Creating Customized Grid'5000 Environments with Chef
From Grid5000
Contents |
Introduction
The goal of this practical session is to introduce Grid'5000 users to Chef, an open source configuration management tool. We will present how Chef can be used to describe and automate large distributed software installations typically used on Grid'5000.
The basic utilization of Chef involves applying a set of cookbooks to nodes. A cookbook is a description of the installation and configuration of a software tool, written in Ruby using resources offered by the Chef framework. A node configuration is described as a JSON entry which is interpreted by Chef to modify software installed on a node in order to match the required configuration.
Installing Chef
First, we reserve three nodes in deploy mode on Grid'5000:
$ oarsub -l nodes=3,walltime=2 -t deploy 'sleep 86400'
We connect to the job (replace with your own job ID):
$oarsub-COAR_JOB_ID
Then, we deploy a squeeze image on the first node:
$ kadeploy3 -e squeeze-x64-nfs -m `head -1 $OAR_NODE_FILE` -k
Let's connect to the first node:
$ ssh root@`head -1 $OAR_NODE_FILE`
To install Chef, we first need to install RubyGems:
# aptitude update # aptitude install ruby ruby-dev libopenssl-ruby rdoc ri irb build-essential wget ssl-cert # These are the required dependencies for RubyGems # cd /tmp/ # env http_proxy=http://proxy:3128 wget http://production.cf.rubygems.org/rubygems/rubygems-1.7.2.tgz # tar xzf rubygems-1.7.2.tgz # cd rubygems-1.7.2 # ruby setup.rb # ln -s /usr/bin/gem1.8 /usr/bin/gem
Now that RubyGems is installed, we can install Chef:
# env http_proxy=http://proxy:3128 gem install chef --no-rdoc --no-ri
Finally, we configure it:
# mkdir -p /etc/chef /tmp/chef-solo
Create a /etc/chef/solo.rb file containing the following content:
cookbook_path "/tmp/chef-solo/cookbooks" file_cache_path "/tmp/chef-solo"
Our image is now ready to be used with Chef. Let's save it. From the frontend:
$mkdir-p ~/public/{descriptions,images} $sshroot@`head -1 $OAR_NODE_FILE` tgz-g5k > ~/public/images/squeeze-x64-chef.tgz
tgz-g5k creates an archive of the modified environment, which takes a couple of minutes. Next, we create the environment description, replacing the highlighted values. Save it in ~/public/descriptions/squeeze-x64-chef.dsc
name : squeeze-x64-chef version : 1 description : squeeze-x64-nfs with Chef author :$USERtarball : http://public.$SITE.grid5000.fr/~$USER/images/squeeze-x64-chef.tgz|tgz postinstall : /grid5000/postinstalls/debian-x64-nfs-2.3-post.tgz|tgz|traitement.ash /rambin kernel : /vmlinuz initrd : /initrd.img fdisktype : 83 filesystem : ext3 environment_kind : linux visibility : shared demolishing_env : 0
Now, we deploy this new environment on the two other nodes we reserved:
$kadeploy3-a http://public.$SITE.grid5000.fr/~$USER/descriptions/squeeze-x64-chef.dsc -msecond-node.$SITE.grid5000.fr -mthird-node.$SITE.grid5000.fr -k
You can let this deployment run in the background and move to the next section.
Using Chef
Now that our environment is ready, we are going to use Chef for the first time. Chef can be run in two modes: a client/server model and a stand-alone model (chef-solo). In this tutorial, we only use chef-solo.
The chef-solo program takes as arguments a node configuration in JSON format describing the list of recipes to install, and a cookbook tarball containing the recipes.
Still on the first node of our reservation, we create the JSON file containing our node configuration. Let's name it /tmp/node.json:
{
"run_list": [ "recipe[java_sun]" ]
}
This tells Chef to install the java_sun recipe, which automates the installation of the Sun Java JDK. First, let's check that Java is not installed:
# java -version
-bash: java: command not found
Now, we can run chef-solo:
# chef-solo -j /tmp/node.json -r http://public.sophia.grid5000.fr/~priteau/cookbooks.tgz
Note that we pass our node.json file with the -j option, and with -r we pass a URL to a remote gzipped tarball of recipes that will be extracted to the cookbook cache (remember the cookbook_path in /etc/chef/solo.rb? That's were the recipes go.). In this case, the cookbooks.tgz file contains a java_sun cookbook. You should see an output similar to this one:
[Mon, 18 Apr 2011 08:43:44 +0200] INFO: Setting the run_list to ["recipe[java_sun]"] from JSON [Mon, 18 Apr 2011 08:43:44 +0200] INFO: Starting Chef Run (Version 0.9.16) [Mon, 18 Apr 2011 08:43:44 +0200] INFO: Installing package[sun-java6-jdk] version 6.24-1~squeeze1 [Mon, 18 Apr 2011 08:43:44 +0200] INFO: Pre-seeding package[sun-java6-jdk] with package installation instructions. [Mon, 18 Apr 2011 08:44:02 +0200] INFO: package[sun-java6-jdk] sending run action to execute[update-java-alternatives] (immediate) [Mon, 18 Apr 2011 08:44:02 +0200] INFO: Ran execute[update-java-alternatives] successfully [Mon, 18 Apr 2011 08:44:02 +0200] INFO: Chef Run complete in 17.475364 seconds [Mon, 18 Apr 2011 08:44:02 +0200] INFO: cleaning the checksum cache [Mon, 18 Apr 2011 08:44:02 +0200] INFO: Running report handlers [Mon, 18 Apr 2011 08:44:02 +0200] INFO: Report handlers complete
We can see that Chef installs the sun-java6-jdk Debian package, but also runs update-java-alternatives to set this java version as the default. Check that Java has been successfully installed:
# java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
Now, run the same Chef command again:
# chef-solo -j /tmp/node.json -r http://public.sophia.grid5000.fr/~priteau/cookbooks.tgz
You should see something like this:
[Mon, 18 Apr 2011 09:18:45 +0200] INFO: Setting the run_list to ["recipe[java_sun]"] from JSON [Mon, 18 Apr 2011 09:18:45 +0200] INFO: Starting Chef Run (Version 0.9.16) [Mon, 18 Apr 2011 09:18:45 +0200] INFO: Chef Run complete in 0.055622 seconds [Mon, 18 Apr 2011 09:18:45 +0200] INFO: cleaning the checksum cache [Mon, 18 Apr 2011 09:18:45 +0200] INFO: Running report handlers [Mon, 18 Apr 2011 09:18:45 +0200] INFO: Report handlers complete
We notice that the execution is much faster, and there is no output about the Sun Java JDK. This is because Chef detects that the package has already been installed, and doesn't perform any action.
Writing Chef cookbooks
It is now time to start writing our own Chef cookbooks! Go on the frontend of the site where you did your reservation, and create an empty repository of Chef cookbooks:
$geminstall rake $exportPATH=$HOME/.gem/ruby/1.8/bin:$PATH $ env http_proxy=http://proxy:3128gitclone http://github.com/opscode/chef-repo.git $cdchef-repo $rakenew_cookbook COOKBOOK=hadoop
Let's study what has been generated by this step:
$cdcookbooks/hadoop $ls-1F README.rdoc attributes/ definitions/ files/ libraries/ metadata.rb providers/ recipes/ resources/ templates/
- README.rdoc: a description of this cookbook
- attributes: attributes are confituration data that can be used by recipes
- definitions: allows to create new Chef resources
- files: any file can be stored here and used by recipes
- libraries: include arbitrary Ruby code
- metadata.rb: metadata for the cookbook, describing its dependencies and supported platforms
- providers: to support multiple platforms with a single resource
- recipes: the code doing all system changes is stored here
- resources: abstraction of a configuration item
- templates: ERB templates used by recipes
Before starting to write the recipe for our Hadoop cookbook, we have to include its dependencies. Hadoop requires Java, so we are going to include in our repository the java_sun cookbook we used earlier.
$cd~/chef-repo/cookbooks $cp-R /home/priteau/cookbooks/java_sun . $cdhadoop
Edit metadata.rb to add the following line:
depends "java_sun"
This tells Chef that the hadoop cookbook depends on the java_sun cookbook.
Now, we can start writing our recipe. Edit the recipes/default.rb file and add the following line.
include_recipe "java_sun"
This tells the hadoop default recipe to execute the java_sun default recipe (and thus install the Sun Java JDK if it is not installed).
Next, we describe a command execution:
execute "Install Hadoop" do command <<-EOH cd /tmp wget http://public.sophia.grid5000.fr/~priteau/hadoop-0.20.2.tar.gz tar xzf hadoop-0.20.2.tar.gz rm -rf /opt/hadoop useradd -d /opt/hadoop hadoop mkdir -p /opt/hadoop mv hadoop-0.20.2/* /opt/hadoop chown -R hadoop:hadoop /opt/hadoop sed -i 's/# export JAVA_HOME=/usr/lib/j2sdk1.5-sun/export JAVA_HOME=/usr/lib/jvm/java-6-sun/' /opt/hadoop/etc/hadoop-env.sh EOH end
As you can see, we can create an execute block and provide it with a shell script which is run as root by default. However, this doesn't bring much advantage compared to a regular shell script. Now, we are going to see how we can use resources provided by Chef to make this recipe cleaner and easier to read.
User and group creation
The following creates a hadoop Unix group, and a hadoop Unix user belonging to this group and using /opt/hadoop as home directory.
group "hadoop" do end user "hadoop" do group "hadoop" home "/opt/hadoop" shell "/bin/bash" end
Directory creation
With the following code, we create the home directory of our user. The first block recursively removes the /opt/hadoop directory. The second one creates it and sets the ownership to hadoop:hadoop.
# We clean up the install directory to start from scratch directory "/opt/hadoop" do action :delete recursive true end # The default directory action is create, so this block creates a /opt/hadoop directory owned by hadoop:hadoop directory "/opt/hadoop" do owner "hadoop" group "hadoop" end
File fetching
The following block fetches the hadoop-0.20.2.tar.gz file from the public HTTP server in Sophia and sets its ownership to hadoop:hadoop. The file is stored in the file cache of Chef (/tmp/chef-solo, configured in the /etc/chef/solo.rb file). The checksum (a SHA-256 hash) allows Chef to avoid downloading the file if it is already present on the node.
remote_file "#{Chef::Config[:file_cache_path]}/hadoop-0.20.2.tar.gz" do
checksum "94a08444706bb09a4f1bd124e5533fbb483e30f764ce647eb0adc399c7b9b174"
owner "hadoop"
group "hadoop"
source "http://public.sophia.grid5000.fr/~priteau/hadoop-0.20.2.tar.gz"
end
Command execution
We can now modify our execute block to only do the untar and move the files:
execute "Install Hadoop" do
cwd "/tmp"
user "hadoop"
group "hadoop"
command <<-EOH
tar xzf #{Chef::Config[:file_cache_path]}/hadoop-0.20.2.tar.gz
mv hadoop-0.20.2/* /opt/hadoop
EOH
end
Templates
With Chef, it is possible to generate files from templates written in ERB. ERB allows to embed Ruby code in a template that will be run to generate the output file. From within the Chef recipe, we generate a file from a template using a block like this:
template "/opt/hadoop/conf/hadoop-env.sh" do source "hadoop-env.sh" mode 0644 owner "hadoop" group "hadoop" variables( :java_home => "/usr/lib/jvm/java-6-sun/" ) end
/opt/hadoop/conf/hadoop-env.sh is the path to the output file. hadoop-env.sh is the source template and is stored in templates/default in the cookbook directory. variables is a Ruby hash accessible from the ERB code included in the template.
To create the template file, we start from the hadoop-env.sh file included in the Hadoop source. For convenience, it is available on the Grid'5000 NFS server:
cp /home/priteau/hadoop-0.20.2/conf/hadoop-env.sh templates/default/
Edit templates/default/hadoop-env.sh and modify the line:
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
by the line:
export JAVA_HOME=<%= @java_home %>
The Ruby code between <%= and %> markers is evaluated and its value (here, passed from the Chef hadoop recipe) replaces the block.
We also need a template for the core-site.xml and mapred-site.xml files:
template "/opt/hadoop/conf/core-site.xml" do source "core-site.xml" mode 0644 owner "hadoop" group "hadoop" variables( :namenode => node[:hadoop][:namenode] ) end template "/opt/hadoop/conf/mapred-site.xml" do source "mapred-site.xml" mode 0644 owner "hadoop" group "hadoop" variables( :jobtracker => node[:hadoop][:jobtracker] ) end
templates/default/core-site.xml will configure the name node (the metadata server):
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.default.name</name> <value>hdfs://<%= @namenode %>:54310</value> </property> </configuration>
templates/default/mapred-site.xml will configure the job tracker:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapred.job.tracker</name> <value>hdfs://<%= @jobtracker %>:54311</value> </property> </configuration>
Final common configuration steps
These final common steps perform a cleanup of HDFS and kill Hadoop services if they are running.
# We clean up the HDFS directory directory "/tmp/hadoop-hadoop" do action :delete recursive true end execute "Format HDFS" do user "hadoop" group "hadoop" command <<-EOH /opt/hadoop/bin/hadoop namenode -format EOH end execute "Stop Hadoop" do user "hadoop" group "hadoop" command "pkill -9 java; true" end
Launching services
Finally we can run the Hadoop services. Since we don't run the same services on all nodes, we are going to differentiate between master (NameNode and JobTracker) and slave nodes (DataNode and TaskTracker).
We create a new recipe, recipes/master.rb:
include_recipe "hadoop" service "NameNode" do start_command "su hadoop sh -c '/opt/hadoop/bin/hadoop-daemon.sh start namenode'" supports [ :start ] action [ :start ] end service "DataNode" do start_command "su hadoop sh -c '/opt/hadoop/bin/hadoop-daemon.sh start datanode'" supports [ :start ] action [ :start ] end service "JobTracker" do start_command "su hadoop sh -c '/opt/hadoop/bin/hadoop-daemon.sh start jobtracker'" supports [ :start ] action [ :start ] end service "TaskTracker" do start_command "su hadoop sh -c '/opt/hadoop/bin/hadoop-daemon.sh start tasktracker'" supports [ :start ] action [ :start ] end
We create a recipe for the slaves, recipes/slave.rb:
include_recipe "hadoop" service "DataNode" do start_command "su hadoop sh -c '/opt/hadoop/bin/hadoop-daemon.sh start datanode'" supports [ :start ] action [ :start ] end service "TaskTracker" do start_command "su hadoop sh -c '/opt/hadoop/bin/hadoop-daemon.sh start tasktracker'" supports [ :start ] action [ :start ] end
Cookbook execution
To execute our Hadoop cookbook on the first node, first create its JSON description, let's store it in ~/master.json on the Grid'5000 frontend:
{
"hadoop": {
"namenode": "first-node.site.grid5000.fr",
"jobtracker": "first-node.site.grid5000.fr"
},
"run_list": [ "recipe[hadoop::master]" ]
}
Create an archive of the cookbooks:
$ tar -C ~/chef-repo -cvzf ~/cookbooks.tgz ./cookbooks
On the first node, as root, run:
# chef-solo -j /home/USER/master.json -r /home/USER/cookbooks.tgz
Creating a slave.json file to configure the slave nodes is left as an exercise for the reader.
