Deploy grid

From Grid5000
Jump to: navigation, search
Warning.png Warning

This practice was designed for the old OAR1 infrastructure of Grid'5000 and is now deprecated.

First, clusters to use in this part are:

Sites debian4RMs compliant
Bordeaux Fail.png
Grenoble Fail.png
Lille Fail.png
Lyon Fail.png
Nancy Fail.png
Orsay Fail.png
Rennes Fail.png
Sophia Check.png
Toulouse Fail.png


In this exercise we are going to learn how to deploy a "virtual" cluster or grid inside Grid5000 infrastructure. We are going to present how we construct and configure an image, so that after the deployment we can use our own cluster or grid with its server(s) and computing node(s), among the reserved nodes of Grid5000.

The image of the system that we are going to deploy contains all the necessary software and middleware, so the only difference among the server and the client nodes are the services that we launch after the deployment.

Deploy a cluster

In the first part of this TP, we deploy and configure a "virtual" cluster inside Grid5000.

Among the required tools for the exploitation of a cluster, the batch scheduler is a central element. A batch scheduler is a system that manages resources, which are the nodes of the cluster. It is in charge of accepting jobs submissions and schedule the execution of these jobs on the resources it manages.

The clusters of Grid5000 infrastructure use OAR (http://oar.imag.fr/) as their default batch scheduler. Nevertheless, there are plenty of available batch schedulers for clusters and parallel machines. Among the most known we can quote PBS/OpenPBS ([1]), Torque([2]), LSF([3]), LoadLeveler([4]), Condor([5]), SGE([6]).

For the exploitation of our "virtual" cluster we can easily use a batch scheduler of our choice. To do that, we simply have to construct an image that contains all the appropriate software for both the server and the client node, and deploy it on our reserved nodes that will be part of our new "virtual" cluster. After deployment and during the configuration phase we give the roles for each node (server,client) by launching the appropriate services... and our cluster is ready for use!!

Torque.png


We present an example of a cluster deployment that uses a widely known batch scheduler for the management of jobs.

TORQUE Resource Manager + MAUI Cluster Scheduler

On the specific example we describe the configuration of TORQUE Resource Manager 2.0 ([7]) and Maui Cluster Scheduler ([8]) as the advanced scheduler of Torque.

Both of these software are installed on the image: "debian4RMS" which we have to deploy on the nodes that we want to include on our new "virtual cluster". For the automation of the configuration we have constructed a script that makes the cluster ready for use in every site, without any manual modifications. In more detail we follow the next steps:

  • Reserve the nodes of a specific site of Grid5000 for deployment:
Terminal.png frontend:
oarsub -I -q deploy -l nodes=3
  • Deploy the image " debian4RMS", on all the nodes of the reservation (do not forget to replace partition by the user-deploy partition ID at your working site):
Terminal.png frontend:
kadeploy -e debian4RMS -f $OAR_NODEFILE -p partition -l ygeorgiou
  • Execute the script named "launch_torque+maui.sh" that finalizes the configuration and starts the services of Torque Resource Manager and Maui Cluster Scheduler (the command for the execution of the script is given after the explanation of it). Thus the script does the following:
    • Starts the NFS server service
    • Make modifications on some files on the server that are needed for the configuration of Torque
    • Make modifications on a file on the server that is needed for the configuration of the Maui Scheduler
    • Diffuse the modified files on all the computing nodes
    • Start the service of Torque Resource Manager
    • Start the service of Maui Cluster Scheduler

For more detail we present the script along with comments on every command that is used:

# !/bin/sh

# start the nfs server
 sudo /etc/init.d/nfs-kernel-server start

# create the string NODE_ARG which contains all the computing nodes with "-m" before each node...
# it is used as argument for sentinelle to launch a command on each node
# create the string NODES which contains all the computing nodes with "np=2" after after each node...
# it is copied on the file /tmp/nodes and it is used by torque as the file to describe all the computing nodes of the system
 NODE_ARG=""
 NODES=""
 for node in $@
 do
    NODE_ARG="$NODE_ARG -m $node "
    NODES="$NODES $node np=2  "
 done
 echo $NODES > /tmp/nodes

# store the hostname and the domaine name of the server on variables
 HOSTNAME=$(hostname)
 DNSDOMAINNAME=$(dnsdomainname)
 export DNSDOMAINNAME

# mount the home directory of the server to all the computing nodes
 sentinelle.pl $NODE_ARG -p "sudo mount $HOSTNAME:/home/g5k /home/g5k"
# TORQUE configuration: change the file server_name with our new server hostname
 sudo sh -c 'sed s/localhost/$HOSTNAME/ /home/g5k/torque-2.0.0p7/server > /usr/spool/PBS/server_name'
# TORQUE configuration: change some lines of the file mom_config using the server hostname and the server domain name
sudo sh -c 'sed -e s/localhost/$HOSTNAME/g -e s/*.domainname/*.$DNSDOMAINNAME/g /home/g5k/torque-2.0.0p7/momconfig >  /usr/spool/PBS/mom_priv/config'
# MAUI SCHEDULER configuration: change the file maui.cfg using the new server hostname
 sudo sh -c 'sed s/localhost/$HOSTNAME/g /home/g5k/maui-3.2.6p14/maui_test.cfg > /usr/local/maui/maui.cfg'

# change the ownerships of some files that we need to modify
 sudo chown g5k /usr/spool/PBS/server_name
 sudo chown g5k /usr/spool/PBS/server_priv/
 sudo chown g5k /usr/spool/PBS/server_priv/nodes
# TORQUE configuration:creation of the computing nodes description file... 
# add the newline character after every np=2 of the /tmp/nodes file created before and store the file 
# on the location to be used by Torque
 sudo cat /tmp/nodes | sed  's:np=2 :np=2\
 :g' > /usr/spool/PBS/server_priv/nodes
# change the ownerships of some files that we need to modify on all the computing nodes
 sentinelle.pl $NODE_ARG -p "sudo chown g5k /usr/spool/PBS/server_name"
 sentinelle.pl $NODE_ARG -p "sudo chown g5k /usr/spool/PBS/server_priv/"
 sentinelle.pl $NODE_ARG -p "sudo chown g5k /usr/spool/PBS/server_priv/nodes"
 sentinelle.pl $NODE_ARG -p "sudo chown g5k /usr/spool/PBS/mom_priv/config"
 sentinelle.pl $NODE_ARG -p "sudo chown g5k /usr/local/maui/maui.cfg"

# TORQUE configuration:copy the server_name file on all the computing nodes
 sentinelle.pl $NODE_ARG -p "sudo -u g5k scp $HOSTNAME:/usr/spool/PBS/server_name /usr/spool/PBS/server_name"
# TORQUE configuration:copy the computing nodes description file on all the computing nodes
 sentinelle.pl $NODE_ARG -p "sudo -u g5k scp $HOSTNAME:/usr/spool/PBS/server_priv/nodes /usr/spool/PBS/server_priv/nodes"
# TORQUE configuration:copy the config file needed by pbs_mom on all the computing nodes
 sentinelle.pl $NODE_ARG -p "sudo -u g5k scp $HOSTNAME:/usr/spool/PBS/mom_priv/config /usr/spool/PBS/mom_priv/config"
# MAUI SCHEDULER configuration:copy the maui.cfg file needed by maui on all the computing nodes
 sentinelle.pl $NODE_ARG -p "sudo -u g5k scp $HOSTNAME:/usr/local/maui/maui.cfg /usr/local/maui/maui.cfg"

# finished changes so change the ownerships of the files as they were before on all the computing nodes
 sentinelle.pl $NODE_ARG -p "sudo chown root /usr/spool/PBS/server_name"
 sentinelle.pl $NODE_ARG -p "sudo chown root /usr/spool/PBS/server_priv/"
 sentinelle.pl $NODE_ARG -p "sudo chown root /usr/spool/PBS/server_priv/nodes"
 sentinelle.pl $NODE_ARG -p "sudo chown root /usr/spool/PBS/mom_priv/config"
 sentinelle.pl $NODE_ARG -p "sudo chown root /usr/local/maui/maui.cfg"
# change the ownerships of the files as they were before on the server
 sudo chown root /usr/spool/PBS/server_name
 sudo chown root /usr/spool/PBS/server_priv/
 sudo chown root /usr/spool/PBS/server_priv/nodes

# start the computing node client on all the computing nodes
 sentinelle.pl $NODE_ARG -p "sudo pbs_mom"
# start the torque server
 sudo pbs_server
# start maui scheduler
 (maui > /dev/null 2>&1)&
  • The above script has to be launched from the machine that is going to be the server of the cluster and we have to use the names of the computing nodes as arguments for the execution of the script. Hence after the deployment of the images we could write the following command for the execution of the script:
Terminal.png frontend:
sort -u $OAR_NODEFILE | tail -2 | ssh -l g5k $(sort -u $OAR_NODEFILE | head -1) "xargs ~g5k/launch_torque+maui.sh"

The virtual cluster is ready for use with Torque Resource Manager 2.0 along with its own scheduler ready for job submissions (Maui Scheduler is not any more configured on this image). To test our cluster we can log on the server of it:

Terminal.png frontend:
ssh -l g5k $(sort -u $OAR_NODEFILE | head -1)

We can see the state of the cluster by writing:

Terminal.png node:
pbsnodes -a

We can create a simple job that sleeps for 60 seconds and writes the date, by creating a file (job.torque) with our favorite editor that contains the following lines:

sleep 60
date

To submit the job we use the command qsub. For the execution of the job we can ask for 2 nodes with 2 processors by writing:

Terminal.png node:
qsub -l nodes=2:ppn=2 job.torque

As the job is executing we can see the state of the jobs by using the command:

Terminal.png node:
qstat

When the job is finished the file job.torque.e* contain the errors, if any, and the file job.torque.o* contain the output of the job (the printed date in our case).

For more details on job submissions and advanced cluster administration we can see the manuals of the two tools (Torque, Maui)

Deploy a grid

Now we are going to deploy a complete grid infrastructure within Grid'5000, which means a grid within the grid !!!

Hence, we need:

This idea is to prepare an image of a system containing all the software and middleware we need. Hence there will be no difference between an server node and a client one, apart from the services that will be started after deployment (that means all nodes will use the same kadeploy environement).

CIGRI OAR In G5K.png

The Kadeploy environement we prepared for this practice is named "debian4RMS" (available in Orsay and Sophia site). It is a Debian (unstable) with the OAR and CIGRI installed as explained before.

Here are the steps to follow:

  • Reserve nodes for the deployment on 2 different sites:
Terminal.png frontend:
oarsub -I -q deploy -l nodes=3
  • Deploy the environment on both sites:
Terminal.png frontend:
kadeploy -e debian4RMS -f $OAR_NODEFILE -p partition -l ygeorgiou
  • Call the script named "launch_OAR.sh" that is located in the home directory of the user g5k (password: grid5000). That script performs the following actions (it takes the list of node of the cluster as parameter, the node starting the script being excepted):
  1. Start the NFS service (to share the homedir of the user g5k with all the nodes of the virtual cluster)
  2. Start MySQL (needed by OAR server)
  3. Start OAR (server)
  4. Modify OAR configuration file on all nodes, so that they can locate the server.
  5. Mount the g5k user homedir on all nodes
  6. Run "oarnodesetting -s Alive" on all nodes to register them in OAR database.

Here is the detailed explanation of the script:

#!/bin/sh

# Each argument of the command line is a name of machine on which it is necessary to activate the node mode of OAR.
HOSTNAME=$(hostname) 

# Start the NFS server so that the computing nodes can mount the "home" of user g5k
sudo /etc/init.d/nfs-kernel-server start
# Start the HTTP server so that we can have access to the monitoring graphic tools (Monika and DrawGantt) of OAR.
sudo /etc/init.d/apache start
# Start the OAR server
sudo /etc/init.d/oar-server start

# It is necessary to format the computing node names to use with the command "sentinelle.pl "(add a -m in front of each name)
NODE_ARG=""
for node in $@
do
    NODE_ARG="$NODE_ARG -m $node "
done

# Mount NFS  home of g5k on all the nodes
sentinelle.pl $NODE_ARG -p "sudo mount $HOSTNAME:/home/g5k /home/g5k"
# Update OAR configurations so that the computing nodes know who is the OAR server.
sudo -u oar sh -c 'sed s/localhost/$HOSTNAME/g /etc/oar.conf > /tmp/oar.conf'
# Diffusion of the new configuration
sentinelle.pl $NODE_ARG -p "sudo -u oar scp $HOSTNAME:/tmp/oar.conf /etc/oar.conf"
# Registration of the computing nodes on the OAR server.
sentinelle.pl $NODE_ARG -p "oarnodesetting -s Alive" 

As you can note, we use the command "sentinelle.pl ". This Perl script manages connection via SSH with a list of nodes. In particular it makes it possible to specify a timeout and to detect which are the nodes which do not have succeeded in executing the desired command. (Much more practical and effective than making a "for" of SSH in bash script). You can download the script "sentinelle.pl" from the address: sentinelle.pl

We want to launch the image initialization script on the first node of the reservation. Thus this machine will be the frontal (server) of our "virtual" cluster and it is necessary to provide the list of the computing nodes as parameters of the script:

Terminal.png frontend:
sort -u $OAR_NODEFILE | tail -2 | ssh -l g5k $(sort -u $OAR_NODEFILE | head -1) "xargs ~g5k/launch_OAR.sh"

After this the virtual cluster is ready for use.

To test, we can launch "firefox" on the server (head -1 #$$OAR_NODEFILE>) in which we can see the bookmarks to visualize the state of the cluster.

Note.png Note

# It is necessary to initialize OAR on each virtual cluster that we want to deploy. You can test your new clusters with the commands "oarsub", "oarstat", "oarnodes"...
# If you use a virtual cluter with more than 2 computing nodes, replace tail -2 by grep -v $(sort -u $OAR_NODEFILE</p>

</div>

Now we have to choose one of the 2 OAR frontal machines to play the role of CIGRI server. Then it is necessary to add the other frontal in the configuration of CIGRI (by default only the frontal "localhost" is configured). For that there is a script which is called "add_cluster_in_CIGRI.sh":

#!/bin/sh

# As argument : $1 has to contain the name of the frontal OAR that we want to add 

echo "INSERT INTO clusters (clusterName,clusterAdmin,clusterBatch) VALUES ('$1', , 'OAR')" | mysql -ucigri -pcigri cigri
echo "INSERT INTO users (userGridName,userClusterName,userLogin) VALUES ('g5k', '$1', 'g5k')" | mysql -ucigri -pcigri cigri

After the insertion of the additional cluster in the database, this script also launches CIGRI server.

The invocation of this script is made as following:

Terminal.png <code class="host">node</code><code>:</code>
<code class="command">~g5k/add_cluster_in_CIGRI.sh</code> <code class="replace">name_of_the_second_OAR_frontal</code>

To test OAR and CIGRI we installed POV-Ray on all the nodes. That makes it possible to calculate each image of a graphic animation, in parallel on each node. Here we show the steps that we have to follow to make the demonstration:

  • go on the machine which was indicated CIGRI server
  • the demonstration is in ~g5k/demo
  • edit the file demo.jdl to add a section on the model:
name_of_the_second_frontal_OAR {
   execFile = /home/g5k/demo_povray/scripts/generateImage.sh ;
   execDir = /home/g5k/demo_povray/tmp ;
}
  • launch the command which submits the multijob to CIGRI:
Terminal.png <code class="host">node</code><code>:</code>
<code class="command">gridsub</code> -f demo.jdl
  • visualize the state of the multijob on the Web interface by launching "firefox" and going on the CIGRI bookmark

You will be able to find the calculated images in the directory "/homeg5k/demo_povray/tmp".

Personal tools
Namespaces

Variants
Actions
Public Portal
Users Portal
Admin portal
Wiki special pages
Toolbox