Deploy environment-OAR2

From Grid5000

Jump to: navigation, search

Contents

Connect to Grid'5000

  • you should connect to the frontend node of the cluster you were granted access to:
Image:Terminal.png outside:
ssh login@access.site.grid5000.fr
  • if needed, you may then internally jump onto the frontend of another cluster following your deployment purposes:
Image:Terminal.png frontend:
ssh another_site.grid5000.fr

What you need to know before starting

The first thing to understand is that by using kadeploy3, you will be running a command that attempts to remotely reboot nodes, and boot them using configuration files hosted on a server, many nodes at a time. On some clusters, there is a failure rate associated with this operation that is not null. You might therefore experience failures on some operations during this tutorial. In this case, retry. The system doesn't retry for you as this implies waiting for long timeouts in all cases, even those where a 90% success rate is sufficient.

What is an Environment ?

Where we describe what exactly is image, kernel, initrd and postinstall

An environment in kadeploy3 is a set of file describing a fully functional Operating System. To be able to setup a Operating System, kadeploy3 needs at least 4 files in the most common cases:

  1. An image
    • An image is a file containing all the Operating System files. It can be a compressed archive (ie tgz file) or a dump of a device (ie dd file). In this tutorial, you will learn to build new images for Kadeploy3
  2. A kernel file
    • For the Unix based environment, the kernel file specifies which kernel to boot. It is the full path of the kernel file.
  3. initrd file (optional)
    • For the Linux based environment, the optional initrd file allows to use an initial ramdisk which will be used as the root filesytem at the boot sequence. More information: Initrd on Wikipedia
  4. A postinstall file (optional)
    • The postinstall file allows you to correctly configure all specificity on each cluster. It is not mandatory to specify it for Kadeploy3 environment but if you know what you are doing, feel free to define it.

Once you have this set of files, you can describe your environment to kadeploy3. This description represents an environment in the kadeploy3 sense.

Find and deploy existing images

Where you will learn to locate environments, find deployable partitions, make a job on a deployable node and use kadeploy...

Locate a suitable image

Grid'5000 maintains several reference environments directly available on any site. These environments are based on various versions of debian. And for each debian version you will find different variants of reference environments.
They are called reference environments because they can be used to generate customized environments. You will find different variants of reference environments, depending on which version of debian they are based on.
The complete list all available environments, with their different variants, and the sites where they are available are listed on the wiki page:

From that same page there is a link for each variant of each reference environments to another page which gives a thorough description of the environment content, how it was build and how to use it with kadeploy3. An example in the next link :

An environment library is maintained on each site in the /grid5000 directory of the frontend node. So all environments available on each site are stored in that directory.

To deploy a registered environment, you must know its name as registered in the Kadeploy database. It is the first information on the environment description page. This tutorial uses the squeeze-x64-base environment.

You can also list all available environment in a site by using the kaenv3 command :

Image:Terminal.png frontend:
kaenv3 -l

This command lists all public as well as your private environments.

We distinguish three levels of visibility for an environment :

  • public: All users can see those environments. Only administrators can tag them this way.
  • shared: Every users can see the environment provided they use the -u option to specify the user the environment belong to.
  • private: The environment is only visible by the user the environment belong to.

For example, a shared environment added by user user is listed this way :

Image:Terminal.png frontend:
kaenv3 -l -u user

Being able to reproduce the experiments that are done is a desirable feature. Therefore, you should always try to control as much as possible the environment the experiment is done in. Therefore, we will attempt to check that the environment that was chosen in the environment directory is the one available on a given cluster. On the cluster you would like to deploy, type the following command to print information about an environment :

Image:Terminal.png frontend:
kaenv3 -p squeeze-x64-base -u deploy

You must specify the user option. In our case, all public environments belong to user deploy.

Check that the tarball file is the expected one by checking its name and its checksum which you should find on the identification sheet:

Image:Terminal.png frontend:
md5sum /grid5000/images/squeeze-x64-base-1.1.tgz

In theory, you should also check the post-install script. A post-install script adapts an environment to the site it is deployed on. In the same way as for environments, you should be able to find a description of the post-install script on pages such as here. Post-install scripts is an evolving matter, so don't be too worried if you don't find things as described here. If everything seems ok, please proceed to the next step.

Make a job on a deployable node

By default, Grid'5000 nodes are running on the production environment. Which already contains most of the important features and can be used to run experiments. But you will not have administrative privileges (root privileges) on these nodes. So you will not be able to customize these environments at will. In fact, only reference environments can be customized at will. But to have the right to deploy a reference environment on a node, you must supply the option -t deploy when submitting your job.

For this part of the tutorial, job made will be interactive (-I), of the deploy type (-t deploy), on only one machine (-l nodes=1) to do environment customization (we will give ourselves 3 hours with -l walltime=3), which gives us the following command, that will open a new shell session on the frontend node:

Image:Terminal.png frontend:
oarsub -I -t deploy -l nodes=1,walltime=3

Since all Grid'5000 nodes do not necessary have console access, it is recommended, for the smooth of this tutorial, to add the option rconsole="YES" in your reservation command. The utility of this feature will be explained in a section below.

Image:Terminal.png frontend:
oarsub -I -t deploy -l '{rconsole="YES"}/nodes=1,walltime=3'

Indeed, when you submit a job of the deploy type, a new shell is opened on the frontend node and not on the first machine of the job as for standard jobs. When you exit from this shell, the job ends. The shell is populated with OAR_* environment variables. You should look at the list of available variables to get an idea of the information you can use to script deployment later. As usual, if the job is successfull, you will get the name of the machine allocated to your job with:

Image:Terminal.png frontend:
cat $OAR_FILE_NODES
Image:Warning.png Warning

At the end of a reservation with the options -t deploy, the reserved nodes will be restarted to boot on the production environment and thus be available to any other user. So you should only use this option -t deploy when you actually intend to deploy a reference environment on the reserved nodes.

Deploy an environment

To deploy your environment, you must discover the nodes you were allocated by OAR. The simplest way of doing this is to look at the content of the file whose name is stored in $OAR_FILE_NODES (this variable is labelled $OAR_NODE_FILE too) or the messages displayed when the job was made. This variable $OAR_NODE_FILE simply stores the url of the file containing the FQDN of all your reserved nodes. Deployment happens when you run the following command:

Image:Terminal.png frontend:
kadeploy3 -e squeeze-x64-base -m node.site.grid5000.fr

You can automate this to deploy on all nodes of your job with Kadeploy3's -f option:

Image:Terminal.png frontend:
kadeploy3 -e squeeze-x64-base -f $OAR_FILE_NODES


If you want to be able to connect to the node as root without any password prompting you can use the -k option and proceed by two ways :

  • You can either specify the public key that will be copied in /root/.ssh/authorized_keys on the deployed nodes :
Image:Terminal.png frontend:
kadeploy3 -e squeeze-x64-base -f $OAR_FILE_NODES -k ~/.ssh/my_special_key.pub
  • Or you can supply the -k option without argument. This will automatically copy your ~/.ssh/authorized_keys and replace the /root/.ssh/authorized_keys file on the deployed nodes.
Image:Terminal.png frontend:
kadeploy3 -e squeeze-x64-base -f $OAR_FILE_NODES -k

The second case is actually the simplest way. One of its advantages is that after deployments, you will be able to connect directly from your local computer to the deployed nodes, the same way you connect to the frontend of the site were those nodes are.
Once kadeploy has run successfully, the allocated node is deployed under squeeze-x64-base environment. It will then be possible to tune this environment according to your needs.

Image:Note.png Note

It is not necessary here, but you can specify destination partition with the -p option. You can find on the Node storage page all informations about the partitions table used on G5K

Tune an environment to build another one: customize authentification parameters

Here you will learn to connect to a deployed environment, customize and add it to Kadeploy3 database with kaenv3.

Connect to the deployed environment and customize it

Connection, the usual way (ssh)

On reference environments managed by the staff, you can use root account for login through ssh (kadeploy checks that sshd is running before declaring success of a deployment). To connect to the node type :

Image:Terminal.png frontend:
ssh root@node.site.grid5000.fr
Image:Note.png Note

If you have not deployed the nodes with a public key using option -k, you will be asked for a password. Default root password for all reference environments is grid5000. Please check the environments descriptions.

In case this doesn't work, please take a look at the kadeploy section of the Sidebar > FAQ

Connection, the brutal way (kaconsole3)

When playing with environments, you could loose services such as sshd or rshd and then become unable to get a shell on the deployed machine. It is therefore possible to connect to the machine's console (for the OS, it is seen as physical access, thanks to remote console capabilities). It should be possible using the kaconsole3 command, provided nobody else has already opened a write access to the remote console and that the post-install script has remote console properly configured:

Image:Terminal.png frontend:
kaconsole3 -m node.site.grid5000.fr

You should be shown the login screen as if you were locally connected onto the machine. Remote console should be kept working all along the boot process, which is useful to debug an environment. To get out of this remote console without having to kill your shell or terminal window, you should type in the proper escape sequence:

Image:Warning.png Warning

Not all machines have the console capability, you can use OAR node property rconsole=YES to make sure your reserved node have console capability, as recommended in the section above.

Customization

Using the root account for all your experiments is possible, but you will probably be better off creating a user account. You could even create user accounts for all the users that would be using your environment. However, if the number of users is greater than 2 or 3, you should better configure an LDAP client by tuning the post-install script or using a fat version of this script (beyond the scope of this tutorial). Two ways of doing things.

The simplest is to create a dedicated account (e.g. the generic user g5k) and move in all experiment data at the beginning and back at the end of an experiment, using scp or rsync. A more elaborate approach is to locally recreate our usual Grid'5000 account with the same uid/gid on our deployed environment. This second approach could simplify file rights management if you need to store temporary data on shared volumes.

To create your local unix group on your environment, first find your primary group on the frontend node with:

Image:Terminal.png frontend:
id

The output of this command is for instance of the form:

uid=19002(dmargery) gid=19000(rennes) groups=9998(CT),9999(grid5000),19000(rennes)

Where :

userId = 19002
userLogin = dmargery
groupId = 19000
groupName = rennes

Then, as root, on the deployed machine:

Image:Terminal.png node:
addgroup --gid groupId groupName

Now, to enable access to your local user account on your environment, as root, on the deployed machine:

Image:Terminal.png node:
adduser --uid userId --ingroup groupName userLogin

Finally, as root, become the newly created user and place your ssh key:

Image:Terminal.png node:
su - userLogin
mkdir ~/.ssh
exit
cp /root/.ssh/authorized_keys /home/userLogin/.ssh/
chown userLogin:groupName /home/userLogin/.ssh/authorized_keys

Now you can login to the node with your user account

Image:Terminal.png fronted:
ssh userLogin@node.site.grid5000.fr

Adding software to an environment

Where you learn to install software using the package repository of your distribution on Grid'5000 (using proxys)...

You can therefore update your environment (to add missing libraries that you need, or remove packages that you don't so that sizes down the image and speeds up the deployment process, etc.) using:

Image:Terminal.png node:
apt-get update
apt-get upgrade
apt-get install list of desired packages and libraries
apt-get --purge remove list of unwanted packages
apt-get clean


Image:Note.png Note

On reference environments, apt-* commands are automatically configured to use the proper proxy. But if you need an outside access for the HTTP, HTTPS and FTP protocols, with another command (wget, git,...), you will have to configure the proxy by following the documentation on the Web_proxy_client page.

Create a new environment from a customized environment

We now need to save this customized environment, where you have a user account, to be able to use this account again each time you deploy it.
The first step to create an environment is to create an archive of the node you just customized. Because of the various implementations of the /dev filesystem tree, this can be a more or less complex operation.

Image:Warning.png Warning

udev, a rule based daemon which helps with hardware detection on Linux systems, can have unexpected behaviour when deploying the same environment on multiple nodes, especially with network interfaces naming. On Debian based systems, once deployed and booted, your system could have network interfaces named eth{2..n} instead of eth0. You are advised to delete the appropriate udev rules before creating the archive:

Image:Terminal.png node:
rm /etc/udev/rules.d/*-persistent-net.rules
This warning applies for manual and dd methods explained below . But does not apply for the tgz-g5k method since late versions of tgz-g5k (from version 1.0.7) automatically removes those unnecessary files.

We have three different ways of creating your archive.

Make a tgz archive manually

When you untar a system environment, you have nothing in /dev. This leads to a kernel panic during the boot process of your deployed node, because the kernel needs the /dev/console and /dev/null device files for instance, before the udev or devfs service is started. So, to create a correct environment archive, you must use a different command in order to also archive the static content of the /dev directory.

One way to be sure to archive the whole content of the / filesystem is to mount it in another location so that you have no subsequent mounts on subdirectories like /dev, /proc, /home, and so on. You may use the following sequence of commands:

  • as root on the target node, mount the target / filesystem on an available mountpoint (usually /mnt):
Image:Terminal.png node:
mount -o bind / /mnt
  • as you on the frontend machine, you now can safely tar the content of this target directory:
Image:Terminal.png frontend:
ssh root@node.site.grid5000.fr "cd /mnt; tar --posix --numeric-owner --one-file-system -zcf - *" > archive.tgz
  • as root on the target node, you may then umount /mnt:
Image:Terminal.png node:
umount /mnt


Use the provided tools

  • You can use TGZ-G5K, a script installed in all reference environments. You can find all instructions on how to use it on its TGZ-G5K page.

Examples :

Image:Terminal.png frontend:
ssh root@node tgz-g5k > path_to_myimage.tgz
Image:Terminal.png node:
tgz-g5k login@frontend:path_to_myimage.tgz

This will create a file path_to_image.tgz into your home directory on frontend. The first example is to be preferred, as it can ran password-less or passphrase-less without adding your private key to the image.

Make a dd archive file (advanced users)

With kadeploy you can use, instead of a standard tgz file a dd file.

  • You have to know a few things about dd images :
    • Because a dd image is a dump of a partition, it will take more time to deploy because the image will be larger than a standard tgz
    • The prepost system doesn't work with this kind of images, that means that you may need one image per site with specific configuration, depending on the clusters hardware.
    • You have to be aware about the different boot process (like grub, udev ...)
    • Please read advanced section of Kadeploy3's documentation about the boot process
      • Make sure that /boot/grub/menu.lst is correctly configured
      • Install grub on your root partition
Image:Terminal.png node:
grub-install /dev/sda3
  • Now you can generate your dd image with
Image:Terminal.png frontend:
ssh root@deployed_node "dd if=/dev/sda3 | gzip" > myimage.dd.gz
Image:Note.png Note

To deploy a dd based environment, it's basically the same process than a tgz environment :

Image:Terminal.png frontend:
oarsub -I -t deploy
Image:Terminal.png frontend:
kadeploy3 -f $OAR_NODEFILE -e myEnvironment

Image:Warning.png Warning

Do not forget to remove the files /etc/udev/rules.d/*-persistent-net.rules as warned in the section above.

Describe the newly created environment for deployments

Kadeploy3 works using an environment description. The easiest way to create a description for your new environment is to change the description of the environment it is based on. We have based this tutorial on the squeeze-x64-base environment of user deploy. We therefore print its description to a file that will be used as a good basis:

Image:Terminal.png frontend:
kaenv3 -p squeeze-x64-base -u deploy > mysqueeze-x64-base.env

It should be edited to change the name, description, author lines, as well as the tarball line. The visibility line should be removed, or changed to shared or private. Once this is done, the newly created environment can be deployed using:

Image:Terminal.png frontend:
kadeploy3 -f $OAR_NODEFILE -a mysqueeze-x64-base.env

This kind of deployment is called anonymous deployment because the description is not recorded into the Kadeploy3 database. It is particularly useful when you perform the tuning of your environment if you have to update the environment tarball several times.

Once your customized environment is successfully tuned, you can save it to Kadeploy3 database so that you can directly deploy it with kadeploy3, by specifying its name:

Image:Terminal.png frontend:
kaenv3 -a mysqueeze-x64-base.env

With kaenv3 command, you can manage your environments at your ease. Please refer to its documentation for an overview of its features.

Scripting a deployment

To plan an experiment, you need to script most of the work to avoid too many interactive steps. The main points to note are the following:

  • a non-interactive job on type deploy will see the given script running on the frontend node;
  • kadeploy3 accepts the -f parameter to read the list of machines to deploy on from a file;
  • it is possible to create a file with the correctly deployed nodes using the --output-ok-nodes (or -o) option, as well as for the uncorrectly deployed nodes with --output-ko-nodes (or -n) option;
  • in non-interactive mode, scripts started by OAR write their output in file whose names are OAR.jobId.stderr and OAR.jobId.stdout.

Using this additionnal information, we will now write a script to be run by OAR to deploy machines before using them for an experiment. You can use the following script as a base. The latest version of this script is available in the subversion repository of the Grid5000-code on InriaGforge.

To get a copy of this script, click on this link and upload it in your home directory (/home/mySite/myLogin/myPath/auto_deploy.sh) ; or download it using wget on a frontend:

Image:Terminal.png frontend:
  1. open it in your favorite text editor;
  2. check the public key to be copied on the nodes;
  3. check the name of the environment it will attempt to deploy when called without parameters;
  4. make the script executable:
    chmod +x /home/mySite/myLogin/myPath/auto_deploy.sh
  5. and submit a job to OAR (ask for 2 nodes on the same cluster):
Image:Terminal.png frontend:
oarsub -l cluster=1/nodes=2 -t deploy /home/mySite/myLogin/myPath/auto_deploy.sh

To follow the status of your job:

Image:Terminal.png frontend:
oarstat -f -j jobId

To follow progress output of your job:

Image:Terminal.png frontend:
tail -f OAR.jobId.stdout

To see the associated error log:

Image:Terminal.png frontend:
less OAR.jobId.stderr

As you will notice, this script accepts parameters. You could deploy your environment and run a script on the head node of your job with the following line (note the double quotes)

Image:Terminal.png frontend:
oarsub -l cluster=1/nodes=2 -t deploy "/home/mySite/myLogin/myPath/auto_deploy.sh environment-name script-name"
Image:Note.png Note

To simplify all those manipulations, a script named katapult has been created to do that well. Explications on it can be find on this page.

Multi-site experiments

Up until now in this tutorial, we have learn how to play with environments on a single site. The job submission as well as the deployment occurred on the same site and involved nodes of a single site. Now we will see how to reserve nodes and deploy an environment simultaneously on multiple sites. In this tutorial we will show you how to deploy on orsay and rennes.

Multi-site nodes reservation

First you must reserve nodes on those sites, with the proper options. This can be achieved in two ways. Both examples below will result with a reservation of 1 node in Orsay and 1 node in Rennes, both reservations made with the mode deploy and nodes reserved for 3 hours. Please adapt it according to your case:

1. You can connect to each site to reserve nodes by using the proper options, as described above in section Make_a_job_on_a_deployable_node. And retrieve the list of your reserved node from any other site.

Image:Terminal.png forsay:
oarsub -I -t deploy -l nodes=1,walltime=3
uniq $OAR_NODEFILE > ~/nodes
Image:Terminal.png frennes:
oarsub -I -t deploy -l nodes=1,walltime=3
uniq $OAR_NODEFILE > ~/nodes
Image:Terminal.png frontend:
ssh orsay 'cat ~/nodes' > ~/grid_nodes && ssh rennes 'cat ~/nodes' >> ~/grid_nodes

2. Or you can use oargrid to reserve nodes simultaneously on those sites and retrieve the list of your reserved node in a file:

Image:Terminal.png frontend:
oargridsub -t deploy -w 03:00:00 gdx:rdef="nodes=1",paradent:rdef="nodes=1" > ~/oargrid.out
grid_job_id=$(grep "Grid reservation id" ~/oargrid.out | cut -d"=" -f2)
grid_job_id=$((grid_job_id+0))
oargridstat -w -l $grid_job_id | sed '/^$/d' | uniq > ~/grid_nodes
Image:Note.png Note

For more information about grid reservation, please read the Grid Reservation tutorial.

Multi-site nodes deployment

Once your nodes are reserved on multiple sites and they FQDN gathered in a file, you can deploy an environment on them. This can be achieved in two ways. You can either use environments already present on each site, or provide a single environment that will be used to deploy all nodes on all sites.

  • 1. Using local environments to each sites :

As seen in this document at the section Locate_a_suitable_image, reference environments are already present in all sites. Moreover, you can add you own environments in the site's system by using the command kaenv3, as shown in the section Describe_the_newly_created_environment. As soon as an environment is available to kaenv3 -l command, you can deploy it with the following command.

Image:Terminal.png frontend:
kadeploy3 -f ~/grid_nodes --multi-server -e squeeze-x64-base -k

This command will deploy the environment squeeze-x64-base on all the nodes with they FQDN listed in the file ~/grid_nodes. The option --multi-server specifies that the deployment will involve multiple sites. And as usual the options -k specifies that your public key ~/.ssh/authorized_keys on each site will be copied on the nodes of that site. If you properly configured your SSH account on every sites, your file ~/.ssh/authorized_keys on every site should have the same content (cf. SSH page). Hence, it will be possible to connect to all nodes the same way you connect to site's frontend.

Image:Note.png Note

The deployed reference environment squeeze-x64-base is local to each site. So if you chose to use your customized environment, you should first make sure that it is properly registered on each site where you wish to perform a multi-site deployment. The best way to do that is to deploy each customized and registered environment only on nodes of the site where it is registered with the classic method.

  • 2. Using global environments for all sites :

As you probably understood, executing the previous method for multi-site deployment obliges you to copy, and synchronize your environments and data to all the involved sites. This can lead to several errors due to mistakes made during inter-site synchronizations : file not properly synchronized, file not in the same directory on all involved sites,...

The best practice is actually to store all the common files in the single site. Then all sites will refer to those files with they http urls. The following is an example of how to do that:

  • Connect to one site frontend. Let say your connect to the site Orsay.
  • Create your environment description file as it is explained in a section above
  • Move that environment description file into your repertory ~/public/ on the site orsay.
  • Open that environment description file and change the option tarball as follow :
tarball : http://public.orsay.grid5000.fr/~your_login/your_environment_image.tgz|tgz
  • Copy your file ~/.ssh/authorized_keys into your repertory ~/public/ on the site orsay.
  • And give it the proper rights :
Image:Terminal.png frontend:
chmod 644 ~/public/authorized_keys
  • Now you can deploy your environment on all your nodes with the command:
Image:Note.png Note

This example used Orsay as the depot site where all common data are store. Feel free to adapt it according to your case.

Shutting down and rebooting nodes

Sometimes, you might want to to restart your customized environments to apply and/or test your modifications.
This can be achieved with the kapower3 and/or kareboot3 commands.
Please consult the complete list of available features of those commands in the kareboot3 and kapower3 page.

Image:Warning.png Warning

You must have had reserved nodes with the option -t deploy to be have the right to shut them down or reboot them with kapower3 and kareboot3

Tuning the Kadeploy3 deployment workflow (curious users)

kadeploy3 allows to fully modify the deployment workflow.

First of all you have to understand the different steps of a deployment. There are 3 macro-steps:

  1. SetDeploymentEnv: this step aims at setting up the deployment environment that contains all the required tools to perform a deployment ;
  2. BroadcastEnv: this step aims at broadcasting the new environment to the nodes and writing it to disk;
  3. BootNewEnv: this step aims at rebooting the nodes on their new environment.

kadeploy3 provides several implementations for each of those 3 macro-steps. You can consult that list in the kadeploy3 page. In Grid'5000, we use the following steps by default in all our clusters :

  • SetDeploymentEnv -> SetDeploymentEnvUntrusted : use an embedded deployment environment
  • BroadcastEnv -> BroadcastEnvKastafior : use the Kastafior tool to broadcast the environment
  • BootNewEnv -> BootNewEnvClassical : the nodes use a classical reboot

Each one of these implementations is divided in micro steps. You can can see the name of those micro-steps if you use the kadeploy3 option --verbose-level 4. And to see what is actually executed during those micro-steps you can add the debug option of kadeploy3 -d

Image:Terminal.png frontend:
kadeploy3 -f $OAR_FILE_NODES -k -e squeeze-x64-base --verbose-level 4 -d > ~/kadeploy3_steps

This command will store the kadeploy3 standard output in the file ~/kadeploy3_steps. Lets analyse its content:

Image:Terminal.png frontend:
grep "Time in" ~/kadeploy3_steps

This command will print on the terminal all the micro-steps executed during the deployment process, and the time spent for each execution. Here are the micro-steps that you should see:

  1. SetDeploymentEnvUntrusted-switch_pxe: Configures the PXE server so that this node will boot on an environment that contains all the required tools to perform the deployment,
  2. SetDeploymentEnvUntrusted-reboot: Sends a reboot signal to the node
  3. SetDeploymentEnvUntrusted-wait_reboot: Waits for the node to restart.
  4. SetDeploymentEnvUntrusted-send_key_in_deploy_env: Sends kadeploy's user's ssh public key into the node's authorized_keys to ease the following ssh connections,
  5. SetDeploymentEnvUntrusted-create_partition_table: Creates the partition table
  6. SetDeploymentEnvUntrusted-format_deploy_part: Format the partition where your environment will be installed. This partition is by default /dev/sda3
  7. SetDeploymentEnvUntrusted-mount_deploy_part: Mounts the deployment partition in a local directory.
  8. SetDeploymentEnvUntrusted-format_tmp_part: Format the partition defined as tmp (by default, /dev/sda5)
  9. SetDeploymentEnvUntrusted-format_swap_part: Format the swap partition
  10. BroadcastEnvKastafior-send_environment: Sends your environments into the node and untar it into the deployment partition.
  11. BroadcastEnvKastafior-manage_admin_post_install: Execute post installation instructions defined by the site admins, in general to adapt to the specificities of the cluster: console baud rate, Infiniband, Myrinet, proxy address,...
  12. BroadcastEnvKastafior-manage_user_post_install: Execute user defined post installation instructions to automatically configure its node depending on its cluster, site, network capabilities, disk capabilities,...
  13. BroadcastEnvKastafior-send_key: Sends the user public ssh key(s) to the node (if the user specified it with the option -k).
  14. BroadcastEnvKastafior-install_bootloader: Properly configures the bootloader
  15. BootNewEnvClassical-switch_pxe: Configure the PXE server so that this node will boot on the partition where your environment has been installed
  16. BootNewEnvClassical-umount_deploy_part : Umount the deployment partition from the directory where it has been mounted during the step 7.
  17. BootNewEnvClassical-reboot_from_deploy_env: Sends a reboot signal to the node
  18. BootNewEnvClassical-set_vlan: Properly configure the node's VLAN
  19. BootNewEnvClassical-wait_reboot: Wait for the node to be up.

That is it. You now know all the default micro-steps used to deploy your environments.

Image:Note.png Note

It is recommended to consult the Node storage page to understand which partition is used at which step.

Adjusting timeout for some environments

Since kadeploy3 provides multiple macro-steps and micro-steps, its is important to detect when a step in failing its execution. This error detection is done by using timeout on each step. When a timeout is reached, the nodes that have not completed the given step are discarded from the deployment process.
The value of those timeouts varies from one cluster to another since they depend on the hardware configuration (network speed, hard disk speed, reboot speed, ...). All defaults timeouts are entered in the configurations files on the kadeploy3 server. But you can consult the default timeouts of each macro-steps by using the command kastat3

Image:Terminal.png frontend:
kastat3 -x $(date --date=@$((`date +%s` - 60*60*12)) +%Y:%m:%d:%H:%M:%S) -d | sed -n '$p'
 gsimo,gdx-121.orsay.grid5000.fr,SetDeploymentEnvUntrusted,BroadcastEnvKastafior,BootNewEnvClassical,550,600,400,0,0,0,1302763272,107,50,0,squeeze-x64-base:5,false,,true,

This command will simply print information of the last deployment made on that site. The format of the output is the following :

user,hostname,step1,step2,step3,timeout_step1,timeout_step2,timeout_step3,retry_step1,retry_step2,retry_step3,start,step1_duration,step2_duration,step3_duration,env,md5,success,error

You can verify the format by asking for help to kastat3, and look for the option --field:

Image:Terminal.png frontend:
kastat3 --help | grep '\-\-field'
Image:Note.png Note

Please consult the kastat3 page for more features information.

Nevertheless, kadeploy3 allow users to change timeouts in the command line. In some cases, when you try to deploy an environment with a large tarball or with a post-install that lasts too long, you may get discarded nodes. This false positive behavior can be avoided by manually modifying the timeouts for each step at the deployment time.

For instance, in our previous example, the timeout of each steps are:

  • SetDeploymentEnvUntrusted: 550
  • BroadcastEnvKastafior: 600
  • BootNewEnvClassical: 400

You can increase the timeout of the second step to 1200 seconds with the following command :

Image:Terminal.png frontend:
kadeploy3 -e my_big_env -f $OAR_FILE_NODES -k --force-steps "SetDeploymentEnv|SetDeploymentEnvUntrusted:1:450&BroadcastEnv|BroadcastEnvKastafior:1:1200&BootNewEnv|BootNewEnvClassical:1:400"

Set Break-Point during deployment

As mentioned in the section above, a deployment is a succession of micro steps that can be consulted and modified.
Moreover, kadeploy3 allows user to set a break-point during deployment.

Image:Terminal.png frontend:
kadeploy3 -m $OAR_FILE_NODES -k -e squeeze-x64-base --verbose-level 4 -d --breakpoint BroadcastEnvKastafior:manage_user_post_install

This command can be used for debugging purpose. It performs a deployment with the maximum verbose level and it asks to stop the deployment workflow just before executing the manage_user_post_install micro-step of the BroadcastEnvKastafior macro-step. Thus you will be able to connect in the deployment environment and to manually run the user post install script to debug it.

Image:Warning.png Warning

At the current state of kadeploy3, it is not possible to resume the deployment from the break-point step. Thus you will have to redeploy you environment from the first step. This feature will be implemented in future version of kadeploy3.

Using Kexec to allow a quick reboot of the nodes

If you want to deploy a Linux based environment, you will probably be able to use an optimization for the last step of deployment to save several minutes espacially if the nodes reboot slowly. This can be achieved by using the Kexec implementation of the BootNewEnv step.

For instance, try to perform the following command and look at the time spent in the deployment :

Image:Terminal.png frontend:
kadeploy3 -e squeeze-x64-base -f $OAR_FILE_NODES -k --force-steps "SetDeploymentEnv|SetDeploymentEnvUntrusted:1:450&BroadcastEnv|BroadcastEnvKastafior:1:900&BootNewEnv|BootNewEnvKexec:1:250"
Image:Warning.png Warning

This optimization does not work on all the clusters with all environments. Here is a list of the known working clusters, tested with lenny-x64-base environment:

  • Bordeaux: bordereau, bordeplage
  • Lille: chicon, chuque, chti, chinqchint
  • Lyon: capricorne, sagittaire
  • Nancy: grelon, griffon
  • Orsay: gdx, netgdx

Modify and/or add scripts to the Grid5000-code repository

If you want to contribute with some script that you think useful for the Grdi5000 community or if you want to enhance scripts inside this repository, you just need two things :

  1. You must have a Inria gforge account : create an account at gforge.inria.fr
  2. Login at gforge.inria.fr
  3. Ask to the project admin to add your account to the developer team :
    1. Go on : Grid5000-code at gforge
    2. Click Request to join

Now you have access to the public repository of Grid5000, you can fetch, add, commit everything you want.

Personal tools
Wiki special pages