Environments creation using Kameleon and Puppet

From Grid5000
Jump to: navigation, search
Note.png Note

This tutorial is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team.

This page summarize what you need to know to create environments used on Grid'5000 using Kameleon and Puppet.

Rebuild a standard Grid'5000 image

Overview

This section describe the step to reproduce a vanilla Grid'5000 image using Kameleon. Kameleon is software appliance builder: It means that it is made to create a complete operating system image from a recipe. Recipes are a set of YAML files that contains a list of steps that are executed to bootstrap, setup and export the image it creates. In order to help users to share it provide a template mechanism that you can extends with you own steps. Kameleon has a lot of other features like: Context isolation, caching of the recipe artefacts, interactive breakpoints and so on.

For more details about Kameleon see the official documentation: http://kameleon.imag.fr/

Two major tools are used to prepare environments used in Grid'5000 Kameleon and Puppet:

  • Kameleon to create a virtual machine (VM) , call Puppet inside it and export its content in a tgz file (as used by Kadeploy) or to other formats (qcow2, ...).
  • Puppet to configure environments (install packages, configuration files, services...).
Note.png Note

As puppet is used internally you don't have to use it for your own recipe if you do not want to. Both methods are proposed here with or without puppet

Install Kameleon and other dependencies

To install Kameleon, follow instructions from the Kameleon website.

Warning.png Warning

Kameleon needs a recent version of libguestfish and ruby. This tutorial is known to work on Debian Jessie, so you can run it on Grid'5000 by deploying this environment on a node. If you are not on Jessie, and to avoid installation problems, it is recommended to deploy a node:

Terminal.png frontend:
oarsub -I -t deploy -l nodes=1,walltime=2
Terminal.png frontend:
kadeploy3 -e jessie-x64-nfs -f $OAR_FILE_NODES -k
Then SSH to your node.


To build Grid'5000 environments, we need to install a few additional packages and then the kameleon-builder gem:

Terminal.png localhost:
apt-get update && apt-get install --no-install-recommends git virtualbox linux-headers-amd64 socat qemu-utils ruby-dev ruby-childprocess polipo && apt-get install -t jessie-backports --no-install-recommends libguestfs-tools
Terminal.png localhost:
gem install --no-ri --no-rdoc kameleon-builder
Warning.png Warning

The mdadm tool, installed as a libguestfs-tools dependency, may ask you which RAID array it should manage. If you don't know what to answer, leave the field blank.

All remaining commands of this tutorial may be run as a standard, non-root, user.

Prepare environmment with a script

On a Grid'5000 deployed node with lot of RAM (>32Go) run :

#/bin/bash
set -e
apt-get update && apt-get install -y --no-install-recommends git virtualbox linux-headers-amd64 socat qemu-utils ruby-dev ruby-childprocess polipo pigz
apt-get install -y --no-install-recommends -t jessie-backports libguestfs-tools
gem install --no-ri --no-rdoc kameleon-builder
mount -t tmpfs tmpfs /tmp
mv /bin/gzip /bin/gzip.OLD
ln -s /usr/bin/pigz /bin/gzip
cd /tmp

It installs all dependencies, mounts /tmp as tmpfs which decrease the build time by ~2-3 times (don't forget to launch 'kameleon build' command from /tmp). And replaces gzip by pigz which is multithreaded implementation of gzip, so the tar.gz compression is decreases proportionally to the number of core of the node..

Get environments recipes

  • Get recipes from Github:
Terminal.png localhost:
kameleon template repo add grid5000 https://github.com/grid5000/environments-recipes.git

You can then list the available templates with:

Terminal.png localhost:
kameleon template list


Build an image

We will create a brand new workspace directory to store builds' data:

Terminal.png localhost:
mkdir ~/my_recipes && cd ~/my_recipes


Pick the right recipe

Kameleon provides some recipes, and we added some others specialized for Grid'5000, their name end by -{variant_listed_below}. We will focus on Grid'5000 recipes here.

Grid'5000 images exist in different variants. As explained earlier, puppet is in charge of configuring images. Choosing a variant is in fact equivalent to picking a set of puppet recipes. The managed variants are:

Min
Base
Nfs
Big
Std

All recipes are included in each others, in the following order:

min ⊂ base ⊂ nfs ⊂ big ⊂ std

This means that changes made in min recipes will affect all other recipes. Changes made in big recipes will affect big and std recipes.

'xen' image is a special case: It is a base image with xen kernel added.

See Getting_Started#Deploying_nodes_with_Kadeploy for more information on differences between these variants. Please note that Kameleon's build logs of Grid'5000 images are available under /grid5000/descriptions/log on frontends.

In this example we will create a min image.

Instantiate a template

We instantiate a template in the current directory. Kameleon will take care of copying all files required to build the template for you.

Terminal.png localhost ~/my_recipes:
kameleon new jessie_custom grid5000/virtualbox/jessie-x64-min.yaml

To create images, Kameleon may use several back-ends (qemu, chroot,...). In Grid'5000, it uses a virtual machine based on virtualbox.

You can list the recipes present in your workspace using the list command:

Terminal.png localhost ~/my_recipes:
kameleon list

You can see your new jessie_custom recipe along with the templates that it extends. But the description field is <MY RECIPE DESCRIPTION>. You can edit your recipe (~/my_recipes/jessie_custom.yaml) and change the description in the commented header, for example:

#==============================================================================
#
# DESCRIPTION: My Grid'5000 Debian jessie 
#
#==============================================================================

Customise your image

Maybe you have an experiment which requires always the same packages, and these packages aren't in min, are in std but std environnement is too big for you ? Kameleon allows to add some packages or whatever you want in your image really simply.

There are two ways to add a package to your image.

Add a new step

Note.png Note

This part is quick explanation to add packages to your image, you can find more details below in Environments_creation_using_Kameleon_and_Puppet#Modifying_an_image part.

The first way is add a new step to Kameleon recipe. A step is just a sequence of actions to execute described in a yaml file.

For example we will install ffmpeg.

1. Create a new file virtualbox/steps/setup/my_install.yaml

2. Fill this file with :

- install_ffmpeg:
    - exec_in : apt-get update && apt-get install -y ffmpeg

exec_in means the command will be executed with bash in the VM.

You can also execute multi-line script :

- install_ffmpeg: 
    - exec_in : |
         apt-get update
         apt-get install -y ffmpeg

3. Edit your recipe (~/my_recipes/jessie_custom.yaml)

...
- setup : 
   - "@base"
   - my_install

You can find more information on the official documentation

Edit puppet recipe

Grid'5000 images use puppet, a tool for configuration management to configure and install packages. Be careful, don't get the Kameleon recipe and puppet recipe mixed-up. Kameleon recipe is used by kameleon on your computer, puppet recipe is used by puppet in the VM which was created by kameleon.

So, the puppets recipes are in ~/my_recipes/grid5000/virtualbox/steps/data/puppet/modules/env/manifests/. In our case, the file min/packages.pp is interesting. If you open it, you can see a really simple file with variable affectation and a puppet command at the end which ask the packages in $installed variable to be installed.

We can add a new variable to install ffmpeg like this :

class env:min:packages () {
   ...
   $my_packages = ['ffmpeg']
   $installed = [ ...,..., $my_packages ]
   ...

In this simple case Puppet or new Step are equivalent, but if you have a package like postfix which requires more configuration, puppet should be better. Puppet cover a lot of needs and we can't see all that puppet can do here. If you want to know more about it refer to the documentation.

Launch the build

Run following command:

Terminal.png localhost ~/my_recipes:
kameleon build jessie_custom.yaml --enable-cache
Note.png Note

If you have enough RAM (depending of the image but >=16Go is generally fine) think about mounting the "build" directory as tmpfs. This should decrease the build time by 2 or more

Warning.png Warning

Depending of different factors, like the size of the image you are about to create (variant), the network quality, using or not a cache, hardware used (ssd?) ... Build can last a few minutes (6 min on my really expensive laptop :) ) or a few hours.

Warning.png Warning

If the build failed during the execution of echo -e "run\nzerofree /dev/sda1" I guestfish' [...], your version of libguestfs is too old.

Warning.png Warning

If the virtual machine refuse to launch during this step, try to launch it with VirtualBox's graphical client, you will get the error message.

Warning.png Warning

Running Grid'5000 VPN may be required to build some recipes which need access to Grid'5000 APT repository (g5k_* deb metapackage).

You'll end up with a build folder that contains files your are interested in:

  • build/jessie_custom/jessie_custom.tgz and build/jessie_custom/jessie_custom.dsc, to deploy your image on Grid'5000 using Kadeploy
  • build/jessie_custom/jessie_custom.qcow2 is a qcow2 version of the environment
  • build/jessie_custom/jessie_custom-cache.tar.gz is a cache that contains all files download by Kameleon during the build. It allows a quicker rebuild of the environment, even without network access. Rebuilding using cache should be possible no matter when you try to rebuild the image (and even if remote repositories / package servers changed).

Run your image

Now that you have built your first image let's run it.

In Grid'5000 with Kadeploy

Warning.png Warning

build/jessie_custom/jessie_custom.dsc must be adapted, in particular the path to the tgz file. It can be specified manually in the file, or with the Kameleon option "--global=g5k_tgz_path:<path>", or inside the recipe (see Customize variables example)

You have to put your recipe and your description file in your Grid'5000 home on the right place. For example:

Terminal.png localhost:
ssh username@access.grid5000.fr "mkdir ~/site/my_g5k_images"
Terminal.png localhost:
rsync -aAXP --bwlimit=5000 build/jessie_custom/jessie_custom.tgz build/jessie_custom/jessie_custom.dsc username@access.grid5000.fr:~/site/my_g5k_images/
Note.png Note

If you configured your SSH client as described in the SSH tutorial for Grid'5000, you should be able to connect to access.grid5000.fr by writing only "g5k" instead of "username@access.grid5000.fr"

Note.png Note

If the rsync command takes too much time and if you are currently on one of the Grid'5000 sites, you can directly rsync on the frontend of this site, which should be much quicker:

Terminal.png localhost:
rsync -aAXP --bwlimit=5000 build/jessie_custom/jessie_custom.tgz build/jessie_custom/jessie_custom.dsc username@access.site.grid5000.fr:~/my_g5k_images/


If you didn't do it before, customize your image description (see Kadeploy-v3#Environment_description for details)

Terminal.png frontend:
vim my_g5k_images/jessie_custom.dsc

Then register the description file with kaenv on the frontend:

Terminal.png frontend:
kaenv3 -a my_g5k_images/jessie_custom.dsc

And ask OAR for some nodes:

Terminal.png frontend:
oarsub -I -t deploy -l nodes=2,walltime=1

Finally launch it with kadeploy:

Terminal.png frontend:
kadeploy3 -f $OAR_NODEFILE -e jessie_custom

With Qemu

Note.png Note

This image is not contextualized by the kadeploy post install script so it can be a little miss-configured.

Just run Qemu on the image:

Terminal.png localhost:
qemu-system-x86_64 -enable-kvm -m 2048 -cpu host build/jessie_custom/jessie_custom.qcow2

With Docker

First you have to import your image in Docker. Simply use import with a tag to name your recipe:

Terminal.png localhost:
docker import build/jessie_custom/jessie_custom.tgz jessie-g5k

Then run the image with bash for example:

Terminal.png localhost:
docker run -ti jessie-g5k bash

Modifying an image

Now, we will consider you have a few personal customization to make to your image. You have two options: The easiest one is to modify Kameleon's build process to insert custom commands. The other alternative is to write your own Puppet recipe, which may be more robust for large modification, but requires to know Puppet's syntax.

Note.png Note

Keep in mind that Grid'5000's way of modifying an image configuration is by using Puppet. Grid'5000 environment only uses Kameleon to setup an initial empty image and export it to various formats. Puppet does all the configuration job (packages, files, services... )


Modifying an image using Kameleon

You can find more tutorials in the Kameleon web page:

First example: Customize variables

In this example, we will customize the default template using the Kameleon variables.

We will use the jessie_custom.yaml recipe that we created in the previous section.

Kameleon use variables prefixed by $$ like $$my_variable. Among other things, the info command allows you to see all the defined variables in your recipe:

Terminal.png localhost:
kameleon info jessie_custom.yaml

In jessie_custom.yaml file, modify Kameleon variables by adding a new global variable in the 'global' part. For instance, we will customize the g5k_tgz_path so kadeploy will know where to find our new image:

global:
  g5k_tgz_path: /home/<YOUR_G5K_USER>/my_g5k_images/jessie_custom.tgz

To build the new image, use (or you may go directly to next section):

Terminal.png localhost:
kameleon build jessie_custom.yaml --enable-cache

Second example: Install the NAS Benchmarks

The NAS benchmarks are commonly used to benchmark HPC application using MPI or OpenMP. In this example we will download and configure the NAS package and build the MPI FT benchmark.

To do so we will create a step file that will called from the recipe. First let's create the ./steps in our kameleon workspace (~/my_recipes in this tutorial).

Terminal.png localhost:
mkdir steps
Note.png Note

It is a good practice to add your steps in this folder rather

than inside the imported templates steps: it is easier to

know what is my work and what is coming from the template.

Define the content of that step in steps/NAS_benchmark.yaml. You can notice that a Kameleon variable is used to defined the NAS_Home.

- NAS_home: /root
- install_NAS_bench:
  # install dependencies
  - exec_in: apt-get -y install openmpi-bin libopenmpi-dev make gfortran gcc
  - download_file_in:
    - https://www.nas.nasa.gov/assets/npb/NPB3.3.1.tar.gz
    - $$NAS_home/NPB3.3.1.tar.gz
  - exec_in: cd $$NAS_home && tar xf NPB3.3.1.tar.gz
- configure_make_def:
  - exec_in: |
      cd $$NAS_home/NPB3.3.1/NPB3.3-MPI/
      cp config/make.def{.template,}
      sed -i 's/^MPIF77.*/MPIF77 = mpif77/' config/make.def
      sed -i 's/^MPICC.*/MPICC = mpicc/' config/make.def
      sed -i 's/^FFLAGS.*/FFLAGS  = -O -mcmodel=medium/' config/make.def
- compile_different_MPI_bench:
  - exec_in: |
      cd $$NAS_home/NPB3.3.1/NPB3.3-MPI/
      for nbproc in 1 2 4 8 16 32
      do
        for class in B C D
        do
          for bench in is lu ft
          do
            # Not all IS bench are compiling but we get 48 working
            make -j 4 $bench NPROCS=$nbproc CLASS=$class || true
          done
        done
      done


To add this step to your jessie_custom.yaml recipe, modify Kameleon build's process by adding a new step NAS_benchmark (named after the file name) to the 'setup' part, in addition to inherited steps from parent recipe (@base). It is possible to override the NAS_Home variable in the recipe. Here we chose to put it in /root/workdir for example, but first we use the inline step declaration to create the folder inside the image.

setup:
  - "@base"
  - create_my_working_directory:
    - create_the_folder:
      - exec_in: mkdir /root/workdir
  - NAS_benchmark:
    - NAS_Home: /root/workdir

To build the new image, use (or you may go directly to next section):

Terminal.png localhost:
kameleon build jessie_custom.yaml --enable-cache

Third example: Add a file

Let's add a file to your image. You can access the steps/data folder inside Kameleon recipes using the $$kameleon_data_dir variable.

In this example, we will add a script that clears logs to in your image.

First, write a step that copies a script and executes it. This step must be located at steps/clean_logs.yaml:

- script_path: /usr/local/sbin
- import_script:
  - local2in:
    - $$kameleon_data_dir/$$script_file_name
    - $$script_path/$$script_file_name
  - exec_in: chmod u+x $$script_path/$$script_file_name
- run_script:
  - exec_in: $$script_path/$$script_file_name
Note.png Note

In this step we are using the alias command local2in provided by Kameleon. See documentation of commands and alias for more details.

Here is an example of a cleaning script that must be copied in steps/data/debian_log_cleaner.sh.

#!/bin/sh
# This is my cleaning script 'cause I don't trust G5K
systemctl stop rsyslog
rm -rf /var/log/*.log*
rm -f /root/.bash_history
Note.png Note

Script content does not really matter, it is an example. Of course you can run these commands directly inside the recipe

Finally, we call that step by modifying the setup section of the recipe (jessie_custom.yaml). We set the variables script_file_name to select the script in the data folder.

  - clean_logs
    - script_file_name: debian_log_cleaner.sh

To build the new image, use (or you may go directly to next section):

Terminal.png localhost:
kameleon build jessie_custom.yaml --enable-cache

Fourth example: Modify export format

Suppose now that you want to export your image to VDI (as used by virtualbox) in addition to the qcow2 and tar.gz format. To do so, we will replace the global section of our jessie_custom.yaml recipe. If we look a the parent recipe's export section in file grid5000/virtualbox/jessie-x64-global.yaml), we see a appliance_formats

In jessie_custom.yaml file we then put:

global:
   appliance_formats: qcow2 tar.gz
Note.png Note

Allowed formats are : tar.gz, tar.bz2, tar.xz, tar.lzo, qcow, qcow2, qed, vdi, raw, vmdk

To build the new image, use:

Terminal.png localhost:
kameleon build jessie_custom.yaml --enable-cache

Modifying an image using Puppet

In Grid'5000 environments building process, Puppet is called by Kameleon after it creates a minimal operating system. Puppet recipes can be found in grid5000/virtualbox/steps/data/puppet/.


Add a package to base variant

In this example, we will re-generate a base variant adding packages iperf and emacs.

If you don't want to mess with standard recipe, you should work in a new workspace:

Terminal.png localhost:
mkdir modified_recipes_puppet && cd modified_recipes_puppet

Then, instantiate a jessie-x64-base template:

Terminal.png localhost:
kameleon new my-jessie-x64-base grid5000/virtualbox/jessie-x64-base.yaml

I add my_packages definition and add it to installed list in grid5000/virtualbox/steps/data/puppet/modules/env/manifests/base/packages.pp

class env::base::packages () {

  # Removed : findutils, grep, gzip, man-db, sed, tar, wget, diffutils, multiarch-support
  $utils = [ 'bzip2', 'curl', 'dnsutils', 'dtach', 'host', 'ldap-utils', 'lshw', 'lsof', 'bsd-mailx', 'm4', 'netcat-openbsd', 'rsync', 'screen', 'strace', 'taktuk', 'telnet', 'time', 'xstow', 'sudo' ]
  $languages = [ 'perl', 'python', 'ipython', 'ruby' ]
  $my_packages = [ 'iperf', 'emacs' ]

  $installed = [ $utils, $languages, $my_packages ]

  package {
    $installed:
      ensure => installed;
  }
}


To build the new image, use (or you may go directly to next section):

Terminal.png localhost:
kameleon build my-jessie-x64-base.yaml --enable-cache

Create a new variant

Previous section have shown how modifying base image. This may be suitable for minor modifications, but for bigger ones, it is recommended to create a new variant. Having your own variant will allow keeping your own set of customization separated from Grid'5000 recipes, which ease their maintenance (for example if Grid'5000 recipes are updated).

In this example situation, we consider that you need to install apache2, on your nodes and want to integrate it in your image. You have to create a linux user (www-data), add an apache2 configuration file, add your application (here, it's a simple html file), and ensure the service apache2 is running and enabled (starts at boot time). We consider the environment you work with usually is base. We will then extend a base environment with modifications listed before.

Note.png Note

This part is an example of Puppet usage. If you want to know more about Puppet, have a look at Puppet documentation


First, declare a new Kameleon template to build our jessie-x64-webserv image, based on jessie-x64-global template, the Kameleon template which all Grid'5000 image inherits from:

Terminal.png localhost:
kameleon new jessie-x64-webserv grid5000/virtualbox/jessie-x64-global.yaml

Create a new Puppet module apache2:

Terminal.png localhost:
mkdir grid5000/virtualbox/steps/data/puppet/modules/apache2
Terminal.png localhost:
mkdir grid5000/virtualbox/steps/data/puppet/modules/apache2/manifests
Terminal.png localhost:
mkdir grid5000/virtualbox/steps/data/puppet/modules/apache2/files

Here is an example of content for grid5000/virtualbox/steps/data/puppet/modules/apache2/manifests/init.pp:

# Module apache2

class apache2 ( ) {

  package {
    "apache2":
      ensure  => installed;
  }
  user {
    "www-data":
      ensure   => present;
  }
  file {
    "/var/www/my_application":
      ensure   => directory,
      owner    => www-data,
      group    => www-data,
      mode     => '0644';
    "/var/www/my_application/index.html":
      ensure   => file,
      owner    => www-data,
      group    => www-data,
      mode     => '0644',
      source   => 'puppet:///modules/apache2/index.html',
      require  => File['/var/www/my_application'];
    "/etc/apache2/sites-available/my_application.conf":
      ensure   => file,
      owner    => root,
      group    => root,
      mode     => '0644',
      source   => 'puppet:///modules/apache2/my_application.conf',
      require  => Package['apache2'];
    "/etc/apache2/sites-enabled/my_application.conf":
      ensure   => link,
      target   => '../sites-available/my_application.conf',
      require  => Package['apache2'],
      notify   => Service['apache2'];
  }
  service {
    "apache2":
      ensure   => running,
      enable   => true,
      require  => Package['apache2'];
  }
}

Files my_application.conf and index.html must be stored in grid5000/virtualbox/steps/data/puppet/modules/apache2/files/

grid5000/virtualbox/steps/data/puppet/modules/apache2/files/my_application.conf:

<VirtualHost *:80>

    ServerName my_application

    DocumentRoot /var/www/my_application

    ErrorLog /var/log/apache2/error.log

    # Possible values include: debug, info, notice, warn, error, crit,
    # alert, emerg.
    LogLevel warn

    CustomLog /var/log/apache2/access.log combined

</VirtualHost>

grid5000/virtualbox/steps/data/puppet/modules/apache2/files/index.html:

<html>
    <head>
        <title>Hello World!</title>
    </head>
    <body>
        <P>I &#60;3 Grid'5000!</P>
    </body>
</html>


We will now integrate this module in a new variant called webserv that extends base variant.

First we must create a file webserv.pp here : grid5000/virtualbox/steps/data/puppet/modules/env/manifests/webserv.pp:

# This file contains the apache2 class used to configure a user environment based on base variant, that contains apache2.

class env::webserv ( ) {

  class { "env::base": } # we include base variant here without overloading any of it's default parameters
  class { "apache2": }
}

To have it included by the actual Puppet setup, we must also create grid5000/virtualbox/steps/data/puppet/manifests/webserv.pp:

# User env containing apache2
# All recipes are stored in env module. Here called with webserv variant parameter.

class { 'env':
  given_variant    => 'webserv';
}

And finally, modify grid5000/virtualbox/steps/data/puppet/modules/env/manifests/init.pp to include your variant:

 case $variant {
   'min' :  { include env::min }
   'base':  { include env::base }
   'webserv': { include env::webserv }
   'nfs' :  { include env::nfs }
   'prod':  { include env::prod }
   'big' :  { include env::big }
   'xen' :  { include env::xen }
   default: { notify {"flavor $variant is not implemented":}}
 }


image, based on jessie-x64-global template, the Kameleon template which all Grid'5000 image inherits:
Terminal.png localhost:
kameleon new jessie-x64-webserv grid5000/virtualbox/jessie-x64-global.yaml

Then, tell Kameleon jessie-x64-webserv template to build our the webserv variant by modifying global section of recipe jessie-x64-webserv.yaml:

---
extend: grid5000/virtualbox/jessie-x64-global.yaml

global:
    # You can see the base template `grid5000/virtualbox/jessie-x64-global.yaml` to know the
    # variables that you can override
  variant: webserv

bootstrap:
  - "@base"

setup:
  - "@base"

export:
  - "@base"


Finally, you can launch a build :

Terminal.png localhost:
kameleon build jessie-x64-webserv.yaml --enable-cache