Multi-parametric experiments with CiGri

From Grid5000
Jump to: navigation, search
See also: HPC and HTC tutorial | Run MPI On Grid'5000 | Accelerators on Grid5000 | Multi-parametric experiments with CiGri
Note.png Note

This tutorial is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team.

CiGri is a grid middleware designed to manage the execution of large sets of independent (i.e. embarrassingly parallel) tasks on distributed computing platforms. CiGri is very suitable for multi-parametric experiments such as Monte-Carlo simulations and more generally, it eases the execution of any application composed of a large amount of tasks that do not communicate with each other (the so-called Bag-of-Tasks applications). Typical examples include large-scale data analysis, simulations, etc. Thanks to CiGri, users can execute such tasks on the whole of Grid'5000 in a low-priority mode, getting easy access to a huge amount of resources.

From a user standpoint, CiGri works as a grid submission system as it allows the submission of jobs on multiple sites and clusters easily. The users submit a script to the CiGri scheduler describing the set of tasks. CiGri dispatches the tasks composing the jobs to different clusters and reports the outcome of executions back to the user. Each task might be parallel, requiring several cores or nodes.

Technically, CiGri runs on top of OAR schedulers and on Grid'5000, CiGri uses the best-effort mode of OAR to execute the tasks on idle resources. CiGri monitors submitted jobs and resubmits jobs automatically if they are killed. CiGri detects problems on clusters and move jobs accordingly. CiGri only submits jobs on OAR queues if the workload on the site allows it. In addition, CiGri works as a meta-scheduler and shared the idle resources between the different job campaigns that have been submitted.

Please note that OAR also provides a job array capability already simplifying the submission of multi-parametric experiments to Grid'5000 clusters (the --array or --array-param-file options of the oarsub command). However, CiGri offers more services for such experiments (multi-site submissions, fair-sharing of idle resources, task buffering, monitoring and reporting of task executions...).

Description of an usage scenario

To demonstrate how CiGri can be used to run embarrassingly parallel application, we provide here a small example that render an animation in parallel. The sequence of images is generated using POV-Ray which is a ray tracing program. The scene to render is described in a text-based file (landscape.pov) and each image is generated independently by tracing the path of light and simulating the effects of its encounters with virtual objects at different time steps.

The computation of one image is done using the following command:

 # +O: output file, +W: image width, +H: image height, +K: clock parameter, landscape.pov: input povray file to render
 povray +W1280 +H720 +Olandscape_<XXX>.png +K<XXX> landscape.pov

Within Grid'5000 (e.g. from a frontal node), the POV-Ray executable, the input files and the job submission scripts used in the following tutorial can be downloaded from apt.grid5000.fr:

flyon: mkdir ~/cigri/; cd ~/cigri; wget http://apt.grid5000.fr/tutorial/cigri/pov.tbz2; tar -xf pov.tbz2

You can work from any frontal node but if you use Lyon, you will not have to adapt the scripts for transferring the files on other sites.

For convenience, one can use a small wrapper (see pov.sh) instead of calling directly the executable with all the parameters:

#!/bin/bash
output_dir=~/cigri/pov-tmp/
output_file=$output_dir/$(printf "landscape_%03d.png" $1)
[ -d $output_dir ] || mkdir -p $output_dir

cd ~/cigri/pov
./povray +O${output_file} +W1280 +H720 +K$1 landscape.pov

This script can be called with only one parameter, to compute a single frame of the animation:

oarsub -I
~/cigri/pov/pov.sh 0
exit

When all the images have been generated, the final animation can be render using the convert utility of ImageMagick:

 convert pov-tmp/landscape_*.png pov-tmp/landscape.gif

(Note: convert may not be available on the default g5k image)

Job submission

Running a job with CiGri involves writing a submission script containing the details of the application to be executed (executable name, list of parameters, resources needed, list of sites where the application can be executed...). A CiGri job is called a campaign and submission scripts are written using a Job Description Language (JDL) based on JSON.

Here is an example of a CiGri submission script:

# Note: JSON does not support comments, remove them before submitting the script to CiGri
{
  "name": "Povray Landscape",  # Name of the campaign

  # Global settings
  "resources": "nodes=1",      # OAR resource description     (-l option of oarsub)
  "walltime": "00:30:00",      # Job duration
  "properties": "",            # OAR constraints on resources (-p option of oarsub)
  "exec_file": "./povray",     # Executable
  "exec_directory": "$HOME/cigri/pov/", # Execution directory (Default: $HOME). This directory has to exist on each site.

  # Input parameters
  # The list can be generated using: ruby -e '100.times {|i| puts "\"+Olandscape_%02d.png +W1280 +H720 +K#{i} landscape.pov\"," % i}'
  "params": [
    "+Olandscape_00.png +W1280 +H720 +K0 landscape.pov",
    "+Olandscape_01.png +W1280 +H720 +K1 landscape.pov",
    ...
    "+Olandscape_99.png +W1280 +H720 +K99 landscape.pov"
  ],

  # Per site configuration
  # Note: povray and landscape.pov must be present on each site
  "clusters": {               # List of OAR clusters where the campaign can run
   "lyon": {},                #  (note: on Grid'5000, OAR clusters == Grid'5000 sites)
   "nancy": {}, 
   "rennes":{}
  }
}
Note.png Note

JSON does not support comments, remove them before submitting the script to CiGri or use directly ~/cigri/pov/cigri-with-params.jdl

  • "params" defines the input arguments of each execution. For example, with:
 "exec_file": "~/script.sh",
 "params": [ # parameters of our multi-parametric experiments
   "a b c",  # first experiment
   "b c d",  # second
   "c d e",  # third
   "e f g"   # fourth
 ]

4 tasks are executed on compute nodes:

 ~/script.sh a b c
 ~/script.sh b c d
 ~/script.sh c d e
 ~/script.sh e f g

Note that the first parameter will be used for the name of the jobs which must be unique. So the first parameter must be a unique key.

  • If the script only take an incrementing integer as parameter, the "nb_jobs" option can replace the full list of parameters. This option generates automatically parameters from 0 to nb_jobs-1. For instance, it can be used with our pov.sh wrapper:
  "exec_file": "~/cigri/pov/pov.sh", 
  "nb_jobs": 100

With these parameters, pov.sh is executed with the following arguments:

 ~/cigri/pov/pov.sh 0
 ~/cigri/pov/pov.sh 1
 [...]
 ~/cigri/pov/pov.sh 99
  • Global settings can be overridden for each Grid'5000 site. For instance, to change the executable path at Nancy and limit executions at Nancy to the cluster named 'griffon', one can use:
 "nancy": {
   "exec_file": "~/cigri/pov/nancy/pov_nancy.sh",
   "properties": "cluster='griffon'"
 }
  • On Grid'5000, there is one instance of an OAR scheduler per site. For CiGri, a cluster is an entire Grid'5000 site (such as Rennes) and CiGri does not recognized the individual names of Grid'5000 clusters (such as parapluie) for the field "cluster". But you can add restriction on cluster names by using the "properties" field as below.
  • CiGri has built-in mechanism to prepare the execution of tasks and to retrieve the results of the execution. The prologue is executed before the first job on each site. The epilogue is executed at the end of the campaign on each site. For transferring files from/to Lyon, one can use:
 "prologue": [
   "mkdir -p ~/cigri/pov/",
   "rsync -avz lyon:~/cigri/pov/ ~/cigri/pov/",
   "mkdir -p ~/cigri/pov-tmp/"
 ],
  
 "epilogue": [
   "rsync -avz --update --exclude='OAR.cigri.*' ~/cigri/pov-tmp/ lyon:~/cigri/pov-results/",
   "rm ~/cigri/pov-tmp/*.png"
 ]

Using the --update option of rsync, we ensure that rsync skips any files which exist on the pov-results/ directory and have a modified time that is newer than the file in pov-tmp. The -t option (included in -a) preserves modification times. These options are important as if one of the jobs is killed during the campaign, a partial result file may remain on a site if the job is rescheduled on another site. The correct output files are always the newest and this method preserve those files.

At Lyon, we do not need the rsync of the prologue. It can be skipped by specializing the prologue for this site (this is not mandatory as the rsync will do nothing).

 "lyon": {
   "prologue": [
     "mkdir -p ~/cigri/pov-tmp/"
  ]
 }

The final submission script for our POV-Ray example can be found in ~/cigri/pov/cigri.jdl.

Scripts are submitted to CiGri by using the gridsub command:

 $ gridsub -f cigri.jdl

CiGri command line interface

CiGri commands are available on the frontal node of every Grid'5000 site.

The gridsub command is used for submitting a new campaigns or adding jobs to an existing campaign.

 $ gridsub -f cigri.jdl
 $ gridsub -c <CAMPAIGN_ID> -f cigri-extra.jdl

The global list of submitted campaign can be retrieved using gridstat:

 $ gridstat
 
 Campaign id Name                User             Submission time     S  Progress
 ----------- ------------------- ---------------- ------------------- -- --------
 42          Some campaign       jdoe             2015-05-21 10-56-59 R  0/100 (0%)
 

Status code are : 'R' (running), 'C' (canceled), 'T' (terminated), 'P' (paused) and 'Re' (running but awaiting event).

You can also get more information on ongoing campaigns by using the -c (campaign) and -f (full) options:

 $ gridstat -c <CAMPAIGN_ID> -f

Campaigns can be canceled, suspended and resumed using the griddel command:

 $ griddel -p <CAMPAIGN_ID> # suspend a campaign (p = pause)
 $ griddel -r <CAMPAIGN_ID> # resume a campaign
 $ griddel <CAMPAIGN_ID>    # cancel a campaign

Campaign monitoring

When an event prevents the campaign to run smoothly, the status of the campaign became "Re" on the gridstat command:

 $ gridstat
 
 Campaign id Name                User             Submission time     S  Progress
 ----------- ------------------- ---------------- ------------------- -- --------
 42          Some campaign       jdoe             2015-05-21 10-56-59 Re 0/100 (0%)

The campaign is suspended and is awaiting user intervention.

To show the events of a campaign:

 $ gridevents -c <CAMPAIGN_ID>

Events can be either linked to a cluster problem (site unreachable etc.) or a problem on a task during the campaign progress.

Example:

 ------------------------------------------------------------------------------
 10024: (open) TIMEOUT at 2015-05-22T11:19:37+02:00 on nancy
 GET https://oar-api.nancy.grid5000.fr:4444/oarapi/jobs/details?array=615462: REST query timeouted!
 ------------------------------------------------------------------------------
 10025: (open) BLACKLIST at 2015-05-22T11:19:37+02:00 on nancy because of 10024
 ------------------------------------------------------------------------------

In particular, CiGri reports when a task returns an error code and stop the campaign, awaiting user instructions.

The output (stdout/stderr) of a job can be reviewed using the gridstat command:

 $ gridstat -j <CIGRI_JOB_ID> -O
 $ gridstat -j <CIGRI_JOB_ID> -E

After review, events can be discard/close using:

 $ gridevents -c 3 -f

Jobs concerned by the fixed events can also be resubmitted using the '-r' option of the gridevents command in combinaison of -f.

 $ gridevents -c 3 -f -r

Cluster blacklist is a common event. A cluster is blacklisted when a connection error occurs or when the cluster is under stress. A cluster is under stress when the workload of the OAR scheduler is high (lots of jobs in queue etc.). The command gridclusters -i gives blacklisted clusters, cluster stress and stress limits. Cluster blacklisting are removed automatically by the CiGri watchdog.

Please note that user commands (removing events, killing jobs or campaigns) are processed asynchronously by Cigri: you get the command prompt back before the commands execution.

Notifications

Users can get email or xmpp (jabber) notifications when a campaign event occurs. The command gridnotify can be used to setup notification preferences.

Examples:

 $ gridnotify -m jdoe@domain.com -s low
 $ gridnotify  -l
 You have the following notification subscriptions:
  * mail on jdoe@domain.com with severity low

API

All interactions with CiGri can also be done through a RESTful API: https://api.grid5000.fr/sid/cigri All responses are done in the JSON format. To obtain a nicely looking response, one can use the ?pretty option.

Examples:

To get the links available on the server:

 $ curl -kn https://api.grid5000.fr/sid/cigri/?pretty

To get the list of campaigns:

 $ curl -kn https://api.grid5000.fr/sid/cigri/campaigns?pretty

To submit a campaign:

 $ curl -kn -X POST https://api.grid5000.fr/sid/cigri/campaigns?pretty -d '{"name":"Some campaign", "nb_jobs":2,"resources":"nodes=1","walltime":"00:30:00","clusters":{"rennes":{"exec_file":"sleep 100"}}}'

or:

$ curl -kn -X POST https://api.grid5000.fr/sid/cigri/campaigns?pretty -d @/home/username/cigri/script.jdl

To get campaign information:

 $ curl -kn https://api.grid5000.fr/sid/cigri/campaigns/<campaign_number>?pretty

To add jobs to an existing campaign:

$ curl -kn -X POST https://api.grid5000.fr/sid/cigri/campaigns/<campaign_number>/jobs?pretty -d '["11", "12", "13"]'

To cancel a campaign:

$ curl -kn -X DELETE https://api.grid5000.fr/sid/cigri/campaigns/<campaign_number>

Additional resources

See also: HPC and HTC tutorial | Run MPI On Grid'5000 | Accelerators on Grid5000 | Multi-parametric experiments with CiGri