OAR2 use cases

From Grid5000
Jump to: navigation, search

This page presents a list of use cases for OAR 2.

Contents

OAR 2 with the backward compatible usage

This section presents ways to use Grid'5000 with OAR 2 and the new per-site hierarchy of resource almost as it used to be done before the migration from OAR version 1 (sept. 2007). This backward compatible mode allows the use of the classic ssh connector as before, but with the restriction of not being able to access resources at the cpu or core levels.

Please find after this sections several use cases for the normal usage of OAR 2, with no restriction (access to cpu or core is possible) but the need to use either oarsh or ssh with options.

See page OAR2 for more information.

Job submission

The idea of the backward compatible mode is to provide users with the same access to the resources as before (using a classic ssh), for the price of adding some options to the job submission command (and not selecting resources at the core or cpu level).

Allow classic ssh connector

First create the job, using ONLY ENTIRE nodes
jdoe@idpot:~$ oarsub -I -t allow_classic_ssh -l nodes=2
Generate a job key...
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=11415
Interactive mode : waiting...
[2007-07-13 16:36:25] Starting...

Initialize X11 forwarding...
Connect to OAR job 11415 via the node idcalc-1.grenoble.grid5000.fr

idcalc-1.grenoble.grid5000.fr appends to be the head node for our job.

Then connect to the nodes

We connect node idcalc-1 from the frontal:

jdoe@idpot:~$ ssh idcalc-1
Last login: Fri Dec 15 17:46:07 2006 from idpot.imag.fr

Assuming idcalc-2 is also part of our job:

jdoe@idcalc-1:~$ ssh idcalc-2
Last login: Fri Dec 15 17:46:07 2006 from idpot.imag.fr
jdoe@idcalc-2:~$

Hence, we can connect to the node, with classic SSH command (no special option).

Selecting node on a cluster

Submission

Using the -p 'cluster="<name>"' statement, you can select your cluster:

jdoe@idpot:~$ oarsub -I -t allow_classic_ssh -p 'cluster="idcalc"' -l nodes=2
Generate a job key...
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=11415
Interactive mode : waiting...
[2007-07-13 16:36:25] Starting...

Initialize X11 forwarding...
Connect to OAR job 11415 via the node idcalc-1.grenoble.grid5000.fr

Basic jobs using OAR 2

Note.png Note

In order to not change from the former syntax usage, OAR 2 keeps the keyword nodes, which is actually a synonym for the resource keyword host. As a consequence for this document, in any resource description, the keyword nodes can be replaced by its synonyme host, with the exactly same meaning. One may remark that nodes was kept plural with regard to the former syntax habit, while host is singular, just like core, cpu, cluster, etc.

Interactive job

Shell 1

Job submission

jdoe@idpot:~$ oarsub -I -l /nodes=3/core=1
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=4924 
Interactive mode : waiting...
[2007-03-07 08:51:04] Starting...

Connect to OAR job 4924 via the node idpot-5.grenoble.grid5000.fr
jdoe@idpot-5:~$

Connecting to the other cores

jdoe@idpot-5:~$ cat $OAR_NODEFILE
idpot-5.grenoble.grid5000.fr
idpot-8.grenoble.grid5000.fr
idpot-9.grenoble.grid5000.fr
jdoe@idpot-5:~$ oarsh idpot-8
Last login: Tue Mar  6 18:00:37 2007 from idpot.imag.fr
jdoe@idpot-8:~$ oarsh idpot-9
Last login: Wed Mar  7 08:48:30 2007 from idpot.imag.fr
jdoe@idpot-9:~$ oarsh idpot-5
Last login: Wed Mar  7 08:51:45 2007 from idpot-5.imag.fr
jdoe@idpot-5:~$

Copying a file from one node to another

jdoe@idpot-5:~$ hostname > /tmp/my_hostname
jdoe@idpot-5:~$ oarcp /tmp/my_hostname idpot-8:/tmp/
my_hostname                                              100%    7     0.0KB/s   00:00    
jdoe@idpot-5:~$ oarsh idpot-8 cat /tmp/my_hostname
idpot-5
jdoe@idpot-5:~$

Shell 2

Connecting to our job from the frontal

jdoe@idpot:~$ OAR_JOB_ID=4924 oarsh idpot-9
Last login: Wed Mar  7 08:52:09 2007 from idpot-8.imag.fr
jdoe@idpot-9:~$ oarsh idpot-5
Last login: Wed Mar  7 08:52:18 2007 from idpot-9.imag.fr
jdoe@idpot-5:~$

Batch mode job

Submission using a script

jdoe@paramount:~$ oarsub -l core=10 runhpl/runhpl
Generate a job key...
[ADMISSION RULE] Set default walltime to 3600.
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=199522

Watching results

jdoe@paramount:~$ cat OAR.199522.stdout
...

Submission using a inline command

Sometime it is very useful to run a little command in oarsub:

jdoe@paramount:~$ oarsub -l core=1 'echo $PATH;which ssh' 
Generate a job key...
[ADMISSION RULE] Set default walltime to 3600.
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=199523

Watching results

jdoe@paramount:~$ cat OAR.199523.stdout
...

Advance Reservations

Job submission

The date format to pass to the -r option is AAAA-MM-DD HH:MM:SS:

jdoe@paramount:~$ oarsub -l core=10 -r "2007-10-10 18:00:00"
Generate a job key...
[ADMISSION RULE] Set default walltime to 3600.
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=199524
Reservation mode : waiting validation...
Reservation valid --> OK
jdoe@paramount:~$

If the resources that you requested are not available, you will get a KO, which means your advance reservation wasn't accepted !

Note.png Note

You can also provide a script to your Advance Reservation, that will be run at the start time on the head node of your resources, just like with batch mode jobs (see above)

Getting information about my reserved resources

Note.png Note

As of OAR version 2.2.12, resources allocated to a Advance Reservation are fixed once validated (once oarsub prints 'OK'). This means that is some of those resources eventually change state and are unavailable (e.g. absent, suspected or dead) at the job start time, they will be discarded (after a retry timeout) from the job, that will run on the rest of them

As of OAR version 2.2.13, resources allocated to a Advance Reservation are displayed by oarstat

jdoe@paramount:~$ oarstat -fj 199524
Job_Id: 199524
    name = toto
    project = default
    owner = jdoe
    state = Waiting
    wanted_resources = -l "{type = 'default'}/core=10,walltime=10:0:0" 
    types = 
    dependencies = 
    assigned_resources = 
    assigned_hostnames = 
    queue = default
    command = toto.sh
    launchingDirectory = /home/bordeaux/jdoe
    jobType = PASSIVE
    properties = 
    reservation = Scheduled
    reserved_resources = 42+43+44+45+49+50+70+22+(41+46)
    walltime = 
    submissionTime = 2008-09-22 12:00:35
    startTime = 2008-10-21 07:00:00
    cpuset_name = jdoe_176976
    message = 
    scheduledStart = 2008-10-21 07:00:00
    resubmit_job_id = 0
    events =

The list of resource here is 42+43+44+45+49+50+70+22+(41+46) where resource 41 and 46 are currently unavailable (but might be available again at the Advance Reservation start time)

Since OAR 2.2.14 a new option "--state" or "-s" was added in oarstat. This option only get the status of specified jobs. It is an optimized query to allow scripting to check jobs status.

rcavagna@chartreuse:~$ oarstat -s -j 116082
116082: Running

rcavagna@fgdx2 ~$> oarstat -s -j 217877
217877: Waiting
Note.png Note

Note that it is recommended to use "--notify" of oarsub command to be notify of the begin and the end of a job.

Connecting to the job resources

Once the Advance Reservation start time is reached, you can connect to your resources using oarsub -C

jdoe@paramount:~$ oarsub -C 199524
Connect to OAR job 199524 via the node paraquad-23.rennes.grid5000.fr
jdoe@paraquad-23:~$
...

Examples of resource requests

Using the resource hierarchy

  • ask for 1 core on 15 nodes on a same cluster (total = 15 cores)
oarsub -I -l /cluster=1/nodes=15/core=1
  • ask for 1 core on 15 nodes on 2 clusters (total = 30 cores)
oarsub -I -l /cluster=2/nodes=15/core=1
  • ask for 1 core on 2 cpus on 15 nodes on a same cluster (total = 30 cores)
oarsub -I -l /cluster=1/nodes=15/cpu=2/core=1
  • ask for 10 cpus on 2 clusters (total = 20 cpus, information regarding the node ou core count depend on the topology of the machines)
oarsub -I -l /cluster=2/cpu=10
  • ask for 1 core on 3 different network switches (total = 3 cores)
oarsub -I -l /switch=3/core=1

Using properties

See OAR2 properties for a description of all available properties, and watch Monika.

  • ask for 10 cores of the cluster azur
oarsub -I -l core=10 -p "cluster='azur'"
  • ask for 2 nodes with 4096 GB of memory and Infiniband 10G
oarsub -I -p "memnode=4096 and ib10g='YES'" -l nodes=2
  • ask for any 4 nodes except gdx-45
oarsub -I -p "not host like 'gdx-45.%'" -l nodes=4
Note.png Note

Please refer to a SQL syntax manual in order to build a correct -p <...> syntax (WHERE clause of a resource selection SQL matching)

Mixing every together

  • ask for 1 core on 2 nodes on the same cluster with 4096 GB of memory and Infiniband 10G + 1 cpu on 2 nodes on the same switch with bicore processors for a walltime of 4 hours
oarsub -I -l "{memnode=4096 and ib10g='YES'}/cluster=1/nodes=2/core=1+{cpucore=2}/switch=1/nodes=2/cpu=1,walltime=4:0:0"
Warning
  1. walltime must always be the last argument of -l <...>
  2. if no resource matches your request, oarsub will exit with the message
Generate a job key...
[ADMISSION RULE] Set default walltime to 3600.
[ADMISSION RULE] Modify resource description with type constraints
There are not enough resources for your request
OAR_JOB_ID=-5
Oarsub failed: please verify your request syntax or ask for support to your admin.

Moldable jobs

  • ask for 4 nodes and a walltime of 2 hours or 2 nodes and a walltime of 4 hours
oarsub -I -l nodes=4,walltime=2 -l nodes=2,walltime=4

Types of job

OAR2 feature the concept of job "type". Among them, the type deploy (that used to be a queue with OAR 1.6) and the type besteffort.

  • ask for 4 nodes on the same cluster in order to deploy a customized environment:
oarsub -I -l cluster=1/nodes=4,walltime=6 -t deploy
  • submit besteffort jobs
for param in $(< ./paramlist); do
    oarsub -t besteffort -l core=1 "./my_script.sh $param"
done

Retrieving the resources allocated to my job

OAR 2.2.13 version and above features a new tool that is called oarprint, that allows to pretty print a job resources

Retrieving resources from within the job

We first submit a job, of course !

jdoe@capricorne:~$ oarsub -I -l nodes=4
...
OAR_JOB_ID=178361
..
Connect to OAR job 178361 via the node capricorne-34.lyon.grid5000.fr
..

Retrieve the host list

We want the list of the nodes we got, identified by unique hostnames

jdoe@capricorne-34:~$oarprint host
sagittaire-32.lyon.grid5000.fr
capricorne-34.lyon.grid5000.fr
sagittaire-63.lyon.grid5000.fr
sagittaire-28.lyon.grid5000.fr

(We get 1 line per host, not per core !)

Warning.png Warning

nodes is a psœudo property: you must use host instead

Retrieve the core list

jdoe@capricorne-34:~$ oarprint core
63
241
64
163
243
244
164
242

Obviously, retrieving OAR internal core Id might not help much. Hence the use of a customized output format

Retrieve core list with host and cpuset Id as identifier

We want to identify our cores by their associated host names and cpuset Ids:

jdoe@capricorne-34:~$ oarprint core -P host,cpuset
capricorne-34.lyon.grid5000.fr 0
sagittaire-32.lyon.grid5000.fr 0
capricorne-34.lyon.grid5000.fr 1
sagittaire-28.lyon.grid5000.fr 0
sagittaire-63.lyon.grid5000.fr 0
sagittaire-63.lyon.grid5000.fr 1
sagittaire-28.lyon.grid5000.fr 1
sagittaire-32.lyon.grid5000.fr 1

A more complex example with a customized output format

We want to identify our cores by their associated host name and cpuset Id, and get the memory information as well, with a customized output format

jdoe@capricorne-34:~$ oarprint core -P host,cpuset,memnode -F "NODE=%[%] MEM=%"
NODE=capricorne-34.lyon.grid5000.fr[0] MEM=2048
NODE=sagittaire-32.lyon.grid5000.fr[0] MEM=2048
NODE=capricorne-34.lyon.grid5000.fr[1] MEM=2048
NODE=sagittaire-28.lyon.grid5000.fr[0] MEM=2048
NODE=sagittaire-63.lyon.grid5000.fr[0] MEM=2048
NODE=sagittaire-63.lyon.grid5000.fr[1] MEM=2048
NODE=sagittaire-28.lyon.grid5000.fr[1] MEM=2048
NODE=sagittaire-32.lyon.grid5000.fr[1] MEM=2048

Retrieving resources from the submission frontend

You just have to pipe oarstat command in oarprint:

jdoe@capricorne:~$ oarstat -j 178361 -p | oarprint core -P host,cpuset,memnode -F "%[%] (%)" -f -
capricorne-34.lyon.grid5000.fr[0] (2048)
sagittaire-32.lyon.grid5000.fr[0] (2048)
capricorne-34.lyon.grid5000.fr[1] (2048)
sagittaire-28.lyon.grid5000.fr[0] (2048)
sagittaire-63.lyon.grid5000.fr[0] (2048)
sagittaire-63.lyon.grid5000.fr[1] (2048)
sagittaire-28.lyon.grid5000.fr[1] (2048)
sagittaire-32.lyon.grid5000.fr[1] (2048)

List OAR properties

Properties can be listed using the oarprint -l command:

jdoe@capricorne-34:~$ oarprint -l
List of properties:
besteffort, cpuset, ib10gmodel, memnode, memcore, ethnb, cpuarch, myri2gmodel, cpu, myri10g, memcpu, xpanagran, myri10gmodel, wattmetre, type,
cpufreq, myri2g, ib10g, core, deploy, ip, disktype, nodemodel, cluster, cpucore, network_address, virtual, host, rconsole, cputype, switch,
xpsalome
Note.png Note

Those properties can also be used in oarsub using the -p switch for instance.

X11 forwarding

Some users complained about the lack of X11 forwarding in oarsub or oarsh. It is now enabled.

We are using xeyes to test X: 2 big eyes should appear on your screen, and follow the moves of your mouse.

Shell 1

Check DISPLAY

jdoe@idpot:~$ echo $DISPLAY
localhost:11.0

Job submission

jdoe@idpot:~$ oarsub -I -l /nodes=2/core=1
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=4926 
Interactive mode : waiting...
[2007-03-07 09:01:16] Starting...

Initialize X11 forwarding...
Connect to OAR job 4926 via the node idpot-8.grenoble.grid5000.fr
jdoe@idpot-8:~$ xeyes &
[1] 14656
jdoe@idpot-8:~$ cat $OAR_NODEFILE
idpot-8.grenoble.grid5000.fr
idpot-9.grenoble.grid5000.fr
[1]+  Done                    xeyes
jdoe@idpot-8:~$ oarsh idpot-9 xeyes
Error: Can't open display: 
jdoe@idpot-8:~$ oarsh -X idpot-9 xeyes

Shell 2

jdoe@idpot:~$ echo $DISPLAY
localhost:13.0
jdoe@idpot:~$ OAR_JOB_ID=4928 oarsh -X idpot-9 xeyes

Using a parallel launcher: taktuk

Warning.png Warning

Taktuk MUST BE installed on all nodes to test this point

Shell 1

Unset DISPLAY so that X does not bother...

jdoe@idpot:~$ unset DISPLAY

Job submission

jdoe@idpot:~$ oarsub -I -l /nodes=20/core=1
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=4930 
Interactive mode : waiting...
[2007-03-07 09:15:13] Starting...

Connect to OAR job 4930 via the node idpot-1.grenoble.grid5000.fr

Running the taktuk command

jdoe@idpot-1:~$ taktuk -c "oarsh" -f $OAR_FILE_NODES broadcast exec [ date ]
idcalc-12.grenoble.grid5000.fr-1: date (11567): output > Thu May  3 18:56:58 CEST 2007
idcalc-12.grenoble.grid5000.fr-1: date (11567): status > Exited with status 0
idcalc-4.grenoble.grid5000.fr-8: date (31172): output > Thu May  3 19:00:09 CEST 2007
idcalc-2.grenoble.grid5000.fr-2: date (32368): output > Thu May  3 19:01:56 CEST 2007
idcalc-3.grenoble.grid5000.fr-5: date (31607): output > Thu May  3 18:56:44 CEST 2007
idcalc-3.grenoble.grid5000.fr-5: date (31607): status > Exited with status 0
idcalc-7.grenoble.grid5000.fr-13: date (31188): output > Thu May  3 18:59:54 CEST 2007
idcalc-9.grenoble.grid5000.fr-15: date (32426): output > Thu May  3 18:56:45 CEST 2007
idpot-6.grenoble.grid5000.fr-20: date (16769): output > Thu May  3 18:59:54 CEST 2007
idcalc-4.grenoble.grid5000.fr-8: date (31172): status > Exited with status 0
idcalc-5.grenoble.grid5000.fr-9: date (10288): output > Thu May  3 18:56:39 CEST 2007
idcalc-5.grenoble.grid5000.fr-9: date (10288): status > Exited with status 0
idcalc-6.grenoble.grid5000.fr-11: date (11290): output > Thu May  3 18:57:52 CEST 2007
idcalc-6.grenoble.grid5000.fr-11: date (11290): status > Exited with status 0
idcalc-7.grenoble.grid5000.fr-13: date (31188): status > Exited with status 0
idcalc-8.grenoble.grid5000.fr-14: date (10450): output > Thu May  3 18:57:34 CEST 2007
idcalc-8.grenoble.grid5000.fr-14: date (10450): status > Exited with status 0
idcalc-9.grenoble.grid5000.fr-15: date (32426): status > Exited with status 0
idpot-1.grenoble.grid5000.fr-16: date (18316): output > Thu May  3 18:57:19 CEST 2007
idpot-1.grenoble.grid5000.fr-16: date (18316): status > Exited with status 0
idpot-10.grenoble.grid5000.fr-17: date (31547): output > Thu May  3 18:56:27 CEST 2007
idpot-10.grenoble.grid5000.fr-17: date (31547): status > Exited with status 0
idpot-2.grenoble.grid5000.fr-18: date (407): output > Thu May  3 18:56:21 CEST 2007
idpot-2.grenoble.grid5000.fr-18: date (407): status > Exited with status 0
idpot-4.grenoble.grid5000.fr-19: date (2229): output > Thu May  3 18:55:37 CEST 2007
idpot-4.grenoble.grid5000.fr-19: date (2229): status > Exited with status 0
idpot-6.grenoble.grid5000.fr-20: date (16769): status > Exited with status 0
idcalc-2.grenoble.grid5000.fr-2: date (32368): status > Exited with status 0
idpot-11.grenoble.grid5000.fr-6: date (12319): output > Thu May  3 18:59:54 CEST 2007
idpot-7.grenoble.grid5000.fr-10: date (7355): output > Thu May  3 18:57:39 CEST 2007
idpot-5.grenoble.grid5000.fr-12: date (13093): output > Thu May  3 18:57:23 CEST 2007
idpot-3.grenoble.grid5000.fr-3: date (509): output > Thu May  3 18:59:55 CEST 2007
idpot-3.grenoble.grid5000.fr-3: date (509): status > Exited with status 0
idpot-8.grenoble.grid5000.fr-4: date (13252): output > Thu May  3 18:56:32 CEST 2007
idpot-8.grenoble.grid5000.fr-4: date (13252): status > Exited with status 0
idpot-11.grenoble.grid5000.fr-6: date (12319): status > Exited with status 0
idpot-9.grenoble.grid5000.fr-7: date (17810): output > Thu May  3 18:57:42 CEST 2007
idpot-9.grenoble.grid5000.fr-7: date (17810): status > Exited with status 0
idpot-7.grenoble.grid5000.fr-10: date (7355): status > Exited with status 0
idpot-5.grenoble.grid5000.fr-12: date (13093): status > Exited with status 0

Setting the connector definitively and running taktuk again

jdoe@idpot-1:~$ export TAKTUK_CONNECTOR=oarsh
jdoe@idpot-1:~$ taktuk -m idpot-3 -m idpot-4 broadcast exec [ date ]
idpot-3-1: date (12293): output > Wed Mar  7 09:20:25 CET 2007
idpot-4-2: date (7508): output > Wed Mar  7 09:20:19 CET 2007
idpot-3-1: date (12293): status > Exited with status 0
idpot-4-2: date (7508): status > Exited with status 0

Using MPI with OARSH

To use MPI, you must setup your MPI stack so that it use OARSH instead of the default RSH or SSH connector. All required steps for the main different flavors of MPI are presented below.

MPICH1

Mpich1 connector can be changed using the P4_RSHCOMMAND environment variable. This variable must be set in the shell configuration files. For instance for bash, within ~/.bashrc

export P4_RSHCOMMAND=oarsh

Please consider setting the P4_GLOBMEMSIZE as well.

You can then run your mpich1 application:

jdoe@idpot-4:~/mpi/mpich$ mpirun.mpich -machinefile $OAR_FILE_NODES -np 6 ./hello
Hello world from process 0 of 6 running on idpot-4.grenoble.grid5000.fr
Hello world from process 4 of 6 running on idpot-6.grenoble.grid5000.fr
Hello world from process 1 of 6 running on idpot-4.grenoble.grid5000.fr
Hello world from process 3 of 6 running on idpot-5.grenoble.grid5000.fr
Hello world from process 2 of 6 running on idpot-5.grenoble.grid5000.fr
Hello world from process 5 of 6 running on idpot-6.grenoble.grid5000.fr

MPICH2

Tested version: 1.0.5p2

MPICH2 uses daemons on nodes that may be started with the "mpdboot" command. This command takes oarsh has an argument (--rsh=oarsh) and all goes well:

jdoe@idpot-2:~/mpi/mpich/mpich2-1.0.5p2/bin$ ./mpicc -o hello ../../../hello.c 
jdoe@idpot-2:~/mpi/mpich/mpich2-1.0.5p2/bin$ ./mpdboot --file=$OAR_NODEFILE --rsh=oarsh -n 2
jdoe@idpot-2:~/mpi/mpich/mpich2-1.0.5p2/bin$ ./mpdtrace -l
idpot-2_39441 (129.88.70.2)
idpot-4_36313 (129.88.70.4)
jdoe@idpot-2:~/mpi/mpich/mpich2-1.0.5p2/bin$ ./mpiexec -np 8 ./hello
Hello world from process 0 of 8 running on idpot-2
Hello world from process 1 of 8 running on idpot-4
Hello world from process 3 of 8 running on idpot-4
Hello world from process 2 of 8 running on idpot-2
Hello world from process 5 of 8 running on idpot-4
Hello world from process 4 of 8 running on idpot-2
Hello world from process 6 of 8 running on idpot-2
Hello world from process 7 of 8 running on idpot-4

LAM/MPI

Tested version: 7.1.3

You can use export LAMRSH=oarsh before starting lamboot; otherwise the "lamboot" command takes -ssi boot_rsh_agent "oarsh" option has an argument (this is not in the manual!). Also note that OARSH doesn't automatically sends the environnement of the user, so, you may need to specify the path to LAM distribution on the nodes with this option: -prefix

jdoe@idpot-2:~/mpi/lam$ ./bin/lamboot -prefix ~/mpi/lam \
                                         -ssi boot_rsh_agent "oarsh" \
                                         -d $OAR_FILE_NODES
jdoe@idpot-2:~/mpi/lam$ ./bin/mpirun -np 8 hello
Hello world from process 2 of 8 running on idpot-2
Hello world from process 3 of 8 running on idpot-2
Hello world from process 0 of 8 running on idpot-2
Hello world from process 1 of 8 running on idpot-2
Hello world from process 4 of 8 running on idpot-4
Hello world from process 6 of 8 running on idpot-4
Hello world from process 5 of 8 running on idpot-4
Hello world from process 7 of 8 running on idpot-4

OpenMPI

Tested version: 1.1.4

The magic option to use with OpenMPI and OARSH is "-mca pls_rsh_agent "oarsh"". Also note that OpenMPI works with daemons that are started on the nodes (orted), but "mpirun" starts them on-demand. The "-prefix" option can help if OpenMPI is not installed in a standard path on the cluster nodes.

jdoe@idpot-2:~/mpi/openmpi$ ./bin/mpirun -prefix ~/mpi/openmpi \
                                -machinefile $OAR_FILE_NODES \
                                -mca pls_rsh_agent "oarsh" \
                                -np 8 hello
Hello world from process 0 of 8 running on idpot-2
Hello world from process 4 of 8 running on idpot-4
Hello world from process 1 of 8 running on idpot-2
Hello world from process 5 of 8 running on idpot-4
Hello world from process 2 of 8 running on idpot-2
Hello world from process 6 of 8 running on idpot-4
Hello world from process 7 of 8 running on idpot-4
Hello world from process 3 of 8 running on idpot-2

Tests of the CPUSET mechanism

Processus isolation

In this test, we run 4 yes commands in a job whose resources is only one core. (syntax tested with bash as the user's shell)

jdoe@idpot:~$ oarsub -l core=1 "yes > /dev/null & yes > /dev/null & yes > /dev/null & yes > /dev/null"
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=8683 

Then we connect to the node and run top

jdoe@idpot:~$ oarsub -C 8683
Initialize X11 forwarding...
Connect to OAR job 8683 via the node idpot-9.grenoble.grid5000.fr
jdoe@idpot-9:~$ ps -eo fname,pcpu,psr | grep yes
yes      23.2   1
yes      23.1   1
yes      24.0   1
yes      23.0   1

This shows that the 4 processus are indeed restricted to the core the job was assigned to, as expected.

Don't forget to delete your job:

jdoe@idpot:~$ oardel 8683

Using best effort mode jobs

Best effort job campaign

OAR 2 provides a way to specify that jobs are best effort, which means that the server can delete them if room is needed to fit other jobs. One can submit such jobs using the besteffort type of job.

For instance you can run a job campaign as follows:

for param in $(< ./paramlist); do
    oarsub -t besteffort -l core=1 "./my_script.sh $param"
done

In this example, the file ./paramlist contains a list of parameters for a parametric application.

The following demonstrates the mechanism.

Note.png Note

Please have a look at the user charter to avoid abuses.

Best effort job mechanism

Running a besteffort job in a first shell
jdoe@idpot:~$ oarsub -I -l nodes=23 -t besteffort
[ADMISSION RULE] Added automatically besteffort resource constraint
[ADMISSION RULE] Redirect automatically in the besteffort queue
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=9630 
Interactive mode : waiting...
[2007-05-10 11:06:25] Starting...

Initialize X11 forwarding...
Connect to OAR job 9630 via the node idcalc-1.grenoble.grid5000.fr
Running a non best effort job on the same set of resources in a second shell
jdoe@idpot:~$ oarsub -I
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=9631 
Interactive mode : waiting...
[2007-05-10 11:06:50] Start prediction: 2007-05-10 11:06:50 (Karma = 0.000)
[2007-05-10 11:06:53] Starting...

Initialize X11 forwarding...
Connect to OAR job 9631 via the node idpot-9.grenoble.grid5000.fr

As expected, meanwhile the best effort job was stopped (watch the first shell):

jdoe@idcalc-1:~$ bash: line 1: 23946 Killed                  /bin/bash -l
Connection to idcalc-1.grenoble.grid5000.fr closed.
Disconnected from OAR job 9630
jdoe@idpot:~$

Testing the checkpointing trigger mechanism

Writing the test script

Here is a script feature an infinite loop and a signal handler trigged by SIGUSR2 (default signal for OAR's checkpointing mechanism).

#!/bin/bash

handler() { echo "Caught checkpoint signal at: `date`"; echo "Terminating."; exit 0; }
trap handler SIGUSR2

cat <<EOF
Hostname: `hostname`
Pid: $$
Starting job at: `date`
EOF
while : ; do sleep 1; done

Running the job

We run the job on 1 core, and a walltime of 1 hour, and ask the job to be checkpointed if it lasts (and it will indeed) more that walltime - 900 sec = 45 min.

jdoe@idpot:~/oar-2.0/tests/checkpoint$ oarsub -l "core=1,walltime=1:0:0" --checkpoint 900 ./checkpoint.sh 
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=9464 
jdoe@idpot:~/oar-2.0/tests/checkpoint$

Result

Taking a look at the job output:

jdoe@idpot:~/oar-2.0/tests/checkpoint$ cat OAR.9464.stdout 
Hostname: idpot-9
Pid: 26577
Starting job at: Fri May  4 19:41:11 CEST 2007
Caught checkpoint signal at: Fri May  4 20:26:12 CEST 2007
Terminating.

The checkpointing signal was sent to the job 15 minutes before the walltime as expected so that the job can finish nicely.

Interactive checkpointing

The oardel command provides the capability to raise a checkpoint event interactively to a job.

We submit the job again

jdoe@idpot:~/oar-2.0/tests/checkpoint$ oarsub -l "core=1,walltime=1:0:0" --checkpoint 900 ./checkpoint.sh 
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=9521

Then run the oardel -c #jobid command...

jdoe@idpot:~/oar-2.0/tests/checkpoint$ oardel -c 9521
Checkpointing the job 9521 ...DONE.
The job 9521 was notified to checkpoint itself (send SIGUSR2).

And then watch the job's output:

jdoe@idpot:~/oar-2.0/tests/checkpoint$ cat OAR.9521.stdout 
Hostname: idpot-9
Pid: 1242
Starting job at: Mon May  7 16:39:04 CEST 2007
Caught checkpoint signal at: Mon May  7 16:39:24 CEST 2007
Terminating.

The job terminated as expected.

Testing the mechanism of dependency on an anterior job termination

First Job

We run a first interactive job in a first Shell

jdoe@idpot:~$ oarsub -I 
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=9458 
Interactive mode : waiting...
[2007-05-04 17:59:38] Starting...

Initialize X11 forwarding...
Connect to OAR job 9458 via the node idpot-9.grenoble.grid5000.fr
jdoe@idpot-9:~$

And leave that job pending.

Second Job

Then we run a second job in another Shell, with a dependence on the first one

jdoe@idpot:~$ oarsub -I -a 9458
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=9459 
Interactive mode : waiting...
[2007-05-04 17:59:55] Start prediction: 2007-05-04 19:59:39 (Karma = 4.469)

So this second job is waiting for the first job walltime (or sooner termination) to be reached to start.

Job dependency in action

We do a logout on the first interactive job...

jdoe@idpot-9:~$ logout
Connection to idpot-9.grenoble.grid5000.fr closed.
Disconnected from OAR job 9458
jdoe@idpot:~$ 

... then watch the second Shell and see the second job starting

[2007-05-04 18:05:05] Starting...

Initialize X11 forwarding...
Connect to OAR job 9459 via the node idpot-7.grenoble.grid5000.fr

... as expected.

Grid jobs with OAR 2

In order to run grid jobs, which means programs and communications over several sites of Grid'5000, one needs to be able to connect from nodes on one site to nodes on other sites.

Each site features one unique OAR 2 server, managing every resources (e.g. possibly several clusters) from the site. However, from site to site, nodes will not be managed by the same OAR 2 server, which implies that one must enable a connection mechanism between the nodes allocated for one's jobs on every sites involved in one's grid experiment.

That connection mechanism is OAR 2 job keys feature.

In the examples below, we use idpot and idcalc as if they were 2 sites, even if they are actually managed by only one OAR 2 server and share the same NFS storage, as this should not false the use case.

Grid jobs without using OARGrid

Reservation

Grid jobs mean using several clusters of Grid'5000 at the same time. To achieve that, one can make advance reservations, e.g. submit jobs to be started at a given time, the same time on many clusters.

Idpot

In a first Shell, we create a first job, on Idpot. The request is for 2 nodes, 1 core per node, using the option to export the job key in the file my_key

jdoe@idpot:~$ mkdir idpot; cd idpot
jdoe@idpot:~/idpot$ oarsub -p "cluster='idpot'" -l "nodes=2/core=1" -e my_key -r '2007-07-12 17:12:00'
Generate a job key...
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=11342
Export job key to file: my_key
Reservation mode : waiting validation...
Reservation valid --> OK
jdoe@idpot:~/idpot$ cd ..
jdoe@idpot:~$

Idcalc

If both clusters were on different sites, we would have to copy the job key file from the OAR submission frontal of the first site to the OAR submission frontal on the second site, and to login on that second submission frontal (both using SSH for instance). Below we will only use a different directory for the second reservation, and copy the job key.

Now we create a second job on the second cluster: idcalc. The request is for 3 nodes, 2 core per nodes.

jdoe@idpot:~$ mkdir idcalc; cd idcalc
jdoe@idpot:~/idcalc$ scp ../idpot/my_key .
jdoe@idpot:~/idcalc$ oarsub -p "cluster='idcalc'" -l nodes=3/core=2  -i my_key -r '2007-07-12 17:17:00'
Import job key from file: my_key
[ADMISSION RULE] Set default walltime to 7200.
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=11346
Reservation mode : waiting validation...
Reservation valid --> OK
jdoe@idpot:~/idcalc$

Building the node list

Once the job start time reached and the job running...

jdoe@idpot:~$ oarstat 
Job id     Name           User           Time Use            S Queue
---------- -------------- -------------- ------------------- - ----------
11342                     jdoe        2007-07-12 17:11:08 R default   
11346                     jdoe        2007-07-12 17:16:29 R default  

...We connect (oarsub -C) to our jobs, and build the node list

jdoe@idpot:~$ oarsub -C 11342
Initialize X11 forwarding...
Connect to OAR job 11342 via the node idpot-1.grenoble.grid5000.fr
jdoe@idpot-1:~/idpot$ sort -u $OAR_NODEFILE > my_nodes
jdoe@idpot-1:~/idpot$ logout
Connection to idpot-1.grenoble.grid5000.fr closed.
Disconnected from OAR job 11342
jdoe@idpot:~$
jdoe@idpot:~$ oarsub -C 11346
Initialize X11 forwarding...
Connect to OAR job 11346 via the node idcalc-1.grenoble.grid5000.fr
jdoe@idcalc-1:~/idcalc$ sort -u $OAR_NODEFILE > my_nodes
jdoe@idcalc-1:~/idcalc$ logout
Connection to idcalc-1.grenoble.grid5000.fr closed.
Disconnected from OAR job 11346
jdoe@idpot:~$

Testing connectivity

Now we can run taktuk using the nodes of both jobs on both clusters (idpot + idcalc).

jdoe@idpot:~$ unset DISPLAY
jdoe@idpot:~$ export OAR_JOB_KEY_FILE=~/idcalc/my_key
jdoe@idpot:~$ export TAKTUK_CONNECTOR=oarsh
jdoe@idpot:~$ cat idpot/my_nodes idcalc/my_nodes > my_grid_nodes
jdoe@idpot:~$ taktuk -f my_grid_nodes broadcast exec [ date ]
idcalc-1.grenoble.grid5000.fr-3: date (21410): output > Thu Jul 12 17:13:07 CEST 2007
idcalc-3.grenoble.grid5000.fr-5: date (8629): output > Thu Jul 12 17:12:52 CEST 2007
idpot-1.grenoble.grid5000.fr-1: date (18693): output > Thu Jul 12 17:13:29 CEST 2007
idpot-2.grenoble.grid5000.fr-2: date (15286): output > Thu Jul 12 17:12:30 CEST 2007
idcalc-2.grenoble.grid5000.fr-4: date (6156): output > Thu Jul 12 17:22:42 CEST 2007
idpot-1.grenoble.grid5000.fr-1: date (18693): status > Exited with status 0
idpot-2.grenoble.grid5000.fr-2: date (15286): status > Exited with status 0
idcalc-1.grenoble.grid5000.fr-3: date (21410): status > Exited with status 0
idcalc-3.grenoble.grid5000.fr-5: date (8629): status > Exited with status 0
idcalc-2.grenoble.grid5000.fr-4: date (6156): status > Exited with status 0

We can now delete our jobs:

jdoe@idpot:~$ oardel 11342 11346
Deleting the job = 11342 ...REGISTERED.
Deleting the job = 11346 ...REGISTERED.
The job(s) [ 11342 11346 ] will be deleted in a near future.

Using OARGrid2

OARGRID1 scripts might still be running with OARGRID2. But the latter has 3 main new features that allow you to use the new capabilities of OAR2:

  • Oargrid generates an ssh key usable by oarsh, and transmits it to oar
  • A new keyword "rdef" may be used in the syntax to define the resources hierarchy (see example bellow)
  • A new concept of "cluster aliases" to define a set of resources having the same properties as being a cluster

There are two kinds of cluster aliases: the "sites" and the "clusters". Clusters are within a site. For example:

site1
  cluster1 (cluster='1')
  cluster2 (cluster='2')
site2
  clusterA (cpuarch='ia64')
  clusterB (cpuarch='x86_64')

If you ask for a site, you'll obtain resources of the site, on nodes that may be very different. If you ask for a cluster, you'll have resources from a set of nodes which have the same properties (defined between the brackets in our example).

You can get the list of the cluster aliases that are defined by calling oargridsub without any option:

jdoe@idpot:~$ oargridsub
Usage oargridsub -s date [-q queue_name][-p program_to_run][-w walltime][-d directory][-v][-V][-f file] DESC
   -s give the date of the begining of the reservation, ex: "2005-01-07 11:00:00" (default is NOW)
   -q give the queue
   -p give the program to run
   -d give the directory where the program will be launched (default is "~")
   -w walltime of the reservation (default is "1:00:00")
   -F continue even if there is a rejected reservation
   -v turn on verbose mode
   -V print oargrid version and exit
   -f specifie a file where to read reservation description (instead of command line)
Where DESC is in the form :
   clusterAlias1:rdef="/nodes=5/core=4":prop="switch = 'sw4'",clusterAlias2:rdef="/cpu=16/core=1"...

 Available cluster aliases are:
       grenoble-obs --> oar.icare.grenoble.grid5000.fr
          icare
       grenoble --> localhost
          idpot (cluster='idpot')
          idcalc (cluster='idcalc')

EXAMPLE:

To make a grid submission that will ask for 1 entire cpu on 2 different nodes of idcalc, 4 cores on idpot having a 2.4Ghz cpu frequency and 1 entire node on icare:

oargridsub idcalc:rdef="/nodes=2/cpu=1",idpot:rdef="/core=4":prop="cpufreq='2.4'",icare:rdef="/nodes=1"

If you want to connect to your nodes, you can use the ssh key that has been generated for you:

jdoe@idpot:~$ oargridsub -k idcalc:rdef="/nodes=2/cpu=1",idpot:rdef="/core=4"
idcalc:rdef=/nodes=2/cpu=1,idpot:rdef=/core=4
[OAR_GRIDSUB] [idpot] Reservation success on idpot : batchId = 8696, nbNodes = 1, cpu = 0, properties = "", queue = default
[OAR_GRIDSUB] [idcalc] Reservation success on idcalc : batchId = 8697, nbNodes = 2, cpu = 0, properties = "", queue = default
[OAR_GRIDSUB] Grid reservation id = 189
[OAR_GRIDSUB] SSH KEY : /tmp/oargrid//oargrid_ssh_key_jdoe_189
       You can use this key to connect directly to your OAR nodes with the oar user.

You get the list of nodes with oargridstat:

jdoe@idpot:~$ oargridstat -l 189
idpot-1.grenoble.grid5000.fr
idpot-1.grenoble.grid5000.fr
idpot-2.grenoble.grid5000.fr
idpot-2.grenoble.grid5000.fr
idcalc-1.grenoble.grid5000.fr
idcalc-1.grenoble.grid5000.fr
idcalc-2.grenoble.grid5000.fr
idcalc-2.grenoble.grid5000.fr

And then, to connect to one of your nodes:

jdoe@idpot:~$ export OAR_JOB_KEY_FILE="/tmp/oargrid//oargrid_ssh_key_jdoe_189"
jdoe@idpot:~$ oarsh idpot-8.grenoble.grid5000.fr
Last login: Thu Apr 26 16:39:48 2007 from idpot.imag.fr

Oargridstat now also reflects the new OAR2 concepts: it can no more list a number of nodes and cpu, so it gives the resources hierarchy description for each cluster. It also reminds the cmd line options used to make the reservation:

jdoe@idpot:~$ oargridstat
Reservation n° 250:
       submission date : 2007-05-25 09:26:06
       start date : 2007-05-25 09:26:02
       walltime : 1:00:00
       program :
       directory : ~jdoe
       user : jdoe
       cmd : -k idcalc:rdef=/nodes=2/cpu=1,idpot:rdef=/core=4
       clusters with job id:
               idpot --> 9956 (name = "", resources = "/core=4", properties = "", queue = default, environment = "", partition = "")
               idcalc --> 9957 (name = "", resources = "/nodes=2/cpu=1", properties = "", queue = default, environment = "", partition = "")

Oargriddel has also been improved: it updates the oargridstat output so that a grid job that has all its jobs deleted will no more appear in the list and it knows about already terminated jobs:

jdoe@idpot:~$ oargriddel 250
[OAR_GRIDDEL] I am deleting the job 9956 on the cluster idpot ... DONE
[OAR_GRIDDEL] I am deleting the job 9957 on the cluster idcalc ... DONE
jdoe@idpot:~$ oargriddel 250
[OAR_GRIDDEL] I am deleting the job 9956 on the cluster idpot ... ALREADY TERMINATED
[OAR_GRIDDEL] I am deleting the job 9957 on the cluster idcalc ... ALREADY TERMINATED
jdoe@idpot:~$ oargridstat
jdoe@idpot:~$

Finaly, if your reservation fails, you can have more verbose messages about the problem by using the oargridsub "-v" option:

jdoe@idpot:~$ oargridsub -v idcalc:rdef="/nodes=2/cpu=1",idpot:rdef="/nodes=1/core=5"
idcalc:rdef=/nodes=2/cpu=1,idpot:rdef=/nodes=1/core=5
DESC string : idcalc:rdef=/nodes=2/cpu=1,idpot:rdef=/nodes=1/core=5
Scanning rdef=/nodes=2/cpu=1...
Scanning rdef=/nodes=1/core=5...
[OAR_GRIDSUB] Launch command : ssh localhost "/bin/sh -c \"cd ~jdoe && sudo -u jdoe oarsub -q default -r \\\"2007-05-25 09:37:11\\\" -l /nodes=1/core=5,walltime=1:00:00 --force-cpuset-name oargrid_311 -p \\\"(cluster='idpot')\\\" \""
$VAR1 = 'stderr';
$VAR2 = 'There are not enough resources for your request
Oarsub failed: please verify your request syntax or ask for support to your admin.
';
$VAR3 = 'status';
$VAR4 = '2048';
$VAR5 = 'stdout';
$VAR6 = '1180078631
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=-5
';
[OAR_GRIDSUB] [idpot] I am not able to parse correctly the output, contact administrator (Check if the launching directory is readable and browsable by oargrid user. This is a known limitation but we cannot do in an other way.).
[OAR_GRIDSUB] [DEBUG] [2048] 1180078631
[ADMISSION RULE] Modify resource description with type constraints
OAR_JOB_ID=-5
-- There are not enough resources for your request
Oarsub failed: please verify your request syntax or ask for support to your admin.
[OAR_GRIDSUB] I delete jobs already submitted
Personal tools
Namespaces

Variants
Actions
Public Portal
Users Portal
Admin portal
Wiki special pages
Toolbox