Advanced OAR: Difference between revisions
Sphilippot (talk | contribs) |
|||
Line 603: | Line 603: | ||
=== Container jobs === | === Container jobs === | ||
With | With the container job functionality, OAR allows for someone to execute ''inner'' jobs within the boundaries of the ''container'' job. | ||
Inner jobs are scheduled using the same algorithm as other jobs, but restricted to the container job's resources and timespan. | |||
A typical use case is to submit first a ''container'' job, then you can submit as many ''inner'' jobs as you like, given that ''inner'' jobs that will not fit in the container's boundaries will stay possibly unscheduled and then possibly unexecuted. | |||
Using such use case could for instance help dealing with the [[Grid5000:UsagePolicy]], or to give a frame for jobs of students during a tutorial. | |||
oarsub -I -t container -l nodes=10,walltime=2:00:00 | {{Note|text=Container job must ally both the container job type and the cosystem or noop job types. This is mandatory for the reason that inner jobs could be of type deploy and reboot the nodes hosting the container as well. | ||
''container'' jobs are usable with passive (batch, scripted), interactive (oarsub -I) and advance reservations (oarsub -r <date>) jobs. But ''inner'' jobs cannot be advance reservations. | |||
}} | |||
; First a job of the type container must be submitted: | |||
oarsub -I -t cosystem -t container -l nodes=10,walltime=2:00:00 | |||
... | ... | ||
OAR_JOB_ID=42 | OAR_JOB_ID=42 | ||
... | ... | ||
Then it is possible to use the inner type to schedule the new jobs within the previously created container job: | ; Then it is possible to use the inner type to schedule the new jobs within the previously created container job: | ||
oarsub -I -t inner=42 -l nodes=7,walltime=00:10:00 | oarsub -I -t inner=42 -l nodes=7,walltime=00:10:00 | ||
Line 625: | Line 632: | ||
This job will never be scheduled because the container job "42" reserved only 10 nodes. | This job will never be scheduled because the container job "42" reserved only 10 nodes. | ||
}} | }} | ||
Revision as of 16:55, 23 January 2018
Note | |
---|---|
This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team. |
This tutorial consists of various independent sections describing various details of OAR useful for an advanced usage, as well as some tips and tricks. It assumes you are familiar with OAR and Grid5000 basics. This OAR tutorial focuses on command line usage. It assumes you are using the bash shell (but should be easy to adapt to another shell). It can be read linearly, but you also may pick some random sections. Begin at least by #useful tips.
OAR
useful tips
- Take the time to carefuly configure ssh, as described in [[1]].
- Use screen so that your work is not lost if you loose the connection to Grid5000. Moreover, having a screen session opened with one or more shell sessions allows you to leave your work session when you want then get back to it later and recover it exactly as you leaved it.
- Most OAR commands (oarsub, oarstat, oarnodes) can provide output in various formats:
- textual (this is the default mode)
- PERL dumper (-D)
- xml (-X)
- yaml (-Y)
- json (-J)
- Direct access to the OAR database: users can directly access the mysql OAR database oar2 on the server mysql.<site>.grid5000.fr with the read-only account oarreader. The password is read.
Connection to a job
Being connected to a job means that your environment is setup (OAR_JOB_ID
and OAR_JOB_KEY_FILE
) so that OAR commands can work. You are automatically connected to a job if you have submitted it in interactive mode. Else you must manually connect to it:
$ JOBID=$(oarsub 'sleep 300' | sed -n 's/OAR_JOB_ID=\(.*\)/\1/p') $ oarsub -C $JOBID $ pkill -f 'sleep 300'
Connection to the job's nodes
You will normally use the oarsh
wrapper to connect to the nodes instead of ssh
, and oarcp
instead of scp
to copy files to/from the nodes. If you use taktuk (or a similar tools like pdsh), you have to configure it so that it uses oarsh instead of ssh.
oarsh and job keys
By default, OAR generates an ssh key pair for each job, and oarsh is used to connect the job's nodes.
oarsh looks to environment variables OAR_JOB_ID
or OAR_JOB_KEY_FILE
to know the key to use. This oarsh works directly if you are connected. You can also connect to the nodes without being connected to the job:
$ oarsub -I [ADMISSION RULE] Set default walltime to 3600. [ADMISSION RULE] Modify resource description with type constraints Generate a job key... OAR_JOB_ID=<JOBID> ...
then, in another terminal:
$ OAR_JOB_ID=<JOBID> oarsh <NODE_NAME>
If needed OAR allows to export the job key of a job.
sharing keys between jobs
Telling oar to always use the same key can be very convenient. If you have a passphrase-less ssh key dedicated for navigating inside grid5000, then in your ~/.profile
or ~/.bash_profile
you can set:
export OAR_JOB_KEY_FILE=<path_to_your_private_key>
Then, OAR will always use this key for all submitted jobs, which allows you to connect to your nodes with oarsh without being connected to the job.
Moreover, if this key is replicated between all Grid5000 sites, and if the environment variable OAR_JOB_KEY_FILE
is exported in ~/.profile
or ~/.bash_profile
on all sites, you will be able to connect directly from any frontend to any reserved node of any site.
If using the same key for all jobs, be warned that this will raise issues if submitting two or more jobs that share a same subset of nodes on different cpusets, because in this case processes cannot be guarantied to run on the good cpuset.
allow_classic_ssh
Submitting with option -t allow_classic_ssh
allows you to use ssh directly instead of oarsh to connect to the nodes, at the cost of not being able to select resources at a finer level than the node (cpu, core).
oarsh details
oarsh is a frontend to ssh. It opens an ssh connection as user oar
to the dedicated oar ssh server running on the node, listening on port 6667. It detects who you are based on your key, and if you have the right to use the node (if you have reserved it) it will su
to your user on the node.
So, if you don't have oarsh installed, you can still connect to the nodes by simulating it. One use case is if you have reserved nodes and want to connect to them through an ssh proxy as described in SSH#Using_SSH_with_ssh_proxycommand_setup_to_access_hosts_inside_Grid.275000:
If you have a passphrase-less ssh key internal to Grid5000, that you use to navigate inside Grid5000, you can tell oar to use this key instead of generating a job-key (see #sharing keys between jobs), then you can copy this key to your workstation outside of Grid5000:
user-laptop$ scp g5k:.ssh/<internal_key_name> g5k:.ssh/<internal_key_name>.pub ~/
In Grid5000, submit a job using this key:
$ oarsub -i ~/.ssh/<internal_key_name> -I
Wait for the job to start. Then in another terminal, from outside Grid5000, try connecting to the node:
user-laptop$ ssh -i ~/<internal_key_name> -p 6667 oar@<node name>.g5k
passive and interactive modes
In interactive mode: a shell is opened on the first node of the reservation (or on the frontend, with appropriate environment set, if the job is of type deploy
). In interactive mode, the job will be killed as soon as this job's shell is closed and will be limited by the job's walltime. It can also be killed by an explicit oardel
.
You can experiment with 3 shells. On first shell, to see the list of your running jobs, regularly run:
$ oarstat -u
To see your own jobs. On the second shell, run an interactive job:
$ oarsub -I
Wait for the job to start, run oarstat, then leave the job, run oarstat again. Submit another interactive job, and on the third shell, kill it:
$ oardel <JOBID>
In passive mode: an executable is run by oar on the first node of the reservation (or on the frontend, with appropriate environment set, if the job is of type deploy
). In passive mode, the limitation to the job's length is its walltime. It can also be killed by an explicit oardel
.
JOBID=$(oarsub 'uname -a' | sed -n 's/OAR_JOB_ID=\(.*\)/\1/p') cat OAR.$JOBID.stdout
You may not want a job to be interactive or to run a script when the job starts, for example because you will use the reserved resources from a program whose lifecycle is longer than the job (and which will use the resources by connecting to the job). One trick to achieve this is to run the job in passive mode with a long sleep
command. One drawback of this method is that the job may terminate with status error if the sleep
is killed. This can be a problem in some situations, eg. when using job dependencies.
Submission and Reservation
- If you don't specify the job's start date (oar option
-r
), then your job is a submission and oar will choose the best schedule. - If you specify the job's start date, this is a reservation, oar cannot decide the best schedule anymore, it is fixed
There are some consequences:
- Current Grid5000 usage policy allows no more than 2 reservations per site (excluding reservations that start in less than one hour)
- in submission mode you're almost guaranteed to get your wanted resources, because oar can decide what resources to allocate at the last moment. You cannot get the list of resources until the job starts.
- in reservation mode, you're not guaranteed to get your wanted resources, because oar has to plan the allocation of resources at reservation time. If later resources become not available, you lose them for your job. You can get the list of resources as soon as the reservation starts.
- in submission mode, you cannot know the date at which your job will start until it starts. But OAR can give you an estimation of that date.
- to coordinate oar submissions on several sites, OARGRID must do OAR reservations.
example: a reservation in one week:
$ oarsub -r "$(date '+%Y-%m-%d %H:%M:%S' --date='+1 week')"
For reservations, there is no interactive mode. You can give oar a command to execute or nothing. If you give it no command, you'll have to connect to the jobs once the reservation starts.
Getting information about a job
The oarstat
command gets jobs informations. By default it lists the current jobs of all users. You can restrict it to your own jobs or someone else's jobs with option -u
:
$ oarstat -u
You can get full details of a job:
$ oarstat -fj <JOBID>
If scripting OAR and regularly polling job states with oarstat, you can cause a high load on the OAR server (because default oarstat invocation causes costly SQL request in the OAR database). In this case, you should use option -s
which is optimized and only queries the current state of a given job:
$ oarstat -s -j <JOBID>
Complex resources selection
The complete selector format syntax (oarsub -l
option) is:
"-l {sql1}/name1=n1/name2=n2+{sql2}/name3=n3/name4=n4/name5=n5+...,walltime=hh:mm:ss"
where
- sqlN are optional SQL predicates on the resource properties (e.g. mem, ib_rate, gpu_count, ...)
- nameN=n are the wanted number of given resources of name nameN (e.g. host, cpu, core, disk...).
- slashes (/) between resources express resource subtree selection
- + allows aggregating different resource specifications
- walltime=hh:mm::ss (separated by a comma) sets the job walltime (expected duration), which defaults to 1 hour
- List resource properties
You can get the list of resource properties for SQL predicates by running oarprint -l
command:
$ oarprint -l List of properties: disktype, gpu_count, ...
You can get the property values set to resources using the oarnodes
:
$ oarnodes -Y --sql="host = 'sagittaire-1.lyon.grid5000.fr'"
These OAR properties are described in OAR2 properties
Using the resource hierarchy
- ask for 1 core on 15 nodes on a same cluster (total = 15 cores)
$ oarsub -I -l /cluster=1/nodes=15/core=1
- ask for 1 core on 15 nodes on 2 clusters (total = 30 cores)
$ oarsub -I -l /cluster=2/nodes=15/core=1
- ask for 1 core on 2 cpus on 15 nodes on a same cluster (total = 30 cores)
$ oarsub -I -l /cluster=1/nodes=15/cpu=2/core=1
- ask for 10 cpus on 2 clusters (total = 20 cpus, the number of nodes and cores depends on the topology of the machines)
$ oarsub -I -l /cluster=2/cpu=10
- ask for 1 core on 3 different network switches (total = 3 cores)
$ oarsub -I -l /switch=3/core=1
Selecting nodes from a specific cluster
For example in Nancy:
$ oarsub -I -l {"cluster='graphene'"}/nodes=2
Or, alternative syntax:
$ oarsub -I -p "cluster='graphene'" -l /nodes=2
Selecting specific nodes
For example in Lyon:
$ oarsub -I -l {"network_address in ('sagittaire-10.lyon.grid5000.fr', 'sagittaire-11.lyon.grid5000.fr', 'sagittaire-12.lyon.grid5000.fr')"}/nodes=1
or, alternative syntax:
$ oarsub -I -p "network_address in ('sagittaire-10.lyon.grid5000.fr', 'sagittaire-11.lyon.grid5000.fr', 'sagittaire-12.lyon.grid5000.fr')" -l /nodes=1
By negating the SQL clause, you can also exclude some nodes.
Other examples using properties
- ask for 10 cores of the cluster graphene
$ oarsub -I -l core=10 -p "cluster='graphene'"
- ask for 2 nodes with 16384 GB of memory and Infiniband 20G
$ oarsub -I -p "memnode='16384' and ib_rate='20'" -l nodes=2
- ask for any 4 nodes except graphene-12
$ oarsub -I -p "not host like 'graphene-12.%'" -l nodes=4
Two nodes with virtualization capability, on different clusters + IP subnets
We want 2 nodes and 4 /22 subnets with the following constraints:
- Nodes are on 2 different clusters of the same site (Hint: use a site with several clusters :-D)
- Nodes have virtualization capability enabled
- /22 subnets are on two different /19 subnets
- 2 subnets belonging to the same /19 subnet are consecutive
$ oarsub -I -l /slash_19=2/slash_22=2+{"virtual!='none'"}/cluster=2/nodes=1
Lets verify the reservation:
$ uniq $OAR_NODE_FILE graphene-43.nancy.grid5000.fr graphite-3.nancy.grid5000.fr
$ g5k-subnets -p 10.144.32.0/22 10.144.36.0/22 10.144.0.0/22 10.144.4.0/22
$ g5k-subnets -ps 10.144.0.0/21 10.144.32.0/21
1 core on 2 nodes on the same cluster with 16384 MB of memory and Infiniband 20G + 1 cpu on 2 nodes on the same switch with 8 cores processors for a walltime of 4 hours
$ oarsub -I -l "{memnode=16384 and ib_rate='20'}/cluster=1/nodes=2/core=1+{cpucore=8}/switch=1/nodes=2/cpu=1,walltime=4:0:0"
- Warning
- walltime must always be the last argument of
-l <...>
- if no resource matches your request, oarsub will exit with the message
Generate a job key... [ADMISSION RULE] Set default walltime to 3600. [ADMISSION RULE] Modify resource description with type constraints There are not enough resources for your request OAR_JOB_ID=-5 Oarsub failed: please verify your request syntax or ask for support to your admin.
Retrieving the resources allocated to my job
You can use oarprint
, that allows to print nicely the resources of a job.
Retrieving resources from within the job
We first submit a job
$ oarsub -I -l nodes=4 ... OAR_JOB_ID=178361 .. Connect to OAR job 178361 via the node capricorne-34.lyon.grid5000.fr ..
Retrieve the host list
We want the list of the nodes we got, identified by unique hostnames
$ oarprint host sagittaire-32.lyon.grid5000.fr capricorne-34.lyon.grid5000.fr sagittaire-63.lyon.grid5000.fr sagittaire-28.lyon.grid5000.fr
(We get 1 line per host, not per core !)
Retrieve the core list
$ oarprint core 63 241 64 163 243 244 164 242
Obviously, retrieving OAR internal core Id might not help much. Hence the use of a customized output format
Retrieve core list with host and cpuset Id as identifier
We want to identify our cores by their associated host names and cpuset Ids:
$ oarprint core -P host,cpuset capricorne-34.lyon.grid5000.fr 0 sagittaire-32.lyon.grid5000.fr 0 capricorne-34.lyon.grid5000.fr 1 sagittaire-28.lyon.grid5000.fr 0 sagittaire-63.lyon.grid5000.fr 0 sagittaire-63.lyon.grid5000.fr 1 sagittaire-28.lyon.grid5000.fr 1 sagittaire-32.lyon.grid5000.fr 1
A more complex example with a customized output format
We want to identify our cores by their associated host name and cpuset Id, and get the memory information as well, with a customized output format
$ oarprint core -P host,cpuset,memnode -F "NODE=%[%] MEM=%" NODE=capricorne-34.lyon.grid5000.fr[0] MEM=2048 NODE=sagittaire-32.lyon.grid5000.fr[0] MEM=2048 NODE=capricorne-34.lyon.grid5000.fr[1] MEM=2048 NODE=sagittaire-28.lyon.grid5000.fr[0] MEM=2048 NODE=sagittaire-63.lyon.grid5000.fr[0] MEM=2048 NODE=sagittaire-63.lyon.grid5000.fr[1] MEM=2048 NODE=sagittaire-28.lyon.grid5000.fr[1] MEM=2048 NODE=sagittaire-32.lyon.grid5000.fr[1] MEM=2048
Retrieving resources from the submission frontend
If you are not within a job ($OAR_RESOURCE_PROPERTIES_FILE
is not defined), running oarprint
will give:
$ oarprint /usr/bin/oarprint: no input data available
In that case, you can however pipe the output of the oarstat
command in oarprint
, e.g.:
$ oarstat -j <JOB_ID> -p | oarprint core -P host,cpuset,memnode -F "%[%] (%)" -f - capricorne-34.lyon.grid5000.fr[0] (2048) sagittaire-32.lyon.grid5000.fr[0] (2048) capricorne-34.lyon.grid5000.fr[1] (2048) sagittaire-28.lyon.grid5000.fr[0] (2048) sagittaire-63.lyon.grid5000.fr[0] (2048) sagittaire-63.lyon.grid5000.fr[1] (2048) sagittaire-28.lyon.grid5000.fr[1] (2048) sagittaire-32.lyon.grid5000.fr[1] (2048)
List OAR properties
Properties can be listed using the oarprint -l
command:
$ oarprint -l List of properties: disktype, gpu_count, ...
X11 forwarding
X11 forwarding can now be enabled with oarsh. As for ssh you need to pass option -X to oarsh.
We will use xterm to test X.
Shell 1
Connect to a frontend with ssh with option -X:
Check DISPLAY
$ echo $DISPLAY localhost:11.0
Job submission
$ oarsub -I -l /nodes=2/core=1 [ADMISSION RULE] Set default walltime to 7200. [ADMISSION RULE] Modify resource description with type constraints OAR_JOB_ID=4926 Interactive mode : waiting... [2007-03-07 09:01:16] Starting... Initialize X11 forwarding... Connect to OAR job 4926 via the node idpot-8.grenoble.grid5000.fr jdoe@idpot-8:~$ xterm & [1] 14656 jdoe@idpot-8:~$ cat $OAR_NODEFILE idpot-8.grenoble.grid5000.fr idpot-9.grenoble.grid5000.fr [1]+ Done xterm jdoe@idpot-8:~$ oarsh idpot-9 xterm Error: Can't open display: jdoe@idpot-8:~$ oarsh -X idpot-9 xterm
Shell 2
Also connected to the frontend with ssh -X:
$ echo $DISPLAY localhost:13.0 $ OAR_JOB_ID=4928 oarsh -X idpot-9 xterm
Using a parallel launcher: taktuk
Warning | |
---|---|
Taktuk MUST BE installed on all nodes to test this point. This is the case on production environments and provided default images, except the min and base images. |
Shell 1
Unset DISPLAY so that X does not bother...
frennes:~$ unset DISPLAY
Job submission
frennes:~$ oarsub -I -l /nodes=20/core=1 [ADMISSION RULE] Set default walltime to 3600. [ADMISSION RULE] Modify resource description with type constraints Generate a job key... OAR_JOB_ID=988498 Interactive mode : waiting... Starting... Connect to OAR job 988498 via the node paravance-37.rennes.grid5000.fr
Running the taktuk command
paravance-37:~$ uniq $OAR_FILE_NODES | taktuk -c "oarsh" -f - broadcast exec [ hostname ] paravance-54.rennes.grid5000.fr-3: hostname (6730): output > paravance-54.rennes.grid5000.fr paravance-59.rennes.grid5000.fr-7: hostname (6757): output > paravance-59.rennes.grid5000.fr paravance-59.rennes.grid5000.fr-7: hostname (6757): status > Exited with status 0 paravance-47.rennes.grid5000.fr-16: hostname (6768): output > paravance-47.rennes.grid5000.fr paravance-49.rennes.grid5000.fr-17: hostname (6778): output > paravance-49.rennes.grid5000.fr paravance-45.rennes.grid5000.fr-14: hostname (6802): output > paravance-45.rennes.grid5000.fr paravance-47.rennes.grid5000.fr-16: hostname (6768): status > Exited with status 0 paravance-41.rennes.grid5000.fr-12: hostname (6704): output > paravance-41.rennes.grid5000.fr paravance-49.rennes.grid5000.fr-17: hostname (6778): status > Exited with status 0 paravance-41.rennes.grid5000.fr-12: hostname (6704): status > Exited with status 0 paravance-45.rennes.grid5000.fr-14: hostname (6802): status > Exited with status 0 paravance-52.rennes.grid5000.fr-19: hostname (6787): output > paravance-52.rennes.grid5000.fr paravance-37.rennes.grid5000.fr-1: hostname (7373): output > paravance-37.rennes.grid5000.fr paravance-52.rennes.grid5000.fr-19: hostname (6787): status > Exited with status 0 paravance-53.rennes.grid5000.fr-2: hostname (6778): output > paravance-53.rennes.grid5000.fr paravance-53.rennes.grid5000.fr-2: hostname (6778): status > Exited with status 0 paravance-54.rennes.grid5000.fr-3: hostname (6730): status > Exited with status 0 paravance-56.rennes.grid5000.fr-5: hostname (6761): output > paravance-56.rennes.grid5000.fr paravance-38.rennes.grid5000.fr-10: hostname (6831): output > paravance-38.rennes.grid5000.fr paravance-38.rennes.grid5000.fr-10: hostname (6831): status > Exited with status 0 paravance-40.rennes.grid5000.fr-11: hostname (6784): output > paravance-40.rennes.grid5000.fr paravance-40.rennes.grid5000.fr-11: hostname (6784): status > Exited with status 0 paravance-42.rennes.grid5000.fr-13: hostname (6762): output > paravance-42.rennes.grid5000.fr paravance-42.rennes.grid5000.fr-13: hostname (6762): status > Exited with status 0 paravance-46.rennes.grid5000.fr-15: hostname (6774): output > paravance-46.rennes.grid5000.fr paravance-46.rennes.grid5000.fr-15: hostname (6774): status > Exited with status 0 paravance-50.rennes.grid5000.fr-18: hostname (6765): output > paravance-50.rennes.grid5000.fr paravance-50.rennes.grid5000.fr-18: hostname (6765): status > Exited with status 0 paravance-62.rennes.grid5000.fr-20: hostname (6781): output > paravance-62.rennes.grid5000.fr paravance-62.rennes.grid5000.fr-20: hostname (6781): status > Exited with status 0 paravance-56.rennes.grid5000.fr-5: hostname (6761): status > Exited with status 0 paravance-37.rennes.grid5000.fr-1: hostname (7373): status > Exited with status 0 paravance-57.rennes.grid5000.fr-6: hostname (6716): output > paravance-57.rennes.grid5000.fr paravance-57.rennes.grid5000.fr-6: hostname (6716): status > Exited with status 0 paravance-55.rennes.grid5000.fr-4: hostname (6721): output > paravance-55.rennes.grid5000.fr paravance-55.rennes.grid5000.fr-4: hostname (6721): status > Exited with status 0 paravance-60.rennes.grid5000.fr-8: hostname (6754): output > paravance-60.rennes.grid5000.fr paravance-60.rennes.grid5000.fr-8: hostname (6754): status > Exited with status 0 paravance-61.rennes.grid5000.fr-9: hostname (5826): output > paravance-61.rennes.grid5000.fr paravance-61.rennes.grid5000.fr-9: hostname (5826): status > Exited with status 0
Setting the connector definitively and running taktuk again
paravance-37:~$ export TAKTUK_CONNECTOR=oarsh paravance-37:~$ taktuk -m paravance-60 -m paravance-61 broadcast exec [ date ] paravance-61-2: date (5875): output > Mon Jan 15 11:32:36 CET 2018 paravance-60-1: date (6805): output > Mon Jan 15 11:32:36 CET 2018 paravance-61-2: date (5875): status > Exited with status 0 paravance-60-1: date (6805): status > Exited with status 0
Using best effort mode jobs
Best effort job campaign
OAR 2 provides a way to specify that jobs are best effort, which means that the server can delete them if room is needed to fit other jobs. One can submit such jobs using the besteffort type of job.
For instance you can run a job campaign as follows:
for param in $(< ./paramlist); do oarsub -t besteffort -l core=1 "./my_script.sh $param" done
In this example, the file ./paramlist
contains a list of parameters for a parametric application.
The following demonstrates the mechanism.
Note | |
---|---|
Please have a look at the UsagePolicy to avoid abuses. |
Best effort job mechanism
- Running a besteffort job in a first shell
frennes:~$ oarsub -I -l nodes=10 -t besteffort [ADMISSION RULE] Automatically redirect in the besteffort queue [ADMISSION RULE] Automatically add the besteffort constraint on the resources [ADMISSION RULE] Set default walltime to 3600. [ADMISSION RULE] Modify resource description with type constraints Generate a job key... OAR_JOB_ID=988535 Interactive mode : waiting... Starting... Connect to OAR job 988535 via the node parasilo-26.rennes.grid5000.fr
parasilo-26:~$ uniq $OAR_FILE_NODES parasilo-26.rennes.grid5000.fr parasilo-27.rennes.grid5000.fr parasilo-28.rennes.grid5000.fr parasilo-3.rennes.grid5000.fr parasilo-4.rennes.grid5000.fr parasilo-5.rennes.grid5000.fr parasilo-6.rennes.grid5000.fr parasilo-7.rennes.grid5000.fr parasilo-8.rennes.grid5000.fr parasilo-9.rennes.grid5000.fr
- Running a non best effort job on the same set of resources in a second shell
frennes:~$ oarsub -I -l {"network_address in ('parasilo-9.rennes.grid5000.fr')"}/nodes=1 [ADMISSION RULE] Set default walltime to 3600. [ADMISSION RULE] Modify resource description with type constraints Generate a job key... OAR_JOB_ID=988546 Interactive mode : waiting... [2018-01-15 13:28:24] Start prediction: 2018-01-15 13:28:24 (FIFO scheduling OK) Starting... Connect to OAR job 988546 via the node parasilo-9.rennes.grid5000.fr
As expected, meanwhile the best effort job was stopped (watch the first shell):
parasilo-26:~$ Connection to parasilo-26.rennes.grid5000.fr closed by remote host. Connection to parasilo-26.rennes.grid5000.fr closed. [ERROR] An unknown error occured : 65280 Disconnected from OAR job 988545
Testing the checkpointing trigger mechanism
Writing the test script
Here is a script which features an infinite loop and a signal handler trigged by SIGUSR2 (default signal for OAR's checkpointing mechanism).
#!/bin/bash handler() { echo "Caught checkpoint signal at: `date`"; echo "Terminating."; exit 0; } trap handler SIGUSR2 cat <<EOF Hostname: `hostname` Pid: $$ Starting job at: `date` EOF while : ; do sleep 10; done
Running the job
We run the job on 1 core, and a walltime of 5 minutes, and ask the job to be checkpointed if it lasts (and it will indeed) more than walltime - 150 sec = 2 min 30.
$ oarsub -l "core=1,walltime=0:05:00" --checkpoint 150 ./checkpoint.sh [ADMISSION RULE] Modify resource description with type constraints OAR_JOB_ID=988555 $
Result
Taking a look at the job output:
$ cat OAR.988555.stdout Hostname: parasilo-9.rennes.grid5000.fr Pid: 12013 Starting job at: Mon Jan 15 14:05:50 CET 2018 Caught checkpoint signal at: Mon Jan 15 14:08:30 CET 2018 Terminating.
The checkpointing signal was sent to the job 2 minutes 30 before the walltime as expected so that the job can finish nicely.
Interactive checkpointing
The oardel
command provides the capability to raise a checkpoint event interactively to a job.
We submit the job again
$ oarsub -l "core=1,walltime=0:05:0" --checkpoint 150 ./checkpoint.sh [ADMISSION RULE] Modify resource description with type constraints OAR_JOB_ID=988560
Then run the oardel -c #jobid
command...
$ oardel -c 988560 Checkpointing the job 988560 ...DONE. The job 988560 was notified to checkpoint itself (send SIGUSR2).
And then watch the job's output:
$ cat OAR.988560.stdout Hostname: parasilo-4.rennes.grid5000.fr Pid: 11612 Starting job at: Mon Jan 15 14:17:25 CET 2018 Caught checkpoint signal at: Mon Jan 15 14:17:35 CET 2018 Terminating.
The job terminated as expected.
Testing the mechanism of dependency on an anterior job termination
First Job
We run a first interactive job in a first Shell
frennes:~$oarsub -I [ADMISSION RULE] Set default walltime to 3600. [ADMISSION RULE] Modify resource description with type constraints Generate a job key... OAR_JOB_ID=988571 Interactive mode : waiting... Starting... Connect to OAR job 988569 via the node parasilo-28.rennes.grid5000.fr parasilo-28:~$
And leave that job pending.
Second Job
Then we run a second job in another Shell, with a dependence on the first one
jdoe@idpot:~$ oarsub -I -a 988571 [ADMISSION RULE] Set default walltime to 3600. [ADMISSION RULE] Modify resource description with type constraints Generate a job key... OAR_JOB_ID=988572 Interactive mode : waiting... [2018-01-15 14:27:08] Start prediction: 2018-01-15 15:30:23 (FIFO scheduling OK)
Job dependency in action
We do a logout on the first interactive job...
parasilo-28:~$ logout Connection to parasilo-28.rennes.grid5000.fr closed. Disconnected from OAR job 988571
... then watch the second Shell and see the second job starting
[2018-01-15 14:27:08] Start prediction: 2018-01-15 15:30:23 (FIFO scheduling OK) Starting... Connect to OAR job 988572 via the node parasilo-3.rennes.grid5000.fr
Container jobs
With the container job functionality, OAR allows for someone to execute inner jobs within the boundaries of the container job. Inner jobs are scheduled using the same algorithm as other jobs, but restricted to the container job's resources and timespan.
A typical use case is to submit first a container job, then you can submit as many inner jobs as you like, given that inner jobs that will not fit in the container's boundaries will stay possibly unscheduled and then possibly unexecuted.
Using such use case could for instance help dealing with the Grid5000:UsagePolicy, or to give a frame for jobs of students during a tutorial.
- First a job of the type container must be submitted
oarsub -I -t cosystem -t container -l nodes=10,walltime=2:00:00 ... OAR_JOB_ID=42 ...
- Then it is possible to use the inner type to schedule the new jobs within the previously created container job
oarsub -I -t inner=42 -l nodes=7,walltime=00:10:00 oarsub -I -t inner=42 -l nodes=1,walltime=00:20:00 oarsub -I -t inner=42 -l nodes=10,walltime=00:10:00
Note | |
---|---|
In the case: oarsub -I -t inner=42 -l nodes=11This job will never be scheduled because the container job "42" reserved only 10 nodes. |
Changing the walltime of a running job
Starting with OAR version 2.5.8, users can request a change to the walltime (duration of the resource reservation) of a running job. This can be achieved using the oarwalltime
command or Grid'5000's API.
This change can be an increase or a decrease, and specified giving either a new walltime value, or an increase value (begin with +) or a decrease value (begin with -).
Please note that a request may stay partially or completely unsatisfied if a next job occupies the resources.
Job must be running for a walltime change. For Waiting job, delete and resubmit.
Walltime change is not possible in the production queue (Nancy).
Warning | |
---|---|
While changes of walltime are not limited a priori (by the |
Command line interface
Querying the walltime change status:
frontend$ oarwalltime 1743185 Walltime change status for job 1743185 (job is running): Current walltime: 1:0:0 Possible increase: UNLIMITED Already granted: 0:0:0 Pending/unsatisfied: 0:0:0
Requesting the walltime change:
frontend$ oarwalltime 1743185 +1:30 Accepted: walltime change request updated for job 1743185, it will be handled shortly.
Querying right afterward:
frontend$ oarwalltime 1743185 Walltime change status for job 1743185 (job is running): Current walltime: 1:0:0 Possible increase: UNLIMITED Already granted: 0:0:0 Pending/unsatisfied: +1:30:0
The request is still to be handled by OAR's scheduler.
Querying again a bit later:
frontend$ oarwalltime 1743185 Walltime change status for job 1743185 (job is running): Current walltime: 2:30:0 Possible increase: UNLIMITED Already granted: +1:30:0 Pending/unsatisfied: 0:0:0
May a job exist on the resources and partially prevent the walltime increase, the query output would be:
frontend$ oarwalltime 1743185 Walltime change status for job 1743185 (job is running): Current walltime: 2:30:0 Possible increase: UNLIMITED Already granted: +1:10:0 Pending/unsatisfied: +0:20:0
Changes events are also reported in oarstat
.
See man oarwalltime
for more information.
Using the REST API
Requesting the walltime change:
curl -i -X POST https://api.grid5000.fr/stable/sites/grenoble/internal/oarapi/jobs/1743185.json -H'Content-Type: application/json' -d '{"method":"walltime-change", "walltime":"+0:30:0"}'
Querying the status of the walltime change:
curl -i -X GET https://api.grid5000.fr/stable/sites/grenoble/internal/oarapi/jobs/1743185/details.json -H'Content-Type: application/json'
See the walltime-change and events keys of the output.
Multi-site jobs with OARGrid
oargrid alows submitting OAR jobs to several Grid'5000 sites at once.
For instance, we are going to reserve 4 nodes on 3 different sites for half an hour
frontend :
|
oargridsub -t allow_classic_ssh -w '0:30:00' SITE1:rdef="/nodes=2",SITE2:rdef="/nodes=1",SITE3:rdef="nodes=1" |
Note that in grid reservation mode, no script can be specified. Users are in charge to:
- connect to the allocated nodes.
- launch their experiment.
OAR Grid connects to each of the specified clusters and makes a passive submission. Cluster job ids are returned by OAR. A grid job id is returned by OAR Grid to bind cluster jobs ids together.
You should see an output like this:
SITE1
:rdef=/nodes=2,SITE2
:rdef=/nodes=1,SITE3
:rdef=nodes=1 [OAR_GRIDSUB] [SITE3
] Date/TZ adjustment: 0 seconds [OAR_GRIDSUB] [SITE3
] Reservation success onSITE3
: batchId =SITE_JOB_ID3
[OAR_GRIDSUB] [SITE2
] Date/TZ adjustment: 1 seconds [OAR_GRIDSUB] [SITE2
] Reservation success onSITE2
: batchId =SITE_JOB_ID2
[OAR_GRIDSUB] [SITE1
] Date/TZ adjustment: 0 seconds [OAR_GRIDSUB] [SITE1
] Reservation success onSITE1
: batchId =SITE_JOB_ID1
[OAR_GRIDSUB] Grid reservation id =GRID_JOB_ID
[OAR_GRIDSUB] SSH KEY : /tmp/oargrid//oargrid_ssh_key_LOGIN
_GRID_JOB_ID
You can use this key to connect directly to your OAR nodes with the oar user.
Fetch the allocated nodes list to transmit it to the script we want to run:
Note | |
---|---|
The
|
(1) Select the node to launch the script (ie: the first node listed in the ~/machines
file).
If (and only if) this node does not belong to the site where the ~/machines
file was saved,
copy the ~/machines
to this node:
frontend :
|
OAR_JOB_ID=SITE_JOB_ID oarcp -i /tmp/oargrid/oargrid_ssh_key_LOGIN_GRID_JOB_ID ~/machines `head -n 1 machines`: |
(2) Connect to this node using oarsh
:
frontend :
|
OAR_JOB_ID=SITE_JOB_ID oarsh -i /tmp/oargrid/oargrid_ssh_key_LOGIN_GRID_JOB_ID `head -n 1 machines` |
And then run the script:
The Grid counterpart of oarstat
gives information about the grid job:
Our grid submission is interactive, so its end time is unrelated to the end time of our script run. The submission ends when the submission owner requests that it ends or when the submission deadline is reached.
We are going to ask for our submission to end:
Funk
funk
is grid resources discovery tool that works at nodes level and generate complex oarsub
/oargridsub
commands. It can help you in three cases:
- to know the number of nodes availables for 2 hours at run time, on sites lille, rennes and on clusters taurus and suno
- to know when 40 nodes on sagittaire and 4 nodes on taurus will be available, with deploy job type and a subnet
- to find the time when the maximum number of nodes are available during 10 hours, before next week deadline, avoiding usage policy periods, and not using genepi
More information on its dedicated page.
Mutli site visualization tools
OarGridGantt
OarGridGantt summarizes information given by its cluster counterpart DrawOARGantt. It prints temporal diagrams of past, current and planned states of each cluster:
Grid'5000 API
An other way to visualize nodes/jobs status is to use the Grid'5000 API