Advanced OAR: Difference between revisions

From Grid5000
Jump to navigation Jump to search
 
(92 intermediate revisions by 9 users not shown)
Line 4: Line 4:
__TOC__
__TOC__


This tutorial consists of various independent sections describing various details of OAR useful for an advanced usage, as well as some tips and tricks. It assumes you are familiar with OAR and Grid5000 basics. '''If not, please first look at the [[Getting Started]] page'''.
This tutorial consists of various independent sections describing various details of OAR useful for advanced usage, as well as some tips and tricks. It assumes you are familiar with OAR and Grid5000 basics. '''If not, please first look at the [[Getting Started]] page'''.


This OAR tutorial focuses on command line usage. It assumes you are using the bash shell (but should be easy to adapt to another shell). It can be read linearly, but you also may pick some random sections. Begin at least by [[#useful tips]].
This OAR tutorial focuses on command line usages. It assumes you are using the bash shell (but should be easy to adapt to another shell). It can be read linearly, but you also may pick some random sections. Begin at least by [[#useful tips]].


= OAR =
= OAR =
Line 12: Line 12:
== Useful tips ==
== Useful tips ==


* Take the time to carefuly configure ssh, as described in [https://www.grid5000.fr/mediawiki/index.php/SSH#The_Grid.275000_case the SSH page].
* Take the time to carefully configure ssh, as described in [[SSH#The_Grid.275000_case|the SSH page]].
* Use ''[[screen]]'' or ''tmux'' so that your work is not lost if you loose the connection to Grid5000. Moreover, having a screen session opened with one or more shell sessions allows you to leave your work session when you want then get back to it later and recover it exactly as you leaved it.
* Use ''[[screen]]'' or ''tmux'' so that your work is not lost if you lose the connection to Grid5000. Moreover, having a screen session opened with one or more shell sessions allows you to leave your work session when you want then get back to it later and recover it exactly as you leaved it.
* Most OAR commands (oarsub, oarstat, oarnodes) can provide output in various formats:
* Most OAR commands (<code class=command>oarsub</code>, <code class=command>oarstat</code>, <code class=command>oarnodes</code>) can provide output in various formats:
** textual (this is the default mode)
** text (this is the default mode)
** PERL dumper (-D)
** PERL dumper (-D)
** xml (-X)
** XML (-X)
** yaml (-Y)
** Yaml (-Y)
** json (-J)
** json (-J)
* Direct access to the OAR database: users can directly access the PostgreSQL OAR database ''oar2'' on the server ''oardb.<site>.grid5000.fr'' with the read-only account ''oarreader''. The password is ''read''.
* Regarding the <code class="command">oarsub</code> command line, you should mostly only see the "host" word, but the <code class="command">oarsub</code> command can use both the word "host" or "nodes" indifferently in Grid'5000, as nodes is just an alias for host. Prefer using "host". Besides, the word "host" is also to be preferred to the longer "network_address" word in the resources filters (both properties have sometime the same value, but not always).
* Regarding the oarsub command line, you should mostly only see the "host" word, but the oarsub command can use both the word "host" or "nodes" indifferently in Grid'5000, nodes is just an alias. Besides, the word "host" is to be preferred to the longer "network_address" word in the resources filters (both properties have the same values).
* At job submission time, only important information are printed out by <code class="command">oarsub</code>. To have more indication about what is done by OAR on Grid'5000 (like computed resource filter, exceptional granted privileges, …) the <code class="command">oarsub</code> verbose (<code>-v</code>) option can be used.
* A syntax simplification mechanism was deployed on Grid'5000 to ease job submission, described at [[OAR Syntax simplification]].


== Connection to a job ==
== Connection to the job's nodes ==
 
Two commands can be used to connect to nodes on Grid'5000, <code class="command">oarsh</code> and <code class="command">ssh</code>.
 
=== Using ssh ===
 
<code class="command">ssh</code> can only be used when a node is entirely reserved in your job (all CPU cores). Other cases may not allow assigning processes to the correct job, thus connecting with <code class="command">ssh</code> is not allowed.
 
For instance, when a node is entirely reserved as follows:
 
{{Term|location=fontend.site|cmd=<code class=command>oarsub</code> -I}}
# Set walltime to default (3600 s).
OAR_JOB_ID=<JOB_ID>
# Interactive mode: waiting...
# Starting...
user@node-32:~$


Being connected to a job means that your environment is setup (<code class="command">OAR_JOB_ID</code> and <code class="command">OAR_JOB_KEY_FILE</code>) so that OAR commands can work. You are automatically connected to a job if you have submitted it in interactive mode. Else you must manually connect to it:
If you open a new shell and try to connect to the node with <code class="command">ssh</code>, it should work:


$ JOBID=$(oarsub 'sleep 300' | sed -n 's/OAR_JOB_ID=\(.*\)/\1/p')
{{Term|location=fontend.site|cmd=<code class=command>ssh</code> node-32 }}
  $ oarsub -C $JOBID
Linux node-32.site.grid5000.fr 5.10.0-11-amd64 #1 SMP Debian 5.10.92-1 (2022-01-18) x86_64
  $ pkill -f 'sleep 300'
Debian11-x64-std-2022013022
(Image based on Debian Bullseye for AMD64/EM64T)
  Maintained by support-staff <support-staff@lists.grid5000.fr>
 
Last login: Wed Feb 23 15:20:32 2022 from 172.16.31.101
  user@node-32:~$


== Connection to the job's nodes ==
However, when reserving for instance only one CPU core of a node:
 
{{Term|location=fontend.site|cmd=<code class=command>oarsub</code> -I -l core=1}}
# Set walltime to default (3600 s).
OAR_JOB_ID=<JOB_ID>
# Interactive mode: waiting...
# Starting...
user@node-32:~$
 
When trying to connect to the node with <code class="command">ssh</code> in another shell, you get:
 
{{Term|location=fontend.site|cmd=<code class=command>ssh</code> node-32 }}
To connect using 'ssh' directly, you must have a single job using all available cores on the node.
Use 'oarsh' instead.
Connection closed by node-32 port 22
 
=== Using oarsh ===
 
<code class=command>oarsh</code> is a frontend to <code class=command>ssh</code> (the <code class=command>oarsh</code> command wraps the OpenSSH <code class=command>ssh</code> command to add some required functions to connect to a job, but provides mostly the same interface/options). 
 
{{Note|text=Technical note about <code class=command>oarsh</code> internals:
* It opens an ssh connection transiently as the <code class="command">oar</code> user to the OAR dedicated SSH server running on a node (TCP port 6667)
* It detects who you are based on the job id ou a job key: if you indeed have the right to connect to the node (you reserved it in an OAR job), it switches back to your user for the execution of the shell or command on the node in the job's context (cgroup/cpuset).
}}
In case of nodes are not entirely reserved (all CPU cores), you have to use the <code class="command">oarsh</code> command to connect to nodes instead of <code class="command">ssh</code>, and <code class="command">oarcp</code> instead of <code class="command">scp</code> to copy files to/from the nodes. If you use <code class="command">taktuk</code> for parallel executions (or a similar tools like <code class="command">pdsh</code>) or <code class="command">rsync</code> to synchronize files to/from a node, you have to configure the ''connector'' so the command uses <code class="command">oarsh</code> instead of <code class="command">ssh</code> underneath (see the man pages of the command to find out how to change the connector, e.g. using ''-c'' or ''-e'').
 
Please note that <code class=command>oarsh</code> also works for node entirely reserved in a job.
 
==== Splitting job resources ====
<code class=command>oarsh</code> also allows splitting resources of a job, for instance to execute commands on different subsets of resources in a job (e.g. 1 GPU each instead of all the reserved GPUs).


You will normally use the <code class="command">oarsh</code> wrapper to connect to the nodes instead of <code class="command">ssh</code>, and <code class="command">oarcp</code> instead of <code class="command">scp</code> to copy files to/from the nodes. If you use taktuk (or a similar tools like pdsh), you have to configure it so that it uses oarsh instead of ssh.
See an example of using this functionality with [[GNU_Parallel#Confining_GNU_Parallel_tasks_to_GPU.2C_CPU.2C_or_cores|GNU Parallel]].


=== oarsh and job keys ===
==== About OAR job keys ====


By default, OAR generates an ssh key pair for each job, and oarsh is used to connect the job's nodes.
By default, OAR generates a job key pair for each job. <code class="command">oarsh</code> can use either the <code class="command">OAR_JOB_ID</code> or <code class="command">OAR_JOB_KEY_FILE</code> environment variables to know what job to connect. If outside a job shell (e.g. on the frontend), you have to set one of those environment variable. This is not required if <code class="command">oarsh</code> is called from the shell of a job (e.g. on a node), since variables are already set.
oarsh looks to environment variables <code class="command">OAR_JOB_ID</code> or <code class="command">OAR_JOB_KEY_FILE</code> to know the key to use. This oarsh works directly if you are connected. You can also connect to the nodes without being connected to the job:


$ oarsub -I
; Example using <code class="command">OAR_JOB_ID</code>
  [ADMISSION RULE] Set default walltime to 3600.
For instance, create a job requesting 3 hosts (3 nodes):
[ADMISSION RULE] Modify resource description with type constraints
{{Term|location=fontend.site|cmd=<code class=command>oarsub</code> -I -l host=3}}
Generate a job key...
  # Set default walltime to 3600.
  OAR_JOB_ID=<JOBID>
  OAR_JOB_ID=<JOBID>
# Interactive mode: waiting...
# Starting...
  ...
  ...


then, in another terminal:
Then, in another terminal, assuming the 2nd host in the job is named <code class=replace>node-2</code>:


$ OAR_JOB_ID=<JOBID> oarsh <NODE_NAME>
{{Term|location=fontend.site|cmd=OAR_JOB_ID=<code class=replace>JOBID</code> <code class=command>oarsh</code> <code class=replace>node-2</code>}}


If needed OAR allows to export the job key of a job.
; Example using <code class="command">OAR_JOB_KEY_FILE</code>
OAR can expose the job key, using the <code class=command>-e</code> option of <code class=command>oarsub</code>


Note that in this case (single-node job), the command is equivalent to:
{{Term|location=fontend.site|cmd=<code class=command>oarsub</code> -I -l host=3 -e <code class=replace>my_job_key</code>}}


$ oarsub -C <JOBID>
Then, in another terminal, assuming the 2nd host in the job is named <code class=replace>node-2</code>:


=== sharing keys between jobs ===
{{Term|location=fontend.site|cmd=OAR_JOB_KEY_FILE=<code class=replace>my_job_key</code> <code class=command>oarsh</code> <code class=replace>node-2</code>}}


Telling oar to always use the same key can be very convenient. If you have a passphrase-less ssh key dedicated for navigating inside grid5000, then in your <code class="command">~/.profile</code> or <code class="command">~/.bash_profile</code> you can set:
{{Note|text=Note that the following command also allows getting a shell in a job, but only on the first default resource (i.e. node).


export OAR_JOB_KEY_FILE=<path_to_your_private_key>
{{Term|location=fontend.site|cmd=<code class=command>oarsub</code> -C <code class=replace>JOBID</code>}}
}}


Then, OAR will always use this key for all submitted jobs, which allows you to connect to your nodes with oarsh without being connected to the job.
==== Connecting to a job of a different site ====
Job keys are especially useful when having to connect from nodes of different sites, since each site is managed by a different OAR instance.


Moreover, if this key is replicated between all Grid5000 sites, and if the environment variable <code class="command">OAR_JOB_KEY_FILE</code> is exported in <code class="command">~/.profile</code> or <code class="command">~/.bash_profile</code> on all sites, you will be able to connect directly from any frontend to any reserved node of any site.
Thus, a convenient way is to tell OAR to always use the same job key for all jobs. You can for instance use your Grid'5000 internal SSH key (This key is generated when your account is created) as the job key: In your <code class="command">~/.profile</code> or <code class="command">~/.bash_profile</code>, set:


If using the same key for all jobs, be warned that this will raise issues if submitting two or more jobs that share a same subset of nodes on different cpusets, because in this case processes cannot be guarantied to run on the good cpuset.
export OAR_JOB_KEY_FILE=<code class=replace>path_to_your_private_key</code>


=== allow_classic_ssh ===
Then, OAR will always use that key for all jobs, allowing you to connect to your nodes with <code class=command>oarsh</code> seamlessly from sites to sites, jobs to jobs, or even outside jobs.


Submitting with option <code class="command">-t allow_classic_ssh</code> allows you to use ssh directly instead of oarsh to connect to the nodes, at the cost of not being able to select resources at a finer level than the node (cpu, core).
{{Warning|text=When using the same job key for 2 jobs that share some nodes (each job reserving part of the nodes), <code class=command>oarsh</code> may not execute in the expected job context (i.e. cgroup/cpuset) as the job key does not differentiate jobs. You may look at the <code class="command">OAR_JOB_ID</code> to notice that.}}


=== oarsh details ===
==== oarsh vs ssh: tips and tricks ====
{{Note|text=The following is only interesting if you jobs do not reserve nodes entirely, as using <code class=command>oarsh</code> is useless otherwise}}


oarsh is a frontend to ssh. It opens an ssh connection as user <code class="command">oar</code> to the dedicated oar ssh server running on the node, listening on port 6667. It detects who you are based on your key, and if you have the right to use the node (if you have reserved it) it will <code class="command">su</code> to your user on the node.
; 1st tip - hide <code class=command>oarsh</code>, ''rename'' it <code class=command>ssh</code>
Creating a symlink from <code class=file>~/bin/ssh</code> (assuming it is in the execution <code class=command>PATH</code>) to <code class=file>/usr/bin/oarsh</code> allows hidding the wrapper use (as long as <code class=command>the OAR_JOB_ID</code> or <code class=command>OAR_JOB_KEY_FILE</code> environment variables are set when connecting from a frontend to a node).


So, if you don't have oarsh installed, you can still connect to the nodes by simulating it. One use case is if you have reserved nodes and want to connect to them through an ssh proxy as described in [[SSH#Using_SSH_with_ssh_proxycommand_setup_to_access_hosts_inside_Grid.275000]]:
; 2nd tip - using <code class=command>ssh</code> directly, without <code class=command>oarsh</code>
If using <code class=command>oarsh</code> does not suit your need, because you would like to use some of the options of <code class=command>ssh</code> that <code class=command>oarsh</code> does not support, you can also connect to reserved nodes by using the real <code class=command>ssh</code> by adding the right set of options to the command. It can also allow a connection to reserved nodes directly from some place where <code class=command>oarsh</code> is not available (e.g. from outside  Grid'5000):


If you have a passphrase-less ssh key internal to Grid5000, that you use to navigate inside Grid5000, you can tell oar to use this key instead of generating a job-key (see [[#sharing keys between jobs]]), then you can copy this key to your workstation outside of Grid5000:
Assuming you have a passphrase-less SSH key (preferably just for internal uses in Grid5000), you can tell <code class=command>oarsub</code> to use that key as a job key instead of letting OAR generate a new one (see [[#sharing keys between jobs]]). Then you can use that key to connect to nodes, even from outside Grid'5000.


user-laptop$ scp g5k:.ssh/<internal_key_name> g5k:.ssh/<internal_key_name>.pub ~/
* Copy the key to your workstation, for instance outside of Grid5000:
{{Term|location=workstation|cmd=scp <code class=replace>site</code>.g5k:.ssh/<code class=file>your_internal_private_key_file</code> ~/}}
* In Grid5000, submit a job using this key:
{{Term|location=fontend.site|cmd=<code class=command>oarsub</code> -i ~/.ssh/<code class=file>your_internal_private_key_file</code> -I}}
* Wait for the job to start. Then in another terminal, from outside Grid5000, try connecting to the node:
{{Term|location=workstation|cmd=ssh -i ~/<code class=file>your_internal_private_key_file</code> -p 6667 <code class=replace>[any other ssh options]</code> oar@<code class=replace>reserved-node</code>.<code class=replace>site</code>.g5k}}


In Grid5000, submit a job using this key:
Finally, this can be hidden in a ''SSH ProxyCommand'' (See also [[SSH#Using_SSH_ProxyCommand_feature_to_ease_the_access_to_hosts_inside_Grid.275000]]):


$ oarsub -i ~/.ssh/<internal_key_name> -I
After adding the following configuration in your OpenSSH configuration file on your workstation (<code class=file>~/.ssh/config</code>):
Host *.g5koar
ProxyCommand ssh <code class=replace>g5k-username</code>@access.grid5000.fr -W "$(basename %h .g5koar):%p"
User oar
Port 6667
IdentityFile ~/<code class=file>your_internal_private_key_file</code>
ForwardAgent no


Wait for the job to start. Then in another terminal, from outside Grid5000, try connecting to the node:
'''Warning:''' the <code class=command>ProxyCommand</code> line works if your login shell is <code class=command>bash</code>. If not you may have to adapt it.


user-laptop$ ssh -i ~/<internal_key_name> -p 6667 oar@<node name>.<site>.g5k
You can just ssh to a reserved node directly from your workstation as follows:
{{Term|location=workstation|cmd=ssh <code class=replace>reserved-node</code>.<code class=replace>site</code>.g5koar}}


== Passive and interactive job modes ==
== Passive and interactive job modes ==


In interactive mode: a shell is opened on the first node of the reservation (or on the frontend, with appropriate environment set, if the job is of type <code class="command">deploy</code>). In interactive mode, the job will be killed as soon as this job's shell is closed and will be limited by the job's walltime. It can also be killed by an explicit <code class="command">oardel</code>.
=== Interactive mode ===
 
In interactive mode, a shell is opened on the first default resource (i.e. node) of the job (or on the frontend, if the job is of type <code class="command">deploy</code> or <code class="command">cosystem</code>). In interactive mode, the job will be terminated as soon as this job's shell is closed or will be killed earlier if the job's <code class=replace>walltime</code> is reached. It can also be killed by an explicit <code class="command">oardel</code>.


You can experiment with 3 shells. On first shell, to see the list of your running jobs, regularly run:
You can experiment with 3 shells. On first shell, to see the list of your running jobs, regularly run:


$ oarstat -u
{{Term|location=fontend.site|cmd=<code class=command>oarstat</code> -u}}


To see your own jobs. On the second shell, run an interactive job:
To see your own jobs. On the second shell, run an interactive job:


$ oarsub -I
{{Term|location=fontend.site|cmd=<code class=command>oarsub</code> -l walltime=<code class=replace>walltime</code> -I}}
 
Wait for the job to start, run <code class=command>oarstat</code>, then leave the job, run <code class=command>oarstat</code> again. Submit another interactive job, and on the third shell, kill it:
 
{{Term|location=fontend.site|cmd=<code class=command>oardel</code> <code class=replace>JOBID</code>}}
 
=== Passive mode ===
 
In passive mode, the <code class=replace>command</code> that is given to <code class=command>oarsub</code> is executed on the first default resource (i.e. node) of the job (or on the site's frontend if the job is of type <code class=command>deploy</code> or <code class=command>cosystem</code>). The job's duration will be the shorter of the execution time of the <code class=replace>command</code> and the job's given <code class=replace>walltime</code>. That unless the job is terminated beforehand by an explicit <code class=command>oardel</code> call from the user or administrator.
 
{{Term|location=fontend.site|cmd=<code class=command>oarsub</code> -l host=3,walltime=<code class=replace>walltime</code> "<code class=replace>command</code>"}}
 
'''command''' can be a simple script name or a more complex command line with arguments.


Wait for the job to start, run oarstat, then leave the job, run oarstat again. Submit another interactive job, and on the third shell, kill it:
To pass arguments, you have to quote the whole command line, like in the following example:
<code class="command">oarsub</code> -l nodes=4,walltime=2 <code class="replace">"/path/to/myscript arg1 arg2 arg3"</code>


$ oardel <JOBID>
'''Note:''' to avoid random code injection, <code class="command">oarsub</code> allows only alphanumeric characters (<code>[a-zA-Z0-9_]</code>), whitespace characters (<code>[ \t\n\r\f\v]</code>) and few others (<code>[/.-]</code>) inside its command line argument.


In passive mode: an executable is run by oar on the first node of the reservation (or on the frontend, with appropriate environment set, if the job is of type <code class="command">deploy</code>). In passive mode, the limitation to the job's length is its walltime. It can also be killed by an explicit <code class="command">oardel</code>.
Special case for jobs of type <code class=command>noop</code> which are always passive jobs: no command is executed for them. The duration of the job is the given <code class=replace>walltime</code>.


JOBID=$(oarsub 'uname -a' | sed -n 's/OAR_JOB_ID=\(.*\)/\1/p')
{{Term|location=fontend.site|cmd=<code class=command>oarsub</code> -t <code class=command>noop</code> -l host=3,walltime=<code class=replace>walltime</code>}}
cat OAR.$JOBID.stdout


You may not want a job to be interactive or to run a script when the job starts, for example because you will use the reserved resources from a program whose lifecycle is longer than the job (and which will use the resources by connecting to the job). One trick to achieve this is to run the job in passive mode with a long <code class="command">sleep</code> command. One drawback of this method is that the job may terminate with status error if the <code class="command">sleep</code> is killed. This can be a problem in some situations, eg. when using job dependencies.
<code class=replace>oardel</code> can also be used to terminate a passive mode reservation. Note that it is only possible to remove the complete reservation, and not individual nodes.


== Submission and reservation modes ==
{{Term|location=fontend.site|cmd=<code class=command>oardel</code> <code class=replace>JOBID</code>}}


; Submission (aka batch jobs):
=== Interactive mode without shell ===
If you don't specify the job's start date (oarsub <code class="command">-r</code> option), then your job is a ''submission'', which lets OAR choose the best schedule and resources.
; Reservation (aka advance reservations):
If you specify the job's start date, that is a ''reservation''. OAR will just try to find resources for the given schedule, fixed by you.


There are some consequences:
You may not want a job to open a shell or to run a script when the job starts, for example because you will use the reserved resources from a program whose lifecycle is longer than the job (and which will use the resources by connecting to the job).
* Current Grid5000 usage policy allows no more than 2 reservations per site (excluding reservations that start in less than one hour)
 
* in submission mode you're almost guaranteed to get your wanted resources, because oar can decide what resources to allocate at the last moment. You cannot get the list of resources until the job starts.
One trick to achieve this is to run the job in passive mode with a long <code class="command">sleep</code> command. One drawback of this method is that the job may terminate with status error if the <code class="command">sleep</code> is killed. This can be a problem in some situations, eg. when using job dependencies.
* in reservation mode, you're not guaranteed to get your wanted resources, because oar has to plan the allocation of resources at reservation time. If later resources become not available, you lose them for your job. You can get the list of resources as soon as the reservation starts.
 
* in submission mode, you cannot know the date at which your job will start until it starts. But OAR can give you an ''estimation'' of that date.
Another solution is to use an advance reservation (see below) with a starting date very close in the future, or even with the current date and time.
* to coordinate oar submissions on several sites, OARGrid has to do OAR reservations.
 
== Batch jobs vs. advance reservation jobs ==
 
; Batch jobs:
If you do not specify the job's start date (oarsub <code class="command">-r</code> option), then your job is a ''batch job''. It lets OAR choose the best schedule (start date).
* With batch jobs, you're guaranteed to get the count of allocated resources you requested, because OAR chooses what resources to allocate to the job just before its start. If some resources suddenly become unavailable, OAR changes the assigned resources and/or the start date.
* Therefore, you cannot get the actual list of resources until the job starts (but a forecast is provided, such as what is shown in the Drawgantt diagrams).
* With batch jobs, you cannot know the start date of your job until it actually starts (any event can change the forecast). But OAR gives an ''estimation'' of the start date (such as shown in the Drawgantt diagram, which also changes after any event).
 
; Advance reservations:
If you specify the job's start date, it is an ''advance reservation''. OAR will just try to find resources for the given schedule, fixed by you.
* The [[Grid5000:UsagePolicy#Privilege_levels_table|Grid5000 usage policy]] allows no more than 2 advance reservations per site (excluding reservations that start in less than one hour)
* With advance reservation jobs, you're not guaranteed to get the count of resources you requested, because OAR planned the allocation of resources at the reservation time.  
* If some resources became unavailable when the job has to start, the job is delayed a bit in case resources may come back (e.g. return from standby).
* If after 400 seconds, if not all resources are available, the job will start with fewer resources than initially allocated. This is however quite unusual.
* The list of allocated resources to an advance reservation job is fixed and known as soon as the advance reservation is validated. But you will get the actual list of resources (that is with unavailable resources removed for it) when the advance reservation starts.
* To coordinate the start date of OAR jobs on several sites, oargrid or funk use advance reservations.


Example: a reservation for a job in one week from now
Example: a reservation for a job in one week from now


  $ oarsub -r "$(date +'%Y-%m-%d %H:%M:%S' --date='+1 week')"
  $ oarsub -r "$(date +'%F %T' --date='+1 week')"


For reservations, there is no interactive mode. You can give OAR a command to execute or nothing. If you don't give a command, you'll have to connect to the jobs once the reservation starts (using oarsub -C <jobid> or oarsh).
For advance reservations, there is no interactive mode. You can give OAR a command to execute or nothing. If you do not give a command, you'll have to connect to the jobs once the reservation starts (using oarsub -C <jobid> or oarsh).
 
=== Why did my advance reservation start with less than all the resources I requested ? ===
Since resources states are transitional, the advance reservation process considers indifferently the current state of resources, be it ''alive'', ''suspended'' or ''absent''. Indeed, at the requested start time of an advance reservation, all resources in any of those states should presumably be back in the ''alive'' state.
 
This is different for resources in the ''dead'' state, which mark failed resources. Although ''dead'' resources may be repaired at some point, that state is less transitional, so ''dead'' resources are excluded from eligible resources for advance reservations.
 
Also, the allocation of resources to an advance reservation is fixed at the time of validation of the submission (contrarily to batch jobs for which both the start time and allocated resources can change up until the job is effectively started, in order to fit with all requested resources available). As a consequence, resources allocated to an advance reservation which would end up unavailable at the job start time are not replaced by other ''alive'' resources.
 
In fact, at the start time of an advance reservation, OAR looks after any unavailable resources (''absent'' or ''suspected''), and whenever some exists, wait for them to return to the ''alive'' state for 5 minutes. Then, if they are not back in time, the job starts with less resources than requested and initially allocated (assuming at least one resource is available).
 
;NB
Information about reduced number of resources or reduced walltime for a reservation due to this mechanism are available in the event part of the output of
<code class='command'>oarstat -fj </code><code class='replace'>jobid</code>


== Getting information about a job ==
== Getting information about a job ==
Line 164: Line 273:


; List resource properties
; List resource properties
You can get the list of resource properties for SQL predicates by running <code class="command">oarprint -l</code> command:
You can get the list of resource properties for SQL predicates by running the <code class="command">oarprint -l</code> command on a node:


  sagittaire-1 $ oarprint -l
  sagittaire-1 $ oarprint -l
Line 176: Line 285:
'''These OAR properties are described in the [[OAR Properties]] page.'''
'''These OAR properties are described in the [[OAR Properties]] page.'''


{{Note|text=A SQL predicates on the resource properties can also be set using the <code class="command">-p <...></code> syntax, in which case it applies to all aggregated resource specifications. It can also be combined with the <code class="command">-l <...></code> syntax (curly brackets), for some possible common parts among all aggragates. Please refer to a SQL syntax manual in order to build a correct SQL predicate syntax, which technically speecking is a ''WHERE'' clause of a resource selection SQL matching.}}
{{Note|text=A SQL predicate on the resource properties can also be set using the <code class="command">-p <...></code> syntax, in which case it applies to all aggregated resource specifications. It can also be combined with the <code class="command">-l <...></code> syntax (curly brackets), for some possible common parts among all aggragates. Please refer to a SQL syntax manual in order to build a correct SQL predicate syntax, which technically speaking is a ''WHERE'' clause of a resource selection SQL matching.}}


=== Using the resources hierarchies ===
=== Using the resources hierarchies ===
The OAR resources define implicit hierarchies to be used on the resource requests (oarsub -l). These hierarchies are specific to Grid'5000.
The OAR resources define implicit hierarchies to be used on the resource requests (oarsub -l). These hierarchies are specific to Grid'5000.


For instance:
; For instance:
* ask for 1 core on 15 nodes on a same cluster (total = 15 cores)
* request 1 core on 15 hosts (nodes) on a same cluster (total = 15 cores)
  $ oarsub -I -l /cluster=1/nodes=15/core=1
  $ oarsub -I -l /cluster=1/host=15/core=1
* ask for 1 core on 15 nodes on 2 clusters (total = 30 cores)
* request 1 core on 15 hosts (nodes) on 2 clusters (total = 30 cores)
  $ oarsub -I -l /cluster=2/nodes=15/core=1
  $ oarsub -I -l /cluster=2/host=15/core=1
* ask for 1 core on 2 cpus on 15 nodes on a same cluster (total = 30 cores)  
* request 1 core on 2 cpus on 15 hosts (nodes) on a same cluster (total = 30 cores)  
  $ oarsub -I -l /cluster=1/nodes=15/cpu=2/core=1
  $ oarsub -I -l /cluster=1/host=15/cpu=2/core=1
* ask for 10 cpus on 2 clusters (total = 20 cpus, the number of nodes and cores depends on the topology of the machines)
* request 10 cpus on 2 clusters (total = 20 cpus, the number of hosts and cores depends on the topology of the machines)
  $ oarsub -I -l /cluster=2/cpu=10
  $ oarsub -I -l /cluster=2/cpu=10
* ask for 1 core on 3 different network switches (total = 3 cores)
* request 1 core on 3 different network switches (total = 3 cores)
  $ oarsub -I -l /switch=3/core=1
  $ oarsub -I -l /switch=3/core=1
; Examples for GPUs
* request 3 GPUs on 1 single host (node). Obviously eligible nodes for the job need to have at least 3 GPU.
$ oarsub -I -l host=1/gpu=3
* request 3 GPUs, possibly on different nodes depending on availability (other jobs, possible resources):
$ oarsub -I -l gpu=3
* request a full node (possibly featuring more than 3 GPUs) with at lease 3 GPUs:
$ oarsub -p "gpu_count >= 3" -l host=1 [...]
* In the job, running oarprint as follows shows what GPUs are available in the job:
$ oarprint gpu -P host,gpudevice
(you may also look at nvidia-smi's output)


Valid resource hierarchies are:
;Valid resource hierarchies are:
* Compute and disk resources
* Compute and disk resources
** both ''switch > cluster'', or ''cluster > switch'' can be valid (some clusters spread their nodes on many switches, some clusters share a same switch), we note below ''cluster|switch'' to reflect that ambiguity.  
** both ''switch > cluster'', or ''cluster > switch'' can be valid (some clusters spread their hosts (nodes) on many switches, some clusters share a same switch), we note below ''cluster|switch'' to reflect that ambiguity.  
** ''cluster|switch > chassis > host > cpu > gpu > core''
** ''cluster|switch > chassis > host > cpu > gpu > core''
** ''cluster|switch > chassis > host > disk''
** ''cluster|switch > chassis > host > disk''
Line 203: Line 322:
** ''slash16 > slash17 > slash18 > slash19 > slash20 > slash21 > slash22''
** ''slash16 > slash17 > slash18 > slash19 > slash20 > slash21 > slash22''
Of course not all hierarchy levels have to be given in a resource request.
Of course not all hierarchy levels have to be given in a resource request.
{{Note|text=Please mind that the ''nodes'' keyword (plural!) is an (historical) alias for ''host'' (singular!). A ''node'' or ''host'' is one server (computer). For instance, ''-l /cluster=X/nodes=Y/core=Z'' is exactly the same as ''-l /cluster=X/host=Y/core=Z''.}}
{{Note|text=Please mind that the ''nodes'' keyword (plural!) is an alias for ''host'' (singular!). A ''node'' or ''host'' is one server (computer). For instance, ''-l /cluster=X/nodes=Y/core=Z'' is exactly the same as ''-l /cluster=X/host=Y/core=Z''.}}


=== Selecting resources using properties ===
=== Selecting resources using properties ===
Line 210: Line 329:


For example in Nancy:
For example in Nancy:
  $ oarsub -I -l {"cluster='graphene'"}/nodes=2
  $ oarsub -I -l {"cluster='graphene'"}/host=2


Or, alternative syntax:
Or, alternative syntax:
  $ oarsub -I -p "cluster='graphene'" -l /nodes=2
  $ oarsub -I -p "cluster='graphene'" -l /host=2
 
; Selecting nodes with a specific CPU architecture
 
For classical x86_64:
 
$ oarsub -I -p "cpuarch='x86_64'"
 
Other architectures are "exotic" so a specific type of job is needed:
 
$ oarsub -I -t exotic -p "cpuarch='ppc64le'"


; Selecting specific nodes
; Selecting specific nodes


For example in Lyon:
For example in Lyon:
  $ oarsub -I -l {"host in ('sagittaire-10.lyon.grid5000.fr', 'sagittaire-11.lyon.grid5000.fr', 'sagittaire-12.lyon.grid5000.fr')"}/nodes=1
  $ oarsub -I -l {"host in ('sagittaire-10.lyon.grid5000.fr', 'sagittaire-11.lyon.grid5000.fr', 'sagittaire-12.lyon.grid5000.fr')"}/host=1


or, alternative syntax:
or, alternative syntax:
Line 229: Line 358:
Ask for 10 cores of the cluster graphene
Ask for 10 cores of the cluster graphene
  $ oarsub -I -l core=10 -p "cluster='graphene'"
  $ oarsub -I -l core=10 -p "cluster='graphene'"
Ask for 2 nodes with 16384 GB of memory and Infiniband 20G
Ask for 2 nodes with 16384 MB of memory and Infiniband 20G
  $ oarsub -I -p "memnode='16384' and ib_rate='20'" -l nodes=2
  $ oarsub -I -p "memnode='16384' and ib_rate='20'" -l host=2
Ask for any 4 nodes except graphene-12
Ask for any 4 nodes except graphene-12
  $ oarsub -I -p "not host like 'graphene-12.%'" -l nodes=4
  $ oarsub -I -p "not host like 'graphene-12.%'" -l host=4
Ask for a node with 32 threads
$ oarsub -I  -p 'thread_count=32'


; Examples of joint resources requests
; Examples of joint resources requests
Line 238: Line 369:
Ask for 2 nodes with virtualization capability, on different clusters + IP subnets:
Ask for 2 nodes with virtualization capability, on different clusters + IP subnets:


* We want 2 nodes and 4 /22 subnets with the following constraints:
* We want 2 nodes (hosts) and 4 /22 subnets with the following constraints:
** Nodes are on 2 different clusters of the same site (Hint: use a site with several clusters :-D)
** Nodes are on 2 different clusters of the same site (Hint: use a site with several clusters :-D)
** Nodes have virtualization capability enabled
** Nodes have virtualization capability enabled
Line 244: Line 375:
** 2 subnets belonging to the same /19 subnet are consecutive
** 2 subnets belonging to the same /19 subnet are consecutive
   
   
  $ '''oarsub -I -l /slash_19=2/slash_22=2+{"virtual!='none'"}/cluster=2/nodes=1'''
  $ '''oarsub -I -l /slash_19=2/slash_22=2+{"virtual!='none'"}/cluster=2/host=1'''


Lets verify the reservation:
Lets verify the reservation:
Line 263: Line 394:


Another example, ask for both
Another example, ask for both
* 1 core on 2 nodes on the same cluster with 16384  MB of memory and Infiniband 20G
* 1 core on 2 hosts (nodes) on the same cluster with 16384  MB of memory and Infiniband 20G
* 1 cpu on 2 nodes on the same switch with 8 cores processors for a walltime of 4 hours
* 1 cpu on 2 hosts (nodes) on the same switch with 8 cores processors for a walltime of 4 hours


  $ '''oarsub -I -l "{memnode=16384 and ib_rate='20'}/cluster=1/nodes=2/core=1+{cpucore=8}/switch=1/nodes=2/cpu=1,walltime=4:0:0"'''
  $ '''oarsub -I -l "{memnode=16384 and ib_rate='20'}/cluster=1/host=2/core=1+{cpucore=8}/switch=1/host=2/cpu=1,walltime=4:0:0"'''


Walltime must always be the last argument of <code class="command">-l <...></code>
Walltime must always be the last argument of <code class="command">-l <...></code>
{{Note|text=If no resource matches your request, oarsub will exit with the message
{{Note|text=If no resource matches your request, oarsub will exit with the message
  Generate a job key...
  # Set default walltime to 3600.
[ADMISSION RULE] Set default walltime to 3600.
  There are not enough resources for your request
  [ADMISSION RULE] Modify resource description with type constraints
'''There are not enough resources for your request'''
  OAR_JOB_ID=-5
  OAR_JOB_ID=-5
  Oarsub failed: please verify your request syntax or ask for support to your admin.
  # Error: oarsub failed, please verify your request syntax.
Check that what you are requesting is actually relevant.
}}
}}


Line 283: Line 411:


We first submit a job
We first submit a job
  $ '''oarsub -I -l nodes=4'''
  $ '''oarsub -I -l host=4'''
  ...
  ...
  OAR_JOB_ID=178361
  OAR_JOB_ID=178361


; Retrieve the host list
; Retrieve the nodes list
We want the list of the nodes we got, identified by '''unique''' hostnames
We want the list of the nodes (hosts) we got, identified by '''unique''' hostnames
  $ '''oarprint host'''
  $ '''oarprint host'''
  sagittaire-32.lyon.grid5000.fr
  sagittaire-32.lyon.grid5000.fr
Line 295: Line 423:
  sagittaire-28.lyon.grid5000.fr
  sagittaire-28.lyon.grid5000.fr
(We get 1 line per host, not per core !)
(We get 1 line per host, not per core !)
{{Warning|text='''nodes''' is a ''pseudo'' property: you must use '''host''' instead}}


; Retrieve the core list
; Retrieve the core list
Line 307: Line 434:
  164
  164
  242
  242
Obviously, retrieving OAR internal core Id might not help much. Hence the use of a customized output format
Obviously, retrieving OAR internal core Id might not help much. Hence the use of a customized output format below.


; Retrieve core list with host and cpuset Id as identifier
; Retrieve core list with host and cpuset Id as identifier
Line 358: Line 485:


== X11 forwarding ==
== X11 forwarding ==
X11 forwarding can be enabled with oarsh. As for ssh you need to pass option -X to oarsh.
X11 forwarding is enabled in the shell opened in interactive job (<code class=command>oarsub -I</code>).
X11 forwarding can also be enabled in a shell opened on a node of a job with <code class=command>oarsh</code>, just like with a classic <code class=command>ssh</code> command: The <code class=command>-X</code> or <code class=command>-Y</code> option must be passed to <code class=command>oarsh</code>.
We will use xterm to test X11.
 
{{Note|text=Please mind that for X11 forwarding to work in the job, X11 forwarding must already work in the shell from which the OAR commands are run. Check the <code class=command>DISPLAY</code> environment variable !}}
 
We will use <code class=command>xterm</code> to test X11.
 
; Enabling X11 forwarding up to the frontend
Connect to a frontend with <code class=command>ssh</code> (reminder: read [[Getting_Started#Recommended_tips_and_tricks_for_efficient_use_of_Grid.275000|the getting started tutorial]] about the use of the ssh proxycommand), and make sure the X11 forwarding is operational so far:
 
Look at the <code class=command>DISPLAY</code> environment variable, which ssh should have set to <code class=host>localhost</code>:<code class=replace>10.0</code> or the like (the <code class=replace>10.0</code> part may vary from hop to hop in the X11 forwarding chain, with numbers greater than 10).
 
It requires to use the <code class=command>-X</code> or <code class=command>-Y</code> option in the <code class=command>ssh</code> command line, or to have <code class=command>ForwardX11=yes</code> set in your SSH configuration.


; Shell 1
In any case, check:
Connect to a frontend with ssh with option -X and look at the DISPLAY environment variable
{{Term|location=frontend.site|cmd=echo $DISPLAY}}
$ '''echo $DISPLAY'''
  localhost:11.0
  localhost:11.0


Submit a job
; Using X11 forwarding in the <code class=command>oarsub</code> job shell
$ '''oarsub -I -l /nodes=2/core=1'''
If the <code class=command>DISPLAY</code> environment variable is set in the calling shell, <code class=command>oarsub</code> will automatically enable the X11 forwarding. Verbose oarsub option (<code>-v</code>) is required to have the "Initialize X11 forwarding..." sentence.
  [ADMISSION RULE] Set default walltime to 7200.
{{Term|location=frontend.site|cmd=<code class=command>oarsub</code> -v -I -l core=1}}
  [ADMISSION RULE] Modify resource description with type constraints
  # Set default walltime to 3600.
  OAR_JOB_ID=4926  
  # Computed global resource filter: -p "maintenance = 'NO'"
  Interactive mode : waiting...
# Computed resource request: -l {"type = 'default'"}/core=1
  [2007-03-07 09:01:16] Starting...
# Generate a job key...
   
  OAR_JOB_ID=4926
Initialize X11 forwarding...
  # Interactive mode: waiting...
  Connect to OAR job 4926 via the node idpot-8.grenoble.grid5000.fr
  # Starting...
jdoe@idpot-8:~$ '''xterm &'''
  # Initialize X11 forwarding...
[1] 14656
  # Connect to OAR job 4926 via node idpot-8.grenoble.grid5000.fr
  jdoe@idpot-8:~$ '''cat $OAR_NODEFILE'''
 
  idpot-8.grenoble.grid5000.fr
Then from the shell of the job, check again the display:
idpot-9.grenoble.grid5000.fr
  jdoe@idpot-8:~$ echo $DISPLAY
[1]+  Done                    xterm
  localhost:10.0
  jdoe@idpot-8:~$ '''oarsh idpot-9 xterm'''
And run <code class=command>xterm</code>
Error: Can't open display:  
  jdoe@idpot-8:~$ xterm
jdoe@idpot-8:~$ oarsh -X idpot-9 xterm
Wait for the window to open: it may be pretty long!
 
; Using X11 forwarding in a job via <code class=command>oarsh</code>
With <code class=command>oarsh</code>, the <code class=command>-X</code> or <code class=command>-Y</code> option must be used to enable the X11 forwarding:
 
{{Term|location=frontend.site|cmd=OAR_JOB_ID=4928 <code class=command>oarsh</code> -X idpot-8}}
Then in the opened shell, you can again check that the <code class=command>DISPLAY</code> is set, and run <code class=command>xterm</code>.
 
You can also just run the <code class=command>xterm</code> command directly in the <code class=command>oarsh</code> call:
{{Term|location=frontend.site|cmd=OAR_JOB_ID=4928 <code class=command>oarsh</code> -X idpot-8 <code class=command>xterm</code>}}
 
; Using X11 forwarding in a job with a deployed environment
When an interactive job is used to deploy an environment, the spawned shell will not contain the <code class=comand>DISPLAY</code> environment variable, even if it was forwarded in the user connection shell.
 
To use X11 forwarding in this situation, you can open a new (X11 forwarded) shell on the frontend, and then connect to the node using again X11 forwarding.
 
you can also connect directly to the node from your laptop either by:
* using the Grid'5000 [[VPN]]
* following the recommendations about a better usage of ssh listed in [[Getting_Started#Recommended_tips_and_tricks_for_an_efficient_use_of_Grid.275000|Getting Started]] document.


; Shell 2
{{Note|text=X11 forwarding will suffer from the latency between your local network and the Grid'5000 network.
Also connected to the frontend with ssh -X:
* Mind using a site local access to Grid'5000 to lower that latency: see [[External access]] ;
$ '''echo $DISPLAY'''
* And/or prefer using another remote display service, such as VNC for instance}}
localhost:13.0
$ '''OAR_JOB_ID=4928 oarsh -X idpot-9 xterm'''


== Using best effort mode jobs ==
== Using best effort mode jobs ==
The best-effort jobs of OAR are implemented to back-fill the cluster with jobs considered as less important without blocking "regular" jobs.
To submit jobs under that policy, you simply have to select the besteffort type of job in your oarsub command.
<code class="command">oarsub</code> <code class="replace">-t besteffort</code> script_to_launch
Jobs submitted that way will only get scheduled on resources when no other job use them (any regular job
overtake besteffort jobs in the waiting queue, regardless of submission times).
Moreover, these jobs are killed (as if oardel were called) when a regular job recently submitted needs the nodes used by a besteffort job.
By default, no checkpointing or automatic restart of besteffort jobs is provided. They are just killed. That is why this mode
is best used with a tool which can detect the killed jobs and resubmit them. However OAR2 provides options for that.
=== Best effort job campaign ===
=== Best effort job campaign ===
OAR 2 provides a way to specify that jobs are ''best effort'', which means that the server can delete them if room is needed to fit other jobs. One can submit such jobs using the '''besteffort type''' of job.
One can submit such jobs using the '''besteffort''' of job type (or indifferently in the '''besteffort''' queue).


For instance you can run a job campaign as follows:
For instance you can run a job campaign as follows:
Line 408: Line 573:
=== Best effort job mechanism ===
=== Best effort job mechanism ===
; Running a besteffort job in a first shell:
; Running a besteffort job in a first shell:
  frennes:~$ '''oarsub -I -l nodes=10 -t besteffort'''
  frennes:~$ '''oarsub -I -l host=10 -t besteffort'''
  [ADMISSION RULE] Automatically redirect in the besteffort queue
  # Set default walltime to 3600.
[ADMISSION RULE] Automatically add the besteffort constraint on the resources
[ADMISSION RULE] Set default walltime to 3600.
[ADMISSION RULE] Modify resource description with type constraints
Generate a job key...
  OAR_JOB_ID=988535
  OAR_JOB_ID=988535
  Interactive mode : waiting...
  # Interactive mode: waiting...
  Starting...
  # Starting...
Connect to OAR job 988535 via the node parasilo-26.rennes.grid5000.fr
 
  parasilo-26:~$ '''uniq $OAR_FILE_NODES'''
  parasilo-26:~$ '''uniq $OAR_FILE_NODES'''
  parasilo-26.rennes.grid5000.fr
  parasilo-26.rennes.grid5000.fr
Line 434: Line 593:
; Running a non best effort job on the same set of resources in a second shell:
; Running a non best effort job on the same set of resources in a second shell:


  frennes:~$ '''oarsub -I -l {"host in ('parasilo-9.rennes.grid5000.fr')"}/nodes=1'''
  frennes:~$ '''oarsub -I -l {"host in ('parasilo-9.rennes.grid5000.fr')"}/host=1'''
  [ADMISSION RULE] Set default walltime to 3600.
  # Set default walltime to 3600.
[ADMISSION RULE] Modify resource description with type constraints
Generate a job key...
  OAR_JOB_ID=988546
  OAR_JOB_ID=988546
  Interactive mode : waiting...
  # Interactive mode: waiting...
  [2018-01-15 13:28:24] Start prediction: 2018-01-15 13:28:24 (FIFO scheduling OK)
  # [2022-01-10 16:00:07] Start prediction: 2022-01-10 16:00:07 (FIFO scheduling OK)
  Starting...
  # Starting...
  Connect to OAR job 988546 via the node parasilo-9.rennes.grid5000.fr
  Connect to OAR job 988546 via the node parasilo-9.rennes.grid5000.fr


Line 449: Line 606:
  parasilo-26:~$ Connection to parasilo-26.rennes.grid5000.fr closed by remote host.
  parasilo-26:~$ Connection to parasilo-26.rennes.grid5000.fr closed by remote host.
  Connection to parasilo-26.rennes.grid5000.fr closed.
  Connection to parasilo-26.rennes.grid5000.fr closed.
  [ERROR] An unknown error occured : 65280
  # Error: job was terminated.
  Disconnected from OAR job 988545
  Disconnected from OAR job 988545


Line 470: Line 627:
; Running the job
; Running the job
We run the job on 1 core, and a walltime of 5 minutes, and ask the job to be checkpointed if it lasts (and it will indeed) more than walltime - 150 sec = 2 min 30.
We run the job on 1 core, and a walltime of 5 minutes, and ask the job to be checkpointed if it lasts (and it will indeed) more than walltime - 150 sec = 2 min 30.
  $ '''oarsub -l "core=1,walltime=0:05:00" --checkpoint 150 ./checkpoint.sh '''
  $ '''oarsub -v -l "core=1,walltime=0:05:00" --checkpoint 150 ./checkpoint.sh '''
  [ADMISSION RULE] Modify resource description with type constraints
  # Modify resource description with type constraints
  OAR_JOB_ID=988555
  OAR_JOB_ID=988555
  $
  $
Line 491: Line 648:


We submit the job again
We submit the job again
  $ '''oarsub -l "core=1,walltime=0:05:0" --checkpoint 150 ./checkpoint.sh '''
  $ '''oarsub -v -l "core=1,walltime=0:05:0" --checkpoint 150 ./checkpoint.sh '''
  [ADMISSION RULE] Modify resource description with type constraints
  # Modify resource description with type constraints
  OAR_JOB_ID=988560
  OAR_JOB_ID=988560


Line 516: Line 673:
; First Job
; First Job
We run a first interactive job in a first Shell
We run a first interactive job in a first Shell
  frennes:~$ '''oarsub -I '''
frennes:~$ '''oarsub -I '''
  [ADMISSION RULE] Set default walltime to 3600.
  # Set default walltime to 3600.
[ADMISSION RULE] Modify resource description with type constraints
Generate a job key...
  OAR_JOB_ID=988571
  OAR_JOB_ID=988571
  Interactive mode : waiting...
  # Interactive mode: waiting...
  Starting...
  # Starting...
Connect to OAR job 988569 via the node parasilo-28.rennes.grid5000.fr
  parasilo-28:~$
  parasilo-28:~$
And leave that job pending.
And leave that job pending.
Line 530: Line 684:
Then we run a second job in another Shell, with a dependence on the first one
Then we run a second job in another Shell, with a dependence on the first one
  jdoe@idpot:~$ '''oarsub -I -a 988571'''
  jdoe@idpot:~$ '''oarsub -I -a 988571'''
  [ADMISSION RULE] Set default walltime to 3600.
  # Set default walltime to 3600.
[ADMISSION RULE] Modify resource description with type constraints
  OAR_JOB_ID=2071596
Generate a job key...
  # Interactive mode: waiting...
  OAR_JOB_ID=988572
  # [2018-01-15 14:27:08] Start prediction: 2018-01-15 15:30:23 (FIFO scheduling OK)
  Interactive mode : waiting...
  [2018-01-15 14:27:08] Start prediction: 2018-01-15 15:30:23 (FIFO scheduling OK)


; Job dependency in action
; Job dependency in action
Line 544: Line 696:
   
   
... then watch the second Shell and see the second job starting
... then watch the second Shell and see the second job starting
  [2018-01-15 14:27:08] Start prediction: 2018-01-15 15:30:23 (FIFO scheduling OK)
  # [2018-01-15 14:27:08] Start prediction: 2018-01-15 15:30:23 (FIFO scheduling OK)
  Starting...
  # Starting...
  Connect to OAR job 988572 via the node parasilo-3.rennes.grid5000.fr
  parasilo-3:~$


== Container jobs ==
== Container jobs ==
Line 555: Line 707:
A typical use case is to submit first a ''container'' job, then have ''inner'' jobs submitted, with referring to the container job_id.  
A typical use case is to submit first a ''container'' job, then have ''inner'' jobs submitted, with referring to the container job_id.  


Mind that the ''inner'' jobs which will not fit in the container's boundaries will stay possibly not scheduled and not executed, but keep in the waiting state in the queue. They will be deleted when the container job is terminated.
Mind that the ''inner'' jobs that will not fit in the container's boundaries will stay in the waiting state in the queue, not scheduled and not executed. They will be deleted when the container job is terminated.


Container jobs are especially useful when organizing tutorial of teaching labs, with the container job created by the organizer, and inner jobs created by the attendees.
Container jobs are especially useful when organizing tutorial of teaching labs, with the container job created by the organizer, and inner jobs created by the attendees.


If all inner job are to be created by the same user as the container job, it is preferable to use a tool such as [[GNU Parallel]].
Mind that if in your use case, all inner job are to be created by the same user as the container job, it is preferable to use a tool such as [[GNU Parallel]].


Inner job are killed when the container job is terminated.
Inner job are killed when the container job is terminated.


{{Note|text=''Container'' job must ally both the ''container'' job type and any of the ''cosystem'' or ''noop'' job types. This is mandatory for the reason that ''inner'' jobs could be of type ''deploy'' and reboot the nodes hosting the container itself.
{{Note|text=A ''container'' job must use both the ''container'' job type and any of the ''cosystem'' or ''noop'' job types. This is mandatory for the reason that ''inner'' jobs could be of type ''deploy'' and reboot the nodes hosting the container itself.
''container'' jobs are usable with passive (batch, scripted), interactive (oarsub -I) and advance reservations (oarsub -r <date>) jobs. But ''inner'' jobs cannot be advance reservations.
''container'' jobs are usable with passive (batch, scripted), interactive (oarsub -I) and advance reservations (oarsub -r <date>) jobs. But ''inner'' jobs cannot be advance reservations.
}}
}}
Line 569: Line 721:
; First a job of the type ''container'' must be submitted:
; First a job of the type ''container'' must be submitted:


{{Term|location=frontend|cmd=<code class="command">oarsub</code> -I -t cosystem -t container -l nodes=10,walltime=2:00:00}}
{{Term|location=frontend|cmd=<code class="command">oarsub</code> -I -t cosystem -t container -l host=10,walltime=2:00:00}}
  ...
  ...
  OAR_JOB_ID=42
  OAR_JOB_ID=42
Line 576: Line 728:
; Then it is possible to use the ''inner'' type to schedule the new jobs within the previously created ''container'' job:
; Then it is possible to use the ''inner'' type to schedule the new jobs within the previously created ''container'' job:


{{Term|location=frontend|cmd=<code class="command">oarsub</code> -I -t inner=42 -l nodes=7,walltime=00:10:00}}
{{Term|location=frontend|cmd=<code class="command">oarsub</code> -I -t inner=42 -l host=7,walltime=00:10:00}}
{{Term|location=frontend|cmd=<code class="command">oarsub</code> -I -t inner=42 -l nodes=1,walltime=00:20:00}}
{{Term|location=frontend|cmd=<code class="command">oarsub</code> -I -t inner=42 -l host=1,walltime=00:20:00}}
{{Term|location=frontend|cmd=<code class="command">oarsub</code> -I -t inner=42 -l nodes=10,walltime=00:10:00}}
{{Term|location=frontend|cmd=<code class="command">oarsub</code> -I -t inner=42 -l host=10,walltime=00:10:00}}


{{Note|text=
{{Note|text=
A job created with:
A job created with:
{{Term|location=frontend|cmd=<code class="command">oarsub</code> -I -t inner=42 -l nodes=11}}
{{Term|location=frontend|cmd=<code class="command">oarsub</code> -I -t inner=42 -l host=11}}
will never be scheduled because the ''container'' job "42" only reserved 10 nodes.
will never be scheduled because the ''container'' job "42" only reserved 10 nodes.


Line 597: Line 749:
* ''noop'' jobs have the advantage over the ''cosystem'' job that they are not affected by a reboot (e.g. due to a maintenance or a failure) of the frontend.  
* ''noop'' jobs have the advantage over the ''cosystem'' job that they are not affected by a reboot (e.g. due to a maintenance or a failure) of the frontend.  


If running a script on the frontend is not required, ''noop'' job a probably to be preferred over the ''cosystem'' jobs.
If running a script on the frontend is not required, ''noop'' job should probably be preferred over the ''cosystem'' jobs.


== Changing the walltime of a running job (oarwalltime) ==
== Changing the walltime of a running job (oarwalltime) ==
Starting with OAR version 2.5.8, users can request a change to the walltime (duration of the resource reservation) of a running job. This can be achieved using the <code class='command'>oarwalltime</code> command or Grid'5000's API.
Users can request a extension of the walltime (duration of the resource reservation) of a running job. This can be achieved using the <code class='command'>oarwalltime</code> command or Grid'5000's API.


This change can be an increase or a decrease, and specified giving either a new walltime value, or an increase value (begin with '''+''') or a decrease value (begin with '''-''').
This change can be specified by giving either a new walltime value or an increase (begin with '''+''').


Please note that a request may stay partially or completely unsatisfied if a next job occupies the resources.
Please note that a request may stay partially or completely unsatisfied if a job is already scheduled to occupy the resources right after the running job.


Job must be '''running''' for a walltime change. For Waiting job, delete and resubmit.
Job must be '''running''' for a walltime change. For Waiting job, delete and resubmit.
Line 681: Line 833:
* If a job could not be scheduled during the current night (not enough resources available), it will be kept in the queue and then postponed in the morning for a retry the next night (hour constraints will be changed to the next night slot), that for 7 days.
* If a job could not be scheduled during the current night (not enough resources available), it will be kept in the queue and then postponed in the morning for a retry the next night (hour constraints will be changed to the next night slot), that for 7 days.
* If the walltime of the job is more than 13h59, the job will obviously not run before a weekend.
* If the walltime of the job is more than 13h59, the job will obviously not run before a weekend.
; Submit a job to run exclusively during the coming (or current) night (or week-end on Friday)
{{Term|location=frontend|cmd=<code class="command">oarsub</code> <code class="command">-t night=noretry</code> <code class="replace">…</code>}}
If job is not scheduled and run during the coming (or current) night (or week-end on Friday), it will not be postponed to the next night for a new try, but just set to error.


Note that:
Note that:
* the maximum walltime for a night is 14h, but due to some overhead in the system (resources state changes, reboots...), it is strongly advised to limit walltime to at most 13h30. Furthermore, a shorter walltime (max a few hours)? will result in more chances to get a job scheduled in case many jobs are already in queue.
* the maximum walltime for a night is 14h, but due to some overhead in the system (resources state changes, reboots...), it is strongly advised to limit walltime to at most 13h30. Furthermore, a shorter walltime (max a few hours)? will result in more chances to get a job scheduled in case many jobs are already in queue.
* jobs with a walltime greater than 14h will be required to run during the week-ends. But even if submitted at the beginning of the week, they will not be scheduled before the Friday morning. Thus, any advance reservation done before Friday will take precedence. Also, given that the rescheduling happens on a daily basis for the next night, advance reservations take precedence if they are submitted before the daily rescheduling. In practice, this mechanism thus provides a low priority way to submit batch jobs during nights and week-ends.
* jobs with a walltime greater than 14h will be required to run during the week-ends. But even if submitted at the beginning of the week, they will not be scheduled before the Friday morning. Thus, any advance reservation done before Friday will take precedence. Also, given that the rescheduling happens on a daily basis for the next night, advance reservations take precedence if they are submitted before the daily rescheduling. In practice, this mechanism thus provides a low priority way to submit batch jobs during nights and week-ends.
* a job will be kept 7 days before deletion (if it cannot be run because of lack of resources within a week)
* a job will be kept 7 days before deletion (if it cannot be run because of lack of resources within a week), unless using <code>night=noretry</code>
 
== Use --project for users in multiple GGAs ==
 
A OAR job is linked to the user's Granting Group Access (GGA). Indeed, GGAs are used by OAR for applying the privilege levels defined in the [[Grid5000:UsagePolicy#Privilege_levels_table|usage policy]] and for statistics:
* for users that belong to only one GGA, OAR automatically retrieves the GGA.
* for users that belong to more than one GGA at once (eg., teaching, multiple affiliations, economic activities), they must use the <code>--project</code> parameter to indicate OAR which GGA is used for the job.
 
Let us take an example for a user who is a member of group <code class="replace">projectA</code> for their research and of group <code class="replace">lab-session-B</code> for their teaching.
To submit jobs related to their research (GGA <code class="replace">projectA</code>), they have to use the following command:
 
{{Term|location=frontend|cmd=<code class="command">oarsub</code> <code class="command">--project=</code><code class="replace">projectA</code> ./myscript.sh}}
 
while for reserving nodes for their lab sessions with students (GGA <code class="replace">lab-session-B</code>), they will use:
 
{{Term|location=frontend|cmd=<code class="command">oarsub</code> <code class="command">-I -t cosystem -t container -l host=10,walltime=2:00:00 --project=</code><code class="replace">lab-session-B</code>}}
 
Note that it is possible to define a default GGA (used by OAR without specifying a GGA with <code>--project</code> parameter) :
* Login to https://api.grid5000.fr/ui/account
* Go to the "Groups" tab (on the left menu), then select the default GGA by clicking on the button located on the 'Default' column.
 
== About resources states ==
OAR resources can be in several states:
* Alive: Free for use or running a job.
* Absent: Temporarily unavailable for use, typically because rebooting after a deploy job.
* Absent/standby: Free for use but not immediately available because shut down. Will be powered on and become Alive whenever needed for a job
* Suspected: A fault has been detected, the resource is unavailable but it may be repaired soon.
* Dead: The resource is definitively not available.
 
Nodes in maintenance are nodes with a ''Dead'' state with the ''maintenance'' property set to ''YES''.


= Multi-site jobs with OARGrid =
= Multi-site jobs with OARGrid =
Line 693: Line 878:
For instance, we are going to reserve 4 nodes on 3 different sites for half an hour
For instance, we are going to reserve 4 nodes on 3 different sites for half an hour


{{Term|location=frontend|cmd=<code class="command">oargridsub -t allow_classic_ssh -w '0:30:00' '''SITE1''':rdef="/nodes=2",'''SITE2''':rdef="/nodes=1",'''SITE3''':rdef="nodes=1"</code>}}
{{Term|location=frontend|cmd=<code class="command">oargridsub -w '0:30:00' '''SITE1''':rdef="/nodes=2",'''SITE2''':rdef="/nodes=1",'''SITE3''':rdef="nodes=1"</code>}}


Note that in grid reservation mode, no script can be specified. Users are in charge to:
Note that in grid reservation mode, no script can be specified. Users are in charge to:
Line 753: Line 938:
* to find the time when the maximum number of nodes are available during 10 hours, before next week deadline, avoiding usage policy periods, and not using genepi
* to find the time when the maximum number of nodes are available during 10 hours, before next week deadline, avoiding usage policy periods, and not using genepi
{{Term|location=frontend|cmd=<code class="command">funk</code> -m <code class="replace">max</code> -w 10:00:00 -e <code class="replace">"2013-12-31 23:59:59"</code> -c -b <code class="replace">genepi</code>}}
{{Term|location=frontend|cmd=<code class="command">funk</code> -m <code class="replace">max</code> -w 10:00:00 -e <code class="replace">"2013-12-31 23:59:59"</code> -c -b <code class="replace">genepi</code>}}
More information on its [https://www.grid5000.fr/mediawiki/index.php/Funk dedicated page].
More information on its [[Funk|dedicated page]].


= OAR in the Grid'5000 API =
= OAR in the Grid'5000 API =
An other way to visualize nodes/jobs status is to use the [https://api.grid5000.fr/stable/ui/jobs.html Grid'5000 API]
An other way to visualize nodes/jobs status is to use the [https://api.grid5000.fr/stable/ui/jobs.html Grid'5000 API]
= OAR database logs =
Grid'5000 gives the possibility to all users to use a read only access to OAR's database. You should be able to connect using PostgresSQL client as user <code>oarreader</code> with password <code>read</code> to database <code>oar2</code> on all <code class=host>oardb.</code><code class=replace>site</code><code class=host>.grid5000.fr</code>. This gives you access to the complete history of jobs on all Grid'5000 sites. This gives you read-only access to the production database of OAR: please be careful with your queries to avoid overloading the testbed!
{{Note|text=Careful: Grid'5000 is not a computation grid, nor HPC center, nor a Cloud (Grid'5000 is a research instrument). That means that the usage of Grid'5000 by itself (OAR logs of Grid'5000 users' reservations) does not reflect a typical usage of any such infrastructure. It is therefore not relevant to analyze Grid'5000 OAR logs to that purpose. As a user, one can however use Grid'5000 to emulate a HPC cluster or cloud on reserved resources (in a job), possibly injecting a real load from a real infrastructure.}}
; Example of access to logs
In this example, we use the PostgresSQL client to generate a CSV file, named '~/oardata.csv', containing all the jobs of the user 'toto'. Each row of the file will be one job of the user. The columns of the CSV file will be the list of nodes assigned to the job, the number of nodes, the number of cores, the cluster name, the submission time, start time and stop time, the job ID, the job name (if any), the job type, the command executed by the job (if any) and the request made by the user.
First, on one of the frontend nodes, launch the client
<syntaxhighlight lang="bash">
psql -h oardb.grenoble.grid5000.fr -U oarreader oar2
</syntaxhighlight>
Then, after entering the password, run the following command (change the user name and the file name if needed):
<syntaxhighlight lang="sql">
\copy (Select string_agg(Distinct host, '/') as hosts, Count(Distinct host) as nb_hosts, Count(Distinct resources.resource_id) as nb_cores,cluster,submission_time,start_time,stop_time,job_id,job_name,job_type,command,initial_request From jobs Inner Join assigned_resources on jobs.assigned_moldable_job = assigned_resources.moldable_job_id Inner Join resources on assigned_resources.resource_id = resources.resource_id Where job_user = 'toto' Group By jobs.submission_time,jobs.start_time,jobs.stop_time,jobs.job_id,jobs.job_name,jobs.job_type,jobs.command,resources.cluster) To '~/oardata.csv' With CSV;
</syntaxhighlight>

Latest revision as of 13:29, 9 April 2024

Note.png Note

This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team.

This tutorial consists of various independent sections describing various details of OAR useful for advanced usage, as well as some tips and tricks. It assumes you are familiar with OAR and Grid5000 basics. If not, please first look at the Getting Started page.

This OAR tutorial focuses on command line usages. It assumes you are using the bash shell (but should be easy to adapt to another shell). It can be read linearly, but you also may pick some random sections. Begin at least by #useful tips.

OAR

Useful tips

  • Take the time to carefully configure ssh, as described in the SSH page.
  • Use screen or tmux so that your work is not lost if you lose the connection to Grid5000. Moreover, having a screen session opened with one or more shell sessions allows you to leave your work session when you want then get back to it later and recover it exactly as you leaved it.
  • Most OAR commands (oarsub, oarstat, oarnodes) can provide output in various formats:
    • text (this is the default mode)
    • PERL dumper (-D)
    • XML (-X)
    • Yaml (-Y)
    • json (-J)
  • Regarding the oarsub command line, you should mostly only see the "host" word, but the oarsub command can use both the word "host" or "nodes" indifferently in Grid'5000, as nodes is just an alias for host. Prefer using "host". Besides, the word "host" is also to be preferred to the longer "network_address" word in the resources filters (both properties have sometime the same value, but not always).
  • At job submission time, only important information are printed out by oarsub. To have more indication about what is done by OAR on Grid'5000 (like computed resource filter, exceptional granted privileges, …) the oarsub verbose (-v) option can be used.
  • A syntax simplification mechanism was deployed on Grid'5000 to ease job submission, described at OAR Syntax simplification.

Connection to the job's nodes

Two commands can be used to connect to nodes on Grid'5000, oarsh and ssh.

Using ssh

ssh can only be used when a node is entirely reserved in your job (all CPU cores). Other cases may not allow assigning processes to the correct job, thus connecting with ssh is not allowed.

For instance, when a node is entirely reserved as follows:

Terminal.png fontend.site:
oarsub -I
# Set walltime to default (3600 s).
OAR_JOB_ID=<JOB_ID>
# Interactive mode: waiting...
# Starting...
user@node-32:~$

If you open a new shell and try to connect to the node with ssh, it should work:

Terminal.png fontend.site:
ssh node-32
Linux node-32.site.grid5000.fr 5.10.0-11-amd64 #1 SMP Debian 5.10.92-1 (2022-01-18) x86_64
Debian11-x64-std-2022013022
(Image based on Debian Bullseye for AMD64/EM64T)
Maintained by support-staff <support-staff@lists.grid5000.fr>
 
Last login: Wed Feb 23 15:20:32 2022 from 172.16.31.101
user@node-32:~$

However, when reserving for instance only one CPU core of a node:

Terminal.png fontend.site:
oarsub -I -l core=1
# Set walltime to default (3600 s).
OAR_JOB_ID=<JOB_ID>
# Interactive mode: waiting...
# Starting...
user@node-32:~$

When trying to connect to the node with ssh in another shell, you get:

Terminal.png fontend.site:
ssh node-32
To connect using 'ssh' directly, you must have a single job using all available cores on the node.
Use 'oarsh' instead.
Connection closed by node-32 port 22

Using oarsh

oarsh is a frontend to ssh (the oarsh command wraps the OpenSSH ssh command to add some required functions to connect to a job, but provides mostly the same interface/options).

Note.png Note

Technical note about oarsh internals:

  • It opens an ssh connection transiently as the oar user to the OAR dedicated SSH server running on a node (TCP port 6667)
  • It detects who you are based on the job id ou a job key: if you indeed have the right to connect to the node (you reserved it in an OAR job), it switches back to your user for the execution of the shell or command on the node in the job's context (cgroup/cpuset).

In case of nodes are not entirely reserved (all CPU cores), you have to use the oarsh command to connect to nodes instead of ssh, and oarcp instead of scp to copy files to/from the nodes. If you use taktuk for parallel executions (or a similar tools like pdsh) or rsync to synchronize files to/from a node, you have to configure the connector so the command uses oarsh instead of ssh underneath (see the man pages of the command to find out how to change the connector, e.g. using -c or -e).

Please note that oarsh also works for node entirely reserved in a job.

Splitting job resources

oarsh also allows splitting resources of a job, for instance to execute commands on different subsets of resources in a job (e.g. 1 GPU each instead of all the reserved GPUs).

See an example of using this functionality with GNU Parallel.

About OAR job keys

By default, OAR generates a job key pair for each job. oarsh can use either the OAR_JOB_ID or OAR_JOB_KEY_FILE environment variables to know what job to connect. If outside a job shell (e.g. on the frontend), you have to set one of those environment variable. This is not required if oarsh is called from the shell of a job (e.g. on a node), since variables are already set.

Example using OAR_JOB_ID

For instance, create a job requesting 3 hosts (3 nodes):

Terminal.png fontend.site:
oarsub -I -l host=3
# Set default walltime to 3600.
OAR_JOB_ID=<JOBID>
# Interactive mode: waiting...
# Starting...
...

Then, in another terminal, assuming the 2nd host in the job is named node-2:

Terminal.png fontend.site:
OAR_JOB_ID=JOBID oarsh node-2
Example using OAR_JOB_KEY_FILE

OAR can expose the job key, using the -e option of oarsub

Terminal.png fontend.site:
oarsub -I -l host=3 -e my_job_key

Then, in another terminal, assuming the 2nd host in the job is named node-2:

Terminal.png fontend.site:
OAR_JOB_KEY_FILE=my_job_key oarsh node-2
Note.png Note

Note that the following command also allows getting a shell in a job, but only on the first default resource (i.e. node).

Terminal.png fontend.site:
oarsub -C JOBID

Connecting to a job of a different site

Job keys are especially useful when having to connect from nodes of different sites, since each site is managed by a different OAR instance.

Thus, a convenient way is to tell OAR to always use the same job key for all jobs. You can for instance use your Grid'5000 internal SSH key (This key is generated when your account is created) as the job key: In your ~/.profile or ~/.bash_profile, set:

export OAR_JOB_KEY_FILE=path_to_your_private_key

Then, OAR will always use that key for all jobs, allowing you to connect to your nodes with oarsh seamlessly from sites to sites, jobs to jobs, or even outside jobs.

Warning.png Warning

When using the same job key for 2 jobs that share some nodes (each job reserving part of the nodes), oarsh may not execute in the expected job context (i.e. cgroup/cpuset) as the job key does not differentiate jobs. You may look at the OAR_JOB_ID to notice that.

oarsh vs ssh: tips and tricks

Note.png Note

The following is only interesting if you jobs do not reserve nodes entirely, as using oarsh is useless otherwise

1st tip - hide oarsh, rename it ssh

Creating a symlink from ~/bin/ssh (assuming it is in the execution PATH) to /usr/bin/oarsh allows hidding the wrapper use (as long as the OAR_JOB_ID or OAR_JOB_KEY_FILE environment variables are set when connecting from a frontend to a node).

2nd tip - using ssh directly, without oarsh

If using oarsh does not suit your need, because you would like to use some of the options of ssh that oarsh does not support, you can also connect to reserved nodes by using the real ssh by adding the right set of options to the command. It can also allow a connection to reserved nodes directly from some place where oarsh is not available (e.g. from outside Grid'5000):

Assuming you have a passphrase-less SSH key (preferably just for internal uses in Grid5000), you can tell oarsub to use that key as a job key instead of letting OAR generate a new one (see #sharing keys between jobs). Then you can use that key to connect to nodes, even from outside Grid'5000.

  • Copy the key to your workstation, for instance outside of Grid5000:
Terminal.png workstation:
scp site.g5k:.ssh/your_internal_private_key_file ~/
  • In Grid5000, submit a job using this key:
Terminal.png fontend.site:
oarsub -i ~/.ssh/your_internal_private_key_file -I
  • Wait for the job to start. Then in another terminal, from outside Grid5000, try connecting to the node:
Terminal.png workstation:
ssh -i ~/your_internal_private_key_file -p 6667 [any other ssh options] oar@reserved-node.site.g5k

Finally, this can be hidden in a SSH ProxyCommand (See also SSH#Using_SSH_ProxyCommand_feature_to_ease_the_access_to_hosts_inside_Grid.275000):

After adding the following configuration in your OpenSSH configuration file on your workstation (~/.ssh/config):

Host *.g5koar
ProxyCommand ssh g5k-username@access.grid5000.fr -W "$(basename %h .g5koar):%p"
User oar
Port 6667
IdentityFile ~/your_internal_private_key_file
ForwardAgent no

Warning: the ProxyCommand line works if your login shell is bash. If not you may have to adapt it.

You can just ssh to a reserved node directly from your workstation as follows:

Terminal.png workstation:
ssh reserved-node.site.g5koar

Passive and interactive job modes

Interactive mode

In interactive mode, a shell is opened on the first default resource (i.e. node) of the job (or on the frontend, if the job is of type deploy or cosystem). In interactive mode, the job will be terminated as soon as this job's shell is closed or will be killed earlier if the job's walltime is reached. It can also be killed by an explicit oardel.

You can experiment with 3 shells. On first shell, to see the list of your running jobs, regularly run:

Terminal.png fontend.site:
oarstat -u

To see your own jobs. On the second shell, run an interactive job:

Terminal.png fontend.site:
oarsub -l walltime=walltime -I

Wait for the job to start, run oarstat, then leave the job, run oarstat again. Submit another interactive job, and on the third shell, kill it:

Terminal.png fontend.site:
oardel JOBID

Passive mode

In passive mode, the command that is given to oarsub is executed on the first default resource (i.e. node) of the job (or on the site's frontend if the job is of type deploy or cosystem). The job's duration will be the shorter of the execution time of the command and the job's given walltime. That unless the job is terminated beforehand by an explicit oardel call from the user or administrator.

Terminal.png fontend.site:
oarsub -l host=3,walltime=walltime "command"

command can be a simple script name or a more complex command line with arguments.

To pass arguments, you have to quote the whole command line, like in the following example:

oarsub -l nodes=4,walltime=2 "/path/to/myscript arg1 arg2 arg3"

Note: to avoid random code injection, oarsub allows only alphanumeric characters ([a-zA-Z0-9_]), whitespace characters ([ \t\n\r\f\v]) and few others ([/.-]) inside its command line argument.

Special case for jobs of type noop which are always passive jobs: no command is executed for them. The duration of the job is the given walltime.

Terminal.png fontend.site:
oarsub -t noop -l host=3,walltime=walltime

oardel can also be used to terminate a passive mode reservation. Note that it is only possible to remove the complete reservation, and not individual nodes.

Terminal.png fontend.site:
oardel JOBID

Interactive mode without shell

You may not want a job to open a shell or to run a script when the job starts, for example because you will use the reserved resources from a program whose lifecycle is longer than the job (and which will use the resources by connecting to the job).

One trick to achieve this is to run the job in passive mode with a long sleep command. One drawback of this method is that the job may terminate with status error if the sleep is killed. This can be a problem in some situations, eg. when using job dependencies.

Another solution is to use an advance reservation (see below) with a starting date very close in the future, or even with the current date and time.

Batch jobs vs. advance reservation jobs

Batch jobs

If you do not specify the job's start date (oarsub -r option), then your job is a batch job. It lets OAR choose the best schedule (start date).

  • With batch jobs, you're guaranteed to get the count of allocated resources you requested, because OAR chooses what resources to allocate to the job just before its start. If some resources suddenly become unavailable, OAR changes the assigned resources and/or the start date.
  • Therefore, you cannot get the actual list of resources until the job starts (but a forecast is provided, such as what is shown in the Drawgantt diagrams).
  • With batch jobs, you cannot know the start date of your job until it actually starts (any event can change the forecast). But OAR gives an estimation of the start date (such as shown in the Drawgantt diagram, which also changes after any event).
Advance reservations

If you specify the job's start date, it is an advance reservation. OAR will just try to find resources for the given schedule, fixed by you.

  • The Grid5000 usage policy allows no more than 2 advance reservations per site (excluding reservations that start in less than one hour)
  • With advance reservation jobs, you're not guaranteed to get the count of resources you requested, because OAR planned the allocation of resources at the reservation time.
  • If some resources became unavailable when the job has to start, the job is delayed a bit in case resources may come back (e.g. return from standby).
  • If after 400 seconds, if not all resources are available, the job will start with fewer resources than initially allocated. This is however quite unusual.
  • The list of allocated resources to an advance reservation job is fixed and known as soon as the advance reservation is validated. But you will get the actual list of resources (that is with unavailable resources removed for it) when the advance reservation starts.
  • To coordinate the start date of OAR jobs on several sites, oargrid or funk use advance reservations.

Example: a reservation for a job in one week from now

$ oarsub -r "$(date +'%F %T' --date='+1 week')"

For advance reservations, there is no interactive mode. You can give OAR a command to execute or nothing. If you do not give a command, you'll have to connect to the jobs once the reservation starts (using oarsub -C <jobid> or oarsh).

Why did my advance reservation start with less than all the resources I requested ?

Since resources states are transitional, the advance reservation process considers indifferently the current state of resources, be it alive, suspended or absent. Indeed, at the requested start time of an advance reservation, all resources in any of those states should presumably be back in the alive state.

This is different for resources in the dead state, which mark failed resources. Although dead resources may be repaired at some point, that state is less transitional, so dead resources are excluded from eligible resources for advance reservations.

Also, the allocation of resources to an advance reservation is fixed at the time of validation of the submission (contrarily to batch jobs for which both the start time and allocated resources can change up until the job is effectively started, in order to fit with all requested resources available). As a consequence, resources allocated to an advance reservation which would end up unavailable at the job start time are not replaced by other alive resources.

In fact, at the start time of an advance reservation, OAR looks after any unavailable resources (absent or suspected), and whenever some exists, wait for them to return to the alive state for 5 minutes. Then, if they are not back in time, the job starts with less resources than requested and initially allocated (assuming at least one resource is available).

NB

Information about reduced number of resources or reduced walltime for a reservation due to this mechanism are available in the event part of the output of

oarstat -fj jobid

Getting information about a job

The oarstat command gets jobs informations. By default it lists the current jobs of all users. You can restrict it to your own jobs or someone else's jobs with option -u:

$ oarstat -u

You can get full details of a job:

$ oarstat -fj <JOBID>

If scripting OAR and regularly polling job states with oarstat, you can cause a high load on the OAR server (because default oarstat invocation causes costly SQL request in the OAR database). In this case, you should use option -s which is optimized and only queries the current state of a given job:

$ oarstat -s -j <JOBID>

Complex resources selection

The complete selector format syntax (oarsub -l option) is:

"-l {sql1}/name1=n1/name2=n2+{sql2}/name3=n3/name4=n4/name5=n5+...,walltime=hh:mm:ss"

where

  • sqlN are optional SQL predicates on the resource properties (e.g. mem, ib_rate, gpu_count, ...)
  • nameN=n are the wanted number of given resources of name nameN (e.g. host, cpu, core, disk...).
  • slashes (/) between resources express resource subtree selection
  • + allows aggregating different resource specifications
  • walltime=hh:mm::ss (separated by a comma) sets the job walltime (expected duration), which defaults to 1 hour
List resource properties

You can get the list of resource properties for SQL predicates by running the oarprint -l command on a node:

sagittaire-1 $ oarprint -l
List of properties:
disktype, gpu_count, ...

You can get the property values set to resources using the oarnodes:

flyon $ oarnodes -Y --sql="host = 'sagittaire-1.lyon.grid5000.fr'"

These OAR properties are described in the OAR Properties page.

Note.png Note

A SQL predicate on the resource properties can also be set using the -p <...> syntax, in which case it applies to all aggregated resource specifications. It can also be combined with the -l <...> syntax (curly brackets), for some possible common parts among all aggragates. Please refer to a SQL syntax manual in order to build a correct SQL predicate syntax, which technically speaking is a WHERE clause of a resource selection SQL matching.

Using the resources hierarchies

The OAR resources define implicit hierarchies to be used on the resource requests (oarsub -l). These hierarchies are specific to Grid'5000.

For instance
  • request 1 core on 15 hosts (nodes) on a same cluster (total = 15 cores)
$ oarsub -I -l /cluster=1/host=15/core=1
  • request 1 core on 15 hosts (nodes) on 2 clusters (total = 30 cores)
$ oarsub -I -l /cluster=2/host=15/core=1
  • request 1 core on 2 cpus on 15 hosts (nodes) on a same cluster (total = 30 cores)
$ oarsub -I -l /cluster=1/host=15/cpu=2/core=1
  • request 10 cpus on 2 clusters (total = 20 cpus, the number of hosts and cores depends on the topology of the machines)
$ oarsub -I -l /cluster=2/cpu=10
  • request 1 core on 3 different network switches (total = 3 cores)
$ oarsub -I -l /switch=3/core=1
Examples for GPUs
  • request 3 GPUs on 1 single host (node). Obviously eligible nodes for the job need to have at least 3 GPU.
$ oarsub -I -l host=1/gpu=3
  • request 3 GPUs, possibly on different nodes depending on availability (other jobs, possible resources):
$ oarsub -I -l gpu=3
  • request a full node (possibly featuring more than 3 GPUs) with at lease 3 GPUs:
$ oarsub -p "gpu_count >= 3" -l host=1 [...]
  • In the job, running oarprint as follows shows what GPUs are available in the job:
$ oarprint gpu -P host,gpudevice

(you may also look at nvidia-smi's output)

Valid resource hierarchies are
  • Compute and disk resources
    • both switch > cluster, or cluster > switch can be valid (some clusters spread their hosts (nodes) on many switches, some clusters share a same switch), we note below cluster|switch to reflect that ambiguity.
    • cluster|switch > chassis > host > cpu > gpu > core
    • cluster|switch > chassis > host > disk
  • vlan resources
    • vlan only
  • subnet resources
    • slash16 > slash17 > slash18 > slash19 > slash20 > slash21 > slash22

Of course not all hierarchy levels have to be given in a resource request.

Note.png Note

Please mind that the nodes keyword (plural!) is an alias for host (singular!). A node or host is one server (computer). For instance, -l /cluster=X/nodes=Y/core=Z is exactly the same as -l /cluster=X/host=Y/core=Z.

Selecting resources using properties

The properties of the resources are described in the OAR Properties page.

Selecting nodes from a specific cluster

For example in Nancy:

$ oarsub -I -l {"cluster='graphene'"}/host=2

Or, alternative syntax:

$ oarsub -I -p "cluster='graphene'" -l /host=2
Selecting nodes with a specific CPU architecture

For classical x86_64:

$ oarsub -I -p "cpuarch='x86_64'"

Other architectures are "exotic" so a specific type of job is needed:

$ oarsub -I -t exotic -p "cpuarch='ppc64le'"
Selecting specific nodes

For example in Lyon:

$ oarsub -I -l {"host in ('sagittaire-10.lyon.grid5000.fr', 'sagittaire-11.lyon.grid5000.fr', 'sagittaire-12.lyon.grid5000.fr')"}/host=1

or, alternative syntax:

$ oarsub -I -p "host in ('sagittaire-10.lyon.grid5000.fr', 'sagittaire-11.lyon.grid5000.fr', 'sagittaire-12.lyon.grid5000.fr')" -l /nodes=1

By negating the SQL clause, you can also exclude some nodes.

Other examples using properties

Ask for 10 cores of the cluster graphene

$ oarsub -I -l core=10 -p "cluster='graphene'"

Ask for 2 nodes with 16384 MB of memory and Infiniband 20G

$ oarsub -I -p "memnode='16384' and ib_rate='20'" -l host=2

Ask for any 4 nodes except graphene-12

$ oarsub -I -p "not host like 'graphene-12.%'" -l host=4

Ask for a node with 32 threads

$ oarsub -I  -p 'thread_count=32'
Examples of joint resources requests

Ask for 2 nodes with virtualization capability, on different clusters + IP subnets:

  • We want 2 nodes (hosts) and 4 /22 subnets with the following constraints:
    • Nodes are on 2 different clusters of the same site (Hint: use a site with several clusters :-D)
    • Nodes have virtualization capability enabled
    • /22 subnets are on two different /19 subnets
    • 2 subnets belonging to the same /19 subnet are consecutive
$ oarsub -I -l /slash_19=2/slash_22=2+{"virtual!='none'"}/cluster=2/host=1

Lets verify the reservation:

 $ uniq $OAR_NODE_FILE
 graphene-43.nancy.grid5000.fr
 graphite-3.nancy.grid5000.fr
 $ g5k-subnets -p
 10.144.32.0/22
 10.144.36.0/22
 10.144.0.0/22
 10.144.4.0/22
 $ g5k-subnets -ps
 10.144.0.0/21
 10.144.32.0/21

Another example, ask for both

  • 1 core on 2 hosts (nodes) on the same cluster with 16384 MB of memory and Infiniband 20G
  • 1 cpu on 2 hosts (nodes) on the same switch with 8 cores processors for a walltime of 4 hours
$ oarsub -I -l "{memnode=16384 and ib_rate='20'}/cluster=1/host=2/core=1+{cpucore=8}/switch=1/host=2/cpu=1,walltime=4:0:0"

Walltime must always be the last argument of -l <...>

Note.png Note

If no resource matches your request, oarsub will exit with the message

# Set default walltime to 3600.
There are not enough resources for your request
OAR_JOB_ID=-5
# Error: oarsub failed, please verify your request syntax.

Handling the resources allocated to my job with oarprint

The oarprint allows to print nicely the resources of a job.

We first submit a job

$ oarsub -I -l host=4
...
OAR_JOB_ID=178361
Retrieve the nodes list

We want the list of the nodes (hosts) we got, identified by unique hostnames

$ oarprint host
sagittaire-32.lyon.grid5000.fr
capricorne-34.lyon.grid5000.fr
sagittaire-63.lyon.grid5000.fr
sagittaire-28.lyon.grid5000.fr

(We get 1 line per host, not per core !)

Retrieve the core list
$ oarprint core
63
241
64
163
243
244
164
242

Obviously, retrieving OAR internal core Id might not help much. Hence the use of a customized output format below.

Retrieve core list with host and cpuset Id as identifier

We want to identify our cores by their associated host names and cpuset Ids:

$ oarprint core -P host,cpuset
capricorne-34.lyon.grid5000.fr 0
sagittaire-32.lyon.grid5000.fr 0
capricorne-34.lyon.grid5000.fr 1
sagittaire-28.lyon.grid5000.fr 0
sagittaire-63.lyon.grid5000.fr 0
sagittaire-63.lyon.grid5000.fr 1
sagittaire-28.lyon.grid5000.fr 1
sagittaire-32.lyon.grid5000.fr 1
A more complex example with a customized output format

We want to identify our cores by their associated host name and cpuset Id, and get the memory information as well, with a customized output format

$ oarprint core -P host,cpuset,memnode -F "NODE=%[%] MEM=%"
NODE=capricorne-34.lyon.grid5000.fr[0] MEM=2048
NODE=sagittaire-32.lyon.grid5000.fr[0] MEM=2048
NODE=capricorne-34.lyon.grid5000.fr[1] MEM=2048
NODE=sagittaire-28.lyon.grid5000.fr[0] MEM=2048
NODE=sagittaire-63.lyon.grid5000.fr[0] MEM=2048
NODE=sagittaire-63.lyon.grid5000.fr[1] MEM=2048
NODE=sagittaire-28.lyon.grid5000.fr[1] MEM=2048
NODE=sagittaire-32.lyon.grid5000.fr[1] MEM=2048
From the submission frontend

If you are not in a job shell ($OAR_RESOURCE_PROPERTIES_FILE is not defined), running oarprint will give:

$ oarprint 
/usr/bin/oarprint: no input data available

In that case, you can however pipe the output of the oarstat command in oarprint, e.g.:

$ oarstat -j <JOB_ID> -p | oarprint core -P host,cpuset,memnode -F "%[%] (%)" -f -
capricorne-34.lyon.grid5000.fr[0] (2048)
sagittaire-32.lyon.grid5000.fr[0] (2048)
capricorne-34.lyon.grid5000.fr[1] (2048)
sagittaire-28.lyon.grid5000.fr[0] (2048)
sagittaire-63.lyon.grid5000.fr[0] (2048)
sagittaire-63.lyon.grid5000.fr[1] (2048)
sagittaire-28.lyon.grid5000.fr[1] (2048)
sagittaire-32.lyon.grid5000.fr[1] (2048)
List the OAR properties to use with oarprint

Properties are descibed in the OAR Properties page, but they can also be listed using the oarprint -l command:

$ oarprint -l
List of properties:
disktype, gpu_count, ...
Note.png Note

Those properties can also be used in oarsub using the -p switch for instance.

X11 forwarding

X11 forwarding is enabled in the shell opened in interactive job (oarsub -I). X11 forwarding can also be enabled in a shell opened on a node of a job with oarsh, just like with a classic ssh command: The -X or -Y option must be passed to oarsh.

Note.png Note

Please mind that for X11 forwarding to work in the job, X11 forwarding must already work in the shell from which the OAR commands are run. Check the DISPLAY environment variable !

We will use xterm to test X11.

Enabling X11 forwarding up to the frontend

Connect to a frontend with ssh (reminder: read the getting started tutorial about the use of the ssh proxycommand), and make sure the X11 forwarding is operational so far:

Look at the DISPLAY environment variable, which ssh should have set to localhost:10.0 or the like (the 10.0 part may vary from hop to hop in the X11 forwarding chain, with numbers greater than 10).

It requires to use the -X or -Y option in the ssh command line, or to have ForwardX11=yes set in your SSH configuration.

In any case, check:

Terminal.png frontend.site:
echo $DISPLAY
localhost:11.0
Using X11 forwarding in the oarsub job shell

If the DISPLAY environment variable is set in the calling shell, oarsub will automatically enable the X11 forwarding. Verbose oarsub option (-v) is required to have the "Initialize X11 forwarding..." sentence.

Terminal.png frontend.site:
oarsub -v -I -l core=1
# Set default walltime to 3600.
# Computed global resource filter: -p "maintenance = 'NO'"
# Computed resource request: -l {"type = 'default'"}/core=1
# Generate a job key...
OAR_JOB_ID=4926
# Interactive mode: waiting...
# Starting...
# Initialize X11 forwarding...
# Connect to OAR job 4926 via node idpot-8.grenoble.grid5000.fr

Then from the shell of the job, check again the display:

jdoe@idpot-8:~$ echo $DISPLAY
localhost:10.0

And run xterm

jdoe@idpot-8:~$ xterm

Wait for the window to open: it may be pretty long!

Using X11 forwarding in a job via oarsh

With oarsh, the -X or -Y option must be used to enable the X11 forwarding:

Terminal.png frontend.site:
OAR_JOB_ID=4928 oarsh -X idpot-8

Then in the opened shell, you can again check that the DISPLAY is set, and run xterm.

You can also just run the xterm command directly in the oarsh call:

Terminal.png frontend.site:
OAR_JOB_ID=4928 oarsh -X idpot-8 xterm
Using X11 forwarding in a job with a deployed environment

When an interactive job is used to deploy an environment, the spawned shell will not contain the DISPLAY environment variable, even if it was forwarded in the user connection shell.

To use X11 forwarding in this situation, you can open a new (X11 forwarded) shell on the frontend, and then connect to the node using again X11 forwarding.

you can also connect directly to the node from your laptop either by:

  • using the Grid'5000 VPN
  • following the recommendations about a better usage of ssh listed in Getting Started document.
Note.png Note

X11 forwarding will suffer from the latency between your local network and the Grid'5000 network.

  • Mind using a site local access to Grid'5000 to lower that latency: see External access ;
  • And/or prefer using another remote display service, such as VNC for instance

Using best effort mode jobs

The best-effort jobs of OAR are implemented to back-fill the cluster with jobs considered as less important without blocking "regular" jobs. To submit jobs under that policy, you simply have to select the besteffort type of job in your oarsub command.

oarsub -t besteffort script_to_launch

Jobs submitted that way will only get scheduled on resources when no other job use them (any regular job overtake besteffort jobs in the waiting queue, regardless of submission times). Moreover, these jobs are killed (as if oardel were called) when a regular job recently submitted needs the nodes used by a besteffort job.

By default, no checkpointing or automatic restart of besteffort jobs is provided. They are just killed. That is why this mode is best used with a tool which can detect the killed jobs and resubmit them. However OAR2 provides options for that.

Best effort job campaign

One can submit such jobs using the besteffort of job type (or indifferently in the besteffort queue).

For instance you can run a job campaign as follows:

for param in $(< ./paramlist); do
    oarsub -t besteffort -l core=1 "./my_script.sh $param"
done

In this example, the file ./paramlist contains a list of parameters for a parametric application.

The following demonstrates the mechanism.

Note.png Note

Please have a look at the UsagePolicy to avoid abuses.

Best effort job mechanism

Running a besteffort job in a first shell
frennes:~$ oarsub -I -l host=10 -t besteffort
# Set default walltime to 3600.
OAR_JOB_ID=988535
# Interactive mode: waiting...
# Starting...
parasilo-26:~$ uniq $OAR_FILE_NODES
parasilo-26.rennes.grid5000.fr
parasilo-27.rennes.grid5000.fr
parasilo-28.rennes.grid5000.fr
parasilo-3.rennes.grid5000.fr
parasilo-4.rennes.grid5000.fr
parasilo-5.rennes.grid5000.fr
parasilo-6.rennes.grid5000.fr
parasilo-7.rennes.grid5000.fr
parasilo-8.rennes.grid5000.fr
parasilo-9.rennes.grid5000.fr


Running a non best effort job on the same set of resources in a second shell
frennes:~$ oarsub -I -l {"host in ('parasilo-9.rennes.grid5000.fr')"}/host=1
# Set default walltime to 3600.
OAR_JOB_ID=988546
# Interactive mode: waiting...
# [2022-01-10 16:00:07] Start prediction: 2022-01-10 16:00:07 (FIFO scheduling OK)
# Starting...
Connect to OAR job 988546 via the node parasilo-9.rennes.grid5000.fr


As expected, meanwhile the best effort job was stopped (watch the first shell):

parasilo-26:~$ Connection to parasilo-26.rennes.grid5000.fr closed by remote host.
Connection to parasilo-26.rennes.grid5000.fr closed.
# Error: job was terminated.
Disconnected from OAR job 988545

Using the checkpointing trigger mechanism

Writing the test script

Here is a script which features an infinite loop and a signal handler trigged by SIGUSR2 (default signal for OAR's checkpointing mechanism).

#!/bin/bash

handler() { echo "Caught checkpoint signal at: `date`"; echo "Terminating."; exit 0; }
trap handler SIGUSR2

cat <<EOF
Hostname: `hostname`
Pid: $$
Starting job at: `date`
EOF
while : ; do sleep 10; done
Running the job

We run the job on 1 core, and a walltime of 5 minutes, and ask the job to be checkpointed if it lasts (and it will indeed) more than walltime - 150 sec = 2 min 30.

$ oarsub -v -l "core=1,walltime=0:05:00" --checkpoint 150 ./checkpoint.sh 
# Modify resource description with type constraints
OAR_JOB_ID=988555
$
Result

Taking a look at the job output:

$ cat OAR.988555.stdout 
Hostname: parasilo-9.rennes.grid5000.fr
Pid: 12013
Starting job at: Mon Jan 15 14:05:50 CET 2018
Caught checkpoint signal at: Mon Jan 15 14:08:30 CET 2018
Terminating.

The checkpointing signal was sent to the job 2 minutes 30 before the walltime as expected so that the job can finish nicely.

Interactive checkpointing

The oardel command provides the capability to raise a checkpoint event interactively to a job.

We submit the job again

$ oarsub -v -l "core=1,walltime=0:05:0" --checkpoint 150 ./checkpoint.sh 
# Modify resource description with type constraints
OAR_JOB_ID=988560

Then run the oardel -c #jobid command...

$ oardel -c 988560
Checkpointing the job 988560 ...DONE.
The job 988560 was notified to checkpoint itself (send SIGUSR2).

And then watch the job's output:

$ cat OAR.988560.stdout 
Hostname: parasilo-4.rennes.grid5000.fr
Pid: 11612
Starting job at: Mon Jan 15 14:17:25 CET 2018
Caught checkpoint signal at: Mon Jan 15 14:17:35 CET 2018
Terminating.

The job terminated as expected.

Using jobs dependency

A job can wait for the termination of a previous job.

First Job

We run a first interactive job in a first Shell

frennes:~$ oarsub -I 
# Set default walltime to 3600.
OAR_JOB_ID=988571
# Interactive mode: waiting...
# Starting...
parasilo-28:~$

And leave that job pending.

Second Job

Then we run a second job in another Shell, with a dependence on the first one

jdoe@idpot:~$ oarsub -I -a 988571
# Set default walltime to 3600.
OAR_JOB_ID=2071596
# Interactive mode: waiting...
# [2018-01-15 14:27:08] Start prediction: 2018-01-15 15:30:23 (FIFO scheduling OK)
Job dependency in action

We do a logout on the first interactive job...

parasilo-28:~$ logout
Connection to parasilo-28.rennes.grid5000.fr closed.
Disconnected from OAR job 988571

... then watch the second Shell and see the second job starting

# [2018-01-15 14:27:08] Start prediction: 2018-01-15 15:30:23 (FIFO scheduling OK)
# Starting...
parasilo-3:~$

Container jobs

With the container job functionality, OAR allows for someone to execute inner jobs within the boundaries of the container job. Inner jobs are scheduled using the same algorithm as other jobs, but restricted to the container job's resources and timespan.

A typical use case is to submit first a container job, then have inner jobs submitted, with referring to the container job_id.

Mind that the inner jobs that will not fit in the container's boundaries will stay in the waiting state in the queue, not scheduled and not executed. They will be deleted when the container job is terminated.

Container jobs are especially useful when organizing tutorial of teaching labs, with the container job created by the organizer, and inner jobs created by the attendees.

Mind that if in your use case, all inner job are to be created by the same user as the container job, it is preferable to use a tool such as GNU Parallel.

Inner job are killed when the container job is terminated.

Note.png Note

A container job must use both the container job type and any of the cosystem or noop job types. This is mandatory for the reason that inner jobs could be of type deploy and reboot the nodes hosting the container itself. container jobs are usable with passive (batch, scripted), interactive (oarsub -I) and advance reservations (oarsub -r <date>) jobs. But inner jobs cannot be advance reservations.

First a job of the type container must be submitted
Terminal.png frontend:
oarsub -I -t cosystem -t container -l host=10,walltime=2:00:00
...
OAR_JOB_ID=42
...
Then it is possible to use the inner type to schedule the new jobs within the previously created container job
Terminal.png frontend:
oarsub -I -t inner=42 -l host=7,walltime=00:10:00
Terminal.png frontend:
oarsub -I -t inner=42 -l host=1,walltime=00:20:00
Terminal.png frontend:
oarsub -I -t inner=42 -l host=10,walltime=00:10:00
Note.png Note

A job created with:

Terminal.png frontend:
oarsub -I -t inner=42 -l host=11
will never be scheduled because the container job "42" only reserved 10 nodes.

cosystem and noop jobs

cosystem
  • Jobs of type cosystem, just like jobs of type deploy, do not execute on the first node assigned to the job but on the frontend.
  • But unlike deploy jobs, cosystem jobs do not grant any special privileges (e.g. no kareboot right).
noop
  • Jobs of type noop do not execute anything at all. They just allocate resources for a time frame.
  • noop jobs cannot be interactive (oarsub -I).
  • noop jobs have the advantage over the cosystem job that they are not affected by a reboot (e.g. due to a maintenance or a failure) of the frontend.

If running a script on the frontend is not required, noop job should probably be preferred over the cosystem jobs.

Changing the walltime of a running job (oarwalltime)

Users can request a extension of the walltime (duration of the resource reservation) of a running job. This can be achieved using the oarwalltime command or Grid'5000's API.

This change can be specified by giving either a new walltime value or an increase (begin with +).

Please note that a request may stay partially or completely unsatisfied if a job is already scheduled to occupy the resources right after the running job.

Job must be running for a walltime change. For Waiting job, delete and resubmit.

Note.png Note

Walltime change is not possible in the production queue (Nancy).

Warning.png Warning

While changes of walltime are not limited a priori (by the oarwalltime command or the API), the resulting characteristics of the jobs must comply with the Grid5000:UsagePolicy. Enforcement checks happen as usual, a posteriori.

Command line interface

Querying the walltime change status:

Terminal.png frontend:
oarwalltime 1743185
Walltime change status for job 1743185 (job is running):
  Current walltime:       1:0:0
  Possible increase:  UNLIMITED
  Already granted:        0:0:0
  Pending/unsatisfied:    0:0:0

Requesting the walltime change:

Terminal.png frontend:
oarwalltime 1743185 +1:30
Accepted: walltime change request updated for job 1743185, it will be handled shortly.

Querying right afterward:

Terminal.png frontend:
oarwalltime 1743185
Walltime change status for job 1743185 (job is running):
  Current walltime:       1:0:0
  Possible increase:  UNLIMITED
  Already granted:        0:0:0
  Pending/unsatisfied:  +1:30:0

The request is still to be handled by OAR's scheduler.

Querying again a bit later:

Terminal.png frontend:
oarwalltime 1743185
Walltime change status for job 1743185 (job is running):
  Current walltime:      2:30:0
  Possible increase:  UNLIMITED
  Already granted:      +1:30:0
  Pending/unsatisfied:    0:0:0

May a job exist on the resources and partially prevent the walltime increase, the query output would be:

Terminal.png frontend:
oarwalltime 1743185
Walltime change status for job 1743185 (job is running):
  Current walltime:      2:30:0
  Possible increase:  UNLIMITED
  Already granted:      +1:10:0
  Pending/unsatisfied:  +0:20:0

Changes events are also reported in oarstat.

See man oarwalltime for more information.

Using the REST API

Requesting the walltime change:

curl -i -X POST https://api.grid5000.fr/stable/sites/grenoble/internal/oarapi/jobs/1743185.json -H'Content-Type: application/json' -d '{"method":"walltime-change", "walltime":"+0:30:0"}'

Querying the status of the walltime change:

curl -i -X GET https://api.grid5000.fr/stable/sites/grenoble/internal/oarapi/jobs/1743185/details.json -H'Content-Type: application/json'

See the walltime-change and events keys of the output.

Restricting jobs to daytime or night/week-end time

To help submitting batch jobs fitting inside the time frames defined in the usage policy (day vs. night and week-end), the types day and night can be used (oarsub -t <type>…).

Submit a job to run during the current day time
Terminal.png frontend:
oarsub -t day

As such:

  • It will be forced to run between 9:00 and 19:00, or the next day if the job is submitted during the night.
  • If the job did not succeed to run before 19:00, it will be deleted.
Submit a job to run during the coming (or current) night (or week-end on Friday)
Terminal.png frontend:
oarsub -t night

As such:

  • It will be forced to run after 19:00, and before 9:00 for week nights (Monday to Thursday nights), or before 9:00 on the next Monday for a job which runs during a week-end.
  • If a job could not be scheduled during the current night (not enough resources available), it will be kept in the queue and then postponed in the morning for a retry the next night (hour constraints will be changed to the next night slot), that for 7 days.
  • If the walltime of the job is more than 13h59, the job will obviously not run before a weekend.
Submit a job to run exclusively during the coming (or current) night (or week-end on Friday)
Terminal.png frontend:
oarsub -t night=noretry

If job is not scheduled and run during the coming (or current) night (or week-end on Friday), it will not be postponed to the next night for a new try, but just set to error.

Note that:

  • the maximum walltime for a night is 14h, but due to some overhead in the system (resources state changes, reboots...), it is strongly advised to limit walltime to at most 13h30. Furthermore, a shorter walltime (max a few hours)? will result in more chances to get a job scheduled in case many jobs are already in queue.
  • jobs with a walltime greater than 14h will be required to run during the week-ends. But even if submitted at the beginning of the week, they will not be scheduled before the Friday morning. Thus, any advance reservation done before Friday will take precedence. Also, given that the rescheduling happens on a daily basis for the next night, advance reservations take precedence if they are submitted before the daily rescheduling. In practice, this mechanism thus provides a low priority way to submit batch jobs during nights and week-ends.
  • a job will be kept 7 days before deletion (if it cannot be run because of lack of resources within a week), unless using night=noretry

Use --project for users in multiple GGAs

A OAR job is linked to the user's Granting Group Access (GGA). Indeed, GGAs are used by OAR for applying the privilege levels defined in the usage policy and for statistics:

  • for users that belong to only one GGA, OAR automatically retrieves the GGA.
  • for users that belong to more than one GGA at once (eg., teaching, multiple affiliations, economic activities), they must use the --project parameter to indicate OAR which GGA is used for the job.

Let us take an example for a user who is a member of group projectA for their research and of group lab-session-B for their teaching. To submit jobs related to their research (GGA projectA), they have to use the following command:

Terminal.png frontend:
oarsub --project=projectA ./myscript.sh

while for reserving nodes for their lab sessions with students (GGA lab-session-B), they will use:

Terminal.png frontend:
oarsub -I -t cosystem -t container -l host=10,walltime=2:00:00 --project=lab-session-B

Note that it is possible to define a default GGA (used by OAR without specifying a GGA with --project parameter) :

  • Login to https://api.grid5000.fr/ui/account
  • Go to the "Groups" tab (on the left menu), then select the default GGA by clicking on the button located on the 'Default' column.

About resources states

OAR resources can be in several states:

  • Alive: Free for use or running a job.
  • Absent: Temporarily unavailable for use, typically because rebooting after a deploy job.
  • Absent/standby: Free for use but not immediately available because shut down. Will be powered on and become Alive whenever needed for a job
  • Suspected: A fault has been detected, the resource is unavailable but it may be repaired soon.
  • Dead: The resource is definitively not available.

Nodes in maintenance are nodes with a Dead state with the maintenance property set to YES.

Multi-site jobs with OARGrid

oargrid alows submitting OAR jobs to several Grid'5000 sites at once.

For instance, we are going to reserve 4 nodes on 3 different sites for half an hour

Terminal.png frontend:
oargridsub -w '0:30:00' SITE1:rdef="/nodes=2",SITE2:rdef="/nodes=1",SITE3:rdef="nodes=1"

Note that in grid reservation mode, no script can be specified. Users are in charge to:

  1. connect to the allocated nodes.
  2. launch their experiment.

OAR Grid connects to each of the specified clusters and makes a passive submission. Cluster job ids are returned by OAR. A grid job id is returned by OAR Grid to bind cluster jobs ids together.

You should see an output like this:

SITE1:rdef=/nodes=2,SITE2:rdef=/nodes=1,SITE3:rdef=nodes=1
[OAR_GRIDSUB] [SITE3] Date/TZ adjustment: 0 seconds
[OAR_GRIDSUB] [SITE3] Reservation success on SITE3 : batchId = SITE_JOB_ID3
[OAR_GRIDSUB] [SITE2] Date/TZ adjustment: 1 seconds
[OAR_GRIDSUB] [SITE2] Reservation success on SITE2 : batchId = SITE_JOB_ID2
[OAR_GRIDSUB] [SITE1] Date/TZ adjustment: 0 seconds
[OAR_GRIDSUB] [SITE1] Reservation success on SITE1 : batchId = SITE_JOB_ID1
[OAR_GRIDSUB] Grid reservation id = GRID_JOB_ID
[OAR_GRIDSUB] SSH KEY : /tmp/oargrid//oargrid_ssh_key_LOGIN_GRID_JOB_ID
       You can use this key to connect directly to your OAR nodes with the oar user.

Fetch the allocated nodes list to transmit it to the script we want to run:

Terminal.png frontend:
oargridstat -w -l GRID_JOB_ID | sed '/^$/d' > ~/machines
Note.png Note

The -w command-line argument makes oargridstat wait for the start of every cluster reservation.

  • Nodes list can be incomplete otherwise.

(1) Select the node to launch the script (ie: the first node listed in the ~/machines file).

If (and only if) this node does not belong to the site where the ~/machines file was saved, copy the ~/machines to this node:

Terminal.png frontend:
OAR_JOB_ID=SITE_JOB_ID oarcp -i /tmp/oargrid/oargrid_ssh_key_LOGIN_GRID_JOB_ID ~/machines `head -n 1 machines`:

(2) Connect to this node using oarsh:

Terminal.png frontend:
OAR_JOB_ID=SITE_JOB_ID oarsh -i /tmp/oargrid/oargrid_ssh_key_LOGIN_GRID_JOB_ID `head -n 1 machines`
Note.png Note

Do not forget to indicate the location of the temporary private key generated by the oargridsub command when you want to connect to one of your allocated nodes

  • In previous snippets, this is done by using the -i option.

And then run the script:

Terminal.png node:
~/hello/helloworld ~/machines


The Grid counterpart of oarstat gives information about the grid job:

Terminal.png frontend:
oargridstat GRID_JOB_ID

Our grid submission is interactive, so its end time is unrelated to the end time of our script run. The submission ends when the submission owner requests that it ends or when the submission deadline is reached.

We are going to ask for our submission to end:

Terminal.png frontend:
oargriddel GRID_JOB_ID

Funk

funk is grid resources discovery tool that works at nodes level and generate complex oarsub/oargridsub commands. It can help you in three cases:

  • to know the number of nodes availables for 2 hours at run time, on sites lille, rennes and on clusters taurus and suno
Terminal.png frontend:
funk -m date -r lille,rennes,taurus,suno -w 2:00:00
  • to know when 40 nodes on sagittaire and 4 nodes on taurus will be available, with deploy job type and a subnet
Terminal.png frontend:
funk -m free -r sagittaire:40,taurus:4 -o "-t deploy" -n slash_22=2
  • to find the time when the maximum number of nodes are available during 10 hours, before next week deadline, avoiding usage policy periods, and not using genepi
Terminal.png frontend:
funk -m max -w 10:00:00 -e "2013-12-31 23:59:59" -c -b genepi

More information on its dedicated page.

OAR in the Grid'5000 API

An other way to visualize nodes/jobs status is to use the Grid'5000 API

OAR database logs

Grid'5000 gives the possibility to all users to use a read only access to OAR's database. You should be able to connect using PostgresSQL client as user oarreader with password read to database oar2 on all oardb.site.grid5000.fr. This gives you access to the complete history of jobs on all Grid'5000 sites. This gives you read-only access to the production database of OAR: please be careful with your queries to avoid overloading the testbed!

Note.png Note

Careful: Grid'5000 is not a computation grid, nor HPC center, nor a Cloud (Grid'5000 is a research instrument). That means that the usage of Grid'5000 by itself (OAR logs of Grid'5000 users' reservations) does not reflect a typical usage of any such infrastructure. It is therefore not relevant to analyze Grid'5000 OAR logs to that purpose. As a user, one can however use Grid'5000 to emulate a HPC cluster or cloud on reserved resources (in a job), possibly injecting a real load from a real infrastructure.

Example of access to logs

In this example, we use the PostgresSQL client to generate a CSV file, named '~/oardata.csv', containing all the jobs of the user 'toto'. Each row of the file will be one job of the user. The columns of the CSV file will be the list of nodes assigned to the job, the number of nodes, the number of cores, the cluster name, the submission time, start time and stop time, the job ID, the job name (if any), the job type, the command executed by the job (if any) and the request made by the user.

First, on one of the frontend nodes, launch the client

 psql -h oardb.grenoble.grid5000.fr -U oarreader oar2

Then, after entering the password, run the following command (change the user name and the file name if needed):

 \copy (Select string_agg(Distinct host, '/') as hosts, Count(Distinct host) as nb_hosts, Count(Distinct resources.resource_id) as nb_cores,cluster,submission_time,start_time,stop_time,job_id,job_name,job_type,command,initial_request From jobs Inner Join assigned_resources on jobs.assigned_moldable_job = assigned_resources.moldable_job_id Inner Join resources on assigned_resources.resource_id = resources.resource_id Where job_user = 'toto' Group By jobs.submission_time,jobs.start_time,jobs.stop_time,jobs.job_id,jobs.job_name,jobs.job_type,jobs.command,resources.cluster) To '~/oardata.csv' With CSV;