This practice is about running jobs on a cluster. You will learn how to access a Grid'5000 cluster, how to install your data and how to run your jobs and visualize them.
We recommend you to open at least 2 terminals:
You're advised to look at the OAR2#Quick_Glossary for definitions of terms used below.
Prepare the experiment environment
In this tutorial, we are going to use a very simple program using the OpenMPI library. This hello world example only prints the rank of each parallel process and the name of the node it is running on.
- To simulate a computation, each process sleeps during 60 seconds.
The program to run is stored outside of Grid'5000. Thus, we need to retrieve and install it on the cluster where we are currently connected.
Check proxy configuration
To retrieve data from a web site outsite Grid'5000, it is required to use the web proxy.
Only a few external web sites are reachable from inside Grid'5000
We will use the wget command to retrieve our data from Internet. To use a web proxy with wget, the
$https_proxy environment variables must be defined.
By default, those
$https_proxy environment variables are not set, to avoid any confusion when using http to connect resources inside Grid'5000. Please run the following command to check your current environment :
You should get:
For this tutorial we use the web proxy, so we need to set the 2 environment variables:
We can now verify that the environment is indeed modified:
You should get:
The experiment data are stored on the project's repository at INRIA Gforge.
(1) Copy them onto the
(2) Unpack experiment data:
~/hello/ directory has been created in our home directory.
It will be available on every clusters's node of the site because of
NFS-mounted home directories.
If your experiment generate lots of writings, it's advised to do them on local disk space of the nodes instead of your networked-shared home directory.
This way, you will avoid a lot of NFS troubles, such as lags or breakdowns.
NFS service is shared among all users and all compute nodes, its performance may vary independently of your experiment.
In order to ensure experiment's reproductibility, be sure to avoid measurements that could depend on the performance of a shared NFS server !
Experiment data are now ready. Before running the experiment in a job, we are now going to look at the cluster state, which can be visualized in many ways.
Scheduled or running jobs
oarstat is a command-line tool to view current or planned job submission.
View each submissions:
View each submission details:
View a specific submission details
View the status of a specified job:
View each submissions from a given user:
oarnodes is also a command-line tool. It shows cluster node properties:
Among returned information there is current node state. This state is generally
Absent. When nodes are sick, their state is
oarprint is a tool providing a pretty print of a job resources.
The command prints a sorted output of the resources of a job with regard to a key property, with a customisable format.
On a job connection node (where
$OAR_RESOURCE_PROPERTIES_FILE is defined):
The following command must be executed in an OAR job (on a compute node)
On the submission
For now, you can test this tool using the second command with a
OAR_JOB_ID obtained with
Current nodes states
Monika is a web interface which in a way synthesizes information given by
- Current nodes states
- Scheduled or running submissions (at the bottom).
Drawgantt is a web interface that prints past, current and planned node states on a temporal diagram.
Node load, memory usage, cpu usage and so on are available with the Ganglia web interface: https://helpdesk.grid5000.fr/ganglia/
By default, current metrics are displayed but you can have an aggregated view of up to 1 year in the past.
OAR2, the Grid'5000 batch scheduler, has an interactive mode. This mode connects the user to the first of his allocated nodes.
Submit an interactive job:
OAR2 returns a numeric unique Id that identify our submission:
-I option automatically connects you to the job's first node.
OAR2 sets several environment variables that can be used by scripts to get parametrized by the current submission properties:
Especially the list of your dedicated nodes can be viewed:
Sometimes nodes name are duplicated inside the
It is time to run our script:
Results are going to be printed on the standard output.
Submission is visible on the Monika web interface of the site where it was submitted:
Cluster status cannot be obtained in command-line from the node where you were connected by OAR.
You need another terminal connected to the
With interactive submission, the end of the job is:
- Not related to the execution lifespan of your scripts.
- Depends on the connection to the job's first node that OAR made for you.
Thus you can run as many scripts as you want until the job deadline.
Default submission's walltime are of 1 hour. If your connection still lies after that deadline, it's automatically cut off by OAR2.
You can kill your submission by quitting the shell opened by OAR.
But the submission should no longer appear when requesting the current cluster status:
The nodes dedicated to the job should return in the available state.
OAR2 could be used in passive mode:
- A script is passed into parameter.
- It will be executed on the reservation's head.
- It must know about the other dedicated nodes to split its work between them.
Submit a passive job with our script:
Environment variables, described during our interactive submission, are also set for passive jobs.
Do the same as for interactive submission.
The following script launches a passive job that will execute the
hello_mpi program and waits until the job starts.
#!/bin/bash my_script="~/hello/run_hello_mpi" oar_job_id=`oarsub $my_script | grep "OAR_JOB_ID" | cut -d '=' -f2` oar_stdout_file="OAR.$oar_job_id.stdout" until oarstat -s -j $oar_job_id | grep Running ; do echo "Job (id: $oar_job_id) is waiting..." sleep 1 done echo "Job $oar_job_id is started !"
To ease post-run analysis, OAR will by default redirect standard output and standard error output
They should be seen in your current working directory after job's end:
Thus you can follow the output of your job as it is running:
OAR lets you to specify output files for standard and error output streams by using options
For example it is possible to redirect the oarsub output in /dev/null or in other files:
The results of our job are available at the end of the files containing the standard (and error) outputs:
Connection to a running job
While a job is running, it is possible to connect inside its environment with
-C OAR_JOB_ID option.
Unless you specify a submission with
-t allow_classic_ssh you have to use the OAR shell to connect to
Node number specification
Unless specified, submissions request the default resource quantity: 1 node.
For submitting an interactive job on 2 nodes:
We are automatically connected to the reservation's head (one of the 2 nodes) due to the interactive submission.
We can learn about our dedicated resources:
As you can read, the example script detects the available nodes and adapts itself to use all the CPUs:
We can verify that the run occurs on the other node with another terminal:
With this functionality it is possible to execute jobs within another one.
- So it is like a sub-scheduling mechanism.
(1) Submit a job of type
oarsub returns the
OAR_JOB_ID of the container job.
(2) From the frontend in a new terminal, it's possible to use the
inner type to schedule new jobs within:
Inner jobs have to be submitted with:
Until now our submissions used default start time now and default duration 1 hour. OAR could off course let you choose a specific duration and a delayed start: advance reservations.
To run the job on April 19th, 5:30pm for 10 minutes:
You can do advance reservations without specifying a script.
The delayed submission appears as Scheduled on Status#Monika of the site where it was submitted.
When the job starts, you can connect to the reservation's head to interactively run the script or monitor its run:
Submission's ending does not occur when you disconnect from reservation's head (even if you omit specifying a script to run).
Ending occurs when:
- Specified script ends
- Job hits its walltime.
- Job is explicitely terminated
If you did not specify a script to run and you finished before the job's walltime, it is a good idea to release the allocated nodes earlier.
To terminate a job:
Same things, but at Grid level: Grid experiment