GNU Parallel
This page descibes the use of GNU Parallel on Grid'5000.
Quoting Parallel website:
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.
For a more general and complete information, see the Parallel website.
We details in this page Grid'5000 specific information in order to let you take benefit from the tool on the platform.
About the GNU Parallel version installed in Grid'5000
The version of GNU Parallel installed on Grid'5000 nodes comes from the Debian's official packaging.
It is a rather old version, but it seems sufficient.
Whenever one would need a more recent version, one can get the tarball provided at http://ftp.gnu.org/gnu/parallel/parallel-latest.tar.bz2, and install it in one's home directoy. This is straightforward (e.g. ./configure --prefix=$HOME/parallel && make install).
(An environment module could be provided if requested by some users.)
Benefit from using GNU Parallel in Grid5000
While OAR is the Resource and Job Management System of Grid'5000 and supports the management of batch of jobs, its use may be overkill to handle SPMD parallel executions of small tasks within a larger reservation. In concrete terms, a user may create a first OAR job in order to book a large set of resources for some time (e.g. for the night), and then have to submit a batch of many small tasks (e.g. each using only one core) within that first job.
To that purpose, using OAR container for the first job, then OAR inner jobs for the small tasks is overkill. (But note that using OAR container and inner jobs makes sense when all jobs are not from the same user, for tutorials for instance).
We strongly advise to use GNU parallel to handle the execution of the small tasks within the initial OAR reservation of resources. That means only create one OAR job to book the large set of resources (not using the container job type), then within this job, use GNU Parallel.
Note that using GNU Parallel to handle the small tasks, the OAR restriction (e.g. max 200 jobs in queue) will not apply.
How to use GNU Parallem in Grid'5000
GNU Parallel must be used within a OAR job: GNU Parallel does not book resources, it just manages the concurrent parallel execution on already reserved resources.
- Single node
Within a job of only 1 node (host), there is nothing special in the use of GNU parallel in Grid'5000, in order to exploit all the cores of the node. See GNU parallel documentation, or manual page for more information.
- Multiple nodes
Within a job of many nodes (hosts), the user need to tell GNU Parallel how to execute on the nodes reserved in the OAR job.
- It has provide the list of target nodes to execute on, to pass to the GNU Parallel --slf option.
- It has to use the oarsh connector (unless the -t allow_classic_ssh OAR job type was used), by passing it to the GNU Parallel --ssh option.
Examples
Example of use of GNU Parallel in a multi-node reservation
- Create a OAR job of 10 nodes
We create an interactive job for this example, so that the command below are executed in the opened job shell.
- Create the sshlogin file for GNU Parallel from OAR_NODEFILE
The OAR node file contains as many lines with a node name as the count of cores of that node.
For a basic sshlogin file, GNU Parallel needs one line per host. It will compute by itself how many tasks to execute on each host, as a function of the hardware threads count of the host.
- Run parallel with the --ssh and --slf options
