GNU Parallel

From Grid5000
Revision as of 19:39, 21 March 2020 by Pneyron (talk | contribs) (Created page with "This page descibes the use of GNU Parallel on Grid'5000. Quoting [https://www.gnu.org/software/parallel/|GNU Parallel website]: GNU parallel is a shell tool for executing jo...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This page descibes the use of GNU Parallel on Grid'5000.

Quoting Parallel website:

GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run  for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel. 

For a more general and complete information, see the Parallel website.

We details in this page Grid'5000 specific information in order to let you take benefit from the tool on the platform.

About the GNU Parallel version installed in Grid'5000

The version of GNU Parallel installed on Grid'5000 nodes comes from the Debian's official packaging.

It is rather old version, however the examples below function with it.

Whenever one would need a more recent version is of course possible, one can get the tarball provided at http://ftp.gnu.org/gnu/parallel/parallel-latest.tar.bz2, and perform an installation in the home directoy, this is strait forward (e.g. ./configure --prefix=$HOME/parallel && make install). An environment module could be provided if requested by some users.

Benefit from using GNU Parallel in Grid5000

While OAR is the Resource and Job Management System of Grid'5000 and supports the management of batch of jobs, its use may be overkill to handle SPMD parallel executions of small tasks within a larger reservation. In concrete terms, a user may create a first OAR job in order to book a large set of resources for some time (e.g. for the night), and then have to submit a batch of many small tasks (e.g. each using only one core) within that first job. To that purpose, using OAR container for the first job, then inner job for the small tasks is overkill.

We strongly advise to use GNU parallel to handle the execution of the small tasks within the initial reservation of resources. That means only create one OAR job to book the large set of resources (not using the container job type), then within this job, use GNU parallel.

How to use GNU Parallem in Grid'5000

GNU Parallel must be used within a OAR job. GNU Parallel does not book resources, it just manages the concurrent parallel execution on already reserved resources.

Single-node

Within a job of only 1 node (host), there is nothing special in the use of GNU parallel in Grid'5000, in order to exploit all the cores of the node. See parallel documentation.

Multi-node

Within a job of many nodes (hosts), the user need to tell GNU Parallel how to execute on the reserved OAR job.

  1. It has provide the list of target nodes to execute on, to pass to the Parallel --slf option.
  2. It has to use the oarsh connector (unless the -t allow_classic_ssh OAR job type was used), by passing it to the Parallel --ssh option.

Examples

Example of use of GNU Parallel in a multi-node reservation.
  • Create a OAR job of 10 nodes
Terminal.png frontend:
oarsub -I -l host=10

We create an interactive job for this example, so that the command below are executed in the opened job shell.

  • Create the sshlogin file for GNU Parallel from OAR_NODEFILE:

The OAR node file contains as many lines with a node name as the count of cores of that node.

Terminal.png node:
uniq $OAR_NODEFILE > nodes

For a basic sshlogin file, GNU Parallel needs one line per host. It will compute by itself how many tasks to execute on each host, as a function of the hardware threads count of the host.

  • Run parallel with the --ssh and --slf options:
Terminal.png node:
parallel --progress --ssh $(which oarsh) --slf nodes command ::: argument