HPC and HTC tutorial

From Grid5000
Jump to: navigation, search

Grid'5000 gives an easy access to a wide variety of hardware technologies and is particularly suitable to carry out HPC (high performance computing) experiments: Users can investigate parallel algorithms, scalability problems or performance portability on Grid'5000. Whereas HPC production systems generally have rather rigid restrictions (no root access, no possibility to install system-wide software, no ssh connection to the compute nodes, no internet access...), Grid'5000 does not suffer from these common limitations of HPC systems. In particular, Grid'5000 has a job scheduling policy that allow reservations in advance of resources which is useful for setting up an experiment on your own schedule. You can also reinstall cluster nodes and gain root access during the time of your jobs using Kadeploy. This can be used to control the entire software stack, experiments with runtime environments, fine-tune network parameters (ex. MTU) or to simply ensure the reproducibility of your experiments by freezing its context. In addition, Grid'5000 provides a set of tools for monitoring experiments that you might find especially useful for detecting problems such as network contentions on distributed algorithms.

Note.png Note

This tutorial is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team.

Discovering HPC resources on Grid'5000

The easiest way to get the global picture of the HPC systems available on Grid'5000 is to consult the Hardware page. This page is built using the Grid'5000 Reference API and describes in detail the CPU models, network interfaces and accelerators of each cluster. You can also use the API Quick Start page as it provides advanced filters for selecting nodes by hardware capability. Alternatively, you can parse the Grid'5000 Reference API yourself to discover the available resources on each site.

Resource reservation on Grid'5000

Resource reservation using the OAR scheduler is covered by the Getting Started tutorial. You can select specific hardware by using the "-p" (properties) option of the oarsub command. The list of properties available on each site is listed on the Monika pages, which is linked from the Status page. For instance, see the Monika page for Nancy. You can combine OAR properties or even use SQL queries for advance filtering.

Here is a non exhaustive list of OAR properties for HPC experiments:

  • CPU: cpuarch, cpucore, cpufreq, cputype
  • Memory (RAM in MB): memnode (memory per node), memcpu (per cpu), memcore (per core)
  • Network:
    • eth_count (number of ethernet interfaces), eth_rate (rate of the fastest ethernet interface available on the node)
    • ib_count (number of InfiniBand interfaces), ib = {'NO', 'SDR', 'DDR', 'QDR', 'FDR'} (the InfiniBand technology available), ib_rate = {0, 10, 20, 40, 56} (max rate in Gbit/s)
  • Accelerator: gpu (=YES/SHARED/NO), gpu_count (number of GPU per node), mic (YES/NO)

For example, you can make a reservation at Lyon for a GPU node using:

Terminal.png flyon:
oarsub -I -p "gpu!='NO'"

Or get a node with at least 256 Go of RAM at Nancy:

Terminal.png fnancy:
oarsub -I -p "memnode>256000"

Using Grid'5000 resources as a HPC production system

The first intent of Grid'5000 is to be a testbed for experiment-driven research in all areas of computer science with a focus on parallel and distributed computing including Cloud, HPC and Big Data. However, Grid5000 offers such a large amount of resources that it allows the use of its idle resources for workloads which are more production oriented (The goal is just to obtain results faster, without regard to the method that is used). Those include HTC (High-throughput computing) projects requiring the execution of a large number of loosely-coupled tasks (also called an embarrassingly parallel workload). That usage of Grid'5000 is only allowed for projects in connection with computer science research and jobs must be submitted using the besteffort mode of the scheduler, or be granted accordingly to the production queue usage policy.

Besteffort jobs are executed on idle resources and are killed whenever a regular job requests the resources. Therefore, besteffort jobs need to be monitored. In particular, long-running jobs should implement a check-pointing mechanism. If your job is of type besteffort and idempotent (oarsub -t besteffort -t idempotent) and killed by the OAR scheduler, then another job is automatically created and put in the queue with same configuration (note that your job is also resubmitted if the exit code of your program is 99).

Applications that submit a large amount of tasks (i.e. Bag-of-Tasks campaigns) should consider using CiGri, which a grid middleware that dispatches the workload on the whole Grid'5000 infrastructure and equally distributes the idle resources among its users without overloading the infrastructure.

Using HPC hardware on Grid'5000

The rest of this tutorial is folded in 3 distinct parts that can be followed in any order: