Difference between revisions of "Grid5000:Gotchas"

From Grid5000
Jump to: navigation, search
(New page: This page documents various [http://en.wikipedia.org/wiki/Gotcha_(programming) ''gotchas''] (counter-intuitive features of Grid'5000) that could affect users' experiments in surprising way...)
 
(Compute nodes)
 
(31 intermediate revisions by 12 users not shown)
Line 1: Line 1:
 +
{{Maintainer|Lucas Nussbaum}}
 +
{{Portal|User}}
 +
{{Note|text=For a more up to-date list of Gotchas, see https://www.grid5000.fr/status/artifact/}}
 
This page documents various [http://en.wikipedia.org/wiki/Gotcha_(programming) ''gotchas''] (counter-intuitive features of Grid'5000) that could affect users' experiments in surprising ways.
 
This page documents various [http://en.wikipedia.org/wiki/Gotcha_(programming) ''gotchas''] (counter-intuitive features of Grid'5000) that could affect users' experiments in surprising ways.
  
 
== Network ==
 
== Network ==
 +
 +
Global and per sites network documentation can be found on [[Grid5000:Network]] page.
 +
 
=== Topology of ethernet networks ===
 
=== Topology of ethernet networks ===
Most (large) clusters have a hierarchical ethernet topology, because ethernet switchs with a large number of ports are too expensive. A good example of such a hierarchical topology is the [[Orsay:Network]], where nodes are first connected to 18 24-port switches, which are themselves connected to the central Cisco Catalyst 6509 switch. When doing experiments using the ethernet network intensively, it is a good idea to request nodes on the same switch, using e.g <tt>oarsub -l switch=1/nodes=5</tt>, or to request nodes connected to specific switch using e.g <tt>oarsub -p "switch='cisco2'" -l nodes=5</tt>.
+
 
 +
Most (large) clusters have a hierarchical ethernet topology, because ethernet switchs with a large number of ports are too expensive. A good example of such a hierarchical topology is the [[Rennes:Network]] for the paravance and parasilo clusters, where nodes are connected to 3 different switches. When doing experiments using the ethernet network intensively, it is a good idea to request nodes on the same switch, using e.g <tt>oarsub -l switch=1/nodes=5</tt>, or to request nodes connected to specific switch using e.g <tt>oarsub -p "switch='cisco2'" -l nodes=5</tt>.
  
 
=== Performance of ethernet networks ===
 
=== Performance of ethernet networks ===
Line 10: Line 17:
  
 
=== High-performance networks ===
 
=== High-performance networks ===
The topology of Infiniband and Myrinet networks is generally less surprising, and many of them are non-blocking (the switch can handle the total bandwidth of all ports simultaneously). However, there are some exceptions :
+
The topology of Infiniband and Omni-Path networks is generally less surprising, two "fat-tree" topologies can be found on the testbed:
* the Infiniband network in Grenoble is hierarchical (see [[Grenoble:Network]]).
+
* non-blocking (''1:1'') : the number of up-link ports (from leaf switches to top switches) is equal to the number of down-link ports (nodes to leaf switches). Like that, all the nodes can communicate with each others at full-speed.
* in nancy, graphene-144 is connected to the griffon infiniband switch. This was required in order to free a port on the graphene switch, used to connect the two infiniband switchs together. This can impact the performance of your application if you are using all 144 graphene nodes.
+
* blocking (''2:1''): the number of up-link ports (from leaf switches to top switches) is half the number of down-link ports (nodes to leaf switches). Like that, nodes from the same leaf switch can communicate to each other at full speed, but not with nodes from others leaf switches.
  
 
== Compute nodes ==
 
== Compute nodes ==
 
All Grid'5000 clusters are supposed to contain homogeneous (identical) sets of nodes, but there are some exceptions.
 
All Grid'5000 clusters are supposed to contain homogeneous (identical) sets of nodes, but there are some exceptions.
 +
 +
Global and per sites cluster documentation can be found on [[Hardware]] page.
 +
 
=== Hard disks ===
 
=== Hard disks ===
Due to their high failure rate, hard disks tend to get replaced frequently, and it is not always possible to keep the same model during the whole life of a cluster.
+
Due to their high failure rate, hard disks tend to get replaced frequently, and it is not always possible to keep the same model during the whole life of a cluster. If this is important to you, please check exact disk model using the reference API, as storage is described in detail for each node.
 +
 
 +
=== NVMe disks configuration ===
  
=== Different CPU performance in the Orsay gdx cluster ===
+
Due to many issues, we had to disable "multipath" support for NVMe disks in most of our environments. This is done by passing the "multipath=off" parameter to the <code>nvme_core</code> module.
The gdx cluster is Orsay is composed of two sets of nodes, as documented in [[Orsay:Hardware]]:
+
If you need NVMe multipath support, you can deploy any <code>-min</code> environment (e.g. <code>debian10-x64-min</code>) since they do not contain this workaround.
* 186 IBM e326m with 2.0 GHz CPUs
 
* 126 IBM e326m with 2.4 GHz CPUs
 
  
In order to select one type of nodes or the other, you need to use the cpufreq OAR property: <tt>oarsub -p "cluster='gdx' and cpufreq='2.0'"</tt>
+
== Software ==
 +
* The standard environment (the one users get when not deploying) on all compute nodes is identical for a given architecture (x86-64, arm64 or ppc64), with the exception of additional drivers and software to support GPUs and High Speed networks on sites where they are available.
 +
* The user frontend are identical on all sites.
 +
* The reference environments (*-$arch-{min,base,nfs,xen,big}) are identical on all sites, for a given architecture.
  
This is supposed to be fixed eventually, and tracked in [https://www.grid5000.fr/cgi-bin/bugzilla3/show_bug.cgi?id=1875 bug 1875]
+
Regarding CPU architectures, some differences can be found in environments:
=== Different machines in the Grenoble adonis cluster ===
 
The Grenoble adonis cluster is composed of two different set of nodes:
 
* adonis-1 to adonis-10 have two E5520 CPUs (Intel Xeon Nehalem GainesTown, 4 cores) ; OAR property: <tt>cputype=xeon-Gainestown</tt>
 
* adonis-11 to adonis-12 have two E5620 CPUs (Intel Xeon Nehalem Gulftown, 6 cores) ; OAR property: <tt>cputype=xeon-gulftown</tt>
 
  
This is supposed to be fixed eventually, and tracked in [https://www.grid5000.fr/cgi-bin/bugzilla3/show_bug.cgi?id=3576 bug 3576]
+
{| class="wikitable"
 +
|-
 +
! scope="col"| Feature
 +
! scope="col"| x86-64
 +
! scope="col"| arm64
 +
! scope="col"| ppc64
 +
! scope="col"| env
 +
|-
 +
! scope="row"| Infiniband
 +
| {{Yes}}
 +
| {{Yes}}
 +
| {{Yes}}
 +
| ''base''
 +
|-
 +
! scope="row"| OmniPath
 +
| {{Yes}}
 +
| {{No}}
 +
| {{No}}
 +
| ''base''
 +
|-
 +
! scope="row"| NFS
 +
| {{Yes}}
 +
| {{Yes}}
 +
| {{Yes}}
 +
| ''nfs''
 +
|-
 +
! scope="row"| Ceph
 +
| {{Yes}}
 +
| {{Yes}}
 +
| {{Yes}}
 +
| ''nfs''
 +
|-
 +
! scope="row"| Xen Dom0
 +
| {{Yes}}
 +
| {{Yes}}
 +
| {{No}}
 +
| ''xen''
 +
|-
 +
! scope="row"| Cuda
 +
| {{Yes}}
 +
| {{No}}
 +
| {{Yes}}
 +
| ''big''
 +
|-
 +
! scope="row" | BeegFS
 +
| {{Yes}}
 +
| {{No}}
 +
| {{No}}
 +
| ''big''
 +
|-
 +
! scope="row" | OpenMPI
 +
| {{Yes}}
 +
| {{Yes}}
 +
| {{Yes}}
 +
| ''big''
 +
|}

Latest revision as of 14:34, 15 April 2021

Note.png Note

For a more up to-date list of Gotchas, see https://www.grid5000.fr/status/artifact/

This page documents various gotchas (counter-intuitive features of Grid'5000) that could affect users' experiments in surprising ways.

Network

Global and per sites network documentation can be found on Grid5000:Network page.

Topology of ethernet networks

Most (large) clusters have a hierarchical ethernet topology, because ethernet switchs with a large number of ports are too expensive. A good example of such a hierarchical topology is the Rennes:Network for the paravance and parasilo clusters, where nodes are connected to 3 different switches. When doing experiments using the ethernet network intensively, it is a good idea to request nodes on the same switch, using e.g oarsub -l switch=1/nodes=5, or to request nodes connected to specific switch using e.g oarsub -p "switch='cisco2'" -l nodes=5.

Performance of ethernet networks

The backplane bandwidth of ethernet switches doesn't usually allow full-speed communications between all the ports of the switch.

High-performance networks

The topology of Infiniband and Omni-Path networks is generally less surprising, two "fat-tree" topologies can be found on the testbed:

  • non-blocking (1:1) : the number of up-link ports (from leaf switches to top switches) is equal to the number of down-link ports (nodes to leaf switches). Like that, all the nodes can communicate with each others at full-speed.
  • blocking (2:1): the number of up-link ports (from leaf switches to top switches) is half the number of down-link ports (nodes to leaf switches). Like that, nodes from the same leaf switch can communicate to each other at full speed, but not with nodes from others leaf switches.

Compute nodes

All Grid'5000 clusters are supposed to contain homogeneous (identical) sets of nodes, but there are some exceptions.

Global and per sites cluster documentation can be found on Hardware page.

Hard disks

Due to their high failure rate, hard disks tend to get replaced frequently, and it is not always possible to keep the same model during the whole life of a cluster. If this is important to you, please check exact disk model using the reference API, as storage is described in detail for each node.

NVMe disks configuration

Due to many issues, we had to disable "multipath" support for NVMe disks in most of our environments. This is done by passing the "multipath=off" parameter to the nvme_core module. If you need NVMe multipath support, you can deploy any -min environment (e.g. debian10-x64-min) since they do not contain this workaround.

Software

  • The standard environment (the one users get when not deploying) on all compute nodes is identical for a given architecture (x86-64, arm64 or ppc64), with the exception of additional drivers and software to support GPUs and High Speed networks on sites where they are available.
  • The user frontend are identical on all sites.
  • The reference environments (*-$arch-{min,base,nfs,xen,big}) are identical on all sites, for a given architecture.

Regarding CPU architectures, some differences can be found in environments:

Feature x86-64 arm64 ppc64 env
Infiniband Check.png Check.png Check.png base
OmniPath Check.png Fail.png Fail.png base
NFS Check.png Check.png Check.png nfs
Ceph Check.png Check.png Check.png nfs
Xen Dom0 Check.png Check.png Fail.png xen
Cuda Check.png Fail.png Check.png big
BeegFS Check.png Fail.png Fail.png big
OpenMPI Check.png Check.png Check.png big