Run MPI On Grid'5000

From Grid5000
Jump to navigation Jump to search
Warning.png Warning

Practical session under construction.

Running MPI on Grid'5000

When attempting to run MPI on Grid'5000 you'll be faced with a number of challenges, ranging from classical setup problems for MPI software to problems specific to Grid'5000. This practical session aims at driving you through the most common uses cases, which are

  • setting up and starting openMPI on a default environment using allow_classic_ssh
  • setting up and starting openMPI on a default environment using oarsh
  • setting up and starting openMPI on a kadeploy image
  • setting up and starting openMPI to use high performance interconnect

Pre-requisite

Overwiew

Currently, the default environment is not the same on every sites, therefore, you don't have the same version of OpenMPI on every site. If you want to use OpenMPI for a grid experiment, you will have to install your own MPI version; You have two options:

  • install OpenMPI on your home dir (but you should recompile it and install it on all the sites you want to use! it may work by simply copying the compiled file, but it's not guaranteed)
  • use the same kadeploy image and deploy it on the the sites you want to use

If you are only interested on a single site experiment, you may use the version provided by the default environment.

Compilation

  • Make a reservation (or connect to the compilation machine if available (compil.<node>.site.grid5000.fr) and compile it with the option you like.
mkdir -p $HOME/src/mpi
oarsub -I  
./configure --prefix=$HOME/openmpi/  --with-memory-manager=none
make -j4
  • Install it on your home directory
make install

Setting up and starting OpenMPI on a default environment using allow_classic_ssh

  • oarsub -I -t allow_classic_ssh -l nodes=3
  • Code to test :
$ mkdir -p $HOME/src/mpi
$ vi $HOME/src/mpi/tp.c
#include <stdio.h>
#include <mpi.h>
#include <time.h> /* for the work function only */

int main (int argc, char *argv []) {
       char hostname[257];
       int size, rank;
       int i, pid;
       int bcast_value = 1;

       gethostname (hostname, sizeof hostname);
       MPI_Init (&argc, &argv);
       MPI_Comm_rank (MPI_COMM_WORLD, &rank);
       MPI_Comm_size (MPI_COMM_WORLD, &size);
       if (!rank) {
            bcast_value = 42;
       }
       MPI_Bcast (&bcast_value,1 ,MPI_INT, 0, MPI_COMM_WORLD );
       printf("%s\t- %d - %d - %d\n", hostname, rank, size, bcast_value);
       fflush(stdout);

       MPI_Barrier (MPI_COMM_WORLD);
       MPI_Finalize ();
       return 0;
}


  • Compile your code
$ $HOME/openmpi/bin/mpicc src/mpi/tp.c -o src/mpi/tp
  • Use this script to launch
$HOME/openmpi/bin/mpirun -machinefile $OAR_NODEFILE $HOME/src/mpi/tp

Setting up and starting OpenMPI on a default environment using oarsh

  • oarsub -I -l nodes=3

oarsh is the default connector used when you reserve a node. To be able to use this connector, you need to add the option --mca plm_rsh_agent "oarsh" to mpirun.

$HOME/openmpi/bin/mpirun --mca  plm_rsh_agent "oarsh" -machinefile $OAR_NODEFILE $HOME/src/mpi/tp

Setting up and starting OpenMPI on a kadeploy image

Building a kadeploy image

Note.png Note

You can skip this section and use directly the environment squeeze-ompi

We will create a kadeploy image based on an existing one.

oarsub -I -t deploy -l nodes=1,walltime=2
kadeploy3 -f $OAR_NODEFILE -e lenny-x64-base -k

Then connect onthe deployed node as root, and install openmpi:

ssh root@<node>
cd /tmp/
tar jvxf ~/softs/openmpi-1.4.1.tar.bz2
cd openmpi-1.4.1
./configure --libdir=/usr/local/lib64 --with-memory-manager=none
make -j4
make install
apt-get -y install libblas-dev
tgz-g5k /dev/shm/image.tgz

On the frontend:

scp node:/dev/shm/image.tgz kaopenmpi.tgz

Using a kadeploy image

Setting up and starting OpenMPI to use high performance interconnect

By default, openMPI tries to use any high performance interconnect he can find. This true only if he has found the libraries at compile time (compilation of openmpi, not your application). This should be true if you have built OpenMPI on a lenny-x64 environment.


We will using the Netpipe tool to check if the high performance interconnect is really used: download it from this URL: http://www.scl.ameslab.gov/netpipe/code/NetPIPE-3.7.1.tar.gz

cd $HOME/src/mpi
tar zvxf ~/dload/NetPIPE-3.7.1.tar.gz
cd NetPIPE-3.7.1
export PATH=~/openmpi/1.4.1/bin:$PATH
make mpi

Myrinet hardware : MPICH-MX

To reserve one core on two nodes with a myrinet interconnect:

  • oarsub -I -l /nodes=2/core=1 -p "myri2g='YES'"

or

  • oarsub -I -l /nodes=2/core=1 -p "myri10g='YES'"
cd $HOME/src/mpi/NetPIPE-3.7.1
$HOME/openmpi/bin/mpirun --mca  plm_rsh_agent "oarsh" -machinefile $OAR_NODEFILE NPmpi

you should have something like that:

 0:         1 bytes   4080 times -->      0.31 Mbps in      24.40 usec     
 1:         2 bytes   4097 times -->      0.63 Mbps in      24.36 usec     
 ...
 122: 8388608 bytes      3 times -->    896.14 Mbps in   71417.13 usec
 123: 8388611 bytes      3 times -->    896.17 Mbps in   71414.83 usec

the minimum latency is given by the last column for a 1 byte message the maximum throughput is given by the last line, 896.17 Mbps in this case

To reserve one core on two nodes with an infiniband interconnect:

Infiniband hardware : MVAPICH

  • oarsub -I -l /nodes=2/core=1 -p "ib10g='YES'"

or

  • oarsub -I -l /nodes=2/core=1 -p "ib20g='YES'"

Setting up openMPI to accept private networks as routable between hosts

OpenMpi site : OpenMPI

  • Connect on one site and reserve a node
ssh nancy.grid5000.fr
oarsub -I
  • Connect to another site and reserve a node
ssh rennes.grid5000.fr
oarsub -I
  • Try to launch the code that you previously create between the two reserved nodes