Run MPI On Grid'5000: Difference between revisions

From Grid5000
Jump to navigation Jump to search
No edit summary
Line 6: Line 6:
= Running MPI on Grid'5000 =
= Running MPI on Grid'5000 =
When attempting to run MPI on Grid'5000 you'll be faced with a number of challenges, ranging from classical setup problems for MPI software to problems specific to Grid'5000. This practical session aims at driving you through the most common uses cases, which are
When attempting to run MPI on Grid'5000 you'll be faced with a number of challenges, ranging from classical setup problems for MPI software to problems specific to Grid'5000. This practical session aims at driving you through the most common uses cases, which are
* setting up and starting openMPI on a default environment using allow_classic_ssh  
* setting up and starting openMPI on a default environment using allow_classic_ssh
* setting up and starting openMPI on a default environment using oarsh  
* setting up and starting openMPI on a default environment using oarsh
* setting up and starting openMPI on a kadeploy image
* setting up and starting openMPI on a kadeploy image
* setting up and starting openMPI to use high performance interconnect
* setting up and starting openMPI to use high performance interconnect
Line 18: Line 18:
Currently, the default environment is not the same on every sites, therefore, you don't have the same version of OpenMPI on every site. If you want to use OpenMPI for a grid experiment, you will have to install your own MPI version; You have two options:
Currently, the default environment is not the same on every sites, therefore, you don't have the same version of OpenMPI on every site. If you want to use OpenMPI for a grid experiment, you will have to install your own MPI version; You have two options:
* install  OpenMPI on your home dir (but you should recompile it and install it on all the sites you want to use! it may work by simply copying the compiled file, but it's not guaranteed)
* install  OpenMPI on your home dir (but you should recompile it and install it on all the sites you want to use! it may work by simply copying the compiled file, but it's not guaranteed)
* use the same kadeploy image and deploy it on the the sites you want to use  
* use the same kadeploy image and deploy it on the the sites you want to use


If you are only interested on a single site experiment, you may use the version provided by the default environment.
If you are only interested on a single site experiment, you may use the version provided by the default environment.


= Using OpenMPI on a default environment
= Using OpenMPI on a default environment=
==Compilation==
==Compilation==
* Make a reservation (or connect to the compilation machine if available (compil.<node>.site.grid5000.fr) and compile it :
* Make a reservation (or connect to the compilation machine if available (compil.<node>.site.grid5000.fr) and compile it :
mkdir -p $HOME/src/mpi
{{Term|location=frontend|cmd=<code class="command">oarsub</code> -I}}
oarsub -
Unarchive openmpi
./configure --prefix=$HOME/openmpi/  --with-memory-manager=none
{{Term|location=node|cmd=cd /tmp/}}
make -j4
{{Term|location=node|cmd=tar jvxf ~/openmpi-1.4.1.tar.bz2}}
{{Term|location=node|cmd=cd openmpi-1.4.1}}
configure and compile:
{{Term|location=node|cmd=<code class="command">./configure</code> --prefix=$HOME/openmpi/  --with-memory-manager=none}}
{{Term|location=node|cmd=cd openmpi-1.4.1}}
{{Term|location=node|cmd=<code class="command">make</code> -j4}}
* Install it on your home directory (in $HOME/openmpi/ )
* Install it on your home directory (in $HOME/openmpi/ )
make install
{{Term|location=node|cmd=<code class="command">make install</code>}}
 
==Create a sample MPI program==
==Setting up and starting OpenMPI on a default environment using allow_classic_ssh==
{{Term|location=frontend|cmd=<code class="command">oarsub</code> -I -t allow_classic_ssh -l nodes=3}}
* We will use a vary basic MPI program to test OAR/MPI; create a file <code class="file">$HOME/src/mpi/tp.c</code> and copy the following source:
* We will use a vary basic MPI program to test OAR/MPI; create a file <code class="file">$HOME/src/mpi/tp.c</code> and copy the following source:


{{Term|location=node|cmd=<code class="command">mkdir</code> -p $HOME/src/mpi}}
{{Term|location=frontend|cmd=<code class="command">mkdir</code> -p $HOME/src/mpi}}
{{Term|location=node|cmd=<code class="command">vi</code> $HOME/src/mpi/tp.c}}
{{Term|location=frontend|cmd=<code class="command">vi</code> $HOME/src/mpi/tp.c}}


now the sources:
the code source:
  #include <stdio.h>
  #include <stdio.h>
  #include <mpi.h>
  #include <mpi.h>
Line 67: Line 70:




* Compile your code  
==Setting up and starting OpenMPI on a default environment using allow_classic_ssh==
Submit a job with the <code>allow_classic_ssh</code> type
{{Term|location=frontend|cmd=<code class="command">oarsub</code> -I -t allow_classic_ssh -l nodes=3}}
 
* Compile your code
{{Term|location=node|cmd=<code class="command">$HOME/openmpi/bin/mpicc</code> src/mpi/tp.c -o src/mpi/tp}}
{{Term|location=node|cmd=<code class="command">$HOME/openmpi/bin/mpicc</code> src/mpi/tp.c -o src/mpi/tp}}


Line 74: Line 81:


== Setting up and starting OpenMPI on a default environment using <code class=command>oarsh</code> ==
== Setting up and starting OpenMPI on a default environment using <code class=command>oarsh</code> ==
* oarsub -I -l nodes=3
{{Term|location=frontend|cmd=<code class="command">oarsub</code> -I -l nodes=3}}
<code class=command>oarsh</code> is the default connector used when you reserve a node. To be able to use this connector, you need to add the option <code class=command>--mca  plm_rsh_agent "oarsh"</code> to mpirun.
<code class=command>oarsh</code> is the default connector used when you reserve a node. To be able to use this connector, you need to add the option <code class=command>--mca  plm_rsh_agent "oarsh"</code> to mpirun.


Line 104: Line 111:
Unarchive openmpi
Unarchive openmpi
{{Term|location=node|cmd=cd /tmp/}}
{{Term|location=node|cmd=cd /tmp/}}
{{Term|location=node|cmd=tar jvxf ~/softs/openmpi-1.4.1.tar.bz2}}
{{Term|location=node|cmd=tar jvxf ~/openmpi-1.4.1.tar.bz2}}
{{Term|location=node|cmd=cd openmpi-1.4.1}}
{{Term|location=node|cmd=cd openmpi-1.4.1}}
Configure and compile
Configure and compile
Line 111: Line 118:
{{Term|location=node|cmd=make install}}
{{Term|location=node|cmd=make install}}
Add blas library
Add blas library
{{Term|location=node|cmd=apt-get -y install libblas-dev}}
{{Term|location=node|cmd=<code class="command">apt-get</code> -y install libblas-dev}}
Create the image using tgz-g5k
Create the image using tgz-g5k
{{Term|location=node|cmd=tgz-g5k /dev/shm/image.tgz}}
{{Term|location=node|cmd=<code class="command">tgz-g5k</code> /dev/shm/image.tgz}}


Copy the image on the frontend:
Copy the image on the frontend:
Line 137: Line 144:
We will using the Netpipe tool to check if the high performance interconnect is really used:  download it from this URL: http://www.scl.ameslab.gov/netpipe/code/NetPIPE-3.7.1.tar.gz
We will using the Netpipe tool to check if the high performance interconnect is really used:  download it from this URL: http://www.scl.ameslab.gov/netpipe/code/NetPIPE-3.7.1.tar.gz


cd $HOME/src/mpi
{{Term|location=node|cmd=<code class="command">cd</code> $HOME/src/mpi}}
tar zvxf ~/dload/NetPIPE-3.7.1.tar.gz
Unarchive Netpipe
cd NetPIPE-3.7.1
{{Term|location=node|cmd=<code class="command">tar</code> zvxf ~/dload/NetPIPE-3.7.1.tar.gz}}
export PATH=~/openmpi/1.4.1/bin:$PATH
{{Term|location=node|cmd=<code class="command">cd</code> NetPIPE-3.7.1}}
make mpi
Change your PATH
{{Term|location=node|cmd=export PATH=~/openmpi/1.4.1/bin:$PATH}}
Compile
{{Term|location=node|cmd=<code class="command">make</code> mpi}}
== Myrinet hardware : [http://www.myri.com/scs/download-mpichmx.html MPICH-MX] ==
== Myrinet hardware : [http://www.myri.com/scs/download-mpichmx.html MPICH-MX] ==
To reserve one core on two nodes with a myrinet interconnect:
To reserve one core on two nodes with a myrinet interconnect:
* oarsub -I -l /nodes=2/core=1 -p "myri2g='YES'"
{{Term|location=frontend|cmd=<code class="command">oarsub</code> -I -l /nodes=2/core=1 -p "myri2g='YES'"}}
or
or
* oarsub -I -l /nodes=2/core=1 -p "myri10g='YES'"
{{Term|location=frontend|cmd=<code class="command">oarsub</code> -I -l /nodes=2/core=1 -p "myri10g='YES'"}}


cd $HOME/src/mpi/NetPIPE-3.7.1
{{Term|location=node|cmd=<code class="command">cd</code> $HOME/src/mpi/NetPIPE-3.7.1}}
$HOME/openmpi/bin/mpirun --mca  plm_rsh_agent "oarsh" -machinefile $OAR_NODEFILE NPmpi
{{Term|location=node|cmd=<code class="command">$HOME/openmpi/bin/mpirun</code> --mca  plm_rsh_agent "oarsh" -machinefile $OAR_NODEFILE NPmpi}}


you should have something like that:
you should have something like that:
Line 163: Line 173:
To reserve one core on two nodes with an infiniband interconnect:
To reserve one core on two nodes with an infiniband interconnect:
== Infiniband hardware : [http://mvapich.cse.ohio-state.edu/ MVAPICH] ==
== Infiniband hardware : [http://mvapich.cse.ohio-state.edu/ MVAPICH] ==
* oarsub -I -l /nodes=2/core=1 -p "ib10g='YES'"
{{Term|location=frontend|cmd=<code class="command">oarsub</code> -I -l /nodes=2/core=1 -p "ib10g='YES'"}}
or
or
* oarsub -I -l /nodes=2/core=1  -p "ib20g='YES'"
{{Term|location=frontend|cmd=<code class="command">oarsub -I -l /nodes=2/core=1  -p "ib20g='YES'"}}


= Setting up openMPI to accept private networks as routable between hosts =
= Setting up openMPI to accept private networks as routable between hosts =

Revision as of 14:22, 3 February 2010

Warning.png Warning

Practical session under construction.

Running MPI on Grid'5000

When attempting to run MPI on Grid'5000 you'll be faced with a number of challenges, ranging from classical setup problems for MPI software to problems specific to Grid'5000. This practical session aims at driving you through the most common uses cases, which are

  • setting up and starting openMPI on a default environment using allow_classic_ssh
  • setting up and starting openMPI on a default environment using oarsh
  • setting up and starting openMPI on a kadeploy image
  • setting up and starting openMPI to use high performance interconnect

Pre-requisite

Overwiew

Currently, the default environment is not the same on every sites, therefore, you don't have the same version of OpenMPI on every site. If you want to use OpenMPI for a grid experiment, you will have to install your own MPI version; You have two options:

  • install OpenMPI on your home dir (but you should recompile it and install it on all the sites you want to use! it may work by simply copying the compiled file, but it's not guaranteed)
  • use the same kadeploy image and deploy it on the the sites you want to use

If you are only interested on a single site experiment, you may use the version provided by the default environment.

Using OpenMPI on a default environment

Compilation

  • Make a reservation (or connect to the compilation machine if available (compil.<node>.site.grid5000.fr) and compile it :
Terminal.png frontend:
oarsub -I

Unarchive openmpi

Terminal.png node:
cd /tmp/
Terminal.png node:
tar jvxf ~/openmpi-1.4.1.tar.bz2
Terminal.png node:
cd openmpi-1.4.1

configure and compile:

Terminal.png node:
./configure --prefix=$HOME/openmpi/ --with-memory-manager=none
Terminal.png node:
cd openmpi-1.4.1
Terminal.png node:
make -j4
  • Install it on your home directory (in $HOME/openmpi/ )
Terminal.png node:
make install

Create a sample MPI program

  • We will use a vary basic MPI program to test OAR/MPI; create a file $HOME/src/mpi/tp.c and copy the following source:
Terminal.png frontend:
mkdir -p $HOME/src/mpi
Terminal.png frontend:
vi $HOME/src/mpi/tp.c

the code source:

#include <stdio.h>
#include <mpi.h>
#include <time.h> /* for the work function only */

int main (int argc, char *argv []) {
       char hostname[257];
       int size, rank;
       int i, pid;
       int bcast_value = 1;

       gethostname (hostname, sizeof hostname);
       MPI_Init (&argc, &argv);
       MPI_Comm_rank (MPI_COMM_WORLD, &rank);
       MPI_Comm_size (MPI_COMM_WORLD, &size);
       if (!rank) {
            bcast_value = 42;
       }
       MPI_Bcast (&bcast_value,1 ,MPI_INT, 0, MPI_COMM_WORLD );
       printf("%s\t- %d - %d - %d\n", hostname, rank, size, bcast_value);
       fflush(stdout);

       MPI_Barrier (MPI_COMM_WORLD);
       MPI_Finalize ();
       return 0;
}


Setting up and starting OpenMPI on a default environment using allow_classic_ssh

Submit a job with the allow_classic_ssh type

Terminal.png frontend:
oarsub -I -t allow_classic_ssh -l nodes=3
  • Compile your code
Terminal.png node:
$HOME/openmpi/bin/mpicc src/mpi/tp.c -o src/mpi/tp
  • Use this script to launch
Terminal.png node:
$HOME/openmpi/bin/mpirun -machinefile $OAR_NODEFILE $HOME/src/mpi/tp

Setting up and starting OpenMPI on a default environment using oarsh

Terminal.png frontend:
oarsub -I -l nodes=3

oarsh is the default connector used when you reserve a node. To be able to use this connector, you need to add the option --mca plm_rsh_agent "oarsh" to mpirun.

$HOME/openmpi/bin/mpirun --mca  plm_rsh_agent "oarsh" -machinefile $OAR_NODEFILE $HOME/src/mpi/tp

multi sites

Is this practical sessions, we will do multiples sites MPI with kadeploy. If you want to do this with the default environment, the following steps are required:

  • recompile openmpi on all the sites you want to use, in the same directory ($HOME/openmpi)
  • recompile your mpi application on all the sites using your openmpi.
  • use oargridsub to reserve nodes on several sites
  • build a node file using oargridstat -l
  • launch mpirun from the first node of your nodefile, using this nodefile instead of $OAR_NODEFILE.


Setting up and starting OpenMPI on a kadeploy image

Building a kadeploy image

Note.png Note

You can skip this section and use directly the environment lenny-x64-openmpi available at sophia

The default openmpi version available in debian based distributions are not compiled with high performances libraries like myrinet/MX, therefore we must recompile OpenMPI from sources. Fortunately, the default images (lenny-x64-XXX) includes all the libraries for high performance interconnect, and OpenMPI will find them at compile time.

We will create a kadeploy image based on an existing one.

Terminal.png frontend:
oarsub -I -t deploy -l nodes=1,walltime=2
Terminal.png frontend:
kadeploy3 -f $OAR_NODEFILE -e lenny-x64-base -k

Then connect on the deployed node as root, and install openmpi:

Terminal.png frontend:
ssh root@node

Unarchive openmpi

Terminal.png node:
cd /tmp/
Terminal.png node:
tar jvxf ~/openmpi-1.4.1.tar.bz2
Terminal.png node:
cd openmpi-1.4.1

Configure and compile

Terminal.png node:
./configure --libdir=/usr/local/lib64 --with-memory-manager=none
Terminal.png node:
make -j4
Terminal.png node:
make install

Add blas library

Terminal.png node:
apt-get -y install libblas-dev

Create the image using tgz-g5k

Terminal.png node:
tgz-g5k /dev/shm/image.tgz

Copy the image on the frontend:

Terminal.png frontend:
scp node:/dev/shm/image.tgz $HOME/lenny-openmpi.tgz

Copy the description file of lenny-x64-nfs

Terminal.png frontend:
cp /grid5000/desriptions/lenny-x64-nfs-2.0.dsc3 $HOME/lenny-openmpi.dsc

Change the image name in the description file:

Terminal.png frontend:
perl -i -pe "s@/grid5000/images/lenny-x64-nfs-2.0.tgz@$HOME/lenny-openmpi.tgz@" $HOME/lenny-openmpi.dsc

Using a kadeploy image

single site

Terminal.png frontend:
oarsub -t deploy -l /nodes=5
Terminal.png frontend:
kadeploy3 -a $HOME/lenny-openmpi.dsc -f $OAR_NODEFILE -k

multiple sites

Choose three clusters from 3 different sites.

Terminal.png frontend:
oargridsub -t deploy cluster1:rdef="nodes=2",cluster2:rdef="nodes=2",cluster3:rdef="nodes=2"

Setting up and starting OpenMPI to use high performance interconnect

By default, openMPI tries to use any high performance interconnect he can find. This true only if he has found the libraries at compile time (compilation of openmpi, not your application). This should be true if you have built OpenMPI on a lenny-x64 environment.


We will using the Netpipe tool to check if the high performance interconnect is really used: download it from this URL: http://www.scl.ameslab.gov/netpipe/code/NetPIPE-3.7.1.tar.gz

Terminal.png node:
cd $HOME/src/mpi

Unarchive Netpipe

Terminal.png node:
tar zvxf ~/dload/NetPIPE-3.7.1.tar.gz
Terminal.png node:
cd NetPIPE-3.7.1

Change your PATH

Terminal.png node:
export PATH=~/openmpi/1.4.1/bin:$PATH

Compile

Terminal.png node:
make mpi

Myrinet hardware : MPICH-MX

To reserve one core on two nodes with a myrinet interconnect:

Terminal.png frontend:
oarsub -I -l /nodes=2/core=1 -p "myri2g='YES'"

or

Terminal.png frontend:
oarsub -I -l /nodes=2/core=1 -p "myri10g='YES'"
Terminal.png node:
cd $HOME/src/mpi/NetPIPE-3.7.1
Terminal.png node:
$HOME/openmpi/bin/mpirun --mca plm_rsh_agent "oarsh" -machinefile $OAR_NODEFILE NPmpi

you should have something like that:

 0:         1 bytes   4080 times -->      0.31 Mbps in      24.40 usec     
 1:         2 bytes   4097 times -->      0.63 Mbps in      24.36 usec     
 ...
 122: 8388608 bytes      3 times -->    896.14 Mbps in   71417.13 usec
 123: 8388611 bytes      3 times -->    896.17 Mbps in   71414.83 usec

the minimum latency is given by the last column for a 1 byte message the maximum throughput is given by the last line, 896.17 Mbps in this case

To reserve one core on two nodes with an infiniband interconnect:

Infiniband hardware : MVAPICH

Terminal.png frontend:
oarsub -I -l /nodes=2/core=1 -p "ib10g='YES'"

or

Terminal.png frontend:
oarsub -I -l /nodes=2/core=1 -p "ib20g='YES'"

Setting up openMPI to accept private networks as routable between hosts

OpenMpi site : OpenMPI

  • Connect on one site and reserve a node
ssh nancy.grid5000.fr
oarsub -I
  • Connect to another site and reserve a node
ssh rennes.grid5000.fr
oarsub -I
  • Try to launch the code that you previously create between the two reserved nodes