Grid'5000 experiment

Jump to: navigation, search

Open MPI and MPICH2 Integration with Fault Tolerance Protocols (Middleware)

Conducted by

Aurelien Bouteiller, Camille Coti, Thomas Herault, Ala Rezmerita, Eric Rodriguez

Description

The MPICH-V project of the Grand-Large team has demonstrated the efficiency and feasability of fault tolerance techniques for message passing systems like MPI. We are now collaborating with both the OpenMPI team at UTK and the MPICH2 team at Argonne National Laboratory to provide efficient fault tolerant mechanisms into their respective implementations of the MPI2 specification. One of the major challenges of this integration is to deal efficiently with high speed networks (like Grid5000 will provide soon) and large scale (that Grid5000 already provides). In this context, we will use Grid5000 as a testbed for the integrated Fault-Tolerant MPI Libraries.

Status

in progress

Resources

  • Nodes involved: 500
  • Sites involved: >6
  • Minimum walltime: 8h
  • Batch mode: no
  • Use kadeploy: yes
  • CPU bound: yes
  • Memory bound: yes
  • Storage bound: yes
  • Network bound: yes
  • Interlink bound: yes

Tools used

No information

Results

Not yet

Shared by: Aurelien Bouteiller, Camille Coti, Thomas Herault, Ala Rezmerita, Eric Rodriguez
Last update: 2007-10-06 20:30:25
Experiment #188

Personal tools
Namespaces

Variants
Views
Actions
Public Portal
Users Portal
Admin portal
Wiki special pages
Toolbox