Grid'5000 user report for
User information( user)
More user information in the user management interface.
- MPI'5000 (Middleware) [in progress]
MPI5000 is a new transparent layer placed between MPI and TCP allowing application composed of several tasks to be correctly distributed on available node regarding the grid topology and the application scheme. Thus, our layer needs two data files: a file describing the grid topology including available nodes, both latency and bandwidth between the nodes and between sites; another file describing the application communication patterns with the size and the amount of messages sent between MPI processes. Using these two informations, our layer should realise an efficient placement of tasks on grid nodes.
Our layer also propose to transparently slipt TCP connections between MPI processes in order to take into account the grid topology. This new architecture is based on a system of relays placed at the LAN/WAN interface. We replace each end-to-end TCP connection by three connections (two on the LAN between a node and a relay, one on the WAN between two relays). Thus, we expect a faster lost recovery on LAN as well as a reduction of memory used because for local TCP buffers (they depend on RTT latency of a connection). On the relay, we planned to use different TCP implementations or different protocols for local and distant communications. The relays could also implement a different scheduling strategy of the messages in function of the data size, for example we could give priority to small message (usually control messages). Finally, as MPI applications are mostly using small messages, they are more penalised if the network is congestionned by large flows. We planned to reserve bandwidth in order to optimise MPI communications on the long distance shared link. The implementation of our proposition is based both on a library between MPI and system calls and relays daemon. Thus, the architecture is independant of MPI implementations.
Results: For the moment, relays and library are in a test phase. We are now testing our architecture in Grid'5000. Finally, we will implement the optimisations proposed previously.
- Optimization of Long-distance communications for MPICH-Madeleine (Networking) [achieved]
Description: To be executed in a grid, applications needs a support like MPI. But MPI was created for clusters. In grids, there is at least two more constraints to manage: heterogeneity and long distance management.
MPICH-Madeleine manage heterogeneity properly but no experiments was done with long-distance.
I use Grid'5000 to see how MPICH-Madeleine behave in the grid. I compare local perfomances and long distant ones in order to adapt this implementation for the grid. Principally, I try to optimize long-distance communications.
Results: Optimizations done lead to obtain a bandwith of 600 Mbps instead of 95 Mbps for sending MPI messages over a Wide Area Network.
- R\'eseau longue distance et application distribu\'ee dans les grilles de calcul : \'etude et propositions pour une interaction efficace  (national)
EntryType: phdthesis Author: Ludovic Hablot School: ENS Lyon, Universit\'e de Lyon
- Etude d'impl\'ementations MPI dans une grille de calcul  (national)
EntryType: inproceedings Author: Hablot, Ludovic and Gl\"uck, Olivier and Mignot, Jean-Christophe and Vicat-Blanc Primet, Pascale Booktitle: Actes de Renpar'08 Month: F\'evrier
- Comparison and tuning of MPI implementation in a grid context  (national)
EntryType: inproceedings Author: Hablot, Ludovic and Glück, Olivier and Mignot, Jean-Christophe and Genaud, Stéphane and Vicat-Blanc Primet, Pascale Booktitle: In Proceedings of 2007 IEEE International Conference on Cluster Computing (CLUSTER) Month: September Pages: 458-463
- Evaluation et optimisation d'une implémentation de MPI  (national)
EntryType: mastersthesis Author: L. Hablot et O. Glück School: LIP ENS Lyon INRIA Reso Month: Juillet Number: RR2006-26 Note: Also available as Research Report RR2006-26, INRIA Rh\^one-Alpes Url: http://hal.inria.fr/inria-00090666
- Interaction efficace entre les réseaux rapides et le stockage distribué dans les grappes de calcul  (national)
EntryType: techreport Author: Brice Goglin and Olivier Glück and Pascale Vicat-Blanc Primet Institution: LIP, ENS Lyon Type: Research Report Number: RR2006-04 Address: Lyon, France Month: Url: http://www.ens-lyon.fr/LIP/Pub/Rapports/RR/RR2006/RR2006-04.pdf Note: Also available as Research Report RR-5806, INRIA Rhône-Alpes
- Emulation d'un nuage réseau de grilles de calcul : eWAN  (national)
EntryType: techreport Author: Pascale Vicat-Blanc and Olivier Glück and Cyril Otal and François Echantillac Institution: LIP, ENS Lyon Type: Research Report Number: RR2004-59 Address: Lyon, France Month: Url: http://www.ens-lyon.fr/LIP/Pub/Rapports/RR/RR2004/RR2004-59.pdf
- Energy considerations in Checkpointing and Fault Tolerance protocols  (international)
EntryType: inproceedings Address: Boston, USA Author: Diouri, M. and Gluck, O. and Lefevre, L. and Cappello, F. Booktitle: 2nd Workshop on Fault-Tolerance for HPC at Extreme Scale (FTXS 2012) Month: 06
- Large Scale Gigabit Emulated Testbed for Grid Transport Evaluation  (international)
EntryType: inproceedings Author: Pascale Vicat-Blanc Primet and R. Takano and Y. Kodama and T. Kudoh and Olivier Glück and C. Otal Institution: LIP, ENS Lyon Booktitle: Proceedings of The Fourth International Workshop on Protocols for Fast Long-Distance Networks, PFLDnet'2006 Address: Nara, Japan Publisher: Pages: Month: Url: http://www.hpcc.jp/pfldnet2006/paper/s1_02.pdf
- An Efficient Network API for in-Kernel Applications in Clusters  (international)
EntryType: inproceedings Author: Brice Goglin and Olivier Glück and Pascale Vicat-Blanc Primet Institution: LIP, ENS Lyon Booktitle: Proceedings of the IEEE International Conference on Cluster Computing Address: Boston, Massachussets Publisher: IEEE Computer Society Press Month: Url: http://hal.inria.fr/inria-00070445
- eWAN : Wide Area Network emulator  (international)
EntryType: unpublished Optkey: Author: Vicat-Blanc, Pascale and Glück, Olivier and Otal, Cyril and Echantillac, François Howpublished: Poster INRIA Booth, Supercomputing 2004, Pittsburgh,USA Month: nov
Success stories and benefits from Grid'5000
- Overall benefits GRID'5000 is composed of several and heterogeneous high speed interconnects. Furthermore, it supplies Gigabit or 10-Gigabit dedicated links between its sites which makes us hope to obtain good performance in the WAN interconnect. It is a research platform which allows us to reserve nodes and links and to deploy our own system image on it.