KAAPI-Fault tolerance (Middleware)
Conducted byLiyun Guelton, Serge Guelton
DescriptionI use Grid5000 to test our Fault Tolerance (FT) system of KAAPI project. The objectif is to successfully execute KAAPI jobs on many nodes, while allowing that certain processes are killed or in failure during the execution (the system is capable to restart the failed processes). Also I use Grid5000 to test the parallel programs.
- Nodes involved: 1000
- Sites involved: >3
- Minimum walltime: >1d
- Batch mode: no
- Use kadeploy: yes
- CPU bound: no
- Memory bound: no
- Storage bound: no
- Network bound: no
- Interlink bound: no
Tools usedtaktuk, karun, oarsub, oarsh, oardel...
Shared by: Liyun Guelton, Serge Guelton
Last update: 2007-09-27 16:01:03