Grid'5000 user report for Raphael Jamet
User informationRaphael Jamet (users, user, grenoble, ml-users user)
More user information in the user management interface.
- File broadcasting in a node / grid (Middleware) [achieved]
Description: This experiments aims at improving the file broadcast software used in Grid'5000. Rather than a line of transfers, we are trying to mix peer-to-peer technology and broadcast trees to create a failsafe and noticeably faster tool, able to take into account the network structure.
Results: Firsts tests show that it would take something like 15-20 seconds to broadcast 450 Mb to 9 nodes. There's still an unidentified problem which causes results under the previsions (we suspect disk access), but this experiment is in standby for now (my internship has ended).
- Aggregation of tasks in CiGri (Middleware) [achieved]
Description: CiGri is a grid middleware made for harvesting the idle cycles of the Ciment group clusters. It's built to work on top of OAR clusters, submitting best-effort jobs through multiple clusters where there's free resources. It manages campaigns and is supposed to work on long periods. But there's a job submitted per resource available, and jobs can be of arbitrary length. We have (for now) a fixed overhead cost of 5-10s per job (mostly due to the network and SSH commands), which cripples efficiency in case of small jobs. Our goal is to regroup jobs in batches, in order to minimise the impact of these overheads. (This section was written around july 2009 and is now outdated, see below) : However, for now, the engineer responsible for the Ciment grid is in holiday, so deploying anything on these clusters is impossible. We are therefore going to use a G5K cluster for our tests. There'll be only a few best-effort jobs, on short periods of time. We'll start by night experiments, to check all technical aspects, then on weekdays to measure the performance gains. (end of outdated section) In the end, we didn't use Grid5000 : deployment of CiGri is a complicated matter, and we didn't manage to get it working before the CiGri admin came back from his leave.
Results: The tool is running and deployed, but not on Grid5000 : see the description.