Grid'5000 user report for Emmanuel Jeannot

Jump to: navigation, search

User information

Emmanuel Jeannot (users, user, account-manager, site-manager, cp, bordeaux, ml-users user)
More user information in the user management interface.


  • Data redistibution (Networking) [in progress]
    Description: Code coupling, executing parallel tasks, data persistence in the agent-client-server model are three examples where data redistribution between clusters occurs. Data redistribution is a communication phase where nodes on one cluster send data to nodes of the other cluster. This communication phase, in order to be efficient must tak einto account several parameters such that data redistribution pattern, latency and bandwidth of the network, bottlenecks. We have proposed several algorithms for scheduling messages in such a framework. Moreover, several algorithms of the litterature can be extended to this case. Currently running experiments aims at : 1- validates models ised when designing the algorithms 2- compare different algorithms bertween them We are using Grid 5000 because it is a platform that fits the field of study (clusters linked together by a backbone) and because it is possible to have a fine control of teh whole environment.
  • Wrekavoc a tool for emulating heterogeneity (Other) [in progress]
    Description: Computer science and especially heterogeneous distributed computing is an experimental science. Simulation, emulation, or in-situ implementation are complementary methodologies to conduct experiments in this context. In this paper we address the problem of defining and controlling the heterogeneity of a platform. We evaluate the proposed solution, called Wrekavoc, with micro-benchmark and by implementing algorithms of the literature.
    Results: In this work we propose an emulation tool called Wrekavoc. The goal of Wrekavoc is to define and control the heterogeneity of a given platform by degrading CPU, network or memory capabilities of each node composing this platform. Our current implementation of Wrekavoc have been tested on an homogeneous cluster. We have shown that configuring a set of nodes is very fast. Micro-benchmarks show that we are able to independently degrade CPU, bandwidth and latency to the desired values. Tests on algorithms of the literature (load balancing and matrix multiplication) confirm the previous results and show that Wrekavoc is a suitable tool for developing, studying and comparing algorithms in heterogeneous environments.
  • VSM-G project - step 1 (Application) [achieved]
    Description: Use of grid computing for solving a in silico drug discovery problem using the VSM-G platform.

    VSM-G (Virtual Screening Manager for computational Grids) is a platform dedicated to automated high throughput virtual screening. The goal of the platform is to identify within a very large molecular database (several million compounds) the best drug candidates for a given protein target. Because considerable computing power is needed for performing searches of this type within a reasonable time, VSM-G targets the use of computational grids. VSM-G is an integrated package constituted of a pre-processing engine for both protein targets and small molecules putative ligands to be screened on, and of a funel-based docking strategy. The latter is architectured as a sequence of different modules: a fast surface-matching procedure for rigid docking, next a flexible docking program, finally more elaborated molecular dynamics calculations. At each step of the funnel and depending on tuning, a proportion of inapropriate molecules can be discarded.

    The present study deals with the first step of the funnel, i.e the shape matching algorithm, on a computational grid. Indeed, this Grid'5000 experiment will allow us for the first time to use the in-house "fingerprint" program on grid clusters. "Fingerprint" approximates molecular surfaces using spherical harmonic expansions. For that purpose particular deflating/inflating algorithms are used, starting from triangulated ellipsoïds embrassing the cartesian coordinates of each molecules or of the target cavities.
    This experiment will highlight the behaviour of the program, estimate the profit time, and also estimate the effect of the size of the files needed for this calculation over the total computing time (granularity). Different times will be monitored such as the total processing time, the time for data transfer and the computing time. The ultimate goal of this experiment is to tune at best the parameters of the program to optimally spread the calculation on the grid. Moreover, we will estimate the best tool to distribute jobs on the grid cluster in order to interface later in optimal way VSM-G with a grid system.
    This experiment being concluded, using the coefficients of the spherical harmonic expansions obtained for both protein and ligand surfaces, the candidate molecules will be ranked according to their surface complementarity. The fast shape-matching in-house algorithm that will evaluate such a complementarity is called "SHEF".

    The candidate molecules ranked this way, will be next submitted to the second filter of the funnel. This step will constitute a forthcoming Grid'5000 experiment.

    Total number of molecular conformers: 91 063 822
    Granularity of "fingerprint" program on one processor on nancy site:
    - for one conformer: processor time of execution = 2 seconds
    - for one thousand of same conformer: processor time of execution = 1919 seconds
    - for one thousand of different conformers: processor time of execution = 2147 seconds
    For the whole experiment the total molecular conformers would take about 6 years on a single processor according to the previous average !!!!
    For the 12-hours reservation, the experiment includes 9 sites and 1360 processors. It corresponds to 43 200 s for one processor and 58 762 000 s nearly 2 years time processors for all 1360 processors.
    To carry out our experiment we have use the A Parameter Sweep Tool (APST) middelware. This tool automates the execution of a single program (or a small set of programs) many times, varying the runs by changing the command line arguments, the input files, or both. The individual runs are usually independent, and little or no communication takes place other than by using output files from one run as input files to another.

    More information here
  • Total Exchange data redistribution (Middleware) [achieved]
    Description: Conception of grid-aware total exchange (all-to-all) algorithms. Scalability tests.
    Results: We developed an algorithm for total exchange between two clusters that improves drastically the performance of all-to-all operations. This algorithm easily performs twice as faster than the traditional algorithm for large messages (>2MB), but it is over 8 times faster for small-medium messages (>1KB), which are the main target of traditional applications. The algorithm also presents a better scalability behavior, as we demonstrated in comparison experiences with up to 90 nodes. The results are actually the subject of papers published at EUROPAR07 and CoreGrid Symposium'07.
  • Modeling LU factorization (Application) [in progress]
    Description: Modeling the LU factorization is an important challenge as it can help to understand the scalability of the algorithm and helps the user to compute both the best block size and the optimal processor grid-size. Modeling the LU factorization has been addressed in several prior works.. IN the ScalaPACK users guide, a very crude model is proposed. It is simply used for showing the scalability of ScaLapack. It does not use the processor grid shape whereas it is well known that it has a great impact on the performance. Domas et al. propose a very fine model with multiple parameters for the environment (more than 30!). They use this model to compute the optimal grid size. However, they do not show how to compute these parameters. In the LAPACK Working Note 43 the authors propose a simple model with only three parameters for the environment (the bandwidth and the latency of the network and the flop rate of the processor). However, our experiments show that the value of these parameters depends on the phase of the algorithm. For instance a better floprate is achieved when updating the trailing sub-matrix than when updating the panel. It is important to note that none of the above models take into account the SMP clusters architecture where each processor can be grouped on a multi-processor node.
    Results: We have proposed a model that has three environment parameters per subroutine. As it takes into account the grid-size, it is able to derive the optimal grid shape in most of the cases. Moreover, we propose a simple and fast method to compute the model parameters. Finally we show how to enhance our model for multiprocessor clusters.
  • VSM-G project - step 2 (Application) [achieved]
    Description: Use of grid computing for solving a in silico drug discovery problem using the VSM-G platform.

    The proposed experiment deals with the second step of the funnel, i.e the flexible docking algorithm, on a computational grid. Indeed, this Grid'5000 experiment will allow us for the first time to use the GOLD program on grid clusters and to test its parallel implementation (PVM) at a large scale. GOLD uses a genetic algorithm to dock a flexible ligand to a semi-flexible protein. The program takes into account the position and conformation of the ligand and also the hydrogen bonding network in the binding site. Then, an empirical scoring function is used to rank the ligands according to their computed affinity with the protein target.
    This experiment will highlight the behaviour of the program, estimate the profit time, and also estimate the effect of the size of the files needed for this calculation over the total computing time (granularity). Different times will be monitored such as the total processing time, the time for data transfer and the computing time. The ultimate goal of this experiment is to tune at best the parameters of the program to optimally spread the calculation on the grid. Moreover, we will estimate the best tool to distribute jobs on the grid cluster in order to interface later in optimal way VSM-G with a grid system.

    This experiment being concluded, the best candidate molecules will be then submitted to the final module of the funnel, i.e. Molecular Dynamics simulations. This step will constitute another Grid'5000 experiment.

  • VSM-G project - step 3 (Application) [in progress]
    Description: Use of grid computing for solving a in silico drug discovery problem using the VSM-G platform.

    Molecular dynamics (MD) calculations will constitute the ultimate step of the screening funel. Starting from several millions of candidate molecules, and after elimination of most of them through the first filters, few thousands will have to be considered by the MD procedure. Considering the lenght of MD calculation for a single protein+molecule system (several months of CPU time on a single processor), it is an absolute requirement to use a MD code presenting an efficient parallelism scheme especially tailored for cluster grids.
    For that purpose, it will be necessary to test the scaling of some popular parallel MD codes such as NAMD, CHARMM and GROMACS on GRID'5000. All these codes are open source software so that it will be possible to introduce the necessary modifications in order to adapt them to running masive calculations on cluster grids.
    The present experience will therefore present several facets:
    - Tests of the scaling on a single cluster of the grid according to the particular architecture of this machine: processor types, network types (1Gb ethernet, infinyband, myrinet). These tests will be performed on different systems with increasing size in order to test how the calculations will be sensitive to this aspect (this will be related to the domain decomposition scheme used by the codes).
    - Next it will be necessary to perform again the same tests, but using now several distant clusters to check the influence of the communications between the different machines on the MD calculations. The tests to be performed will need ressources ranging from few hours on a very large number of CPUs (up to 1000) to several days on a limited nodelist (up to 128).
    After analysing the bottlenecks within the MD codes, we will then check the efficiency of different middlewares in order to choose the more suitable one for this type of simulation.

  • VSM-G project - RNA/protein docking (RNADock project) Conducted by Fabrice Leclerc, Manuel Simoes, Leo Ghemtio, Bernard Maigret (Application) [achieved]
    Description: Design of discrete rotamer libraries of dinucleotides for RNA/protein docking simulations (Applications) A large number of biological processes are regulated through RNA/protein interactions. A better understanding of the molecular basis for RNA/protein recognition will help to control these processes in a therapeutic or an enginerring perspective. Simulating in silico the interactions between RNA and protein molecules is achieved through docking methods. These methods involve two main functionalities related to the conformational search and the scoring evaluation of the energy of the bimolecular complexes. In this part of the projet, we focus on the first functionnality. The conformational search can be performed by exploring the conformational space on-the-fly or by searching through a pre-built conformational library in a pseudo-random way. For the sake of performance, we have opted for the second approach especially because RNA are very flexible molecules. Apart from the interaction with the protein, the RNA conformation is determined by specific stabilization forces: in particular the stacking of nucleic acid bases ("nucleotide side-chain") which involves a particular arrangement of two RNA residues in 3D space. The MC-Sym program is used to sample all possible arrangements between two adjacent residues which can be found in an RNA molecule. Each possible arrangement is defined by a matrix transformation which allows to place one residues with respect to the other one. A full-scale conformational library can be built based on a systematic and exhaustive exploration of the conformational space (backtracking) using any matrix of transformation corresponding to any arrangement between two adjacent RNA residues observed in experimental 3D structures (MC-Sym database). Even though the exploration is based on a discrete representation of RNA conformations, large computational ressources are still required for generating such conformational library for a dinucleotide (2 RNA residues). A root mean square devaition (RMSD) crirerion is used during the MC-Sym search to define how precise the library should be: a criterion of 1.5Å, for example, only retains RNA conformations which differ by more than 1.5Å of RMSD. In order to make RNA libraries easy to search, the RNA conformations generated are organized and clustered according to three geometrical constraints: the distance between the beginning and the end of the dinucleotide (4.0Å £ d £ 14.0Å, d± 0.5Å) and two torsion angles (0 £ q £ 360, d± 10°) that would define the orientation of the dinucleotide in a longer RNA chain at both ends. This is accomplished by the MC-Sym program by constraint satisfaction. Depending on the docking simulation time (user-defined) that will determine the exploration time, one may vary the RMSD value to get a more precise description of the RNA conformational space (RMSD of 0.5Å for example) for short simulations or a looser description (RMSD of 2.0Å for example) for long simulations without sacrifying the global description of conformational space. To optimize the computer time necessary for generating an RNA library for a given RMSD value, we plan to evaluate the time requirements depending on the three geometrical constraints. Since the distributions of the number of experimental 3D structures of RNA in the MC-Sym 3D database versus the geometrical constraints are not uniform, we expect a non-uniform computer time for searching through subsets of the conformational space defined by the contraints. First, we plan to distribute the RNA library building by incrementing the three geometrical constraints in an iterative way on a single processor up to x combinations of the three constraints and on all the nodes up to 1000 processors. An analysis of the CPU time for each elementary search on a single processor will be carried out and correlated with the number of RNA conformers generated. An analysis of the number of conformers versus the geometrical constraints will also be performed to evaluate the biais present in the MC-Sym database used to build the RNA libraries for a given RMSD.
  • Analysis of the AllToAllV in cluster of cluster environments (Networking) [in progress]
    Description: The context of this work is the total exchange of data between processes running in different clusters linked by a wide-area link (backbone). We can experiment such an environment using several sites of the Grid5000 testbed. Starting from the work done on the AllToAll operation we have extended the study to the AllToAllV operation. AllToAll in the MPI terminology means a total exchange of pieces of data of the same size between participant processes. AllToAllV implies that each piece of data may be of a different size. Unlike AllToAll, it is difficult to choose a good routing algorithm (e.g binomial tree, Bruck algorithm) depending on the data size for AllToAllV since the processes do not know the amount of data sent by the others. MPI implementations (OpenMPI, MPICH-2, GridMPI) have today a very simple implementation for AllToAllV, in which all processes send and receive asynchronously their data to all other processes and wait for the completion of all communications. Yet simple, this strategy performs better than any optimized routing scheme used in the other collective operations. We have tried more sophitiscated approaches based on message aggregation, congestion regulation and load-balance. In this kind of strategy, we select forwarders processes at each cluster, whose task is to aggregate single messages into a larger one sent over the backbone. The number of forwarders allows to control how many TCP streams simultaneously compete for the backbone. We can balance the size of the aggregated messages with an extra step at the begining where process first exchange the information about how much data they send. However, our proposal have only equaled the original AllToAllV implementation so far, bringing no significant improvement. One of the key finding of this work is that message aggregation does not improve the overall throughtput of the streams over the wide area link. Work is currently undergoing to model the behavior of the various routing strategies in the PLogP model.
  • RT Boinc chess (Middleware) [achieved]
    Description: Many large-scale distributed computing applications demand real-time responses by soft deadlines. To enable such real-time task distribution and execution on the volunteer resources, we previously proposed the design of the real-time volunteer computing platform called RT-BOINC. The system gives low $O(1)$ worst-case execution time for task management operations, such as task scheduling, state transitioning, and validation. In this work, we present a full implementation RT-BOINC, adding new features including deadline timer and parameter-based admission control. We evaluate RT-BOINC at large scale using two real-time applications, namely, the games Go and Chess. The results of our case study show that RT-BOINC provides much better performance than the original BOINC in terms of average and worst-case response time, scalability and efficiency.
    Results: We improved the real time features of RT BOINC. The experiments was done with a chess game.



    Success stories and benefits from Grid'5000

    last update: 2011-03-30 16:59:49