VSM-G project - RNA/protein docking (RNADock project) Conducted by Fabrice Leclerc, Manuel Simoes, Leo Ghemtio, Bernard Maigret (Application)
Conducted byXavier Delaruelle, Leo Ghemtio, Emmanuel Jeannot, Bernard Maigret
DescriptionDesign of discrete rotamer libraries of dinucleotides for RNA/protein docking simulations (Applications) A large number of biological processes are regulated through RNA/protein interactions. A better understanding of the molecular basis for RNA/protein recognition will help to control these processes in a therapeutic or an enginerring perspective. Simulating in silico the interactions between RNA and protein molecules is achieved through docking methods. These methods involve two main functionalities related to the conformational search and the scoring evaluation of the energy of the bimolecular complexes. In this part of the projet, we focus on the first functionnality. The conformational search can be performed by exploring the conformational space on-the-fly or by searching through a pre-built conformational library in a pseudo-random way. For the sake of performance, we have opted for the second approach especially because RNA are very flexible molecules. Apart from the interaction with the protein, the RNA conformation is determined by specific stabilization forces: in particular the stacking of nucleic acid bases ("nucleotide side-chain") which involves a particular arrangement of two RNA residues in 3D space. The MC-Sym program is used to sample all possible arrangements between two adjacent residues which can be found in an RNA molecule. Each possible arrangement is defined by a matrix transformation which allows to place one residues with respect to the other one. A full-scale conformational library can be built based on a systematic and exhaustive exploration of the conformational space (backtracking) using any matrix of transformation corresponding to any arrangement between two adjacent RNA residues observed in experimental 3D structures (MC-Sym database). Even though the exploration is based on a discrete representation of RNA conformations, large computational ressources are still required for generating such conformational library for a dinucleotide (2 RNA residues). A root mean square devaition (RMSD) crirerion is used during the MC-Sym search to define how precise the library should be: a criterion of 1.5Å, for example, only retains RNA conformations which differ by more than 1.5Å of RMSD. In order to make RNA libraries easy to search, the RNA conformations generated are organized and clustered according to three geometrical constraints: the distance between the beginning and the end of the dinucleotide (4.0Å £ d £ 14.0Å, d± 0.5Å) and two torsion angles (0 £ q £ 360, d± 10°) that would define the orientation of the dinucleotide in a longer RNA chain at both ends. This is accomplished by the MC-Sym program by constraint satisfaction. Depending on the docking simulation time (user-defined) that will determine the exploration time, one may vary the RMSD value to get a more precise description of the RNA conformational space (RMSD of 0.5Å for example) for short simulations or a looser description (RMSD of 2.0Å for example) for long simulations without sacrifying the global description of conformational space. To optimize the computer time necessary for generating an RNA library for a given RMSD value, we plan to evaluate the time requirements depending on the three geometrical constraints. Since the distributions of the number of experimental 3D structures of RNA in the MC-Sym 3D database versus the geometrical constraints are not uniform, we expect a non-uniform computer time for searching through subsets of the conformational space defined by the contraints. First, we plan to distribute the RNA library building by incrementing the three geometrical constraints in an iterative way on a single processor up to x combinations of the three constraints and on all the nodes up to 1000 processors. An analysis of the CPU time for each elementary search on a single processor will be carried out and correlated with the number of RNA conformers generated. An analysis of the number of conformers versus the geometrical constraints will also be performed to evaluate the biais present in the MC-Sym database used to build the RNA libraries for a given RMSD.
- Nodes involved: >1000
- Sites involved: >3
- Minimum walltime: 8h
- Batch mode: no
- Use kadeploy: no
- CPU bound: yes
- Memory bound: yes
- Storage bound: yes
- Network bound: yes
- Interlink bound: yes
Tools usedMC-Sym; Perl scripts; C-Shell; APST.
Shared by: Xavier Delaruelle, Leo Ghemtio, Emmanuel Jeannot, Bernard Maigret
Last update: 2011-07-03 14:49:54