Multi-parametric intensive stochastic simulations for hydrogeology (Hydro)
Leader: Jocelyne Erhel (SAGE)
Numerical modelling is an important key for the management and remediation of groundwater resources. Natural geological formations are highly heterogeneous and frac- tured, leading to preferential flow paths and stagnant regions. The contaminant migration is strongly affected by these irregular water velocity distributions. In order to account for the limited knowledge of the geological characteristics and for the natural heterogeneity, numerical models are based on probabilistic data and rely on Uncertainty Quantification methods. In this stochastic framework, non intrusive methods require to run multiple simulations. Also, numerical modelling aims at studying the impact of various physi- cal parameters, such as the Peclet number. Therefore, each simulation is governed by multiple parameters and a complete study requires to carry out analysis for more than 50 sets of parameters. The hydraulic simulations must be performed on domains of a large size, at the scale of management of the groundwater resource or at the scale of the homogeneous medium type in terms of geology. This domain must be discretized at a fine resolution to take into account the scale of geological heterogeneities. Characterization of transport laws requires simulating advection and dispersion on very long times and in turn in very large domains. Our objective is to use the computing and memory resources of computational grids to deploy these multiple simulations.
A first level of parallelism is used in each simulation. Indeed, in order to reach the target of large scale domains, it is necessary to run each simulation on a parallel computer with enough memory and with enough computing power. A second level of parallelism comes from Uncertainty Quantification. A third level of parallelism is the study of differ- ent sets of parameters. These multiparametric simulations are clearly independent and are thus very well-suited to techniques inspired from peer-to-peer. However, it should be kept in mind that each study is in itself a heavy computation involving a large number of random simulations, requiring high performance computing for each simulation. Our objective is to use current middleware developed for grid architectures, in order to make the most of the three levels of parallelism. Several difficulties arise, ranging from basic software engineering (compatibility of systems, libraries, compilers) to scheduling issues.
We are developing a scientific platform H2olab for hydrogeology. Our platform is designed in order to ensure integration of new modules and to facilitate coupling of existing modules. We use C++ development environments and software engineering tools. We have implemented three levels of distributed and parallel computing.
Each simulation is memory and CPU intensive. The platform relies on free software libraries such as parallel sparse solvers which use the MPI library. Thus we choose to develop distributed memory algorithms also with MPI. Each simulation is fully paral- lel with data distributed from the beginning to the end. These parallel deterministic simulations are operational in the software H2OLab and we are investigating scalability issues.
The intermediate level is the Uncertainty Quantification non intrusive method, cur- rently Monte-Carlo. We have developed a generic Monte-Carlo module. We use a specific random number generator in order to guarantee independent simulations. Thanks to our generic module and our random number generation, a run of Monte Carlo contains an embarassingly parallel loop of simulations, which can be readily distributed on a compu- tational grid. We have currently implemented a parallel version using the MPI standard. It can be generalized to a version with an extended MPI library or to a distributed version with a grid service. Also, the Monte Carlo module can be extended to any non intrusive UQ method.
At the multiparametric level, we choose only the distributed approach as is done in most projects on computational grids. Multiparametric simulations require more than 50 sets of data and generate as many results. We have developed a tool to automatically generate a multiparametric study: from a given range of parameter values, the tool generates all corresponding input data files and an associated batch file to run the complete study. This tool is now ready to be deployed on a computational grid using an adapted middlware.
Preliminary experiments with clusters of Grid’5000 show clearly that what we get is what we expect. So we can now adopt the same strategy for very large 3D computational domains. Our objective is to use middleware available on Grid’5000, in order to run the three levels of distributed computing.