# Grid'5000 experiment

## Distributed Generation of Large Scale Permutations (Application)

### Conducted by

Jens Gustedt### Description

For testing and benchmarking the generation of large random input data
with known probability distributions is crucial. In
[GUSTEDT:2008:INRIA-00000900:2],
we show how to uniformly
distribute data at random in two related settings: *coarse grained
parallelism* and *external memory*. In contrast to previously
known work for parallel setups, our method is able to fulfill the three
criteria of uniformity, work-optimality and balance among the processors
simultaneously. To guarantee the uniformity we investigate the matrix of
communication requests between the processors. We show that its
distribution is a generalization of the multivariate hypergeometric
distribution and we give algorithms to sample it efficiently in the two
settings.

If instead of shuffling existing data we constrain ourselves to produce
permutations of integers *0, ..., n* for some large number *n*,
solutions that are much more efficient become possible. First, we are
able to show that the communication mentioned above can be improved by
using adapted compression techniques. In particular the information
theoretic lower bound of *Theta(n log p)* (instead of *Theta(n log
n)*) for the overall communication can be matched up to a constant
factor. Second, if instead of communicating data (integers in that case)
we generate them in place, we achieve a scheme that allows for a
trade-off between the number of random bits (*ie the quality of the
target distribution) and the total running time of the algorithm. This
approach is presented in [GUSTEDT:2008:INRIA-00312131:1].
*

### Status

achieved### Resources

*Nodes involved:*100*Sites involved:*1*Minimum walltime:*1h*Batch mode:*yes*Use kadeploy:*no*CPU bound:*yes*Memory bound:*yes*Storage bound:*no*Network bound:*yes*Interlink bound:*no

### Tools used

parXXL### Results

Not yet
*Shared by: Jens Gustedt*

*Last update: 2008-11-19 10:52:37*

*Experiment #522*