Grid'5000 is a scientific instrument designed to support experiment-driven research in all areas of computer science related to parallel, large-scale or distributed computing and networking.
Grid'5000 was prototyped in mid-2003 and took off since 2004.
On April 5th 2011, 7244 cores distributed among 1500 nodes were available to experiments.
The trend is to increase the ratio nb of cores / nb of nodes, whereas the number of nodes remains stable.
Hardware information is gathered at :
The live status of the platform (live or dead machines, job load among the clusters, ...) is at :
Specialized visualizations of the platform are reachable through the Grid5000 API User Interface.
The technical team
Whole team is between 15 and 20 engineers, structured into specialized staffs :
- support :
- All subjects related to platform administration and maintenance
- development :
- various developments about the platform and its tools : KaVLAN, Kadeploy, APIs, UMS, ...
On sharp topics such as specific parallel or distributed technologies, you'd better requesting help from the Grid'5000 user community through the users mailing-list (firstname.lastname@example.org).
Account's management is performed through the centralized User Management System (UMS)
- Request your account (if not already done) with that form.
- Once your request approved, manage your account with the dedicated interface.
Some accounts are getting higher privileges than others in order to implement the hierarchical management between accounts.
- User accounts are getting standard privileges : access to the wiki, web services and grid systems.
- they are supervised by managers.
- Account managers : they are in charge of one or several accounts.
- They're able to approve some requests from user accounts.
- They are supervised by Top managers.
- Site managers (or Top Managers) : they are in responsability of all the accounts of a site or a project.
- They're able to approve all the requests from users accounts.
- Admin accounts : reserved to Grid'5000 staff ; unlimited powers.
- They're able to repair or unlock things.
- Observer accounts : access only to the wiki.
Grid5000 is a unique tool but its usage and scientific production is regularly audited by its funders.
So users are strongly encouraged to keep their reports up to date about their research activities and listing their experiments, results, publications, collaborations...
- Gathers knowledge of engineers and researchers about Grid'5000.
- Archiving and sharing informations for the community.
- Features 3 portals : public, users, admin.
- Institutional pages for public visibility
- Targeted at the whole scientific community: history, descriptions, publications...
- unauthenticated access
Scientific work produced with Grid'5000 can be accessed at:
- Dedicated to the Grid'5000 user.
- Unauthenticated access allowed.
Go there to grab basic documentation on how to start with Grid'5000. It features main pointers to :
- the tutorials : help to learn ...
- the tools : resource reservation, deployment, ...
- your account : management, usage, ...
- the platform information : events, status, ...
- Meta informations about the social functioning of the Grid'5000 comunity : charter, support, ...
- Heart of the Grid'5000 knowledge base
- Authenticated access only.
It gathers more detailed documentation about :
- Grid'5000 softwares.
- Site's hardware.
- Technical documentation about Grid'5000 usage.
- Technical committee's meetings minutes...
Some technical webservices (monitoring tools, bugzilla database, etc.) are hosted on a separate Helpdesk but all links are provided by the main Grid'5000 portal.
Grid'5000 wiki is open toward its community : do not hesitate to contribute ! (typos / updates / add-ons...).
Some editing rules shall be followed however in order to avoid chaos and to make information easy to find and easy to upkeep ! Beware of :
- Namespace+name when creating a new page.
- Cross-linking to/from existing pages :
- Ease of finding new informations from existing top pointers (sidebar, portals).
- Avoid redundancy.
- Use rather cross-links/inclusions/redirections than simple duplication.
In case of doubt, do not hesitate to ask for help to wiki administrators and coordinators (David Margery, et al.).
For historical reasons, Grid'5000 has been built upon a network of dedicated clusters. It's not an ad-hoc grid.
This hence implies 2 ways of using Grid'5000 :
- At the grid level (to make experiments in a grid environment).
- At the cluster level (to maximise hardware homogeneity and bandwidth).
- It's the easiest level to start with as it is the basic building block unit, for hardware as well as for software.
- But multi-site experiments are favoured by the charter and some tools because these are difficult
A site refers to the geographical area of the laboratory hosting the machines. Almost all Grid'5000 sites are hosting more than one cluster.
Historically, Grid'5000 hardware has been acquired by incremental steps on each sites, thus forming clusters at each acquisition.
A cluster is a set of computers which present homogeneous properties.
A cluster is connected to a given network architecture and is physically installed on a given site.
All clusters from a given site rely upon a common network infrastructure.
A computer who's part of a cluster is called a node. Therefore we could define two types of nodes :
- compute nodes : the base element of a cluster, on which computations are run ; they're usually refered as nodes.
- service nodes : these machines are not meant to execute users applications but are dedicated to host the grid infrastructure services.
- Some service nodes are called frontends because of their particular role.
Each node may offer several resources to the users. On Grid'5000, the finest grain of resource is the core.
Every cluster usually consists of:
- one or more service node (also called frontend)
- from user-side these machines are mainly used for access, resources reservation and deployment
- from admin-side, theses machines host virtual hosts and infrastructure services
- compute nodes : the main computing power.
- Each node may have several CPUs and each CPU possibly several cores. The Grid'5000 resource manager allows reservation at the node level, and the core level.
- Service nodes hosting all the infrastructure services : their systems relies upon virtualization for isolation needs.
In principle, service nodes and frontends do not take part to the computing power of the cluster and therefore are not counted as computing nodes in the cluster hardware description
- All clusters of a site are physically connected.
- Network topology of each site are described in pages like :
- RENATER (the French research and education backbone) provides 10 Gbps interconnection between Grid'5000 sites.
- system services which allow things work (DNS, LDAP, ...)
- Distributed among all sites, with no outbound routing policy so the traffic is well isolated.
- External services (WiKi, mailing-lists manager...) are centralized on dedicated hosts belonging to the outside (public IP addresses).
Software and middleware
The two main services offered specifically by Grid'5000 are :
- resource management and job scheduling : the ability for the users to request some resources on the platform
and the guarantee of having fair access to them.
- deployment of system images : the ability for the users to re-install the system on their reserved nodes.
Other services are under the way :
Resource management and job scheduling
This job is handled by OAR 2, developed at IMAG.
It performs 3 tasks :
- Reserve resources (computes nodes, cores...) for a given duration, on behalf of the requesting user.
- Schedule the user's job over the reserved nodes ; the scheduler guarantees a fair use of the machine time.
- Free the resources at the end of the reservation.
All the resources of a given site are managed by a single instance of OAR 2.
Resources reservations must abide to the User Charter.
OAR-Grid 2 is a tool built on top of OAR for Grid reservations : several nodes of several sites at the same time.
Grid'5000 allow users to build their own customized system environment and install it on their reserved nodes.
This task is assumed by Kadeploy and the associated ka-* tools. Kadeploy's usage will also be addressed by the next practices.
Two options are available to benefit from a customized image :
- Building an image from scratch :
- Requires some preliminary work and specific knowledge.
- Allows for optimum experiment conditions and reproductability (as long as the hardware remain the same).
- Customizing an image maintained by the support staff.
- Some knowledge about services infrastructure is required (name resolution: DNS, authentication: LDAP, Home directories: NFS, access: SSH, ...).
The Grid5000 API UI provides a simple way to browse the hardware constitution of Grid'5000.
Full Grid'5000 hardware description is API browsable programatically but it's gathered there.
- All machines hardware are based upon x86-64 architecture.
- Processors are either AMD or Intel.
- Machine's hardware is slightly different from cluster to cluster, so as to create a richer grid eco-system.
- At least full 1 Gbps Ethernet interconnection.
- Low-latency hi-perf networks like Myrinet or Infiniband (featuring usually 10 Gbps bandwidth).
- Local disks (at least 80 GB).
The universal partitioning scheme is the same throughout all Grid'5000.
- Huge storage space, mainly for hosting
Working with Grid'5000
The Security policy could be resumed as follows :
- No outbound connection allowed from Grid'5000 toward the Internet.
- Inbound internet connections to site's clusters may be filtered depending on local security policy.
- Hosting laboratory networks are allowed to connect to site's Grid'5000 cluster(s).
- All traffic allowed between 2 Grid'5000 endpoints.
For more informations, see Security model.
Internet is filtered, so only a few sites are accessible from inside Grid'5000, for security reasons. Accesses are split in two categories :
- common : these accesses are common to the whole Grid'5000 platform.
- site : these accesses are specific to a site.
Common accesses are mainly about :
- Linux packages mirrors (Debian, Fedora, CentOS, ...)
- Kernel archive repository
- INRIA Gforge (INRIA's forge of software repository)
Web accesses are driven by a policy. Please refer to it to take a deeper understanding of web accesses inside Grid'5000.
If you need an extra access which is not currently authorized from inside Grid'5000 and if you feel it's for a legitimate use of Grid'5000, you might ask for it to be added to whitelisted hosts, by using the default support request procedure.
Grid'5000 is a grid, so each node of any site can communicate with any other node of any site.
A Grid'5000 account gives you a home directory on each site. This home directory is mounted on site's frontend and on nodes who use the reference image (default site system image).
- Please note that you have a distinct home directory on each site (ie : 9 Grid'5000 sites --> 9 home directories).
Data synchronization between your lab's home directory and all your Grid'5000 home directories are your responsibility.
- Grid'5000 home directories use quota on each site (soft limit of 25 GB, hard limit of 100 GB).
Informations to consider :
- Advertisements of current events (maintenance, incidents)
- Bugzilla, to keep track of issues and solutions