Multi-cluster policy

From Grid5000

Jump to: navigation, search
See also: Multi-cluster policy | Multi-cluster implementation (admin)

An uniform way to manage the clusters of a site.

Contents

Why a convention on the multi-cluster management?

Rationalize the configuration of the sites so:

  • Users can handle the platform more efficiently
  • Administrators can share their way to manage the platform

Convention policy

What is a cluster?

A cluster is a relatively homogeneous set of nodes:

  • homogeneous in term of deployment (configuration and performance)
  • homogeneous for user's experiments

This smooth definition allows to adapt to each case when new hardware is purchased.

Note: a not mandatory but good way to know when a new cluster appears is when a new value for the (site,nodemodel,cputype,cpufreq) couple is defined

User frontends

One user frontend must at least be available on each site.

Default requirements

All the user frontends must allow users to:

  • authenticate themselves
  • manage their data
  • submit jobs
  • access all the resources of the grid

Main user frontend

A main user frontend is elected on each site and must fulfilled additional requirements:

The main user frontend helps improving platform's understanding and scripting comfort.

Services

Image:Todo.png Todo

Load balancing may be part of this draft one day, but its is not required for the OAR2 migration

Job submission

Users may be able to reserve from the user frontends:

  • any resources of the local site with OAR, to improve job allocation
  • any resources of the grid with OAR Grid

Deployment

User may be able to deploy any of the resources of the local site from the main user frontend.

Procedures

Receiving new compute nodes

  • During a meeting of the CT, the site presents its new hardware and its will of integration
  • Then the members of the CT study if the demand can be fulfilled without harming platform's uniformity
  • At the end of the meeting, consensus is reached about integrating the new hardware as a new cluster or as an extension

Retiring old compute nodes

Image:Todo.png Todo

To draft one day, but not required for the OAR2 migration

Personal tools
Wiki special pages