Grid'5000 experiment

Jump to: navigation, search

Integration of SLURM resource manager with OARv2 batch scheduler (Middleware)

Conducted by

Olivier Richard

Description

SLURM is an open-source resource manager designed for Linux clusters of all sizes. It provides three key functions. First it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates conflicting requests for resources by managing a queue of pending work. SLURM is not a sophisticated batch system, but it does provide an Applications Programming Interface (API) for integration with external schedulers such as The Maui Scheduler, Moab Cluster Suite and Platform LSF. Oar is an opensource batch scheduler which provides a simple and flexible exploitation of a cluster. t manages resources of clusters as a traditional batch scheduler (as PBS / Torque / LSF / SGE).Its design is based on high level tools: * relational database engine MySQL or PostgreSQL, * scripting language Perl, * confinement system mechanism cpuset, * scalable exploiting tool Taktuk. It is flexible enough to be suitable for production clusters and research experiments. It currently manages over than 5000 nodes and has executed more than 5 million jobs. Our goal is to couple SLURM with OARv2 batch scheduler. For this there are two different paths: 1)Use the SLURM API and provide OARv2 as an external scheduler of SLURM (In this case we use the SLURM commands for job submitting and monitoring), 2)Use SLURM as the low level resource manager tool in the place of Taktuk for OARv2 batch scheduler (In this case we use the OARv2 commands for job submitting and monitoring). We are interested to implement both of the methods and propose experiments to test and evaluate the performance of the coupled systems in terms of scalability and efficiency, by comparing the tho methodes 1)between them,2)with each system "standalone" and 3)with other possible resource managers and batch scedulers.

Status

in progress

Resources

  • Nodes involved: 1
  • Sites involved: 1
  • Minimum walltime: 1h
  • Use kadeploy: yes

Tools used

SLURMv1.2.1 resource manager, OARv2.0.1-2 batch scheduler

Results

Not yet

Shared by: Olivier Richard
Last update: 2008-11-19 21:31:52
Experiment #353

Personal tools
Namespaces

Variants
Views
Actions
Public Portal
Users Portal
Admin portal
Wiki special pages
Toolbox