Grid experiment-OAR2

From Grid5000

(Redirected from Grid experiment)
Jump to: navigation, search


Contents

Introduction

More than 20 clusters spread over 9 sites are available on Grid'5000.

A really simple tool OAR Grid was built upon OAR2 to help you using the whole Grid resources at once.

That tutorial's purpose is to get our job run among 3 clusters from 3 sites.

Image:Note.png Note

We recommend you to open at least 2 terminals:

  1. for running jobs
  2. the other to monitor/act on the jobs.

Prepare the experiment environment

Cluster setup

Please refer to the cluster setup done in previous tutorial.


Synchronize data on each site

Your home directories on each site are independant from each other. Thus before submitting grid jobs, you have to be sure that experiment's data are available in all sites you plan to use.

(1) Synchronize SSH publickey and configuration:

Image:Terminal.png frontend:
rsync --delete -avz ~/.ssh --exclude known_hosts SITE.grid5000.fr:

(2) Synchronize data experiment (codes, configuration files...):

Image:Terminal.png frontend:
rsync --delete -avz ~/hello SITE.grid5000.fr:

Visualize grid

As for cluster experiments, it's possible to analyze the grid state.

disco

disco is a grid resources discovery tool to find the maximum available resources on a given time range for specified alias(s) of resources.

Find available resources from now to now + 1 hour on paradent cluster (located at Rennes):

Image:Terminal.png frontend:
disco paradent

Find available resources from now to now + 1 hour on Lille Rennes and Sophia sites:

Image:Terminal.png frontend:
disco lille rennes sophia
lille: 
  resources:(max/dead/avail):       1045/690/355
  nodes:    (max/dead/fully_avail): 175/115/21
sophia: 
  resources:(max/dead/avail):       1130/902/228
  nodes:    (max/dead/fully_avail): 224/73/6
rennes: 
  resources:(max/dead/avail):       2578/2366/212
  nodes:    (max/dead/fully_avail): 390/232/4

Nb available resources:    795
Nb fully available nodes:  31


To reserve at resource level (inside/outside):
  oargridsub -s "2011-04-09 15:59:08" -w 1:00:00   lille:rdef="core=355",sophia:rdef="core=228",rennes:rdef="core=212"
  ssh frontend.grenoble.grid5000.fr oargridsub -s \"2011-04-09 15:59:08\"  -w 1:00:00 lille:rdef="core=355",sophia:rdef="core=228",rennes:rdef="core=212" 

To reserve at nodes level with -t allow_classic_ssh enabled (inside/outside)::
  oargridsub -t allow_classic_ssh -s "2011-04-09 15:59:08" -w 1:00:00   lille:rdef="nodes=21",sophia:rdef="nodes=6",rennes:rdef="nodes=4"
  ssh frontend.grenoble.grid5000.fr oargridsub -t allow_classic_ssh -s \"2011-04-09 15:59:08\"  -w 1:00:00     lille:rdef="nodes=21",sophia:rdef="nodes=6",rennes:rdef="nodes=4"


In addition to give resources available at best, it lists the oargridsub commands to use.

OarGridMonika

OarGridMonika is a web interface that gathers informations retrieved by all Monika of each site.

OarGridGantt

OarGridGantt summarizes information given by its cluster counterpart DrawOARGantt. It prints temporal diagrams of past, current and planned states of each cluster:

Grid'5000 API

An other way to visualize nodes/jobs status is to use the Grid'5000 API

A script imitating some of disco behavior using the API with restfully is available here for API 2.0 or here for the SID API.

Grid reservation

In grid reservation mode, no script can be specified for interactive submissions.

Users are in charge to:

  1. connect to the allocated nodes.
  2. launch their experiment.


Reservation submission

We are going to reserve 4 nodes on 3 different sites for half an hour:

Image:Terminal.png frontend:
oargridsub -t allow_classic_ssh -w '0:30:00' CLUSTER1:rdef="/nodes=2",CLUSTER2:rdef="/nodes=1",CLUSTER3:rdef="nodes=1"

OAR Grid connects to each of the specified clusters and makes a passive submission. Cluster job ids are returned by OAR. A grid job id is returned by OAR Grid to bind cluster jobs ids together.

You should see an output like this:

CLUSTER1:rdef=/nodes=2,CLUSTER2:rdef=/nodes=1,CLUSTER3:rdef=nodes=1
[OAR_GRIDSUB] [CLUSTER3] Date/TZ adjustment: 0 seconds
[OAR_GRIDSUB] [CLUSTER3] Reservation success on CLUSTER3 : batchId = CLUSTER_JOB_ID3
[OAR_GRIDSUB] [CLUSTER2] Date/TZ adjustment: 1 seconds
[OAR_GRIDSUB] [CLUSTER2] Reservation success on CLUSTER2 : batchId = CLUSTER_JOB_ID2
[OAR_GRIDSUB] [CLUSTER1] Date/TZ adjustment: 0 seconds
[OAR_GRIDSUB] [CLUSTER1] Reservation success on CLUSTER1 : batchId = CLUSTER_JOB_ID1
[OAR_GRIDSUB] Grid reservation id = GRID_JOB_ID
[OAR_GRIDSUB] SSH KEY : /tmp/oargrid//oargrid_ssh_key_LOGIN_GRID_JOB_ID
       You can use this key to connect directly to your OAR nodes with the oar user.

Fetch the allocated nodes list to transmit it to the script we want to run:

Image:Terminal.png frontend:
oargridstat -w -l GRID_JOB_ID | sed '/^$/d' > ~/machines
Image:Note.png Note

The -w command-line argument makes oargridstat wait for the start of every cluster reservation.

  • Nodes list can be incomplete otherwise.

(1) Select the node to launch the script (ie: the first node listed in the ~/machines file).

If (and only if) this node does not belong to the site where the ~/machines file was saved, copy the ~/machines to this node:

Image:Terminal.png frontend:
OAR_JOB_ID=CLUSTER_JOB_ID oarcp -i /tmp/oargrid/oargrid_ssh_key_LOGIN_GRID_JOB_ID ~/machines `head -n 1 machines`:

(2) Connect to this node using oarsh:

Image:Terminal.png frontend:
OAR_JOB_ID=CLUSTER_JOB_ID oarsh -i /tmp/oargrid/oargrid_ssh_key_LOGIN_GRID_JOB_ID `head -n 1 machines`
Image:Note.png Note

Do not forget to indicate the location of the temporary private key generated by the oargridsub command when you want to connect to one of your allocated nodes

  • In previous snippets, this is done by using the -i option.

And then run the script:

Image:Terminal.png node:
~/hello/helloworld ~/machines

Visualization

The Grid counterpart of oarstat gives information about the grid job:

Image:Terminal.png frontend:
oargridstat GRID_JOB_ID

Ending

Our grid submission is interactive, so its end time is unrelated to the end time of our script run. The submission ends when the submission owner requests that it ends or when the submission deadline is reached.

We are going to ask for our submission to end:

Image:Terminal.png frontend:
oargriddel GRID_JOB_ID

Grid'5000 API

The restfully tutorial describes how to reserve nodes similarly to oargridsub on all sites. You can adapt this script to your convenience.

Grid batch

OAR Grid can also do batch submission. In this mode, you specify a script that will be run on each specified cluster.


Reservation submission

Let us run the script on April 19th, 2011 at 4:00pm for a 10-minute duration:

Image:Terminal.png frontend:
oargridsub -t allow_classic_ssh CLUSTER1:rdef="/nodes=2",CLUSTER2:rdef="/nodes=1",CLUSTER3:rdef="/nodes=1" -s '2011-04-19 16:00:00' -w '0:10:00' -p ~/hello/helloworld

You should see a similar behavior to that of an interactive submission.

  • OAR Grid connects to each specified cluster and does a passive submission.
  • As opposed to interactive mode, the specified script is run by each passive cluster submission.
Image:Warning.png Warning

Our passive grid submission will provoke 3 independent cluster submissions.

  • A different result will be returned for each involved cluster.

Visualization

The allocated nodes list:

Image:Terminal.png frontend:
oargridstat -w -l GRID_JOB_ID

Jobs results can be viewed on each involved cluster. As for a passive submission, standard and error outputs are saved in your home directory:

OAR.CLUSTER_JOB_ID.stdout
OAR.CLUSTER_JOB_ID.stderr

We can follow our cluster job's run in live when connected to these clusters (Ctrl-C to quit):

Image:Terminal.png frontend:
tail -f OAR.CLUSTER_JOB_ID.stdout

Ending

Passive grid submission ends when every inherent passive cluster submission are terminated.

  • So it ends when each script have finished running on each cluster.


Next tutorial

Learn to program Grid'5000 with API Main Practical.

Personal tools
Wiki special pages