API all in one Tutorial

From Grid5000
Jump to: navigation, search
Note.png Note

This tutorial is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team.

Presentation

This session will help you learning how to interact with Grid5000 API, using a range of tools that goes in increasing degrees of abstraction. We target people who want a remote and programmatic access to Grid'5000 tools in order to monitor nodes, submit jobs and deploy environments.

If you are not familiar with the HTTP protocol, you are strongly advised to have a look at the introductory page about the Grid'5000 API. The introduction to this tutorial will touch shortly on the main things to know.

This practical will use the latest stable version of the API: 3.0 and will guide you through the steps required to run a simple experiment looking at power consumption.

Introduction

You can access the API from both inside and outside Grid'5000. Some operations on the API require your identification. If you are using a shared computer please be careful not to expose your Grid'5000 password on the command line and do not leave it in clear text on the filesystem. If you are using your personal computer, less precautions are required, but you should nevertheless handle your credentials with care. For your convenience, you are automatically identified when you query the API from a Grid'5000 frontend. You can therefore securely do this tutorial

  • From your personal computer
  • From a Grid'5000 frontend
  • From a machine with a tunnel to an ssh frontend setup. See How to setup an SSH Tunnel from the API page.

As the API is really a REST API over HTTP, there is no library and language dependencies to use that API. We will therefore start from the command line using cURL. However, using a programming language is recommended for any meaningful interaction with the API. As it is not practical to write a tutorial supporting many programming languages, Ruby is choosen for this tutorial. If you are not familiar with ruby, you will have to adapt the examples to your favourite language, or learn the basics of ruby as you go.

In this tutorial, we will use the API to discover, reserve and use nodes. For this, we will

  1. Find appropriate resources using the Reference API
  2. Look at whether they are free or not, using the Monitoring API
  3. Secure access to them using the Jobs API
  4. Gather usage metrics using the Metrology API

But first, remember you must choose where you will run your code. Note that all code examples in this tutorial are available in the api/3.0/ directory of the tutorials repository on github. Please clone that repository on the machine you'll be using for this tutorial.

Terminal.png frontend:

git clone https://github.com/grid5000/tutorials.git;

cd tutorials/measurements/3.0
Note.png Note

Platform state and reproducibility

The data served by the API, corresponding to the characteristics of Grid5000 nodes, can change over time.

Please refer to Platform state and reproducibility to keep a stable reference to those characteristics before using the API.

API at a glance

This schema presents an overview of the Grid'5000 API landscape, with its sub APIs.

Api Overview.png

Hands-on tutorial

Finding appropriate resources

Starting with cURL

We will start by using curl to look for resources

Terminal.png frontend:
Terminal.png laptop:
curl -k -u login[:password] https://api.grid5000.fr/stable
{
  "type":"grid",
  "uid":"grid5000",
  "version":"cd01beb4caa1366af286443897be9082aa3435e5",
  "release":"3.1.5",
  "timestamp":1380803003,
  "links":[
    {
      "rel":"environments",
      "href":"/3.0/environments",
      "type":"application/vnd.grid5000.collection+json"
    },
    {
      "rel":"network_equipments",
      "href":"/3.0/network_equipments",
      "type":"application/vnd.grid5000.collection+json"
    },
    {
      "rel":"sites",
      "href":"/3.0/sites",
      "type":"application/vnd.grid5000.collection+json"
    },
    {
     "rel":"self",
     "type":"application/vnd.grid5000.item+json",
     "href":"/3.0/"
    },
    {
     "rel":"parent",
     "type":"application/vnd.grid5000.item+json",
     "href":"/3.0/"
    },
    {
     "rel":"version",
     "type":"application/vnd.grid5000.item+json",
     "href":"/3.0/versions/cd01beb4caa1366af286443897be9082aa3435e5"
    },
    {
     "rel":"versions",
     "type":"application/vnd.grid5000.collection+json",
     "href":"/3.0/versions"
    },
    {
     "rel":"users",
     "type":"application/vnd.grid5000.collection+json",
     "href":"/3.0/users"
    },
    {
     "rel":"notifications",
     "type":"application/vnd.grid5000.collection+json",
     "href":"/3.0/notifications"
    }
  ]
}

You probably did not get a very nice display of the output. You can use json_pp to format the output, or the build-in ?pretty=yes parameter to your queries. For readability reasons, we will leave them aside in the rest of the tutorial

Terminal.png frontend:


What you should note in the answer

  1. The answer has a version identifier: The description of Grid'5000 is versioned
  2. You have a link to the description of the version /3.0/versions/cd01beb4caa1366af286443897be9082aa3435e5
  3. You have link to to list the versions /3.0/versions You can also query the API for the description of Grid'5000 at a specific date.
  4. you have a link to a description of all the sites, network equipments and environments. These links entries are the base mechanism to allow user or programs to browse the API.

looking at sites link, gives you the idea that browsing the API with a program is desirable for most real use-cases. If you want to explore further just by using cURL, please take a look at the curl tutorial for the outdated API 2.0.

Browsing using ruby's net/http

We will now illustrate discovering the sites by another method than running cURL on a link that has been cut and pasted from the result of a previous call to cURL. This is the point where you need to decide whether you will learn ruby when following the examples or adapt them to your language of choice. The examples work on the ruby version running on the frontend (source available at browse_using_ruby.rb).

 1 #!/usr/bin/env ruby
 2 
 3 require 'net/http'
 4 require 'net/https'
 5 require 'openssl'
 6 require 'uri'
 7 require 'rubygems'
 8 require 'json'
 9 require 'pp'
10 
11 # by default, net/https does not trust
12 # any certificate authority
13 store = OpenSSL::X509::Store.new
14 store.set_default_paths
15 
16 # create the http object modeling the connexion
17 # to the API
18 
19 https = Net::HTTP.new('api.grid5000.fr',443)
20 req = Net::HTTP::Get.new('/stable')
21 https.use_ssl = true
22 https.verify_mode = OpenSSL::SSL::VERIFY_NONE
23 
24 # WARNING: For an usage outside of grid5000 add a basic auth:
25 #req.basic_auth("user", "pass")
26 
27 def fetch_url(https,req)
28   res = https.request(req)
29   case res
30   when Net::HTTPSuccess
31     answer=JSON.parse(res.body)
32     return answer
33   else
34     puts "HTTP Error #{res.code} calling #{https}"
35     res.error!
36   end
37 end
38 
39 def get_link(root, name)
40   root["links"].collect { |item| item["href"] if item["rel"] == name }.compact.first
41 end
42 
43 root=fetch_url(https,req)
44 puts root
45 
46 sites_url=get_link(root,"sites")
47 req = Net::HTTP::Get.new(sites_url)
48 # WARNING: For an usage outside of grid5000 add a basic auth:
49 #req.basic_auth("user", "pass")
50 
51 all_sites=fetch_url(https,req)
52 
53 all_sites["items"].each do |site|
54   puts site["name"]
55 end

Run this example with

Terminal.png frontend:
ruby browse_using_ruby.rb

As you can see, there is a lot of boiler plate code here, that can be abstracted using higher level libraries. Please extend this example to get a list of clusters from the reference API, and the total number of clusters available today.

As you can see, it seems we are always writing the same code, and therefore a library could abstract some of the standard stuff. restfully is such a library. In fact, it is a library, an interactive shell and a command.

Using Restfully as a shell

Get access to a working installation of restfully
Terminal.png laptop:
gem install restfully

It is also possible to run these examples from the restfully installation on site frontends in Grid'5000.

Run the example
Terminal.png frontend:
restfully --uri https://api.grid5000.fr/stable

pp root
pp root.sites
pp root.sites[:'rennes']

pp root.sites[:'rennes'].clusters[:'parapluie'].nodes[:'parapluie-1']

This will give you access to the description of a node in the reference API

{
  "bios": {
    "version": "O37",
    "vendor": "HP",
    "release_date": "09/06/2010"
  },
  "network_adapters": [
    {
      "device": "bmc",
      "mounted": false,
      "network_address": "parapluie-1-bmc.rennes.grid5000.fr",
      "rate": 1000000000,
      "mac": "78:e7:d1:65:a9:23",
      "management": true,
      "interface": "Ethernet",
      "enabled": true,
      "ip": "172.17.99.1",
      "mountable": false
    },
    { 
      "device": "eth0",
      "rate": 1000000000,
      "mac": "78:e7:d1:f5:ef:22",
      "interface": "Ethernet",
      "enabled": false,
      "driver": "igb"
    },
    {
      "device": "eth1",
      "switch": "gw",
      "mounted": true,
      "switch_port": "Gi3/42",
      "network_address": "parapluie-1.rennes.grid5000.fr",
      "rate": 1000000000,
      "mac": "78:e7:d1:f5:ef:23",
      "interface": "Ethernet",
      "enabled": true,
      "bridged": true,
      "version": "82576",
      "ip": "172.16.99.1",
      "vendor": "Intel",
      "driver": "igb",
      "mountable": true
    },
    {
      "device": "eth2",
      "rate": 1000000000,
      "mac": "78:e7:d1:f5:3f:30",
      "interface": "Ethernet",
      "enabled": false,
      "driver": "igb"
    },
    {
      "device": "eth3",
      "rate": 1000000000,
      "mac": "78:e7:d1:f5:3f:31",
      "interface": "Ethernet",
      "enabled": false,
      "driver": "igb"
    },
    {
      "device": "ib0",
      "mounted": true,
      "network_address": "parapluie-1-ib0.rennes.grid5000.fr",
      "rate": 10000000000,
      "mac": "20:00:55:00:41:80:00:00:00:00:00:00:00:02:c9:03:00:06:ba:0f",
      "interface": "Infiniband",
      "enabled": true,
      "version": "MT25418",
      "ip": "172.18.99.1",
      "vendor": "Mellanox",
      "driver": "mlx4_core",
      "mountable": true
    },
    {
      "device": "ib1",
      "rate": 10000000000,
      "mac": "20:00:55:00:41:80:00:00:00:00:00:00:00:02:c9:03:00:06:ba:10",
      "interface": "Infiniband",
      "enabled": false,
      "version": "MT25418",
      "vendor": "Mellanox",
      "driver": "mlx4_core"
    }
  ],
  "gpu": {
    "gpu": false
  },
  "operating_system": {
    "name": "Debian",
    "kernel": "2.6.26",
    "version": null,
    "release": "5.0"
  },
  "uid": "parapluie-1",
  "storage_devices": [
    {
      "model": "GB0250EAFYK",
      "device": "sda",
      "size": 250059350016,
      "rev": "HPG2",
      "interface": "SATA",
      "driver": "ahci"
    }
  ],
  "sensors": {
    "power": {
      "available": true,
      "via": {
        "api": {"metric": "pdu"},
        "pdu": {"port": 24, "uid": "parapluie-pdu-1"}}},
    "temperature": {
      "available": true,
      "via": {
        "api": {"metric": "ambient_temp"},
        "ipmi": {"sensors" : {"ambient"=>"Inlet Ambient"}}
       }
     }
   },
  "version": "5979b53f99f25f0a0773e5992a92e2763665ebc4",
  "type": "node",
  "links": [
    {
      "href": "/2.1/grid5000/sites/rennes/clusters/parapluie/nodes/parapluie-1/versions/5979b53f99f25f0a0773e5992a92e2763665ebc4",
      "title": "version",
      "rel": "member",
      "type": "application/vnd.fr.grid5000.api.Version+json;level=1"
    },
    { 
      "href": "/2.1/grid5000/sites/rennes/clusters/parapluie/nodes/parapluie-1/versions",
      "title": "versions",
      "rel": "collection",
      "type": "application/vnd.fr.grid5000.api.Collection+json;level=1"
    },
    {
      "href": "/2.1/grid5000/sites/rennes/clusters/parapluie/nodes/parapluie-1",
      "rel": "self",
      "type": "application/vnd.fr.grid5000.api.Node+json;level=1"
    },
    {
      "href": "/2.1/grid5000/sites/rennes/clusters/parapluie",
      "rel": "parent",
      "type": "application/vnd.fr.grid5000.api.Cluster+json;level=1"
    },
    {
      "href": "/2.1/grid5000/sites/rennes/clusters/parapluie/nodes/parapluie-1/status",
      "title": "status",
      "rel": "member",
      "type": "application/vnd.fr.grid5000.api.NodeStatus+json;level=1"
    }
  ],
  "supported_job_types": {
    "virtual": "amd-v",
    "besteffort": true,
    "deploy": true
  },
  "chassis": {
    "serial_number": "GB803651KY"
  },
  "processor": {
    "model": "AMD Opteron",
    "clock_speed": 1700000000.0,
    "cache_l1d": null,
    "version": "6164 HE",
    "other_description": "",
    "cache_l1": null,
    "cache_l2": null,
    "vendor": "AMD",
    "instruction_set": "",
    "cache_l1i": null
  },
  "main_memory": {
    "ram_size": 51539607552,
    "virtual_size": null
  },
  "monitoring": {
    "wattmeter": false,
    "temperature": true
  },
  "architecture": {
    "platform_type": "x86_64",
    "nb_procs": 2,
    "nb_cores": 24,
    "nb_threads": 24
  }
}

Using Restfully as a command

For this tutorial, we will suppose that we are only interested in nodes with only one hard drive for our experiments. We will therefore browse the reference API to discover those nodes, and we will give the following program (name it restfully_count.rb) and in order to discover those nodes we will use restfully commands.

Terminal.png frontend:
restfully --uri https://api.grid5000.fr/stable restfully_count.rb
 1 suitable_nodes=[]
 2 
 3 root.sites.each do |site| 
 4   site.clusters.each do |cluster| 
 5     cluster.nodes.each do |node| 
 6       if node["storage_devices"].size == 1
 7         suitable_nodes << node["uid"]+"."+site["uid"]+".grid5000.fr"
 8       end
 9     end
10   end
11 end
12 
13 puts "Found #{suitable_nodes.size} nodes with only one local storage device"

Of course, you can use arbitrary complex filters to select nodes with such a script.

Monitoring API

Of course, we only want free resources for this tutorial. We will therefore query the status of the resource using the monitoring API and check it is available for at least one hour before adding it to our list. We therefore update our script (restfully_count_free.rb) in the following way:

 1 suitable_nodes=[]
 2 
 3 root.sites.each do |site| 
 4   begin
 5     site.clusters.each do |cluster| 
 6       nodes_status=nil
 7       cluster.nodes.each do |node| 
 8         if node["storage_devices"].size == 1
 9           # there is at least one interesting node in this site
10           # get the status of nodes
11           nodes_status=site.status["nodes"] if nodes_status == nil
12           status=nodes_status[node["uid"]+"."+site["uid"]+".grid5000.fr"]
13           if status["soft"] == "free"
14             if status["reservations"].size == 0 || 
15                 (status["reservations"].size > 0 && Time.at(status["reservations"][0]["scheduled_at"])-Time.now>= 3600)
16               suitable_nodes << node["uid"]+"."+site["uid"]+".grid5000.fr"
17             else
18               puts "#{node["uid"]} is free but not available long enough"
19             end
20           end
21         end
22       end
23     end
24   rescue Restfully::HTTP::ServerError => e
25     puts "Could not access information from #{site["uid"]}"
26   end
27 end
28 
29 puts "Found #{suitable_nodes.size} nodes with only one local storage device available for the next hour"

Run the example with

Terminal.png frontend:
restfully --uri https://api.grid5000.fr/stable restfully_count_free.rb

Some explanations

  • We have added some fault tolerance in the script, handling cases when a site is not reachable (this is the begin-rescue block)
  • We have queried the status of a node using site.status["nodes"], that gives you access in one call to the API at the status of all nodes of that site. It is a key-value pair, and for a given node, this is the type of answer you will get
{"hard"=>"alive",
 "soft"=>"free",
 "reservations"=>
  [{"uid"=>547479,
    "user_uid"=>"dshestakov",
    "user"=>"dshestakov",
    "walltime"=>223020,
    "queue"=>"default",
    "state"=>"waiting",
    "project"=>"default",
    "types"=>["deploy"],
    "mode"=>"INTERACTIVE",
    "command"=>"",
    "submitted_at"=>1378722136,
    "scheduled_at"=>1380906120,
    "started_at"=>1380906120,
    "message"=>"",
    "properties"=>"(deploy = 'YES') AND maintenance = 'NO'",
    "directory"=>"/home/dshestakov"},
   {"uid"=>548661,
    "user_uid"=>"ddib",
    "user"=>"ddib",
    "walltime"=>216000,
    "queue"=>"default",
    "state"=>"waiting",
    "project"=>"default",
    "types"=>["deploy"],
    "mode"=>"INTERACTIVE",
    "command"=>"",
    "submitted_at"=>1380532200,
    "scheduled_at"=>1381510801,
    "started_at"=>1381510801,
    "message"=>"R=481,W=60:0:0,J=R,T=deploy",
    "properties"=>"(deploy = 'YES') AND maintenance = 'NO'",
    "directory"=>"/home/ddib"},
   {"uid"=>548662,
    "user_uid"=>"hchihoub",
    "user"=>"hchihoub",
    "walltime"=>216000,
    "queue"=>"default",
    "state"=>"waiting",
    "project"=>"default",
    "types"=>["deploy"],
    "mode"=>"INTERACTIVE",
    "command"=>"",
    "submitted_at"=>1380532283,
    "scheduled_at"=>1382115601,
    "started_at"=>1382115601,
    "message"=>"R=481,W=60:0:0,J=R,T=deploy",
    "properties"=>"(deploy = 'YES') AND maintenance = 'NO'",
    "directory"=>"/home/hchihoub"},
   {"uid"=>548666,
    "user_uid"=>"ddib",
    "user"=>"ddib",
    "walltime"=>216000,
    "queue"=>"default",
    "state"=>"waiting",
    "project"=>"default",
    "types"=>["deploy"],
    "mode"=>"INTERACTIVE",
    "command"=>"",
    "submitted_at"=>1380533161,
    "scheduled_at"=>1383933601,
    "started_at"=>1383933601,
    "message"=>"R=481,W=60:0:0,J=R,T=deploy",
    "properties"=>"(deploy = 'YES') AND maintenance = 'NO'",
    "directory"=>"/home/ddib"},
   {"uid"=>548668,
    "user_uid"=>"hchihoub",
    "user"=>"hchihoub",
    "walltime"=>216000,
    "queue"=>"default",
    "state"=>"waiting",
    "project"=>"default",
    "types"=>["deploy"],
    "mode"=>"INTERACTIVE",
    "command"=>"",
    "submitted_at"=>1380533260,
    "scheduled_at"=>1383328801,
    "started_at"=>1383328801,
    "message"=>"R=481,W=60:0:0,J=R,T=deploy",
    "properties"=>"(deploy = 'YES') AND maintenance = 'NO'",
    "directory"=>"/home/hchihoub"},
   {"uid"=>548754,
    "user_uid"=>"tbuchert",
    "user"=>"tbuchert",
    "walltime"=>46800,
    "queue"=>"default",
    "state"=>"waiting",
    "project"=>"default",
    "types"=>["deploy"],
    "mode"=>"INTERACTIVE",
    "command"=>"",
    "submitted_at"=>1380649673,
    "scheduled_at"=>1381161660,
    "started_at"=>1381161660,
    "message"=>"R=1393,W=13:0:0,J=R,T=deploy",
    "properties"=>"(deploy = 'YES') AND maintenance = 'NO'",
    "directory"=>"/home/tbuchert"},
   {"uid"=>548862,
    "user_uid"=>"sallier",
    "user"=>"sallier",
    "walltime"=>50400,
    "queue"=>"default",
    "state"=>"waiting",
    "project"=>"default",
    "types"=>["deploy"],
    "mode"=>"INTERACTIVE",
    "command"=>"",
    "submitted_at"=>1380717634,
    "scheduled_at"=>1380819600,
    "started_at"=>1380819600,
    "message"=>"R=160,W=14:0:0,J=R,T=deploy",
    "properties"=>"(deploy = 'YES') AND maintenance = 'NO'",
    "directory"=>"/home/sallier"},
   {"uid"=>548890,
    "user_uid"=>"amehiaoui",
    "user"=>"amehiaoui",
    "walltime"=>172800,
    "queue"=>"default",
    "state"=>"waiting",
    "project"=>"default",
    "types"=>["deploy"],
    "mode"=>"INTERACTIVE",
    "command"=>"",
    "submitted_at"=>1380783074,
    "scheduled_at"=>1382770800,
    "started_at"=>1382770800,
    "message"=>"R=793,W=48:0:0,J=R,T=deploy",
    "properties"=>"(deploy = 'YES') AND maintenance = 'NO'",
    "directory"=>"/home/amehiaoui"}],
 "comment"=>"OK"}

Reserve resources

The Jobs Jobs API will allow us to reserve one of the nodes. Regular users have found that when they write a script that reserves resources, it is good practise to first check whether we already have a reservation made by a previous run of the script, specially when debugging other parts of the script. To find an existing job, we will use its name. Note here that if you run the script from your workstation, and that your local username is not your Grid'5000 login, you will need to adapt line 9. The following are the relevant code snippets. A complete example can be found as restfully_submit.rb

 1 job_name = "Running the API all in one Tutorial to learn to reserve nodes #{File.basename($0)}"
 2 
 3 #Look for a previously submitted job of the same name
 4 #To avoid submitting submitting twice
 5 my_job=nil
 6 root.sites.each do |site| 
 7   begin
 8     if my_job==nil
 9       site.jobs(:query => {:user =>ENV["USER"]}).each do |job|
10         if job["name"] == job_name
11           my_job=job
12           break
13         end
14       end
15     end
16   rescue Restfully::HTTP::ServerError => e
17     puts "Site #{site["uid"]} unreachable"
18     # print e.message
19   end
20 end

We will generalize the data structures to save the node, cluster and site data for each suitable node.

1 ...
2               if status["reservations"].size == 0 || 
3                   (status["reservations"].size > 0 && Time.at(status["reservations"][0]["scheduled_at"])-Time.now>= 3600)
4                 suitable_nodes << { :node => node,
5                         :cluster => cluster,
6                         :site => site
7                 }
8 ...

If you don't find resources, you can now create a job with

 1   elected_node = suitable_nodes.pop
 2 
 3   if elected_node != nil
 4     puts "Attempt to create a job on #{elected_node[:node]["uid"]}.#{elected_node[:site]["uid"]}.grid5000.fr"
 5     begin
 6       my_job=elected_node[:site].jobs.submit(
 7                                              :resources => "nodes=1,walltime=00:30:00",
 8                                              :properties => "network_address in ('#{elected_node[:node]["uid"]}.#{elected_node[:site]["uid"]}.grid5000.fr')",
 9                                              :command => "sleep 3600",
10                                              :types => ["allow_classic_ssh"],
11                                              :name => job_name
12                                              )
13     rescue Restfully::HTTP::ServerError => e
14       status=elected_node[:node].status(:query => { :reservations_limit => '5'})
15       puts e.message
16       puts "#{status["system_state"]}"
17       pp status["reservations"]
18       puts "Could node get a job on #{elected_node[:node]["uid"]}. Please retry on another node"
19     end
20   end

In all case, you can look at the status of your job to wait for your job to be running before going further

 1 if my_job != nil
 2   my_job.reload
 3   puts "Found job #{my_job["uid"]} in state #{my_job["state"]}"
 4   puts " expected to start at #{Time.at(my_job["scheduled_at"])}" if my_job["scheduled_at"] != nil
 5 
 6   wait_time=0
 7   while my_job.reload['state'] != "running" && wait_time < 30
 8     sleep 1
 9     wait_time+=1
10     print '.'
11   end
12   if my_job['state'] == "running"
13     puts "running on node #{my_job["assigned_nodes"].first}. Need to do something with this job. Ssh to #{my_job["assigned_nodes"].first.split('.')[1]} and connect to the job using oarsub -C #{my_job['uid']}"
14   else
15     puts "Stopped waiting for the job to start."
16   end

Let's experiment

We will now reserve one free node with only one hard drive and script an experiment using net-ssh as in restfully_complete_experiment.

restfully_complete_experiment.rb :

  1 # (c) 2012-2016 Inria by David Margery (david.margery@inria.fr) in the context of the Grid'5000 project
  2 # Licenced under the CeCILL B licence.
  3 
  4 require 'net/ssh'
  5 
  6 pdu_nodes= {}
  7 g5k_login=ENV["USER"]
  8 job_name = "Running the API all in one Tutorial to learn to reserve nodes #{File.basename($0)}"
  9 
 10 #Look for a previously submitted job of the same name
 11 #To avoid submitting submitting twice
 12 my_job=nil
 13 my_site=nil
 14 
 15 root.sites.each do |site| 
 16   begin
 17     if my_job==nil
 18       site.jobs(:query => {:user =>g5k_login, :name => job_name}).each do |job|
 19         if job["name"] == job_name && job['state'] != 'error'
 20           my_job=job
 21           my_site=site
 22           break
 23         end
 24       end
 25     end
 26   rescue Restfully::HTTP::ServerError => e
 27     puts "Site #{site["uid"]} unreachable"
 28     # print e.message
 29   end
 30 end
 31 
 32 if my_job==nil
 33   #we fallback to looking for available resources
 34   suitable_nodes=[]
 35 
 36   root.sites.each do |site| 
 37     begin
 38       site.clusters.each do |cluster| 
 39         nodes_status=nil
 40         cluster.nodes.each do |node| 
 41           if node["storage_devices"].size == 1
 42             # there is at least one interesting node in this site
 43             # get the status of nodes
 44             nodes_status=site.status["nodes"] if nodes_status == nil
 45             status=nodes_status[node["uid"]+"."+site["uid"]+".grid5000.fr"]
 46             if status["soft"] == "free"
 47               if status["reservations"].size == 0 || 
 48                   (status["reservations"].size > 0 && Time.at(status["reservations"][0]["scheduled_at"])-Time.now>= 3000)
 49                 suitable_nodes << { :node => node,
 50                         :cluster => cluster,
 51                         :site => site
 52                 }
 53               else
 54                 puts "#{node["uid"]} is free but not available long enough"
 55               end
 56             end
 57           end
 58         end
 59       end
 60     rescue Restfully::HTTP::ServerError => e
 61       puts "Could not access information from #{site["uid"]}"
 62     end
 63   end
 64     
 65   elected_node = suitable_nodes.pop
 66 
 67   if elected_node != nil
 68     puts "Attempt to create a job on #{elected_node[:node]["uid"]}.#{elected_node[:site]["uid"]}.grid5000.fr"
 69     begin
 70       my_site=elected_node[:site]
 71       my_job=my_site.jobs.submit(
 72                                              :resources => "nodes=1,walltime=00:30:00",
 73                                              :properties => "network_address in ('#{elected_node[:node]["uid"]}.#{elected_node[:site]["uid"]}.grid5000.fr')",
 74                                              :command => "sleep 3600",
 75                                              :types => ["allow_classic_ssh"],
 76                                              :name => job_name
 77                                              )
 78     rescue Restfully::HTTP::ServerError => e
 79       status=elected_node[:node].status(:query => { :reservations_limit => '5'})
 80       puts e.message
 81       puts "#{status["system_state"]}"
 82       pp status["reservations"]
 83       puts "Could node get a job on #{elected_node[:node]["uid"]}. Please retry on another node"
 84     end
 85   end
 86 end
 87 
 88 if my_job != nil
 89   my_job.reload
 90   puts "Found job #{my_job["uid"]} in state #{my_job["state"]}"
 91   puts " expected to start at #{Time.at(my_job["scheduled_at"])}" if my_job["scheduled_at"] != nil
 92   
 93   wait_time=0
 94   while my_job.reload['state'] != "running" && wait_time < 30
 95     sleep 1
 96     wait_time+=1
 97     print '.'
 98   end
 99   
100   if my_job['state'] == "running"
101     puts "running on node #{my_job["assigned_nodes"]}. Will do something with this job"
102     
103     fqdn=my_job["assigned_nodes"][0]
104     host=fqdn.split('.')[0]
105     cluster= host.match(/(\w+)-.*/)[1]
106 
107     node=my_site.clusters[cluster.to_sym].nodes[host.to_sym]
108     threads=node["architecture"]["nb_threads"]
109     
110     gw=nil
111     if Socket.gethostname !~ /grid5000.fr/
112       require 'net/ssh/gateway'
113       # Need to connect to the node through a gateway
114       # A lot here depends on your ssh config
115       # usefull options are
116       # * :keys_only => true to use specified keys before keys offered by your ssh-agent
117       # * :verbose => :debug to see why the connection fails
118       # * :keys => ["private_key_file to use"]
119       # * :config => false to bypass your ssh_config file
120       puts "  created a gateway for the ssh connexion"
121       gw=Net::SSH::Gateway.new('access.grid5000.fr', g5k_login, :keys_only => true)
122     end
123     
124     ssh= if gw
125            puts "  connecting to #{my_job["assigned_nodes"][0]} through gateway"
126            ssh=gw.ssh(my_job["assigned_nodes"][0], g5k_login, :keys_only => true )
127          else
128            ssh=Net::SSH.start(my_job["assigned_nodes"][0], g5k_login, :keys_only => true )
129          end
130     
131     #stress the node a bit to see the impact on consumption
132     events={}
133     puts "  running date"
134     start=DateTime.parse(ssh.exec!('date'))
135     cmd_time=240
136     cmds=["stress -t #{cmd_time} -c #{threads}", "stress -t #{cmd_time} -i  #{threads}","stress -t #{cmd_time} -m  #{threads}","stress -t #{cmd_time} -c  #{threads} -i  #{threads} -m  #{threads}", "sleep #{cmd_time}"]
137     cmds.each do |cmd|
138       puts "  running #{cmd} on node"
139       events[DateTime.parse(ssh.exec!('date')).to_time.to_i]= "Now running #{cmd}"
140       ssh.exec!(cmd)
141     end
142     
143     ssh.close
144     gw.close(ssh.transport.port) if gw
145     
146     #get the values from this experiment
147     [:cpu_user,:cpu_system,:mem_free].each do |metric|
148       data_desc=my_site.metrics[metric]
149       pdu_values=data_desc.timeseries(:query => {:resolution => 15, :from => start.to_time.to_i})[host.to_sym]
150       sample_timestamp=pdu_values["from"]
151       sample_resolution=pdu_values["resolution"]
152       puts "Got values from #{Time.at(sample_timestamp)} at a resolution of a value every #{sample_resolution}s for #{metric}"
153       pdu_values["values"].each do |sample|
154         if !events.has_key?(sample_timestamp)
155           events[sample_timestamp]=""
156         end
157         events[sample_timestamp]+= " #{sample}% #{metric} measured" if [:cpu_user,:cpu_system].include?(metric)
158         events[sample_timestamp]+= " #{sample} Bytes of #{metric} measured" if [:mem_free,:mem_cache].include?(metric)
159         sample_timestamp+=sample_resolution
160       end
161     end
162 
163     events.keys.sort!.each do | timestamp|
164       puts "#{Time.at(timestamp)}: #{events[timestamp]}"
165     end
166   end
167 end

With a 167 line script, it becomes probably quite useful to use restfully as a library rather than passing a script as a parameter of the restfully executable. An example can be found at http://grid5000.github.com/tutorials/api/2.0/restfully-tutorial.html