G5k-checks

From Grid5000
Jump to navigation Jump to search


Description

Overview

  • g5k-checks is expected to be integrated into the standard environment of the Grid'5000 computational nodes. It checks that a node meets several basic requirements before it declares itself as available to the OAR server.
  • This lets the admins enable some checkers which may be very specific to the hardware of a cluster.

Architecture

G5kchecks is based on rspec test suite. Rspec is a little bit roundabout of it first mission: test a program. We use rspec to test all node characteristics. The first step is to retrieve node informatation with ohai. By default ohai provides a large set of characteristics of the machine. Added to this, we have developed some plugins to complete missing information (particularly for the disk, the cpu and the network). The second step is to compare those characteristics with the grid5000 Reference_Repository. To do that, g5kchecks takes each value of the API and compares them with the values given by ohai. If those values don't match, then an error is thrown via the rspec process.

oar

  • The oar-node flavour of OAR installation embeds an hourly cron job, /usr/lib/oar/oarnodecheckrun, which runs the executable file /etc/oar/check.d/start_g5kchecks. Then the server periodically invokes remotely /usr/bin/oarnodecheckquery. This command returns with status 1 if /var/lib/oar/check.d/ and 0 otherwise. So if /etc/oar/check.d/start_g5kchecks finds something wrong, it simply has to create a log file in that directory.
  • The version of /etc/default/oar-node that g5k-checks installs runs both oarnodecheckrun and oarnodecheckquery scripts. If the latter fails, then the node is not ready to start, and it loops on running those scripts until either oarnodecheckquery returns 0 or a timeout is reached. If the timeout is reached, then it does not attempt to declare the node as "Alive".

Checks Overview

legend

legend
:-) means
Fail.png

no test

Check.png

test

InProgress.png

test but doesn't work on each cluster

NoStarted.png

don't know if we could test

g5k-parts

g5k-parts is designed to run at both phases of g5k-checks (see above).

  • In Phase 1, g5k-parts validates the partitioning of a Grid'5000 computational node against the G5K Node Storage convention: all partitions but /tmp are primary, and /tmp is a logical partition inside the only extended partition.
  • It first compares /etc/fstab with its backup generated at deployment time. When errors are found at this level, /etc/fstab is reset and the machine reboots.
  • Then for every partition given on the command line, it first matches its geometry on the hard drive with the partition layout saved at deployment time. In the new g5kchecks, we decide that no formating is doing after an error (let's do that with charon )

Clock

G5kchecks ensure that the node is up to time by perform tree step:

* stop the ntp client; 
* synchronize with the ntp server of the site
* start the client

If the OS clock is different from hardware clock than g5kchecks puts the good time on the hardware clock. It ensure that the hardware clock is right and was not set by another user during another deployment.

Virtual Hardware

What is checks by the new version
ref API check ? comment(s)

supported_job_types_virtual

Check.png

Architecture

What is checks by the new version
ref API check ? comment(s)

architecture_platform_type

Check.png

platform type (x86_64 ...)

architecture_nb_procs

Check.png

number of procs

architecture_nb_cores

Check.png

number of cores

architecture_nb_threads

Check.png

number of thread

Bios

What is checks by the new version
ref API check ? comment(s)

bios_version

Check.png

bios_vendor

Check.png

bios_release_date

Check.png

BMC

What is checks by the new version
ref API check ? comment(s)

network_adapters_bmc_ip

Check.png

Can, but ipmitool is not present in standard environment

network_adapters_bmc_mac

Check.png

Can, but ipmitool is not present in standard environment

network_adapters_bmc_managment

Check.png

Can, but ipmitool is not present in standard environment

Chassis

What is checks by the new version
ref API check ? comment(s)

chassis_serial_number

Check.png

chassis_manufacturer

Check.png

chassis_product_name

Check.png

Disk

What is checks by the new version
ref API check ? comment(s)

storage_devices_*_device

Check.png

storage_devices_*_size

Check.png

storage_devices_*_model

Check.png

storage_devices_*_rev

Check.png

storage_devices_*_driver

Fail.png

storage_devices_*_interface

Fail.png

Memory

What is checks by the new version
ref API check ? comment(s)

main_memory_ram_size

Check.png

Network

What is checks by the new version
ref API check ? comment(s)

network_adapters_*_device

Check.png

network_adapters_*_interface

Check.png

network_adapters_*_ip4

Check.png

network_adapters_*_ip6

Check.png

network_adapters_*_switch

Fail.png

NoStarted.png

network_adapters_*_switch_port

Fail.png

NoStarted.png

network_adapters_*_bridged

Fail.png

network_adapters_*_driver

Fail.png

network_adapters_*_mac

Check.png

network_adapters_*_guid

Check.png

network_adapters_*_rate

Check.png

network_adapters_*_version

Fail.png

network_adapters_*_vendor

Fail.png

network_adapters_*_mountable

Check.png

network_adapters_*_enabled

Check.png

network_adapters_*_mounted

Check.png

network_adapters_*_management

Check.png

OS

What is checks by the new version
ref API check ? comment(s)

operating_system_name

Check.png

operating_system_kernel

Check.png

operating_system_version

Check.png

Processor

turboboost_enabled
What is checks by the new version
ref API check ? comment(s)

processor_clock_speed

Check.png

processor_instruction_set

Check.png

processor_model

Check.png

processor_version

Check.png

processor_vendor

Check.png

processor_description

Check.png

processor_cache_l2

Check.png

processor_cache_l3

Check.png

processor_cache_l1

Check.png

processor_cache_l1d

Check.png

Check.png

Simple usage

Installation

G5kchecks is has been tested for wheezy and jessie on grid5000 debian repository, just add on /etc/apt/sources.list

deb http://apt.grid5000.fr/debian sid main

Get grid5000 keyring (A5ED59A7AF7F6E3B):

Terminal.png node:
apt-get update ; apt-get install grid5000-keyring && apt-get update

Install it:

Terminal.png node:
apt-get install g5kchecks

Get sources

Terminal.png node:
git clone git@gitolite.grid5000.fr:g5k-checks

Run g5k-checks

If you want to check your node just run:

Terminal.png node:
g5k-checks

If some error occurs, g5k-checks puts file in /var/lib/g5kchecks/. For instance:

 root@adonis-3:~# g5k-checks
 root@adonis-3:~# ls /var/lib/oar/checklogs/
 OAR_Architecture_should_have_the_correct_number_of_thread
 root@adonis-3:~# cat /var/lib/oar/checklogs/OAR_Architecture_should_have_the_correct_number_of_thread 
 {"started_at":"2013-09-25 15:07:16 +0200","exception":"16, 8, architecture, nb_threads",
   "status":"failed","finished_at":"2013-09-25 15:07:16 +0200","run_time":0.000155442}

This means that adonis-3 haven't good number of thread (nb_threads is 16 instead of 8).

Get node description

If you want a exact node description you can run:

Terminal.png node:
g5k-checks -m api

Then g5k-checks put a json and a yaml file in /tmp/

 root@adonis-3:~# g5k-checks -m api
 root@adonis-3:~# ls /tmp/
 adonis-3.grenoble.grid5000.fr.json  adonis-3.grenoble.grid5000.fr.yaml  lost+found

Write your own checks/description

G5k-checks internal

G5k-checks is write in ruby help by rspec test framework. It gather informations from ohai program and compare they with grid'5000 reference API. Rspec is simple to read and write, so you can copy easily other checks already install by default and adapt they to your needs.

On Debian file are store in /usr/lib/ruby/vendor_ruby/g5kchecks. Tree is:

 ├── ohai # Add information to ohai, those informations are use by g5k-checks after
 ├── rspec # Add Rspec formatter (store informations in different way)
 ├── spec # Checks directory
 └── utils # some useful class

Play with ohai

Ohai is a small program who retrieve information from different files/other program on the host. It offers an easy too parse output in Json. We can add information to Json just by writing plugins. For instance if we want to add the version of bash in the description, you can create a small file /usr/lib/ruby/vendor_ruby/g5kchecks/ohai/package_version.rb with:

 provides "packages"
 packages Mash.new
 packages[:bash] = `dpkg -l | grep bash | awk '{print $3}'`

Play with Rspec

Rspec is a framework for testing ruby programs. G5k-checks use Rspec, not to test a ruby program, but to test host. Rspec is simple to read and write. For instance if we want to ensure that bash version is the good one, you can create a file /usr/lib/ruby/vendor_ruby/g5kchecks/spec/packages/packages_spec.rb with :

 describe "Packages" do
                                                                                                                                           
   before(:all) do                                                                                                                         
     @system = RSpec.configuration.node.ohai_description
   end
   
   it "bash should should have the good version" do                                                                                        
     puts @system[:packages][:bash].to_yaml
     bash_version = @system[:packages][:bash].strip                                                                                        
     bash_version.should eql("4.2+dfsg-0.1"), "#{bash_version}, 4.2+dfsg-0.1, packages, bash"                                              
   end
       
 end

Add checks

Example: I want to check if flag "acpi" is available on the processor:

Add to /usr/lib/ruby/vendor_ruby/g5kchecks/spec/processor/processor_spec.rb:

 it "should have apci" do
   acpi_ohai = @system[:cpu][:'0'][:flags].include?('acpi')
   acpi_ohai.should_not be_false, "#{acpi_ohai}, is not acpi, processor, acpi"
 end

Add informations in description

Example: I want to add bogomips of node:

First we should add information in ohai description. To do this we add in /usr/lib/ruby/vendor_ruby/g5kchecks/ohai/cpu.rb at line 58:

   if line =~ /^BogoMIPS/
     cpu[:Bogo] = line.chomp.split(": ").last.lstrip
   end

Then we can retrieve information and add it to the description. To do this we add in /usr/lib/ruby/vendor_ruby/g5kchecks/spec/processor/processor_spec.rb:

   it "should have BogoMIPS" do
     bogo_ohai = @system[:cpu][:Bogo]
     bogo_ohai.should be_nil, "#{bogo_ohai}, don't have information, processor, bogoMIPS"
   end

Now you have the information in /tmp/mynode.mysite.grid5000.fr.yaml:

   root@graphene-100:/usr/lib/ruby/vendor_ruby/g5kchecks# g5k-checks -m api
   root@graphene-100:/usr/lib/ruby/vendor_ruby/g5kchecks# grep -C 3 bogo /tmp/graphene-100.nancy.grid5000.fr.yaml 
     ram_size: 16860348416
   processor:
     clock_speed: 2530000000
     bogoMIPS: 5053.74
     instruction_set: x86-64
     model: Intel Xeon
     version: X3440