G5k-checks
Description
Overview
- g5k-checks is expected to be integrated into the production environment of the Grid'5000 computational nodes. It check that a node meets several basic requirements before it declares itself as available to the OAR server.
- This lets the admins enable some checkers which may be very specific to the hardware of a cluster.
Archictecture
G5kchecks is based on rspec test suite. Rspec is a little bit roundabout of it first mission: test a program. We use rspec to test all node characteristics. The first step is to retrieve node informatation with ohai. By default ohai provide and large set of characteristic of the machine. Added to this, we have developed some plugin to complete missings informations (particulary for the disk, the cpu and the network). The second step is to compare thoses caracteristics with the grid5000 Reference_Repository. To do that g5kchecks take each values of the API and compare them with the value given by ohai. If those both values are not equals, then a error is throwed via the rspec process.
oar
- The oar-node flavour of OAR installation embeds an hourly cron job,
/usr/lib/oar/oarnodecheckrun
, which runs the executable file/etc/oar/check.d/start_g5kchecks
. Then the server periodically invokes remotely/usr/bin/oarnodecheckquery
. This command returns with status 1 if/var/lib/oar/check.d/
and 0 otherwise. So if /etc/oar/check.d/start_g5kchecks finds something wrong, it simply has to create a log file in that directory. - The version of /etc/default/oar-node that g5k-checks installs runs both
oarnodecheckrun
andoarnodecheckquery
scripts. If the latter fails, then the node is not ready to start, and it loops on running those scripts until eitheroarnodecheckquery
returns 0 or a timeout is reached. If the timeout is reached, then it does not attempt to declare the node as "Alive".
Checks Overview
legend
:-) | means |
---|---|
![]() |
no test |
![]() |
test |
![]() |
test but doesn't work on each cluster |
![]() |
don't know if we could test |
g5k-parts
g5k-parts is designed to run at both phases of g5k-checks (see above).
- In Phase 1, g5k-parts validates the partitioning of a Grid'5000 computational node against the G5K Node Storage convention: all partitions but /tmp are primary, and /tmp is a logical partition inside the only extended partition.
- It first compares /etc/fstab with its backup generated at deployment time. When errors are found at this level, /etc/fstab is reset and the machine reboots.
- Then for every partition given on the command line, it first matches its geometry on the hard drive with the partition layout saved at deployment time. In the new g5kchecks, we decide that no formating is doing after an error (let's do that with charon )
Clock
G5kchecks ensure that the node is up to time by perform tree step:
* stop the ntp client; * synchronize with the ntp server of the site * start the client
If the OS clock is different from hardware clock than g5kchecks puts the good time on the hardware clock. It ensure that the hardware clock is right and was not set by another user during another deployment.
Virtual Hardware
ref API | check ? | comment(s) |
---|---|---|
supported_job_types_virtual |
Architecture
ref API | check ? | comment(s) |
---|---|---|
architecture_smp_size |
number of core | |
architecture_platform_type |
platform type | |
architecture_smt_size |
number of thread |
Bios
ref API | check ? | comment(s) |
---|---|---|
bios_version |
||
bios_vendor |
||
bios_release_date |
BMC
ref API | check ? | comment(s) |
---|---|---|
network_adapters_bmc_ip |
||
network_adapters_bmc_mac |
||
network_adapters_bmc_managment |
Chassis
ref API | check ? | comment(s) |
---|---|---|
chassis_serial_number |
||
chassis_manufacturer |
||
chassis_product_name |
Disk
ref API | check ? | comment(s) |
---|---|---|
storage_devices_*_device |
||
storage_devices_*_size |
||
storage_devices_*_model |
||
storage_devices_*_rev |
Memory
ref API | check ? | comment(s) |
---|---|---|
main_memory_ram_size |
Network
ref API | check ? | comment(s) |
---|---|---|
network_adapters_*_device |
||
network_adapters_*_interface |
||
network_adapters_*_ip4 |
||
network_adapters_*_ip6 |
||
network_adapters_*_switch |
||
network_adapters_*_switch_port |
||
network_adapters_*_bridged |
||
network_adapters_*_driver |
||
network_adapters_*_mac |
||
network_adapters_*_guid |
||
network_adapters_*_rate |
||
network_adapters_*_version |
||
network_adapters_*_vendor |
||
network_adapters_*_mountable |
||
network_adapters_*_enabled |
||
network_adapters_*_mounted |
||
network_adapters_*_management |
OS
ref API | check ? | comment(s) |
---|---|---|
operating_system_name |
||
operating_system_kernel |
||
operating_system_version |
Processor
ref API | check ? | comment(s) |
---|---|---|
processor_clock_speed |
||
processor_instruction_set |
||
processor_model |
||
processor_version |
||
processor_vendor |
||
processor_description |
||
processor_cache_l2 |
||
processor_cache_l3 |
||
processor_cache_l1 |
||
processor_cache_l1d |
Installation
G5kchecks is available on grid5000 debian repository, just add
deb http://apt.grid5000.fr/debian sid main
on /etc/apt/sources.list and install it:
Get the sources
Get the latest development from the Grid'5000 Git repository: