G5k-checks
Get the sources
Get the latest development from the Grid'5000 Git repository:
Description
g5k-checks
- g5k-checks is expected to be integrated into the production environment of the Grid'5000 computational nodes. It gathers a collection of programs which check that a node meets several basic requirements before it declares itself as available to the OAR server.
- This lets the admins enable some checkers which may be very specific to the hardware of a cluster.
g5k-checks executes at boot time in two phases:
- Phase 1
- An init script, /etc/init.d/g5k-checks, runs all checkers that must run early enough in the boot process.
- They are listed in the variable CHECKS_FOR_INIT in the configuration file.
- Then it enables all checkers listed in the variable CHECKS_FOR_OAR for Phase 2.
- Phase 2
- This phase strongly relies on the check mechanism provided by OAR and the oar-node configuration file (/etc/default/oar-node for deb distros, /etc/sysconfig/oar-node for rpm ones).
- The oar-node flavour of OAR installation embeds an hourly cron job,
/usr/lib/oar/oarnodecheckrun
, which runs all executable files stored in/etc/oar/check.d/
. Then the server periodically invokes remotely/usr/bin/oarnodecheckquery
. This command returns with status 0 if there is some files in/var/lib/oar/check.d
and 0 otherwise. So if a checker in /etc/oar/check.d/ finds something wrong, it simply has to create a log file in that directory. - The version of /etc/(default|sysconfig)/oar-node that g5k-checks installs runs both
oarnodecheckrun
andoarnodecheckquery
scripts. If the latter fails, then the node is not ready to start, and it loops on running those scripts until eitheroarnodecheckquery
returns 0 or a timeout is reached. If the timeout is reached, then it does not attempt to declare the node as "Alive". - During Phase 1, the enabling of a checker simply turns out to adding a symbolic link in /etc/oar/check.d to its "oar-node driver". We name this a short script file which interfaces the core checker to the OAR check mechanism.
At any moment when the node is running g5k-checks may be called either to disable or to enable checks. This is expected to be used by OAR prologue and epilogue:
- /etc/init.d/g5k-checks stop: disable OAR checks
- /etc/init.d/g5k-checks start: enable OAR checks for oarnodecheckrun
- /etc/init.d/g5k-checks startrun: enable OAR checks for oarnodecheckrun and run once the couple oarnodecheckrun/oarnodecheckquery, without waiting for one hour to be passed.
Basically, the OAR prologue should call "stop", while the epilogue should call "startrun".
- At installation time, g5k-checks configures the local syslog daemon: it first looks for a free <n> such as local<n>.alert and local<n>.warning selectors are not used, and then defines them with the action "@syslog". If there is no "syslog" host on the local network, then it defaults to writing messages in the local syslog fi| width="10" |
network_adapters_*_network_address
| width="10" |
| bgcolor="" width="10" |
|-le.
- The checkers use the local<n> facility to report only important messages. They use a local log file for debugging messages. Please see section 3.2 for further details.
Checks Overview
legend
:-) | means |
---|---|
![]() |
no test |
![]() |
test |
![]() |
test but doesn't work on each cluster |
![]() |
don't know if we could test |
g5k-parts
g5k-parts is designed to run at both phases of g5k-checks (see above).
- In Phase 1, g5k-parts validates the partitioning of a Grid'5000 computational node against the G5K Node Storage convention: all partitions but /tmp are primary, and /tmp is a logical partition inside the only extended partition.
- It first compares /etc/fstab with its backup generated at deployment time. When errors are found at this level, /etc/fstab is reset and the machine reboots.
- Then for every partition given on the command line, it first matches its geometry on the hard drive with the partition layout saved at deployment time. In the new g5kchecks, we decide that no formating is doing after an error (let's do that with charon )
Architecture
ref API | check ? | comment(s) |
---|---|---|
architecture_smp_size |
number of core | |
architecture_platform_type |
platform type | |
architecture_smt_size |
number of thread |
Bios
ref API | check ? | comment(s) |
---|---|---|
bios_version |
||
bios_vendor |
||
bios_release_date |
BMC
ref API | check ? | comment(s) |
---|---|---|
network_adapters_bmc_ip |
||
network_adapters_bmc_mac |
||
network_adapters_bmc_managment |
Chassis
ref API | check ? | comment(s) |
---|---|---|
chassis_serial_number |
||
chassis_manufacturer |
||
chassis_product_name |
disk
ref API | check ? | comment(s) |
---|---|---|
storage_devices_*_device |
||
storage_devices_*_size |
||
storage_devices_*_model |
||
storage_devices_*_rev |
Memory
ref API | check ? | comment(s) |
---|---|---|
main_memory_ram_size |
Network
ref API | check ? | comment(s) |
---|---|---|
network_adapters_*_device |
||
network_adapters_*_interface |
||
network_adapters_*_ip4 |
||
network_adapters_*_ip6 |
||
network_adapters_*_switch |
||
network_adapters_*_switch_port |
||
network_adapters_*_bridged |
||
network_adapters_*_driver |
||
network_adapters_*_mac |
||
network_adapters_*_guid |
||
network_adapters_*_rate |
||
network_adapters_*_version |
||
network_adapters_*_vendor |
||
network_adapters_*_mountable |
||
network_adapters_*_enabled |
||
network_adapters_*_mounted |
||
network_adapters_*_management |
OS
ref API | check ? | comment(s) |
---|---|---|
operating_system_name |
||
operating_system_kernel |
||
operating_system_version |
Processor
ref API | check ? | comment(s) |
---|---|---|
processor_clock_speed |
||
processor_instruction_set |
||
processor_model |
||
processor_version |
||
processor_vendor |
||
processor_description |
||
processor_cache_l2 |
||
processor_cache_l3 |
||
processor_cache_l1 |
||
processor_cache_l1d |