G5k-checks

From Grid5000
Revision as of 21:57, 6 June 2010 by Probert (talk | contribs) (New page: {{Maintainer|Philippe Robert}} {{Author|Philippe Combes}} {{Author|Philippe Robert}} {{Portal|Admin}} {{Portal|Service}} {{Status|Open for comment}} = Description = == g5k-checks == * ...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Description

g5k-checks

  • g5k-checks is expected to be integrated into the production environment of the Grid'5000 computational nodes.

It gathers a collection of programs which check that a node meets several basic requirements before it declares itself as available to the OAR server.

  • g5k-checks is expected to be integrated into the production environment of the Grid'5000 computational nodes.

It gathers a collection of programs which check that a node meets several basic requirements before it declares itself as available to the OAR server.

  • This lets the admins enable some checkers which may be very specific to the hardware of a cluster.

g5k-checks executes at boot time in two phases:

    • Phase 1 - An init script, /etc/init.d/g5k-checks, runs all checkers that must run early enough in the boot process.
* They are listed in the variable CHECKS_FOR_INIT in the configuration file.
* Then it enables all checkers listed in the variable CHECKS_FOR_OAR for Phase 2.
    • Phase 2 - This phase strongly relies on the check mechanism provided by OAR and the oar-node configuration file (/etc/default/oar-node for deb distros, /etc/sysconfig/oar-node for rpm ones).
** The oar-node flavour of OAR installation embeds an hourly cron job, oarnodecheckrun, which runs all executable files stored in /etc/oar/check.d/.

Then the server periodically invokes remotely oarnodecheckquery which will return with status 0 if and if only there are some log files in a given OAR directory. So if a checker in /etc/oar/check.d/ finds something wrong, it simply has to create a log file in that directory.

** The version of /etc/(default|sysconfig)/oar-node that g5k-checks installs runs both oarnodecheckrun and oarnodecheckquery scripts.

If the latter fails, then the node is not ready to start, and it loops on running those scripts until either oarnodecheckquery returns 0 or a timeout is reached. If the timeout is reached, then it does not attempt to declare the node as "Alive".

** During Phase 1, the enabling of a checker simply turns out to adding a symbolic link in /etc/oar/check.d to its "oar-node driver". We name this a short script file which interfaces the core checker to the OAR check mechanism.