Status: Difference between revisions

From Grid5000
Jump to navigation Jump to search
(monika pour oar2)
(drop bordeaux)
(43 intermediate revisions by 13 users not shown)
Line 1: Line 1:
{{Maintainer|Pierre Neyron}}
{{Maintainer|Sebastien Badia}}
{{Author|Florian Le Goff}}
{{Status|In production}}
{{Status|In production}}
{{Portal|User}}
{{Portal|Platform}}


If you experience problems, please [[Current_events|check the grid administration schedule]], where past, present and future incidents (planned or not...) are notified for all sites:
= [https://www.grid5000.fr/status/ Current events] (maintenance, issues...) =
If you experience problems, please [https://www.grid5000.fr/status/ check the grid administration schedule], where past, present and future incidents (planned or not...) are notified for all sites.


= Monika =
= Monika =
[http://oar.imag.fr/ Monika] displays current and scheduled [[OAR]] jobs.
[http://oar.imag.fr/ Monika] displays current and scheduled OAR jobs.


You can select an individual site or cluster:
You can select an individual site or cluster:
Line 11: Line 15:
|-
|-
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://helpdesk.grid5000.fr/oar/Bordeaux/monika.cgi Bordeaux]
[https://intranet.grid5000.fr/oar/Grenoble/monika.cgi Grenoble]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://helpdesk.grid5000.fr/oar/Grenoble/monika.cgi Grenoble]
[https://intranet.grid5000.fr/oar/Lille/monika.cgi Lille]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://helpdesk.grid5000.fr/oar/Lille/monika.cgi Lille]
[https://intranet.grid5000.fr/oar/Luxembourg/monika.cgi Luxembourg]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://helpdesk.grid5000.fr/oar/Lyon/monika.cgi Lyon]
[https://intranet.grid5000.fr/oar/Lyon/monika.cgi Lyon]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://helpdesk.grid5000.fr/oar/Nancy/monika.cgi Nancy]
[https://intranet.grid5000.fr/oar/Nancy/monika.cgi Nancy]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://helpdesk.grid5000.fr/oar/Orsay/monika.cgi Orsay]
[https://intranet.grid5000.fr/oar/Nantes/monika.cgi Nantes]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://helpdesk.grid5000.fr/oar/Rennes/monika.cgi Rennes]
[https://intranet.grid5000.fr/oar/Reims/monika.cgi Reims]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://helpdesk.grid5000.fr/oar/Sophia/monika.cgi Sophia]
[https://intranet.grid5000.fr/oar/Rennes/monika.cgi Rennes]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://helpdesk.grid5000.fr/oar/Toulouse/monika.cgi Toulouse]
[https://intranet.grid5000.fr/oar/Sophia/monika.cgi Sophia]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://intranet.grid5000.fr/oar/Toulouse/monika.cgi Toulouse]
|}
|}


Or view the [https://www.grid5000.fr/gridstatus/oargridmonika.cgi global snapshot of the grid].
Or view the [https://www.grid5000.fr/gridstatus/oargridmonika.cgi global snapshot of the grid].


= DrawOARGantt =
= Drawgantt =
[http://oar.imag.fr/ DrawOARGantt] displays past, current and scheduled [[OAR]] jobs.
[http://oar.imag.fr/ Drawgantt] displays past, current and scheduled OAR jobs.


You can select an individual site or cluster:
You can select an individual site or cluster:
Line 39: Line 45:
|-
|-
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://helpdesk.grid5000.fr/oar/Bordeaux/DrawOARGantt.pl Bordeaux]
[https://intranet.grid5000.fr/oar/Grenoble/drawgantt.cgi Grenoble]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://intranet.grid5000.fr/oar/Lille/drawgantt.cgi Lille]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://intranet.grid5000.fr/oar/Luxembourg/drawgantt.cgi Luxembourg]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://intranet.grid5000.fr/oar/Lyon/drawgantt.cgi Lyon]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://intranet.grid5000.fr/oar/Nancy/drawgantt.cgi Nancy]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://intranet.grid5000.fr/oar/Nantes/drawgantt.cgi Nantes]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://intranet.grid5000.fr/oar/Reims/drawgantt.cgi Reims]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://intranet.grid5000.fr/oar/Rennes/drawgantt.cgi Rennes]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://intranet.grid5000.fr/oar/Sophia/drawgantt.cgi Sophia]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://intranet.grid5000.fr/oar/Toulouse/drawgantt.cgi Toulouse]
|}
 
Or view the [https://www.grid5000.fr/gridstatus/oargridgantt.cgi global grid Gantt diagram].
 
A SVG version of Gantt charts are also available:
{|
|-
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://intranet.grid5000.fr/oar/Grenoble/drawgantt-svg/ Grenoble]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://intranet.grid5000.fr/oar/Lille/drawgantt-svg/ Lille]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
Grenoble:
[https://intranet.grid5000.fr/oar/Luxembourg/drawgantt-svg/ Luxembourg]
* [https://helpdesk.grid5000.fr/oar/Grenoble/DrawOARGantt.pl Idpot]
* [http://ita101.imag.fr/cgi-bin/DrawOARGantt.pl ICluster2]
* Icare
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://helpdesk.grid5000.fr/oar/Lille/DrawOARGantt.pl Lille]
[https://intranet.grid5000.fr/oar/Lyon/drawgantt-svg/ Lyon]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
Lyon:
[https://intranet.grid5000.fr/oar/Nancy/drawgantt-svg/ Nancy]
* [https://helpdesk.grid5000.fr/oar/Lyon/DrawOARGantt-capricorne.pl capricorne]
* [https://helpdesk.grid5000.fr/oar/Lyon/DrawOARGantt-sagittaire.pl sagittaire]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://helpdesk.grid5000.fr/oar/Nancy/DrawOARGantt.pl Nancy]
[https://intranet.grid5000.fr/oar/Nantes/drawgantt-svg/ Nantes]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://helpdesk.grid5000.fr/oar/Orsay/DrawOARGantt.pl Orsay]
[https://intranet.grid5000.fr/oar/Reims/drawgantt-svg/ Reims]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://helpdesk.grid5000.fr/oar/Rennes/DrawOARGantt.pl Rennes]
[https://intranet.grid5000.fr/oar/Rennes/drawgantt-svg/ Rennes]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://helpdesk.grid5000.fr/oar/Sophia/DrawOARGantt.pl Sophia]
[https://intranet.grid5000.fr/oar/Sophia/drawgantt-svg/ Sophia]
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
|bgcolor="#ffffff" valign="top" style="border:1px solid #cccccc;padding:1em;padding-top:0.5em;"|
[https://helpdesk.grid5000.fr/oar/Toulouse/DrawOARGantt.pl Toulouse]
[https://intranet.grid5000.fr/oar/Toulouse/drawgantt-svg/ Toulouse]
|}
|}


Or view the [https://www.grid5000.fr/gridstatus/DrawGridGantt.cgi global grid Gantt diagram].
= Network Monitoring =
== Backbone network status and load ==
[http://pasillo.renater.fr/weathermap/weathermap_g5k.html Grid5000 Weathermap]  (courtesy of Renater)
 
Shows the actual state of the opticals links between the Grid5000 10Gb-ready sites. A link painted in black on the weathermap means that you won't be able to access this site nodes from the Grid5000 internal network.
 
== Historical network load ==
 
[http://pasillo.renater.fr/metrologie/GRID5000/ Grid5000 Network monitoring] User:GRID PASS:5000 (courtesy of Renater)
 
This page gives you some nice graphs built from the SNMP counters of Renater switches.
 
You can see on those graphs if one or several experiments are creating congestion on a switch interface. It is quite interesting if you experiment weird things like packet loss or anormal delay. The EoMPLS graphs are outdated since Orsay and Bordeaux have been migrated to Renater-5.
 
== Sites network traffic ==
[https://intranet.grid5000.fr/supervision/grenoble/monitoring/network/last/minute/ Grenoble]
[https://intranet.grid5000.fr/supervision/lille/monitoring/network/last/minute/ Lille]
[https://intranet.grid5000.fr/supervision/lyon/monitoring/network/last/minute/ Lyon]
[https://intranet.grid5000.fr/supervision/luxembourg/monitoring/network/last/minute/ Luxembourg]
[https://intranet.grid5000.fr/supervision/nancy/monitoring/network/last/minute/ Nancy]
[https://intranet.grid5000.fr/supervision/nantes/monitoring/network/last/minute/ Nantes]
[https://intranet.grid5000.fr/supervision/reims/monitoring/network/last/minute/ Reims]
[https://intranet.grid5000.fr/supervision/rennes/monitoring/network/last/minute/ Rennes]
[https://intranet.grid5000.fr/supervision/sophia/monitoring/network/last/minute/ Sophia]
[https://intranet.grid5000.fr/supervision/toulouse/monitoring/network/last/minute/ Toulouse]
 
== Latency monitoring ==
 
[https://intranet.grid5000.fr/smokeping/Lille/?target=G5KCore Grid5000 Interlink Latency] (please check  [http://oss.oetiker.ch/smokeping/doc/reading.en.html Reading the Graphs] if you are not used to Smokeping graphs).
 
We are using the ping / ICMP probes in order to monitor the backbone's latency. Smokeping forks Fping every 300 seconds on each site in order to ping 20 times the adminfront of each site (including himself). Each site is trying to ping the others one, allowing us to get a full view of the network from each site.
 
Each host is pinged 20 times (similar to a ping -c 20) in order to study :
 
* '''Packet Loss''' (PL) occurring the link. Packet Loss can be caused by the saturation of a link. When a router or switch buffers are unable to store packets, the packets are dropped. It may also be caused by a faulty transmitting equipment somewhere, a link flapping (up/down/up/down...) caused by a faulty component.
 
* '''The variance between each ping'''. If the first comes back in 10ms, then the second in 50ms, then 5ms... there is something weird going on (network overload or routing issues). The graphs is plotting the median value then drawing smoke under and upper the point. If the median is 20ms, the min 10m and max 80ms, you will have a colored point at 20ms then smoke going from 10ms to 80ms.
 


= Renater4 monitoring =
A dashboard combining links and real-time data is also available on the [https://intranet.grid5000.fr/net/Lille/ Grid5000 Backbone Network Monitoring] page.
At present you have two pages with information about Grid5000 network monitoring:


* [http://pasillo.renater.fr/metrologie/GRID5000/ Grid5000 Network monitoring] This page gives you the information of the interfaces corresponding to the Renater nodes that are linked to a Grid5000 site. User:GRID PASS:5000.
= Power Monitoring =
* [http://www.renater.fr/Metrologie/map-FON-GRID-5000/ Grid5000 weathermap] Shows the actual state of the FON structure, with link use and historical data.
* [https://intranet.grid5000.fr/supervision/lyon/monitoring/energy/last/minute/ Lyon]
* [https://intranet.grid5000.fr/supervision/nancy/monitoring/energy/last/minute/ Nancy]
* [https://intranet.grid5000.fr/supervision/reims/monitoring/energy/last/minute/ Reims]
* [https://intranet.grid5000.fr/supervision/rennes/monitoring/energy/last/minute/ Rennes]


See the page [[Renater_G5K_project%27s_metrology]] for more info.
= Usage statistics =
[https://intranet.grid5000.fr/stats/ Site availability] over time gathers a lot of statistics about raw usage of the platform


= Kaspied =
= Kaspied =
Kaspied is a statistic tool provided to show who is using the platform.
Kaspied is a statistic tool provided to show who is using the platform.


https://www.grid5000.fr/kaspied
https://www.grid5000.fr/kaspied/


= Ganglia =
= Ganglia =
[http://ganglia.sourceforge.net/ Ganglia] provides resources usage metrics (memory, cpu, jobs...) for individual sites or the whole grid.
[http://ganglia.sourceforge.net/ Ganglia] provides resources usage metrics (memory, cpu, jobs...) for individual sites or the whole grid.


https://helpdesk.grid5000.fr/ganglia/
https://intranet.grid5000.fr/ganglia/


= Nagios =
= Nagios =
[http://www.nagios.org/ Nagios] monitors critical grid servers and services and automatically reports incidents and failures.
[http://www.nagios.org/ Nagios] monitors critical grid servers and services and automatically reports incidents and failures.


https://helpdesk.grid5000.fr/nagios/
[https://intranet.grid5000.fr/nagios/ Grid'5000 Nagios monitoring page.]
 
= Global Grid'5000 status geographical map =
[[Image:G5K_geographical_map02-07.jpg|thumbnail|250px|right|Geographical map screenshot]]
{{Warning|text=This is still an experimental tool (eg. unstable).}}
This tool places all Grid'5000 sites geographically, and displays their current status.
 
http://www.lri.fr/~herault/G5K/action.html

Revision as of 12:09, 23 July 2015


Current events (maintenance, issues...)

If you experience problems, please check the grid administration schedule, where past, present and future incidents (planned or not...) are notified for all sites.

Monika

Monika displays current and scheduled OAR jobs.

You can select an individual site or cluster:

Grenoble

Lille

Luxembourg

Lyon

Nancy

Nantes

Reims

Rennes

Sophia

Toulouse

Or view the global snapshot of the grid.

Drawgantt

Drawgantt displays past, current and scheduled OAR jobs.

You can select an individual site or cluster:

Grenoble

Lille

Luxembourg

Lyon

Nancy

Nantes

Reims

Rennes

Sophia

Toulouse

Or view the global grid Gantt diagram.

A SVG version of Gantt charts are also available:

Grenoble

Lille

Luxembourg

Lyon

Nancy

Nantes

Reims

Rennes

Sophia

Toulouse

Network Monitoring

Backbone network status and load

Grid5000 Weathermap (courtesy of Renater)

Shows the actual state of the opticals links between the Grid5000 10Gb-ready sites. A link painted in black on the weathermap means that you won't be able to access this site nodes from the Grid5000 internal network.

Historical network load

Grid5000 Network monitoring User:GRID PASS:5000 (courtesy of Renater)

This page gives you some nice graphs built from the SNMP counters of Renater switches.

You can see on those graphs if one or several experiments are creating congestion on a switch interface. It is quite interesting if you experiment weird things like packet loss or anormal delay. The EoMPLS graphs are outdated since Orsay and Bordeaux have been migrated to Renater-5.

Sites network traffic

Grenoble Lille Lyon Luxembourg Nancy Nantes Reims Rennes Sophia Toulouse

Latency monitoring

Grid5000 Interlink Latency (please check Reading the Graphs if you are not used to Smokeping graphs).

We are using the ping / ICMP probes in order to monitor the backbone's latency. Smokeping forks Fping every 300 seconds on each site in order to ping 20 times the adminfront of each site (including himself). Each site is trying to ping the others one, allowing us to get a full view of the network from each site.

Each host is pinged 20 times (similar to a ping -c 20) in order to study :

  • Packet Loss (PL) occurring the link. Packet Loss can be caused by the saturation of a link. When a router or switch buffers are unable to store packets, the packets are dropped. It may also be caused by a faulty transmitting equipment somewhere, a link flapping (up/down/up/down...) caused by a faulty component.
  • The variance between each ping. If the first comes back in 10ms, then the second in 50ms, then 5ms... there is something weird going on (network overload or routing issues). The graphs is plotting the median value then drawing smoke under and upper the point. If the median is 20ms, the min 10m and max 80ms, you will have a colored point at 20ms then smoke going from 10ms to 80ms.


A dashboard combining links and real-time data is also available on the Grid5000 Backbone Network Monitoring page.

Power Monitoring

Usage statistics

Site availability over time gathers a lot of statistics about raw usage of the platform

Kaspied

Kaspied is a statistic tool provided to show who is using the platform.

https://www.grid5000.fr/kaspied/

Ganglia

Ganglia provides resources usage metrics (memory, cpu, jobs...) for individual sites or the whole grid.

https://intranet.grid5000.fr/ganglia/

Nagios

Nagios monitors critical grid servers and services and automatically reports incidents and failures.

Grid'5000 Nagios monitoring page.