Status: Difference between revisions
|  (→Nagios) | |||
| Line 5: | Line 5: | ||
| {{Portal|Platform}} | {{Portal|Platform}} | ||
| = [https://www.grid5000.fr/status/ Current events] (maintenance, issues...) = | = [https://www.grid5000.fr/status/ Current platform events] (maintenance, outages, issues...) = | ||
| If you experience problems, please [https://www.grid5000.fr/status/  | If you experience problems, please check '''[https://www.grid5000.fr/status/ the platform's operation schedule]''', where past, present and future incidents (planned or not...) are notified for all sites. | ||
| = Monika = | = Monika = | ||
Revision as of 10:10, 5 September 2018
Current platform events (maintenance, outages, issues...)
If you experience problems, please check the platform's operation schedule, where past, present and future incidents (planned or not...) are notified for all sites.
Monika
Monika displays current and scheduled OAR jobs.
You can select an individual site or cluster:
Drawgantt
Drawgantt displays past, current and future OAR jobs.
Default view:
Forecast over 1 week:
Or view the global Grid'5000 Gantt diagram.
Network Monitoring
Backbone network status and load
Grid'5000 Weathermap (courtesy of Renater)
Shows the actual state of the opticals links between the Grid'5000 10Gb-ready sites. A link painted in black on the weathermap means that you won't be able to access this site nodes from the Grid'5000 internal network.
Sites network traffic
Grenoble Lille Lyon Luxembourg Nancy Nantes Rennes Sophia
Latency monitoring
Grid'5000 Interlink Latency (please check Reading the Graphs if you are not used to Smokeping graphs).
We are using the ping / ICMP probes in order to monitor the backbone's latency. Smokeping forks Fping every 300 seconds on each site in order to ping 20 times the adminfront of each site (including himself). Each site is trying to ping the others one, allowing us to get a full view of the network from each site.
Each host is pinged 20 times (similar to a ping -c 20) in order to study :
- Packet Loss (PL) occurring the link. Packet Loss can be caused by the saturation of a link. When a router or switch buffers are unable to store packets, the packets are dropped. It may also be caused by a faulty transmitting equipment somewhere, a link flapping (up/down/up/down...) caused by a faulty component.
- The variance between each ping. If the first comes back in 10ms, then the second in 50ms, then 5ms... there is something weird going on (network overload or routing issues). The graphs is plotting the median value then drawing smoke under and upper the point. If the median is 20ms, the min 10m and max 80ms, you will have a colored point at 20ms then smoke going from 10ms to 80ms.
A dashboard combining links and real-time data is also available on the Grid'5000 Backbone Network Monitoring page.
Power Monitoring
Usage statistics
Stats5k gathers a lot of statistics about the testbed.
Ganglia
Ganglia provides resources usage metrics (memory, cpu, jobs...) for individual sites or the whole platform.