Grid'5000 EventsCurrent, Planned and Past Grid'5000 Events.https://www.grid5000.fr/status/upcoming.atom2024-03-19T14:09:10+01:00Grid'5000 Staff[NEW] #Exceptional #Maintenance at #Luxembourg from 2024-03-21@09:00 to 2024-04-05@19:00: OAR server migration testing on bullseyePatrice Ringot
<p>Reported by Patrice Ringot, assigned to Patrice Ringot. The event is expected to last for ~15 days.</p>
<p>
We are testing the migration of the OAR server to bullseye. <br/><br/>The reservation is large just to be on the safe side, we may have terminated this operation sooner than planned.
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td></td></tr>
<tr><td>Tags:</td><td>exceptional, maintenance, luxembourg</td></tr>
<tr><td>Start date:</td><td>2024-03-21 09:00:00 +0100</td></tr>
<tr><td>End date:</td><td>2024-04-05 19:00:00 +0200</td></tr>
<tr><td>Event created at:</td><td>2024-03-19 10:28:48 +0100</td></tr>
<tr><td>Event updated at:</td><td>2024-03-19 10:28:48 +0100</td></tr>
</tbody>
</table>
2024-03-19T10:28:48+01:002024-03-19T10:28:48+01:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=15497[NEW] #Maintenance at #Nantes from 2024-03-21@10:00: to 2024-03-21@14:00 : API Proxy system upgradeJulien Lelaurain
<p>Reported by Julien Lelaurain, assigned to Julien Lelaurain. The event is expected to last for ~4 hours.</p>
<p>
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td></td></tr>
<tr><td>Tags:</td><td>maintenance, nantes</td></tr>
<tr><td>Start date:</td><td>2024-03-21 10:00:00 +0100</td></tr>
<tr><td>End date:</td><td>2024-03-21 14:00:00 +0100</td></tr>
<tr><td>Event created at:</td><td>2024-03-18 15:32:43 +0100</td></tr>
<tr><td>Event updated at:</td><td>2024-03-18 15:37:52 +0100</td></tr>
</tbody>
</table>
2024-03-18T15:32:43+01:002024-03-18T15:37:52+01:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=15494[NEW] #Maintenance at #toulouse from 2024-03-21@10:00: to 2024-03-21@14:00 : API Proxy system upgradeJulien Lelaurain
<p>Reported by Julien Lelaurain, assigned to Julien Lelaurain. The event is expected to last for ~4 hours.</p>
<p>
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td></td></tr>
<tr><td>Tags:</td><td>maintenance, toulouse</td></tr>
<tr><td>Start date:</td><td>2024-03-21 10:00:00 +0100</td></tr>
<tr><td>End date:</td><td>2024-03-21 14:00:00 +0100</td></tr>
<tr><td>Event created at:</td><td>2024-03-18 15:34:09 +0100</td></tr>
<tr><td>Event updated at:</td><td>2024-03-18 15:37:47 +0100</td></tr>
</tbody>
</table>
2024-03-18T15:34:09+01:002024-03-18T15:37:47+01:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=15495[NEW] #Maintenance at #Grenoble from 2024-03-21@08:00 to 2024-03-21@09:30 : API proxy unavailableNicolas Perrin
<p>Reported by Nicolas Perrin, assigned to Nicolas Perrin. The event is expected to last for ~90 minutes.</p>
<p>
api-proxy migration with user impact.
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td></td></tr>
<tr><td>Tags:</td><td>maintenance, grenoble</td></tr>
<tr><td>Start date:</td><td>2024-03-21 08:00:00 +0100</td></tr>
<tr><td>End date:</td><td>2024-03-21 09:30:00 +0100</td></tr>
<tr><td>Event created at:</td><td>2024-03-18 13:46:46 +0100</td></tr>
<tr><td>Event updated at:</td><td>2024-03-18 13:48:43 +0100</td></tr>
</tbody>
</table>
2024-03-18T13:46:46+01:002024-03-18T13:48:43+01:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=15493[NEW] #Maintenance at #Rennes from 2024-03-21@08:00 to 2024-03-21@09:30 : API proxy unavailableNicolas Perrin
<p>Reported by Nicolas Perrin, assigned to Nicolas Perrin. The event is expected to last for ~90 minutes.</p>
<p>
api-proxy migration with user impact.
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td></td></tr>
<tr><td>Tags:</td><td>maintenance, rennes</td></tr>
<tr><td>Start date:</td><td>2024-03-21 08:00:00 +0100</td></tr>
<tr><td>End date:</td><td>2024-03-21 09:30:00 +0100</td></tr>
<tr><td>Event created at:</td><td>2024-03-18 13:46:38 +0100</td></tr>
<tr><td>Event updated at:</td><td>2024-03-18 13:48:40 +0100</td></tr>
</tbody>
</table>
2024-03-18T13:46:38+01:002024-03-18T13:48:40+01:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=15492[RESOLVED] #Maintenance at #Luxembourg from 2024-03-15@16:00: to 2024-03-18@19:00 : API Proxy system upgradeJulien Lelaurain
<p>Reported by Julien Lelaurain, assigned to Julien Lelaurain. The event is expected to last for ~3 days.</p>
<p>
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>FIXED</td></tr>
<tr><td>Tags:</td><td>maintenance, luxembourg</td></tr>
<tr><td>Start date:</td><td>2024-03-15 16:00:00 +0100</td></tr>
<tr><td>End date:</td><td>2024-03-18 19:00:00 +0100</td></tr>
<tr><td>Event created at:</td><td>2024-03-15 16:21:28 +0100</td></tr>
<tr><td>Event updated at:</td><td>2024-03-15 23:02:32 +0100</td></tr>
</tbody>
</table>
2024-03-15T16:21:28+01:002024-03-15T23:02:32+01:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=15483[RESOLVED] #Maintenance on #all_sites from 2024-03-14@09:30 to 2024-03-14@16:00: conda and mamba module unavailableLaurent Pouilloux
<p>Reported by Laurent Pouilloux, assigned to Support Staff. The event is expected to last for ~6 hours.</p>
<p>
Due to an upgrade of the standard environment that will ship a new LMOD version, usage of conda and mamba modules will be broken.
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>FIXED</td></tr>
<tr><td>Tags:</td><td>maintenance, all_sites</td></tr>
<tr><td>Start date:</td><td>2024-03-14 09:30:00 +0100</td></tr>
<tr><td>End date:</td><td>2024-03-14 16:00:00 +0100</td></tr>
<tr><td>Event created at:</td><td>2024-03-14 09:22:07 +0100</td></tr>
<tr><td>Event updated at:</td><td>2024-03-14 16:34:17 +0100</td></tr>
</tbody>
</table>
2024-03-14T09:22:07+01:002024-03-14T16:34:17+01:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=15473[RESOLVED] #Maintenance at #Lyon from 2024-03-07@8:00 to 2024-03-07@9:00 : frontend rebootLaurent Pouilloux
<p>Reported by Laurent Pouilloux, assigned to Support Staff. The event is expected to last for ~0 minutes.</p>
<p>
The frontend must be restarted.
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>FIXED</td></tr>
<tr><td>Tags:</td><td>maintenance, lyon</td></tr>
<tr><td>Start date:</td><td>2024-03-07 00:00:00 +0100</td></tr>
<tr><td>End date:</td><td>2024-03-07 00:00:00 +0100</td></tr>
<tr><td>Event created at:</td><td>2024-02-29 11:29:13 +0100</td></tr>
<tr><td>Event updated at:</td><td>2024-03-07 08:09:50 +0100</td></tr>
</tbody>
</table>
2024-02-29T11:29:13+01:002024-03-07T08:09:50+01:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=15435[ASSIGNED] #Incident at #Rennes from 2024-02-21@18:00 : electrical problem - abacus22 and abacus25 clusters unavailableJulien Lelaurain
<p>Reported by Julien Lelaurain, assigned to Nicolas Perrin. The event is expected to last for ~-28475580 minutes.</p>
<p>
The srv-data2 server is down and NFS volume for storage2 and modules are unavailable.<br/>abacus22 and abacus25 are also impacted and unavailable.<br/>Since yersterday ~18h45, only abacus25 are still unavailable.<br/>Correction : only abacus22-1 is unavailable<br/>Another electrical shutdown occurred : abacus22 and abacus25 (except for abacus25-3) clusters are unavailable.<br/>(srv-data2 server is alive).<br/>This issue is still under investigation.<br/>abacus22-1 and abacus25-3 are the only nodes of abacus22 and abacus25 clusters available for now.
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td></td></tr>
<tr><td>Tags:</td><td>incident, rennes</td></tr>
<tr><td>Start date:</td><td>2024-02-21 18:00:00 +0100</td></tr>
<tr><td>End date:</td><td>1970-01-01 01:00:00 +0100</td></tr>
<tr><td>Event created at:</td><td>2024-02-21 18:29:53 +0100</td></tr>
<tr><td>Event updated at:</td><td>2024-03-06 09:08:18 +0100</td></tr>
</tbody>
</table>
2024-02-21T18:29:53+01:002024-03-06T09:08:18+01:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=15410[RESOLVED] #Maintenance on #all_sites from 2024-02-29@10:30 to 2024-02-29@11:30: access, VPN and intranet rebootLaurent Pouilloux
<p>Reported by Laurent Pouilloux, assigned to Support Staff. The event is expected to last for ~60 minutes.</p>
<p>
We have to reboot access, VPN and intranet servers to apply security patchs.
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>FIXED</td></tr>
<tr><td>Tags:</td><td>maintenance, all_sites</td></tr>
<tr><td>Start date:</td><td>2024-02-29 10:30:00 +0100</td></tr>
<tr><td>End date:</td><td>2024-02-29 11:30:00 +0100</td></tr>
<tr><td>Event created at:</td><td>2024-02-29 08:25:38 +0100</td></tr>
<tr><td>Event updated at:</td><td>2024-02-29 11:26:48 +0100</td></tr>
</tbody>
</table>
2024-02-29T08:25:38+01:002024-02-29T11:26:48+01:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=15430[RESOLVED] #Incident at Luxembourg from 2024-02-29@9:47 : Luxembourg site unreachableLaurent Pouilloux
<p>Reported by Laurent Pouilloux, assigned to Support Staff. The event is expected to last for ~9 hours.</p>
<p>
The Luxembourg site is unavailable.<br/><br/>No RENATER incident reported for the moment.
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>FIXED</td></tr>
<tr><td>Tags:</td><td>incident</td></tr>
<tr><td>Start date:</td><td>2024-02-29 00:00:00 +0100</td></tr>
<tr><td>End date:</td><td>2024-02-29 09:55:59 +0100</td></tr>
<tr><td>Event created at:</td><td>2024-02-29 09:50:08 +0100</td></tr>
<tr><td>Event updated at:</td><td>2024-02-29 09:55:59 +0100</td></tr>
</tbody>
</table>
2024-02-29T09:50:08+01:002024-02-29T09:55:59+01:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=15433[RESOLVED] #Exceptional #Maintenance at #Nancy from 2024-02-29@09:00 to 2024-02-29@11:00: frontend reboot, bios update on storage5.nancy.grid5000.frPatrice Ringot
<p>Reported by Patrice Ringot, assigned to Patrice Ringot. The event is expected to last for ~2 hours.</p>
<p>
We have to reboot the Nancy frontend to apply a new security patch.<br/> <br/>As usual in such a case, tmux/screen sessions, Jupyter labs hosted on frontend will be lost.<br/><br/>During this maintenance, in order to apply a BIOS update, we will also reboot the storage5.nancy.grid5000.fr which hosts the /srv/storage/grvingt@storage5.nancy.grid5000.fr path.
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>FIXED</td></tr>
<tr><td>Tags:</td><td>exceptional, maintenance, nancy</td></tr>
<tr><td>Start date:</td><td>2024-02-29 09:00:00 +0100</td></tr>
<tr><td>End date:</td><td>2024-02-29 11:00:00 +0100</td></tr>
<tr><td>Event created at:</td><td>2024-02-21 16:38:50 +0100</td></tr>
<tr><td>Event updated at:</td><td>2024-02-29 09:12:54 +0100</td></tr>
</tbody>
</table>
2024-02-21T16:38:50+01:002024-02-29T09:12:54+01:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=15409[RESOLVED] #Maintenance at #grenoble from 2024-02-29@08:00 to 2024-02-29@09:00 : frontend and kavlan servers rebootNicolas Perrin
<p>Reported by Nicolas Perrin, assigned to Nicolas Perrin. The event is expected to last for ~60 minutes.</p>
<p>
We have to reboot frontend and kavlan servers to apply security patchs.<br/>Security patchs have been successfully applied.
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>FIXED</td></tr>
<tr><td>Tags:</td><td>maintenance, grenoble</td></tr>
<tr><td>Start date:</td><td>2024-02-29 08:00:00 +0100</td></tr>
<tr><td>End date:</td><td>2024-02-29 09:00:00 +0100</td></tr>
<tr><td>Event created at:</td><td>2024-02-22 11:18:30 +0100</td></tr>
<tr><td>Event updated at:</td><td>2024-02-29 08:41:48 +0100</td></tr>
</tbody>
</table>
2024-02-22T11:18:30+01:002024-02-29T08:41:48+01:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=15412[RESOLVED] #Maintenance at #Rennes 2023-11-09@08:00 to 2023-11-09@09:00 : storage2 reboot (srv-data2)Nicolas Perrin
<p>Reported by Nicolas Perrin, assigned to Nicolas Perrin. The event is expected to last for ~60 minutes.</p>
<p>
Reboot needed to apply System Power Supply firmware update.
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>INVALID</td></tr>
<tr><td>Tags:</td><td>maintenance, rennes</td></tr>
<tr><td>Start date:</td><td>2023-11-09 08:00:00 +0100</td></tr>
<tr><td>End date:</td><td>2023-11-09 09:00:00 +0100</td></tr>
<tr><td>Event created at:</td><td>2023-11-07 15:30:08 +0100</td></tr>
<tr><td>Event updated at:</td><td>2023-11-09 08:30:12 +0100</td></tr>
</tbody>
</table>
2023-11-07T15:30:08+01:002023-11-09T08:30:12+01:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=15072[RESOLVED] #Exceptional #Network #Renater #Maintenance #DMZ at #Lille from 2023-05-30@21:00 to 2023-05-31@07:00 : G5K link will be downNicolas Perrin
<p>Reported by Nicolas Perrin, assigned to Nicolas Perrin. The event is expected to last for ~10 hours.</p>
<p>
access et api vont passer au sud dans la journée (changement DNS) pour éviter leur indisponibilité.<br/><br/>Information Renater:<br/>------------------------------------------------------------------<br/>N°Ticket : 4943702<br/>Type de ticket : MAINTENANCE<br/>Etat du ticket : Ouvert<br/>------------------------------------------------------------------<br/>Emetteur : NOC-RENATER<br/>Elément concerné : liaison-ren_lille-paris1<br/>Service(s) impacté(s) : Grid5k Paris1 Lille<br/>------------------------------------------------------------------<br/>Début de Maintenance : 30/05/2023 22:00:44 CET/CEST<br/>Fin prévue de la Maintenance : 31/05/2023 06:00:44 CET/CEST<br/>------------------------------------------------------------------<br/>Date/Heure Ouverture (du ticket) : 26/05/2023 18:07:09 CET/CEST<br/>------------------------------------------------------------------<br/><br/>Description de maintenance:<br/>Maintenance Opérateur avec coupure sur liaison-ren_lille-paris1.<br/>------------------------------------------------------------------
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>DUPLICATE</td></tr>
<tr><td>Tags:</td><td>exceptional, network, renater, maintenance, dmz, lille</td></tr>
<tr><td>Start date:</td><td>2023-05-30 21:00:00 +0200</td></tr>
<tr><td>End date:</td><td>2023-05-31 07:00:00 +0200</td></tr>
<tr><td>Event created at:</td><td>2023-05-30 08:38:00 +0200</td></tr>
<tr><td>Event updated at:</td><td>2023-05-30 16:20:49 +0200</td></tr>
</tbody>
</table>
2023-05-30T08:38:00+02:002023-05-30T16:20:49+02:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=14573[RESOLVED] #Renater #Network #Maintenance at Nantes from 2023-04-27@21:00 to 2023-04-28@07:00 : G5K link will be downNicolas Perrin
<p>Reported by Nicolas Perrin, assigned to Support Staff. The event is expected to last for ~10 hours.</p>
<p>
Ticket Renater:<br/>------------------------------------------------------------------<br/>N°Ticket : 4929222<br/>Type de ticket : MAINTENANCE<br/>Etat du ticket : Ouvert<br/>------------------------------------------------------------------<br/>Emetteur : NOC-RENATER<br/>Elément concerné : liaison-ren_rennes-nantes<br/>Service(s) impacté(s) : Rennes-Nantes GRID5K et Paris1-Nantes LHCONE<br/>------------------------------------------------------------------<br/>Début de Maintenance : 27/04/2023 22:00:47 CET/CEST<br/>Fin prévue de la Maintenance : 28/04/2023 06:00:47 CET/CEST<br/>------------------------------------------------------------------<br/>Date/Heure Ouverture (du ticket) : 25/04/2023 11:28:32 CET/CEST<br/>------------------------------------------------------------------<br/>Renater:<br/>"La maintenance est reportée à une date ultérieure suite a des problèmes techniques."
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>INVALID</td></tr>
<tr><td>Tags:</td><td>renater, network, maintenance</td></tr>
<tr><td>Start date:</td><td>2023-04-27 21:00:00 +0200</td></tr>
<tr><td>End date:</td><td>2023-04-28 07:00:00 +0200</td></tr>
<tr><td>Event created at:</td><td>2023-04-26 08:07:10 +0200</td></tr>
<tr><td>Event updated at:</td><td>2023-04-26 09:43:37 +0200</td></tr>
</tbody>
</table>
2023-04-26T08:07:10+02:002023-04-26T09:43:37+02:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=14514[RESOLVED] #Maintenance at #Grenoble from 2023-03-16@14:00 to 2023-03-16@18:00 : network maintenance in DC IMAGPatrice Ringot
<p>Reported by Patrice Ringot, assigned to Support Staff. The event is expected to last for ~4 hours.</p>
<p>
The whole site has been reserved during the maintenance as it can impact our network communications.<br/><br/>------------------------------------------------------------------<br/>Numéro du ticket : 2039<br/>Type du ticket : Maintenance<br/>Etat du ticket : Ouverture<br/>Date d'ouverture du ticket : 10/03/2023 12:14 CET<br/>Début de la maintenance : 16/03/2023 14:00 CET<br/>Fin prévue de la maintenance : 16/03/2023 18:00 CET<br/>Durée estimée (en minutes) : 240<br/>Localisation de la maintenance : IMAG - DC<br/>------------------------------------------------------------------<br/>Résumé :<br/>SPRING - Remplacement des leaves 313, 314 et 371<br/>------------------------------------------------------------------<br/>Impact sur le service :<br/>Coupure réseau pour les équipements branchés sur ces leaves<br/>------------------------------------------------------------------<br/>Description de la maintenance :<br/>Dans le cadre d'une opération de jouvence les leaves 313, 314 et 371 <br/>vont être remplacées le 16/3 après-midi prochain.<br/><br/>Les entités suivantes sont concernées : CIMENT, G5K, IMAG, OSUG, <br/>VERIMAG, MIASHS, braintech.<br/><br/>Le CT Spring prendra directement contact avec les entités concernées <br/>pour indiquer les serveurs impactés.<br/>------------------------------------------------------------------<br/>Historique :<br/>- Création le 10/03/2023 12:14 CET par thomasyl<br/>------------------------------------------------------------------
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>INVALID</td></tr>
<tr><td>Tags:</td><td>maintenance, grenoble</td></tr>
<tr><td>Start date:</td><td>2023-03-16 14:00:00 +0100</td></tr>
<tr><td>End date:</td><td>2023-03-16 18:00:00 +0100</td></tr>
<tr><td>Event created at:</td><td>2023-03-13 17:37:31 +0100</td></tr>
<tr><td>Event updated at:</td><td>2023-03-14 12:21:37 +0100</td></tr>
</tbody>
</table>
2023-03-13T17:37:31+01:002023-03-14T12:21:37+01:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=14437[RESOLVED] #Maintenance at #Grenoble from 2022-10-13@07:00 to 2022-10-13@17:00 : Electrical tests in server roomPatrice Ringot
<p>Reported by Patrice Ringot, assigned to Support Staff. The event is expected to last for ~10 hours.</p>
<p>
Note: we will release our maintenance reservation probably before 5pm.
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>DUPLICATE</td></tr>
<tr><td>Tags:</td><td>maintenance, grenoble</td></tr>
<tr><td>Start date:</td><td>2022-10-13 07:00:00 +0200</td></tr>
<tr><td>End date:</td><td>2022-10-13 17:00:00 +0200</td></tr>
<tr><td>Event created at:</td><td>2022-10-11 19:07:07 +0200</td></tr>
<tr><td>Event updated at:</td><td>2022-10-12 15:38:56 +0200</td></tr>
</tbody>
</table>
2022-10-11T19:07:07+02:002022-10-12T15:38:56+02:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=14167[RESOLVED] #Incident on #Rennes infiniband network failure since 2021-02-26@12:00Dimitri Delabroye
<p>Reported by Dimitri Delabroye, assigned to Pascal Morillon. The event is expected to last for ~235 days.</p>
<p>
As a result the parapide and the parapluie cluster are down<br/><br/>Sorry for the inconvenience<br/>parapide and parapluie clusters are retired.<br/><br/>[WONTFIX]
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>WONTFIX</td></tr>
<tr><td>Tags:</td><td>incident, rennes</td></tr>
<tr><td>Start date:</td><td>2021-02-26 12:00:00 +0100</td></tr>
<tr><td>End date:</td><td>2021-10-19 14:11:57 +0200</td></tr>
<tr><td>Event created at:</td><td>2021-02-26 16:09:17 +0100</td></tr>
<tr><td>Event updated at:</td><td>2021-10-19 14:11:57 +0200</td></tr>
</tbody>
</table>
2021-02-26T16:09:17+01:002021-10-19T14:11:57+02:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=12831[RESOLVED] #Network #Incident : possible stitching issues with IMEC since 2021-07-12Baptiste Jonglez
<p>Reported by Baptiste Jonglez, assigned to Support Staff. The event is expected to last for ~4 days.</p>
<p>
Our monitoring reports reachability issues with IMEC since July 12. This happens only when using VLAN stitching. We are investigating.<br/>en fait le test repose sur une VM côté IMEC, dont la réservation a expiré le week-end dernier (pas cool le rappel 24h avant qui tombe un week-end).<br/><br/>Il y a deux solutions :<br/>1/ refaire le setup (réserver un VM côté IMEC et l'associer au VLAN, pour pouvoir la pinger). Ca serait mieux si ce n'était pas moi qui faisait la réservation<br/>2/ faire un truc plus propre où on provisionnerait une VM dynamiquement côté IMEC avec la version scriptable de jfed<br/><br/>(2) est nettement plus élégant mais nécessite de comprendre comment automatiser le provisioning côté IMEC<br/>(2) a deux autres avantages:<br/>- il ne monopolise pas en permanence l'un des VM<br/>- il teste aussi qu'on arrive à reconfigurer du stitching côté IMEC (avec (1), il n'y a pas de changement de config côté IMEC lors du test, donc on peut passer à côté de problèmes)<br/>Ah. J'ouvre un bug à part pour ça du coup.
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>INVALID</td></tr>
<tr><td>Tags:</td><td>network, incident</td></tr>
<tr><td>Start date:</td><td>2021-07-12 00:00:00 +0200</td></tr>
<tr><td>End date:</td><td>2021-07-16 15:54:36 +0200</td></tr>
<tr><td>Event created at:</td><td>2021-07-13 19:24:23 +0200</td></tr>
<tr><td>Event updated at:</td><td>2021-07-16 15:54:36 +0200</td></tr>
</tbody>
</table>
2021-07-13T19:24:23+02:002021-07-16T15:54:36+02:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=13288[RESOLVED] #Maintenance on #DMZ South from 2020-09-10@09:00 to 2020-09-10@13:00 : Upgrade south DMZ serversDavid Loup
<p>Reported by David Loup, assigned to Support Staff. The event is expected to last for ~4 hours.</p>
<p>
South DMZ server will be upgraded and rebooted.<br/>Grid'5000 won't be accessible through south DMZ (access-south.grid5000.fr) but will be reachable as usual through north DMZ (access.grid5000.fr, alias access-north.grid5000.fr)<br/>The maintenance is postponed
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>LATER</td></tr>
<tr><td>Tags:</td><td>maintenance, dmz</td></tr>
<tr><td>Start date:</td><td>2020-09-10 09:00:00 +0200</td></tr>
<tr><td>End date:</td><td>2020-09-10 13:00:00 +0200</td></tr>
<tr><td>Event created at:</td><td>2020-09-08 11:03:49 +0200</td></tr>
<tr><td>Event updated at:</td><td>2020-09-10 14:06:57 +0200</td></tr>
</tbody>
</table>
2020-09-08T11:03:49+02:002020-09-10T14:06:57+02:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=12185[RESOLVED] #Maintenance at #Rennes from 2020-06-04@08:15 to 2020-06-04@12:15 : systems updateNicolas Perrin
<p>Reported by Nicolas Perrin, assigned to Nicolas Perrin. The event is expected to last for ~4 hours.</p>
<p>
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>DUPLICATE</td></tr>
<tr><td>Tags:</td><td>maintenance, rennes</td></tr>
<tr><td>Start date:</td><td>2020-06-04 08:15:00 +0200</td></tr>
<tr><td>End date:</td><td>2020-06-04 12:15:00 +0200</td></tr>
<tr><td>Event created at:</td><td>2020-05-27 16:33:31 +0200</td></tr>
<tr><td>Event updated at:</td><td>2020-05-28 09:07:48 +0200</td></tr>
</tbody>
</table>
2020-05-27T16:33:31+02:002020-05-28T09:07:48+02:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=11900[RESOLVED] #Incident at #Nancy #Talc since 2016-06-20@08:00 : no network access to talc nodesClément Parisot
<p>Reported by Clément Parisot, assigned to Support Staff. The event is expected to last for ~1412 days.</p>
<p>
The link between Grid'5000 and Talc is saturated by an user experiment. All Talc nodes are Dead.<br/>The experiment has been terminated and talc nodes are being brought back.<br/>Nodes are back.<br/>No talc storage is accessible since Friday 9:30PM (approximately).
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>DUPLICATE</td></tr>
<tr><td>Tags:</td><td>incident, nancy, talc</td></tr>
<tr><td>Start date:</td><td>2016-06-20 08:00:00 +0200</td></tr>
<tr><td>End date:</td><td>2020-05-02 12:16:06 +0200</td></tr>
<tr><td>Event created at:</td><td>2016-06-20 10:48:11 +0200</td></tr>
<tr><td>Event updated at:</td><td>2020-05-02 12:16:06 +0200</td></tr>
</tbody>
</table>
2016-06-20T10:48:11+02:002020-05-02T12:16:06+02:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=7055[RESOLVED] Expired accounts being automatically retiredAdrien Courbet
<p>Reported by Adrien Courbet, assigned to Support Staff. The event is expected to last for ~3 hours.</p>
<p>
A bug in the User Management System is retiring currently expired accounts. We are working on fixing this, see https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=10926 and https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=10913
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>INVALID</td></tr>
<tr><td>Tags:</td><td></td></tr>
<tr><td>Start date:</td><td>2019-09-30 11:00:40 +0200</td></tr>
<tr><td>End date:</td><td>2019-09-30 14:25:37 +0200</td></tr>
<tr><td>Event created at:</td><td>2019-09-30 11:00:40 +0200</td></tr>
<tr><td>Event updated at:</td><td>2019-09-30 14:25:37 +0200</td></tr>
</tbody>
</table>
2019-09-30T11:00:40+02:002019-09-30T14:25:37+02:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=10928[RESOLVED] #Maintenance #Nancy from 2019-08-29@10:00 to 2019-08-29@10:30: frontend rebootAdrien Courbet
<p>Reported by Adrien Courbet, assigned to Support Staff. The event is expected to last for ~30 minutes.</p>
<p>
The frontend will be rebooted.<br/><br/>All of Nancy nodes have been reserved during this timeperiod.<br/>Cancelled.
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>INVALID</td></tr>
<tr><td>Tags:</td><td>maintenance, nancy</td></tr>
<tr><td>Start date:</td><td>2019-08-29 10:00:00 +0200</td></tr>
<tr><td>End date:</td><td>2019-08-29 10:30:00 +0200</td></tr>
<tr><td>Event created at:</td><td>2019-08-19 10:18:03 +0200</td></tr>
<tr><td>Event updated at:</td><td>2019-08-20 16:21:48 +0200</td></tr>
</tbody>
</table>
2019-08-19T10:18:03+02:002019-08-20T16:21:48+02:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=10759[RESOLVED] #Maintenance on #Intranet and #Wiki from 2019-07-25@09:00 to 2019-07-25@12:00 : Update intranet and wiki certificatesDavid Loup
<p>Reported by David Loup, assigned to Support Staff. The event is expected to last for ~3 hours.</p>
<p>
Intranet and wiki certificates will be replaced with stronger ones.<br/><br/>Intranet and wiki may be unavailable during the maintenance
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>INVALID</td></tr>
<tr><td>Tags:</td><td>maintenance, intranet, wiki</td></tr>
<tr><td>Start date:</td><td>2019-07-25 09:00:00 +0200</td></tr>
<tr><td>End date:</td><td>2019-07-25 12:00:00 +0200</td></tr>
<tr><td>Event created at:</td><td>2019-07-24 15:14:59 +0200</td></tr>
<tr><td>Event updated at:</td><td>2019-07-24 15:21:10 +0200</td></tr>
</tbody>
</table>
2019-07-24T15:14:59+02:002019-07-24T15:21:10+02:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=10712[RESOLVED] #Exceptional #Maintenance at #Grenoble from 2016-12-06@15:00 to 2016-12-06@17:00 : deploy new g5k-checks version (ram_size, bios and disk revision)Nicolas Michon
<p>Reported by Nicolas Michon, assigned to Support Staff. The event is expected to last for ~2 hours.</p>
<p>
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>INVALID</td></tr>
<tr><td>Tags:</td><td>exceptional, maintenance, grenoble</td></tr>
<tr><td>Start date:</td><td>2016-12-06 15:00:00 +0100</td></tr>
<tr><td>End date:</td><td>2016-12-06 17:00:00 +0100</td></tr>
<tr><td>Event created at:</td><td>2016-11-30 13:27:31 +0100</td></tr>
<tr><td>Event updated at:</td><td>2019-06-27 11:22:06 +0200</td></tr>
</tbody>
</table>
2016-11-30T13:27:31+01:002019-06-27T11:22:06+02:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=7615[RESOLVED] #Maintenance at #Lyon from 2017-06-26@09:00 to 2017-06-26@12:00 : OAR MigrationDavid Loup
<p>Reported by David Loup, assigned to Support Staff. The event is expected to last for ~3 hours.</p>
<p>
OAR server will be migrated to stretch.<br/>It won't be possible to make reservation on Lyon during the maintenance
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>DUPLICATE</td></tr>
<tr><td>Tags:</td><td>maintenance, lyon</td></tr>
<tr><td>Start date:</td><td>2017-06-26 09:00:00 +0200</td></tr>
<tr><td>End date:</td><td>2017-06-26 12:00:00 +0200</td></tr>
<tr><td>Event created at:</td><td>2018-06-27 15:56:46 +0200</td></tr>
<tr><td>Event updated at:</td><td>2018-06-27 15:58:36 +0200</td></tr>
</tbody>
</table>
2018-06-27T15:56:46+02:002018-06-27T15:58:36+02:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=9454[RESOLVED] #storage5k #lyon seems to be brokenDimitri Delabroye
<p>Reported by Dimitri Delabroye, assigned to Simon Delamare. The event is expected to last for ~352 days.</p>
<p>
Apres avoir fait une résa qui est passé en running on obtient ceci et le répertoire n'est pas monté dans /data. Le lv n'est pas créé non plus sur le serveur storage5k<br/><br/>storage5k -a info<br/>/var/lib/gems/2.1.0/gems/storage5k-oar-1.1.6/lib/storage5k/oar-server-modules/oarstat.rb:45:in `stat': No such file or directory - statvfs (Errno::ENOENT)<br/> from /var/lib/gems/2.1.0/gems/storage5k-oar-1.1.6/lib/storage5k/oar-server-modules/oarstat.rb:45:in `block in myjobs'<br/> from /var/lib/gems/2.1.0/gems/storage5k-oar-1.1.6/lib/storage5k/oar-server-modules/oarstat.rb:31:in `each'<br/> from /var/lib/gems/2.1.0/gems/storage5k-oar-1.1.6/lib/storage5k/oar-server-modules/oarstat.rb:31:in `myjobs'<br/> from /var/lib/gems/2.1.0/gems/storage5k-oar-1.1.6/lib/storage5k/oar-server-modules/info.rb:14:in `puts_info'<br/> from /var/lib/gems/2.1.0/gems/storage5k-oar-1.1.6/lib/storage5k/application/storage5k.rb:180:in `oar_run_application'<br/> from /var/lib/gems/2.1.0/gems/storage5k-oar-1.1.6/lib/storage5k/application.rb:75:in `run'<br/> from /var/lib/gems/2.1.0/gems/storage5k-oar-1.1.6/bin/storage5k:12:in `<top (required)>'<br/> from /usr/local/bin/storage5k:23:in `load'<br/> from /usr/local/bin/storage5k:23:in `<main>'<br/>Le contenu de /var/log/storage5k.log quand j'ai fais une résa :<br/><br/>[Tue Jun 27 10:17:14 +0200 2017] INFO -- : creation the logical volume dloup_878797 [Sucess]<br/> Logical volume "dloup_878797" created<br/>/usr/lib/ruby/gems/1.8/gems/storage5k-oar-1.1.1/lib/storage5k/external.rb:24:in `cmd': Fatal error, `/sbin/mkfs.ext4 -m 0 -E lazy_itable_init=1 -O uninit_bg /dev/G5K_VG/dloup_878797` returned 1 with 'mke2fs 1.41.12 (17-May-2010) (Storage5k::External::ExternalFailure)<br/><br/>Warning, had trouble writing out superblocks.' <br/> Filesystem label=<br/>OS type: Linux<br/>Block size=4096 (log=2)<br/>Fragment size=4096 (log=2)<br/>Stride=128 blocks, Stripe width=640 blocks<br/>655360 inodes, 2621440 blocks<br/>0 blocks (0.00%) reserved for the super user<br/>First data block=0<br/>Maximum filesystem blocks=2684354560<br/>80 block groups<br/>32768 blocks per group, 32768 fragments per group<br/>8192 inodes per group<br/>Superblock backups stored on blocks: <br/> 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632<br/><br/>Writing inode tables: done <br/>Creating journal (32768 blocks): done<br/>Writing superblocks and filesystem accounting information: done<br/><br/>This filesystem will be automatically checked every 38 mounts or<br/>180 days, whichever comes first. Use tune2fs -c or -i to override.<br/> from /usr/lib/ruby/gems/1.8/gems/storage5k-nfs-1.1.1/lib/storage5k/nfs-server-modules/lvmG5K.rb:69:in `mount_lvm'<br/> from /usr/lib/ruby/gems/1.8/gems/storage5k-oar-1.1.1/lib/storage5k/application/storage5k.rb:121:in `nfs_run_application'<br/> from /usr/lib/ruby/gems/1.8/gems/storage5k-oar-1.1.1/lib/storage5k/application.rb:66:in `run'<br/> from /usr/lib/ruby/gems/1.8/gems/storage5k-oar-1.1.1/bin/storage5k:11<br/> from /usr/bin/storage5k:19:in `load'<br/> from /usr/bin/storage5k:19<br/>[Tue Jun 27 10:17:16 +0200 2017] INFO -- : deletion of the logical volume dloup_878797 [Sucess]<br/> Logical volume "dloup_878797" successfully removed<br/>Corrigé en desconstruisant et reconstruisant le RAID (il était passé en RO, pourtant les disques semble ok)<br/><br/>/etc/init.d/nfs-kernel-server stop<br/>umount /data/*<br/>/etc/init.d/lvm stop<br/>mdadm -S /dev/md5*<br/>mdadm --create /dev/md51 --level=raid5 --raid-devices=7 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 --spare-devices=1 /dev/sdab1<br/>mdadm --create /dev/md52 --level=raid5 --raid-devices=7 /dev/sdi1 /dev/sdl1 /dev/sdm1 /dev/sdn1 /dev/sdo1 /dev/sdp1 /dev/sdq1 --spare-devices=1 /dev/sdac1<br/>mdadm --create /dev/md53 --level=raid5 --raid-devices=8 /dev/sdr1 /dev/sds1 /dev/sdt1 /dev/sdu1 /dev/sdv1 /dev/sdw1 /dev/sdx1 /dev/sdy1 --spare-devices=1 /dev/sdae1<br/>mdadm --create /dev/md54 --level=raid5 --raid-devices=8 /dev/sdah1 /dev/sdai1 /dev/sdaj1 /dev/sdak1 /dev/sdal1 /dev/sdam1 /dev/sdan1 /dev/sdao1 --spare-devices=1 /dev/sdaf1<br/>mdadm --create /dev/md55 --level=raid5 --raid-devices=8 /dev/sdap1 /dev/sdaq1 /dev/sdar1 /dev/sdas1 /dev/sdat1 /dev/sdau1 /dev/sdav1 /dev/sdaw1 --spare-devices=1 /dev/sdag1<br/>mdadm --create /dev/md50 --level=raid0 --raid-devices=5 /dev/md51 /dev/md52 /dev/md53 /dev/md54 /dev/md55<br/>mdadm --detail --scan --verbose > /etc/mdadm/mdadm.conf<br/>sed -i 's/spares=\([1-9]*\) /spares=\1 spare-group=global /g' /etc/mdadm/mdadm.conf<br/>/etc/init.d/lvm start<br/>mount -a<br/>/etc/init.d/nfs-kernel-server start<br/>a nnouveau:<br/><br/>pneyron@flyon:~$ storage5k -a info<br/>/var/lib/gems/2.3.0/gems/storage5k-oar-1.1.6/lib/storage5k/oar-server-modules/oarstat.rb:45:in `stat': No such file or directory - statvfs (Errno::ENOENT)<br/> from /var/lib/gems/2.3.0/gems/storage5k-oar-1.1.6/lib/storage5k/oar-server-modules/oarstat.rb:45:in `block in myjobs'<br/> from /var/lib/gems/2.3.0/gems/storage5k-oar-1.1.6/lib/storage5k/oar-server-modules/oarstat.rb:31:in `each'<br/> from /var/lib/gems/2.3.0/gems/storage5k-oar-1.1.6/lib/storage5k/oar-server-modules/oarstat.rb:31:in `myjobs'<br/> from /var/lib/gems/2.3.0/gems/storage5k-oar-1.1.6/lib/storage5k/oar-server-modules/info.rb:14:in `puts_info'<br/> from /var/lib/gems/2.3.0/gems/storage5k-oar-1.1.6/lib/storage5k/application/storage5k.rb:180:in `oar_run_application'<br/> from /var/lib/gems/2.3.0/gems/storage5k-oar-1.1.6/lib/storage5k/application.rb:75:in `run'<br/> from /var/lib/gems/2.3.0/gems/storage5k-oar-1.1.6/bin/storage5k:12:in `<top (required)>'<br/> from /usr/local/bin/storage5k:22:in `load'<br/> from /usr/local/bin/storage5k:22:in `<main>'
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>DUPLICATE</td></tr>
<tr><td>Tags:</td><td>storage5k, lyon</td></tr>
<tr><td>Start date:</td><td>2017-06-27 06:27:32 +0200</td></tr>
<tr><td>End date:</td><td>2018-06-14 08:32:43 +0200</td></tr>
<tr><td>Event created at:</td><td>2017-06-27 06:27:32 +0200</td></tr>
<tr><td>Event updated at:</td><td>2018-06-14 08:32:43 +0200</td></tr>
</tbody>
</table>
2017-06-27T06:27:32+02:002018-06-14T08:32:43+02:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=8277[RESOLVED] #Maintenance #API from 2018-01-11@09:00 to 2018-01-11@12:00Sébastien Philippot
<p>Reported by Sébastien Philippot, assigned to Sébastien Philippot. The event is expected to last for ~3 hours.</p>
<p>
Host needs to reboot on up-to-date kernel.<br/>The external api server will be reboot in order to apply update.<br/>A short interruption will happened.
</p>
<table border="0">
<tbody>
<tr><td>Resolution:</td><td>DUPLICATE</td></tr>
<tr><td>Tags:</td><td>maintenance, api</td></tr>
<tr><td>Start date:</td><td>2018-01-11 09:00:00 +0100</td></tr>
<tr><td>End date:</td><td>2018-01-11 12:00:00 +0100</td></tr>
<tr><td>Event created at:</td><td>2018-01-08 09:44:41 +0100</td></tr>
<tr><td>Event updated at:</td><td>2018-01-10 17:23:44 +0100</td></tr>
</tbody>
</table>
2018-01-08T09:44:41+01:002018-01-10T17:23:44+01:00https://intranet.grid5000.fr/bugzilla/show_bug.cgi?id=8892