FAQ: Difference between revisions

From Grid5000
Jump to navigation Jump to search
 
(323 intermediate revisions by 38 users not shown)
Line 1: Line 1:
{{Portal|User}}
{{Portal|Tutorial}}
__TOC__
__TOC__
== Why does my home directory not contain the same files on every site ? ==
== About this document ==
Every site has its own file server, this is the user responsibility to synchronise the personal data between his home directory on the different sites. You may use the ''rsync'' command to synchronise a remote site home directory (be careful this will erase any file that are not the same as on the local home directory):
=== How to add/correct an entry to the FAQ? ===
  rsync -n --delete -avz ~ sync.''site''.grid5000.fr:~
{{Note|text=Just like any other page of this wiki, you can edit the FAQ yourself to improve it. If you click on one of the little "edit" placed after each question, you'll get the possibility to edit that particular question. To edit the whole page, simply choose the edit tab at the top of the page.}}
 
== Publications and Grid'5000 ==
=== Is there an official acknowledgement ? ===
Yes there is: you agreed to it when accepting the usage policy. As the policy might have been updated since, please refer to the [[Grid5000:UsagePolicy|latest version]]. You should use it on all publications presenting results obtained (even partially) using Grid'5000.
 
=== How to mention Grid'5000 in HAL  ? ===
[http://hal.inria.fr HAL] is an open archive you're invited to use. If you do so, the recommended way of mentioning Grid'5000 is to use the collaboration field of submission form, with the '''Grid'5000''' keyword, capitalized as such.
 
== Account management ==
=== I forgot my password, how can I retrieve it ? ===
To retrieve your password, you can use [[Special:G5KChangePassword|this form]], or ask your account manager to reset it.
 
=== My account expired, how can I extend it? ===
Use the account management interface (''Manage account'' link in the sidebar).
 
=== Why doesn't my home directory contain the same files on every site? ===
Every site has its own file server, this is the user's responsibility to synchronize the personal data between his home directory on the different sites. You may use the <code class="command">rsync</code> command to synchronize a remote site home directory (be careful this will erase any file that are not the same as on the local home directory):
  <code class="command">rsync</code> -n --delete -avz <code class="file">~</code> <code class="host">frontend.</code><code class="replace">site</code><code class="host">.grid5000.fr</code>:<code class="file">~</code>


'''NB :''' please remove the ''-n'' argument once you are sure you actually don't want to do a ''dry-run'' only...;)
'''NB :''' please remove the ''-n'' argument once you are sure you actually don't want to do a ''dry-run'' only...;)


== How to restore a wrongly deleted file ? ==
=== How to get my home mounted on deployed nodes? ===
No backup falicity is provided by Grid'5000 platform. Please watch your fingers and do backup your data using external backup services.
This is completely automatic if you deploy a *-nfs or *-big image (automount).
* You can connect using your own username and should land in your home;
* If connecting as root, once connected to the node, just change directory your home and it will be mounted:
  <code class="command">cd</code> /home/<code class="replace">username</code>
{{Note|text=But home of other users cannot be mounted, for security reasons.}}
 
=== How to restore a wrongly deleted file? ===
No backup facility is provided by Grid'5000 platform. Please watch your fingers and do backup your data using external backup services.
 
=== What about disk quotas ? ===
See the section about the <code class=file>/home</code> in the [[Storage#.2Fhome|Storage]] page.
 
=== How do I unsubscribe from the mailing-list ? ===
 
Users' mailing-list subscription is tied to your Grid'5000 account.  You can configure your subscriptions in your account settings:
 
[[File:Grid5000-unsubscribe-mailing-list.png|thumb|How to unsubscribe from the mailing list]]
* Login to https://api.grid5000.fr/ui/account
* Go to the "My account" tab, then click on the "Actions" button, then choose "Manage mailing lists"
 
Alternate method, by configuring Sympa to stop receiving any email from the list (while still being subscribed):
 
* If you haven't done it before, ask for a password on ''sympa.inria.fr'' from this form: https://sympa.inria.fr/sympa/firstpasswd/. '''Use the email address you used to register to Grid'5000'''.
* Connect to https://sympa.inria.fr using your email address you used to register to Grid'5000 and your ''sympa.inria.fr'' password.
* From the left panel, select ''users_grid5000''. Then go to your subscriber options (''Options d'abonné'') and in the ''reception'' field (''Mode de réception''), select ''suspend'' (''interrompre la réception des messages'').
 
== Network access to/from Grid'5000 ==
=== How can I connect to Grid'5000 ? ===
This is documented at length in the [[Getting Started]] tutorial.
 
You should be able to access Grid'5000 from anywhere on the Internet, by connecting to <code class="host">access.grid5000.fr</code> using SSH. You'll need SSH keys properly configured (please refer to [[SSH#SSH_Key_usage| the page dedicated to SSH]] if you don't understand these last words) as this machine will not allow you to log using a password.
 
Some sites have an <code class="host">access.</code><code class="replace">site</code><code class="host">.grid5000.fr</code> machine, which is only reachable from an IP address coming from local laboratory (replace <code class="replace">site</code> with the actual site name).
 
=== How to connect from different workstations with the same account? ===
You can associate several public SSH keys to your account. In order to do so, you have to:
* login
* go to  [https://api.grid5000.fr/ui/account User Portal > Manage Account],
* select the ''My account'' top tab,
* select the ''SSH keys'' left tab,
* then, manage your keys:
** add a new public SSH key ;
** remove an old one.
 
More information in the [[SSH]] page and the [[Public key authentication]] page.
 
=== How to directly connect by SSH to any machine within Grid'5000 from my workstation? ===
This tip consists of customizing SSH configuration file <code class="file">~/.ssh/config</code> (compatible with OpenSSH ssh client)
 
Host *.g5k
    User <code class="replace">login</code>
    ProxyCommand <code class="command">ssh</code> <code class="replace">login</code>@access.grid5000.fr -W "$(basename %h .g5k):%p"
 
You can then connect to any machine using <code class="command">ssh</code> <code class="replace">machine.site</code><code>.g5k</code>
 
Please have a look at the '''[[SSH]]''' page for a deeper understanding and more information.
 
For users of ''powershell'' in ''Microsoft Windows'' which also comes with OpenSSH ssh client, mind adapting the configuration as the <code class="command">basename</code> command may not be available.
 
{{Note|text=Grid'5000 internal network uses private IP V4 addresses and are not directly reachable from outside of Grid'5000}}
 
=== Is access to the Internet possible from nodes? ===
Full Internet access is allowed from Grid'5000 network to the Internet.
 
All IPv4 communication is NATed, while with [[IPv6]] each node uses its own public IPv6 address.
 
{{Warning|text=For security reasons, all connections are logged.}}
 
=== What is the source address of outcoming traffic from Grid'5000 nodes to the Internet? ===
The IPv4 outcoming traffic from Grid'5000 nodes to the Internet is NATed. The public IPv4 addresses used as sources for the NATed packets are:
    194.254.60.35 (nr-lil-536.grid5000.fr)
    194.254.60.13 (nr-sop-535.grid5000.fr)
 
=== How can I connect to an HTTP or HTTPS service running on a node? ===
 
See the [[HTTP/HTTPs_access]] page.
 
=== How can I share file from Grid'5000 using HTTP? ===
See the [[HTTP/HTTPs_access]] page.
 
=== Could I access Grid'5000 nodes directly from the internet? ===
For other protocols than [[SSH#Easing_SSH_connections_from_the_outside_to_Grid.275000|SSH]] and [[HTTP/HTTPs_access|HTTP/HTTPs]] which provide lighter specific solutions, see the [[VPN]] and [[Reconfigurable_Firewall]].
 
=== SSH related questions ===
See the [[SSH]] page.
 
== Software installation issues ==
=== What is the general philosophy ? ===
This is how things should work: a basic set of software is installed on the frontends and nodes' standard environment of each site. If you need some other software packages on nodes, you can create a Kadeploy image including them, and deploy it. You can also use at sudo-g5k. If you think those software should be installed by default, you can contact the [[Support|support-staff]].
 
== Deployment related issues ==
See [[Advanced_Kadeploy#FAQ]].
 
== About resources reservations (jobs) ==
 
=== How can I check whether my reservations are respecting the Grid'5000 Usage Policy? ===
You can use the script <code>usagepolicycheck</code>, present on all frontends. See if your current reservations are respecting the Policy with <code>usagepolicycheck -t</code>, use <code>usagepolicycheck -h</code> to see the other options.
 
To help respecting the usage policy, it is possible to use <code>day</code> and <code>night</code> OAR job types to fit batch jobs inside day vs. night / week-end time frames. More details are available in the [[Advanced OAR#Restricting_jobs_to_daytime_or_night.2Fweek-end_time|Advanced OAR]] guide.
 
=== How can I reserve resources purchased by my team for a longer duration (e.g. 1 month)? ===
 
If your team purchased specific computing resources (you already have 'p1' access to them), and you need a reservation that is longer than 1 week, you must email the Abaca/SLICES-FR Technical Team <support-staff@lists.grid5000.fr>, with your team leader in Cc, and the following information:
 
  Subject: Long job execution request
  * site:
  * cluster:
  * number of nodes:
  * date/time of the reservation:
  * duration of the reservation:
  * short explanation of the need for a long job:
 
== How can I execute a campaign of tasks within previously reserved resources? (or smaller job in a bigger job) ==
This can be done either with OAR's ''container'' jobs, or with '''[[GNU Parallel]]''':
* If all jobs, container and inner are from a same user, using '''[[GNU Parallel]]''' should be '''preferred''.
* Container job are mostly relevant for tutorials or teaching labs, where jobs are created by a set of '''different users'''. More information in [[Advanced_OAR#Container_jobs]]
 
== About checkpoint/restart support of job ==
 
The Grid'5000 OAR service setup does not provide a seamless checkpoint/restart mechanism for jobs. While this is obviously a most wanted feature especially for long-running tasks that have to be split in order to fit in the platform usage policy, we think this is better to let the user take care of it. Indeed, while some techniques exist, such as [https://criu.org/ CRIU], none seems satisfactory enough for a sustainable deployment in Grid'5000.
 
Note that OAR provides a [[Advanced_OAR#Using_the_checkpointing_trigger_mechanism|mechanism]] to trigger an application to checkpoint itself, and to get a checkpointed job resubmitted.
 
== Continuous Integration (CI) jobs ==
 
Running CI tasks on Grid'5000 is allowed, but special precautions must be taken:
* Inform the [[Support|support staff]] that you plan to use Grid'5000 for CI
* Use a dedicated user account (not your personal user account) that reflects your project's name, and make sure that the ''Professional status/Employee type'' is set to ''bot''. This is important to allow differentiating between your own personal usage, and usage potentially generated by others through CI (however, remember that you remain responsible for the usage made by your project's bot account). It also allows the testbed operators to track usage generated by CI (for statistics).
** If you need to share data between your personal user account and your bot account, you can use a [[Group Storage]].
* If you use GitHub, [https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks configure GitHub Actions to require approval before running workflows from external collaborators].
 
Orchestrating such tasks can be done using the [[API|Grid'5000 REST API]], together with client libraries described on [[Grid5000:Software|Software]] and [[Experiment_scripting_tutorial]].
 
Several schemas are possible to run such tasks from GitLab (and manage credentials):
* Use an existing GitLab runner (such as GitLab's shared ones), store credentials (Grid5000 user account and password) in GitLab secrets, and create a job that will reserve resources as needed (typically using the Grid5000 API). See for example [https://gitlab.inria.fr/discovery/enoslib/-/blob/main/.gitlab-ci.yml?ref_type=heads#L45 test_invivo_g5k* in EnOSLib's .gitlab-ci.yml]
* Run your own GitLab runner on a Grid5000 frontend, as documented in the [https://gitlab.inria.fr/gitlabci_gallery/orchestration/supercomputer-oar GitLab CI gallery]. In that case, you do not need to store your Grid5000 user account and password in your home directory (because users are automatically identified when using the API from frontends). However you will need to store the gitlab runner token in your home directory, which might be a security issue (homes are not suitable for storing sensitive data).
* Use a [[Persistent Virtual Machine]] to host your GitLab runner service. All credentials (Grid5000's and runner's) are stored in the virtual machine.
 
== Maintenance on Grid'5000 ==
 
A maintenance slot is planned every Thursday on Grid'5000.
 
If a maintenance can impact the users jobs, we announce it on the mailing list users@lists.grid5000.fr .
 
When a maintenance is announced, you can follow its progress on ''[https://www.grid5000.fr/status/ the platform's operation schedule]''
 
== How to use MPI in Grid5000? ==
 
See [[Run_MPI_On_Grid'5000|The MPI Tutorial]].


== How to get a site list of nodes ? ==
== How to share data with other users in Grid5000? ==
La convention de nommage choisie impose que les noeuds aient un nom de la forme <code class="host">node-''xx''.''site''.grid5000.fr</code>, il est donc possible pour un site donnée d'extraire du DNS la liste des noeuds. La définition de la fonction shell (utilisant <code>bash</code>) est donnée en exemple:
function nodelist {
    dig "${1}.grid5000.fr" axfr | grep -E '^node-[0-9]+\.' | cut -f 1 -d ' '
}
Dans le shell il suffit alors d'exécuter pour avoir la liste des noeuds de lyon: <code class="command">nodelist lyon</code>


== Y a-t-il un moyen d'obtenir l'ensemble des clefs SSH des machines? ==
See [[Storage]].
Afin d'éviter de répondre 'yes' lors des connections ssh vers de nouvelles machines, il est possible de générer le fichier <code class="file">~/.ssh/known_hosts</code> automatiquement pour un ''site'', le contenu étant donné par la commande:
nodelist ''site'' | ssh-keyscan -tdsa,rsa  -f -


== Est-il possible de ne pas avoir à valider les clés SSH des machines? ==
== How do I access to other scientific infrastructures from Grid'5000 ? ==
Il est possible d'utiliser l'option <code>StrictHostKeyChecking</code> positionnée à <code>no</code>, soit dans le fichier de configuration <code class="file">~/.ssh/config</code> (ie: <code>StrictHostKeyChecking no</code>), soit passé en argument à <code class="command">ssh</code> (ie: <code>-o StrictHostKeyChecking=no</code>).


== Comment tester rapidement qu'un ensemble de machines marche? ==
=== Jean Zay supercomputer (and possibly others GENCI supercomputers) ===
Si le site à installé la commande <code class="commande">nmap</code> ou <code class="command">fping</code>, il est possible de tester l'état des noeuds (déduit en fonction des requêtes ICMP), au choix, avec pour exemple la liste des noeuds d'un site:
nodelist ''site'' | nmap -iL - -sP


nodelist ''site'' | fping -a 2> /dev/null
If you have an account on the Jean Zay supercomputer operated by the Institute for Development and Resources in Intensive Scientific Computing (IDRIS), it is possible to connect directly to it using ssh/scp/sftp from Grid'5000 frontends or reserved nodes.


== Comment tuer tous mes processus? ==
For this to be effective, you must add the Grid'5000 SSH outcoming IP addresses to the list of the IP addresses bound to your Jean Zay account.
* Sur la machine actuellement connecté (attention, cela a pour effet de vous déconnecter):
kill -KILL -1
* Depuis, le frontal par exemple, pour l'ensemble des noeuds d'un site:
for node in `nodelist ''site''` ; do
  ssh -o StrictHostKeyChecking=no $node kill -KILL -1
done


== Comment me connecter directement sur les noeuds depuis mon poste de travail? ==
These addresses are:
L'astuce consiste à chainer les commandes <code class="command">ssh</code> et à automatiser ceci en utilisant le fichier de configuration <code class="file">~/.ssh/config</code>.
L'exemple donné ici, nécessite l'emploi de la commande <code class="command">nc</code> ou <code class="command">tcpconnect</code> qui couple une connection réseau avec ''stdin''/''stdout'':
Host frontale.*.grid5000.fr
Host *.grid5000.fr
    User ''login''
    ProxyCommand ssh ''login''@''frontal'' "nc %h %p"


'''Note''': Les noms de machines en <code class="host">*.grid5000.fr</code> ne sont normalement pas connu à l'extérieur de Grid5000.
* 194.254.60.35 (nr-lil-536.grid5000.fr)
* 194.254.60.13 (nr-sop-535.grid5000.fr)


'''Attention''': Les commandes <code class="command">nc</code> et <code class="command">tcpconnect</code> ne sont pas nécessairement disponibles sur les frontaux.
The procedure is the following:


== Comment accéder à Internet depuis les noeuds ? ==
* First download from the IDRIS website the form required to manage your account
Pour des raisons de sécurité, '''il n'est pas possible de se connecter à Internet''' depuis l'intérieur de Grid5000.
** English: http://www.idris.fr/media/eng/forms/fgc-eng.pdf
Toute fois, dans la mesure où le ''port forwarding'' via <code class="command">ssh</code> n'a pas encore été supprimé, il est possible d'utiliser la commande suivante, qui permet lorsque l'on se connecte sur le port ''port_g5k'' de la machine ''host_g5k'', de faire comme si la connexion avait été établie sur le port ''port'' de la machine ''host'' (à combiner avec l'[[#Comment me connecter directement sur les noeuds depuis mon poste de travail?|accès direct aux machines]]).
** French: http://www.idris.fr/media/data/formulaires/fgc.pdf
:
ssh -R ''port_g5k'':''host'':''port'' ''host_g5k''.''site''.grid5000.fr
'''Attention''': Si il s'agit d'un véritable besoin vous pouvez en toucher un mot à votre administrateur local avant que celui-ci s'en aperçoive et décide de désosser votre station de travail.


'''Note''': Il est possible de combiner cette méthode avec un proxy web comme [http://www.squid-cache.org/ squid] ou [http://httpd.apache.org/ Apache] afin d'accéder aux serveurs web d'Internet.
* Then fill in the required sections:
** "Add, modify or delete machines"
*** add IP/name of the two IP/name addresses cited above
*** both you and your associated security manager must sign this part of the form
** "Complete this box only if the machines are under the responsibility of an organisation or a department which is not the demander’s organisation"
*** Organisation hosting the machines: '''GIS Grid'5000'''
*** Laboratory unit number (if CNRS) or acronym: '''Grid'5000'''
*** Address: '''https://www.grid5000.fr'''
*** Telephone: leave this field blank
*** Last name, first name and qualification/function of the site manager: '''Guillaume Schreiner, Technical Director'''
*** Professional e-mail address: '''support-staff@lists.grid5000.fr'''
*** Telephone: leave this field blank
* Send us your request by mail at '''support-staff@lists.grid5000.fr''':
** Subject: ''Request to connect to Jean Zay supercomputer from Grid'5000''
** Attached: the above form filled and signed (PDF)
** Body of the mail (example): ''Hello, could you please sign the attached form because I need it to access to Jean Zay from Grid'5000 ? Best regards. YOU.''
* We will send you back the form with our signature, and you will have to send the form to '''gestutil@idris.fr''' (this will take roughly a day for this to be effective)

Latest revision as of 14:21, 18 December 2024

About this document

How to add/correct an entry to the FAQ?

Note.png Note

Just like any other page of this wiki, you can edit the FAQ yourself to improve it. If you click on one of the little "edit" placed after each question, you'll get the possibility to edit that particular question. To edit the whole page, simply choose the edit tab at the top of the page.

Publications and Grid'5000

Is there an official acknowledgement ?

Yes there is: you agreed to it when accepting the usage policy. As the policy might have been updated since, please refer to the latest version. You should use it on all publications presenting results obtained (even partially) using Grid'5000.

How to mention Grid'5000 in HAL  ?

HAL is an open archive you're invited to use. If you do so, the recommended way of mentioning Grid'5000 is to use the collaboration field of submission form, with the Grid'5000 keyword, capitalized as such.

Account management

I forgot my password, how can I retrieve it ?

To retrieve your password, you can use this form, or ask your account manager to reset it.

My account expired, how can I extend it?

Use the account management interface (Manage account link in the sidebar).

Why doesn't my home directory contain the same files on every site?

Every site has its own file server, this is the user's responsibility to synchronize the personal data between his home directory on the different sites. You may use the rsync command to synchronize a remote site home directory (be careful this will erase any file that are not the same as on the local home directory):

rsync -n --delete -avz ~ frontend.site.grid5000.fr:~

NB : please remove the -n argument once you are sure you actually don't want to do a dry-run only...;)

How to get my home mounted on deployed nodes?

This is completely automatic if you deploy a *-nfs or *-big image (automount).

  • You can connect using your own username and should land in your home;
  • If connecting as root, once connected to the node, just change directory your home and it will be mounted:
 cd /home/username
Note.png Note

But home of other users cannot be mounted, for security reasons.

How to restore a wrongly deleted file?

No backup facility is provided by Grid'5000 platform. Please watch your fingers and do backup your data using external backup services.

What about disk quotas ?

See the section about the /home in the Storage page.

How do I unsubscribe from the mailing-list ?

Users' mailing-list subscription is tied to your Grid'5000 account. You can configure your subscriptions in your account settings:

How to unsubscribe from the mailing list

Alternate method, by configuring Sympa to stop receiving any email from the list (while still being subscribed):

  • If you haven't done it before, ask for a password on sympa.inria.fr from this form: https://sympa.inria.fr/sympa/firstpasswd/. Use the email address you used to register to Grid'5000.
  • Connect to https://sympa.inria.fr using your email address you used to register to Grid'5000 and your sympa.inria.fr password.
  • From the left panel, select users_grid5000. Then go to your subscriber options (Options d'abonné) and in the reception field (Mode de réception), select suspend (interrompre la réception des messages).

Network access to/from Grid'5000

How can I connect to Grid'5000 ?

This is documented at length in the Getting Started tutorial.

You should be able to access Grid'5000 from anywhere on the Internet, by connecting to access.grid5000.fr using SSH. You'll need SSH keys properly configured (please refer to the page dedicated to SSH if you don't understand these last words) as this machine will not allow you to log using a password.

Some sites have an access.site.grid5000.fr machine, which is only reachable from an IP address coming from local laboratory (replace site with the actual site name).

How to connect from different workstations with the same account?

You can associate several public SSH keys to your account. In order to do so, you have to:

  • login
  • go to User Portal > Manage Account,
  • select the My account top tab,
  • select the SSH keys left tab,
  • then, manage your keys:
    • add a new public SSH key ;
    • remove an old one.

More information in the SSH page and the Public key authentication page.

How to directly connect by SSH to any machine within Grid'5000 from my workstation?

This tip consists of customizing SSH configuration file ~/.ssh/config (compatible with OpenSSH ssh client)

Host *.g5k
   User login
   ProxyCommand ssh login@access.grid5000.fr -W "$(basename %h .g5k):%p"

You can then connect to any machine using ssh machine.site.g5k

Please have a look at the SSH page for a deeper understanding and more information.

For users of powershell in Microsoft Windows which also comes with OpenSSH ssh client, mind adapting the configuration as the basename command may not be available.

Note.png Note

Grid'5000 internal network uses private IP V4 addresses and are not directly reachable from outside of Grid'5000

Is access to the Internet possible from nodes?

Full Internet access is allowed from Grid'5000 network to the Internet.

All IPv4 communication is NATed, while with IPv6 each node uses its own public IPv6 address.

Warning.png Warning

For security reasons, all connections are logged.

What is the source address of outcoming traffic from Grid'5000 nodes to the Internet?

The IPv4 outcoming traffic from Grid'5000 nodes to the Internet is NATed. The public IPv4 addresses used as sources for the NATed packets are:

   194.254.60.35 (nr-lil-536.grid5000.fr)
   194.254.60.13 (nr-sop-535.grid5000.fr)

How can I connect to an HTTP or HTTPS service running on a node?

See the HTTP/HTTPs_access page.

How can I share file from Grid'5000 using HTTP?

See the HTTP/HTTPs_access page.

Could I access Grid'5000 nodes directly from the internet?

For other protocols than SSH and HTTP/HTTPs which provide lighter specific solutions, see the VPN and Reconfigurable_Firewall.

SSH related questions

See the SSH page.

Software installation issues

What is the general philosophy ?

This is how things should work: a basic set of software is installed on the frontends and nodes' standard environment of each site. If you need some other software packages on nodes, you can create a Kadeploy image including them, and deploy it. You can also use at sudo-g5k. If you think those software should be installed by default, you can contact the support-staff.

Deployment related issues

See Advanced_Kadeploy#FAQ.

About resources reservations (jobs)

How can I check whether my reservations are respecting the Grid'5000 Usage Policy?

You can use the script usagepolicycheck, present on all frontends. See if your current reservations are respecting the Policy with usagepolicycheck -t, use usagepolicycheck -h to see the other options.

To help respecting the usage policy, it is possible to use day and night OAR job types to fit batch jobs inside day vs. night / week-end time frames. More details are available in the Advanced OAR guide.

How can I reserve resources purchased by my team for a longer duration (e.g. 1 month)?

If your team purchased specific computing resources (you already have 'p1' access to them), and you need a reservation that is longer than 1 week, you must email the Abaca/SLICES-FR Technical Team <support-staff@lists.grid5000.fr>, with your team leader in Cc, and the following information:

 Subject: Long job execution request
 * site:
 * cluster:
 * number of nodes:
 * date/time of the reservation:
 * duration of the reservation:
 * short explanation of the need for a long job:

How can I execute a campaign of tasks within previously reserved resources? (or smaller job in a bigger job)

This can be done either with OAR's container jobs, or with GNU Parallel:

  • If all jobs, container and inner are from a same user, using GNU Parallel' should be preferred.
  • Container job are mostly relevant for tutorials or teaching labs, where jobs are created by a set of different users. More information in Advanced_OAR#Container_jobs

About checkpoint/restart support of job

The Grid'5000 OAR service setup does not provide a seamless checkpoint/restart mechanism for jobs. While this is obviously a most wanted feature especially for long-running tasks that have to be split in order to fit in the platform usage policy, we think this is better to let the user take care of it. Indeed, while some techniques exist, such as CRIU, none seems satisfactory enough for a sustainable deployment in Grid'5000.

Note that OAR provides a mechanism to trigger an application to checkpoint itself, and to get a checkpointed job resubmitted.

Continuous Integration (CI) jobs

Running CI tasks on Grid'5000 is allowed, but special precautions must be taken:

  • Inform the support staff that you plan to use Grid'5000 for CI
  • Use a dedicated user account (not your personal user account) that reflects your project's name, and make sure that the Professional status/Employee type is set to bot. This is important to allow differentiating between your own personal usage, and usage potentially generated by others through CI (however, remember that you remain responsible for the usage made by your project's bot account). It also allows the testbed operators to track usage generated by CI (for statistics).
    • If you need to share data between your personal user account and your bot account, you can use a Group Storage.
  • If you use GitHub, configure GitHub Actions to require approval before running workflows from external collaborators.

Orchestrating such tasks can be done using the Grid'5000 REST API, together with client libraries described on Software and Experiment_scripting_tutorial.

Several schemas are possible to run such tasks from GitLab (and manage credentials):

  • Use an existing GitLab runner (such as GitLab's shared ones), store credentials (Grid5000 user account and password) in GitLab secrets, and create a job that will reserve resources as needed (typically using the Grid5000 API). See for example test_invivo_g5k* in EnOSLib's .gitlab-ci.yml
  • Run your own GitLab runner on a Grid5000 frontend, as documented in the GitLab CI gallery. In that case, you do not need to store your Grid5000 user account and password in your home directory (because users are automatically identified when using the API from frontends). However you will need to store the gitlab runner token in your home directory, which might be a security issue (homes are not suitable for storing sensitive data).
  • Use a Persistent Virtual Machine to host your GitLab runner service. All credentials (Grid5000's and runner's) are stored in the virtual machine.

Maintenance on Grid'5000

A maintenance slot is planned every Thursday on Grid'5000.

If a maintenance can impact the users jobs, we announce it on the mailing list users@lists.grid5000.fr .

When a maintenance is announced, you can follow its progress on the platform's operation schedule

How to use MPI in Grid5000?

See The MPI Tutorial.

How to share data with other users in Grid5000?

See Storage.

How do I access to other scientific infrastructures from Grid'5000 ?

Jean Zay supercomputer (and possibly others GENCI supercomputers)

If you have an account on the Jean Zay supercomputer operated by the Institute for Development and Resources in Intensive Scientific Computing (IDRIS), it is possible to connect directly to it using ssh/scp/sftp from Grid'5000 frontends or reserved nodes.

For this to be effective, you must add the Grid'5000 SSH outcoming IP addresses to the list of the IP addresses bound to your Jean Zay account.

These addresses are:

  • 194.254.60.35 (nr-lil-536.grid5000.fr)
  • 194.254.60.13 (nr-sop-535.grid5000.fr)

The procedure is the following:

  • Then fill in the required sections:
    • "Add, modify or delete machines"
      • add IP/name of the two IP/name addresses cited above
      • both you and your associated security manager must sign this part of the form
    • "Complete this box only if the machines are under the responsibility of an organisation or a department which is not the demander’s organisation"
      • Organisation hosting the machines: GIS Grid'5000
      • Laboratory unit number (if CNRS) or acronym: Grid'5000
      • Address: https://www.grid5000.fr
      • Telephone: leave this field blank
      • Last name, first name and qualification/function of the site manager: Guillaume Schreiner, Technical Director
      • Professional e-mail address: support-staff@lists.grid5000.fr
      • Telephone: leave this field blank
  • Send us your request by mail at support-staff@lists.grid5000.fr:
    • Subject: Request to connect to Jean Zay supercomputer from Grid'5000
    • Attached: the above form filled and signed (PDF)
    • Body of the mail (example): Hello, could you please sign the attached form because I need it to access to Jean Zay from Grid'5000 ? Best regards. YOU.
  • We will send you back the form with our signature, and you will have to send the form to gestutil@idris.fr (this will take roughly a day for this to be effective)