Getting Started: Difference between revisions

From Grid5000
Jump to navigation Jump to search
Line 132: Line 132:
* <code class="command">big</code> <code class="command">nfs</code>  + packages for development, system tools, editors, shells.
* <code class="command">big</code> <code class="command">nfs</code>  + packages for development, system tools, editors, shells.
* <code class="command">prod</code>: <code class="command">big</code> + integration with OAR. The <code class="command">squeeze</code> version of this environment is the one used when you use nodes without deploying.
* <code class="command">prod</code>: <code class="command">big</code> + integration with OAR. The <code class="command">squeeze</code> version of this environment is the one used when you use nodes without deploying.
<!-- see https://www.grid5000.fr/mediawiki/index.php/New_maintenance_strategy_of_reference_images for details -->


== Conclusions ==
== Conclusions ==
{{Todo|text=summarize important stuff seen in this tutorial}}
{{Todo|text=summarize important stuff seen in this tutorial}}
{{Todo|text=introduce next tutorials}}
{{Todo|text=introduce next tutorials}}

Revision as of 12:08, 6 November 2012

Warning.png Warning

Work in progress

Grid'5000 is a scientific instrument that supports large-scale, reproducible experiments in the context of research on distributed systems (Cloud, Grid, HPC, P2P systems).

This tutorial will guide you through your first steps on Grid'5000. Before proceeding, make sure you have a Grid'5000 account (if not, follow this procedure), and an SSH client.

Getting support

The Support page describes how to get help during your Grid'5000 usage. There's also an FAQ.

Connecting for the first time and preparing your SSH environment

Step 1: Connect to Grid'5000

Terminal.png outside:
ssh login@access.grid5000.fr

You will get authenticated using the SSH public key you provided in the account creation form.

The access.grid5000.fr address points to two actual machines: access-south in Sophia and access-north in Lille. Those machines provide SSH access to Grid'5000 from Internet.

Note.png Note

If you prefer, you might also be able to connect directly to your local Grid'5000 site, but per-site access restrictions are applied, so using access.grid5000.fr is usually a safer choice. See External_access for details about local access machines.

Grid'5000 is structured in sites (Grenoble, Rennes, Nancy, ...). Each site hosts one or more clusters. The primary way to move around Grid'5000 is using SSH. It is recommended that you use a second SSH key, created without a passphrase, and that you use it inside Grid'5000 to move around. The next steps of this tutorial will guide you through creating that SSH key, and configuring your SSH environment on all sites.

Step 2: Create an new SSH key with ssh-keygen

Terminal.png access:
ssh-keygen

Generating public/private rsa key pair.
Enter file in which to save the key (
/home/login/.ssh/id_rsa):(press Enter)
Enter passphrase (empty for no passphrase):(press Enter)
Enter same passphrase again:(press Enter)
Your identification has been saved in /home/login/.ssh/id_rsa.

Your public key has been saved in /home/login/.ssh/id_rsa.pub.

You have a different home directory on each Grid'5000 site, so you will usually use Rsync or scp to move data around. Note that home directories on Grid'5000 are not backed up: it is your responsibility to save important data outside Grid'5000 (or to copy data to several Grid'5000 sites in order to increase redundancy). Also note that quotas are applied -- by default, you get about 25 GB per Grid'5000 site. If your usage of Grid'5000 requires more disk space, it is possible to request quota extensions in the account management interface, or to use other storage solutions (see Storage5k).

Todo.png Todo

update reference to Storage5k tutorial once it is written. the current page does not look like a user-friendly tutorial

On access machines, you have direct access to each of those home directory (through NFS mounts). In the next two steps of this tutorial, we will use that feature to propagate your SSH key to each site. First, we will prepare your SSH configuration for one site, then we will copy it to all other sites.

Step 3: Prepare your SSH configuration on one site

We will prepare the bordeaux site, then duplicate its configuration everywhere.

First, copy your new SSH keys to your .ssh directory in bordeaux:

Terminal.png access:
cp .ssh/id_rsa* bordeaux/.ssh/

Now, add your new SSH public key to bordeaux's authorized_keys file:

Terminal.png access:
cat .ssh/id_rsa.pub >> bordeaux/.ssh/authorized_keys

Step 4: Push your SSH configuration to all sites

We will use a shell trick to copy your SSH configuration to all sites, and to the other access machine:

Terminal.png access:
for site in $(ls); do cp bordeaux/.ssh/* $site/.ssh/; done

An error message about bordeaux is normal.

Step 5: Use SSH to connect to another site

Terminal.png access:
ssh nancy

The figure below shows how you just connected from your local machine to access, and then to the site frontend in nancy. Site frontends (named fsite.site.grid5000.fr or simply site.grid5000.fr) are the machines you will use to interact with Grid'5000 tools such as OAR and Kadeploy. Those machines are virtual machines, and must not be used for CPU or I/O intensive tasks (nodes must be used instead).

Grid5000 SSH access.png

Note.png Note

If you are using Linux, Mac OS X, or another Unix-based system, it is recommended to configure your SSH client to enable shortcuts. Once done, you will be able to connect to any machine inside Grid'5000 in one shot, using ssh machine.g5k. See this page for details.

Visualization and reservation of Grid'5000 resources

At this point, you should be connected to a site frontend, as indicated by your shell prompt (login@fsite:~$). This machine will be used to reserve and manipulate resources on this site, using the OAR software suite.

Discovering and visualizing resources

There are several ways to learn about the site's resources and their status:

  • The site's MOTD (message of the day) lists all clusters and their features. Additionally, it gives the list of current or future downtimes due to maintenance, which is also available from https://www.grid5000.fr/status/.
  • Site pages on the wiki (e.g. Nancy:Home) contain a detailed description of the site's hardware and network:
  • The Status page links to the resource status on each site, with two different visualizations available: Monika (see Nancy's current status) and Gantt (see Nancy's current status).
  • Using the API (we'll look at that latter on) to browse a machine readable description of Grid'5000 and machine readable status information. This web UI or that one can be used to discover resources. Note that due to a bug, those interfaces do not currently work with the Chrome web browser.

Reserving resources: submitting OAR jobs

Please run

Terminal.png fnancy:
oarsub -I
Terminal.png griffon-54:
exit

With the first command, you request resources in interactive mode. Notice that with no parameters, oarsub gave you 1 resource for one hour. You where also directly connected to the node you reserved with an interactive shell, and when you run exit, you are disconnected and your reservation is terminated. To avoid anticipated termination of your jobs in case or errors, you can reserve and connect in 2 steps using the job id associated to your reservation

Terminal.png fnancy:
oarsub "sleep 10d"
Terminal.png fnancy:
oarsub -C job_id
Terminal.png griffon-25:
java -version

mpirun --version
whoami

env | grep OAR # discover environment variable set by OAR

Of course, you will probably want to use more than one node on a given site, and you might want them for a different duration than one hour.

Terminal.png fnancy:
oarsub -I -l nodes=2,walltime=0:30:0

By default, you can only connect to nodes in your reservation, and only using the oarsh connector to go from one node to the other. The connector supports the same options as the classical ssh, so it can be used as a replacement for software expecting ssh.

Terminal.png grillon-49:

uniq $OAR_FILENODE # list of resources of your reservation
oarsh grillon-1#use a node not in the file (will fail)
oarsh grillon-54 #use the other node of your reservation

ssh grillon-54 #will fail

It is possible to avoid using oarsh for ssh with the allow_classic_ssh job type, as in

Terminal.png fnancy:
oarsub -I -l nodes=2,walltime=0:30:0 -t allow_classic_ssh
Todo.png Todo

expand: To reserve at a specific date une -r

Using oarsub without specific options gives you access to resources configured in their production environment. You can use such an environment to run Java or MPI programs, or even to boot virtual machines with KVM, but you have no administrative privileges (root access) should your experiment require changing the software environment in a way or an other.

Get root access and create your own experimental environment with Kadeploy

Todo.png Todo

# oarsub -t deploy

  1. kadeploy3 
  2. installer un paquet
    1. http_proxy (whitelisting)
    2. save for furture usage or script installation
  3. Kaconsole et Kareboot ou kapower3
  4. parler de g5k-checks, du focus XP de G5K, expliquer que g5k-checks compare ses données avec l'API de réference, et donc introduire l'API

Using Kadeploy, Grid'5000 enables you to install your own software environment (be it a different Debian version, another Linux distribution, or even Windows) on nodes, and get root access.

Reserve one node with the deploy job type:

Terminal.png fnancy:
oarsub -I -l nodes=1,walltime=1 -t deploy

Start a deployment of the squeeze-x64-min image on that node:

Terminal.png fnancy:
kadeploy3 -f $OAR_FILENODE -e squeeze-x64-min -k

The -f parameter specifies a file containing the list of nodes to deploy. Alternatively, you can use -m to specify a node (such as -m graphene-42.nancy.grid5000.fr). The -k parameter asks Kadeploy to copy your SSH to the node's root account after deployment, so that you can connect without password. If you don't specify it, you can still connect, but SSH will ask you for a password. The root password for all Grid'5000-provided images is grid5000.

Reference images are named debian version-architecture-architecture. The debian version can be lenny (Debian 5.0, released in 02/2009), squeeze (Debian 6.0, released in 02/2011), or wheezy (Debian 7.0, to be released in 2013). The architecture is x64 (in the past, 32-bit images were also provided). The type can be:

  • min: a minimalistic image with no Grid'5000-specific customizations
  • base: min + various Grid'5000 tuning for TCP buffers, open file descriptors, drivers for Infiniband and Myrinet networks, etc.
  • nfs: base + support for mounting your NFS home, and using your Grid'5000 user account on deployed nodes
  • big nfs + packages for development, system tools, editors, shells.
  • prod: big + integration with OAR. The squeeze version of this environment is the one used when you use nodes without deploying.

Conclusions

Todo.png Todo

summarize important stuff seen in this tutorial

Todo.png Todo

introduce next tutorials