G5kss09 GRUDU TP
In a grid environment such as Grid'5000 (1), we need several complex tools for the management of resources, the allocation of resources to the job or even for the deployment of Kadeploy (2) images. Most grid software systems use command-line interfaces without any Graphical User Interface (GUI). For the creation of a tool dedicated to the management of grid environments, different functions are mandatory. We can consider three main graphical interfaces for such framework: one for resource monitoring, one for resources allocation and one for distant usage (terminal and file access). GRUDU(3) answers to the need of an unified set of tools providing the user with a complete, modular, portable, and powerful way to manage grid resources (it must be specified that GRUDU as a stand alone tool is a fork of the resource tool used in the DIET Dashboard (4), a front-end for DIET (5), a GridRPC middleware).
Installation and Setup
To be able to work, GRUDU needs some prerequisites to be fulfilled concerning the network and the ssh configuration of your Grid'5000 account. GRUDU is written in Java (6) and thus it can be executed on any platform that offers a recent version of the Java runtime (At least, 1.5.0 version or higher). Currently, GRUDU support only the Bourne shell, so if your Grid'5000 account uses another shell type, you need to change it. This requirement is only for your Grid'5000 account and you still can use your preferred shell on your machine. To allow GRUDU to access to all the platform, you need to configure your account to authorize a direct access to the different sites. To do so make sure all these conditions are fulfilled:
- you have your ssh key in every site: each Grid'5000 site has its own NFS filesystem, so you need to copy your ssh key (at least the public one) in your .ssh directory.
- you have you public ssh key in the file $HOME/.ssh/authorized_keys. You can do this by such a command :
- the following option should be present in your $HOME/.ssh/config file :
Host * StrictHostKeyChecking no
For more information about ssh access to Grid'5000, key management or file exchange please refer to the documentation of Grid'5000 at : https://www.grid5000.fr/mediawiki/index.php/Documentation
To use GRUDU you should first download it from the dedicated web page (http://grudu.gforge.inria.fr/downloads.html). The installer is highly recommended (some directories must be created on the different Grid'5000 sites for the good use of some functionalities: Ganglia, Kadeploy interface, etc.)
- Once you've downloaded GRUDU you can install it.
GRUDU is provided through a single installation jar file containing the GRUDU software, the required libraries, the source files and the documentation (User Manual and JavaDoc). This installation file has been created with IzPack(7).
To launch the installer you can either double-click on the installer jar file(8) , or launch the jar file from a shell terminal with the following command:
The installation is separated into two parts: the installation of the software itself (the jar file, the libraries and the resource files), and then its configuration (locally and remotely).
Installation of the software
The first one corresponds to the selection of the different "packages" you want to install. Five packages are available:
- The base package contains the software and the mandatory libraries (It is required).
- The JFTP module for GRUDU. This module corresponds to a File Transfert Protocol module. This module allows you to transfert data between Grid'5000 and your local machine, but also between the frontales of Grid'5000.
- The Ganglia module for GRUDU. This module corresponds to a plugin retrieving data from Ganglia to display low-level information about all the nodes of a site or the history of these metrics for the nodes of your jobs.
- The documentation package corresponds to the User's Manual and the JavaDoc of GRUDU.
- The source code of GRUDU.
- Install the base package, the JFTP and the Ganglia module
If you have an Unix-like operating system (Linux or BSD variants) or Windows, the seventh panel will allow you to put shortcuts on your desktop and also in the program group if you want to.
After having installed GRUDU, you should configure it. The configuration panel is separated into two parts, the first concerns the access to GRID’5000. In this tab you have to define :
- a preferred access point (the external frontal that will be used to enter in the GRID’5000 network). The ComboBox contains the different sites of GRID’5000.
- your user name (your GRID’5000 login)
- your ssh public key, rsa or dsa (for more information about ssh access with public/private keys, please refer to the SSH page.
Your passphrase is never stored on the disk. It is kept in memory during the life of the GRUDU instance (same behaviour as ssh-agent)
- Enter the correct information to allow the installer to generate the mandatory remote directories and generate the local configuration
The second tab of the panel consists in selecting the clusters you want to enable in GRUDU (e.g. the clusters that will be considered when launching oarstat or oarnodes commands, or when reserving machines). In this tab you will also be able to define the partitions used by KaDeploy for the deployment of an image (for more informations about the partitions you can specify please refer to the pages of the sites on the or to the messages of the day displayed when you get connected to a cluster).
- When all these information will be filled out, you will be able to write the configuration by clicking the “Write configuration” button. This action will create the local hierarchy of files mandatory for GRUDU (a directory called .diet containing all the files for GRUDU will be created in your home directory), and the remote hierarchy of files mandatory for GRUDU (approximately the same on the clusters of GRID’5000).
You can launch GRUDU with two different ways :
- If you choosed to install the Desktop Shortcut, just click on !
- Else, you can launch it with the script
UnixLauncher.sh(Unix-like OS) and
WinLauncher.bat(Windows) on top of your installation directory (
GRUDU is composed of one principal frame shown in the following figure. From this frame the user will be able to:
- Log in GRID’5000
- Monitor GRID’5000 and his/her reservations
- Have a terminal on the different sites main nodes of his/her reservations on GRID’5000
- Deploy images through Kadeploy on the appropriate nodes
- Exchange files between the locale machine and GRID’5000 but also synchronize files between GRID’5000 frontends.
- Log out
- Display the Help of GRUDU
Legend of the figure:
- A Options toolbar (left-hand side).
- 1 Button used to log in GRID’5000 (When connected to GRID’5000 you will have a button to log out).
- 2 Button used to display the reservation frame.
- 3 Button used to update the GRID’5000 tree of sites and jobs.
- 4 Button used to display the configuration frame.
- 5 Button used to display a summary of the information about GRID’5000 and your reservations.
- 6 Button used to display a terminal on the preferred access point you have defined.
- 7 Button used to display the Kadeploy frame for the deployment if images with user defined environments.
- 8 Button used to display the JFTP module for GRUDU. This module allows the user to transfert data between your locale machine and GRID’5000. You can also transfert data between the GRID’5000 frontales.
- B Options toolbar (right-hand side)
- 9 JavaHelp dedicated to the Help of GRUDU.
- 10 Application settings of GRUDU.
- C Legend of the colors used for the sites, clusters and jobs.
- D Main panel where information are displayed. Information about GRID’5000, the different sites and the jobs are displayed here.
- E GRID’5000 sites and jobs.
- 11 Root node of the GRID’5000 sites and jobs tree. this node allows you to display information about the grid. When right-clicking on this node, you can either update the GRID’5000 view, open a shell on your preferred access point or delete all your reservations on GRID’5000.
- 12 Node corresponding to a site. Information about the site, i.e. the occupation of the nodes and the existing reservations on this site. When right-clicking on a site node, you can either delete the reservations you have on the site or open a shell on the cluster frontale.
- 13 Node corresponding to a job. Information on the job are displayed in the information panel. When right-clicking on that node, you will able to delete the corresponding reservation, update the site view or open a shell on the main node of the corresponding job.
A tip of the day frame is shown (is you want so) at GRUDU startup and presents you some tips for the use of GRUDU. You can enable/disable the frame in the application configuration frame (see 9).
- Now that you know the base interface you can get connected to the grid with the connect button
Grid resources monitoring
Grid monitoring is important for a default user before he or she reserved resources, but also after he or she has reserved resources. Before submitting any job to a grid, the user should be aware of the available nodes considering their states (free/already used/dead). Whenever there is not enough resources, the user should be able to know when these will be available for computation. After having successfully submitted some jobs, the user should have some interface to get the information about his jobs but also the other jobs running on the grid. Even if sometimes more information could be interesting for expert users, too lower level information could be unusable for the default user who only wants to perform computations on some resources for a given period of time.
- After being connected to Grid'5000, GRUDU gets the status of the different sites enabled in the configuration panel
To monitor GRID’5000 three views can be displayed:
- The first view in the following Figure corresponds to the GRID’5000 view. You can see the occupancy of the grid in term of free/occupied/dead/absent/suspected/possessed by you for each site and for the entire grid. Added to the states of the nodes, you can also see which nodes you have reserved. You can also see a table summarizing these information. Finally you have a table of your reservation(s) on the grid. Thanks to two buttons you can save your reservation(s) in a directory for a future use (for example in the DIET Mapping Tool or with the XMLGoDIETGenerator)
- Click with the left button on the G5K node in the Tree to see the status of the grid.
- You can switch between the different possible views.
- You can also save the graph by right clicking on it.
Being able to save the graph presenting the status of the platform can help you presenting the use of the platform during an experiment.
- Modify the different parameters of the generated output to see the impact
- The second view in the following figure corresponds to a site view. A graph represents the different numbers of nodes for each node state and the ones corresponding to your possible reservation(s), a table presents these information in a different way. Another table presents the reservation(s) realized on the site. You can also display a Gantt chart of the different reservations of the cluster to know when you are able to reserve.
- Click with the left button on the site node you want in the Tree to see the status of the site.
- You can switch between the different possible views.
- You can also save the graph by right clicking on it.
concerning the site view you have two views. One with the status, the second one (if you installed the Ganglia plugin) refers to the ganglia usage. This view will be explored later
- The third view in the following figure corresponds to the job view. Here you can see the different information of the job such as the nodes of the reservation, the job state, the walltime, etc . . . If you selected the Ganglia plugin at the installation step, you also have a button bar on the right hand side of the frame that will be populated with a Ganglia history information button allowing you yo get history on the low-level information of the nodes of your reservation.
- Click on a existing job in the Tree to see its caracteristics (select a job where you're not the owner)
For the moment only the view of the caracteristics of the job is available (as far as its not your job)
The Ganglia module for GRUDU
Ganglia short introduction
Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on thousands of clusters around the world. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes.
Ganglia is an open-source project that grew out of the University of California, Berkeley Millennium Project which was initially funded in large part by the National Partnership for Advanced Computational Infrastructure (NPACI) and National Science Foundation RI Award EIA-9802069. NPACI is funded by the National Science Foundation and strives to advance science by creating a ubiquitous, continuous, and pervasive national computational infrastructure: the Grid. Current support comes from Planet Lab: an open platform for developing, deploying, and accessing planetary-scale services.
As Ganglia is installed on GRID’5000, the GRUDU users can have acces to the information provided by the software inside GRUDU. If you selected the ganglia plugin during the installation step of GRUDU, there is two ways to use it:
- From the site information panels, where you can have instantaneous low-level information on every nodes of the site (computation nodes but also frontends).
- Return to a site view by clicking on a site in the tree on your left
- Click on the ganglia button on the right-hand side of the frame
- Select in the combobox the node you want to get information on
"Nodes deployed with images where ganglia is not installed and configured won't give you information about their statuses
- Change the information displayed for the selected node
Information displayed is only instantaneous information. You can only get history about the node status if you reserve it and use the ganglia view of the job
- From the job information panels, where you can get the history of the low-level information brought to you by Ganglia. Concerning the generation of the history you have first to configure the history generation, which means: defining the period of data refreshing, the range of the chart, and the path to the java home on the main node of your reservation for the launch of the remote jar creating the history.
Grid resources allocation
Allocation using OAR Interface
The most used operation is probably resources allocation. In GRID’5000, this operation can be done by the OAR system. GRUDU provides an easy way to manipulate OAR (either the OAR1 or OAR2 versions). The resources allocation window on the following figure shows a map of France with GRID’5000 sites and jobs characteristics (time, queue, oargridsub behaviour, the script to launch). These information are presented on the first tab of the window. The second tab provides the definition of the properties for the different sites. Since some sites include more than one cluster, you have to click first on the site, and then select the number of desired nodes for each cluster or you can specify that you do not care where they are located). When selecting resources numbers, the map displays the total number of requested resources for each site. Jobs characteristics are:
- Time parameters: date and reservation walltime. The starting time can be specified manually, or through the use of a calendar for the day and through boxes where you can specify the hour, the minute and the second. For the walltime you have to define the number of hours and the minutes of your reservation.
- Queue: default, deploy (for Kadeploy) or allow_classic_ssh (specific for OAR2 but corresponds to default for OAR1).
- OARGridSub behaviour: the user can specify if the reservation should be done with the OARGridSub behaviour, i.e. when the user chooses to realize several reservations on different sites, if one fails, all the previous successful reservations are deleted.
- A script to launch: The user can specify a script that will be launched in order to be executed on the reserved nodes. The reservation will be stopped when the script ends.
- Realize a reservation of one node on your preferred site
Concerning the second tab of the window presented on the following figure, it allows the user the ability to define the OAR properties that will be used for the reservation.
- You can specify a special OAR property on a site (the use of mirynet for example on site that support it/or don't) to see the impact of the reservation (if the property is not available, no nodes would be allocated)
After the reservation, a status frame summarizes the information about the success (or not) of your jobs submission.
- Go back to the site view and update it (By right clicking on the site in the tree and select "update the cluster view"). Your job will appear in the tree
- In the site view you will now see your reservation and the percentage of the site you have reserved. If the job does not appear in the tree on your left update the site by
OAR GridSub Behaviour
- Try to realize a reservation with nodes on different sites with OARGridSub behaviour. If you select more nodes on a site than available no reservation would be realized.
Ganglia plugin for a job
- If you have a running reservation go on the job view. And click the ganglia button.
- Some parameters are needed :
- The refresh time could be set to 00:00:00 (it will generate some load and cpu usage)
- For the range you can specify the walltime of your job
- The java home path is mandatory (for example: /usr/lib/jvm/java-1.5.0-sun)
- After having specify the parameters you can refresh the view and select the node you want information on
Getting a list of nodes
When you have some nodes reserved you can either go on the site view or on the grid view to see your reservation(s), or directly click on the Button used to display a summary of the information about GRID’5000 and your reservations. (5 on the user interface image).
- Go on the site view where one of your reservations is running to see the information
If you need to keep these information you can then save either all the reservation information, or only for a selection of reservation. A directory wil be created with a file by job with the corresponding information and a file by site with the corresponding nodes.
- Realize a reservation on several sites from the OAR reservation interface
- Save your reservations to a directory
You have several ways to kill jobs:
- directly by right clicking on the job and selecting kill reservation. This will work for only one job at a time.
- If you want to kill all the jobs of a site you can right-click on the site and select kill all reservations.
- You can also kill all your reservations on Grid'5000 by right-clicking on the G5K node and selecting the kill all reservations. A warning frame will appear asking you if you are sure.
- Kill all the reservation of a site or on the grid
Deploy your Kadeploy images
Kadeploy is a fast and scalable deployment system for cluster and grid computing. Kadeploy is the reconfiguration system used in GRID’5000, allowing the users to deploy their own OS on their reserved nodes.
The following figure shows you the frame allowing you to deploy images on which you have rights. The left hand side of the frame corresponds to clusters and nodes available for deployment (i.e. reserved on the deploy queue). You can click on the checkboxes to select/deselect the nodes. If you want to select/deselect all nodes, you can click on the corresponding button on the right hand side of the frame. Then you can select the image you wan to deploy from the lists on the right hand side of the frame.
When you are done with the configuration, you can click on the deploy button. A new frame will be displayed corresponding to the log of the deployment (with both standard output and error).
- Realize a reservation on the kadeploy queue
- Refresh the site view
- Click on the KaDeploy button to get the available images and select one of them (for example the lenny one with nfs support)
- Click on the nodes' boxes that should be deployed and then click on the deploy button
- A frame with a tab by site will appear with the log of the deployment
You should be aware that there is a limitation on the number of simultaneous deployment due to the ssh library used by GRUDU
Grid resources access
If the goal is to realize experiments, the user needs to have an simplified access to the remote machines and to the files located on the different sites of Grid'5000. To fulfill these needs a graphical tool should thus provide a remote access and a file transfer interface to put/get files to/from the grid and its different sites.
GRUDU provides both functionalities through the following features:
- Terminal Access which can be linked to different points: the grid access site, any site frontend and any reservation main node.
- File Transfer interface: which allows the user to transparently transfer files from/to the grid access site, between the local and any of the remote sites and also a synchronization between two distant sites.
GRUDU provide a terminal access to different levels in the grid:
- At the grid level (you will have a remote access to the preferred access site)
- Try to open a shell by right-clicking on the G5K node in the tree. You will have an access on the preferred access point
- At the site level (you will have a remote access to the frontend of the corresponding site)
- Try to open a shell on another site (different from the preferred access point) by right clicking on the selected site in the tree.
- At the job level (you will have a remote access to the first node of your reservation)
- Try to open a shell on a specific job (works only for you own jobs) by right-clicking on the corresponding node in the tree. You will be directly connected to the main node of your job
It should be noted that the behaviour is slightly different for the kadeploy jobs where you get an access to the frontend
JFTP Module for GRUDU
JFTP is an acronym for Java File Transfert Protocol. JFTP is a graphical Java network and file transfer client. At the origin JFTP is developed as a project under GNU GPL license. You can find more information about the initial project at http://j-ftp.sourceforge.net/ .
The JFTP module presents three internal frames, one for the local machine, one for GRID’5000 with one tab per site, and the last frame for the log of the module.
- Create a directory remotely on a site
- Transfer a file from your local machine to the one of the enabled sites
- Synchronize the directory to every sites
- Copy the created directory from the distant site to your local machine
- Delete the distant directory
For the configuration of the options for the Rsync transfert between GRID’5000 frontales, you can click on the option menu and you will find the following frame where you can edit the Rsync options:
- Modify the rsync parameters
- Create a new directory on one of the enabled sites
- Synchronize it
- R. Bolze, F. Cappello, E. Caron, M. Daydé, F. Desprez, E. Jeannot, Y. Jégou, S. Lanteri, J. Leduc, N. Melab, G. Mornet, R. Namyst, P. Primet, B. Quetier, O. Richard, E. Talbi, and I. Touché. Grid'5000: a large scale and highly reconfigurable experimental grid testbed. International Journal of High Performance Computing Applications , 20(4):481--494, November 2006.
- Y. Georgiou and J. Leduc and B. Videau and J. Peyrard and O. Richard. A Tool for Environment Deployment in Clusters and light Grids. Second Workshop on System Management Tools for Large-Scale Parallel Systems (SMTPS'06), Rhodes Island, Greece, April 2006.
- E. Caron, F. Desprez, and D. Loureiro. All-in-one graphical tool for the management of diet a gridrpc middleware. In CoreGRID Workshop on Grid Middleware (in conjunction with OGF'23) , Barcelona, Spain, June 2-6 2008.
- E. Caron and F. Desprez. DIET: A Scalable Toolbox to Build Network Enabled Servers on the Grid. International Journal of High Performance Computing Applications, 20(3):335--352, 2006.
- Sun, the Sun logo, Sun Microsystems, Java, and all Java-related trademarks are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries.
- IzPack is an installer's generator for the Java platform
- works on operating systems where the jar mime-type is managed by java