Globus
From Grid5000
| Warning | |
|---|---|
This practice was designed for the old OAR1 infrastructure of Grid'5000 and is now deprecated. | |
The Globus Toolkit is an open source software toolkit used for building Grid systems and applications. It provides tools to easily share computing power, databases, and other tools securely online across the IT boundaries (these boundaries being physical or logical).
The way we plan to use the Globus Toolkit across Grid'5000 is slightly different than the way it was built. Actually, an install of a grid in Grid'5000 is volatile and must be deployed effortless by users.
Hence, this practice presents Globus mechanisms, then the adaptation and technical choices made to install it within Grid'5000. Finally, it proposes kind of a using the Globus Toolkit in practice.
Globus 4 in Grid5'000
Technical choices to integrate Globus in Grid'5000
Technical choices are as follow:
User accounts
2 distinct users are created by default for the Globus Image:
- The
globusaccount: essential for globus to run, able to configure and launch the different services. - the user account
globus_user: local account used for experiments.
It is possible to create multiple local accounts. It is better to use local accounts instead of Grid'5000 LDAP accounts for performance purposes.
These two accounts have a default password (their name).
Communicating via SSH
To allow communications between globus nodes an ssh key has been generated in the Globus Image. The corresponding public key has been added to the ~/.ssh/authorized_keys root and globus user accounts. This key was generated without pass-phrase to be able to establish non-interactive connections to the nodes(essential for globus post-install scripts).
Certification Authority
We chose to use the simple-ca package provided by the Globus Toolkit. It allows us to use a simple and fast certification authority. This certification authority delivers X.509 certificates to:
- nodes
- globus services
- users
The delivery of certificates was automated in order to make easier the use of globus images. When personalizing an image, some tasks are done:
- for master: install of the certification authority
- for master and slaves: automatic generation (creation, signing) of user certificates(globus, globus_user), of the machine's certificate and the globus container's certificate (service).
this could be adapted to use an external certification authority.
Provided Environment
The environment build for grid'5000 is an image, that can be installed on nodes with kadeploy, based on a Debian Sid. Shell-scripts are added to automate the install and configuration process. Some install scripts are also provided in order to allow administrators, but also simple users, to build a globus image based on their own environment.
The deployment architecture is as follows:
- Master node
- contains the Certification Authority
- contains the Monitoring and Discovery System (MDS)
- has users credentials
- Slave nodes
Each node as also:
- a Globus container ready to be activated
- MDS service activated
- GRAM for local jobs submission (fork)
- RFT and GridFTP activated
- Security service activated.
Other services may be present but not properly configured.
How to deploy a Globus grid in Grid'5000
Which environment ? Which files ?
The base image file, contains the Globus toolkit installed, and all relative files. These files can be divided into 4 parts:
- preinstall script: /root/globus/config-globus-base-pre.sh. The prerequisites of the toolkit must be fulfilled. It sets all environment variables necessary for the toolkit installation and creates users.
- post-configuration script: When the environment is deployed, one must run the post installation script: /root/globus/config-globus-base-post.sh. This script gets its parameters from the params.sh script and uses the files packaged in the archive. The aim of the script is to configure and launch on all nodes the globus services.
- the master image specialization script: /root/globus/config-globus-master.sh. This script sets up the certification authority and generates the users' certificates for the current globus install.
- the slave image specialization script: /root/globus/config-globus-slave.sh. It takes as argument the master node's name and generates certificates for the node.
Another script install.sh can automatically launch the globus grid configuration process. It first configures the master node, then launches in parallel the slaves configuration, and waits for their end.
The base file is available on lille's gateway: /grid5000/images/rocks4all-globus4.tgz
Note: The params.sh file contains the values of different variables used either by globus or by other scripts. If you plan to set up your own environment, you have to update it.
Deployment and configuration
- Kadeploy stuff: in order to be able to use the GT, you must deploy an image on nodes reserved. This image is registered in an environment. You "ka"deploy your image form this environment from the command line:
kadeploy -e <env_globus> -f $OAR_FILE_NODES -p <partition>
For example, with the globus environment at Lille:
kadeploy -e globus4-debian -f $OAR_FILE_NODES -p sda3
- To be able to use the globus image, you must have root access on the node. This can be done simply by adding your ssh key to the image in
/root/.ssh/authorized_keysduring post-install (system).
- Once the image is deployed on all nodes, you have to connect to the master (the one you choose) as root user.
- Then you modify the list of nodes (for eg. machines) and tell all machines to be configured.
# cat machines node-1.lille.grid5000.fr node-2.lille.grid5000.fr node-1.bordeaux.grid5000.fr node-26.bordeaux.grid5000.fr node-54.sophia.grid5000.fr
- Then, you just have to launch
./globus/install.sh machineswhich will configure the <master> node and then slaves. When it will be finished, you will have something like:
- Then, you just have to launch
# vi machines # ./globus/install.sh machines ... Globus Grid is ready!
The globus grid is now configured and ready for use !
Using the Globus environment
First step: obtaining access
As already told, the master specialization script has generated certificates for the default globus user, named globus_user. So, this user is used for globus services. To be able to do so, we use an authentication proxy.
We do it this way:
- initialization of the proxy on the master only:
[globus_user@globus_master_node ~]$ myproxy-init -s <master>
You will be asked a pass-phrase. This pass-phrase is used when trying to authenticate with myproxy-logon.
- authentication on the all nodes (including the master node):
[globus_user@node ~]$ myproxy-logon -s <master>
You have plenty of options, for example to set the time limit of the proxy.
Globus 4 and the Web Services
First steps
First, we must run the Globus container as globus user on every node we plan to use the container. We do it this way:
[globus@node-xx globus]$ $GLOBUS_LOCATION/bin/globus-start-container
We can throw the container to the background using: CONTROL+Z then bg. Another command allows to launch the container already in the background:
[globus@node-xx globus]$ $GLOBUS_LOCATION/sbin/globus-start-container-detached
For the master to be able to register every container, the master's container must be initiated first.
We now use globus_user user account (root's public key is authorized in globus_user account)
Submitting jobs with the command line
To submit jobs with Globus4, we must use the globusrun-ws command(WS because Globus 4 uses Web Services), it is used the following way:
[globus_user@node-xx globus]$ globusrun-ws -submit -c /bin/touch touched_it
(This is to be launched on a node that runs a container)
This command has as goal to launch /bin/touch binary on the current node. To use resources on a remote machine we ask its container:
[globus_user@node-xx globus]$ globusrun-ws -F https://<node-yy>:8443/wsrf/services/ManagedJobFactoryService -submit -c /bin/touch touched_it
We can verify if the command was successful with an ls -l on target node to check that the touched_it file was really created.
Submitting a job via the RSL file
The syntax of an RSL file in Globus 4 contains the following information: (this is a template to show all the different options)
<job>
<executable>/bin/echo</executable>
<directory>/tmp</directory>
<argument>12</argument>
<argument>abc</argument>
<argument>this is an example_string </argument>
<environment>
<name>PI</name>
<value>3.141</value>
</environment>
<stdin>/dev/null</stdin>
<stdout>stdout</stdout>
<stderr>stderr</stderr>
<count>2</count>
</job>
Here is a brief explanation of the RSL syntax:
- <executable>: path to binary
- <directory>: directory of execution
- <argument>: an argument to the binary file
- <environment>: to set up <name> + <value> couples corresponding to environment variables.
- <stdin>: standard input redirection file
- <stdout>: standard output redirection file
- <stderr>: standard error redirection file
- <count>: number of time the command is repeated (echo here)
We can now build and launch an job using RSL with the following command:
[globus_user@node-xx globus]$ globusrun-ws -submit -f file.rsl
To see the output:
[globus_user@node-xx globus]$ cat /tmp/stdout [globus_user@node-xx globus]$ cat /tmp/stderr
Submitting a job with file transfer using an rsl file
To be able to transfer the executable file prior to the execution, one must add the <fileStageIn> instruction in the job description file. The <fileCleanUp> instruction cleans the files after the job execution.
For example, we can transfer the /bin/echo/ file of the source node to the home directory of the user, under the name my_echo, before launching the job. It will be deleted once the job ends.
Example of a simple rsl file for file transfer:
$ cat file.rsl
<job>
<executable>my_echo</executable>
<directory>${GLOBUS_USER_HOME}</directory>
<argument>Hello</argument>
<argument>World!</argument>
<stdout>${GLOBUS_USER_HOME}/stdout</stdout>
<stderr>${GLOBUS_USER_HOME}/stderr</stderr>
<fileStageIn>
<transfer>
<sourceUrl>gsiftp://<source_node>.grid5000.fr:2811/bin/echo</sourceUrl>
<destinationUrl>file:///${GLOBUS_USER_HOME}/my_echo</destinationUrl>
</transfer>
</fileStageIn>
<fileCleanUp>
<deletion>
<file>file:///${GLOBUS_USER_HOME}/my_echo</file>
</deletion>
</fileCleanUp>
</job>
Beware: One must replace <source_node> with the real name of the node (ex: node-1.lille.grid5000.fr)!
We can launch the job with the following command:
[globus_user@node-xx globus]globusrun-ws -submit -S -f file.rsl
The -S option is really important if file staging is enabled:
If the -S option is set AND the job description file includes staging or cleanup directives AND the job description does not include stagingCredentials and transfercredentials elements, globusrun-ws will delegate authorization access to WS GRAM and RFT, and will introduce the corresponding elements in the rsl file.
To see the output:
[globus_user@node-xx globus]$ cat ${GLOBUS_USER_HOME}/stdout
[globus_user@node-xx globus]$ cat ${GLOBUS_USER_HOME}/stderr
Submitting multiple jobs via RSL
Globus allows to launch multiple jobs with one file (no recurrence). This can help launching parallel jobs. In a multiple job, you must decide what will be the scheduler and the job manager. In the following example, we will use the node A for a multiple job doing:
- We launch three times the '
/bin/echo Hello World from xxx' command redirecting standard output to files that will be transferred to node B at the end of the job. - We launch '
/bin/echo Hello World from xxx' two time on node B redirecting the standard output.
rsl file:
$ cat multi.rsl
<?xml version="1.0" encoding="UTF-8"?>
<multiJob xmlns:gram="http://www.globus.org/namespaces/2004/10/gram/job"
xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing">
<factoryEndpoint>
<wsa:Address>
https://<node_A>.grid5000.fr:8443/wsrf/services/ManagedJobFactoryService
</wsa:Address>
<wsa:ReferenceProperties>
<gram:ResourceID>Multi</gram:ResourceID>
</wsa:ReferenceProperties>
</factoryEndpoint>
<directory>${GLOBUS_LOCATION}</directory>
<count>1</count>
<job>
<factoryEndpoint>
<wsa:Address>https://<node_A>.grid5000.fr:8443/wsrf/services/ManagedJobFactoryService</wsa:Address>
<wsa:ReferenceProperties>
<gram:ResourceID>Fork</gram:ResourceID>
</wsa:ReferenceProperties>
</factoryEndpoint>
<executable>/bin/echo</executable>
<argument>Hello World from <node_A>!</argument>
<stdout>${GLOBUS_USER_HOME}/stdout.p1</stdout>
<stderr>${GLOBUS_USER_HOME}/stderr.p1</stderr>
<count>3</count>
<fileStageOut>
<transfer>
<sourceUrl>file:///${GLOBUS_USER_HOME}/stdout.p1</sourceUrl>
<destinationUrl>gsiftp://<node_B>.grid5000.fr:2811//localhome/globus_user/stdout.p1</destinationUrl>
</transfer>
</fileStageOut>
</job>
<job>
<factoryEndpoint>
<wsa:Address>https://<node_B>.grid5000.fr:8443/wsrf/services/ManagedJobFactoryService</wsa:Address>
<wsa:ReferenceProperties>
<gram:ResourceID>Fork</gram:ResourceID>
</wsa:ReferenceProperties>
</factoryEndpoint>
<executable>/bin/echo</executable>
<argument>Hello World from <node_B>!</argument>
<stdout>${GLOBUS_USER_HOME}/stdout.p2</stdout>
<stderr>${GLOBUS_USER_HOME}/stderr.p2</stderr>
<count>2</count>
</job>
</multiJob>
The -J option of globusrun-ws really important when submitting multiple jobs:
If the -J option is set AND the description of the job does not include //jobCredential//, globusrun-ws will delegate the access authorizations to WS GRAM and will include corresponding element into the rsl file.
[globus_user@node-xx globus]$ globusrun-ws -submit -J -S -f multi.rsl
We can now check that everything went well (on node B):
$ more stdout.p1 Hello World from <node_A>! Hello World from <node_A>! Hello World from <node_A>! $ more stdout.p2 Hello World from <node_B>! Hello World from <node_B>!
More information on the RSL syntax
File transfers with RFT
The file transfers are done with RFT (Reliable Transfer Service).
The syntax for transferring files is as follows:
#Binary (true) or ascii (false) transfer true #Blocks size 16000 #TCP buffer size 16000 #Using a tier (e.g. because of a firewall) false #Number of parallel connections 1 #Use of authentication true #Concurrency of the request 1 #Subject of the source of the transfer (null if authentication is made at the node level) null #Subject of the destination of the transfer (null if authentication is made at the node level) null #All or none: if one transfer fails all others are erased false #Max trial numbers 10 #Source URL gsiftp://<source_node>:2811<path> #Destination URL gsiftp://<destination_node>:2811<path>
Example:
cat /tmp/rft.xfr true 16000 16000 false 1 true 1 null null false 10 gsiftp://node-xx.site.grid5000.fr:2811/etc/group gsiftp://node-xx.site.grid5000.fr:2811/tmp/rftTest_Done.tmp
One launches the transfer this way:
[globus_user@node-xx globus]$ rft -h <globus_container> -f /tmp/rft.xfr
Testing an application deployed in the Globus Container
Here we use a counter for example purpose. It is deployed by default in the Globus container.
First, we create a new counter:
$GLOBUS_LOCATION/bin/counter-create -s https://<node>:8443/wsrf/services/CounterService > epr
A new "counter" resource will be created and all information on the target will be saved in the file epr. This file can be used with a lot of commands like the wsrf-* and wsn-* clients using the -e option.
Then we launch the counter many times:
$GLOBUS_LOCATION/bin/counter-add -e epr 2
You should see the evolution of the counter:
$GLOBUS_LOCATION/bin/counter-add -e epr 2 2 $GLOBUS_LOCATION/bin/counter-add -e epr 2 4 $GLOBUS_LOCATION/bin/counter-add -e epr 2 6
Visualization of the MDS
As user globus on the <master> node, launch the Tomcat server:
[globus@globus_master_node globus]$ $CATALINA_HOME/bin/startup.sh
Then we can have access to the MDS via a web browser:
http://<master>.grid5000.fr:8080/webmds/
Polling the MDS
The MDS (Monitoring and Discovery Service), and more specifically the indexation service can also be polled via the Globus API (from a client application) or on the command line. This command allows to ask the default Index service at the mentioned URI and queries "all" (all available information) with the '/*' syntax:
$GLOBUS_LOCATION/bin/wsrf-query -s https://<master>:8443/wsrf/services/DefaultIndexService '/*'
In our architecture, we have defined the master node as the centralized MDS server (i.e. slave nodes register their information in the master node's MDS.) So, we will query the master node. The -s option let us use an URI, on the contrary the -e option takes an epr file as parameter.
Some examples:
- This command allows to use the wsrf-get-property for 'Entry' type elements:
$GLOBUS_LOCATION/bin/wsrf-get-property -s https://<master>:8443/wsrf/services/DefaultIndexService \
{http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-ServiceGroup-1.2-draft-01.xsd}Entry
- The following example queries the MDS to get the number of free CPUs counting the ComputingElements where the attribute is
FreeCPU:
$GLOBUS_LOCATION/bin/wsrf-query -s https://<master>:8443/wsrf/services/DefaultIndexService \ "count(//*[local-name()='GLUECE']/glue:ComputingElement/ glue:State/@glue:FreeCPUs)"
Using Globus in "Globus 2" mode, binary version
Submitting a job
The first test to do is that a node is ok for submitting.
globusrun -a -r <node>
You should see the following message on success:
GRAM Authentication test successful
Actually, globusrun is the base program to submit jobs on a Globus grid.
It asks the target node's scheduler to execute the command. Here, the -a option tells that we only try an authentication request, the -r option tells which resource manager is called. For more information on the globusrun options, one should try: globusrun -help
Even if you now know that you are able to submit jobs, the best way to be sure it really works is to try, isn't it ? So we can try to submit our first Globus job:
globusrun -s -r <node> '&(executable="/bin/echo") (arguments="Hello World")'
The argument '&(executable="/bin/echo") (arguments="Hello World")' follows the rsl (Resource Specification Language) syntax. In order to make the use of globus easier, one can also use the globus-job-run utility, this way:
globus-job-run <node> <executable> <arguments>
So, we can now submit the previous job like that:
globus-job-run <node> /bin/echo Hello World
Beware, the executable file must be available on the node you choose to execute it (with the same path).
globus-job-run command can overpass the problem with the -s option (for staging). With this option, globus-job-run transfers the file before execution.
For example, you can write a simple shell script on the master node and execute it on any slave node
Example of shell script:
#! /bin/bash echo -n "result is " echo "scale=4; 123+987" | /usr/bin/bc -l echo -n "Hostname is " echo `hostname`
You can also prepare a submission to globus-job-run in a file:
cat my_file node-xx.site.grid5000.fr <executable> <options>
And to use it:
globus-job-run -file my_file
Finally, you can use the command globus-job-submit to launch a jog to the background:
globus-job-submit <node> <executable> <options>
You will get with this command a reference to the job, and will be able to see its evolution with the following commands:
globus-job-status <reference> globus-job-get-output <reference> globus-job-clean <reference>
File Transfers
In order to have a complete overview of the capabilities of Globus in binary mode, we will talk shortly on file transfers. We can do it in two ways:
- use directly the globus-url-copy utility
- use globusrun with an rsl file.
Globus uses GSIFTP to transfer files. The tests will be done with the /etc/passwd file of the current node, for example.
- using globus-url-copy:
# globus-url-copy file:<file to transfer> gsiftp://<target node>/<path to the target directory>
- using globusrun:
// the file gass-put.rsl & (executable=$(GLOBUS_LOCATION)/bin/globus-url-copy) (arguments=$(GLOBUSRUN_GASS_URL)<path to target file) "file:<target directory>") (environment=(LD_LIBRARY_PATH $(GLOBUS_LOCATION)/lib)) //Transfer launch #globusrun -s -r <target node> -f gass-put.rsl
</code>
Exercise
With the following script (pi-test.sh):
#! /bin/bash
get_host_count() {
NB=${#1}
}
compute_pi() {
echo "scale=$1; 4*a(1)" | /usr/bin/bc -l
}
case $# in
0) get_host_count $(hostname) ; compute_pi $NB ;;
1) compute_pi $1 ;;
*) echo "usage: ./pi-test.sh [digit_number]" ;;
esac
Execute this script on 2 slave nodes one with an argument, the other without it, from the master node. Redirect the standard output to files you transfer on the master node.
Hint: The RSL syntax can be found here: Media:RSL syntax.pdf
MPICH-G2
MPICH-G2 that can be used to execute MPI programs on a globus Grid is installed by default in the globus image.
To compile your application, you can use the classical commands mpiCC, mpiXX, etc...
This version of MPICH only works with Globus 2 but a revision should come to work with Globus 4. To submit an MPI job you must have a machine file in your working directory.
There are two different ways to deploy and execute an MPI job:
- use the
mpiruncommand as follows:
mpirun -np <processor_number> <my_prog> <my_arguments>
- generate an rsl file, to complete or modify to use globusrun:
mpirun -dumprsl -np <processor_number> <my_prog> <my_arguments> > <my_prog>.rsl cat <my_prog>.rsl | sed -e 's|(executable="\(.*\)")|(executable=$(GLOBUSRUN_GASS_URL)\1)|g' > test.rsl globusrun -s -f test.rsl
Sed is here used to modify the rsl file to force the transfer of the executable on every target node. One can also modify the rsl file to add directives for example to transfer data Here is an example RSL file:
+
( &(resourceManagerContact="node_A")
(count=1)
(label="subjob 0")
(environment=(GLOBUS_DUROC_SUBJOB_INDEX 0)
(LD_LIBRARY_PATH /opt/globus/globus-4.0.1/lib/))
(arguments= "my_argument")
(executable=$(GLOBUSRUN_GASS_URL)/localhome/globus_user/demo/my_prog)
(file_stage_in = ($(GLOBUSRUN_GASS_URL)/localhome/globus_user/demo/data1 data1)
($(GLOBUSRUN_GASS_URL)/localhome/globus_user/demo/data2 data2))
)
( &(resourceManagerContact="node_B")
(count=1)
(label="subjob 1")
(environment=(GLOBUS_DUROC_SUBJOB_INDEX 1)
(LD_LIBRARY_PATH /opt/globus/globus-4.0.1/lib/))
(arguments= "my_argument")
(executable=$(GLOBUSRUN_GASS_URL)/localhome/globus_user/demo/my_prog)
(file_stage_in = ($(GLOBUSRUN_GASS_URL)/localhome/globus_user/demo/data1 data1)
($(GLOBUSRUN_GASS_URL)/localhome/globus_user/demo/data2 data2))
)
( &(resourceManagerContact="node_C")
(count=1)
(label="subjob 2")
(environment=(GLOBUS_DUROC_SUBJOB_INDEX 2)
(LD_LIBRARY_PATH /opt/globus/globus-4.0.1/lib/))
(arguments= "my_argument")
(executable=$(GLOBUSRUN_GASS_URL)/localhome/globus_user/demo/my_prog)
(file_stage_in = ($(GLOBUSRUN_GASS_URL)/localhome/globus_user/demo/data1 data1)
($(GLOBUSRUN_GASS_URL)/localhome/globus_user/demo/data2 data2))
)
Beware: Do not transfer the executable in a stage_in directive, the copied file will not have execution rights.
