G5kss09 DIET TP
From Grid5000
Introductory Hands On
Introduction
This is a basic tutorial introducing how to use the Diet (Distributive Interactive Engineering Toolbox) middleware on Grid'5000.
A KaDeploy image containing all necessary applications and libraries has been recorded under the name DIET_tutorial. It contains software and libraries required to run Diet.
Throughout this tutorial, you will need the Diet User's Manual [1], a pdf version is available in the KaDeploy image under /opt/UserManual.pdf, and an online version is also available: http://graal.ens-lyon.fr/DIET/UsersManualDIET2.3.
Diet Middleware
The Distributed Interactive Engineering Toolbox (Diet)[2][3] project is focused on the development of a scalable middleware with initial efforts focused on distributing the scheduling problem across multiple agents. This middleware is able to find an appropriate server according to the information given in the client's request (e.g., problem to be solved, size of the data involved), the performance of the target platform (e.g., server load, available memory, communication performance) and the local availability of data stored during previous computations.
The Diet component architecture is structured hierarchically for improved scalability. Such an architecture is flexible and can be adapted to diverse environments, including arbitrary heterogeneous computing platforms. The Diet toolkit is implemented in CORBA and thus benefits from the many standardized and stable services provided by the various freely-available CORBA implementations. CORBA systems provide a remote method invocation facility with a high level of transparency.
The Diet framework is composed of several components:
- A Client is an application that uses the Diet infrastructure to solve problems using an RPC approach. Clients access Diet via various interfaces: web portals, Problem Solving Environments (PSEs) such as Scilab, or programmatically using published C or C++ APIs.
- A SeD , or server daemon, acts as the service provider, exporting functionalities via a standardized computational service interface; a single SeD can offer any number of computational services. A SeD can also serve as the interface and execution mechanism for either a stand-alone interactive machine or a parallel supercomputer, by interfacing with its batch scheduling facility.
- Agents , which are the components that facilitate the service location and invocation interactions of clients and SeDs.
Collectively, a hierarchy of agents provides higher-level services such as scheduling and data management. These services are made scalable by distributing them across a hierarchy of agents composed of a single Master Agent (MA) and several Local Agents (LA). The figure shows an example of a Diet hierarchy.
The Master Agent of a Diet hierarchy serves as the distinguished entry point from which the services contained within the hierarchy may be logically accessed. Clients identify the Diet hierarchy using a standard CORBA naming service. Clients submit requests - composed of the name of the specific computational service they require and the necessary arguments for that service - to the MA. The MA then forwards the request to its children, who subsequently forward the request to their children, such that the request is eventually received by all SeDs in the hierarchy. SeDs then evaluate their own capacity to perform the requested service; capacity can be measured in a variety of ways including an application-specific performance prediction, general server load, or local availability of datasets specifically needed by the application. SeDs forward this capacity information back up the agent hierarchy. Based on the capacities of individual SeDs to serve the request at hand, agents at each level of the hierarchy reduce the set of server responses to a manageable list of server choices with the greatest potential. The server choice can be made specific to any kind of application using plug-in schedulers at each level of the hierarchy.
Exercise 1 - Installing and compiling Diet
You first need to reserve a few nodes on which you will be able to deploy a Diet hierarchy. In order to simplify the installation process, a KaDeploy image containing all required software and libraries is provided (omniORB, BLAS, Diet tools, etc. ).
- Reserve a few nodes in deploy mode (replace n with the number of nodes you want, and t with the walltime):
- Deploy the Diet image:
- Wait a few minutes, as this last step can be quite long.
The /opt directory contains the following:
- diet_basic: Diet already installed for this tutorial, if you do not want to bother with the installation process
- diet_basic_tutorial: files required for this tutorial
- diet_dagda_tutorial: files required for the advanced data management tutorial
- diet_dagda_wf: Diet already installed for the advanced data management and workflow management tutorial, if you do not want to bother with the installation process
- diet_src: Diet source repository
- diet_utils:
- GoDIET: Diet deployment tool
- LogService: Diet logging tools, you will need them in the workflow tutorial
- VizDIET: VizDiet, Diet visualization tool
- diet_wf_tutorial: files required for the workflow management tutorial
- scripts: a few scripts to help you set your environment, and deploy a diet platform
- UserManual.pdf: Diet user manual
The installation process is described in the User's Manual. Install Diet in the directory of your choice, for instance ${HOME}\DIET. To install Diet, basically what you need to do is the following:
- create a temporary working directory, and change directory to it
- use CMake to run diet configuration:
- press 'c' to run initial configuration, and select compilation options, for this tutorial you only need to change the installation directory (CMAKE_INSTALL_PREFIX), turn on examples compilation if you want to (DIET_BUILD_EXAMPLES and DIET_BUILD_BLAS_EXAMPLES). Then press 'c' (configure), then 'g' (generate), CMake will exit, you can finally type make && make install.
You will need to set your PATH and LD_LIBRARY_PATH respectively to <DIET_HOME>/bin and <DIET_HOME>/lib.
Exercise 2 - An example of matrix computation
In this section, we learn how to program a simple client/server application that uses Diet. We will use the context of matrix computation to make this program look a bit more interesting than just a "Hello world". We will implement a basic scalar by matrix product. Then, we will test this program using different schemes of execution.
Files skeletons
The /opt/diet_basic_tutorial/exercise2 directory contains all the skeleton files needed for a quick start. Useful pieces of software are also included in them. This directory contains the following files:
- Makefile
- management of dependencies between source and compiled files
- server.c
- program implementing the service (scalar by matrix product)
- client_smprod.c
- program using the service defined in server.c: the matrix is stored in memory
- client_smprod_file.c
- same program as client_smprod.c, except that the matrix is passed as a file to the server
- client_smprodAsync.c
- same program as client_smprod.c, except that asynchronous mode is used to call Diet
- matrix1
- an example matrix file to be used along with client_smprod_file.c
Copy them in a working directory.
Server-side implementation
Using the skeleton of program server.c, write a service of scalar by matrix product. This service will have the following parameters:
| Parameter | Type |
| a scalar | double |
| a matrix to be multiplied | double |
| computing time | float |
The intial matrix is overwritten by the result. The matrix will be stored in memory.
First, try to define a detailed interface for the service, i.e., a precise definition of the service profile . To do so, have a look at in, inout and out parameters.
You can also have a look at the page concerning data management.
Then, program the solve function solve_smprod, as well as the initialization of the service in the main function.
int solve_smprod(...) {
}
The following function is given to help you:
int scal_mat_prod(double alpha, double *M, int nb_rows, int nb_cols, float *time)
It mutliplies the M matrix (an nb_rows by nb_cols matrix) by the alpha scalar. The result is the matrix alpha*M and the time in seconds taken by this operation.
Data Types
There are two kinds of data types: base types, and composite types. Base types have the following semantics:
| Type | Description | Size in octets |
| DIET_CHAR | Character | 1 |
| DIET_BYTE | Octet | 1 |
| DIET_INT | Signed integer | 4 |
| DIET_LONGINT | Long signed integer | 8 |
| DIET_FLOAT | Simple precision real | 4 |
| DIET_DOUBLE | Double precision real | 8 |
| DIET_SCOMPLEX | Simple precision complex | 8 |
| DIET_DCOMPLEX | Double precision complex | 16 |
Here are the composite types you can use:
| Type | Possible base types |
| DIET_SCALAR | all base types |
| DIET_VECTOR | all base types |
| DIET_MATRIX | all base types |
| DIET_STRING | DIET_CHAR |
| DIET_FILE | DIET_CHAR |
Persistence Modes
Various persistence modes for data
- DIET_VOLATILE
- No persistency at all.
- DIET_PERSISTENT_RETURN
- (valid for INOUT and OUT arguments only)
- Data are saved on the server and a copy is sent back to the client after the computation is complete.
- DIET_PERSISTENT
- Data are saved on the server and nothing is brought back to the client.
- DIET_STICKY
- Data are saved on the server, they cannot been moved from there to another server, and thus cannot be sent back to the client.
| Note | |
|---|---|
You will only need the DIET_VOLATILE persistence mode throughout the basic hands on session. Only for the example using a file as INOUT parameter will you need to use the DIET_PERSISTENT mode. | |
Initializing the Service Table
You have the following function to define the number of services a server can handle, and add the services to the service table (we do not need the diet_convertor_t *cvt, so you can set it to NULL).
int diet_service_table_init(int max_size);
int diet_service_table_add(diet_profile_desc_t *profile,
diet_convertor_t *cvt,
diet_solve_t solve_func);
void diet_print_service_table();
Creating a Profile
Here are the main functions available to create a profile.
diet_profile_desc_t* diet_profile_desc_alloc(const char* path,
int last_in, int last_inout, int last_out);
int diet_generic_desc_set(diet_arg_t *desc,
diet_data_type_t type,
diet_base_type_t base_type );
int diet_profile_desc_free(diet_profile_desc_t* desc);
diet_arg_t* diet_parameter(diet_profile_t* profile, int position);
You need to define the profile of your service. last_in, last_inout and last_out correspond to the index in an array of the last IN, INOUT and OUT parameters (for example if last_in = 0 then we have 1 IN parameter, if last_in = last_inout, then there is no INOUT parameter).
diet_generic_desc_set has to be called for each parameter in the profile. For example:
diet_generic_desc_set(diet_param_desc(profile,0), DIET_SCALAR, DIET_DOUBLE);
Getting and Setting IN, INOUT and OUT Arguments
You have to get IN and INOUT arguments using functions having the following syntax:
int diet_*_get(diet_arg_t* arg, void** value,
diet_persistence_mode_t* mode);
For the time being, you can set the mode to NULL. For example you could have for an IN parameter:
double* coeff; ... diet_scalar_get(diet_parameter(pb,0), &coeff, NULL);
You have to set INOUT and OUT arguments using functions having the following syntax:
int diet_*_desc_set(diet_arg_t* arg, ...);
where * can be for example scalar, matrix... For example, you could have for an OUT parameter:
float* time; ... diet_scalar_get(diet_parameter(pb,2), &time, NULL); ... diet_scalar_desc_set(diet_parameter(pb,2), time);
| Note | |
|---|---|
You can also have a look at /opt/diet_basic/include/DIET_server.h and /opt/diet_basic/include/DIET_data.h. | |
Client-side implementation
Using the client_smprod.c skeleton file, write a client for the service defined above. You will need to initialize a matrix and a scalar with known values so as to be able to verify if the answer is correct or not.
You will have to remember that the profile used in the client must match exactly the server profile.
You can also write another version of the client, using asynchronous calls to Diet.
Initializing and Finalizing a DIET Session
The client program must open its DIET session with a call to diet_initialize, which parses the configuration file to set all options and get a reference to the DIET Master Agent. The session is closed with a call to diet_finalize, which frees all resources associated with this session on the client. Note that memory allocated for all INOUT and OUT arguments brought back onto the client during the session is not freed during diet_finalize; this allows the user to continue to use the data, but also requires that the user explicitly free the memory. The user must also free the memory he or she allocated for IN arguments.
/* Before any call to DIET */ diet_error_t diet_initialize(char* config_file_name, int argc, char* argv[]); /* When you do not need DIET anymore */ diet_error_t diet_finalize();
Create Profile and Set Arguments
Allocate a DIET profile with memory space for its arguments. pb_name is deep-copied. If no IN argument, please give -1 for last_in. If no INOUT argument, please give last_in for last_inout. If no OUT argument, please give last_inout for last_out. Once the profile is allocated, please use set functions on each parameter. For example, the nth argument is a matrix:
diet_matrix_set(diet_parameter(profile,n),
mode, value, btype, nb_r, nb_c, order);
diet_profile_t* diet_profile_alloc(char* pb_name,
int last_in, int last_inout, int last_out);
/* Once you do not need the profile anymore */
int diet_profile_free(diet_profile_t* profile);
You have to set IN and INOUT arguments using functions having the following syntax:
int diet_*_set(diet_arg_t* arg, ...);
where * can be for example scalar, matrix...
Calling a Service
Just call:
diet_error_t diet_call(diet_profile_t* profile);
This function will return once the service has finished.
You can also make asynchronous calls.
diet_error_t diet_call_async(diet_profile_t* profile, diet_reqID_t* reqID);
You then have to wait for the end of the services. You can either wait for one call to finish:
diet_error_t diet_wait(diet_reqID_t reqID);
Or several calls, either all of them, or any of them, on all submitted requests or on a subset of them:
diet_error_t diet_wait_and(diet_reqID_t* IDs, size_t length); diet_error_t diet_wait_or(diet_reqID_t* IDs, size_t length, diet_reqID_t* IDptr); diet_error_t diet_wait_all(); diet_error_t diet_wait_any(diet_reqID_t* IDptr);
Freeing Data
Free the amount of data pointed at by the value field of an argument. This should be used ONLY for VOLATILE data,
- on the server for IN arguments that will no longer be used
- on the client for OUT arguments, after the problem has been solved, when they will no longer be used.
NB: for files, this function removes the file and frees the path (since it has been dynamically allocated by DIET in both cases)
int diet_free_data(diet_arg_t* arg);
| Note | |
|---|---|
You can also have a look at /opt/diet_basic/include/DIET_client.h and /opt/diet_basic/include/DIET_data.h. | |
Setting up and testing the client/server locally
The file env_vars.bash for bash shell (respectively env_vars.csh for tcsh shell) in /opt/diet_basic_tutorial/solutions/ contains all the environment variables needed for the execution of the programs. Verify the values of those variables, then load this file using the following command:
| Note | |
|---|---|
This script sets PATH and LD_LIBRARY_PATH to an already installed version of Diet present in /opt. | |
When done with this operation, you need to
start the name server of omniORB:
omniNames. To do that, you must give a port number with the
-start option (omniNames [-start <port>], default
value is 2809), on which the service will be opened (and on which the
server "listens"):
~> omniNames -start Tue Dec 11 14:10:28 2003: Starting omniNames for the first time. Wrote initial log file. Read log file successfully. Root context is IOR:010000002b00000049444c3a6f6d672e6f72672f436f734e616d696e672f4e616d696e674 36f6e746578744578743a312e300000010000000000000060000000010102000d000000313430 2e37372e31332e36310000554f0b0000004e616d6553657276696365000200000000000000080 000000100000000545441010000001c0000000100000001000100010000000100010509010100 0100000009010100 Checkpointing Phase 1: Prepare. Checkpointing Phase 2: Commit. Checkpointing completed.
Then, you have to copy this port number in the omniORB configuration file: the name and location of this file is given by the environment variable OMNIORB_CONFIG which is defined in the env_vars.bash file.
Using Diet User's Manual, prepare configuration files (suggestion: place them in a cfgs directory). You will want to create a hierarchy of agents, to make it interesting. This hierarchy will contain at least one MA and one LA. Basically, an agent configuration file has the following syntax:
#****************************************************************************#
# traceLevel for the DIET agent:
# 0 DIET prints only warnings and errors on the standard error output.
# 1 [default] DIET prints information on the main steps of a call.
# 5 DIET prints information on all internal steps too.
# 10 DIET prints all the communication structures too.
# >10 (traceLevel - 10) is given to the ORB to print CORBA messages too.
#****************************************************************************#
traceLevel = 1
#****************************************************************************#
# (*) agentType: Master Agent or Local Agent ? As there is only one excutable
# for both agent types, it is COMPULSORY to specify the type of this agent.
#****************************************************************************#
agentType = DIET_MASTER_AGENT # or MA if this agent is a master agent
# or LA or DIET_LOCAL_AGENT if this agent is a
# local agent
#****************************************************************************#
# (*) name: the name of this MA. The ORB configuration files of the clients
# and the children of this MA (LAs and SeDs) must point at the same CORBA
# Naming Service as the one pointed at by the ORB configuration file of
# this agent.
#****************************************************************************#
name = MA1
#****************************************************************************#
# (*) parentName: the name of the agent to which the LA will register. This
# agent must have registered at the same CORBA Naming Service that is
# pointed to by your ORB configuration.
#****************************************************************************#
# parentName = MA1 only useful for a local agent
An agent can be run using the following command:
Compile server and client using the Makefile, then, launch the server and the client. Here is a client configuration file example:
#****************************************************************************# # traceLevel for the DIET client library: # 0 DIET prints only warnings and errors on the standard error output. # 1 [default] DIET prints information on the main steps of a call. # 5 DIET prints information on all internal steps too. # 10 DIET prints all the communication structures too. # >10 (traceLevel - 10) is given to the ORB to print CORBA messages too. #****************************************************************************# traceLevel = 1 #****************************************************************************# # (*) MAName: the name of the Master Agent to which the client will connect. # This agent must have registered at the same CORBA Naming Service that is # pointed to by your ORB configuration. #****************************************************************************# MAName = MA1
and a server configuration file:
#****************************************************************************# # traceLevel for the DIET SeD library: # 0 DIET prints only warnings and errors on the standard error output. # 1 [default] DIET prints information on the main steps of a call. # 5 DIET prints information on all internal steps too. # 10 DIET prints all the communication structures too. # >10 (traceLevel - 10) is given to the ORB to print CORBA messages too. #****************************************************************************# traceLevel = 1 #****************************************************************************# # (*) parentName: the name of the agent to which the SeD will register. This # agent must have registered at the same CORBA Naming Service that is # pointed to by your ORB configuration. #****************************************************************************# parentName = MA1
Deploying on several nodes
Deploying a whole Diet hierarchy on many nodes may be quite tiresome. A tool has been specifically designed to deploy Diet platforms: GoDiet. The Diet hierarchy is described in an XML file. You won't have to write the whole XML file. We provide a script which creates a basic Diet platform composed of an MA, and one or many SeDs. The script can be found in /opt/scripts/make_diet_platform_basic.py:
Usage: ./make_diet_platform_basic.py <nb SeD> <nodefile> <output_file>
<GoDIEt_scratch_runtime> <path to SeD>
Creates a simple DIET hierarchy: 1 MA, <nb SeD> SeDs
- <nb SeD>: number of SeDs
- <nodefile>: file containing the list of nodes, e.g. $OAR_NODEFILE
- <output_file>: the XML file to create
- <GoDIEt_scratch_runtime>: path to an existing directory where
GoDIET will write the configuration files
- <path to SeD>: path to SeD binary
Then, you just need to run:
Retrieve the omniORB configuration file, set your OMNIORB_CONFIG environment variable. You can now submit requests using your client. You can modify the XML file to produce different Diet hierarchies.
Another version of the service
In this part, you will modify the server to make it support a slightly different version of the scalar by matrix product. The matrix will be transmitted as a file, and not anymore in memory.
Diet doesn't impose anything about the data format of files, but it would be a good idea to facilitate your work to use the data format used in the skeleton files. This format is just simple text: the file contains a serie of numbers, separated by 'space' characters. The meaning of the numbers is as follows:
- matrix dimensions (number of rows, number of columns)
- matrix values
Create a file containing a matrix, then implement a new service "smprod_file" with the following parameters:
| Parameter | Type |
| a scalar | double |
| a file containing matrix to be multiplied | double |
| the time needed for the product to compute | float |
The file is overwritten by the result.
Exercise 3 - Yet another service, using BLAS dgemm
To compile programs of this exercise, the BLAS library (Basic Linear Algebric Subroutines) is required (it is already installed on the image).
The dgemm_ function is part of the BLAS. It performs the
following matrix-matrix computation:
and
are scalars, and A, B and C are matrices, with A an m by k matrix, B a k by n matrix and C an m by n matrix.
This exercise aims at adding a new service in a Diet platform, that
performs the dgemm_ computation. The idea is to interface
the existing dgemm_ function to a Diet SeD. Here is its
prototype:
void dgemm_(char *tA,
char *tB,
int *m,
int *n,
int *k,
double *alpha,
double *A,
int *lda,
double *B,
int *ldb,
double *beta,
double *C,
int *ldc);
All parameters are given by address. Parameters alpha,
beta, m, n, k, A,
B and C correspond exactly to their respective roles
in the computation
. lda,
ldb and ldc are the leading dimensions of the
corresponding matrices. Since matrices are stored in a classical
one-dimension array, it is important to specify if they are stored by
rows or by columns. *tA and *tB are characters
which have the following semantics:
| tA | Storage order of A (m,k) | |
| 'T' | row-major | [row 1, row 2, ... , row m] |
| 'N' | column-major | [col 1, col 2, ... , col k] |
For this exercise, there is no need to explore all possibilities offered by the storage order or the leading dimension. Just set *tA and *tB to 'N', and lda, ldb and ldc to the number of rows of the corresponding matrix. Once you have specified the profile of the service, program a server that implements this service, and a test client, using the files skeletons in the exercise3 directory. Matrices will be stored in memory. Eventually, test the client/server architecture, through Diet, in different contexts of execution.
References
- Raphaël Bolze, Yves Caniou, Eddy Caron, Pushpinder Kaur Chouhan, Philippe Combes, Sylvain Dahan, Holly Dail, Bruno Delfabro, Georg Hoesch, Mathieu Jan, Jean-Yves L'Excellent, Christophe Pera, Cyrille Pontvieux, Alan Su, Cédric Tedeschi, and Antoine Vernois. http://graal.ens-lyon.fr/DIET/UsersManualDIET2.3/index.html. Also available in pdf: http://graal.ens-lyon.fr/DIET/download/doc/UsersManualDiet2.3.pdf.
- A. Amar, R. Bolze, Y. Caniou, E. Caron, B. Depardon, J.S. Gay, G. Le Mahec, and D. Loureiro. Tunable scheduling in a gridrpc framework. Concurrency Computation: Practice Experience , 2008.
- E. Caron and F. Desprez. DIET: A scalable toolbox to build network enabled servers on the grid. International Journal of High Performance Computing Applications , 20(3):335--352, 2006.
Advanced Data Management Using DAGDA
This is a tutorial introducing how to use Dagda (Data Arrangement for the Grid and Distributed Applications) as the data manager for Diet.
You can use the same KaDeploy image as for the previous exercises of this tutorial. You just need to load another environment variables setting script to fix the good paths and configuration settings. Diet has already been compiled with the relevant options, and installed in /opt/diet_dagda_wf.
The file env_vars.bash for bash shell (respectively env_vars.csh for tcsh shell) in /opt/diet_dagda_tutorial/solutions/ contains all the environment variables needed for the execution of the programs. Verify the values of those variables, then load this file using the following command:
Introduction
Data Arrangement for Grid and Distributed Application (Dagda) is a new data manager for the Diet middleware which allows data explicit or implicit replications and advanced data management on the grid. It was designed to be backward compatible with previously developed applications for Diet but also introduces some new features for the direct data management inside a Diet application.
Using Dagda, the Diet users can do:
- Explicit or implicit data replications.
- A data status backup/restoration system.
- File sharing between the nodes which can access to the same disk partition.
- Choice of a data replacement algorithm.
- Configure the memory and disk space Diet should use for the data storage and transfers.
Dagda transfer model
To transfer a data, Dagda uses the pull model instead of the push model used by DTM (Data Tree Manager, Diet historical data manager): The data are not sent into the profile from the source to the destination, but they are downloaded by the destination from the source.
Figure 1 presents how Dagda manages the data transfers for a standard Diet call:
- The client performs a Diet service call.
- Diet selects one or more SeDs to execute the service.
- The client submits its request to the selected SeD, sending only the data descriptions.
- The SeD downloads the new data from the client and the persistent ones from the nodes on which they are stored.
- The SeD executes the service using the input data.
- The SeD performs updates on the inout and out data and sends their descriptions to the client. The client then downloads the volatile and persistent return data.
At each step of the submission, the transfers are allways launched by the destination node. All the transfers operations are transparently performed and the user does not have to modify its application which uses data persistency to take benefits of the data replications.
Remark: After the call, the persistent output data obtained from other nodes are replicated on the SeD and will stay on it until they are erased by the user or by the data replacement algorithm.
The Dagda API
In this section we present the part of the Dagda API used for this hands-on. For a complete Dagda API presentation, please refer to http://graal.ens-lyon.fr/DIET/UsersManualDIET2.3
Basic data management
dagda_put_file(char* path, diet_persistence_mode_t mode, char**ID)
This macro adds the file of path path with the persistence mode mode to the platform and stores the data ID into ID.
dagda_get_file(char* ID, char** path)
The file of ID ID is obtained from Dagda and the *path is the path to the file. character.
Asynchronous data management
dagda_put_file_async(char* path, diet_persistence_mode_t mode)
This function works like the synchronous version but returns an unsigned int which must be used as the transfer reference for the waiting functions.
dagda_wait_data_ID(unsigned int transferRef, char** ID)
The transferRef argument is the value returned by a dagda_put_*_async function. The ID content will be initialized to a pointer on the data ID.
dagda_get_file_async(char* ID)
This function works like the synchronous one, returning a transfer reference used by the wait function.
dagda_wait_file(unsigned int transferRef, char** path)
The transferRef argument is the value returned by the dagda_get_file_async function. The path content will be initialized to a pointer on the file path.
Data alias
dagda_data_alias(const char* id, const char* alias)
Tries to associate alias to id. If the alias is already defined, returns a non zero value. A data can have several aliases but an alias is always associated to only one data.
dagda_id_from_alias(const char* alias, char** id)
This function tries to retrieve the data id associated to the alias.
Data replication
After a data has been added to the Dagda hierarchy, the users can choose to replicate it explicitely on one or several Diet nodes. With the current Dagda version, we allow to choose the nodes where the data will be replicated by hostname or Dagda component ID. In future developments, it will be possible to select the nodes differently. To maintain backward compatibility, the replication function uses a C-string to define the replication rule.
dagda_replicate_data(const char* id, const char* rule)
The replication rule is defined as follows: "<Pattern target>:<identification pattern>:<Capacity overflow behavior>"
- The pattern target can be ID or host.
- The identification pattern can contain some wildcards characters (for example ”*.lyon.grid5000.fr” is a
valid pattern.)
- The capacity overflow behavior can be replace or noreplace. replace means the cache replacement algorithm will be used if available on the target node (a data could be deleted from the node to leave space to store the new one). noreplace means that the data will be replicated on the node if and only if there is enough storage capacity on it.
For example, ”host:capricorne-*.lyon.*:replace” is a valid replication rule.
Exercise 1
Here we see how to share data between different users. Dagda provides an alias system to share the data identifier easily between the users. We will use it to write a simple application that does not use a specific Diet service.
This application should work as follows:
If the user passes two parameters to the application: the first one is a filename and the second one is a “human readable” identifier. If the file does not exist, we try to use the identifier to obtain it from the platform. Otherwise, we send the file on the platform and tries to define the identifier as an alias name for the data on the grid. If the identifier is already used, the application add a number at the end of it until it finds an available one. (e.g.: if file.txt is used, then try file.txt-0, file.txt-1...).
The /opt/diet_dagda_tutorial/exercise1 directory contains all the skeleton files needed for a quick start. This directory contains the following files:
- Makefile
- management of dependencies between source and compiled files
- data_sharing.cc
- The skeleton for the data sharing application.
Copy them in a working directory.
Exercise 2 - Asynchronous data management
In this example, we will use the data asynchronous transfers capacities of Dagda to write an application which takes several parameters:
-send <<filename ID> <filename ID> ...> -recv < ID ID >
This application sends the files on the platform using their respectives ID as alias and downloads the data of ID passed after the -recv option. All the data transfers will be done simultaneously
The /opt/diet_dagda_tutorial/exercise2 directory contains all the skeleton files needed for a quick start. This directory contains the following files:
- Makefile
- management of dependencies between source and compiled files
- async_data_mngt.cc
- The skeleton for the data asynchronous management application.
Copy them in a working directory.
Exercise 3 - Data replications
Now, we will write an application which takes a filename as parameter, send it on the platform and then replicate it on the nodes located on a particular cluster (e.g: all the nodes of the capricorne cluster in Lyon).
You can compare the time needed to get the data in function of the data replications level. To obtain significant results, deploy the Diet hierarchy and launch the client on different clusters.
The /opt/diet_dagda_tutorial/exercise3 directory contains all the skeleton files needed for a quick start. This directory contains the following files:
- Makefile
- management of dependencies between source and compiled files
- data_replication.cc
- The skeleton for the data replication application.
Copy them in a working directory.
Exercise 4 - A Diet application using Dagda
For this exercise, we will use what we saw previously about data management in Dagda to develop a Diet client-server application that takes advantage of the data management features provided by Dagda.
We want to develop a service which counts the number of occurences of a string in a text file. This service takes 5 arguments:
- The file on which the service searches the string
- The string to search
- The offset to start the search
- The length of the search
- The number of occurences in this part of the file
We need the client application which call the service asynchronously on different parts of the file: The application will take a filename, the string to search, the number of parallel calls and an alias name for the file.
The application tries to obtain the ID of the data corresponding to the alias. If the alias does not exist, the application adds the file on the Diet hierarchy and then associates the alias to it. Then, it calls the service asynchronously using the data identifier instead of the file, avoiding a client to server file transfer.
In a third step, we want to replicate the data on the SeDs and compare the execution time. To obtain significant results, you will deploy the Master Agent, the SeDs and the client on different clusters. You will find a large text file to use as entry file here: /home/lyon/glemahec/large_text.txt (~1GB of french texts file)
The /opt/diet_dagda_tutorial/exercise3 directory contains all the skeleton files needed for a quick start. This directory contains the following files:
- Makefile
- management of dependencies between source and compiled files
- count_client.cc
- The skeleton for the client application.
- count_server.cc
- The skeleton for the server application.
Copy them in a working directory.
Workflows Management
Exercise 1: Installing and compiling Diet with workflow support
You can skip this exercise by using the already compiled /opt/diet_dagda_wf distrib. To compile DIET with workflow support, follow the same steps as in the 'Introductory Hands-on' but in the ccmake configuration you must turn on workflow support (DIET_USE_WORKFLOW option).
Exercise 2: Executing a DAG workflow
In this section, we learn how to execute a basic DAG workflow using Diet MaDag agent. We will use the context of integer arithmetic for the services invoked within the workflow.
Client-server binaries
The /opt/diet_dagda_wf/bin/examples/workflow directory contains all the binary files needed for a quick start. This directory contains the following files:
- scalar_server
- program implementing the services (SUCC, DOUBLE, SUM, SQUARE)
- client_scalar
- program using the workflow service to execute the basic workflow defined in the XML file
- generic_client
- generic client program to execute a workflow
Copy them in a working directory.
XML Dag description
The MaDag agent understands a specific language used to describe the workflow structure. This language is written in XML and contains the following tags:
- dag
- the main tag that contains the whole dag description
- node
- one node corresponds to a task to execute on the grid with given input data described in the 'arg' and 'in' tags. The name of the DIET service is given in the 'path' attribute. The 'id' attribute is used to identify the node and must be unique within the dag.
- arg
- an input port with a given static value that is given in the 'value' attribute. The 'type' attribute contains one of the usual DIET data types.
- in
- an input port connected to the output port of another node using the 'source' attribute
- out
- an output port
The workflow we will use in the exercise is described in the file /opt/diet_dagda_wf/bin/examples/workflow/xml/scalar.xml.
First, copy this file in the working directory.
This file describes a workflow using three different services:
- succ
- data_in -> data_in + 1
- double
- data_in -> 2x data_in
- sum
- data_in1, data_in2 -> data_in1 + data_in2
- square
- data_in -> square_root ( data_int)
These services are combined to compute a mathematical formula by connecting the outputs of some services to the outputs of other services.
Setting up and testing the workflow locally
To run the workflow you must first launch the Diet hierarchy as in the 'Introductory Hands-on' with an additional agent called the MaDag (Master Agent DAG).
Using Diet User's Manual, prepare configuration files http://graal.ens-lyon.fr/DIET/UsersManualDIET2.3/node111.html (suggestion: place them in a cfgs directory). You will want to create a hierarchy of agents, to make it interesting. This hierarchy will contain at least one MADAG, one MA and one LA.
Launch first the MA and the server, then the MADAG and finally the client using the scalar.xml workflow definition. You can first use the client_scalar binary which is specifically designed for the 'scalar' workflow, then you can use the generic_client binary that can be used to run any workflow defined in the MaDag XML language.
Here is an example of an MADAG configuration file:
#****************************************************************************# # traceLevel for the DIET agent: # 0 DIET prints only warnings and errors on the standard error output. # 1 [default] DIET prints information on the main steps of a call. # 5 DIET prints information on all internal steps too. # 10 DIET prints all the communication structures too. # >10 (traceLevel - 10) is given to the ORB to print CORBA messages too. #****************************************************************************# traceLevel = 1 #****************************************************************************# # (*) agentType: #****************************************************************************# agentType = DIET_MA_DAG #****************************************************************************# # (*) name: the name of this MA DAG. #****************************************************************************# name = mad #****************************************************************************# # (*) parentName: the name of the Master Agent #****************************************************************************# parentName = MA1
You can run the MADAG using the following command: An agent can be run using the following command:
You also have to specify the MADAG a client has to connect to, to submit workflows, this is done by adding the following line in the client configuration file:
#****************************************************************************# # MADAGNAME: the name of the MA DAG agent to wich the client will connect #****************************************************************************# MADAGNAME = mad
Deploying on several nodes
Deploying a whole Diet hierarchy on many nodes may be quite tiresome. A tool has been specifically designed to deploy Diet platforms: GoDiet. The Diet hierarchy is described in an XML file. You won't have to write the whole XML file. We provide a script which creates a basic Diet platform composed of an MA, a MADAG and one or many SeDs. The script can be found in /opt/scripts/make_diet_platform_wf.py:
Usage: ./make_diet_platform_wf.py <nb SeD> <nodefile> <output_file>
<GoDIEt_scratch_runtime> <path to SeD>
Creates a simple DIET hierarchy: 1 MA, <nb SeD> SeDs
- <nb SeD>: number of SeDs
- <nodefile>: file containing the list of nodes, e.g. $OAR_NODEFILE
- <output_file>: the XML file to create
- <GoDIEt_scratch_runtime>: path to an existing directory where
GoDIET will write the configuration files
- <path to SeD>: path to SeD binary
Then, you just need to run:
Retrieve the omniORB configuration file, set your OMNIORB_CONFIG environment variable. You can now submit requests using your client. You can modify the XML file to produce different Diet hierarchies.
Another version of the workflow
In this part, you will modify the workflow description to produce a different computation.
The "multiplied by 4" operation is originally done using the sum of the result of two "double" operation. You can add a branch to the workflow using the output of one "double" operation to compute the same value by doubling its result. Then add a "square" node to get the same result as the original branch.
Note that the connection between inputs and outputs of tasks is done using the 'source' attribute of the in element within a node element.
To execute the new workflow and get the results, you should use the generic_client binary (in the command-line parameters you must set the type of workflow to 'dag' and not 'wf'; the 'wf' type uses a different workflow language of which you can see examples in the xml directory).
Exercise 3: Launching multiple workflows simultaneously
The objective of this exercise is to use the different options offered by the MADAG agent for workflow scheduling in the context of multiple concurrent workflow requests
Selecting the scheduling heuristic of the MaDAG agent
The MADAG agent implements different heuristics to optimize multi-workflow scheduling. You can choose between those heuristics by setting one of the following command-line options (for the 'maDagAgent' program):
- -g_heft: ("global heterogeneous earliest finish time") this heuristic extends the classical HEFT heuristic by creating a virtual meta-workflow that contains all the submitted workflows
- -g_aging_heft: similar to the previous one but applies an "aging" factor to the HEFT priorities in order to favorize workflows that were submitted first
- -fairness: this heuristic computes a "slowdown" factor for each dag that is proportional to the difference between the actual estimated makespan minus the estimated makespan when the dag is executed without platform limitation. The priority is given to the dag with the highest "slowdown" value.
- -srpt: ("Shortest remaining processing time") this heuristic gives the priority to dags having the smallest value for remaining estimated processing time.
- -fcfs: ("First come First served") as stated this heuristic uses only the order of arrival to set the dag priority
You can modify the command-line option used for the MaDag agent by modifying the GoDIET xml file that you used in Exercise 2: within the diet_hierarchy/master_agent/ma_dag tag you must uncomment the parameters tag and use the choosen option in the 'string' attribute.
Using a script to launch several workflows one after the other
All required files are available in the /opt/diet_wf_tutorial/exercise3 directory. The workflow_exp.sh script must be launched with the exp1_example.desc.txt configuration file as parameter. This configuration file must be adapted to your setup (working directory and goDiet directories). To test the script, you must first launch the Diet platform using GoDIET and then launch the script workflow_exp.sh with your experiment description
After the launch you can monitor the end of your experiment by using the script watch.sh as follow :
Using VizDIET to vizualize the DAGs execution schedule
At the end of the script execution, a directory is created (e.g. directory exp1.FOFT.loop-5.sleep-2: the exact name depends on the script parameters) containing all the logs of the different agents and a global log file (e.g. DietLog.FOFT.loop-5.sleep-2.log) that contains all the main events registered by the Diet hierarchy of agents and sent to the central logging agent. This file can be used as input for the VizDiet tool to display graphically the execution schedule.
For your information, the VizDiet tool can be used with a direct connection to the central logging agent in order to display the schedule in real-time, but in this case it must be launched on Grid5000 and therefore the X display must be exported.
In this exercise, we propose that you install VizDiet locally on your computer by downloading the Jar file from the /opt/diet_utils/VizDiet directory. Then simply launch the Java program with this Jar file and this should display the GUI of VizDiet. Select the 'Read Log file' box and press the 'Open' button to open a dialog box where you will select the previously mentionned DietLog file. Then press 'OK', the main VizDiet window will appear and by pressing the 'Play button' (leftmost icon) VizDiet will read the log file. A diagram of the Diet hierarchy will be displayed on the main window. In the 'View' menu, you can select the 'Statistics' option to display the schedule. By default the colors used identify the different SeDs but you can switch to 'Dags' colors by changing the drop-down list at the bottom of the diagram. You can also switch between different types of diagrams by clicking on their name at the top of the diagram (for example the 'Gantt' chart).
The VizDiet tool can greatly help you to check the effect of the different MaDag heuristics: run different experiments with the same parameters but with different MaDag heuristic options (don't forget to restart the Diet hierarchy using GoDiet each time). Then use VizDiet to display the different logs generated.
Conclusion
In this tutorial section we have seen how to run a DAG workflow using Diet and also how the workflow scheduler (the MaDAG) can be configured to use different scheduling heuristics and how these heuristics change the execution schedule. The Diet middleware is able to efficiently execute workflows with a very high number of tasks or a high number of workflows simultaneously. The scheduler automatically adapts to changes in the ressources load and tries to optimize their usage to reduce the workflows makespan. Depending on the strategy for multi-workflow scheduling the different MaDag heuristics can be used to improve the mean or the variance of the workflows makespan distribution.
We hope you enjoyed doing this tutorial and discovering Diet. Thanks a lot!
The Diet team.

