G5kss09 DIET TP

From Grid5000

Jump to: navigation, search

Contents

Introductory Hands On

Introduction

This is a basic tutorial introducing how to use the Diet (Distributive Interactive Engineering Toolbox) middleware on Grid'5000.

A KaDeploy image containing all necessary applications and libraries has been recorded under the name DIET_tutorial. It contains software and libraries required to run Diet.

Throughout this tutorial, you will need the Diet User's Manual [1], a pdf version is available in the KaDeploy image under /opt/UserManual.pdf, and an online version is also available: http://graal.ens-lyon.fr/DIET/UsersManualDIET2.3.

Diet Middleware

The Distributed Interactive Engineering Toolbox (Diet)[2][3] project is focused on the development of a scalable middleware with initial efforts focused on distributing the scheduling problem across multiple agents. This middleware is able to find an appropriate server according to the information given in the client's request (e.g., problem to be solved, size of the data involved), the performance of the target platform (e.g., server load, available memory, communication performance) and the local availability of data stored during previous computations.

The Diet component architecture is structured hierarchically for improved scalability. Such an architecture is flexible and can be adapted to diverse environments, including arbitrary heterogeneous computing platforms. The Diet toolkit is implemented in CORBA and thus benefits from the many standardized and stable services provided by the various freely-available CORBA implementations. CORBA systems provide a remote method invocation facility with a high level of transparency.

Diet hierarchical organization.
Diet hierarchical organization.

The Diet framework is composed of several components:

  • A Client is an application that uses the Diet infrastructure to solve problems using an RPC approach. Clients access Diet via various interfaces: web portals, Problem Solving Environments (PSEs) such as Scilab, or programmatically using published C or C++ APIs.
  • A SeD , or server daemon, acts as the service provider, exporting functionalities via a standardized computational service interface; a single SeD can offer any number of computational services. A SeD can also serve as the interface and execution mechanism for either a stand-alone interactive machine or a parallel supercomputer, by interfacing with its batch scheduling facility.
  • Agents , which are the components that facilitate the service location and invocation interactions of clients and SeDs.

Collectively, a hierarchy of agents provides higher-level services such as scheduling and data management. These services are made scalable by distributing them across a hierarchy of agents composed of a single Master Agent (MA) and several Local Agents (LA). The figure shows an example of a Diet hierarchy.

The Master Agent of a Diet hierarchy serves as the distinguished entry point from which the services contained within the hierarchy may be logically accessed. Clients identify the Diet hierarchy using a standard CORBA naming service. Clients submit requests - composed of the name of the specific computational service they require and the necessary arguments for that service - to the MA. The MA then forwards the request to its children, who subsequently forward the request to their children, such that the request is eventually received by all SeDs in the hierarchy. SeDs then evaluate their own capacity to perform the requested service; capacity can be measured in a variety of ways including an application-specific performance prediction, general server load, or local availability of datasets specifically needed by the application. SeDs forward this capacity information back up the agent hierarchy. Based on the capacities of individual SeDs to serve the request at hand, agents at each level of the hierarchy reduce the set of server responses to a manageable list of server choices with the greatest potential. The server choice can be made specific to any kind of application using plug-in schedulers at each level of the hierarchy.

Exercise 1 - Installing and compiling Diet

You first need to reserve a few nodes on which you will be able to deploy a Diet hierarchy. In order to simplify the installation process, a KaDeploy image containing all required software and libraries is provided (omniORB, BLAS, Diet tools, etc. ).

  • Reserve a few nodes in deploy mode (replace n with the number of nodes you want, and t with the walltime):
Image:Terminal.png frontend:
oarsub -I -t deploy -l nodes=n,walltime=t
  • Deploy the Diet image:
Image:Terminal.png frontend:
kadeploy -f $OAR_NODEFILE -e DIET_tutorial -l bdepardo
  • Wait a few minutes, as this last step can be quite long.
Image:Warning.png Warning

It can take up to 10 minutes to deploy the image

The /opt directory contains the following:

  • diet_basic: Diet already installed for this tutorial, if you do not want to bother with the installation process
  • diet_basic_tutorial: files required for this tutorial
  • diet_dagda_tutorial: files required for the advanced data management tutorial
  • diet_dagda_wf: Diet already installed for the advanced data management and workflow management tutorial, if you do not want to bother with the installation process
  • diet_src: Diet source repository
  • diet_utils:
    • GoDIET: Diet deployment tool
    • LogService: Diet logging tools, you will need them in the workflow tutorial
    • VizDIET: VizDiet, Diet visualization tool
  • diet_wf_tutorial: files required for the workflow management tutorial
  • scripts: a few scripts to help you set your environment, and deploy a diet platform
  • UserManual.pdf: Diet user manual

The installation process is described in the User's Manual. Install Diet in the directory of your choice, for instance ${HOME}\DIET. To install Diet, basically what you need to do is the following:

  • create a temporary working directory, and change directory to it
  • use CMake to run diet configuration:
Image:Terminal.png node:
ccmake /opt/diet_src/
  • press 'c' to run initial configuration, and select compilation options, for this tutorial you only need to change the installation directory (CMAKE_INSTALL_PREFIX), turn on examples compilation if you want to (DIET_BUILD_EXAMPLES and DIET_BUILD_BLAS_EXAMPLES). Then press 'c' (configure), then 'g' (generate), CMake will exit, you can finally type make && make install.

You will need to set your PATH and LD_LIBRARY_PATH respectively to <DIET_HOME>/bin and <DIET_HOME>/lib.

Exercise 2 - An example of matrix computation

In this section, we learn how to program a simple client/server application that uses Diet. We will use the context of matrix computation to make this program look a bit more interesting than just a "Hello world". We will implement a basic scalar by matrix product. Then, we will test this program using different schemes of execution.

Files skeletons

The /opt/diet_basic_tutorial/exercise2 directory contains all the skeleton files needed for a quick start. Useful pieces of software are also included in them. This directory contains the following files:

Makefile
management of dependencies between source and compiled files
server.c
program implementing the service (scalar by matrix product)
client_smprod.c
program using the service defined in server.c: the matrix is stored in memory
client_smprod_file.c
same program as client_smprod.c, except that the matrix is passed as a file to the server
client_smprodAsync.c
same program as client_smprod.c, except that asynchronous mode is used to call Diet
matrix1
an example matrix file to be used along with client_smprod_file.c

Copy them in a working directory.

Server-side implementation

Using the skeleton of program server.c, write a service of scalar by matrix product. This service will have the following parameters:

Parameter Type
a scalar double
a matrix to be multiplied double
computing time float

The intial matrix is overwritten by the result. The matrix will be stored in memory.

First, try to define a detailed interface for the service, i.e., a precise definition of the service profile . To do so, have a look at in, inout and out parameters.

You can also have a look at the page concerning data management.

Then, program the solve function solve_smprod, as well as the initialization of the service in the main function.

int solve_smprod(...) {

}


The following function is given to help you:

int scal_mat_prod(double alpha, double *M, int nb_rows, int nb_cols, float *time)

It mutliplies the M matrix (an nb_rows by nb_cols matrix) by the alpha scalar. The result is the matrix alpha*M and the time in seconds taken by this operation.


Data Types

There are two kinds of data types: base types, and composite types. Base types have the following semantics:

Type Description Size in octets
DIET_CHAR Character 1
DIET_BYTE Octet 1
DIET_INT Signed integer 4
DIET_LONGINT Long signed integer 8
DIET_FLOAT Simple precision real 4
DIET_DOUBLE Double precision real 8
DIET_SCOMPLEX Simple precision complex 8
DIET_DCOMPLEX Double precision complex 16

Here are the composite types you can use:

Type Possible base types
DIET_SCALAR all base types
DIET_VECTOR all base types
DIET_MATRIX all base types
DIET_STRING DIET_CHAR
DIET_FILE DIET_CHAR


Persistence Modes

Various persistence modes for data

DIET_VOLATILE
No persistency at all.
DIET_PERSISTENT_RETURN
(valid for INOUT and OUT arguments only)
Data are saved on the server and a copy is sent back to the client after the computation is complete.
DIET_PERSISTENT
Data are saved on the server and nothing is brought back to the client.
DIET_STICKY
Data are saved on the server, they cannot been moved from there to another server, and thus cannot be sent back to the client.
Image:Note.png Note

You will only need the DIET_VOLATILE persistence mode throughout the basic hands on session. Only for the example using a file as INOUT parameter will you need to use the DIET_PERSISTENT mode.

Initializing the Service Table

You have the following function to define the number of services a server can handle, and add the services to the service table (we do not need the diet_convertor_t *cvt, so you can set it to NULL).

int diet_service_table_init(int max_size);
int diet_service_table_add(diet_profile_desc_t *profile,
                           diet_convertor_t    *cvt,
                           diet_solve_t         solve_func);
void diet_print_service_table();

Creating a Profile

Here are the main functions available to create a profile.

diet_profile_desc_t* diet_profile_desc_alloc(const char* path,
                                             int last_in, int last_inout, int last_out);
int diet_generic_desc_set(diet_arg_t       *desc,
                          diet_data_type_t  type,
                          diet_base_type_t  base_type );
int diet_profile_desc_free(diet_profile_desc_t* desc);
diet_arg_t* diet_parameter(diet_profile_t* profile, int position);


You need to define the profile of your service. last_in, last_inout and last_out correspond to the index in an array of the last IN, INOUT and OUT parameters (for example if last_in = 0 then we have 1 IN parameter, if last_in = last_inout, then there is no INOUT parameter).

diet_generic_desc_set has to be called for each parameter in the profile. For example:

  diet_generic_desc_set(diet_param_desc(profile,0), DIET_SCALAR, DIET_DOUBLE);

Getting and Setting IN, INOUT and OUT Arguments

You have to get IN and INOUT arguments using functions having the following syntax:

int diet_*_get(diet_arg_t* arg, void** value,
               diet_persistence_mode_t* mode);

For the time being, you can set the mode to NULL. For example you could have for an IN parameter:

  double* coeff;
  ...
  diet_scalar_get(diet_parameter(pb,0), &coeff, NULL);
Image:Note.png Note

The diet_*_get functions are defined in DIET_data.h. Do not forget that the necessary memory space for OUT arguments is allocated by DIET. So the user should call the diet_*_get functions to retrieve the pointer to the zone his/her program should write to.

You have to set INOUT and OUT arguments using functions having the following syntax:

int diet_*_desc_set(diet_arg_t* arg, ...);

where * can be for example scalar, matrix... For example, you could have for an OUT parameter:

  float* time;
  ...
  diet_scalar_get(diet_parameter(pb,2), &time, NULL);
  ...
  diet_scalar_desc_set(diet_parameter(pb,2), time);


Image:Note.png Note

You can also have a look at /opt/diet_basic/include/DIET_server.h and /opt/diet_basic/include/DIET_data.h.

Client-side implementation

Using the client_smprod.c skeleton file, write a client for the service defined above. You will need to initialize a matrix and a scalar with known values so as to be able to verify if the answer is correct or not.

You will have to remember that the profile used in the client must match exactly the server profile.

You can also write another version of the client, using asynchronous calls to Diet.

Initializing and Finalizing a DIET Session

The client program must open its DIET session with a call to diet_initialize, which parses the configuration file to set all options and get a reference to the DIET Master Agent. The session is closed with a call to diet_finalize, which frees all resources associated with this session on the client. Note that memory allocated for all INOUT and OUT arguments brought back onto the client during the session is not freed during diet_finalize; this allows the user to continue to use the data, but also requires that the user explicitly free the memory. The user must also free the memory he or she allocated for IN arguments.

/* Before any call to DIET */
diet_error_t diet_initialize(char* config_file_name, int argc, char* argv[]);

/* When you do not need DIET anymore */
diet_error_t diet_finalize();

Create Profile and Set Arguments

Allocate a DIET profile with memory space for its arguments. pb_name is deep-copied. If no IN argument, please give -1 for last_in. If no INOUT argument, please give last_in for last_inout. If no OUT argument, please give last_inout for last_out. Once the profile is allocated, please use set functions on each parameter. For example, the nth argument is a matrix:

   diet_matrix_set(diet_parameter(profile,n),
                   mode, value, btype, nb_r, nb_c, order);
diet_profile_t* diet_profile_alloc(char* pb_name,
                                   int last_in, int last_inout, int last_out);

/* Once you do not need the profile anymore */
int diet_profile_free(diet_profile_t* profile);

You have to set IN and INOUT arguments using functions having the following syntax:

int diet_*_set(diet_arg_t* arg, ...);

where * can be for example scalar, matrix...

Calling a Service

Just call:

diet_error_t diet_call(diet_profile_t* profile);

This function will return once the service has finished.


You can also make asynchronous calls.

diet_error_t diet_call_async(diet_profile_t* profile, diet_reqID_t* reqID);

You then have to wait for the end of the services. You can either wait for one call to finish:

diet_error_t  diet_wait(diet_reqID_t reqID);

Or several calls, either all of them, or any of them, on all submitted requests or on a subset of them:

diet_error_t
  diet_wait_and(diet_reqID_t* IDs, size_t length);
diet_error_t
  diet_wait_or(diet_reqID_t* IDs, size_t length, diet_reqID_t* IDptr);
diet_error_t
  diet_wait_all();
diet_error_t
  diet_wait_any(diet_reqID_t* IDptr);

Freeing Data

Free the amount of data pointed at by the value field of an argument. This should be used ONLY for VOLATILE data,

  • on the server for IN arguments that will no longer be used
  • on the client for OUT arguments, after the problem has been solved, when they will no longer be used.

NB: for files, this function removes the file and frees the path (since it has been dynamically allocated by DIET in both cases)

int diet_free_data(diet_arg_t* arg);


Image:Note.png Note

You can also have a look at /opt/diet_basic/include/DIET_client.h and /opt/diet_basic/include/DIET_data.h.

Setting up and testing the client/server locally

The file env_vars.bash for bash shell (respectively env_vars.csh for tcsh shell) in /opt/diet_basic_tutorial/solutions/ contains all the environment variables needed for the execution of the programs. Verify the values of those variables, then load this file using the following command:

Image:Terminal.png node:
source /opt/diet_basic_tutorial/solutions/env_vars.bash
Image:Note.png Note

This script sets PATH and LD_LIBRARY_PATH to an already installed version of Diet present in /opt.


When done with this operation, you need to start the name server of omniORB: omniNames. To do that, you must give a port number with the -start option (omniNames [-start <port>], default value is 2809), on which the service will be opened (and on which the server "listens"):

~> omniNames -start 

Tue Dec 11 14:10:28 2003:

Starting omniNames for the first time.
Wrote initial log file.
Read log file successfully.
Root context is
IOR:010000002b00000049444c3a6f6d672e6f72672f436f734e616d696e672f4e616d696e674
36f6e746578744578743a312e300000010000000000000060000000010102000d000000313430
2e37372e31332e36310000554f0b0000004e616d6553657276696365000200000000000000080
000000100000000545441010000001c0000000100000001000100010000000100010509010100
0100000009010100
Checkpointing Phase 1: Prepare.
Checkpointing Phase 2: Commit.
Checkpointing completed.

Then, you have to copy this port number in the omniORB configuration file: the name and location of this file is given by the environment variable OMNIORB_CONFIG which is defined in the env_vars.bash file.

Using Diet User's Manual, prepare configuration files (suggestion: place them in a cfgs directory). You will want to create a hierarchy of agents, to make it interesting. This hierarchy will contain at least one MA and one LA. Basically, an agent configuration file has the following syntax:

#****************************************************************************#
# traceLevel for the DIET agent:
#    0  DIET prints only warnings and errors on the standard error output.
#    1  [default] DIET prints information on the main steps of a call.
#    5  DIET prints information on all internal steps too.
#   10  DIET prints all the communication structures too.
#  >10  (traceLevel - 10) is given to the ORB to print CORBA messages too.
#****************************************************************************#

traceLevel = 1

#****************************************************************************#
# (*) agentType: Master Agent or Local Agent ? As there is only one excutable
#   for both agent types, it is COMPULSORY to specify the type of this agent.
#****************************************************************************#

agentType = DIET_MASTER_AGENT # or MA if this agent is a master agent
                              # or LA or DIET_LOCAL_AGENT if this agent is a
                              # local agent

#****************************************************************************#
# (*) name: the name of this MA. The ORB configuration files of the clients
#   and the children of this MA (LAs and SeDs) must point at the same CORBA
#   Naming Service as the one pointed at by the ORB configuration file of
#   this agent.
#****************************************************************************#

name = MA1

#****************************************************************************#
# (*) parentName: the name of the agent to which the LA will register. This
#   agent must have registered at the same CORBA Naming Service that is 
#   pointed to by your ORB configuration.
#****************************************************************************#

# parentName = MA1 only useful for a local agent

An agent can be run using the following command:

Image:Terminal.png node:
/opt/diet_basic/bin/dietAgent agent.cfg

Compile server and client using the Makefile, then, launch the server and the client. Here is a client configuration file example:

#****************************************************************************#
# traceLevel for the DIET client library:
#    0  DIET prints only warnings and errors on the standard error output.
#    1  [default] DIET prints information on the main steps of a call.
#    5  DIET prints information on all internal steps too.
#   10  DIET prints all the communication structures too.
#  >10  (traceLevel - 10) is given to the ORB to print CORBA messages too.
#****************************************************************************#

traceLevel = 1


#****************************************************************************#
# (*) MAName: the name of the Master Agent to which the client will connect.
#   This agent must have registered at the same CORBA Naming Service that is
#   pointed to by your ORB configuration.
#****************************************************************************#

MAName = MA1

and a server configuration file:

#****************************************************************************#
# traceLevel for the DIET SeD library:
#    0  DIET prints only warnings and errors on the standard error output.
#    1  [default] DIET prints information on the main steps of a call.
#    5  DIET prints information on all internal steps too.
#   10  DIET prints all the communication structures too.
#  >10  (traceLevel - 10) is given to the ORB to print CORBA messages too.
#****************************************************************************#

traceLevel = 1


#****************************************************************************#
# (*) parentName: the name of the agent to which the SeD will register. This
#   agent must have registered at the same CORBA Naming Service that is
#   pointed to by your ORB configuration.
#****************************************************************************#

parentName = MA1


Image:Note.png Note

If an element crashes and return an error such as:

DIET ERROR: exception caught (NO_RESOURCES) while attempting to connect to the CORBA name server.
DIET ERROR: cannot locate Master Agent MA1.
DIET initialization failed !
terminate called after throwing an instance of 'omni_thread_fatal'
make sure your OMNIORB_CONFIG environment variable is set to a valid omniORB configuration file (the one used for the deployment of the hierarchy)

Deploying on several nodes

Deploying a whole Diet hierarchy on many nodes may be quite tiresome. A tool has been specifically designed to deploy Diet platforms: GoDiet. The Diet hierarchy is described in an XML file. You won't have to write the whole XML file. We provide a script which creates a basic Diet platform composed of an MA, and one or many SeDs. The script can be found in /opt/scripts/make_diet_platform_basic.py:

Usage: ./make_diet_platform_basic.py <nb SeD> <nodefile> <output_file>
                                     <GoDIEt_scratch_runtime> <path to SeD>
  Creates a simple DIET hierarchy: 1 MA, <nb SeD> SeDs
   - <nb SeD>: number of SeDs
   - <nodefile>: file containing the list of nodes, e.g. $OAR_NODEFILE
   - <output_file>: the XML file to create
   - <GoDIEt_scratch_runtime>: path to an existing directory where
      GoDIET will write the configuration files
   - <path to SeD>: path to SeD binary

Then, you just need to run:

Image:Terminal.png node:
godiet <XML file>

Retrieve the omniORB configuration file, set your OMNIORB_CONFIG environment variable. You can now submit requests using your client. You can modify the XML file to produce different Diet hierarchies.

Another version of the service

In this part, you will modify the server to make it support a slightly different version of the scalar by matrix product. The matrix will be transmitted as a file, and not anymore in memory.

Diet doesn't impose anything about the data format of files, but it would be a good idea to facilitate your work to use the data format used in the skeleton files. This format is just simple text: the file contains a serie of numbers, separated by 'space' characters. The meaning of the numbers is as follows:

  • matrix dimensions (number of rows, number of columns)
  • matrix values

Create a file containing a matrix, then implement a new service "smprod_file" with the following parameters:

Parameter Type
a scalar double
a file containing matrix to be multiplied double
the time needed for the product to compute float

The file is overwritten by the result.

Exercise 3 - Yet another service, using BLAS dgemm

To compile programs of this exercise, the BLAS library (Basic Linear Algebric Subroutines) is required (it is already installed on the image).

The dgemm_ function is part of the BLAS. It performs the following matrix-matrix computation: Image:diet_tuto_eq_dgemm.png

Image:diet_tuto_eq_alpha.png and Image:diet_tuto_eq_beta.png are scalars, and A, B and C are matrices, with A an m by k matrix, B a k by n matrix and C an m by n matrix. This exercise aims at adding a new service in a Diet platform, that performs the dgemm_ computation. The idea is to interface the existing dgemm_ function to a Diet SeD. Here is its prototype:

void dgemm_(char   *tA,
            char   *tB,
            int    *m,
            int    *n,
            int    *k,
            double *alpha,
            double *A,
            int    *lda,
            double *B,
            int    *ldb,
            double *beta,
            double *C,
            int    *ldc);

All parameters are given by address. Parameters alpha, beta, m, n, k, A, B and C correspond exactly to their respective roles in the computation Image:diet_tuto_eq_dgemm.png. lda, ldb and ldc are the leading dimensions of the corresponding matrices. Since matrices are stored in a classical one-dimension array, it is important to specify if they are stored by rows or by columns. *tA and *tB are characters which have the following semantics:

tA Storage order of A (m,k)
'T' row-major [row 1, row 2, ... , row m]
'N' column-major [col 1, col 2, ... , col k]

For this exercise, there is no need to explore all possibilities offered by the storage order or the leading dimension. Just set *tA and *tB to 'N', and lda, ldb and ldc to the number of rows of the corresponding matrix. Once you have specified the profile of the service, program a server that implements this service, and a test client, using the files skeletons in the exercise3 directory. Matrices will be stored in memory. Eventually, test the client/server architecture, through Diet, in different contexts of execution.

References

  1. Raphaël Bolze, Yves Caniou, Eddy Caron, Pushpinder Kaur Chouhan, Philippe Combes, Sylvain Dahan, Holly Dail, Bruno Delfabro, Georg Hoesch, Mathieu Jan, Jean-Yves L'Excellent, Christophe Pera, Cyrille Pontvieux, Alan Su, Cédric Tedeschi, and Antoine Vernois. http://graal.ens-lyon.fr/DIET/UsersManualDIET2.3/index.html. Also available in pdf: http://graal.ens-lyon.fr/DIET/download/doc/UsersManualDiet2.3.pdf.
  2. A. Amar, R. Bolze, Y. Caniou, E. Caron, B. Depardon, J.S. Gay, G. Le Mahec, and D. Loureiro. Tunable scheduling in a gridrpc framework. Concurrency Computation: Practice Experience , 2008.
  3. E. Caron and F. Desprez. DIET: A scalable toolbox to build network enabled servers on the grid. International Journal of High Performance Computing Applications , 20(3):335--352, 2006.

Advanced Data Management Using DAGDA

This is a tutorial introducing how to use Dagda (Data Arrangement for the Grid and Distributed Applications) as the data manager for Diet.

You can use the same KaDeploy image as for the previous exercises of this tutorial. You just need to load another environment variables setting script to fix the good paths and configuration settings. Diet has already been compiled with the relevant options, and installed in /opt/diet_dagda_wf.

The file env_vars.bash for bash shell (respectively env_vars.csh for tcsh shell) in /opt/diet_dagda_tutorial/solutions/ contains all the environment variables needed for the execution of the programs. Verify the values of those variables, then load this file using the following command:

Image:Terminal.png node:
source /opt/diet_dagda_tutorial/solutions/env_vars.bash
Image:Warning.png Warning

In this section of the tutorial, you need to use a version of Diet compiled with Dagda support. If something goes wrong, make sure your PATH and LD_LIBRARY_PATH are set to the path of the correct installation of Diet.


Introduction

Data Arrangement for Grid and Distributed Application (Dagda) is a new data manager for the Diet middleware which allows data explicit or implicit replications and advanced data management on the grid. It was designed to be backward compatible with previously developed applications for Diet but also introduces some new features for the direct data management inside a Diet application.

Using Dagda, the Diet users can do:

  • Explicit or implicit data replications.
  • A data status backup/restoration system.
  • File sharing between the nodes which can access to the same disk partition.
  • Choice of a data replacement algorithm.
  • Configure the memory and disk space Diet should use for the data storage and transfers.

Dagda transfer model

Figure 1: Dagda architecture.

To transfer a data, Dagda uses the pull model instead of the push model used by DTM (Data Tree Manager, Diet historical data manager): The data are not sent into the profile from the source to the destination, but they are downloaded by the destination from the source.

Figure 1 presents how Dagda manages the data transfers for a standard Diet call:

  • The client performs a Diet service call.
  • Diet selects one or more SeDs to execute the service.
  • The client submits its request to the selected SeD, sending only the data descriptions.
  • The SeD downloads the new data from the client and the persistent ones from the nodes on which they are stored.
  • The SeD executes the service using the input data.
  • The SeD performs updates on the inout and out data and sends their descriptions to the client. The client then downloads the volatile and persistent return data.

At each step of the submission, the transfers are allways launched by the destination node. All the transfers operations are transparently performed and the user does not have to modify its application which uses data persistency to take benefits of the data replications.

Remark: After the call, the persistent output data obtained from other nodes are replicated on the SeD and will stay on it until they are erased by the user or by the data replacement algorithm.

The Dagda API

In this section we present the part of the Dagda API used for this hands-on. For a complete Dagda API presentation, please refer to http://graal.ens-lyon.fr/DIET/UsersManualDIET2.3

Basic data management

dagda_put_file(char* path, diet_persistence_mode_t mode, char**ID)

This macro adds the file of path path with the persistence mode mode to the platform and stores the data ID into ID.

dagda_get_file(char* ID, char** path)

The file of ID ID is obtained from Dagda and the *path is the path to the file. character.

Asynchronous data management

dagda_put_file_async(char* path, diet_persistence_mode_t mode)

This function works like the synchronous version but returns an unsigned int which must be used as the transfer reference for the waiting functions.

dagda_wait_data_ID(unsigned int transferRef, char** ID)

The transferRef argument is the value returned by a dagda_put_*_async function. The ID content will be initialized to a pointer on the data ID.

dagda_get_file_async(char* ID)

This function works like the synchronous one, returning a transfer reference used by the wait function.

dagda_wait_file(unsigned int transferRef, char** path)

The transferRef argument is the value returned by the dagda_get_file_async function. The path content will be initialized to a pointer on the file path.

Data alias

dagda_data_alias(const char* id, const char* alias)

Tries to associate alias to id. If the alias is already defined, returns a non zero value. A data can have several aliases but an alias is always associated to only one data.

dagda_id_from_alias(const char* alias, char** id)

This function tries to retrieve the data id associated to the alias.

Data replication

After a data has been added to the Dagda hierarchy, the users can choose to replicate it explicitely on one or several Diet nodes. With the current Dagda version, we allow to choose the nodes where the data will be replicated by hostname or Dagda component ID. In future developments, it will be possible to select the nodes differently. To maintain backward compatibility, the replication function uses a C-string to define the replication rule.

dagda_replicate_data(const char* id, const char* rule)

The replication rule is defined as follows: "<Pattern target>:<identification pattern>:<Capacity overflow behavior>"

  • The pattern target can be ID or host.
  • The identification pattern can contain some wildcards characters (for example ”*.lyon.grid5000.fr” is a

valid pattern.)

  • The capacity overflow behavior can be replace or noreplace. replace means the cache replacement algorithm will be used if available on the target node (a data could be deleted from the node to leave space to store the new one). noreplace means that the data will be replicated on the node if and only if there is enough storage capacity on it.

For example, ”host:capricorne-*.lyon.*:replace” is a valid replication rule.

Exercise 1

Here we see how to share data between different users. Dagda provides an alias system to share the data identifier easily between the users. We will use it to write a simple application that does not use a specific Diet service.

This application should work as follows:

If the user passes two parameters to the application: the first one is a filename and the second one is a “human readable” identifier. If the file does not exist, we try to use the identifier to obtain it from the platform. Otherwise, we send the file on the platform and tries to define the identifier as an alias name for the data on the grid. If the identifier is already used, the application add a number at the end of it until it finds an available one. (e.g.: if file.txt is used, then try file.txt-0, file.txt-1...).

The /opt/diet_dagda_tutorial/exercise1 directory contains all the skeleton files needed for a quick start. This directory contains the following files:

Makefile
management of dependencies between source and compiled files
data_sharing.cc
The skeleton for the data sharing application.

Copy them in a working directory.

Image:Warning.png Warning

Throughout the exercices, if you make local tests, or if you run the clients on the same machine as one of the agents or SeDs, you need to set the Dagda storage directory of each element to a separate directory. To do so you can either modify the configuration files by adding the following line:

storageDirectory = <path to an existing directory>

or add in your GoDiet XML file, for each Diet elements the following line:

<cfg_option>
  ...
  <option name='storageDirectory' value='path to an existing directory'
</cfg_options>

Make sure the path you give really exists on the node where the Diet element will be deployed, GoDiet won't create it.

If you do not use this option, a file transfered from, for example, the MA to the client located on a same machine will be read from and written to the same path, which will lead to corrupt files. Another possiblity is, of course, to run each Diet element and the client on a separate node.

Exercise 2 - Asynchronous data management

In this example, we will use the data asynchronous transfers capacities of Dagda to write an application which takes several parameters:

-send <<filename ID> <filename ID> ...> 
-recv < ID ID > 

This application sends the files on the platform using their respectives ID as alias and downloads the data of ID passed after the -recv option. All the data transfers will be done simultaneously

The /opt/diet_dagda_tutorial/exercise2 directory contains all the skeleton files needed for a quick start. This directory contains the following files:

Makefile
management of dependencies between source and compiled files
async_data_mngt.cc
The skeleton for the data asynchronous management application.

Copy them in a working directory.

Exercise 3 - Data replications

Now, we will write an application which takes a filename as parameter, send it on the platform and then replicate it on the nodes located on a particular cluster (e.g: all the nodes of the capricorne cluster in Lyon).

You can compare the time needed to get the data in function of the data replications level. To obtain significant results, deploy the Diet hierarchy and launch the client on different clusters.

The /opt/diet_dagda_tutorial/exercise3 directory contains all the skeleton files needed for a quick start. This directory contains the following files:

Makefile
management of dependencies between source and compiled files
data_replication.cc
The skeleton for the data replication application.

Copy them in a working directory.

Exercise 4 - A Diet application using Dagda

For this exercise, we will use what we saw previously about data management in Dagda to develop a Diet client-server application that takes advantage of the data management features provided by Dagda.

We want to develop a service which counts the number of occurences of a string in a text file. This service takes 5 arguments:

  • The file on which the service searches the string
  • The string to search
  • The offset to start the search
  • The length of the search
  • The number of occurences in this part of the file

We need the client application which call the service asynchronously on different parts of the file: The application will take a filename, the string to search, the number of parallel calls and an alias name for the file.

The application tries to obtain the ID of the data corresponding to the alias. If the alias does not exist, the application adds the file on the Diet hierarchy and then associates the alias to it. Then, it calls the service asynchronously using the data identifier instead of the file, avoiding a client to server file transfer.

In a third step, we want to replicate the data on the SeDs and compare the execution time. To obtain significant results, you will deploy the Master Agent, the SeDs and the client on different clusters. You will find a large text file to use as entry file here: /home/lyon/glemahec/large_text.txt (~1GB of french texts file)

The /opt/diet_dagda_tutorial/exercise3 directory contains all the skeleton files needed for a quick start. This directory contains the following files:

Makefile
management of dependencies between source and compiled files
count_client.cc
The skeleton for the client application.
count_server.cc
The skeleton for the server application.

Copy them in a working directory.

Workflows Management

Exercise 1: Installing and compiling Diet with workflow support

You can skip this exercise by using the already compiled /opt/diet_dagda_wf distrib. To compile DIET with workflow support, follow the same steps as in the 'Introductory Hands-on' but in the ccmake configuration you must turn on workflow support (DIET_USE_WORKFLOW option).

Exercise 2: Executing a DAG workflow

In this section, we learn how to execute a basic DAG workflow using Diet MaDag agent. We will use the context of integer arithmetic for the services invoked within the workflow.

Client-server binaries

The /opt/diet_dagda_wf/bin/examples/workflow directory contains all the binary files needed for a quick start. This directory contains the following files:

scalar_server
program implementing the services (SUCC, DOUBLE, SUM, SQUARE)
client_scalar
program using the workflow service to execute the basic workflow defined in the XML file
generic_client
generic client program to execute a workflow

Copy them in a working directory.

XML Dag description

The MaDag agent understands a specific language used to describe the workflow structure. This language is written in XML and contains the following tags:

dag 
the main tag that contains the whole dag description
node 
one node corresponds to a task to execute on the grid with given input data described in the 'arg' and 'in' tags. The name of the DIET service is given in the 'path' attribute. The 'id' attribute is used to identify the node and must be unique within the dag.
arg 
an input port with a given static value that is given in the 'value' attribute. The 'type' attribute contains one of the usual DIET data types.
in 
an input port connected to the output port of another node using the 'source' attribute
out 
an output port

The workflow we will use in the exercise is described in the file /opt/diet_dagda_wf/bin/examples/workflow/xml/scalar.xml.

First, copy this file in the working directory.

This file describes a workflow using three different services:

succ 
data_in -> data_in + 1
double 
data_in -> 2x data_in
sum 
data_in1, data_in2 -> data_in1 + data_in2
square 
data_in -> square_root ( data_int)

These services are combined to compute a mathematical formula by connecting the outputs of some services to the outputs of other services.

Setting up and testing the workflow locally

To run the workflow you must first launch the Diet hierarchy as in the 'Introductory Hands-on' with an additional agent called the MaDag (Master Agent DAG).

Using Diet User's Manual, prepare configuration files http://graal.ens-lyon.fr/DIET/UsersManualDIET2.3/node111.html (suggestion: place them in a cfgs directory). You will want to create a hierarchy of agents, to make it interesting. This hierarchy will contain at least one MADAG, one MA and one LA.

Launch first the MA and the server, then the MADAG and finally the client using the scalar.xml workflow definition. You can first use the client_scalar binary which is specifically designed for the 'scalar' workflow, then you can use the generic_client binary that can be used to run any workflow defined in the MaDag XML language.

Here is an example of an MADAG configuration file:

#****************************************************************************#
# traceLevel for the DIET agent:
#    0  DIET prints only warnings and errors on the standard error output.
#    1  [default] DIET prints information on the main steps of a call.
#    5  DIET prints information on all internal steps too.
#   10  DIET prints all the communication structures too.
#  >10  (traceLevel - 10) is given to the ORB to print CORBA messages too.
#****************************************************************************#

traceLevel = 1


#****************************************************************************#
# (*) agentType: 
#****************************************************************************#

agentType = DIET_MA_DAG


#****************************************************************************#
# (*) name: the name of this MA DAG. 
#****************************************************************************#

name = mad


#****************************************************************************#
# (*) parentName: the name of the Master Agent
#****************************************************************************#

parentName = MA1

You can run the MADAG using the following command: An agent can be run using the following command:

Image:Terminal.png node:
/opt/diet_dagda_wf/bin/maDagAgent madag.cfg

You also have to specify the MADAG a client has to connect to, to submit workflows, this is done by adding the following line in the client configuration file:

#****************************************************************************#
# MADAGNAME: the name of the MA DAG agent to wich the client will connect
#****************************************************************************#

MADAGNAME = mad
Image:Warning.png Warning

In this section of the tutorial, you need to use a version of Diet compiled with workflow support. If something goes wrong, make sure your PATH and LD_LIBRARY_PATH are set to the path of the correct installation of Diet.

Deploying on several nodes

Deploying a whole Diet hierarchy on many nodes may be quite tiresome. A tool has been specifically designed to deploy Diet platforms: GoDiet. The Diet hierarchy is described in an XML file. You won't have to write the whole XML file. We provide a script which creates a basic Diet platform composed of an MA, a MADAG and one or many SeDs. The script can be found in /opt/scripts/make_diet_platform_wf.py:

Usage: ./make_diet_platform_wf.py <nb SeD> <nodefile> <output_file>
                                     <GoDIEt_scratch_runtime> <path to SeD>
  Creates a simple DIET hierarchy: 1 MA, <nb SeD> SeDs
   - <nb SeD>: number of SeDs
   - <nodefile>: file containing the list of nodes, e.g. $OAR_NODEFILE
   - <output_file>: the XML file to create
   - <GoDIEt_scratch_runtime>: path to an existing directory where
      GoDIET will write the configuration files
   - <path to SeD>: path to SeD binary

Then, you just need to run:

Image:Terminal.png node:
godiet <XML file>

Retrieve the omniORB configuration file, set your OMNIORB_CONFIG environment variable. You can now submit requests using your client. You can modify the XML file to produce different Diet hierarchies.

Another version of the workflow

In this part, you will modify the workflow description to produce a different computation.

The "multiplied by 4" operation is originally done using the sum of the result of two "double" operation. You can add a branch to the workflow using the output of one "double" operation to compute the same value by doubling its result. Then add a "square" node to get the same result as the original branch.

Note that the connection between inputs and outputs of tasks is done using the 'source' attribute of the in element within a node element.

To execute the new workflow and get the results, you should use the generic_client binary (in the command-line parameters you must set the type of workflow to 'dag' and not 'wf'; the 'wf' type uses a different workflow language of which you can see examples in the xml directory).

Exercise 3: Launching multiple workflows simultaneously

The objective of this exercise is to use the different options offered by the MADAG agent for workflow scheduling in the context of multiple concurrent workflow requests

Selecting the scheduling heuristic of the MaDAG agent

The MADAG agent implements different heuristics to optimize multi-workflow scheduling. You can choose between those heuristics by setting one of the following command-line options (for the 'maDagAgent' program):

  • -g_heft: ("global heterogeneous earliest finish time") this heuristic extends the classical HEFT heuristic by creating a virtual meta-workflow that contains all the submitted workflows
  • -g_aging_heft: similar to the previous one but applies an "aging" factor to the HEFT priorities in order to favorize workflows that were submitted first
  • -fairness: this heuristic computes a "slowdown" factor for each dag that is proportional to the difference between the actual estimated makespan minus the estimated makespan when the dag is executed without platform limitation. The priority is given to the dag with the highest "slowdown" value.
  • -srpt: ("Shortest remaining processing time") this heuristic gives the priority to dags having the smallest value for remaining estimated processing time.
  • -fcfs: ("First come First served") as stated this heuristic uses only the order of arrival to set the dag priority

You can modify the command-line option used for the MaDag agent by modifying the GoDIET xml file that you used in Exercise 2: within the diet_hierarchy/master_agent/ma_dag tag you must uncomment the parameters tag and use the choosen option in the 'string' attribute.

Using a script to launch several workflows one after the other

All required files are available in the /opt/diet_wf_tutorial/exercise3 directory. The workflow_exp.sh script must be launched with the exp1_example.desc.txt configuration file as parameter. This configuration file must be adapted to your setup (working directory and goDiet directories). To test the script, you must first launch the Diet platform using GoDIET and then launch the script workflow_exp.sh with your experiment description

Image:Terminal.png node:
./workflow_exp.sh <exp.desc.txt>

After the launch you can monitor the end of your experiment by using the script watch.sh as follow :

Image:Terminal.png node:
watch ./watch.sh <exp.desc.txt>

Using VizDIET to vizualize the DAGs execution schedule

At the end of the script execution, a directory is created (e.g. directory exp1.FOFT.loop-5.sleep-2: the exact name depends on the script parameters) containing all the logs of the different agents and a global log file (e.g. DietLog.FOFT.loop-5.sleep-2.log) that contains all the main events registered by the Diet hierarchy of agents and sent to the central logging agent. This file can be used as input for the VizDiet tool to display graphically the execution schedule.

For your information, the VizDiet tool can be used with a direct connection to the central logging agent in order to display the schedule in real-time, but in this case it must be launched on Grid5000 and therefore the X display must be exported.

In this exercise, we propose that you install VizDiet locally on your computer by downloading the Jar file from the /opt/diet_utils/VizDiet directory. Then simply launch the Java program with this Jar file and this should display the GUI of VizDiet. Select the 'Read Log file' box and press the 'Open' button to open a dialog box where you will select the previously mentionned DietLog file. Then press 'OK', the main VizDiet window will appear and by pressing the 'Play button' (leftmost icon) VizDiet will read the log file. A diagram of the Diet hierarchy will be displayed on the main window. In the 'View' menu, you can select the 'Statistics' option to display the schedule. By default the colors used identify the different SeDs but you can switch to 'Dags' colors by changing the drop-down list at the bottom of the diagram. You can also switch between different types of diagrams by clicking on their name at the top of the diagram (for example the 'Gantt' chart).

The VizDiet tool can greatly help you to check the effect of the different MaDag heuristics: run different experiments with the same parameters but with different MaDag heuristic options (don't forget to restart the Diet hierarchy using GoDiet each time). Then use VizDiet to display the different logs generated.

Conclusion

In this tutorial section we have seen how to run a DAG workflow using Diet and also how the workflow scheduler (the MaDAG) can be configured to use different scheduling heuristics and how these heuristics change the execution schedule. The Diet middleware is able to efficiently execute workflows with a very high number of tasks or a high number of workflows simultaneously. The scheduler automatically adapts to changes in the ressources load and tries to optimize their usage to reduce the workflows makespan. Depending on the strategy for multi-workflow scheduling the different MaDag heuristics can be used to improve the mean or the variance of the workflows makespan distribution.

We hope you enjoyed doing this tutorial and discovering Diet. Thanks a lot!

The Diet team.

Personal tools
Wiki special pages