Large-Scale Trace Visualization Triva

From Grid5000

Jump to: navigation, search

Contents

Introduction

The goal of this practical session is to introduce Grid'5000 users to Triva, an open source trace visualization tool to analyze large-scale parallel and distributed applications. For that intent, we present the Pajé file format, the visualization scability issues in large-scale scenarios and the two data integration techniques to reduce information in time and space dimensions. We then present the two visualization techniques (Treemap and GraphView) and let users play with four case studies that illustrate the features of the tool.

The basic utilization of Triva involves the selection of a visualization technique, a trace file collected and a configuration file with the mapping between the entities in the trace and the visualization. Triva can be compilled in any platform that supports GNUstep (Windows, Linux and Mac).

The basic concepts and slides for this practical session are available here.

Installing Triva

Install Triva & GNUstep dependencies

$ sudo apt-get install libxml2-dev libxslt1-dev libssl-dev libx11-dev \
                      libxext-dev libxt-dev libjpeg62-dev libtiff4-dev \
                      libpng12-dev libffi-dev gobjc \
                      build-essential autoconf automake \
                      libgraphviz-dev libmatheval1-dev \
                      libgsl0-dev

Install GNUstep

$ GNUSTEP=gnustep-startup-0.25.0
$ wget http://ftpmain.gnustep.org/pub/gnustep/core/$GNUSTEP.tar.gz
$ tar xfz $GNUSTEP.tar.gz   
$ cd $GNUSTEP
$ sudo ./InstallGNUstep

Make sure you source the GNUstep.sh script to initialize GNUstep environment variables

$ source /usr/GNUstep/System/Library/Makefiles/GNUstep.sh

Install Renaissance

$ svn co http://svn.gna.org/svn/gnustep/libs/renaissance/trunk renaissance
$ cd renaissance
$ source /usr/GNUstep/System/Library/Makefiles/GNUstep.sh
$ make
$ sudo make install

Install Pajé

$ git clone -b schnorr/official/master git://paje.git.sourceforge.net/gitroot/paje/paje Paje
$ export LANG=C
$ cd Paje
$ git checkout 
$ make
$ sudo make install

Get the latest development version from the Triva web site]

$ svn checkout svn://scm.gforge.inria.fr/svn/triva

Then, we bootstrap, configure, compile and install Triva

$ cd triva
$ ./bootstrap
$ ./configure
$ make
$ sudo make install

Download trace files

The trace files that are used for the practical session are available here.]

Extract the tarball, we find four trace files

# 2-01092008-6-188np-avec-poa-anomaly.trace
# dt-a-bh-21-g0.fix.trace
# dt-c-sh-448-g1.fix.trace
# dt-c-wh-85-g0.fix.trace

and one configuration file for Triva's graph view

# smpi_uncat.plist

Running Triva

KAAPI Work Stealing traces (Fibonnacci Application)

First, we have to define the colors for the states present in the trace file (Idle is blue, Rsteal is red and Run is green):

$ defaults write Triva 'State Colors' '{
   IDLE = "0 0 1";
   RSTEAL = "1 0 0";
   RUN = "0 1 0";
 }'

Then, we activate the treemap visualization technique using the --treemap switch. Then use the treemap to look for the anomaly located at Porto Alegre site.

$ Triva --treemap 2-01092008-6-188np-avec-poa-anomaly.trace

When this command is issued, three windows are opened: TimeInterval, TypeFilter and SquarifiedTreemap. The first is used to configure the current time-slice used to calculate the values for the visualization, the second can be used to select a subset of the entities that are present in the trace and the third shows the treemap visualization itself. The mouse wheel can be used over the treemap to zoom in and out in the hierarchy present in the trace. Colors are associated to the integrated values. We can verify the anomaly at the Porto Alegre site, where processes executing there spent more time in one state (stealing tasks) than the other (executing tasks).

NAS Data Traffic Benchmark

We activate the graph visualization technique using the --graph switch, and also provide the configuration file for the tool using the --gc_conf switch. The --gc_hide switch only hides the window that opens to change the configuration dynamically.

We use the first trace file, where the DT-A Black Hole with 21 processes is executed in 92 machines of Griffon cluster (simulated using SimGrid). We look for the network links limiting the execution of the application. Those links are close the nodes. What could be done to improve the performance of the application?

$ Triva --gc_conf smpi_uncat.plist --gc_hide --graph dt-a-bh-21-g0.fix.trace

The second NAS-DT benchmark is the class C White Hole with 85 processes, at the same platform. We look for the backbone links limiting the execution. A better deployment of the application may improve performance?

$ Triva --gc_conf smpi_uncat.plist --gc_hide --graph dt-c-wh-85-g0.fix.trace

The third and last trace file is the NAS-DT benchmark is the class C Shuffle with 448 processes, executing in the 92 machines. We look for the main backbone limiting the execution of the application. Can we improve the performance of this application considering the platform characteristics?

$ Triva --gc_conf smpi_uncat.plist --gc_hide --graph dt-c-sh-448-g1.fix.trace

Conclusion

We have used Triva to search for anomalies in a KAAPI work stealing application and also to pin-point network bottlenecks in the execution of a benchmark focused on data traffic.

Any comments and feedbacks are welcome!

Personal tools
Wiki special pages