Performance testing of Oar
From Grid5000
Contents |
Overview
Tsung is a multi-protocol distributed load testing tool. It includes an experimental plugin for OAR testing. In this page, we show how to use this plugin.
Installation
tsung depends on erlang; (tsung won't work on lenny because of the old version of erlang), so for running tsung on debian, you need squeeze or a backport of erlang.
A prebuild binary for squeeze is available here : http://tsung.erlang-projects.org/dist/debian/tsung_1.4.1-1_all.deb
You can also use a build against erlang-R14B3 for lenny here: http://public.sophia.grid5000.fr/~nniclausse/lenny/tsung_1.4.1a-1_all.deb
Configuration
First you need two scripts on your home directory:
-
notify.sh: notification scripts used by all jobs to let tsung known when a job starts and finish. Tsung listen to a tcp port (33333 in the next example); we use netcat to send notifications to tsung.
#!/bin/bash PORT=$1 shift echo $* | nc -q 1 localhost $PORT
-
sleep.sh: the jobs script. Here, the job simply sleeps $1 seconds.
#!/bin/sh sleep $1
Example of a Tsung configuration file (several "sessions" are defined, see comments)
<?xml version="1.0"?>
<!DOCTYPE tsung SYSTEM "/home/nniclausse/share/tsung/tsung-1.0.dtd">
<tsung loglevel="notice" dumptraffic="protocol">
<clients>
<client host="localhost" use_controller_vm="true"/>
</clients>
<!-- server is not used by oar plugin, so this is meaningless -->
<servers><server host="127.0.0.1" port="5432" type="erlang"/></servers>
<monitoring>
<monitor host="localhost" type="erlang"/>
</monitoring>
<load>
<user session="manynodes" start_time="1" unit="second"></user>
<user session="manycores" start_time="30" unit="second"></user>
<user session="seq-jobs" start_time="60" unit="second"></user>
<user session="manycores" start_time="100" unit="second"></user>
<user session="besteffort" start_time="1" unit="second"></user>
</load>
<options>
<option name="job_notify_port" value="33333"></option>
<option name="tcp_timeout" value="86400000"></option>
</options>
<sessions>
<!-- besteffort: start a lot of 1 core BE jobs and then a big (64 core) regular job -->
<session probability="0" name="besteffort" type="ts_job">
<for from="1" to="1000" incr="1" var="counter">
<request>
<job type="oar" name="besteffort" req="submit" resources="/core=1" duration="500" script="/home/nniclausse/sleep.sh" walltime="00:30:00" options="-t besteffort -t idempotent" notify_script="/home/nniclausse/notify.sh" notify_port="33333"/>
</request>
</for>
<thinktime value="30"/>
<request>
<job type="oar" name="bigafterBE" req="submit" resources="/core=64" duration="60" script="/home/nniclausse/sleep.sh" walltime="00:20:00" notify_script="/home/nniclausse/notify.sh" notify_port="33333"/>
</request>
<request><job type="oar" req="wait_jobs"/></request>
</session>
<!-- start a big job on several nodes, but not using all the cores -->
<session probability="0" name="manynodes" type="ts_job">
<request subst='true'>
<job type="oar" name="manynodes" req="submit" resources="/nodes=8/core=3" duration="190" script="/home/nniclausse/sleep.sh" walltime="00:20:00" notify_script="/home/nniclausse/notify.sh" notify_port="33333"/>
</request>
<request><job type="oar" req="wait_jobs"/></request>
</session>
<!-- start a big container job (64 cores) and lots of single core jobs inside the container -->
<session probability="0" name="container" type="ts_job">
<request>
<dyn_variable name="container" re="OAR_JOB_ID=(\d+)"/>
<job type="oar" name="container" req="submit" resources="/core=64" duration="290" script="/home/nniclausse/sleep.sh" walltime="01:20:00" options="-t container" notify_script="/home/nniclausse/notify.sh" notify_port="33333"/>
</request>
<for from="1" to="128" incr="1" var="counter">
<request subst='true'><job name="inner" type="oar" req="submit" resources="/core=1" duration="120" script="/home/nniclausse/sleep.sh" options="-t inner=%%_container%%" notify_script="/home/nniclausse/notify.sh" notify_port="33333"/></request>
</for>
<request><job type="oar" req="wait_jobs"/></request>
</session>
<!-- multiple cores job -->
<session probability="0" name="manycores" type="ts_job">
<request subst='true'>
<job type="oar" name="manycores" queue="par" req="submit" resources="/core=16" duration="150" script="/home/nniclausse/sleep.sh" walltime="00:20:00" notify_script="/home/nniclausse/notify.sh" notify_port="33333"/>
</request>
<request><job type="oar" req="wait_jobs"/></request>
</session>
<!-- submit a lot of sequential (one core) jobs -->
<session probability="100" name="seq-jobs" type="ts_job">
<setdynvars sourcetype="random_number" start="10" end="120">
<var name="duration" />
</setdynvars>
<for from="1" to="200" incr="1" var="counter">
<request subst='true'><job name="seq" type="oar" req="submit" resources="/core=1" duration="120" script="/home/nniclausse/sleep.sh" notify_script="/home/nniclausse/notify.sh" notify_port="33333"/></request>
</for>
<request><job type="oar" req="wait_jobs"/></request>
</session>
</sessions>
</tsung>
Save this as ~/.tsung/oar_test.xml
How to run a test
Run it by hand
Install you oar setup with kadeploy (one oar-server, one OAR frontend, many clients), then on a OAR frontend node, run tsung:
tsung -f ~/.tsung/oar_test.xml start
cd to the test directory (~/.tsung/YYYYMMDD-HHMM) and then generate the graphical output (during the test or after) with ~/lib/tsung/bin/tsung_stats.pl
deploy and run with g5k-campaign
Install g5k-campaign, and save the following file as oar_cluster_multideployment.rb (adapt the site and resources number) and then run
g5k-campaign -i ./lib/oar_cluster_multideployment.rb --no-cleanup
This will deploy a oar server node, and then 20 oar nodes, setup oar and then run tsung (on the oar server use as a frontend).
# Engine used to deploy a cluster with one oar-server and oar nodes on all others nodes.
# Based on the multiple deployment engine in the examples.
class Oar_cluster < Grid5000::Campaign::Engine
$base_url = "http://public.sophia.grid5000.fr/~nniclausse/"
set :environment, $base_url+"oar-node.dsc"
set :environment_master, $base_url+"oar-server.dsc"
set :resources, "nodes=21"
set :site, "sophia"
$user = defaults[:user]
$slave_file = "/home/#{defaults[:user]}/slaves.txt"
$tsung_conf = "oar_job_big.xml"
on :deploy! do |env, block|
@master, *@slaves = env[:nodes]
fail "Well... seems like you didn't reserve more than one node..." if @slaves.empty?
env[:parallel_deploy!] = parallel
env_slaves = env.merge(:nodes => @slaves)
env_master = env.merge(:nodes => @master, :environment => environment_master, :status => "master")
File.open($slave_file, 'w') {|f| f.write( @slaves.join("\n")) }
env[:parallel_deploy!].add(env_slaves) { |env|
deploy!(env, &block)
}.add(env_master) { |env|
# master deployment
deploy!(env, &block)
}.loop!
env
end
# This procedure will be called for each <job,deployment> tuple.
on :install! do |env, *args|
if (env[:status] == "master")
ssh(env[:nodes].first, 'root') do |ssh|
logger.info "Installing tsung on master"
ssh.exec!("wget --no-proxy -P /tmp "+ $base_url+ "/lenny/tsung_1.4.1a-1_all.deb")
ssh.exec!("dpkg -i /tmp/tsung_1.4.1a-1_all.deb")
ssh.exec!("apt-get -y -f install") # will install tsung dependencies
ssh.exec!("apt-get -y install gnuplot libtemplate-perl")
ssh.sftp.upload!($slave_file,"/tmp/slaves.txt")
# wait for all deployments to be finished setting up oar.
env[:parallel_deploy!].wait!
ssh.exec!("yes yes | oar_resources_init /tmp/slaves.txt") do |channel, stream, data|
logger.debug data
end
ssh.exec!("bash /tmp/oar_resources_init.txt")
ssh.exec!("wget --no-proxy -P /tmp "+ $base_url + $tsung_conf)
# TODO: install/modify oar.conf file
end
end
logger.info "Valid nodes=#{env[:nodes].inspect}"
logger.info "Job=#{env[:job].inspect}"
logger.info "Deployment=#{env[:deployment].inspect}"
# nothing to install on nodes
env
end
# run tsung on master
on :execute! do |env, *args|
if (env[:status] == "master")
ssh(env[:nodes].first, $user) do |ssh|
logger.info "Running tsung on master"
ssh.exec!("/usr/bin/tsung -f /tmp/" + $tsung_conf+" start > /tmp/tsung.out") do |channel, stream, data|
logger.info data
end
ssh.exec!("grep 'Log directory' /tmp/tsung.out| cut -f4 -d' ' | tr -d \\\042 > /tmp/tsungdir.out")
ssh.exec!("cd `cat /tmp/tsungdir.out`; /usr/lib/tsung/bin/tsung_stats.pl")
end
end
env
end
end
Results
The output generated by tsung looks like this: test OAR
You also have detailled information on each job duration in the file tsung.dump
The format is
#date;pid;jobid;job name;submit date;submit time (ms);waiting time (ms);execution time (ms);status
Example:
1315575820.125672;<0.104.0>;100;besteffort;1315575314;340;5400;500833;ok 1315575820.126362;<0.104.0>;101;besteffort;1315575314;284;5020;500929;ok 1315575882.023912;<0.104.0>;1001;bigafterBE;1315575617;332;203714;61054;ok 1315576382.698305;<0.104.0>;107;besteffort;1315575316;366;566092;500506;ok
