Performance testing of Oar

From Grid5000

Jump to: navigation, search


Contents

Overview

Tsung is a multi-protocol distributed load testing tool. It includes an experimental plugin for OAR testing. In this page, we show how to use this plugin.

Installation

tsung depends on erlang; (tsung won't work on lenny because of the old version of erlang), so for running tsung on debian, you need squeeze or a backport of erlang.

A prebuild binary for squeeze is available here : http://tsung.erlang-projects.org/dist/debian/tsung_1.4.1-1_all.deb

You can also use a build against erlang-R14B3 for lenny here: http://public.sophia.grid5000.fr/~nniclausse/lenny/tsung_1.4.1a-1_all.deb

Configuration

First you need two scripts on your home directory:

  • notify.sh: notification scripts used by all jobs to let tsung known when a job starts and finish. Tsung listen to a tcp port (33333 in the next example); we use netcat to send notifications to tsung.
#!/bin/bash
PORT=$1
shift
echo $* | nc -q 1 localhost $PORT
  • sleep.sh: the jobs script. Here, the job simply sleeps $1 seconds.
#!/bin/sh
sleep $1

Example of a Tsung configuration file (several "sessions" are defined, see comments)

<?xml version="1.0"?>
<!DOCTYPE tsung SYSTEM "/home/nniclausse/share/tsung/tsung-1.0.dtd">
<tsung loglevel="notice" dumptraffic="protocol">

  <clients>
    <client host="localhost" use_controller_vm="true"/>
  </clients>

<!-- server is not used by oar plugin, so this is meaningless -->
 <servers><server host="127.0.0.1" port="5432" type="erlang"/></servers>

  <monitoring>
   <monitor host="localhost" type="erlang"/>
  </monitoring>
 <load>
  <user session="manynodes" start_time="1" unit="second"></user>
  <user session="manycores" start_time="30" unit="second"></user>
  <user session="seq-jobs" start_time="60" unit="second"></user>
  <user session="manycores" start_time="100" unit="second"></user>
  <user session="besteffort" start_time="1" unit="second"></user>
 </load>

<options>
  <option name="job_notify_port" value="33333"></option>
  <option name="tcp_timeout" value="86400000"></option>
</options>

 <sessions>

<!-- besteffort: start a lot of 1 core BE jobs and then a big (64 core) regular job -->
  <session probability="0" name="besteffort" type="ts_job">
   <for from="1" to="1000" incr="1" var="counter">
    <request>
      <job type="oar" name="besteffort" req="submit" resources="/core=1" duration="500" script="/home/nniclausse/sleep.sh" walltime="00:30:00" options="-t besteffort -t idempotent" notify_script="/home/nniclausse/notify.sh" notify_port="33333"/>
    </request>
   </for>

   <thinktime value="30"/>

   <request>
     <job type="oar" name="bigafterBE" req="submit" resources="/core=64" duration="60" script="/home/nniclausse/sleep.sh" walltime="00:20:00" notify_script="/home/nniclausse/notify.sh" notify_port="33333"/>
   </request>
   <request><job type="oar" req="wait_jobs"/></request>
 </session>

<!-- start a big job on several nodes, but not using all the cores -->
  <session probability="0" name="manynodes" type="ts_job">
    <request subst='true'>
      <job type="oar" name="manynodes" req="submit" resources="/nodes=8/core=3" duration="190" script="/home/nniclausse/sleep.sh" walltime="00:20:00" notify_script="/home/nniclausse/notify.sh" notify_port="33333"/>
   </request>
    <request><job type="oar" req="wait_jobs"/></request>
 </session>
<!-- start a big container job (64 cores) and lots of single core jobs inside the container -->
  <session probability="0" name="container" type="ts_job">
    <request>
      <dyn_variable name="container" re="OAR_JOB_ID=(\d+)"/>
      <job type="oar" name="container" req="submit" resources="/core=64" duration="290" script="/home/nniclausse/sleep.sh" walltime="01:20:00" options="-t container" notify_script="/home/nniclausse/notify.sh" notify_port="33333"/>
    </request>

    <for from="1" to="128" incr="1" var="counter">
      <request subst='true'><job name="inner" type="oar" req="submit" resources="/core=1" duration="120" script="/home/nniclausse/sleep.sh" options="-t inner=%%_container%%" notify_script="/home/nniclausse/notify.sh" notify_port="33333"/></request>
    </for>
    <request><job type="oar" req="wait_jobs"/></request>
  </session>

<!-- multiple cores job -->
  <session probability="0" name="manycores" type="ts_job">
    <request subst='true'>
      <job type="oar" name="manycores" queue="par" req="submit" resources="/core=16" duration="150" script="/home/nniclausse/sleep.sh" walltime="00:20:00" notify_script="/home/nniclausse/notify.sh" notify_port="33333"/>
   </request>
    <request><job type="oar" req="wait_jobs"/></request>
 </session>

<!-- submit a lot of sequential (one core) jobs -->
  <session probability="100" name="seq-jobs" type="ts_job">
      <setdynvars sourcetype="random_number" start="10" end="120">
        <var name="duration" />
      </setdynvars>

      <for from="1" to="200" incr="1" var="counter">
        <request subst='true'><job name="seq" type="oar" req="submit" resources="/core=1" duration="120" script="/home/nniclausse/sleep.sh" notify_script="/home/nniclausse/notify.sh" notify_port="33333"/></request>
      </for>
      <request><job type="oar" req="wait_jobs"/></request>
  </session>
 </sessions>
</tsung>

Save this as ~/.tsung/oar_test.xml

How to run a test

Run it by hand

Install you oar setup with kadeploy (one oar-server, one OAR frontend, many clients), then on a OAR frontend node, run tsung:

tsung -f ~/.tsung/oar_test.xml  start

cd to the test directory (~/.tsung/YYYYMMDD-HHMM) and then generate the graphical output (during the test or after) with ~/lib/tsung/bin/tsung_stats.pl

deploy and run with g5k-campaign

Install g5k-campaign, and save the following file as oar_cluster_multideployment.rb (adapt the site and resources number) and then run

g5k-campaign -i ./lib/oar_cluster_multideployment.rb  --no-cleanup 

This will deploy a oar server node, and then 20 oar nodes, setup oar and then run tsung (on the oar server use as a frontend).


# Engine used to deploy a cluster with one oar-server and  oar nodes on all others nodes.
# Based on the multiple deployment engine in the examples.

class Oar_cluster < Grid5000::Campaign::Engine

  $base_url = "http://public.sophia.grid5000.fr/~nniclausse/"
  set :environment, $base_url+"oar-node.dsc"
  set :environment_master, $base_url+"oar-server.dsc"
  set :resources, "nodes=21"
  set :site, "sophia"
  $user = defaults[:user]
  $slave_file = "/home/#{defaults[:user]}/slaves.txt"
  $tsung_conf = "oar_job_big.xml"

  on :deploy! do |env, block|
    @master, *@slaves = env[:nodes]
    fail "Well... seems like you didn't reserve more than one node..." if @slaves.empty?

    env[:parallel_deploy!] = parallel

    env_slaves = env.merge(:nodes => @slaves)
    env_master = env.merge(:nodes => @master, :environment => environment_master, :status => "master")
    File.open($slave_file, 'w') {|f| f.write( @slaves.join("\n")) }

    env[:parallel_deploy!].add(env_slaves) { |env|
      deploy!(env, &block)
    }.add(env_master) { |env|
      # master deployment
      deploy!(env, &block)
    }.loop!

    env
  end

  # This procedure will be called for each <job,deployment> tuple.
  on :install! do |env, *args|
    if (env[:status] == "master")
      ssh(env[:nodes].first, 'root') do |ssh|
        logger.info "Installing tsung on master"
        ssh.exec!("wget --no-proxy -P /tmp "+ $base_url+ "/lenny/tsung_1.4.1a-1_all.deb")
        ssh.exec!("dpkg -i /tmp/tsung_1.4.1a-1_all.deb")
        ssh.exec!("apt-get -y -f install") # will install tsung dependencies
        ssh.exec!("apt-get -y install gnuplot libtemplate-perl")
        ssh.sftp.upload!($slave_file,"/tmp/slaves.txt")

        #  wait for all deployments to be finished setting up oar.
        env[:parallel_deploy!].wait!

        ssh.exec!("yes yes | oar_resources_init /tmp/slaves.txt") do |channel, stream, data|
            logger.debug data
        end
        ssh.exec!("bash /tmp/oar_resources_init.txt")
        ssh.exec!("wget --no-proxy -P /tmp "+ $base_url +  $tsung_conf)
        # TODO: install/modify oar.conf file
      end
    end
    logger.info "Valid nodes=#{env[:nodes].inspect}"
    logger.info "Job=#{env[:job].inspect}"
    logger.info "Deployment=#{env[:deployment].inspect}"
    # nothing to install on nodes
    env
  end

  # run tsung on master
  on :execute! do |env, *args|
    if (env[:status] == "master")
      ssh(env[:nodes].first, $user) do |ssh|
        logger.info "Running tsung on master"
        ssh.exec!("/usr/bin/tsung -f /tmp/" + $tsung_conf+" start > /tmp/tsung.out") do |channel, stream, data|
            logger.info data
        end
        ssh.exec!("grep 'Log directory' /tmp/tsung.out|  cut -f4 -d' ' | tr -d \\\042 > /tmp/tsungdir.out")
        ssh.exec!("cd `cat /tmp/tsungdir.out`; /usr/lib/tsung/bin/tsung_stats.pl")
      end
    end
    env
  end
end

Results

The output generated by tsung looks like this: test OAR


You also have detailled information on each job duration in the file tsung.dump

The format is

#date;pid;jobid;job name;submit date;submit time (ms);waiting time (ms);execution time (ms);status

Example:

1315575820.125672;<0.104.0>;100;besteffort;1315575314;340;5400;500833;ok
1315575820.126362;<0.104.0>;101;besteffort;1315575314;284;5020;500929;ok
1315575882.023912;<0.104.0>;1001;bigafterBE;1315575617;332;203714;61054;ok
1315576382.698305;<0.104.0>;107;besteffort;1315575316;366;566092;500506;ok
Personal tools
Wiki special pages