Metroflux Usage

From Grid5000
Jump to: navigation, search


Contents

Introduction

Description

The aim of Metroflux is to give Grid’5000 users access to the state of the network. The service is providing a web interface and also an API exposing the network.

The state of the network is currently obtained using software (sFlow, NetFlow) and hardware probes (GNET, DAG) that perform traffic traffic at different data scales (aggregate, flow, packets) and different time scales (from milliseconds to minutes). Metroflux is also is able to analyze the network traffic between Grid5000 sites, by monitoring the link of each site to RENATER.

Metroflux Lyon.png

Metroflux is a multipoint measurement tool that is supposed to be available, in the near future, on all Grid5000 sites. For the moment, it is available only in Lyon and Lille (and is being installed on Rennes). It supports i) arbitrary traffic queries that run continuously on the live data streams and ii) retrospective queries that analyze past traffic data to enable network forensics.

Unlike active monitoring systems (that monitor networks by injecting custom packets), Metroflux is a passive system. As such, ingresss/egress packets headers are captured on-the-fly and forwarded and stored on external servers, from which data is queried and retrieved later.

Why/When to use Metroflux

Example of some questions that Metroflux can answer:

  • Q1: What are the traffic characteristics of my own experiment (throughput, number of flows, top IP adresses, top ports, ....)?
  • Q2: What is the bandwidth cost of doing a specific operation (eg: migrating virtual machines)?
  • Q3: When did my machines start to send data?
  • Q4: Was my experiment influenced by the lack of sufficient bandwidth?

Example of some other possible usages of Metroflux:

  • U1: Use Metroflux to improve load balancing.
  • U2: Use Metroflux to determine network attacks.
  • U3: Just satisfy your curiosities about the Grid5000 network utilization and characteristics.

Querying Metroflux

Please note that as of now, Metroflux is installed in both Lille and Lyon. The machines that are currently analyzing and monitoring the corresponding network traffic are:

  • Lyon -> {metroflux,flows}.lyon.grid5000.fr. metroflux.lyon uses a dedicated hardware probe and monitors only Lyon egress traffic, whilst flows.lyon can be seen as an sFlow collector and= monitors both egress and ingress traffic of Lyon
  • Lille -> metroflux.lille.grid5000.fr, that can be seen as a NetFlow-based software probe and monitors both the output and the input traffic of Lille

Syntax

Queries include an analysis name (module) and a set of parameters that are passed in the HTTP GET query. The queried analysis name and parameters are encoded in the request according to the schema:

 "http://host:port/module?parameters"

or

 "http://host:port/service"

, where service=status

Parameters are encoded as in standard HTML requests: “param1=value1&param2=value2&…”. There are standard parameters that the core processes understand, and additional module-specific parameters.

Parameters include:

  • start=<unix timestamp>, UNIX timestamp of the beginning of the time window of interest.
  • end=<unix timestamp>, UNIX timestamp of the end of the time window of interest.
  • time=<time>:<time>, specifies a time range. "<time>" can be an exact timestamp in the format "@[cc[yy[mm[dd[hh]]]]]mmss" as well as a relative timestamp in the format "[+|-](<number>{d,h,m,s})".
  • format=<string>, specifies the output format of the query. The supported formats depend on the analysis module itself.
  • wait=yes|no, specifies when the end time of the query is in the future, whether to wait until the necessary packets are processed or, instead, just retrieve the information that is currently available.
  • gridjobid=<OARGRID job identifier>, which is used to filter and show only network analysis from the nodes belonging to that grid job.
  • jobid=<OAR job iddentifier>, used to filter statistics to the local nodes assigned to that OAR job.


There is a specific parameter that specifies that the input packet stream of the analysis is different than the packet stream coming from the probe:

  • source=<module>, specify a source module for the input packet stream of this query different than the actual traffic.

This parameter is very useful if you want to customize the default analyses provided by the system. The accepted modules are trace and tuple.


And here are the parameters with whom you can customize the analysis ( the source parameter must be specified !!!):

  • filter=<expression>, specify a filter on the incoming packet stream.
  • interval=<integer>, specify measurement interval of the new module.


Any other parameter is passed down as-is to the application module and may trigger different behaviors depending on the specific module implementation.


Some examples of valid queries are:

Command Description
http://flows.lyon.grid5000.fr:44444/traffic?time=-10h:0 Returns traffic for the past 10 hours


http://metroflux.lille.grid5000.fr:44444/traffic?time=@093000:+10m&format=gnuplot Returns traffic from 09:30:00 of today for 10 minutes. Records are output in gnuplot format


http://metroflux.lille.grid5000.fr:44444/topaddr?time=-1h:0&source=tuple Replays a flow stream using the output of the tuple module in the past hour and returns all record that the topaddr module generates


http://metroflux.lille.grid5000.fr:44444/traffic?gridjobid=23162 Retrieves the nodes involved in gridjobid 23162 and returns the input and output traffic.
http://metroflux.lille.grid5000.fr:44444/traffic?jobid=10111213&time=-1h:0&filter=icmp Returns the last past hour ICMP traffic of local nodes assigned to OAR job id 10111213.
http://metroflux.lille.grid5000.fr:44444/traffic?jobid=10111213&gridjobid=23162&time=-1h:0 Returns the last past hour traffic of grid-wide nodes assigned to grid job id 23162 and local nodes assigned to OAR job id 10111213.


The process in charge of reading and processing the incoming user queries is able to understand several standard parameters, such as the starting and ending timestamp of the time interval being queried, the expected output format, etc. Each variable has a default value; for example, if no start time is specified, it is assumed to be the current time.

In addition to standard queries, Metroflux also accepts special requests for 'services'. For example, the status query informs the user about the modules that run on the node, their packet filters, timestamp of the first record, supported formats and description, as well as information about virtual nodes.

Available modules

For the moment we enabled this following analysis:

  • trace - packet-level trace (pcap file of the traffic)
  • traffic - throughput in packet and bytes per second
  • tuple - flow stream (5 tuple)
  • topaddr - top IP addresses (source or destination) in bytes
  • topports - top ports
  • flowcount - approximate active flow counter
  • protocol - protocol breakdown
  • ethtypes - ethertypes breakdown
  • flow-reassembly - TCP flow reassembly

Traffic Filters Syntax

The analysis modules can apply filters to the incoming packet streams. The filter can operate on any header field as well as on more complex data structures. For example, it is possible to filter packets depending on source or destination Autonomous Systems according to publicly available BGP table dumps. If no filter is specified all packets will be processed by the analysis module. Otherwise only packets that match the filter will be processed. The filter consists of one or more keyword/value pairs. Keywords specify protocol-specific fields in the packet while values include exact matches or ranges (including CIDR prefixes). Multiple keyword/value pairs can be combined to build more complex filters using and, or or not logical connectors. Currently supported keywords include:

  • src|dst, source or destination IP address or CIDR network block.
  • addr|host, source and destination IP address or network block.
  • sport|dport, source or destination port number.
  • ip, IP packets.
  • tcp|udp|icmp, transport protocol.
  • input|output, input|output interface (for NetFlow, sFlow data).
  • ether, Ethernet address.
  • asn, Autonomous system number.
  • exporter, Netflow router exported.
  • to_ds|from_ds, IEEE 802.11 packets from/to access points.

For example valid filters are:

Filter Comment
tcp Process TCP packets only


tcp and src 10.213.54.6 and sport 5000:6000 Process TCP packets with source IP address 10.213.54.6 and source port in the range 5000 to 6000


udp and (not(sport 21) or src 64.32.234.9/31) Process UDP packets whose source port number is not 21 or the source address in the 64.32.234.9/31 range


src asn 2529 or addr asn 65535 Process packets whose source Autonomous System in 2529 or 65535 or the destination Autonomous System is 65535


And here is an example:

 http://metroflux.lyon.grid5000.fr:44444/tuple?time=-10m:0&filter=tcp+and+src+192.168.159.243&source=tuple

Remember that you must specify a source when specifying a 'filter' parameter: "source=tuple" or "source=trace"!!

Output Format

The modules support different output formats. Again, with the status command, we can find out what formats are supported by each module:

Module: ethtypes        | all | 1252604695 |  plain pretty gnuplot | Ethertypes breakdown
Module: flow-reassembly | tcp | 1253064987 |  plain como | TCP flow reassembly
Module: flowcount       | all | 1252604695 |  gnuplot | Approximate active flow counter
Module: protocol        | ip | 1156349940 |  plain pretty gnuplot | Protocol breakdown
Module: topaddr         | ip | 1156349940 |  plain pretty html sidebox | Top IP addresses (source or destination) in bytes
Module: topports        | tcp or udp | 1156349940 | plain pretty html sidebox | Top ports
Module: traffic         | all | 1156349940 | gnuplot plain pretty | Packet/bytes counter
Module: tuple           | ip | 1156349940 | plain pretty html como | Active flows (5 tuple)

when asking for a specific format you have to pass the parameter format=format to the query:

http://metroflux.lyon.grid5000.fr:44444/tuple?format=html&time=-1h:20m

The pretty format is just an eye candy format.

Gnuplot

As you probably observed there is a format named gnuplot, that facilitates plotting the data with gnuplot.

To plot the data specify the gnuplot format and then redirect the output to gnuplot:

 http://metroflux.lyon.grid5000.fr:44444/flowcount?format=gnuplot&time=-1h:0 | gnuplot < tmp.eps 

For the traffic module, there are two steps for doing this, because there are two lines to be plotted (Mbps,pkts/s):

http://metroflux.lyon.grid5000.fr:44444/traffic?format=gnuplot&time=-10m:0 > gnuplot_file
cat gnuplot_file | awk '{v[i++] = $0; print $0;} /^e/ {for (j=1;j<i;j++) print v[j];}' |  gnuplot > tmp.eps

Timestamp

You might notice when looking at the output of the modules that the showed hour is one/two hour(s) late as expected, this is because it is the UTC hour:

Start                         Duration     Proto Source IP:Port      Destination IP:Port   Bytes    Packets
Sep 17 2009 18:19:58.806605   1.151999   tcp 192.168.159.243  5667   172.24.120.20 39652      320        4
Sep 17 2009 18:19:58.806605   1.279999   tcp   172.24.120.20 39652 192.168.159.243  5667      866        5

18:19:58 is actually 20:19:58 in France.

On the other side the traffic module shows also the current hour beside the timestamp:

Date                     Timestamp          Bytes    Pkts
Thu Sep 17 20:25:00 2009   1253211900.154005    48759       64
Thu Sep 17 20:26:00 2009   1253211960.169377    39730       57


20:26:00 is the actual 20:26:00 hour in France.

The Timescale

The timescale of data differs form site to site due to the hardware probes that are capturing the traffic. At Lyon the traffic is captured with dedicated hardware (GNET10) that is capturing all the packet headers that are crossing the link, so this allows to have a timescale of one second.

On the other side at Lille we obtain the data with NetFlow which gives informations about flows and not packets and so we are forced to have a timescale of one minute.

If you want a different interval you can pass the interval parameter to the query:

 curl "http://metroflux.lille.grid5000.fr:44444/traffic?interval=5&source=tuple&time=-10m:0"

which will give the statistics at intervals of 5 seconds:

Thu Sep 17 20:59:40 2009   1253213980.392950   112118      113
Thu Sep 17 20:59:45 2009   1253213985.445963     2292       18

For smaller intervals than the default ones the correctness of data is not guaranteed.


Contact

- Oana Goga oana.goga_AT_ens-lyon.fr

- Armel Soro armel DOT soro AT inria DOT fr


Personal tools
Namespaces

Variants
Actions
Public Portal
Users Portal
Admin portal
Wiki special pages
Toolbox