Metroflux Usage
From Grid5000
Contents |
Introduction
Description
The aim of Metroflux is to give Grid’5000 users access to the state of the network. The service is providing a web interface and also an API exposing the network.
The state of the network is currently obtained using software (sFlow, NetFlow) and hardware probes (GNET, DAG) that perform traffic traffic at different data scales (aggregate, flow, packets) and different time scales (from milliseconds to minutes). Metroflux is also is able to analyze the network traffic between Grid5000 sites, by monitoring the link of each site to RENATER.
Metroflux is a multipoint measurement tool that is supposed to be available, in the near future, on all Grid5000 sites. For the moment, it is available only in Lyon and Lille (and is being installed on Rennes). It supports i) arbitrary traffic queries that run continuously on the live data streams and ii) retrospective queries that analyze past traffic data to enable network forensics.
Unlike active monitoring systems (that monitor networks by injecting custom packets), Metroflux is a passive system. As such, ingresss/egress packets headers are captured on-the-fly and forwarded and stored on external servers, from which data is queried and retrieved later.
Why/When to use Metroflux
Example of some questions that Metroflux can answer:
- Q1: What are the traffic characteristics of my own experiment (throughput, number of flows, top IP adresses, top ports, ....)?
- Q2: What is the bandwidth cost of doing a specific operation (eg: migrating virtual machines)?
- Q3: When did my machines start to send data?
- Q4: Was my experiment influenced by the lack of sufficient bandwidth?
Example of some other possible usages of Metroflux:
- U1: Use Metroflux to improve load balancing.
- U2: Use Metroflux to determine network attacks.
- U3: Just satisfy your curiosities about the Grid5000 network utilization and characteristics.
Querying Metroflux
Please note that as of now, Metroflux is installed in both Lille and Lyon. The machines that are currently analyzing and monitoring the corresponding network traffic are:
- Lyon -> {metroflux,flows}.lyon.grid5000.fr. metroflux.lyon uses a dedicated hardware probe and monitors only Lyon egress traffic, whilst flows.lyon can be seen as an sFlow collector and= monitors both egress and ingress traffic of Lyon
- Lille -> metroflux.lille.grid5000.fr, that can be seen as a NetFlow-based software probe and monitors both the output and the input traffic of Lille
Syntax
Queries include an analysis name (module) and a set of parameters that are passed in the HTTP GET query. The queried analysis name and parameters are encoded in the request according to the schema:
"http://host:port/module?parameters"
or
"http://host:port/service"
, where service=status
Parameters are encoded as in standard HTML requests: “param1=value1¶m2=value2&…”. There are standard parameters that the core processes understand, and additional module-specific parameters.
Parameters include:
- start=<unix timestamp>, UNIX timestamp of the beginning of the time window of interest.
- end=<unix timestamp>, UNIX timestamp of the end of the time window of interest.
- time=<time>:<time>, specifies a time range. "<time>" can be an exact timestamp in the format "@[cc[yy[mm[dd[hh]]]]]mmss" as well as a relative timestamp in the format "[+|-](<number>{d,h,m,s})".
- format=<string>, specifies the output format of the query. The supported formats depend on the analysis module itself.
- wait=yes|no, specifies when the end time of the query is in the future, whether to wait until the necessary packets are processed or, instead, just retrieve the information that is currently available.
- gridjobid=<OARGRID job identifier>, which is used to filter and show only network analysis from the nodes belonging to that grid job.
- jobid=<OAR job iddentifier>, used to filter statistics to the local nodes assigned to that OAR job.
There is a specific parameter that specifies that the input packet stream of the analysis is different than the packet stream coming from the probe:
- source=<module>, specify a source module for the input packet stream of this query different than the actual traffic.
This parameter is very useful if you want to customize the default analyses provided by the system. The accepted modules are trace and tuple.
And here are the parameters with whom you can customize the analysis ( the source parameter must be specified !!!):
- filter=<expression>, specify a filter on the incoming packet stream.
- interval=<integer>, specify measurement interval of the new module.
Any other parameter is passed down as-is to the application module and may trigger different behaviors depending on the specific module implementation.
Some examples of valid queries are:
| Command | Description |
|---|---|
| http://flows.lyon.grid5000.fr:44444/traffic?time=-10h:0 | Returns traffic for the past 10 hours
|
| http://metroflux.lille.grid5000.fr:44444/traffic?time=@093000:+10m&format=gnuplot | Returns traffic from 09:30:00 of today for 10 minutes. Records are output in gnuplot format
|
| http://metroflux.lille.grid5000.fr:44444/topaddr?time=-1h:0&source=tuple | Replays a flow stream using the output of the tuple module in the past hour and returns all record that the topaddr module generates
|
| http://metroflux.lille.grid5000.fr:44444/traffic?gridjobid=23162 | Retrieves the nodes involved in gridjobid 23162 and returns the input and output traffic. |
| http://metroflux.lille.grid5000.fr:44444/traffic?jobid=10111213&time=-1h:0&filter=icmp | Returns the last past hour ICMP traffic of local nodes assigned to OAR job id 10111213. |
| http://metroflux.lille.grid5000.fr:44444/traffic?jobid=10111213&gridjobid=23162&time=-1h:0 | Returns the last past hour traffic of grid-wide nodes assigned to grid job id 23162 and local nodes assigned to OAR job id 10111213. |
The process in charge of reading and processing the incoming user queries is able to understand several standard parameters, such as the starting and ending timestamp of the time interval being queried, the expected output format, etc. Each variable has a default value; for example, if no start time is specified, it is assumed to be the current time.
In addition to standard queries, Metroflux also accepts special requests for 'services'. For example, the status query informs the user about the modules that run on the node, their packet filters, timestamp of the first record, supported formats and description, as well as information about virtual nodes.
Available modules
For the moment we enabled this following analysis:
- trace - packet-level trace (pcap file of the traffic)
- traffic - throughput in packet and bytes per second
- tuple - flow stream (5 tuple)
- topaddr - top IP addresses (source or destination) in bytes
- topports - top ports
- flowcount - approximate active flow counter
- protocol - protocol breakdown
- ethtypes - ethertypes breakdown
- flow-reassembly - TCP flow reassembly
Traffic Filters Syntax
The analysis modules can apply filters to the incoming packet streams. The filter can operate on any header field as well as on more complex data structures. For example, it is possible to filter packets depending on source or destination Autonomous Systems according to publicly available BGP table dumps. If no filter is specified all packets will be processed by the analysis module. Otherwise only packets that match the filter will be processed. The filter consists of one or more keyword/value pairs. Keywords specify protocol-specific fields in the packet while values include exact matches or ranges (including CIDR prefixes). Multiple keyword/value pairs can be combined to build more complex filters using and, or or not logical connectors. Currently supported keywords include:
- src|dst, source or destination IP address or CIDR network block.
- addr|host, source and destination IP address or network block.
- sport|dport, source or destination port number.
- ip, IP packets.
- tcp|udp|icmp, transport protocol.
- input|output, input|output interface (for NetFlow, sFlow data).
- ether, Ethernet address.
- asn, Autonomous system number.
- exporter, Netflow router exported.
- to_ds|from_ds, IEEE 802.11 packets from/to access points.
For example valid filters are:
| Filter | Comment |
|---|---|
| tcp | Process TCP packets only
|
| tcp and src 10.213.54.6 and sport 5000:6000 | Process TCP packets with source IP address 10.213.54.6 and source port in the range 5000 to 6000
|
| udp and (not(sport 21) or src 64.32.234.9/31) | Process UDP packets whose source port number is not 21 or the source address in the 64.32.234.9/31 range
|
| src asn 2529 or addr asn 65535 | Process packets whose source Autonomous System in 2529 or 65535 or the destination Autonomous System is 65535
|
And here is an example:
http://metroflux.lyon.grid5000.fr:44444/tuple?time=-10m:0&filter=tcp+and+src+192.168.159.243&source=tuple
Remember that you must specify a source when specifying a 'filter' parameter: "source=tuple" or "source=trace"!!
Output Format
The modules support different output formats. Again, with the status command, we can find out what formats are supported by each module:
Module: ethtypes | all | 1252604695 | plain pretty gnuplot | Ethertypes breakdown Module: flow-reassembly | tcp | 1253064987 | plain como | TCP flow reassembly Module: flowcount | all | 1252604695 | gnuplot | Approximate active flow counter Module: protocol | ip | 1156349940 | plain pretty gnuplot | Protocol breakdown Module: topaddr | ip | 1156349940 | plain pretty html sidebox | Top IP addresses (source or destination) in bytes Module: topports | tcp or udp | 1156349940 | plain pretty html sidebox | Top ports Module: traffic | all | 1156349940 | gnuplot plain pretty | Packet/bytes counter Module: tuple | ip | 1156349940 | plain pretty html como | Active flows (5 tuple)
when asking for a specific format you have to pass the parameter format=format to the query:
http://metroflux.lyon.grid5000.fr:44444/tuple?format=html&time=-1h:20m
The pretty format is just an eye candy format.
Gnuplot
As you probably observed there is a format named gnuplot, that facilitates plotting the data with gnuplot.
To plot the data specify the gnuplot format and then redirect the output to gnuplot:
http://metroflux.lyon.grid5000.fr:44444/flowcount?format=gnuplot&time=-1h:0 | gnuplot < tmp.eps
For the traffic module, there are two steps for doing this, because there are two lines to be plotted (Mbps,pkts/s):
http://metroflux.lyon.grid5000.fr:44444/traffic?format=gnuplot&time=-10m:0 > gnuplot_file
cat gnuplot_file | awk '{v[i++] = $0; print $0;} /^e/ {for (j=1;j<i;j++) print v[j];}' | gnuplot > tmp.eps
Timestamp
You might notice when looking at the output of the modules that the showed hour is one/two hour(s) late as expected, this is because it is the UTC hour:
Start Duration Proto Source IP:Port Destination IP:Port Bytes Packets Sep 17 2009 18:19:58.806605 1.151999 tcp 192.168.159.243 5667 172.24.120.20 39652 320 4 Sep 17 2009 18:19:58.806605 1.279999 tcp 172.24.120.20 39652 192.168.159.243 5667 866 5
18:19:58 is actually 20:19:58 in France.
On the other side the traffic module shows also the current hour beside the timestamp:
Date Timestamp Bytes Pkts Thu Sep 17 20:25:00 2009 1253211900.154005 48759 64 Thu Sep 17 20:26:00 2009 1253211960.169377 39730 57
20:26:00 is the actual 20:26:00 hour in France.
The Timescale
The timescale of data differs form site to site due to the hardware probes that are capturing the traffic. At Lyon the traffic is captured with dedicated hardware (GNET10) that is capturing all the packet headers that are crossing the link, so this allows to have a timescale of one second.
On the other side at Lille we obtain the data with NetFlow which gives informations about flows and not packets and so we are forced to have a timescale of one minute.
If you want a different interval you can pass the interval parameter to the query:
curl "http://metroflux.lille.grid5000.fr:44444/traffic?interval=5&source=tuple&time=-10m:0"
which will give the statistics at intervals of 5 seconds:
Thu Sep 17 20:59:40 2009 1253213980.392950 112118 113 Thu Sep 17 20:59:45 2009 1253213985.445963 2292 18
For smaller intervals than the default ones the correctness of data is not guaranteed.
Contact
- Oana Goga oana.goga_AT_ens-lyon.fr
- Armel Soro armel DOT soro AT inria DOT fr

