Advanced KaVLAN: Difference between revisions

From Grid5000
Jump to navigation Jump to search
Line 168: Line 168:
For user accounts, you need to specify your GEM_HOME directory because in the classical one, you won't be able to install the "ruby-ip". To make it possible, type :
For user accounts, you need to specify your GEM_HOME directory because in the classical one, you won't be able to install the "ruby-ip". To make it possible, type :
{{Term|location=frontend|cmd=<code class="command">export</code> GEM_HOME=~/.gem/ruby/1.8/}}
{{Term|location=frontend|cmd=<code class="command">export</code> GEM_HOME=~/.gem/ruby/1.8/}}
 
{{Warning|text=On a frontend that runs Debian Jessie (only in Reims for now), the version number is different:
{{Term|location=frontend|cmd=<code class="command">export</code> GEM_HOME=~/.gem/ruby/2.1.0/}}}}


You have to disable the default DHCP server of the VLAN:
You have to disable the default DHCP server of the VLAN:

Revision as of 17:30, 9 December 2015

Note.png Note

This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team.

Overview

The goal of Kavlan is to provide network isolation for Grid'5000 users. KaVLAN allow users to manage VLAN on their Grid'5000 nodes. The benefits is complete level 2 isolation. It can be used together with OAR and Kadeploy to do some experimentations on the grid.

The first step is to read the KaVLAN introduction.

Use Open-MX with KaVLAN

In the first part of the tutorial, we will use kadeploy and kavlan together on a single site.

Open-MX is a high-performance implementation of the Myrinet Express message-passing stack over generic Ethernet networks.

KaVLAN let several users use simultaneously Open-MX on a Grid'5000 site without interfering. For this, we will use a routed vlan (we could also use a local vlan).

First, you have to connect to one of the following sites: sophia, lille and lyon, nancy, rennes, toulouse (reims should be available soon).

To obtain nodes and a VLAN, you must reserve a kavlan resources with oarsub. There are 3 kinds of resources: kavlan, kavlan-local, kavlan-global. Here, we will use 3 nodes and a routed VLAN:

Terminal.png frontend:
oarsub -t deploy -l {"type='kavlan'"}/vlan=1,{"cluster='suno'"}/nodes=3 -I

A shell is now opened on the frontend (like any regular deploy job) You can get the id of your VLAN using the kavlan command

Terminal.png frontend:
kavlan -V

If you run this command outside the shell started by OAR for your reservation, you must add the oar JOBID.

Terminal.png frontend:
kavlan -V -j JOBID

You should get an integer in the <4-9> range for this routed VLAN ( the range for local vlan is <1-3>, and there is one global VLAN per OAR server).

You can get all the options of the command using --help:

# kavlan --help
Usage: kavlan [options]
Specific options:
    -i, --vlan-id N                  set VLAN ID (integer or DEFAULT)
    -C, --ca-cert CA                 CA certificate
    -c, --client-cert CERT           client certificate
    -k, --client-key KEY             client key
    -l, --get-nodelist               Show nodenames in the given vlan
    -e, --enable-dhcp                Start DHCP server
    -d, --disable-dhcp               Stop DHCP server
    -V, --show-vlan-id               Show vlan id of job (needs -j JOBID)
    -g, --get-vlan                   Show vlan of nodes
    -s, --set-vlan                   Set vlan of nodes
    -j, --oar-jobid JOBID            OAR job id
    -m, --machine NODE               set nodename (several -m are OK)
    -f, --filename NODEFILE          read nodes from a file
    -u, --user USERNAME              username
    -v, --[no-]verbose               Run verbosely
    -q, --[no-]quiet                 Run quietly
        --[no-]debug                 Run with debug output
    -h, --help                       Show this message
        --version                    Show version

Once you have a kavlan reservation running, you can put your nodes in your VLAN (and back into the default VLAN) at anytime during the lifetime of your job; we will not use this for now.

Instead we will change the vlan with kadeploy. The next step is to deploy the nodes with an Open-MX aware image.


Enable the dhcp server of the VLAN

Before deploying, if you don't install your own DHCP server, you should start the default DHCP server of the VLAN. Do this with the kavlan command (add -j JOBID if needed) :

Terminal.png frontend:
kavlan -e

(You can disable the DHCP server with kavlan -d)

Deploy nodes and change VLAN in one step

Warning.png Warning

This part of the tutorial is known to be buggy. If the following deployment really fails (see warning below), use the "wheezy-x64-base" image instead and go directly to Network_isolation_on_Grid'5000#Setup_a_DHCP_server_on_your_nodes

Terminal.png frontend:
kadeploy3 -f $OAR_NODEFILE -k -a http://public.sophia.grid5000.fr/~nniclausse/openmx.dsc --vlan `kavlan -V` --reboot-classical-timeout 450 --reboot-kexec-timeout 450
Warning.png Warning

This image requires a higher reboot time. This is why we have to manually specify a longuer reboot time. It may be necessary to increase this value. You can see with kaconsole if the reboot goes well and if it is too long for kadeploy to handle. It is possible that kadeploy labels the deployment as failure just because the node took too long to reboot. In this case, a few minutes later it still becomes available through ping and SSH

Once the deployment is done, you will be able to connect on your nodes. They are now inside the VLAN, therefore they are not reachable with their default IP; You can get the list of new hostname of you nodes in the vlan with kavlan -l:

Create a nodefile and copy it on the first node:

Terminal.png frontend:
kavlan -l > nodefile
Terminal.png frontend:
scp nodefile root@`head -1 < nodefile`:/tmp


The password required is here

 Password : grid5000 

Use Open-MX

Once the deployement is done, you can connect to a node. The nodes are now in the vlan, therefore they can be reached at the address:

node-XX-kavlan-vlanid

Note on Open-MX configuration: it was compiled with a MTU of 1500 instead of the default 9000 because jumbo frames are not configured on all Grid'5000 routers and switches.


Now connect on the first node:

Terminal.png frontend:
ssh root@`kavlan -l | head -1`

and run omx_info; you should see your 3 nodes:

Terminal.png node:
/opt/open-mx/bin/omx_info
Open-MX version 1.3.4
 build: root@suno-30-kavlan-18.sophia.grid5000.fr:/tmp/open-mx-1.3.4 Wed Mar  9 16:23:47 CET 2011

Found 1 boards (32 max) supporting 32 endpoints each:
 suno-6-kavlan-4.sophia.grid5000.fr:0 (board #0 name eth0 addr 00:26:b9:3f:40:af)
   managed by driver 'bnx2'
   WARNING: high interrupt-coalescing

Peer table is ready, mapper is 00:00:00:00:00:00
================================================
  0) 00:26:b9:3f:40:af suno-6-kavlan-4.sophia.grid5000.fr:0
  1) 00:26:b9:3f:43:a1 suno-7-kavlan-4.sophia.grid5000.fr:0
  2) 00:26:b9:3f:4a:3d suno-8-kavlan-4.sophia.grid5000.fr:0


You can try to run an mpi application with mx support: netpipe

Terminal.png node:
su - mpi
Terminal.png node:
export LD_LIBRARY_PATH=/var/mpi/openmpi/lib/:/opt/open-mx/lib/
Terminal.png node:
cd NetPIPE-3.7.1
Terminal.png node:
mpirun -n 2 -x LD_LIBRARY_PATH --machinefile /tmp/nodefile --mca btl self,sm,mx ./NPmpi
  0:       1 bytes   4627 times -->      0.37 Mbps in      20.36 usec
  1:       2 bytes   4910 times -->      0.75 Mbps in      20.31 usec
  2:       3 bytes   4924 times -->      1.13 Mbps in      20.33 usec
 ...
121: 8388605 bytes      3 times -->    911.48 Mbps in   70215.34 usec
122: 8388608 bytes      3 times -->    911.30 Mbps in   70229.01 usec
123: 8388611 bytes      3 times -->    911.20 Mbps in   70237.00 usec

compare with the result without mx (tcp):

Terminal.png node:
mpirun -n 2 -x LD_LIBRARY_PATH --machinefile /tmp/nodefile --mca btl self,sm,tcp ./NPmpi
  0:       1 bytes   2789 times -->      0.28 Mbps in      27.19 usec
  1:       2 bytes   3678 times -->      0.58 Mbps in      26.23 usec
  2:       3 bytes   3812 times -->      0.87 Mbps in      26.38 usec
 ...
121: 8388605 bytes      3 times -->    903.15 Mbps in   70862.85 usec
122: 8388608 bytes      3 times -->    902.94 Mbps in   70879.18 usec
123: 8388611 bytes      3 times -->    903.05 Mbps in   70870.68 usec

Without any tuning on the ethernet driver, the latency is improved by 23%, and the maximum bandwitdh is slighty better with Open-MX.

Setup a DHCP server on your nodes

Configure DHCP

If you need to run your own DHCP server (for example if you want to run a cluster distribution inside kavlan or test kadeploy ), you can use a script to generate the configuration file:

Then, go back the the frontend, and download the script that will generate your dhcp configuration:

(this script use restfully and ruby-ip gems)


Then, generate the configuration (replace VLANID and SITE by your current site and vlan id ) and copy it on the node:

Terminal.png frontend:
chmod +x ./gen_dhcpd_conf.rb
Terminal.png frontend:
gem install ruby-ip --no-ri --no-rdoc --user-install
Terminal.png frontend:
./gen_dhcpd_conf.rb --site SITE --vlan-id VLANID
Terminal.png frontend:
scp dhcpd-kavlan-VLANID-SITE.conf root@`head -1 < nodefile`:

For user accounts, you need to specify your GEM_HOME directory because in the classical one, you won't be able to install the "ruby-ip". To make it possible, type :

Terminal.png frontend:
export GEM_HOME=~/.gem/ruby/1.8/
Warning.png Warning

On a frontend that runs Debian Jessie (only in Reims for now), the version number is different:

Terminal.png frontend:
export GEM_HOME=~/.gem/ruby/2.1.0/

You have to disable the default DHCP server of the VLAN:

On the frontend

Terminal.png frontend:
kavlan -d

Now you have to install a DHCP server on the node (we assume the node is not yet in the job VLAN, or the vlan is routed and have access to the proxy for apt):

Terminal.png node:
apt-get install dhcp3-server
Terminal.png node:
cp /root/dhcpd.conf /etc/dhcp/

On the node chosen as a DHCP server, start the server:

Terminal.png node:
/etc/init.d/dhcp3-server start

Then, in another shell, connect as root on a second node (or use kaconsole):

Terminal.png frontend:
ssh root@node-xx-kavlan-yy

And restart the network configuration:

Terminal.png node-dhcp-client:
/etc/init.d/networking restart
...
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 6
[ 5185.656817] bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON
[ 5185.670596] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8
DHCPOFFER from 10.32.3.6
DHCPREQUEST on eth0 to 255.255.255.255 port 67
DHCPACK from 10.32.3.6
Stopping NTP server: ntpd.
Starting NTP server: ntpd.
bound to 10.32.3.7 -- renewal in 37174 seconds.

on the dhcp server, check the logs:

Terminal.png node-dhcp-server:
tail /var/log/daemon.log
...
Apr 12 17:32:24 suno-6-kavlan-4 dhcpd: DHCPDISCOVER from 00:26:b9:3f:43:a1 via eth0
Apr 12 17:32:24 suno-6-kavlan-4 dhcpd: DHCPOFFER on 10.32.3.7 to 00:26:b9:3f:43:a1 via eth0
Apr 12 17:32:24 suno-6-kavlan-4 dhcpd: DHCPREQUEST for 10.32.3.7 (10.32.3.6) from 00:26:b9:3f:43:a1 via eth0
Apr 12 17:32:24 suno-6-kavlan-4 dhcpd: DHCPACK on 10.32.3.7 to 00:26:b9:3f:43:a1 via eth0

In the four last lines, you see that your own dhcp server has given an address to the other node.

DHCP and PXE

For your information, if you need to do a PXE boot, you must change the tftp server in the generated dhcpd configuration file:

Terminal.png node:
IP=`hostname -i`
Terminal.png node:
perl -i -pe "s/next-server .*/next-server $IP;/" /etc/dhcp/dhcpd.conf

(if there is no next-server configured, you must edit the file by hand and add a line like this:

next-server XX.XX.XX.XX ;

where XX.XX.XX.XX is the IP of your node (echo $IP).

Change the VLAN of your nodes manually

Put your nodes into the reserved VLAN

If you really want to change the VLAN manually, you can, but it's much simpler to change the vlan with kadeploy.

In order to change the VLAN of the nodes manually, you must reconfigure the network after the vlan has changed; but once the VLAN has changed, you can't connect to the node! An easy way to do this is to use the 'at' command (apt-get install at if it's not installed in your nodes)

We will use Taktuk to start remote commands on several nodes at once. In this example, we will use all the nodes. Since taktuk does not handle duplicate names in the nodefile, we must first remove duplicates.

First, we will use taktuk to install at on all nodes, then the taktuk command will simply launch the network reconfiguration in one minute. Finally, we set the VLAN of all our nodes.

As we will change the network configuration of nodes, we will use an isolated kavlan (a.k.a. kavlan-local) to not interfer with the rest of Grid'5000 network.

Terminal.png frontend:
oarsub -t deploy -l {"type='kavlan-local'"}/vlan=1,walltime=2 -I
Terminal.png frontend:
kavlan -V > myvlan
Terminal.png frontend:
oarsub -t deploy -l nodes=2 -I
Terminal.png frontend:
kadeploy3 -e wheezy-x64-base -k -f $OAR_FILE_NODES
Terminal.png frontend:
taktuk -s -l root -f $OAR_FILE_NODES broadcast exec [ "apt-get update; apt-get --yes install at" ]
Terminal.png frontend:
taktuk -s -l root -f $OAR_FILE_NODES broadcast exec [ "echo '/etc/init.d/networking restart'| at now + 1 minute " ]
Terminal.png frontend:
kavlan -i `cat myvlan` -s -f $OAR_FILE_NODES
all nodes are configured in the vlan 2


In one minute, your nodes will renegotiate their IP addresses and will be available inside the VLAN, you can connect to each of them using kaconsole or ssh (as we use a kavlan-local, you must connect to the gateway of that kavlan first):

Terminal.png frontend:
ssh kavlan-`cat myvlan`
Terminal.png kavlan-VLANID:
ssh root@suno-30-kavlan-2

You can use the ip neigh command to see the known hosts in your LAN; you should only see IPs in the 192.168.66.0/24 subnet

Terminal.png node:
ip neigh
192.168.66.250 dev eth0  INCOMPLETE
192.168.66.254 dev eth0 lladdr 02:00:00:00:01:02 REACHABLE

You should be able to ping another of your host inside your VLAN

Terminal.png node:
ping -c 3 suno-42-kavlan-2
64 bytes from 192.168.211.42: icmp_req=1 ttl=64 time=0.141 ms
64 bytes from 192.168.211.42: icmp_req=2 ttl=64 time=0.166 ms
64 bytes from 192.168.211.42: icmp_req=3 ttl=64 time=0.165 ms

--- suno-42-kavlan-2.sophia.grid5000.fr ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1998ms
rtt min/avg/max/mdev = 0.141/0.157/0.166/0.015 ms

Put your nodes back into the default VLAN

First, get put the list of your nodes name with vlan in a file:

Terminal.png frontend:
uniq $OAR_NODEFILE > mynodes
Terminal.png frontend:
sed "s/\([^.]*\)\(.*\)/\1-kavlan-`cat myvlan`\2/" mynodes > mynodes-vlan

Don't forget to first start the network restarting command with taktuk:

Terminal.png frontend:
taktuk -s -l root -f ./mynodes-vlan broadcast exec [ "echo '/etc/init.d/networking restart' | at now + 1 minute " ]

Then you can put your nodes back in the default VLAN:

Terminal.png frontend:
kavlan -s -i DEFAULT -f $OAR_NODEFILE

You should be able to ping your nodes:

Terminal.png frontend:
for i in `uniq $OAR_NODEFILE`; do ping -c 1 $i; done


Another way to put back nodes into the default VLAN is to change the vlan and then kareboot the nodes.

Terminal.png frontend:
kavlan -s -i DEFAULT -f $OAR_NODEFILE
Terminal.png frontend:
kareboot3 -f $OAR_NODEFILE -r simple_reboot

Other usage

Using the API

Kavlan is also available through the API. Using the job and deploy API, you can, as with the command line tools, reverve nodes with vlan and deploy nodes into a vlan. If you want to manipulate VLAN directly through the API, you can do several things:

You can get the vlans you have reserved:

GET https://api.grid5000.fr/stable/sites/:site_uid/vlans/users/:user_uid

You can get all the vlan available on a site:

GET https://api.grid5000.fr/stable/sites/:site_uid/vlans/

You can get the VLAN of nodes on a site:

GET https://api.grid5000.fr/stable/sites/:site_uid/vlans/nodes

You can print the VLAN of a list of nodes

POST https://api.grid5000.fr/stable/sites/:site_uid/vlans/nodes {\"nodes\": [ <list of node names>]}

You can change the VLAN of a list of nodes:

POST https://api.grid5000.fr/stable/sites/:site_uid/vlans/:vlan_uid/ {\"nodes\": [ <list of node names>]}

You can start the dhcp server for the vlan

PUT https://api.grid5000.fr/stable/sites/:site_uid/vlans/:vlan_uid/dhcpd {"action":"start"}

You can stop the dhcp server for the vlan

PUT https://api.grid5000.fr/stable/sites/:site_uid/vlans/:vlan_uid/dhcpd {"action":"stop"}

Use a global VLAN

With a global VLAN, you can put nodes from several sites in the same VLAN

First reserve a global vlan on one site (here sophia) and 2 nodes on lille,sophia and lyon:

Terminal.png frontend:
oargridsub -t deploy -w 2:00:00 sophia:rdef="{\\\\\\\"type='kavlan-global'\\\\\\\"}/vlan=1+/nodes=2",lille:rdef=/nodes=2,lyon:rdef=/nodes=2 > oargrid.out


Get the oargrid Id and Job key from the output of oargridsub:

Terminal.png frontend:
export OAR_JOB_KEY_FILE=`grep "SSH KEY" oargrid.out | cut -f2 -d: | tr -d " "`
Terminal.png frontend:
export OARGRID_JOB_ID=`grep "Grid reservation id" oargrid.out | cut -f2 -d= | cut -d ' ' -f2`

Get the node list using oargridstat:

Terminal.png frontend:
oargridstat -w -l $OARGRID_JOB_ID | grep grid> ~/gridnodes

Then use kadeploy3 to deploy your image on all sites and change the VLAN:

Terminal.png frontend:
kadeploy3 -f gridnodes -a http://public.sophia.grid5000.fr/~nniclausse/openmx.dsc -k --multi-server -o ~/nodes.deployed --vlan 18

If you want to manipulate directly VLAN of a node, you have to run the kavlan command on the site where the node is, e.g. if you have reserved the global vlan located at sophia and want to put some nodes of lille into this vlan, you have to run kavlan on lille site (or use the API with lille site in the URL).

How to use a local VLAN

In this section, we will describe the specificity of the local VLANs.

If you want to use local VLAN, you have to first connect on the gateway of the vlan. For this, once you have a running reservation on a local VLAN, you have a ssh accces to the gateway:

Terminal.png frontend:
ssh kavlan-<vlanid>

Then you can reach your nodes inside the VLAN. Another option is to use the kaconsole command.

(You can still use kadeploy to put your nodes in the VLAN in one step.)


Configure ssh to easily connect to nodes in a local VLAN

You can configure ssh to make the connection through the gateway transparent:

In order to transparently use ssh to acces to isolated nodes (local VLAN), you should add this to your .ssh/config file on the frontend:

Host *-*-kavlan-1 *-*-kavlan-1.*.grid5000.fr
    ProxyCommand ssh -a -x kavlan-1 nc -q 0 %h %p
Host *-*-kavlan-2 *-*-kavlan-2.*.grid5000.fr
    ProxyCommand ssh -a -x kavlan-2 nc -q 0 %h %p
Host *-*-kavlan-3 *-*-kavlan-3.*.grid5000.fr
    ProxyCommand ssh -a -x kavlan-3 nc -q 0 %h %p

Then you can simply use ssh <cluster>-<nodeid>-kavlan-<vlanid> to access the node , for example:

Terminal.png frontend:
ssh root@NODE-kavlan-VLANID