KaVLAN: Difference between revisions

From Grid5000
Jump to navigation Jump to search
(153 intermediate revisions by 11 users not shown)
Line 1: Line 1:
{{Maintainer|Nicolas Niclausse}}
{{Maintainer|Nicolas Niclausse}}
{{Maintainer|Pierre Neyron}}
{{Portal|User}}
{{Portal|User}}
{{Portal|Tutorial}}
{{Portal|Tutorial}}
{{Status|Draft}}
{{Portal|Network}}
{{Status|In production}}
{{Pages|KaVLAN}}
__FORCETOC__
__FORCETOC__


= Overview =
= Overview =
The goal of Kavlan is to allow people to manage VLAN on Grid'5000 nodes. The benefits is  complete level 2 isolation. It can be used together with OAR and Kadeploy to do some experimentations on the grid.
[[Image:fig-kavlan.png|thumb|340px|alt="KaVLAN scheme"|KaVLAN big picture]]


The following figure shows two jobs running with KaVLAN: each job has it's nodes isolated in a VLAN (purple and green). The other nodes are all in the default VLAN (red). The only way to reach the isolated nodes is to use a gateway node (kavlan-1 and kavlan-2 in the figure). The ''gateway'' has two Ethernet interfaces: one in the default VLAN and one is the dedicated VLAN. This way, you can use ssh to reach your nodes (an other way to reach an isolated node is to use the <code class='command'>kaconsole</code> command).
[[Image:Kavlan_admin.png|500px|right|thumb|KaVLAN architecture: see ''local VLANs'' in '''<font color="green">green</font>''', ''routed VLANs'' in '''<font color="blue">blue</font>''', ''global VLANs'' in '''<font color="purple">purple</font>''' and the default VLAN in '''<font color="red">red</font>''']]
[[Image:kavlan.png|450px|center|thumbnail|KaVLAN architecture: 2 jobs running KaVLAN]]


{{Note|text=The gateways are NOT doing any routing: they are only used as ssh gateways.}}
[[KaVLAN]] provides ''network isolation capabilities'' for Grid'5000 users' experimentations, via a high-level, user-driven interface to '''[https://en.wikipedia.org/wiki/Virtual_LAN VLANs (802.1Q)]'''.  


Currently, KaVLAN can be used on a single site only. The Technical team is currently developing an extension to use QinQ in Grid'5000 to allow Grid-wide VLANs.
Said differently: [[KaVLAN]] allows users to manage VLANs for the network connection of their Grid'5000 nodes.  


Installation status on sites :
Behind the scenes, [[KaVLAN]] actually changes the configuration of the network switches of Grid'5000 infrastructure, to set the VLAN membership (VLAN ID) for the ports which are cabled to the network interfaces of one or more nodes.


{| class="checks" style="width: auto;"
'''The benefit is a complete level 2 isolation for users' experiments.'''
! class="left" |Sites
! Version
! Status
|-
| class="left" |[[Bordeaux:Home|Bordeaux]]
|
| [[Image:Fail.png]]
|-
| class="left" |[[Grenoble:Home|Grenoble]]
|
| [[Image:Fail.png]]
|-
| class="left" |[[Lille:Home|Lille]]
| 1.0rc3
| [[Image:Check.png]]
|-
| class="left" |[[Lyon:Home|Lyon]]
|
| [[Image:Fail.png]]
|-
| class="left" |[[Nancy:Home|Nancy]]
|
| [[Image:InProgress.png]]
|-
| class="left" |[[Orsay:Home|Orsay]]
|
| [[Image:Fail.png]]
|-
| class="left" |[[Rennes:Home|Rennes]]
|
| [[Image:Fail.png]]
|-
| class="left" |[[Sophia:Home|Sophia]]
| 1.0rc5
| [[Image:Check.png]]
|-
| class="left" |[[Toulouse:Home|Toulouse]]
|
| [[Image:Fail.png]]
|}


=Usage=
''It is however important to note that KaVLAN does not guarantee performance isolation: on sites with a hierarchical network (such as [[Nancy:Network|Nancy]]), inter-switch links may indeed be shared between various VLANs/experiments.''
== How to reserve a VLAN ==


KaVLAN only works with ''deploy'' reservations; to obtain nodes and a VLAN, simply add the '''-t kavlan''' option to <code class="command">oarsub</code>. For example, if you need 3 nodes and a VLAN:
For experimentations involving network reconfiguration, [[KaVLAN]] is to be used together with OAR and Kadeploy (for the resources reservation and to gain control over the operating system and network configuration of the nodes)
{{Term|location=frontend|cmd=<code class="command">oarsub</code> -t kavlan -t deploy -l /nodes=3 -I}}


Then you can get the id of your VLAN using the <code class="command">kavlan</code> command
{{Term|location=frontend|cmd=<code class="command">kavlan</code> -V}}


If you run this command outside the shell started by OAR for your reservation, you must add the oar JOBID.
Please note the installation status of KaVLAN for all Grid'5000 sites:
{{Term|location=frontend|cmd=<code class="command">kavlan</code> -V -j <code class="replace">JOBID</code>}}
{{KaVLAN installation status}}


You should get an integer in the <1-8> range.
= The 3 KaVLAN VLAN types =
3 types of VLANs are available for users in Grid'5000: '''local''', '''routed''' and '''global'''.
{{Kvlan-types-and-id}}


You can get all the options of the command using --help:
See the 2 schemas on the right of this page, which illustrate [[KaVLAN]] big picture and architecture.
<pre class="brush: bash">
# kavlan --help
Version 1.0rc2
USAGE : kavlan [options]
      -r|--get-network-range
      -g|--get-network-gateway
      -l|--get-nodelist
      -V|--get-vlan-id              print VLAN ID of job (needs -j JOBID)
      -d|--disable-dhcp
      -e|--enable-dhcp
      -i|--vlan_id <VLANID>
      -s                            set vlan for given node(s)
      -f|--filenode <NODEFILE>
      -j|--oar-jobid=<JOBID>
      -m|--machine <nodename>
      -q|--quiet                    quiet mode
      -h|--help                    print this help
      -v|--verbose                  verbose mode
</pre>


Once you have a kavlan reservation running, you are allowed to connect to the VLAN gateway named <code class='hostname'>kavlan-<ID></code> where ID is your vlan ID, and you can also put your nodes in your VLAN (and back into the default VLAN) at anytime during the lifetime of your job.
== 1: Local VLAN ==


Since KaVLAN works only with deploy jobs, the next step is to deploy at least one node (otherwise, you won't have root acces on it and therefore can't restart it's network configuration).
From the IP routing point of view, a ''local VLAN'' is completely '''isolated''' from the rest of Grid'5000. '''No IP routing is configured in any router of the infrastructure'''. Therefore, to reach your nodes inside that kind of VLAN, the Grid'5000 infrastructure provides a special host you can hop by: '''the SSH gateway of the VLAN'''. For each local VLAN, the hostname of that SSH gateway is: ''kavlan-<code class="replace">ID</code>''.


Let's say you want to deploy all nodes using the lenny-x64-base environment:
{{Term|location=frontend|cmd=<code class="command">kadeploy3</code> -f $OAR_NODEFILE -k -e <code class="replace">lenny-x64-base</code>}}


== Enable/disable the dhcp server of the gateway ==
Then you can connect to any of your nodes within the VLAN using hostnames such as ''<code class="replace">hostname-X</code>-kavlan-<code class="replace">ID</code>'' (adding the suffix ''-kavlan-'' + the ''VLAN_ID'' to the regular hostname), for instance from the SSH gateway of the VLAN, or from node to node (with the default provided DNS configuration in the VLAN).
Once the deployment is over, you are now able to change the VLAN of your nodes. First check that the DHCP server is running on the gateway, run on the frontend (add ''-j JOBID'' if needed) :
{{Term|location=frontend|cmd=<code class="command">kavlan</code> -e}}


You can disable the DHCP server with <code class='command'>kavlan -d</code>


== Change the VLAN of your nodes ==
The figure below shows two jobs using KaVLAN: each job has its nodes isolated in a ''local VLAN'' ('''<font color="green">green</font>''' and '''<font color="purple">purple</font>'''). The other nodes are all in the default VLAN ('''<font color="red">red</font>'''). The only way to reach the isolated nodes is to hop by the VLAN's ''SSH gateway'' machine (kavlan-1 and kavlan-2 in the figure). Technically speaking, the ''SSH gateway'' has two Ethernet interfaces: one in the default VLAN and one in the dedicated VLAN. An other way to reach an isolated node is to use the <code class='command'>kaconsole</code> command.
In order to change the VLAN of the nodes, you must  reconfigure the network after the vlan has changed; but once the VLAN has changed, you can't connect to the node! An easy way to do this is to use the 'at' command (<code class='command'>apt-get install at</code> if it's not installed in your nodes)
[[Image:kavlan.png|450px|center|thumbnail|KaVLAN architecture: 2 jobs running KaVLAN]]


We will use [[Using_TakTuk|Taktuk]] to start remote commands on several nodes at once. In this example, we will use all the nodes. Since taktuk does not handle duplicate names in the nodefile, we must first remove duplicates.
{{Note|text=Please note that:
* as your nodes are isolated from the rest of Grid'5000, NFS mounts of /home partition is not possible. Therefore, '''Grid'5000 environments that mount /home partition (-nfs, -big, -std) may fail to boot'''}}


First, we will use taktuk to install <code class='command'>at</code> on all nodes, then the taktuk command will simply launch the network reconfiguration in one minute. Finally, we set the VLAN of all our nodes.
== 2: Routed VLAN ==


<pre class="brush: bash">
Unlike ''local VLANs'' which are isolated, '''''routed VLANs'' are not isolated at the layer 3: IP packets are routed'''. Therefore you can reach the nodes inside a ''routed VLAN'' from the rest of Grid5000 (e.g. from the default VLAN, or from another ''routed VLAN''). No need here for a hop by a SSH gateway, as it is the case for ''local VLANs''.
$ uniq $OAR_NODEFILE > ./mynodes
$ taktuk -s -l root -f ./mynodes broadcast exec [ "apt-get update; apt-get --yes install at" ]
$ taktuk -s -l root -f ./mynodes broadcast exec [ "echo '/etc/init.d/networking restart'| at now + 1 minute " ]
$ kavlan -s
Take node list from OAR nodefile: /var/lib/oar/387465
... node azur-25.sophia.grid5000.fr changed to vlan KAVLAN-7
... node azur-28.sophia.grid5000.fr changed to vlan KAVLAN-7
... node azur-30.sophia.grid5000.fr changed to vlan KAVLAN-7
all nodes are configured in the vlan 7
</pre>


In one minute, your nodes will renegotiate their IP addresses and will be available inside the VLAN. To get the name of your nodes in the VLAN, use the ''-l'' option:
Nodes in the VLAN are reachable with the following hostname: ''<code class="replace">hostname-X</code>-kavlan-<code class="replace">ID</code>'' (same naming scheme as for ''local VLANs''), from the frontends of the sites for instance.
<pre class="brush: bash">
$kavlan  -l
azur-25-kavlan-7.sophia.grid5000.fr
azur-28-kavlan-7.sophia.grid5000.fr
azur-30-kavlan-7.sophia.grid5000.fr
</pre>


You can connect to each of them using kaconsole or ssh (first, you must connect to the gateway of the vlan):
== 3: Global VLAN ==
<pre class="brush: bash">
$VLANID=`kavlan -V`
$ssh kavlan-$VLANID
kavlan-7@sophia$ ssh root@azur-25-kavlan-7
</pre>


You can use the <code class='command'>ip neigh</code> command to see the known hosts in your LAN; you should only see IPs in the 192.168.66.0/24 subnet
'''''Global VLANs'' are VLANs which spread on all grid5000 sites'''. Therefore you can configure nodes of different sites in the same ''global VLAN'', i.e. in a same Ethernet network (no inter-site IP routing required, nodes in a global VLAN use a same broadcast domain).
<pre class="brush: bash">
azur-25-kavlan-7:~$ip neigh
192.168.66.250 dev eth0  INCOMPLETE
192.168.66.254 dev eth0 lladdr 02:00:00:00:01:02 REACHABLE
</pre>


You should be able to ping another of your host inside your VLAN
(underneath they use the [https://en.wikipedia.org/wiki/IEEE_802.1ad IEEE 802.1ad] encapsulation, also known as QinQ to provide a same layer 2 network for all sites.)  
<pre class="brush: bash">
azur-25-kavlan-7:~# ping -c 3 azur-30-kavlan-7
PING azur-30-kavlan-7.sophia.grid5000.fr (192.168.66.30) 56(84) bytes of data.
64 bytes from azur-30.local (192.168.66.30): icmp_seq=1 ttl=64 time=0.154 ms
64 bytes from azur-30.local (192.168.66.30): icmp_seq=2 ttl=64 time=0.170 ms
64 bytes from azur-30.local (192.168.66.30): icmp_seq=3 ttl=64 time=0.163 ms


--- azur-30-kavlan-7.sophia.grid5000.fr ping statistics ---
There is exactly 1 and only 1 ''global VLAN'' provided by site. If that VLAN is already reserved by another user, you can try to get one from another site. '''Reservation must be made on the site of the ''global VLAN'''.
3 packets transmitted, 3 received, 0% packet loss, time 1998ms
rtt min/avg/max/mdev = 0.154/0.162/0.170/0.012 ms
</pre>


You can configure ssh to make the connection through the gateway transparent:
Since it is a same layer 2 network, no routing between the nodes which are placed in a ''global VLAN'' is required (even from site to site).


== Configure ssh to easily connect to nodes in a VLAN ==
To reach nodes inside a ''global VLAN'' from outside, routing is configured on the router of the site where the ''global VLAN'' is reserved.
The hostnames of nodes within a VLAN follow the same scheme as above: ''<code class="replace">hostname-X</code>-kavlan-<code class="replace">ID</code>''.


In order to transparently use ssh to acces to isolated nodes, you should add this to your .ssh/config file on the frontend:
{{Note|text=Please mind that there is not performance isolation between all ''global VLANs'' and also Grid'5000 inter-site VLAN (backbone VLAN). All share the same inter-site ''physical'' link}}


<pre class="brush: bash;">
= Reserving a VLAN =
Host *-*-kavlan-1 *-*-kavlan-1.*.grid5000.fr
    ProxyCommand ssh -q -a -x kavlan-1 nc %h %p
Host *-*-kavlan-2 *-*-kavlan-2.*.grid5000.fr
    ProxyCommand ssh -q -a -x kavlan-2 nc %h %p
Host *-*-kavlan-3 *-*-kavlan-3.*.grid5000.fr
    ProxyCommand ssh -q -a -x kavlan-3 nc %h %p
Host *-*-kavlan-4 *-*-kavlan-4.*.grid5000.fr
    ProxyCommand ssh -q -a -x kavlan-4 nc %h %p
Host *-*-kavlan-5 *-*-kavlan-5.*.grid5000.fr
    ProxyCommand ssh -q -a -x kavlan-5 nc %h %p
Host *-*-kavlan-6 *-*-kavlan-6.*.grid5000.fr
    ProxyCommand ssh -q -a -x kavlan-6 nc %h %p
Host *-*-kavlan-7 *-*-kavlan-7.*.grid5000.fr
    ProxyCommand ssh -q -a -x kavlan-7 nc %h %p
Host *-*-kavlan-8 *-*-kavlan-8.*.grid5000.fr
    ProxyCommand ssh -q -a -x kavlan-8 nc %h %p
</pre>


Then you can simply use ssh <cluster>-<nodeid>-kavlan-<vlanid> to access the node , for ex:
Using KaVLAN requires to works with ''deploy'' reservations because it necessarily involves reconfiguring the network stack of the operating system of the nodes.
{{Term|location=frontend|cmd=<code class="command">ssh</code> root@<code class='replace'>NODE</code>-kavlan-<code class='replace'>VLANID</code>}}


== Put your nodes back into the default VLAN ==
To obtain both nodes and a VLAN, you must reserve kavlan resources (VLAN-IDs) with OAR using the <code class="command">oarsub</code> command. As shown in the table above, there are 3 kinds of resources defined in OAR for VLANs: '''kavlan''', '''kavlan-local''', '''kavlan-global'''. For example, if you need 3 nodes and a local VLAN, you can run:
{{Term|location=frontend|cmd=<code class="command">oarsub</code> -t deploy -l {"type='kavlan-local'"}/vlan=1+/nodes=3 -I}}


First, get put the list of your nodes name with vlan in a file:
Then you can get the ID of your VLAN using the <code class="command">kavlan</code> command
{{Term|location=frontend|cmd=<code class='command'>kavlan</code> -l > mynodes-vlan}}
{{Term|location=frontend|cmd=<code class="command">kavlan</code> -V}}


Don't forget to first start the network restarting command with taktuk:
If you need to run that command from outside the shell which is started by OAR for your reservation, you have to give the OAR ''JOBID''.
{{Term|location=frontend|cmd=<code class='command'>taktuk</code> -s -l root -f ./mynodes-vlan broadcast exec [ "echo '/etc/init.d/networking restart' &#124;  at now + 1 minute " ]}}
{{Term|location=frontend|cmd=<code class="command">kavlan</code> -V -j <code class="replace">JOBID</code>}}


Then you can put your nodes back in the default VLAN:
Either ways, you should get a VLAN ID integer in the ''<1-3>'' range for ''local VLANs'', ''<4-9>'' for ''routed VLANs'', and greater than 10 for ''global VLANs'' (only one global VLAN ID is available per site, that should be the one of the site you are connected to).
{{Term|location=frontend|cmd=<code class="command">kavlan</code> -s -i DEFAULT -f $OAR_NODEFILE}}


You should be able to ping your nodes:
<pre class="brush: bash">
for i in `uniq $OAR_NODEFILE`; do ping -c 1 $i; done
PING azur-25.sophia.grid5000.fr (138.96.20.25 56(84) bytes of data.
64 bytes from azur-25.sophia.grid5000.fr (138.96.20.25): icmp_seq=1 ttl=64 time=1002 ms


--- azur-25.sophia.grid5000.fr ping statistics ---
See below the KaVLAN ID, and associated IP subnets (served by DHCP in the VLANs)  
1 packets transmitted, 1 received, 0% packet loss, time 0ms
{{Template:KaVLAN IP Network Golden rules}}
rtt min/avg/max/mdev = 1002.910/1002.910/1002.910/0.000 ms
(More info in the [[Grid5000:Network|Network page]])
PING azur-28.sophia.grid5000.fr (138.96.20.28) 56(84) bytes of data.
64 bytes from azur-28.sophia.grid5000.fr (138.96.20.28): icmp_seq=1 ttl=64 time=1.23 ms


--- azur-28.sophia.grid5000.fr ping statistics ---
= Setting up the VLAN =
1 packets transmitted, 1 received, 0% packet loss, time 0ms
Configuring the VLANs is done with the  '''<code class="command">kavlan</code>''' command.
rtt min/avg/max/mdev = 1.234/1.234/1.234/0.000 ms
PING azur-30.sophia.grid5000.fr (138.96.20.30) 56(84) bytes of data.
64 bytes from azur-30.sophia.grid5000.fr (138.96.20.30): icmp_seq=1 ttl=64 time=1.25 ms


--- azur-30.sophia.grid5000.fr ping statistics ---
All the options of the command can be show using ''--help'', as follows:
1 packets transmitted, 1 received, 0% packet loss, time 0ms
<pre class="brush: bash">
rtt min/avg/max/mdev = 1.259/1.259/1.259/0.000 ms
# kavlan --help
Usage: kavlan [options]
Specific options:
    -i, --vlan-id N                  set VLAN ID (integer or DEFAULT)
    -C, --ca-cert CA                CA certificate
    -c, --client-cert CERT          client certificate
    -k, --client-key KEY            client key
    -l, --get-nodelist              Show nodenames in the given vlan
    -e, --enable-dhcp                Start DHCP server
    -d, --disable-dhcp              Stop DHCP server
    -V, --show-vlan-id              Show vlan id of job (needs -j JOBID)
    -g, --get-vlan                  Show vlan of nodes
    -s, --set-vlan                  Set vlan of nodes
    -j, --oar-jobid JOBID            OAR job id
    -m, --machine NODE              set nodename (several -m are OK)
    -f, --filename NODEFILE          read nodes from a file
    -u, --user USERNAME              username
    -v, --[no-]verbose              Run verbosely
    -q, --[no-]quiet                Run quietly
        --[no-]debug                Run with debug output
    -h, --help                      Show this message
        --version                    Show version
</pre>
</pre>


=Advance usage=
So, once you have a ''kavlan'' job running, and know your vlan ID, you can use the '''<code class="command">kavlan</code>''' command to put some network interfaces of your nodes in your VLAN (and later, back into the default VLAN) at anytime during the lifetime of your job.  
== Setup a DHCP server on your nodes ==
 
If you need to run your own DHCP server (for example if you want to run a cluster distribution inside kavlan or test kadeploy ), you can use the configuration file available on the VLAN's gateway.
 
Let's say that you want to install dhcpd on azur-25-kavlan-7. You first have to install a dhcp server on this node (we assume the node is not yet is the job VLAN):
{{Term|location=node|cmd=<code class="command">apt-get</code> install dhcp3-server}}
 
Then, copy the configuration file from the gateway to the node:
 
{{Term|location=frontend|cmd=<code class="command">scp</code> kavlan-7:/etc/dhcp3/dhcpd.conf root@azur-25-kavlan-7:/etc/dhcp3/}}
 


In case of a node with multiple cabled network interfaces, each of them can be used, with the following naming:
* for the default interface: <code class="replace">hostname-X</code>-kavlan-<code class="replace">ID</code>
* for other interfaces: <code class="replace">hostname-X</code>-eth<code class="replace">Y</code>-kavlan-<code class="replace">ID</code>


Then we must isolate our nodes before starting the dhcp server:
{{Note|text=You may notice that the hostname for secondaries interfaces is formed like this "<code class="replace">hostname-X</code>-eth<code class="replace">Y</code>-kavlan-<code class="replace">ID</code>" while the name of the interface is the system is "<code class="replace">enoY</code> or <code class="replace">enpYsZ</code>".
<pre class="brush: bash">
It's due to changes in the naming of interfaces since debian9 (see https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/).<br/>
frontend$ taktuk -s -l root -f ./mynodes broadcast exec [ "echo '/etc/init.d/networking restart'| at now + 1 minute " ]
Kavlan hostname still uses old interface names, if you're not sure which name correspond to which interface, both naming (old and new) are describes in the api.
frontend$ kavlan -s
For exemple, '''grisou-2-eth1-kavlan-1.nancy.grid5000.fr''' will correspond to interface '''eno2''' on '''grisou-2''' ( https://api.grid5000.fr/stable/sites/nancy/clusters/grisou/nodes/grisou-2.json )
</pre>
}}
Wait one minute, and then you can start the server, once you have disabled the gateway's DHCP server.
On the frontend {{Term|location=frontend|cmd=<code class="command">kavlan -d</code>}}
then on the node:
{{Term|location=node|cmd=<code class='command'>/etc/init.d/dhcp3-server</code> start}}


Then, in another shell, connect as root on a second node:
{{Term|location=frontend|cmd=<code class='command'>ssh</code> azur-30-kavlan-7}}


And restart the network configuration:
<pre class="brush: bash">
azur-30-kavlan-7:~# /etc/init.d/networking restart
Reconfiguring network interfaces...There is already a pid file /var/run/dhclient.eth1.pid with pid 5319
killed old client process, removed PID file
Internet Systems Consortium DHCP Client V3.1.1
Copyright 2004-2008 Internet Systems Consortium.
All rights reserved.
For info, please visit http://www.isc.org/sw/dhcp/


Listening on LPF/eth1/00:11:25:c4:d9:c5
;A DHCP service is provided in all VLANs (local, routed and global):
Sending on  LPF/eth1/00:11:25:c4:d9:c5
Once you have put network interfaces of some nodes in a VLAN, you can down-up them, or restart the networking service of the operating system (with kaconsole3 for instance, or using a command line with some bash magic like '| at now + 1 minute', to run the command asynchronously and overcome the network disconnection that will occur), or reboot the node (with kareboot3) in order to get a relevant IP for the VLAN.  
Sending on  Socket/fallback
DHCPRELEASE on eth1 to 192.168.66.254 port 67
Internet Systems Consortium DHCP Client V3.1.1
Copyright 2004-2008 Internet Systems Consortium.
All rights reserved.
For info, please visit http://www.isc.org/sw/dhcp/


Listening on LPF/eth1/00:11:25:c4:d9:c5
Sending on  LPF/eth1/00:11:25:c4:d9:c5
Sending on  Socket/fallback
DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 8
DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 14
DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 4
DHCPOFFER from 192.168.66.19
DHCPREQUEST on eth1 to 255.255.255.255 port 67
DHCPACK from 192.168.66.19
bound to 192.168.66.2 -- renewal in 41122 seconds.
done.
</pre>


on the dhcp server, check the logs:
If needed for your experiment, please note that the '''<code class="command">kavlan</code>''' command allows to deactivate the DHCP service in a VLAN.


<pre class="brush: bash">
azur-25-kavlan-7:~# tail /var/log/messages
Mar 17 16:22:51 azur-25 dhcpd: Copyright 2004-2008 Internet Systems Consortium.
Mar 17 16:22:51 azur-25 dhcpd: All rights reserved.
Mar 17 16:22:51 azur-25 dhcpd: For info, please visit http://www.isc.org/sw/dhcp/
Mar 17 16:22:51 azur-25 dhcpd: Wrote 0 deleted host decls to leases file.
Mar 17 16:22:51 azur-25 dhcpd: Wrote 0 new dynamic host decls to leases file.
Mar 17 16:22:51 azur-25 dhcpd: Wrote 0 leases to leases file.
Mar 17 16:25:27 azur-25 dhcpd: DHCPDISCOVER from 00:11:25:c4:d9:c5 via eth1
Mar 17 16:25:27 azur-25 dhcpd: DHCPOFFER on 192.168.66.2 to 00:11:25:c4:d9:c5 via eth1
Mar 17 16:25:27 azur-25 dhcpd: DHCPREQUEST for 192.168.66.2 (192.168.66.19) from 00:11:25:c4:d9:c5 via eth1
Mar 17 16:25:27 azur-25 dhcpd: DHCPACK on 192.168.66.2 to 00:11:25:c4:d9:c5 via eth1
</pre>


In the four last lines, you see that your own dhcp server has given an address to the other node.
Reminder: for local VLANs, you are also allowed to ssh to the VLAN's SSH gateway, which is named kavlan-<code class='replace'>ID</code>.


If you need to do PXE boot, you must change the tftp server in the config file:
{{Term|location=node|cmd=IP=`hostname -i`}}
{{Term|location=node|cmd=<code class='command'>perl</code> -i -pe "s/next-server .*/next-server $IP;/" /etc/dhcp3/dhcpd.conf}}
(if there is no next-servre configurer, you must edit the file by hand and add you a line like this:
next-server XX.XX.XX.XX ;


where XX.XX.XX.XX is the IP of your node (echo $IP).
Please look at the other KaVLAN pages for examples of usage (look at the '''see-also dialog box''' at the top of the page).

Revision as of 13:25, 8 January 2019


Overview

"KaVLAN scheme"
KaVLAN big picture
KaVLAN architecture: see local VLANs in green, routed VLANs in blue, global VLANs in purple and the default VLAN in red

KaVLAN provides network isolation capabilities for Grid'5000 users' experimentations, via a high-level, user-driven interface to VLANs (802.1Q).

Said differently: KaVLAN allows users to manage VLANs for the network connection of their Grid'5000 nodes.

Behind the scenes, KaVLAN actually changes the configuration of the network switches of Grid'5000 infrastructure, to set the VLAN membership (VLAN ID) for the ports which are cabled to the network interfaces of one or more nodes.

The benefit is a complete level 2 isolation for users' experiments.

It is however important to note that KaVLAN does not guarantee performance isolation: on sites with a hierarchical network (such as Nancy), inter-switch links may indeed be shared between various VLANs/experiments.

For experimentations involving network reconfiguration, KaVLAN is to be used together with OAR and Kadeploy (for the resources reservation and to gain control over the operating system and network configuration of the nodes)


Please note the installation status of KaVLAN for all Grid'5000 sites:

Sites Version Status
Grenoble 1.2.7-1 Check.png
Lille 1.2.7-1 Check.png
Luxembourg 1.2.7-1 Check.png
Lyon 1.2.7-1 Check.png
Nancy 1.2.7-1 Check.png
Nantes 1.2.7-1 Check.png
Rennes 1.2.7-1 Check.png
Sophia 1.2.7-1 Check.png

The 3 KaVLAN VLAN types

3 types of VLANs are available for users in Grid'5000: local, routed and global.

KaVLAN name in OAR type first id last id
kavlan-local local 1 3
kavlan routed 4 9
kavlan-global global 10 21

See the 2 schemas on the right of this page, which illustrate KaVLAN big picture and architecture.

1: Local VLAN

From the IP routing point of view, a local VLAN is completely isolated from the rest of Grid'5000. No IP routing is configured in any router of the infrastructure. Therefore, to reach your nodes inside that kind of VLAN, the Grid'5000 infrastructure provides a special host you can hop by: the SSH gateway of the VLAN. For each local VLAN, the hostname of that SSH gateway is: kavlan-ID.


Then you can connect to any of your nodes within the VLAN using hostnames such as hostname-X-kavlan-ID (adding the suffix -kavlan- + the VLAN_ID to the regular hostname), for instance from the SSH gateway of the VLAN, or from node to node (with the default provided DNS configuration in the VLAN).


The figure below shows two jobs using KaVLAN: each job has its nodes isolated in a local VLAN (green and purple). The other nodes are all in the default VLAN (red). The only way to reach the isolated nodes is to hop by the VLAN's SSH gateway machine (kavlan-1 and kavlan-2 in the figure). Technically speaking, the SSH gateway has two Ethernet interfaces: one in the default VLAN and one in the dedicated VLAN. An other way to reach an isolated node is to use the kaconsole command.

KaVLAN architecture: 2 jobs running KaVLAN
Note.png Note

Please note that:

  • as your nodes are isolated from the rest of Grid'5000, NFS mounts of /home partition is not possible. Therefore, Grid'5000 environments that mount /home partition (-nfs, -big, -std) may fail to boot

2: Routed VLAN

Unlike local VLANs which are isolated, routed VLANs are not isolated at the layer 3: IP packets are routed. Therefore you can reach the nodes inside a routed VLAN from the rest of Grid5000 (e.g. from the default VLAN, or from another routed VLAN). No need here for a hop by a SSH gateway, as it is the case for local VLANs.

Nodes in the VLAN are reachable with the following hostname: hostname-X-kavlan-ID (same naming scheme as for local VLANs), from the frontends of the sites for instance.

3: Global VLAN

Global VLANs are VLANs which spread on all grid5000 sites. Therefore you can configure nodes of different sites in the same global VLAN, i.e. in a same Ethernet network (no inter-site IP routing required, nodes in a global VLAN use a same broadcast domain).

(underneath they use the IEEE 802.1ad encapsulation, also known as QinQ to provide a same layer 2 network for all sites.)

There is exactly 1 and only 1 global VLAN provided by site. If that VLAN is already reserved by another user, you can try to get one from another site. Reservation must be made on the site of the global VLAN.

Since it is a same layer 2 network, no routing between the nodes which are placed in a global VLAN is required (even from site to site).

To reach nodes inside a global VLAN from outside, routing is configured on the router of the site where the global VLAN is reserved. The hostnames of nodes within a VLAN follow the same scheme as above: hostname-X-kavlan-ID.

Note.png Note

Please mind that there is not performance isolation between all global VLANs and also Grid'5000 inter-site VLAN (backbone VLAN). All share the same inter-site physical link

Reserving a VLAN

Using KaVLAN requires to works with deploy reservations because it necessarily involves reconfiguring the network stack of the operating system of the nodes.

To obtain both nodes and a VLAN, you must reserve kavlan resources (VLAN-IDs) with OAR using the oarsub command. As shown in the table above, there are 3 kinds of resources defined in OAR for VLANs: kavlan, kavlan-local, kavlan-global. For example, if you need 3 nodes and a local VLAN, you can run:

Terminal.png frontend:
oarsub -t deploy -l {"type='kavlan-local'"}/vlan=1+/nodes=3 -I

Then you can get the ID of your VLAN using the kavlan command

Terminal.png frontend:
kavlan -V

If you need to run that command from outside the shell which is started by OAR for your reservation, you have to give the OAR JOBID.

Terminal.png frontend:
kavlan -V -j JOBID

Either ways, you should get a VLAN ID integer in the <1-3> range for local VLANs, <4-9> for routed VLANs, and greater than 10 for global VLANs (only one global VLAN ID is available per site, that should be the one of the site you are connected to).


See below the KaVLAN ID, and associated IP subnets (served by DHCP in the VLANs)

Local VLANs (non-routed)
Site KAVLAN-1 KAVLAN-2 KAVLAN-3
All 192.168.192.0/20 192.168.208.0/20 192.168.224.0/20
Routed VLANs
Site KAVLAN-4 KAVLAN-5 KAVLAN-6 KAVLAN-7 KAVLAN-8 KAVLAN-9
Bordeaux 10.0.0.0/18 10.0.64.0/18 10.0.128.0/18 10.0.192.0/18 10.1.0.0/18 10.1.64.0/18
Grenoble 10.4.0.0/18 10.4.64.0/18 10.4.128.0/18 10.4.192.0/18 10.5.0.0/18 10.5.64.0/18
Lille 10.8.0.0/18 10.8.64.0/18 10.8.128.0/18 10.8.192.0/18 10.9.0.0/18 10.9.64.0/18
Lyon 10.12.0.0/18 10.12.64.0/18 10.12.128.0/18 10.12.192.0/18 10.13.0.0/18 10.13.64.0/18
Nancy 10.16.0.0/18 10.16.64.0/18 10.16.128.0/18 10.16.192.0/18 10.17.0.0/18 10.17.64.0/18
Orsay 10.20.0.0/18 10.20.64.0/18 10.20.128.0/18 10.20.192.0/18 10.21.0.0/18 10.21.64.0/18
Rennes 10.24.0.0/18 10.24.64.0/18 10.24.128.0/18 10.24.192.0/18 10.25.0.0/18 10.25.64.0/18
Toulouse 10.28.0.0/18 10.28.64.0/18 10.28.128.0/18 10.28.192.0/18 10.29.0.0/18 10.29.64.0/18
Sophia 10.32.0.0/18 10.32.64.0/18 10.32.128.0/18 10.32.192.0/18 10.33.0.0/18 10.33.64.0/18
Strasbourg Reims 10.36.0.0/18 10.36.64.0/18 10.36.128.0/18 10.36.192.0/18 10.37.0.0/18 10.37.64.0/18
Luxembourg 10.40.0.0/18 10.40.64.0/18 10.40.128.0/18 10.40.192.0/18 10.41.0.0/18 10.41.64.0/18
Nantes 10.44.0.0/18 10.44.64.0/18 10.44.128.0/18 10.44.192.0/18 10.45.0.0/18 10.45.64.0/18
Note.png Note

At the end of each network, address x.x.x.253 is used by Kavlan server

Global VLANs
Site Global Vlan Subnet Router IP
Bordeaux KAVLAN-10 10.3.192.0/18 10.3.255.254
Grenoble KAVLAN-11 10.7.192.0/18 10.7.255.254
Lille KAVLAN-12 10.11.192.0/18 10.11.255.254
Lyon KAVLAN-13 10.15.192.0/18 10.15.255.254
Nancy KAVLAN-14 10.19.192.0/18 10.19.255.254
Orsay KAVLAN-15 10.23.192.0/18 10.23.255.254
Rennes KAVLAN-16 10.27.192.0/18 10.27.255.254
Toulouse KAVLAN-17 10.31.192.0/18 10.31.255.254
Sophia KAVLAN-18 10.35.192.0/18 10.35.255.254
Strasbourg Reims KAVLAN-19 10.39.192.0/18 10.39.255.254
Luxembourg KAVLAN-20 10.43.192.0/18 10.43.255.254
Nantes KAVLAN-21 10.47.192.0/18 10.47.255.254
IP subnet assignments for the sites within a global VLANs

A global VLAN is a /18 subnet (16382 IP addresses). It is split so that every site gets one /23 (510 ip) in the global VLAN address space.

Example for the global VLAN of Lille, KAVLAN-12, whose address space is 10.11.192.0/18:

  • Bordeaux: 10.11.192.110.11.193.254
  • Grenoble: 10.11.194.110.11.195.254
  • Lille: 10.11.196.110.11.197.254
  • Lyon: 10.11.198.110.11.199.254
  • Nancy: 10.11.200.110.11.201.254
  • Orsay: 10.11.202.110.11.203.254
  • Rennes: 10.11.204.110.11.205.254
  • Toulouse: 10.11.206.110.11.207.254
  • Sophia: 10.11.208.110.11.209.254
  • Strasbourg Reims: 10.11.210.110.11.211.254
  • Luxembourg: 10.11.212.110.11.213.254
  • Nantes: 10.11.214.110.11.215.254

(More info in the Network page)

Setting up the VLAN

Configuring the VLANs is done with the kavlan command.

All the options of the command can be show using --help, as follows:

# kavlan --help
Usage: kavlan [options]
Specific options:
    -i, --vlan-id N                  set VLAN ID (integer or DEFAULT)
    -C, --ca-cert CA                 CA certificate
    -c, --client-cert CERT           client certificate
    -k, --client-key KEY             client key
    -l, --get-nodelist               Show nodenames in the given vlan
    -e, --enable-dhcp                Start DHCP server
    -d, --disable-dhcp               Stop DHCP server
    -V, --show-vlan-id               Show vlan id of job (needs -j JOBID)
    -g, --get-vlan                   Show vlan of nodes
    -s, --set-vlan                   Set vlan of nodes
    -j, --oar-jobid JOBID            OAR job id
    -m, --machine NODE               set nodename (several -m are OK)
    -f, --filename NODEFILE          read nodes from a file
    -u, --user USERNAME              username
    -v, --[no-]verbose               Run verbosely
    -q, --[no-]quiet                 Run quietly
        --[no-]debug                 Run with debug output
    -h, --help                       Show this message
        --version                    Show version

So, once you have a kavlan job running, and know your vlan ID, you can use the kavlan command to put some network interfaces of your nodes in your VLAN (and later, back into the default VLAN) at anytime during the lifetime of your job.

In case of a node with multiple cabled network interfaces, each of them can be used, with the following naming:

  • for the default interface: hostname-X-kavlan-ID
  • for other interfaces: hostname-X-ethY-kavlan-ID
Note.png Note

You may notice that the hostname for secondaries interfaces is formed like this "hostname-X-ethY-kavlan-ID" while the name of the interface is the system is "enoY or enpYsZ".

It's due to changes in the naming of interfaces since debian9 (see https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/).
Kavlan hostname still uses old interface names, if you're not sure which name correspond to which interface, both naming (old and new) are describes in the api.

For exemple, grisou-2-eth1-kavlan-1.nancy.grid5000.fr will correspond to interface eno2 on grisou-2 ( https://api.grid5000.fr/stable/sites/nancy/clusters/grisou/nodes/grisou-2.json )


A DHCP service is provided in all VLANs (local, routed and global)

Once you have put network interfaces of some nodes in a VLAN, you can down-up them, or restart the networking service of the operating system (with kaconsole3 for instance, or using a command line with some bash magic like '| at now + 1 minute', to run the command asynchronously and overcome the network disconnection that will occur), or reboot the node (with kareboot3) in order to get a relevant IP for the VLAN.


If needed for your experiment, please note that the kavlan command allows to deactivate the DHCP service in a VLAN.


Reminder: for local VLANs, you are also allowed to ssh to the VLAN's SSH gateway, which is named kavlan-ID.


Please look at the other KaVLAN pages for examples of usage (look at the see-also dialog box at the top of the page).