Armored Node for Sensitive Data: Difference between revisions

From Grid5000
Jump to navigation Jump to search
No edit summary
No edit summary
Line 46: Line 46:
* Only the '''/data''' partition is encrypted. You must not store sensitive data outside it (for example, not in your home directory, and not on other Grid'5000 machines).
* Only the '''/data''' partition is encrypted. You must not store sensitive data outside it (for example, not in your home directory, and not on other Grid'5000 machines).
* You must only use secured protocols to transfer data to/from the node as described below.
* You must only use secured protocols to transfer data to/from the node as described below.
* If you reboot the node, you will no longer be able to access your data (unless you made a copy of the encryption key, but this is not recommended).
* If you reboot the node or if the node is shutdown for some reason, you will no longer be able to access your data (unless you made a copy of the encryption key, but this is not recommended).
** It is therefore a good idea to make intermediary backups of the processed data, in case the secured node becomes unreachable during the processing.


=== Transferring data to/from the node ===
=== Transferring data to/from the node ===
'''FIXME'''
'''FIXME'''

Revision as of 10:52, 13 April 2021

Note.png Note

This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team.

This page documents how to secure a Grid'5000 node, making it suitable to host and process more sensitive data. The process is based on a tool (g5k-armor-node.py) that runs on a debian10-x64-big environment.

Node reservation and deployment

Identify your requirements:

  • Select a cluster that suits your needs (for example using the Hardware page).
  • Estimate for how long you will need the resources. If they exceed what is allowed for the default queue in the Usage Policy, maybe the production queue will match your needs. If the duration also exceeds what is allowed by the production queue (more than one week needed), you should follow the procedure explained on the Usage Policy page to request an exception. Remember that your data will be destroyed at the end of the reservation.
  • Reserve a node and a VLAN, deploy the node with the debian10-x64-big environment inside the VLAN (see detailed steps below).

Detailed steps for reservation and deployment

Reserve the node and the VLAN:

nancy frontend:oarsub -q production -t deploy -l {"type='kavlan'"}/vlan=1+{"cluster='CLUSTER'"}/nodes=1,walltime=WALLTIME -r START DATE

FIXME: mention reserving additional disks

Once the job has started, connect inside the job:

frontend:oarsub -C JOB ID

Get the assigned VLAN number:

frontend:kavlan -V

Get the reserved node:

frontend:uniq $OAR_NODEFILE

Deploy the node with the debian10-x64-big environment, inside the VLAN:

frontend:kadeploy3 -e debian10-x64-big -m NODE --vlan VLAN NUMBER -k

Now wait for the deployment to complete.

Securing the node with g5k-armor-node.py

Connect to the node from the outside of Grid'5000, using the node name suffixed by the Kavlan number (since the node was deployed inside a Kavlan vlan). After securing the node, this will be the only allowed way to connect to the node, as SSH will only be authorized from Grid'5000 access machines:

your machine:ssh -J YOUR_G5K_LOGIN@access.grid5000.fr root@node-X-kavlan-Y.site.grid5000.fr

On the node, download g5k-armor-node.py, for example with:

node:wget https://gitlab.inria.fr/grid5000/g5k-armor/-/raw/master/g5k-armor-node.py

Run it:

node:chmod a+rx g5k-armor-node.py
node:./g5k-armor-node.py

After the script finishes, disconnect from the node, and try to connect again using SSH. You should get an error message from SSH, because the node's host key changed. This is expected: the script replaced the node's SSH host key with a newly generated one. Follow the instructions to remove the old key.

Using the secured node

You can either connect using the root account, or using your Grid'5000 login. The node can access the Internet, and you can use the root access on the node to install additional software if needed.

Please remember that:

  • Only the /data partition is encrypted. You must not store sensitive data outside it (for example, not in your home directory, and not on other Grid'5000 machines).
  • You must only use secured protocols to transfer data to/from the node as described below.
  • If you reboot the node or if the node is shutdown for some reason, you will no longer be able to access your data (unless you made a copy of the encryption key, but this is not recommended).
    • It is therefore a good idea to make intermediary backups of the processed data, in case the secured node becomes unreachable during the processing.

Transferring data to/from the node

FIXME