Revision as of 17:44, 27 January 2021

	Note
	This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team.

This page describes installation steps of common Deep Learning frameworks.

Deep learning on x86_64 nodes (common case)

pip will be used to install the frameworks (conda could be used much the same way). Installation is performed under your home directory.

Reserve some GPU nodes with OAR

Reserve a node with some GPUs (see the Hardware page for the list of sites and clusters with GPUs).

For instance, to reserve one GPU using OAR:

$ oarsub -I -l gpu=1

(remember to add '-q production' option if you want to reserve a GPU from Nancy "production" resources)

To reserve the full node:

$ oarsub -I -l host=1

To reserve a gpu or a full node on a specific cluster, add to the oarsub command:

-p cluster=<clustername>

Once connected to the node, check GPU presence and the available CUDA version:

$ nvidia-smi 
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
(...)

PyTorch

Go on PyTorch website to see the installation command that suits you.

For instance (as of May 2020), selecting “Stable”, “Linux”, “Pip”, “Python”, “Cuda 10.1” gives this command to execute:

$ pip3 install torch==1.5.0+cu101 torchvision==0.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

Check if PyTorch is correctly installed to works with GPU:

$ python3 -c "import torch; print(torch.cuda.is_available())"

Tensorflow (with Keras)

Go on Tensorflow website to see the installation commands. As of May 2020 (tensorflow v2.2.0), it is:

$ pip3 install --upgrade pip
$ pip3 install tensorflow

To use GPUs, TensorFlow requires CudNN library. We provide it as a module to load:

$ module load cudnn

Now check if TensorFlow is correctly installed to works with GPU:

$ python3 -c "import tensorflow as tf; print('Num GPUs Available:', len(tf.config.experimental.list_physical_devices('GPU')))"

Note: This install TensorFlow v2. If you need TensorFlow v1, see https://www.tensorflow.org/guide/migrate

MXNet

Go on MXNet website to see the installation command that suits you.

For instance (as of May 2020), selecting “Linux”, “Python”, “GPU” and “Pip”, the command to execute (in order to use Cuda 10.1) is:

$ pip3 install mxnet-cu101

Check if PyTorch is correctly installed to works with GPU:

$ python3 -c "import mxnet; print('Num GPUs Available:', mxnet.context.num_gpus())"

Additional resources

An in-depth tutorial contributed by a Grid'5000 user, Ismael Bada
Many Docker images exist with ready-to-use Deep Learning software stack. They can be executed using Docker or Singularity tools (using appropriate options to enable GPU usage). See wiki pages to learn how to use these tools in Grid'5000.
If you want to use virtualenv to manage your Python packages, it is available in Grid'5000 standard environments. Create your environment with python3 -m venv <env_directory> and activate it using source <env_directory>/bin/activate before using pip and installed packages.
If you prefer to use conda to manage your Python packages, it is available in Grid'5000 as a module. Just execute "module load miniconda3" from a node or a frontend to make it available.

Deep learning on ppc64 nodes

We have an IBM cluster with many GPUs. But since it is running with a ppc64 architecture, many deep learning frameworks cannot be easily installed.

Reserve ppc64 GPU nodes with OAR

Reserve a ppc64 node with GPUs (see the Hardware page of drac cluster for details).

To reserve a full node:

$ oarsub -I -p cluster=drac -l host=1,walltime=1:00

Once connected to the node, check GPU presence and the available CUDA version:

$ nvidia-smi 
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
(...)

	Note
	ppc64 nodes come with a known-working Nvidia driver version in their default environment, but it only supports CUDA versions up to 10.1. If you install a more recent driver or deploy your own images, you may experience frequent system crashes with recent nvidia drivers on Debian or Ubuntu. CentOS seems unaffected by the crashes. See nvidia developer forum

@@ Line 5: / Line 5: @@
 This page describes installation steps of common Deep Learning frameworks.
+= Deep learning on x86_64 nodes (common case) =
 ''pip'' will be used to install the frameworks (''conda'' could be used much the same way). Installation is performed under your home directory.
-= Reserve some GPU nodes with OAR =
+== Reserve some GPU nodes with OAR ==
 *Reserve a node with some GPUs (see the [[Hardware#Accelerators_.28GPU.2C_Xeon_Phi.29|Hardware]] page for the list of sites and clusters with GPUs).
@@ Line 29: / Line 31: @@
 (...)</pre>
-= PyTorch =
+== PyTorch ==
@@ Line 38: / Line 40: @@
 <pre>$ python3 -c &quot;import torch; print(torch.cuda.is_available())&quot;</pre>
-= Tensorflow (with Keras) =
+== Tensorflow (with Keras) ==
@@ Line 52: / Line 54: @@
 Note: This install TensorFlow v2. If you need TensorFlow v1, see https://www.tensorflow.org/guide/migrate
-= MXNet =
+== MXNet ==
 *Go on [https://mxnet.apache.org/get_started?platform=linux&language=python&processor=gpu&environ=pip& MXNet website] to see the installation command that suits you.
@@ Line 60: / Line 62: @@
 <pre>$ python3 -c &quot;import mxnet; print('Num GPUs Available:', mxnet.context.num_gpus())&quot;</pre>
-= Additional resources =
+== Additional resources ==
 * An [[User:Ibada/Tuto Deep Learning|in-depth tutorial]] contributed by a Grid'5000 user, Ismael Bada
 * Many Docker images exist with ready-to-use Deep Learning software stack. They can be executed using [[Docker]] or [[Singularity]] tools (using appropriate options to enable GPU usage). See wiki pages to learn how to use these tools in Grid'5000.
 * If you want to use ''virtualenv'' to manage your Python packages, it is available in Grid'5000 standard environments. Create your environment with ''python3 -m venv <env_directory>'' and activate it using ''source <env_directory>/bin/activate'' before using ''pip'' and installed packages.
 * If you prefer to use ''conda'' to manage your Python packages, it is available in Grid'5000 as a [[Software using modules|module]]. Just execute "module load miniconda3" from a node or a frontend to make it available.
+= Deep learning on ppc64 nodes =
+We have an IBM cluster with many GPUs. But since it is running with a ppc64 architecture, many deep learning frameworks cannot be easily installed.
+== Reserve ppc64 GPU nodes with OAR ==
+*Reserve a ppc64 node with GPUs (see the [[Hardware:Grenoble#drac|Hardware page of drac cluster]] for details).
+To reserve a full node:
+<pre>$ oarsub -I -p cluster=drac -l host=1,walltime=1:00</pre>
+*Once connected to the node, check GPU presence and the available CUDA version:
+<pre>$ nvidia-smi
++-----------------------------------------------------------------------------+
+| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
+(...)</pre>
+{{Note|text=ppc64 nodes come with a known-working Nvidia driver version in their default environment, but it only supports CUDA versions up to 10.1. If you install a more recent driver or deploy your own images, you may experience frequent system crashes with recent nvidia drivers on Debian or Ubuntu. CentOS seems unaffected by the crashes. See [https://forums.developer.nvidia.com/t/recent-nvidia-tesla-drivers-cause-system-crashs-on-powernvl-w-p100-gpus/160938 nvidia developer forum]}}

Deep Learning Frameworks: Difference between revisions

Revision as of 17:44, 27 January 2021

Contents

Deep learning on x86_64 nodes (common case)

Reserve some GPU nodes with OAR

PyTorch

Tensorflow (with Keras)

MXNet

Additional resources

Deep learning on ppc64 nodes

Reserve ppc64 GPU nodes with OAR

Navigation menu

Deep Learning Frameworks: Difference between revisions

Revision as of 17:44, 27 January 2021

Deep learning on x86_64 nodes (common case)

Reserve some GPU nodes with OAR

PyTorch

Tensorflow (with Keras)

MXNet

Additional resources

Deep learning on ppc64 nodes

Reserve ppc64 GPU nodes with OAR

Navigation menu

Search