Conda: Difference between revisions

From Grid5000
Jump to navigation Jump to search
No edit summary
No edit summary
 
(288 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{Author|Laurent Mirtain}}
{{Maintainer|Laurent Mirtain}}
{{Status|In production}}
{{Portal|User}}
{{Portal|User}}
{{Portal|Tutorial}}
{{Pages|HPC}}
{{Portal|HPC}}
{{TutorialHeader}}
{{TutorialHeader}}


{{Note|text='''The purpose of this document is:'''<br />
* explains how to use conda on Grid'5000
* gives examples to install and configure software with conda to create environments for running HPC and AI jobs on Grid'5000
'''It was written by consolidating the following different information resources:'''
* Grid'5000 documentation Environment modules, HPC and HTC tutorial, Deep Learning Frameworks
* An [[User:Ibada/Tuto Deep Learning|in-depth tutorial]] contributed by a Grid'5000 user, Ismael Bada
* [[User:Bjonglez/Debian11/Deep Learning Frameworks]]
}}


= Introduction =
[https://docs.conda.io/projects/conda/en/latest/index.html Conda] is an open source package management system and environment management system for installing multiple versions of software packages and their dependencies and switching easily between them. It works on Linux, OS X and Windows, and was created for Python programs but can package and distribute any software.
To get started with Conda, have a look at this [https://docs.conda.io/projects/conda/en/latest/user-guide/cheatsheet.html Conda cheat sheet] and this [https://towardsdatascience.com/managing-project-specific-environments-with-conda-b8b50aa8be0e Getting Started with Conda] guide.
== Conda, Miniconda, Anaconda ? ==


= Introduction =
* '''conda''' is the package manager.
* '''miniconda''' is a minimal python distribution for '''conda''' that includes base packages
* '''anaconda''' is another python distribution for '''conda''' that includes 160+ additionnal packages to miniconda
 
On Grid'5000, we installed ''conda'' using the ''miniconda'' installer, but you are free to create an anaconda environment, using the ''anaconda'' meta-package.
 
More information about Miniconda vs Anaconda is available on the [https://docs.conda.io/projects/conda/en/latest/user-guide/install/download.html#anaconda-or-miniconda Conda website].


Conda is an open source package management system and environment management system for installing multiple versions of software packages and their dependencies and switching easily between them. It works on Linux, OS X and Windows, and was created for Python programs but can package and distribute any software.
== Conda or Mamba? ==


Reference:
[https://mamba.readthedocs.io/en/latest/index.html mamba] is a reimplementation of the conda package manager in C++. Conda has a reputation for taking time when dealing with complex sets of dependencies. Mamba is much more efficient and is fully compatible with Conda packages and supports most of Conda's commands. It consists of:
* [conda website https://docs.conda.io/projects/conda/en/latest/index.html]
* mamba: a Python-based CLI conceived as a drop-in replacement for conda, offering higher speed and more reliable environment solutions
* [Conda cheat sheet https://docs.conda.io/projects/conda/en/latest/user-guide/cheatsheet.html]
* micromamba: a pure C++-based CLI, self-contained in a single-file executable
* libmamba: a C++ library exposing low-level and high-level APIs on top of which both mamba and micromamba are built


= Installation =
=== Mamba on Grid'5000 ===


The conda package and environment manager is included in all versions of Anaconda®, Miniconda, and Anaconda Repository. Conda is also included in Anaconda Enterprise, which provides on-site enterprise package and environment management for Python, R, Node.js, Java, and other application stacks. Conda is also available on conda-forge, a community channel.
Like Conda, Mamba is available as a module on Grid'5000:
{{Term|location=frontal|cmd=<code class="command">module load mamba</code>}}


== Anaconda or Miniconda? ==
Then, since its syntax is generally compatible with Conda, you can use the <code class="command">mamba</code> command where you would use the <code class="command">conda</code> command.


See: https://docs.conda.io/projects/conda/en/latest/user-guide/install/download.html#anaconda-or-miniconda
= Conda on Grid'5000 =


== Conda avaibility in Grid'5000 ==
Conda is already available in Grid'5000 as a module. '''You do not need to install Anaconda or Miniconda on Grid'5000 !'''


Conda is already available in Grid'5000 as a [module https://www.grid5000.fr/w/Software_using_modules]. Just execute `module load miniconda3` from a node or a frontend to make it available.
== Load Conda module ==


If you want to install conda on your desktop follow the [installation guide](https://conda.io/projects/conda/en/stable/user-guide/install/index.html#) in the conda Website.
* To make it available on a node or on a frontend, load the Conda module as follows (default version):
{{Term|location=frontal|cmd=<code class="command">module load conda</code>}}


You have 3 conda download options:
== Optional: Conda initialization and activation in your shell ==
* Download Anaconda---free.
* Download Miniconda---free.
* Purchase Anaconda Enterprise.


== Miniconda installation on your desktop ==
Conda initialization is the process of defining some shell functions that facilitate activating and deactivating Conda environments, as well as some optional features such as updating PS1 to show the active environment. It is not required to use Conda.


Commands to install miniconda somewhere in your home directory
The conda shell function is mainly a forwarder function. It will delegate most of the commands to the real conda executable driven by the Python library.


```
There are two ways to initialize conda in standard installation:
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
eval "$(conda shell.bash hook)" # two load conda shell environment
```


NB: your ` .bashrc` is modified by the installation program
* 1. occasionally : activate conda in your current shell (ex: bash)
{{Term|location=$|cmd=<code class="command">eval "$(conda shell.bash hook)"</code>}}


* 2. always : activate conda in your login shell environment permanently (this command modifies your .bashrc by adding conda setup directives)
{{Term|location=$|cmd=<code class="command">conda init</code>}}


{{Warning|text=bash is the default shell for conda.<br>
For users using tcsh or zsh  use :
* <code class="command">eval "$(conda shell.{tcsh,zsh} hook)"</code>
* <code class="command">conda init {tcsh,zsh}</code>}}


= Conda usage =
In Grid'5000, the '''conda''' initialization is made transparently by loading the conda module.


== Getting started ==
The <code class="command">conda activate</code> or
<code class="command">conda deactivate</code> commands relies on the conda shell initialization to load/unload the corresponding conda environment variables to the current shell session.


The command `conda init bash` load the conda added bash environment  
By default, you are located in the <code>base</code> Conda environment that corresponds to the base installation of Conda.


The command `conda info` display information about current conda install
If you’d prefer that conda’s base environment not be activated on startup, set the auto_activate_base parameter to false:
{{Term|location=$|cmd=<code class="command">conda config --set auto_activate_base false</code>}}


Usage
Verify your conda configuration with this command:
```
{{Term|location=$|cmd=<code class="command">conda config --show</code>}}
$ conda --help
usage: conda [-h] [-V] command ...


conda is a tool for managing and deploying applications, environments and packages.
Look at all available configuration options with:
{{Term|location=$|cmd=<code class="command">conda config --describe</code>}}


Options:
== Conda environments ==


positional arguments:
Conda allows you to create separate environments containing files, packages, and their dependencies that will not interact with other environments.
  command
    clean            Remove unused packages and caches.
    compare          Compare packages between conda environments.
    config            Modify configuration values in .condarc. This is modeled
                      after the git config command. Writes to the user
                      .condarc file (/user/lmirtain/home/.condarc) by default.
                      Use the --show-sources flag to display all identified
                      configuration locations on your computer.
    create           Create a new conda environment from a list of specified
                      packages.
    info              Display information about current conda install.
    init              Initialize conda for shell interaction.
    install          Installs a list of packages into a specified conda
                      environment.
    list              List installed packages in a conda environment.
    package          Low-level conda package utility. (EXPERIMENTAL)
    remove (uninstall)
                      Remove a list of packages from a specified conda
                      environment.
    rename            Renames an existing environment.
    run              Run an executable in a conda environment.
    search            Search for packages and display associated
                      information.The input is a MatchSpec, a query language
                      for conda packages. See examples below.
    update (upgrade)  Updates conda packages to the latest compatible version.
    notices          Retrieves latest channel notifications.


options:
When you begin using conda, you already have a default environment named <code>base</code>.  
  -h, --help          Show this help message and exit.
You can create separate environments to keep your programs isolated from each other. Specifying the environment name confines conda commands to that environment.
  -V, --version      Show the conda version number and exit.
```


See:
{{Warning|text=The <code>base</code> environment is stored in a read-only directory as shown by <code>conda info</code> command
* [Conda cheat sheet https://docs.conda.io/projects/conda/en/latest/user-guide/cheatsheet.html] for curent commands
'''That's why you need to systematically create your own conda environments to install the software you need.'''}}
* [Getting Started with Conda https://towardsdatascience.com/managing-project-specific-environments-with-conda-b8b50aa8be0e]


* List all your environments
{{Term|location=$|cmd=<code class="command">conda info --envs</code>}}
or
{{Term|location=$|cmd=<code class="command">conda env list</code>}}


== Conda channels and packages ==
* Create a new environment
{{Term|location=$|cmd=<code class="command">conda create --name ENVNAME</code>}}


Channels are the locations of the repositories where conda looks for packages. Channels may point to a Cloud repository or a private location on a remote or local repository that you or your organization created.
* Activate this environment before installing package
In its default configuration, conda can install and manage the over 7,500 packages at https://repo.anaconda.com/pkgs/ that are built, reviewed, and maintained by Anaconda. This is the default conda channel which may require a paid license, as described in the repository terms of service a commercial license.
{{Term|location=$|cmd=<code class="command">conda activate ENVNAME</code>}}


Other usefull channels:
For further information:
* The conda-forge channel https://conda-forge.org/ is free for all to use.
* https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
* The nvidia channel https://anaconda.org/nvidia provides Nvidia's software
* [https://towardsdatascience.com/managing-project-specific-environments-with-conda-406365a539ab Managing your data science project environments with Conda]
* The IBM PowerAI https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/#/ Watson Machine Learning Community Edition


== Conda package installation ==


list all packages + source channels
In its default configuration (the default Conda channel), Conda can install and manage the over 7,500 packages at https://repo.anaconda.com/pkgs/ that are built, reviewed, and maintained by Anaconda.
{{Term|location=inside|cmd=<code class="command">conda list --show-channel-urls</code>}}


Install a package from specific channel
{{Term|location=$|cmd=<code class="command">conda install <package></code>}}
{{Term|location=inside|cmd=<code class="command">conda install -c CHANNELNAME PKG1 PKG2</code>}}


Visit for details:
* Install specific version of package:
* [https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/packages.html]
* [https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/channels.html]


{{Term|location=$|cmd=<code class="command">conda install <package>=<version></code>}}


== Conda environments ==
* Uninstall a package:
 
{{Term|location=$|cmd=<code class="command">conda uninstall <package></code>}}
 
For further information:
* https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/packages.html


Conda allows you to create separate environments containing files, packages, and their dependencies that will not interact with other environments.
== Conda package installation from channels ==


When you begin using conda, you already have a default environment named "base".  
Channels are the locations of the repositories where Conda looks for packages. Channels may point to a Cloud repository or a private location on a remote or local repository that you or your organization created. Useful channels are:
You can create separate environments to keep your programs isolated from each other. Specifying the environment name confines conda commands to that environment.
* <code>conda-forge</code> from https://conda-forge.org. It is free for all to use.  
* <code>nvidia</code> from https://anaconda.org/nvidia. It provides Nvidia's software.


List all your environments
To install a package from a specific channel:
{{Term|location=inside|cmd=<code class="command">conda info --envs</code>}}
{{Term|location=$|cmd=<code class="command">conda install -c <channel_name> <package></code>}}
or
{{Term|location=inside|cmd=<code class="command">conda env list</code>}}


Typical sequence
* List all packages installed with their source channels
* create a new environment
{{Term|location=$|cmd=<code class="command">conda list --show-channel-urls</code>}}
{{Term|location=inside|cmd=<code class="command">conda create --name ENVNAME</code>}}


* activate this environment before installing package
For further information:
{Term|location=inside|cmd=<code class="command">conda activate ENVNAME</code>}
* https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/channels.html


Recommandation: Packages are installed in $HOME/.conda
{{Warning|text=Installing Conda packages can be time and resources consuming. Preferably use a node (instead of a frontend) to perform such an operation. Note, using a node is mandatory if you need to access specific hardware resources like GPU.}}
You could, therefore, rapidly saturate the disk quota of your $HOME


See more:
= Application examples =
* [https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html]
* [Managing your data science project environments with Conda https://towardsdatascience.com/managing-project-specific-environments-with-conda-406365a539ab]


== Create an environment ==


= Use conda on Grid'5000 =  
For example create environment <code class="replace"><env_name></code> (specify a Python version; otherwise, it is the module default version)
{{Term|location=fgrenoble|cmd=<code class="command">conda create -y -n </code><code class="replace"><env_name></code> <code class="command">python=x.y</code>}}


Remind, you don't need to install Anaconda or Miniconda to create your own environment on Grid'5000! You just load conda by using the `module load miniconda3` command.
== Load this environment ==
{{Term|location=fgrenoble|cmd=<code class="command">conda activate </code><code class="replace"><env_name></code>}}
Log in a grid'5000 frontal
{{Term|location=inside|cmd=<code class="command">ssh</code> <code class="replace">login</code><code class="command">@</code><code class="host">access.grid5000.fr</code>}}


go to your favorite grid5000 site
== Install a package into ==
{{Term|location=inside|cmd=<code class="command">ssh grenoble</code>}}
{{Term|location=fgrenoble|cmd=<code class="command">conda install </code><code class="replace"><package_name></code>}}


Load conda and source it
== Exit from the loaded environment ==
{{Term|location=inside|cmd=<code class="command">module load miniconda3</code>}}
{{Term|location=fgrenoble|cmd=<code class="command">conda deactivate</code>}}
{{Term|location=inside|cmd=<code class="command">eval "$(conda shell.bash hook)"</code>}}


== Remove unused Conda environments ==


Current commands are:
{{Warning|text=Conda packages are installed in <code>$HOME/.conda</code>. You could, therefore, rapidly saturate your [[Storage#.2Fhome|homedir quota]] (25GB by default). Do not forget to occasionally remove unused Conda environment to free up space.}}
<pre>
    conda init <SHELL_NAME> to initialize your shell
    conda deactivate to exit from the environment loaded with module load
    conda create -y -n <name> python=x.y to create an environment (specify a Python version; otherwise, it is the module default version)
    conda activate <name>
    conda env remove --name <name> to correctly delete the environment
    conda clean -a to remove unused packages and the cache. Do not be concerned if this appears to try to delete the packages of the system environment (ie. non-local).
</pre>


* To delete an environment
{{Term|location=fgrenoble|cmd=<code class="command">conda deactivate</code><br>
<code class="command">conda env remove --name </code><code class="replace"><env_name></code>}}


Tip: Creating an environment can be time and resource consuming. Preferably reserve and connect to a node via oarsub command (mandatory if you need to access to specific hardware ressource like GPU).
* To remove unused packages and the cache. Do not be concerned if this appears to try to delete the packages of the system environment (ie. non-local).
{{Term|location=fgrenoble|cmd=<code class="command">conda clean -a</code>}}


== Use a Conda environment in a job ==


== Create a specific environment for PowerPC arch ==
As seen in the previous section, the Conda environment is stored by default in user's homedir (at <code>~/.conda</code>). Once the environment is created and packages installed, it is usable on all nodes from the given site.


Because the version of pytorch in PowerAI is too old for py38 or py39, we must use python 3.7,
=== For interactive jobs ===
{{Term|location=inside|cmd=<code class="command">conda create create --name pytorch-ppc64-py37 python=3.7</code>}}


Use this specific environment
Load, init, and active you conda environment <code class="replace">env_name</code> in an interactive job
{{Term|location=inside|cmd=<code class="command">conda activate pytorch-ppc64-py37</code>}}


Install pytorch in this environment
{{Term|location=frontal|cmd=<code class="command">oarsub -I</code>}}
{{Term|location=inside|cmd=<code class="command">conda config --prepend channels https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/</code>}}
{{Term|location=node|cmd=<code class="command">module load conda</code><br>
{{Term|location=inside|cmd=<code class="command">conda install pytorch -c</code>}}
<code class="command">conda activate </code><code class="replace">env_name</code>}}


=== For batch jobs ===


== Use NIVDIA tools ==
Load, initialize, and active you conda environment <code class="replace">env_name</code> in a batch job


NVIDIA libraries are available via Conda and you can manage project specific versions of the NVIDIA CUDA Toolkit, NCCL, and cuDNN using Conda.
First prepare your conda environment on the frontend:
* module load and conda initialization
* conda creation of an environment <code>testconda</code> containing <code>gcc</code> from <code>conda-forge</code> channel
* list installed packages with source info
{{Term|location=fsiteA|cmd=<code class="command">module load conda</code><br>
<code class="command">conda create --name testconda</code><br>
<code class="command">conda activate testconda</code><br>
<code class="command">conda install -c conda-forge gcc_linux-64 gxx_linux-64</code>}}
* launch this commands and keep output result
{{Term|location=fsiteA|cmd=<code class="command">conda info</code><br>
<code class="command">conda list -n testconda --show-channel-urls</code>}}


NVIDIA actually maintains their own Conda channel and the versions of CUDA Toolkit available from the default channels are the same as those you will find on the NVIDIA channel.
In this example, we launch a job that does the same tasks but in batch job.
* The important step is to source shell environment to execute module and activate conda
{{Term|location=fsiteA|cmd=<code class="command">oarsub 'bash -l -c "module load conda ; conda activate testconda ; conda info ; conda list -n testconda --show-channel-url"'</code>}}
<pre>OAR_JOB_ID=1539228</pre>


To compare build numbers version from default and nvidia channel
* Is job finished ?
{{Term|location=inside|cmd=<code class="command">conda search --channel nvidia cudatoolkit</code>}}
{{Term|location=fsiteA|cmd=<code class="command">oarsub -C 1539228</code>}}
<pre># Error: job 1539228 is not running. Its current state is Finishing.</pre>


Compare also limitation of versions from the diffent channels: ie. NVIDIA
* Compare output with the previous one : they should be identical
{{Term|location=fsiteA|cmd=<code class="command">cat OAR.1539228.std</code>}}


See:
= Advanced Conda environment operations =
* [https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#conda-installation]
* [Managing CUDA dependencies with Conda https://towardsdatascience.com/managing-cuda-dependencies-with-conda-89c5d817e7e1]


conda create --name NvidiaTools
== Synchronize Conda environments between Grid'5000 sites ==
conda activate NvidiaTools
conda install cudatoolkit -c nvidia


== Use pytorch ==
* To synchronize a Conda directory from a siteA to a siteB:


{{Term|location=fsiteA|cmd=<code class="command">rsync --dry-run --delete -avz ~/.conda siteB.grid5000.fr:~</code>}}


Installation pytorch from nvidia channel
To really do things, the <code>--dry-run</code> argument has to be removed and ''siteB'' has to be replaced by a real site name.
{{Term|location=inside|cmd=<code class="command">conda install pytorch -c nvidia`


Hello world pytorch:
== Share Conda environments between multiple users ==
- See: https://towardsai.net/p/l/how-to-set-up-and-run-cuda-operations-in-pytorch


# Tensorflow
You can use two different approaches to share Conda environments with other users.
`conda install tensorflow-gpu -c conda-forge`
ne pas oublier de faire activate après l'installation


```gemini-1@lyon:~/dload nniclausse# python
=== Export an environment as a yaml file ===
Python 3.7.11 (default, Jul 27 2021, 14:32:16)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
> import tensorflow as tf
> hello = tf.constant("hello TensorFlow!")
> sess=tf.Session()
```


== CUDA ==
* Export it as follows:
pour afficher la version installée:
{{Term|location=fgrenoble|cmd=<code class="command">conda env export > environment.yml</code>}}


`nvcc --version`
* Share it by putting the yaml file in your public folder
{{Term|location=fgrenoble|cmd=<code class="command">cp environment.yml ~/public/</code>}}


utilisation des channels conda-forge ou nvidia pour installer cuda
* Other users can create the environment from the <code>environment.yml</code> file
{{Term|location=fgrenoble|cmd=<code class="command">conda env create -f ~/<login>/public/environment.yml</code>}}


== GCC ==
* Advantage : it prevents other users from damaging the environment if they add packages that could conflict with other packages and/or even delete packages that another user might need.
installer les dernières version de gcc via conda-forge (acutellement gcc 12)
* Inconvenient : it's not a true shared environment. The environment is duplicated on other users' home directory. Any modification on one Conda environment will not be automatically replicated on others.


`conda install -c conda-forge gcc_linux-64 gxx_linux-64`
=== Use a group storage ===


== OpenMPI ==
[[Group Storage]] gives you the possibility to share a storage between multiple users. You can take advantage of a group storage to share a single Conda environment among multiple users.
ucx = pour utiliser le réseau rapide (infiniband)


```
* Create a shared Conda environment with <code>--prefix</code> to specify the path to use to store the conda environment
  module load miniconda3/4.10.3_gcc-10.2.0
{{Term|location=flyon|cmd=<code class="command">conda create --prefix /srv/storage/</code><code class="replace">storage_name</code>@<code class="replace">server_hostname_(fqdn)/ENVNAME</code>}}
  conda activate mamba
  conda activate --stack openmpi
  mamba install -c conda-forge gcc_linux-64 openmpi ucx
```
Installation de NetPIPE pour tester la latence et le débit réseau
```
  wget https://src.fedoraproject.org/lookaside/pkgs/NetPIPE/NetPIPE-3.7.1.tar.gz/5f720541387be065afdefc81d438b712/NetPIPE-3.7.1.tar.gz
  tar zvxf NetPIPE-3.7.1.tar.gz
  cd NetPIPE-3.7.1/
  make mpi
```
réserver 2 procs sur 2 noeuds différents:
`oarsub -I -l /nodes=2/core=1`


version sans réseau rapide:   
* Activate the shared environment (share this command with the targeted users)
    `mpirun -np 2 --machinefile $OAR_NODEFILE --prefix $CONDA_PREFIX --mca plm_rsh_agent oarsh  NPmpi `
{{Term|location=flyon|cmd=<code class="command">conda activate /srv/storage/</code><code class="replace">storage_name</code>@<code class="replace">server_hostname_(fqdn)/ENVNAME</code>}}
   
version avec ucx:
    `mpirun -np 2 --machinefile $OAR_NODEFILE --prefix $CONDA_PREFIX --mca plm_rsh_agent oarsh --mca pml ucx  --mca osc ucx`


* Advantage : It avoids storing duplicate packages and makes any modification accessible to all users
* Inconvenients :
** Users could potentially harm the environment by installing or removing packages.
** When installing additional packages, conda still stores them in the package cache located in your home directory. Use <code class="command">conda clean</code> as described above to clean those files.


== Keras ==
* Create your environments by defaut in a group storage location
You can modify you <code>~/.condarc</code> file to specify this location for conda environment and package installation as follow (change the location to suit your group and your convenience). Add this lines:
<pre>
pkgs_dirs:
  - /srv/storage/storage_name@server_hostname_(fqdn)/conda_shared_envs/pkgs/
envs_dirs:
  - /srv/storage/storage_name@server_hostname_(fqdn)/conda_shared_envs/envs/
</pre>


Keras is a high-level neural networks API, written in python, which is used as a wrapper of theano, tensorflow or CNTK.
= Build your HPC-IA framework with conda =
Keras allows to create deep learning experiments much more easily than using directly theano or tensorflow,
it's the recommended tool for beginners and even advanced users who don't want to deal and spend too much time with the complexity of low-level libraries as theano and tensorflow.


`conda install keras`
Here are some pointers to help you set up your software environment for HPC or AI with conda
* [[HPC_and_HTC_tutorial]]
* Running [[Run_MPI_On_Grid'5000|MPI applications on Grid'5000]]
* [[Deep_Learning_Frameworks|Deep Learning Frameworks documentation]]

Latest revision as of 17:23, 21 September 2023

Note.png Note

This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team.


Introduction

Conda is an open source package management system and environment management system for installing multiple versions of software packages and their dependencies and switching easily between them. It works on Linux, OS X and Windows, and was created for Python programs but can package and distribute any software.

To get started with Conda, have a look at this Conda cheat sheet and this Getting Started with Conda guide.

Conda, Miniconda, Anaconda ?

  • conda is the package manager.
  • miniconda is a minimal python distribution for conda that includes base packages
  • anaconda is another python distribution for conda that includes 160+ additionnal packages to miniconda

On Grid'5000, we installed conda using the miniconda installer, but you are free to create an anaconda environment, using the anaconda meta-package.

More information about Miniconda vs Anaconda is available on the Conda website.

Conda or Mamba?

mamba is a reimplementation of the conda package manager in C++. Conda has a reputation for taking time when dealing with complex sets of dependencies. Mamba is much more efficient and is fully compatible with Conda packages and supports most of Conda's commands. It consists of:

  • mamba: a Python-based CLI conceived as a drop-in replacement for conda, offering higher speed and more reliable environment solutions
  • micromamba: a pure C++-based CLI, self-contained in a single-file executable
  • libmamba: a C++ library exposing low-level and high-level APIs on top of which both mamba and micromamba are built

Mamba on Grid'5000

Like Conda, Mamba is available as a module on Grid'5000:

Terminal.png frontal:
module load mamba

Then, since its syntax is generally compatible with Conda, you can use the mamba command where you would use the conda command.

Conda on Grid'5000

Conda is already available in Grid'5000 as a module. You do not need to install Anaconda or Miniconda on Grid'5000 !

Load Conda module

  • To make it available on a node or on a frontend, load the Conda module as follows (default version):
Terminal.png frontal:
module load conda

Optional: Conda initialization and activation in your shell

Conda initialization is the process of defining some shell functions that facilitate activating and deactivating Conda environments, as well as some optional features such as updating PS1 to show the active environment. It is not required to use Conda.

The conda shell function is mainly a forwarder function. It will delegate most of the commands to the real conda executable driven by the Python library.

There are two ways to initialize conda in standard installation:

  • 1. occasionally : activate conda in your current shell (ex: bash)
Terminal.png $:
eval "$(conda shell.bash hook)"
  • 2. always : activate conda in your login shell environment permanently (this command modifies your .bashrc by adding conda setup directives)
Terminal.png $:
conda init
Warning.png Warning

bash is the default shell for conda.

For users using tcsh or zsh use :

  • eval "$(conda shell.{tcsh,zsh} hook)"
  • conda init {tcsh,zsh}

In Grid'5000, the conda initialization is made transparently by loading the conda module.

The conda activate or conda deactivate commands relies on the conda shell initialization to load/unload the corresponding conda environment variables to the current shell session.

By default, you are located in the base Conda environment that corresponds to the base installation of Conda.

If you’d prefer that conda’s base environment not be activated on startup, set the auto_activate_base parameter to false:

Terminal.png $:
conda config --set auto_activate_base false

Verify your conda configuration with this command:

Terminal.png $:
conda config --show

Look at all available configuration options with:

Terminal.png $:
conda config --describe

Conda environments

Conda allows you to create separate environments containing files, packages, and their dependencies that will not interact with other environments.

When you begin using conda, you already have a default environment named base. You can create separate environments to keep your programs isolated from each other. Specifying the environment name confines conda commands to that environment.

Warning.png Warning

The base environment is stored in a read-only directory as shown by conda info command That's why you need to systematically create your own conda environments to install the software you need.

  • List all your environments
Terminal.png $:
conda info --envs

or

Terminal.png $:
conda env list
  • Create a new environment
Terminal.png $:
conda create --name ENVNAME
  • Activate this environment before installing package
Terminal.png $:
conda activate ENVNAME

For further information:

Conda package installation

In its default configuration (the default Conda channel), Conda can install and manage the over 7,500 packages at https://repo.anaconda.com/pkgs/ that are built, reviewed, and maintained by Anaconda.

Terminal.png $:
conda install <package>
  • Install specific version of package:
Terminal.png $:
conda install <package>=<version>
  • Uninstall a package:
Terminal.png $:
conda uninstall <package>

For further information:

Conda package installation from channels

Channels are the locations of the repositories where Conda looks for packages. Channels may point to a Cloud repository or a private location on a remote or local repository that you or your organization created. Useful channels are:

To install a package from a specific channel:

Terminal.png $:
conda install -c <channel_name> <package>
  • List all packages installed with their source channels
Terminal.png $:
conda list --show-channel-urls

For further information:

Warning.png Warning

Installing Conda packages can be time and resources consuming. Preferably use a node (instead of a frontend) to perform such an operation. Note, using a node is mandatory if you need to access specific hardware resources like GPU.

Application examples

Create an environment

For example create environment <env_name> (specify a Python version; otherwise, it is the module default version)

Terminal.png fgrenoble:
conda create -y -n <env_name> python=x.y

Load this environment

Terminal.png fgrenoble:
conda activate <env_name>

Install a package into

Terminal.png fgrenoble:
conda install <package_name>

Exit from the loaded environment

Terminal.png fgrenoble:
conda deactivate

Remove unused Conda environments

Warning.png Warning

Conda packages are installed in $HOME/.conda. You could, therefore, rapidly saturate your homedir quota (25GB by default). Do not forget to occasionally remove unused Conda environment to free up space.

  • To delete an environment
Terminal.png fgrenoble:
conda deactivate
conda env remove --name <env_name>
  • To remove unused packages and the cache. Do not be concerned if this appears to try to delete the packages of the system environment (ie. non-local).
Terminal.png fgrenoble:
conda clean -a

Use a Conda environment in a job

As seen in the previous section, the Conda environment is stored by default in user's homedir (at ~/.conda). Once the environment is created and packages installed, it is usable on all nodes from the given site.

For interactive jobs

Load, init, and active you conda environment env_name in an interactive job

Terminal.png frontal:
oarsub -I
Terminal.png node:
module load conda
conda activate env_name

For batch jobs

Load, initialize, and active you conda environment env_name in a batch job

First prepare your conda environment on the frontend:

  • module load and conda initialization
  • conda creation of an environment testconda containing gcc from conda-forge channel
  • list installed packages with source info
Terminal.png fsiteA:
module load conda

conda create --name testconda
conda activate testconda

conda install -c conda-forge gcc_linux-64 gxx_linux-64
  • launch this commands and keep output result
Terminal.png fsiteA:
conda info
conda list -n testconda --show-channel-urls

In this example, we launch a job that does the same tasks but in batch job.

  • The important step is to source shell environment to execute module and activate conda
Terminal.png fsiteA:
oarsub 'bash -l -c "module load conda ; conda activate testconda ; conda info ; conda list -n testconda --show-channel-url"'
OAR_JOB_ID=1539228
  • Is job finished ?
Terminal.png fsiteA:
oarsub -C 1539228
# Error: job 1539228 is not running. Its current state is Finishing.
  • Compare output with the previous one : they should be identical
Terminal.png fsiteA:
cat OAR.1539228.std

Advanced Conda environment operations

Synchronize Conda environments between Grid'5000 sites

  • To synchronize a Conda directory from a siteA to a siteB:
Terminal.png fsiteA:
rsync --dry-run --delete -avz ~/.conda siteB.grid5000.fr:~

To really do things, the --dry-run argument has to be removed and siteB has to be replaced by a real site name.

Share Conda environments between multiple users

You can use two different approaches to share Conda environments with other users.

Export an environment as a yaml file

  • Export it as follows:
Terminal.png fgrenoble:
conda env export > environment.yml
  • Share it by putting the yaml file in your public folder
Terminal.png fgrenoble:
cp environment.yml ~/public/
  • Other users can create the environment from the environment.yml file
Terminal.png fgrenoble:
conda env create -f ~/<login>/public/environment.yml
  • Advantage : it prevents other users from damaging the environment if they add packages that could conflict with other packages and/or even delete packages that another user might need.
  • Inconvenient : it's not a true shared environment. The environment is duplicated on other users' home directory. Any modification on one Conda environment will not be automatically replicated on others.

Use a group storage

Group Storage gives you the possibility to share a storage between multiple users. You can take advantage of a group storage to share a single Conda environment among multiple users.

  • Create a shared Conda environment with --prefix to specify the path to use to store the conda environment
Terminal.png flyon:
conda create --prefix /srv/storage/storage_name@server_hostname_(fqdn)/ENVNAME
  • Activate the shared environment (share this command with the targeted users)
Terminal.png flyon:
conda activate /srv/storage/storage_name@server_hostname_(fqdn)/ENVNAME
  • Advantage : It avoids storing duplicate packages and makes any modification accessible to all users
  • Inconvenients :
    • Users could potentially harm the environment by installing or removing packages.
    • When installing additional packages, conda still stores them in the package cache located in your home directory. Use conda clean as described above to clean those files.
  • Create your environments by defaut in a group storage location

You can modify you ~/.condarc file to specify this location for conda environment and package installation as follow (change the location to suit your group and your convenience). Add this lines:

pkgs_dirs:
  - /srv/storage/storage_name@server_hostname_(fqdn)/conda_shared_envs/pkgs/
envs_dirs:
  - /srv/storage/storage_name@server_hostname_(fqdn)/conda_shared_envs/envs/

Build your HPC-IA framework with conda

Here are some pointers to help you set up your software environment for HPC or AI with conda