Conda: Difference between revisions
| No edit summary | |||
| (153 intermediate revisions by 3 users not shown) | |||
| Line 1: | Line 1: | ||
| {{Portal|User}} | {{Portal|User}} | ||
| {{Portal|Tutorial}} | |||
| {{Pages|HPC}} | |||
| {{Portal|HPC}} | |||
| {{TutorialHeader}} | {{TutorialHeader}} | ||
| = Introduction = | = Introduction = | ||
| Conda is an open source package management system and environment management system for installing multiple versions of software packages and their dependencies and switching easily between them. It works on Linux, OS X and Windows, and was created for Python programs but can package and distribute any software. | [https://docs.conda.io/projects/conda/en/latest/index.html Conda] is an open source package management system and environment management system for installing multiple versions of software packages and their dependencies and switching easily between them. It works on Linux, OS X and Windows, and was created for Python programs but can package and distribute any software. | ||
| To get started with Conda, have a look at this [https://docs.conda.io/projects/conda/en/latest/user-guide/cheatsheet.html Conda cheat sheet] and this [https://towardsdatascience.com/managing-project-specific-environments-with-conda-b8b50aa8be0e Getting Started with Conda] guide. | |||
| == Anaconda  | == Conda, Miniconda, Anaconda ? == | ||
| ''' | * '''conda''' is the package manager. | ||
| * '''miniconda''' is a minimal python distribution for '''conda''' that includes base packages | |||
| * '''anaconda''' is another python distribution for '''conda''' that includes 160+ additionnal packages to miniconda | |||
| On Grid'5000, we installed ''conda'' using the ''miniconda'' installer, but you are free to create an anaconda environment, using the ''anaconda'' meta-package. | |||
| More information about Miniconda vs Anaconda is available on the [https://docs.conda.io/projects/conda/en/latest/user-guide/install/download.html#anaconda-or-miniconda Conda website]. | |||
| = Conda  | == Conda or Mamba? == | ||
| [https://mamba.readthedocs.io/en/latest/index.html mamba] is a reimplementation of the conda package manager in C++. Conda has a reputation for taking time when dealing with complex sets of dependencies. Mamba is much more efficient and is fully compatible with Conda packages and supports most of Conda's commands. It consists of: | |||
| * mamba: a Python-based CLI conceived as a drop-in replacement for conda, offering higher speed and more reliable environment solutions | |||
| * micromamba: a pure C++-based CLI, self-contained in a single-file executable | |||
| * libmamba: a C++ library exposing low-level and high-level APIs on top of which both mamba and micromamba are built | |||
| *  | |||
| *  | |||
| *  | |||
| === Mamba on Grid'5000 === | |||
| ==  | Like Conda, Mamba is available as a module on Grid'5000: | ||
| {{Term|location=frontal|cmd=<code class="command">module load mamba</code>}} | |||
| Then, since its syntax is generally compatible with Conda, you can use the <code class="command">mamba</code> command where you would use the <code class="command">conda</code> command. | |||
| = Conda on Grid'5000 =  | |||
| Conda is already available in Grid'5000 as a module. '''You do not need to install Anaconda or Miniconda on Grid'5000 !'''  | |||
| == Conda  | == Load Conda module == | ||
| * To make it available on a node or on a frontend, load the Conda module as follows (default version): | |||
| {{Term|location=frontal|cmd=<code class="command">module load conda</code>}} | |||
| == Conda shell  | == Optional: Conda initialization and activation in your shell == | ||
| Conda  | Conda initialization is the process of defining some shell functions that facilitate activating and deactivating Conda environments, as well as some optional features such as updating PS1 to show the active environment. It is not required to use Conda. | ||
| The conda shell function is mainly a forwarder function. It will delegate most of the commands to the real conda executable driven by the Python library. | The conda shell function is mainly a forwarder function. It will delegate most of the commands to the real conda executable driven by the Python library. | ||
| There are two ways to initialize conda in standard installation: | |||
| * 1. occasionally : activate conda in your current shell (ex: bash) | |||
| {{Term|location=$|cmd=<code class="command">eval "$(conda shell.bash hook)"</code>}}   | {{Term|location=$|cmd=<code class="command">eval "$(conda shell.bash hook)"</code>}}   | ||
| * 2. always : activate conda in your login shell environment permanently (this command modifies your .bashrc by adding conda setup directives) | |||
| {{Term|location=$|cmd=<code class="command">conda init</code>}}  | |||
| = | {{Warning|text=bash is the default shell for conda.<br> | ||
| *  | For users using tcsh or zsh  use : | ||
| *  | * <code class="command">eval "$(conda shell.{tcsh,zsh} hook)"</code> | ||
| * <code class="command">conda init {tcsh,zsh}</code>}} | |||
| In Grid'5000, the '''conda''' initialization is made transparently by loading the conda module. | |||
| The <code class="command">conda activate</code> or   | |||
| <code class="command">conda deactivate</code> commands relies on the conda shell initialization to load/unload the corresponding conda environment variables to the current shell session. | |||
| By default, you are located in the <code>base</code> Conda environment that corresponds to the base installation of Conda. | |||
| If you’d prefer that conda’s base environment not be activated on startup, set the auto_activate_base parameter to false: | |||
| {{Term|location= | {{Term|location=$|cmd=<code class="command">conda config --set auto_activate_base false</code>}} | ||
| Verify your conda configuration with this command: | |||
| {{Term|location=$|cmd=<code class="command">conda config --show</code>}} | |||
| Look at all available configuration options with: | |||
| {{Term|location= | {{Term|location=$|cmd=<code class="command">conda config --describe</code>}} | ||
| ==  | == Conda environments == | ||
| Conda allows you to create separate environments containing files, packages, and their dependencies that will not interact with other environments. | |||
| When you begin using conda, you already have a default environment named <code>base</code>.  | |||
| You can create separate environments to keep your programs isolated from each other. Specifying the environment name confines conda commands to that environment. | |||
| {{ | {{Warning|text=The <code>base</code> environment is stored in a read-only directory as shown by <code>conda info</code> command | ||
| '''That's why you need to systematically create your own conda environments to install the software you need.'''}} | |||
| *  | * List all your environments | ||
| {{Term|location= | {{Term|location=$|cmd=<code class="command">conda info --envs</code>}} | ||
| or | |||
| {{Term|location=$|cmd=<code class="command">conda env list</code>}} | |||
| *  | * Create a new environment | ||
| {{Term|location= | {{Term|location=$|cmd=<code class="command">conda create --name ENVNAME</code>}} | ||
| *  | * Activate this environment before installing package | ||
| {{Term|location= | {{Term|location=$|cmd=<code class="command">conda activate ENVNAME</code>}} | ||
| *  | For further information: | ||
| * https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html | |||
| * [https://towardsdatascience.com/managing-project-specific-environments-with-conda-406365a539ab Managing your data science project environments with Conda] | |||
| ==  | == Conda package installation == | ||
| In its default configuration (the default Conda channel), Conda can install and manage the over 7,500 packages at https://repo.anaconda.com/pkgs/ that are built, reviewed, and maintained by Anaconda. | |||
| {{Term|location=$|cmd=<code class="command">conda install <package></code>}} | |||
| {{Term|location= | |||
| *  | * Install specific version of package: | ||
| ==  | {{Term|location=$|cmd=<code class="command">conda install <package>=<version></code>}} | ||
| * Uninstall a package: | |||
| {{Term|location=$|cmd=<code class="command">conda uninstall <package></code>}} | |||
| For further information: | |||
| * https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/packages.html | |||
| == Conda package installation from channels == | |||
| Channels are the locations of the repositories where Conda looks for packages. Channels may point to a Cloud repository or a private location on a remote or local repository that you or your organization created. Useful channels are: | |||
| * <code>conda-forge</code> from https://conda-forge.org. It is free for all to use.  | |||
| * <code>nvidia</code> from https://anaconda.org/nvidia. It provides Nvidia's software. | |||
| To install a package from a specific channel: | |||
| {{Term|location=$|cmd=<code class="command">conda install -c <channel_name> <package></code>}} | |||
| * List all packages installed with their source channels | |||
| *  | {{Term|location=$|cmd=<code class="command">conda list --show-channel-urls</code>}} | ||
| {{Term|location= | |||
| For further information: | |||
| * https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/channels.html | |||
| {{Warning|text=Installing Conda packages can be time and resources consuming. Preferably use a node (instead of a frontend) to perform such an operation. Note, using a node is mandatory if you need to access specific hardware resources like GPU.}} | |||
| {{ | |||
| = Application examples = | |||
| == | == Create an environment == | ||
| For example create environment <code class="replace"><env_name></code> (specify a Python version; otherwise, it is the module default version) | |||
| {{Term|location= | {{Term|location=fgrenoble|cmd=<code class="command">conda create -y -n </code><code class="replace"><env_name></code> <code class="command">python=x.y</code>}} | ||
| == Load this environment == | |||
| {{Term|location=fgrenoble|cmd=<code class="command">conda activate </code><code class="replace"><env_name></code>}} | |||
| {{Term|location= | |||
| == Install a package into == | |||
| {{Term|location= | {{Term|location=fgrenoble|cmd=<code class="command">conda install </code><code class="replace"><package_name></code>}} | ||
| == Exit from the loaded environment == | |||
| {{Term|location= | {{Term|location=fgrenoble|cmd=<code class="command">conda deactivate</code>}} | ||
| == Remove unused Conda environments == | |||
| {{Warning|text=Conda packages are installed in <code>$HOME/.conda</code>. You could, therefore, rapidly saturate your [[Storage#.2Fhome|homedir quota]] (25GB by default). Do not forget to occasionally remove unused Conda environment to free up space.}} | |||
| {{ | |||
| *  | * To delete an environment | ||
| {{Term|location= | {{Term|location=fgrenoble|cmd=<code class="command">conda deactivate</code><br> | ||
| <code class="command">conda env remove --name </code><code class="replace"><env_name></code>}} | |||
| *  | * To remove unused packages and the cache. Do not be concerned if this appears to try to delete the packages of the system environment (ie. non-local). | ||
| {{Term|location= | {{Term|location=fgrenoble|cmd=<code class="command">conda clean -a</code>}} | ||
| == Use a Conda environment in a job == | |||
| As seen in the previous section, the Conda environment is stored by default in user's homedir (at <code>~/.conda</code>). Once the environment is created and packages installed, it is usable on all nodes from the given site. | |||
| === For interactive jobs === | |||
| = | Load, init, and active you conda environment <code class="replace">env_name</code> in an interactive job | ||
| {{Term|location=frontal|cmd=<code class="command">oarsub -I</code>}} | |||
| {{Term|location=node|cmd=<code class="command">module load conda</code><br> | |||
| <code class="command">conda activate </code><code class="replace">env_name</code>}} | |||
| === For batch jobs === | |||
| Load, initialize, and active you conda environment <code class="replace">env_name</code> in a batch job | |||
| *  | First prepare your conda environment on the frontend:  | ||
| {{Term|location= | * module load and conda initialization | ||
| * conda creation of an environment <code>testconda</code> containing <code>gcc</code> from <code>conda-forge</code> channel | |||
| {{Term|location= | * list installed packages with source info | ||
| {{Term|location=fsiteA|cmd=<code class="command">module load conda</code><br> | |||
| <code class="command">conda create --name testconda</code><br> | |||
| <code class="command">conda activate testconda</code><br> | |||
| <code class="command">conda install -c conda-forge gcc_linux-64 gxx_linux-64</code>}} | |||
| * launch this commands and keep output result | |||
| {{Term|location=fsiteA|cmd=<code class="command">conda info</code><br> | |||
| <code class="command">conda list -n testconda --show-channel-urls</code>}} | |||
| *  | In this example, we launch a job that does the same tasks but in batch job. | ||
| {{Term|location= | * The important step is to source shell environment to execute module and activate conda | ||
| <pre> | {{Term|location=fsiteA|cmd=<code class="command">oarsub 'bash -l -c "module load conda ; conda activate testconda ; conda info ; conda list -n testconda --show-channel-url"'</code>}} | ||
| <pre>OAR_JOB_ID=1539228</pre> | |||
| </pre> | |||
| *  | * Is job finished ? | ||
| {{Term|location= | {{Term|location=fsiteA|cmd=<code class="command">oarsub -C 1539228</code>}} | ||
| <pre># Error: job 1539228 is not running. Its current state is Finishing.</pre> | |||
| <pre> | |||
| </pre> | |||
| * Compare output with the previous one : they should be identical | |||
| {{Term|location=fsiteA|cmd=<code class="command">cat OAR.1539228.std</code>}} | |||
| = Advanced Conda environment operations = | |||
| == Synchronize Conda environments between Grid'5000 sites == | |||
| * To synchronize a Conda directory from a siteA to a siteB: | |||
| * To  | |||
| {{Term|location=fsiteA|cmd=<code class="command">rsync --dry-run --delete -avz ~/.conda siteB.grid5000.fr:~</code>}} | |||
| {{Term|location= | |||
| To really do things, the <code>--dry-run</code> argument has to be removed and ''siteB'' has to be replaced by a real site name. | |||
| == Share Conda environments between multiple users == | == Share Conda environments between multiple users == | ||
| You can use two different approaches to share Conda environments with other users. | |||
| You can  | |||
| === Export an environment as a yaml file === | === Export an environment as a yaml file === | ||
| * Export it as  | * Export it as follows: | ||
| < | {{Term|location=fgrenoble|cmd=<code class="command">conda env export > environment.yml</code>}} | ||
| * Share it by putting the yaml file in your public folder | * Share it by putting the yaml file in your public folder | ||
| < | {{Term|location=fgrenoble|cmd=<code class="command">cp environment.yml ~/public/</code>}} | ||
| * Other users can create the environment from the <code>environment.yml</code> file | * Other users can create the environment from the <code>environment.yml</code> file | ||
| < | {{Term|location=fgrenoble|cmd=<code class="command">conda env create -f ~/<login>/public/environment.yml</code>}} | ||
| *  | * Advantage : it prevents other users from damaging the environment if they add packages that could conflict with other packages and/or even delete packages that another user might need. | ||
| * Inconvenient : it's not a true shared environment | * Inconvenient : it's not a true shared environment. The environment is duplicated on other users' home directory. Any modification on one Conda environment will not be automatically replicated on others. | ||
| ===  | === Use a group storage === | ||
| [[Group Storage]] gives you the possibility to share a storage between multiple users. You can take advantage of a group storage to share a single Conda environment among multiple users.  | |||
| *  | * Create a shared Conda environment with <code>--prefix</code> to specify the path to use to store the conda environment | ||
| <code> | {{Term|location=flyon|cmd=<code class="command">conda create --prefix /srv/storage/</code><code class="replace">storage_name</code>@<code class="replace">server_hostname_(fqdn)/ENVNAME</code>}} | ||
| * Activate the shared environment (share this command with the targeted users) | |||
| {{Term|location=flyon|cmd=<code class="command">conda activate /srv/storage/</code><code class="replace">storage_name</code>@<code class="replace">server_hostname_(fqdn)/ENVNAME</code>}} | |||
| * Advantage : It avoids storing duplicate packages and makes any modification accessible to all users | |||
| * Inconvenients : | |||
| ** Users could potentially harm the environment by installing or removing packages. | |||
| ** When installing additional packages, conda still stores them in the package cache located in your home directory. Use <code class="command">conda clean</code> as described above to clean those files. | |||
| * Create your environments by defaut in a group storage location | |||
| You can modify you <code>~/.condarc</code> file to specify this location for conda environment and package installation as follow (change the location to suit your group and your convenience). Add this lines: | |||
| <pre> | |||
| pkgs_dirs: | |||
|   - /srv/storage/storage_name@server_hostname_(fqdn)/conda_shared_envs/pkgs/ | |||
| envs_dirs: | |||
|   - /srv/storage/storage_name@server_hostname_(fqdn)/conda_shared_envs/envs/ | |||
| </pre> | |||
| = Build your HPC-IA framework with conda = | |||
| [ | Here are some pointers to help you set up your software environment for HPC or AI with conda | ||
| * [[HPC_and_HTC_tutorial]] | |||
| * Running [[Run_MPI_On_Grid'5000|MPI applications on Grid'5000]] | |||
| * [[Deep_Learning_Frameworks|Deep Learning Frameworks documentation]] | |||
Latest revision as of 17:23, 21 September 2023
|   | Note | 
|---|---|
| This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team. | |
Introduction
Conda is an open source package management system and environment management system for installing multiple versions of software packages and their dependencies and switching easily between them. It works on Linux, OS X and Windows, and was created for Python programs but can package and distribute any software.
To get started with Conda, have a look at this Conda cheat sheet and this Getting Started with Conda guide.
Conda, Miniconda, Anaconda ?
- conda is the package manager.
- miniconda is a minimal python distribution for conda that includes base packages
- anaconda is another python distribution for conda that includes 160+ additionnal packages to miniconda
On Grid'5000, we installed conda using the miniconda installer, but you are free to create an anaconda environment, using the anaconda meta-package.
More information about Miniconda vs Anaconda is available on the Conda website.
Conda or Mamba?
mamba is a reimplementation of the conda package manager in C++. Conda has a reputation for taking time when dealing with complex sets of dependencies. Mamba is much more efficient and is fully compatible with Conda packages and supports most of Conda's commands. It consists of:
- mamba: a Python-based CLI conceived as a drop-in replacement for conda, offering higher speed and more reliable environment solutions
- micromamba: a pure C++-based CLI, self-contained in a single-file executable
- libmamba: a C++ library exposing low-level and high-level APIs on top of which both mamba and micromamba are built
Mamba on Grid'5000
Like Conda, Mamba is available as a module on Grid'5000:
Then, since its syntax is generally compatible with Conda, you can use the mamba command where you would use the conda command.
Conda on Grid'5000
Conda is already available in Grid'5000 as a module. You do not need to install Anaconda or Miniconda on Grid'5000 !
Load Conda module
- To make it available on a node or on a frontend, load the Conda module as follows (default version):
Optional: Conda initialization and activation in your shell
Conda initialization is the process of defining some shell functions that facilitate activating and deactivating Conda environments, as well as some optional features such as updating PS1 to show the active environment. It is not required to use Conda.
The conda shell function is mainly a forwarder function. It will delegate most of the commands to the real conda executable driven by the Python library.
There are two ways to initialize conda in standard installation:
- 1. occasionally : activate conda in your current shell (ex: bash)
- 2. always : activate conda in your login shell environment permanently (this command modifies your .bashrc by adding conda setup directives)
|   | Warning | 
|---|---|
| bash is the default shell for conda. For users using tcsh or zsh use : 
 | |
In Grid'5000, the conda initialization is made transparently by loading the conda module.
The conda activate or 
conda deactivate commands relies on the conda shell initialization to load/unload the corresponding conda environment variables to the current shell session.
By default, you are located in the base Conda environment that corresponds to the base installation of Conda.
If you’d prefer that conda’s base environment not be activated on startup, set the auto_activate_base parameter to false:
Verify your conda configuration with this command:
Look at all available configuration options with:
Conda environments
Conda allows you to create separate environments containing files, packages, and their dependencies that will not interact with other environments.
When you begin using conda, you already have a default environment named base. 
You can create separate environments to keep your programs isolated from each other. Specifying the environment name confines conda commands to that environment.
- List all your environments
or
- Create a new environment
- Activate this environment before installing package
For further information:
- https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
- Managing your data science project environments with Conda
Conda package installation
In its default configuration (the default Conda channel), Conda can install and manage the over 7,500 packages at https://repo.anaconda.com/pkgs/ that are built, reviewed, and maintained by Anaconda.
- Install specific version of package:
- Uninstall a package:
For further information:
Conda package installation from channels
Channels are the locations of the repositories where Conda looks for packages. Channels may point to a Cloud repository or a private location on a remote or local repository that you or your organization created. Useful channels are:
- conda-forgefrom https://conda-forge.org. It is free for all to use.
- nvidiafrom https://anaconda.org/nvidia. It provides Nvidia's software.
To install a package from a specific channel:
- List all packages installed with their source channels
For further information:
Application examples
Create an environment
For example create environment <env_name> (specify a Python version; otherwise, it is the module default version)
Load this environment
Install a package into
Exit from the loaded environment
Remove unused Conda environments
|   | Warning | 
|---|---|
| Conda packages are installed in  | |
- To delete an environment
- To remove unused packages and the cache. Do not be concerned if this appears to try to delete the packages of the system environment (ie. non-local).
Use a Conda environment in a job
As seen in the previous section, the Conda environment is stored by default in user's homedir (at ~/.conda). Once the environment is created and packages installed, it is usable on all nodes from the given site.
For interactive jobs
Load, init, and active you conda environment env_name in an interactive job
For batch jobs
Load, initialize, and active you conda environment env_name in a batch job
First prepare your conda environment on the frontend:
- module load and conda initialization
- conda creation of an environment testcondacontaininggccfromconda-forgechannel
- list installed packages with source info
|   | fsiteA: | module load conda
 conda install -c conda-forge gcc_linux-64 gxx_linux-64 | 
- launch this commands and keep output result
In this example, we launch a job that does the same tasks but in batch job.
- The important step is to source shell environment to execute module and activate conda
|   | fsiteA: | oarsub 'bash -l -c "module load conda ; conda activate testconda ; conda info ; conda list -n testconda --show-channel-url"' | 
OAR_JOB_ID=1539228
- Is job finished ?
# Error: job 1539228 is not running. Its current state is Finishing.
- Compare output with the previous one : they should be identical
Advanced Conda environment operations
Synchronize Conda environments between Grid'5000 sites
- To synchronize a Conda directory from a siteA to a siteB:
To really do things, the --dry-run argument has to be removed and siteB has to be replaced by a real site name.
You can use two different approaches to share Conda environments with other users.
Export an environment as a yaml file
- Export it as follows:
- Share it by putting the yaml file in your public folder
- Other users can create the environment from the environment.ymlfile
- Advantage : it prevents other users from damaging the environment if they add packages that could conflict with other packages and/or even delete packages that another user might need.
- Inconvenient : it's not a true shared environment. The environment is duplicated on other users' home directory. Any modification on one Conda environment will not be automatically replicated on others.
Use a group storage
Group Storage gives you the possibility to share a storage between multiple users. You can take advantage of a group storage to share a single Conda environment among multiple users.
- Create a shared Conda environment with --prefixto specify the path to use to store the conda environment
- Activate the shared environment (share this command with the targeted users)
- Advantage : It avoids storing duplicate packages and makes any modification accessible to all users
- Inconvenients :
- Users could potentially harm the environment by installing or removing packages.
- When installing additional packages, conda still stores them in the package cache located in your home directory. Use conda cleanas described above to clean those files.
 
- Create your environments by defaut in a group storage location
You can modify you ~/.condarc file to specify this location for conda environment and package installation as follow (change the location to suit your group and your convenience). Add this lines:
pkgs_dirs: - /srv/storage/storage_name@server_hostname_(fqdn)/conda_shared_envs/pkgs/ envs_dirs: - /srv/storage/storage_name@server_hostname_(fqdn)/conda_shared_envs/envs/
Build your HPC-IA framework with conda
Here are some pointers to help you set up your software environment for HPC or AI with conda