PMEM: Difference between revisions
m (Pneyron moved page User:Pneyron/PMEM-userdoc to PMEM) |
No edit summary |
||
(27 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
{{Portal|User}} | |||
{{Portal|Tutorial}} | |||
{{TutorialHeader}} | |||
__TOC__ | __TOC__ | ||
Some [[Hardware#PMEM_size_per_node|nodes of Grid'5000]] feature the new '''Persistent Memory''' technology. As of writing this page, the [[Grenoble:Hardware#troll|troll]] cluster in Grenoble is equipped. | Some [[Hardware#PMEM_size_per_node|nodes of Grid'5000]] feature the new '''Persistent Memory''' technology. As of writing this page, the [[Grenoble:Hardware#troll|troll]] cluster in Grenoble is equipped. | ||
= Forewords = | = Forewords = | ||
This '' | This ''Persistent Memory'' technology is known by many different names, e.g. | ||
* nvdimm (generic term, nvdimm-N = battery backed DRAM, nvdimm-P...) | * nvdimm (generic term, nvdimm-N = battery backed DRAM, nvdimm-P...) | ||
* SCM (storage class memory) | * SCM (storage class memory) | ||
Line 9: | Line 12: | ||
In the rest of this document, we'll use the '''PMEM''' acronym. | In the rest of this document, we'll use the '''PMEM''' acronym. | ||
The current available PMEM technology available in Grid'5000 is '''Intel's Optane DC | The current available PMEM technology available in Grid'5000 is '''Intel's Optane DC Persistent Memory'''. Other vendors may provide PMEM in the future (IBM, HPE Memristor ?). PMEM has been also available for tests in emulators such as qemu for a long time. | ||
This technology consists in DIMMs (just like DRAM) but offering a different set of characteristics: | This technology consists in DIMMs (just like DRAM) but offering a different set of characteristics: | ||
* It fills the gap between memory and storage: RAM <x10< PMEM <x100< high-end NVMe SSD in terms of latency | * It fills the gap between memory and storage: RAM <x10< PMEM <x100< high-end NVMe SSD in terms of latency | ||
* Persistence: can be | * Persistence: can be used as (persistent) memory or filesystem on steroids | ||
* Byte addressable, zero-copy memory mapping | * Byte addressable, zero-copy memory mapping | ||
* No energy consumption | * No energy consumption when idle, but more than RAM when used | ||
* Lower price per GB compared to DRAM, larger memory sizes than DRAM | * Lower price per GB compared to DRAM, larger memory sizes than DRAM | ||
This technology is not to be confused with the generic NVRAM term or the NVMe storages (SSD disk drives on top of PCIe). | |||
= Intel's PMEM settings = | = Intel's PMEM settings = | ||
Line 22: | Line 27: | ||
; Memory | ; Memory | ||
* Just more RAM, no persistence. DRAM serves as cache (it disappears for the operating system viewpoint). | * Just more RAM, no persistence. DRAM serves as cache (it disappears for the operating system viewpoint). | ||
* The persistence of the PMEM memory is not actually exposed in this mode. | |||
; App direct | ; App direct | ||
* Offers an explicit use of the PMEM memory in different modes. | |||
* Many choices of configuration: | * Many choices of configuration: | ||
** DIMMs interleave option in the region (change needs reboot) | ** DIMMs interleave option in the region (change needs reboot) | ||
Line 28: | Line 35: | ||
** sector, fsdax, devdax, kmem (kmem not available before Linux 5.1) | ** sector, fsdax, devdax, kmem (kmem not available before Linux 5.1) | ||
; Mix mode | ; Mix mode | ||
* It is also possible to allocate part of the memory to Memory | * It is also possible to allocate part of the memory to Memory mode and part of it to App Direct | ||
'''In order to change the configuration (e.g. from Memory mode to App Direct mode, or vice versa), a reboot of the machine is needed'''. | '''In order to change the configuration (e.g. from Memory mode to App Direct mode, or vice versa), a reboot of the machine is needed'''. | ||
= Grid'5000 setup for experimentation = | = Grid'5000 setup for experimentation = | ||
'''The choice in Grid'5000 has been to configure PMEM in Memory | '''The choice in Grid'5000 has been to configure PMEM in Memory mode by default'''. | ||
That means that the PMEM is in Memory | That means that the PMEM is in Memory mode (it appears just like more RAM) in the Grid'5000 default environment (when not deploying). | ||
'''Kadeploying allows to experiment with the App Direct mode'''. We encourage users who wants to experiments with the App direct mode to deploy a very recent system (e.g. Debian testing), in order to benefit from the latest support for PMEM. | '''Kadeploying allows to experiment with the App Direct mode'''. We encourage users who wants to experiments with the App direct mode to deploy a very recent system (e.g. Debian testing), in order to benefit from the latest support for PMEM. | ||
To that purpose, jobs need to be of the ''deploy'' type, and kadeploy must be used: | To that purpose, jobs need to be of the ''deploy'' type, and kadeploy must be used: | ||
{{Term|location=fgrenoble|cmd=<code class="command">oarsub -p | {{Term|location=fgrenoble|cmd=<code class="command">oarsub -t exotic -p troll -t deploy -I</code>}} | ||
Then: | Then: | ||
{{Term|location=fgrenoble|cmd=<code class="command"> | {{Term|location=fgrenoble|cmd=<code class="command">kadeploy3 -e debian11-x64-min -f $OAR_NODEFILE -k</code>}} | ||
Once a node is deployed, one can connect to it as root, install the PMEM software and possibly change the configuration and reboot to apply it. | Once a node is deployed, one can connect to it as root, install the PMEM software and possibly change the configuration and reboot to apply it. | ||
The PMEM software are: | The PMEM software are: | ||
* <code class="command">ipmctl</code>: tool to change the config of Intel's PMEM | * <code class="command">ipmctl</code>: tool to change the config of Intel's PMEM (switch mode, etc.) | ||
* <code class="command">ndctl</code>: tool to configure PMEM in App Direct mode | * <code class="command">ndctl</code>: tool to configure PMEM when in App Direct mode | ||
* <code class="command">daxctl</code>: tool to configure the PMEM direct access (dax) | * <code class="command">daxctl</code>: tool to configure the PMEM direct access (dax) | ||
{{Warning|text=The [https://github.com/intel/ipmctl/tags ipmctl software] evolves a lot and the PMEM support backward compatibility is not always ensured. Please try to always use the latest version of ipmctl. The latest provided in [https://tracker.debian.org/pkg/ipmctl Debian sid/testing or buster-backports] should be fine. | |||
If not, it may happen that the Grid'5000 recovery tool is not able to reset the PMEM configuration at the end your job. In such a case you may let [mailto:support-staff@lists.grid5000.fr support-staff] know what you actually did in order to help in the diagnostic, by providing the ipmctl commands and the ipmctl version you used. Thanks.}} | |||
Install in Debian testing as follows: | Install in Debian testing as follows: | ||
{{Term|location=troll-2|cmd=<code class="command">apt install ipmctl ndctl | {{Term|location=troll-2|cmd=<code class="command">apt install ipmctl ndctl daxctl</code>}} | ||
See the man pages or external documentations (see the [[#References|references]] section) of the tools to use them. | See the man pages or external documentations (see the [[#References|references]] section) of the tools to find out how to use them. | ||
For instance to change to App Direct mode, with DIMMs interleaved, one can run: | For instance to change to App Direct mode, with DIMMs interleaved, one can run: | ||
{{Term|location=troll-2|cmd=<code class="command">ipmctl create -goal MemoryMode= | {{Term|location=troll-2|cmd=<code class="command">ipmctl create -goal MemoryMode=0</code>}} | ||
And then '''reboot'''. | And then '''reboot'''. | ||
Reboot time of the machine is pretty long (~ 10 minutes), so be patient. You might want to look at the console to follow the progress: | Reboot time of the machine is pretty long (~ 10 minutes), so be patient. You might want to look at the console to follow the progress: | ||
{{Term|location=fgrenoble|cmd=<code class="command"> | {{Term|location=fgrenoble|cmd=<code class="command">kaconsole3 -m</code><code classe="replace">troll-2</code>}} | ||
= Important notes= | |||
* Please mind that '''when a job is terminated, the nodes of the job are automatically reconfigured to the default mode of operation, that is Memory mode'''. | * Please mind that '''when a job is terminated, the nodes of the job are automatically reconfigured to the default mode of operation, that is Memory mode'''. | ||
* Please mind that the PMEM is not erased after a job. Data stored during an experiment (e.g. after switching the PMEM to the App Direct mode) may be accessed in a later job for instance by another user (despite the switch to Memory Mode in between jobs). Please mind erasing your data if meaningful for you. Conversely, you should of course not expect data to be preserved between two jobs. | |||
* Please mind that '''sudo-g5k is of NO help to experiment with the App Direct mode''', since rebooting the node after changing the configuration will terminate the job, and switch it back to Memory mode. Using the App Direct mode requires kadeploying. | * Please mind that '''sudo-g5k is of NO help to experiment with the App Direct mode''', since rebooting the node after changing the configuration will terminate the job, and switch it back to Memory mode. Using the App Direct mode requires kadeploying. | ||
{{Note|text=See also [[User:Pneyron/PMEM-environment|this page]] for preparing a Grid'5000 environment that takes care of the PMEM switch to App Direct right away during the initial deployment of a node.}} | |||
= References = | = References = |
Latest revision as of 12:28, 31 March 2022
Note | |
---|---|
This page is actively maintained by the Grid'5000 team. If you encounter problems, please report them (see the Support page). Additionally, as it is a wiki page, you are free to make minor corrections yourself if needed. If you would like to suggest a more fundamental change, please contact the Grid'5000 team. |
Some nodes of Grid'5000 feature the new Persistent Memory technology. As of writing this page, the troll cluster in Grenoble is equipped.
Forewords
This Persistent Memory technology is known by many different names, e.g.
- nvdimm (generic term, nvdimm-N = battery backed DRAM, nvdimm-P...)
- SCM (storage class memory)
- PMM/PMEM
In the rest of this document, we'll use the PMEM acronym.
The current available PMEM technology available in Grid'5000 is Intel's Optane DC Persistent Memory. Other vendors may provide PMEM in the future (IBM, HPE Memristor ?). PMEM has been also available for tests in emulators such as qemu for a long time.
This technology consists in DIMMs (just like DRAM) but offering a different set of characteristics:
- It fills the gap between memory and storage: RAM <x10< PMEM <x100< high-end NVMe SSD in terms of latency
- Persistence: can be used as (persistent) memory or filesystem on steroids
- Byte addressable, zero-copy memory mapping
- No energy consumption when idle, but more than RAM when used
- Lower price per GB compared to DRAM, larger memory sizes than DRAM
This technology is not to be confused with the generic NVRAM term or the NVMe storages (SSD disk drives on top of PCIe).
Intel's PMEM settings
Intel's PMEM can be configured in 2 modes:
- Memory
- Just more RAM, no persistence. DRAM serves as cache (it disappears for the operating system viewpoint).
- The persistence of the PMEM memory is not actually exposed in this mode.
- App direct
- Offers an explicit use of the PMEM memory in different modes.
- Many choices of configuration:
- DIMMs interleave option in the region (change needs reboot)
- region splits in namespaces (change may need reboot)
- sector, fsdax, devdax, kmem (kmem not available before Linux 5.1)
- Mix mode
- It is also possible to allocate part of the memory to Memory mode and part of it to App Direct
In order to change the configuration (e.g. from Memory mode to App Direct mode, or vice versa), a reboot of the machine is needed.
Grid'5000 setup for experimentation
The choice in Grid'5000 has been to configure PMEM in Memory mode by default. That means that the PMEM is in Memory mode (it appears just like more RAM) in the Grid'5000 default environment (when not deploying).
Kadeploying allows to experiment with the App Direct mode. We encourage users who wants to experiments with the App direct mode to deploy a very recent system (e.g. Debian testing), in order to benefit from the latest support for PMEM.
To that purpose, jobs need to be of the deploy type, and kadeploy must be used:
Then:
Once a node is deployed, one can connect to it as root, install the PMEM software and possibly change the configuration and reboot to apply it.
The PMEM software are:
ipmctl
: tool to change the config of Intel's PMEM (switch mode, etc.)ndctl
: tool to configure PMEM when in App Direct modedaxctl
: tool to configure the PMEM direct access (dax)
Warning | |
---|---|
The ipmctl software evolves a lot and the PMEM support backward compatibility is not always ensured. Please try to always use the latest version of ipmctl. The latest provided in Debian sid/testing or buster-backports should be fine. If not, it may happen that the Grid'5000 recovery tool is not able to reset the PMEM configuration at the end your job. In such a case you may let support-staff know what you actually did in order to help in the diagnostic, by providing the ipmctl commands and the ipmctl version you used. Thanks. |
Install in Debian testing as follows:
See the man pages or external documentations (see the references section) of the tools to find out how to use them.
For instance to change to App Direct mode, with DIMMs interleaved, one can run:
And then reboot.
Reboot time of the machine is pretty long (~ 10 minutes), so be patient. You might want to look at the console to follow the progress:
Important notes
- Please mind that when a job is terminated, the nodes of the job are automatically reconfigured to the default mode of operation, that is Memory mode.
- Please mind that the PMEM is not erased after a job. Data stored during an experiment (e.g. after switching the PMEM to the App Direct mode) may be accessed in a later job for instance by another user (despite the switch to Memory Mode in between jobs). Please mind erasing your data if meaningful for you. Conversely, you should of course not expect data to be preserved between two jobs.
- Please mind that sudo-g5k is of NO help to experiment with the App Direct mode, since rebooting the node after changing the configuration will terminate the job, and switch it back to Memory mode. Using the App Direct mode requires kadeploying.
Note | |
---|---|
See also this page for preparing a Grid'5000 environment that takes care of the PMEM switch to App Direct right away during the initial deployment of a node. |
References
- https://docs.pmem.io/
- https://software.intel.com/pmem
- https://software.intel.com/en-us/articles/quick-start-guide-configure-intel-optane-dc-persistent-memory-on-linux
- https://software.intel.com/en-us/videos/provisioning-intel-optane-dc-persistent-memory-modules-in-linux
- https://www.youtube.com/watch?v=BShO6h8Lc1s
- https://www.youtube.com/watch?v=UTVt_AZmWjM
- https://github.com/intel/ipmctl
- https://hal.inria.fr/hal-02173336/document
- https://nvdimm.wiki.kernel.org/
- https://stevescargall.com/2019/07/09/how-to-extend-volatile-system-memory-ram-using-persistent-memory-on-linux/
- https://www.dell.com/support/manuals/fr/fr/frbsdt1/poweredge-r640/idrac_3.36.36.36_racadm_ar_referenceguide/biospmcreategoalconfigpmpersistentpercentage-read-or-write
- VirtIO-PMEM https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.3-VirtIO-PMEM
- https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00074717en_us&docLocale=en_US