GPUs on Grid5000: Difference between revisions
Ebertoncello (talk | contribs) |
Ebertoncello (talk | contribs) |
||
Line 25: | Line 25: | ||
{{Term|location=lille|cmd=<code class="command">oarsub</code> -I -p "GPU='SHARED'"}} | {{Term|location=lille|cmd=<code class="command">oarsub</code> -I -p "GPU='SHARED'"}} | ||
{{Warning|text=Please note that Lille GPU are shared between nodes and the oarsub command will differ from Grenoble or Lyon. When using GPUs at Lille, you may | {{Warning|text=Please note that Lille GPU are shared between nodes and the oarsub command will differ from Grenoble or Lyon. When using GPUs at Lille, you may encounter some trouble, you can read more about Lille GPU on [[Lille:GPU]].}} | ||
We will then download CUDA 5.0 samples and install them. | We will then download CUDA 5.0 samples and install them. | ||
Line 32: | Line 32: | ||
You will be prompt to accept the EULA and for a installation path, we suggest you to use /tmp/samples/.<br /><br /> | You will be prompt to accept the EULA and for a installation path, we suggest you to use /tmp/samples/.<br /><br /> | ||
Then you can go to installation path, in our case /tmp/samples/. If you list the directory, you will see a lot of folder starting with a number, these are CUDA examples. CUDA examples are describe in the document named Samples.html. You might also want to have a look to "doc | Then you can go to installation path, in our case <code class="file">/tmp/samples/</code>. If you list the directory, you will see a lot of folder starting with a number, these are CUDA examples. CUDA examples are describe in the document named <code class="file">Samples.html</code>. You might also want to have a look to <code class="file">doc</code> directory.<br /><br /> | ||
We will now compile examples, this will take a little time. From CUDA samples installation directory (/tmp/samples), run make: | We will now compile examples, this will take a little time. From CUDA samples installation directory (<code class="file">/tmp/samples</code>), run make: | ||
{{Term|location=node|cmd=<code class="command">make</code>}} | {{Term|location=node|cmd=<code class="command">make</code>}} | ||
The process is complete when "Finished building CUDA samples" is printed. You should able to run CUDA examples. We will try the one named "Device Query | The process is complete when "Finished building CUDA samples" is printed. You should be able to run CUDA examples. We will try the one named <code class="file">Device Query</code> which is located in <code class="file">/tmp/samples/1_Utilities/deviceQuery/</code>. This sample enumerates the properties of the CUDA devices present in the system. | ||
{{Term|location=node|cmd=<code class="command">1_Utilities/deviceQuery/deviceQuery</code>}} | {{Term|location=node|cmd=<code class="command">/tmp/samples/1_Utilities/deviceQuery/deviceQuery</code>}} | ||
This is an example of the result on adonis cluster at Grenoble: | This is an example of the result on adonis cluster at Grenoble: | ||
ebertoncello@adonis-2:/tmp/samples$ 1_Utilities/deviceQuery/deviceQuery | ebertoncello@adonis-2:/tmp/samples$ ./1_Utilities/deviceQuery/deviceQuery | ||
1_Utilities/deviceQuery/deviceQuery Starting... | 1_Utilities/deviceQuery/deviceQuery Starting... | ||
Line 110: | Line 110: | ||
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 2, Device0 = Tesla T10 Processor, Device1 = Tesla T10 Processor | deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 2, Device0 = Tesla T10 Processor, Device1 = Tesla T10 Processor | ||
= Install CUDA from a base environement = | |||
== Reservation and deployment == | |||
{{Term|location=frontend|cmd=<code class="command">oarsub</code> -I -p "GPU='<code class="replace">YES</code>'" -l /nodes=1,walltime=2}} | |||
== CUDA installation == |
Revision as of 10:57, 3 December 2013
Purpose
This page presents how to use GPUs on Grid'5000 and how to install your own NVIDIA drivers and CUDA installation.
In this tutorial, we will first compile and use CUDA examples and in the second part, we will install NVIDIA drivers and compile CUDA 5 from a simple wheezy-x64-base environment.
Pre-requisite
- A basic knowledge of Grid'5000 is require, we suggest you to read Getting Started tutorial first.
- Information about hardware information and GPUs availability can be found on Special:G5KHardware.
Download and compile examples
GPU are available in:
- Grenoble (adonis)
- Lyon (orion)
- Lille (chirloute)
NVIDIA drivers 304.54 and CUDA 5.0 are installed by default on nodes.
You can reserve a node with GPU using OAR GPU property. For Grenoble and Lyon:
or for Lille:
![]() |
Warning |
---|---|
Please note that Lille GPU are shared between nodes and the oarsub command will differ from Grenoble or Lyon. When using GPUs at Lille, you may encounter some trouble, you can read more about Lille GPU on Lille:GPU. |
We will then download CUDA 5.0 samples and install them.
![]() |
node :
|
wget http://git.grid5000.fr/sources/cuda-samples_5.0.35_linux.run -P /tmp/ && cd /tmp |
You will be prompt to accept the EULA and for a installation path, we suggest you to use /tmp/samples/.
Then you can go to installation path, in our case /tmp/samples/
. If you list the directory, you will see a lot of folder starting with a number, these are CUDA examples. CUDA examples are describe in the document named Samples.html
. You might also want to have a look to doc
directory.
We will now compile examples, this will take a little time. From CUDA samples installation directory (/tmp/samples
), run make:
The process is complete when "Finished building CUDA samples" is printed. You should be able to run CUDA examples. We will try the one named Device Query
which is located in /tmp/samples/1_Utilities/deviceQuery/
. This sample enumerates the properties of the CUDA devices present in the system.
This is an example of the result on adonis cluster at Grenoble:
ebertoncello@adonis-2:/tmp/samples$ ./1_Utilities/deviceQuery/deviceQuery 1_Utilities/deviceQuery/deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 2 CUDA Capable device(s) Device 0: "Tesla T10 Processor" CUDA Driver Version / Runtime Version 5.0 / 5.0 CUDA Capability Major/Minor version number: 1.3 Total amount of global memory: 4096 MBytes (4294770688 bytes) (30) Multiprocessors x ( 8) CUDA Cores/MP: 240 CUDA Cores GPU Clock rate: 1296 MHz (1.30 GHz) Memory Clock rate: 800 Mhz Memory Bus Width: 512-bit Max Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048) Max Layered Texture Size (dim) x layers 1D=(8192) x 512, 2D=(8192,8192) x 512 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 16384 Warp size: 32 Maximum number of threads per multiprocessor: 1024 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 2147483647 bytes Texture alignment: 256 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): No Device PCI Bus ID / PCI location ID: 12 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > Device 1: "Tesla T10 Processor" CUDA Driver Version / Runtime Version 5.0 / 5.0 CUDA Capability Major/Minor version number: 1.3 Total amount of global memory: 4096 MBytes (4294770688 bytes) (30) Multiprocessors x ( 8) CUDA Cores/MP: 240 CUDA Cores GPU Clock rate: 1296 MHz (1.30 GHz) Memory Clock rate: 800 Mhz Memory Bus Width: 512-bit Max Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048) Max Layered Texture Size (dim) x layers 1D=(8192) x 512, 2D=(8192,8192) x 512 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 16384 Warp size: 32 Maximum number of threads per multiprocessor: 1024 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 2147483647 bytes Texture alignment: 256 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): No Device PCI Bus ID / PCI location ID: 10 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 2, Device0 = Tesla T10 Processor, Device1 = Tesla T10 Processor
Install CUDA from a base environement
Reservation and deployment