Revision as of 10:57, 3 December 2013

Purpose

This page presents how to use GPUs on Grid'5000 and how to install your own NVIDIA drivers and CUDA installation.
In this tutorial, we will first compile and use CUDA examples and in the second part, we will install NVIDIA drivers and compile CUDA 5 from a simple wheezy-x64-base environment.

Pre-requisite

A basic knowledge of Grid'5000 is require, we suggest you to read Getting Started tutorial first.
Information about hardware information and GPUs availability can be found on Special:G5KHardware.

Download and compile examples

GPU are available in:

Grenoble (adonis)
Lyon (orion)
Lille (chirloute)

NVIDIA drivers 304.54 and CUDA 5.0 are installed by default on nodes.
You can reserve a node with GPU using OAR GPU property. For Grenoble and Lyon:

frontend:

oarsub -I -p "GPU='YES'"

or for Lille:

lille:

oarsub -I -p "GPU='SHARED'"

	Warning
	Please note that Lille GPU are shared between nodes and the oarsub command will differ from Grenoble or Lyon. When using GPUs at Lille, you may encounter some trouble, you can read more about Lille GPU on Lille:GPU.

We will then download CUDA 5.0 samples and install them.

node:

wget http://git.grid5000.fr/sources/cuda-samples_5.0.35_linux.run -P /tmp/ && cd /tmp

node:

sh cuda-samples_5.0.35_linux.run -cudaprefix=/usr/local/cuda-5.0/

You will be prompt to accept the EULA and for a installation path, we suggest you to use /tmp/samples/.

Then you can go to installation path, in our case /tmp/samples/. If you list the directory, you will see a lot of folder starting with a number, these are CUDA examples. CUDA examples are describe in the document named Samples.html. You might also want to have a look to doc directory.

We will now compile examples, this will take a little time. From CUDA samples installation directory (/tmp/samples), run make:

node:

make

The process is complete when "Finished building CUDA samples" is printed. You should be able to run CUDA examples. We will try the one named Device Query which is located in /tmp/samples/1_Utilities/deviceQuery/. This sample enumerates the properties of the CUDA devices present in the system.

node:

/tmp/samples/1_Utilities/deviceQuery/deviceQuery

This is an example of the result on adonis cluster at Grenoble:

ebertoncello@adonis-2:/tmp/samples$ ./1_Utilities/deviceQuery/deviceQuery 
1_Utilities/deviceQuery/deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 2 CUDA Capable device(s)

Device 0: "Tesla T10 Processor"
 CUDA Driver Version / Runtime Version          5.0 / 5.0
 CUDA Capability Major/Minor version number:    1.3
 Total amount of global memory:                 4096 MBytes (4294770688 bytes)
 (30) Multiprocessors x (  8) CUDA Cores/MP:    240 CUDA Cores
 GPU Clock rate:                                1296 MHz (1.30 GHz)
 Memory Clock rate:                             800 Mhz
 Memory Bus Width:                              512-bit
 Max Texture Dimension Size (x,y,z)             1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048)
 Max Layered Texture Size (dim) x layers        1D=(8192) x 512, 2D=(8192,8192) x 512
 Total amount of constant memory:               65536 bytes
 Total amount of shared memory per block:       16384 bytes
 Total number of registers available per block: 16384
 Warp size:                                     32
 Maximum number of threads per multiprocessor:  1024
 Maximum number of threads per block:           512
 Maximum sizes of each dimension of a block:    512 x 512 x 64
 Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
 Maximum memory pitch:                          2147483647 bytes
 Texture alignment:                             256 bytes
 Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
 Run time limit on kernels:                     No
 Integrated GPU sharing Host Memory:            No
 Support host page-locked memory mapping:       Yes
 Alignment requirement for Surfaces:            Yes
 Device has ECC support:                        Disabled
 Device supports Unified Addressing (UVA):      No
 Device PCI Bus ID / PCI location ID:           12 / 0
 Compute Mode:
    < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "Tesla T10 Processor"
 CUDA Driver Version / Runtime Version          5.0 / 5.0
 CUDA Capability Major/Minor version number:    1.3
 Total amount of global memory:                 4096 MBytes (4294770688 bytes)
 (30) Multiprocessors x (  8) CUDA Cores/MP:    240 CUDA Cores
 GPU Clock rate:                                1296 MHz (1.30 GHz)
 Memory Clock rate:                             800 Mhz
 Memory Bus Width:                              512-bit
 Max Texture Dimension Size (x,y,z)             1D=(8192), 2D=(65536,32768), 3D=(2048,2048,2048)
 Max Layered Texture Size (dim) x layers        1D=(8192) x 512, 2D=(8192,8192) x 512
 Total amount of constant memory:               65536 bytes
 Total amount of shared memory per block:       16384 bytes
 Total number of registers available per block: 16384
 Warp size:                                     32
 Maximum number of threads per multiprocessor:  1024
 Maximum number of threads per block:           512
 Maximum sizes of each dimension of a block:    512 x 512 x 64
 Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
 Maximum memory pitch:                          2147483647 bytes
 Texture alignment:                             256 bytes
 Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
 Run time limit on kernels:                     No
 Integrated GPU sharing Host Memory:            No
 Support host page-locked memory mapping:       Yes
 Alignment requirement for Surfaces:            Yes
 Device has ECC support:                        Disabled
 Device supports Unified Addressing (UVA):      No
 Device PCI Bus ID / PCI location ID:           10 / 0
 Compute Mode:
    < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 2, Device0 = Tesla T10 Processor, Device1 = Tesla T10 Processor

Install CUDA from a base environement

Reservation and deployment

frontend:

oarsub -I -p "GPU='YES'" -l /nodes=1,walltime=2

CUDA installation

@@ Line 25: / Line 25: @@
 {{Term|location=lille|cmd=<code class="command">oarsub</code> -I -p "GPU='SHARED'"}}
-{{Warning|text=Please note that Lille GPU are shared between nodes and the oarsub command will differ from Grenoble or Lyon. When using GPUs at Lille, you may encounterer some trouble, you can read more about Lille GPU on [[Lille:GPU]].}}
+{{Warning|text=Please note that Lille GPU are shared between nodes and the oarsub command will differ from Grenoble or Lyon. When using GPUs at Lille, you may encounter some trouble, you can read more about Lille GPU on [[Lille:GPU]].}}
 We will then download CUDA 5.0 samples and install them.
@@ Line 32: / Line 32: @@
 You will be prompt to accept the EULA and for a installation path, we suggest you to use /tmp/samples/.<br /><br />
-Then you can go to installation path, in our case /tmp/samples/. If you list the directory, you will see a lot of folder starting with a number, these are CUDA examples. CUDA examples are describe in the document named Samples.html. You might also want to have a look to "doc" directory.<br /><br />
+Then you can go to installation path, in our case <code class="file">/tmp/samples/</code>. If you list the directory, you will see a lot of folder starting with a number, these are CUDA examples. CUDA examples are describe in the document named <code class="file">Samples.html</code>. You might also want to have a look to <code class="file">doc</code> directory.<br /><br />
-We will now compile examples, this will take a little time. From CUDA samples installation directory (/tmp/samples), run make:
+We will now compile examples, this will take a little time. From CUDA samples installation directory (<code class="file">/tmp/samples</code>), run make:
 {{Term|location=node|cmd=<code class="command">make</code>}}
-The process is complete when "Finished building CUDA samples" is printed. You should able to run CUDA examples. We will try the one named "Device Query" which is located in "1_Utilities/deviceQuery". This sample enumerates the properties of the CUDA devices present in the system.
+The process is complete when "Finished building CUDA samples" is printed. You should be able to run CUDA examples. We will try the one named <code class="file">Device Query</code> which is located in <code class="file">/tmp/samples/1_Utilities/deviceQuery/</code>. This sample enumerates the properties of the CUDA devices present in the system.
-{{Term|location=node|cmd=<code class="command">1_Utilities/deviceQuery/deviceQuery</code>}}
+{{Term|location=node|cmd=<code class="command">/tmp/samples/1_Utilities/deviceQuery/deviceQuery</code>}}
 This is an example of the result on adonis cluster at Grenoble:
-  ebertoncello@adonis-2:/tmp/samples$ 1_Utilities/deviceQuery/deviceQuery
+  ebertoncello@adonis-2:/tmp/samples$ ./1_Utilities/deviceQuery/deviceQuery
 _Utilities/deviceQuery/deviceQuery Starting...
@@ Line 110: / Line 110: @@
   deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 2, Device0 = Tesla T10 Processor, Device1 = Tesla T10 Processor
+= Install CUDA from a base environement =
+== Reservation and deployment ==
+{{Term|location=frontend|cmd=<code class="command">oarsub</code> -I -p "GPU='<code class="replace">YES</code>'" -l /nodes=1,walltime=2}}
+== CUDA installation ==

GPUs on Grid5000: Difference between revisions

Revision as of 10:57, 3 December 2013

Contents

Purpose

Pre-requisite

Download and compile examples

Install CUDA from a base environement

Reservation and deployment

CUDA installation

Navigation menu

GPUs on Grid5000: Difference between revisions

Revision as of 10:57, 3 December 2013

Purpose

Pre-requisite

Download and compile examples

Install CUDA from a base environement

Reservation and deployment

CUDA installation

Navigation menu

Search