This page contains all information you need to submit GPU-jobs successfully on Ubelix.
Important Information on GPU Usage
Code that runs on the CPU will not magically make use of GPUs by simply submitting a job to the ‘gpu’ partition! You have to explicitly adapt your code to run on the GPU. Also, code that runs on a GPU will not necessarily run faster than it runs on the CPU. For example, GPUs are not suited to handle tasks that are not highly parallelizable. In other words, you must understand the characteristics of your job, and make sure that you only submit jobs to the ‘gpu’ partition that can actually benefit from GPUs.
Ubelix currently features four types of GPUs. You have to choose an architecture and use the following
--gres option to select it.
|Type||SLURM gres option|
|Nvidia Geforce GTX 1080 Ti||
|Nvidia Geforce RTX 2080 Ti||
|Nvidia Geforce RTX 3090||
|Nvidia Tesla P100||
Use the following options to submit a job to the gpu partition using the default job QoS:
#SBATCH --partition=gpu #SBATCH --gres=gpu:<type>:<number_of_gpus>
For investors we provides investor partitions with specific QoS for each investor, defining the purchased resources. In case of GPU we want/need to provide instant access to purchased GPU resources. Nevertheless, to efficiently use all resources, the
job_gpu_preemt exists in the
gpu partition. Jobs, submitted with this QoS, may interrupted if resources are required for investors. Short jobs, and jobs with checkpointing benefit from these additional resources.
For example requesting 4 RTX2080Ti
#SBATCH --partition=gpu #SBATCH --qos=job_gpu_preempt #SBATCH --gres=gpu:rtx2080ti:4
Use the following option to ensure that the job, if preempted, won’t be requeued but canceled instead:
CUDA versions are now managed through modules. Run module avail to see which versions are available:
module avail CUDA ---- /software.el7/modulefiles/all ---- CUDA/8.0.61 cuDNN/7.1.4-CUDA-9.2.88 CUDA/9.0.176 cuDNN/184.108.40.206-gcccuda-2019a (D) CUDA/9.1.85 fosscuda/2019a CUDA/9.2.88 fosscuda/2019b (D) CUDA/10.1.105-GCC-8.2.0-2.31.1 gcccuda/2019a CUDA/10.1.243 (D) gcccuda/2019b (D) cuDNN/6.0-CUDA-8.0.61 OpenMPI/3.1.3-gcccuda-2019a cuDNN/7.0.5-CUDA-9.0.176 OpenMPI/3.1.4-gcccuda-2019b cuDNN/7.0.5-CUDA-9.1.85
Run module load
module load cuDNN/7.1.4-CUDA-9.2.88
If you need cuDNN you must load the cuDNN module. The appropriate CUDA version is then loaded as a dependency.
CUDA C/C++ Basics: http://www.nvidia.com/docs/IO/116711/sc11-cuda-c-basics.pdf
Nvidia Geforce GTX 1080 Ti: https://www.nvidia.com/en-us/geforce/products/10series/geforce-gtx-1080-ti
Nvidia Geforce RTX 2080 Ti: https://www.nvidia.com/de-de/geforce/graphics-cards/rtx-2080-ti/ Nvidia Geforce RTX 3090: https://www.nvidia.com/de-de/geforce/graphics-cards/30-series/rtx-3090/ Nvidia Tesla P100: http://www.nvidia.com/object/tesla-p100.html