UBELIX - Overview

Description

This page provides a high-level system overview of a HPC cluster such as UBELIX. It describes the different hardware components that constitute the cluster and gives a quantitative list of the different generations of compute nodes in UBELIX.

UBELIX (University of Bern Linux Cluster) is a HPC cluster that currently consists of about 330 compute nodes featuring almost 10‘000 CPU cores and ~333‘000 GPU cores and a software-defined storage infrastructure providing ~3 PB of disk storage net. UBELIX is a heterogeneous cluster, meaning UBELIX consists of different generations of compute nodes with different instruction sets. Compute nodes, front-end servers and the storage are interconnected through a high speed Infiniband network. The front-end servers also provide a link to the outside world. UBELIX is used by various institutes and research groups within chemistry, biology, physics, astronomy, computer science, geography, medical radiology and others for scientific research and by students working on their thesis.

High-level system overview

System Overview Diagram The HPCs can only be reached within the UniBE network. User landing point are the login nodes, where jobs can be prepared and submitted.

Login node aka. Submit node

A user connects to the cluster by logging into the submit host via SSH. You can use this host for medium-performance tasks, e.g. to edit files or to compile programs. Resource-demanding/high-performance tasks must be submitted to the batch queuing system as jobs, and will finally run on one or multiple compute nodes. Even long running compile tasks should be submitted as a job on a compute node instead of running it on the submit host.

Batch-Queueing System

On UBELIX we use the open-source batch-queueing system Slurm, managing all jobs on the compute nodes. The job submission is described in detail in the Job handling section, starting with Submitting jobs.

The procedure look like:

  • defining resources: you as a user define the resources your job requires, including numbers of CPU cores, time limit, memory, etc. Not specified parameters are chosen with default values. These resources can be defines in the batch script or as command line arguments
  • submitting: you submit your job by calling sbatch, srun, or salloc. Therewith a job gets registered in Slurm, if it is within the specification and limits.
  • scheduling: Slurm is finding the optimal spots for the registered jobs on the resources and time. This also includes priority handling and optimizing for best coverage.
  • launch: Slurm prepares the environment on the selected compute resources. This also includes setting up the MPI environment, if requested interactive sessions, etc., and launching your batch script.
  • serial/parallel tasks: per default all the tasks defined in your batch script are run on the first core of your allocation. Compute tasks should be started with ‘srun’. Parallel task are launched on all (or as defined) job related resources.
  • cancelling/completing: When tasks finished, wall time limit or memory limit is reached the job ant its environment gets removed from the resources. All output is written into file(s) (except of interactive sessions)

Cluster Partitions (Queues) and their Compute Nodes

Partitions group nodes into logical sets. Nodes in a partition share the same limits. You can choose a partition depending on your jobs requirements (CPUs, memory, runtime). UBELIX provides several different partitions as shown in the following table including the corresponding hardware details of the compute nodes each partition is comprised of:

Partition name max runtime (wall clock time) in h max memory per node max cores/node GPU Node Group CPU Generation #Nodes #Cores RAM Local Scratch
all 96h 243GB 20 cores - hnodes[23-42]
jnodes
knodes
anodes[145-216]
sandybridge
sandybridge
ivybridge
broadwell
20
21
36
72
16
16
16
20
70GB
243GB
117GB
117GB
250GB
500GB
850GB
850GB
amd 96h 1TB 192 cores - bnode[001-026] AMD EPYC 2
empi 24h 117GB 20 cores - anodes[003-144] broadwell 142 20 117GB 850GB
long1 360h 86GB 24 cores - hnode[43-49] ivybridge 7 24 86GB 500Gb
debug 10m 117GB 20 cores - anodes[001-002] broadwell 2 20 117GB 850
gpu2 24h 243GB 24 GTX 1080Ti
Tesla P100
RTX 2080Ti
gnode[01-06]
gnode[07-08]
gnode[09-10]
broadwell 6
2
2
24 243 GB 850GB
gpu-debug 10m 243GB 24 GTX 1080Ti gnode[11-12] broadwell 12 24 243 GB 850GB

To access a specific nodes you may need to specify the partition or with --constrain option a specific architecture within a partition.

The all partition is the default partition if you do not specify one explicitly.

Storage Infrastructure

A modular, software-defined storage system (IBM Spectrum Scale) provides a shared, parallel file system that is mounted on all frontend servers and compute nodes. Ubelix also provides a limited amount of storage space on the Campus Storage. The different storage locations are summarized in the table below. For more information about the storage infrastructure see here.

Path Connection Availability Backup default Quota
/storage/homefs/$USER Network global yes3 1TB
/storage/workspaces/ Network global yes 10TB per research group, plus additional purchased storage
/scratch/network/ Network global no 50TB per space

  1. Due to the limited resources and the potentially long job runtime, access to the long partition must be requested explicitly once. 

  2. The gpu partition is closed by definition. If you need GPU resources, you have request access to this partition. Write an email to hpc@id.unibe.ch to do so. 

  3. Home Backup and snapshots will only be activated after the migration a Workspace introduction phase is finished