Parallel BZIP2


Data need to be packed and compressed for archiveing or transfer. There are multiple tools available like tar and gzip, bzip. Pbzip2 is a paeallel implementation of bzip2. For general information see bzip2 and pbzip2


The tool is available on all nodes without loading any module.


Parallel packing and compressing can be performed on a compute node using tar with a specified number of threads

As an example, a file or directory /data/to/pack can be packed and compressed into a file /packed/file.tar.bz2 using the job script:

#SBATCH --job-name="pbzip2"
#SBATCH --time=01:00:00
#SBATCH --mem-per-cpu=2G
## For parallel jobs with 8 cores
#SBATCH --cpus-per-task=8           ## select the amount of cores required

source="data/to/pack"               ## specify your data to compress
target="/packed/file.tar.bz2"       ## specify directory and filename

# archive dir data_unibe to a tar file and compress it using pbzip2
srun tar -cS $source | pbzip2 -p$SLURM_CPUS_PER_TASK > $target

# Generate a sha256 fingerprint, to later check the integrity 
sha256sum $target > ${target}.sha256sum