Parallel BZIP2
Data need to be packed and compressed for archiveing or transfer. There are multiple tools available like tar and gzip, bzip. Pbzip2 is a paeallel implementation of bzip2. For general information see bzip2 and pbzip2
The tool is available on all nodes without loading any module.
Parallel packing and compressing can be performed on a compute node using tar
with a specified number of threads
As an example, a file or directory /data/to/pack
can be packed and compressed into a file /packed/file.tar.bz2
using the job script:
#SBATCH --job-name="pbzip2"
#SBATCH --time=01:00:00
#SBATCH --mem-per-cpu=2G
## For parallel jobs with 8 cores
#SBATCH --cpus-per-task=8 ## select the amount of cores required
source="data/to/pack" ## specify your data to compress
target="/packed/file.tar.bz2" ## specify directory and filename
# archive dir data_unibe to a tar file and compress it using pbzip2
srun tar -cS $source | pbzip2 -p$SLURM_CPUS_PER_TASK > $target
# Generate a sha256 fingerprint, to later check the integrity
sha256sum $target > ${target}.sha256sum