DGX-2 User Guide

Use Slurm to Submit job

sbatch Sample

Preparing a job script then submitting via sbatch is most common use of SLURM. To feed a job script to the job scheduling system, SLURM uses

#!/bin/bash

#SBATCH --job-name=dgx-test
#SBATCH --partition=dgx2
#SBATCH --gres=gpu:1
#SBATCH --output=%j.out
#SBATCH --error=%j.err
#SBATCH -n 1

ulimit -l unlimited
ulimit -s unlimited

module load cuda/10.0.130-gcc-5.4.0 gcc/5.4.0-gcc-4.8.5

./cudaTensorCoreGemm

The parameter #SBATCH --gres=gpu:1 means use one gpu for this job. You can assign at most 16 gpu with #SBATCH --gres=gpu:16 within one dgx2 node.

srun Sample

srun can launch interactive jobs. This operation will block until completion or being terminated.

$ srun -u -p dgx2 -w vol01 --exclusive ./cudaTensorCoreGemm

Use Singularity

Transforming Mirrors

This operation requires root privileges,we suggest transforming on your privite Linux environment.

$ singularity pull library://sylabsed/examples/lolcow

After done it,uploading mirror to PI HPC.

SLURM Sample

#!/bin/bash

#SBATCH --job-name=dgx-test
#SBATCH --partition=dgx2
#SBATCH --output=%j.out
#SBATCH --error=%j.err
#SBATCH -n 1
#SBATCH --exclusive

ulimit -l unlimited
ulimit -s unlimited

singularity run  /path/to/lolcow_latest.sif