title | author | theme |
Advanced Slurm usage |
Janne Blomqvist |
white |
Previously we have covered the basics of using slurm via the various slurm command line tools. In this session we'll focus on slightly more advanced usage, namely:
- Array jobs
- GPU jobs
- Job dependencies
- Multithreaded jobs (e.g. openMP)
- Parallel jobs with MPI
For running these "non-standard" types of jobs, you need additional options in your job submission.
Array jobs are the tool of choice when you are faced with a bunch of
data sets, and you need to run the same program on all the data
sets. In other words, you have an embarrassingly parallel
problem. For array jobs, the crucial additional parameter is
where you specify the array indices for the job. For each
individual job in the array, Slurm sets the environment variable
to the current array index.
#SBATCH -n 1
#SBATCH -t 04:00:00
#SBATCH --mem-per-cpu=2500
#SBATCH --array=0-29
srun ./my_application -input input_data_$SLURM_ARRAY_TASK_ID
cd ..
In order to allocate a GPU for your jobs, you need to use the slurm
GRES (Generic Resources) system. The syntax of the --gres
option is:
where you can optionally specify the type of GPU you want, and how many of those GPU's you want.
#!/bin/bash -l
#SBATCH -p gpushort
#SBATCH --gres=gpu
module load CUDA
srun --gres=gpu path/to/my_GPU_binary
To see which kinds of GPU's are available, run
slurm features
If you want to run a GPU job which requires two Tesla K80 GPU's, you need a GRES specification like
Job dependencies allow you to specify dependencies between slurm jobs. E.g. if job B needs some results that are calculated and written to a file by job A, then job B can specify a dependency saying that it can start only after A has finished successfully.
--dependency=<dependency list>
where <dependency list>
is a list of dependecies. E.g.
meaning that the job can start only after jobs with ID's 123 and 124 have both completed successfully.
So you have a workflow where you want, say, job B to start only after job A finishes successfully.
However, you don't know the job ID of job A before you have submitted it!
- So how to automate it?
Slurm, by itself, offers no good solution to this.
Create a shell script for submitting your dependent jobs
idA=$(sbatch jobA.sh | perl -lane 'print $F[3]')
sbatch --dependency=afterok:${idA} jobB.sh
(Error handling above omitted for brevity. For real work you'll want to handle errors when submitting jobA.)
Note: This also works for job arrays!
Slurm makes a distinction between multiple processes (tasks) and the number of threads for each task.
- If you ask for multiple tasks you might get allocated CPU's on multiple nodes, which won't work if you want to run a single process with multiple threads.
To specify the number of threads per task, use the
#SBATCH -c 4
If you're using OpenMP, always set
export OMP_PROC_BIND=true
To specify the number of tasks, use the -n
#SBATCH -n 48
Note: Slurm runs the job script on the first allocated node, it's up to you to make use of all the other task slots allocated!
In most cases, you're using MPI, and the MPI runtime Slurm integration takes care of setting up all the MPI ranks.
Yes, you can create an array job using multiple tasks, multiple threads per task, dependencies on other jobs etc.
Most likely, you won't need to do all of this at once!
- Create a chain of jobs A -> B -> C each depending on the succesful
completion of the previous job. In each job, run e.g.
sleep 120
(sleep for 2 minutes) to give you time to investigate the queue. What happens if at the end of the job A script, you putexit 1
- Run a GPU job where you run the deviceQuery sample application. Hint: To compile deviceQuery, you need to copy the samples directory and run make:
cp -a $CUDA_HOME/samples .
cd samples/1_Utilities/deviceQuery
- Create a parallel job script with
-n 10
. Runhostname
, thensrun hostname
. What happens?