Using environment variables
When a SLURM batch scripr is submitted, the job scheduler will export or use a number of environment variables. Some of the most important variables are:
Variable | Meaning |
---|---|
SLURM_SUBMIT_DIR | Directory from which the script was submitted |
SLURM_JOB_USER | The username of the user that submitted the job |
SLURM_EXPORT_ENV | Which variables to export to shell |
SLURM_NNODES | The number of nodes allocated for the job |
SLURM_JOBID | An arithmetic value which identifies the job |
SLURM_NODELIST | The names of the nodes on which the job will run |
SLURM_SUBMIT_HOST | The hostname of the node from which the job was submited |
SLURM_NTASKS_PER_NODE | Number of tasks requested per node |
SLURM_NTASKS_PER_SOCKET | Number of tasks requested per physical processor |
All these variables may be used inside a batch script in order to control the processor keep track of the progress of a job. For example:
#!/bin/bash
#SBATCH -J my_hybrid_job
#SBATCH --cpus-per-task=2
#SBATCH --ntasks=16
#SBATCH --partition=parallel
srun my_parallel_program
echo "Job with ID: $SLURM_JOBID used $SLURM_NNODES number of nodes and $SLURM_NTASKS_PER_SOCKET task per socket"
Using Python as the batch script interpreter
It is possible to use Python as the batch cript interpreter, which means that you can create a batch script entirely with Python. For example:
#!/bin/env python
#SBATCH --time=00:30:00
#SBATCH --ntasks=100
#SBATCH --partition=parallel
from mpi4py import MPI
from pk import work
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
work(rank, size)
Disabling Hyper-Threading
Sometimes, the use of the Hyperthreading Technology (HT) can cause serious performance issues. Since HT is enabled by default for all nodes in the Metropolis cluster, in case you need to disable it you must instruct the scheduler to do so. This can be achieved with a the --extra-node-info switch:
#SBATCH --extra-node-info=2:10:1
In the above example, we instruct the job scheduler that ech node has 2 processors, 10 cores per processor, but only 1 thread per core. In this way HT is virtually disabled and our task will use just one thread per core.
Hybrid jobs: MPI and OpenMP
It is possible to have an MPI job that requires both MPI and OpenMP. To take advantage of OpenMP, we need to specify the number of OpenMP threads via the OMP_NUM_THREADS environment variable. Additionally, to avoid running all the threads in a single core, we should disable core affinity. A batch script that handles such jobs, could be like this:
#!/bin/bash
#SBATCH -J my_hybrid_job
export OMP_NUM_THREADS=16
export MV2_ENABLE_AFFINITY=0
module load mpi/ofed/mpich2
srun -n 64 my_hybrid_executable
In the above example, we define the number of OpenMP threads to 10 and we instruct MPIVCH to disable affinity so that the threads are not pinned on one core. Please note that, when using this approach, we may result into decreased performance due to thread replacement by the operating system.
Allocating tasks to nodes, cores and threads
Requeueing jobs
There is always the possibility that a node will fail while executing your jobs (e.g: hardware failure). In this case, you might want your job to be automatically requeued or not. You can control this behavior by using the --requeue or the --no-requeue switch (command line and batch script switches).
Job Arrays
SLURM offers a mechanism for submitting and managing collections of similar jobs, given that all jobs share the same initial options, which can be later altered with the scontrol update command. To explain the mechanism used for job arrays, we will use a simple example. Let's assume that we need to run a program named "vectormap" which maps 10 datasets to 10 individual vectors. We can create several batch files, each one for each run, or even one single batch file in which will call the vectormap program several times. With job arrays, we have the flexibility to create a single batch script in which we will call the vectormap program onle once and SLURM will split the process into 10 individual jobs.
In order to achieve that, first we need to create a batch script (let's call it vectormap.sh) like this:
#!/bin/bash
#SBATCH -J vectormap
#SBATCH -o vectormap%A%a.out # Standard output
#SBATCH -e vectormap%A%a.err # Standard error
vectormap dataset"${SLURM_ARRAY_TASK_ID}".inp
and then, we can submit the script using the sbatch command but with the --array switch:
sbatch --array=1-10 vectormap.sh
For a more detailed description of job arrays, please visit the SLURM Job Arrays page.