Metropolis uses the SLURM (Simple Linux Utility for Resource Management System) batch job system. SLURM handles both serial and parallel jobs and it is responsible for allocating the required resources (CPU and Memory) for each job. Nodes at Metropolis are organized into virtual node groups called partitions. Each partition contains nodes with similar resources and jobs running in each partition share the same characteristics in terms of cpu, maximum allowed memory allocation, total cpu time and so on. The purpose of each partition and its characteristics are explained in the following table*:
Partition | # of nodes | Max. # of nodes | Max. # of threads / node | Max. memory per thread | Max. running time | Notes |
---|---|---|---|---|---|---|
test | 15 | 1 | 10 | 1 GB | 30 min | Low priority jobs, for testing purposes only |
master | 52 | 52 | 10 | 4GB | 2d | Normal priority, small to medium jobs that require less or equal of 2d to complete |
large | 52 | 20 | 10 | 8GB | 4d | Normal priority, medium to large jobs |
long | 52 | 1 | 5 | 4GB | unlimited | For serial jobs only, restrictions apply per user and total number of jobs per user |
vlong | 40 | 10 | 20 | 4GB | 30d | Normal priority, large jobs |
serial | 10 | 1 | special | special | special | Normal priority, serial jobs *only* |
parallel | 40 | 20 | 40 | special | special | MPI jobs *only* |
special | Restrictions apply, based on the relevant project, user level |
All jobs should be submitted and handled by the SLURM scheduler, otherwise they are discarded. Please, bare in mind that this is a multi user system. Resources are shared between users and groups and there is not such thing as "unlimited resources. That said, please read the following "code of ethics" (courtesy of Dr. Jacek Herbrych):
- Think twice before starting very heavy calculations. Are the parameters correct? Is the calculation necessary at all?
- Don't interfere with other users' work. Badly behaved programs will be terminated.
- If you generate huge amounts of data, exclude them from backup. Try to keep your daily backup below 1 GB.
- Try not to allocate more than 4 GB of RAM per thread. Currently only CPU usage counts in the fair share system, however this may change in the future since the number of cores per CPU grows faster than RAM sizes on typical computers.
- Run parallel code whenever possible. OpenMP and MPI stacks are provided.
- Try to keep your jobs short, ideally a few days at most. Parallelize your code to achieve that, or use checkpointing (your own or use a library like Berkeley checkpointing). The longer your job, the less likely it is to properly finish. Furthermore, there are other mechanisms of job failure, such as hard ECC errors, malfunctioning programs of other users, etc.
Resource allocation and job scirpts examples
More complex batch scripts examples
*Under review