Metropolis uses the SLURM (Simple Linux Utility for Resource Management System) batch job system. SLURM handles both serial and parallel jobs and it is responsible for allocating the required resources (CPU and Memory) for each job. Nodes at Metropolis are organized into virtual node groups called partitions. Each partition contains nodes with similar resources and jobs running in each partition share the same characteristics in terms of cpu, maximum allowed memory allocation, total cpu time and so on. The purpose of each partition and its characteristics are explained in the following table*:

Partition # of nodes Max. # of nodes Max. # of threads / node Max. memory per thread Max. running time Notes
test 15 1 10 1 GB 30 min Low priority jobs, for testing purposes only
master 52 52 10 4GB 2d Normal priority, small to medium jobs that require less or equal of 2d to complete
large 52 20 10 8GB 4d Normal priority, medium to large jobs
long 52 1 5 4GB unlimited For serial jobs only, restrictions apply per user and total number of jobs per user
vlong 40 10 20 4GB 30d Normal priority, large jobs
serial 10 1 special special special Normal priority, serial jobs *only*
parallel 40 20 40 special special MPI jobs *only*
special           Restrictions apply, based on the relevant project, user level

All jobs should be submitted and handled by the SLURM scheduler, otherwise they are discarded. Please, bare in mind that this is a multi user system. Resources are shared between users and groups and there is not such thing as "unlimited resources. That said, please read the following "code of ethics" (courtesy of Dr. Jacek Herbrych):

  • Think twice before starting very heavy calculations. Are the parameters correct? Is the calculation necessary at all?
  • Don't interfere with other users' work. Badly behaved programs will be terminated.
  • If you generate huge amounts of data, exclude them from backup. Try to keep your daily backup below 1 GB.
  • Try not to allocate more than 4 GB of RAM per thread. Currently only CPU usage counts in the fair share system, however this may change in the future since the number of cores per CPU grows faster than RAM sizes on typical computers.
  • Run parallel code whenever possible. OpenMP and MPI stacks are provided.
  • Try to keep your jobs short, ideally a few days at most. Parallelize your code to achieve that, or use checkpointing (your own or use a library like Berkeley checkpointing). The longer your job, the less likely it is to properly finish. Furthermore, there are other mechanisms of job failure, such as hard ECC errors, malfunctioning programs of other users, etc.

Resource allocation and job scirpts examples

More complex batch scripts examples

*Under review