Difference between revisions of "Advanced SLURM"

From Storrs HPC Wiki
Jump to: navigation, search
(Broadwell)
(Sandy Bridge)
Line 43: Line 43:
 
To target your job at the ''Sandy Bridge'' architecture, which has 40 nodes, each with two [http://ark.intel.com/products/64590 Intel E5-2650] CPUs and 64GB of RAM:
 
To target your job at the ''Sandy Bridge'' architecture, which has 40 nodes, each with two [http://ark.intel.com/products/64590 Intel E5-2650] CPUs and 64GB of RAM:
  
  #SBATCH --exclude=cn[01-64,105-312]
+
  #SBATCH --exclude=cn[01-64,105-328]

Revision as of 10:11, 7 September 2016

Cancel All of Your Jobs

If you have a large number of jobs to cancel it would be tedious to individually cancel them with scancel. This short command will cancel all of your jobs, so use it carefully.

$ for i in $(sacct --format=JobID --noheader --allocations); do scancel $i; done

Targeting Specific Node Architectures

Some of the SLURM partitions span multiple generations of hardware architecture, specifically: general, serial, and debug. In some circumstances you may want to ensure that your jobs run on a specific node architecture, such as:

  • MPI jobs may perform significantly better using homogenous nodes which are tightly coupled with Infiniband
  • Some applications may have compiler optimization flags specific to CPU's built-in instructions, such as hardware AES encryption
  • You may experience non-deterministic runtimes from one job to another if each runs on a different architecture

Notes:

  • None of the priority Partitions span multiple hardware architectures.
  • We are frequently adding new nodes, and occasionally removing old nodes, so the instructions below may change frequently. If you are experiencing an issue with job targeting, please refer back to this page and ensure that your specific command is still accurate.
  • Click here to view a comparison chart of the different CPU types on our compute nodes.

SLURM's --exclude parameter is used to target a given job to a specific hardware architecture.

Broadwell

To target your job at the Broadwell architecture, which has 4 nodes, each with two Intel E5-2699 V4 CPUs and 256GB of RAM:

#SBATCH --exclude=cn[01-324]

Haswell

To target your job at the Haswell architecture, which has 175 nodes, each with two Intel E5-2690 V3 CPUs and 128GB of RAM:

#SBATCH --exclude=cn[01-136,325-328]

Eight of the Haswell nodes have a higher amount of RAM, 192GB. These are available in the serial_requeue partition. To target these nodes:

#SBATCH --partition=serial_requeue
#SBATCH --exclude=cn[01-256,265-312]

Ivy Bridge

To target your job at the Ivy Bridge architecture, which has 32 nodes, each with two Intel E5-2680 V2 CPUs and 128GB of RAM:

#SBATCH --exclude=cn[01-104,137-312]

Sandy Bridge

To target your job at the Sandy Bridge architecture, which has 40 nodes, each with two Intel E5-2650 CPUs and 64GB of RAM:

#SBATCH --exclude=cn[01-64,105-328]