Difference between revisions of "Advanced SLURM"
(→Targeting Specific Node Architectures)
|Line 20:||Line 20:|
To target your job at the
To target your job at the ''Broadwell'' architecture, which has 4 nodes, each with two [http://ark.intel.com/products/91317 Intel E5-2699 V4] CPUs and 256GB of RAM:
Revision as of 10:04, 7 September 2016
Cancel All of Your Jobs
If you have a large number of jobs to cancel it would be tedious to individually cancel them with
scancel. This short command will cancel all of your jobs, so use it carefully.
$ for i in $(sacct --format=JobID --noheader --allocations); do scancel $i; done
Targeting Specific Node Architectures
Some of the SLURM partitions span multiple generations of hardware architecture, specifically:
debug. In some circumstances you may want to ensure that your jobs run on a specific node architecture, such as:
- MPI jobs may perform significantly better using homogenous nodes which are tightly coupled with Infiniband
- Some applications may have compiler optimization flags specific to CPU's built-in instructions, such as hardware AES encryption
- You may experience non-deterministic runtimes from one job to another if each runs on a different architecture
- None of the priority Partitions span multiple hardware architectures.
- We are frequently adding new nodes, and occasionally removing old nodes, so the instructions below may change frequently. If you are experiencing an issue with job targeting, please refer back to this page and ensure that your specific command is still accurate.
- Click here to view a comparison chart of the different CPU types on our compute nodes.
--exclude parameter is used to target a given job to a specific hardware architecture.
To target your job at the Broadwell architecture, which has 4 nodes, each with two Intel E5-2699 V4 CPUs and 256GB of RAM:
To target your job at the Haswell architecture, which has 175 nodes, each with two Intel E5-2690 V3 CPUs and 128GB of RAM:
Eight of the Haswell nodes have a higher amount of RAM, 192GB. These are available in the
serial_requeue partition. To target these nodes:
#SBATCH --partition=serial_requeue #SBATCH --exclude=cn[01-256,265-312]
To target your job at the Ivy Bridge architecture, which has 32 nodes, each with two Intel E5-2680 V2 CPUs and 128GB of RAM:
To target your job at the Sandy Bridge architecture, which has 40 nodes, each with two Intel E5-2650 CPUs and 64GB of RAM: