HPC Intermediate

From Storrs HPC Wiki
Jump to: navigation, search

The HPC Intermediate workshop is for any researcher to use UConn's computer cluster with at least some experience of the command-line or using clusters, even if it is not with using our particular cluster. This event will be more informal and we expect to fine tune the topics based on your experiences.

This workshop covers:

  • Optimizing your jobs
  • Troubleshooting
  • Power user shortcuts
  • Compiling and running your own software
  • Answering complex usage questions

If you are in the workshop classroom, sign-in as a student to UCONNHPC on socrative.com. If you don't see the lesson materials under /scratch/lesson-intermediate, they are also available on our public GitHub repository: HPC/lesson-intermediate

cd /scratch/$USER
git clone https://github.uconn.edu/HPC/lesson-intermediate.git

Unix Shell automation

Commands below assume you are using the bash shell. Equivalent commands are given after the bash commands in case you are using the csh shell instead.

Check for the shell you are using with:

echo $0

Aliases

Save yourself typing; create aliases:

# the command we want to alias
htop
# one way
alias htop='htop -u $USER' # for csh, no =
type htop                  # for csh, use `where` instead
# another way
alias htop="htop -u $USER" # for csh, no =
type htop                  # for csh, use `where` instead
# ignore alias
command htop               # for csh, use \htop
# remove alias
unalias htop
type htop                  # for csh, use `where` instead

Our sjobs command is an alias!

type sjobs                 # for csh, use `where` instead

Challenge 1: Create an alias to `cd` to the scratch directory by typing `scratch` (show answer)

Functions

cp /scratch/lesson-intermediate ~
cd ~/lesson-intermediate
# for bash
cat watch-sbatch.sh
source watch-sbatch.sh
# for csh
cat stail
set path = (`pwd` $path)
stail

Job troubleshooting

cd slurm-troubleshooting
# Let's see what nodes are free
sinfo -s
# Try out stail
stail bash-fail.slurm

Challenge 2: Why did the job fail? (show answer)

# What does success look like?
stail bash-success.slurm

There are two useful debugging features enabled in "bash-debug.slurm"

# Getting debug information
stail bash-debug.slurm
# Why is this stuck in pending?
stail bash-pending.slurm
# Use Control + C to cancel out.
# To find why, see the output of:
squeue -u $USER
stimes -u $USER
# Related and also useful
sprio -u $USER
# Cancel our job
sjobs
scancel -u $USER
sjobs
make clean
srun --partition=phi hostname

History-fu

Ctrl + R Reverse history search of shell history.

Ctrl + G to exit search mode.

mkdir my-descriptive-long-name
cd 

Retype last word:

  • Bash: Alt + . (Use ESC on OSX for Alt)
  • csh: Alt + Shift + _

Multiple windows using tmux and screen

The RedHat version of tmux is too old, so we have a module:

module load tmux
tmux

Shortcuts (all shortcuts are prefixed by Ctrl + B):

  • " Split window horizontally
  • % Split window vertically
  • Arrow keys Move between windows (Wait a second or two before recalling commands)
  • Space Cycle pane layouts
  • Ctrl + C Create new window
  • N Go to next window
  • , Rename window
  • D to disconnect
# Reconnect to detached session
tmux attach -d

Environmental variables

Challenge 3: How do you make your program use an environmental variable?

  1. Create the variable in a terminal and restart the program # Learner does not understand how the environment is set
  2. Export the variable and restart the program # Learner does not understand that environment needs to be selectively imported
  3. Read the environmental variable from the program # Correct answer
  4. Use the variable name "ENVIRONMENT" # Does not understand the concept of environment

(show answer)

Set the environmental variable TERM to xterm to enable color:

TERM=xterm

Job optimization

$ sacct
$ sacct -l
# Lots of info!  Let's break that down.
$ man sacct
$ type sjobs
$ cat /scratch/lesson-intermediate/job-aliases.sh # or .csh
$ source /scratch/lesson-intermediate/job-aliases.sh # or .csh
# CPU usage
$ job-cpu
$ job-cpu 761842
       JobID      NCPUS    Elapsed     MinCPU      CPUTime 
------------ ---------- ---------- ---------- ------------ 
761842               96   05:50:33             23-08:52:48 
761842.batch         24   05:50:33   00:00:00   5-20:13:12 
761842.1             96   05:50:31   05:49:57  23-08:49:36 
# Memory usage
$ job-mem 761842
       JobID  MaxVMSize     MaxRSS     AveRSS 
------------ ---------- ---------- ---------- 
761842                                        
761842.batch    652640K      7548K      7548K 
761842.1        502984K     94908K     40215K
# Disk usage
$ job-disk 761842
       JobID  MaxDiskRead MaxDiskWrite MaxPages 
------------ ------------ ------------ -------- 
761842                                          
761842.batch           2M           1M        0 
761842.1             119M        4850M        0
# Exercise
# See disk usage for 760356
# See CPU usage for 576568

In summary, SLURM is good for disk usage, but not as useful for memory and CPU. htop is better for run-time performance profiling (RES in htop = RSS in SLURM).

Install software

cd ~/lesson-intermediate/install-apps
bash python-example
# Loading the module in the subshell did not affect our current environment
module list
cutadapt # error
~/.local/bin/cutadapt -h
PATH="~/.local/bin:$PATH" cutadapt     # bash only
env PATH="~/.local/bin:$PATH" cutadapt # csh only
cutadapt # error again: change to PATH was temporary
export PATH="~/.local/bin:$PATH" # bash only
setenv PATH "~/.local/bin:$PATH" # csh only
cutadapt # now works

Compile and install software

Compilers / module name

  • GCC - gcc
  • Intel - intelics
  • PGI (Portland Group) - pgi

MPI

  • MPICH (MPI "Chamelion" started by Argonne National Labs) - mpi/mpich
  • MVAPICH (InfiniBand optimized version by Ohio State University) - mpi/mvapich and mpi/mvapich2
  • OpenMPI - mpi/openmpi
  • Intel MPI - intelics

Let's compile some software. As many of you are using R, power users might like using ESS to do the same sort of things you would do in RStudio at the command-line. Let's fetch the latest version of GNU Emacs from gnu.org. At the top of the page, choose "GNU/Linux", click the "main GNU ftp server" and at the bottom of the page get the latest version. At the time of writing, this is 25.1. The *.tar.xz file is smaller, although we would need to load the xz module. Right-click on that link and "copy link location". Then type wget in the terminal and paste the link:

wget https://ftp.gnu.org/gnu/emacs/emacs-25.1.tar.xz
module load xz
tar -xf emacs-25.1.tar.xz
cd emacs-25.1
./configure --help

The typical flow of installing software is:

  • ./configure
  • make
  • make install

You should always specify a --prefix in ./configure because we want to install to our non-standard home directory instead of the traditional /usr/local/ directory which you don't have access to. Also we don't need most of the other configured features, so we will use --without-all and --with-x-toolkit=no per the ./configure --help output.

./configure --prefix=~/apps/emacs --without-all --with-x-toolkit=no # error: relative path
echo $HOME  # ./configure treats ~ literally, but $HOME will expand in the shell before it reaches ./configure
./configure --prefix=$HOME/apps/emacs --without-all --with-x-toolkit=no
# Compile and install faster using 12 CPU cores:
make -j 12
make -j 12 install

To compile on a node, try using the debug partition and limiting the architecture.

Troubleshooting

LAMMPS error - remove MPI

Appendix

Answers

Challenge 1

# bash
alias scratch='cd /scratch/$USER'
# csh (no equal sign)
alias scratch 'cd /scratch/$USER'

Challenge 2

There is a typo in the submission file. Because of the space character between bin and /bash, SLURM is looking for an executable file called bin in the root directory and passing it the first argument /bash

$ diff -u bash-fail.slurm bash-success.slurm 
--- bash-fail.slurm	2017-03-19 22:13:31.473867000 -0400
+++ bash-success.slurm	2017-03-19 22:15:09.780362131 -0400
@@ -1,4 +1,4 @@
-#!/bin /bash
+#!/bin/bash
 # Submit a 1 minute job.
 #SBATCH --partition=phi
 #SBATCH --time=1:00

Challenge 3

Challenge 3: Answer = 3

  1. Create the variable in a terminal and restart the program - The environment is not always set by the terminal. It may be set by other programs
  2. Export the variable and restart the program - The environmental variable needs to be selectively imported
  3. Read the environmental variable from the program - Correct answer
  4. Use the variable name "ENVIRONMENT" - An environment is comprised of several variables, not a single variable named "ENVIRONMENT"