HPC Intermediate Python

From Storrs HPC Wiki
Revision as of 14:30, 22 May 2017 by Pan14001 (talk | contribs) (Add overview, reference sheet, and start on profiling section.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This article is a work in progress

The HPC Intermediate workshop is for any researcher to use UConn's computer cluster with at least some experience of the command-line or using clusters, even if it is not with using our particular cluster. Additionally, fluency in any programming language is expected to understand concepts like variables, functions, etc. This event will be more informal and we expect to fine tune the topics based on your experiences.

This workshop covers:

  • 70% Strategies to parallelize programs.
  • 20% Good practices and habits for numerical programming.
  • 10% A little vocabulary and theory.

If you are in the workshop classroom, sign-in as a student to UCONNHPC on socrative.com. If you don't see the lesson materials under /scratch/lesson-intermediate, they are also available on our public GitHub repository: HPC/lesson-intermediate

# Make your own copy of the lessons
mkdir -p /scratch/$USER
cd /scratch/$USER
git clone https://github.uconn.edu/HPC/lesson-intermediate.git


Learning doesn't work by simply by explaining a bunch of theory and doing practical problems. Memorable learning is forming associations to existing knowledge. Therefore many of the topics will be explained by example; by showing a less optimal way of solving a problem followed by the more accepted, preferred way.

To explain the broad topic of parallelism, we will be switching between coding in Python and using SLURM. Instead of using the plain, vanilla Python shell we will introduce IPython and use it throughout.

We will cover the following topics:

  1. Profiling code
    • Introduction to IPython
    • Using %timeit on loops, lists and arrays
    • Review of pandas library API
    • split-apply-combine
  2. Built-in SLURM parallelism
    • Job arrays are unpredictable
    • srun --multiprog works but has limitations
  3. Resume jobs using "checkpointing"
    • Using traps.
    • Guarding against data corruption.
  4. Disk usage best practices
    • Parallel IO
    • Binary formats like HDF5
    • Reading large files with MPI
  5. Multi-process parallelism
    • External process managers like GNU Parallel
    • Internal process management
    • Multi-threading not usually worth it
  6. General debugging
    • Using %debug in IPython
    • Recognize when you run out of memory
    • Disable MPI to see the underlying error
  7. Learning to be a better programmer
    • Writing a short package
    • Working on projects with real consequences
    • Reading good packages
    • Books


To be able to learn any deep subject well we also need to know a little bit of the vocabulary:

Cheatsheet for vocabulary
Term Description
Library / Module Reusable collection of code to accomplishes a specific task. "Libraries" is the more general term. In Python, libraries are called "modules". Always try to write the minimum code possible to accomplish your problem by using libraries.
Documentation Flavorful explanation of why you should use a particular program or library and how it can make your life easier. Often accompanied by short examples.
API Less flavorful explanation of libraries. Libraries don't just provide functions: they can also provide, variables (e.g. numpy.pi), operators (+, -, *, /), Python decorators, and so on.

Stands for "Application Programming Interface".

Checkpoint Saving progress to pickup close to where you left off from an interrupted job. Think about video game checkpoints of a convenient location where you can respawn after being killed :)
Multi-processing Make temporary copies of your program with the same functions and variables.
Multi-threading Share the same memory. More efficient, but also more complicated. Usually done with "OpenMP" in HPC.

Profiling code

Let's quickly recall the 7 elements common to every programming language:

  1. Store individual things (the number 2, the word "Hello")
  2. Store groups of things (lists, dictionaries, DataFrames, arrays)
  3. Commands that operate on things (the + symbol, the len() function)
  4. Ways to create chunks (functions, objects/classes, and packages)
  5. Ways to repeat yourself (for loops)
  6. Ways to make choices (if and try)
  7. Ways to combine chunks (function composition)

For the 20% of numerical programming today, we will mainly focus on #3: of being mindful of efficient operations.