|Category||General, Commandline utility|
Loading the Python module
For either method, you must first load the python module. Get a list of available versions using:
module avail python
For example, to load Python 3.4.3 you would then run:
module load python/3.4.3
Or if you need Python 2.7 for compatibility, the above
module avail command lists 2.7.6 at the time of writing:
module load python/2.7.6
Create a toy Python example script
Create the SLURM submission script
#SBATCH -n 1 python my_program.py
Please read Laurent Duchesne's excellent step-by-step guide for parallelizing your Python code using multiple processors and MPI.
On our cluster, to run MPI Python programs, mpi4py has been compiled against OpenMPI 1.10.1 therefore we need to load that additional package:
module load python/3.4.3 mpi/openmpi/1.10.1-gcc
Create the the test MPI example file as described in Laurent's guide above, using the same name
from mpi4py import MPI comm = MPI.COMM_WORLD rank = comm.Get_rank() size = comm.Get_size() print("I am rank", rank, "of", size)
Create the SLURM submission script
#SBATCH -n 4 mpirun python mpi.py
You should get output similar to:
I am rank 3 of 4 I am rank 0 of 4 I am rank 1 of 4 I am rank 2 of 4
Craig Finch has a more practical example for high throughput MPI on GitHub.
Installing Python libraries
Global package install
Please submit a ticket with the packages you would like installed and the Python version, and the administrators will install it for you.
Local package install
One can easily install Python packages to your home directory using:
pip install --user <name of your package>
A few reasons why you may want to use Jupyter notebooks on the cluster are:
- Plotting data.
- Being productive in the same familiar environment as your personal laptop, quickly edit code and preserve the output alongside that code.
- Integrating functions into a larger reusable script by first interactively using cluster-specific installed python libraries and environmental variables.
Because the Jupyter software is frequently updated it's usually not installed with the python modules, so once you load a python module version suitable for your work, you probably need to install jupyter notebook as a #local package install:
# Load the python module that is most useful to you: module purge modp python/3.5.2 # If needed, install jupyter notebook locally in your home ~/.local directory: python3 -m pip install --user --upgrade notebook
Then to run the notebook on the cluster you will need to run the notebook server on the compute node and connect to it from your laptop:
# On the cluster: # Add a particular --partition if you need, additional CPUs with --ntasks, etc: srun --pty bash jupyter notebook --no-browser --ip='*'
Here is some example output from the jupyter command:
[W 12:41:50.615 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended. [I 12:41:50.618 NotebookApp] Serving notebooks from local directory: /home/pan14001 [I 12:41:50.618 NotebookApp] The Jupyter Notebook is running at: [I 12:41:50.618 NotebookApp] http://cn360:8888/?token=fd0458cb33cc82ceaf276d2784e3dfba242ff2e3e60721ae [I 12:41:50.618 NotebookApp] or http://127.0.0.1:8888/?token=fd0458cb33cc82ceaf276d2784e3dfba242ff2e3e60721ae [I 12:41:50.618 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [C 12:41:50.623 NotebookApp] To access the notebook, open this file in a browser: file:///home/pan14001/.local/share/jupyter/runtime/nbserver-122777-open.html Or copy and paste one of these URLs: http://cn360:8888/?token=fd0458cb33cc82ceaf276d2784e3dfba242ff2e3e60721ae or http://127.0.0.1:8888/?token=fd0458cb33cc82ceaf276d2784e3dfba242ff2e3e60721ae [I 12:42:25.737 NotebookApp] 302 GET /?token=fd0458cb33cc82ceaf276d2784e3dfba242ff2e3e60721ae (192.168.100.1) 0.40ms
On your local machine, tunnel your local desktop traffic to the login node over SSH using:
# On your laptop ssh -NL localhost:8888:cn360:8888 email@example.com
... where cn360 is the node SLURM assigned us and where the jupyter notebook is running. You will almost certainly have to change cn360 to the node SLURM assigns you, and wou will also have to use your own NetID instead of abc12345.
One has to use that second 127.0.0.1 URL above. The "302 GET" line above appears when we successfully connect to the cluster notebook server from our local machine.
Current Supported Python Versions
2.7.6, 2.7.10, 3.4.3, 3.5.2
Check Installed Packages
ls all the python directories to see a package is installed:
# Check which Python versions have numpy installed ls -d /apps2/python/*/lib/python*/site-packages/numpy
If you already have a python module loaded, one can also see all the packages and versions installed with:
Latest Installed Packages List
1. Tensorflow: it already been installed in python 2.7.6. Currently, the cpu version is available. It has been passed our basic testings.
2. scikit-learn (0.17.1)
Notice: you need to load the module "intelics/2012.0.032" before use sklearn 0.17.1. Some functions such as "LinearRegression" depend on some libraries such as libmkl_rt.so which are included by this intel module