IPython

From Storrs HPC Wiki
Jump to: navigation, search

IPython is so much more useful than the typical python shell, that the first thing to do as a programmer on a new computer is to install IPython before anything else! IPython gives you features like tab completion, running programs from the shell, editing files from inside python, saving history, and exploring source code among other things. To use the latest Version 6 of IPython, you need python 3.4 or later.

Don't worry if you have never programmed with Python before. The language is fun to learn. Learning python is beyond the scope of this guide, so feel free to consult the resources in the sidebar of the Python subreddit. If you need help and are new to programming in general, you can post to Learn Python or contact the HPC admins if your question is cluster related.

Requirements for this guide:

  • Some experience with the shell
  • Some experience with any programming language

Learning objectives:

  • Overview of IPython
  • Building up a script
  • Running python programs
  • Debugging errors

This introduction to IPython is part of the HPC intermediate workshop.

Install IPython

First check the version of Python you are currently using.

python --version
#> Python 2.6.6

The operating system version of Python 2.6.6 is ancient, so if that's the version you have by default, you should load a more recent version. We need Python 3.4 or later for IPython 6's static code completion, so let's use the latest python 3.6.1:

module -t avail python
module load python/3.6.1
python --version
#> Python 3.6.1

I promise that is the last time we will use python ^_~

Now we can install IPython:

pip3 install --user ipython
#> Successfully installed decorator-4.0.11 ipython-6.0.0 ipython-genutils-0.2.0 jedi-0.10.2 pexpect-4.2.1 pickleshare-0.7.4 prompt-toolkit-1.0.14 ptyprocess-0.5.1 pygments-2.2.0 simplegeneric-0.8.1 traitlets-4.3.2 wcwidth-0.1.7

The --user argument is to install the packages to our home directory under ~/.local/ because we don't have write permissions to the system directory /apps2/python/3.6.1/.

Now IPython is installed! But if we try to run ipython we get an error because it is not yet in the $PATH:

ipython
#> -bash: ipython: command not found

Let's edit our ~/.bashrc so that the shell knows where to search for the program:

echo 'export PATH=$HOME/.local/bin:$PATH' >> ~/.bashrc
source ~/.bashrc
which ipython
#> ~/.local/bin/ipython

Run ipython:

ipython
#> Python 3.6.1 (default, May 13 2017, 20:28:40) 
#> Type 'copyright', 'credits' or 'license' for more information
#> IPython 6.0.0 -- An enhanced Interactive Python. Type '?' for help.
#> 
#> In [1]:

Overview

Spent 2 minutes to read the builtin help as suggested by the startup message:

?

Like many IPython commands, the help opens up inside of the less pager. To navigate, use your the keyboard commands for less you know and love:

Keyboard Shortcut Action
arrow keys Move one line
space Next page
b Previous page
g Jump to top
G Jump to bottom
/word Search for "word"
n Next search result
N Previous search result
q Quit
h Help

As the help screen explains, IPython gives us all the power of the shell inside of python. We can run shell commands inside of Python, so that we never heave to leave python to run various housekeeping commands. This is possible using what it calls "magic". All magic commands begin with a % including the command to learn about "magic"!

Exit out of the help screen and let's read the short form of magic commands:

%quickref

One can see the equivalent full help with %magic

%magic

As you can see the help is vast. For now let's look at the few most important commands by creating our project directories.

# The [TAB] here means use the TAB key; don't actually type "[TAB]" ^_~

cd /scr[TAB]
#> cd /scratch/
cd /scratch/
#> '/gpfs/scratchfs1'

ls
#> (Lots of directories!)

mkdir -p abc12345/ipy # Replace with your NetID!
cd abc12345/ipy       # Replace with your NetID!
#> /gpfs/scratchfs1/abc12345/ipy

mkdir data src results
ls
#> data/  results/  src/

!find
#> .
#> ./results
#> ./data
#> ./src

!tree -F $PWD
#> /gpfs/scratchfs1/abc12345/ipy
#> ├── data/
#> ├── results/
#> └── src/
#> 
#> 3 directories, 0 files

To run any shell command, we put an exclamation mark ! before the command name. We used the !find and !tree commands to recursively print all out directores, though tree is a nicer format for humans (and worse for machines!) $PWD is a shell variable which we are able to access inside of IPython only when running shell commands with !. Strictly speaking, the $PWD was not necessary, but it nicely shows the absolute path and demonstrates that shell variables are usable with shell commands inside IPython.

Now that we have our directory structure we can move on to the next section of creating our Python script.

Running programs

We will use the IPython shell to try many lines of python code, some of which are will then save to our script.

Let's create our script. Create a new script with your favorite editor and then run the script using the run magic command.

# Use your favorite editor here like emacs or vim.  Or nano if you you're not familiar with either:
!nano src/plot.py

Create your script similar to:

1 # coding: utf-8
2 '''Explore airline data'''
3 
4 import os
5 import pandas as pd

Save your file and exit from your editor. Back in the IPython shell, run your program:

run src/plot.py
#> (No output if all is well!)

If you need to install any packages, run !pip3 install --user NAME_OF_PACKAGE.

We can see that the os and pd libraries have been imported using python's dir() function, but IPython has better who* commands:

dir()
#> ...
#> 'os',
#> 'pd',
#> 'quit']

who
#> os      pd

whos
#> Variable      Type         Data/Info
#> ------------------------------------
#> os            module       <module 'os' from '/apps2<...>6.1/lib/python3.6/os.py'>
#> pd            module       <module 'pandas' from '/a<...>ages/pandas/__init__.py'>

Note that unlike running a script from the command-line with python path/to/file.py, running a program from inside IPython allows us to inspect the variables to play around, or to troubleshoot if something goes wrong.

Building a Script

At this point, our program does nothing useful.

Let's read in some airline data.

cd data/
#> /gpfs/scratchfs1/abc12345/ipy/data

!wget http://stat-computing.org/dataexpo/2009/1987.csv.bz2
!bzip2 -d 1987.csv.bz2
ls
#> 1987.csv

!du -hs 1987.csv
#> 122M    1987.csv

cd ../src
#> /gpfs/scratchfs1/abc12345/ipy/src

df = pd.read[TAB]
df = pd.read_csv('../data/[TAB]'
df = pd.read_csv('../data/1987.csv')

type(df)
#> pandas.core.frame.DataFrame

df.columns
#> Index(['Year', 'Month', 'DayofMonth', 'DayOfWeek', 'DepTime', 'CRSDepTime',
#>        'ArrTime', 'CRSArrTime', 'UniqueCarrier', 'FlightNum', 'TailNum',
#>        'ActualElapsedTime', 'CRSElapsedTime', 'AirTime', 'ArrDelay',
#>        'DepDelay', 'Origin', 'Dest', 'Distance', 'TaxiIn', 'TaxiOut',
#>        'Cancelled', 'CancellationCode', 'Diverted', 'CarrierDelay',
#>        'WeatherDelay', 'NASDelay', 'SecurityDelay', 'LateAircraftDelay'],
#>       dtype='object')

Now we want to save the df = read_csv(...) line to our plot.py script, but we don't want to copy and paste the command. We can use the save IPython magic command

?save

However save requires knowing the line number. We have run a couple of commands, so let's print out a list of our commands with their line numbers:

hist -n
# (Lots of commands!)
hist -nl
# (Only last 10 lines)

In my output, the command corresponds to IPython line 83, so I will append only that line:

!save -a ./plot.py 83 # Change 83 to whatever you see in your output!
#> The following commands were written to file `./plot.py`:
#> df = pd.read_csv('../data/1987.csv')

We will be using the save command frequently, and IPython has a builtin variable _i for the last line we ran. The pass statement in python does nothing, so let's save that to the file for testing:

pass
save -a plot.py _i
#> The following commands were written to file `plot.py`:
#> pass

Now whenever we need to save the last line, we can type [Ctrl] + [R] to recall previous commands and then start typing "save" and hit [Enter]

save -a plot.py _i
#> I-search backward: sa[Enter][Enter]

This saves us from going back and forth while editing our script.

Configuration options

Try inspecting our DataFrame of airline times:

df.[TAB]
#> (After several seconds a menu pops up!)

There is an open issue with tab completion on large objects[1]. As a workaround, we can disable jedi completion:

%config
%config IPCompleter
%config IPCompleter.use_jedi=False
df.[TAB]
#> (Now completion menu pops up immediately!)

It's difficult to all the completion options for a complicated pandas DataFrame object. Let's switch to the old completion behavior:

One cannot move one page at a time because we are in multicolumn mode. Let's change that to the older readline way:

%config TerminalInteractiveShell.display_completions
#> 'multicolumn'
%config TerminalInteractiveShell.display_completions='readlinelike'
df.[TAB]

Unlike the previous setting, this change require a restart of IPython. Also the configuration changes by using %config are temporary. We need to save them in a profile configuration file[2] to apply them permanently.

!ipython profile locate
#> /home/abc12345/.ipython/profile_default

!echo "c.IPCompleter.use_jedi=False" >> ~/.ipython/profile_default/ipython_config.py
!echo "c.TerminalInteractiveShell.display_completions='readlinelike'" >> ~/.ipython/profile_default/ipython_config.py
pycat ~/.ipython/profile_default/ipython_config.py
#> c.IPCompleter.use_jedi=False
#> c.TerminalInteractiveShell.display_completions='readlinelike'

Restart ipython, run plot.py and df.[TAB] shows the full list now without the columns hiding the df<code> object's functions and properties.