IPython is so much more useful than the typical
that the first thing to do
as a programmer on a new computer
is to install IPython before anything else!
IPython gives you features like
running programs from the shell,
editing files from inside python,
exploring source code
among other things.
To use the latest Version 6 of IPython,
you need python 3.4 or later.
Don't worry if you have never programmed with Python before. The language is fun to learn. Learning python is beyond the scope of this guide, so feel free to consult the resources in the sidebar of the Python subreddit. If you need help and are new to programming in general, you can post to Learn Python or contact the HPC admins if your question is cluster related.
Requirements for this guide:
- Some experience with the shell
- Some experience with any programming language
- Overview of IPython
- Building up a script
- Running python programs
- Debugging errors
This introduction to IPython is part of the HPC intermediate workshop.
First check the version of Python you are currently using.
python --version #> Python 2.6.6
The operating system version of Python 2.6.6 is ancient, so if that's the version you have by default, you should load a more recent version. We need Python 3.4 or later for IPython 6's static code completion, so let's use the latest python 3.6.1:
module -t avail python module load python/3.6.1 python --version #> Python 3.6.1
I promise that is the last time we will use
Now we can install IPython:
pip3 install --user ipython #> Successfully installed decorator-4.0.11 ipython-6.0.0 ipython-genutils-0.2.0 jedi-0.10.2 pexpect-4.2.1 pickleshare-0.7.4 prompt-toolkit-1.0.14 ptyprocess-0.5.1 pygments-2.2.0 simplegeneric-0.8.1 traitlets-4.3.2 wcwidth-0.1.7
--user argument is to install the packages
to our home directory under
because we don't have write permissions to the system directory
Now IPython is installed!
But if we try to run
ipython we get an error
because it is not yet in the
ipython #> -bash: ipython: command not found
Let's edit our
~/.bashrc so that the shell knows where to search for the program:
echo 'export PATH=$HOME/.local/bin:$PATH' >> ~/.bashrc source ~/.bashrc which ipython #> ~/.local/bin/ipython
ipython #> Python 3.6.1 (default, May 13 2017, 20:28:40) #> Type 'copyright', 'credits' or 'license' for more information #> IPython 6.0.0 -- An enhanced Interactive Python. Type '?' for help. #> #> In :
Spent 2 minutes to read the builtin help as suggested by the startup message:
Like many IPython commands, the help opens up inside of the
use your the keyboard commands for
less you know and love:
|arrow keys||Move one line|
|g||Jump to top|
|G||Jump to bottom|
|/word||Search for "word"|
|n||Next search result|
|N||Previous search result|
As the help screen explains,
IPython gives us all the power of the shell inside of python.
We can run shell commands inside of Python,
so that we never heave to leave python to run various housekeeping commands.
This is possible using what it calls "magic".
All magic commands begin with a
including the command to learn about "magic"!
Exit out of the help screen and let's read the short form of magic commands:
One can see the equivalent full help with
As you can see the help is vast. For now let's look at the few most important commands by creating our project directories.
# The [TAB] here means use the TAB key; don't actually type "[TAB]" ^_~ cd /scr[TAB] #> cd /scratch/ cd /scratch/ #> '/gpfs/scratchfs1' ls #> (Lots of directories!) mkdir -p abc12345/ipy # Replace with your NetID! cd abc12345/ipy # Replace with your NetID! #> /gpfs/scratchfs1/abc12345/ipy mkdir data src results ls #> data/ results/ src/ !find #> . #> ./results #> ./data #> ./src !tree -F $PWD #> /gpfs/scratchfs1/abc12345/ipy #> ├── data/ #> ├── results/ #> └── src/ #> #> 3 directories, 0 files
To run any shell command,
we put an exclamation mark
! before the command name.
We used the
to recursively print all out directores,
though tree is a nicer format for humans (and worse for machines!)
$PWD is a shell variable
which we are able to access inside of IPython
only when running shell commands with
Strictly speaking, the
$PWD was not necessary,
but it nicely shows the absolute path
and demonstrates that shell variables are usable with shell commands inside IPython.
Now that we have our directory structure we can move on to the next section of creating our Python script.
We will use the IPython shell to try many lines of python code, some of which are will then save to our script.
Let's create our script.
Create a new script with your favorite editor
and then run the script using the
run magic command.
# Use your favorite editor here like emacs or vim. Or nano if you you're not familiar with either: !nano src/plot.py
Create your script similar to:
1 # coding: utf-8 2 '''Explore airline data''' 3 4 import os 5 import pandas as pd
Save your file and exit from your editor. Back in the IPython shell, run your program:
run src/plot.py #> (No output if all is well!)
If you need to install any packages,
!pip3 install --user NAME_OF_PACKAGE.
We can see that the
pd libraries have been imported
but IPython has better
dir() #> ... #> 'os', #> 'pd', #> 'quit'] who #> os pd whos #> Variable Type Data/Info #> ------------------------------------ #> os module <module 'os' from '/apps2<...>6.1/lib/python3.6/os.py'> #> pd module <module 'pandas' from '/a<...>ages/pandas/__init__.py'>
Note that unlike running a script from the command-line with
running a program from inside IPython allows us to inspect the variables to play around,
or to troubleshoot if something goes wrong.
Building a Script
At this point, our program does nothing useful.
Let's read in some airline data.
cd data/ #> /gpfs/scratchfs1/abc12345/ipy/data !wget http://stat-computing.org/dataexpo/2009/1987.csv.bz2 !bzip2 -d 1987.csv.bz2 ls #> 1987.csv !du -hs 1987.csv #> 122M 1987.csv cd ../src #> /gpfs/scratchfs1/abc12345/ipy/src df = pd.read[TAB] df = pd.read_csv('../data/[TAB]' df = pd.read_csv('../data/1987.csv') type(df) #> pandas.core.frame.DataFrame df.columns #> Index(['Year', 'Month', 'DayofMonth', 'DayOfWeek', 'DepTime', 'CRSDepTime', #> 'ArrTime', 'CRSArrTime', 'UniqueCarrier', 'FlightNum', 'TailNum', #> 'ActualElapsedTime', 'CRSElapsedTime', 'AirTime', 'ArrDelay', #> 'DepDelay', 'Origin', 'Dest', 'Distance', 'TaxiIn', 'TaxiOut', #> 'Cancelled', 'CancellationCode', 'Diverted', 'CarrierDelay', #> 'WeatherDelay', 'NASDelay', 'SecurityDelay', 'LateAircraftDelay'], #> dtype='object')
Now we want to save the
df = read_csv(...) line to our
but we don't want to copy and paste the command.
We can use the
save IPython magic command
However save requires knowing the line number. We have run a couple of commands, so let's print out a list of our commands with their line numbers:
hist -n # (Lots of commands!) hist -nl # (Only last 10 lines)
In my output, the command corresponds to IPython line 83, so I will append only that line:
!save -a ./plot.py 83 # Change 83 to whatever you see in your output! #> The following commands were written to file `./plot.py`: #> df = pd.read_csv('../data/1987.csv')
We will be using the save command frequently,
and IPython has a builtin variable
_i for the last line we ran.
pass statement in python does nothing,
so let's save that to the file for testing:
pass save -a plot.py _i #> The following commands were written to file `plot.py`: #> pass
Now whenever we need to save the last line,
we can type
[Ctrl] + [R] to recall previous commands
and then start typing "save" and hit
save -a plot.py _i #> I-search backward: sa[Enter][Enter]
This saves us from going back and forth while editing our script.
Try inspecting our DataFrame of airline times:
df.[TAB] #> (After several seconds a menu pops up!)
There is an open issue with tab completion on large objects. As a workaround, we can disable jedi completion:
%config %config IPCompleter %config IPCompleter.use_jedi=False df.[TAB] #> (Now completion menu pops up immediately!)
It's difficult to all the completion options for a complicated pandas DataFrame object. Let's switch to the old completion behavior:
One cannot move one page at a time because we are in multicolumn mode. Let's change that to the older readline way:
%config TerminalInteractiveShell.display_completions #> 'multicolumn' %config TerminalInteractiveShell.display_completions='readlinelike' df.[TAB]
Unlike the previous setting,
this change require a restart of IPython.
Also the configuration changes by using
%config are temporary.
We need to save them in a profile configuration file to apply them permanently.
!ipython profile locate #> /home/abc12345/.ipython/profile_default !echo "c.IPCompleter.use_jedi=False" >> ~/.ipython/profile_default/ipython_config.py !echo "c.TerminalInteractiveShell.display_completions='readlinelike'" >> ~/.ipython/profile_default/ipython_config.py pycat ~/.ipython/profile_default/ipython_config.py #> c.IPCompleter.use_jedi=False #> c.TerminalInteractiveShell.display_completions='readlinelike'
df.[TAB] shows the full list now without the columns hiding the
df<code> object's functions and properties.