Difference between revisions of "IPython"
(Introduce the IPython shell) |
(Add sections to run programs and build up a script) |
||
Line 157: | Line 157: | ||
#> cd /scratch/ | #> cd /scratch/ | ||
cd /scratch/ | cd /scratch/ | ||
− | |||
#> '/gpfs/scratchfs1' | #> '/gpfs/scratchfs1' | ||
+ | |||
ls | ls | ||
#> (Lots of directories!) | #> (Lots of directories!) | ||
+ | |||
mkdir -p abc12345/ipy # Replace with your NetID! | mkdir -p abc12345/ipy # Replace with your NetID! | ||
cd abc12345/ipy # Replace with your NetID! | cd abc12345/ipy # Replace with your NetID! | ||
#> /gpfs/scratchfs1/abc12345/ipy | #> /gpfs/scratchfs1/abc12345/ipy | ||
+ | |||
mkdir data src results | mkdir data src results | ||
ls | ls | ||
#> data/ results/ src/ | #> data/ results/ src/ | ||
− | !tree $PWD | + | |
+ | !find | ||
+ | #> . | ||
+ | #> ./results | ||
+ | #> ./data | ||
+ | #> ./src | ||
+ | |||
+ | !tree -F $PWD | ||
#> /gpfs/scratchfs1/abc12345/ipy | #> /gpfs/scratchfs1/abc12345/ipy | ||
− | #> ├── data | + | #> ├── data/ |
− | #> ├── results | + | #> ├── results/ |
− | #> └── src | + | #> └── src/ |
#> | #> | ||
#> 3 directories, 0 files | #> 3 directories, 0 files | ||
Line 178: | Line 187: | ||
To run any shell command, | To run any shell command, | ||
we put an exclamation mark <code>!</code> before the command name. | we put an exclamation mark <code>!</code> before the command name. | ||
− | We used the <code>!tree</code> | + | We used the <code>!find</code> and <code>!tree</code> commands |
− | + | to recursively print all out directores, | |
− | + | though tree is a nicer format for humans (and worse for machines!) | |
− | when running shell commands with <code>!</code>. | + | <code>$PWD</code> is a shell variable |
+ | which we are able to access inside of IPython | ||
+ | only when running shell commands with <code>!</code>. | ||
Strictly speaking, the <code>$PWD</code> was not necessary, | Strictly speaking, the <code>$PWD</code> was not necessary, | ||
− | but it shows | + | but it nicely shows the absolute path |
− | and | + | and demonstrates that shell variables are usable with shell commands inside IPython. |
Now that we have our directory structure | Now that we have our directory structure | ||
we can move on to the next section of creating our Python script. | we can move on to the next section of creating our Python script. | ||
+ | |||
+ | = Running programs = | ||
+ | |||
+ | We will use the IPython shell to try many lines of python code, | ||
+ | some of which are will then save to our script. | ||
+ | |||
+ | Let's create our script. | ||
+ | Create a new script with your favorite editor | ||
+ | and then run the script using the <code>run</code> magic command. | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | # Use your favorite editor here like emacs or vim. Or nano if you you're not familiar with either: | ||
+ | !nano src/plot.py | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | Create your script similar to: | ||
+ | <syntaxhighlight lang="python" line> | ||
+ | # coding: utf-8 | ||
+ | '''Explore airline data''' | ||
+ | |||
+ | import os | ||
+ | import pandas as pd | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | Save your file and exit from your editor. | ||
+ | Back in the IPython shell, | ||
+ | run your program: | ||
+ | |||
+ | <syntaxhighlight lang="python"> | ||
+ | run src/plot.py | ||
+ | #> (No output if all is well!) | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | If you need to install any packages, | ||
+ | run <code>!pip3 install --user NAME_OF_PACKAGE</code>. | ||
+ | |||
+ | We can see that the <code>os</code> and <code>pd</code> libraries have been imported | ||
+ | using python's <code>dir()</code> function, | ||
+ | but IPython has better <code>who*</code> commands: | ||
+ | |||
+ | <syntaxhighlight lang="bash"> | ||
+ | dir() | ||
+ | #> ... | ||
+ | #> 'os', | ||
+ | #> 'pd', | ||
+ | #> 'quit'] | ||
+ | |||
+ | who | ||
+ | #> os pd | ||
+ | |||
+ | whos | ||
+ | #> Variable Type Data/Info | ||
+ | #> ------------------------------------ | ||
+ | #> os module <module 'os' from '/apps2<...>6.1/lib/python3.6/os.py'> | ||
+ | #> pd module <module 'pandas' from '/a<...>ages/pandas/__init__.py'> | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | Note that unlike running a script from the command-line with <code>python path/to/file.py</code>, | ||
+ | running a program from inside IPython allows us to inspect the variables to play around, | ||
+ | or to troubleshoot if something goes wrong. | ||
+ | |||
+ | = Building a Script = | ||
+ | |||
+ | At this point, | ||
+ | our program does nothing useful. | ||
+ | |||
+ | Let's read in some airline data. | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | cd data/ | ||
+ | #> /gpfs/scratchfs1/abc12345/ipy/data | ||
+ | |||
+ | !wget http://stat-computing.org/dataexpo/2009/1987.csv.bz2 | ||
+ | !bzip2 -d 1987.csv.bz2 | ||
+ | ls | ||
+ | #> 1987.csv | ||
+ | |||
+ | !du -hs 1987.csv | ||
+ | #> 122M 1987.csv | ||
+ | |||
+ | cd ../src | ||
+ | #> /gpfs/scratchfs1/abc12345/ipy/src | ||
+ | |||
+ | df = pd.read[TAB] | ||
+ | df = pd.read_csv('../data/[TAB]' | ||
+ | df = pd.read_csv('../data/1987.csv') | ||
+ | |||
+ | type(df) | ||
+ | #> pandas.core.frame.DataFrame | ||
+ | |||
+ | df.columns | ||
+ | #> Index(['Year', 'Month', 'DayofMonth', 'DayOfWeek', 'DepTime', 'CRSDepTime', | ||
+ | #> 'ArrTime', 'CRSArrTime', 'UniqueCarrier', 'FlightNum', 'TailNum', | ||
+ | #> 'ActualElapsedTime', 'CRSElapsedTime', 'AirTime', 'ArrDelay', | ||
+ | #> 'DepDelay', 'Origin', 'Dest', 'Distance', 'TaxiIn', 'TaxiOut', | ||
+ | #> 'Cancelled', 'CancellationCode', 'Diverted', 'CarrierDelay', | ||
+ | #> 'WeatherDelay', 'NASDelay', 'SecurityDelay', 'LateAircraftDelay'], | ||
+ | #> dtype='object') | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | Now we want to save the <code>df = read_csv(...)</code> line to our <code>plot.py</code> script, | ||
+ | but we don't want to copy and paste the command. | ||
+ | We can use the <code>save</code> IPython magic command | ||
+ | |||
+ | ?save | ||
+ | |||
+ | However save requires knowing the line number. | ||
+ | We have run a couple of commands, so let's print out a list of our commands with their line numbers: | ||
+ | |||
+ | hist -n | ||
+ | |||
+ | In my output, the command corresponds to IPython line 83, | ||
+ | so I will append only that line: | ||
+ | |||
+ | <syntaxhighlight lang="bash"> | ||
+ | !save -a ./plot.py 83 # Change 83 to whatever you see in your output! | ||
+ | #> The following commands were written to file `./plot.py`: | ||
+ | #> df = pd.read_csv('../data/1987.csv') | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | We will be using the save command frequently, | ||
+ | so it makes sense to create a magic alias: | ||
+ | <syntaxhighlight lang="python"> | ||
+ | last_line = lambda : In[len(In) - 2] | ||
+ | pass | ||
+ | save -a plot last_line() | ||
+ | #> The following commands were written to file `plot.py`: | ||
+ | #> pass | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | Now whenever we need to save the last line, | ||
+ | we can type <code>[Ctrl] + [R]</code> to recall previous commands | ||
+ | and then start typing "save" and hit <code>[Enter]</code> | ||
+ | |||
+ | <syntaxhighlight lang="python"> | ||
+ | save -a plot.py last_line() | ||
+ | #> I-search backward: sa[Enter][Enter] | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | This saves us from going back and forth while editing our script. | ||
[[Category:Core]] | [[Category:Core]] |
Revision as of 23:54, 13 May 2017
This article is a work in progress.
IPython is so much more useful than the typical python
shell,
that the first thing to do
as a programmer on a new computer
is to install IPython before anything else!
IPython gives you features like
tab completion,
running programs from the shell,
editing files from inside python,
saving history,
and
exploring source code
among other things.
Version 6 of IPython now also has static completion,
so if you open square brackets
you can tab-complete Python lists and dictionaries,
write really long commands and use fewer temporary variables.
Don't worry if you have never programmed with Python before. The language is fun to learn. Learning python is beyond the scope of this guide, so feel free to consult the resources in the sidebar of the Python subreddit. If you need help and are new to programming in general, you can post to Learn Python or contact the HPC admins if your question is cluster related.
Requirements for this guide:
- Some experience with the shell
- Some experience with any programming language
Learning objectives:
- Overview of IPython
- Building up a script
- Running python programs
- Debugging errors
This introduction to IPython is part of the HPC intermediate workshop.
Install IPython
First check the version of Python you are currently using.
python --version
#> Python 2.6.6
The operating system version of Python 2.6.6 is ancient, so if that's the version you have by default, you should load a more recent version. We need Python 3.4 or later for IPython 6's static code completion, so let's use the latest python 3.6.1:
module -t avail python
module load python/3.6.1
python --version
#> Python 3.6.1
I promise that is the last time we will use python
^_~
Now we can install IPython:
pip3 install --user ipython
#> Successfully installed decorator-4.0.11 ipython-6.0.0 ipython-genutils-0.2.0 jedi-0.10.2 pexpect-4.2.1 pickleshare-0.7.4 prompt-toolkit-1.0.14 ptyprocess-0.5.1 pygments-2.2.0 simplegeneric-0.8.1 traitlets-4.3.2 wcwidth-0.1.7
The --user
argument is to install the packages
to our home directory under ~/.local/
because we don't have write permissions to the system directory /apps2/python/3.6.1/
.
Now IPython is installed!
But if we try to run ipython
we get an error
because it is not yet in the $PATH
:
ipython
#> -bash: ipython: command not found
Let's edit our ~/.bashrc
so that the shell knows where to search for the program:
echo 'export PATH=$HOME/.local/bin:$PATH' >> ~/.bashrc
source ~/.bashrc
which ipython
#> ~/.local/bin/ipython
Run ipython
:
ipython
#> Python 3.6.1 (default, May 13 2017, 20:28:40)
#> Type 'copyright', 'credits' or 'license' for more information
#> IPython 6.0.0 -- An enhanced Interactive Python. Type '?' for help.
#>
#> In [1]:
Overview
Spent 2 minutes to read the builtin help as suggested by the startup message:
?
Like many IPython commands, the help opens up inside of the less
pager.
To navigate,
use your the keyboard commands for less
you know and love:
Keyboard Shortcut | Action |
---|---|
arrow keys | Move one line |
space | Next page |
b | Previous page |
g | Jump to top |
G | Jump to bottom |
/word | Search for "word" |
n | Next search result |
N | Previous search result |
q | Quit |
h | Help |
As the help screen explains,
IPython gives us all the power of the shell inside of python.
We can run shell commands inside of Python,
so that we never heave to leave python to run various housekeeping commands.
This is possible using what it calls "magic".
All magic commands begin with a %
including the command to learn about "magic"!
Exit out of the help screen and let's read the short form of magic commands:
%quickref
One can see the equivalent full help with %magic
%magic
As you can see the help is vast. For now let's look at the few most important commands by creating our project directories.
# The [TAB] here means use the TAB key; don't actually type "[TAB]" ^_~
cd /scr[TAB]
#> cd /scratch/
cd /scratch/
#> '/gpfs/scratchfs1'
ls
#> (Lots of directories!)
mkdir -p abc12345/ipy # Replace with your NetID!
cd abc12345/ipy # Replace with your NetID!
#> /gpfs/scratchfs1/abc12345/ipy
mkdir data src results
ls
#> data/ results/ src/
!find
#> .
#> ./results
#> ./data
#> ./src
!tree -F $PWD
#> /gpfs/scratchfs1/abc12345/ipy
#> ├── data/
#> ├── results/
#> └── src/
#>
#> 3 directories, 0 files
To run any shell command,
we put an exclamation mark !
before the command name.
We used the !find
and !tree
commands
to recursively print all out directores,
though tree is a nicer format for humans (and worse for machines!)
$PWD
is a shell variable
which we are able to access inside of IPython
only when running shell commands with !
.
Strictly speaking, the $PWD
was not necessary,
but it nicely shows the absolute path
and demonstrates that shell variables are usable with shell commands inside IPython.
Now that we have our directory structure we can move on to the next section of creating our Python script.
Running programs
We will use the IPython shell to try many lines of python code, some of which are will then save to our script.
Let's create our script.
Create a new script with your favorite editor
and then run the script using the run
magic command.
# Use your favorite editor here like emacs or vim. Or nano if you you're not familiar with either:
!nano src/plot.py
Create your script similar to:
1 # coding: utf-8
2 '''Explore airline data'''
3
4 import os
5 import pandas as pd
Save your file and exit from your editor. Back in the IPython shell, run your program:
run src/plot.py
#> (No output if all is well!)
If you need to install any packages,
run !pip3 install --user NAME_OF_PACKAGE
.
We can see that the os
and pd
libraries have been imported
using python's dir()
function,
but IPython has better who*
commands:
dir()
#> ...
#> 'os',
#> 'pd',
#> 'quit']
who
#> os pd
whos
#> Variable Type Data/Info
#> ------------------------------------
#> os module <module 'os' from '/apps2<...>6.1/lib/python3.6/os.py'>
#> pd module <module 'pandas' from '/a<...>ages/pandas/__init__.py'>
Note that unlike running a script from the command-line with python path/to/file.py
,
running a program from inside IPython allows us to inspect the variables to play around,
or to troubleshoot if something goes wrong.
Building a Script
At this point, our program does nothing useful.
Let's read in some airline data.
cd data/
#> /gpfs/scratchfs1/abc12345/ipy/data
!wget http://stat-computing.org/dataexpo/2009/1987.csv.bz2
!bzip2 -d 1987.csv.bz2
ls
#> 1987.csv
!du -hs 1987.csv
#> 122M 1987.csv
cd ../src
#> /gpfs/scratchfs1/abc12345/ipy/src
df = pd.read[TAB]
df = pd.read_csv('../data/[TAB]'
df = pd.read_csv('../data/1987.csv')
type(df)
#> pandas.core.frame.DataFrame
df.columns
#> Index(['Year', 'Month', 'DayofMonth', 'DayOfWeek', 'DepTime', 'CRSDepTime',
#> 'ArrTime', 'CRSArrTime', 'UniqueCarrier', 'FlightNum', 'TailNum',
#> 'ActualElapsedTime', 'CRSElapsedTime', 'AirTime', 'ArrDelay',
#> 'DepDelay', 'Origin', 'Dest', 'Distance', 'TaxiIn', 'TaxiOut',
#> 'Cancelled', 'CancellationCode', 'Diverted', 'CarrierDelay',
#> 'WeatherDelay', 'NASDelay', 'SecurityDelay', 'LateAircraftDelay'],
#> dtype='object')
Now we want to save the df = read_csv(...)
line to our plot.py
script,
but we don't want to copy and paste the command.
We can use the save
IPython magic command
?save
However save requires knowing the line number. We have run a couple of commands, so let's print out a list of our commands with their line numbers:
hist -n
In my output, the command corresponds to IPython line 83, so I will append only that line:
!save -a ./plot.py 83 # Change 83 to whatever you see in your output!
#> The following commands were written to file `./plot.py`:
#> df = pd.read_csv('../data/1987.csv')
We will be using the save command frequently, so it makes sense to create a magic alias:
last_line = lambda : In[len(In) - 2]
pass
save -a plot last_line()
#> The following commands were written to file `plot.py`:
#> pass
Now whenever we need to save the last line,
we can type [Ctrl] + [R]
to recall previous commands
and then start typing "save" and hit [Enter]
save -a plot.py last_line()
#> I-search backward: sa[Enter][Enter]
This saves us from going back and forth while editing our script.