Compiling software

From Storrs HPC Wiki
Revision as of 15:31, 3 January 2020 by Pan14001 (talk | contribs) (Fix missing closing tag)
Jump to: navigation, search

Compiling allows you to access to the latest and greatest software. If you have never compiled software before, the process may seem a little involved at first, but once you have compiled a few programs you'll enjoy the process.

First grab a copy of the source code of your program.

Don't I need sudo permissions?

No, using the administrative program sudo is commonly suggested to install software using commands like sudo make install, but sudo is only needed because the default install locations like /usr/local are protected.

As long as you choose a different install location where you have write access, such as a location in your home directory, you don't need any special permissions or sudo.

The setting to change the install location is typically called a "prefix". We will explain how to set the prefix location.

Getting the source code

Often source code for GNU/Linux will be provided in a "tarball" file. You can recognize a tarball file by it's file extension; some examples are:

.tar.gz   .tgz    # These 2 are equivalent file extensions
.tar.bz2  .tbz2
.tar.xz

You can unpack these files in a directory using tar -xf ${NAME_OF_TARBALL}.tar.gz.

At other times, instead of a tarball, one may need to grab a copy from a version control system like git. In the case of git, one might create the source directory by cloning the source URL.

Now that we have our source files in a directory, the next thing we need to do is consider the compiler to use.

Which compiler should I use?

Usually the developer will suggest which compiler(s) are supported in the documentation. If not, using gcc is safest. Our RedHat 6.7 compute nodes use gcc 4.4.7 by default. If your compilation complains about needing a newer version you can load any of the gcc modules.

Some of our users report better performance with Intel MPI. One can access them from the intelics modules, where the version is the year.

Another option is the Portland Group (PGI) compiler. Intel and PGI tend to be more popular with Fortran programs as are quicker to implement the latest Fortran standards.

Finally, if you are compiling for GPU, you would need to load the nvidia compiler available in the cuda module.

# List all compilers: GNU, Intel, Portland Group, and Nvidia
for compiler in gcc intel pgi cuda ; do module avail $compiler ; done
# Output from the above command
----------------------- /apps2/Modules/3.2.6/modulefiles -----------------------
gcc/4.8.2     gcc/4.9.3     gcc/5.4.0-alt gcc/9.1.0
gcc/4.8.5     gcc/5.4.0     gcc/6.3.0     gcc/9.2.0

----------------------- /apps2/Modules/3.2.6/modulefiles -----------------------
intel/2019u3                      intelics/2016.1-full-gcc
intelics/2012.0.032               intelics/2016.3-full
intelics/2013.1.039-compiler      intelics/2017
intelics/2013.1.039-full(default) intelics/ifort/11.0.084
intelics/2016.1-full

----------------------- /apps2/Modules/3.2.6/modulefiles -----------------------
pgi/14.6  pgi/15.1  pgi/15.10 pgi/15.7  pgi/16.1  pgi/16.10

----------------------- /apps2/Modules/3.2.6/modulefiles -----------------------
cuda/10.0   cuda/7.0    cuda/8.0    cuda/9.1
cuda/10.1   cuda/7.5    cuda/8.0.61

Before compiling programs, you may want to remove any other modules you have loaded so that they do not interfere with your compilation.

# Unload all modules
module purge

General workflow

Follow the documentation in your software source directory. Typically the workflow is:

./configure --prefix=${HOME}/apps
make -j $(nproc)
make install

Good practise is to create a shell script which runs these commands for you, so that a few months from now you remember exactly how you compiled your software and make your work more reproducible for yourself, your lab mates and collaborators. Also, you may want to write the line set -e toward the top of your shell script so that the script stops when it encounters errors.

Nearly all software that needs compilation will at least ship with a makefile. If you are not an expert in using Makefiles, you should really, really make yourself literate in being able to read and understand them by spending 2 hours reading a short introduction to makefiles such as the Software Carpentry automation and make lesson. The make command will search for a file named Makefile. If one does not exist you would need to specify a file name using e.g. make -f ${NAME_OF_MAKEFILE}.mk.

If your software is complex enough to also require other dependencies, it would likely come with a configure shell script. It is a good idea to run ./configure --help to see how to change variables and set PATHs to libraries. You almost always would need to set the --prefix option to set the final installation path as you do not have access to the system protected directories of /bin /lib64 /usr/local etc. If you obtained your code from version control instead of a traditional release and do not see a configure script and your documentation tells you that you need one, you may likely need to also generate the configure shell script from configure.ac using a program named similar to bootstrap.sh, autogen.sh or at worst you would need to run autoreconf directly.

Good resources for understanding how the autotools programs work that process configure.ac and Makefile.am files are the basics of autotools in the Gentoo Linux development manual, and the Diego Pattenò's comprehensive online book autotools.io

Examples

VASP 5.3.3

Reading the comments in the VASP makefiles, we can compile VASP with the PGI compiler or the Intel compiler. As the makefile comments mention there is no performance change with the PGI compiler versions, we will use the Intel compiler in this example:

# Create a directory for our VASP project.
mkdir ~/src/vasp-5.3.3
cd ~/src/vasp-5.3.3

# Copy the source code from the admin directory.
cp -arv /shared/admin/sw-src/rhel6/vasp/vasp.5.3.3.tar.gz .
cp -arv /shared/admin/sw-src/vasp/vasp.5.lib.tar.gz .
# Unpack the sources.
tar -xvpf vasp.5.3.3.tar.gz
tar -xvpf vasp.5.lib.tar.gz

# Load the Intel compiler.
#
# List available Intel compiler versions:
module avail intelics
# Get rid of any other modules that might interfere with our compilation.
module purge
# The latest version at this time is 2017.
module load intelics/2017

# Compile the VASP 5 library.
cd vasp.5.lib/
# There are several makefiles to compile VASP for different types of CPUs and compilers.
# These are the linux compatible makefiles:
ls -1 makefile.linux*
# The best supported for our cluster is makefile.linux_ifc_P4
# Let's see what the makefile will do before running it.
make -n -f makefile.linux_ifc_P4
# Now compile by running make without the `-n` flag.
# Also overwrite Intel's old Fortran compiler name from `ifc` to be `ifort` by passing as a variable to `make`.
make -f makefile.linux_ifc_P4 FC=ifort
# Go back to src directory
cd ..

# Compile the VASP program.
cd vasp.5.3/
# Compile by running make without the `-n` flag.
# With VASP we cannot use `-j` for simultaneous compilation as it is unreliable.
# Overwrite BLAS variable as Intel now calls the "guide" library as "iomp5" per https://software.intel.com/en-us/forums/intel-c-compiler/topic/284445
make -f makefile.linux_ifc_P4 BLAS=-liomp5\ -mkl

Create a module file for VASP that so that we can conveniently load VASP and it's dependencies. The name that you choose for your module file is important as that is what module uses to reference it. We will make our name different by adding the "-mine" suffix to help separate it from the system installed vasp.

mkdir -p ~/mod/vasp
cd ~/mod/vasp
nano 5.3.3-mine
 1 #%Module1.0
 2 
 3 # Throw an error if any of these modules are loaded.
 4 conflict vasp
 5 conflict intelics
 6 
 7 # Load the particular Intel compiler module we used for the Math Kernel library, etc.
 8 module load intelics/2017
 9 
10 # Modify the PATH to use our compiled VASP.  Do not use a trailing slash.
11 prepend-path PATH ~/src/vasp-5.3.3/vasp.5.3

If you are interested, in learning about module files you can read man modulefile

Finally, make sure that module knows to look in your ~/mod directory for your module files by setting the MODULEPATH environmental variable:

nano ~/.bashrc  # Add the lines below.
1 # My modules
2 source /etc/profile.d/modules.sh
3 MODULEPATH=${HOME}/mod:${MODULEPATH}

Reload your ~/.bashrc file in your current shell:

1 source ~/.bashrc
2 # Finally Now we can load and run our VASP module
3 module load vasp/5.3.3-mine
4 which vasp
5 vasp -h

GEMMA 0.96

To compile and install your copy of GEMMA, let's fetch the source tarball from GitHub. Looking at the GEMMA releases page on GitHub let's copy the link to the .tar.gz source code. Below is a short installation script:

cd  # Go to the home directory
wget -O GEMMA-0.96.tar.gz https://github.com/xiangzhou/GEMMA/archive/v0.96.tar.gz
tar -xf GEMMA-0.96.tar.gz
cd GEMMA-0.96
cat README.txt  # We need the GSL and LAPACK libraries to compile.

When you try to compile GEMMA now with make all you will get a C++ error:

make all
# g++ -Wall -Weffc++ -O3 -std=gnu++11 -DWITH_LAPACK -m64 -static  -c src/main.cpp -o src/main.o
# cc1plus: error: unrecognized command line option "-std=gnu++11"
# make: *** [src/main.o] Error 1

This is because our GCC 4.4.7 compiler is too old to support the gnu++11 standard. Let's load a newer GCC module, as well as the GSL and LAPACK modules suggested by README.txt:

module load gcc/5.4.0-alt gsl/2.4 lapack/3.5.0 zlib/1.2.8
make -j $(nproc) all FORCE_DYNAMIC=1

We have quite a few module dependencies for GEMMA to run. Let's create a module file to simplify using gemma.

# Create the module file
mkdir -p mod
cat > ~/mod/gemma <<EOF
#%Module1.0
conflict gemma
module load gcc/5.4.0-alt gsl/2.4 lapack/3.5.0 zlib/1.2.8
prepend-path PATH /home/$USER/GEMMA-0.96/bin
EOF

# Add ~/mod to our MODULEPATH for us to be able to load the new gemma module
sed -i '/    module /i \                                               
    export MODULEPATH=$HOME/mod:$MODULEPATH' ~/.bashrc
source ~/.bashrc

# Load the new gemma module
module load gemma
gemma -h