Difference between revisions of "R Guide"

From Storrs HPC Wiki
Jump to: navigation, search
(MPI)
(MPI: Commenting to explain my previous edit: Added table partially filled out with R version and openmpi dependecies, along with updates to the script which work, Also added mpi no warn on fork line as that may show up more in the future.)
Line 84: Line 84:
 
  source /etc/profile.d/modules.sh
 
  source /etc/profile.d/modules.sh
 
  module purge
 
  module purge
  module load r/3.2.3 mpi/openmpi/1.10.3-gcc4
+
  module load r/3.2.3 mpi/openmpi/1.10.1-gcc
 +
 +
# If MPI tells you that forking is bad uncomment the line below
 +
# export OMPI_MCA_mpi_warn_on_fork=0
 
   
 
   
 
  Rscript mpi.R
 
  Rscript mpi.R

Revision as of 14:43, 22 June 2017

R
Author Sandia National Labs and Temple University
Website https://www.r-project.org
Source CRAN
Category Statistics, Commandline utility
Help manual
mailing list
conferences


R is a GNU project for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software and data analysis. See Wikipedia

There are several versions of R installed on the HPC Cluster. Users can install their own packages in their home directories.

Rstudio cannot be used on HPC

Rstudio is a very useful interface of R, our support team received many requests from users to install it on cluster. Unfortunately, the bug inside the current desktop version and our user policy stop us from installing it. The newest version of Rstudio has a bug regarding to the linking errors to QtWebkit library which has not been solved by Rstudio team yet. If you are interested in investigating such error and have suggestion for us, it is described in this page: https://bugreports.qt.io/browse/QTBUG-34302. And also Rstudio requires gstreamer for the interface. However, our cluster only has gstreamer on our login node. According to our policy, running interface on our login node is not allowed.

We apologize for the inconvenience that has brought to you. Please write and debug your R code on your own computer and copy it to cluster to run. Thank you for your cooperation.

Loading the R module

To list available versions of R, type

  module avail r

At the time of writing, version 3.11 contains the largest number of R packages which were installed per requests by users To load that module version for this session type (for example for 2.15)

  module load r/3.1.1

To make R autoload on login

  module initadd r/3.1.1

Install an R package

Local package install

Start an R session

R

Only for the purposes of installing a package, we make an exception for R run on a login node. In other cases, R processes will be killed if they are running with 100% CPU on the login node for more than 30 minutes.

If your module name is foo, run

install.packages('foo')

It will then output

Warning in install.packages("foo") :
 'lib = "/gpfs/gpfs1/apps/R/3.1.1/lib64/R/library"' is not writable
Would you like to create a personal library
~/R/x86_64-unknown-linux-gnu-library/3.1.1
to install packages into?  (y/n)

Enter y. If it exits with no errors, your local package installed successfully.

If you saw the following errors:

Warning: unable to access index for repository https://cran.fhcrc.org/src/contrib:
  unsupported URL scheme

Please remove "s" from "https", try the following command instead:

install.packages('foo',repos="http://cran.fhcrc.org")

Global package install

Please submit a ticket with the packages you would like installed and the R version, and the administrators will install it for you.

Submitting jobs

Serial

Assume that you have a script called helloworld.R with these contents:

cat('Hello world!')

Submit to Slurm scheduler using sbatch

 sbatch -n 1 R CMD BATCH helloworld.R

Submit to Slurm scheduler with multi-threading:

 sbatch -n 1 -c 20 --exclusive R CMD BATCH helloworld.R # use "-c 20" to setup multi-threading for R

When the job completes output will be written to helloworld.Rout

MPI

For MPI programs, Rmpi has been compiled against OpenMPI therefore we need to load that package in our submission script submit-mpi.slurm:

#!/bin/bash
#SBATCH -p general
#SBATCH -n 30

source /etc/profile.d/modules.sh
module purge
module load r/3.2.3 mpi/openmpi/1.10.1-gcc

# If MPI tells you that forking is bad uncomment the line below 
# export OMPI_MCA_mpi_warn_on_fork=0

Rscript mpi.R

Now create the mpi.R script:

library(parallel)

hello_world <- function() {
    ## Print the hostname and MPI worker rank.
    paste(Sys.info()["nodename"],Rmpi::mpi.comm.rank(), sep = ":")
}

cl <- makeCluster(Sys.getenv()["SLURM_NTASKS"], type = "MPI")
clusterCall(cl, hello_world)
stopCluster(cl)

Run the script with:

sbatch submit-mpi.slurm

In your slurm output you will see a message from each of the MPI workers.

Read R's built-in "parallel" package documentation for tips on parallel programming in R: https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf

Each version of R may depend on a different version of MPI, what follows is the known dependencies as of Thu Jun 22 13:30:50 EDT 2017:

R Version MPI Version
r/3.2.3 mpi/openmpi/1.10.1-gcc
r/3.3.3 mpi/openmpi/1.10.1-gcc
r/3.1.1 Unknown*

*If anybody has been using R 3.1.1 with Rmpi and knows what version works with it please let us know.

If you prefer to use one of the other MPI implementations compatible with Rmpi, such as MPICH, feel free to install your local package. This was how OpenMPI was installed in a session of R started with fisbatch (change the values in blue to whatever you want):

fisbatch
module load r/3.1.1 mpi/openmpi/1.10.1-gcc
R
install.packages('Rmpi', configure.args='--with-Rmpi-include=/apps2/openmpi/1.10.1-gcc/include --with-Rmpi-libpath=/apps2/openmpi/1.10.1-gcc/lib --with-Rmpi-type=OPENMPI')
# When prompted for the mirror, try TX (i.e. 121 at the time of writing) since some mirrors are problematic.