How to use local R installation on HPC with qsub - r

I work with a cluster where it is not possible to globally install a specific R version. Given that I built a specific version for R on folder:
<generic_path>/R/R-X.Y.Z
and I installed some packages locally on:
<generic_path/R/packages
how can I set, in a shell script (bash), the environment variables and aliases to run this specific R version, loading the packages from the local package directory?

Option 1:
Using a shell script for HPC (in my case a qsub script), this is possible by running a shell script (e.g. in bash), which contains the following lines:
alias R="<path_to_R>/R/R-X.Y.Z/bin/R"
export R_LIBS="<path_to_R>/R/packages"
export PATH="<path_to_R>/R/R-X.Y.Z/bin:${PATH}"
The script (here I named it makeenv.sh) may be run inside the qsub script with:
source makeenv.sh
Option 2: Depending on your HPC system, you might have module avail, module load commands, if so then use:
myBsubFile.sh
#!/bin/bash
# some #BSUB headers...
# ...
module load /R/R-X.Y.Z
Rscript myRcode.R
Then load libraries in the R script as:
myRcode.R
library("data.table", lib.loc = "path/to/my/libs")
# some more R code...

Related

Running Rscript in Bash on Windows

I'm writing a Git hook which should run some R code. If the hook starts with
#!/usr/bin/env Rscript
then the following code is correctly interpreted as R.
However, Rscript looks for my installed packages in my home directory. According to the R for Windows FAQ, the home directory is defined by the environment variables R_USER or HOME. However, the Git Bash shell does not contain either of these, so the home directory defaults to the working directory.
The hook, therefore, fails to locate any non-base packages.
The solution I've found is to create an R file and then use Bash in the hook to call Rscript after manually defining R_USER:
#!/bin/sh
R_USER="my/home/directory"
export R_USER
Rscript foo.R
(Where foo.R is in the project's working directory, in this case).
This works but is somewhat inelegant. I'd rather simply use Rscript in the hook itself.
So I considered setting the home directory within the Rscript:
#!/usr/bin/env Rscript
Sys.setenv(R_USER = "my/home/directory")
But while that does set R_USER, it doesn't fix the problem. I assume this means that R defines the package-finding path before the script itself is run, so defining R_USER within the script doesn't change the fact that it's still looking for packages in the working directory instead.
So, is there a solution to this (for example, by setting R_USER and then somehow telling R to update the package directory)? Or is the use-Bash-to-call-Rscript method the way to go?

Making system2 use a specific version of python

I have both python2 and python3 installed on my desktop. If I do
python -V in the terminal I get Python 3.6.0 :: Anaconda 4.3.0 (x86_64).
However if I use the system2 command from R
system2("python", args = "-V")
then it reports Python 2.7.10
If I specify the full path it I get the right version
system2("//anaconda/bin/python", args = "-V")
Python 3.6.0 :: Anaconda 4.3.0 (x86_64)
But I'd like system2 to just use python3 by default. Is there someway to change which version it uses? This is for Mac OSX
When running R from the R application or RStudio, system calls access a different environment than they do when you run R from terminal. Because of that, the PATH environment variable you have configured to run the correct version of a unix executable in a shell program is different than the one used in a system2()or system() call in an R session in either of these applications. To solve this, you need to set the path in your R environment.
In an interactive session, you can do this:
# Reproducing your problem (in the R application or RStudio)
system2("python", args="-V")
# Python 2.7.10
# set a new PATH in the environment accessed by R
# This is the line you can also add to your .Rprofile
Sys.setenv(PATH = paste(c("//anaconda/bin", Sys.getenv("PATH"),
collapse = .Platform$path.sep))
# For users other than the OP, you'll want to use the directory
# where your preferred installation of python is. For the OP that's
# //anaconda/bin
# Confirm
system2("python", args="-V")
# Python 3.6.0 :: Anaconda 4.3.0 (x86_64)
The system command python should now be found in the directory //anaconda/bin, rather than /usr/bin. This, of course, depends on where these unix executables are found in your system, so for readers other than the OP, you'll need to use the directory that holds your desired version of python.
This PATH will remain valid through the rest of your R session. To change your path in all R sessions, update (or create, if you haven't yet) your .Rprofile file. An .Rprofile file can be (or go) in your HOME directory or R_HOME. If you add the above line to .Rprofile, each time R is initialized, they will execute at the beginning of each R session.

Setting .libPaths() For Running R Scripts From Command Line Using Rscript.exe

I am trying to run R scripts via BAT files on Windows Command Prompt.
The scripts require a few R packages such as data.table, tidyR, etc.
For operational reasons, all required R packages and dependencies (including data.table) are installed at C:\Users\username\Documents\R\R-3.5.1\library. I am not allowed to install RStudio in this environment.
When I try
"C:\Program Files\R\R-3.5.1\bin\x64\Rscript.exe" script.R, I get an error similar to
Error in library(data.table) : there is no package called 'data.table'
Execution halted
How can I set the .libPaths via Command Prompt to point to the correct location of the packages (i.e. to C:\Users\username\Documents\R\R-3.5.1\library)?
Thank you in advance.
Disclaimer: I'm unfamiliar with R.
From R: Search paths :
The library search path is initialized at startup from the environment
variable R_LIBS (which should be a colon-separated list of directories
at which R library trees are rooted) followed by those in environment
variable R_LIBS_USER. Only directories which exist at the time will be
included.
By default R_LIBS is unset, and R_LIBS_USER is set to directory
‘R/R.version$platform-library/x.y’ of the home directory (or
‘Library/R/x.y/library’ for CRAN macOS builds), for R x.y.z.
An environment variable can be created with set VARIABLE_NAME=YOUR_VALUE batch command.
So your batch file should probably be something like this:
cd /d "C:\INSERT_PATH_TO_DIRECTORY_CONTAINING_script.R"
set "R_LIBS=C:\Users\username\Documents\R\R-3.5.1\library"
"C:\Program Files\R\R-3.5.1\bin\x64\Rscript.exe" script.R
However for portability reasons (let's say a collegue asks for a copy of your script or your computer dies) I suggest putting the script, R library and batch file in a single directory, let's say C:\Users\username\Documents\R. The batch file C:\Users\username\Documents\R\script.bat becomes:
cd /d "%~dp0"
set "R_LIBS=%~dp0R-3.5.1\library"
"%PROGRAMFILES%\R\R-3.5.1\bin\x64\Rscript.exe" "%~dpn0.R"
%PROGRAMFILES% environment variable expands to full path of program files folder, %~dp0 parameter expands to full path of a directory that holds your batch file, and %~dpn0 is a batch-file full path without extension.
Notice that %~dp0R-3.5.1 is not a typo because %~dp0 includes trailing backslash.
This way you can copy C:\Users\username\Documents\R to D:\Users\SOMEOTHERNAME\Documents\R and the script will still run.
If you create another version of your script, just copy the batch file so that it has same filename as your script but .bat extension instead of .R and it should call the new script - this has proven to be very handy when debugging and distributing scripts.
Alternatively, if you would rather install libraries separately you may want to use %HOMEDRIVE%%HOMEPATH% which expands to C:\Users\username.
Extracting proper Documents folder path, as well as R installation path is possible but requires reading the registry and thus is a bit more complicated.

R: Startup script on cluster

I would like R to run a script anytime I start up R. I am using R on a Linux cluster and I do not have access to R installation and the .Rprofile file and I cannot find it either. I can run R and have my local packages/libraries. For example, I would like to set the lib paths each time R is started.
.libPaths("path")
Is it possible to define and set a path for a custom startup script file?
In a non-interactive session, I could do in Bash something like
Rscript script.R
and my script.R has
.libPaths('path')
...
more code
...

Rscript: There is no package called ...?

I want to run R files in batch mode using Rscript, however it does not seem to be loading the libraries that I need. The specific error I am getting is:
Error in library(timeSeries) : there is no package called 'timeSeries'
Execution halted
However I do have the package timeSeries and can load it from Rstudio, RGui, and R from the command line no problem. The issue seems to only be when running a script using Rscript.
My system/environment variables are configured as:
C:\Program Files\R\R-3.1.0\bin\x64 (Appended to PATH)
R_HOME = C:\Program Files\R\R-3.1.0
R_User = Patrick
I am running the same version of R in RStudio, RGui, and R from command line. I've also checked .Library from these three sources and got the same output as well.
How can I run Rscript from command line with the packages that I am using (and have installed) in R?
EDIT:
I am using Rscript via Rscript script.r at the windows command line in the directory where script.r is located.
The output of Rscript -e print(.Library) is [1] "C:/PROGRA~1/R/R-31~1.0/library"
which is consistent with the other three options that I mentioned: [1] "C:/PROGRA~1/R/R-31~1.0/library"
However, if I put this in my script:
print(.libPaths())
library(timeSeries) #This is the package that failed to load
I get an output of:
[1] "C:/Program Files/R/R-3.1.0/library"
Error in library(timeSeries) : there is no package called 'timeSeries'
Execution halted
The corresponding call in RStudio gives an additional path to where the package is actually installed:
> print(.libPaths())
[1] "C:/Users/Patrick/Documents/R/win-library/3.1" "C:/Program Files/R/R-3.1.0/library"
In short, the value returned by calling Sys.getenv('R_LIBS_USER') in R.exe needs to be the same as the value returned by calling this at the command line:
Rscript.exe -e "Sys.getenv('R_LIBS_USER')"
and the above value needs to be included in this command line call:
Rscript.exe -e ".libPaths()"
Note that the values of R_LIBS_USER may be differ between R.exe and Rscript.exe if the value of R_USER is changed, either in the .Rprofile or the in target field of user's shortcut to R.exe, and in general, I find that the user library (i.e. .libPaths()[2]) is simply not set in Rscript.exe
Since I'm fond of setting R_USER to my USERPROFILE, I include the following block in at the top of .R files that I wish to run on mulitiple computers or in Rscript.exe's .Rprofile (i.e. Rscript -e "path.expand('~/.Rprofile')"):
# =====================================================================
# For compatibility with Rscript.exe:
# =====================================================================
if(length(.libPaths()) == 1){
# We're in Rscript.exe
possible_lib_paths <- file.path(Sys.getenv(c('USERPROFILE','R_USER')),
"R","win-library",
paste(R.version$major,
substr(R.version$minor,1,1),
sep='.'))
indx <- which(file.exists(possible_lib_paths))
if(length(indx)){
.libPaths(possible_lib_paths[indx[1]])
}
# CLEAN UP
rm(indx,possible_lib_paths)
}
# =====================================================================
As mentioned in the comments, it seems Rscript doesn't recognize the library path defaults automatically. I am writing an R script that needs to be source-able from the command line on different people's computers, so I came up with this more general workaround:
First store the default library path in a variable (Rscript-sourced functions can find this, they just don't automatiocally)
Then include that path in the library() call with lib.loc = argument.
This should work regardless of what the path is on a given computer.
library.path <- .libPaths()
library("timeseries", lib.loc = library.path)
Thanks again to #flodel above for putting me on the right path
This answer will not help the original asker (pbreach), but it may help someone else who stumbles across this question and has a similar problem to me.
I have many bash .sh script files which call RScript to execute .R files. My operating system is Windows 10, and I execute those bash files using cygwin.
Everything had been working fine until yesterday, when I finally upgraded my R from Revolution R 8.0.1 beta to Microsoft R Open 3.4.1. After that upgrade, every bash script that called RScript failed due to the exact same reason asked here (e.g. Error in library(zoo) : there is no package called 'zoo').
Investigation revealed that RScript actually worked fine if called from a DOS shell instead of from a cygwin bash shell.
For example, if I execute this in a DOS shell
C:\Progra~1\Microsoft\ROpen~1\R-3.4.1\bin\x64\Rscript.exe -e ".libPaths()"
I see the output
[1] "C:/Users/HaroldFinch/Documents/R/win-library/3.4"
[2] "C:/Program Files/Microsoft/R Open/R-3.4.1/library"
I eventually discovered the reason. As explained in the R FAQ, to define its home directory, R will first use the R_USER environment variable if defined, else it will use HOME environment variable if defined, else it will use the Windows "personal" directory.
My Windows configuration does not define either R_USER or HOME environment variables. So, in the DOS shell case, R uses my Windows "personal" directory (C:/Users/HaroldFinch/Documents). That is good, because that is where all my libraries are installed (C:/Users/HaroldFinch/Documents/R/win-library/3.4).
In contrast, cygwin defines and exports a HOME environment variable that points to my cygwin user directory, which lacks any R stuff. Hence, RScript called from cygwin had a wrong R home directory, and so failed to load libraries.
There are probably many ways to solve this. I decided to have my bash script set a R_USER environment variable which points to my Windows user directory.
For example, if I execute this in a cygwin bash shell:
R_USER="C:/Users/HaroldFinch/Documents"
export R_USER
/cygdrive/c/Progra~1/Microsoft/ROpen~1/R-3.4.1/bin/x64/Rscript.exe -e ".libPaths()"
I see the output
[1] "C:/Users/HaroldFinch/Documents/R/win-library/3.4"
[2] "C:/Program Files/Microsoft/R Open/R-3.4.1/library"
which is exactly the same output now as the DOS shell example above.
Another cause is packrat. If you are running with packrat, RStudio turns it on for you when you open the project. RScript does not, so you need a packrat::on() early in your script (before the library calls).
As the others have already pointed out, the problem is that Rscript.exe cannot recognise the win-library folder. The easiest solution for me was to explicitly set the path to the library folder by adding:
.libPaths("C:/Users/Benutzer1/Documents/R/win-library/4.0")
to my program. Then it loads all the packages from the win-library folder and it is still capable of loading packages from the standard library folder.

Resources