Running Rscript in Bash on Windows - r

I'm writing a Git hook which should run some R code. If the hook starts with
#!/usr/bin/env Rscript
then the following code is correctly interpreted as R.
However, Rscript looks for my installed packages in my home directory. According to the R for Windows FAQ, the home directory is defined by the environment variables R_USER or HOME. However, the Git Bash shell does not contain either of these, so the home directory defaults to the working directory.
The hook, therefore, fails to locate any non-base packages.
The solution I've found is to create an R file and then use Bash in the hook to call Rscript after manually defining R_USER:
#!/bin/sh
R_USER="my/home/directory"
export R_USER
Rscript foo.R
(Where foo.R is in the project's working directory, in this case).
This works but is somewhat inelegant. I'd rather simply use Rscript in the hook itself.
So I considered setting the home directory within the Rscript:
#!/usr/bin/env Rscript
Sys.setenv(R_USER = "my/home/directory")
But while that does set R_USER, it doesn't fix the problem. I assume this means that R defines the package-finding path before the script itself is run, so defining R_USER within the script doesn't change the fact that it's still looking for packages in the working directory instead.
So, is there a solution to this (for example, by setting R_USER and then somehow telling R to update the package directory)? Or is the use-Bash-to-call-Rscript method the way to go?

Related

Prevent R in conda environment from running `~/.Rprofile`

Problem
I have a conda environment with an R installation. My issue is that the conda R will run ~/.Rprofile during startup. This is breaking the expectation that conda environments are self-contained. Specifically, I am loading packages in my ~/.Rprofile that are not installed in the conda R (I am using require(), so just warnings). The process works the following way:
The first .Rprofile file found on the R startup search path is processed. The search path is (in order): (i) Sys.getenv("R_PROFILE_USER"), (ii) ./.Rprofile, and (iii) ~/.Rprofile. Source: startup package vignette
Goal
Ideally, I would like to alter the third path to a location within the environment directory and somehow do this in the environment yaml during setup, such that I can easily replicate the setup on another device. I realize that this might not work, so a solution that permanently sets R_PROFILE_USER to an environment-specific location would also be appreciated.
Edit:
Since I am using R through rpy2 I don't think I can use the --no-init-file flag.
Quoting from ?.Rprofile, which invokes the man page for Startup,
unless --no-init-file was given, R searches for a user profile, a file
of R code. The path of this file can be specified by the
R_PROFILE_USER environment variable (and tilde expansion will be
performed). If this is unset, a file called ‘.Rprofile’ is searched
for in the current directory or in the user's home directory (in that
order). The user profile file is sourced into the workspace.
I wrote a bash script to solve this issue. Run it in the project directory containing the environment.yml. It prompts for a name of the newly created conda environment, then creates and activates it. Subsequently an empty .Rprofile is created in the environment directory and R_PROFILE_USER is set to this location. Finally, the environment is reactivated for the change to take effect. Thus, any time R is run from this environment, the newly created .Rprofile is used.
It should be noted that an .Rprofile in R_PROFILE_USER takes precedence over an .Rprofile in the project directory. This might lead to confusion if the user wants to create use such a file and is unaware of the setup.
echo What name do you want to give to the conda environment?
read input
conda env create -n $input -f environment.yml
conda activate $input
touch $CONDA_PREFIX/.Rprofile
conda env config vars set R_PROFILE_USER=$CONDA_PREFIX
conda activate $input

Execute a script from the exec directory

In R packages there can be a directory exec which contains some executable scripts. I have such a script called json_merge.R in my package numericprojection. This gets installed to ~/R/x86_64-redhat-linux-gnu-library/3.6/numericprojection/exec/json_merge.R.
To execute it I can of course specify that particular path and call it with Rscript from the command line. I was wondering whether there is some way to have R resolve this path such that I could just specify json_merge.R and numericprojection.
In the meantime I constructed this here:
r_libs_user="$(Rscript -e "cat(Sys.getenv('R_LIBS_USER'))")"
script="$r_libs_user/numericprojection/exec/projected_merge.R"
script="${script/#\~/$HOME}" # https://stackoverflow.com/a/27485157/653152
"$script"
That's what the system.file command is for. In your case that command should look like this:
system.file("exec", "json_merge.R", package = "numericprojection")
And will return:
~/R/x86_64-redhat-linux-gnu-library/3.6/numericprojection/exec/json_merge.R
If that is where the file was installed.
However, I think that your question is likely based on a misunderstanding as outlined in the comments.

Setting .libPaths() For Running R Scripts From Command Line Using Rscript.exe

I am trying to run R scripts via BAT files on Windows Command Prompt.
The scripts require a few R packages such as data.table, tidyR, etc.
For operational reasons, all required R packages and dependencies (including data.table) are installed at C:\Users\username\Documents\R\R-3.5.1\library. I am not allowed to install RStudio in this environment.
When I try
"C:\Program Files\R\R-3.5.1\bin\x64\Rscript.exe" script.R, I get an error similar to
Error in library(data.table) : there is no package called 'data.table'
Execution halted
How can I set the .libPaths via Command Prompt to point to the correct location of the packages (i.e. to C:\Users\username\Documents\R\R-3.5.1\library)?
Thank you in advance.
Disclaimer: I'm unfamiliar with R.
From R: Search paths :
The library search path is initialized at startup from the environment
variable R_LIBS (which should be a colon-separated list of directories
at which R library trees are rooted) followed by those in environment
variable R_LIBS_USER. Only directories which exist at the time will be
included.
By default R_LIBS is unset, and R_LIBS_USER is set to directory
‘R/R.version$platform-library/x.y’ of the home directory (or
‘Library/R/x.y/library’ for CRAN macOS builds), for R x.y.z.
An environment variable can be created with set VARIABLE_NAME=YOUR_VALUE batch command.
So your batch file should probably be something like this:
cd /d "C:\INSERT_PATH_TO_DIRECTORY_CONTAINING_script.R"
set "R_LIBS=C:\Users\username\Documents\R\R-3.5.1\library"
"C:\Program Files\R\R-3.5.1\bin\x64\Rscript.exe" script.R
However for portability reasons (let's say a collegue asks for a copy of your script or your computer dies) I suggest putting the script, R library and batch file in a single directory, let's say C:\Users\username\Documents\R. The batch file C:\Users\username\Documents\R\script.bat becomes:
cd /d "%~dp0"
set "R_LIBS=%~dp0R-3.5.1\library"
"%PROGRAMFILES%\R\R-3.5.1\bin\x64\Rscript.exe" "%~dpn0.R"
%PROGRAMFILES% environment variable expands to full path of program files folder, %~dp0 parameter expands to full path of a directory that holds your batch file, and %~dpn0 is a batch-file full path without extension.
Notice that %~dp0R-3.5.1 is not a typo because %~dp0 includes trailing backslash.
This way you can copy C:\Users\username\Documents\R to D:\Users\SOMEOTHERNAME\Documents\R and the script will still run.
If you create another version of your script, just copy the batch file so that it has same filename as your script but .bat extension instead of .R and it should call the new script - this has proven to be very handy when debugging and distributing scripts.
Alternatively, if you would rather install libraries separately you may want to use %HOMEDRIVE%%HOMEPATH% which expands to C:\Users\username.
Extracting proper Documents folder path, as well as R installation path is possible but requires reading the registry and thus is a bit more complicated.

How does R system() recognize command path?

How does R's built-in system() function know where to look to invoke some arbitrary OS command specified by the command argument? For example, if I homebrew install some_command_line_program, how does R's system() function know where it is located when I type:
cmd <- "some_complicated_code_from_some_command_line_program"
system(cmd, wait=FALSE)
In other words, how is R smart enough to know where to look without any user input? If I compile from source via Github (instead of homebrew install), would system() also recognize the command?
What system does depends on your OS, you've not told us (although you've given us some clues...).
On unix-alike systems, it executes it as a command in a bash shell, which searches for the first match in the directories on the $PATH environment variable. You can see what that is in R:
> Sys.getenv("PATH")
[1] "/usr/local/heroku/bin:/usr/local/heroku/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/nobackup/rowlings/NEST4B"
In Windows, it does something else.
You can get a full path to whatever it might run with Sys.which, which uses the systems' which command on unixes and fakes it on Windows. Read the help for more.
If you compile something from source then it will be found if the file that runs the command (a shell script, an executable, a #!-script in any language) is placed in a folder in your $PATH. You can create a folder yourself, say /home/user/bin, put your executables in there, add that to your $PATH, and (possibly after logging out an in again, or restarting R, or just firing up a new shell...) then R will find it.

Rscript: There is no package called ...?

I want to run R files in batch mode using Rscript, however it does not seem to be loading the libraries that I need. The specific error I am getting is:
Error in library(timeSeries) : there is no package called 'timeSeries'
Execution halted
However I do have the package timeSeries and can load it from Rstudio, RGui, and R from the command line no problem. The issue seems to only be when running a script using Rscript.
My system/environment variables are configured as:
C:\Program Files\R\R-3.1.0\bin\x64 (Appended to PATH)
R_HOME = C:\Program Files\R\R-3.1.0
R_User = Patrick
I am running the same version of R in RStudio, RGui, and R from command line. I've also checked .Library from these three sources and got the same output as well.
How can I run Rscript from command line with the packages that I am using (and have installed) in R?
EDIT:
I am using Rscript via Rscript script.r at the windows command line in the directory where script.r is located.
The output of Rscript -e print(.Library) is [1] "C:/PROGRA~1/R/R-31~1.0/library"
which is consistent with the other three options that I mentioned: [1] "C:/PROGRA~1/R/R-31~1.0/library"
However, if I put this in my script:
print(.libPaths())
library(timeSeries) #This is the package that failed to load
I get an output of:
[1] "C:/Program Files/R/R-3.1.0/library"
Error in library(timeSeries) : there is no package called 'timeSeries'
Execution halted
The corresponding call in RStudio gives an additional path to where the package is actually installed:
> print(.libPaths())
[1] "C:/Users/Patrick/Documents/R/win-library/3.1" "C:/Program Files/R/R-3.1.0/library"
In short, the value returned by calling Sys.getenv('R_LIBS_USER') in R.exe needs to be the same as the value returned by calling this at the command line:
Rscript.exe -e "Sys.getenv('R_LIBS_USER')"
and the above value needs to be included in this command line call:
Rscript.exe -e ".libPaths()"
Note that the values of R_LIBS_USER may be differ between R.exe and Rscript.exe if the value of R_USER is changed, either in the .Rprofile or the in target field of user's shortcut to R.exe, and in general, I find that the user library (i.e. .libPaths()[2]) is simply not set in Rscript.exe
Since I'm fond of setting R_USER to my USERPROFILE, I include the following block in at the top of .R files that I wish to run on mulitiple computers or in Rscript.exe's .Rprofile (i.e. Rscript -e "path.expand('~/.Rprofile')"):
# =====================================================================
# For compatibility with Rscript.exe:
# =====================================================================
if(length(.libPaths()) == 1){
# We're in Rscript.exe
possible_lib_paths <- file.path(Sys.getenv(c('USERPROFILE','R_USER')),
"R","win-library",
paste(R.version$major,
substr(R.version$minor,1,1),
sep='.'))
indx <- which(file.exists(possible_lib_paths))
if(length(indx)){
.libPaths(possible_lib_paths[indx[1]])
}
# CLEAN UP
rm(indx,possible_lib_paths)
}
# =====================================================================
As mentioned in the comments, it seems Rscript doesn't recognize the library path defaults automatically. I am writing an R script that needs to be source-able from the command line on different people's computers, so I came up with this more general workaround:
First store the default library path in a variable (Rscript-sourced functions can find this, they just don't automatiocally)
Then include that path in the library() call with lib.loc = argument.
This should work regardless of what the path is on a given computer.
library.path <- .libPaths()
library("timeseries", lib.loc = library.path)
Thanks again to #flodel above for putting me on the right path
This answer will not help the original asker (pbreach), but it may help someone else who stumbles across this question and has a similar problem to me.
I have many bash .sh script files which call RScript to execute .R files. My operating system is Windows 10, and I execute those bash files using cygwin.
Everything had been working fine until yesterday, when I finally upgraded my R from Revolution R 8.0.1 beta to Microsoft R Open 3.4.1. After that upgrade, every bash script that called RScript failed due to the exact same reason asked here (e.g. Error in library(zoo) : there is no package called 'zoo').
Investigation revealed that RScript actually worked fine if called from a DOS shell instead of from a cygwin bash shell.
For example, if I execute this in a DOS shell
C:\Progra~1\Microsoft\ROpen~1\R-3.4.1\bin\x64\Rscript.exe -e ".libPaths()"
I see the output
[1] "C:/Users/HaroldFinch/Documents/R/win-library/3.4"
[2] "C:/Program Files/Microsoft/R Open/R-3.4.1/library"
I eventually discovered the reason. As explained in the R FAQ, to define its home directory, R will first use the R_USER environment variable if defined, else it will use HOME environment variable if defined, else it will use the Windows "personal" directory.
My Windows configuration does not define either R_USER or HOME environment variables. So, in the DOS shell case, R uses my Windows "personal" directory (C:/Users/HaroldFinch/Documents). That is good, because that is where all my libraries are installed (C:/Users/HaroldFinch/Documents/R/win-library/3.4).
In contrast, cygwin defines and exports a HOME environment variable that points to my cygwin user directory, which lacks any R stuff. Hence, RScript called from cygwin had a wrong R home directory, and so failed to load libraries.
There are probably many ways to solve this. I decided to have my bash script set a R_USER environment variable which points to my Windows user directory.
For example, if I execute this in a cygwin bash shell:
R_USER="C:/Users/HaroldFinch/Documents"
export R_USER
/cygdrive/c/Progra~1/Microsoft/ROpen~1/R-3.4.1/bin/x64/Rscript.exe -e ".libPaths()"
I see the output
[1] "C:/Users/HaroldFinch/Documents/R/win-library/3.4"
[2] "C:/Program Files/Microsoft/R Open/R-3.4.1/library"
which is exactly the same output now as the DOS shell example above.
Another cause is packrat. If you are running with packrat, RStudio turns it on for you when you open the project. RScript does not, so you need a packrat::on() early in your script (before the library calls).
As the others have already pointed out, the problem is that Rscript.exe cannot recognise the win-library folder. The easiest solution for me was to explicitly set the path to the library folder by adding:
.libPaths("C:/Users/Benutzer1/Documents/R/win-library/4.0")
to my program. Then it loads all the packages from the win-library folder and it is still capable of loading packages from the standard library folder.

Resources