Rscript not working with packaged R for AWS Lambda - r

I'm trying to run an R script on the command line of an AWS EC2 instance using packaged R binaries and libraries (without installation) -- the point is to test the script for deployment to AWS Lambda. I followed these instructions. The instructions are for packaging up all the R binaries and libraries in a zip file and moving everything to a Amazon EC2 instance for testing. I unzipped everything on the new machine, ran 'sudo yum update' on the machine, and set R's environment variables to point to the proper location:
export R_HOME=$HOME
export LD_LIBRARY_PATH=$HOME/lib
NOTE: $HOME is equal to /home/ec2-user.
I created this hello_world.R file to test:
#!/home/ec2-user/bin/Rscript
print ("Hello World!")
But when I ran this:
ec2-user$ Rscript hello_world.R
I got the following error:
Rscript execution error: No such file or directory
So I checked the path, but everything checks out:
ec2-user$ whereis Rscript
Rscript: /home/ec2-user/bin/Rscript
ec2-user$ whereis R
R: /home/ec2-user/bin/R /home/ec2-user/R
But when I tried to evaluate an expression using Rscript at the command line, I got this:
ec2-user$ Rscript -e "" --verbose
running
'/usr/lib64/R/bin/R --slave --no-restore -e '
Rscript execution error: No such file or directory
It seems Rscript is still looking for R in the default location '/usr/lib64/R/bin/R' even though my R_HOME variable is set to '/home/ec2-user':
ec2-user$ echo $R_HOME
/home/ec2-user
I've found sprinkles of support, but I can't find anything that addresses my specific issue. Some people have suggested reinstalling R, but my understanding is, for the purposes of Lambda, everything needs to be self-contained so I installed R on a separate EC2 instance, then packaged it up. I should mention that everything runs fine on the machine where R was installed with the package manager.
SOLUTION: Posted my solution in the answers.

It thinkt it is staring at you right there:
ec2-user$ whereis R
R: /home/ec2-user/bin/R /home/ec2-user/R
is where you put R -- however it was built for / expects this:
ec2-user$ Rscript -e "" --verbose
running
'/usr/lib64/R/bin/R --slave --no-restore -e '
These paths are not the same. The real error may be your assumption that you could just relocate the built and configured R installation to a different directory. You can't.
You could build R for the new (known) path and install that. On a system where the configured-for and installed-at path are the same, all is good:
$ Rscript -e "q()" --verbose
running
'/usr/lib/R/bin/R --slave --no-restore -e q()'
$

This blog post walks through a similar problem and offers a potential solution. I also had to implement part of the solution from this post.
I changed the very first line of R's source code from this:
#!/bin/sh
# Shell wrapper for R executable.
R_HOME_DIR=${R_ROOT_DIR}/lib64${R_ROOT_DIR}
To this:
R_HOME_DIR=${RHOME}/lib64${R_ROOT_DIR}
I'll explain why below.
NOTE -- The rest of the code is:
if test "${R_HOME_DIR}" = "${R_ROOT_DIR}/lib64${R_ROOT_DIR}"; then
case "linux-gnu" in
linux*)
run_arch=`uname -m`
case "$run_arch" in
x86_64|mips64|ppc64|powerpc64|sparc64|s390x)
libnn=lib64
libnn_fallback=lib
;;
*)
libnn=lib
libnn_fallback=lib64
;;
esac
if [ -x "${R_ROOT_DIR}/${libnn}${R_ROOT_DIR}/bin/exec${R_ROOT_DIR}" ]; then
R_HOME_DIR="${R_ROOT_DIR}/${libnn}${R_ROOT_DIR}"
elif [ -x "${R_ROOT_DIR}/${libnn_fallback}${R_ROOT_DIR}/bin/exec${R_ROOT_DIR}" ]; then
R_HOME_DIR="${R_ROOT_DIR}/${libnn_fallback}${R_ROOT_DIR}"
## else -- leave alone (might be a sub-arch)
fi
;;
esac
fi
if test -n "${R_HOME}" && \
test "${R_HOME}" != "${R_HOME_DIR}"; then
echo "WARNING: ignoring environment value of R_HOME"
fi
R_HOME="${R_HOME_DIR}"
export R_HOME
You can see at the bottom, the code sets R_HOME equal to R_HOME_DIR, which it originally assigned based on R_ROOT_DIR.
No matter what you set the R_HOME_DIR or R_HOME variable to, R resets everything using the R_ROOT_DIR variable.
With the change, I can set all my environment variables:
export RHOME=$PWD/R #/home/ec2-user/R
export R_HOME=$PWD/R #/home/ec2-user/R
export R_ROOT_DIR=/R #/R
I set RHOME to my working directory where the R package sits. RHOME basically acts as a prefix, in my case, it's /home/ec2-user/.
Also, Rscript appends /R/bin to whatever RHOME is, so now I can properly run...
Rscript hello_world.R
...on the command line. Rscript knows where to find R, which knows where to find all it's stuff.
I feel like packaging up R to run in a portable self-contained folder, without using Docker or something, should be easier than this, so if anyone has a better way of doing this, I'd really appreciate it.

Another more quickly method:
create same folder /usr/lib/R/bin/
then put R into this folder.

Related

R Markdown: Can't access Bash command installed through Conda/Anaconda

I'm exploring some bioinformatics data and I like to use R notebooks (i.e. Rmarkdown) when I can. Right now, I need to use a command line tool to analyze a VCF file and I would like to do it through a Bash code chunk in the Rmarkdown notebook.
The problem is that the command I want to use was installed with conda into my conda environment. The tool is bcftools. When I try to access this command, I get this error (code chunk commented out to show rmarkdown code chunk format):
#```{bash}
bcftools view -H test.vcf.gz
#```
/var/folders/9l/phf62p1s0cxgnzp4hgl7hy8h0000gn/T/RtmplzEvEh/chunk-code-6869322acde0.txt: line 3: bcftools: command not found
Whereas if I run from Terminal, I get output (using conda environment called "binfo"):
> bcftools view -H test.vcf.gz | head -n 3
chr10 78484538 . A C . PASS DP=57;SOMATIC;SS=2;SSC=16;GPV=1;SPV=0.024109 GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:34:33:0:0%:0,33,0,0 0/1:.:23:19:4:17.39%:1,18,0,4
chr12 4333138 . G T . PASS DP=119;SOMATIC;SS=2;SSC=14;GPV=1;SPV=0.034921 GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:72:71:1:1.39%:71,0,1,0 0/1:.:47:42:5:10.64%:42,0,5,0
chr15 75086860 . C T . PASS DP=28;SOMATIC;SS=2;SSC=18;GPV=1;SPV=0.013095 GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:15:15:0:0%:4,11,0,0 0/1:.:13:8:5:38.46%:5,3,1,4
(binfo)
So, how do I access tools installed with conda/in my conda env from an R notebook/Rmarkdown bash code chunk? I searched for quite a while and could not find anyone talking about running conda commands in a shell chunk in Rmarkdown. Any help would be appreciated because I like the R notebook format for exploratory analysis.
Passing Arguments to Engines
If your Conda is properly configured to work in bash, then you can use engine.opts to tell bash to launch in login mode (i.e., source your .bash_profile (Mac) or .bashrc (Linux)):
bash
```{bash engine.opts='-l'}
bcftools view -H test.vcf.gz
```
zsh
If working with zsh (e.g., Mac OS 10.15 Catalina users), then the interactive flag, --interactive|-i is what you want (Credit: #Leo).
```{zsh engine.opts='-i'}
bcftools view -H test.vcf.gz
```
Again, this presumes you've previously run conda init zsh to set up Conda to work with the shell.
Note on Reproducibility
Since reproducibility is usually a concern in scientific work, I will add that you may want to do something to capture the state of your Conda environment. For example, if you are working in version control, then commit a conda env export > environment.yaml. Another option would be to output that info directly at the end of the Rmd, like what is usually done with sessionInfo(). That is,
```{bash engine.opts='-l', comment=NA}
conda env export
```
where the comment=NA is so that the output can be cleanly copied from the rendered version.
Quick solution for bash: prepend the following init script into your Bash scripts.
eval "$(command conda 'shell.bash' 'hook' 2> /dev/null)"
# you may need to activate the "base" environment explicitly
conda activate base
Detail
When you open your terminal, an interactive shell is spawned. But your script is run in a non-interactive shell. Bash configuration file ~/.bashrc will not be used for the scripts, which skips the conda initialization and your "base" environment is not exposed into PATH.
References
Python - Activate conda env through shell script

executing file contents as commands under R from Linux terminal

I've written a text file containing script for R. I've gotten it to run under Windows from a .bat file running a .txt file under R with CMD BATCH.
I'm trying to replicate that (minus the clickability) in the Terminal
I've changed the permissions for program execution, I've set the file to have the shebang, and have tried rewriting for it a few different programmes such as
#!/usr/bin/R
library(rvest)
library(plyr)
which returns an error "Syntax error near unexpected symbol 'rvest'
and
#!/home/robert/Téléchargements/R-3.2.3/src/unix/Rscript.c
library(rvest)
library(plyr)
which also returns an error "Syntax error near unexpected symbol 'rvest'
Separately, on both of these I changed the file extension from nothing to .R
In one case it gave the same error, in the other it started a session of R but didn't execute the commands.
I realise it's a messy question, but I'm having difficulty getting these ducks in a row.
Ultimately, this was what worked:
R < /home/robert/R/scraper1.R --no-save
But here's the rest of my answer in case that doesn't work for someone else:
I'm not sure if you've seen any of these, but here's some stuff to try:
Dupe?
The top answer from a very similar post: I'm going to assume you googled your question before posting it, so I'm sure you've already seen this, but it's not referenced in your question, so here it is [Source]:
Content of script.r:
#!/usr/bin/Rscript
cat("Hello")
Invocation from command line:
./script.r
?Rscript
?Rscript allows you to run R scripts in a Unix-esque system [Source]:
## example #! script for a Unix-alike
#! /path/to/Rscript --vanilla --default-packages=utils
args <- commandArgs(TRUE)
res <- try(install.packages(args))
if(inherits(res, "try-error")) q(status=1) else q()
Batch
Here's something from an old R mail pipe [Source]:
Place the line: R --vanilla < foo.txt foo.results into a file named foo.batch. No other text should be in the file.
Make this file executable via chmod 755 foo.batch
At the command line try at -f foo.batch now or perhaps, batch -f foo.batch.
If this does not work, ask your system administrator how to set up a batch process.
The advantage of the batch process is 1. you need not be logged in, 2.
your job will take a lower priority than interactive jobs.
R -e
Loading two libraries and running an R command [Source]
R -e 'library("rmarkdown");library("knitr");rmarkdown::render("NormalDevconJuly.Rmd")'
R -e 'library("markdown");rpubsUpload("normalDev","NormalDevconJuly.html")'
Other
R < scriptname.R --no-save [Source]
$ source("scriptname.R") [Source]

Rscript: command not found

I'm working with R for a while, and I always worked with Rstudio, I tried just now to run a Rscript command in terminal (I have a mac..) and I got this error-
>Rscript script.R
-bash: Rscript: command not found
when I tried to open R in the terminal I go the same error-
>R
-bash: R: command not found
I can run R code with the Rstudio and the R application, but I know there is a way to run R throw the terminal.
Did I miss something when I installed R on my computer? do I need to add R to my PATH?
thanks in advance!
Steps to run R script through Windows command prompt
Set the PATH variable for Rscript.exein the environment variables. Rscript.exe can be found inside bin folder of R. Set the path for Rscript.exe to use Rscript command in Windows command prompt. To check if Rscript.exe has been set environmentally or not, type Rscript in command prompt. The follwoing message should come.
Go to Command Prompt, set the path where your .R file is there.
Run the following command: Here abcd.R is present under Documents folder. So I set path and then run Rscript abcd.R
For those who stumbled upon this but use a mac, you might find this useful. I recently downloaded and installed R and RStudio through the CRAN site. I didn't do it through homebrew. Since I downloaded this install directly from the site, it DID NOT add the RScript executable to my /usr/local/bin directory.
I have locate on my mac so I did a quick lookup:
locate RScript
And I found it here:
/Library/Frameworks/R.framework/Versions/4.0/Resources/bin/Rscript
What I had to do was create a symbolic link to my /usr/local/bin directory to get it to work:
cd /usr/local/bin
ln -s /Library/Frameworks/R.framework/Versions/4.0/Resources/bin/Rscript Rscript
Now I'm able to run Rscript through the command line. This may help someone else out there.

Executing Shell Script via Automator

via Terminal/Totalterminal or iTerm, this script works very well:
cd ~/go/to/dir/ && R -e "shiny::runApp("/go/to/dir", launch.browser=TRUE)"
but as/in a App via Automator, the second Part wont work.
In Automator: run Shell-Script.
Where is the difference of "normal" Terminal and the Terminal used by Automator.
In both /bin/bash
Is r on the path in Automator? If not, replace r with the full path like /usr/local/bin/r (shown by which r).

/usr/bin/env: RScript: No such file or directory | After recent R-3.0.1. installation.

I am a bit lost when dealing with installing and using R. I installed R 3.0.1 from source and did the ./configure, make, make check, and make install as suggested. However I tried running R but it said that R wasn't in the /usr/bin folder. So I then copied the entire R-3.0.1/bin directory into my /usr/bin directory using cp. Now I'm getting a few errors regarding /usr/bin/env when trying to use RScript on a hello_world.R script I wrote from the O'Reilly R In a Nutshell book I store in a file hello_world.R the contents are below:
#! /usr/bin/env RScript
print("Hello World!");
Simple enough, but when I try to load it I get the following error:
$ ./hello_world.R
/usr/bin/env: RScript: No such file or directory
I'm not sure if this is a PATH problem or something, but when I search in my /usr/bin directory I do see the RScript file in there along with (R, BATCH, and the others associated with R programming language). Any help is greatly appreciated. Cheers.
You may be using an invalid command line option for Rscript in your shebang line.
For instance ...
#!/usr/bin/env RScript --vanilla
remove "--vanilla" (or other offending option) and rerun your script
#!/usr/bin/env RScript
I know you didn't put this in your example, but the solution may help others searching for the same issue.
Again, the good solution to this problem is very simple and clearly explained in the man page of env. The script should use the env command to invoke Rscript and not Rscript directly:
#!/usr/bin/env Rscript
some R code now...
But a script like this will read the user's .Rprofile among other things. When we want to have a vanilla R session (in order to start with a clean and controlled R), we must pass the option --vanilla. If you try something like
#!/usr/bin/env Rscript --vanilla
some R code now...
env will take the string Rscript --vanilla a the command to execute and will inevitably return the error message
/usr/bin/env: ‘Rscript --vanilla’: No such file or directory
In env's man page, there is an option called -S for splitting the strings. Its role is exactly to solve the problem above and use the first string Rscript as the command name, and the following strings (like --vanilla) as options to pass to Rscript.
The solution is therefore:
#!/usr/bin/env -S Rscript --vanilla
some R code now...
Put in the shebang line of your script #!/usr/bin/Rscript and it should work.
As a side remark if you want to keep up-to-date with the R versions from CRAN and not relying on the native R of your Linux distro (Ubuntu) then add the following line in your apt sources:
deb http://my_favorite_cran_mirror/bin/linux/ubuntu raring/
After that you can always use the apt system to install R which -I would agree with Jake above- it should be the preferable way to install R.
*Change the my_favorite_cran_mirror with a valid CRAN mirror that is close to you.
#! /usr/bin/env RScript
print("Hello World!");
Simple enough, but when I try to load it I get the following error:
$ ./hello_world.R
/usr/bin/env: RScript: No such file or directory
Here u make mistake is that instead of RScript write Rscript.
The syntax will be
#! /usr/bin/env Rscript
print("Hello World!");
Then run it it will work (y) all the best.
$./hello_world.R
I arrived at this question trying to understand this error message on a cluster computer where I did not have control over the R installation.
In general, when I converted Rscript in my makefile to /usr/bin/Rscript the error message no longer occurred.

Resources