R Markdown: Can't access Bash command installed through Conda/Anaconda - r

I'm exploring some bioinformatics data and I like to use R notebooks (i.e. Rmarkdown) when I can. Right now, I need to use a command line tool to analyze a VCF file and I would like to do it through a Bash code chunk in the Rmarkdown notebook.
The problem is that the command I want to use was installed with conda into my conda environment. The tool is bcftools. When I try to access this command, I get this error (code chunk commented out to show rmarkdown code chunk format):
#```{bash}
bcftools view -H test.vcf.gz
#```
/var/folders/9l/phf62p1s0cxgnzp4hgl7hy8h0000gn/T/RtmplzEvEh/chunk-code-6869322acde0.txt: line 3: bcftools: command not found
Whereas if I run from Terminal, I get output (using conda environment called "binfo"):
> bcftools view -H test.vcf.gz | head -n 3
chr10 78484538 . A C . PASS DP=57;SOMATIC;SS=2;SSC=16;GPV=1;SPV=0.024109 GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:34:33:0:0%:0,33,0,0 0/1:.:23:19:4:17.39%:1,18,0,4
chr12 4333138 . G T . PASS DP=119;SOMATIC;SS=2;SSC=14;GPV=1;SPV=0.034921 GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:72:71:1:1.39%:71,0,1,0 0/1:.:47:42:5:10.64%:42,0,5,0
chr15 75086860 . C T . PASS DP=28;SOMATIC;SS=2;SSC=18;GPV=1;SPV=0.013095 GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:15:15:0:0%:4,11,0,0 0/1:.:13:8:5:38.46%:5,3,1,4
(binfo)
So, how do I access tools installed with conda/in my conda env from an R notebook/Rmarkdown bash code chunk? I searched for quite a while and could not find anyone talking about running conda commands in a shell chunk in Rmarkdown. Any help would be appreciated because I like the R notebook format for exploratory analysis.

Passing Arguments to Engines
If your Conda is properly configured to work in bash, then you can use engine.opts to tell bash to launch in login mode (i.e., source your .bash_profile (Mac) or .bashrc (Linux)):
bash
```{bash engine.opts='-l'}
bcftools view -H test.vcf.gz
```
zsh
If working with zsh (e.g., Mac OS 10.15 Catalina users), then the interactive flag, --interactive|-i is what you want (Credit: #Leo).
```{zsh engine.opts='-i'}
bcftools view -H test.vcf.gz
```
Again, this presumes you've previously run conda init zsh to set up Conda to work with the shell.
Note on Reproducibility
Since reproducibility is usually a concern in scientific work, I will add that you may want to do something to capture the state of your Conda environment. For example, if you are working in version control, then commit a conda env export > environment.yaml. Another option would be to output that info directly at the end of the Rmd, like what is usually done with sessionInfo(). That is,
```{bash engine.opts='-l', comment=NA}
conda env export
```
where the comment=NA is so that the output can be cleanly copied from the rendered version.

Quick solution for bash: prepend the following init script into your Bash scripts.
eval "$(command conda 'shell.bash' 'hook' 2> /dev/null)"
# you may need to activate the "base" environment explicitly
conda activate base
Detail
When you open your terminal, an interactive shell is spawned. But your script is run in a non-interactive shell. Bash configuration file ~/.bashrc will not be used for the scripts, which skips the conda initialization and your "base" environment is not exposed into PATH.
References
Python - Activate conda env through shell script

Related

Creating a portable version of R for Mac (and installing package from source for this version)

I am trying to create a completely portable version of R for Mac that I can send to users with no R on their system and they can essentially double click a command file and it launches a Shiny application. I'll need to be able to install packages including some built from source (and some from GitHub).
I am using the script from this GitHub repository (https://github.com/dirkschumacher/r-shiny-electron/blob/master/get-r-mac.sh) as a starting point (it's also pasted below), creating a version of R, but (A) I find that when I try to launch R it gives me an error not finding etc/ldpaths and (B) when I try to launch Rscript it runs my system version -- I run `Rscript -e 'print(R.version)' and it prints out 4.0 which is my system version of R rather than the version 3.5.1 which the shell script has downloaded and processed.
I've experimented with editing the "R" executable and altering R_HOME and R_HOME_DIR but it still runs into issues when I try to install packages to the 3.5.1 directory.
Can anyone provide some guidance?
(By the way docker is not an option, this needs to be as simple as possible end-users with limited technical skills. So having them install docker etc won't be an option)
#!/usr/bin/env bash
set -e
# Download and extract the main Mac Resources directory
# Requires xar and cpio, both installed in the Dockerfile
mkdir -p r-mac
curl -o r-mac/latest_r.pkg \
https://cloud.r-project.org/bin/macosx/R-3.5.1.pkg
cd r-mac
xar -xf latest_r.pkg
rm -r r-1.pkg Resources tcltk8.pkg texinfo5.pkg Distribution latest_r.pkg
cat r.pkg/Payload | gunzip -dc | cpio -i
mv R.framework/Versions/Current/Resources/* .
rm -r r.pkg R.framework
# Patch the main R script
sed -i.bak '/^R_HOME_DIR=/d' bin/R
sed -i.bak 's;/Library/Frameworks/R.framework/Resources;${R_HOME};g' \
bin/R
chmod +x bin/R
rm -f bin/R.bak
# Remove unneccessary files TODO: What else
rm -r doc tests
rm -r lib/*.dSYM
Happy to help you get this working for your shiny app. You can use this github repo for Electron wrapping R/Shiny... just clone, and replace the app.R (for your other packages you need to install them in the local R folder after cloning and then running R from the command line out of the R-Portable-Mac/bin folder...
Try it with the Hello World app.R that is included first
https://github.com/ColumbusCollaboratory/electron-quick-start
And, then installing your packages in the local R-Portable-Mac folder runtime. Included packages by default...
https://github.com/ColumbusCollaboratory/electron-quick-start/tree/master/R-Portable-Mac/library
Your packages will show up here after install.packages() from the command line using the local R-Mac-Portable runtime.
We have been working on a R Addin for this also...
https://github.com/ColumbusCollaboratory/photon
But, note the add-in is still a work in progress and doesn't work with compiled R packages; still have to go into the local R folder and runtime on the command line and install the packages directly into the local R folder libpath as discussed above.
Give it a try and let us know through Github issues if you have any questions and issues. And, if you've already posted out there, sorry we haven't responded as of yet. Would love to communicate through the photon Add-In for this to get it working with compiling packages (into the libPath)--if you have the time to help. Thanks!

Rscript not working with packaged R for AWS Lambda

I'm trying to run an R script on the command line of an AWS EC2 instance using packaged R binaries and libraries (without installation) -- the point is to test the script for deployment to AWS Lambda. I followed these instructions. The instructions are for packaging up all the R binaries and libraries in a zip file and moving everything to a Amazon EC2 instance for testing. I unzipped everything on the new machine, ran 'sudo yum update' on the machine, and set R's environment variables to point to the proper location:
export R_HOME=$HOME
export LD_LIBRARY_PATH=$HOME/lib
NOTE: $HOME is equal to /home/ec2-user.
I created this hello_world.R file to test:
#!/home/ec2-user/bin/Rscript
print ("Hello World!")
But when I ran this:
ec2-user$ Rscript hello_world.R
I got the following error:
Rscript execution error: No such file or directory
So I checked the path, but everything checks out:
ec2-user$ whereis Rscript
Rscript: /home/ec2-user/bin/Rscript
ec2-user$ whereis R
R: /home/ec2-user/bin/R /home/ec2-user/R
But when I tried to evaluate an expression using Rscript at the command line, I got this:
ec2-user$ Rscript -e "" --verbose
running
'/usr/lib64/R/bin/R --slave --no-restore -e '
Rscript execution error: No such file or directory
It seems Rscript is still looking for R in the default location '/usr/lib64/R/bin/R' even though my R_HOME variable is set to '/home/ec2-user':
ec2-user$ echo $R_HOME
/home/ec2-user
I've found sprinkles of support, but I can't find anything that addresses my specific issue. Some people have suggested reinstalling R, but my understanding is, for the purposes of Lambda, everything needs to be self-contained so I installed R on a separate EC2 instance, then packaged it up. I should mention that everything runs fine on the machine where R was installed with the package manager.
SOLUTION: Posted my solution in the answers.
It thinkt it is staring at you right there:
ec2-user$ whereis R
R: /home/ec2-user/bin/R /home/ec2-user/R
is where you put R -- however it was built for / expects this:
ec2-user$ Rscript -e "" --verbose
running
'/usr/lib64/R/bin/R --slave --no-restore -e '
These paths are not the same. The real error may be your assumption that you could just relocate the built and configured R installation to a different directory. You can't.
You could build R for the new (known) path and install that. On a system where the configured-for and installed-at path are the same, all is good:
$ Rscript -e "q()" --verbose
running
'/usr/lib/R/bin/R --slave --no-restore -e q()'
$
This blog post walks through a similar problem and offers a potential solution. I also had to implement part of the solution from this post.
I changed the very first line of R's source code from this:
#!/bin/sh
# Shell wrapper for R executable.
R_HOME_DIR=${R_ROOT_DIR}/lib64${R_ROOT_DIR}
To this:
R_HOME_DIR=${RHOME}/lib64${R_ROOT_DIR}
I'll explain why below.
NOTE -- The rest of the code is:
if test "${R_HOME_DIR}" = "${R_ROOT_DIR}/lib64${R_ROOT_DIR}"; then
case "linux-gnu" in
linux*)
run_arch=`uname -m`
case "$run_arch" in
x86_64|mips64|ppc64|powerpc64|sparc64|s390x)
libnn=lib64
libnn_fallback=lib
;;
*)
libnn=lib
libnn_fallback=lib64
;;
esac
if [ -x "${R_ROOT_DIR}/${libnn}${R_ROOT_DIR}/bin/exec${R_ROOT_DIR}" ]; then
R_HOME_DIR="${R_ROOT_DIR}/${libnn}${R_ROOT_DIR}"
elif [ -x "${R_ROOT_DIR}/${libnn_fallback}${R_ROOT_DIR}/bin/exec${R_ROOT_DIR}" ]; then
R_HOME_DIR="${R_ROOT_DIR}/${libnn_fallback}${R_ROOT_DIR}"
## else -- leave alone (might be a sub-arch)
fi
;;
esac
fi
if test -n "${R_HOME}" && \
test "${R_HOME}" != "${R_HOME_DIR}"; then
echo "WARNING: ignoring environment value of R_HOME"
fi
R_HOME="${R_HOME_DIR}"
export R_HOME
You can see at the bottom, the code sets R_HOME equal to R_HOME_DIR, which it originally assigned based on R_ROOT_DIR.
No matter what you set the R_HOME_DIR or R_HOME variable to, R resets everything using the R_ROOT_DIR variable.
With the change, I can set all my environment variables:
export RHOME=$PWD/R #/home/ec2-user/R
export R_HOME=$PWD/R #/home/ec2-user/R
export R_ROOT_DIR=/R #/R
I set RHOME to my working directory where the R package sits. RHOME basically acts as a prefix, in my case, it's /home/ec2-user/.
Also, Rscript appends /R/bin to whatever RHOME is, so now I can properly run...
Rscript hello_world.R
...on the command line. Rscript knows where to find R, which knows where to find all it's stuff.
I feel like packaging up R to run in a portable self-contained folder, without using Docker or something, should be easier than this, so if anyone has a better way of doing this, I'd really appreciate it.
Another more quickly method:
create same folder /usr/lib/R/bin/
then put R into this folder.

How can I print R documentation from a Linux command shell (e.g. bash)?

How can I check documentation for R code from a Linux command shell such as bash? I DO NOT mean an interactive session.
With Perl, I can use perldoc to print out documentation at the command line:
perldoc lib
I was hoping for something simple like that for R. I don't always want to pull up a full interactive R session just to look up some documentation.
There might be other ways, but one that works for me is using the -e flag to execute code on the command line. I also use the --slave flag, which prevents anything from being printed to standard output (e.g. no R startup messages, etc.):
R --slave -e '?function'
I actually created a super small script I call rdoc to act like a simple R version of perldoc:
#!/bin/bash
R --slave -e "?$1"
After installing that in my ~/bin directory (or however you install it in your PATH), it's easy:
rdoc function
If you want to look at documentation of a function from a particular package, prepend the library name followed by two colons. For example, to pull up documentation of the dmrFinder function from the charm package:
rdoc charm::dmrFinder

Executing Shell Script via Automator

via Terminal/Totalterminal or iTerm, this script works very well:
cd ~/go/to/dir/ && R -e "shiny::runApp("/go/to/dir", launch.browser=TRUE)"
but as/in a App via Automator, the second Part wont work.
In Automator: run Shell-Script.
Where is the difference of "normal" Terminal and the Terminal used by Automator.
In both /bin/bash
Is r on the path in Automator? If not, replace r with the full path like /usr/local/bin/r (shown by which r).

/usr/bin/env: RScript: No such file or directory | After recent R-3.0.1. installation.

I am a bit lost when dealing with installing and using R. I installed R 3.0.1 from source and did the ./configure, make, make check, and make install as suggested. However I tried running R but it said that R wasn't in the /usr/bin folder. So I then copied the entire R-3.0.1/bin directory into my /usr/bin directory using cp. Now I'm getting a few errors regarding /usr/bin/env when trying to use RScript on a hello_world.R script I wrote from the O'Reilly R In a Nutshell book I store in a file hello_world.R the contents are below:
#! /usr/bin/env RScript
print("Hello World!");
Simple enough, but when I try to load it I get the following error:
$ ./hello_world.R
/usr/bin/env: RScript: No such file or directory
I'm not sure if this is a PATH problem or something, but when I search in my /usr/bin directory I do see the RScript file in there along with (R, BATCH, and the others associated with R programming language). Any help is greatly appreciated. Cheers.
You may be using an invalid command line option for Rscript in your shebang line.
For instance ...
#!/usr/bin/env RScript --vanilla
remove "--vanilla" (or other offending option) and rerun your script
#!/usr/bin/env RScript
I know you didn't put this in your example, but the solution may help others searching for the same issue.
Again, the good solution to this problem is very simple and clearly explained in the man page of env. The script should use the env command to invoke Rscript and not Rscript directly:
#!/usr/bin/env Rscript
some R code now...
But a script like this will read the user's .Rprofile among other things. When we want to have a vanilla R session (in order to start with a clean and controlled R), we must pass the option --vanilla. If you try something like
#!/usr/bin/env Rscript --vanilla
some R code now...
env will take the string Rscript --vanilla a the command to execute and will inevitably return the error message
/usr/bin/env: ‘Rscript --vanilla’: No such file or directory
In env's man page, there is an option called -S for splitting the strings. Its role is exactly to solve the problem above and use the first string Rscript as the command name, and the following strings (like --vanilla) as options to pass to Rscript.
The solution is therefore:
#!/usr/bin/env -S Rscript --vanilla
some R code now...
Put in the shebang line of your script #!/usr/bin/Rscript and it should work.
As a side remark if you want to keep up-to-date with the R versions from CRAN and not relying on the native R of your Linux distro (Ubuntu) then add the following line in your apt sources:
deb http://my_favorite_cran_mirror/bin/linux/ubuntu raring/
After that you can always use the apt system to install R which -I would agree with Jake above- it should be the preferable way to install R.
*Change the my_favorite_cran_mirror with a valid CRAN mirror that is close to you.
#! /usr/bin/env RScript
print("Hello World!");
Simple enough, but when I try to load it I get the following error:
$ ./hello_world.R
/usr/bin/env: RScript: No such file or directory
Here u make mistake is that instead of RScript write Rscript.
The syntax will be
#! /usr/bin/env Rscript
print("Hello World!");
Then run it it will work (y) all the best.
$./hello_world.R
I arrived at this question trying to understand this error message on a cluster computer where I did not have control over the R installation.
In general, when I converted Rscript in my makefile to /usr/bin/Rscript the error message no longer occurred.

Resources