R X13binary missing in docker build - r

I have a docker file where i'm trying to install the R seasonal library:
FROM continuumio/miniconda3:4.5.12 # Debian
. . .
# Install packages not on conda
RUN conda activate r_env && \
R -e "install.packages(c('RUnit', 'seasonal'), dependencies=TRUE, repos='https://cran.case.edu')"
Everything looks like it installs correctly, however when I get into the container and run library(seasonal) I get the error:
> library(seasonal)
The binaries provided by 'x13binary' do not work on this
machine. To get more information, run:
x13binary::checkX13binary()
> x13binary::checkX13binary()
Error in x13binary::checkX13binary() : X-13 binary file not found
After some googling it looks like I can manually set the path for the binary and a findutil shows that the binary exists on the machine:
(r_env) root#89c7265d9316:/# find / -name "*x13*"
/opt/conda/envs/arimaApiR/lib/R/library/x13binary
/opt/conda/envs/arimaApiR/lib/R/library/x13binary/help/x13binary.rdx
/opt/conda/envs/arimaApiR/lib/R/library/x13binary/help/x13binary.rdb
/opt/conda/envs/arimaApiR/lib/R/library/x13binary/html/x13path.html
/opt/conda/envs/arimaApiR/lib/R/library/x13binary/html/x13binary-package.html
/opt/conda/envs/arimaApiR/lib/R/library/x13binary/bin/x13ashtml.exe
/opt/conda/envs/arimaApiR/lib/R/library/x13binary/R/x13binary.rdx
/opt/conda/envs/arimaApiR/lib/R/library/x13binary/R/x13binary.rdb
/opt/conda/envs/arimaApiR/lib/R/library/x13binary/R/x13binary
/opt/conda/envs/arimaApiR/conda-meta/r-x13binary-1.1.39_2-r36h6115d3f_0.json
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0/lib/R/library/x13binary
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0/lib/R/library/x13binary/help/x13binary.rdx
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0/lib/R/library/x13binary/help/x13binary.rdb
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0/lib/R/library/x13binary/html/x13path.html
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0/lib/R/library/x13binary/html/x13binary-package.html
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0/lib/R/library/x13binary/bin/x13ashtml.exe
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0/lib/R/library/x13binary/R/x13binary.rdx
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0/lib/R/library/x13binary/R/x13binary.rdb
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0/lib/R/library/x13binary/R/x13binary
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0.tar.bz2
However no matter whatever I set the path to be, the library still throws errors on where the actual path is:
(r_env) root#89c7265d9316:/# export X13_PATH=/opt/conda/envs/arimaApiR/lib/R/library/x13binary
(r_env) root#89c7265d9316:/# R -e "library(seasonal)"
The system variable 'X13_PATH' has been manually set to:
/opt/conda/envs/arimaApiR/lib/R/library/x13binary
Since version 1.2, 'seasonal' relies on the 'x13binary'
package and does not require 'X13_PATH' to be set anymore.
Only set 'X13_PATH' manually if you intend to use your own
binaries. See ?seasonal for details.
Binary executable file /opt/conda/envs/arimaApiR/lib/R/library/x13binary/x13as or /opt/conda/envs/arimaApiR/lib/R/library/x13binary/x13ashtml not found.
See ?seasonal for details.
I feel like I'm running in circles. Has anyone had luck running this inside a container?

I've prepared my own container but I didn't use continuumio/miniconda since I don't know how it works inside.
This is the Dockerfile I've prepared:
FROM r-base:3.6.1
RUN apt-get update \
&& apt-get install -y libxml2-dev
RUN R -e "install.packages('RUnit', dependencies=TRUE, repos='https://cran.case.edu')"
RUN R -e "install.packages('x13binary', dependencies=TRUE, repos='https://cran.case.edu')"
RUN R -e "install.packages('seasonal', dependencies=TRUE, repos='https://cran.case.edu')"
CMD [ "bash" ]
If I run your test commands, I receive this:
> library(seasonal)
> x13binary::
x13binary::checkX13binary x13binary::supportedPlatform x13binary::x13path
> x13binary::checkX13binary
x13binary::checkX13binary
> x13binary::checkX13binary()
x13binary is working properly
>
NOTE: the Dockerfile can be improve, e.g. you can put together the packages c(RUnit, x13binary, seasonal) and you can remove the apt cache after installing the package but I just wanted to run a test to see if it'd work.

Related

Using R in a Snakemake workflow with Mambaforge

I'm building a pipeline with Snakemake. One rule involves an R script that reads a CSV file using readr. I get this error when I run the pipeline with --use-singularity and --use-conda
Error: Unknown TZ UTC
In addition: Warning message:
In OlsonNames() : no Olson database found
Execution halted
Google suggests readr is crashing due to missing tzdata but I can't figure out how to install the tzdata package and make readr see it. I am running the entire pipeline in a Mambaforge container to ensure reproducibility. Snakemake recommends using Mambaforge over a Miniconda container as it's faster, but I think my error involves Mambaforge as using Miniconda solves the error.
Here's a workflow to reproduce the error:
#Snakefile
singularity: "docker://condaforge/mambaforge"
rule targets:
input:
"out.txt"
rule readr:
input:
"input.csv"
output:
"out.txt"
conda:
"env.yml"
script:
"test.R"
#env.yml
name: env
channels:
- default
- bioconda
- conda-forge
dependencies:
- r-readr
- tzdata
#test.R
library(readr)
fp <- snakemake#input[[1]]
df <- read_csv(fp)
print(df)
write(df$x, "out.txt")
I run the workflow with snakemake --use-conda --use-singularity. How do I run R scripts when the Snakemake workflow is running from a Mambaforge singularity container?
Looking through the stack of R code leading to the error, I see that it checks a bunch of default locations for the zoneinfo folder that tzdata includes, but also checks for a TZDIR environment variable.
I believe a proper solution to this would be for the Conda tzdata package to set this variable to point to it. This will require a PR to the Conda Forge package (see repo issue). In the meantime, one could do either of the following as workarounds.
Workaround 1: Set TZDIR from R
Continuing to use the tzdata package from Conda, one could set the environment variable at the start of the R script.
#!/usr/bin/env Rscript
## the following assumes active Conda environment with `tzdata` installed
Sys.setenv("TZDIR"=paste0(Sys.getenv("CONDA_PREFIX"), "/share/zoneinfo"))
I would consider this a temporary workaround.
Workaround 2: Derive a New Docker
Otherwise, make a new Docker image that includes a system-level tzdata installation. This appears to be a common issue, so following other examples (and keeping things clean), it'd go something like:
Dockerfile
FROM --platform=linux/amd64 condaforge/mambaforge:latest
## include tzdata
RUN apt-get update > /dev/null \
&& DEBIAN_FRONTEND="noninteractive" apt-get install --no-install-recommends -y tzdata > /dev/null \
&& apt-get clean
Upload this to Docker Hub and use it instead of the Mambaforge image as the image for Snakemake. This is probably a more reliable long-term solution, but perhaps not everyone wants to create a Docker Hub account.

Creating a portable version of R for Mac (and installing package from source for this version)

I am trying to create a completely portable version of R for Mac that I can send to users with no R on their system and they can essentially double click a command file and it launches a Shiny application. I'll need to be able to install packages including some built from source (and some from GitHub).
I am using the script from this GitHub repository (https://github.com/dirkschumacher/r-shiny-electron/blob/master/get-r-mac.sh) as a starting point (it's also pasted below), creating a version of R, but (A) I find that when I try to launch R it gives me an error not finding etc/ldpaths and (B) when I try to launch Rscript it runs my system version -- I run `Rscript -e 'print(R.version)' and it prints out 4.0 which is my system version of R rather than the version 3.5.1 which the shell script has downloaded and processed.
I've experimented with editing the "R" executable and altering R_HOME and R_HOME_DIR but it still runs into issues when I try to install packages to the 3.5.1 directory.
Can anyone provide some guidance?
(By the way docker is not an option, this needs to be as simple as possible end-users with limited technical skills. So having them install docker etc won't be an option)
#!/usr/bin/env bash
set -e
# Download and extract the main Mac Resources directory
# Requires xar and cpio, both installed in the Dockerfile
mkdir -p r-mac
curl -o r-mac/latest_r.pkg \
https://cloud.r-project.org/bin/macosx/R-3.5.1.pkg
cd r-mac
xar -xf latest_r.pkg
rm -r r-1.pkg Resources tcltk8.pkg texinfo5.pkg Distribution latest_r.pkg
cat r.pkg/Payload | gunzip -dc | cpio -i
mv R.framework/Versions/Current/Resources/* .
rm -r r.pkg R.framework
# Patch the main R script
sed -i.bak '/^R_HOME_DIR=/d' bin/R
sed -i.bak 's;/Library/Frameworks/R.framework/Resources;${R_HOME};g' \
bin/R
chmod +x bin/R
rm -f bin/R.bak
# Remove unneccessary files TODO: What else
rm -r doc tests
rm -r lib/*.dSYM
Happy to help you get this working for your shiny app. You can use this github repo for Electron wrapping R/Shiny... just clone, and replace the app.R (for your other packages you need to install them in the local R folder after cloning and then running R from the command line out of the R-Portable-Mac/bin folder...
Try it with the Hello World app.R that is included first
https://github.com/ColumbusCollaboratory/electron-quick-start
And, then installing your packages in the local R-Portable-Mac folder runtime. Included packages by default...
https://github.com/ColumbusCollaboratory/electron-quick-start/tree/master/R-Portable-Mac/library
Your packages will show up here after install.packages() from the command line using the local R-Mac-Portable runtime.
We have been working on a R Addin for this also...
https://github.com/ColumbusCollaboratory/photon
But, note the add-in is still a work in progress and doesn't work with compiled R packages; still have to go into the local R folder and runtime on the command line and install the packages directly into the local R folder libpath as discussed above.
Give it a try and let us know through Github issues if you have any questions and issues. And, if you've already posted out there, sorry we haven't responded as of yet. Would love to communicate through the photon Add-In for this to get it working with compiling packages (into the libPath)--if you have the time to help. Thanks!

Rscript not finding installed packages in container

I am trying to schedule and R script to run inside a container. I have a docker file like this:
# Install R version 3.5
FROM rocker/tidyverse:3.5.1
USER root
# Install Ubuntu packages
RUN apt-get update && apt-get install -y \
sudo \
gdebi-core \
pandoc \
pandoc-citeproc \
libcurl4-gnutls-dev \
libcairo2-dev \
libxt-dev \
libssl-dev \
xtail \
wget \
cron
# Install R packrat, which we'll then use to install the other packages
RUN R -e 'install.packages("packrat", repos="http://cran.rstudio.com", dependencies=TRUE);'
# copy packrat files
COPY packrat/ /home/project/packrat/
# copy .Rprofile so that it know where to look for packages
COPY .Rprofile /home/project/
RUN R -e 'packrat::restore(project="/home/project");'
# Copy DB query script into the Docker image
COPY 002_query_db_for_kpis.R /home/project/002_query_db_for_kpis.R
# copy crontab for db query
COPY db_query_cronjob /etc/crontabs/db_query_cronjob
# give execution rights
RUN chmod 644 /etc/crontabs/db_query_cronjob
# run the job
RUN crontab /etc/crontabs/db_query_cronjob
# start cron in the foreground
CMD ["cron", "-f"]
It builds ok and then the cron job fails silently. When I investigate with:
docker exec -it 19338f50b4ed Rscript `/home/project/002_query_db_for_kpis.R`
The output I get is:
Error in library(zoo) : there is no package called ‘zoo’
Execution halted
Now, the first part of the scripts looks like:
#!/usr/local/bin/env Rscript --default-packages=zoo,RcppRoll,lubridate,broom,magrittr,tidyverse,rlang,RPostgres,DBI
library(zoo)
...
So, clearly it's not finding the packages. They are in there though. That was the whole point of packrat and copying the .Rprofile, and it seemed to work because if I run a shell inside the container while it's running I can find them in:
root#d2b4f6e7eade:/usr/local/lib/R/site-library#
and all the packrat files seem in the right place as well.. could it be that the .Rprofile file isn't being seen because it starts with a '.'? Can I change that?
UPDATE
If I don't use packrat, but install packages normally, it works. Digging around inside the container's files, I can see that /usr/local/lib/R/site-library doesn't have the packages needed in it, whereas /home/project/packrat/src does. So, it must be to do with Rscript looking in the wrong place. I thought the .Rprofile in /home/project would solve that but it doesn't.. maybe something else I didn't copy over? Although I've got the script running now, it's not ideal since, those packages might be different versions (hence why I want to use packrat), so if anyone can figure out how to get it to work with packrat I'll mark that answer as correct.
A couple things to try based on problem and update:
have you ignored your packrat/lib* and packrat/src/ directories in .dockerignore? i am worried you are copying over all the built packages and so restore() thinks the packages already been built in your container.
does your root container have executable privs on the packrat.lock file? obviously would prevent restore from running.
change docker install user to the rocker rstudio image's default "rstudio", moves just the packrat.lock and packrat.opts files
USER rstudio
COPY --chown=rstudio:rstudio packrat/packrat.* /home/project/packrat/
A good reference for these options: https://rviews.rstudio.com/2018/01/18/package-management-for-reproducible-r-code/

use conda skeleton to build an r bioconductor package

I know that its generally easier to install r packages, even within condas, using install packages, or similar. But I also know that I can usually build my own package with, for instance
conda skeleton cran tensorA
conda build r-tensorA
conda install --use-local r-tensorA
But what if the package lives in bioconda, rathaer than cran? DECIPHER, for instance can be installed within R by running
source("https://bioconductor.org/biocLite.R")
biocLite("DECIPHER")
For primarily learning purposes, I'd like to try to build DECIPHER (and other bioconductor packages) into a condas package. Can anyone point me in a good direction to do something like this? Or if you are really feeling awesome, ouline the steps one would take?
Ok. Here's how I managed to solve this:
I used conda skeleton files from another r-build as a template. It turns out, that for bioconductor packages here's an example, there is a link to the git repository under the heading Source Repository.
I found the following guide helpful here.
I ended up making a meta.yaml file that looks like this
package:
name: r-decipher
version: "2.6.0"
source:
git_url: https://git.bioconductor.org/packages/DECIPHER
requirements:
build:
- r
- bioconductor-biostrings
- r-rsqlite
run:
- r
- bioconductor-biostrings
- r-rsqlite
test:
test:
commands:
# You can put additional test commands to be run here.
- $R -e "library('DECIPHER')" # [not win]
- "\"%R%\" -e \"library('DECIPHER')\"" # [win]
about:
home: https://bioconductor.org/packages/release/bioc/html/DECIPHER.html
I also made a build.sh file that looks like the following:
#!/bin/bash
if [[ $target_platform =~ linux.* ]] || [[ $target_platform == win-32 ]] || [[ $target_platform == win-64 ]] || [[ $target_platform
== osx-64 ]]; then
export DISABLE_AUTOBREW=1
mv DESCRIPTION DESCRIPTION.old
grep -v '^Priority: ' DESCRIPTION.old > DESCRIPTION
$R CMD INSTALL --build .
else
mkdir -p $PREFIX/lib/R/library/decipher
mv * $PREFIX/lib/R/library/decipher
fi
And a bld.bat that looks like the following
"%R%" CMD INSTALL --build .
IF %ERRORLEVEL% NEQ 0 exit 1
All of these went in a source directory called r-decipher.
From the directory outside of that one I ran conda build, which worked (I may have had to install some dependencies, but it complains about them on the command line and each is available in CRAN at least) and then ran conda install r-decipher.
If anyone wants to use my specific build of r-decipher, it can now be found at https://anaconda.org/cramjaco/r-decipher

Rscript not working with packaged R for AWS Lambda

I'm trying to run an R script on the command line of an AWS EC2 instance using packaged R binaries and libraries (without installation) -- the point is to test the script for deployment to AWS Lambda. I followed these instructions. The instructions are for packaging up all the R binaries and libraries in a zip file and moving everything to a Amazon EC2 instance for testing. I unzipped everything on the new machine, ran 'sudo yum update' on the machine, and set R's environment variables to point to the proper location:
export R_HOME=$HOME
export LD_LIBRARY_PATH=$HOME/lib
NOTE: $HOME is equal to /home/ec2-user.
I created this hello_world.R file to test:
#!/home/ec2-user/bin/Rscript
print ("Hello World!")
But when I ran this:
ec2-user$ Rscript hello_world.R
I got the following error:
Rscript execution error: No such file or directory
So I checked the path, but everything checks out:
ec2-user$ whereis Rscript
Rscript: /home/ec2-user/bin/Rscript
ec2-user$ whereis R
R: /home/ec2-user/bin/R /home/ec2-user/R
But when I tried to evaluate an expression using Rscript at the command line, I got this:
ec2-user$ Rscript -e "" --verbose
running
'/usr/lib64/R/bin/R --slave --no-restore -e '
Rscript execution error: No such file or directory
It seems Rscript is still looking for R in the default location '/usr/lib64/R/bin/R' even though my R_HOME variable is set to '/home/ec2-user':
ec2-user$ echo $R_HOME
/home/ec2-user
I've found sprinkles of support, but I can't find anything that addresses my specific issue. Some people have suggested reinstalling R, but my understanding is, for the purposes of Lambda, everything needs to be self-contained so I installed R on a separate EC2 instance, then packaged it up. I should mention that everything runs fine on the machine where R was installed with the package manager.
SOLUTION: Posted my solution in the answers.
It thinkt it is staring at you right there:
ec2-user$ whereis R
R: /home/ec2-user/bin/R /home/ec2-user/R
is where you put R -- however it was built for / expects this:
ec2-user$ Rscript -e "" --verbose
running
'/usr/lib64/R/bin/R --slave --no-restore -e '
These paths are not the same. The real error may be your assumption that you could just relocate the built and configured R installation to a different directory. You can't.
You could build R for the new (known) path and install that. On a system where the configured-for and installed-at path are the same, all is good:
$ Rscript -e "q()" --verbose
running
'/usr/lib/R/bin/R --slave --no-restore -e q()'
$
This blog post walks through a similar problem and offers a potential solution. I also had to implement part of the solution from this post.
I changed the very first line of R's source code from this:
#!/bin/sh
# Shell wrapper for R executable.
R_HOME_DIR=${R_ROOT_DIR}/lib64${R_ROOT_DIR}
To this:
R_HOME_DIR=${RHOME}/lib64${R_ROOT_DIR}
I'll explain why below.
NOTE -- The rest of the code is:
if test "${R_HOME_DIR}" = "${R_ROOT_DIR}/lib64${R_ROOT_DIR}"; then
case "linux-gnu" in
linux*)
run_arch=`uname -m`
case "$run_arch" in
x86_64|mips64|ppc64|powerpc64|sparc64|s390x)
libnn=lib64
libnn_fallback=lib
;;
*)
libnn=lib
libnn_fallback=lib64
;;
esac
if [ -x "${R_ROOT_DIR}/${libnn}${R_ROOT_DIR}/bin/exec${R_ROOT_DIR}" ]; then
R_HOME_DIR="${R_ROOT_DIR}/${libnn}${R_ROOT_DIR}"
elif [ -x "${R_ROOT_DIR}/${libnn_fallback}${R_ROOT_DIR}/bin/exec${R_ROOT_DIR}" ]; then
R_HOME_DIR="${R_ROOT_DIR}/${libnn_fallback}${R_ROOT_DIR}"
## else -- leave alone (might be a sub-arch)
fi
;;
esac
fi
if test -n "${R_HOME}" && \
test "${R_HOME}" != "${R_HOME_DIR}"; then
echo "WARNING: ignoring environment value of R_HOME"
fi
R_HOME="${R_HOME_DIR}"
export R_HOME
You can see at the bottom, the code sets R_HOME equal to R_HOME_DIR, which it originally assigned based on R_ROOT_DIR.
No matter what you set the R_HOME_DIR or R_HOME variable to, R resets everything using the R_ROOT_DIR variable.
With the change, I can set all my environment variables:
export RHOME=$PWD/R #/home/ec2-user/R
export R_HOME=$PWD/R #/home/ec2-user/R
export R_ROOT_DIR=/R #/R
I set RHOME to my working directory where the R package sits. RHOME basically acts as a prefix, in my case, it's /home/ec2-user/.
Also, Rscript appends /R/bin to whatever RHOME is, so now I can properly run...
Rscript hello_world.R
...on the command line. Rscript knows where to find R, which knows where to find all it's stuff.
I feel like packaging up R to run in a portable self-contained folder, without using Docker or something, should be easier than this, so if anyone has a better way of doing this, I'd really appreciate it.
Another more quickly method:
create same folder /usr/lib/R/bin/
then put R into this folder.

Resources