Problem building R api with plumber, RPostgreSQL, and docker - r

I'm trying to install plumber and RPostgreSQL into my docker image. Here's my dockerFile:
FROM rocker/r-base
RUN R -e "install.packages('plumber')"
RUN R -e "install.packages('RPostgreSQL')"
RUN mkdir -p /code
COPY ./plumber.R /code/plumber.R
CMD Rscript --no-save /code/plumber.R
The only thing my plumber script does is try to reference the RPostgreSQL package:
library('RPostgreSQL')
When I build, it appears to successfully install both packages, but when my script runs, it complains that RPostgreSQL doesn't exist. I've tried other base images, I've tried many things.
Any help appreciated. Thanks!

You are trying to install RPostgres and then trying to load RPostgreSQL -- these are different packages. Hence the error.
Next, as you are on r-base, the latter is installed more easily as sudo apt install r-cran-rpostgresql (maybe after an intial sudo apt update). While you're at it, you can also install plumber as a pre-made binary (along with its dependencies). So
RUN apt update -qq \
&& apt install --yes --no-install-recommends \
r-cran-rpostgresql \
r-cran-plumber
is easier and faster.

Related

Installing R in a docker container

I'm trying to install in a Ubuntu:20.04 based container miniconda and, using the conda keyword, R:4.05.
The Dockerfile I'm using is this:
FROM ubuntu:20.04
USER root
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get -y install libcurl4-openssl-dev
RUN apt-get install -y wget
RUN mkdir -p ~/miniconda3
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh -O ~/miniconda3/miniconda.sh
RUN bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
RUN export PATH=~/miniconda3/bin:$PATH
RUN rm -rf ~/miniconda3/miniconda.sh
RUN ~/miniconda3/bin/conda init bash
RUN ~/miniconda3/bin/conda init zsh
RUN ~/miniconda3/bin/conda config --add channels conda-forge
RUN ~/miniconda3/bin/activate
RUN ~/miniconda3/bin/conda install -y -c conda-forge r-base
RUN R -e "install.packages('BiocManager')"
RUN R -e "BiocManager::install('DESeq2')"
From lines 8 to 16 I download miniconda and run it in ~/miniconda3
In line 17:
RUN R -e "install.packages('BiocManager')"
I try to use R and install the BiocManager package from the command line, but I receive this error:
> [16/17] RUN R -e "install.packages('BiocManager')":
#19 2.767 /bin/sh: 1: R: not found
------
executor failed running [/bin/sh -c R -e "install.packages('BiocManager')"]: exit code: 127
I've also tried to start from the official distribution of Rocker, but in this way (the way I've shown you in this post) I would prefer it since I would end up with an image in which I have both miniconda and R.
Can someone help me?
Thanks a lot!
Each RUN command runs in a separate shell, so your export command sets the path, but then the shell exits and the path is reset for the next RUN command.
You also have to use the absolute path. Tilde expansion doesn't work.
Instead of
RUN export PATH=~/miniconda3/bin:$PATH
try
ENV PATH=/root/miniconda3/bin:$PATH

Docker does not find R despite pre-built layers

I create a docker image to run R scripts on a VM server with no access to the internet.
For the first layer I load R and all libraries
Dockerfile1
FROM r-base
## Needed to access R
ENV R_HOME /usr/lib/R
## install required libraries
RUN apt-get update
RUN apt-get -y install libgdal-dev
## install R-packages
RUN R -e "install.packages('dplyr',dependencies=TRUE, repos='http://cran.rstudio.com/')"
...
and create it
docker build -t mycreate_od/libraries -f Dockerfile1 .
Then I use this library layer to load the R script
Dockerfile2
FROM mycreate_od/libraries
## Create directory
RUN mkdir -p /home/analysis/
## Copy files
COPY my_script_dir /home/analysis/
## Run the script
CMD R -e "source('/home/analysis/my_script_dir/main.R')"
Create the analysis layer
docker build -t mycreate_od/analysis -f vault/Dockerfile2 .
On my master VM, this runs and suceeds, but on the fresh VM I get
docker run mycreate_od/analysis
R docker ERROR: R_HOME ('/usr/lib/R') not found - Recherche Google
From a previous bug search I have set the ENV variable in the Docker (see Dockerfile1),
but it looks like docker installs R on some other place.
Thanks to Dirk advice I managed to get it done using r-apt (see Dockerfile below).
The image get then built and can be run without the R_HOME error.
BTW much faster and with a significantly smaller resulting image.
FROM rocker/r-apt:bionic
RUN apt-get update && \
apt-get -y install libgdal-dev && \
apt-get install -y -qq \
r-cran-dplyr \
r-cran-rjson \
r-cran-data.table \
r-cran-crayon \
r-cran-geosphere \
r-cran-lubridate \
r-cran-sp \
r-cran-R.utils
RUN R -e 'install.packages("tools")'
RUN R -e 'install.packages("rgdal", repos="http://R-Forge.R-project.org")'
This is unfortunately a cargo-boat solution, as I am unable to explain why the previous Dockerfile failed.

Using docker buildkit caching with R-packages

I'm trying to use the docker buildkit approach to caching packages to speed up adding packages to docker containers. I learned about it from the instructions for both python and apt-get packages and useful Stackexchange answer on caching python packages while building Docker. For Python and apt-get I am able to get this to work, but I can't get it to work for R packages.
In a Dockerfile for Python I'm able to change:
RUN pip install -r requirements.txt
to (and the comment looking bit at the top of the Dockerfile is needed)
# syntax=docker/dockerfile:experimental
RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt
And then when I add a package to the requirements.txt file, rather than re-downloading and building the packages, pip is able to re-use all the work it has done. So buildkit cache mounts add a level of caching beyond the image layers of docker. It's a massive timesaver. I'm hoping to set up something similar for r-packages.
Here is what I've tried that works for apt-get but not r-packges. I've also tried with the install2.r script.
# syntax=docker/dockerfile:experimental
FROM rocker/tidyverse
RUN rm -f /etc/apt/apt.conf.d/docker-clean; echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache
RUN --mount=type=cache,target=/var/cache/apt --mount=type=cache,target=/var/lib/apt \
apt update && apt install -y gcc \
zsh \
vim
COPY ./requirements.R .
RUN --mount=type=cache,target=/usr/local/lib/R/site-library Rscript ./requirements.R
I think I don't understand:
How buildkit works. Does it do the building of containers inside a container? ie the cache path is on the 'build container'?
What one needs to specify as the target for R to notice that it already has downloaded (and possibly built).
I suspect that it has something to do with the keep.source command when installing an R package, as discussed in this question

How to speed up R packages installation in docker

Say you have the following list of packages you would like to install for a docker image
("jsonlite","dplyr","stringr","tidyr","lubridate",
"knitr","purrr","tm","cba","caret",
"plumber","httr")
It actually takes around 1 hour to install these!
Any suggestions into how to speed up such a thing ? (or how to prevent the re-installation at every new image build ?)
Side note
I do not install these packages from the dockerfile like this:
RUN Rscript -e "install.packages('stringr')
...
Instead I create an R script Requirements.R which installs these packages and simply do:
RUN Rscript Requirements.R
Is these less optimal than installing the packages directly from the Dockerfile ?
Use binary packages where you can as we often do in the Rocker Project providing multiple Docker files for R, including the official r-base one.
If you start from Ubuntu, you get Michael's PPAs with over 3000+ packages; if you start from Debian you get fewer from the distro but still many essential ones. (There are some efforts to bring more binary packages to Debian but nothing is up right now.)
Lastly, Dockerfile creation is of course compile time too. You spend the time once (per container creation) and re-use potentially many time after. Also, by using the Docker Hub you can avoid spending your local cpu cycles.
Edit in Sep 2020: The (updated) Ubuntu PPA now has over 4600 package for the three most recent LTS releases. Still highly, highly recommended.
I found an article that described how to install R packages from precompiled binaries. It reduced the build time on our Jenkins server from 45 minutes down to 3 minutes.
Here is my Dockerfile:
FROM rocker/r-apt:bionic
WORKDIR /app
RUN apt-get update && \
apt-get install -y libxml2-dev
# Install binaries (see https://datawookie.netlify.com/blog/2019/01/docker-images-for-r-r-base-versus-r-apt/)
COPY ./requirements-bin.txt .
RUN cat requirements-bin.txt | xargs apt-get install -y -qq
# Install remaining packages from source
COPY ./requirements-src.R .
RUN Rscript requirements-src.R
# Clean up package registry
RUN rm -rf /var/lib/apt/lists/*
COPY ./src /app
EXPOSE 5000
CMD ["Rscript", "Server.R"]
You can add a file requirements-bin.txt with package names:
r-cran-plumber
r-cran-quanteda
r-cran-irlba
r-cran-lsa
r-cran-caret
r-cran-stringr
r-cran-dplyr
r-cran-magrittr
r-cran-randomforest
And finally, a requirements-src.R for packages that are not available as binairies:
pkgs <- c(
'otherpackage'
)
install.packages(pkgs)
I ended up using rocker/r-base as #DirkEddelbuettel suggested. Also thanks to this How to avoid reinstalling packages when building Docker image for Python projects? I wrote my Dockerfile in a way that doesen't reinstall packages every time I rebuild my docker image.
I want to share how my Dockerfile looks like now, hopefully this will be of help to others:
FROM rocker/r-base
RUN apt-get update
# install packages
RUN apt-get -y install libcurl4-openssl-dev
RUN apt-get -y install libssl-dev
# set work directory
WORKDIR /myapp
# copy requirments R script
COPY ./Requirements.R /myapp/Requirements.R
# run requirments R script
RUN Rscript Requirements.R
COPY . /myapp
EXPOSE 8094
ENV NAME R-test-service
CMD ["Rscript", "my_R_api.R"]

how can i install the packages in r properly and correctly using Rstudio app?

I tried to install "xlsx" package using Rstudio, and i couldn't install it.
I am trying to install my packages using install.packages("xlsxjars"). I already tried doing that from Tools window in Rstudio app, and I tried using the console.
I am using Linux (Gnome 17.04). I get this error:
input
install.packages("xlsxjars")
output
Installing package into ‘/home/aim/R/x86_64-pc-linux-gnu-library/3.3’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/src/contrib/xlsxjars_0.6.1.tar.gz'
Error in install.packages : error reading from connection
if you get some Errors while installing r packages kind of somthing.so is miising or some packages are missing depending on the version of R, you should try this commands because, i had this kind of errors with R 3.0.x in Ubuntu 14.04
so i upgrade my R version to fix it with the following commands :
sudo sh -c 'echo "deb http://cran.rstudio.com/bin/linux/ubuntu trusty/" »
gpg —keyserver keyserver.ubuntu.com —recv-key E084DAB9
gpg -a —export E084DAB9 | sudo apt-key add -
sudo apt-get —purge remove r-base-core
sudo apt-get update
sudo apt-get -y install r-base
sudo apt-get update
now i can install all packages without errors
I totally agree with the answer of #kissi salim yahia
Just want to add . When you tape in R
install.packages("your_package")
you got some dependencies below.
these one may have troubles sometimes. So just need to install them manually from the SHELL like:
sudo apt-get install r-cran-your_dependencie
I spent a while struggling with that too. Eventually I just gave up as the java in that package was out of date and I would have needed to load older Java up to get it to run. I would suggest readxl instead.
A simple intro to the package can be found here: https://www.datacamp.com/community/tutorials/r-tutorial-read-excel-into-r#readxl

Resources