Dockerfile for minimum image size on R-base - r

I'm trying to minimize the size of my docker image. It's on R-Base from the rocker project. It needs to be as small as possible, since it is used as a container instance in a cloud based workflow.
The image needs some extra packages (dplyr, pdftools, stringr and AzureStor). Some are available as binary, but AzureStor I could not find as such.
I already used some recommended commands to minimize size. What can I do more? Please read the docker file below. A few options I'm considering now:
Can I save space using 'no cache'? How do I 'implement' this?
Is there a binary version for a R package like AzureStor? I can't find it.
Are there any other build commands or dockerfile lines I can use to reduce any excess size?
Any help would be much appreciated!
Here is my current dockerfile
FROM rocker/r-base:latest
## install binary, build and dependend packages from single run command
RUN apt-get update && apt-get install -y -qq --no-install-recommends --purge
r-cran-pdftools \
r-cran-dplyr \
r-cran-stringr \
libxml2-dev \
libssl-dev && \
## install non-binary packages (from the same run command)
echo "r <- getOption('repos');r['CRAN'] <- 'http://cran.us.r-project.org'; options(repos = r);" > ~/.Rprofile && \
Rscript -e "install.packages(c('AzureStor'))" && \
mkdir -p /scripts \
## remove and clean what we can (still the same run command)
apt-get autoclean && \
apt-get -y autoremove libssl-dev && \
rm -rf /var/lib/apt/lists/*
## copy code
COPY script / script
## Set workdir
WORKDIR /scripts
## command line for autorunning the entire rscript
CMD [ "Rscript", "runscript.R"]
Right now, the size is around 800 mb. Hoping to get this down.

Related

Docker does not find R despite pre-built layers

I create a docker image to run R scripts on a VM server with no access to the internet.
For the first layer I load R and all libraries
Dockerfile1
FROM r-base
## Needed to access R
ENV R_HOME /usr/lib/R
## install required libraries
RUN apt-get update
RUN apt-get -y install libgdal-dev
## install R-packages
RUN R -e "install.packages('dplyr',dependencies=TRUE, repos='http://cran.rstudio.com/')"
...
and create it
docker build -t mycreate_od/libraries -f Dockerfile1 .
Then I use this library layer to load the R script
Dockerfile2
FROM mycreate_od/libraries
## Create directory
RUN mkdir -p /home/analysis/
## Copy files
COPY my_script_dir /home/analysis/
## Run the script
CMD R -e "source('/home/analysis/my_script_dir/main.R')"
Create the analysis layer
docker build -t mycreate_od/analysis -f vault/Dockerfile2 .
On my master VM, this runs and suceeds, but on the fresh VM I get
docker run mycreate_od/analysis
R docker ERROR: R_HOME ('/usr/lib/R') not found - Recherche Google
From a previous bug search I have set the ENV variable in the Docker (see Dockerfile1),
but it looks like docker installs R on some other place.
Thanks to Dirk advice I managed to get it done using r-apt (see Dockerfile below).
The image get then built and can be run without the R_HOME error.
BTW much faster and with a significantly smaller resulting image.
FROM rocker/r-apt:bionic
RUN apt-get update && \
apt-get -y install libgdal-dev && \
apt-get install -y -qq \
r-cran-dplyr \
r-cran-rjson \
r-cran-data.table \
r-cran-crayon \
r-cran-geosphere \
r-cran-lubridate \
r-cran-sp \
r-cran-R.utils
RUN R -e 'install.packages("tools")'
RUN R -e 'install.packages("rgdal", repos="http://R-Forge.R-project.org")'
This is unfortunately a cargo-boat solution, as I am unable to explain why the previous Dockerfile failed.

Why renv in docker is not installing and loading packages?

This is the first time I'm using Docker. I have tried a few times but this is the first time where I'm containerizing the project. The project generates reports using latex and officer. However, after building the container and running the image, Rstudio shows that none of the packages are installed and loaded.
Dockerfile
FROM rocker/verse:4.0.5
ENV RENV_VERSION 0.15.1
RUN R -e "install.packages('remotes', repos = c(CRAN = 'https://cloud.r-project.org'))"
RUN R -e "remotes::install_github('rstudio/renv#${RENV_VERSION}')"
WORKDIR /REAL
COPY . .
RUN apt-get -y update --fix-missing
#RUN apt-get -y install build-essential \
# libssl-dev \
# libcurl4-openssl-dev \
# libxml2-dev \
# libfontconfig1-dev \
# libcairo2-dev
RUN R -e 'options(renv.consent = TRUE); renv::restore(lockfile = "/REAL/renv.lock"); renv::activate()'
Building Docker image
docker build -t reports/real .
Running docker with rstudio support
docker run --rm -d -p 8787:8787 --name real_report -e ROOT=TRUE -e DISABLE_AUTH=TRUE -v C:/Users/test/sources/REAL_Reports:/home/rstudio/REAL_Reports reports/real:latest
What am I doing wrong and how do I fix it so that all packages that renv has discovered are installed and loaded?
RUN R -e 'options(renv.consent = TRUE); renv::restore(lockfile = "/REAL/renv.lock"); renv::activate()'
You likely need to activate / load your project before calling restore(), or just forego using a project-local library altogether.
https://rstudio.github.io/renv/articles/docker.html may also be useful here.

Reduce Docker Image Size with R-installation & dependencies

I am using images to run R in its base form with certain packages and its dependencies installed. For this, I create an intermediary (basic) image which I subsequently use to build the final image on.
Below you find a dockerfile to build this intermediary image (and not the final image)
Goal
I want to reduce the total size AND the amount of layers (and thus reduce time to 'pull' the image from the registry.
Question
What can I do to further reduce size and layers in the final image?
Current approach
Please look at the docker file enclosed:
I used a multi-stage approach to remove as much build-dependend libraries in stage 2 as I could find.
Some R packages are available as binary, but some are not. Thats why some packages are installed using the 'Rscript -e commmand'. And some are installed using the apt-get update / install commands. The latter is faster & takes up less space.
I only install the libraries the image needs to actually run the R-session using all the packages provided.
Multistage Dockerfile to create an intermediary image
# Base image
FROM rocker/r-base:latest AS stage1
#install binary & build dependencies
RUN apt-get update && apt-get install -y -qq --no-install-recommends --purge \
r-cran-pdftools \
r-cran-dplyr \
r-cran-knitr \
r-cran-magick \
r-cran-purrr \
r-cran-tidyr \
r-cran-tm \
r-cran-lubridate \
r-cran-ggplot2 \
r-cran-readxl \
pandoc \
libxml2 \
libssl1.1 \
tesseract-ocr-eng \
tesseract-ocr-nld \
liblept5 \
libgit2-dev \
libtesseract4 && \
apt-get autoclean && \
rm -rf /var/lib/apt/lists/* && \
rm -rf /tmp/*
##Build the second stage
FROM stage1 AS stage2
RUN apt-get update && apt-get install -y -qq --no-install-recommends --purge \
libtesseract-dev \
libleptonica-dev \
libxml2-dev \
libgit2-dev \
libssl-dev && \
Rscript -e "install.packages(c('rmarkdown', 'forcats', 'tesseract', 'AzureStor', 'AzureKeyVault', 'stopwords', 'SnowballC', 'NbClust', 'flexdashboard', 'formattable', 'htmlwidgets', 'xgboost'))"
#use stage 1 as base and copy run time libraries needed for final version
FROM stage1
COPY --from=stage2 /usr/local/lib/R/site-library /usr/local/lib/R/site-library
Any help is much appreciated!

How to install R packages in parallel way via docker?

I am installing several R packages from CRAN via docker file. Below is my docker file:
FROM r-base:4.0.2
RUN apt-get update \
&& apt-get install -y --auto-remove \
build-essential \
libcurl4-openssl-dev \
libpq-dev \
libssl-dev \
libxml2-dev \
&& R -e "system.time(install.packages(c('shiny', 'rmarkdown', 'Hmisc', 'rjson', 'caret','DBI', 'RPostgres','curl', 'httr', 'xml2', 'aws.s3'), repos='https://cloud.r-project.org/'))"
RUN mkdir /shinyapp
COPY . /shinyapp
EXPOSE 5000
CMD ["R", "-e", "shiny::runApp('/shinyapp/src/shiny', port = 5000, host = '0.0.0.0')"]
The docker build process is taking too much time (25 to 30 minutes). Below are the execution time details after completion of build.
user system elapsed
1306.268 232.438 1361.374
Is there any way to optimize above Dockerfile? Any way to install packages in parallel manner?
Note: I have also tried rocker/r-base, but didn't find any luck in installation speed.
‘pak’ performs package download and installation in parallel.
Unfortunately the current CRAN version of ‘pak’ (0.1.2.1) is arguably broken: it has tons of dependencies. By contrast, the development version on GitHub has no external dependencies, as it should. So we need to install that one.
So you could change your Dockerfile as follows:
…
&& Rscript -e "install.packages('pak', repos = 'https://r-lib.github.io/p/pak/dev/'); pak::pkg_install(c('shiny', 'rmarkdown', 'Hmisc', 'rjson', 'caret','DBI', 'RPostgres','curl', 'httr', 'xml2', 'aws.s3'))"
…
But, frankly, that’s quite unreadable. A better approach would be to use ARG or ENV to supply the packages to be installed (this is regardless of whether we use ‘pak’ to install packages):
FROM r-base:4.0.2
ARG PKGS="shiny, rmarkdown, Hmisc, rjson, caret, DBI, RPostgres, curl, httr, xml2, aws.s3"
RUN apt-get update \
&& apt-get install -y --auto-remove \
build-essential \
libcurl4-openssl-dev \
libpq-dev \
libssl-dev \
libxml2-dev
RUN Rscript -e 'install.packages("pak", repos = "https://r-lib.github.io/p/pak/dev/")' \
&& echo "$PKGS" \
| Rscript -e 'pak::pkg_install(strsplit(readLines("stdin"), ", ?")[[1L]])'
RUN mkdir /shinyapp
COPY . /shinyapp
EXPOSE 5000
CMD ["Rscript", "-e", "shiny::runApp('/shinyapp/src/shiny', port = 5000, host = '0.0.0.0')"]
Also note that R shouldn’t be invoked via the R binary for scripted use — that’s what Rscript is for. Amongst other things it handles stdout better.

cannot install devtools in r shiny docker

I'm trying to build a docker image for my shiny app. Below is my dockerfile. When I build my images, everything else seems fine, except I got error message Error in library(devtools) : there is no package called ‘devtools’ Execution halted. I also tried devtools::install_github('nik01010/dashboardthemes') with no success. I have non clue why? What could go wrong? Do anyone know what is wrong with my dockerfile? Thanks a lot.
# Install R version 3.6
FROM r-base:3.6.0
# Install Ubuntu packages
RUN apt-get update && apt-get install -y \
sudo \
gdebi-core \
pandoc \
pandoc-citeproc \
libcurl4-gnutls-dev \
libcairo2-dev/unstable \
libxt-dev \
libssl-dev
# Download and install ShinyServer (latest version)
RUN wget --no-verbose https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/VERSION -O "version.txt" && \
VERSION=$(cat version.txt) && \
wget --no-verbose "https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/shiny-server-$VERSION-amd64.deb" -O ss-latest.deb && \
gdebi -n ss-latest.deb && \
rm -f version.txt ss-latest.deb
# Install R packages that are required
RUN R -e "install.packages(c('devtools', 'shiny','shinythemes','shinydashboard','shinyWidgets','shinyjs', 'tidyverse', 'dplyr', 'ggplot2','rlang','DT','lubridate', 'plotly', 'leaflet', 'mapview', 'tigris', 'rgdal', 'visNetwork', 'wordcloud2', 'arules'), repos='http://cran.rstudio.com/')"
RUN R -e "library(devtools)"
RUN R -e "install_github('nik01010/dashboardthemes')"
# Copy configuration files into the Docker image
COPY shiny-server.conf /etc/shiny-server/shiny-server.conf
COPY /app /srv/shiny-server/
# Make the ShinyApp available at port 80
EXPOSE 80
# Copy further configuration files into the Docker image
COPY shiny-server.sh /usr/bin/shiny-server.sh
CMD ["/usr/bin/shiny-server.sh"]
There are a few approaches you might try.
Easiest:
Use remotes::install_github instead of devtools. remotes has many fewer dependencies if you don't need the other functionality.
Second Easiest:
Use rocker/tidyverse image from Docker Hub instead of baseR image.
docker pull rocker/tidyverse
Change line 2:
FROM rocker/verse
Hardest:
Otherwise, you will need to figure out which dependencies you need to install inside your docker image before you can install devtools. It will likely be obvious if you try to install it interactively.
Make sure the container is running
Get the container name using docker ps
Start a shell with docker exec -it <container name> /bin/bash
Start R and try to install devtools interactively

Resources