How to install R packages in parallel way via docker? - r

I am installing several R packages from CRAN via docker file. Below is my docker file:
FROM r-base:4.0.2
RUN apt-get update \
&& apt-get install -y --auto-remove \
build-essential \
libcurl4-openssl-dev \
libpq-dev \
libssl-dev \
libxml2-dev \
&& R -e "system.time(install.packages(c('shiny', 'rmarkdown', 'Hmisc', 'rjson', 'caret','DBI', 'RPostgres','curl', 'httr', 'xml2', 'aws.s3'), repos='https://cloud.r-project.org/'))"
RUN mkdir /shinyapp
COPY . /shinyapp
EXPOSE 5000
CMD ["R", "-e", "shiny::runApp('/shinyapp/src/shiny', port = 5000, host = '0.0.0.0')"]
The docker build process is taking too much time (25 to 30 minutes). Below are the execution time details after completion of build.
user system elapsed
1306.268 232.438 1361.374
Is there any way to optimize above Dockerfile? Any way to install packages in parallel manner?
Note: I have also tried rocker/r-base, but didn't find any luck in installation speed.

‘pak’ performs package download and installation in parallel.
Unfortunately the current CRAN version of ‘pak’ (0.1.2.1) is arguably broken: it has tons of dependencies. By contrast, the development version on GitHub has no external dependencies, as it should. So we need to install that one.
So you could change your Dockerfile as follows:
…
&& Rscript -e "install.packages('pak', repos = 'https://r-lib.github.io/p/pak/dev/'); pak::pkg_install(c('shiny', 'rmarkdown', 'Hmisc', 'rjson', 'caret','DBI', 'RPostgres','curl', 'httr', 'xml2', 'aws.s3'))"
…
But, frankly, that’s quite unreadable. A better approach would be to use ARG or ENV to supply the packages to be installed (this is regardless of whether we use ‘pak’ to install packages):
FROM r-base:4.0.2
ARG PKGS="shiny, rmarkdown, Hmisc, rjson, caret, DBI, RPostgres, curl, httr, xml2, aws.s3"
RUN apt-get update \
&& apt-get install -y --auto-remove \
build-essential \
libcurl4-openssl-dev \
libpq-dev \
libssl-dev \
libxml2-dev
RUN Rscript -e 'install.packages("pak", repos = "https://r-lib.github.io/p/pak/dev/")' \
&& echo "$PKGS" \
| Rscript -e 'pak::pkg_install(strsplit(readLines("stdin"), ", ?")[[1L]])'
RUN mkdir /shinyapp
COPY . /shinyapp
EXPOSE 5000
CMD ["Rscript", "-e", "shiny::runApp('/shinyapp/src/shiny', port = 5000, host = '0.0.0.0')"]
Also note that R shouldn’t be invoked via the R binary for scripted use — that’s what Rscript is for. Amongst other things it handles stdout better.

Related

Copying R Libraries to Docker Image from Local Libraries

I created a renv for my Shiny app. As you know that when you build a dockerfile, it has to install every package and its dependencies. If there are many packages, it takes long time.
I already installed packages my local machine. Also, the app will run on the remote machine. Is there any chance to copy the all libraries without restoring the renv? My local machine is macOS and my remote machine is Ubuntu. Due to the different Operating Systems might not work to use libraries. Probably, there are different dependencies between macOS and Ubuntu for the R packages.
Which is the best way to install R packages in a Dockerfile? Should I wait long docker building every time?
Directory
FROM openanalytics/r-base
# system libraries of general use
RUN apt-get update && apt-get install -y \
sudo \
nano \
automake \
pandoc \
pandoc-citeproc \
libcurl4-openssl-dev \
libcairo2-dev \
libxt-dev \
libssl-dev \
libssh2-1-dev \
libssl1.1 \
libmpfr-dev \
# RPostgres
libpq-dev \
libssl-dev \
# rvest
libxml2-dev \
libssl-dev
# Install Packages
# RUN R -e "install.packages(c('janitor','DBI','RPostgres','httr', 'rvest','xml2', 'shiny', 'rmarkdown', 'tidyverse', 'lubridate','shinyWidgets', 'shinyjs', 'shinycssloaders', 'openxlsx', 'DT'), repos='https://cloud.r-project.org/')"
# copy the app to the image
RUN mkdir /root/appfile
COPY appfile /root/appfile
# renv restore
RUN Rscript -e 'install.packages("renv")'
RUN Rscript -e "renv::restore(lockfile = '/root/appfile/renv.lock')"
EXPOSE 3838
CMD ["R", "-e", "shiny::runApp('/root/appfile')"]
Thanks.

Why renv in docker is not installing and loading packages?

This is the first time I'm using Docker. I have tried a few times but this is the first time where I'm containerizing the project. The project generates reports using latex and officer. However, after building the container and running the image, Rstudio shows that none of the packages are installed and loaded.
Dockerfile
FROM rocker/verse:4.0.5
ENV RENV_VERSION 0.15.1
RUN R -e "install.packages('remotes', repos = c(CRAN = 'https://cloud.r-project.org'))"
RUN R -e "remotes::install_github('rstudio/renv#${RENV_VERSION}')"
WORKDIR /REAL
COPY . .
RUN apt-get -y update --fix-missing
#RUN apt-get -y install build-essential \
# libssl-dev \
# libcurl4-openssl-dev \
# libxml2-dev \
# libfontconfig1-dev \
# libcairo2-dev
RUN R -e 'options(renv.consent = TRUE); renv::restore(lockfile = "/REAL/renv.lock"); renv::activate()'
Building Docker image
docker build -t reports/real .
Running docker with rstudio support
docker run --rm -d -p 8787:8787 --name real_report -e ROOT=TRUE -e DISABLE_AUTH=TRUE -v C:/Users/test/sources/REAL_Reports:/home/rstudio/REAL_Reports reports/real:latest
What am I doing wrong and how do I fix it so that all packages that renv has discovered are installed and loaded?
RUN R -e 'options(renv.consent = TRUE); renv::restore(lockfile = "/REAL/renv.lock"); renv::activate()'
You likely need to activate / load your project before calling restore(), or just forego using a project-local library altogether.
https://rstudio.github.io/renv/articles/docker.html may also be useful here.

Update SSL Certificates inside dockerfile

I have the following dockerfile:
FROM rocker/tidyverse:3.5.2
RUN apt-get update
# System dependices for R packages
RUN apt-get install -y \
git \
make \
curl \
libcurl4-openssl-dev \
libssl-dev \
pandoc \
libxml2-dev \
unixodbc \
libsodium-dev \
tzdata
# Clean up package installations
RUN apt-get clean
# ODBC system dependencies
RUN apt-get install -y gnupg apt-transport-https
RUN curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
RUN curl https://packages.microsoft.com/config/debian/9/prod.list > /etc/apt/sources.list.d/mssql-release.list
RUN apt-get update
RUN ACCEPT_EULA=Y apt-get install msodbcsql17 -y
# Install renv (package management)
ENV RENV_VERSION 0.11.0
RUN R -e "install.packages('remotes', repos = c(CRAN = 'https://cloud.r-project.org'))"
RUN R -e "remotes::install_github('rstudio/renv#${RENV_VERSION}')"
# Specify USER for rstudio session
ENV USER rstudio
COPY ./renv.lock /renv/tmp/renv.lock
WORKDIR /renv/tmp
RUN R -e 'renv::consent(provided = TRUE)'
RUN R -e "renv::restore()"
WORKDIR /home/$USER
I use this image to recreate environments for R scripting purposes. This was working for a number of months up until the end of September when I started getting:
Error in curl::curl_fetch_memory(url, handle = handle) :
SSL certificate problem: certificate has expired
This occured when using GET request to query a website. How do I update my certificate now and in the future to avoid certificates expiring...I do not want to use the "config(ssl_verifypeer = FALSE)" workaround.
This happened to me too. Any chance you are working on MacOS or Linux based machine? It seems to be a bug:
https://security.stackexchange.com/questions/232445/https-connection-to-specific-sites-fail-with-curl-on-macos
ca-certificates Mac OS X
SSL Certificates - OS X Mavericks
Instead of adjusting the certificates as is suggested here, you can simply set R to not verify the peer. Just add the following line to the beginning of your code.
httr::set_config(config(ssl_verifypeer = FALSE, ssl_verifyhost = FALSE))

Importing R CRAN, Bioconductor and github R packages in one Dockerfile

My apologies because I think this may be a simple question but it is something that I am really struggling to understand!
As a background, I am trying to create a Dockerfile which installs a lot of R CRAN and R Bioconductor packages as well as some R packages from Github. I want to do this as quickly as possible so I'm using rocker's base image to install binary files, see here for a great, quick tutorial: https://datawookie.dev/blog/2019/01/docker-images-for-r-r-base-versus-r-apt/
My approach is first to install all my necessary packages as binaries and, if any are not available install them from source. After this, I use the Bioconductor base image to install the necessary Bioconductor packages.
However, the packages I installed through the rocker base image aren't available after I import the Bioconductor base image. This is where I feel I don't have a clear understanding of creating Dockerfiles but I can't seem to find an answer in any documentation. Is there some way to copy these over after importing another image? I didn't know if this is necessary, I have seen others do it the same way, such as the question poster here: Minimizing the size of docker image R shiny app
To note, I import the Bioconductor base image as I thought this would help deal with dependency issues. I guess I could just install the Bioconductor packages like the R packages that weren't available as binaries but I want to do this as quickly and cleanly as possible and I thought that this would slow things down.
Essentially, I want to know what's the quickest way to install, R binaries, R non-binaries, R bioconductor and github packages all in one dockerfile.
An example of my approach is below with a very small subset of the packages I need. Note, I have shown my full approach to install R binaries, R non-binaries, R bioconductor and github packages but for the issue I am having see what happens to the tidyverse R package before and after I import the Bioconductor image; the call library(tidyverse) runs before but fails after:
Dockerfile
## Use r-ubuntu, prev r-apt:bionic to enable the use of binary r packages for speed for R 4.0
FROM rocker/r-ubuntu:18.04
## Install available binaries - for speed
RUN apt-get update && \
apt-get install -y -qq \
r-cran-tidyverse \
r-cran-ids \
r-cran-snow
## Install remaining packages from source
COPY ./requirements-src.R .
RUN Rscript requirements-src.R
## This works
RUN R -e 'library(tidyverse)'
## Install Bioconductor packages
# Docker inheritance
FROM bioconductor/bioconductor_docker:RELEASE_3_12
COPY ./requirements-bioc.R .
#Don't bother running for speed but this will run
#RUN R -e 'BiocManager::install(ask = F)' && Rscript requirements-bioc.R
#This will fail - can't find the package
RUN R -e 'library(tidyverse)'
## Install from GH the following
#Don't bother running for speed but this will run
#RUN installGithub.r mojaveazure/loomR
EXPOSE 8787
## Make R the default
CMD [”R”]
requirements-src.R
pkgs <- c(
'spelling',
'english',
'DT'
)
install.packages(pkgs)
requirements-bioc.R
bioc_pkgs<-c(
'biomaRt',
'DropletUtils',
'rhdf5'
)
BiocManager::install(bioc_pkgs,ask=F)
Just in the interest of anyone else who is facing a similar problem, I will post my solution. I am not suggesting that this is the only solution so if others find better alternatives, I'll update to it.
In the end my approach to creating docker image which installs a lot of R CRAN and R Bioconductor packages as well as some R packages from Github was:
Use the latest Rocker RStudio image - to get packages installed as binary and to also enable easy debugging of your package with the correct dependencies since you can interactively run your image
Install all libraries from the latest Bioconductor image - to ensure you can install any Bioconductor package without issue
Install CRAN binaries
Install CRAN packages from source - where binaries aren't available
Install Bioconductor packages
Install Github packages
My solution uses this steps in this order and should prove as a fast and efficient solution (the use case for me was an R package which required >80 other packages from CRAN, Bioconductor and Github as dependencies! This solution reduced the runtime to a fraction of the original). Also, since we are using the latest version of Rocker RStudio and packages, this should stay up-to-date with the latest versions of software and packages.
The Dockerfile looks like this:
#LABEL maintainer="John Doe"
## Use rstudio installs binaries from RStudio's RSPM service by default,
## Uses the latest stable ubuntu, R and Bioconductor versions
FROM rocker/rstudio
## Add packages dependencies - from Bioconductor
RUN apt-get update \
&& apt-get install -y --no-install-recommends apt-utils \
&& apt-get install -y --no-install-recommends \
## Basic deps
gdb \
libxml2-dev \
python3-pip \
libz-dev \
liblzma-dev \
libbz2-dev \
libpng-dev \
libgit2-dev \
## sys deps from bioc_full
pkg-config \
fortran77-compiler \
byacc \
automake \
curl \
## This section installs libraries
libpcre2-dev \
libnetcdf-dev \
libhdf5-serial-dev \
libfftw3-dev \
libopenbabel-dev \
libopenmpi-dev \
libxt-dev \
libudunits2-dev \
libgeos-dev \
libproj-dev \
libcairo2-dev \
libtiff5-dev \
libreadline-dev \
libgsl0-dev \
libgslcblas0 \
libgtk2.0-dev \
libgl1-mesa-dev \
libglu1-mesa-dev \
libgmp3-dev \
libhdf5-dev \
libncurses-dev \
libbz2-dev \
libxpm-dev \
liblapack-dev \
libv8-dev \
libgtkmm-2.4-dev \
libmpfr-dev \
libmodule-build-perl \
libapparmor-dev \
libprotoc-dev \
librdf0-dev \
libmagick++-dev \
libsasl2-dev \
libpoppler-cpp-dev \
libprotobuf-dev \
libpq-dev \
libperl-dev \
## software - perl extensions and modules
libarchive-extract-perl \
libfile-copy-recursive-perl \
libcgi-pm-perl \
libdbi-perl \
libdbd-mysql-perl \
libxml-simple-perl \
libmysqlclient-dev \
default-libmysqlclient-dev \
libgdal-dev \
## new libs
libglpk-dev \
## Databases and other software
sqlite \
openmpi-bin \
mpi-default-bin \
openmpi-common \
openmpi-doc \
tcl8.6-dev \
tk-dev \
default-jdk \
imagemagick \
tabix \
ggobi \
graphviz \
protobuf-compiler \
jags \
## Additional resources
xfonts-100dpi \
xfonts-75dpi \
biber \
libsbml5-dev \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
#install R CRAN binary packages
RUN install2.r -e \
testthat
## Install remaining packages from source
COPY ./requirements-src.R .
RUN Rscript requirements-src.R
## Install Bioconductor packages
COPY ./requirements-bioc.R .
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
libfftw3-dev \
gcc && apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN Rscript -e 'requireNamespace("BiocManager"); BiocManager::install(ask=F);' \
&& Rscript requirements-bioc.R
## Install from GH the following
RUN installGithub.r theislab/kBET \
chris-mcginnis-ucsf/DoubletFinder \
Note that the CRAN packages from source and the Bioconductor packages are held in separate scripts in the same folder as your Dockerfile.
requirements-src.R:
pkgs <- c(
'spelling',
'english',
'Seurat')
install.packages(pkgs)
requirements-bioc.R:
bioc_pkgs<-c(
'biomaRt',
'SingleCellExperiment',
'SummarizedExperiment')
requireNamespace("BiocManager")
BiocManager::install(bioc_pkgs,ask=F)

cannot install devtools in r shiny docker

I'm trying to build a docker image for my shiny app. Below is my dockerfile. When I build my images, everything else seems fine, except I got error message Error in library(devtools) : there is no package called ‘devtools’ Execution halted. I also tried devtools::install_github('nik01010/dashboardthemes') with no success. I have non clue why? What could go wrong? Do anyone know what is wrong with my dockerfile? Thanks a lot.
# Install R version 3.6
FROM r-base:3.6.0
# Install Ubuntu packages
RUN apt-get update && apt-get install -y \
sudo \
gdebi-core \
pandoc \
pandoc-citeproc \
libcurl4-gnutls-dev \
libcairo2-dev/unstable \
libxt-dev \
libssl-dev
# Download and install ShinyServer (latest version)
RUN wget --no-verbose https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/VERSION -O "version.txt" && \
VERSION=$(cat version.txt) && \
wget --no-verbose "https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/shiny-server-$VERSION-amd64.deb" -O ss-latest.deb && \
gdebi -n ss-latest.deb && \
rm -f version.txt ss-latest.deb
# Install R packages that are required
RUN R -e "install.packages(c('devtools', 'shiny','shinythemes','shinydashboard','shinyWidgets','shinyjs', 'tidyverse', 'dplyr', 'ggplot2','rlang','DT','lubridate', 'plotly', 'leaflet', 'mapview', 'tigris', 'rgdal', 'visNetwork', 'wordcloud2', 'arules'), repos='http://cran.rstudio.com/')"
RUN R -e "library(devtools)"
RUN R -e "install_github('nik01010/dashboardthemes')"
# Copy configuration files into the Docker image
COPY shiny-server.conf /etc/shiny-server/shiny-server.conf
COPY /app /srv/shiny-server/
# Make the ShinyApp available at port 80
EXPOSE 80
# Copy further configuration files into the Docker image
COPY shiny-server.sh /usr/bin/shiny-server.sh
CMD ["/usr/bin/shiny-server.sh"]
There are a few approaches you might try.
Easiest:
Use remotes::install_github instead of devtools. remotes has many fewer dependencies if you don't need the other functionality.
Second Easiest:
Use rocker/tidyverse image from Docker Hub instead of baseR image.
docker pull rocker/tidyverse
Change line 2:
FROM rocker/verse
Hardest:
Otherwise, you will need to figure out which dependencies you need to install inside your docker image before you can install devtools. It will likely be obvious if you try to install it interactively.
Make sure the container is running
Get the container name using docker ps
Start a shell with docker exec -it <container name> /bin/bash
Start R and try to install devtools interactively

Resources