Using docker buildkit caching with R-packages - r

I'm trying to use the docker buildkit approach to caching packages to speed up adding packages to docker containers. I learned about it from the instructions for both python and apt-get packages and useful Stackexchange answer on caching python packages while building Docker. For Python and apt-get I am able to get this to work, but I can't get it to work for R packages.
In a Dockerfile for Python I'm able to change:
RUN pip install -r requirements.txt
to (and the comment looking bit at the top of the Dockerfile is needed)
# syntax=docker/dockerfile:experimental
RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt
And then when I add a package to the requirements.txt file, rather than re-downloading and building the packages, pip is able to re-use all the work it has done. So buildkit cache mounts add a level of caching beyond the image layers of docker. It's a massive timesaver. I'm hoping to set up something similar for r-packages.
Here is what I've tried that works for apt-get but not r-packges. I've also tried with the install2.r script.
# syntax=docker/dockerfile:experimental
FROM rocker/tidyverse
RUN rm -f /etc/apt/apt.conf.d/docker-clean; echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache
RUN --mount=type=cache,target=/var/cache/apt --mount=type=cache,target=/var/lib/apt \
apt update && apt install -y gcc \
zsh \
vim
COPY ./requirements.R .
RUN --mount=type=cache,target=/usr/local/lib/R/site-library Rscript ./requirements.R
I think I don't understand:
How buildkit works. Does it do the building of containers inside a container? ie the cache path is on the 'build container'?
What one needs to specify as the target for R to notice that it already has downloaded (and possibly built).
I suspect that it has something to do with the keep.source command when installing an R package, as discussed in this question

Related

Problem building R api with plumber, RPostgreSQL, and docker

I'm trying to install plumber and RPostgreSQL into my docker image. Here's my dockerFile:
FROM rocker/r-base
RUN R -e "install.packages('plumber')"
RUN R -e "install.packages('RPostgreSQL')"
RUN mkdir -p /code
COPY ./plumber.R /code/plumber.R
CMD Rscript --no-save /code/plumber.R
The only thing my plumber script does is try to reference the RPostgreSQL package:
library('RPostgreSQL')
When I build, it appears to successfully install both packages, but when my script runs, it complains that RPostgreSQL doesn't exist. I've tried other base images, I've tried many things.
Any help appreciated. Thanks!
You are trying to install RPostgres and then trying to load RPostgreSQL -- these are different packages. Hence the error.
Next, as you are on r-base, the latter is installed more easily as sudo apt install r-cran-rpostgresql (maybe after an intial sudo apt update). While you're at it, you can also install plumber as a pre-made binary (along with its dependencies). So
RUN apt update -qq \
&& apt install --yes --no-install-recommends \
r-cran-rpostgresql \
r-cran-plumber
is easier and faster.

Deleted /usr/bin/dotnet and pacman -S dotnet-sdk will not install it

I am trying to write a WEB api in dotnet core on my Manjaro Arch linux distro.
I installed the edge version of dotnet first (^3) since i like the bleeding edge. I had, however on a different computer already made the project using dotnet 2.2. Therefore i install dotnet 2.2 aswell. This did not work, since the version in /usr/bin/ was still 3.0.
I deleted the exec from /usr/bin and now i cant get it back. I have run pacman -Su and pacman -R, i have tried rebooting aswell.
OBS: the first to times i installed them, i did it with yay -S dotnet-sdk which allowed me to choose from the different versions
You can force install of packages that is already installed using --force in pacman.
So, you should be able to get the binary again by using sudo pacman -S dotnet-sdk --force.
You might also attempt to remove dotnet-sdk before you install, you can do that by running sudo pacman -Rns dotnet-sdk. (remove package with configurationfiles and dependencies not required by any other package).

answer yes's to repeated unix prompts [duplicate]

I'm trying to uninstall all django packages in my superuser environment to ensure that all my webapp dependencies are installed to my virtualenv.
sudo su
sudo pip freeze | grep -E '^django-' | xargs pip -q uninstall
But pip wants to confirm every package uninstall, and there doesn't seem to be a -y option for pip. Is there a better way to uninstall a batch of python modules? Is rm -rf .../site-packages/ a proper way to go? Is there an easy_install alternative?
Alternatively, would it be better to force pip to install all dependencies to the virtualenv rather than relying on the system python modules to meet those dependencies, e.g. pip --upgrade install, but forcing even equally old versions to be installed to override any system modules. I tried activating my virtualenv and then pip install --upgrade -r requirements.txt and that does seem to install the dependencies, even those existing in my system path, but I can't be sure if that's because my system modules were old. And man pip doesn't seem to guarantee this behavior (i.e. installing the same version of a package that already exists in the system site-packages).
starting with pip version 7.1.2 you can run pip uninstall -y <python package(s)>
pip uninstall -y package1 package2 package3
or from file
pip uninstall -y -r requirements.txt
Pip does NOT include a --yes option (as of pip version 1.3.1).
WORKAROUND: pipe yes to it!
$ sudo ls # enter pw so not prompted again
$ /usr/bin/yes | sudo pip uninstall pymongo
If you want to uninstall every package from requirements.txt,
pip uninstall -y -r requirements.txt
on www.saturncloud.io, Jupiter notebooks one can use like this:
!yes | pip uninstall tensorflow
!yes | pip uninstall gast
!yes | pip uninstall tensorflow-probability
Alternatively, would it be better to force pip to install all dependencies to the virtualenv rather than relying on the system python modules to meet those dependencies,
Yes. Don't mess too much with the inbuilt system installed packages. Many of the system packages, particularly in OS X (even the debian and the derived varieties) depend too much on them.
pip --upgrade install, but forcing even equally old versions to be installed to override any system modules.
It should not be a big deal if there are a few more packages installed within the venv that are already there in the system package, particularly if they are of different version. Thats the whole point of virtualenv.
I tried activating my virtualenv and then pip install --upgrade -r requirements.txt and that does seem to install the dependencies, even those existing in my system path, but I can't be sure if that's because my system modules were old. And man pip doesn't seem to guarantee this behavior (i.e. installing the same version of a package that already exists in the system site-packages).
No, it doesn't install the packages already there in the main installation unless you have used the --no-site-packages flag to create it, or the required and present versions are different..
Lakshman Prasad was right, pip --upgrade and/or virtualenv --no-site-packages is the way to go. Uninstalling the system-wide python modules is bad.
The --upgrade option to pip does install required modules in the virtual env, even if they already exist in the system environment, and even if the required version or latest available version is the same as the system version.
pip --upgrade install
And, using the --no-site-packages option when creating the virtual environment ensures that missing dependencies can't possibly be masked by the presence of missing modules in the system path. This helps expose problems during migration of a module from one package to another, e.g. pinax.apps.groups -> django-groups, especially when the problem is with load templatetags statements in django which search all available modules for templatetags directories and the tag definitions within.
pip install -U xxxx
can bypass confirm

How to speed up R packages installation in docker

Say you have the following list of packages you would like to install for a docker image
("jsonlite","dplyr","stringr","tidyr","lubridate",
"knitr","purrr","tm","cba","caret",
"plumber","httr")
It actually takes around 1 hour to install these!
Any suggestions into how to speed up such a thing ? (or how to prevent the re-installation at every new image build ?)
Side note
I do not install these packages from the dockerfile like this:
RUN Rscript -e "install.packages('stringr')
...
Instead I create an R script Requirements.R which installs these packages and simply do:
RUN Rscript Requirements.R
Is these less optimal than installing the packages directly from the Dockerfile ?
Use binary packages where you can as we often do in the Rocker Project providing multiple Docker files for R, including the official r-base one.
If you start from Ubuntu, you get Michael's PPAs with over 3000+ packages; if you start from Debian you get fewer from the distro but still many essential ones. (There are some efforts to bring more binary packages to Debian but nothing is up right now.)
Lastly, Dockerfile creation is of course compile time too. You spend the time once (per container creation) and re-use potentially many time after. Also, by using the Docker Hub you can avoid spending your local cpu cycles.
Edit in Sep 2020: The (updated) Ubuntu PPA now has over 4600 package for the three most recent LTS releases. Still highly, highly recommended.
I found an article that described how to install R packages from precompiled binaries. It reduced the build time on our Jenkins server from 45 minutes down to 3 minutes.
Here is my Dockerfile:
FROM rocker/r-apt:bionic
WORKDIR /app
RUN apt-get update && \
apt-get install -y libxml2-dev
# Install binaries (see https://datawookie.netlify.com/blog/2019/01/docker-images-for-r-r-base-versus-r-apt/)
COPY ./requirements-bin.txt .
RUN cat requirements-bin.txt | xargs apt-get install -y -qq
# Install remaining packages from source
COPY ./requirements-src.R .
RUN Rscript requirements-src.R
# Clean up package registry
RUN rm -rf /var/lib/apt/lists/*
COPY ./src /app
EXPOSE 5000
CMD ["Rscript", "Server.R"]
You can add a file requirements-bin.txt with package names:
r-cran-plumber
r-cran-quanteda
r-cran-irlba
r-cran-lsa
r-cran-caret
r-cran-stringr
r-cran-dplyr
r-cran-magrittr
r-cran-randomforest
And finally, a requirements-src.R for packages that are not available as binairies:
pkgs <- c(
'otherpackage'
)
install.packages(pkgs)
I ended up using rocker/r-base as #DirkEddelbuettel suggested. Also thanks to this How to avoid reinstalling packages when building Docker image for Python projects? I wrote my Dockerfile in a way that doesen't reinstall packages every time I rebuild my docker image.
I want to share how my Dockerfile looks like now, hopefully this will be of help to others:
FROM rocker/r-base
RUN apt-get update
# install packages
RUN apt-get -y install libcurl4-openssl-dev
RUN apt-get -y install libssl-dev
# set work directory
WORKDIR /myapp
# copy requirments R script
COPY ./Requirements.R /myapp/Requirements.R
# run requirments R script
RUN Rscript Requirements.R
COPY . /myapp
EXPOSE 8094
ENV NAME R-test-service
CMD ["Rscript", "my_R_api.R"]

MXNet R package on an Amazon Linux Deep Learning EC2 instance

I'm attempting to setup an Amazon Linux EC2 instance with MXNet and R (and the MXNet r package available as well). Unfortunately this has been a lot harder than I expected.
I've attempted to follow the instructions from MXNet using Amazon's deep learning AMI with CUDA 8.0 on a p2.xlarge (https://mxnet.incubator.apache.org/get_started/install.html)
However I get the same error when attempting to compile the mxnet r package from this SO post:
Issues installing mxnet GPU R package for Amazon deep learning AMI
The solution discussed in that post are somewhat beyond my abilities to fully test/debug. i.e. I'm not particularly familiar with linux environment variables and such to modify. I've also reviewed some issues raised on the apache-incubator github for MXnet and those were pretty unhelpful as well.
So my questions are,
Is anyone aware of any available AMI's which come pre-packaged with R and MXNet? The ones I see seem to only include python.
Have a working set of instructions (or a script) to run on an Amazon Linux EC2 instance to install the required dependencies (assuming Im using some type of deep learning AMI that comes with CUDA 8.0 at least) to install the MXnet R package?
Right so I was the guy on the other post and I DID eventually get it working. Took 50+ hours and I'm not 100% sure where the issue was because...linux.
sudo yum install R
sudo yum install libxml2-devel
sudo yum install cairo-devel
sudo yum install giflib-devel
sudo yum install libXt-devel
sudo R
install.packages("devtools")
library(devtools)
install_github("igraph/rigraph")
install.packages(c(“DiagrammeR”, “roxygen2”, “rgexf”, “influenceR”, “Cairo”, “imager”))
cd
cd /src/mxnet
cp make/config.mk .
echo "USE_BLAS=openblas" >>config.mk
echo "ADD_CFLAGS += -I/usr/include/openblas" >>config.mk
echo "ADD_LDFLAGS += /usr/local/lib" >>config.mk
echo "USE_CUDA=1" >>config.mk
echo "USE_CUDA_PATH=/usr/local/cuda-9.0/lib64" >>config.mk
echo "USE_CUDNN=1" >>config.mk
*add another LD flag for /usr/local/lib
cd /etc/ld.so.conf.d/
sudo nano cuda.conf
Insert  /usr/local/cuda-9.0/lib64
cd
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH
sudo ldconfig
cd R-package
Rscript -e "install.packages('devtools', repo = 'https://cran.rstudio.com')"
Rscript -e "library(devtools); library(methods);options(repos=c(CRAN='https://cran.rstudio.com'));install_deps(dependencies = TRUE)"
cd ..
sudo make rpkg
THEN you gotta make sure R/Rstudio can actually find those libraries:
cd /etc/rstudio
sudo nano rserver.conf
You can add elements to the default LD_LIBRARY_PATH for R sessions (as determined by the R ldpaths script) by adding an rsession-ld-library-path entry to the server config file. This might be useful for ensuring that packages can locate external library dependencies that aren't installed in the system standard library paths. For example:
rsession-ld-library-path=/opt/local/lib:/usr/local/cuda/lib64

Resources