Let Docker image build fail when R package installation returns error - r

I am trying to create a custom Docker image based on Rocker using Dockerfile. In the Dockerfile I am pulling my own R package from a custom GitLab server using:
RUN R -e "devtools::install_git('[custom gitlab server]', quiet = FALSE)"
Everything usually works, but I have noticed that when the GitLab server is down, or the machine running Docker is low on RAM memory, the package does not install correctly and returns an error message in the R console. This behavior is to be expected. However, Docker does not notice the error produced by R and continues evaluating the rest of the Dockerfile. I would like Docker to fail building the image when this occurs. In that way, I could ultimately prevent automatic deployment of the incomplete Docker container by Kubernetes.
So far I have thought of two potential solutions, but I am struggling with the execution:
R level: Wrap tryCatch() around devtools::install_git to catch the error. But then what? Use stop? Will this cause the Docker building process to stop as well? Could withCallingHandlers() be used?
Dockerfile level: Use a shell command to check for errors? I cannot find the contents of R --help as I do not have a Linux machine at the moment. So I am not sure of what R -e actually does (execute I presume) and which other commands could be passed along with R.
It seems that a similar issue is discussed here and here, but the I do not understand how they have solved it.
Thus how to make sure no Docker image ends up running on the Kubernetes cluster without the custom package?

The Docker build process should stop once one of the commands in the Dockerfile returns a non zero status.
install_git doesn't seem to throw an error when the package wasn't installed successfully, so the execution keeps on.
An obvious way to go would be to wrap the installation inside a dedicated R script and throw an error if it didn't finish successfully, which would then stop the build.
So I would suggest something like this ...
Create installation script install_gitlab.R:
### file install_gitlab.R
## change repo- and package name!!
repo <- '[custom gitlab server]'
pkgname <- 'testpackage'
devtools::install_git(repo, quiet = FALSE)
stopifnot(pkgname %in% installed.packages()[,'Package'])
Modify your Dockerfile accordingly (replace the install_git line):
...
Add install_gitlab.R /runscripts/install_gitlab.R
RUN Rscript /runscripts/install_gitlab.R
...
One thing to keep in mind is, this approach assumes the package you're trying to install is NOT installed prior to calling the command.

If you're using a rocker image, they already have the littler package installed, which has the handy installGithub.r script. I believe it should already have the functionality you want. If not, it at least simplifies the running of the custom install_github.r script.
A docker RUN command using littler just looks like:
RUN installGithub.r "yourRepo"

Related

How to use R libraries in Azure Databricks without depending on CRAN server connectivity?

We are using a few R libraries in Azure Databricks which do not come preinstalled. To install these libraries during Job Runs on Job Clusters, we use an init script to install them.
sudo R --vanilla -e 'install.packages("package_name",
repos="https://mran.microsoft.com/snapsot/YYYY-MM-DD")'
During one of our production runs, the Microsoft Server was down (could the timing be any worse?) and the job failed.
As a workaround, we now install libraries in /dbfs/folder_x and when we want to use them, we include the following block in our R code:
.libpaths('/dbfs/folder_x')
library("libraryName")
This does work for us, but what is the ideal solution to this? Since, if we want to update a library to another version, remove a library or add one, we have to go through the following steps everytime and there is a chance of forgetting this during code promotions:
install.packages("xyz")
system("cp -R /databricks/spark/R/lib/xyz /dbfs/folder_x/xyz")
It is a very simple and workable solution, but not ideal.

Singularity container with stringr fails only locally with 'libicui18n.so.66: cannot open shared object file: No such file or directory'

I enjoy using the Singularity container software, as well as the R package 'stringr' to work with strings.
What I fail to understand is why a Singularity container fails locally (i.e. on my Ubuntu 20.04 computer), yet passes remotely (i.e. on GitHub Actions), when both containers are built at approximately the same time.
Here I run a simple script ([1], see below) that uses stringr:
singularity run --bind $PWD/scripts/ stringr.sif scripts/demo_container.R
(I can remove --bind $PWD/scripts/, but I want to have exactly the same call here as on GitHub Actions)
The error I get is:
'stringr.sif' running with arguments 'scripts/demo_container.R'
Error: package or namespace load failed for ‘stringr’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/richel/R/x86_64-pc-linux-gnu-library/4.1/stringi/libs/stringi.so':
libicui18n.so.66: cannot open shared object file: No such file or directory
Execution halted
On GitHub Actions, however, this exact call passes without problems (from this GitHub Actions log):
'stringr.sif' running with arguments 'scripts/demo_container.R'
Hello world
The Singularity script is very simple, it only installs and updates apt, then installs the stringr package ([2], see below).
I understand that this is a shared objects problem, there are some workaround that fail in this context:
sudo apt install libicu-dev: ICU is the library that stringr uses
uninstall stringr and install it again, forcing a recompilation of the shared object, from this GitHub Issue comment
How can it be my Singularity container fails locally, yet passes on GitHub Actions? How can I fix this, so that the container works in both environments?
A non-fix is to use rocker/tidyverse as a base, which I can get to work successfully, as the question is more about why this stringr setup fails.
Thanks and cheers, Richel Bilderbeek
[1] demo_container.R
library(stringr)
message(stringr::str_trim(" Hello world "))
[2] Singularity
Bootstrap: docker
From: r-base
%post
sed -i 's/$/ universe/' /etc/apt/sources.list
apt-get update
apt-get clean
Rscript -e 'install.packages("stringr")'
%runscript
echo "'stringr.sif' running with arguments '$#'"
Rscript "$#"
I had what seems like the same problem, and setting the environment variable R_LIBS solved it. Details below.
As background, the typical error message would look something like this:
Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared object '/home/mrodo/R/x86_64-pc-linux-gnu-library/4.2/fs/libs/fs.so':
/usr/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /home/mrodo/R/x86_64-pc-linux-gnu-library/4.2/fs/libs/fs.so)
The reason, as I understand it, is that the R package fs had been installed previously to the library that the container is using, but using a different system (either the host system or a container based on a different image). This installation of fs is incompatible with the running container and so throws an error. In this case the error is because a different version of GLIBC is available then what fs wants to use.
How this scenario arose is as follows. Upon trying to install a package using R inside a previous container based off a different image but also running R4.2.x, no default library was writeable as R_LIBS_USER was not set and/or did not exist (if it does not exist when R is loaded, then it is not added to the library paths). Then R prompted to install to a personal library, and I accepted this. But my latest container prompted to use the same personal library, creating the clash.
The solution I used was to set a directory in my home directory as the first path returned by .libPaths(). To do this, you create the directory (won't work if it doesn't exist first0 and add its path as the first path of R_LIBS. You can do that in various ways, but I used the --env option for singularity run.
To make it easier to set this every time, I created an alias for the run command, adding the following to .bashrc:
export R_LIBS_USER_AR42=~/.R/ar42
mkdir -p "$R_LIBS_USER_AR42"
alias ar42='singularity run --env "R_LIBS='$R_LIBS_USER_AR42':/usr/local/lib/R/site-library:/usr/local/lib/R/library$sif/ar42.sif radian'
That would just be tweaked for your own settings.
If you look at the error message, you'll see that the library that cannot be loaded is in your HOME on the host OS: /home/richel/R/x86_64-pc-linux-gnu-library/4.1/stringi/libs/stringi.so
This suggests that the R being used is one you have locally installed on the host and not the one installed in the image. Since singularity processes inherit the full host environment by default, I'd guess you've modified your $PATH and that's clobbering the value set inside the container. Since the environment on the CI / actions server is clean, it is able to run successfully.
I strongly recommend always using the -e/--cleanenv parameters to ensure that the environment inside the singularity container is the same anywhere it is run.

R there is no package called in a docker container

I have a docker image where I install several packages using the following lines:
FROM rocker/verse:4.0.3
... (some other installation details)
RUN install2.r --error \
glmnet \
ineq
...
However, I sporadically get error messages when running a container from that image where it seems like R cannot find that package:
Error in library(ineq) : there is no package called 'ineq'
If I create a new version of the container and manually open R and run it, I can never reproduce this error. Does anyone have any idea of how I can fix this (or what I should be looking for to reproduce this)?
Hitting the same "weird" behaviour..
My workaround was (I know its not elegant) using pacman instead of the library lines. What pacman does if it can't find the library it simply installs it. Which is obviously rude and anti-container pattern but at some point we need to move on :-/.. A big disadvantage here is that the container startup time could be huge if it is lost all of its packages and starts to re-install them
if(!require("pacman")) install.packages("pacman")
pacman::p_load("glmnet","ineq")

Use crontab to automate R script

I am attempting to automate a R script using Rstudio Server on ec2 machine.
The R script is working without errors. I then navigated to the terminal on RStudio Sever and attempted to run the R script using the command - Rscript "Rfilename" and it works.
At this point I created a shell script and placed the command above for running the R script in there. This shell command is also running fine - sh "shellfilename"
But when I try to schedule this shell command using crontab, it does not produce any result. I am using the following cron entry :
* * * * * /usr/bin/sh ./shellfilename.sh
I am using cron for the first time and need help debug what is going wrong. My intuition is that there is there is difference in the environments used by the command when I run it on terminal and when I use the same in crontab. In case it is relevant information - am doing all of this on a user account created for myself on this machine so would differ from admin account.
Can someone help resolve this issue? Thanks!
The issue arose due to relative paths used in the script for importing files and objects. Changing this to absolute path resolved the described issue.

Julia and HTCondor - ENV["HOME"] causes error on Condor

When I run a Julia Script that prints "Hello World" on HTCondor, I get the following error
fatal: error thrown and no exception handler available.
Base.InitError(mod=:Pkg, error=Base.KeyError(key="HOME"))
The code runs without a problem on my local Ubuntu machine. I can run
eval julia --version
in a bash script on condor and the output is
julia version 0.5.0
This problem has been discussed in two places on github: one, two.
ENV["HOME"] is used in a single file and the common recommendation is to modify that. However, I cannot change the julia installation on condor.
Is there a way to fix this on the fly before running a script without sudo?
As #sujeet suggested, it is possible to set environmental parameters in condor. The issue is resolved by adding the following line in the condor submit script
Environment = "HOME=""/tmp"""
, which sets the home directory to the tmp. Julia code runs fine then (as long one is careful not to write to home before resetting it in the script itself).

Resources