Using R in a Snakemake workflow with Mambaforge - r

I'm building a pipeline with Snakemake. One rule involves an R script that reads a CSV file using readr. I get this error when I run the pipeline with --use-singularity and --use-conda
Error: Unknown TZ UTC
In addition: Warning message:
In OlsonNames() : no Olson database found
Execution halted
Google suggests readr is crashing due to missing tzdata but I can't figure out how to install the tzdata package and make readr see it. I am running the entire pipeline in a Mambaforge container to ensure reproducibility. Snakemake recommends using Mambaforge over a Miniconda container as it's faster, but I think my error involves Mambaforge as using Miniconda solves the error.
Here's a workflow to reproduce the error:
#Snakefile
singularity: "docker://condaforge/mambaforge"
rule targets:
input:
"out.txt"
rule readr:
input:
"input.csv"
output:
"out.txt"
conda:
"env.yml"
script:
"test.R"
#env.yml
name: env
channels:
- default
- bioconda
- conda-forge
dependencies:
- r-readr
- tzdata
#test.R
library(readr)
fp <- snakemake#input[[1]]
df <- read_csv(fp)
print(df)
write(df$x, "out.txt")
I run the workflow with snakemake --use-conda --use-singularity. How do I run R scripts when the Snakemake workflow is running from a Mambaforge singularity container?

Looking through the stack of R code leading to the error, I see that it checks a bunch of default locations for the zoneinfo folder that tzdata includes, but also checks for a TZDIR environment variable.
I believe a proper solution to this would be for the Conda tzdata package to set this variable to point to it. This will require a PR to the Conda Forge package (see repo issue). In the meantime, one could do either of the following as workarounds.
Workaround 1: Set TZDIR from R
Continuing to use the tzdata package from Conda, one could set the environment variable at the start of the R script.
#!/usr/bin/env Rscript
## the following assumes active Conda environment with `tzdata` installed
Sys.setenv("TZDIR"=paste0(Sys.getenv("CONDA_PREFIX"), "/share/zoneinfo"))
I would consider this a temporary workaround.
Workaround 2: Derive a New Docker
Otherwise, make a new Docker image that includes a system-level tzdata installation. This appears to be a common issue, so following other examples (and keeping things clean), it'd go something like:
Dockerfile
FROM --platform=linux/amd64 condaforge/mambaforge:latest
## include tzdata
RUN apt-get update > /dev/null \
&& DEBIAN_FRONTEND="noninteractive" apt-get install --no-install-recommends -y tzdata > /dev/null \
&& apt-get clean
Upload this to Docker Hub and use it instead of the Mambaforge image as the image for Snakemake. This is probably a more reliable long-term solution, but perhaps not everyone wants to create a Docker Hub account.

Related

R X13binary missing in docker build

I have a docker file where i'm trying to install the R seasonal library:
FROM continuumio/miniconda3:4.5.12 # Debian
. . .
# Install packages not on conda
RUN conda activate r_env && \
R -e "install.packages(c('RUnit', 'seasonal'), dependencies=TRUE, repos='https://cran.case.edu')"
Everything looks like it installs correctly, however when I get into the container and run library(seasonal) I get the error:
> library(seasonal)
The binaries provided by 'x13binary' do not work on this
machine. To get more information, run:
x13binary::checkX13binary()
> x13binary::checkX13binary()
Error in x13binary::checkX13binary() : X-13 binary file not found
After some googling it looks like I can manually set the path for the binary and a findutil shows that the binary exists on the machine:
(r_env) root#89c7265d9316:/# find / -name "*x13*"
/opt/conda/envs/arimaApiR/lib/R/library/x13binary
/opt/conda/envs/arimaApiR/lib/R/library/x13binary/help/x13binary.rdx
/opt/conda/envs/arimaApiR/lib/R/library/x13binary/help/x13binary.rdb
/opt/conda/envs/arimaApiR/lib/R/library/x13binary/html/x13path.html
/opt/conda/envs/arimaApiR/lib/R/library/x13binary/html/x13binary-package.html
/opt/conda/envs/arimaApiR/lib/R/library/x13binary/bin/x13ashtml.exe
/opt/conda/envs/arimaApiR/lib/R/library/x13binary/R/x13binary.rdx
/opt/conda/envs/arimaApiR/lib/R/library/x13binary/R/x13binary.rdb
/opt/conda/envs/arimaApiR/lib/R/library/x13binary/R/x13binary
/opt/conda/envs/arimaApiR/conda-meta/r-x13binary-1.1.39_2-r36h6115d3f_0.json
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0/lib/R/library/x13binary
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0/lib/R/library/x13binary/help/x13binary.rdx
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0/lib/R/library/x13binary/help/x13binary.rdb
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0/lib/R/library/x13binary/html/x13path.html
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0/lib/R/library/x13binary/html/x13binary-package.html
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0/lib/R/library/x13binary/bin/x13ashtml.exe
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0/lib/R/library/x13binary/R/x13binary.rdx
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0/lib/R/library/x13binary/R/x13binary.rdb
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0/lib/R/library/x13binary/R/x13binary
/opt/conda/pkgs/r-x13binary-1.1.39_2-r36h6115d3f_0.tar.bz2
However no matter whatever I set the path to be, the library still throws errors on where the actual path is:
(r_env) root#89c7265d9316:/# export X13_PATH=/opt/conda/envs/arimaApiR/lib/R/library/x13binary
(r_env) root#89c7265d9316:/# R -e "library(seasonal)"
The system variable 'X13_PATH' has been manually set to:
/opt/conda/envs/arimaApiR/lib/R/library/x13binary
Since version 1.2, 'seasonal' relies on the 'x13binary'
package and does not require 'X13_PATH' to be set anymore.
Only set 'X13_PATH' manually if you intend to use your own
binaries. See ?seasonal for details.
Binary executable file /opt/conda/envs/arimaApiR/lib/R/library/x13binary/x13as or /opt/conda/envs/arimaApiR/lib/R/library/x13binary/x13ashtml not found.
See ?seasonal for details.
I feel like I'm running in circles. Has anyone had luck running this inside a container?
I've prepared my own container but I didn't use continuumio/miniconda since I don't know how it works inside.
This is the Dockerfile I've prepared:
FROM r-base:3.6.1
RUN apt-get update \
&& apt-get install -y libxml2-dev
RUN R -e "install.packages('RUnit', dependencies=TRUE, repos='https://cran.case.edu')"
RUN R -e "install.packages('x13binary', dependencies=TRUE, repos='https://cran.case.edu')"
RUN R -e "install.packages('seasonal', dependencies=TRUE, repos='https://cran.case.edu')"
CMD [ "bash" ]
If I run your test commands, I receive this:
> library(seasonal)
> x13binary::
x13binary::checkX13binary x13binary::supportedPlatform x13binary::x13path
> x13binary::checkX13binary
x13binary::checkX13binary
> x13binary::checkX13binary()
x13binary is working properly
>
NOTE: the Dockerfile can be improve, e.g. you can put together the packages c(RUnit, x13binary, seasonal) and you can remove the apt cache after installing the package but I just wanted to run a test to see if it'd work.

Rscript not finding installed packages in container

I am trying to schedule and R script to run inside a container. I have a docker file like this:
# Install R version 3.5
FROM rocker/tidyverse:3.5.1
USER root
# Install Ubuntu packages
RUN apt-get update && apt-get install -y \
sudo \
gdebi-core \
pandoc \
pandoc-citeproc \
libcurl4-gnutls-dev \
libcairo2-dev \
libxt-dev \
libssl-dev \
xtail \
wget \
cron
# Install R packrat, which we'll then use to install the other packages
RUN R -e 'install.packages("packrat", repos="http://cran.rstudio.com", dependencies=TRUE);'
# copy packrat files
COPY packrat/ /home/project/packrat/
# copy .Rprofile so that it know where to look for packages
COPY .Rprofile /home/project/
RUN R -e 'packrat::restore(project="/home/project");'
# Copy DB query script into the Docker image
COPY 002_query_db_for_kpis.R /home/project/002_query_db_for_kpis.R
# copy crontab for db query
COPY db_query_cronjob /etc/crontabs/db_query_cronjob
# give execution rights
RUN chmod 644 /etc/crontabs/db_query_cronjob
# run the job
RUN crontab /etc/crontabs/db_query_cronjob
# start cron in the foreground
CMD ["cron", "-f"]
It builds ok and then the cron job fails silently. When I investigate with:
docker exec -it 19338f50b4ed Rscript `/home/project/002_query_db_for_kpis.R`
The output I get is:
Error in library(zoo) : there is no package called ‘zoo’
Execution halted
Now, the first part of the scripts looks like:
#!/usr/local/bin/env Rscript --default-packages=zoo,RcppRoll,lubridate,broom,magrittr,tidyverse,rlang,RPostgres,DBI
library(zoo)
...
So, clearly it's not finding the packages. They are in there though. That was the whole point of packrat and copying the .Rprofile, and it seemed to work because if I run a shell inside the container while it's running I can find them in:
root#d2b4f6e7eade:/usr/local/lib/R/site-library#
and all the packrat files seem in the right place as well.. could it be that the .Rprofile file isn't being seen because it starts with a '.'? Can I change that?
UPDATE
If I don't use packrat, but install packages normally, it works. Digging around inside the container's files, I can see that /usr/local/lib/R/site-library doesn't have the packages needed in it, whereas /home/project/packrat/src does. So, it must be to do with Rscript looking in the wrong place. I thought the .Rprofile in /home/project would solve that but it doesn't.. maybe something else I didn't copy over? Although I've got the script running now, it's not ideal since, those packages might be different versions (hence why I want to use packrat), so if anyone can figure out how to get it to work with packrat I'll mark that answer as correct.
A couple things to try based on problem and update:
have you ignored your packrat/lib* and packrat/src/ directories in .dockerignore? i am worried you are copying over all the built packages and so restore() thinks the packages already been built in your container.
does your root container have executable privs on the packrat.lock file? obviously would prevent restore from running.
change docker install user to the rocker rstudio image's default "rstudio", moves just the packrat.lock and packrat.opts files
USER rstudio
COPY --chown=rstudio:rstudio packrat/packrat.* /home/project/packrat/
A good reference for these options: https://rviews.rstudio.com/2018/01/18/package-management-for-reproducible-r-code/

Can't install a Julia package through a proxy connection

I tried the following command in Julia to install FixedEffectModels, but I'm getting this error:
julia> Pkg.add("FixedEffectModels")
INFO: Initializing package repository /root/.julia/v0.4
INFO: Cloning METADATA from git://github.com/JuliaLang/METADATA.jl
ERROR: failed process: Process(`git clone -q -b metadata-v2 git://github.com/JuliaLang/METADATA.jl METADATA`, ProcessExited(128)) [128]
in anonymous at ./pkg/dir.jl:52
I'm using a proxy connection, is it related?
On Windows 7, it seemed to me the .gitconfig worked. But are you sure it's in the correct user home directory (C:/Users/Username by default) and that it is really .gitconfig and not .gitconfig.txt (like I managed to do on my first attempt)?
Your home is root??
Look at this discussion could be inspiring.
export https_proxy=... and Pkg.setprotocol!("https") could probably help?
I also propose to upgrade julia. Because you are using Ubuntu, you could add this ppa:
sudo add-apt-repository ppa:staticfloat/juliareleases
sudo apt-get update
and you could have julia 0.5.2 (unfortunately it was not updated after 0.6)
But if you trust packages on github then you could probably download julia 0.6.1 too. :)
Thegit config --global url."https://github.com/".insteadOf git://github.com/ solve momentarily but after a server reboot it didn't worked anymore, the best solution found in this discussion was installing Julia 0.6+ because it can use the environment variables, so the export take effects in Julia.

How do you create a fake install of a debian package for use in testing?

I have a package that previously only targeted RPM based distros for which I am now building .deb packages for Debian based distros.
The aim is to simulate a test installation from user-space that is isolated from the system you are building on. It may be multi-user and you do not want to require root access just to build the software. Many of our tests simulate the installation directory structure already. This is for the next step up to simulate an actual installation using packages built.
For the RPM packages I was able to create test installations using:
WSDIR=/where/I/want/my/tests/to/run
rpmdb --initdb --dbpath "$WSDIR"/rpmdb
rpm --relocate /opt="$WSDIR"/opt --dbpath $WSDIR/rpmdb -i <package>.rpm
The equivalent in the Debian world is something like:
dpkg --force-not-root --admindir=$WSDIR/dpkg --root=$WSDIR/install --install "$DEB"
However, I am stuck over the equivalent to the rpmdb --initdb step.
Note that I can just unpack the archive using:
dpkg-deb -x "$DEB" $WSDIR/install
But I would prefer to be closer to how a real package is installed.
Also I don't think this will run preinstall and postinstall scripts.
Similar questions have suggested using deboostrap to create a chroot environment but this creates a complete new installation. As well as being overkill it is too slow for an automated test. I intend to use this for quick tests of the installation package prior to further testing in actual test environments.
My experiments so far:
(cd $WSDIR/dpkg && mkdir alternatives info parts triggers updates)
cp /var/lib/dpkg/status $WSDIR/dpkg/status
have at best resulted in:
dpkg: error: unable to access dpkg status area: No such file or directory
which does not indicate clear what is wrong.
So how do you create a dpkg admin directory?
Cross posted as https://superuser.com/questions/1271145/how-do-you-create-a-dpkg-admin-directory
Update 24/11/2017
I've tried copying using the dpkg dir from an environment created by [cowdancer][1] (which uses deboostrap under the hood) or copying the real one from /var/lib/dpkg but I still get the same error message so perhaps the error (and/or the --admindir option) doesn't mean quite what I think it means.
Note that:
sudo dpkg --force-not-root --root=$WSDIR/install --admindir=/var/lib/dpkg --install "$DEB"
does work. So it is something to do with the admin dir.
I've also retitled the question as "How do you create a dpkg admin directory" is interesting question but the answer is not necessarily the solution to my problem.
The minimal way to create a dpkg database is something like this:
$ mkdir -p db/{updates,info}
$ touch db/{status,diversions,statoverride}
If you want to use that as non-root, currently the best way is to use fakeroot.
$ mkdir -p fsys
$ PATH=/sbin:/usr/sbin:$PATH fakeroot dpkg --log=/dev/null --admindir=db --instdir=fsys -i pkg.deb
But take into account that passing --root after --admindir or --instdir will reset those paths, which is I think the problem you have been having here.
Also using sudo and --force-not-root does not make much sense? :) And is definitely less confined than using just fakeroot. In the near future it will be possible to run dpkg fully unprivileged in some local tree.
I eventually found an answer for this. Thanks to Guillem Jover for some of this.
Pasting a copy of it here:
mkdir fake
mkdir fake/install
mkdir -p fake/dpkg/info
mkdir -p fake/dpkg/updates
touch fake/dpkg/status
PATH=/sbin:/usr/sbin:$PATH fakeroot dpkg --force-script-chrootless --log=`pwd`/fake/dpkg.log --root=`pwd`/fake --instdir `pwd`/fake --admindir=`pwd`/fake/dpkg --install *.deb
Some points to note:
--force-not-root is not enough. fakeroot is required.
ldconfig and start-stop-daemon must be on the path.
(hence PATH=/sbin:/usr/sbin:$PATH)
The log file needs to be relocated from the default /var/log/dpkg.log
The order of arguments is significant. If used --root must be before --instdir and --admindir.
The admindir is supposed to have a the installation dir as a prefix.
If the package contains any pre or post installation scripts (preinst,postinst) then --force-script-chrootless is required as these scripts are normally run via chroot() which gives operation not permitted when attempted under fakeroot.
For a quick test of trivial dependencies, you can directly install on the system using 'dpkg -i' then 'dpkg -P' and 'apt-get autoremove' to purge the package and clean the dependencies.
An other more secure but slower solution could be to use the autopkgtest package:
https://people.debian.org/~mpitt/autopkgtest/README.package-tests.html

Can I install a .deb during a BitBake Build?

Problem Definition
I'm attempting to adapt these rosjava installation instructions so that I can include rosjava on a target image built by the BitBake build system. I'm using the jethro branch of Poky.
Implementation Attempt: Build From .deb with package_deb.bbclass
According to the installation instructions, all that really needs to be done to install rosjava is the following:
sudo apt-get install ros-indigo-rosjava
Which works perfectly fine on my build machine. I figured that if I can just point to a .deb and use the Poky metadata class package_deb, it would do all the heavy lifting for me, so I produced the following simple recipe adapted on this posting on the Yocto Project mailing list:
inherit package_deb
SRC_URI = "http://packages.ros.org/ros/ubuntu/pool/main/r/ros-indigo-rosjava/ros-indigo-rosjava_0.2.1-0trusty-20160207-031808-0800_amd64.deb"
SRC_URI[md5sum] = "2020ccc8b4a67dd918a9a2c426eece0b"
SRC_URI[sha256sum] = "ab9493fabe1285b0d21aab031348d0d733d116b0b2470bae90025709b303b649"
The relevant part of the errors I get during the above recipe's do_unpack are:
| no entry data.tar.gz in archive
|
| gzip: stdin: unexpected end of file
| tar: This does not look like a tar archive
| tar: Exiting with failure status due to previous errors
| DEBUG: Python function base_do_unpack finished
| DEBUG: Python function do_unpack finished
The following command produces the output below:
$ ar t python-rosdistro_0.4.5-1_all.deb
debian-binary
control.tar.gz
data.tar.xz
You can see here that there's a data.tar.xz, not data.tar.gz. What can I do to remedy this error and install from this particular .deb?
I've included package_deb in my PACKAGE_CLASSES variable and package-management in my IMAGE_FEATURES. I've tried other methods of installation which have all failed; I thought this method in particular would be very useful to know how to implement.
Update - 3/22
I'm attempting to circumvent the problems with the method above by doing my installation through a ROOTFS_POSTPROCESS_COMMAND which I've adapted from forum posts like this
install_rosjava() {
${STAGING_BINDIR_NATIVE}/dpkg \
--root=${IMAGE_ROOTFS}/ \
--admindir=${IMAGE_ROOTFS}/var/lib/dpkg/ \
-L /var/cache/apt/archives/ros-indigo-rosjava_0.2.1-0trusty-20160207-031808-0800_amd64.deb
}
ROOTFS_POSTPROCESS_COMMAND += " install_rosjava() ; "
However, this fails due to dpkg not being a command found within the ${STAGING_BINDIR_NATIVE} path. The Yocto Project Reference Manual states that:
STAGING_BINDIR_NATIVE Specifies the path to the /usr/bin subdirectory of the sysroot directory for the build host.
Taking a look inside this directory yields a lot of commands but not dpkg (The recipe depends on the dpkg package, and this command can be found in my target rootfs after the build is finished; I've also tried pointing to ${IMAGE_ROOTFS}/usr/bin/dpkg which yields the same results). From what I understand of the BitBake process, this command may be in another sysroot, but I must admit that this is where my understanding breaks down.
Can I adjust this method so that it works, or will I need to start from scratch on an installation from source?
Perhaps there's a different method entirely which I could consider?
If you really want to install their deb directly then your rootfs postprocess is one solution. It doesn't work because depending on dpkg will build you a dpkg for the target but you want a dpkg that will run on the host. Add a dependency on dpkg-native to your image.
Though personally I'd either inherit bin_package and extract the deb they provide then re-package it as a standard package in OE, or ideally write a proper recipe to build rosjava and submit it to meta-ros (https://github.com/bmwcarit/meta-ros).
package_deb is where the packaging machinery for deb packages is stored, it's not something you'd inherit in a recipe but should be listed in PACKAGE_CLASSES.
When you put a .deb in a SRC_URI the fetcher will try to unpack it so you can access the contents: the assumption is that you're going to repack the contents as a native Yocto recipe.
If that's what you want to do then first you'll need to fix the unpack logic (in bitbake/lib/bb/fetch2/__init__.py) to handle .debs with xz-compressed data. This is a bug in bitbake and a bug report and/or patch would be appreciated.
The alternative would be to use their deb directly but I don't recommend that as it's likely the dependencies don't match. The best long-term solution would be to build it from source directly instead of attempting to use a package for another distro.

Resources