dockerhub automated builds appear to cache access to other github repositories - r

My Dockerfile includes a step that pulls from an unrelated, public git repository. Specifically, it's installing an R package using the R command
devtools::install_github("bmbroom/NGCHMR")
This should install the latest version of the package from the master branch of that repository, and on local builds that is exactly what it does.
On Dockerhub automated builds, however, I always get the version of the package that was current when I first built the image. There is otherwise nothing special about that particular version. It's as if github access is being cached. The docker build process itself is clearly not cached, based on authoritative replies to other questions here as well as the build logs, which show earlier steps in the build being executed in detail.
Any suggestions why I'm not getting the most recent version of the R package installed?

Related

Using renv: where should renv itself be installed?

I am getting started with collaborating with team members on R projects using renv. While I can (mostly) get it to work, I am a bit confused about whether and where to install renv itself. According to the documented workflow I basically need renv installed before I start a new project with renv.
However, when I do not have renv installed, and clone a repo that uses renv, it seems to install (bootstrap?) itself. But it does this within the local renv environment.
I have a couple of questions regarding this:
Do you recommend to have renv installed "outside" the renv virtual environment?
How do you deal with differences in versions of renv itself between what is e.g. installed on my machine and present in a repo that I clone, and which renv I would like to replicate? I ran into problems with this one, could not replicate a renv from a cloned repo with a different renv version.
On a more conceptual level: why is renv itself part of the virtual environment it creates? That's not the case for the python virtual environment managers I know.
Do you recommend to have renv installed "outside" the renv virtual environment?
We do. In fact, this is necessary if you want to initialize an renv project in the first place, since this is done by calling renv::init() -- and so the regular renv initialization workflow expects renv to be installed into the user library.
How do you deal with differences in versions of renv itself between what is e.g. installed on my machine and present in a repo that I clone, and which renv I would like to replicate? I ran into problems with this one, could not replicate a renv from a cloned repo with a different renv version.
Since renv is just an R package, you can install or upgrade (or downgrade) the version of renv used within a project as required, without affecting other projects. For example, installing the latest version from CRAN can be done with a plain install.packages("renv").
When working within an renv project, the version of renv installed in that project is normally the copy that is used -- so at this point, it should no longer matter what version of renv is installed in the user library.
On a more conceptual level: why is renv itself part of the virtual environment it creates? That's not the case for the python virtual environment managers I know.
This is done primarily to ensure existing renv projects can continue to function even if an update to renv happened to break some existing workflows. (We endeavor to make sure that will never happen, but want to make sure users have an escape hatch in case it does.)
However, when I do not have renv installed, and clone a repo that uses renv, it seems to install (bootstrap?) itself. But it does this within the local renv environment.
The "bootstrap" behavior here is done to help streamline the collaborative workflow. Rather than requiring users explicitly install renv before opening an renv project, renv knows enough to bootstrap itself in an existing project so that new users can get up and running quickly. (In addition, the bootstrapper script also tries to ensure that the version of renv that project was configured to use is installed.)

Non-standard Remotes package INLA in R package

I have a package that Requires INLA, which is not hosted on CRAN or a standard GitHub repository. There are multiple SO questions detailing how to install the package on a personal machine, such as this, or even mentions it as a dependency in a package.
The two ways that are typically recommended to install on a personal machine are:
Direct from INLA website
install.packages("INLA",repos=c(getOption("repos"),INLA="https://inla.r-inla-download.org/R/stable"), dep=TRUE)
From the GitHub host
devtools::install_github(repo = "https://github.com/hrue/r-inla", ref = "stable", subdir = "rinla", build = FALSE)
Now, these are fine for personal machines, but don't work in the DESCRIPTION files Remotes: section.
If we do url::https://inla.r-inla-download.org/R/stable, this gives an error that the file extension isn't recognized.
Error: Error: Failed to install 'unknown package' from URL:
Don't know how to decompress files with extension
If we do github::hrue/r-inla, I am unaware of how to pass (or if it's even possible) the ref, subdir, and build arguments in the DESCRIPTION file.
Previous packages used a read only mirror of the INLA code that was hosted on GitHub, solely for this purpose, at this repo and then just using github::inbo/INLA. However, this repository is out of date.
Current solution
What I'm doing instead is to directly reference the tarball hosted on the main webpage.
url::https://inla.r-inla-download.org/R/stable/src/contrib/INLA_21.02.23.tar.gz
This solution works, and passes CI as well as the machines are able to install and load from there. The only issue is that I need to periodically update the static link to this tarball, and would prefer to reference the stable build, either directly from the INLA website as above, or the hrue/inla repo with those other arguments passed. Directly referencing those links also has the advantage that when my package is re-installed on a machine, it would recognize whether or not the latest version of INLA has been installed on that machine. Is there a way to achieve this in the DESCRIPTION file?
This is not a perfect answer but maybe what you can do is add the zip url of the stable branch of INLA from the new github repository of INLA:-
url::https://github.com/hrue/r-inla/archive/refs/heads/stable.zip
Hence, this will always install the latest stable version of the package.

How can I test a change on a forked R package before sending a pull request?

I have forked and cloned an R repository on my local computer. I have made some edits but I'm not sure how to test these changes before sending out a pull request.
I don't know how to make an R package from this clone one and test it.
The usual two-step:
R CMD build directoryOfYourPackage
resulting in a tar.gz archive you use in the next step:
R CMD check package_1.2.3.tar.gz
where package and the version are determined by the DESCRIPTION file.
Both commands have options, i.e. you can suppress vignette creation and test if you have an insufficient LaTeX installation and these pdf vignettes etc pp. See Writing R Extensions for all the gory details.
Also, if you are set up for Travis CI and the package is then your commit back to your fork should trigger a build at Travis doing the same: package building and check. However, it is also a good idea to check locally before committing...

R package build failing on Unix machines due to missing GSL - GNU Scientific Library

I am facing a particularly vexing problem with R package development. My own package, called ggstatsplot (https://github.com/IndrajeetPatil/ggstatsplot), depends on userfriendlyscience, which depends on another package called MBESS, which itself ultimately depends on another package called gsl. There is no problem at all for installation of ggstatsplot on a Windows machine (as assessed by AppVeyor continuous integration platform: https://ci.appveyor.com/project/IndrajeetPatil/ggstatsplot).
But whenever the package is to be installed on Unix machines, it throws the error that ggstatsplot can't be downloaded because userfriendlyscience and MBESS can't be downloaded because gsl can't be downloaded. The same thing is also revealed on Travis continuous integration platform with virtual Unix machines, where the package build fails (https://travis-ci.org/IndrajeetPatil/ggstatsplot).
Now one way to solve this problem for the user on the Unix machine is to configure GSL (as described here:
installing R gsl package on Mac), but I can't possibly expect every user of ggstatsplot to go through the arduous process of configuring GSL. I want them to just run install.packages("ggstatsplot") and be done with it.
So I would really appreciate if anyone can offer me any helpful advice as to how I can make my package user's life simpler by removing this problem at its source. Is there something I should include in the package itself that will take care of this on behalf of the user?
This may not have a satisfying solution via changes to your R package (I'm not sure either way). If the gsl package authors (which include a former R Core member) didn't configure it to avoid a pre-req installation of a linux package, there's probably a good reason not to.
But it may be some consolation that most R+Linux users understand that some R packages first require installing the underlying Linux libraries (eg, through apt or dnf/yum).
Primary Issue: making it easy for the users to install
Try to be super clear on the GitHub readme and the CRAN INSTALL file. The gsl package has decent CRAN directions. This leads to the following bash code:
sudo apt-get install libgsl0-dev
The best example of clear (linux pre-req package) documentation I've seen is from the curl and sf packages. sf's CRAN page lists only the human names of the 3 libraries, but the GitHub page provides the exact bash commands for three major distribution branches. The curl package does this very well too (eg, CRAN and GitHub). For example, it provides the following explanation and bash code:
Installation from source on Linux requires libcurl. On Debian or Ubuntu use libcurl4-openssl-dev:
sudo apt-get install -y libcurl-dev
Ideally your documentation would describe how do install the gsl linux package on multiple distributions.
Disclaimer: I've never developed a package that directly requires a Linux package, but I use them a lot. In case more examples would help, this doc includes a script I use to install stuff on new Ubuntu machines. Some commands were stated explicitly in the package documentation; some had little or no documentation, and required research.
edit 2018-04-07:
I encountered my new favorite example: the sys package uses a config file to produce the following message in the R console. While installing 100+ packages on a new computer, it was nice to see this direct message, and not have to track down the R package and the documentation about its dependencies.
On Debian/Ubuntu this package requires AppArmor.
Please run: sudo apt-get install libapparmor-dev
Another good one is pdftools, that also uses a config file (and is also developed by Jeroen Ooms).
Secondary Issue: installing on Travis
The userfriendly travis config file apparently installs a lot of binaries directly (including gsl), unlike the current ggstatsplot version.
Alternatively, I'm more familiar with telling travis to install the linux package, as demonstrated by curl's config file. As a bonus, this probably more closely replicates what typical users do on their own machines.
addons:
apt:
packages:
- libcurl4-openssl-dev
Follow up 2018-03-13 Indrajeet and I tweaked the travis file so it's working. Two sections were changed in the yaml file:
The libgsl0-dev entry was added under the packages section (similar to the libcurl4-openssl-dev entry above).
Packages were listed in the r_binary_packages section so they install as binaries. The build was timing out after 50 minutes, and now it's under 10 min. In this particular package, the r_binary_packages section was nested in the Linux part of the Travis matrix so it wouldn't interfere with his two OS X jobs on Travis.

what is the difference of install r package in tow commands?

When I install the packages in R, sometimes it is used by devtools::install_github(). other times it is used by install.packages().
Could I ask what is the essential difference between them?
R's official repository for packages is located on CRAN (Comprehensive R Archive Network). The process of publishing a package there is very strict and is reachable via install.packages(). For the most part, binary packages (opposed to source code, which is not "properly translated" yet) are available and no additional tools need to be present for proper installation (see next paragraph).
GitHub is one of many webservices that offers repositories for code, incl. R code. Author can upload her or his package and if everything is in its place, the user can install a package from source via devtools::install_github(). This means you need to have a proper toolchain installed (also a distributoin of LaTeX). In Windows, this means Rtools. Linux based OS are likely to be shipped with most of the necessary tools.

Resources