How to submit an R package to CRAN via GitHub Actions? - r

I've created an R package and I'd like to upload it to CRAN via GitHub Actions whenever I merge changes into the master branch. I've found a lot of examples of R actions and I've even looked up how some of the most popular packages like dplyr do it and even though I've found a devtools::release() helper function, I still haven't seen a workflow that would submit a library to CRAN when you merge changes into the master branch. Do package developers do this manually? Is there any reason why this hasn't been automated?

CRAN works quite differently from other language repositories, as uploads are not fully automated like in e.g. PyPI.
When you upload a new package, it is subject to verification from an actual human. When you update a package, if it triggers certain checks it will also be subject to a new review from a human. When a package uploads successfully and passes the first verification, many automated checks are run for it over the course of weeks (e.g. different OSes, compilers, compiler options, architectures, sanitizers, valgrind, etc.), and precompiled binaries are automatically generated for some platforms and R versions from your source code.
The CRAN policies explicitly state that frequent updates are not allowed, and you're not supposed to be submitting uploads any faster than once every couple months, for which I think this level of automation would not be worth it.
Even if you do want to automate this process, there is an email verification in the middle, so you'd perhaps have to do something with selenium + other scripts.
BTW if you are worried about complicated building processes and are using RStudio, you can configure on a per-project basis what arguments to use when building source or binary distributions of your package.

Related

Programmatic check of R package version

I'm building an Rmarkdown document reliant on a frequently updated data package that's hosted on github.
How do I make sure that the document is always built using the latest version of the package, without installing the package on each build?
You can see the list of commits to a package by getting the commits page for the package. For example,
https://github.com/tidyverse/dplyr/commits
shows that there were commits today. If you save a copy of the top hash in that response (currently af75177), and then update whenever it changes, you should be sure to have the latest version.
However, this is likely a bad policy. The package is not necessarily in a working condition after a commit: perhaps the author is planning another one a minute later to finish some update. It's much safer to use update.packages() and only get the updates that are judged to be stable enough to be sent to and accepted on CRAN.

Pinning R package versions

How do you best pin package versions in R?
Rejected strategy 1: Pin to CRAN source tar.gzs
Doesn't work if you want to pin it at the latest version since CRAN does not put the tip version in the archive (duh)
Rejected strategy 2: Use devtools
Don't want to, because it takes ages to compile and adds lots of stuff I don't want to use
Rejected strategy 3: Vendor
Would rather avoid having to copy all source
To provide a little bit more information on packrat, which I use for this purpose. From the website.
R package dependencies can be frustrating. Have you ever had to use
trial-and-error to figure out what R packages you need to install to
make someone else’s code work–and then been left with those packages
globally installed forever, because now you’re not sure whether you
need them? Have you ever updated a package to get code in one of your
projects to work, only to find that the updated package makes code in
another project stop working?
We built packrat to solve these problems. Use packrat to make your R
projects more:
Isolated: Installing a new or updated package for one project won’t
break your other projects, and vice versa. That’s because packrat
gives each project its own private package library. Portable: Easily
transport your projects from one computer to another, even across
different platforms. Packrat makes it easy to install the packages
your project depends on. Reproducible: Packrat records the exact
package versions you depend on, and ensures those exact versions are
the ones that get installed wherever you go.
Packrat stores the version of the packages you use in the packrat.lock file, and then downloads that version from CRAN whenever you packrat::restore(). It is much lighter weight than devtools, but can still take some time to re-download all of the packages (depending on the packages you are using).
If you prefer to store all of the sources in a zip file, you can use packrat::snapshot() to pull down the sources / update the packrat.lock and then packrat::bundle() to "bundle" everything up. The aim for this is to make projects / research reproducible and portable over time by storing the package versions and dependencies used on the original design (along with the source code so that the OS dependency on a binary is avoided).
There is much more information on the website I linked to, and you can see current activity on the git repo. I have encountered a few cases that work in a less-than-ideal way (packages not on CRAN have some issues at times), but the git repo still seems to be pretty active with issues/patches which is encouraging.

What's a good strategy for saving old R package versions on GitHub?

The development of RStudio and the packages devtools and roxygen2 has made R package creation pretty easy. I use GitHub for version control and devtools allows others to easily install directly from my account.
As my package gradually changes with each version, I'm wondering if I should be maintaining .zip files (or other format) of my past stable builds, in case anyone would ever want to use a previous version.
It's easy to download a .zip of an R package directly from GitHub, but I'm wondering if I should add this to the same GitHub directory (e.g. https://github.com/myaccount/mypackage/previous_versions/mypackage_0.1.zip) without messing up somebody's installation via install_github("myaccount/mypackage").
So, the main Qs are:
Should I keep an old package version at all?
Should I keep old package versions in a sub-folder of my GitHub R package directory?
Should I save .zip files downloaded from GitHub as my old version, or produce a Source or Binary file during the package build itself (i.e. in RStudio)?
Is this a superfluous activity if one isn't yet willing to publish to CRAN?!
When you think your package is at a good solid place, you should tag a release. This archives the branch at that point in time and stores the zip file with the source code, and the tar.gz file.
I tend to mark my CRAN packages as a release each time I release it to CRAN (for example, see https://github.com/nutterb/pixiedust/releases) and with some intemediary tags that I consider noteworthy.
Another good strategy for managing changes in between tagged releases is to maintain a development branch below your main branch. That way your development changes won't pollute or break anything being used by those pulling from your main branch. It makes you free to experiment in the dev branch while always having a clean, working copy to push to and restore from.
1. Should I keep an old package version at all?
It's subjective, but I'd definitely say "yes" unless there's a space constraint, which is probably unlikely.
This serves 2 purposes. One is for your own convenience, such as if you want to make sure that you always have a quick way to test the results of older versions versus a newer version.
The other is that people often need older versions of packages, such as if someone wants to use your package but they're using an older version of R on a server where the policies prevent an update to R. Perhaps a newer version of your package includes a new dependency which only works with a package that depends on a certain version of R or higher.
Of course, packages can always be installed without the compressed or binary files, but it's a nice convenience.
2. Should I keep old package versions in a sub-folder of my GitHub R package directory?
I would put it in a trunk or special subfolder that won't be automatically downloaded when someone tries to install_github or clone your master branch. Having a separate branch is a good idea.
3. Should I save .zip files downloaded from GitHub as my old version, or produce a Source or Binary file during the package build itself (i.e. in RStudio)?
As the package author you're in a position to know if these differ significantly and which if either is better, but by default I'd recommend the RStudio build because I assume (if you're like me) that you're less likely to include unnecessary files this way.
4. Is this a superfluous activity if one isn't yet willing to publish to CRAN?!
No, not necessarily. If people rely on your package then it really doesn't matter if it's on CRAN or not. In fact, not being on CRAN may be a reason to be more proactive like this to ensure that your users will always have access to the needed version of your package.

Can CRAN packages be modified and re-uploaded by others?

I've been working with R for a long time, but I'm a complete newbie in writing (and/or publishing) own packages via CRAN. Actually, I create a new package for educational purposes (university) and I want to load it to CRAN, so my students (and, of course, others) can download and use it.
After I uploaded my package (let's call it “JohnnyStat”), is it possible that another person (let's call “Mark Miller”) modifies it and adds his name as another “co-author” (“author”/“contributor” etc.)?
So, as a result, the package “JohnnyStat” would be registered as written by “Johnny” AND “Mark Miller”?
No. Only the maintainer can upload package updates, not any co-author. An acceptance mail is automatically sent to the mail address of the maintainer. And no-one can become maintainer without the explicit consent of the previous maintainer (for obvious reasons).
If you want the possibility of various people modifying the package, maybe CRAN is not the best option. It is possible to install from other repositories. Why not have the package at e.g. R-forge or github?
You may get a more complete answer if you ask this on the CRAN mailing list R-package-devel.

How do you use multiple versions of the same R package?

In order to be able to compare two versions of a package, I need to able to choose which version of the package that I load. R's package system is set to by default to overwrite existing packages, so that you always have the latest version. How do I override this behaviour?
My thoughts so far are:
I could get the package sources, edit the descriptions to give different names and build, in effect, two different packages. I'd rather be able to work directly with the binaries though, as it is much less hassle.
I don't necessarily need to have both versions of the packages loaded at the same time (just installed somewhere at the same time). I could perhaps mess about with Sys.getenv('R_HOME') to change the place where R installs the packages, and then .libpaths() to change the place where R looks for them. This seems hacky though, so does anyone have any better ideas?
You could selectively alter the library path. For complete transparency, keep both out of your usual path and then do
library(foo, lib.loc="~/dev/foo/v1") ## loads v1
and
library(foo, lib.loc="~/dev/foo/v2") ## loads v2
The same works for install.packages(), of course. All these commands have a number of arguments, so the hooks you aim for may already be present. So don't look at changing R_HOME, rather look at help(install.packages) (assuming you install from source).
But AFAIK you cannot load the same package twice under the same name.
Many years have passed since the accepted answer which is of course still valid. It might however be worthwhile to mention a few new options that arised in the meanwhile:
Managing multiple versions of packages
For managing multiple versions of packages on a project (directory) level, the packrat tool can be useful: https://rstudio.github.io/packrat/. In short
Packrat enhances your project directory by storing your package dependencies inside it, rather than relying on your personal R library that is shared across all of your other R sessions.
This basically means that each of your projects can have its own "private library", isolated from the user and system libraries. If you are using RStudio, packrat is very neatly integrated and easy to use.
Installing custom package versions
In terms of installing a custom version of a package, there are many ways, perhaps the most convenient may be using the devtools package, example:
devtools::install_version("ggplot2", version = "0.9.1")
Alternatively, as suggested by Richie, there is now a more lightweight package called remotes that is a result of the decomposition of devtools into smaller packages, with very similar usage:
remotes::install_version("ggplot2", version = "0.9.1")
More info on the topic can be found:
https://support.rstudio.com/hc/en-us/articles/219949047-Installing-older-versions-of-packages
I worked with R for a longtime now and it's only today that I thought about this. The idea came from the fact that I started dabbling with Python and the first step I had to make was to manage what they (pythonistas) call "Virtual environments". They even have dedicated tools for this seemingly important task. I informed myself more about this aspect and why they take it so seriously. I finally realized that this is a neat and important way to manage different projects with conflicting dependencies. I wanted to know why R doesn't have this feature and found that actually the concept of "environments" exists in R but not introduced to newbies like in Python. So you need to check the documentation about this and it will solve your issue.
Sorry for rambling but I thought it would help.

Resources