How to autoupdate version number on successful build of an R package - r

I found this article (original) about how to auto update a package version number in R.
I would like to implement it in the same way as they suggest but I fail at the point to create my own Makefile to build a package.
The function they provide is working fine.
Can anyone help me to create a Makefile to check, build, if both are successful increase the version number, build under new version number. Within RStudio it is possible to select Makefile as build tool.
In general I like the idea to have all development packages with a 4th version number like 0.1.2.9001. At the moment I always overwrite my packages and only set three numbers manually like 0.1.3.

Related

How to run R package tests without building or installing the package?

R CMD check automatically runs tests located in tests/ directory. However running the tests this way requires building the package first. After that R CMD check goes through various different sanity checks before finally reaching the tests at the end.
Question: Is there a way to run those tests without having to build or install the package first?
NOTE: without using testthat or other non-standard packages.
To summarise our discussion.
To my knowledge there is no standard alternative to R CMD check for unit testing provided by base R
Typically for unit testing, I source everything under R/ (and dyn.load everything under source/) and then source everything under tests/ (actually, I also use the Example sections of the help pages in the man/ directory as test cases and compare their outcome to those from previous package versions)
I assume that these are the basic testing functionalities provided by devtools and testthat. If you expect to develop multiple packages and want to stay independent from non-base-R, I'd recommed to automate the above processes with custom scripts/packages.
I'd recomment looking into http://r-pkgs.had.co.nz/tests.html.

Developing R package, testing with `foreach`, while running simulations at same time with different package version

I write almost all my R code in packages at work (and use git). I make heavy use of devtools, in particular short cuts for load_all, etc as I update functions used in a package. I have a rough understanding of devtools, in that load_all makes a temporary copy of the package, and I really like this workflow for testing function updates in packages.
Is there a nice easy way/workflow for running simulations depending on the package, while developing it at the same time, without "breaking" those simulations?
I suspect there is an easy solution that I've overlooked.
Right now what I do is:
get the package "mypackage" up to a point ready for running simulations. copy the whole folder containing the project. Run the simulations in the copied folder using a new package name "mypackage2"). Run simulation scripts which include library(mypackage2) but NOT library(mypackage). This annoyingly means I need to update library(mypackage) calls to library(mypackage2) calls. If I run simulations using library(mypackage) and avoid using library(mypackage2), then I need to make sure the current built version of mypackage is the 'old' one that doesn't reflect updates in 2. below (but 2. below requires rebuilding the package too!). Handling all this gets messy.
While the simulations are running in the copied folder I can update the functions in "mypackage", by either using load_all or rebuilding the package. I often need to Rebuild the package (i.e. using load_all without rebuilding the package when testing updates to the package isn't a workable solution) because I want to test functions that run small parallel simulations with doParallel and foreach, etc (on windows), and any functions I modify and want to test need the latest built "mypackage" in the children processes which spawn new R processes calling "mypackage". I understand that when a package is built in R, it gets stored in ..\R\R-3.6.1\library, and when future R sessions call library(mypackage) they will use that version of the package.
What I'd ideally like to be able to do is, in the same original folder, run simulations with a version of mypackage, and then update the code in the package while simulations are stopped/started, confident my development changes won't break the simulations which are running a specific version of the package.
Is there a simple way for doing the above, without having to recopy folders (and make things like "mypackage2")?
thanks
The issue described here is sort of similar to what I am facing Specify package location in foreach
The problem is that if I run a simulation that takes several days using "mypackage", with many calls to foreach, and update and rebuild "mypackage" when testing changes, future foreach calls from the simulation may pick up the new updated version of the package, which would be a disaster.
I think the answers in the other question do apply,
but you need to do some extra steps.
Let's say you have a version of the package you want to test.
You'd still create a specific folder for that version, but you leave it empty.
Here I'll use /tmp/mypkg2 as an example.
While having your project open in RStudio, you execute:
withr::with_libpaths(c("/tmp/mypkg2", .libPaths()), devtools::install())
That will install that version of the package to the provided folder.
You could then have a wrapper script,
say wrapper.R,
with something like:
pkg_path <- commandArgs(trailingOnly = TRUE)[1L]
cat("Using package at", pkg_path, "\n")
.libPaths(c(pkg_path, .libPaths()))
library(doParallel)
workers <- makeCluster(detectCores())
registerDoParallel(workers)
# We need to modify the lib path in each worker too
parallel::clusterExport(workers, "pkg_path")
parallel::clusterEvalQ(workers, .libPaths(c(pkg_path, .libPaths())))
# ... Your code calling your package and doing stuff
parallel::stopCluster(workers)
Afterwards, from the command line (outside of R/RStudio),
you could type (assuming Rscript is in your path):
Rscript path/to/wrapper.R /tmp/mypkg2
This way, the actual testing code can stay the same
(including calls to library)
and R will automatically search first in pkg_path,
loading your specific package version,
and then searching in the standard locations for any dependencies you may have.
I don't fully understand your use-case (as to why you want to do this) but what I normally do when testing two versions of a package is to push the most recent version to my dev branch in GitHub and then use devtools::load_all() to test what I'm currently working on. Then by using remotes::install_github() and specifying the dev branch you can run the GitHub version with mypackage::func and the devtools version with func

How does a typical Rcpp edit-compile-test cycle look like?

I can only find information on how to install a ready-made R extension package, but it is nowhere mentioned which commands a developer of an extension package has to use during daily development. I am using Rcpp and I am on Windows.
If this were a typical C++ project, it would go like this:
edit
make # oops, typo
edit # fix typo
make # oops, forgot an #include
edit
make # good; updates header dependencies for subsequent 'make' automatically
./fooreader # test it
make install # only now I'm ready
Which commands do I need for daily development of an Rcpp package project?
I've allocated a skeleton project using these commands from the R command line:
library(Rcpp)
Rcpp.package.skeleton("FooReader", example_code=FALSE,
author="My Name", email="my.email#example.com")
This allocated 3 files:
DESCRIPTION
NAMESPACE
man/FooReader-package.Rd
Now I dropped source code into
src/readfoo.cpp
with these contents:
#include <Rcpp.h>
#error here
I know I can run this from the R command line:
Rcpp::sourceCpp("D:/Projects/FooReader/src/readfoo.cpp")
(this does run the compiler and indicates the #error).
But I want to develop a package ultimately.
There is no universal answer for everybody, I guess.
For some people, RStudio is everything, and with some reason. One can use the package creation facility to create an Rcpp package, then edit and just hit the buttons (or keyboard shortcuts) to compile and re-load and test.
I also work a lot on a shell, so I do a fair amount of editing in Emacs/ESS along with R CMD INSTALL (where thanks to ccache recompilation of unchanged code is immediate) with command-line use via r of the littler package -- this allows me to write compact expressions loading the new package and evaluating: r -lnewpackage -esomeFunc(somearg) to test newpackage::someFunc() with somearg.
You can also launch the build and test from Emacs. As I said, it all depends.
Both those answers are for package, where I do real work. When I just test something in a single file, I do that in one Emacs buffer and sourceCpp() in an R session in another buffer of the same Emacs. Or sometimes I edit in Emacs and run sourceCpp() in RStudio.
There is no one answer. Find what works for you.
Also, the first part of your question describes the initial setup of a package. That is not part of the edit/compile/link/test cycle as it is a one off. And for that too do we have different approaches many of which have been discussed here.
Edit: The other main misunderstanding of your question is that once you have package you generally do not use sourceCpp() anymore.
In order to test an R package, it has to be installed into a (temporary) library such that it can be attached to a running R process. So you will typically need:
R CMD build . to build package_version.tar.gz
R CMD check <package_version.tar.gz> to test your package, including tests placed into the testsfolder
R CMD INSTALL <package_version.tar.gz> to install it into a library
After that you can attach the package and test it. Quite often I try to use a more TTD approach, which means I do not have to INSTALL the package. Running the unit tests (e.g. via R CMD check) is enough.
All that is independent of Rcpp. For a package using Rcpp you need to call Rcpp::compileAttributes() before these steps, e.g. with Rscript -e 'Rcpp::compileAttributes()'.
If you use RStudio for package development, it offers a lot of automation via the devtools package. I still find it useful to know what has to go on under the hood and it is by no means required.

Virtual environment in R?

I've found several posts about best practice, reproducibility and workflow in R, for example:
How to increase longer term reproducibility of research (particularly using R and Sweave)
Complete substantive examples of reproducible research using R
One of the major preoccupations is ensuring portability of code, in the sense that moving it to a new machine (possibly running a different OS) is relatively straightforward and gives the same results.
Coming from a Python background, I'm used to the concept of a virtual environment. When coupled with a simple list of required packages, this goes some way to ensuring that the installed packages and libraries are available on any machine without too much fuss. Sure, it's no guarantee - different OSes have their own foibles and peculiarities - but it gets you 95% of the way there.
Does such a thing exist within R? Even if it's not as sophisticated. For example simply maintaining a plain text list of required packages and a script that will install any that are missing?
I'm about to start using R in earnest for the first time, probably in conjunction with Sweave, and would ideally like to start in the best way possible! Thanks for your thoughts.
I'm going to use the comment posted by #cboettig in order to resolve this question.
Packrat
Packrat is a dependency management system for R. Gives you three important advantages (all of them focused in your portability needs)
Isolated : Installing a new or updated package for one project won’t break your other projects, and vice versa. That’s because packrat gives each project its own private package library.
Portable: Easily transport your projects from one computer to another, even across different platforms. Packrat makes it easy to install the packages your project depends on.
Reproducible: Packrat records the exact package versions you depend on, and ensures those exact versions are the ones that get installed wherever you go.
What's next?
Walkthrough guide: http://rstudio.github.io/packrat/walkthrough.html
Most common commands: http://rstudio.github.io/packrat/commands.html
Using Packrat with RStudio: http://rstudio.github.io/packrat/rstudio.html
Limitations and caveats: http://rstudio.github.io/packrat/limitations.html
Update: Packrat has been soft-deprecated and is now superseded by renv, so you might want to check this package instead.
The Anaconda package manager conda supports creating R environments.
conda create -n r-environment r-essentials r-base
conda activate r-environment
I have had a great experience using conda to maintain different Python installations, both user specific and several versions for the same user. I have tested R with conda and the jupyter-notebook and it works great. At least for my needs, which includes RNA-sequencing analyses using the DEseq2 and related packages, as well as data.table and dplyr. There are many bioconductor packages available in conda via bioconda and according to the comments on this SO question, it seems like install.packages() might work as well.
It looks like there is another option from RStudio devs, renv. It's available on CRAN and supersedes Packrat.
In short, you use renv::init() to initialize your project library, and use renv::snapshot() / renv::restore() to save and load the state of your library.
I prefer this option to conda r-enviroments because here everything is stored in the file renv.lock, which can be committed to a Git repo and distributed to the team.
To add to this:
Note:
1. Have Anaconda installed already
2. Assumed your working directory is "C:"
To create desired environment -> "r_environment_name"
C:\>conda create -n "r_environment_name" r-essentials r-base
To see available environments
C:\>conda info --envs
.
..
...
To activate environment
C:\>conda activate "r_environment_name"
(r_environment_name) C:\>
Launch Jupyter Notebook and let the party begins
(r_environment_name) C:\> jupyter notebook
For a similar "requirements.txt", perhaps this link will help -> Is there something like requirements.txt for R?
Check out roveR, the R container management solution. For details, see https://www.slideshare.net/DavidKunFF/ownr-technical-introduction, in particular slide 12.
To install roveR, execute the following command in R:
install.packages("rover", repos = c("https://lair.functionalfinances.com/repos/shared", "https://lair.functionalfinances.com/repos/cran"))
To make full use of the power of roveR (including installing specific versions of packages for reproducibility), you will need access to a laiR - for CRAN, you can use our laiR instance at https://lair.ownr.io, for uploading your own packages and sharing them with your organization you will need a laiR license. You can contact us on the email address in the presentation linked above.

How can I get a Makefile from an existing R package?

I like the GBM package in R.
I can't get R's memory management to work with the combination of my machine/data set/task needed for reasons that have been covered elsewhere and should be considered off topic for the purposes of this question.
I would like to "rip" out the GBM algorithm away from R and rebuild it as standalone code.
Unfortunately there is no Makefile in the package tarball (or indeed any R package tarball I've seen). Is there a place I can look for straightforward Makefiles of R packages? Or do I really have to go way back to ground zero and write my own Makefile for the long painful journey ahead?
As Henry Spencer quipped: "Those who do not understand Unix are doomed to reinvent it, poorly."
R packages do not have a Makefile because R creates one on the fly when building the package, using both the defaults of the current R installation and the settings in the package, typically via a file Makevars.
Run the usual command R CMD INSTALL foo_1.2.3.tar.gz and you will see the effect of the generated Makefile as the build proceeds. Worst case you can always start by copying and pasting.
You could also take a look at CMake which can quite easily create makefiles for you. It took me minimal time to get it working for a project of mine.

Resources