Difference between tests/ and inst/tests/ for R packages - r

I'm developing a new package and I have some unit tests to write. What's the difference between tests/ and inst/tests/? What kinds of stuff should go in each?
In particular, I see in http://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf that Hadley recommends using inst/tests/ "so users
also have access to them," then putting a reference in tests/ to run them all. But why not just put them all in tests/?

What #hadley means is that in binary packages, tests/ is not present; it is only in the source package. The convention is that anything in inst/ is copied to the package top-level directory upon installation, so inst/tests/ will be available as /tests in the binary and installed package directory structure.
See my permute package as an example. I used #hadley's testthat package as a learning experience and for my package tests. The package is on CRAN. Grab the source tarball and notice that it has both tests/ and inst/tests/, then grab the Windows binary and notice that only has tests/ and that is a copy of the one from inst/tests in the sources.
Strictly, only tests/ is run by R CMD check etc so during development and as checks in production packages you need code in tests/ to test the package does what it claims, or other unit tests. You can of course have code in tests/ that runs R scripts in /inst/tests/ which actually do the tests, and this has the side effect of also making the test code available to package users.
The way I see things is you need at least tests/, whether you need inst/tests will depend upon how you want to develop your package and what unit testing code/packages you are using. The inst/tests/ is something #hadley advocates but it is far from being a standard across much of CRAN.

As 'Writing R Extensions' says, only inst/tests gets installed. So only those can be used for tests for both the source versions of the package, and its binary (installed) form.
Otherwise tests/ is of course there for the usual R CMD check tests. Now, Martin Maechler once developed a 'hook' script from tests/ to use inst/tests, and I have been using that in a few of my packages, permitting them to be invoked when a user looks at the source, as well as after the fact. So you can in fact pivot out of the one set of tests into the other, and get the best of both worlds.
Edit: Here is a link to what our RProtoBuf package does to invoke inst/tests/ tests from tests/: https://github.com/eddelbuettel/rprotobuf/blob/master/tests/runUnitTests.R And the basic idea is due to Martin as I said.

Related

How to run R package tests without building or installing the package?

R CMD check automatically runs tests located in tests/ directory. However running the tests this way requires building the package first. After that R CMD check goes through various different sanity checks before finally reaching the tests at the end.
Question: Is there a way to run those tests without having to build or install the package first?
NOTE: without using testthat or other non-standard packages.
To summarise our discussion.
To my knowledge there is no standard alternative to R CMD check for unit testing provided by base R
Typically for unit testing, I source everything under R/ (and dyn.load everything under source/) and then source everything under tests/ (actually, I also use the Example sections of the help pages in the man/ directory as test cases and compare their outcome to those from previous package versions)
I assume that these are the basic testing functionalities provided by devtools and testthat. If you expect to develop multiple packages and want to stay independent from non-base-R, I'd recommed to automate the above processes with custom scripts/packages.
I'd recomment looking into http://r-pkgs.had.co.nz/tests.html.

R: prevent source re-compilation using devtools::install

I am in the process of developing an R package with external source code that takes a long time to compile. While compilation time isn't a big problem for a one-off installation, I have to routinely reinstall the package to test new additions. Is it possible to prevent re-compiling the source code if there haven't been any changes to it?
I don't necessarily need this to be automated, but I can't figure out a manual solution either. As my source code is in Rust, the following serves as the most representative example I have (note that it requires Rust cargo to be installed):
git clone https://github.com/r-rust/hellorust
Rscript -e "devtools::install('hellorust', quick = TRUE)"
When I run the above, I see that the hellorust.so file has been created in the src directory, but how do I make devtools::install() use this file rather than recompile everything? It doesn't seem like devtools::install(quick = TRUE) or devtools::install(build = FALSE) are meant for this...
Alternatively, is it possible to achieve the desired behavior on the Rust side of things? I don't understand why cargo would recompile everything if there haven't been any changes and the target directory is still there. That said, I'm quite new to Rust and compiled languages in general so my understanding of the broader concepts involved here is unfortunately quite limited...
I would also be interested to learn if there is a better way to test R packages during development than manually reinstalling them.
Based on the comments by r2evans, the final answer seems to be that this isn't what devtools::install is for.
As per the devtools documentation, there are three main tools for "frequent development tasks":
load_all
document
test
Of these load_all "simulates installing and reloading your package, loading R code in R/, compiled shared objects in src/ and data files in data/". By default, load_all() will not recompile source code in src/ (unless the recompile flag is set to true).
So the answer is to use load_all as opposed to install during package development and manually control when to compile the source code using something like devtools::compile_dll.

When should a Julia project have a Manifest AND Project file?

I am trying to understand when a Julia Project needs a Manifest AND Project file vs when it just needs a project file. What are the different situations that warrant each case? I am trying to make sure my own project is set up correctly(It has both files currently).
The Manifest.toml is a snapshot of the exact state of a Julia environment. It specifies all packages that are installed in the environment with version numbers - not just the ones that have been ] added but the entire dependency graph!
The Project.toml on the other hand just lists the direct dependencies, that is the packages that have been ] added explicitly, potentially with version bounds specified in a [compat] section.
By checking in both files (specifically the Manifest.toml), you make your project reproducible. Another user only has to ] instantiate and will have the exact same environment that you had when working on the project. This is great for application projects which might consist of multiple Julia scripts which are not intended for use by other Julia projects.
If you only check in the Project.toml you are specifying the dependency information more loosely and will leave room for Julias resolver to find appropriate package versions for all dependencies. This is what you should do when working on a Julia package since people will likely want to install your package next to other packages and overly restricting versions of dependencies will make your package incompatible.
Hence, I'd summarize as follows:
Application / "Project" -> Project.toml + Manifest.toml
Julia Package -> Only Project.toml
For more on applications and packages checkout the glossary of the Pkg.jl documentation.
(Note that there are exceptional cases (unregistered dependencies, for example) where you might have to check in a Manifest.toml for a Julia package.)
In Julia 1.2 and above, you can have nested Project.toml files to express test-specific dependencies. Since you may have a Project.toml in your test folder, which you would need to activate, I would also suggest including a Manifest.toml, as a record of under which environment you for-sure know your package's test are passing.
In other words, I believe in the package/application dichotomy mentioned in crstnbr's answer, and the recommendation to include Manifest.toml with applications, and I would further say that the tests within a package are like an application. The same goes for performance benchmarks that you might have in your package.
I haven't practiced this myself, but it seems like it would be nice to have the CI tests run under both the "frozen" version of the test/Manifest.toml, and the latest versions that the package manager can find of each package. If the tests start failing, it would be easier to tease apart whether the breakage is caused by a change in a dependency.

What type of object is an R package?

Probably a pretty basic question but a friend and I tried to run str(packge_name) and R threw us an error. Now that I'm looking at it, I'm wondering if an R package is like a .zip file in that it is a collection of objects, say pictures and songs, but not a picture or song itself.
If I tried to open a zip of pictures with an image viewer, it wouldn't know what to do until I unzipped it - just like I can't call str(forecast) but I can call str(ts) once I've loaded the forecast package into my library...
Can anyone set me straight?
R packages are generally distributed as compressed bundles of files. They can either be in "binary" form which are preprocessed at a repository to compile any C or Fortran source and create the proper headers, or they can be in source form where the various required files are available to be used in the installation process, but this requires that the users have the necessary compilers and tools installed at locations where the R build process using OS system resources can get at them.
If you read the documentation for a package at CRAN you see they are distributed in set of compressed formats that vary depending on the OS-targets:
Package source: Rcpp_0.11.3.tar.gz # the Linus/UNIX targets
Windows binaries: r-devel: Rcpp_0.11.3.zip, r-release: Rcpp_0.11.3.zip, r-oldrel: Rcpp_0.11.3.zip
OS X Snow Leopard binaries: r-release: Rcpp_0.11.3.tgz, r-oldrel: Rcpp_0.11.3.tgz
OS X Mavericks binaries: r-release: Rcpp_0.11.3.tgz
Old sources: Rcpp archive # not really a file but a web link
Once installed an R package will have a specified directory structure. The DESCRIPTION file is a text file with specific entries for components that determine whether the local installation meets the dependencies of the package. There are NAMESPACE, LICENSE, and INDEX files. There are directories named '/help', '/html', '/Meta', '/R', and possibly '/libs', '/demo', '/data', '/unitTests', and others.
This is the tree at the top of the ../library/Rcpp package directory:
$ ls
CITATION NAMESPACE THANKS examples libs
DESCRIPTION NEWS.Rd announce help prompt
INDEX R discovery html skeleton
Meta README doc include unitTests
So in the "life-cycle" of a package, there will be initially a series of required and optional files, which then get processed by the BUILD and CHECK mechanisms into an installed package, which than then get compressed for distribution, and later unpacked into a specified directory tree on the users machine. See these help pages:
?.libPaths # also describes .Library()
?package.skeleton
?install.packages
?INSTALL
And of course read Writing R Extensions, a document that ships with every installation of R.
Your question is:
What type of object is an R package?
Somehow, I’m still missing an answer to this exact question. So here goes:
As far as R is concerned, an R package is not an object. That is, it’s not an object in R’s type system. R is being a bit difficult, because it allows you to write
library(pkg_name)
Without requiring you to define pkg_name anywhere prior. In contrast, other objects which you are using in R have to be defined somewhere – either by you, or by some package that’s loaded either explicitly or implicitly.
This is unfortunate, and confuses people. Therefore, when you see library(pkg_name), think
library('pkg_name')
That is, imagine the package name in quotes. This does in fact work just as expected. The fact that the code also works without quotes is a peculiarity of the library function, known as non-standard evaluation. In this case, it’s mostly an unfortunate design decision (but there are reasons).
So, to repeat the answer: a package isn’t a type of R object1. For R, it’s simply a name which refers to a known location in the file system, similar to what you’ve assumed. BondedDust’s answer goes into detail to explain that structure, so I shan’t repeat it here.
1 For super technical details, see Joshua’s and Richard’s comments below.
From R's own documentation:
Packages provide a mechanism for loading optional code, data and
documentation as needed.…A package is a directory of files which
extend R, a source package (the master files of a package), or a
tarball containing the files of a source package, or an installed
package, the result of running R CMD INSTALL on a source package. On
some platforms (notably OS X and Windows) there are also binary
packages, a zip file or tarball containing the files of an installed
package which can be unpacked rather than installing from sources. A
package is not a library.
So yes, a package is not the functions within it; it is a mechanism to have R be able to use the functions or data which comprise the package. Thus, it needs to be loaded first.
I am reading Hadley's book Advanced-R (Chapter 6.3 - functions, p.79) and this quote will cover you I think:
Every operation is a function call
“To understand computations in R, two slogans are helpful:
Everything that exists is an object.
Everything that happens is a function call."
— John Chambers
According to that using library(name_of_library) is a function call that will load the package. Every little bit that has been loaded i.e. functions or data sets are objects which you can use by calling other functions. In that sense a package is not an object in any of R's environments until it is loaded. Then you can say that it is a collection of the objects it contains and which are loaded.

Dependency management in R

Does R have a dependency management tool to facilitate project-specific dependencies? I'm looking for something akin to Java's maven, Ruby's bundler, Python's virtualenv, Node's npm, etc.
I'm aware of the "Depends" clause in the DESCRIPTION file, as well as the R_LIBS facility, but these don't seem to work in concert to provide a solution to some very common workflows.
I'd essentially like to be able to check out a project and run a single command to build and test the project. The command should install any required packages into a project-specific library without affecting the global R installation. E.g.:
my_project/.Rlibs/*
Unfortunately, Depends: within the DESCRIPTION: file is all you get for the following reasons:
R itself is reasonably cross-platform, but that means we need this to work across platforms and OSs
Encoding Depends: beyond R packages requires encoding the Depends in a portable manner across operating systems---good luck encoding even something simple such as 'a PNG graphics library' in a way that can be resolved unambiguously across systems
Windows does not have a package manager
AFAIK OS X does not have a package manager that mixes what Apple ships and what other Open Source projects provide
Even among Linux distributions, you do not get consistency: just take RStudio as an example which comes in two packages (which all provide their dependencies!) for RedHat/Fedora and Debian/Ubuntu
This is a hard problem.
The packrat package is precisely meant to achieve the following:
install any required packages into a project-specific library without affecting the global R installation
It allows installing different versions of the same packages in different project-local package libraries.
I am adding this answer even though this question is 5 years old, because this solution apparently didn't exist yet at the time the question was asked (as far as I can tell, packrat first appeared on CRAN in 2014).
Update (November 2019)
The new R package renv replaced packrat.
As a stop-gap, I've written a new rbundler package. It installs project dependencies into a project-specific subdirectory (e.g. <PROJECT>/.Rbundle), allowing the user to avoid using global libraries.
rbundler on Github
rbundler on CRAN
We've been using rbundler at Opower for a few months now and have seen a huge improvement in developer workflow, testability, and maintainability of internal packages. Combined with our internal package repository, we have been able to stabilize development of a dozen or so packages for use in production applications.
A common workflow:
Check out a project from github
cd into the project directory
Fire up R
From the R console:
library(rbundler)
bundle('.')
All dependencies will be installed into ./.Rbundle, and an .Renviron file will be created with the following contents:
R_LIBS_USER='.Rbundle'
Any R operations run from within this project directory will adhere to the project-speciic library and package dependencies. Note that, while this method uses the package DESCRIPTION to define dependencies, it needn't have an actual package structure. Thus, rbundler becomes a general tool for managing an R project, whether it be a simple script or a full-blown package.
You could use the following workflow:
1) create a script file, which contains everything you want to setup and store it in your projectd directory as e.g. projectInit.R
2) source this script from your .Rprofile (or any other file executed by R at startup) with a try statement
try(source("./projectInit.R"), silent=TRUE)
This will guarantee that even when no projectInit.R is found, R starts without error message
3) if you start R in your project directory, the projectInit.R file will be sourced if present in the directory and you are ready to go
This is from a Linux perspective, but should work in the same way under windows and Mac as well.

Resources