R testthat and devtools: why does a minimal unit test break my package? - r

I'm working on an R package for sparse matrix handling. It kinda works; here's a minimal example to set the stage for my question.
devtools::install_github("ekernf01/MatrixLazyEval", ref = "eef5593ad")
library(Matrix)
library(MatrixLazyEval)
data(CAex)
M = rbind(CAex, CAex)
M = matrix(stats::rnorm(prod(dim(M))), nrow = nrow(M))
M_lazy = AsLazyMatrix( M )
svd_lazy = RandomSVDLazyMatrix(M_lazy)
But, when I run even a minimal unit test, it breaks the package permanently (I have to restart my R session or reinstall the package). The immediate cause is that R can't find some S4 methods from packages I depend on (e.g. for matrix transpose t or colSums from the Matrix package). I run the unit test like this:
devtools::test(filter = "minimal")
svd_lazy = RandomSVDLazyMatrix(M_lazy)
Here's the contents of the test files.
> cat tests/testthat.R
library(testthat)
testthat::test_check("MatrixLazyEval")
> cat tests/testthat/testthat_minimal.R
context("minimal")
Why does this happen? Maybe this is naive, but the unit test shouldn't even do anything.
Edit
Possibly related:
r - data.table and testthat package
https://github.com/r-lib/devtools/issues/192
R data.table breaks in exported functions

You need to import all the generics you're using in your package namespace:
#' #importFrom Matrix t tcrossprod colSums rowMeans
NULL
This will fix the issue that you're observing and you'll be able the tests multiple times in the same session.
Also this will allow other packages that import Matrix::t to consistently use your custom methods. Currently, since you're calling setMethod() in your package, you're creating a new t() generic local to your namespace whenever Matrix is not attached to the search path at load-time (this is why it worked the first time you ran the tests). This prevents other packages using Matrix::t() to access your methods. Importing Matrix::t() explicitly will fix this because you'll never create a local generic for t().

Related

Using datasets in an R package

I am trying to get the latest version of my package (https://github.com/jmcurran/relSim) on CRAN. This has been rejected because of the use of a data set that is included in the package in a function which is not exported (i.e. the user cannot use it unless they use the ::: operator. A code snippet:
testIS = function(nc = c(3, 2), locus = 1, seed = 123456){
set.seed(seed)
np = 2 * nc[2]
freqs = USCaucs$freqs
The dataset is included in the package, and as per Hadley's advice I have LazyData: true in my DESCRIPTION file. However I get this note from https://win-builder.r-project.org which I don't know how to resolve.
* checking R code for possible problems ... [11s] NOTE
testIS: no visible binding for global variable 'USCaucs'
Undefined global functions or variables::
USCaucs
I find this especially frustrating, since, as I said, this function is not even exported (it also works without complaint because the package loads this dataset). All help appreciated
The solution appears to involve a little duplication. At the suggestion of Thomas Lumley, I placed the object in R/sysdata.rda as well as having it in data/USCaucs.rda. I followed Hadley Wickham's suggestion to use devtools::use_data with the argument internal set to TRUE so that it was saved in the correct manner for a package.
As noted, this solution involves duplicating the data. This isn't an issue for a small object such as the one I have here, but I'd like to think there is a more elegant solution out there.

Why does using "<<-" in a function in the global workspace work, but not in a package?

I'm creating a package using devtools and roxygen2 (in RStudio), however after I've built the package my function no longer works as intended. Yet, if I load the function's .R file and run the function from there in RStudio, it works perfectly. I've created another package using this method before and it worked fine (13 functions all working as intended from my other package), yet I cant seem to get this new one to work.
To start creating the package I start with:
library("devtools")
devtools::install_github("klutometis/roxygen")
library(roxygen2)
setwd("my parent directory")
create("triale")
All is working fine so far. So I put my .R file containing my function in the R folder under the triale folder. The .R file looks like this:
#' Trial Z Function
#'
#' This function counts the values in the columns
#' #param x is the number
#' #keywords x
#' #export
#' #examples
#' trialz()
trialz = function(x) {w_id= c(25,x,25,25,25,1,1,1,1,1);
wcenter= c(rep("BYSTAR-1",10));
df1 <<- data.frame(w_id, wcenter);
countit <<- data.table(df1);
view <<- countit[, .N, by = list(w_id, wcenter)];
View(view)}
Again if I were to just run the code from the .R file, and test the function it works fine. But to continue, next I enter:
setwd("./triale")
document()
The triale documentation is updated, triale is loaded, and the NAMESPACE and trialz.Rd are both written so that trialz.Rd is under the man folder, and NAMESPACE is under the triale folder as intended. Next I install triale:
setwd("..")
install("triale")
Which I know works because I get the following:
Installing triale
"C:/PROGRA~1/R/R-31~1.3/bin/x64/R" --vanilla CMD INSTALL \
"C:/Users/grice/Documents/R/triale" \
--library="C:/Users/grice/Documents/R/win-library/3.1" --install-tests
* installing *source* package 'triale' ...
** R
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (triale)
Reloading installed triale
Package is now built, so I do the following:
library("triale")
library("data.table")
Note whenever I load the package data.table I get the following error message:
data.table 1.9.4 For help type: ?data.table
*** NB: by=.EACHI is now explicit. See README to restore previous behaviour.
However it doesnt seem to affect my function. So now its time to test my function from my package:
trialz(25)
This goes through, and I of course get a populated df1, and countit, but for whatever reason view is always empty (as in 0 obs. of 0 variables).
So I test my work using the dummy code below:
>trialy = function(x) {wid= c(25,x,25,25,25,1,1,1,1,1);
wc= c(rep("BYSTAR-1",10));
df2 <<- data.frame(wid, wc);
countitt <<- data.table(df2);
viewer <<- countitt[, .N, by = list(wid, wc)];
View(viewer)}
>trialy(25)
Even though this is the same exact code with just the names changed around it works. Dumbfounded I open trialz.R and copy the function from there and run it as below, and that works:
> trialz = function(x) {w_id= c(25,x,25,25,25,1,1,1,1,1);
wcenter= c(rep("BYSTAR-1",10));
df1 <<- data.frame(w_id, wcenter);
countit <<- data.table(df1);
view <<- countit[, .N, by = list(w_id, wcenter)];
View(view)}
> trialz(25)
Since I've created a package before I know my method is solid (that package had 13 dif. functions, all of which worked). I just don't understand how a function can work fine as written, yet when I package it, the function no longer works.
Again here is where it stops working as intended when using my package:
view <<- countit[, .N, by = list(w_id, wcenter)];
View(view)}
And my end result should look something like this, if my package worked:
wid wc N
1 25 BYSTAR-1 5
2 1 BYSTAR-1 5
Can anyone explain why view is never populated after I package my function? I've tested it as much as I know how, and my results should be reproducible for anyone thats willing to try it for themselves.
Thanks, I appreciate any feedback.
Your problem here is that "<<-" does not create variables in the global environment but rather in the parent environment. (See help("<<-").)
The parent environment of a function is the environment in which it has been defined. In the case where you defined your function directly in your workspace, this parent environment actually is the same as your workspace environment (namely: .GlobalEnv), which is why your variables are assigned values as you expect them to. In the case where your function is packaged, however, the parent environment is the package environment and not the .GlobalEnv! This is why you do not see your variables being assigned values in your workspace.
Refer to the chapter on environments in Hadley's book and How R Searches and Finds Stuff for more details on environments in R.
Note that doing this would not be considered a proper debugging technique, to say the least. In general, you never want to use the "<<-" operator.
For options on debugging R code, see, e.g., this question. I, in particular, like the debugonce function very well. See ?debugonce.
I forgot one important part when editing my description file in that I for got to add
Imports: data.table
Also the NAMESPACE file needed to include the data.table package as an import as well, like so:
import(data.table)
export(Z)
export(AS) .... etc.
Doing this ensures that whenever a function within your package uses a function from another package, that (second) package is called up before your code is executed.

Choose function to load from an R package

I like using function reshape from the matlab package, but I need then to specify base::sum(m) each time I want to sum the elements of my matrix or else matlab::sum is called, which only sums by columns..
I need loading package gtools to use the rdirichlet function, but then the function gtools::logit masks the function pracma::logit that I like better..
I gess there are no such things like:
library(loadOnly = "rdirichlet", from = "gtools")
or
library(loadEverythingFrom = "matlab", except = "sum")
.. because functions from the package matlab may internaly work on the matlab::sum function. So the latter must be loaded. But is there no way to get this behavior from the point of view of the user? Something that would feel like:
library(pracma)
library(matlab)
library(gtools)
sum <- base::sum
logit <- pracma::logit
.. but that would not spoil your ls() with all these small utilitary functions?
Maybe I need defining my own default namespace?
To avoid spoiling your ls, you can do something like this:
.ns <- new.env()
.ns$sum <- base::sum
.ns$logit <- pracma::logit
attach(.ns)
To my knowledge there is no easy answer to what you want to achieve. The only dirty hack I can think of is to download the source of the packages "matlab", "gtools", "pracma" and delete the offending functions from their NAMESPACE file prior to installation from source (with R CMD INSTALL package).
However, I would recommend using the explicit notation pracma::logit, because it improves readability of your code for other people and yourself in the future.
This site gives a good overview about package namespaces:
http://r-pkgs.had.co.nz/namespace.html

Passing an entire package to a snow cluster

I'm trying to parallelize (using snow::parLapply) some code that depends on a package (ie, a package other than snow). Objects referenced in the function called by parLapply must be explicitly passed to the cluster using clusterExport. Is there any way to pass an entire package to the cluster rather than having to explicitly name every function (including a package's internal functions called by user functions!) in clusterExport?
Install the package on all nodes, and have your code call library(thePackageYouUse) on all nodes via one the available commands, egg something like
clusterApply(cl, library(thePackageYouUse))
I think the parallel package which comes with recent R releases has examples -- see for example here from help(clusterApply) where the boot package is loaded everywhere:
## A bootstrapping example, which can be done in many ways:
clusterEvalQ(cl, {
## set up each worker. Could also use clusterExport()
library(boot)
cd4.rg <- function(data, mle) MASS::mvrnorm(nrow(data), mle$m, mle$v)
cd4.mle <- list(m = colMeans(cd4), v = var(cd4))
NULL
})

How to correct for "Error in nullmodel(comm, method) : could not find function "list2env" in the vegan package

I'm busy exploring the package vegan for R, using it to calculate nestedness of community matrices and null models. I'm particularly interested in using the permat functions as well as Oecosimu.
However, when running my program I obtained the following errors:
Error in nullmodel(comm, method) : could not find function "list2env"
Error in nullmodel(m, ALGO) : could not find function "list2env"
I then even ran an example (given below) of how to use these functions given by the R help function, and even these examples gave the same error. Am I suppose to import something else in order to use these functions or how do I go about fixing this?
Examples:
m <- matrix(c(
1,3,2,0,3,1,
0,2,1,0,2,1,
0,0,1,2,0,3,
0,0,0,1,4,3
), 4, 6, byrow=TRUE)
x1 <- permatswap(m, "quasiswap")
summary(x1)
x2 <- permatfull(m)
summary(x2)
x3 <- permatfull(m, "none", mtype="prab")
x3$orig
summary(x3)
x4 <- permatfull(m, strata=c(1,1,2,2))
summary(x4)
Technically, this is a bug in the development version of Vegan on R-Forge. We were failing to declare a dependency on R versions >= 2.12 in DESCRIPTION. I have checked in the relevant change to the source tree to fix this but it will take a day or so before the tarball and binaries are rebuilt by R-Forge.
That said, you should probably update your R to something more recent. Or use the versions of those functions provided in Vegan 2.0-x on CRAN.
list2env is part of R base, which means it comes with the distribution, not in an add-on package. So if you don't have it you're probably either running an old version of R or have a broken installation. The example worked fine for me, with R 2.12.1 and vegan 2.1-0.
Your code works for me without an error message
The most probable cause of your error is your using old versions of R, vegan or permute
The R news for changes says
CHANGES IN R VERSION 2.12.0: NEW FEATURES:
o New list2env() utility function as an inverse of
as.list(<environment>) and for fast multi-assign() to existing
environment. as.environment() is now generic and uses list2env()
as list method.
CHANGES IN R VERSION 2.12.1: BUG FIXES:
o When list2env() created an environment it was missing a PROTECT
call and so was vulnerable to garbage collection.
CHANGES IN R VERSION 2.13.0: NEW FEATURES:
o list2env(envir = NULL) defaults to hashing (with a suitably sized
environment) for lists of more than 100 elements.
So update your version of R and the packages and try again.

Resources