packaging issues with a function that uses arules - r

I'm using R and trying to assemble a bunch of functions into a package. One of the function uses the package arules to mine rules from a dataset, subset them and get other interest measures.
I'm having problem with the line that subsets them.
rules <- apriori(trainingTrans, parameter = list(support = 0.005, confidence = 0.0, maxlen = 6)
rulesCases <- subset(rules, subset = rhs %in% "event")
The functions works outside of the package as long as I've loaded arules, but doesn't work in the package whether I've set arules as a Depends, an Imports, or had the function call it with library(arules). The error displayed is 'match' requires vector arguments. I thought Arules has its own version of match to get around that, I've tried arules::match(rhs,"event"), but I still have the same problem.

The issue is that it does not find the correct version of %in%. Possibly this works:
rulesCases <- subset(rules, subset = arules::"%in%"(rhs, "event"))
This should be not necessary if you import arules, but there seems to be something weird going on. I hope this will be resolved in a future release of arules.

I had the same problem in my package and be able to fix it :
The syntax subset(rules, subset = arules::"%in%"(rhs, "event")) forces to use the correct version of %in% in the package, as Michael Hahsler noticed
But rhs is no more related with rules so it needed to be re-precised using rules#rhs
So the correct syntax should be subset(rules, subset = arules::"%in%"(rules#rhs, "event"))
It do the job for my package, with the DESCRIPTION file containing
LinkingTo: arules
Imports: arules
And no further uses of library(arules).

Related

Removing/de-registering a specific function from an R package

I may not be using the terminology correctly here so please forgive me...
I have a case of one package 'overwriting' a function with the same name loaded by another package and thus changing the behavior (breaking) of a function.
The specific case:
X <- data.frame ( y = rnorm(100), x1 = rnorm(100), x2 = rnorm(100) )
library(CausalImpact)
a <- CausalImpact::CausalImpact( X, c(1,75), c(76, 100) ) # works
library(bfast) # imports quantmod which loads crappy version of as.zoo.data.frame
b <- CausalImpact::CausalImpact( X, c(1,75), c(76, 100) ) # Error
I know the error comes from two versions of the function as.zoo.data.frame.
The problematic version is imported by bfast from the package 'quantmod' (see https://github.com/joshuaulrich/quantmod/issues/168). Unfortunately their hotfix did not prevent this error. Super annoying.
I can hack around this specific problem, but I was wondering if there is a general way to like 'de-register' this function variant from the search path. Neither detach nor unloadNamespace remove the offending function (same behavior after). An explanation and similar problem is discussed here and here, but I wasn't able to find a general solution. For instance I'd rather just remove this function than clone and re-write CausalImpact to deal with this behavior.
From R 3.6.0 onwards, there is a new option called "conflicts.policy" to handle this within an established framework. For small issues like this, you can use the new arguments to library(). If you aren't yet to 3.6, the easiest solution might be to explicitly namespace CausalImpact when you need it, i.e. CausalImpact::CausalImpact. That's a mouthful, so you could do causal_impact <- CausalImpact::CausalImpact and use that alias.
# only attach select
library(dplyr, include.only = "select")
# exclude slice/arrange from being attached.
library(dplyr, exclude = c("slice", "arrange"))
library(bfast, exclude = "CausalImpact") should solve your problem.
Attach means that they are available for use without explicit prefixing with their package. In either of these cases, something like dplyr::slice would work just fine.
For more information, you can see ?library. Also, the R-Core member Luke Tierney wrote a blog explaining how the conflicts.policy works. You can find that here
Here's an answer that works, but is less preferable than de-registering a S3 method because it involves replacing the registered version in the S3 Methods table with the desired method:
library(CausalImpact)
library(bfast)
assignInNamespace("as.zoo.data.frame", zoo:::as.zoo.data.frame, ns = asNamespace("zoo"))
based partially on #smingerson's suggestion in the comments

R testthat and devtools: why does a minimal unit test break my package?

I'm working on an R package for sparse matrix handling. It kinda works; here's a minimal example to set the stage for my question.
devtools::install_github("ekernf01/MatrixLazyEval", ref = "eef5593ad")
library(Matrix)
library(MatrixLazyEval)
data(CAex)
M = rbind(CAex, CAex)
M = matrix(stats::rnorm(prod(dim(M))), nrow = nrow(M))
M_lazy = AsLazyMatrix( M )
svd_lazy = RandomSVDLazyMatrix(M_lazy)
But, when I run even a minimal unit test, it breaks the package permanently (I have to restart my R session or reinstall the package). The immediate cause is that R can't find some S4 methods from packages I depend on (e.g. for matrix transpose t or colSums from the Matrix package). I run the unit test like this:
devtools::test(filter = "minimal")
svd_lazy = RandomSVDLazyMatrix(M_lazy)
Here's the contents of the test files.
> cat tests/testthat.R
library(testthat)
testthat::test_check("MatrixLazyEval")
> cat tests/testthat/testthat_minimal.R
context("minimal")
Why does this happen? Maybe this is naive, but the unit test shouldn't even do anything.
Edit
Possibly related:
r - data.table and testthat package
https://github.com/r-lib/devtools/issues/192
R data.table breaks in exported functions
You need to import all the generics you're using in your package namespace:
#' #importFrom Matrix t tcrossprod colSums rowMeans
NULL
This will fix the issue that you're observing and you'll be able the tests multiple times in the same session.
Also this will allow other packages that import Matrix::t to consistently use your custom methods. Currently, since you're calling setMethod() in your package, you're creating a new t() generic local to your namespace whenever Matrix is not attached to the search path at load-time (this is why it worked the first time you ran the tests). This prevents other packages using Matrix::t() to access your methods. Importing Matrix::t() explicitly will fix this because you'll never create a local generic for t().

Using datasets in an R package

I am trying to get the latest version of my package (https://github.com/jmcurran/relSim) on CRAN. This has been rejected because of the use of a data set that is included in the package in a function which is not exported (i.e. the user cannot use it unless they use the ::: operator. A code snippet:
testIS = function(nc = c(3, 2), locus = 1, seed = 123456){
set.seed(seed)
np = 2 * nc[2]
freqs = USCaucs$freqs
The dataset is included in the package, and as per Hadley's advice I have LazyData: true in my DESCRIPTION file. However I get this note from https://win-builder.r-project.org which I don't know how to resolve.
* checking R code for possible problems ... [11s] NOTE
testIS: no visible binding for global variable 'USCaucs'
Undefined global functions or variables::
USCaucs
I find this especially frustrating, since, as I said, this function is not even exported (it also works without complaint because the package loads this dataset). All help appreciated
The solution appears to involve a little duplication. At the suggestion of Thomas Lumley, I placed the object in R/sysdata.rda as well as having it in data/USCaucs.rda. I followed Hadley Wickham's suggestion to use devtools::use_data with the argument internal set to TRUE so that it was saved in the correct manner for a package.
As noted, this solution involves duplicating the data. This isn't an issue for a small object such as the one I have here, but I'd like to think there is a more elegant solution out there.

dplyr 0.7.0 tidyeval in packages

Preamble
I commonly use dplyr in my packages. Prior to 0.7.0, I was using the underscored versions of dplyr verbs to avoid NOTEs during R CMD CHECK. For example, the code:
x <- tibble::tibble(v = 1:3, w = 2)
y <- dplyr::filter(x, v > w)
would have yielded the R CMD CHECK NOTE:
* checking R code for possible problems ... NOTE
no visible binding for global variable ‘v’
By comparison, using the standard evaluation version:
y <- dplyr::filter_(x, ~v > w)
yielded no such NOTE.
However, in dplyr 0.7.0, the vignette Programming with dplyr says that the appropriate syntax for including dplyr functions in packages (to avoid NOTEs) is:
y <- dplyr::filter(x, .data$v > .data$w)
Consequently, the news file says that "the underscored version of each main verb is no longer needed, and so these functions have been deprecated (but remain around for backward compatibility)."
Question
The vignette says that the above new syntax will not yield R CMD check NOTES, "provided that you’ve also imported rlang::.data with #importFrom rlang .data." However, when I run the code:
y <- dplyr::filter(x, rlang::.data$v > rlang::.data$w)
Evaluation error: Object `From` not found in data.
Is this error similar to the following?
y <- dplyr::filter(x, v == dplyr::n())
Evaluation error: This function should not be called directly.
Namely, for some functions, calling them prefixed with the package yields errors? (Something to do with whether or not they've been exported, perhaps?)
Comment
As an aside, is there a less verbose way of writing package-friendly dplyr functions with the new syntax in 0.7.0? In particular, the syntax for dplyr >=0.7.0:
y <- dplyr::filter(x, .data$v > .data$w)
is more verbose than the syntax for dplyr <0.7.0:
y <- dplyr::filter_(x, ~v > w)
and the verbosity increases as more variables are referenced. However, I don't want to use the less verbose syntax with the underscored version, as it is deprecated.
for some functions, calling them prefixed with the package yields errors?
That's right, but we could make them work to make things more predictable. You can file a github issue for this feature.
is there a less verbose way of writing package-friendly dplyr functions with the new syntax in 0.7.0?
The alternative is to declare all your column symbols to R, e.g. within a globalVariables(c("v", "w")) statement somewhere in your package.
Ideally, R should know about NSE functions and never warn for unknown symbols in those cases.
Another work-around is to add lines such as
v <- NULL; # mark as not an unbound global reference
just above your expressions that are generating CRAN checks. It is no less accurate (column names are not in fact global variables) and has somewhat limited scope.

Choose function to load from an R package

I like using function reshape from the matlab package, but I need then to specify base::sum(m) each time I want to sum the elements of my matrix or else matlab::sum is called, which only sums by columns..
I need loading package gtools to use the rdirichlet function, but then the function gtools::logit masks the function pracma::logit that I like better..
I gess there are no such things like:
library(loadOnly = "rdirichlet", from = "gtools")
or
library(loadEverythingFrom = "matlab", except = "sum")
.. because functions from the package matlab may internaly work on the matlab::sum function. So the latter must be loaded. But is there no way to get this behavior from the point of view of the user? Something that would feel like:
library(pracma)
library(matlab)
library(gtools)
sum <- base::sum
logit <- pracma::logit
.. but that would not spoil your ls() with all these small utilitary functions?
Maybe I need defining my own default namespace?
To avoid spoiling your ls, you can do something like this:
.ns <- new.env()
.ns$sum <- base::sum
.ns$logit <- pracma::logit
attach(.ns)
To my knowledge there is no easy answer to what you want to achieve. The only dirty hack I can think of is to download the source of the packages "matlab", "gtools", "pracma" and delete the offending functions from their NAMESPACE file prior to installation from source (with R CMD INSTALL package).
However, I would recommend using the explicit notation pracma::logit, because it improves readability of your code for other people and yourself in the future.
This site gives a good overview about package namespaces:
http://r-pkgs.had.co.nz/namespace.html

Resources