How to check whether a dataset exists in package? - r

Is there a more elegant (fail-safe/robust, shorter) way of checking whether a dataset (whose name is known as a character string) exists in a package than this?
rda.name <- "Animals" # name of the data set/.rda
rda.name %in% data(package = "MASS")[["results"]][,"Item"]

You can try this approach using exists:
exists(data("Animals", package = "MASS"))
# [1] TRUE

As mentioned in the comment, I cannot replicate Sven's answer (under any recent version of R). The following works, but the usage of suppressWarnings() is rather ugly and the dataset is also loaded when calling data() this way (instead of just checking its existence). As such, I don't think this is preferable over my original version, but perhaps inspires someone to provide a fix.
exists(suppressWarnings(data(list = rda.name, package = "MASS")))

Related

Removing/de-registering a specific function from an R package

I may not be using the terminology correctly here so please forgive me...
I have a case of one package 'overwriting' a function with the same name loaded by another package and thus changing the behavior (breaking) of a function.
The specific case:
X <- data.frame ( y = rnorm(100), x1 = rnorm(100), x2 = rnorm(100) )
library(CausalImpact)
a <- CausalImpact::CausalImpact( X, c(1,75), c(76, 100) ) # works
library(bfast) # imports quantmod which loads crappy version of as.zoo.data.frame
b <- CausalImpact::CausalImpact( X, c(1,75), c(76, 100) ) # Error
I know the error comes from two versions of the function as.zoo.data.frame.
The problematic version is imported by bfast from the package 'quantmod' (see https://github.com/joshuaulrich/quantmod/issues/168). Unfortunately their hotfix did not prevent this error. Super annoying.
I can hack around this specific problem, but I was wondering if there is a general way to like 'de-register' this function variant from the search path. Neither detach nor unloadNamespace remove the offending function (same behavior after). An explanation and similar problem is discussed here and here, but I wasn't able to find a general solution. For instance I'd rather just remove this function than clone and re-write CausalImpact to deal with this behavior.
From R 3.6.0 onwards, there is a new option called "conflicts.policy" to handle this within an established framework. For small issues like this, you can use the new arguments to library(). If you aren't yet to 3.6, the easiest solution might be to explicitly namespace CausalImpact when you need it, i.e. CausalImpact::CausalImpact. That's a mouthful, so you could do causal_impact <- CausalImpact::CausalImpact and use that alias.
# only attach select
library(dplyr, include.only = "select")
# exclude slice/arrange from being attached.
library(dplyr, exclude = c("slice", "arrange"))
library(bfast, exclude = "CausalImpact") should solve your problem.
Attach means that they are available for use without explicit prefixing with their package. In either of these cases, something like dplyr::slice would work just fine.
For more information, you can see ?library. Also, the R-Core member Luke Tierney wrote a blog explaining how the conflicts.policy works. You can find that here
Here's an answer that works, but is less preferable than de-registering a S3 method because it involves replacing the registered version in the S3 Methods table with the desired method:
library(CausalImpact)
library(bfast)
assignInNamespace("as.zoo.data.frame", zoo:::as.zoo.data.frame, ns = asNamespace("zoo"))
based partially on #smingerson's suggestion in the comments

Using datasets in an R package

I am trying to get the latest version of my package (https://github.com/jmcurran/relSim) on CRAN. This has been rejected because of the use of a data set that is included in the package in a function which is not exported (i.e. the user cannot use it unless they use the ::: operator. A code snippet:
testIS = function(nc = c(3, 2), locus = 1, seed = 123456){
set.seed(seed)
np = 2 * nc[2]
freqs = USCaucs$freqs
The dataset is included in the package, and as per Hadley's advice I have LazyData: true in my DESCRIPTION file. However I get this note from https://win-builder.r-project.org which I don't know how to resolve.
* checking R code for possible problems ... [11s] NOTE
testIS: no visible binding for global variable 'USCaucs'
Undefined global functions or variables::
USCaucs
I find this especially frustrating, since, as I said, this function is not even exported (it also works without complaint because the package loads this dataset). All help appreciated
The solution appears to involve a little duplication. At the suggestion of Thomas Lumley, I placed the object in R/sysdata.rda as well as having it in data/USCaucs.rda. I followed Hadley Wickham's suggestion to use devtools::use_data with the argument internal set to TRUE so that it was saved in the correct manner for a package.
As noted, this solution involves duplicating the data. This isn't an issue for a small object such as the one I have here, but I'd like to think there is a more elegant solution out there.

packaging issues with a function that uses arules

I'm using R and trying to assemble a bunch of functions into a package. One of the function uses the package arules to mine rules from a dataset, subset them and get other interest measures.
I'm having problem with the line that subsets them.
rules <- apriori(trainingTrans, parameter = list(support = 0.005, confidence = 0.0, maxlen = 6)
rulesCases <- subset(rules, subset = rhs %in% "event")
The functions works outside of the package as long as I've loaded arules, but doesn't work in the package whether I've set arules as a Depends, an Imports, or had the function call it with library(arules). The error displayed is 'match' requires vector arguments. I thought Arules has its own version of match to get around that, I've tried arules::match(rhs,"event"), but I still have the same problem.
The issue is that it does not find the correct version of %in%. Possibly this works:
rulesCases <- subset(rules, subset = arules::"%in%"(rhs, "event"))
This should be not necessary if you import arules, but there seems to be something weird going on. I hope this will be resolved in a future release of arules.
I had the same problem in my package and be able to fix it :
The syntax subset(rules, subset = arules::"%in%"(rhs, "event")) forces to use the correct version of %in% in the package, as Michael Hahsler noticed
But rhs is no more related with rules so it needed to be re-precised using rules#rhs
So the correct syntax should be subset(rules, subset = arules::"%in%"(rules#rhs, "event"))
It do the job for my package, with the DESCRIPTION file containing
LinkingTo: arules
Imports: arules
And no further uses of library(arules).

Data inside a function (package creation)

If I need to use a data set inside a function (as a lookup table) inside of a package I'm creating do I need to explicitly load the data set inside of the function?
The function and the data set are both part of my package.
Is this the correct way to use that data set inside the function:
foo <- function(x){
x <- dataset_in_question
}
or is this better:
foo <- function(x){
x <- data(dataset_in_question)
}
or is there some approach I'm not thinking of that's correct?
There was a recent discussion about this topic (in the context of package development) on R-devel, numerous points of which are relevant to this question:
If only the options you provide are applicable to your example R himself (i.e., Brian Ripley) tells you to do:
foo <- function(x){
data("dataset_in_question")
}
This approach will however throw a NOTE in R CMD check which can be avoided in upcoming versions of R (or currently R devel) by using the globalVariables() function, added by John Chambers
The 'correct' approach (i.e., the one advocated by Brian Ripley and Peter Dalgaard) would be to use the LazyData option for your package. See this section of "Writing R Extensions".
Btw: I do not fully understand how your first approach should work. What should x <- dataset_in_question do? Is dataset_in_question a global Variable or defined previously?
For me it was necessary to use get() additionally to LazyData: true in DESCRIPTION file (see postig by #Henrik point 3) to get rid of the NOTE no visible binding for global variable .... My R version is 3.2.3.
foo <- function(x){
get("dataset_in_question")
}
So LazyData makes dataset_in_question directly accessible (without using data("dataset_in_question", envir = environment())) and get() is to satisfy R CMD check
HTH
One can just place the data set as a .rda file in the R folder as described by Hadley here: http://r-pkgs.had.co.nz/data.html#data-sysdata
Matthew Jockers uses this approach in the syuzhet package for data sets including the bing data set as seen at ~line 452 here: https://github.com/mjockers/syuzhet/blob/master/R/syuzhet.R
bing is not available to the user but is to the package as demonstrated by: syuzhet:::bing
Essentially, the command devtools::use_data(..., internal = TRUE) will set everything up in the way it's needed.

No visible binding for global variable Note in R CMD check

I noticed in checking a package that I obtain notes "no visible binding for global variable" when I use functions like subset that use verbatim names of list elements as arguments.
For example with a data frame:
foo <- data.frame(a=c(TRUE,FALSE,TRUE),b=1:3)
I can do silly things like:
subset(foo,a)
transform(foo,a=b)
Which work as expected. The R code check in R CMD however doesn't understand that these refer to elements and complains about there not being any visible bindings of global variables.
While this works ok, I don't really like having notes in my package and prefer for it to pass the check with no errors, warnings and notes at all. I also don't really want to rework my code too much. Is there a way to write these codes so that it is clear the arguments do not refer to global variables?
To get it past R CMD check you can either :
Use get("b") (but that is onerous)
Place a=b=NULL somewhere higher up in your function (that's what I do)
There was a thread on r-devel a while ago where somebody from r-core basically said (from memory) "NOTES are ok, you know. The assumption is that the author checked it and is ok with the NOTE.". But, I agree with you. I do prefer to have CRAN checks return a clean "OK" on all platforms. That way the user is left in no doubt that it passes checks ok.
EDIT :
Here is the r-devel thread I was remembering (from April 2010). So that appears to suggest that there are some situations where there is no known way to avoid the NOTE, but that's ok.
This is one of the potential "unanticipated consequences" of using subset non-interactively. As it says in the Warning section of ?subset:
This is a convenience function intended for use interactively. For
programming it is better to use the standard subsetting functions like
‘[’, and in particular the non-standard evaluation of argument
‘subset’ can have unanticipated consequences.
From R version 2.15.1 onwards there is a way around this:
if(getRversion() >= "2.15.1") utils::globalVariables(c("a", "othervar"))
As per the warning section of ?subset it is better to use subset interactively, and [ for programming.
I would replace a command like
subset(foo,a)
with
foo[foo$a]
or if foo is a dataframe:
foo[foo$a, ]
you might also like to use with if foo is a dataframe and the expression to be evaluated is complex:
with(foo, foo[a, ])
I had this issue and traced it to my ggplot2 section.
This code provided the error:
ggplot2::ggplot(data = spec.df, ggplot2::aes(E.avg, fraction)) +
ggplot2::geom_line() +
ggplot2::ggtitle(paste0(title))
Adding the data name to the parameters eliminated the not:
ggplot2::ggplot(data = spec.df, ggplot2::aes(spec.df$E.avg, spec.df$fraction)) +
ggplot2::geom_line() +
ggplot2::ggtitle(paste0(title))

Resources