Using datasets in an R package - r

I am trying to get the latest version of my package (https://github.com/jmcurran/relSim) on CRAN. This has been rejected because of the use of a data set that is included in the package in a function which is not exported (i.e. the user cannot use it unless they use the ::: operator. A code snippet:
testIS = function(nc = c(3, 2), locus = 1, seed = 123456){
set.seed(seed)
np = 2 * nc[2]
freqs = USCaucs$freqs
The dataset is included in the package, and as per Hadley's advice I have LazyData: true in my DESCRIPTION file. However I get this note from https://win-builder.r-project.org which I don't know how to resolve.
* checking R code for possible problems ... [11s] NOTE
testIS: no visible binding for global variable 'USCaucs'
Undefined global functions or variables::
USCaucs
I find this especially frustrating, since, as I said, this function is not even exported (it also works without complaint because the package loads this dataset). All help appreciated

The solution appears to involve a little duplication. At the suggestion of Thomas Lumley, I placed the object in R/sysdata.rda as well as having it in data/USCaucs.rda. I followed Hadley Wickham's suggestion to use devtools::use_data with the argument internal set to TRUE so that it was saved in the correct manner for a package.
As noted, this solution involves duplicating the data. This isn't an issue for a small object such as the one I have here, but I'd like to think there is a more elegant solution out there.

Related

Removing/de-registering a specific function from an R package

I may not be using the terminology correctly here so please forgive me...
I have a case of one package 'overwriting' a function with the same name loaded by another package and thus changing the behavior (breaking) of a function.
The specific case:
X <- data.frame ( y = rnorm(100), x1 = rnorm(100), x2 = rnorm(100) )
library(CausalImpact)
a <- CausalImpact::CausalImpact( X, c(1,75), c(76, 100) ) # works
library(bfast) # imports quantmod which loads crappy version of as.zoo.data.frame
b <- CausalImpact::CausalImpact( X, c(1,75), c(76, 100) ) # Error
I know the error comes from two versions of the function as.zoo.data.frame.
The problematic version is imported by bfast from the package 'quantmod' (see https://github.com/joshuaulrich/quantmod/issues/168). Unfortunately their hotfix did not prevent this error. Super annoying.
I can hack around this specific problem, but I was wondering if there is a general way to like 'de-register' this function variant from the search path. Neither detach nor unloadNamespace remove the offending function (same behavior after). An explanation and similar problem is discussed here and here, but I wasn't able to find a general solution. For instance I'd rather just remove this function than clone and re-write CausalImpact to deal with this behavior.
From R 3.6.0 onwards, there is a new option called "conflicts.policy" to handle this within an established framework. For small issues like this, you can use the new arguments to library(). If you aren't yet to 3.6, the easiest solution might be to explicitly namespace CausalImpact when you need it, i.e. CausalImpact::CausalImpact. That's a mouthful, so you could do causal_impact <- CausalImpact::CausalImpact and use that alias.
# only attach select
library(dplyr, include.only = "select")
# exclude slice/arrange from being attached.
library(dplyr, exclude = c("slice", "arrange"))
library(bfast, exclude = "CausalImpact") should solve your problem.
Attach means that they are available for use without explicit prefixing with their package. In either of these cases, something like dplyr::slice would work just fine.
For more information, you can see ?library. Also, the R-Core member Luke Tierney wrote a blog explaining how the conflicts.policy works. You can find that here
Here's an answer that works, but is less preferable than de-registering a S3 method because it involves replacing the registered version in the S3 Methods table with the desired method:
library(CausalImpact)
library(bfast)
assignInNamespace("as.zoo.data.frame", zoo:::as.zoo.data.frame, ns = asNamespace("zoo"))
based partially on #smingerson's suggestion in the comments

utils::globalVariables(.) not applicable to R CMD CHECK note:no visible binding for global variable '.' [duplicate]

I noticed in checking a package that I obtain notes "no visible binding for global variable" when I use functions like subset that use verbatim names of list elements as arguments.
For example with a data frame:
foo <- data.frame(a=c(TRUE,FALSE,TRUE),b=1:3)
I can do silly things like:
subset(foo,a)
transform(foo,a=b)
Which work as expected. The R code check in R CMD however doesn't understand that these refer to elements and complains about there not being any visible bindings of global variables.
While this works ok, I don't really like having notes in my package and prefer for it to pass the check with no errors, warnings and notes at all. I also don't really want to rework my code too much. Is there a way to write these codes so that it is clear the arguments do not refer to global variables?
To get it past R CMD check you can either :
Use get("b") (but that is onerous)
Place a=b=NULL somewhere higher up in your function (that's what I do)
There was a thread on r-devel a while ago where somebody from r-core basically said (from memory) "NOTES are ok, you know. The assumption is that the author checked it and is ok with the NOTE.". But, I agree with you. I do prefer to have CRAN checks return a clean "OK" on all platforms. That way the user is left in no doubt that it passes checks ok.
EDIT :
Here is the r-devel thread I was remembering (from April 2010). So that appears to suggest that there are some situations where there is no known way to avoid the NOTE, but that's ok.
This is one of the potential "unanticipated consequences" of using subset non-interactively. As it says in the Warning section of ?subset:
This is a convenience function intended for use interactively. For
programming it is better to use the standard subsetting functions like
‘[’, and in particular the non-standard evaluation of argument
‘subset’ can have unanticipated consequences.
From R version 2.15.1 onwards there is a way around this:
if(getRversion() >= "2.15.1") utils::globalVariables(c("a", "othervar"))
As per the warning section of ?subset it is better to use subset interactively, and [ for programming.
I would replace a command like
subset(foo,a)
with
foo[foo$a]
or if foo is a dataframe:
foo[foo$a, ]
you might also like to use with if foo is a dataframe and the expression to be evaluated is complex:
with(foo, foo[a, ])
I had this issue and traced it to my ggplot2 section.
This code provided the error:
ggplot2::ggplot(data = spec.df, ggplot2::aes(E.avg, fraction)) +
ggplot2::geom_line() +
ggplot2::ggtitle(paste0(title))
Adding the data name to the parameters eliminated the not:
ggplot2::ggplot(data = spec.df, ggplot2::aes(spec.df$E.avg, spec.df$fraction)) +
ggplot2::geom_line() +
ggplot2::ggtitle(paste0(title))

Save package settings between sessions

Is there a definitive way to save options or information pertaining to a certain package between sessions?
For example say somebody made a game and released it as an R package. If they wanted to save high scores and not have them reset each time R started a new session what would be the best way to do this? Currently I can only think of storing a file in the users home directory but I'm not sure if I like that approach.
This may be an approach. I created a dummy package with a dummy function (any function I create is bound to be a dummy function) and a data set I called scores that I set as follows:
scores <- NA
Then I created the package with the scores data set.
Then I used the following to change the data set from within R.
loc <- paste0(find.package("new"), "/Data")
unlink(paste0(loc, "/scores.rda"), recursive = TRUE, force = FALSE)
scores <- 10
save(scores, file=paste0(loc, "/scores.rda"))
Then when I unloaded the library and re loaded agin the data set now says:
> scores
[1] 10
Could this be modified to do what you want? You'd have to have it save in between somehow but am not sure on how to do this without messing with .Last function.
EDIT:
It appears this option is not viable in that when you compile as a package and use lazy load it saves the data sets as:
RData.rbd, RData.rbx, not as .rda files. That means the approach I use above is kinda worthless in that we want it to automatically be recognized.
EDIT2
This approach works and I tried it on a package I made. You can't do lazy load of the data and you have to either explicitly use data(scores) or use data(scores) inside of the function you're calling. I also assigned scores to .scores int he global.env the first time it was created and used exists inside the function to see if it exists. If `.scores. existed I assigned that to scores within the function. Once you unload the library and laod again you never have to worry about that again.
Maybe an alternative is to save this as a function somehow that can be altered using Josh's advice here: Permanently replacing a function
I guess there is no way to store settings without saving them to disk or a database, some way or another. It can be done silently though by putting the code below in your ~/.Rprofile. However, if you have packages that save settings in other ways than using options you need to add them manually.
I know this is exactly what you said you did not want, but it might spark some debate at least.
.Last <- function(){
my.options <- options()
save(my.options, file="~/.Roptions.Rdata")
}
.First <- function(){
tryCatch({
load("~/.Roptions.Rdata")
do.call(options, my.options)
rm(my.options)
}, error=function(...){})
}
To my suprise try(..., silent=TRUE) gives a warning on startup if ~/.Roptions.Rdata does not exist, which is why I used tryCatch instead.
The modern answer to this problem is well explained at https://blog.r-hub.io/2020/03/12/user-preferences/
I think I will be trying the hoardr package! Here is an example that worked for me :)
x <- hoardr::hoard()
x$cache_path_set("yourpackage", type = 'user_cache_dir')
x$mkdir()
scores<-data.frame(
user=c("one","two","three"),
score=c("500,200,1100")
)
save(scores,file = file.path(x$cache_path_get(), "scores.rdata"))
x$list()
x$details()
#new session
x <- hoardr::hoard()
x$cache_path_set("yourpackage", type = 'user_cache_dir')
x$list()
x$details()
load(file = file.path(x$cache_path_get(), "scores.rdata"))
PS - you can see a working example in the rnoaa package found on at github "opensci/rnoaa". Check their R/onload.r file! I can expand if needed.

Data inside a function (package creation)

If I need to use a data set inside a function (as a lookup table) inside of a package I'm creating do I need to explicitly load the data set inside of the function?
The function and the data set are both part of my package.
Is this the correct way to use that data set inside the function:
foo <- function(x){
x <- dataset_in_question
}
or is this better:
foo <- function(x){
x <- data(dataset_in_question)
}
or is there some approach I'm not thinking of that's correct?
There was a recent discussion about this topic (in the context of package development) on R-devel, numerous points of which are relevant to this question:
If only the options you provide are applicable to your example R himself (i.e., Brian Ripley) tells you to do:
foo <- function(x){
data("dataset_in_question")
}
This approach will however throw a NOTE in R CMD check which can be avoided in upcoming versions of R (or currently R devel) by using the globalVariables() function, added by John Chambers
The 'correct' approach (i.e., the one advocated by Brian Ripley and Peter Dalgaard) would be to use the LazyData option for your package. See this section of "Writing R Extensions".
Btw: I do not fully understand how your first approach should work. What should x <- dataset_in_question do? Is dataset_in_question a global Variable or defined previously?
For me it was necessary to use get() additionally to LazyData: true in DESCRIPTION file (see postig by #Henrik point 3) to get rid of the NOTE no visible binding for global variable .... My R version is 3.2.3.
foo <- function(x){
get("dataset_in_question")
}
So LazyData makes dataset_in_question directly accessible (without using data("dataset_in_question", envir = environment())) and get() is to satisfy R CMD check
HTH
One can just place the data set as a .rda file in the R folder as described by Hadley here: http://r-pkgs.had.co.nz/data.html#data-sysdata
Matthew Jockers uses this approach in the syuzhet package for data sets including the bing data set as seen at ~line 452 here: https://github.com/mjockers/syuzhet/blob/master/R/syuzhet.R
bing is not available to the user but is to the package as demonstrated by: syuzhet:::bing
Essentially, the command devtools::use_data(..., internal = TRUE) will set everything up in the way it's needed.

No visible binding for global variable Note in R CMD check

I noticed in checking a package that I obtain notes "no visible binding for global variable" when I use functions like subset that use verbatim names of list elements as arguments.
For example with a data frame:
foo <- data.frame(a=c(TRUE,FALSE,TRUE),b=1:3)
I can do silly things like:
subset(foo,a)
transform(foo,a=b)
Which work as expected. The R code check in R CMD however doesn't understand that these refer to elements and complains about there not being any visible bindings of global variables.
While this works ok, I don't really like having notes in my package and prefer for it to pass the check with no errors, warnings and notes at all. I also don't really want to rework my code too much. Is there a way to write these codes so that it is clear the arguments do not refer to global variables?
To get it past R CMD check you can either :
Use get("b") (but that is onerous)
Place a=b=NULL somewhere higher up in your function (that's what I do)
There was a thread on r-devel a while ago where somebody from r-core basically said (from memory) "NOTES are ok, you know. The assumption is that the author checked it and is ok with the NOTE.". But, I agree with you. I do prefer to have CRAN checks return a clean "OK" on all platforms. That way the user is left in no doubt that it passes checks ok.
EDIT :
Here is the r-devel thread I was remembering (from April 2010). So that appears to suggest that there are some situations where there is no known way to avoid the NOTE, but that's ok.
This is one of the potential "unanticipated consequences" of using subset non-interactively. As it says in the Warning section of ?subset:
This is a convenience function intended for use interactively. For
programming it is better to use the standard subsetting functions like
‘[’, and in particular the non-standard evaluation of argument
‘subset’ can have unanticipated consequences.
From R version 2.15.1 onwards there is a way around this:
if(getRversion() >= "2.15.1") utils::globalVariables(c("a", "othervar"))
As per the warning section of ?subset it is better to use subset interactively, and [ for programming.
I would replace a command like
subset(foo,a)
with
foo[foo$a]
or if foo is a dataframe:
foo[foo$a, ]
you might also like to use with if foo is a dataframe and the expression to be evaluated is complex:
with(foo, foo[a, ])
I had this issue and traced it to my ggplot2 section.
This code provided the error:
ggplot2::ggplot(data = spec.df, ggplot2::aes(E.avg, fraction)) +
ggplot2::geom_line() +
ggplot2::ggtitle(paste0(title))
Adding the data name to the parameters eliminated the not:
ggplot2::ggplot(data = spec.df, ggplot2::aes(spec.df$E.avg, spec.df$fraction)) +
ggplot2::geom_line() +
ggplot2::ggtitle(paste0(title))

Resources