R:"Note: no visible binding for global variable" after devtools:use_data - r

I want to make a dataframe available to users of a package, as well as to functions of the package.
Starting from a new package I've used
devtools::use_data_raw("my_data")
This creates the file data_raw/my_data.R, which I edit as
my_data <- data.frame(x = runif(3), y = runif(3))
devtools::use_data(my_data, overwrite = TRUE)
After running the code above, a file data/my_data.Rda is created.
According to Hadley Wickham's R Packages every file under data/ is exported, and if I try
load_all()
my_data
I can see that this is the case. However if I now try to use the my_data dataframe inside a function in the package, say like R/test_my_data.R
test_my_data <- function {
my_data
}
and then I run
devtools::check()
I get the infamous no visible binding for global variable my_data NOTE.
I appreciate there are many questions already on this topic but many are related to the cases where a tidy evaluation function is used, or data from another package is referred to. Why is R CMD check failing on the example above, and what's the correct way of sorting this out?
I'm aware that the
utils::globalVariables("my_data")
solution will avoid this NOTE, but I'd like to know if there's a proper way of informing R CMD check that my_data actually exists.

Related

Determining which version of a function is active when many packages are loaded

If I have multiple packages loaded that define functions of the same name, is there an easy way to determine which version of the function is currently the active one? Like, lets say I have base R, the tidyverse, and a bunch of time series packages loaded. I'd like a function which_package("intersect") that would tell me the package name of the active version of the intersect function. I know you can go back and look at all the warning messages you recieved when installing packages, but I think that sort of manual search is not only tedious but also error-prone.
There is a function here that does sort of what I want, except it produces a table for all conflicts rather than the value for one function. I would actually be quite happy with that, and would also accept a similar function as an answer, but I have had problems with the implimentation of function given. As applied to my examples, it inserts vast amounts of white space and many duplicates of the package names (e.g. the %>% function shows up with 132 packages listed), making the output hard to read and hard to use. It seems like it should be easy to remove the white space and duplicates, and I have spent considerable time on various approaches that I expected to work but which had no impact on the outcome.
So, for an example of many conflicts:
install.packages(pkg = c("tidyverse", "fpp3", "tsbox", "rugarch", "Quandl", "DREGAR", "dynlm", "zoo", "GGally", "dyn", "ARDL", "bigtime", "BigVAR", "dLagM", "VARshrink")
lapply(x = c("tidyverse", "fable", "tsbox", "rugarch", "Quandl", "DREGAR", "dynlm", "zoo", "GGally", "dyn", "ARDL", "bigtime", "BigVAR", "dLagM", "VARshrink"),
library, character.only = TRUE)
You can pull this information with your own function helper.
which_package <- function(fun) {
if(is.character(fun)) fun <- getFunction(fun)
stopifnot(is.function(fun))
x <- environmentName(environment(fun))
if (!is.null(x)) return(x)
}
This will return R_GlobalEnv for functions that you define in the global environment. There is also the packageName function if you really want to restrict it to packages only.
For example
library(MASS)
library(dplyr)
which_package(select)
# [1] "dplyr"

Removing/de-registering a specific function from an R package

I may not be using the terminology correctly here so please forgive me...
I have a case of one package 'overwriting' a function with the same name loaded by another package and thus changing the behavior (breaking) of a function.
The specific case:
X <- data.frame ( y = rnorm(100), x1 = rnorm(100), x2 = rnorm(100) )
library(CausalImpact)
a <- CausalImpact::CausalImpact( X, c(1,75), c(76, 100) ) # works
library(bfast) # imports quantmod which loads crappy version of as.zoo.data.frame
b <- CausalImpact::CausalImpact( X, c(1,75), c(76, 100) ) # Error
I know the error comes from two versions of the function as.zoo.data.frame.
The problematic version is imported by bfast from the package 'quantmod' (see https://github.com/joshuaulrich/quantmod/issues/168). Unfortunately their hotfix did not prevent this error. Super annoying.
I can hack around this specific problem, but I was wondering if there is a general way to like 'de-register' this function variant from the search path. Neither detach nor unloadNamespace remove the offending function (same behavior after). An explanation and similar problem is discussed here and here, but I wasn't able to find a general solution. For instance I'd rather just remove this function than clone and re-write CausalImpact to deal with this behavior.
From R 3.6.0 onwards, there is a new option called "conflicts.policy" to handle this within an established framework. For small issues like this, you can use the new arguments to library(). If you aren't yet to 3.6, the easiest solution might be to explicitly namespace CausalImpact when you need it, i.e. CausalImpact::CausalImpact. That's a mouthful, so you could do causal_impact <- CausalImpact::CausalImpact and use that alias.
# only attach select
library(dplyr, include.only = "select")
# exclude slice/arrange from being attached.
library(dplyr, exclude = c("slice", "arrange"))
library(bfast, exclude = "CausalImpact") should solve your problem.
Attach means that they are available for use without explicit prefixing with their package. In either of these cases, something like dplyr::slice would work just fine.
For more information, you can see ?library. Also, the R-Core member Luke Tierney wrote a blog explaining how the conflicts.policy works. You can find that here
Here's an answer that works, but is less preferable than de-registering a S3 method because it involves replacing the registered version in the S3 Methods table with the desired method:
library(CausalImpact)
library(bfast)
assignInNamespace("as.zoo.data.frame", zoo:::as.zoo.data.frame, ns = asNamespace("zoo"))
based partially on #smingerson's suggestion in the comments

Creating R package containing a dataset and an R function which uses the data

I am creating an R package containing a dataset and an R function which uses the data.
The R function looks like this:
myFun <- function(iobs){
data(MyData)
return(MyData[iobs,])
}
When I do the usual "R CMD check myPack" business, it gives me error saying
* checking R code for possible problems ... NOTE
myFun: no visible binding for global variable ‘MyData’
Is there way to fix this problem?
You can use lazy-loading for this.
Just put
LazyData: yes
in your DESCRIPTION file and remove
data(MyData)
from your function.
Due to lazy-loading your MyData-Object will be available in your namespace, so no need for to call data().
Two alternatives to the lazy data approach. Both rely on using the list argument to data
data(list = 'MyData')
Define as an default argument of the function (may not ideal as then can be changed)
myFun <- function(iobs, myData = data(list='MyData')){
return(myData[iobs,])
}
Load into an empty environment then extract using [[.
myFun2 <- function(iobs){
e <- new.env(parent = emptyenv())
data(list='MyData', envir = e)
e[['MyData']][iobs,]
}
Note that
e$MyData[iobs,] should also work.
I would also suggest using drop = TRUEas safe practice to retain the same class as MyData
eg
MyData[iobs,,drop=TRUE]. This may not be an issue given the specifics of this function and the structure of MyData, but is good programming practice, especially within packages when you want robust code.

Save package settings between sessions

Is there a definitive way to save options or information pertaining to a certain package between sessions?
For example say somebody made a game and released it as an R package. If they wanted to save high scores and not have them reset each time R started a new session what would be the best way to do this? Currently I can only think of storing a file in the users home directory but I'm not sure if I like that approach.
This may be an approach. I created a dummy package with a dummy function (any function I create is bound to be a dummy function) and a data set I called scores that I set as follows:
scores <- NA
Then I created the package with the scores data set.
Then I used the following to change the data set from within R.
loc <- paste0(find.package("new"), "/Data")
unlink(paste0(loc, "/scores.rda"), recursive = TRUE, force = FALSE)
scores <- 10
save(scores, file=paste0(loc, "/scores.rda"))
Then when I unloaded the library and re loaded agin the data set now says:
> scores
[1] 10
Could this be modified to do what you want? You'd have to have it save in between somehow but am not sure on how to do this without messing with .Last function.
EDIT:
It appears this option is not viable in that when you compile as a package and use lazy load it saves the data sets as:
RData.rbd, RData.rbx, not as .rda files. That means the approach I use above is kinda worthless in that we want it to automatically be recognized.
EDIT2
This approach works and I tried it on a package I made. You can't do lazy load of the data and you have to either explicitly use data(scores) or use data(scores) inside of the function you're calling. I also assigned scores to .scores int he global.env the first time it was created and used exists inside the function to see if it exists. If `.scores. existed I assigned that to scores within the function. Once you unload the library and laod again you never have to worry about that again.
Maybe an alternative is to save this as a function somehow that can be altered using Josh's advice here: Permanently replacing a function
I guess there is no way to store settings without saving them to disk or a database, some way or another. It can be done silently though by putting the code below in your ~/.Rprofile. However, if you have packages that save settings in other ways than using options you need to add them manually.
I know this is exactly what you said you did not want, but it might spark some debate at least.
.Last <- function(){
my.options <- options()
save(my.options, file="~/.Roptions.Rdata")
}
.First <- function(){
tryCatch({
load("~/.Roptions.Rdata")
do.call(options, my.options)
rm(my.options)
}, error=function(...){})
}
To my suprise try(..., silent=TRUE) gives a warning on startup if ~/.Roptions.Rdata does not exist, which is why I used tryCatch instead.
The modern answer to this problem is well explained at https://blog.r-hub.io/2020/03/12/user-preferences/
I think I will be trying the hoardr package! Here is an example that worked for me :)
x <- hoardr::hoard()
x$cache_path_set("yourpackage", type = 'user_cache_dir')
x$mkdir()
scores<-data.frame(
user=c("one","two","three"),
score=c("500,200,1100")
)
save(scores,file = file.path(x$cache_path_get(), "scores.rdata"))
x$list()
x$details()
#new session
x <- hoardr::hoard()
x$cache_path_set("yourpackage", type = 'user_cache_dir')
x$list()
x$details()
load(file = file.path(x$cache_path_get(), "scores.rdata"))
PS - you can see a working example in the rnoaa package found on at github "opensci/rnoaa". Check their R/onload.r file! I can expand if needed.

Data inside a function (package creation)

If I need to use a data set inside a function (as a lookup table) inside of a package I'm creating do I need to explicitly load the data set inside of the function?
The function and the data set are both part of my package.
Is this the correct way to use that data set inside the function:
foo <- function(x){
x <- dataset_in_question
}
or is this better:
foo <- function(x){
x <- data(dataset_in_question)
}
or is there some approach I'm not thinking of that's correct?
There was a recent discussion about this topic (in the context of package development) on R-devel, numerous points of which are relevant to this question:
If only the options you provide are applicable to your example R himself (i.e., Brian Ripley) tells you to do:
foo <- function(x){
data("dataset_in_question")
}
This approach will however throw a NOTE in R CMD check which can be avoided in upcoming versions of R (or currently R devel) by using the globalVariables() function, added by John Chambers
The 'correct' approach (i.e., the one advocated by Brian Ripley and Peter Dalgaard) would be to use the LazyData option for your package. See this section of "Writing R Extensions".
Btw: I do not fully understand how your first approach should work. What should x <- dataset_in_question do? Is dataset_in_question a global Variable or defined previously?
For me it was necessary to use get() additionally to LazyData: true in DESCRIPTION file (see postig by #Henrik point 3) to get rid of the NOTE no visible binding for global variable .... My R version is 3.2.3.
foo <- function(x){
get("dataset_in_question")
}
So LazyData makes dataset_in_question directly accessible (without using data("dataset_in_question", envir = environment())) and get() is to satisfy R CMD check
HTH
One can just place the data set as a .rda file in the R folder as described by Hadley here: http://r-pkgs.had.co.nz/data.html#data-sysdata
Matthew Jockers uses this approach in the syuzhet package for data sets including the bing data set as seen at ~line 452 here: https://github.com/mjockers/syuzhet/blob/master/R/syuzhet.R
bing is not available to the user but is to the package as demonstrated by: syuzhet:::bing
Essentially, the command devtools::use_data(..., internal = TRUE) will set everything up in the way it's needed.

Resources