No visible binding for global variable Note in R CMD check - r

I noticed in checking a package that I obtain notes "no visible binding for global variable" when I use functions like subset that use verbatim names of list elements as arguments.
For example with a data frame:
foo <- data.frame(a=c(TRUE,FALSE,TRUE),b=1:3)
I can do silly things like:
subset(foo,a)
transform(foo,a=b)
Which work as expected. The R code check in R CMD however doesn't understand that these refer to elements and complains about there not being any visible bindings of global variables.
While this works ok, I don't really like having notes in my package and prefer for it to pass the check with no errors, warnings and notes at all. I also don't really want to rework my code too much. Is there a way to write these codes so that it is clear the arguments do not refer to global variables?

To get it past R CMD check you can either :
Use get("b") (but that is onerous)
Place a=b=NULL somewhere higher up in your function (that's what I do)
There was a thread on r-devel a while ago where somebody from r-core basically said (from memory) "NOTES are ok, you know. The assumption is that the author checked it and is ok with the NOTE.". But, I agree with you. I do prefer to have CRAN checks return a clean "OK" on all platforms. That way the user is left in no doubt that it passes checks ok.
EDIT :
Here is the r-devel thread I was remembering (from April 2010). So that appears to suggest that there are some situations where there is no known way to avoid the NOTE, but that's ok.

This is one of the potential "unanticipated consequences" of using subset non-interactively. As it says in the Warning section of ?subset:
This is a convenience function intended for use interactively. For
programming it is better to use the standard subsetting functions like
‘[’, and in particular the non-standard evaluation of argument
‘subset’ can have unanticipated consequences.

From R version 2.15.1 onwards there is a way around this:
if(getRversion() >= "2.15.1") utils::globalVariables(c("a", "othervar"))

As per the warning section of ?subset it is better to use subset interactively, and [ for programming.
I would replace a command like
subset(foo,a)
with
foo[foo$a]
or if foo is a dataframe:
foo[foo$a, ]
you might also like to use with if foo is a dataframe and the expression to be evaluated is complex:
with(foo, foo[a, ])

I had this issue and traced it to my ggplot2 section.
This code provided the error:
ggplot2::ggplot(data = spec.df, ggplot2::aes(E.avg, fraction)) +
ggplot2::geom_line() +
ggplot2::ggtitle(paste0(title))
Adding the data name to the parameters eliminated the not:
ggplot2::ggplot(data = spec.df, ggplot2::aes(spec.df$E.avg, spec.df$fraction)) +
ggplot2::geom_line() +
ggplot2::ggtitle(paste0(title))

Related

R generic dispatching to attached environment

I have a bunch of functions and I'm trying to keep my workspace clean by defining them in an environment and attaching the environment. Some of the functions are S3 generics, and they don't seem to play well with this approach.
A minimum example of what I'm experiencing requires 4 files:
testfun.R
ttt.xxx <- function(object) print("x")
ttt <- function(object) UseMethod("ttt")
ttt2 <- function() {
yyy <- structure(1, class="xxx")
ttt(yyy)
}
In testfun.R I define an S3 generic ttt and a method ttt.xxx, I also define a function ttt2 calling the generic.
testenv.R
test_env <- new.env(parent=globalenv())
source("testfun.R", local=test_env)
attach(test_env)
In testenv.R I source testfun.R to an environment, which I attach.
test1.R
source("testfun.R")
ttt2()
xxx <- structure(1, class="xxx")
ttt(xxx)
test1.R sources testfun.R to the global environment. Both ttt2 and a direct function call work.
test2.R
source("testenv.R")
ttt2()
xxx <- structure(1, class="xxx")
ttt(xxx)
test2.R uses the "attach" approach. ttt2 still works (and prints "x" to the console), but the direct function call fails:
Error in UseMethod("ttt") :
no applicable method for 'ttt' applied to an object of class "xxx"
however, calling ttt and ttt.xxx without arguments show that they are known, ls(pos=2) shows they are on the search path, and sloop::s3_dispatch(ttt(xxx)) tells me it should work.
This questions is related to Confusion about UseMethod search mechanism and the link therein https://blog.thatbuthow.com/how-r-searches-and-finds-stuff/, but I cannot get my head around what is going on: why is it not working and how can I get this to work.
I've tried both R Studio and R in the shell.
UPDATE:
Based on the answers below I changed my testenv.R to:
test_env <- new.env(parent=globalenv())
source("testfun.R", local=test_env)
attach(test_env)
if (is.null(.__S3MethodsTable__.))
.__S3MethodsTable__. <- new.env(parent = baseenv())
for (func in grep(".", ls(envir = test_env), fixed = TRUE, value = TRUE))
.__S3MethodsTable__.[[func]] <- test_env[[func]]
rm(test_env, func)
... and this works (I am only using "." as an S3 dispatching separator).
It’s a little-known fact that you must use .S3method() to define methods for S3 generics inside custom environments (outside of packages).1 The reason almost nobody knows this is because it is not necessary in the global environment; but it is necessary everywhere else since R version 3.6.
There’s virtually no documentation of this change, just a technical blog post by Kurt Hornik about some of the background. Note that the blog post says the change was made in R 3.5.0; however, the actual effect you are observing — that S3 methods are no longer searched in attached environments — only started happening with R 3.6.0; before that, it was somehow not active yet.
… except just using .S3method will not fix your code, since your calling environment is the global environment. I do not understand the precise reason why this doesn’t work, and I suspect it’s due to a subtle bug in R’s S3 method lookup. In fact, using getS3method('ttt', 'xxx') does work, even though that should have the same behaviour as actual S3 method lookup.
I have found that the only way to make this work is to add the following to testenv.R:
if (is.null(.__S3MethodsTable__.)) {
.__S3MethodsTable__. <- new.env(parent = baseenv())
}
.__S3MethodsTable__.$ttt.xxx <- ttt.xxx
… in other words: supply .GlobalEnv manually with an S3 methods lookup table. Unfortunately this relies on an undocumented S3 implementation detail that might theoretically change in the future.
Alternatively, it “just works” if you use ‘box’ modules instead of source. That is, you can replace the entirety of your testenv.R by the following:
box::use(./testfun[...])
This code treats testfun.R as a local module and loads it, attaching all exported names (via the attach declaration [...]).
1 (and inside packages you need to use the equivalent S3method namespace declaration, though if you’re using ‘roxygen2’ then that’s taken care of for you)
First of all, my advice would be: don't try to reinvent R packages. They solve all the problems you say you are trying to solve, and others as well.
Secondly, I'll try to explain what went wrong in test2.R. It calls ttt on an xxx object, and ttt.xxx is on the search list, but is not found.
The problem is how the search for ttt.xxx happens. The search doesn't look for ttt.xxx in the search list, it looks for it in the environment from which ttt was called, then in an object called .__S3MethodsTable__.. I think there are two reasons for this:
First, it's a lot faster. It only needs to look in one or two places, and the table can be updated whenever a package is attached or detached, a relatively rare operation.
Second, it's more reliable. Each package has its own methods table, because two packages can use the same name for generics that have nothing to do with each other, or can use the same class names that are unrelated. So package code needs to be able to count on finding its own definitions first.
Since your call to ttt() happens at the top level, that's where R looks first for ttt.xxx(), but it's not there. Then it looks in the global .__S3MethodsTable__. (which is actually in the base environment), and it's not there either. So it fails.
There is a workaround that will make your code work. If you run
.__S3MethodsTable__. <- list2env(list(ttt.xxx = ttt.xxx))
as the last line of testenv.R, then you'll create a methods table in the global environment. (Normally there isn't one there, because that's user space, and R doesn't like putting things there unless the user asks for it.)
R will find that methods table, and will find the ttt.xxx method that it defines. I wouldn't be surprised if this breaks some other aspect of S3 dispatch, so I don't recommend doing it, but give it a try if you insist on reinventing the package system.

A note on graphics::curve() in R CMD check

I use the following code in my own package.
graphics::curve( foo (x) )
When I run R CMD check, it said the following note.How do I delete the NOTE?
> checking R code for possible problems ... NOTE
foo: no visible binding for global variable 'x'
Undefined global functions or variables:
x
Edit for the answers:
I try the answer as follows.
function(...){
utils::globalVariables("x")
graphics::curve( sin(x) )
}
But it did not work. So,..., now, I use the following code, instead
function(...){
x <-1 # This is not used but to avoid the NOTE, I use an object "x".
graphics::curve( sin(x) )
}
The last code can remove the NOTE.
Huuum, I guess, the answer is correct, but, I am not sure but it dose not work for me.
Two things:
Add
utils::globalVariables("x")
This can be added in a file of its own (e.g., globals.R), or (my technique) within the file that contains that code.
It is not an error to include the same named variables in multiple files, so the same-file technique will preclude you from accidentally removing it when you remove one (but not another) reference. From the help docs: "Repeated calls in the same package accumulate the names of the global variables".
This must go outside of any function declarations, on its own (top-level). While it is included in the package source (it needs to be, in order to have an effect on the CHECK process), but otherwise has no impact on the package.
Add
importFrom(utils,globalVariables)
to your package NAMESPACE file, since you are now using that function (unless you want another CHECK warning about objects not found in the global environment :-).

Using datasets in an R package

I am trying to get the latest version of my package (https://github.com/jmcurran/relSim) on CRAN. This has been rejected because of the use of a data set that is included in the package in a function which is not exported (i.e. the user cannot use it unless they use the ::: operator. A code snippet:
testIS = function(nc = c(3, 2), locus = 1, seed = 123456){
set.seed(seed)
np = 2 * nc[2]
freqs = USCaucs$freqs
The dataset is included in the package, and as per Hadley's advice I have LazyData: true in my DESCRIPTION file. However I get this note from https://win-builder.r-project.org which I don't know how to resolve.
* checking R code for possible problems ... [11s] NOTE
testIS: no visible binding for global variable 'USCaucs'
Undefined global functions or variables::
USCaucs
I find this especially frustrating, since, as I said, this function is not even exported (it also works without complaint because the package loads this dataset). All help appreciated
The solution appears to involve a little duplication. At the suggestion of Thomas Lumley, I placed the object in R/sysdata.rda as well as having it in data/USCaucs.rda. I followed Hadley Wickham's suggestion to use devtools::use_data with the argument internal set to TRUE so that it was saved in the correct manner for a package.
As noted, this solution involves duplicating the data. This isn't an issue for a small object such as the one I have here, but I'd like to think there is a more elegant solution out there.

utils::globalVariables(.) not applicable to R CMD CHECK note:no visible binding for global variable '.' [duplicate]

I noticed in checking a package that I obtain notes "no visible binding for global variable" when I use functions like subset that use verbatim names of list elements as arguments.
For example with a data frame:
foo <- data.frame(a=c(TRUE,FALSE,TRUE),b=1:3)
I can do silly things like:
subset(foo,a)
transform(foo,a=b)
Which work as expected. The R code check in R CMD however doesn't understand that these refer to elements and complains about there not being any visible bindings of global variables.
While this works ok, I don't really like having notes in my package and prefer for it to pass the check with no errors, warnings and notes at all. I also don't really want to rework my code too much. Is there a way to write these codes so that it is clear the arguments do not refer to global variables?
To get it past R CMD check you can either :
Use get("b") (but that is onerous)
Place a=b=NULL somewhere higher up in your function (that's what I do)
There was a thread on r-devel a while ago where somebody from r-core basically said (from memory) "NOTES are ok, you know. The assumption is that the author checked it and is ok with the NOTE.". But, I agree with you. I do prefer to have CRAN checks return a clean "OK" on all platforms. That way the user is left in no doubt that it passes checks ok.
EDIT :
Here is the r-devel thread I was remembering (from April 2010). So that appears to suggest that there are some situations where there is no known way to avoid the NOTE, but that's ok.
This is one of the potential "unanticipated consequences" of using subset non-interactively. As it says in the Warning section of ?subset:
This is a convenience function intended for use interactively. For
programming it is better to use the standard subsetting functions like
‘[’, and in particular the non-standard evaluation of argument
‘subset’ can have unanticipated consequences.
From R version 2.15.1 onwards there is a way around this:
if(getRversion() >= "2.15.1") utils::globalVariables(c("a", "othervar"))
As per the warning section of ?subset it is better to use subset interactively, and [ for programming.
I would replace a command like
subset(foo,a)
with
foo[foo$a]
or if foo is a dataframe:
foo[foo$a, ]
you might also like to use with if foo is a dataframe and the expression to be evaluated is complex:
with(foo, foo[a, ])
I had this issue and traced it to my ggplot2 section.
This code provided the error:
ggplot2::ggplot(data = spec.df, ggplot2::aes(E.avg, fraction)) +
ggplot2::geom_line() +
ggplot2::ggtitle(paste0(title))
Adding the data name to the parameters eliminated the not:
ggplot2::ggplot(data = spec.df, ggplot2::aes(spec.df$E.avg, spec.df$fraction)) +
ggplot2::geom_line() +
ggplot2::ggtitle(paste0(title))

Is there a sensible way to do something like docstrings in R?

This is not just a coding style question. If you know python (and I think also Ruby has something like this), you can have a docstring in a function, such that you can readily get that string by issuing a "help" command. e.g.:
def something(t=None):
'''Do something, perhaps to t
t : a thing
You may not want to do this
'''
if t is not None:
return t ** 2
else:
return 'Or maybe not'
Then help(something) returns the following:
Help on function something in module __main__:
something(t=None)
Do something, perhaps to t
t : a thing
You may not want to do this
The way things work in R, you can get the full text of the defined code snippet, so you could see comments (including those at the beginning of the function), but that can be a lot of scrolling and visual filtering. Is there any better way?
I recently wrote a package to do just this task. The docstring package allows one to write their documentation as roxygen style comments within the function they are documenting. For example one could do
square <- function(x){
#' Square a number
return(x^2)
}
and then to view the documentation either call the docstring function
docstring(square)
or use the built in ? support and do
?square
The comments can either be a single chunk like shown above or fully roxygen style to take advantage of some of the keywords provided
square <- function(x){
#' Square a number
#'
#' Calculates the square of the input
#'
#' #param x the input to be squared
return(x^2)
}
This is on CRAN now: https://cran.r-project.org/package=docstring so you can just install using
install.packages("docstring")
or if you want the latest development version you can install from github:
library(devtools)
install_github("dasonk/docstring")
You can add any attributes you like to R objects, including function. So something like
describe <- function(obj) attr(obj, "help")
foo <- function(t=NULL) ifelse(!is.null(t), t^2, "Or maybe not")
attr(foo, "help") <- "Here is the help message"
produces more or less the desired output
> foo(2)
[1] 4
> foo()
[1] "Or maybe not"
> describe(foo)
[1] "Here is the help message"
Sort-of -- look at the roxygen2 package on CRAN (vignette here). You write a declarative header, and among other things a help page is created for you when you 'roxygen-ize' your sources.
It may not be the easiest package to use, see here on SO for questions pertaining to it as well as its mailing list. But it probably is the closest match.
RStudio helps you to create documentation quite easily.
See their documentation for more information.
The new reference class system has something very similar to docstrings for documenting methods of a class. Here is an example:
Something <- setRefClass("Something",
methods=list(
something=function(t=NULL) {
"Do something, perhaps to t
t : a thing
You may not want to do this
"
if(!is.null(t))
t^2
else
"Or maybe not"
}
))
a <- Something$new()
a$something(2)
a$something()
Something$help("something") ## to see help page
I had another idea as I'm finally wrapping my head around the fact that "R is a (very poor) LISP". Specifically, you can get access to the source code (usually) using the deparse command. So, this function would be a start towards defining your own custom source-code parsing help function:
docstr <- function(my.fun) {
# Comments are not retained
# So, we can put extra stuff here we don't want
# our docstr users to see
'This is a docstring that will be printed with extra whitespace and quotes'
orig.code.ish <- deparse(my.fun)
print(orig.code.ish[3])
}
docstr(docstr)
The above illustrates that deparse really does deparse, and is different from what you'd print at the REPL prompt if you typed docstr: quotes are changed to (default) double-quotes, opening curly brace gets moved to the second line, and blank lines (including comments) are removed. This actually helps a lot if you want to design a robust function. Would be trivial to look for e.g., opening and closing quotes down through the first line that doesn't start with a quote.
Another way to do it would be to get the list of call objects that make up the body list with body(docstr). The string would be in body(docstr)[[2]]. I have to admit that I'm a bit out of my depth here, as I don't fully understand the return value of body, and don't know where to find documentation! In particular, note that body(docstr)[2] returns an object of type and mode 'call'.
This latter approach seems much more LISPy. I'd love to hear other's thoughts on the general idea, or have pointers to actual language reference materials for this!

Resources