EDIT: Hadley Wickham points out that I misspoke. R CMD check is throwing NOTES, not Warnings. I'm terribly sorry for the confusion. It was my oversight.
The short version
R CMD check throws this note every time I use sensible plot-creation syntax in ggplot2:
no visible binding for global variable [variable name]
I understand why R CMD check does that, but it seems to be criminalizing an entire vein of otherwise sensible syntax. I'm not sure what steps to take to get my package to pass R CMD check and get admitted to CRAN.
The background
Sascha Epskamp previously posted on essentially the same issue. The difference, I think, is that subset()'s manpage says it's designed for interactive use.
In my case, the issue is not over subset() but over a core feature of ggplot2: the data = argument.
An example of code I write that generates these notes
Here's a sub-function in my package that adds points to a plot:
JitteredResponsesByContrast <- function (data) {
return(
geom_point(
aes(
x = x.values,
y = y.values
),
data = data,
position = position_jitter(height = 0, width = GetDegreeOfJitter(jj))
)
)
}
R CMD check, on parsing this code, will say
granovagg.contr : JitteredResponsesByContrast: no visible binding for
global variable 'x.values'
granovagg.contr : JitteredResponsesByContrast: no visible binding for
global variable 'y.values'
Why R CMD check is right
The check is technically correct. x.values and y.values
Aren't defined locally in the function JitteredResponsesByContrast()
Aren't pre-defined in the form x.values <- [something] either globally or in the caller.
Instead, they're variables within a dataframe that gets defined earlier and passed into the function JitteredResponsesByContrast().
Why ggplot2 makes it difficult to appease R CMD check
ggplot2 seems to encourage the use of a data argument. The data argument, presumably, is why this code will execute
library(ggplot2)
p <- ggplot(aes(x = hwy, y = cty), data = mpg)
p + geom_point()
but this code will produce an object-not-found error:
library(ggplot2)
hwy # a variable in the mpg dataset
Two work-arounds, and why I'm happy with neither
The NULLing out strategy
Matthew Dowle recommends setting the problematic variables to NULL first, which in my case would look like this:
JitteredResponsesByContrast <- function (data) {
x.values <- y.values <- NULL # Setting the variables to NULL first
return(
geom_point(
aes(
x = x.values,
y = y.values
),
data = data,
position = position_jitter(height = 0, width = GetDegreeOfJitter(jj))
)
)
}
I appreciate this solution, but I dislike it for three reasons.
it serves no additional purpose beyond appeasing R CMD check.
it doesn't reflect intent. It raises the expectation that the aes() call will see our now-NULL variables (it won't), while obscuring the real purpose (making R CMD check aware of variables it apparently wouldn't otherwise know were bound)
The problems of 1 and 2 multiply because every time you write a function that returns a plot element, you have to add a confusing NULLing statement
The with() strategy
You can use with() to explicitly signal that the variables in question can be found inside some larger environment. In my case, using with() looks like this:
JitteredResponsesByContrast <- function (data) {
with(data, {
geom_point(
aes(
x = x.values,
y = y.values
),
data = data,
position = position_jitter(height = 0, width = GetDegreeOfJitter(jj))
)
}
)
}
This solution works. But, I don't like this solution because it doesn't even work the way I would expect it to. If with() were really solving the problem of pointing the interpreter to where the variables are, then I shouldn't even need the data = argument. But, with() doesn't work that way:
library(ggplot2)
p <- ggplot()
p <- p + with(mpg, geom_point(aes(x = hwy, y = cty)))
p # will generate an error saying `hwy` is not found
So, again, I think this solution has similar flaws to the NULLing strategy:
I still have to go through every plot element function and wrap the logic in a with() call
The with() call is misleading. I still need to supply a data = argument; all with() is doing is appeasing R CMD check.
Conclusion
The way I see it, there are three options I could take:
Lobby CRAN to ignore the notes by arguing that they're "spurious" (pursuant to CRAN policy), and do that every time I submit a package
Fix my code with one of two undesirable strategies (NULLing or with() blocks)
Hum really loudly and hope the problem goes away
None of the three make me happy, and I'm wondering what people suggest I (and other package developers wanting to tap into ggplot2) should do.
You have two solutions:
Rewrite your code to avoid non-standard evaluation. For ggplot2, this means using aes_string() instead of aes() (as described by Harlan)
Add a call to globalVariables(c("x.values", "y.values")) somewhere in the top-level of your package.
You should strive for 0 NOTES in your package when submitting to CRAN, even if you have to do something slightly hacky. This makes life easier for CRAN, and easier for you.
(Updated 2014-12-31 to reflect my latest thoughts on this)
Have you tried with aes_string instead of aes? This should work, although I haven't tried it:
aes_string(x = 'x.values', y = 'y.values')
This question has been asked and answered a while ago but just for your information, since version 2.1.0 there is another way to get around the notes: aes_(x=~x.values,y=~y.values).
In 2019, the best way to get around this is to use the .data prefix from the rlang package, which also gets exported to ggplot2. This tells R to treat x.values and y.values as columns in a data.frame (so it won't complain about undefined variables).
Note: This works best if you have predefined columns names that you know will exist in you data input
#' #importFrom ggplot2 .data
my_func <- function(data) {
ggplot(data, aes(x = .data$x, y = .data$y))
}
EDIT: Updated to export .data from ggplot2 instead of rlang based off #Noah comment
If
getRversion() >= "3.1.0"
You can add a call at the top level of the package:
utils::suppressForeignCheck(c("x.values", "y.values"))
from:
help("suppressForeignCheck")
Add this line of code to the file in which you provide package-level documentation:
if(getRversion() >= "2.15.1") utils::globalVariables(c("."))
Example here
how about using get()?
geom_point(
aes(
x = get('x.values'),
y = get('y.values')
),
data = data,
position = position_jitter(height = 0, width = GetDegreeOfJitter(jj))
)
Because the manual for ?aes_string says
All these functions are soft-deprecated. Please use tidy evaluation
idioms instead (see the quasiquotation section in aes()
documentation).
So I read that page, and came up with this pattern:
ggplot2::aes(x = !!quote(x.values),
y = !!quote(y.values))
It is about as fugly as an IIFE, and mixes base expressions with tidy-bang-bangs. But does not require the global variables workaround, either, and doesn't use anything that is deprecated afaict. It seems like it also works with calculations in aesthetics and the derived variables like ..count..
Related
I'm trying to copy a ggplot object and then change some properties of the new copied object as, for instance, the colour line to red.
Assume this code:
df = data.frame(cbind(x=1:10, y=1:10))
a = ggplot(df, aes(x=x, y=y)) + geom_line()
b = a
Then, if I change the colour of line of variable a
a$layers[[1]]$geom_params$colour = "red"
it also changes the colour of b
> b$layers[[1]]$geom_params$colour
[1] "red" # why it is not "black"?
I wish I could have two different objects a and b with different characteristics. So, in order to do this in the correct way, I would need to call the plot again for b using b = ggplot(df, aes(xy, y=z)) + geom_line(). However, at this time in the algorithm, there is no way to know the plot command ggplot(df, aes(x=x, y=y)) + geom_line()
Do you know what's wrong with this? Is ggplot objects treated in a different manner?
Thanks!
The issue here is that ggplot uses the proto library to mimic OO-style objects. The proto library relies on environments to collect variables for objects. Environments are passed by reference which is why you are seeing the behavior you are (and also a reason no one would probably recommend changing the properties of a layer that way).
Anyway, adapting an example from the proto documentaiton, we can try to make a deep copy of the laters of the ggplot object. This should "disconnect" them. Here's such a helper function
duplicate.ggplot<-function(x) {
require(proto)
r<-x
r$layers <- lapply(r$layers, function(x) {
as.proto(as.list(x), parent=x)
})
r
}
so if we run
df = data.frame(cbind(x=1:10, y=1:10))
a = ggplot(df, aes(x=x, y=y)) + geom_line()
b = a
c = duplicate.ggplot(a)
a$layers[[1]]$geom_params$colour = "red"
then plot all three, we get
which shows we can change "c" independently from "a"
Ignoring the specifics of ggplot, there's a simple trick to make a deep copy of (almost) any object in R:
obj_copy <- unserialize(serialize(obj, NULL))
This serializes the object to a binary representation suitable for writing to disk and then reconstructs the object from that representation. It's equivalent to saving the object to a file and then loading it again (i.e. saveRDS followed by readRDS), only it never actually saves to a file. It's probably not the most efficient solution, but it should work for just about any object that can be saved to a file.
You can define a deepcopy function using this trick:
deepcopy <- function(p) {
unserialize(serialize(p, NULL))
}
This seems to successfully break the links between related ggplots.
Obviously, this will not work for objects that cannot be serialized, such as big matrices from the bigmemory package.
I use ggalluvial with ggplot2, though, I'd like to be able to generate the same plot without attaching ggalluvial but only specify its use with ggalluvial::. If it is not attached, I get the following error: Error: Can't find stat called "stratum".
d <- data.frame(
status = rep(c("state1","state2","state3"), rep(90, times=3)),
cellIndex = rep(seq_len(90), times=3),
cellCategory = c(rep(letters[seq_len(3)], each=30),
rep(letters[c(2,3,1)], each=30),
rep(letters[c(3,1,2)], each=30))
)
ggplot2::ggplot(data=d, ggplot2::aes(x=status, stratum=cellCategory, alluvium=cellIndex,
fill=cellCategory, label=cellCategory)) +
ggalluvial::geom_flow(stat="alluvium", lode.guidance="rightleft", color="darkgray") +
ggalluvial::geom_stratum() +
ggplot2::geom_text(stat="stratum", size=3)
This was a tough one---digging into the code for ggplot2, the stat argument pastes the string you give and then looks for that object (in this case "StatStratum") in the environment you're in. Because you don't want to load the package, it won't be able to find it (and there's no way to change the argument itself).
Answer
So you need to save that object from the ggalluvial package like so:
StatStratum <- ggalluvial::StatStratum
Then leave the rest of your code as is.
The following worked for me.
ggplot2::geom_text(stat = ggalluvial::StatStratum)
I'm contributing to the qmethod R package, and I just wrote a function that creates a bunch of ggplot2 objects.
The function works fine, but builds and R CMD Check warns me that:
replacing previous import by ‘ggplot2::%+%’ when loading ‘qmethod’
I've looked at SE posts and #hadley's book but can't figure out the problem.
Here's the relevant parts of my NAMESPACE:
import("ggplot2",
"stringr")
import("psych")
importFrom("plyr","count")
importFrom("reshape2","melt")
importFrom("digest", "digest")
importFrom("RColorBrewer", "brewer.pal")
And here's part of my DESCRIPTION:
Imports:
digest,
psych,
knitr,
RColorBrewer,
stringr,
ggplot2,
plyr,
reshape2
The part where I call a ggplot2 function inside my function array.viz.R looks like this (and more):
g <- ggplot(
data = array.viz.data
,aes(
x = fsc # factor scores, always same variable bc dataframe is constructed for every factor array by above loop
,y = ycoord # just the random ycoord for viz
,ymax = max(ycoord)
,ymin = 0
#,label = item.wrapped # this for some reason causes an error
)
)
g <- g + geom_tile( # add background tiles
aes(
fill = item.sd
)
)
Ps.: you can find the entire current work here: https://github.com/maxheld83/qmethod/tree/array-viz
Pps.: I am aware that ggplot2 itself imports a bunch of the functions I also import (such as reshape2), so I have a hunch that that might be a problem.
Turns out, import("psych") is the offending package.
It seems to somehow export again ggplot::%+%, though I can't think of why that would be the case.
Anyway, the fix is:
importFrom("psych", "principal") # that's the function we were using
EDIT: Hadley Wickham points out that I misspoke. R CMD check is throwing NOTES, not Warnings. I'm terribly sorry for the confusion. It was my oversight.
The short version
R CMD check throws this note every time I use sensible plot-creation syntax in ggplot2:
no visible binding for global variable [variable name]
I understand why R CMD check does that, but it seems to be criminalizing an entire vein of otherwise sensible syntax. I'm not sure what steps to take to get my package to pass R CMD check and get admitted to CRAN.
The background
Sascha Epskamp previously posted on essentially the same issue. The difference, I think, is that subset()'s manpage says it's designed for interactive use.
In my case, the issue is not over subset() but over a core feature of ggplot2: the data = argument.
An example of code I write that generates these notes
Here's a sub-function in my package that adds points to a plot:
JitteredResponsesByContrast <- function (data) {
return(
geom_point(
aes(
x = x.values,
y = y.values
),
data = data,
position = position_jitter(height = 0, width = GetDegreeOfJitter(jj))
)
)
}
R CMD check, on parsing this code, will say
granovagg.contr : JitteredResponsesByContrast: no visible binding for
global variable 'x.values'
granovagg.contr : JitteredResponsesByContrast: no visible binding for
global variable 'y.values'
Why R CMD check is right
The check is technically correct. x.values and y.values
Aren't defined locally in the function JitteredResponsesByContrast()
Aren't pre-defined in the form x.values <- [something] either globally or in the caller.
Instead, they're variables within a dataframe that gets defined earlier and passed into the function JitteredResponsesByContrast().
Why ggplot2 makes it difficult to appease R CMD check
ggplot2 seems to encourage the use of a data argument. The data argument, presumably, is why this code will execute
library(ggplot2)
p <- ggplot(aes(x = hwy, y = cty), data = mpg)
p + geom_point()
but this code will produce an object-not-found error:
library(ggplot2)
hwy # a variable in the mpg dataset
Two work-arounds, and why I'm happy with neither
The NULLing out strategy
Matthew Dowle recommends setting the problematic variables to NULL first, which in my case would look like this:
JitteredResponsesByContrast <- function (data) {
x.values <- y.values <- NULL # Setting the variables to NULL first
return(
geom_point(
aes(
x = x.values,
y = y.values
),
data = data,
position = position_jitter(height = 0, width = GetDegreeOfJitter(jj))
)
)
}
I appreciate this solution, but I dislike it for three reasons.
it serves no additional purpose beyond appeasing R CMD check.
it doesn't reflect intent. It raises the expectation that the aes() call will see our now-NULL variables (it won't), while obscuring the real purpose (making R CMD check aware of variables it apparently wouldn't otherwise know were bound)
The problems of 1 and 2 multiply because every time you write a function that returns a plot element, you have to add a confusing NULLing statement
The with() strategy
You can use with() to explicitly signal that the variables in question can be found inside some larger environment. In my case, using with() looks like this:
JitteredResponsesByContrast <- function (data) {
with(data, {
geom_point(
aes(
x = x.values,
y = y.values
),
data = data,
position = position_jitter(height = 0, width = GetDegreeOfJitter(jj))
)
}
)
}
This solution works. But, I don't like this solution because it doesn't even work the way I would expect it to. If with() were really solving the problem of pointing the interpreter to where the variables are, then I shouldn't even need the data = argument. But, with() doesn't work that way:
library(ggplot2)
p <- ggplot()
p <- p + with(mpg, geom_point(aes(x = hwy, y = cty)))
p # will generate an error saying `hwy` is not found
So, again, I think this solution has similar flaws to the NULLing strategy:
I still have to go through every plot element function and wrap the logic in a with() call
The with() call is misleading. I still need to supply a data = argument; all with() is doing is appeasing R CMD check.
Conclusion
The way I see it, there are three options I could take:
Lobby CRAN to ignore the notes by arguing that they're "spurious" (pursuant to CRAN policy), and do that every time I submit a package
Fix my code with one of two undesirable strategies (NULLing or with() blocks)
Hum really loudly and hope the problem goes away
None of the three make me happy, and I'm wondering what people suggest I (and other package developers wanting to tap into ggplot2) should do.
You have two solutions:
Rewrite your code to avoid non-standard evaluation. For ggplot2, this means using aes_string() instead of aes() (as described by Harlan)
Add a call to globalVariables(c("x.values", "y.values")) somewhere in the top-level of your package.
You should strive for 0 NOTES in your package when submitting to CRAN, even if you have to do something slightly hacky. This makes life easier for CRAN, and easier for you.
(Updated 2014-12-31 to reflect my latest thoughts on this)
Have you tried with aes_string instead of aes? This should work, although I haven't tried it:
aes_string(x = 'x.values', y = 'y.values')
This question has been asked and answered a while ago but just for your information, since version 2.1.0 there is another way to get around the notes: aes_(x=~x.values,y=~y.values).
In 2019, the best way to get around this is to use the .data prefix from the rlang package, which also gets exported to ggplot2. This tells R to treat x.values and y.values as columns in a data.frame (so it won't complain about undefined variables).
Note: This works best if you have predefined columns names that you know will exist in you data input
#' #importFrom ggplot2 .data
my_func <- function(data) {
ggplot(data, aes(x = .data$x, y = .data$y))
}
EDIT: Updated to export .data from ggplot2 instead of rlang based off #Noah comment
If
getRversion() >= "3.1.0"
You can add a call at the top level of the package:
utils::suppressForeignCheck(c("x.values", "y.values"))
from:
help("suppressForeignCheck")
Add this line of code to the file in which you provide package-level documentation:
if(getRversion() >= "2.15.1") utils::globalVariables(c("."))
Example here
how about using get()?
geom_point(
aes(
x = get('x.values'),
y = get('y.values')
),
data = data,
position = position_jitter(height = 0, width = GetDegreeOfJitter(jj))
)
Because the manual for ?aes_string says
All these functions are soft-deprecated. Please use tidy evaluation
idioms instead (see the quasiquotation section in aes()
documentation).
So I read that page, and came up with this pattern:
ggplot2::aes(x = !!quote(x.values),
y = !!quote(y.values))
It is about as fugly as an IIFE, and mixes base expressions with tidy-bang-bangs. But does not require the global variables workaround, either, and doesn't use anything that is deprecated afaict. It seems like it also works with calculations in aesthetics and the derived variables like ..count..
I want to create a scatterplot and draw the regression line for a subset of a dataset. To give a reproducible example I'll use the CO2 dataset.
I tried this but the regression line doesn't appear for some reason
with(subset(CO2,Type=="Quebec"),plot(conc,uptake),abline(lm(uptake~conc)))
What is the correct way to give a command like this? Can I do it with a one-liner?
You need to provide both your lines of code as a single R expression. The abline() is being taken as a subsequent argument to with(), which is the ... argument. This is documented a a means to pass arguments on to future methods, but the end result is that it is effectively a black hole for this part of your code.
Two options, i) keep one line but wrap the expression in { and } and separate the two expressions with ;:
with(subset(CO2,Type=="Quebec"), {plot(conc,uptake); abline(lm(uptake~conc))})
Or spread the expression out over two lines, still wrapped in { and }:
with(subset(CO2,Type=="Quebec"),
{plot(conc,uptake)
abline(lm(uptake~conc))})
Edit: To be honest, if you are doing things like this you are missing out on the advantages of doing the subsetting via R's model formulae. I would have done this as follows:
plot(uptake ~ conc, data = CO2, subset = Type == "Quebec")
abline(lm(uptake ~ conc, data = CO2, subset = Type == "Quebec"), col = "red")
The with() is just causing you to obfuscate your code with braces and ;.
From ?with: with ... evaluates expr in a local environment created using data. You're passing abline() via .... You need to do something like this:
with(subset(CO2,Type=="Quebec"),{plot(conc,uptake);abline(lm(uptake~conc))})
Gavin and Joshua offer good solutions to your immediate problem; here's the equivalent plot using ggplot:
library(ggplot2)
qplot(conc, uptake, data = CO2[CO2$Type == "Quebec" , ]) + stat_smooth(method = "lm", se = FALSE)