R preserve symbol names across indirection - r

I made a plot like so...
ggplot(my_data, aes(x = ttd, y = aval)) +
theme_bw() +
geom_point(alpha = 0.25)
That gave me a nice plot with ttd and aval as my axes labels. I like how it used the names of the arguments as the default labels.
However, I have a bunch of plots like this, and I wanted to abstract it into my own function. But I can't seem to make the plot from inside the function. Here's what I tried:
bw_plot <- function(data, x_, y_) {
ggplot(data, aes(x = substitute(x_), y = substitute(y_))) +
theme_bw() +
geom_point(alpha = 0.25)
}
bw_plot(my_data, ttd, aval)
But I get this error:
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : object 'x_' not found
I simply want to pass the symbols down from my bw_plot function into ggplot. How can I get it to see my actual column names?
(I also tried passing the column names in as strings and calling as.name on them, but i get the same result.

substitute will correctly "quote" your arguments x_ and y_. However, aes will apply a second round of "quoting" internally, which is what gives you the error. You need to unquote the result of your substitute calls, as you're passing them to aes. This can be done using !! operator from rlang.
library( ggplot2 )
library( rlang )
bw_plot <- function( .data, x_, y_ )
{
xsym <- ensym(x_)
ysym <- ensym(y_)
ggplot( .data, aes(x = !!xsym, y = !!ysym) ) +
theme_bw() +
geom_point(alpha=0.25)
}
Note that the correct function to use is rlang::ensym, rather than substitute, because you are aiming to capture individual symbols (column names). Also, I suggest not naming your argument data to avoid name collisions with a built-in function.
Here's example usage: bw_plot( mtcars, mpg, wt )

The accepted answer works for ggplot2 version 2.2.1.9000 and later. For versions 2.2.1 and prior, it looks like the non-standard evaluation used by aes makes this impossible. Instead, I have to use aes_ which provides an "escape hatch" allowing me to provide the symbols to the function as expected. Here is the solution:
bw_plot <- function(data, x_, y_) {
ggplot(data, aes_(x = substitute(x_), y = substitute(y_))) +
theme_bw() +
geom_point(alpha = 0.25)
}
bw_plot(my_data, ttd, aval)

Related

ggplot2: default fill if aesthetic not supplied

I've got a function that uses ggplot. I'd like to either pass it a variable to use for the fill aesthetic, or use a different colour when I don't pass in anything.
The if/else statement does that for the function below, but my actual function is a bit more complicated and has a few if/else branches, so I'd like to minimise them.
Would it be possible to set the default fill for geom_bar() to "blue", but have that overridden if I pass in a fill aesthetic?
Alternatively, I was thinking of creating a list of arguments to be passed to geom_bar(), based on the arguments passed to my function, and splicing it in using !!!. I haven't really got my head around tidyeval/quasiquotation yet though so I don't quite know how that would work.
I'm open to other solutions too though.
library(ggplot2)
plotter <- function(x = mtcars, fill = NULL) {
p <- ggplot(x, aes(factor(cyl)))
# I don't like this if/else statement
if (!is.null(fill)) {
p <- p + geom_bar(mapping = aes_string(group = fill, fill = fill))
} else {
p <- p + geom_bar(fill = "blue")
}
p
}
plotter()
plotter(fill = "as.factor(gear)")

subset=.() cannot be called in ggplot() directly

It is pretty clear what the following line wants to do:
ggplot(data=mtcars, aes(x=mpg, y=cyl), subset=.(gear=="5")) +
geom_point(aes(colour=gear))
But it doesn't work (subset is just ignored). What does indeed work is:
ggplot(data=mtcars, aes(x=mpg, y=cyl)) +
geom_point(aes(colour=gear), subset=.(gear=="5"))
or also:
ggplot(data=subset(mtcars, gear=="5"), aes(x=mpg, y=cyl)) +
geom_point(aes(colour=gear))
So it seems subset can be called only from geometry calls, and not from ggplot() directly.
Is this a bug or is this the correct behaviour? ggplot doesn't return any kind of warning or error.
I don't think this is a bug. It looks like it is intended if you see the source code of the two functions: ggplot and geom_point:
For ggplot:
> getAnywhere(ggplot.data.frame)
A single object matching ‘ggplot.data.frame’ was found
It was found in the following places
registered S3 method for ggplot from namespace ggplot2
namespace:ggplot2
with value
function (data, mapping = aes(), ..., environment = globalenv())
{
if (!missing(mapping) && !inherits(mapping, "uneval"))
stop("Mapping should be created with aes or aes_string")
p <- structure(list(data = data, layers = list(), scales = Scales$new(),
mapping = mapping, theme = list(), coordinates = coord_cartesian(),
facet = facet_null(), plot_env = environment), class = c("gg",
"ggplot"))
p$labels <- make_labels(mapping)
set_last_plot(p)
p
}
<environment: namespace:ggplot2>
And geom_point:
> geom_point
function (mapping = NULL, data = NULL, stat = "identity", position = "identity",
na.rm = FALSE, ...)
{
GeomPoint$new(mapping = mapping, data = data, stat = stat,
position = position, na.rm = na.rm, ...)
}
<environment: namespace:ggplot2>
If you look at the ellipsis argument ... you will see that it is not used in the ggplot function. So, your use of the argument subset=.() is not transferred or used anywhere. It does not give any errors or warnings however because of the existence of the ellipsis in the ggplot function.
On the other hand the geom_point function uses the ellipsis and transfers it to GeomPoint$new where it is used. In this case your subset=.() argument is transferred to GeomPoint$new where it is used, producing the result you want.

Alternatives to combination to paste and parse functions using ggplot2

I'm trying to construct a general function to generate some graphics. So I have to create an axis label that combines an expression and a string. I've tried:
i <- 2
measure <- c(expression((kg/m^2)), "(%)")
variable <- c("BMI", "Cintilografia 1h")
data <- data.frame(x = 0, y = 0)
gp <- ggplot(data, aes(x = x, y = y)) +
geom_point() +
labs(y = parse(text = paste(variable[i], measure[i])))
It works for i = 1, but for i = 2.
I discovered parse function has some problems with special characters. I would prefer that the user-function does not to worry about these features. Then, I'm looking for a more general solution.
Does anyone have any idea?
Thanks in advance
Trying to paste and parse isn't a great idea when working with labels where you want to use ?plotmath markup. It's better to work with expressions. The bquote() function makes it somewhat easier to pop in values into expressions. This should work for you
i<-1
g1<-ggplot(data, aes(x = x, y = y)) +
geom_point() +
labs(y = bquote(.(variable[i])~.(measure[[i]])))
i<-2
g2<-ggplot(data, aes(x = x, y = y)) +
geom_point() +
labs(y = bquote(.(variable[i])~.(measure[[i]])))
gridExtra::grid.arrange(g1,g2, ncol=2)
Also note that when using bquote() on the measure object, since it's an expression, you want to extract the sub-expressions with [[ ]] rather than [ ] because the latter always wraps the result in an expression() and plotmath doesn't like nested expressions -- you want proper call objects.

recreate scale_fill_brewer with scale_fill_manual

I am trying to understand the connection between scale_fill_brewer and scale_fill_manual of package ggplot2.
First, generate a ggplot with filled colors:
library(ggplot2)
p <- ggplot(data = mtcars, aes(x = mpg, y = wt,
group = cyl, fill = factor(cyl))) +
geom_area(position = 'stack')
# apply ready-made palette with scale_fill_brewer from ggplot2
p + scale_fill_brewer(palette = "Blues")
Now, replicate with scale_fill_manual
library(RColorBrewer)
p + scale_fill_manual(values = brewer.pal(3, "Blues"))
where 3 is the number of fill-colors in the data. For convenience, I have used the brewer.pal function of package RColorBrewer.
As far as I understand, the convenience of scale_fill_brewer is that it automatically computes the number of unique levels in the data (3 in this example). Here is my attempt at replicating:
p + scale_fill_manual(values = brewer.pal(length(levels(factor(mtcars$cyl))), "Blues"))
My question is: how does scale_fill_brewer compute the number of levels in the data?
I'm interested in understanding what else fill_color_brewer might be doing under the hood. Might I run into any difficulty if I replace the more user friendly fill_color_brewer with a more contorted implementation of scale_fill_manual like the one above.
Perusing the source code:
scale_fill_brewer
function (..., type = "seq", palette = 1) {
discrete_scale("fill", "brewer", brewer_pal(type, palette), ...)
}
I couldn't see through this how scale_fill_brewer computes the number of unique levels in the data. Perhaps hidden in the ... ?
Edit: Where does the function scale_fill_brewer receive instructions to compute the number of levels in the data? Is it in "seq" or in ... or elsewhere?
The discrete_scale function is intricate and I'm lost. Here are its arguments:
discrete_scale <- function(aesthetics, scale_name, palette, name = NULL,
breaks = waiver(), labels = waiver(), legend = NULL, limits = NULL,
expand = waiver(), na.value = NA, drop = TRUE, guide="legend") {
Does any of this compute the number of levels?
The easiest way is to trace it is to think in terms of (1) setting up the plot data structure, and (2) resolving the aesthetics. It uses S3 so the branching is implicit
The setup call sequence
[scale-brewer.R] scale_fill_brewer(type="seq", palette="Blues")
[scale-.R] discrete_scale(...) - return an object representing the scale
structure(list(
call = match.call(),
aesthetics = aesthetics,
scale_name = scale_name,
palette = palette,
range = DiscreteRange$new(), ## this is scales::DiscreteRange
...), , class = c(scale_name, "discrete", "scale"))
The resolve call sequence
[plot-build.R] ggplot_build(plot) - for non-position scales, apply scales_train_df
# Train and map non-position scales
npscales <- scales$non_position_scales() ## scales is plot$scales, S4 type Scales
if (npscales$n() > 0) {
lapply(data, scales_train_df, scales = npscales)
data <- lapply(data, scales_map_df, scales = npscales)
}
[scales-.r] scales_train_df(...) - iterate again over scales$scales (list)
[scale-.r] scale_train_df(...) - iterate again
[scale-.r] scale_train(...) - S3 generic function
[scale-.r] scale_train.discrete(...) - almost there...
scale$range$train(x, drop = scale$drop)
but scale$range is a DiscreteRange instance, so it calls (scales::DiscreteRange$new())$train, which overwrites scale$range!
range <<- train_discrete(x, range, drop)
scales:::train_discrete(...) - again, almost there...
scales:::discrete_range(...) - still not there..
scales:::clevels(...) - there it is!
As of this point, scale$range has been overwritten by the levels of the factor. Unwinding the call stack to #1, we now call scales_map_df
[plot-build.R] ggplot_build(plot) - for non-position scales, apply scales_train_df
# Train and map non-position scales
npscales <- scales$non_position_scales() ## scales is plot$scales, S4 type Scales
if (npscales$n() > 0) {
lapply(data, scales_train_df, scales = npscales)
data <- lapply(data, scales_map_df, scales = npscales)
}
[scales-.r] scale_maps_df(...) - iterate
[scale-.r] scale_map_df(...) - iterate
[scale-.r] scale_map.discrete - fill up the palette (non-position scale!)
scale_map.discrete <- function(scale, x, limits = scale_limits(scale)) {
n <- sum(!is.na(limits))
pal <- scale$palette(n)
...
}

Writing ggplot functions in R with optional arguments

I have a series of ggplot graphs that I'm repeating with a few small variations. I would like to wrap these qplots with their options into a function to avoid a lot of repetition in the code.
My problem is that for some of the graphs I am using the + facet_wrap() option, but for others I am not. I.e. I need the facet wrap to be an optional argument. When it is included the code needs to call the +facet_wrap() with the variable supplied in the facets argument.
So ideally my function would look like this, with facets being an optional argument:
$ qhist(variable, df, heading, facets)
I have tried googling how to add optional arguments and they suggest either passing a default value or using an if loop with the missing() function. I haven't been able to get either to work.
Here is the function that I have written, with the desired functionality of the optional facets argument included too.
$ qhist <- function(variable, df, heading, facets) {
qplot(variable, data = df, geom = "histogram", binwidth = 2000,
xlab = "Salary", ylab = "Noms") +
theme_bw() +
scale_x_continuous(limits=c(40000,250000),
breaks=c(50000,100000,150000,200000,250000),
labels=c("50k","100k","150k","200k","250k")) +
opts(title = heading, plot.title = theme_text(face = "bold",
size = 14), strip.text.x = theme_text(size = 10, face = 'bold'))
# If facets argument supplied add the following, else do not add this code
+ facet_wrap(~ facets)
the way to set up a default is like this:
testFunction <- function( requiredParam, optionalParam=TRUE, alsoOptional=123 ) {
print(requiredParam)
if (optionalParam==TRUE) print("you kept the default for optionalParam")
paste("for alsoOptional you entered", alsoOptional)
}
*EDIT*
Oh, ok... so I think I have a better idea of what you are asking. It looks like you're not sure how to bring the optional facet into the ggplot object. How about this:
qhist <- function(variable, df, heading, facets=NULL) {
d <- qplot(variable, data = df, geom = "histogram", binwidth = 2000,
xlab = "Salary", ylab = "Noms") +
theme_bw() +
scale_x_continuous(limits=c(40000,250000),
breaks=c(50000,100000,150000,200000,250000),
labels=c("50k","100k","150k","200k","250k")) +
opts(title = heading, plot.title = theme_text(face = "bold",
size = 14), strip.text.x = theme_text(size = 10, face = 'bold'))
# If facets argument supplied add the following, else do not add this code
if (is.null(facets)==FALSE) d <- d + facet_wrap(as.formula(paste("~", facets)))
d
return(d)
}
I have not tested this code at all. But the general idea is that the facet_wrap expects a formula, so if the facets are passed as a character string you can build a formula with as.formula() and then add it to the plot object.
If I were doing it, I would have the function accept an optional facet formula and then pass that facet formula directly into the facet_wrap. That would negate the need for the as.formula() call to convert the text into a formula.
Probably, the best way is to stop using such unusual variable names including commas or spaces.
As a workaround, here is an extension of #JDLong's answer. The trick is to rename the facet variable.
f <- function(dat, facet = NULL) {
if(!missing(facet)) {
names(dat)[which(names(dat) == facet)] <- ".facet."
ff <- facet_wrap(~.facet.)
} else {
ff <- list()
}
qplot(x, y, data = dat) + ff
}
d <- data.frame(x = 1:10, y = 1:10, "o,o" = gl(2,5), check.names=F)
f(d, "o,o")
f(d)
Note that you can also use missing(facets) to check if the facets argument was specified or not. If you use #JD Long's solution, it would look something like this:
qhist <- function(variable, df, heading, facets) {
... insert #JD Longs' solution ...
if (!missing(facets)) d <- d + facet_wrap(as.formula(paste("~", facets)))
return(d)
}
...Note that I also changed the default argument from facets=NULL to just facets.
Many R functions use missing arguments like this, but in general I tend to prefer #JD Long's variant of using a default argument value (like NULL or NA) when possible. But sometimes there is no good default value...

Resources