Writing ggplot functions in R with optional arguments - r

I have a series of ggplot graphs that I'm repeating with a few small variations. I would like to wrap these qplots with their options into a function to avoid a lot of repetition in the code.
My problem is that for some of the graphs I am using the + facet_wrap() option, but for others I am not. I.e. I need the facet wrap to be an optional argument. When it is included the code needs to call the +facet_wrap() with the variable supplied in the facets argument.
So ideally my function would look like this, with facets being an optional argument:
$ qhist(variable, df, heading, facets)
I have tried googling how to add optional arguments and they suggest either passing a default value or using an if loop with the missing() function. I haven't been able to get either to work.
Here is the function that I have written, with the desired functionality of the optional facets argument included too.
$ qhist <- function(variable, df, heading, facets) {
qplot(variable, data = df, geom = "histogram", binwidth = 2000,
xlab = "Salary", ylab = "Noms") +
theme_bw() +
scale_x_continuous(limits=c(40000,250000),
breaks=c(50000,100000,150000,200000,250000),
labels=c("50k","100k","150k","200k","250k")) +
opts(title = heading, plot.title = theme_text(face = "bold",
size = 14), strip.text.x = theme_text(size = 10, face = 'bold'))
# If facets argument supplied add the following, else do not add this code
+ facet_wrap(~ facets)

the way to set up a default is like this:
testFunction <- function( requiredParam, optionalParam=TRUE, alsoOptional=123 ) {
print(requiredParam)
if (optionalParam==TRUE) print("you kept the default for optionalParam")
paste("for alsoOptional you entered", alsoOptional)
}
*EDIT*
Oh, ok... so I think I have a better idea of what you are asking. It looks like you're not sure how to bring the optional facet into the ggplot object. How about this:
qhist <- function(variable, df, heading, facets=NULL) {
d <- qplot(variable, data = df, geom = "histogram", binwidth = 2000,
xlab = "Salary", ylab = "Noms") +
theme_bw() +
scale_x_continuous(limits=c(40000,250000),
breaks=c(50000,100000,150000,200000,250000),
labels=c("50k","100k","150k","200k","250k")) +
opts(title = heading, plot.title = theme_text(face = "bold",
size = 14), strip.text.x = theme_text(size = 10, face = 'bold'))
# If facets argument supplied add the following, else do not add this code
if (is.null(facets)==FALSE) d <- d + facet_wrap(as.formula(paste("~", facets)))
d
return(d)
}
I have not tested this code at all. But the general idea is that the facet_wrap expects a formula, so if the facets are passed as a character string you can build a formula with as.formula() and then add it to the plot object.
If I were doing it, I would have the function accept an optional facet formula and then pass that facet formula directly into the facet_wrap. That would negate the need for the as.formula() call to convert the text into a formula.

Probably, the best way is to stop using such unusual variable names including commas or spaces.
As a workaround, here is an extension of #JDLong's answer. The trick is to rename the facet variable.
f <- function(dat, facet = NULL) {
if(!missing(facet)) {
names(dat)[which(names(dat) == facet)] <- ".facet."
ff <- facet_wrap(~.facet.)
} else {
ff <- list()
}
qplot(x, y, data = dat) + ff
}
d <- data.frame(x = 1:10, y = 1:10, "o,o" = gl(2,5), check.names=F)
f(d, "o,o")
f(d)

Note that you can also use missing(facets) to check if the facets argument was specified or not. If you use #JD Long's solution, it would look something like this:
qhist <- function(variable, df, heading, facets) {
... insert #JD Longs' solution ...
if (!missing(facets)) d <- d + facet_wrap(as.formula(paste("~", facets)))
return(d)
}
...Note that I also changed the default argument from facets=NULL to just facets.
Many R functions use missing arguments like this, but in general I tend to prefer #JD Long's variant of using a default argument value (like NULL or NA) when possible. But sometimes there is no good default value...

Related

Plotting two stat_function()'s in a grid using ggplot

I want to output two plots in a grid using the same function but with different input for x. I am using ggplot2 with stat_function as per this post and I have combined the two plots as per this post and this post.
f01 <- function(x) {1 - abs(x)}
ggplot() +
stat_function(data = data.frame(x=c(-1, 1)), aes(x = x, color = "red"), fun = f01) +
stat_function(data = data.frame(x=c(-2, 2)), aes(x = x, color = "black"), fun = f01)
With the following outputs:
Plot:
Message:
`mapping` is not used by stat_function()`data` is not used by stat_function()`mapping` is not used by stat_function()`data` is not used by stat_function()
I don't understand why stat_function() won't use neither of the arguments. I would expect to plot two graphs one with x between -1:1 and the second with x between -2:2. Furthermore it takes the colors as labels, which I also don't understand why. I must be missing something obvious.
The issue is that according to the docs the data argument is
Ignored by stat_function(), do not use.
Hence, at least in the second call to stat_function the data is ignored.
Second, the
The function is called with a grid of evenly spaced values along the x axis, and the results are drawn (by default) with a line.
Therefore both functions are plotted over the same range of x values.
If you simply want to draw functions this can be achievd without data and mappings like so:
library(ggplot2)
f01 <- function(x) {1 - abs(x)}
ggplot() +
stat_function(color = "black", fun = f01, xlim = c(-2, 2)) +
stat_function(color = "red", fun = f01, xlim = c(-1, 1))
To be honest, I'm not really sure what happens here with ggplot and its inner workings. It seems that the functions are always applied to the complete range, here -2 to 2. Also, there is an issue on github regarding a wrong error message for stat_function.
However, you can use the xlim argument for your stat_function to limit the range on which a function is drawn. Also, if you don't specify the colour argument by a variable, but by a manual label, you need to tell which colours should be used for which label with scale_colour_manual (easiest with a named vector). I also adjusted the line width to show the function better:
library(ggplot2)
f01 <- function(x) {1 - abs(x)}
cols <- c("red" = "red", "black" = "black")
ggplot() +
stat_function(data = data.frame(x=c(-1, 1)), aes(x = x, colour = "red"), fun = f01, size = 1.5, xlim = c(-1, 1)) +
stat_function(data = data.frame(x=c(-2, 2)), aes(x = x, colour = "black"), fun = f01) +
scale_colour_manual(values = cols)

Framework for adding lines on existing plots: "points" and "plot" in ggplot

This is probably a pretty naive question. I searched and couldn't find a duplicate, but please let me know if this has already been asked!
I have some functions that, based on a boolean argument, either make a new plot or plot ontop of the existing graph. Here's a MWE:
plotThing <- function(boolPoints = FALSE, color = "black") {
x <- sample(c(1:100), size = 10, replace=TRUE)
y <- sample(c(1:100), size = 10, replace=TRUE)
if(boolPoints) {
points(y~x, col = color, pch = 19)
}
else {
plot(y~x, col = color, pch = 19)
}
}
This style of function has been useful for me because I import all of these functions into a markdown document, and then it's really easy to just plot a arbitrarily large amount of points on one plot. For example, in the markdown document:
```{r MWE}
source("MWE.R")
plotThing();
plotThing(boolPoints = TRUE, color = "red")
legend("topright", title = "Things", c("thing1", "thing2"), pch = c(19,19), col = c("black", "red"))
```
In the actual code, I think this style might be a good choice because my plotting functions are pretty large and have lots of arguments that would be ugly to repeat over and over again.
Is it possible/how would I have an equivalent setup with ggplot? Or is this just poor design that you'd recommend changing?
I am not sure whether this will directly answer your question, but I think it might help.
So, in ggplot2 you have the nice feature of being able to save plots without printing them:
p <- ggplot(mtcars, aes(x=hp, y=mpg)) + geom_point()
Now, you could - based on a condition - add further points to the same plot if you have gathered more data and print it or just print the basic plot:
if(addPoints){
print(p + geom_point(data = data.frame(hp=500, mpg=50)))
} else {
print(p)
}
However, if you add additional data to the plot you would just have to make sure that the aesthetic mapping matches.

R preserve symbol names across indirection

I made a plot like so...
ggplot(my_data, aes(x = ttd, y = aval)) +
theme_bw() +
geom_point(alpha = 0.25)
That gave me a nice plot with ttd and aval as my axes labels. I like how it used the names of the arguments as the default labels.
However, I have a bunch of plots like this, and I wanted to abstract it into my own function. But I can't seem to make the plot from inside the function. Here's what I tried:
bw_plot <- function(data, x_, y_) {
ggplot(data, aes(x = substitute(x_), y = substitute(y_))) +
theme_bw() +
geom_point(alpha = 0.25)
}
bw_plot(my_data, ttd, aval)
But I get this error:
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : object 'x_' not found
I simply want to pass the symbols down from my bw_plot function into ggplot. How can I get it to see my actual column names?
(I also tried passing the column names in as strings and calling as.name on them, but i get the same result.
substitute will correctly "quote" your arguments x_ and y_. However, aes will apply a second round of "quoting" internally, which is what gives you the error. You need to unquote the result of your substitute calls, as you're passing them to aes. This can be done using !! operator from rlang.
library( ggplot2 )
library( rlang )
bw_plot <- function( .data, x_, y_ )
{
xsym <- ensym(x_)
ysym <- ensym(y_)
ggplot( .data, aes(x = !!xsym, y = !!ysym) ) +
theme_bw() +
geom_point(alpha=0.25)
}
Note that the correct function to use is rlang::ensym, rather than substitute, because you are aiming to capture individual symbols (column names). Also, I suggest not naming your argument data to avoid name collisions with a built-in function.
Here's example usage: bw_plot( mtcars, mpg, wt )
The accepted answer works for ggplot2 version 2.2.1.9000 and later. For versions 2.2.1 and prior, it looks like the non-standard evaluation used by aes makes this impossible. Instead, I have to use aes_ which provides an "escape hatch" allowing me to provide the symbols to the function as expected. Here is the solution:
bw_plot <- function(data, x_, y_) {
ggplot(data, aes_(x = substitute(x_), y = substitute(y_))) +
theme_bw() +
geom_point(alpha = 0.25)
}
bw_plot(my_data, ttd, aval)

ggplot2: default fill if aesthetic not supplied

I've got a function that uses ggplot. I'd like to either pass it a variable to use for the fill aesthetic, or use a different colour when I don't pass in anything.
The if/else statement does that for the function below, but my actual function is a bit more complicated and has a few if/else branches, so I'd like to minimise them.
Would it be possible to set the default fill for geom_bar() to "blue", but have that overridden if I pass in a fill aesthetic?
Alternatively, I was thinking of creating a list of arguments to be passed to geom_bar(), based on the arguments passed to my function, and splicing it in using !!!. I haven't really got my head around tidyeval/quasiquotation yet though so I don't quite know how that would work.
I'm open to other solutions too though.
library(ggplot2)
plotter <- function(x = mtcars, fill = NULL) {
p <- ggplot(x, aes(factor(cyl)))
# I don't like this if/else statement
if (!is.null(fill)) {
p <- p + geom_bar(mapping = aes_string(group = fill, fill = fill))
} else {
p <- p + geom_bar(fill = "blue")
}
p
}
plotter()
plotter(fill = "as.factor(gear)")

recreate scale_fill_brewer with scale_fill_manual

I am trying to understand the connection between scale_fill_brewer and scale_fill_manual of package ggplot2.
First, generate a ggplot with filled colors:
library(ggplot2)
p <- ggplot(data = mtcars, aes(x = mpg, y = wt,
group = cyl, fill = factor(cyl))) +
geom_area(position = 'stack')
# apply ready-made palette with scale_fill_brewer from ggplot2
p + scale_fill_brewer(palette = "Blues")
Now, replicate with scale_fill_manual
library(RColorBrewer)
p + scale_fill_manual(values = brewer.pal(3, "Blues"))
where 3 is the number of fill-colors in the data. For convenience, I have used the brewer.pal function of package RColorBrewer.
As far as I understand, the convenience of scale_fill_brewer is that it automatically computes the number of unique levels in the data (3 in this example). Here is my attempt at replicating:
p + scale_fill_manual(values = brewer.pal(length(levels(factor(mtcars$cyl))), "Blues"))
My question is: how does scale_fill_brewer compute the number of levels in the data?
I'm interested in understanding what else fill_color_brewer might be doing under the hood. Might I run into any difficulty if I replace the more user friendly fill_color_brewer with a more contorted implementation of scale_fill_manual like the one above.
Perusing the source code:
scale_fill_brewer
function (..., type = "seq", palette = 1) {
discrete_scale("fill", "brewer", brewer_pal(type, palette), ...)
}
I couldn't see through this how scale_fill_brewer computes the number of unique levels in the data. Perhaps hidden in the ... ?
Edit: Where does the function scale_fill_brewer receive instructions to compute the number of levels in the data? Is it in "seq" or in ... or elsewhere?
The discrete_scale function is intricate and I'm lost. Here are its arguments:
discrete_scale <- function(aesthetics, scale_name, palette, name = NULL,
breaks = waiver(), labels = waiver(), legend = NULL, limits = NULL,
expand = waiver(), na.value = NA, drop = TRUE, guide="legend") {
Does any of this compute the number of levels?
The easiest way is to trace it is to think in terms of (1) setting up the plot data structure, and (2) resolving the aesthetics. It uses S3 so the branching is implicit
The setup call sequence
[scale-brewer.R] scale_fill_brewer(type="seq", palette="Blues")
[scale-.R] discrete_scale(...) - return an object representing the scale
structure(list(
call = match.call(),
aesthetics = aesthetics,
scale_name = scale_name,
palette = palette,
range = DiscreteRange$new(), ## this is scales::DiscreteRange
...), , class = c(scale_name, "discrete", "scale"))
The resolve call sequence
[plot-build.R] ggplot_build(plot) - for non-position scales, apply scales_train_df
# Train and map non-position scales
npscales <- scales$non_position_scales() ## scales is plot$scales, S4 type Scales
if (npscales$n() > 0) {
lapply(data, scales_train_df, scales = npscales)
data <- lapply(data, scales_map_df, scales = npscales)
}
[scales-.r] scales_train_df(...) - iterate again over scales$scales (list)
[scale-.r] scale_train_df(...) - iterate again
[scale-.r] scale_train(...) - S3 generic function
[scale-.r] scale_train.discrete(...) - almost there...
scale$range$train(x, drop = scale$drop)
but scale$range is a DiscreteRange instance, so it calls (scales::DiscreteRange$new())$train, which overwrites scale$range!
range <<- train_discrete(x, range, drop)
scales:::train_discrete(...) - again, almost there...
scales:::discrete_range(...) - still not there..
scales:::clevels(...) - there it is!
As of this point, scale$range has been overwritten by the levels of the factor. Unwinding the call stack to #1, we now call scales_map_df
[plot-build.R] ggplot_build(plot) - for non-position scales, apply scales_train_df
# Train and map non-position scales
npscales <- scales$non_position_scales() ## scales is plot$scales, S4 type Scales
if (npscales$n() > 0) {
lapply(data, scales_train_df, scales = npscales)
data <- lapply(data, scales_map_df, scales = npscales)
}
[scales-.r] scale_maps_df(...) - iterate
[scale-.r] scale_map_df(...) - iterate
[scale-.r] scale_map.discrete - fill up the palette (non-position scale!)
scale_map.discrete <- function(scale, x, limits = scale_limits(scale)) {
n <- sum(!is.na(limits))
pal <- scale$palette(n)
...
}

Resources