I am trying to understand the connection between scale_fill_brewer and scale_fill_manual of package ggplot2.
First, generate a ggplot with filled colors:
library(ggplot2)
p <- ggplot(data = mtcars, aes(x = mpg, y = wt,
group = cyl, fill = factor(cyl))) +
geom_area(position = 'stack')
# apply ready-made palette with scale_fill_brewer from ggplot2
p + scale_fill_brewer(palette = "Blues")
Now, replicate with scale_fill_manual
library(RColorBrewer)
p + scale_fill_manual(values = brewer.pal(3, "Blues"))
where 3 is the number of fill-colors in the data. For convenience, I have used the brewer.pal function of package RColorBrewer.
As far as I understand, the convenience of scale_fill_brewer is that it automatically computes the number of unique levels in the data (3 in this example). Here is my attempt at replicating:
p + scale_fill_manual(values = brewer.pal(length(levels(factor(mtcars$cyl))), "Blues"))
My question is: how does scale_fill_brewer compute the number of levels in the data?
I'm interested in understanding what else fill_color_brewer might be doing under the hood. Might I run into any difficulty if I replace the more user friendly fill_color_brewer with a more contorted implementation of scale_fill_manual like the one above.
Perusing the source code:
scale_fill_brewer
function (..., type = "seq", palette = 1) {
discrete_scale("fill", "brewer", brewer_pal(type, palette), ...)
}
I couldn't see through this how scale_fill_brewer computes the number of unique levels in the data. Perhaps hidden in the ... ?
Edit: Where does the function scale_fill_brewer receive instructions to compute the number of levels in the data? Is it in "seq" or in ... or elsewhere?
The discrete_scale function is intricate and I'm lost. Here are its arguments:
discrete_scale <- function(aesthetics, scale_name, palette, name = NULL,
breaks = waiver(), labels = waiver(), legend = NULL, limits = NULL,
expand = waiver(), na.value = NA, drop = TRUE, guide="legend") {
Does any of this compute the number of levels?
The easiest way is to trace it is to think in terms of (1) setting up the plot data structure, and (2) resolving the aesthetics. It uses S3 so the branching is implicit
The setup call sequence
[scale-brewer.R] scale_fill_brewer(type="seq", palette="Blues")
[scale-.R] discrete_scale(...) - return an object representing the scale
structure(list(
call = match.call(),
aesthetics = aesthetics,
scale_name = scale_name,
palette = palette,
range = DiscreteRange$new(), ## this is scales::DiscreteRange
...), , class = c(scale_name, "discrete", "scale"))
The resolve call sequence
[plot-build.R] ggplot_build(plot) - for non-position scales, apply scales_train_df
# Train and map non-position scales
npscales <- scales$non_position_scales() ## scales is plot$scales, S4 type Scales
if (npscales$n() > 0) {
lapply(data, scales_train_df, scales = npscales)
data <- lapply(data, scales_map_df, scales = npscales)
}
[scales-.r] scales_train_df(...) - iterate again over scales$scales (list)
[scale-.r] scale_train_df(...) - iterate again
[scale-.r] scale_train(...) - S3 generic function
[scale-.r] scale_train.discrete(...) - almost there...
scale$range$train(x, drop = scale$drop)
but scale$range is a DiscreteRange instance, so it calls (scales::DiscreteRange$new())$train, which overwrites scale$range!
range <<- train_discrete(x, range, drop)
scales:::train_discrete(...) - again, almost there...
scales:::discrete_range(...) - still not there..
scales:::clevels(...) - there it is!
As of this point, scale$range has been overwritten by the levels of the factor. Unwinding the call stack to #1, we now call scales_map_df
[plot-build.R] ggplot_build(plot) - for non-position scales, apply scales_train_df
# Train and map non-position scales
npscales <- scales$non_position_scales() ## scales is plot$scales, S4 type Scales
if (npscales$n() > 0) {
lapply(data, scales_train_df, scales = npscales)
data <- lapply(data, scales_map_df, scales = npscales)
}
[scales-.r] scale_maps_df(...) - iterate
[scale-.r] scale_map_df(...) - iterate
[scale-.r] scale_map.discrete - fill up the palette (non-position scale!)
scale_map.discrete <- function(scale, x, limits = scale_limits(scale)) {
n <- sum(!is.na(limits))
pal <- scale$palette(n)
...
}
Related
I want to output two plots in a grid using the same function but with different input for x. I am using ggplot2 with stat_function as per this post and I have combined the two plots as per this post and this post.
f01 <- function(x) {1 - abs(x)}
ggplot() +
stat_function(data = data.frame(x=c(-1, 1)), aes(x = x, color = "red"), fun = f01) +
stat_function(data = data.frame(x=c(-2, 2)), aes(x = x, color = "black"), fun = f01)
With the following outputs:
Plot:
Message:
`mapping` is not used by stat_function()`data` is not used by stat_function()`mapping` is not used by stat_function()`data` is not used by stat_function()
I don't understand why stat_function() won't use neither of the arguments. I would expect to plot two graphs one with x between -1:1 and the second with x between -2:2. Furthermore it takes the colors as labels, which I also don't understand why. I must be missing something obvious.
The issue is that according to the docs the data argument is
Ignored by stat_function(), do not use.
Hence, at least in the second call to stat_function the data is ignored.
Second, the
The function is called with a grid of evenly spaced values along the x axis, and the results are drawn (by default) with a line.
Therefore both functions are plotted over the same range of x values.
If you simply want to draw functions this can be achievd without data and mappings like so:
library(ggplot2)
f01 <- function(x) {1 - abs(x)}
ggplot() +
stat_function(color = "black", fun = f01, xlim = c(-2, 2)) +
stat_function(color = "red", fun = f01, xlim = c(-1, 1))
To be honest, I'm not really sure what happens here with ggplot and its inner workings. It seems that the functions are always applied to the complete range, here -2 to 2. Also, there is an issue on github regarding a wrong error message for stat_function.
However, you can use the xlim argument for your stat_function to limit the range on which a function is drawn. Also, if you don't specify the colour argument by a variable, but by a manual label, you need to tell which colours should be used for which label with scale_colour_manual (easiest with a named vector). I also adjusted the line width to show the function better:
library(ggplot2)
f01 <- function(x) {1 - abs(x)}
cols <- c("red" = "red", "black" = "black")
ggplot() +
stat_function(data = data.frame(x=c(-1, 1)), aes(x = x, colour = "red"), fun = f01, size = 1.5, xlim = c(-1, 1)) +
stat_function(data = data.frame(x=c(-2, 2)), aes(x = x, colour = "black"), fun = f01) +
scale_colour_manual(values = cols)
I made a plot like so...
ggplot(my_data, aes(x = ttd, y = aval)) +
theme_bw() +
geom_point(alpha = 0.25)
That gave me a nice plot with ttd and aval as my axes labels. I like how it used the names of the arguments as the default labels.
However, I have a bunch of plots like this, and I wanted to abstract it into my own function. But I can't seem to make the plot from inside the function. Here's what I tried:
bw_plot <- function(data, x_, y_) {
ggplot(data, aes(x = substitute(x_), y = substitute(y_))) +
theme_bw() +
geom_point(alpha = 0.25)
}
bw_plot(my_data, ttd, aval)
But I get this error:
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : object 'x_' not found
I simply want to pass the symbols down from my bw_plot function into ggplot. How can I get it to see my actual column names?
(I also tried passing the column names in as strings and calling as.name on them, but i get the same result.
substitute will correctly "quote" your arguments x_ and y_. However, aes will apply a second round of "quoting" internally, which is what gives you the error. You need to unquote the result of your substitute calls, as you're passing them to aes. This can be done using !! operator from rlang.
library( ggplot2 )
library( rlang )
bw_plot <- function( .data, x_, y_ )
{
xsym <- ensym(x_)
ysym <- ensym(y_)
ggplot( .data, aes(x = !!xsym, y = !!ysym) ) +
theme_bw() +
geom_point(alpha=0.25)
}
Note that the correct function to use is rlang::ensym, rather than substitute, because you are aiming to capture individual symbols (column names). Also, I suggest not naming your argument data to avoid name collisions with a built-in function.
Here's example usage: bw_plot( mtcars, mpg, wt )
The accepted answer works for ggplot2 version 2.2.1.9000 and later. For versions 2.2.1 and prior, it looks like the non-standard evaluation used by aes makes this impossible. Instead, I have to use aes_ which provides an "escape hatch" allowing me to provide the symbols to the function as expected. Here is the solution:
bw_plot <- function(data, x_, y_) {
ggplot(data, aes_(x = substitute(x_), y = substitute(y_))) +
theme_bw() +
geom_point(alpha = 0.25)
}
bw_plot(my_data, ttd, aval)
I have a facetted point plot, and the facets are based on multiple factors:
p = p + facet_wrap(~ model + run, ncol = 1, scales = 'fixed')
I'm happy with ggplot2 using the existing unique values of model and run to build the facet labels, but I'd like to have them across one line, not multiple. However,
p = p + facet_wrap(~ model + run, ncol = 1, scales = 'fixed',
labeller = label_value(multi_line = FALSE)
results in a missing argument error, because label_value() expects the argument labels first:
Error in lapply(labels, as.character) :
argument "labels" is missing, with no default
I'm not sure how to supply this given there are multiple facetting variables specified. If I leave the labeller out entirely, ggplot2 seems happy to work it out itself.
You want to pass the function without calling it (it will be called with the labels when constructing the plot), so:
p = p + facet_wrap(~ model + run, ncol = 1, scales = 'fixed',
labeller = function(labs) {label_value(labs, multi_line = FALSE)})
I'm quite new to ggplot but I like the systematic way how you build your plots. Still, I'm struggeling to achieve desired results. I can replicate plots where you have categorical data. However, for my use I often need to fit a model to certain observations and then highlight them in a combined plot. With the usual plot function I would do:
library(splines)
set.seed(10)
x <- seq(-1,1,0.01)
y <- x^2
s <- interpSpline(x,y)
y <- y+rnorm(length(y),mean=0,sd=0.1)
plot(x,predict(s,x)$y,type="l",col="black",xlab="x",ylab="y")
points(x,y,col="red",pch=4)
points(0,0,col="blue",pch=1)
legend("top",legend=c("True Values","Model values","Special Value"),text.col=c("red","black","blue"),lty=c(NA,1,NA),pch=c(4,NA,1),col=c("red","black","blue"),cex = 0.7)
My biggest problem is how to build the data frame for ggplot which automatically then draws the legend? In this example, how would I translate this into ggplot to get a similar plot? Or is ggplot not made for this kind of plots?
Note this is just a toy example. Usually the model values are derived from a more complex model, just in case you wante to use a stat in ggplot.
The key part here is that you can map colors in aes by giving a string, which will produce a legend. In this case, there is no need to include the special value in the data.frame.
df <- data.frame(x = x, y = y, fit = predict(s, x)$y)
ggplot(df, aes(x, y)) +
geom_line(aes(y = fit, col = 'Model values')) +
geom_point(aes(col = 'True values')) +
geom_point(aes(col = 'Special value'), x = 0, y = 0) +
scale_color_manual(values = c('True values' = "red",
'Special value' = "blue",
'Model values' = "black"))
I have a series of ggplot graphs that I'm repeating with a few small variations. I would like to wrap these qplots with their options into a function to avoid a lot of repetition in the code.
My problem is that for some of the graphs I am using the + facet_wrap() option, but for others I am not. I.e. I need the facet wrap to be an optional argument. When it is included the code needs to call the +facet_wrap() with the variable supplied in the facets argument.
So ideally my function would look like this, with facets being an optional argument:
$ qhist(variable, df, heading, facets)
I have tried googling how to add optional arguments and they suggest either passing a default value or using an if loop with the missing() function. I haven't been able to get either to work.
Here is the function that I have written, with the desired functionality of the optional facets argument included too.
$ qhist <- function(variable, df, heading, facets) {
qplot(variable, data = df, geom = "histogram", binwidth = 2000,
xlab = "Salary", ylab = "Noms") +
theme_bw() +
scale_x_continuous(limits=c(40000,250000),
breaks=c(50000,100000,150000,200000,250000),
labels=c("50k","100k","150k","200k","250k")) +
opts(title = heading, plot.title = theme_text(face = "bold",
size = 14), strip.text.x = theme_text(size = 10, face = 'bold'))
# If facets argument supplied add the following, else do not add this code
+ facet_wrap(~ facets)
the way to set up a default is like this:
testFunction <- function( requiredParam, optionalParam=TRUE, alsoOptional=123 ) {
print(requiredParam)
if (optionalParam==TRUE) print("you kept the default for optionalParam")
paste("for alsoOptional you entered", alsoOptional)
}
*EDIT*
Oh, ok... so I think I have a better idea of what you are asking. It looks like you're not sure how to bring the optional facet into the ggplot object. How about this:
qhist <- function(variable, df, heading, facets=NULL) {
d <- qplot(variable, data = df, geom = "histogram", binwidth = 2000,
xlab = "Salary", ylab = "Noms") +
theme_bw() +
scale_x_continuous(limits=c(40000,250000),
breaks=c(50000,100000,150000,200000,250000),
labels=c("50k","100k","150k","200k","250k")) +
opts(title = heading, plot.title = theme_text(face = "bold",
size = 14), strip.text.x = theme_text(size = 10, face = 'bold'))
# If facets argument supplied add the following, else do not add this code
if (is.null(facets)==FALSE) d <- d + facet_wrap(as.formula(paste("~", facets)))
d
return(d)
}
I have not tested this code at all. But the general idea is that the facet_wrap expects a formula, so if the facets are passed as a character string you can build a formula with as.formula() and then add it to the plot object.
If I were doing it, I would have the function accept an optional facet formula and then pass that facet formula directly into the facet_wrap. That would negate the need for the as.formula() call to convert the text into a formula.
Probably, the best way is to stop using such unusual variable names including commas or spaces.
As a workaround, here is an extension of #JDLong's answer. The trick is to rename the facet variable.
f <- function(dat, facet = NULL) {
if(!missing(facet)) {
names(dat)[which(names(dat) == facet)] <- ".facet."
ff <- facet_wrap(~.facet.)
} else {
ff <- list()
}
qplot(x, y, data = dat) + ff
}
d <- data.frame(x = 1:10, y = 1:10, "o,o" = gl(2,5), check.names=F)
f(d, "o,o")
f(d)
Note that you can also use missing(facets) to check if the facets argument was specified or not. If you use #JD Long's solution, it would look something like this:
qhist <- function(variable, df, heading, facets) {
... insert #JD Longs' solution ...
if (!missing(facets)) d <- d + facet_wrap(as.formula(paste("~", facets)))
return(d)
}
...Note that I also changed the default argument from facets=NULL to just facets.
Many R functions use missing arguments like this, but in general I tend to prefer #JD Long's variant of using a default argument value (like NULL or NA) when possible. But sometimes there is no good default value...