Consider this code:
require(ggplot2)
ggplot(data = mtcars) +
geom_point(aes(x = drat, y = wt)) +
geom_hline(yintercept = 3) +
facet_grid(~ cyl) ## works
ggplot(data = mtcars) +
geom_point(aes(x = drat, y = wt)) +
geom_hline(yintercept = 3) +
facet_grid(~ factor(cyl)) ## does not work
# Error in factor(cyl) : object 'cyl' not found
# removing geom_hline: works again.
Google helped me to find a debug, namely wrapping intercept into aes
ggplot(data = mtcars) +
geom_point(aes(x = drat, y = wt)) +
geom_hline(aes(yintercept = 3)) +
facet_grid(~ factor(cyl)) # works
# R version 3.4.3 (2017-11-30)
# ggplot2_2.2.1
Hadley writes here that functions as variables need to be in every layer. (which sounds mysterious to me)
Why does this happen when factorising the facet variable?
So here's my best guess and explanation.
When Hadley says:
This is a known limitation of facetting with a function - the variables you use have to be present on every layer.
He means in ggplot, when you're going to use a function in the facetting function, you need to have the variable in every geom. The issue occurs because there cyl variable is not present in the hline geom.
It's important to remember, this is a limitation, not ideal behaviour. Moreso, a consequence of how their efficient code works, is that when using functions to facet, the variables must be present in every geom.
Without looking into the specifics of the ggplot2 functions, I'm guessing what wrapping aes around the yintercept argument does, is give an aesthetic mapping to the geom_hline function. The aes function maps variables to components of the plot, rather than static values. It's an important distinction. Even though we still set yintercept = 3, the fact that we have placed it in the aesthetic mapping, must somehow reference that cyl also exists in this space. That is, it connects geom_hline indirectly with cyl, meaning it's now in the layer, and no longer a limitation.
This may not be an entirely satisfying answer, but without reading over the ggplot2 code to try and work out specifically why this limitation occurs, this might be as good as you'll get for now. Hopefully one of these workarounds is sufficient for you :)
Related
I am a fairly experienced ggplot2 user and teach it to university students. However, I only just came across an example that uses the following syntax:
ggplot(mtcars) + aes(cyl) + geom_histogram()
This fits a lot better into the logic of adding up layers than specifying aes inside ggplot() or the geom_ ... but it does not seem to be documented anywhere in the ggplot2 help. Therefore, I am wondering whether there are any reasons why this syntax is limited / should not be used? (Obviously, I see that it needs to be specified in the geom if it is meant to differ between geoms ...)
This is verging on an opinion-based question, but I think it is on-topic, since it helps to clarify the syntax and structure of ggplot calls.
In a sense you have already answered the question yourself:
it does not seem to be documented anywhere in the ggplot2 help
This, and the near absence of examples in online tutorials, blogs and SO answers is a good enough reason not to use aes this way (or at least not to teach people to use it this way). It could lead to confusion and frustration on the part of new users.
This fits a lot better into the logic of adding up layers
This is sort of true, but could be a bit misleading. What it actually does is to specify the default aesthetic mapping, that subsequent layers will inherit from the ggplot object itself. It should be considered a core part of the base plot, along with the default data object, and therefore "belongs" in the initial ggplot call, rather than something that is being added or layered on to the plot. If you create a default ggplot object without data and mapping, the slots are still there, but contain waivers rather than being NULL :
p <- ggplot()
p$mapping
#> Aesthetic mapping:
#> <empty>
p$data
#> list()
#> attr(,"class")
#> [1] "waiver"
Note that unlike the scales and co-ordinate objects, for which you might argue that the same is also true, there can be no defaults for data and aesthetic mappings.
Does this mean you should never use this syntax? No, but it should be considered an advanced trick for folks who are well versed in ggplot. The most frequent use case I find for it is in changing the mapping of ggplots that are created in extension packages, such as ggsurvplot or ggraph, where the plotting functions use wrappers around ggplot. It can also be used to quickly create multiple plots with the same themes and colour scales:
p <- ggplot(iris, aes(Sepal.Width, Sepal.Length)) +
geom_point(aes(color = Species)) +
theme_light()
library(patchwork)
p + (p + aes(Petal.Width, Petal.Length))
So the bottom line is that you can use this if you want, but best avoid teaching it to beginners
TL;DR
I cannot see any strong reasons why not to use this pattern, but other patterns are recommended in the documentation, without elaboration.
What does + aes() do?
A ggplot has two types of aesthetics:
the default one (typically supplied inside ggplot()), and
geom_*() specific aesthetics
If inherit.aes = TRUE is set inside the geoms, then these two types of aesthetics are combined in the final plot. If the default aesthetic is not set, then the geom_* specific aesthetics must be set.
Using ggplot(df) + aes(x, y) changes the default aesthetic.
This is documented in ?"+.gg":
An aes() object replaces the default aesthetics.
Are there any reasons not to use it?
I cannot see any strong reasons not to. However, in the documentation of ?ggplot it is stated that:
There are three common ways to invoke ggplot():
ggplot(df, aes(x, y, other aesthetics))
ggplot(df)
ggplot()
The first method is recommended if all layers use the same data and the same set of aesthetics.
As far as I can see, the typical use case for + aes() is when all layers use the same aesthetics. So the documentation recommend the usual pattern ggplot(df, aes(x, y, other aesthetics)), but I cannot find an elaboration of why.
Further: even though the plots look identical, the objects returned by ggplot(df, aes() and ggplot(df) + aes() are not identical, so there might be some edge cases where one pattern would lead to errors or a different plot.
You can see the many small differences with this code:
library(ggplot2)
a <- ggplot(mtcars, aes(hp, mpg)) + geom_point()
b <- ggplot(mtcars) + aes(hp, mpg) + geom_point()
waldo::compare(a, b, x_arg = "a", y_arg = "b")
So I'm self-teaching myself R right now using this online resource: "https://r4ds.had.co.nz/data-visualisation.html#facets"
This particular section is going over the use of facet_wrap and facet_grid. It's clear to me that facet_grid is primarily used when wanting to visualize a plot along two additional dimensions, rather than just one. What I don't understand is why you can use facet_grid(.~variable) or facet_grid(variable~.) to basically achieve the same result as facet_wrap. Putting a "." in place of a variable results in just not faceting along the row or column dimension, or in other words showing 1 additional variable just as facet_wrap would do.
If anyone can shed some light on this, thank you!
If you use facet_grid, the facets will always be in one row/column. They will never wrap to make a rectangle. But really if you just have one variable with few levels, it doesn't much matter.
You can also see that facet_grid(.~variable) and facet_grid(variable~.) will put the facet labels in different places (row headings vs column headings)
mg <- ggplot(mtcars, aes(x = mpg, y = wt)) + geom_point()
mg + facet_grid(vs~ .) + labs(title="facet_grid(vs~ .)"),
mg + facet_grid(.~ vs) + labs(title="facet_grid(.~ vs)")
So in the most simple of cases, there's nothing that different between them. The main reason to use facet_grid is to have a single, common axis for all facets so you can easily scan across all panels to make a direct comparison of data.
Actually, the same result is not produced all the time...
The number of facets which appear across the graphs pane is fixed with facet_grid (always the number of unique values in the variable) where as facet_wrap, like its name suggests, wraps the facets around the graphics pane. In this way the functions only result in the same graph when the number of facets produced is small.
Both facet_grid and facet_wrap take their arguments in the form row~columns, and nowdays we don't need to use the dot with facet_grid.
In order to compare their differences let's add a new variable with 8 unqiue values to the mtcars data set:
library(tidyverse)
mtcars$example <- rep(1:8, length.out = 32)
ggplot()+
geom_point(data = mtcars, aes(x = mpg, y = wt))+
facet_grid(~example, labeller = label_both)
Which results in a cluttered plot:
Compared to:
ggplot()+
geom_point(data = mtcars, aes(x = mpg, y = wt))+
facet_wrap(~example, labeller = label_both)
Which results in:
Consider the following lines.
p <- ggplot(mpg, aes(x=factor(cyl), y=..count..))
p + geom_histogram()
p + stat_summary(fun.y=identity, geom='bar')
In theory, the last two should produce the same plot. In practice, stat_summary fails and complains that the required y aesthetic is missing.
Why can't I use ..count.. in stat_summary? I can't find anywhere in the docs information about how to use these variables.
Expanding #joran's comment, the special variables in ggplot with double periods around them (..count.., ..density.., etc.) are returned by a stat transformation of the original data set. Those particular ones are returned by stat_bin which is implicitly called by geom_histogram (note in the documentation that the default value of the stat argument is "bin"). Your second example calls a different stat function which does not create a variable named ..count... You can get the same graph with
p + geom_bar(stat="bin")
In newer versions of ggplot2, one can also use the stat function instead of the enclosing .., so aes(y = ..count..) becomes aes(y = stat(count)).
I have seen both usages, yet I don't know the difference between 2 in practical.
And, why
stat_vline(xintercept="mean", geom="vline") # this works
But
geom_vline(xintercept="mean", stat="vline") # this doesn't work
Does that mean after passing mean to a next layer which is vline in this case, the function becomes character? Is this behaviour general?
You might have found a bug. If you specify the aesthetics mapping (again) it works:
p <- ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point()
p + geom_vline(aes(x=wt, y=mpg), xintercept="mean", stat="vline")
Typical for ggplot2 documentation is somewhat sparse, which makes it difficult to judge if this is intentional.
Consider the following lines.
p <- ggplot(mpg, aes(x=factor(cyl), y=..count..))
p + geom_histogram()
p + stat_summary(fun.y=identity, geom='bar')
In theory, the last two should produce the same plot. In practice, stat_summary fails and complains that the required y aesthetic is missing.
Why can't I use ..count.. in stat_summary? I can't find anywhere in the docs information about how to use these variables.
Expanding #joran's comment, the special variables in ggplot with double periods around them (..count.., ..density.., etc.) are returned by a stat transformation of the original data set. Those particular ones are returned by stat_bin which is implicitly called by geom_histogram (note in the documentation that the default value of the stat argument is "bin"). Your second example calls a different stat function which does not create a variable named ..count... You can get the same graph with
p + geom_bar(stat="bin")
In newer versions of ggplot2, one can also use the stat function instead of the enclosing .., so aes(y = ..count..) becomes aes(y = stat(count)).