Consider the following lines.
p <- ggplot(mpg, aes(x=factor(cyl), y=..count..))
p + geom_histogram()
p + stat_summary(fun.y=identity, geom='bar')
In theory, the last two should produce the same plot. In practice, stat_summary fails and complains that the required y aesthetic is missing.
Why can't I use ..count.. in stat_summary? I can't find anywhere in the docs information about how to use these variables.
Expanding #joran's comment, the special variables in ggplot with double periods around them (..count.., ..density.., etc.) are returned by a stat transformation of the original data set. Those particular ones are returned by stat_bin which is implicitly called by geom_histogram (note in the documentation that the default value of the stat argument is "bin"). Your second example calls a different stat function which does not create a variable named ..count... You can get the same graph with
p + geom_bar(stat="bin")
In newer versions of ggplot2, one can also use the stat function instead of the enclosing .., so aes(y = ..count..) becomes aes(y = stat(count)).
Related
I have to admit that it has been a while since I used ggplot, but this seems a bit silly. Either I am missing something fundamental when trying to make a density plot, or there is a bug in ggplot2 (v3.3.2)
test <- data.frame(Time=rnorm(100),Age=rnorm(100))
ggplot(test,aes(y=Time,x=Age)) +
geom_density(aes(y=Time,x=Age))
produces
ggplot(test,aes(y=Time,x=Age)) +
geom_density(aes(y=Time,x=Age))
Error: geom_density requires the following missing aesthetics: y
how could the 'y' aesthetic be missing??
There are two cases when using geom_density(). It depends which stat layer you're specifying:
The standard case is the stat density which makes the geom_density() function compute its y values based on the frequency distribution of the given x values. In this case you must NOT proved a y aesthetic because those are computed behind the curtain.
Then there is a second case, which is yours, and which you have to specify explicitly by changing the stat to identity: This is needed if, for some reason, you've precalculated values which you want to feed directly into the density function.
Your problem arises, if you're mixing case 1) and 2). But I agree, the error message is not really clear, it could be mentioned to make sure that the used stat is the desired one.
library(ggplot2)
test <- data.frame(time = rnorm(100), age = rnorm(100))
#if you want to use precalculated y values you have to change the used stat to identity:
ggplot(test) +
geom_density(aes(x = age, y = time),
stat = "identity")
# compared to the case with the default value of stat: stat = "density"
ggplot(test) +
geom_density(aes(x = age))
Created on 2020-08-04 by the reprex package (v0.3.0)
If you want to plot the two variables in the graphic you need to "melt" it first.
test <- data.frame(Time=rnorm(100),Age=rnorm(100))
dt <- data.table(test)
dt_melt <- melt.data.table(dt)
ggplot(dt_melt,aes(x=value, fill=variable)) + geom_density(alpha=0.25)
Consider the following lines.
p <- ggplot(mpg, aes(x=factor(cyl), y=..count..))
p + geom_histogram()
p + stat_summary(fun.y=identity, geom='bar')
In theory, the last two should produce the same plot. In practice, stat_summary fails and complains that the required y aesthetic is missing.
Why can't I use ..count.. in stat_summary? I can't find anywhere in the docs information about how to use these variables.
Expanding #joran's comment, the special variables in ggplot with double periods around them (..count.., ..density.., etc.) are returned by a stat transformation of the original data set. Those particular ones are returned by stat_bin which is implicitly called by geom_histogram (note in the documentation that the default value of the stat argument is "bin"). Your second example calls a different stat function which does not create a variable named ..count... You can get the same graph with
p + geom_bar(stat="bin")
In newer versions of ggplot2, one can also use the stat function instead of the enclosing .., so aes(y = ..count..) becomes aes(y = stat(count)).
What is the difference between putting aes(x=…) in ggplot() or in geom() (e.g. geom_histogram() below):
1. in ggplot():
ggplot(diamonds) +
geom_histogram(binwidth=500, aes(x=diamonds$price))+
xlab("Diamond Price U$") + ylab("Frequency")+
ggtitle("Diamond Price Distribution")
2. in the geom():
ggplot(diamonds, aes(x=diamonds$price)) +
geom_histogram(bidwidth= 500) +
xlab("Price") + ylab("Frequncy") +
ggtitle("Diamonds Price distribution")
Whether you put x = price in the original ggplot() call or in a specific geom only really matters if you have multiple geoms with different mappings. The mapping you specify in the ggplot() call will be applied to all geoms, so it's often best to put the mapping in the top level like that, if only to save you having to type it out again for each individual geom. Specify mappings in the individual geoms if they only apply to that specific geom.
Also note that it should just be aes(x = price), not aes(x = diamonds$price). ggplot knows to look in the dataframe you're using as your data argument. If you pass a vector manually like diamonds$price you might mess up facetting or grouping in a more complex plot.
I have been trying to work out a few things about ggplot2, and how supplemntary arguments inherit from the first part ggplot(). Specifically, if inheritance is passed on beyond the geom_*** part.
I have a histogram of data:
ggplot(data = faithful, aes(eruptions)) + geom_histogram()
Which produces a fine chart, though the breaks are default. It appears to me (an admitted novice), that geom_histogram() is inheriting the data specification from ggplot(). If I want to have a smarter way of setting the breaks I could use a process like so:
ggplot(data = faithful, aes(eruptions)) +
geom_histogram(breaks = seq(from = min(faithful$eruptions),
to = max(faithful$eruptions), length.out = 10))
However, here I am re-specifying within the geom_histogram() function that I want faithful$eruptions. I have been unable to find a way to phrase this without re-specifying. Further, if I use the data = argument in geom_histogram(), and specify just eruptions in min and max, seq() still doesn't understand that I mean the faithful data set.
I know that seq is not part of ggplot2, but I wondered if it might be able to inherit regardless, as it is bound within geom_histogram(), which itself inherits from ggplot(). Am I just using the wrong syntax, or is this possible?
Note that the term you are looking for is not "inheritance", but non standard evaluation (NSE). ggplot offers a couple of places where you can refer to your data items by their column names instead of a full reference (NSE), but those are the mapping arguments to the geom_* layers only, and even then when you are using aes. These work:
ggplot(faithful) + geom_point(aes(eruptions, eruptions))
ggplot(faithful) + geom_point(aes(eruptions, eruptions, size=waiting))
The following doesn't work because we are referring to waiting outside of aes and mapping (note first arg to geom_* is the mapping arg):
ggplot(faithful) + geom_point(aes(eruptions, eruptions), size=waiting)
But this works:
ggplot(faithful) + geom_point(aes(eruptions, eruptions), size=faithful$waiting)
though differently since now size is being interpreted litterally instead of being normalized as when part of mapping.
In your case, since breaks is not part of the aes/mapping spec, you can't use NSE and you are left using the full reference. Some possible work-arounds:
ggplot(data = faithful, aes(eruptions)) + geom_histogram(bins=10) # not identical
ggplot(data=faithful, aes(eruptions)) +
geom_histogram(
breaks=with(faithful, # use `with`
seq(from=max(eruptions), to=min(eruptions), length.out=10)
) )
And no-NSE, but a little less typing:
ggplot(data=faithful, aes(eruptions)) +
geom_histogram(
breaks=do.call(seq, c(as.list(range(faithful$eruptions)), len=10))
)
Based on the ggplot2 documentation it seems that + operator which is really the +.gg function allows adding the following objects to a ggplot object:
data.frame, uneval, layer, theme, scale, coord, facet
The geom function are functions that create layers which inherit the data and aes from the ggplot object "above" unless stated otherwise.
However the ggplot object and functions "live" in the Global environment, and thus calling a function such as seq which doesn't create a ggplot object from the ones listed above and doesn't inherit the ggplot object's themes (with the + operator which apply's to the listed above objects) lives in the global environment which doesn't include an object eruptions
Consider the following lines.
p <- ggplot(mpg, aes(x=factor(cyl), y=..count..))
p + geom_histogram()
p + stat_summary(fun.y=identity, geom='bar')
In theory, the last two should produce the same plot. In practice, stat_summary fails and complains that the required y aesthetic is missing.
Why can't I use ..count.. in stat_summary? I can't find anywhere in the docs information about how to use these variables.
Expanding #joran's comment, the special variables in ggplot with double periods around them (..count.., ..density.., etc.) are returned by a stat transformation of the original data set. Those particular ones are returned by stat_bin which is implicitly called by geom_histogram (note in the documentation that the default value of the stat argument is "bin"). Your second example calls a different stat function which does not create a variable named ..count... You can get the same graph with
p + geom_bar(stat="bin")
In newer versions of ggplot2, one can also use the stat function instead of the enclosing .., so aes(y = ..count..) becomes aes(y = stat(count)).