What is the difference between putting aes(x=…) in ggplot() or in geom() (e.g. geom_histogram() below):
1. in ggplot():
ggplot(diamonds) +
geom_histogram(binwidth=500, aes(x=diamonds$price))+
xlab("Diamond Price U$") + ylab("Frequency")+
ggtitle("Diamond Price Distribution")
2. in the geom():
ggplot(diamonds, aes(x=diamonds$price)) +
geom_histogram(bidwidth= 500) +
xlab("Price") + ylab("Frequncy") +
ggtitle("Diamonds Price distribution")
Whether you put x = price in the original ggplot() call or in a specific geom only really matters if you have multiple geoms with different mappings. The mapping you specify in the ggplot() call will be applied to all geoms, so it's often best to put the mapping in the top level like that, if only to save you having to type it out again for each individual geom. Specify mappings in the individual geoms if they only apply to that specific geom.
Also note that it should just be aes(x = price), not aes(x = diamonds$price). ggplot knows to look in the dataframe you're using as your data argument. If you pass a vector manually like diamonds$price you might mess up facetting or grouping in a more complex plot.
Related
Been set this question for an assignment - but i've never used R before - any help is appreciated.
Many thanks.
Question:
Produce a scatter plot to compare CO2 emissions from Brazil and Argentina between 1950 and 2019....
I can get it for Brazil but cannot figure out how to add Argentina.
I think i have to do something with geom_point and filter?
df%>%
filter(Country=="Brazil", Year<=2019 & Year>=1950) %>%
ggplot(aes(x = Year, y = CO2_annual_tonnes)) +
geom_point(na.rm =TRUE, shape=20, size=2, colour="green") +
labs(x = "Year", y = "CO2Emmissions (tonnes)")
The answer depends on what you're looking to do, but generally adding another dimension to a scatter plot where you already have clear x and y dimensions is done by applying an aesthetic (color, shape, etc) or via faceting.
In both approaches, you actually don't want to filter the data. You use either aesthetics or faceting to "filter" in a way and map the data appropriately based on the country column in the dataset. If your dataset contains more countries than Argentina and Brazil, you will want to filter to only include those, so:
your_filtered_df <- your_df %>%
dplyr::filter(Country %in% c("Argentina", "Brazil"))
Faceting
Faceting is another way of saying you want to split up your one plot into two separate plots (one for Argentina, one for Brazil). Each plot will have the same aesthetics (look the same), but will have the appropriate "filtered" dataset.
In your case, you can try:
your_filtered_df %>%
ggplot(aes(x = Year, y = CO2_annual_tonnes)) +
geom_point(na.rm =TRUE, shape=20, size=2, colour="green") +
facet_wrap(~Country)
Aesthetics
Here, you have a lot of options. The idea is that you tell ggplot2 to map the appearance of individual points in the point geom to the value specified in your_filtered_df$Country. You do this by placing one of the aesthetic arguments for geom_point() inside of aes(). If you use shape=, for example it might look like this:
your_filtered_df %>%
ggplot(aes(x = Year, y = CO2_annual_tonnes)) +
geom_point(aes(shape=Country), na.rm =TRUE, size=2, colour="green")
This should show a plot that has a legend created to and two different shapes for the points that correspond to the country name. It's very important to remember that when you put an aesthetic like shape or color or size inside of aes(), you must not also have it outside. So, this will behave correctly:
geom_point(aes(colour=Country), ...)
But this will not:
geom_point(aes(colour=Country), colour="green", ...)
When one aesthetic is outside, it overrides the one in aes(). The second one will still show all points as green.
Don't Do this... but it works
OP posted a comment that indicated some additional hints from the professor, which was:
We were given the hint in the question "you can embed piped filter
functions within geom_point objects"
I believe they are referring to a final... very bad way of generating the points. This method would require you to have two geom_point() objects, and send each one a different filtered dataset. You can do this by accessing the data= argument within each geom_point() object. There are many problems with this approach, including the lack of a legend being generated, but if you simply must do it this way... here it is:
# painful to write this. it goes against all good practices with ggplot
your_filtered_df %>%
ggplot(aes(x = Year, y = CO2_annual_tonnes)) +
geom_point(data=your_filtered_df %>% dplyr::filter(Country=="Argentina"),
color="green", shape=20) +
geom_point(data=your_filtered_df %>% dplyr::filter(Country=="Brazil"),
color="red", shape=20)
You should probably see why this is not a good convention. Think about what you would do for representing 50 different countries... the above codes or methods would work, but with this method, you would have 50 individual geom_point() objects in your plot... ugh. Don't make a typo!
Consider the following lines.
p <- ggplot(mpg, aes(x=factor(cyl), y=..count..))
p + geom_histogram()
p + stat_summary(fun.y=identity, geom='bar')
In theory, the last two should produce the same plot. In practice, stat_summary fails and complains that the required y aesthetic is missing.
Why can't I use ..count.. in stat_summary? I can't find anywhere in the docs information about how to use these variables.
Expanding #joran's comment, the special variables in ggplot with double periods around them (..count.., ..density.., etc.) are returned by a stat transformation of the original data set. Those particular ones are returned by stat_bin which is implicitly called by geom_histogram (note in the documentation that the default value of the stat argument is "bin"). Your second example calls a different stat function which does not create a variable named ..count... You can get the same graph with
p + geom_bar(stat="bin")
In newer versions of ggplot2, one can also use the stat function instead of the enclosing .., so aes(y = ..count..) becomes aes(y = stat(count)).
I would like to plot another series of data on top of a current graph. The additional data only contains information for 3 (out of 6) spp, which are used in the facet_wraping.
The other series of data is currently a column (in the same data file).
Current graph:
ped.num <- ggplot(data, aes(ped.length, seeds.inflorstem))
ped.num + geom_point(size=2) + theme_bw() + facet_wrap(~spp, scales = "free_y")
Additional layer would be:
aes(ped.length, seeds.filled)
I feel I should be able to plot them using the same y-axis, because they have just slightly smaller values. How do I go about add this layer?
#ialm 's solution should work fine, but I recommend calling the aes function separately in each geom_* because it makes the code easier to read.
ped.num <- ggplot(data) +
geom_point(aes(x=ped.length, y=seeds.inflorstem), size=2) +
theme_bw() +
facet_wrap(~spp, scales="free_y") +
geom_point(aes(x=ped.length, y=seeds.filled))
(You'll always get better answers if you include example data, but I'll take a shot in the dark)
Since you want to plot two variables that are on the same data.frame, it's probably easiest to reshape the data before feeding it into ggplot:
library(reshape2)
# Melting data gives you exactly one observation per row - ggplot likes that
dat.melt <- melt(dat,
id.var = c("spp", "ped.length"),
measure.var = c("seeds.inflorstem", "seeds.filled")
)
# Plotting is slightly different - instead of explicitly naming each variable,
# you'll refer to "variable" and "value"
ggplot(dat.melt, aes(x = ped.length, y = value, color = variable)) +
geom_point(size=2) +
theme_bw() +
facet_wrap(~spp, scales = "free_y")
The seeds.filled values should plot only on the facets for the corresponding species.
I prefer this to Drew's (totally valid) approach of explicitly mapping different layers because you only need a single geom_point() whether you have two variables or twenty and it's easy to map a variety of aesthetics to variable.
Consider the following lines.
p <- ggplot(mpg, aes(x=factor(cyl), y=..count..))
p + geom_histogram()
p + stat_summary(fun.y=identity, geom='bar')
In theory, the last two should produce the same plot. In practice, stat_summary fails and complains that the required y aesthetic is missing.
Why can't I use ..count.. in stat_summary? I can't find anywhere in the docs information about how to use these variables.
Expanding #joran's comment, the special variables in ggplot with double periods around them (..count.., ..density.., etc.) are returned by a stat transformation of the original data set. Those particular ones are returned by stat_bin which is implicitly called by geom_histogram (note in the documentation that the default value of the stat argument is "bin"). Your second example calls a different stat function which does not create a variable named ..count... You can get the same graph with
p + geom_bar(stat="bin")
In newer versions of ggplot2, one can also use the stat function instead of the enclosing .., so aes(y = ..count..) becomes aes(y = stat(count)).
Consider the following lines.
p <- ggplot(mpg, aes(x=factor(cyl), y=..count..))
p + geom_histogram()
p + stat_summary(fun.y=identity, geom='bar')
In theory, the last two should produce the same plot. In practice, stat_summary fails and complains that the required y aesthetic is missing.
Why can't I use ..count.. in stat_summary? I can't find anywhere in the docs information about how to use these variables.
Expanding #joran's comment, the special variables in ggplot with double periods around them (..count.., ..density.., etc.) are returned by a stat transformation of the original data set. Those particular ones are returned by stat_bin which is implicitly called by geom_histogram (note in the documentation that the default value of the stat argument is "bin"). Your second example calls a different stat function which does not create a variable named ..count... You can get the same graph with
p + geom_bar(stat="bin")
In newer versions of ggplot2, one can also use the stat function instead of the enclosing .., so aes(y = ..count..) becomes aes(y = stat(count)).