Add mean to grouped box plot in R with ggplot2 - r

I created a grouped box plot in R with ggplot2. However, when I want to add the mean, the dots appear between the two boxes in a group. How can I change it such that the dots are within each box?
Here my code:
ggplot(results, aes(x=treatment, y=effect, fill=sex)) +
geom_boxplot() +
stat_summary(fun.y=mean, geom="point", shape=20, size=3, color="red")`

You can use position_dodge2. Because points and boxplots have differing widths, you will need to trial and error with the width argument to centralise the dots.
ggplot(mtcars, aes(x=factor(gear), y=hp, fill=factor(vs))) +
geom_boxplot() +
stat_summary(fun.y=mean, geom="point", shape=20, size=3, color="red",
position = position_dodge2(width = 0.75,
preserve = "single"))

In most cases, you will not be able to place the points inside each grouped box as they overlap with each other through the axes. One alternative is to use facet_wrap.
Here is one example with iris data:
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, fill=Species)) +
geom_point(aes(color = Species), shape = 20, size = 3) +
geom_boxplot(alpha = 0.8) +
facet_wrap(~Species)
If you don't want the color of the points to be the same as the color of the boxplots, you have to remove the grouping variable from the aes inside the geom_point. Again, with the iris example,
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, fill=Species)) +
geom_boxplot(alpha = 0.8) +
geom_point(shape = 20, size = 3, color = 'red') +
facet_wrap(~Species)
Note that the ggplot2 package works in layers. Therefore, if you add the geom_point layer after the geom_boxplot layer, the points will be on the top of the boxplot. If you add the geom_point layer before the geom_boxplot layer, the points will be in the background.
Edit:
If what you want is to add a single point in your boxplot to indicate the mean, you can do something like:
iris %>%
group_by(Species) %>%
mutate(mean.y = mean(Sepal.Width),
mean.x = mean(Sepal.Length)) %>%
ggplot(aes(x=Sepal.Length, y=Sepal.Width, fill=Species)) +
geom_boxplot(alpha = 0.8) +
geom_point(aes(y = mean.y, x = mean.x), shape = 20, size = 3, color = 'red')
But be aware that it would probably require some calibration on the x axis to make it exactly in the middle of each box.

Related

How to flip a geom_area to be under the line when using scale_y_reverse()

I had to flip the axis of my line, but still need the geom_area to be under the curve. However I cannot figure out how to do so.
This is the line of code I tried
ggplot(PalmBeachWell, aes(x=Date, y=Depth.to.Water.Below.Land.Surface.in.ft.)) +
geom_area(position= "identity", fill='lightblue') +
theme_classic() +
geom_line(color="blue") +
scale_y_reverse()
and here is what i got
One option would be to use a geom_ribbon to fill the area above the curve which after applying scale_y_reverse will result in a fill under the curve.
Using some fake example data based on the ggplot2::economics dataset:
library(ggplot2)
PalmBeachWell <- economics[c("date", "psavert")]
names(PalmBeachWell) <- c("Date", "Depth.to.Water.Below.Land.Surface.in.ft.")
ggplot(PalmBeachWell, aes(x = Date, y = Depth.to.Water.Below.Land.Surface.in.ft.)) +
geom_ribbon(aes(ymin = Depth.to.Water.Below.Land.Surface.in.ft., ymax = Inf),
fill = "lightblue"
) +
geom_line(color = "blue") +
scale_y_reverse() +
theme_classic()

Mixed fill color in ggplot2 legend using geom_smooth() in R

When plotting two regression curves using geom_smooth() in ggplot2, for the fill color, the legend picks the one where the confidence intervals intersect. I do think this behaviour arises when the overlapping area is proportionally bigger that the other, however I find this quite undesired because the reader is capable of deducing that the "darkened" area is the one where the CI intersect. It is IMHO a bit harder or unintuitive to assign the same color for both the curves.
How can I correct this ?
MWE:
library(ggplot2)
p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length)) + geom_point()
p <- p + geom_smooth(method=loess, aes(colour="Loess"), fill="yellow")
p <- p + geom_smooth(method=lm, aes(colour="LM"))
print(p)
Output:
You can add the fill as an aesthetic mapping, ensuring you name it the same as the color mapping to get legends to merge:
library(ggplot2)
ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length)) +
geom_point(aes(shape = "data")) +
geom_smooth(method=loess, aes(colour="Loess", fill="Loess")) +
geom_smooth(method=lm, aes(colour="LM", fill = "LM")) +
scale_fill_manual(values = c("yellow", "gray"), name = "model") +
scale_colour_manual(values = c("red", "blue"), name = "model") +
labs(shape = "")

removing confidance ellipses for some groups when using stat_ellipse ggplot2

You have a scatter plot with several groups (for example 10). You draw the 95% confidance ellipses for the groups.
Problem: you don't need to see the confidance ellipses of all groups (because is not necessary or because some of them have few points, resulting in huge ellipses)
Question: how do you remove confidance ellipses of determined groups while keepiing the point on the scatter plot?
Example:
In this code you wish to remove the confidance ellipse of versicolor, but keeping the points with their colour and keeping the other ellipses
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
geom_point() +
stat_ellipse(aes(color = Species)) +
theme(legend.position = "bottom")
The legend in Z.lin's answer would not be accurate as versicolor still has a line going through the shape even though no ellipse were plotted. We can bypass this by specifing aes(linetype = Species) and the scale_linetype_manual.
library(dplyr, ggplot2)
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
geom_point(aes(shape = Species)) +
stat_ellipse(aes(linetype = Species)) +
scale_linetype_manual(values = c(1,0,1)) +
theme(legend.position = "bottom")
[]
You can filter the data passed to the stat_ellipse layer to include only groups for which you want ellipses:
library(dplyr)
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
geom_point() +
stat_ellipse(data = . %>% filter(Species != "versicolor"),
aes(color = Species)) +
theme(legend.position = "bottom")

ggplot Adding Tracking Colors Below X-Axis

I'd like to add a line below the x-axis where its color is dependant on a factor that is not plotted.
In this example, I'm creating a box plot and would like to add a line that indicates another variable.
Using the cars data set as an example and then physically dawing in what I'm trying to do:
ggplot(mtcars, aes(factor(cyl), mpg, fill=factor(am))) +
geom_boxplot()
My thought was to create a bar, column, or geom_tile plot and then arrange it below the boxplot. This is how I would do it in base R. Is there a way to add in these kinds of color labels in ggplot2?
The natural way in ggplot2 to do this sort of thing would to be facet on the categorical variable to create subplots. However if you want to keep everything on the same graph you could try using a geom_tile() layer something like this:
df <-data.frame(x = factor(c(4,6,8)), colour = factor(c(1,2,1)))
ggplot(mtcars, aes(factor(cyl), mpg, fill=factor(am))) +
geom_boxplot() +
geom_tile(data=df, aes(x = x, y = 8, fill = colour))
Alternatively as you suggest you could align an additional plot underneath it. You could use ggarrange() in the ggpubr package for this:
plot1 <- ggplot(mtcars, aes(factor(cyl), mpg, fill=factor(am))) +
geom_boxplot() +
geom_tile(data=df, aes(x = x, y = 10, fill = colour))
theme(legend.position = 'none')
plot2 <- ggplot(df, aes(x=x, y=1, fill = colour)) +
geom_tile() +
theme_void() +
scale_fill_manual(values=c('orange', 'green', 'orange')) +
theme(legend.position = 'none')
library(ggpubr)
ggarrange(plot1, plot2, nrow = 2, heights = c(10, 1), align = 'h')

ggplot2 add legend for each geom_point manually

I created a plot using 2 separate data sets so that I could create different errorbars. The first data set has error bars that go down only whereas the second data set has error bars that go up only. This prevents unnecessary overlap in the plot. I also used a compound shape for one of the groups.
I want to create a legend based on these shapes (not a colour), but I can't seem to figure it out. Here is the plot code.
p <-ggplot()
p + geom_point(data=df.figure.1a, aes(x=Hour, y=Mean), shape=5, size=4) +
geom_point(data=df.figure.1a, aes(x=Hour, y=Mean), shape=18, size=3) +
geom_errorbar(data=df.figure.1a, aes(x=Hour, y=Mean, ymin = Mean - SD, ymax = Mean), size=0.7, width = 0.4) +
geom_point(data=df.figure.1b, aes(x=Hour, y=Mean), shape=17, size=4) +
geom_errorbar(data=df.figure.1b, aes(x=Hour, y=Mean, ymin = Mean, ymax = Mean + SD), size=0.7, width = 0.4)

Resources