removing confidance ellipses for some groups when using stat_ellipse ggplot2 - r

You have a scatter plot with several groups (for example 10). You draw the 95% confidance ellipses for the groups.
Problem: you don't need to see the confidance ellipses of all groups (because is not necessary or because some of them have few points, resulting in huge ellipses)
Question: how do you remove confidance ellipses of determined groups while keepiing the point on the scatter plot?
Example:
In this code you wish to remove the confidance ellipse of versicolor, but keeping the points with their colour and keeping the other ellipses
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
geom_point() +
stat_ellipse(aes(color = Species)) +
theme(legend.position = "bottom")

The legend in Z.lin's answer would not be accurate as versicolor still has a line going through the shape even though no ellipse were plotted. We can bypass this by specifing aes(linetype = Species) and the scale_linetype_manual.
library(dplyr, ggplot2)
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
geom_point(aes(shape = Species)) +
stat_ellipse(aes(linetype = Species)) +
scale_linetype_manual(values = c(1,0,1)) +
theme(legend.position = "bottom")
[]

You can filter the data passed to the stat_ellipse layer to include only groups for which you want ellipses:
library(dplyr)
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
geom_point() +
stat_ellipse(data = . %>% filter(Species != "versicolor"),
aes(color = Species)) +
theme(legend.position = "bottom")

Related

Add mean to grouped box plot in R with ggplot2

I created a grouped box plot in R with ggplot2. However, when I want to add the mean, the dots appear between the two boxes in a group. How can I change it such that the dots are within each box?
Here my code:
ggplot(results, aes(x=treatment, y=effect, fill=sex)) +
geom_boxplot() +
stat_summary(fun.y=mean, geom="point", shape=20, size=3, color="red")`
You can use position_dodge2. Because points and boxplots have differing widths, you will need to trial and error with the width argument to centralise the dots.
ggplot(mtcars, aes(x=factor(gear), y=hp, fill=factor(vs))) +
geom_boxplot() +
stat_summary(fun.y=mean, geom="point", shape=20, size=3, color="red",
position = position_dodge2(width = 0.75,
preserve = "single"))
In most cases, you will not be able to place the points inside each grouped box as they overlap with each other through the axes. One alternative is to use facet_wrap.
Here is one example with iris data:
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, fill=Species)) +
geom_point(aes(color = Species), shape = 20, size = 3) +
geom_boxplot(alpha = 0.8) +
facet_wrap(~Species)
If you don't want the color of the points to be the same as the color of the boxplots, you have to remove the grouping variable from the aes inside the geom_point. Again, with the iris example,
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, fill=Species)) +
geom_boxplot(alpha = 0.8) +
geom_point(shape = 20, size = 3, color = 'red') +
facet_wrap(~Species)
Note that the ggplot2 package works in layers. Therefore, if you add the geom_point layer after the geom_boxplot layer, the points will be on the top of the boxplot. If you add the geom_point layer before the geom_boxplot layer, the points will be in the background.
Edit:
If what you want is to add a single point in your boxplot to indicate the mean, you can do something like:
iris %>%
group_by(Species) %>%
mutate(mean.y = mean(Sepal.Width),
mean.x = mean(Sepal.Length)) %>%
ggplot(aes(x=Sepal.Length, y=Sepal.Width, fill=Species)) +
geom_boxplot(alpha = 0.8) +
geom_point(aes(y = mean.y, x = mean.x), shape = 20, size = 3, color = 'red')
But be aware that it would probably require some calibration on the x axis to make it exactly in the middle of each box.

Mixed fill color in ggplot2 legend using geom_smooth() in R

When plotting two regression curves using geom_smooth() in ggplot2, for the fill color, the legend picks the one where the confidence intervals intersect. I do think this behaviour arises when the overlapping area is proportionally bigger that the other, however I find this quite undesired because the reader is capable of deducing that the "darkened" area is the one where the CI intersect. It is IMHO a bit harder or unintuitive to assign the same color for both the curves.
How can I correct this ?
MWE:
library(ggplot2)
p <- ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length)) + geom_point()
p <- p + geom_smooth(method=loess, aes(colour="Loess"), fill="yellow")
p <- p + geom_smooth(method=lm, aes(colour="LM"))
print(p)
Output:
You can add the fill as an aesthetic mapping, ensuring you name it the same as the color mapping to get legends to merge:
library(ggplot2)
ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length)) +
geom_point(aes(shape = "data")) +
geom_smooth(method=loess, aes(colour="Loess", fill="Loess")) +
geom_smooth(method=lm, aes(colour="LM", fill = "LM")) +
scale_fill_manual(values = c("yellow", "gray"), name = "model") +
scale_colour_manual(values = c("red", "blue"), name = "model") +
labs(shape = "")

Combine and merge legends in ggplot2 with patchwork (discrete/categorical data)

I arranged 3 ggplot2 plots into a single figure by using the functionality of package patchwork. I tried to collect the legends and they appeared one next to the other. But still, they are 3 separate legends and I expected a single legend. So how can I merge the legends that contain identical values of the same factor variable into a single legend?
Notes:
And I do not want to remove the legends of separate plots by using, e.g., theme(legend.position = "none") in case some additional factor level appears. I expect patchwork specific solution.
A similar question was answered in Combine and merge legends in ggplot2 with patchwork but the data was continuous. And in my case, I have categorical data.
The code:
library(ggplot2)
library(patchwork)
iris_1 <-
ggplot(iris, aes(x = Sepal.Length, fill = Species, color = Species)) +
geom_density(alpha = 0.3, adjust = 1.5)
iris_2 <-
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point()
iris_3 <-
ggplot(iris, aes(x = Species, y = Sepal.Width, fill = Species)) +
geom_boxplot()
(iris_1 + iris_2 + iris_3) + plot_layout(guides = "collect")
Created on 2020-10-14 by the reprex package (v0.3.0)
Update
I tried using the same aesthetic mappings (fill = Species and color = Species) as it was proposed in the comments below but it had no effect:
library(tidyverse)
library(patchwork)
iris_1 <-
ggplot(iris, aes(x = Sepal.Length, color = Species, fill = Species)) +
geom_density(alpha = 0.3, adjust = 1.5)
iris_2 <-
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species, fill = Species)) +
geom_point()
iris_3 <-
ggplot(iris, aes(x = Species, y = Sepal.Width, color = Species, fill = Species)) +
geom_boxplot(color = "black")
(iris_1 + iris_2 + iris_3) + plot_layout(guides = "collect")
Created on 2020-10-14 by the reprex package (v0.3.0)
Unfortunately setting the same aes is only one condition. patchwork will merge legends only if they are identical. Therefore we have to ensure that the legends are the same for each plot. To this end I add a guides layer which makes the look of each legend the same by setting color, shape, size and alpha. Additionally we have to choose the same glyph for each geom using argument key_glyph. After these adjustments the three legends get merged into one.
library(ggplot2)
library(patchwork)
g <- guides(fill = guide_legend(override.aes = list(color = scales::hue_pal()(3),
shape = c(16, 16, 16),
size = c(1, 1, 1),
alpha = c(1, 1, 1)),))
iris_1 <-
ggplot(iris, aes(x = Sepal.Length)) +
geom_density(aes(fill = Species, color = Species), key_glyph = "point", alpha = 0.3, adjust = 1.5) +
g
iris_2 <-
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point(aes(fill = Species, color = Species), key_glyph = "point") +
g
iris_3 <-
ggplot(iris, aes(x = Species, y = Sepal.Width)) +
geom_boxplot(aes(fill = Species, color = Species), key_glyph = "point") +
scale_color_manual(values = c("black", "black", "black")) +
g
(iris_1 + iris_2 + iris_3) + plot_layout(guides = "collect")

How to edit legend description in ggplot without getting a second legend?

I had been trying to include a legend in my plot that shows the name of the months next to the line with its respective colour and shape but I can't figure it out.
I have tried using scale_color_hue() but I got two different legends
isop_temp <- ggplot(bio_all_data, aes(t_2m, isop)) +
geom_jitter(aes(shape = month, colour = month, fill = month)) +
geom_smooth(aes(group = month, colour = month), method='lm',
fullrange = T, se = F) +
theme_bw() +
ylim(0, 4.5) +
xlab('temperature °C')+
ylab('Isoprene[ppb]') +
theme(legend.position = "top") +
scale_color_hue(labels = c('February','March','April','May','June'))
And this is what I am getting. What am I missing?
Short answer: you need to add scale_shape() with the same labels.
The issue here is that you map one variable (month) to 3 aesthetics - color, shape and fill. That would give you one legend, but the addition of scale_color_hue() separates the mapping of color and shape.
To illustrate using a reproducible example - we will omit fill because only color is relevant to geom_point. This works as expected:
library(ggplot2)
iris %>%
ggplot(aes(Sepal.Length, Petal.Width)) +
geom_point(aes(color = Species, shape = Species))
Now we add scale_color_hue. We get a separate legend because the labels differ to the default labels used when we mapped to shape:
iris %>%
ggplot(aes(Sepal.Length, Petal.Width)) +
geom_point(aes(color = Species, shape = Species)) +
scale_color_hue(labels = LETTERS[1:3])
The simplest fix is to use the same labels in scale_shape. Alternatively you could dplyr::mutate() the data frame to add a column with month name and map to that instead.
iris %>%
ggplot(aes(Sepal.Length, Petal.Width)) +
geom_point(aes(color = Species, shape = Species)) +
scale_color_hue(labels = LETTERS[1:3]) +
scale_shape(labels = LETTERS[1:3])

ggplot with full data and subset(s) along x-axis

I'll use violin plots here as an example, but the question extends to many other ggplot types.
I know how to subset my data along the x-axis by a factor:
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
geom_violin() +
geom_point(position = "jitter")
And I know how to plot only the full dataset:
ggplot(iris, aes(x = 1, y = Sepal.Length)) +
geom_violin() +
geom_point(position = "jitter")
My question is: is there a way to plot the full data AND a subset-by-factor side-by-side in the same plot? In other words, for the iris data, could I make a violin plot that has both "full data" and "setosa" along the x-axis?
This would enable a comparison of the distribution of a full dataset and a subset of that dataset. If this isn't possible, any recommendations on better way to visualise this would also be welcome :)
Thanks for any ideas!
Using:
ggplot(iris, aes(x = "All", y = Sepal.Length)) +
geom_violin() +
geom_point(aes(color="All"), position = "jitter") +
geom_violin(data=iris, aes(x = Species, y = Sepal.Length)) +
geom_point(data=iris, aes(x = Species, y = Sepal.Length, color = Species),
position = "jitter") +
scale_color_manual(values = c("black","#F8766D","#00BA38","#619CFF")) +
theme_minimal(base_size = 16) +
theme(axis.title.x = element_blank(), legend.title = element_blank())
gives:

Resources