Connect medians with lines in dodged boxplots within subgroups? - r

In the code below, which uses an interaction of two variables to define a group, why ggplot2 correctly plots separate points for the medians, but complains that the group consists of only one observation when I try to plot lines connecting the medians within each group?
Please how can I do it?
dat <-
data.frame(method = c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B"),
var = c("X01","X01","X01","X01","X01","X02","X02","X02","X02","X02","X01","X01","X01","X01","X01","X02","X02","X02","X02","X02"),
value = c(-1.1,0.3,0.3,-0.6,-0.2,0.7,1.2,0.2,0.3,1.8,0.5,1.1,-0.8,0.8,0.6,1.0,0.5,0.6,1.5,1.3))
ggplot(dat, aes(var, value)) +
geom_boxplot(aes(fill = method),
position = position_dodge(width = .9),
outlier.color = NA) +
stat_summary(
fun = median,
geom = "point",
aes(group = interaction(method, var)),
position = position_dodge(width = .9)
) +
stat_summary(
fun = median,
geom = "line",
aes(group = interaction(method, var)),
position = position_dodge(width = .9)
)
geom_path: Each group consists of only one observation. Do you need to adjust
the group aesthetic?
Desired output:

Because ggplot2 thinks the groups are a combination of method and var. It doesn't know that after you've calculated the median, you'd like to drop the method part of the grouping. I don't think there is an elegant way to get the lines as you want.
Below is an inelegant put pragmatic solution: you first calculate the medians, then feed that layer data to a new line geom.
library(ggplot2)
dat <-
data.frame(method = c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B"),
var = c("X01","X01","X01","X01","X01","X02","X02","X02","X02","X02","X01","X01","X01","X01","X01","X02","X02","X02","X02","X02"),
value = c(-1.1,0.3,0.3,-0.6,-0.2,0.7,1.2,0.2,0.3,1.8,0.5,1.1,-0.8,0.8,0.6,1.0,0.5,0.6,1.5,1.3))
p <- ggplot(dat, aes(var, value)) +
geom_boxplot(aes(fill = method),
position = position_dodge(width = .9),
outlier.color = NA) +
stat_summary(
fun = median,
geom = "point",
aes(group = interaction(method, var)),
position = position_dodge(width = .9)
)
p + geom_line(
data = layer_data(p, 2),
aes(x = x, group = round(x), y = y)
)
Created on 2022-08-14 by the reprex package (v2.0.1)

Related

R ggplot2 : geom_jitter and fill, problem to have the dots on the right boxplot

Here's my R code
ggplot(dat = Table, aes(x = Group, y = value, fill = Type)) +
geom_boxplot(alpha=0.08)+
geom_jitter()+
scale_fill_brewer(palette="Spectral")+
theme_minimal()
Like you can see the dots are in the middle of the boxplots. What can I add in geom_jitter to have each point in the righ boxplot and not in the middle like this ? I also tried geom_point, it gave the same result !
Thanks to the help now It works, but I wanted to add a line to connect the dots and I got this.. can someone tell how to really connect the dots with lines
I think if you group by interaction(Group, Type) and use position_jitterdodge() you should get what you're looking for.
ggplot(mtcars, aes(as.character(am), mpg, color = as.character(vs),
group = interaction(as.character(vs), as.character(am)))) +
geom_boxplot() +
geom_jitter(position = position_jitterdodge()) # same output with geom_point()
Edit - here's an example with manual jittering applied to data where the each subject appears once in each Group.
I looked for a built-in way to do this, and this answer comes close, but I couldn't get it to work in terms of using position_jitterdodge with position defined by the groups of Group/Type, but line grouping defined by id alone and not by Group/Type. Both aesthetics (position adjustment and series identification) rely on the same group parameter, but they each need a different value for it.
Table = data.frame(id = 1:4,
value = rnorm(8),
Group = rep(c("a","b"), each = 4),
Type = c("1", "2"))
library(dplyr)
Table %>%
mutate(x = as.numeric(as.factor(Group)) +
0.2 * scale(as.numeric(as.factor(Type))) +
rnorm(n(), sd = 0.06)) %>%
ggplot(aes(x = Group, y = value, fill = Type, group = interaction(Group, Type))) +
geom_boxplot(alpha=0.2)+
geom_point(aes(x = x)) +
geom_line(aes(x = x, group = id), alpha = 0.1) +
scale_fill_brewer(palette="Spectral")+
theme_minimal()
Best to use position_dodge instead if you want them to line up:
library(ggplot2)
Table <- tibble::tibble(
Group = rep(c("A", "B"), each = 20),
Type = factor(rep(c(1:2, 1:2), each = 10)),
value = rnorm(40, mean = 10)
)
ggplot(dat = Table, aes(x = Group, y = value, fill = Type)) +
geom_boxplot(alpha=0.08)+
geom_point(position = position_dodge(width = 0.75))+
scale_fill_brewer(palette="Spectral")+
theme_minimal()
To add a line, make sure group = ID goes in both the geom_point and geom_line calls:
library(ggplot2)
Table <- tibble::tibble(
Group = rep(c("A", "B"), each = 20),
Type = factor(rep(c(1:2, 1:2), each = 10)),
ID = factor(rep(1:20, times = 2)),
value = rnorm(40, mean = 10)
)
ggplot(dat = Table, aes(x = Group, y = value, fill = Type)) +
geom_boxplot(alpha = 0.08) +
geom_point(aes(group = ID), position = position_dodge(width = 0.75))+
geom_line(aes(group = ID), position = position_dodge(width = 0.75), colour = "grey")+
scale_fill_brewer(palette = "Spectral") +
theme_minimal()

Is there a way to facet a line graph but add a constant line in R?

I am making a line graph using ggplot of some data that I have were I am sizing objects based on treatment groups into bins and looking at the percentage in each size bin by group.
My goal is to facet the data but have the control line as a graph but also added to the other groups when they facet.
I have the graph working were I get all of my groups one of which is my control and I can facet it. I just would to keep the control line on the subsequent graphs but slightly grayed out. I am not sure how to add that.
graph <- ggplot(data=my_data, aes(x=bins, y=p_per_bin, group = group,
color = group, linetype = group, shape = group))+
stat_summary(fun = "mean", geom = "line", lwd = rel(1))+
stat_summary(fun = mean,
geom = "pointrange",
fun.max = function(x) mean(x) + sd(x) / sqrt(length(x)),
fun.min = function(x) mean(x) - sd(x) / sqrt(length(x)))+
stat_summary(fun = "mean", geom = "point", size = rel(2), fill = "white", stroke = rel(1.1))
graph + facet_wrap(~group)
Here is a portion of my data as a sample. Group "a" is the control.
sample <- c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8,9,9,9)
group <- c("a","a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b",
"b","c","c","c","c","c","c","c","c","c")
bins <-c("0-20","20-40","40-60","0-20","20-40","40-60","0-20","20-40","40-60","0-20","20-40","40-60", "0-20","20-40","40-60", "0-20","20-40","40-60", "0-20","20-40","40-60", "0-20","20-40","40-60", "0-20","20-40","40-60")
p_per_bin <- c(0,37.7192982,21.0526316,0,36.744186,23.7209302,0,36.2126246,31.5614618,
0,31.25,27.0833333,0,41.2280702,28.5087719,0,39.6078431,31.372549,0,43.7262357,
20.1520913,0,35.4716981,21.1320755,0,38.5350318, 29.9363057)
my_data <- cbind(sample,group,bins,p_per_bin)
One option would be to add your reference group via an additional stat_summary for which you only use the data on the reference group. To get this layer displayed on each facet it's important to drop the group column after filtering.
For the example code I have chosen group "a" as the reference group:
library(ggplot2)
library(dplyr)
ggplot(data = my_data, aes(
x = bins, y = p_per_bin, group = group,
color = group, linetype = group, shape = group
)) +
#### Add line for reference group
stat_summary(fun = "mean", geom = "line", lwd = rel(.5),
data = ~filter(.x, group == "a") |> select(-group),
color = "grey45", linetype = "solid") +
####
stat_summary(fun = "mean", geom = "line", lwd = rel(1)) +
stat_summary(
fun = mean,
geom = "pointrange",
fun.max = function(x) mean(x) + sd(x) / sqrt(length(x)),
fun.min = function(x) mean(x) - sd(x) / sqrt(length(x))
) +
stat_summary(fun = "mean", geom = "point", size = rel(2), fill = "white", stroke = rel(1.1)) +
facet_wrap(~group)

Connecting means with stat_summary (geom = 'line') within ticks on the x-axis

I am trying to create a plot in ggplot2 similar to this one:
Here is the code I am using:
Dataset %>%
group_by(Participant, Group, Emotion) %>%
ggplot(aes(y = Score, x = Emotion, fill = Group, colour = Group)) +
geom_flat_violin(position = position_nudge(x = .2, y = 0), alpha = .4) +
geom_point(aes(y = Score, color = Group), position = position_jitter(width = .15), size = 3, alpha = 0.4) +
stat_summary(aes(y = Score, group = Emotion), fun.y = mean, geom="line", size = 2.2, alpha = 1.2, width = 0.25, colour = 'gray48') +
stat_summary(fun = mean, geom = 'pointrange', width = 0.2, size = 2, alpha = 1.2, position=position_dodge(width=0.3)) +
stat_summary(fun.data = mean_se, geom='errorbar', width = 0.25, size = 2.2, alpha = 1.2, linetype = "solid",position=position_dodge(width=0.3)) +
guides(color = FALSE) +
scale_color_brewer(palette = "Dark2") +
scale_fill_brewer(palette = "Dark2") +
ylim(0, 100) +
graph_theme
What I am failing to do is set up the stat_summary(geom = 'line') to connect the green and orange means within each emotion on the x-axis. Could anyone give any pointers on this? I'd also like all the other features to stay the same if possible (e.g., I wouldn't like to use facet_grid or facet_wrap).
Thank you!
When I change the group argument in stat_summary to 'Group' instead of 'Emotion', means for each group are connected across emotions, but I can't figure out how to connect means of different groups within each emotion:
This is a tricky one because your line needs to connect points that have different x values but even if you jitter in the point layer, they still technically have the same x value so the line doesn't know how to connect them. What others have done is to manually add the jitter to force the points to have a different x position. For more inspiration check out this, this and this. Here's an example:
library(tidyverse)
set.seed(1)
emotion <- c("anger", "fear", "sadness")
group <- letters[1:2]
participant <- 1:10
dat <- expand_grid(emotion, group, participant) %>%
mutate(across(everything(), as.factor),
score = sample(x = 1:100, size = nrow(.), replace = T))
dat %>%
mutate(new_emot = case_when(
group == "a" ~as.numeric(emotion) - 0.125,
group == "b" ~as.numeric(emotion) + 0.125
)) %>%
ggplot(aes(x = emotion, y = score)) +
stat_summary(aes(color = group), fun = mean, geom = "point", position = position_dodge(width = 0.5)) +
stat_summary(aes(color = group), fun.data = mean_se, geom = "errorbar", width = 0.5, position = position_dodge(width = 0.5)) +
stat_summary(aes(x = new_emot, group = emotion), fun = mean, geom = "line") +
theme_bw()
Created on 2021-03-24 by the reprex package (v1.0.0)
Setting geom_line to the same position as pointrange and errorbar will solve the problem.
i.e.,
stat_summary(aes(y = Score, group = Emotion), fun.y = mean, geom="line", size = 2.2, alpha = 1.2, width = 0.25, colour = 'gray48', position=position_dodge(width=0.3))

Problem with jittered data points in geom_boxplot

I have created a boxplot using the following code -
ggplot(xray50g, aes(x = Company, y = DefScore, label = Batch,
label2 = PercentPopAff, label3 = AvVertAff,
label4 = EggsPerLitreReceiving)) +
geom_boxplot() +
geom_point(aes(colour = Ploidy), size = 0.5) +
geom_jitter() +
# USE ENVSTATS PACKAGE TO INCLUDE SAMPLE SIZE
stat_n_text(size = 3) +
# INCLUDE MEAN VALUES
stat_summary(fun = mean, geom = "point", shape = 4, size = 2, color = "black") +
stat_summary(fun = mean, colour = "black", geom = "text", size = 3, show.legend = FALSE,
hjust = -0.35, vjust = -0.5, aes( label = round(..y.., digits = 2)))
I wanted to spread the data points out a little; however, when I use geom_jitter it seems to blur all the data points together and ruin the chart (see image).
Any help with this would be greatly appreciated.
You can use the width argument of geom_jitter to control how much the points are spread along the x-axis. I'd also recommend making the jittered points transparent (alpha argument) and to stop geom_boxplot from plotting the outliers with the outlier.shape argument (as those points also will be plotted by the jitter layer). Try the following:
ggplot(xray50g, aes(x = Company, y = DefScore, label = Batch,
label2 = PercentPopAff, label3 = AvVertAff,
label4 = EggsPerLitreReceiving)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(alpha = 0.25, width = 0.1)

Is there a way to present multiple means in ggplot in r

This is the code that I am using:
p1 <- ggplot(df_xy, aes(as.factor(x = Vehicle), y = accuracy)) +
geom_boxplot(aes(fill = Analyzer)) +
stat_boxplot(aes(fill = Analyzer), geom = 'errorbar') +
stat_summary(fun.y = "mean", geom = "point", shape = 20, size = 2, color = "red", fill = "red")
I present accuracy of 2 different analyzers vs vehicle type on x-axis.
I have 2 boxplots (not for all vehicles) and I want to show also two mean values with dots (and not only median). How can I do that? Below is example of my plot.
It would be helpful if you could include some sample data so that we could verify an answer. However, it looks like you need to tell stat_summary how to group the data for the mean calculation and then to space the points horizontally. The code probably should be
p1 <- ggplot(df_xy, aes(as.factor(x = Vehicle), y = accuracy)) +
geom_boxplot(aes(fill = Analyzer)) +
stat_boxplot(aes(fill = Analyzer), geom = 'errorbar') +
stat_summary( aes(group = Analyzer), fun.y = "mean", geom = "point",
shape = 20, size = 2, color = "red", fill = "red", position = position_dodge(width = .75) )

Resources