How to insert grouped median segments in violin plot in ggplot2 - r

I'd like to insert median lines for factor levels into a violin plot in ggplot2. Here's some reproducible data:
set.seed(12)
FactorVar <- sample(LETTERS[1:5], 500, replace = T)
NumericVar <- abs(rnorm(500))
df <- data.frame(FactorVar, NumericVar)
To get the grouped medians I use tapply:
medians <- tapply(df$NumericVar, df$FactorVar, FUN = median)
And this is the code for the plot. As can be seen, I'm inserting each median line individually. That's cumbersome and uneconomical:
library(ggplot2)
g <-
ggplot(data = df,
aes(x = FactorVar, y = NumericVar, fill = FactorVar)) +
geom_violin(scale = "count", trim = F, adjust = 0.75) +
geom_point(aes(y = NumericVar),
position = position_jitter(width = .15), size = 0.9, alpha = 0.8) +
geom_hline(yintercept = mean(NumericVar), color = "blue", size = 0.8, linetype = 4) +
geom_segment(x = 0.5, xend = 1.5, y= medians[1], yend = medians[1], color = "red", linetype = 2) +
geom_segment(x = 1.5, xend = 2.5, y = medians[2], yend = medians[2], color = "red", linetype = 2) +
geom_segment(x = 2.5, xend = 3.5, y = medians[3], yend = medians[3], color = "red", linetype = 2) +
geom_segment(x = 3.5, xend = 4.5, y = medians[4], yend = medians[4], color = "red", linetype = 2) +
geom_segment(x = 4.5, xend = 5.5, y = medians[5], yend = medians[5], color = "red", linetype = 2) +
guides(fill = FALSE) +
guides(color = FALSE) +
coord_flip() +
theme_gray(); g
How can the median segments be inserted in a single command? Also, observe how the median line for factor A is thinner than the others? Why's that?

One method (that simplifies the +/- axis) would be to facet it. Before, though, we'll need to put the medians into a frame, preferably with the same grouping factors as the original.
mediansdf <- data.frame(FactorVar=names(medians), NumericVar=medians)
g <-
ggplot(data = df,
aes(x = FactorVar, y = NumericVar, fill = FactorVar)) +
geom_violin(scale = "count", trim = F, adjust = 0.75) +
geom_point(aes(y = NumericVar),
position = position_jitter(width = .15), size = 0.9, alpha = 0.8) +
geom_hline(yintercept = mean(NumericVar), color = "blue", size = 0.8, linetype = 4) +
guides(fill = FALSE) +
guides(color = FALSE) +
coord_flip() +
theme_gray() +
facet_grid(FactorVar~., scales="free") +
geom_segment(aes(x = 0.5, xend = 1.5, yend = NumericVar), color = "red", linetype = 2, data = mediansdf)
g
This example reused the y aesthetic, but since we have a different frame, we could easily use different names (and specify them within aes(...). One advantage to using the same variable names is (in my opinion) clearer declarative code.
Since the facet_grid adds the factor label on the right side, you likely could remove it from the axis. Note, if you do not use scales="free", then you'll see all factors in each facet, which is distracting and unnecessary.
The reason I am suggesting facets is that it makes the x and xend simple and relative to a single violin, so 0.5 to 1.5; otherwise, as you saw, there is some assumption on which is going with which integer placement.
Last, the appearance of thinner red lines for me was while looking at the raster plot window. If you save to vector-based format (e.g., PDF), the lines appear to be the same thickness.

Related

R: Reduce axis to geom_density_ridges distance after flipping plot with coord_flip in ggplot2

First we prepare some toy data that sufficiently resembles the one I am working with.
rawdata <- data.frame(Score = rnorm(1000, seq(1, 0, length.out = 10), sd = 1),
Group = rep(LETTERS[1:3], 10000))
stdev <- c(10.78,10.51,9.42)
Now we plot the estimated densities via geom_density_ridges. I also add a grey highlight around zero via geom_rect. I also flip the chart with coord_flip.
p <- ggplot(rawdata, aes(x = Score, y = Group)) +
scale_y_discrete() +
geom_rect(inherit.aes = FALSE, mapping = aes(ymin = 0, ymax = Inf, xmin = -0.1 * min(stdev), xmax = 0.1 * max(stdev)),
fill = "grey", alpha = 0.5) +
geom_density_ridges(aes(fill = Group), scale = 0.5, size = 1, alpha=0.5) +
scale_color_manual(values = col) +
scale_fill_manual(values = col) +
labs(title="Toy Graph", y="Group", x="Value") +
coord_flip(xlim = c(-8, 8), ylim = NULL, expand = TRUE, clip = "on")
p
And this is the solution I get, which is close to what I was expecting, despite the detail of this enormous gap between the y axis an the start of the first factor in the x axis A. I tried using expand=c(0,0) inside scale_y_discrete() following some suggestions from other posts, but it does not make the gap smaller at all. If possible I would still like to have a certain gap, although minimal. I've been also trying to flip the densities in the y axis so the gap is filled by first factor density plot but I have been unsuccessful as it does not seem as trivial as one could expect.
Sorry, I know this might be technically two different questions, "How to reduce the gap from the y axis to the first density plot?" and "How to flip the densities from y axis to reduce the gap?" But I would really be happy with the first one as I understand the second question seems to be apparently less straightforward.
Thanks in advance! Any help is appreciated.
Flipping the densities also effectively reduces the space, so this might be all you need to do. You can achieve it with a negative scale parameter:
ggplot(rawdata, aes(x = Score, y = Group)) +
scale_y_discrete() +
geom_rect(inherit.aes = FALSE,
mapping = aes(ymin = 0, ymax = Inf,
xmin = -0.1 * min(stdev),
xmax = 0.1 * max(stdev)),
fill = "grey", alpha = 0.5) +
geom_density_ridges(aes(fill = Group), scale = -0.5, size = 1, alpha = 0.5) +
scale_color_manual(values = col) +
scale_fill_manual(values = col) +
labs(title = "Toy Graph", y = "Group", x = "Value") +
coord_flip(xlim = c(-8, 8), ylim = NULL, expand = TRUE, clip = "on")
If you want to keep the densities pointing the same way but just reduce space on the left side, simply set hard limits in your coord_flip, with no expansion:
ggplot(rawdata, aes(x = Score, y = Group)) +
geom_rect(inherit.aes = FALSE,
mapping = aes(ymin = 0, ymax = Inf,
xmin = -0.1 * min(stdev),
xmax = 0.1 * max(stdev)),
fill = "grey", alpha = 0.5) +
geom_density_ridges(aes(fill = Group), scale = 0.5, size = 1, alpha = 0.5) +
scale_color_manual(values = col) +
scale_fill_manual(values = col) +
scale_y_discrete() +
labs(title = "Toy Graph", y = "Group", x = "Value") +
coord_flip(xlim = c(-8, 8), ylim = c(0.8, 4), expand = FALSE)

How to get overlapped rectangular bars in ggplot?

I am trying to create 3 layers of rectangles each with different color on top of each other to get something like below image:
Data:
library(tidyverse)
df_vaccination <- data.frame(type = c('Population', 'First.Dose.Administered', 'Second.Dose.Administered'),
count = c(1366400000, 952457943, 734608556))
Code tried:
df_vaccination %>%
ggplot()+
geom_rect(aes(xmin = 0, ymin = 0, xmax = count, ymax = 0,
size = 10, lineend = 'round',
alpha = 0.5, fill = type)) +
scale_fill_manual(values = c("#d8b365", "orange", "#5ab4ac")) +
theme_clean() +
scale_x_continuous(labels = unit_format(scale = 1e-7, unit = "Cr")) +
guides(color = guide_legend(order = 1),
size = FALSE,
alpha = FALSE)
Result I am getting is blank plot when I am using geom_rect() & scale_fill_manual(). I am not sure why am I getting blank rectangle:
Convert type column to ordered factor so that largest number plots first, then use geom_col with x = 1. This will make the bars to plot on top of each other, lastly flip the coordinates:
df_vaccination$type <- factor(df_vaccination$type, levels = df_vaccination$type)
ggplot(df_vaccination, aes(x = 1, y = count, fill = type))+
geom_col() +
scale_fill_manual(values = c("#d8b365", "orange", "#5ab4ac")) +
coord_flip() +
theme_void()

custom color for each group + category combination raincloud plot

I have a raincloud plot:
but I would like each combination of TL group and yr to be a different color, as one can do in base boxplot():
I have tried using the following code for the raincloud plot:
Y_C_rain= ggplot(yct_rain, aes(y=d13C, x=lengthcat,fill = yr,color=yr)) +
geom_flat_violin(position = position_nudge(x = .2, y =0), alpha = .8)+
geom_point(aes(y = , color = yr),
position = position_jitter(width = .05), size = 2, alpha = .5) +
geom_boxplot(width = .3, guides = FALSE, outlier.shape = NA, alpha = 0, notch = FALSE) +
stat_summary(fun= mean, geom = "point", shape = 21, size = 3, fill = "black") +
scale_y_continuous (limits = c(-35,-10),expand = c(0,0),breaks=seq(-35,-10,5)) +
ylab("d13C") + xlab("TL group") +
ggtitle("YCT d13C") +
theme_bw() +
scale_colour_discrete(my_clrs_yct)+
scale_fill_discrete(my_clrs_yct)
Y_C_rain
I know that the colors in the rain plot will need to be coded with some variant of scale_fill_xxx but I am hitting a road block since it appears that each point also needs to have its own color. Therefore the variations of scale_fill_xxx with only 6 individual colors listed is not working.
Do you want something like this?
library(dplyr)
library(data.table)
library(ggplot2)
# used geom_flat_violin from https://gist.github.com/dgrtwo/eb7750e74997891d7c20
my_clrs_yct <- c("#404040", "#407a8c", "#7a7a7a", "#404f86", "#a6a6a6", "#3e1451")
## used storms from dplyr as reproducible example
data("storms")
setDT(storms)
storms[, season:= factor(ifelse(month <=6, "Q12", "Q34"))]
ggplot(storms, aes(x=status, y=pressure, color=interaction(status, season),
fill=interaction(status, season))) +
geom_point(aes(color = interaction(status, season)),
position = position_jitterdodge(
jitter.width=.1, dodge.width=.25), size = 2, alpha = .5)+
geom_flat_violin(position = position_nudge(x = .5, y =0), alpha = .5)+
geom_boxplot(width = .3, guides = FALSE, outlier.shape = NA, alpha = 0)+
stat_summary(fun = mean, geom = "point", shape = 21, size = 3,
fill = "black", position = position_nudge(x = c(-.075,.075), y =0)) +
theme_bw() +
scale_colour_manual(values=my_clrs_yct) +
scale_fill_manual(values=my_clrs_yct)

removing part of a fill legend in ggplot

I am facing a difficulty for a plot: I want to remove a part of a fill legend in a ggplot plot, while keeping the automated coloring. here is an example:
library(ggplot2)
df1 <- data.frame(x = 1:20,y1 = rnorm(20,2,0.2),y2 = sqrt(1:20))
df2 <- data.frame(x1 = c(1,5,10),x2 = c(5,10,20),color2 = as.factor(1:3))
ggplot(data=df1) +
geom_rect(data = df2,
aes(xmin = x1,
xmax = x2,
ymin = 0,
ymax = Inf,
fill = color2),
color = "black",
size = 0.3,
alpha = 0.2)+
geom_bar(aes(x = x,
y= y1,
fill = "daily"),
stat='identity',
width = 0.75,
size = 0.1,
alpha = 0.5) +
geom_line(aes(x = x,
y =y2,
color = "somthing"),
size = 1.5)
I would like to:
keep only the daily entry of the fill legend
keep the automated filling based on the color2 for the geom_rect
ideally, merge the two legends (color and fill) into one
I have been playing around with scale_fill_manual and guide, but I did not come with something working. I feel that the solution could be making two independent layer and add them, but I don't know how to do that.
Does anyone know how to do ?
Remember you can set the breaks on any scale, so just set a single break at "daily" on your fill scale. To merge it with the color scale (if I understand your meaning) you can just give the color guide and its single break the same names as the fill guide and fill break:
ggplot(data=df1) +
geom_rect(data = df2,
aes(xmin = x1,
xmax = x2,
ymin = 0,
ymax = Inf,
fill = color2),
color = "black",
size = 0.3,
alpha = 0.2)+
geom_bar(aes(x = x,
y= y1,
fill = "daily"),
stat='identity',
width = 0.75,
size = 0.1,
alpha = 0.5) +
geom_line(aes(x = x,
y =y2,
color = "somthing"),
size = 1.5) +
scale_fill_discrete(breaks = "daily", name = NULL) +
scale_color_discrete(name = "labels") +
theme(legend.margin = margin(0, 0, -10, 0))

How to merge legends for color and shape when geom_hline has a separate (additional) entry in the color legend?

I have the following code, which produces the following plot:
cols <- brewer.pal(n = 3, name = 'Dark2')
p4 <- ggplot(all.m, aes(x=xval, y=yval, colour = Approach, ymax = 0.95)) + theme_bw() +
geom_errorbar(aes(ymin= yval - se, ymax = yval + se), width=5, position=pd) +
geom_line(position=pd) +
geom_point(aes(shape=Approach, colour = Approach), size = 4) +
geom_hline(aes(yintercept = cp.best$slope, colour = "C2P"), show_guide = FALSE) +
scale_color_manual(name="Approach", breaks=c("C2P", "P2P", "CP2P"), values = cols[c(1,3,2)]) +
scale_y_continuous(breaks = seq(0.4, 0.95, 0.05), "Test AUROC") +
scale_x_continuous(breaks = seq(10, 150, by = 20), "# Number of Patient Samples in Training")
p4 <- p4 + theme(legend.direction = 'horizontal',
legend.position = 'top',
plot.margin = unit(c(5.1, 7, 4.5, 3.5)/2, "lines"),
text = element_text(size=15), axis.title.x=element_text(vjust=-1.5), axis.title.y=element_text(vjust=2))
p4 <- p4 + guides(colour=guide_legend(override.aes=list(shape=c(NA,17,16))))
p4
When I try show_guide = FALSE in geom_point, the shape of the point in the upper legend are all set to default solid circles.
How can I make the lower legend to disappear, without affecting the upper legend?
This is a solution, complete with reproducible data:
library("ggplot2")
library("grid")
library("RColorBrewer")
cp2p <- data.frame(xval = 10 * 2:15, yval = cumsum(c(0.55, rnorm(13, 0.01, 0.005))), Approach = "CP2P", stringsAsFactors = FALSE)
p2p <- data.frame(xval = 10 * 1:15, yval = cumsum(c(0.7, rnorm(14, 0.01, 0.005))), Approach = "P2P", stringsAsFactors = FALSE)
pd <- position_dodge(0.1)
cp.best <- list(slope = 0.65)
all.m <- rbind(p2p, cp2p)
all.m$Approach <- factor(all.m$Approach, levels = c("C2P", "P2P", "CP2P"))
all.m$se <- rnorm(29, 0.1, 0.02)
all.m[nrow(all.m) + 1, ] <- all.m[nrow(all.m) + 1, ] # Creates a new row filled with NAs
all.m$Approach[nrow(all.m)] <- "C2P"
cols <- brewer.pal(n = 3, name = 'Dark2')
p4 <- ggplot(all.m, aes(x=xval, y=yval, colour = Approach, ymax = 0.95)) + theme_bw() +
geom_errorbar(aes(ymin= yval - se, ymax = yval + se), width=5, position=pd) +
geom_line(position=pd) +
geom_point(aes(shape=Approach, colour = Approach), size = 4, na.rm = TRUE) +
geom_hline(aes(yintercept = cp.best$slope, colour = "C2P")) +
scale_color_manual(values = c(C2P = cols[1], P2P = cols[2], CP2P = cols[3])) +
scale_shape_manual(values = c(C2P = NA, P2P = 16, CP2P = 17)) +
scale_y_continuous(breaks = seq(0.4, 0.95, 0.05), "Test AUROC") +
scale_x_continuous(breaks = seq(10, 150, by = 20), "# Number of Patient Samples in Training")
p4 <- p4 + theme(legend.direction = 'horizontal',
legend.position = 'top',
plot.margin = unit(c(5.1, 7, 4.5, 3.5)/2, "lines"),
text = element_text(size=15), axis.title.x=element_text(vjust=-1.5), axis.title.y=element_text(vjust=2))
p4
The trick is to make sure that all of the desired levels of all.m$Approach appear in all.m, even if one of them gets dropped out of the graph. The warning about the omitted point is suppressed by the na.rm = TRUE argument to geom_point.
Short answer:
Just add a dummy geom_point layer (transparent points) where shape is mapped to the same level as in geom_hline.
geom_point(aes(shape = "int"), alpha = 0)
Longer answer:
Whenever possible, ggplot merges / combines legends of different aesthetics. For example, if colour and shape is mapped to the same variable, then the two legends are combined into one.
I illustrate this using simple data set with 'x', 'y' and a grouping variable 'grp' with two levels:
df <- data.frame(x = rep(1:2, 2), y = 1:4, grp = rep(c("a", "b"), each = 2))
First we map both color and shape to 'grp'
ggplot(data = df, aes(x = x, y = y, color = grp, shape = grp)) +
geom_line() +
geom_point(size = 4)
Fine, the legends for the aesthetics, color and shape, are merged into one.
Then we add a geom_hline. We want it to have a separate color from the geom_lines and to appear in the legend. Thus, we map color to a variable, i.e. put color inside aes of geom_hline. In this case we do not map the color to a variable in the data set, but to a constant. We may give the constant a desired name, so we don't need to rename the legend entries afterwards.
ggplot(data = df, aes(x = x, y = y, color = grp, shape = grp)) +
geom_line() +
geom_point(size = 4) +
geom_hline(aes(yintercept = 2.5, color = "int"))
Now two legends appears, one for the color aesthetics of geom_line and geom_hline, and one for the shape of the geom_points. The reason for this is that the "variable" which color is mapped to now contains three levels: the two levels of 'grp' in the original data, plus the level 'int' which was introduced in the geom_hline aes. Thus, the levels in the color scale differs from those in the shape scale, and by default ggplot can't merge the two scales into one legend.
How to combine the two legends?
One possibility is to introduce the same, additional level for shape as for color by using a dummy geom_point layer with transparent points (alpha = 0) so that the two aesthetics contains the same levels:
ggplot(data = df, aes(x = x, y = y, color = grp, shape = grp)) +
geom_line() +
geom_point(size = 4) +
geom_hline(aes(yintercept = 2.5, color = "int")) +
geom_point(aes(shape = "int"), alpha = 0) # <~~~~ a blank geom_point
Another possibility is to convert the original grouping variable to a factor, and add the "geom_hline level" to the original levels. Then use drop = FALSE in scale_shape_discrete to include "unused factor levels from the scale":
datadf$grp <- factor(df$grp, levels = c(unique(df$grp), "int"))
ggplot(data = df, aes(x = x, y = y, color = grp, shape = grp)) +
geom_line() +
geom_point(size = 4) +
geom_hline(aes(yintercept = 2.5, color = "int")) +
scale_shape_discrete(drop = FALSE)
Then, as you already know, you may use the guides function to "override" the shape aesthetics in the legend, and remove the shape from the geom_hline entry by setting it to NA:
guides(colour = guide_legend(override.aes = list(shape = c(16, 17, NA))))

Resources