Shaded violin plot by group - r

I'm trying to produce a variation of a grouped violin plot in R (preferably using ggplot2), similar to the one below:
which was produced by the following reproducible example code:
# Load libraries #
library(tidyverse)
# Create dummy data #
set.seed(321)
df <- data.frame(X = rep(c("X1", "X2"), each = 100),
Y = rgamma(n = 200, shape = 2, rate = 2),
Z = rep(c("Za", "Zb"), rep = 100),
stringsAsFactors = FALSE)
# Grouped violin plot #
df %>%
ggplot(., aes(x = X, y = Y, fill = Z)) +
geom_violin(draw_quantiles = 0.5) +
scale_fill_manual(values = c("Za" = "red", "Zb" = "blue"))
The variation I'd like to have is that the density above the median should have a different shade compared to the density below the median, as in the following plot:
I produced the above (single) violin plot for the combination X = X1 and Z = Za in the data, using the following code:
## Shaded violin plot ##
# Calculate limits and median #
df.lim <- df %>%
filter(X == "X1", Z == "Za") %>%
summarise(Y_min = min(Y),
Y_qnt = quantile(Y, 0.5),
Y_max = max(Y))
# Calculate density, truncate at limits and assign shade category #
df.dens <- df %>%
filter(X == "X1", Z == "Za") %>%
do(data.frame(LOC = density(.$Y)$x,
DENS = density(.$Y)$y)) %>%
filter(LOC >= df.lim$Y_min, LOC <= df.lim$Y_max) %>%
mutate(COL = ifelse(LOC > df.lim$Y_qnt, "Empty", "Filled"))
# Find density values at limits #
df.lim.2 <- df.dens %>%
filter(LOC == min(LOC) | LOC == max(LOC))
# Produce shaded single violin plot #
df.dens %>%
ggplot(aes(x = LOC)) +
geom_area(aes(y = DENS, alpha = COL), fill = "red") +
geom_area(aes(y = -DENS, alpha = COL), fill = "red") +
geom_path(aes(y = DENS)) +
geom_path(aes(y = -DENS)) +
geom_segment(data = df.lim.2, aes(x = LOC, y = DENS, xend = LOC, yend = -DENS)) +
coord_flip() +
scale_alpha_manual(values = c("Empty" = 0.1, "Filled" = 1))
As you will notice in the code, I'm building the violin plot from scratch using the density function horizontally and then flipping the axes. The problem arises when I try to produce a grouped violin plot mainly because the axis in which the groups X and Z will appear, is already used for the "height" of the density. I did try to reach the same result by repeating all the calculations by groups but I'm stuck in the final step:
## Shaded grouped violin plot ##
# Calculate limits and median by group #
df.lim <- df %>%
group_by(X, Z) %>%
summarise(Y_min = min(Y),
Y_qnt = quantile(Y, 0.5),
Y_max = max(Y))
# Calculate density, truncate at limits and assign shade category by group #
df.dens <- df %>%
group_by(X, Z) %>%
do(data.frame(LOC = density(.$Y)$x,
DENS = density(.$Y)$y)) %>%
left_join(., df.lim, by = c("X", "Z")) %>%
filter(LOC >= Y_min, LOC <= Y_max) %>%
mutate(COL = ifelse(LOC > Y_qnt, "Empty", "Filled"))
# Find density values at limits by group #
df.lim.2 <- df.dens %>%
group_by(X, Z) %>%
filter(LOC == min(LOC) | LOC == max(LOC))
# Produce shaded grouped violin plot #
df.dens %>%
ggplot(aes(x = LOC, group = interaction(X, Z))) +
# The following two lines don't work when included #
#geom_area(aes(y = DENS, alpha = COL), fill = "red") +
#geom_area(aes(y = -DENS, alpha = COL), fill = "red") +
geom_path(aes(y = DENS)) +
geom_path(aes(y = -DENS)) +
geom_segment(data = df.lim.2, aes(x = LOC, y = DENS, xend = LOC, yend = -DENS)) +
coord_flip() +
scale_alpha_manual(values = c("Empty" = 0.1, "Filled" = 1))
Running the code above will produce the outline of the violin plots for each group, each one on top of the other. But once I try to include the geom_area lines, the code fails.
My gut feeling tells me that I would need to somehow produce the "shaded" violin plot as a new geom which can then be used under the general structure of ggplot2 graphics but I have no idea how to do that, as my coding skills don't extend that far. Any help or pointers, either along my line of thought or in a different direction would be much appreciated. Thank you for your time.

Idea
For the fun of it, I hacked a quick half-violin geom. It is basically a lot of copy & paste from GeomViolin and in order to make it run I had to access some of the internal ggplot2 function, which are not exported via ::: which means that this solution may not run in the future (if the ggplot team decides to change their internal functions).
However, this solution works and you can specify the alpha level of both the upper and the lower part. The geom assumes that you are providing just one quantile. The code is only superficially tested but it gives you an idea of how this can be done. As said it is in large part a simple copy & paste from GeomViolin where I added some code which finds out which values are below and above the quantile and splits the underlying GeomPolygon in 2 parts, as this function uses just a single alpha value. It works with groups and coord_flip likewise.
Code
library(grid)
GeomHalfViolin <- ggproto("GeomHalfViolin", GeomViolin,
draw_group = function (self, data, ..., draw_quantiles = NULL,
alpha_upper = .5, alpha_lower = 1) {
data <- transform(data, xminv = x - violinwidth * (x - xmin),
xmaxv = x + violinwidth * (xmax - x))
newdata <- rbind(transform(data, x = xminv)[order(data$y),
], transform(data, x = xmaxv)[order(data$y, decreasing = TRUE),
])
newdata <- rbind(newdata, newdata[1, ])
if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {
stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <=
1))
stopifnot(length(draw_quantiles) <= 1)
## need to add ggplot::: to access ggplot2 internal functions here and there
quantiles <- ggplot2:::create_quantile_segment_frame(data, draw_quantiles)
###------------------------------------------------
## find out where the quantile is supposed to be
quantile_line <- unique(quantiles$y)
## which y values are below this quantile?
ind <- newdata$y <= quantile_line
## set the alpha values accordingly
newdata$alpha[!ind] <- alpha_upper
newdata$alpha[ind] <- alpha_lower
###------------------------------------------------
aesthetics <- data[rep(1, nrow(quantiles)), setdiff(names(data),
c("x", "y", "group")), drop = FALSE]
aesthetics$alpha <- rep(1, nrow(quantiles))
both <- cbind(quantiles, aesthetics)
both <- both[!is.na(both$group), , drop = FALSE]
quantile_grob <- if (nrow(both) == 0) {
zeroGrob()
}
else {
GeomPath$draw_panel(both, ...)
}
###------------------------------------------------
## GeomPolygon uses a single alpha value by default
## Hence, split the violin in two parts
ggplot2:::ggname("geom_half_violin",
grobTree(GeomPolygon$draw_panel(newdata[ind, ], ...),
GeomPolygon$draw_panel(newdata[!ind, ], ...),
quantile_grob))
###------------------------------------------------
}
else {
ggplot2:::ggname("geom_half_violin", GeomPolygon$draw_panel(newdata,
...))
}
}
)
geom_half_violin <- function(mapping = NULL, data = NULL, stat = "ydensity",
position = "dodge", ..., draw_quantiles = NULL,
alpha_upper = .5, alpha_lower = 1,
trim = TRUE, scale = "area",
na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) {
layer(data = data, mapping = mapping, stat = stat, geom = GeomHalfViolin,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(trim = trim, scale = scale, draw_quantiles = draw_quantiles,
alpha_lower = alpha_lower, alpha_upper = alpha_upper,
na.rm = na.rm, ...))
}
library(tidyverse)
# Create dummy data #
set.seed(321)
df <- data.frame(X = rep(c("X1", "X2"), each = 100),
Y = rgamma(n = 200, shape = 2, rate = 2),
Z = rep(c("Za", "Zb"), rep = 100),
stringsAsFactors = FALSE)
# Grouped violin plot #
df %>%
ggplot(., aes(x = X, y = Y, fill = Z)) +
geom_half_violin(draw_quantiles = 0.5, alpha_upper = .1) +
scale_fill_manual(values = c("Za" = "red", "Zb" = "blue"))
# no groups
df %>% filter(Z == "Za") %>%
ggplot(., aes(x = X, y = Y)) +
geom_half_violin(draw_quantiles = 0.5, alpha_upper = .1, fill = "red") +
scale_fill_manual(values = c("Za" = "red", "Zb" = "blue")) +
coord_flip()
Graphs

Related

How to modify breaks and lables within ggproto?

I have a ggplot function (ggproto) to generate a manhattan plot. The code works, but I can't get my function to correctly position the breaks and labels on the x-axis. So I would really appreciate help on: How I can get my function to plot the axis that I want. Is it possible within the stat_manhattan function or would it be better to code an extra scale_x_chr(or so) function? Many thanks in advance! P.S hope my question is clear.
This is the code for the function:
StatManhattan <- ggplot2::ggproto("StatManhattan",
ggplot2::Stat,
# set up parameters, e.g. unpack from list
setup_params = function(data, params) {
params
},
# Compute group is the most granular component to be called
compute_group = function(data, scales, params, col_chrom) {
message("My param has value ", scales)
data_1 <- data %>%
dplyr::arrange(chr, pos) %>%
dplyr::transmute(x = cumsum(as.numeric(pos)),
y,
chr)
data_2 <- data_1 %>%
dplyr::distinct(chr) %>%
dplyr::mutate(colour = rep(col_chrom, length.out = dplyr::n()))
# the final df
data_1 %>%
dplyr::left_join(data_2, by = "chr") %>%
as.data.frame()
},
required_aes = c("y", "pos", "chr"),
default_aes = aes(y = stat(y),
x = stat(x),
colour = stat(colour),
size = 0.2)
)
stat_manhattan <- function(mapping = NULL, data = NULL, geom = "point",
position = "identity", na.rm = FALSE, show.legend = FALSE,
inherit.aes = TRUE, col_chrom = c("magenta2", "grey60"),
...) {
ggplot2::layer(
stat = StatManhattan, data = data, mapping = mapping, geom = geom,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(na.rm = na.rm,
col_chrom = col_chrom,
...)
)
}
This can generate the desired plot, but the x-axis has continuous labels - which is not what I want.
# test data
test_data <- tibble(chr = rep(1:3, each = 1000),
bp = c(sort(sample.int(1e6, 1000)),
sort(sample.int(1e6, 1000)),
sort(sample.int(1e6, 1000))),
p = runif(3000, min=0, max=1))
# plotting the test data
ggplot(test_data)+aes(chr = chr, pos = bp, y =-log10(p))+stat_manhattan()
What I want to achieve is something like this, where the chromosome (chr) numbers are centered on the x-axis.
axisdf = test_data %>%
mutate(bp_cum = cumsum(as.numeric(bp))) %>%
group_by(chr) %>%
summarize(center=( max(bp_cum) + min(bp_cum) ) / 2 )
test_data %>%
mutate(bp_cum = cumsum(as.numeric(bp))) %>%
ggplot(., aes(x=bp_cum, y=-log10(p), colour = as.character(chr))) +
geom_point() +
# custom X axis:
scale_x_continuous(label = axisdf$chr, breaks= axisdf$center )

Plotting points in voronoi respective to x and y coords to join an additional coordinate

I am working on plotting a voronoi point between two coordinates. So that the point aligns with respective to the lower and upper values in the dataframe.
Unfortunately, when I plot the data, it linearly increases by the value of the points. This then produces the following plot:
Here's the script that I used:
library(ggplot2)
library(ggvoronoi)
ggplot(voronoi_data, aes(x=lower, y=upper)) + stat_voronoi() +geom_point(aes(fit))
ggplot(voronoi_data, aes(x = lower, y = upper)) +
stat_voronoi(
geom = "path",
color = 4,
lwd = 0.7,
linetype = 1
) + geom_point(aes(fit))
Some points do not lie within the voronoi diagram because they are either too large or too small. Therefore, I have thought of incrementing the upper and lower value by fit, and layering two voronoi over one another with two different colour schemes and then plotting the fit values (this is currently under process as I just had this idea.)
The update on the progress idea:
voronoi_1 <-
mapply(function(a, b)
a - b, voronoi_data$lower, voronoi_data$fit) %>% data.frame(lower_add =
.)
voronoi_1 <-
rbind(
voronoi_1,
mapply(function(a, b)
a + b, voronoi_data$lower, voronoi_data$fit) %>% data.frame(lower_add =
.)
)
voronoi_2 <-
mapply(function(a, b)
a + b, voronoi_data$upper, voronoi_data$fit) %>% data.frame(upper_add =
.)
voronoi_2 <-
rbind(
voronoi_2,
mapply(function(a, b)
a - b, voronoi_data$upper, voronoi_data$fit) %>% data.frame(upper_add =
.)
)
voronoi_3 <- rep(voronoi_data$fit, 2) %>% data.frame(fit = .)
voronoi_update <- cbind(voronoi_1, voronoi_2, voronoi_3)
ggplot(voronoi_data, aes(x = lower, y = upper)) +
stat_voronoi(
geom = "path",
color = 4,
lwd = 0.7,
linetype = 1
) + geom_point(aes(fit), col = 'blue') + stat_voronoi(
data = voronoi_update,
aes(x = lower_add, y = upper_add),
geom = "path",
color = 2,
lwd = 0.7,
linetype = 1
) + geom_point(col = 'green') + geom_segment(aes(
x = lower,
y = upper,
xend = fit,
yend = fit
))
Update produces the following picture:
How can I get the green-dots to connect with the blue-dots?
reproducible data:
structure(list(lower = c(-50.231394143356, -56.2551026846824,
28.4249214917657, -72.7725910398994, -81.2658846682781, 21.6407972918016,
-6.38857800084765, -83.9469403037355, -7.49345446155375, -9.25035611734441
), upper = c(83.3536041213786, 63.9866816320508, 145.329559457229,
48.0937531194102, 42.462115738722, 136.959651947817, 100.752432092854,
40.2250494139988, 110.279218627158, 107.6295802627), fit = c(16.5611049890113,
3.86578947368421, 86.8772404744972, -12.3394189602446, -19.4018844647781,
79.3002246198091, 47.1819270460033, -21.8609454448684, 51.3928820828022,
49.1896120726776)), class = "data.frame", row.names = c(NA, -10L
))
The reason why your points aren't lying in the Voronoi is that you have put aes(fit) inside geom_point. This is interpreted as the x axis value for the points, which presumably isn't what you want:
ggplot(voronoi_data, aes(x = lower, y = upper)) +
stat_voronoi(
geom = "path",
color = 4,
lwd = 0.7,
linetype = 1
) + geom_point()

Remove whiskers and outliers in R plotly

I have continuous data that I'd like to plot using R's plotly with a box or violin plot without the outliers and whiskers:
set.seed(1)
df <- data.frame(group=c(rep("g1",500),rep("g2",700),rep("g3",600)),
value=c(c(rep(0,490),runif(10,10,15)),abs(rnorm(700,1,10)),c(rep(0,590),runif(10,10,15))),
stringsAsFactors = F)
df$group <- factor(df$group, levels = c("g1","g2","g3"))
I know how to remove outliers in plotly:
plotly::plot_ly(x = df$group, y =df$value, type = 'box', color = df$group, boxpoints = F, showlegend = F)
But I'm still left with the whiskers.
I tried using ggplot2 for that (also limiting the height of the y-axis to that of the 75 percentile):
library(ggplot2)
gp <- ggplot(df, aes(group, value, color = group, fill = group)) + geom_boxplot(outlier.shape = NA, coef = 0) +
scale_y_continuous(limits = c(0, ceiling(max(dplyr::summarise(dplyr::group_by(df, group), tile = quantile(value, probs = 0.75))$tile)))) +
theme_minimal() + theme(legend.position = "none",axis.title = element_blank())
But then trying to convert that to a plotly object doesn't maintain that:
plotly::ggplotly(gp)
Any idea?
This is a workaround.
I changed your plot a bit, first.
# box without outliers
p <- plot_ly(df, x = ~group, y = ~value, type = 'box',
color = ~group, boxpoints = F, showlegend = F,
whiskerwidth = 0, line = list(width = 0)) # no whisker, max or min line
Then I add the medians back to the graph. This requires calculating the medians, matching the colors, and creating the shape lists for Plotly.
For the colors, it's odd, the first three default colors are used, but the order is g3, g2, g1...
# the medians
res = df %>% group_by(group) %>%
summarise(med = median(value))
# default color list: https://community.plotly.com/t/plotly-colours-list/11730/2
col = rev(c('#1f77b4', '#ff7f0e', '#2ca02c')) # the plot is colored 3, 2, 1
# discrete x-axis; domain default [0, 1]
# default box margin = .08, three groups, each get 1/3 of space
details <- function(col){ # need everytime basics
list(type = 'line',
line = list(color = col, width = 4),
xref = "paper", yref = "y")
}
# horizontal segments/ median
segs = lapply(1:nrow(res),
function(k){
x1 <- k/3 - .08 # if the domain is [0, 1]
x0 <- (k - 1)/3 + .08
y0 <- y1 <- res[k, ]$med
line = list("x0" = x0, "x1" = x1,
"y0" = y0, "y1" = y1)
deets = details(col[k])
c(deets, line)
})
Finally, I added them back onto the plot.
p %>% layout(shapes = segs)
I made the lines obnoxiously wide, but you get the idea.
If you wanted the IQR outline back, you could do this, as well. I used functions here, as well. I figured that the data you've provided is not the actual data, so the function will serve a purpose.
# include IQR outline
res2 = df %>% group_by(group) %>%
summarise(q1 = setNames(quantile(value, type = 7, 1/4), NULL),
q3 = setNames(quantile(value, type = 7, 3/4), NULL),
med = median(value))
# IQR segments
rects = lapply(1:nrow(res2), # if the domain is [0, 1]
function(k){
x1 <- k/3 - .08
x0 <- (k - 1)/3 + .08
y0 <- res2[k, ]$q1
y1 <- res2[k, ]$q3
line = list(color = col[k], width = 4)
rect = list("x0" = x0, "x1" = x1,
"y0" = y0, "y1" = y1,
type = "rect", xref = "paper",
yref = "y", "line" = line)
rect
})
rects = append(segs, rects)
p %>% layout(shapes = rects)

How to draw a multi-colored dashed line (alternating colors for visual effect) [duplicate]

This question already has answers here:
Alternating color of individual dashes in a geom_line
(4 answers)
Closed 8 months ago.
I was wondering if it is possible to create a multicolored dashed line in ggplot.
Basically I have a plot displaying savings based on two packages.
A orange line with savings based on package A
A green line with savings based on package B
I also have a third line and I would like that one to be dashed alterenating between orange and green. Is that something that somebody has been able to do?
Here is an example:
library(tidyverse)
S <- seq(0, 5, by = 0.05)
a <- S ^ 2
b <- S
a_b = a + b #This data should have the dashed multicolor line, since it is the sum of the other two lines.
S <- data.frame(S)
temp <- cbind(S, a, b, a_b)
temp <- gather(temp, variable, value, -S)
desiredOrder <- c("a", "b", "a_b")
temp$variable <- factor(temp$variable, levels = desiredOrder)
temp <- temp[order(temp$variable),]
p <- ggplot(temp, aes(x = S, y = value, colour = variable)) +
theme_minimal() +
geom_line(size = 1) +
scale_color_manual(name = "Legend", values = c("orange", "green", "#0085bd"),
breaks = c("a", "b", "a_b"))
p
I basically want to have a multicolored (dashed or dotted) line for "c"
This is, to my best knowledge, currently only possible via creation of new segments for each alternate color. This is fiddly.
Below I've tried a largely programmatic approach in which you can define the size of the repeating segment (based on your x unit). The positioning of y values is slightly convoluted and it will also result in slightly irregular segment lengths when dealing with different slopes. I also haven't tested it on many data, either. But I guess it's a good start :)
For the legend, I'm taking the same approach, by creating a fake legend and stitching it onto the other plot. The challenges here include:
positioning of legend elements relative to the plot
relative distance between the legend elements
update
For a much neater way to create those segments and a Stat implementation see this thread
library(tidyverse)
library(patchwork)
S <- seq(0, 5, by = 0.05)
a <- S^2
b <- S
a_b <- a + b
df <- data.frame(x = S, a, b, a_b) %>%
pivot_longer(-x, names_to = "variable", values_to = "value")
## a function to create modifiable cuts in order to get segments.
## this looks convoluted - and it is! there are a few if/else statements.
## Why? The assigment of new y to x values depends on how many original values
## you have.
## There might be more direct ways to get there
alt_colors <- function(df, x, y, seg_length, my_cols) {
x <- df[[x]]
y <- df[[y]]
## create new x for each tiny segment
length_seg <- seg_length / length(my_cols)
new_x <- seq(min(x, na.rm = TRUE), x[length(x)], length_seg)
## now we need to interpolate y values for each new x
## This is different depending on how many x and new x you have
if (length(new_x) < length(x)) {
ind_int <- findInterval(new_x, x)
new_y <- sapply(seq_along(ind_int), function(i) {
if (y[ind_int[i]] == y[ind_int[length(ind_int)]]) {
y[ind_int[i]]
} else {
seq_y <- seq(y[ind_int[i]], y[ind_int[i] + 1], length.out = length(my_cols))
head(seq_y, -1)
}
})
} else {
ind_int <- findInterval(new_x, x)
rle_int <- rle(ind_int)
new_y <- sapply(rle_int$values, function(i) {
if (y[i] == y[max(rle_int$values)]) {
y[i]
} else {
seq_y <- seq(y[i], y[i + 1], length.out = rle_int$lengths[i] + 1)
head(seq_y, -1)
}
})
}
## THis is also a bit painful and might cause other bugs that I haven't
## discovered yet.
if (length(unlist(new_y)) < length(new_x)) {
newdat <- data.frame(
x = new_x,
y = rep_len(unlist(new_y), length.out = length(new_x))
)
} else {
newdat <- data.frame(x = new_x, y = unlist(new_y))
}
newdat <- newdat %>%
mutate(xend = lead(x), yend = lead(y)) %>%
drop_na(xend)
newdat$color <- my_cols
newdat
}
## the below is just a demonstration of how the function would work
## using different segment widths
df_alt1 <-
df %>%
filter(variable == "a_b") %>%
alt_colors("x", "value", 1, c("orange", "green"))
df_alt.5 <-
df %>%
filter(variable == "a_b") %>%
alt_colors("x", "value", .5, c("orange", "green"))
df_ab <-
df %>%
filter(variable != "a_b") %>%
# for the identity mapping
mutate(color = ifelse(variable == "a", "green", "orange"))
## create data frame for the legend, also using the alt_colors function as per above
## the amount of x is a bit of trial and error, this is just a quick hack
## this is a trick to center the legend more or less relative to the main plot
y_leg <- ceiling(mean(range(df$value, na.rm = TRUE)))
dist_y <- 2
df_legend <-
data.frame(
variable = rep(unique(df$variable), each = 2),
x = 1:2,
y = rep(seq(y_leg - dist_y, y_leg + dist_y, by = dist_y), each = 2)
)
df_leg_onecol <-
df_legend %>%
filter(variable != "a_b") %>%
mutate(color = ifelse(variable == "a", "green", "orange"))
df_leg_alt <-
df_legend %>%
filter(variable == "a_b") %>%
alt_colors("x", "y", .5, c("orange", "green"))
## I am mapping the colors globally using identity mapping (see scale_identity).
p1 <-
ggplot(mapping = aes(x, value, colour = color)) +
theme_minimal() +
geom_line(data = df_ab, size = 1) +
geom_segment(data = df_alt1, aes(y = y, xend = xend, yend = yend), size = 1) +
scale_color_identity() +
ggtitle("alternating every 1 unit")
p.5 <-
ggplot(mapping = aes(x, value, colour = color)) +
theme_minimal() +
geom_line(data = df_ab, size = 1) +
geom_segment(data = df_alt.5, aes(y = y, xend = xend, yend = yend), size = 1) +
scale_color_identity() +
ggtitle("alternating every .5 unit")
p_leg <-
ggplot(mapping = aes(x, y, colour = color)) +
theme_void() +
geom_line(data = df_leg_onecol, size = 1) +
geom_segment(data = df_leg_alt, aes(xend = xend, yend = yend), size = 1) +
scale_color_identity() +
annotate(
geom = "text", y = unique(df_legend$y), label = unique(df_legend$variable),
x = max(df_legend$x + 1), hjust = 0
)
## set y limits to the range of the main plot
## in order to make the labels visible you need to adjust the plot margin and
## turn clipping off
p1 + p.5 +
(p_leg + coord_cartesian(ylim = range(df$value), clip = "off") +
theme(plot.margin = margin(r = 20, unit = "pt"))) +
plot_layout(widths = c(1, 1, .2))
Created on 2022-01-18 by the reprex package (v2.0.1)
(Copied this over from Alternating color of individual dashes in a geom_line)
Here's a ggplot hack that is simple, but works for two colors only. It results in two lines being overlayed, one a solid line, the other a dashed line.
library(dplyr)
library(ggplot2)
library(reshape2)
# Create df
x_value <- 1:10
group1 <- c(0,1,2,3,4,5,6,7,8,9)
group2 <- c(0,2,4,6,8,10,12,14,16,18)
dat <- data.frame(x_value, group1, group2) %>%
mutate(group2_2 = group2) %>% # Duplicate the column that you want to be alternating colors
melt(id.vars = "x_value", variable.name = "group", value.name ="y_value") # Long format
# Put in your selected order
dat$group <- factor(dat$group, levels=c("group1", "group2", "group2_2"))
# Plot
ggplot(dat, aes(x=x_value, y=y_value)) +
geom_line(aes(color=group, linetype=group), size=1) +
scale_color_manual(values=c("black", "red", "black")) +
scale_linetype_manual(values=c("solid", "solid", "dashed"))
Unfortunately the legend still needs to be edited by hand. Here's the example plot.

ggplot2 shade area under density curve by group

I have this dataframe:
set.seed(1)
x <- c(rnorm(50, mean = 1), rnorm(50, mean = 3))
y <- c(rep("site1", 50), rep("site2", 50))
xy <- data.frame(x, y)
And I have made this density plot:
library(ggplot2)
ggplot(xy, aes(x, color = y)) + geom_density()
For site1 I need to shade the area under the curve that > 1% of the data. For site2 I need to shade the area under the curve that < 75% of the data.
I'm expecting the plot to look something like this (photoshopped). Having been through stack overflow, I'm aware that others have asked how to shade part of the area under a curve, but I cannot figure out how to shade the area under a curve by group.
Here is one way (and, as #joran says, this is an extension of the response here):
# same data, just renaming columns for clarity later on
# also, use data tables
library(data.table)
set.seed(1)
value <- c(rnorm(50, mean = 1), rnorm(50, mean = 3))
site <- c(rep("site1", 50), rep("site2", 50))
dt <- data.table(site,value)
# generate kdf
gg <- dt[,list(x=density(value)$x, y=density(value)$y),by="site"]
# calculate quantiles
q1 <- quantile(dt[site=="site1",value],0.01)
q2 <- quantile(dt[site=="site2",value],0.75)
# generate the plot
ggplot(dt) + stat_density(aes(x=value,color=site),geom="line",position="dodge")+
geom_ribbon(data=subset(gg,site=="site1" & x>q1),
aes(x=x,ymax=y),ymin=0,fill="red", alpha=0.5)+
geom_ribbon(data=subset(gg,site=="site2" & x<q2),
aes(x=x,ymax=y),ymin=0,fill="blue", alpha=0.5)
Produces this:
The problem with #jlhoward's solution is that you need to manually add goem_ribbon for each group you have. I wrote my own ggplot stat wrapper following this vignette. The benefit of this is that it automatically works with group_by and facet and you don't need to manually add geoms for each group.
StatAreaUnderDensity <- ggproto(
"StatAreaUnderDensity", Stat,
required_aes = "x",
compute_group = function(data, scales, xlim = NULL, n = 50) {
fun <- approxfun(density(data$x))
StatFunction$compute_group(data, scales, fun = fun, xlim = xlim, n = n)
}
)
stat_aud <- function(mapping = NULL, data = NULL, geom = "area",
position = "identity", na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, n = 50, xlim=NULL,
...) {
layer(
stat = StatAreaUnderDensity, data = data, mapping = mapping, geom = geom,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(xlim = xlim, n = n, ...))
}
Now you can use stat_aud function just like other ggplot geoms.
set.seed(1)
x <- c(rnorm(500, mean = 1), rnorm(500, mean = 3))
y <- c(rep("group 1", 500), rep("group 2", 500))
t_critical = 1.5
tibble(x=x, y=y)%>%ggplot(aes(x=x,color=y))+
geom_density()+
geom_vline(xintercept = t_critical)+
stat_aud(geom="area",
aes(fill=y),
xlim = c(0, t_critical),
alpha = .2)
tibble(x=x, y=y)%>%ggplot(aes(x=x))+
geom_density()+
geom_vline(xintercept = t_critical)+
stat_aud(geom="area",
fill = "orange",
xlim = c(0, t_critical),
alpha = .2)+
facet_grid(~y)
You need to use fill. color controls the outline of the density plot, which is necessary if you want non-black outlines.
ggplot(xy, aes(x, color=y, fill = y, alpha=0.4)) + geom_density()
To get something like that. Then you can remove the alpha part of the legend by using
ggplot(xy, aes(x, color = y, fill = y, alpha=0.4)) + geom_density()+ guides(alpha='none')

Resources