Dodge two different geoms apart in ggplot2 - r

Let's say I have two different sources of data. One is of repeated observations, and one is just a mean +/- standard error predicted by a model.
n <- 30
obs <- data.frame(
group = rep(c("A", "B"), each = n*3),
level = rep(rep(c("low", "med", "high"), each = n), 2),
yval = c(
rnorm(n, 30), rnorm(n, 50), rnorm(n, 90),
rnorm(n, 40), rnorm(n, 55), rnorm(n, 70)
)
) %>%
mutate(level = factor(level, levels = c("low", "med", "high")))
model_preds <- data.frame(
group = c("A", "A", "A", "B", "B", "B"),
level = rep(c("low", "med", "high"), 2),
mean = c(32,56,87,42,51,74),
sem = runif(6, min = 2, max = 5)
)
now I can plot these on the same graph easily enough
p <- ggplot(obs, aes(x = level, y = yval, fill = group)) +
geom_boxplot() +
geom_point(data = model_preds, aes(x = level, y = mean), size = 2, colour = "forestgreen") +
geom_errorbar(data = model_preds, aes(x = level, y = mean, ymax = mean + sem, ymin = mean - sem), colour = "forestgreen", size = 1) +
facet_wrap(~group)
and use that the visually look at the difference between the model predictions and the observed results.
But I think this looks a bit ugly, so ideally would want to 'dodge' the point-and-errorbars geom(s) from the boxplot geom.
If you'll forgive my quick paint drawing, something like this:
It seems like position_dodge() might be the way to go but I haven't figured out how to combine two different geoms this way and the docs don't have any examples.
Might be that it's impossible, but thought I'd ask to check

As a consequence of the grammer of graphics, which clearly separates various aspects of plotting, there is no way to communicate information between different layers (geoms and stats) of a plot. This also means that a position adjustment cannot be shared across layers, such that they can be dodged in a multi-layer fashion.
The next best thing you could do, is to use position = position_nudge() in every layer, so that across the layers they seem dodged. You might also want to adjust the width parameter of the boxplot and errorbar for this. Example below:
library(tidyverse)
n <- 30
obs <- data.frame(
group = rep(c("A", "B"), each = n*3),
level = rep(rep(c("low", "med", "high"), each = n), 2),
yval = c(
rnorm(n, 30), rnorm(n, 50), rnorm(n, 90),
rnorm(n, 40), rnorm(n, 55), rnorm(n, 70)
)
) %>%
mutate(level = factor(level, levels = c("low", "med", "high")))
model_preds <- data.frame(
group = c("A", "A", "A", "B", "B", "B"),
level = rep(c("low", "med", "high"), 2),
mean = c(32,56,87,42,51,74),
sem = runif(6, min = 2, max = 5)
)
ggplot(obs, aes(x = level, y = yval, fill = group)) +
geom_boxplot(position = position_nudge(x = -0.3),
width = 0.5) +
geom_point(data = model_preds, aes(x = level, y = mean),
size = 2, colour = "forestgreen",
position = position_nudge(x = 0.3)) +
geom_errorbar(data = model_preds,
aes(x = level, y = mean, ymax = mean + sem, ymin = mean - sem),
colour = "forestgreen", size = 1, width = 0.5,
position = position_nudge(x = 0.3)) +
facet_wrap(~group)
Created on 2021-01-17 by the reprex package (v0.3.0)

Related

Overlay two plots from different dataframes in R

I would like to overlay two ggplots from different data sources. I don't think a left_join will work because the dataframes are of two different lengths and would potential change the underlying plots.[Maybe?]
library(tidyverse)
set.seed(123)
player_df <- tibble(name = rep(c("A","B","C","D"), each = 10, times = 1),
pos = rep(c("DEF","DEF","MID","MID"), each = 10, times = 1),
load = c(rnorm(10, mean = 200, sd = 100),
rnorm(10, mean = 300, sd = 50),
rnorm(10, mean = 400, sd = 100),
rnorm(10, mean = 500, sd = 50)))
p1 <- player_df %>%
ggplot(aes(x = load, y = name)) +
geom_point()
pos_df <- tibble(pos = rep(c("DEF","MID"), each = 30, times = 1),
load = (c(rnorm(30, mean = 250, sd = 100),
rnorm(30, mean = 350, sd = 100))))
p2 <- pos_df %>%
ggplot(aes(x = load, y = pos)) +
geom_boxplot()
p1
p2
# add p2 to every p1 player plot by pos
I would like p1 to have the corresponding p2 - by pos - appear behind it. So... add the matching p2 boxplot to each p1 scatterplot.
p1:
p2:
It's not really advisable to attempt to superimpose two plots on each other. A ggplot is made of layers already, so usually it's just a case of superimposing one geom on another. This can be difficult if (as in your case) one of the axes has different labels. However, with a little work it is possible to wrangle your data so that it all sits on a single plot. In your case, you could do something like:
levs <- c("A", "DEF", "B", "C", "MID", "D")
ggplot(within(pos_df, pos <- factor(pos, levs)), aes(x = load, y = pos)) +
geom_boxplot(width = 2.3) +
geom_point(data = within(player_df, pos <- factor(name, levs))) +
scale_y_discrete(limits = c("A", "DEF", "B", " ", "C", "MID", "D"))
Dug into ggplot a bit and re-engineered a boxplot bit by bit.
# manually calculate stats that are used in boxplots
pos_df_summary <- pos_df %>%
group_by(pos, .drop = FALSE) %>%
summarise(min = fivenum(load)[1],
Q1 = fivenum(load)[2],
median = fivenum(load)[3],
Q3 = fivenum(load)[4],
max = fivenum(load)[5]
)
# add the boxplot data to each player
joined_df <- player_df %>%
left_join(., pos_df_summary, by = "pos") %>%
distinct(name, .keep_all = TRUE)
# plot
ggplot(data = NULL, aes(group = name)) +
# create the line from min to max
geom_segment(data = joined_df, aes(y = name, yend = name, x=min, xend=max), color="black") +
#create the box with median line
geom_crossbar(data = joined_df,
aes(y = name, xmin = Q1, xmax = Q3, x = median, fill = "NA"),
color = "black",
fatten = 1) +
scale_fill_manual(values = "white") +
# add the points from the player_df
geom_point(data = player_df,
aes(x = load, y = name, group=name),
color = "red",
show.legend=FALSE) +
theme(legend.position = "none")
There may be some extraneous code in here as I cobbled it from some other resources. Specifically, I'm not sure what the aes(group = name) in the ggplot() call does exactly.

How to extend line across entire violin plot

Dataframe as example:
library(tidyverse)
set.seed(123)
df <- data.frame("b" = runif(1000, min = 2, max = 10),
"c" = runif(1000, min = 2, max = 10),
"d" = runif(1000, min = 2, max = 10))
df_2 <- data.frame(id = c("b", "c", "d"),
cutoff = c(5, 3, 5),
stringsAsFactors = FALSE)
df <-
pivot_longer(
df,
cols = c("b", "c", "d"),
names_to = "id",
values_to = "value"
) %>%
left_join(df_2, by = "id")
I can now make a violin plot (or a boxplot, same issue) with a line overlaid:
df %>%
ggplot(aes(x = id)) +
geom_violin(aes(y = value)) +
geom_line(aes(x = id, y = cutoff, group = 1), color = red)
What I'd like though is three lines (don't need to be connected) each of which extend across the entire width of a single violin, at the cutoff value specified in df_2.
I can do this manually with geom_segment, but is there a better, more programmatic way?
df %>%
ggplot(aes(x = id)) +
geom_violin(aes(y = value)) +
geom_segment(aes(x = 0.55, xend = 1.45, y = 5, yend = 5), color = "blue") +
geom_segment(aes(x = 1.55, xend = 2.45, y = 3, yend = 3), color = "blue") +
geom_segment(aes(x = 2.55, xend = 3.45, y = 5, yend = 5), color = "blue")
I understand that at some fundamental level the x-axis is ordered by factor level, with b = 1, c = 2 etc., so asking for a line intersecting x = 0.9 would require specifying corresponding y value. In another sense though, ggplot2 clearly knows (in some sense) that the region above x = 0.9 (that is, y values intersected by a vertical line at x = 0.9) is associated with factor level b because the corresponding violin for b overlaps that region. Is there a way to get at that information?
You can use geom_errorbar(). So change your second block to:
df %>%
ggplot(aes(x = id)) +
geom_violin(aes(y = value)) +
geom_errorbar(aes(x = id, ymin = cutoff,ymax = cutoff), color = "red")

Reorder panels in ridgeline plot

I have a dataframe like this:
set.seed(3467)
df<- data.frame(method= c(rep("A", 1000), rep("B", 1000), rep("C", 1000)),
beta=c(rnorm(1000, mean=0, sd=1),rnorm(1000, mean=2, sd=1.4),rnorm(1000, mean=0, sd=0.5)))
I wish to create a ridgeline plot similar to this:
library(ggplot2)
library(ggridges)
ggplot() +
geom_rect(data = data.frame(x = 1),
xmin = -0.391, xmax = 0.549, ymin = -Inf, ymax = Inf,
alpha = 0.5, fill = "gray") +
geom_density_ridges(data = df, aes(x = beta, y = method, color = method, fill = method),
size=0.75)+
xlim(-5,5)+
scale_fill_manual(values = c("#483d8b50", "#0072B250","#228b2250")) +
scale_color_manual(values = c("#483d8b", "#0072B2", "#228b22"), guide = "none") +
stat_density_ridges(data = df, aes(x = beta, y = method, color = method, fill = method),
quantile_lines = TRUE, quantiles = c(0.025, 0.5, 0.975), alpha = 0.6, size=0.75)+
scale_y_discrete(expand = expand_scale(add = c(0.1, 0.9)))
However, I wish to order the y-axis in the order of method= "B", "C", "A" not method= "A", "B", "C"
I have tried the following method, without success, to reorder the density plots:
library(dplyr)
df %>%
mutate(method = fct_relevel(method,
"B", "C", "A"))%>%
ggplot() +
geom_rect(data = data.frame(x = 1),
xmin = -0.391, xmax = 0.549, ymin = -Inf, ymax = Inf,
alpha = 0.5, fill = "gray") +
geom_density_ridges(data = df, aes(x = beta, y = method, color = method, fill = method),
size=0.75)+
xlim(-5,5)+
scale_fill_manual(values = c("#483d8b50", "#0072B250","#228b2250")) +
scale_color_manual(values = c("#483d8b", "#0072B2", "#228b22"), guide = "none") +
stat_density_ridges(data = df, aes(x = beta, y = method, color = method, fill = method),
quantile_lines = TRUE, quantiles = c(0.025, 0.5, 0.975), alpha = 0.6, size=0.75)+
scale_y_discrete(expand = expand_scale(add = c(0.1, 0.9)))
You were nearly there. - You need to specify the levels argument within fct_relevel. (See ?fct_relevel: there is no levels argument, but ..., you have to specify its name!)
library(tidyverse)
library(ggridges)
set.seed(3467)
df<- data.frame(method= c(rep("A", 1000), rep("B", 1000), rep("C", 1000)),
beta=c(rnorm(1000, mean=0, sd=1),rnorm(1000, mean=2, sd=1.4),rnorm(1000, mean=0, sd=0.5)))
# here is the main change:
df <- df %>%
mutate(method = fct_relevel(method, levels = "B", "C", "A"))
ggplot(df) +
geom_density_ridges(data = df, aes(x = beta, y = method, color = method, fill = method))
#> Picking joint bandwidth of 0.225
Created on 2020-02-17 by the reprex package (v0.3.0)
If you change the method column to ordered factor, it should work:
df$method <- factor(df$method, levels = c("C", "A", "B"), ordered = TRUE)

Use position_jitterdodge without mapping aesthetic

I would like to produce a plot like the one obtained with the code below. However, I would like to dodge by "replicate", but without actually mapping an aesthetic (because I would like to assign fill and colors to other aesthetics).
dataset <- data_frame(sample = rep(c("Sample1","Sample2","Sample3", "Sample4"), each = 25),
replicate = sample(x = c("A", "B"), size = 100, replace = TRUE),
value = rnorm(n = 100, mean = 0, sd = 10))
ggplot(data = dataset, aes(x = sample, y = value, fill = replicate)) +
geom_point(position = position_jitterdodge(jitter.width = 0.15, dodge.width = 0.75),
show.legend = F)
I had hope using group = replicate instead of fill = replicate but this doesn't work. I can imagine a workaround using for example alpha = replicate as an aesthetic and setting scale_alpha_manual(values = c(1, 1)) in case of duplicates, but I don't find this solution ideal and would like to keep all aesthetics available (other than x and y available for further use)
ggplot(data = dataset, aes(x = sample, y = value, alpha = replicate)) +
geom_point(position = position_jitterdodge(jitter.width = 0.15, dodge.width = 0.75),
show.legend = F) +
scale_alpha_manual(values = c(1, 1))
The plot that I expect to get is:
I hope my question makes sense, any hint ?
Best,
Yvan
You could unite the sample and replicate columns and use that as the x-axis, injecting a 'Placeholder' value for spacing between samples.
library(tidyverse)
set.seed(20181101)
dataset <- data_frame(sample = rep(c("Sample1","Sample2","Sample3", "Sample4"), each = 25),
replicate = sample(x = c("A", "B"), size = 100, replace = TRUE),
value = rnorm(n = 100, mean = 0, sd = 10))
dataset %>%
bind_rows({
#create a dummy placeholder to allow for spacing between samples
data.frame(sample = unique(dataset$sample),
replicate = rep("Placeholder", length(unique(dataset$sample))),
stringsAsFactors = FALSE)
}) %>%
#unite the sample & replicate columns, and use it as the new x-axis
unite(sample_replicate, sample, replicate, remove = FALSE) %>%
ggplot(aes(x = sample_replicate, y = value, color = replicate)) +
geom_jitter() +
#only have x-axis labels for each sample
scale_x_discrete(breaks = paste0("Sample", 1:length(unique(dataset$sample)), "_B"),
labels = paste0("Sample ", 1:length(unique(dataset$sample)))) +
labs(x = "Sample") +
#don't show the Placeholder value in the legend
scale_color_discrete(breaks = c("A", "B"))

Divide one bin2d by another bin2d to generate a new bin2d plot in ggplot

I have two data.frames x and y. I am interested in the change of relations between "A" and "B" from x to y. So, I plot bin2d plot "A" and "B" using geom_bin2d for x and y separately. But, it's hard to find the difference between two bin2d plots.
ggplot(X1, aes_string("factor_1", "factor_2")) + geom_bin2d(aes(fill = ..density..)) + scale_fill_gradientn(colours = c("red", "blue"))
ggplot(X2, aes_string("factor_1", "factor_2")) + geom_bin2d(aes(fill = ..density..)) + scale_fill_gradientn(colours = c("red", "blue"))
Is there a way to do (bin2d_x / bin2d_y) or (bin2d_x - bin2d_y), and then plot the difference?
Test code to illustrate the situation:
X1 <- data.frame( class = "a", "factor_1" = rnorm(5000, mean = 0, sd = 1), "factor_2" = rnorm(5000, mean = 0, sd =1))
X2 <- data.frame( class = "b", "factor_1" = rnorm(5000, mean = 3, sd = 1), "factor_2" = rnorm(5000, mean = 3, sd =1))
ggplot(rbind(X1, X2), aes_string("factor_1", "factor_2")) + geom_bin2d(aes(fill = ..density..)) + scale_fill_gradientn(colours = c("red", "blue"))
But, class variable cannot be properly shown in the graph.

Resources