I'm trying to colour specific points within a ridge plot, but the points I am trying to highlight aren't displayed nor are they in the legend. In this example I am trying to highlight the points with labels "X" and "Y" withing ridges based on groups "A" and "B", but the points associated with "X" aren't plotted.
library(tidyverse)
library(ggridges)
data <- tibble(y = 1:10,
group = as.factor(rep(c("A", "B"), each=5)),
subgroup = as.factor(rep(c("X", rep("Y", each=4)),times=2)))
data%>%
ggplot(aes(x = y, y= group, fill = group))+
geom_density_ridges(alpha=0.5)+
geom_density_ridges(aes(point_fill = subgroup, point_color = subgroup),
alpha=0, colour = NA, jittered_points = T, point_alpha=1)
I was expecting the points associated with subgroup "X" to be plotted as points in a different colour and for subgroup "X" to appear in the legend.
You need to explicitly define the group, then it works:
library(dplyr)
library(ggplot2)
library(ggridges)
data <- tibble(y = 1:10,
group = as.factor(rep(c("A", "B"), each=5)),
subgroup = as.factor(rep(c("X", rep("Y", each=4)),times=2)))
data%>%
ggplot(aes(x = y, y= group, group=group, fill = group))+
geom_density_ridges(alpha=0.5)+
geom_density_ridges(aes(point_fill = subgroup, point_color = subgroup),
alpha=0, colour = NA, jittered_points = T, point_alpha=1)
#> Picking joint bandwidth of 0.974
#> Picking joint bandwidth of 0.974
Created on 2022-04-06 by the reprex package (v2.0.1)
You could achieve your desired result by adding the group aesthetic to the second geom_density_ridges, i.e. add group = group to aes.
library(tibble)
library(ggplot2)
library(ggridges)
set.seed(123)
data <- tibble(
y = 1:10,
group = as.factor(rep(c("A", "B"), each = 5)),
subgroup = as.factor(rep(c("X", rep("Y", each = 4)), times = 2))
)
ggplot(data, aes(x = y, y = group)) +
geom_density_ridges(aes(fill = group), alpha = 0.5) +
geom_density_ridges(aes(point_color = subgroup, group = group),
alpha = 0, colour = NA, jittered_points = T, point_alpha = 1
)
#> Picking joint bandwidth of 0.974
#> Picking joint bandwidth of 0.974
Related
I got a bar graph with both alpha and fill. I would like the plot to alternate with alpha and mix colors. Currently, I only get it to mix alpha and order by colors (see MWE below). Any help towards changing the ordering would be very appreciated.
library(tibble)
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.2.2
set.seed(1)
tibble(
y = runif(20),
x = letters[rep(1:2, times = 10)],
f = factor(LETTERS[rep(1:2, each = 10)], levels = LETTERS[1:2], ordered = TRUE),
a = factor(rep(1:5, times = 4), levels = 1:5, ordered = TRUE)
) %>%
ggplot(aes(x, y, fill = f, alpha = a)) +
geom_col(position = position_dodge2(1)) +
scale_alpha_discrete(range = c(.5, 1))
#> Warning: Using alpha for a discrete variable is not advised.
Created on 2023-01-26 with reprex v2.0.2
I.e., I am trying to get dark yellow mixed with dark purple and light/transparent yellow with light/transparent purple.
Perhaps like this ..., i.e. simply switch the order of fill and alpha to change the grouping:
library(tibble)
library(ggplot2)
set.seed(1)
tibble(
y = runif(20),
x = letters[rep(1:2, times = 10)],
f = factor(LETTERS[rep(1:2, each = 10)], levels = LETTERS[1:2], ordered = TRUE),
a = factor(rep(1:5, times = 4), levels = 1:5, ordered = TRUE)
) %>%
ggplot(aes(x, y, alpha = a, fill = f)) +
geom_col(position = position_dodge2(1)) +
scale_alpha_discrete(range = c(.5, 1))
#> Warning: Using alpha for a discrete variable is not advised.
Let's say I have two different sources of data. One is of repeated observations, and one is just a mean +/- standard error predicted by a model.
n <- 30
obs <- data.frame(
group = rep(c("A", "B"), each = n*3),
level = rep(rep(c("low", "med", "high"), each = n), 2),
yval = c(
rnorm(n, 30), rnorm(n, 50), rnorm(n, 90),
rnorm(n, 40), rnorm(n, 55), rnorm(n, 70)
)
) %>%
mutate(level = factor(level, levels = c("low", "med", "high")))
model_preds <- data.frame(
group = c("A", "A", "A", "B", "B", "B"),
level = rep(c("low", "med", "high"), 2),
mean = c(32,56,87,42,51,74),
sem = runif(6, min = 2, max = 5)
)
now I can plot these on the same graph easily enough
p <- ggplot(obs, aes(x = level, y = yval, fill = group)) +
geom_boxplot() +
geom_point(data = model_preds, aes(x = level, y = mean), size = 2, colour = "forestgreen") +
geom_errorbar(data = model_preds, aes(x = level, y = mean, ymax = mean + sem, ymin = mean - sem), colour = "forestgreen", size = 1) +
facet_wrap(~group)
and use that the visually look at the difference between the model predictions and the observed results.
But I think this looks a bit ugly, so ideally would want to 'dodge' the point-and-errorbars geom(s) from the boxplot geom.
If you'll forgive my quick paint drawing, something like this:
It seems like position_dodge() might be the way to go but I haven't figured out how to combine two different geoms this way and the docs don't have any examples.
Might be that it's impossible, but thought I'd ask to check
As a consequence of the grammer of graphics, which clearly separates various aspects of plotting, there is no way to communicate information between different layers (geoms and stats) of a plot. This also means that a position adjustment cannot be shared across layers, such that they can be dodged in a multi-layer fashion.
The next best thing you could do, is to use position = position_nudge() in every layer, so that across the layers they seem dodged. You might also want to adjust the width parameter of the boxplot and errorbar for this. Example below:
library(tidyverse)
n <- 30
obs <- data.frame(
group = rep(c("A", "B"), each = n*3),
level = rep(rep(c("low", "med", "high"), each = n), 2),
yval = c(
rnorm(n, 30), rnorm(n, 50), rnorm(n, 90),
rnorm(n, 40), rnorm(n, 55), rnorm(n, 70)
)
) %>%
mutate(level = factor(level, levels = c("low", "med", "high")))
model_preds <- data.frame(
group = c("A", "A", "A", "B", "B", "B"),
level = rep(c("low", "med", "high"), 2),
mean = c(32,56,87,42,51,74),
sem = runif(6, min = 2, max = 5)
)
ggplot(obs, aes(x = level, y = yval, fill = group)) +
geom_boxplot(position = position_nudge(x = -0.3),
width = 0.5) +
geom_point(data = model_preds, aes(x = level, y = mean),
size = 2, colour = "forestgreen",
position = position_nudge(x = 0.3)) +
geom_errorbar(data = model_preds,
aes(x = level, y = mean, ymax = mean + sem, ymin = mean - sem),
colour = "forestgreen", size = 1, width = 0.5,
position = position_nudge(x = 0.3)) +
facet_wrap(~group)
Created on 2021-01-17 by the reprex package (v0.3.0)
Dataframe as example:
library(tidyverse)
set.seed(123)
df <- data.frame("b" = runif(1000, min = 2, max = 10),
"c" = runif(1000, min = 2, max = 10),
"d" = runif(1000, min = 2, max = 10))
df_2 <- data.frame(id = c("b", "c", "d"),
cutoff = c(5, 3, 5),
stringsAsFactors = FALSE)
df <-
pivot_longer(
df,
cols = c("b", "c", "d"),
names_to = "id",
values_to = "value"
) %>%
left_join(df_2, by = "id")
I can now make a violin plot (or a boxplot, same issue) with a line overlaid:
df %>%
ggplot(aes(x = id)) +
geom_violin(aes(y = value)) +
geom_line(aes(x = id, y = cutoff, group = 1), color = red)
What I'd like though is three lines (don't need to be connected) each of which extend across the entire width of a single violin, at the cutoff value specified in df_2.
I can do this manually with geom_segment, but is there a better, more programmatic way?
df %>%
ggplot(aes(x = id)) +
geom_violin(aes(y = value)) +
geom_segment(aes(x = 0.55, xend = 1.45, y = 5, yend = 5), color = "blue") +
geom_segment(aes(x = 1.55, xend = 2.45, y = 3, yend = 3), color = "blue") +
geom_segment(aes(x = 2.55, xend = 3.45, y = 5, yend = 5), color = "blue")
I understand that at some fundamental level the x-axis is ordered by factor level, with b = 1, c = 2 etc., so asking for a line intersecting x = 0.9 would require specifying corresponding y value. In another sense though, ggplot2 clearly knows (in some sense) that the region above x = 0.9 (that is, y values intersected by a vertical line at x = 0.9) is associated with factor level b because the corresponding violin for b overlaps that region. Is there a way to get at that information?
You can use geom_errorbar(). So change your second block to:
df %>%
ggplot(aes(x = id)) +
geom_violin(aes(y = value)) +
geom_errorbar(aes(x = id, ymin = cutoff,ymax = cutoff), color = "red")
I am trying to plot a simple scatter plot for 3 groups, with different horizontal lines (line segment) for each group: for instance a hline at 3 for group "a", a hline at 2.5 for group "b" and a hline at 6 for group "c".
library(ggplot2)
df <- data.frame(tt = rep(c("a","b","c"),40),
val = round(rnorm(120, m = rep(c(4, 5, 7), each = 40))))
ggplot(df, aes(tt, val))+
geom_jitter(aes(tt, val), data = df, colour = I("red"),
position = position_jitter(width = 0.05))
I really appreciate your help!
Never send a line when a point can suffice:
library(ggplot2)
df <- data.frame(tt = rep(c("a","b","c"),40),
val = round(rnorm(120, m = rep(c(4, 5, 7), each = 40))))
hline <- data.frame(tt=c("a", "b", "c"), v=c(3, 2.5, 6))
ggplot(df, aes(tt, val))+
geom_point(data=hline, aes(tt, v), shape=95, size=20) +
geom_jitter(aes(tt, val), data = df, colour = I("red"),
position = position_jitter(width = 0.05))
There are other ways if this isn't acceptable, such as:
hline <- data.frame(tt=c(1, 2, 3), v=c(3, 2.5, 6))
ggplot(df, aes(tt, val))+
geom_jitter(aes(tt, val), data = df, colour = I("red"),
position = position_jitter(width = 0.05)) +
geom_segment(data=hline, aes(x=tt-0.25, xend=tt+0.25, y=v, yend=v))
The downside for the point is the egregious thickness and no control over width.
The downside for the segment is the need to use numerics for the discrete axis position vs the factors.
I also should have set the random seed to ensure reproducibility.
I'm trying to draw a simple (scree)-plot with some extra geom_hline and geom_vlines thrown in.
Problem is: whenever I so much as add show_guide=TRUE or add some aes() to the geom_xline, I screw up the original legend.
Here's some (ugly) fake data:
exdf <- data.frame(rep(x=1:12, times = 3), rep(x = c("A", "B", "C"), times = 6), rnorm(36), stringsAsFactors = FALSE)
colnames(exdf) <- c("PC", "variable", "eigenvalue")
And here's my plot:
g <- ggplot(data = exdf, mapping = aes(x = factor(PC), y = eigenvalue))
g <- g + geom_line(mapping = aes(group = factor(variable), linetype = variable))
g <- g + geom_vline(xintercept = 7, colour = "green", show_guide = TRUE)
How do I add a separate legend for the geom_vline without polluting the other legend?
Can't wrap my head around why one layer's color would change that of another legend.
This partly solves the problem:
g <- ggplot(data = exdf, mapping = aes(x = factor(PC), y = eigenvalue))
g <- g + geom_line(mapping = aes(group = factor(variable), linetype = variable))
g <- g + geom_vline(aes(xintercept = x, colour = Threshold), data.frame(x = 7, Threshold = "A"), show_guide = TRUE) + scale_colour_manual(values = c(A = "green")
But the legend will still have crosses for the variable section, albeit not green ones.
Alternatively you could use a geom_line with a new data frame with two rows, both with the same x and y equal to the lower and upper bounds of your data. This will give you a legend that has a horizontal green line for your threshold and no vertical lines.
Based on #Nick K's suggestion in the above, here's a way to do this with clean legends via different data = for the different layers.
exdf <- data.frame(rep(x=1:12, times = 3), rep(x = c("A", "B", "C"), times = 6), rnorm(36), stringsAsFactors = FALSE)
colnames(exdf) <- c("PC", "variable", "eigenvalue")
g <- ggplot()
g <- g + geom_line(data = exdf, mapping = aes(x = factor(PC), y = eigenvalue, group = factor(variable), linetype = variable))
g
thresholds <- data.frame(threshold = "Threshold-A", PC = 7, ymin = min(exdf$eigenvalue), ymax = max(exdf$eigenvalue))
g <- g + geom_linerange(data = thresholds, mapping = aes(x = PC, ymin = ymin, ymax = ymax, color = threshold))
g
yields:
Notice:
I know, the original data exdf are dumb and make an ugly plot; that's not the point here.
Notice that you have to set the data = argument for both layers, and keep the first g <- ggplot() blank, otherwise ggplot2 gets confused about the dataframes.
yeah, it's a hack job (see below), and it also doesn't fill the y-height of the plot, as a geom_vline should.
As an add-on, (not a solution!), here's how it should work with geom_vline:
exdf <- data.frame(rep(x=1:12, times = 3), rep(x = c("A", "B", "C"), times = 6), rnorm(36), stringsAsFactors = FALSE)
colnames(exdf) <- c("PC", "variable", "eigenvalue")
g <- ggplot()
g <- g + geom_line(data = exdf, mapping = aes(x = factor(PC), y = eigenvalue, group = factor(variable), linetype = variable))
g
g + geom_vline(data = thresholds, mapping = aes(xintercept = PC, color = threshold), show_guide = TRUE)
yields:
That fills the yheight, as you would expect from geom_vline, but somehow messes up the legend of variable (notice the vertical lines).
Not sure why this is so, feels like a bug to me.
Here reported: https://github.com/hadley/ggplot2/issues/1267