How to extend line across entire violin plot - r

Dataframe as example:
library(tidyverse)
set.seed(123)
df <- data.frame("b" = runif(1000, min = 2, max = 10),
"c" = runif(1000, min = 2, max = 10),
"d" = runif(1000, min = 2, max = 10))
df_2 <- data.frame(id = c("b", "c", "d"),
cutoff = c(5, 3, 5),
stringsAsFactors = FALSE)
df <-
pivot_longer(
df,
cols = c("b", "c", "d"),
names_to = "id",
values_to = "value"
) %>%
left_join(df_2, by = "id")
I can now make a violin plot (or a boxplot, same issue) with a line overlaid:
df %>%
ggplot(aes(x = id)) +
geom_violin(aes(y = value)) +
geom_line(aes(x = id, y = cutoff, group = 1), color = red)
What I'd like though is three lines (don't need to be connected) each of which extend across the entire width of a single violin, at the cutoff value specified in df_2.
I can do this manually with geom_segment, but is there a better, more programmatic way?
df %>%
ggplot(aes(x = id)) +
geom_violin(aes(y = value)) +
geom_segment(aes(x = 0.55, xend = 1.45, y = 5, yend = 5), color = "blue") +
geom_segment(aes(x = 1.55, xend = 2.45, y = 3, yend = 3), color = "blue") +
geom_segment(aes(x = 2.55, xend = 3.45, y = 5, yend = 5), color = "blue")
I understand that at some fundamental level the x-axis is ordered by factor level, with b = 1, c = 2 etc., so asking for a line intersecting x = 0.9 would require specifying corresponding y value. In another sense though, ggplot2 clearly knows (in some sense) that the region above x = 0.9 (that is, y values intersected by a vertical line at x = 0.9) is associated with factor level b because the corresponding violin for b overlaps that region. Is there a way to get at that information?

You can use geom_errorbar(). So change your second block to:
df %>%
ggplot(aes(x = id)) +
geom_violin(aes(y = value)) +
geom_errorbar(aes(x = id, ymin = cutoff,ymax = cutoff), color = "red")

Related

Overlay two plots from different dataframes in R

I would like to overlay two ggplots from different data sources. I don't think a left_join will work because the dataframes are of two different lengths and would potential change the underlying plots.[Maybe?]
library(tidyverse)
set.seed(123)
player_df <- tibble(name = rep(c("A","B","C","D"), each = 10, times = 1),
pos = rep(c("DEF","DEF","MID","MID"), each = 10, times = 1),
load = c(rnorm(10, mean = 200, sd = 100),
rnorm(10, mean = 300, sd = 50),
rnorm(10, mean = 400, sd = 100),
rnorm(10, mean = 500, sd = 50)))
p1 <- player_df %>%
ggplot(aes(x = load, y = name)) +
geom_point()
pos_df <- tibble(pos = rep(c("DEF","MID"), each = 30, times = 1),
load = (c(rnorm(30, mean = 250, sd = 100),
rnorm(30, mean = 350, sd = 100))))
p2 <- pos_df %>%
ggplot(aes(x = load, y = pos)) +
geom_boxplot()
p1
p2
# add p2 to every p1 player plot by pos
I would like p1 to have the corresponding p2 - by pos - appear behind it. So... add the matching p2 boxplot to each p1 scatterplot.
p1:
p2:
It's not really advisable to attempt to superimpose two plots on each other. A ggplot is made of layers already, so usually it's just a case of superimposing one geom on another. This can be difficult if (as in your case) one of the axes has different labels. However, with a little work it is possible to wrangle your data so that it all sits on a single plot. In your case, you could do something like:
levs <- c("A", "DEF", "B", "C", "MID", "D")
ggplot(within(pos_df, pos <- factor(pos, levs)), aes(x = load, y = pos)) +
geom_boxplot(width = 2.3) +
geom_point(data = within(player_df, pos <- factor(name, levs))) +
scale_y_discrete(limits = c("A", "DEF", "B", " ", "C", "MID", "D"))
Dug into ggplot a bit and re-engineered a boxplot bit by bit.
# manually calculate stats that are used in boxplots
pos_df_summary <- pos_df %>%
group_by(pos, .drop = FALSE) %>%
summarise(min = fivenum(load)[1],
Q1 = fivenum(load)[2],
median = fivenum(load)[3],
Q3 = fivenum(load)[4],
max = fivenum(load)[5]
)
# add the boxplot data to each player
joined_df <- player_df %>%
left_join(., pos_df_summary, by = "pos") %>%
distinct(name, .keep_all = TRUE)
# plot
ggplot(data = NULL, aes(group = name)) +
# create the line from min to max
geom_segment(data = joined_df, aes(y = name, yend = name, x=min, xend=max), color="black") +
#create the box with median line
geom_crossbar(data = joined_df,
aes(y = name, xmin = Q1, xmax = Q3, x = median, fill = "NA"),
color = "black",
fatten = 1) +
scale_fill_manual(values = "white") +
# add the points from the player_df
geom_point(data = player_df,
aes(x = load, y = name, group=name),
color = "red",
show.legend=FALSE) +
theme(legend.position = "none")
There may be some extraneous code in here as I cobbled it from some other resources. Specifically, I'm not sure what the aes(group = name) in the ggplot() call does exactly.

How to add individual hlines for each bar in a plot?

Given a data frame and a plot as follows:
library(dplyr)
library(ggplot2)
dat <- data.frame(grp = c("a", "b", "c"),
val = c(30, 20, 10),
avg = c(25, 15, 5))
dat %>%
ggplot(aes(x = grp, y = val)) +
geom_bar(stat = "identity")
How do I amend the code above to place a unique horizontal reference line (avg) on each bar as shown below:
This could be achieved via geom_segment like so, where I first conver grp to a numeric and corresponding to the default width of a bar of .9 put the x at .45 to the left and xend at .45 to the right:
library(ggplot2)
dat <- data.frame(grp = c("a", "b", "c"),
val = c(30, 20, 10),
avg = c(25, 15, 5))
ggplot(dat, aes(x = grp, y = val)) +
geom_bar(stat = "identity") +
geom_segment(aes(y = avg, yend = avg,
x = as.numeric(factor(grp)) - .45,
xend = as.numeric(factor(grp)) + .45), color = "red")
EDIT Thanks to comment by #tjebo: As hard-coding is rarely a good idea one could set the width via a variable:
w <- .9
...
geom_segment(aes(y = avg, yend = avg,
x = as.numeric(factor(grp)) - w/2,
xend = as.numeric(factor(grp)) + w/2), color = "red")

control overlaying lines while color is continuous value in ggplot

I have a data and would like to plot the lines and have control over the order that lines are laying on top of each other.
I would like to use 'cale_color_viridis()' as my pallet. I have no idea how can plot the lighter(yellow) line on the darker ones.
Here is my toy data frame and my code:
toy_data <- data.frame(x = c(1,3,1,2,5,0), y = c(0, 01, 1, 0.6, 1, .7),
col = rep(c("r", "b", "g"), each = 2), group = seq(0,1, by = 0.2))
ggplot(toy_data, aes(x = x, y = y, group = col, color = group)) +
geom_line(size = 2) +
scale_color_viridis()
any idea how can I do this?
The group aesthetic determines the plotting order, in this case, the col variable which is character data. It will normally plot in alphabetical order (b g r), so to get the yellow line from col "g" to print last, you could convert it to a factor ordered in order of appearance, like with forcats::fct_inorder:
ggplot(toy_data,
aes(x = x, y = y, group = col %>% forcats::fct_inorder(), color = group)) +
geom_line(size = 2) +
scale_color_viridis_c() # added in ggplot2 3.0 in July 2018.
# scale_color_viridis for older ggplot2 versions
If col is numeric, you could achieve the same thing by giving your "top" series the biggest number.
toy_data2 <- data.frame(x = c(1,3,1,2,5,0), y = c(0, 01, 1, 0.6, 1, .7),
col = rep(c(3, 1, 2), each = 2), group = seq(0,1, by = 0.2))
ggplot(toy_data2,
aes(x = x, y = y, group = if_else(col == 2, 1e10, col), color = group)) +
geom_line(size = 2) +
scale_color_viridis_c()

Dodge two different geoms apart in ggplot2

Let's say I have two different sources of data. One is of repeated observations, and one is just a mean +/- standard error predicted by a model.
n <- 30
obs <- data.frame(
group = rep(c("A", "B"), each = n*3),
level = rep(rep(c("low", "med", "high"), each = n), 2),
yval = c(
rnorm(n, 30), rnorm(n, 50), rnorm(n, 90),
rnorm(n, 40), rnorm(n, 55), rnorm(n, 70)
)
) %>%
mutate(level = factor(level, levels = c("low", "med", "high")))
model_preds <- data.frame(
group = c("A", "A", "A", "B", "B", "B"),
level = rep(c("low", "med", "high"), 2),
mean = c(32,56,87,42,51,74),
sem = runif(6, min = 2, max = 5)
)
now I can plot these on the same graph easily enough
p <- ggplot(obs, aes(x = level, y = yval, fill = group)) +
geom_boxplot() +
geom_point(data = model_preds, aes(x = level, y = mean), size = 2, colour = "forestgreen") +
geom_errorbar(data = model_preds, aes(x = level, y = mean, ymax = mean + sem, ymin = mean - sem), colour = "forestgreen", size = 1) +
facet_wrap(~group)
and use that the visually look at the difference between the model predictions and the observed results.
But I think this looks a bit ugly, so ideally would want to 'dodge' the point-and-errorbars geom(s) from the boxplot geom.
If you'll forgive my quick paint drawing, something like this:
It seems like position_dodge() might be the way to go but I haven't figured out how to combine two different geoms this way and the docs don't have any examples.
Might be that it's impossible, but thought I'd ask to check
As a consequence of the grammer of graphics, which clearly separates various aspects of plotting, there is no way to communicate information between different layers (geoms and stats) of a plot. This also means that a position adjustment cannot be shared across layers, such that they can be dodged in a multi-layer fashion.
The next best thing you could do, is to use position = position_nudge() in every layer, so that across the layers they seem dodged. You might also want to adjust the width parameter of the boxplot and errorbar for this. Example below:
library(tidyverse)
n <- 30
obs <- data.frame(
group = rep(c("A", "B"), each = n*3),
level = rep(rep(c("low", "med", "high"), each = n), 2),
yval = c(
rnorm(n, 30), rnorm(n, 50), rnorm(n, 90),
rnorm(n, 40), rnorm(n, 55), rnorm(n, 70)
)
) %>%
mutate(level = factor(level, levels = c("low", "med", "high")))
model_preds <- data.frame(
group = c("A", "A", "A", "B", "B", "B"),
level = rep(c("low", "med", "high"), 2),
mean = c(32,56,87,42,51,74),
sem = runif(6, min = 2, max = 5)
)
ggplot(obs, aes(x = level, y = yval, fill = group)) +
geom_boxplot(position = position_nudge(x = -0.3),
width = 0.5) +
geom_point(data = model_preds, aes(x = level, y = mean),
size = 2, colour = "forestgreen",
position = position_nudge(x = 0.3)) +
geom_errorbar(data = model_preds,
aes(x = level, y = mean, ymax = mean + sem, ymin = mean - sem),
colour = "forestgreen", size = 1, width = 0.5,
position = position_nudge(x = 0.3)) +
facet_wrap(~group)
Created on 2021-01-17 by the reprex package (v0.3.0)

Add count as label to points in geom_count

I used geom_count to visualise overlaying points as sized groups, but I also want to add the actual count as a label to the plotted points, like this:
However, to achieve this, I had to create a new data frame containing the counts and use these data in geom_text as shown here:
#Creating two data frames
data <- data.frame(x = c(2, 2, 2, 2, 3, 3, 3, 3, 3, 4),
y = c(1, 2, 2, 2, 2, 2, 3, 3, 3, 3),
id = c("a", "b", "b", "b", "c",
"c", "d", "d", "d", "e"))
data2 <- data %>%
group_by(id) %>%
summarise(x = mean(x), y = mean(y), count = n())
# Creating the plot
ggplot(data = data, aes(x = x, y = y)) +
geom_count() +
scale_size_continuous(range = c(10, 15)) +
geom_text(data = data2,
aes(x = x, y = y, label = count),
color = "#ffffff")
Is there any way to achieve this in a more elegant way (i.e. without the need for the second data frame)? I know that you can access the count in geom_count using ..n.., yet if I try to access this in geom_text, this is not working.
Are you expecting this:
ggplot(data %>%
group_by(id) %>%
summarise(x = mean(x), y = mean(y), count = n()),
aes(x = x, y = y)) + geom_point(aes(size = count)) +
scale_size_continuous(range = c(10, 15)) +
geom_text(aes(label = count),
color = "#ffffff")
update:
If the usage of geom_count is must, then the expected output can be achieved using:
p <- ggplot(data = data, aes(x = x, y = y)) +
geom_count() + scale_size_continuous(range = c(10, 15))
p + geom_text(data = ggplot_build(p)$data[[1]],
aes(x, y, label = n), color = "#ffffff")
here would be a solution for a code with discrete values
f<-ggplot(data = STest, aes(x = x, y = y)) + geom_count()+scale_x_discrete(labels = c("strong decrease","decrease","no change","increase","strong increase","no opinion"))+scale_y_discrete(labels = c("strong decrease","decrease","no change","increase","strong increase","no opinion"))
f + geom_text(data = ggplot_build(p)$data[[1]],aes(x, y, label = n,vjust= -2))
Thank you so much!
A much easier way to change this is to use the labs() function so in this case it would be ...labs(size = "Count") + ....
That should be all you need.

Resources