Creating equal and pretty scale breaks with facet_grid() - r

I want to create a plot using facet_grid(), with free scales for the y axis. However, for each row, the scale breaks should be distributed evenly, that is, with 3 breaks.
I lended from this question, but I was not able to adapt the code in a way that the scale breaks are actually pretty.
However, this is my current approach:
# Packages
library(dplyr)
library(ggplot2)
library(scales)
# Test Data
set.seed(123)
result_df <- data.frame(
variable = rep(c(1,2,3,4), each = 4),
mode = rep(c(1,2), each = 2),
treat = rep(c(1,2)) %>% as.factor(),
mean = rnorm(16, mean = .7, sd = 0.2),
x = abs(rnorm(16, mean = 0, sd = 0.5))) %>%
mutate(lower = mean - x,upper = mean + x)
# Function for equal breaks, lended from
equal_breaks <- function(n = 3, s = 0.05, ...) {
function(x) {
d <- s * diff(range(x)) / (1+2*s)
round(seq(min(x)+d, max(x)-d, length=n), 2)
}}
## Plot
result_df %>%
ggplot(aes(y = mean*100, x = treat)) +
geom_pointrange(aes(ymin = lower*100, ymax = upper*100), shape = 20) +
facet_grid(variable ~ mode, scales = "free_y")+
scale_y_continuous(breaks = equal_breaks(n = 3, s = .2))+
labs(x = "", y = "")
Which leads to this current plot. As one can see, the breaks are far from being reasonable.
Thanks in advance for any kind of recommendation, and please excuse me in case I have missed a already existing solution.
Best, Malte

Related

Referring to the input data of ggplot and use that in a custom function within a geom

I'm using ggplot geom_vline in combination with a custom function to plot certain values on top of a histogram.
The example function below e.g. returns a vector of three values (the mean and x sds below or above the mean). I can now plot these values in geom_vline(xintercept) and see them in my graph.
#example function
sds_around_the_mean <- function(x, multiplier = 1) {
mean <- mean(x, na.rm = TRUE)
sd <- sd(x, na.rm = TRUE)
tibble(low = mean - multiplier * sd,
mean = mean,
high = mean + multiplier * sd) %>%
pivot_longer(cols = everything()) %>%
pull(value)
}
Reproducible data
#data
set.seed(123)
normal <- tibble(data = rnorm(1000, mean = 100, sd = 5))
outliers <- tibble(data = runif(5, min = 150, max = 200))
df <- bind_rows(lst(normal, outliers), .id = "type")
df %>%
ggplot(aes(x = data)) +
geom_histogram(bins = 100) +
geom_vline(xintercept = sds_around_the_mean(df$data, multiplier = 3),
linetype = "dashed", color = "red") +
geom_vline(xintercept = sds_around_the_mean(df$data, multiplier = 2),
linetype = "dashed")
The problem is, that as you can see I would have to define data$df at various places.
This becomes more error-prone when I apply any change to the original df that I pipe into ggplot, e.g. filtering out outliers before plotting. I would have to apply the same changes again at multiple places.
E.g.
df %>% filter(type == "normal")
#also requires
df$data
#to be changed to
df$data[df$type == "normal"]
#in geom_vline to obtain the correct input values for the xintercept.
So instead, how could I replace the df$data argument with the respective column of whatever has been piped into ggplot() in the first place? Something similar to the "." operator, I assume. I've also tried stat_summary with geom = "vline" to achieve this, but without the desired effect.
You can enclose the ggplot part in curly brackets and reference the incoming dataset with the . symbol both in the ggplot command and when calculating the sds_around_the_mean. This will make it dynamic.
df %>%
{ggplot(data = ., aes(x = data)) +
geom_histogram(bins = 100) +
geom_vline(xintercept = sds_around_the_mean(.$data, multiplier = 3),
linetype = "dashed", color = "red") +
geom_vline(xintercept = sds_around_the_mean(.$data, multiplier = 2),
linetype = "dashed")}

R ggplot: overlay two conditional density plots (same binary outcome variable) - possible?

I know how to plot several density curves/polygrams on one plot, but not conditional density plots.
Reproducible example:
require(ggplot2)
# generate data
a <- runif(200, min=0, max = 1000)
b <- runif(200, min=0, max = 1000)
c <- sample(c("A", "B"), 200, replace =T)
df <- data.frame(a,b,c)
# plot 1
ggplot(df, aes(a, fill = c)) +
geom_density(position='fill', alpha = 0.5)
# plot 2
ggplot(df, aes(b, fill = c)) +
geom_density(position='fill', alpha = 0.5)
In my real data I have a bunch of these paired conditional density plots and I would need to overlay one over the other to see (and show) how different (or similar) they are. Does anyone know how to do this?
One way would be to plot the two versions as layers. The overlapping areas will be slightly different, depending on the layer order, based on how alpha works in ggplot2. This may or may not be what you want. You might fiddle with the two alphas, or vary the border colors, to distinguish them more.
ggplot(df, aes(fill = c)) +
geom_density(aes(a), position='fill', alpha = 0.5) +
geom_density(aes(b), position='fill', alpha = 0.5)
For example, you might make it so the fill only applies to one layer, but the other layer distinguishes groups using the group aesthetic, and perhaps a different linetype. This one seems more readable to me, especially if there is a natural ordering to the two variables that justifies putting one in the "foreground" and one in the "background."
ggplot(df) +
geom_density(aes(a, group = c), position='fill', alpha = 0.2, linetype = "dashed") +
geom_density(aes(b, fill = c), position='fill', alpha = 0.5)
I'm not so sure if "on top of one another" is a great idea. Jon's ideas are probably the way to go. But what about just plotting side-by side - our brains can cope with that and we can compare this pretty well.
Make it long, then use facet.
Another option might be an animated graph (see 2nd code chunk below).
require(ggplot2)
#> Loading required package: ggplot2
library(tidyverse)
a <- runif(200, min=0, max = 1000)
b <- runif(200, min=0, max = 1000)
#### BAAAAAD idea to call anything "c" in R!!! Don't do this. ever!
d <- sample(c("A", "B"), 200, replace =T)
df <- data.frame(a,b,d)
df %>% pivot_longer(cols = c(a,b)) %>%
ggplot(aes(value, fill = d)) +
geom_density(position='fill', alpha = 0.5) +
facet_grid(~name)
library(gganimate)
p <- df %>% pivot_longer(cols = c(a,b)) %>%
ggplot(aes(value, fill = d)) +
geom_density(position='fill', alpha = 0.5) +
labs(title = "{closest_state}")
p_anim <- p + transition_states(name)
animate(p_anim, duration = 2, fps = 5)
Created on 2022-06-14 by the reprex package (v2.0.1)
Although it is not the overlay you might have thought of, it facilitates the comparison of density curves:
library(tidyverse)
library(ggridges)
library(truncnorm)
DF <- tibble(
alpha = rtruncnorm(n = 200, a = 0, b = 1000, mean = 500, sd = 50),
beta = rtruncnorm(n = 200, a = 0, b = 1000, mean = 550, sd = 50)
)
DF <- DF %>%
pivot_longer(c(alpha, beta), names_to = "name", values_to = "meas") %>%
mutate(name = factor(name))
DF %>%
ggplot(aes(meas, name, fill = factor(stat(quantile)))) +
stat_density_ridges(
geom = "density_ridges_gradient",
calc_ecdf = T,
quantiles = 4,
quantile_lines = T
) +
scale_fill_viridis_d(name = "Quartiles")

Avoid overlap of points on a timeline (1-D repeling)

I want to create a timeline plot that roughly resembles the example below: lots of overlap at some points, not a lot of overlap at others.
What I need: overlapping images should repel each other where necessary, eliminating or reducing overlap. Ideally I'd be able to implement either a vertical or horizontal repel.
library(tidyverse)
library(ggimage)
test_img <- list.files(system.file("extdata", package="ggimage"), pattern="png", full.names=TRUE)
set.seed(123)
df <-
tibble(date = as.Date(paste0("2020-", round(runif(45, 1, 2)), "-", round(runif(45, 1, 10)))),
group = paste0("Timeline ", rep(1:9, each = 5)),
img = sample(test_img, size = 45, replace = T) )
df %>%
ggplot() +
geom_line(aes(x = date, y = group, group = group), size = 5, alpha = 0.2) +
geom_image(aes(x = date, y = group, image = img, group = group), asp = 1)
Something similar to the repelling in ggbeeswarm::geom_beeswarm or ggrepel::geom_text_repel would be nice, but those don't support images. So I think I need to pre-apply some kind of 1-dimensional packing algorithm, implementing iterative pair-wise repulsion on my vector of dates within each group, to try to find a non-overlapping arrangement.
Any ideas? Thank you so much!
Created on 2021-10-30 by the reprex package (v2.0.1)
Here is the solution I’ve been able to come up with, repurposing the circleRepelLayout function from the awesome packcircles package
into the repel_vector vector function that takes in your overlapping vector and a "repel_radius", and returns, if possible, a non-overlapping version.
I demonstrate the solution with the richtext geom since this is a geom I’ve always wished had repel functionality.
library(packcircles)
library(tidyverse)
library(ggtext)
library(ggimage)
repel_vector <- function(vector, repel_radius = 1, repel_bounds = range(vector)){
stopifnot(is.numeric(vector))
repelled_vector <-
packcircles::circleRepelLayout(x = data.frame(vector, ypos = 1, repel_radius),
xysizecols = c("vector", "ypos", "repel_radius"),
xlim = repel_bounds, ylim = c(0,1),
wrap = FALSE) %>%
as.data.frame() %>%
.$layout.x
return(repelled_vector)
}
overlapping_vec <- c(1, 1.1, 1.2, 10, 10.1, 10.2)
repelled_vec_default <- repel_vector(overlapping_vec)
repelled_vec_tighter <- repel_vector(overlapping_vec, repel_radius = 0.35)
ggplot() +
annotate("richtext", x = overlapping_vec, y = 3, label = "**test**", alpha = 0.5) +
annotate("richtext", x = repelled_vec_default, y = 2, label = "**test**", alpha = 0.5) +
annotate("richtext", x = repelled_vec_tighter, y = 1, label = "**test**", alpha = 0.5) +
scale_y_continuous(breaks = 1:3, labels = c("Tighter repel", "Default repel", "Overlapping points"))
In theory you apply this to 2D repelling as well.
To solve the problem in my question, this can be applied like so:
test_img <- list.files(system.file("extdata", package="ggimage"), pattern="png", full.names=TRUE)
set.seed(123)
df <-
tibble(date = as.Date(paste0("2020-", round(runif(45, 1, 2)), "-", round(runif(45, 1, 10)))),
group = paste0("Timeline ", rep(1:9, each = 5)),
img = sample(test_img, size = 45, replace = T) ) %>%
group_by(group) %>%
mutate(repelled_date = repel_vector(as.numeric(date),
repel_radius = 4,
repel_bounds = range(as.numeric(date)) + c(-3,3)),
repelled_date = as.Date(repelled_date, origin = "1970-01-01"))
df %>%
ggplot() +
geom_line(aes(x = date, y = group, group = group), size = 5, alpha = 0.2) +
geom_image(aes(x = repelled_date, y = group, image = img, group = group), asp = 1)
Created on 2021-10-30 by the reprex package (v2.0.1)

Mix color and fill aesthetics in ggplot

I wonder if there is the possibility to change the fill main colour according to a categorical variable
Here is a reproducible example
df = data.frame(x = c(rnorm(10, mean = 0),
rnorm(10, mean = 3)),
y = c(rnorm(10, mean = 0),
rnorm(10, mean = 3)),
grp = c(rep('a', times = 10),
rep('b', times = 10)),
val = rep(1:10, times = 2))
ggplot(data = df,
aes(x = x,
y = y)) +
geom_point(pch = 21,
aes(color = grp,
fill = val,
size = val))
Of course it is easy to change the circle colour/shape, according to the variable grp, but I'd like to have the a group in shades of red and the b group in shades of blue.
I also thought about using facets, but don't know if the fill gradient can be changed for the two panels.
Anyone knows if that can be done, without gridExtra?
Thanks!
I think there are two ways to do this. The first is using the alpha aesthetic for your val column. This is a quick and easy way to accomplish your goal but may not be exactly what you want:
ggplot(data = df,
aes(x = x,
y = y)) +
geom_point(pch = 21,
aes(alpha=val,
fill = grp,
size = val)) + theme_minimal()
The second way would be to do something similar to this post: Vary the color gradient on a scatter plot created with ggplot2. I edited the code slightly so its not a range from white to your color of interest but from a lighter color to a darker color. This requires a little bit of work and using the scale_fill_identity function which basically takes a variable that has the colors you want and maps them directly to each point (so it doesn't do any scaling).
This code is:
#Rescale val to [0,1]
df$scaled_val <- rescale(df$val)
low_cols <- c("firebrick1","deepskyblue")
high_cols <- c("darkred","deepskyblue4")
df$col <- ddply(df, .(grp), function(x)
data.frame(col=apply(colorRamp(c(low_cols[as.numeric(x$grp)[1]], high_cols[as.numeric(x$grp)[1]]))(x$scaled_val),
1,function(x)rgb(x[1],x[2],x[3], max=255)))
)$col
df
ggplot(data = df,
aes(x = x,
y = y)) +
geom_point(pch = 21,
aes(
fill = col,
size = val)) + theme_minimal() +scale_fill_identity()
Thanks to this other post I found a way to visualize the fill bar in the legend, even though that wasn't what I meant to do.
Here's the ouptup
And the code
df = data.frame(x = c(rnorm(10, mean = 0),
rnorm(10, mean = 3)),
y = c(rnorm(10, mean = 0),
rnorm(10, mean = 3)),
grp = factor(c(rep('a', times = 10),
rep('b', times = 10)),
levels = c('a', 'b')),
val = rep(1:10, times = 2)) %>%
group_by(grp) %>%
mutate(scaledVal = rescale(val)) %>%
ungroup %>%
mutate(scaledValOffSet = scaledVal + 100*(as.integer(grp) - 1))
scalerange <- range(df$scaledVal)
gradientends <- scalerange + rep(c(0,100,200), each=2)
ggplot(data = df,
aes(x = x,
y = y)) +
geom_point(pch = 21,
aes(fill = scaledValOffSet,
size = val)) +
scale_fill_gradientn(colours = c('white',
'darkred',
'white',
'deepskyblue4'),
values = rescale(gradientends))
Basically one should rescale fill values (e.g. between 0 and 1) and separate them using another order of magnitude, provided by the categorical variable grp.
This is not what I wanted though: the snippet can be improved, of course, to make the whole thing less manual, but still lacks the simple usual discrete fill legend.

Boxplot width in ggplot with cross classified groups

I am making boxplots with ggplot with data that is classified by 2 factor variables. I'd like to have the box sizes reflect sample size via varwidth = TRUE but when I do this the boxes overlap.
1) Some sample data with a 3 x 2 structure
data <- data.frame(group1= sample(c("A","B","C"),100, replace = TRUE),group2= sample(c("D","E"),100, replace = TRUE) ,response = rnorm(100, mean = 0, sd = 1))
2) Default boxplots: ggplot without variable width
ggplot(data = data, aes(y = response, x = group1, color = group2)) + geom_boxplot()
I like how the first level of grouping is shown.
Now I try to add variable widths...
3) ...and What I get when varwidth = TRUE
ggplot(data = data, aes(y = response, x = group1, color = group2)) + geom_boxplot(varwidth = T)
This overlap seems to occur whether I use color = group2 or group = group2 in both the main call to ggplot and in the geom_boxplot statement. Fussing with position_dodge doesn't seem to help either.
4) A solution I don't like visually is to make unique factors by combining my group1 and group2
data$grp.comb <- paste(data$group1, data$group2)
ggplot(data = data, aes(y = response, x = grp.comb, color = group2)) + geom_boxplot()
I prefer having things grouped to reflect the cross classification
5) The way forward:
I'd like to either a)figure out how to either make varwidth = TRUE not cause the boxes to overlap or b)manually adjusted the space between the combined groups so that boxes within the 1st level of grouping are closer together.
I think your problem can be solved best by using facet_wrap.
library(ggplot2)
data <- data.frame(group1= sample(c("A","B","C"),100, replace = TRUE), group2=
sample(c("D","E"),100, replace = TRUE) ,response = rnorm(100, mean = 0, sd = 1))
ggplot(data = data, aes(y = response, x = group2, color = group2)) +
geom_boxplot(varwidth = TRUE) +
facet_wrap(~group1)
Which gives:
A recent update to ggplot2 makes it so that the code provided by #N Brouwer in (3) works as expected:
# library(devtools)
# install_github("tidyverse/ggplot2")
packageVersion("ggplot2") # works with v2.2.1.9000
library(ggplot2)
set.seed(1234)
data <- data.frame(group1= sample(c("A","B","C"), 100, replace = TRUE),
group2= sample(c("D","E"), 100, replace = TRUE),
response = rnorm(100, mean = 0, sd = 1))
ggplot(data = data, aes(y = response, x = group1, color = group2)) +
geom_boxplot(varwidth = T)
(I'm a new user and can't post images inline)
fig 1
This question has been answered here ggplot increase distance between boxplots
The answer involves using the position = position_dodge() argument of geom_boxplot().
For your example:
data <- data.frame(group1= sample(c("A","B","C"),100, replace = TRUE), group2=
sample(c("D","E"),100, replace = TRUE) ,response = rnorm(100, mean = 0, sd = 1))
ggplot(data = data, aes(y = response, x = group1, color = group2)) +
geom_boxplot(position = position_dodge(1))

Resources