Graph with a shaded the area occupied by multiple lines - r

PROBLEM STATEMENT
My dataset contains 100 groups, each of one can be plotted as a line with a similar shape against a response variable. I would like to produce a graph where all the space taken by the 100 curved lines turns into a shaded area, so it is easier to show the variation of the response variable across all the groups. This will also allow to clearly see the values or intervals in the x-axis where the response variable has lower variation (shaded area will be narrower as most lines will overlap) or higher variation.
CODE EXAMPLE
library(tidyverse)
library(ggplot2)
set.seed(1)
# Produce a similar table to the real one
example <- tibble(values = seq(0, 10, 0.1),
sine1 = sin(values + 0.2),
sine2 = sin(values - 0.2),
sine3 = sin(values + 0.4) + 0.2,
sine4 = sin(values - 0.4) - 0.2,
sine5 = sin(values - 0.4) + 0.2,
sine6 = sin(values - 0.2) + 0.4) %>%
pivot_longer(-values) # final format with 3 columns
# Create a line graph, where each line represents a different sine curve
graph1 <- ggplot(example, aes(x = values, y = value, col = name)) +
geom_line(size = 3, show.legend = FALSE, alpha = 0.5) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))
graph1
QUESTION
Is there a way of going from this graph...
to this one (or similar)? Note: the thick black line is not strictly necessary

You just need to group per individual time unit and calculate the minimum / maximum values. This allows you to plot a geom_ribbon:
example %>%
group_by(values) %>%
summarize(min = min(value), max = max(value)) %>%
ggplot() +
geom_ribbon(aes(x = values, ymin = min, ymax = max), size = 2,
fill = "#29c8e5", color = "black") +
theme_classic()
If you would rather have the ribbon overlying your original plot, you could do:
ribbon <- example %>%
group_by(values) %>%
summarize(min = min(value), max = max(value))
graph1 +
geom_ribbon(aes(x = values, ymin = min, ymax = max),
data = ribbon, size = 0, fill = "#29c8e5",
color = NA, alpha = 0.3, inherit.aes = FALSE)
For what it's worth, I think the first option is more visually striking.

Related

Change ggplot bar chart fill colors

With this data:
df <- data.frame(value =c(20, 50, 90),
group = c(1, 2,3))
I can get a bar chart:
df %>% ggplot(aes(x = group, y = value, fill = value)) +
geom_col() +
coord_flip()+
scale_fill_viridis_c(option = "C") +
theme(legend.position = "none")
But I would like to have the colors of those bars to vary according to their corresponding values in value.
I have managed to change them using geom_raster:
ggplot() +
geom_raster(aes(x = c(0:20), y = .9, fill = c(0:20)),
interpolate = TRUE) +
geom_raster(aes(x = c(0:50), y = 2, fill = c(0:50)),
interpolate = TRUE) +
geom_raster(aes(x = c(0:90), y = 3.1, fill = c(0:90)),
interpolate = TRUE) +
scale_fill_viridis_c(option = "C") +
theme(legend.position = "none")
This approach is not efficient when I have many groups in real data. Any suggestions to get it done more efficiently would be appreciated.
I found the accepted answer to a previous similar question, but "These numbers needs to be adjusted depending on the number of x values and range of y". I was looking for an approach that I do not have to adjust numbers based on data. David Gibson's answer fits my purpose.
It does not look like this is supported natively in ggplot. I was able to get something close by adding additional rows, ranging from 0 to value) to the data. Then use geom_tile and separating the tiles by specifying width.
library(tidyverse)
df <- data.frame(value = c(20, 50, 90),
group = c(1, 2, 3))
df_expanded <- df %>%
rowwise() %>%
summarise(group = group,
value = list(0:value)) %>%
unnest(cols = value)
df_expanded %>%
ggplot() +
geom_tile(aes(
x = group,
y = value,
fill = value,
width = 0.9
)) +
coord_flip() +
scale_fill_viridis_c(option = "C") +
theme(legend.position = "none")
If this is too pixilated you can increase the number of rows generated by replacing list(0:value) with seq(0, value, by = 0.1).
This is a real hack using ggforce. This package has a geom that can take color gradients but it is for a line segment. I've just increased the size to make the line segment look like a bar. I made all the bars the same length to get the correct gradient, then covered a portion of each bar over with the same color as the background color to make them appear to be the correct length. Had to hide the grid lines, however. :-)
df %>%
ggplot() +
geom_link(aes(x = 0, xend = max(value), y = group, yend = group, color = stat(index)), size = 30) +
geom_link(aes(x = value, xend = max(value), y = group, yend = group), color = "grey", size = 31) +
scale_color_viridis_c(option = "C") +
theme(legend.position = "none", panel.background = element_rect(fill = "grey"),
panel.grid = element_blank()) +
ylim(0.5, max(df$group)+0.5 )

How can I visualise points on a single line in R?

I'm wanting to plot 3 numerical size values on one line in R in order of ascending size, but research so far has pointed me towards regular line graphs. I'm looking for something like this:
where size increases from left to right and I can plot my 3 data points on the line to show where each sample falls. It doesnt need to be as complicated as this example, just one line standalone.
How would I go about doing this?
Here's a quick recreation:
library(tidyverse)
mtcars %>%
group_by(gear = as.factor(gear)) %>%
summarize(min = min(wt),
max = max(wt),
mean = mean(wt),
sd = sd(wt),
median = median(wt)) -> summary
ggplot(summary, aes(y=gear)) +
geom_errorbarh(aes(xmin = min, xmax = max), height = 0.04, color = "gray70") +
geom_segment(aes(yend = gear, x = mean-sd, xend = mean+sd), alpha = 0.3,
color = "forestgreen", size = 10) +
geom_point(aes(x = median), shape = 17, color = "darkred") +
geom_text(aes(x = median, label = median), vjust = -1.5) +
theme_minimal() + theme(panel.grid = element_blank())

Reproduce a plot using ggplot

I am trying to reproduce a plot using ggplot.
The code I got from the textbook:
skeptic<-c(1,1.171,1.4,1.8,2.2,2.6,3,3.4,3.8,3.934,4.2,
4.6,5,5.4,5.8,6.2,6.6,7,7.4,7.8,8.2,8.6,9)
effect<-c(-.361,-.327,-.281,-.200,-.120,-.039,.041,.122,.202,.229,.282,
.363,.443,.524,.604,.685,.765,.846,.926,1.007,1.087,1.168,1.248)
llci<-c(-.702,-.654,-.589,-.481,-.376,-.276,-.184,-.099,-.024,0,.044,.105,
.161,.212,.261,.307,.351,.394,.436,.477,.518,.558,.597)
ulci<-c(-.021,0,.028,.080,.136,.197,.266,.343,.428,.458,.521,.621,.726,.836,
.948,1.063,1.180,1.298,1.417,1.537,1.657,1.778,1.899)
plot(x=skeptic,y=effect,type="l",pch=19,ylim=c(-1,1.5),xlim=c(1,6),lwd=3,
ylab="Conditional effect of disaster frame",
xlab="Climate Change Skepticism (W)")
points(skeptic,llci,lwd=2,lty=2,type="l")
points(skeptic,ulci,lwd=2,lty=2,type="l")
abline(h=0, untf=FALSE,lty=3,lwd=1)
abline(v=1.171,untf=FALSE,lty=3,lwd=1)
abline(v=3.934,untf=FALSE,lty=3,lwd=1)
text(1.171,-1,"1.171",cex=0.8)
text(3.934,-1,"3.934",cex=0.8)
The exemplary plot is
I have tried ggplot but I am struggling with the vertical and horizontal dashed line. Could anybody reproduce the plot using ggplot? And I have a follow-up question. How can I mark the area of x < 3.934 and x > 1.171? Thank you!
Here is a way to reproduce the posted graph.
library(ggplot2)
library(magrittr)
library(tidyr)
df1 <- data.frame(skeptic, effect, llci, ulci)
vlines <- data.frame(x = c(0, 1.171, 3.934))
vertices <- data.frame(xmin = 1.171, xmax = 3.934,
ymin = -Inf, ymax = Inf)
brks <- names(df1)[-1]
df1 %>%
pivot_longer(-skeptic, names_to = "line") %>%
ggplot(aes(skeptic, value)) +
geom_rect(data = vertices,
mapping = aes(xmin = xmin, xmax = xmax,
ymin = ymin, ymax = ymax),
fill = "blue", alpha = 0.2,
inherit.aes = FALSE) +
geom_line(aes(size = line, linetype = line)) +
geom_hline(yintercept = 0, linetype = "dotted") +
geom_vline(data = vlines,
mapping = aes(xintercept = x),
linetype = "dotted") +
geom_text(data = subset(vlines, x != 0),
mapping = aes(x = x, label = x),
y = -0.75,
hjust = 0, vjust = 1) +
scale_size_manual(breaks = brks, values = c(1, 0.5, 0.5)) +
scale_linetype_manual(breaks = brks, values = c("solid", "dashed", "dashed")) +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
Constructing on your specific question (horizontal and vertical lines and area) as you said you got already the remaining parts right.
Use geom_hline for horizontal line and geom_vline for vertical one. linetype="dashed" will render dashed lines. As you didn't tell how you want the area rendered, here is my guess, a vertical grayed area extending horizontally from abcissa of your vertical lines and vertically from min effect to max effect (Inf values) drawn using a geom_rect.
ggplot(data.frame(skeptic,effect))+
geom_line(aes(skeptic,effect))+
geom_rect(aes(xmin=1.171,xmax=3.934,ymin=-Inf,ymax=Inf),fill="lightgray")+
geom_hline(yintercept=0,linetype="dashed") +
geom_vline(xintercept=c(1.171,3.934),linetype="dashed")

Overlay histogram and density with varying alpha

I am trying to create a ggplot histogram with a density overlay, where the alpha changes past the number 1. An example can be seen on 538 under the Every outcome in our simulations section. The alpha differs based on the electoral vote count. I am close to getting a similar graph but I cannot figure out how to get the density and histogram to work together.
Code
library(data.table)
library(ggplot2)
dt <- data.table(ratio = rnorm(10000, mean = .5, sd = 1))
dt[, .(ratio,
al = (ratio >= 1))] %>%
ggplot(aes(x = ratio, alpha = al)) +
geom_histogram(aes(), bins = 100,
fill = 'red') +
geom_density(aes(),size = 1.5,
color = 'blue') +
geom_vline(xintercept = 1,
color = '#0080e2',
size = 1.2) +
scale_alpha_discrete(range = c(.65, .9))
This attempt correctly changes alpha past 1 as desired but the density estimate is not scaled.
dt[, .(ratio,
al = (ratio >= 1))] %>%
ggplot(aes(x = ratio)) +
geom_histogram(aes(y = ..density.., alpha = al), bins = 100,
fill = 'red') +
geom_density(aes(y = ..scaled..),size = 1.5,
color = 'blue',) +
geom_vline(xintercept = 1,
color = '#0080e2',
size = 1.2) +
scale_alpha_discrete(range = c(.65, .9))
This attempt correctly scales the density curve, but now the geom_histogram is calculated separately for values under 1 and above 1. I want them calculated as one group.
What am I missing?
The reason why knowing your theme is important is that there's an easy shortcut to this, which is not using alpha, but just drawing a semitransparent rectangle over the left half of your plot:
library(data.table)
library(ggplot2)
library(dplyr)
data.table(ratio = rnorm(10000, mean = .5, sd = 1)) %>%
ggplot(aes(x = ratio)) +
geom_histogram(aes(y = ..density..), bins = 100,
fill = 'red') +
geom_line(aes(), stat = "density", size = 1.5,
color = 'blue') +
geom_vline(xintercept = 1,
color = '#0080e2',
size = 1.2) +
annotate("rect", xmin = -Inf, xmax = 1, ymin = 0, ymax = Inf, fill = "white",
alpha = 0.5) +
theme_bw()
Splitting into two groups and using alpha is possible, but it basically requires you to precalculate the histogram and the density curve. That's fine, but it would be an awful lot of extra effort for very little visual gain.
Of course, if theme_josh has a custom background color and zany gridlines, this approach may not be quite so effective. As long as you set the fill color to the panel background you should get a decent result. (the default ggplot panel is "gray90" or "gray95" I think)

R - ggplot2 - Add arrow if geom_errorbar outside limits when x-axis is a factor variable

I want to use geom_segment to replace error bars with arrows when the error exceeds a certain limit. I found a previous post that addresses this question: R - ggplot2 - Add arrow if geom_errorbar outside limits
The code works well, except that my x-axis is a factor variable instead of a numeric variable. Using position_dodge within the geom_segment statement makes the arrows start in the correct location, but it doesn't change the terminal point (xend) and all arrows point towards one central point on the x-axis instead of going straight up from the origins.
Instead of recoding the x-axis to be numeric (I will use this code to create many plots that have a range of x-axis values, with the last numeric value always ending in "+"), is there a way to correct this within geom_segment?
Code used:
data$OR.95U_u = ifelse(data$OR.95U > 10, 10 , NA)
ggplot(data, aes(x = numAlleles, y = OR, fill = Outcome)) +
geom_bar(position = position_dodge(.5), stat = "identity", width = .4, color = "black") + geom_hline(yintercept = 1, linetype = "dashed", color = "black") +
ylim(0,10) + geom_errorbar(aes(ymin=OR.95L, ymax=OR.95U), width=.2,position=position_dodge(.5)) +
theme(legend.key = element_blank(), text = element_text(size = 11.5), legend.title = element_blank()) +
labs(x = "Number of rare alleles") +
scale_fill_manual(values=c("chocolate1","coral1", "red2", "darkred")) +
geom_segment(aes(x = numAlleles, xend = numAlleles, y = OR, yend = OR.95U_u), position = position_dodge(.5), arrow = arrow(length = unit(0.3, "cm")))
Resulting figure
Ok, after investigating a bit, I didn't find a clean way of doing this, at it seems that position_dodge only change the x aes, and not the xend aes. position_nudge also don't work here, as it moves all the arrows at the same time.
So I came with a dirty way of doing this. All we need is create a new variable with the desired xend position for the geom_segment. I try and came with a semi-automtized way of doing it, for any number of levels of the coloring variable, and also created a reproducible dataset to work with, as I'm sure this could be improved a lot by people with more knowledge than me.
The code has inline comments expalining the steps:
library(tidyverse)
# dummy data (tried to replicate your plot data more or less accurately)
df <- tibble(
numAlleles = rep(c("1", "2+"), each = 4),
Outcome = rep(LETTERS[1:4], 2),
OR = c(1.4, 1.5, 1.45, 2.3, 3.8, 4.2, 4.0, 1.55),
OR.95U = c(1.9,2.1,1.9,3.8,12,12,12,12),
OR.95L = c(0.9, 0.9, 0.9, 0.8, NA, NA,NA,NA)
) %>%
mutate(
OR.95U_u = if_else(OR.95U > 10, 10, NA_real_)
)
# as it seems that position_dodge in a geom_segment only "dodge" the x aes and
# not the xend aes, we need to supply a custom xend. Also, we need to try
# to automatize the position, for more classes or different dodge widths.
# To do that, lets start with some parameters:
# position_dodge width
position_dodge_width <- 0.5
# number of bars per x axis class
bars_per_class <- length(unique(df$Outcome))
# total space available per class. In discrete vars, this is 1 au (arbitrary unit)
# for each class, but position_dodge only use the fraction of that unit
# indicated in the width parameter, so we need to calculate the real
# space available:
total_space_available <- 1 * position_dodge_width
# now we calculate the real bar width used by ggplot in these au, dividing the
# space available by the number of bars to plot for each class
bar_width_real <- (total_space_available / bars_per_class)
# position_dodge with discrete variables place bars to the left and to the right of the
# class au value, so we need to know when to place the xend to the left or
# to the right. Also, the number of bars has to be taken in to account, as
# in odd number of bars, one is located on the exact au value
if (bars_per_class%%2 == 0) {
# we need an offset, as bars are wider than arrows, and we want them in the
# middle of the bar
offset_segment <- bar_width_real / 2
# offset modifier to know when to substract or add the modifier
offset_modifier <- c(rep(-1, bars_per_class%/%2), rep(1, bars_per_class%/%2))
# we also need to know how meny bars to the left and how many to the right,
# but, the first bar of each side is already taken in account with the offset,
# so the bar modifier has to have one bar less for each side
bar_width_modifier <- c(seq((bars_per_class%/%2-1), 0), seq(0, (bars_per_class%/%2-1)))
} else {
# when odd number of columns, the offset is the same as the bar width
offset_segment <- bar_width_real
# and the modifiers have to have a middle zero value for the middle bar
offset_modifier <- c(rep(-1, bars_per_class%/%2), 0, rep(1, bars_per_class%/%2))
bar_width_modifier <- c(seq((bars_per_class%/%2-1), 0), 0, seq(0, (bars_per_class%/%2-1)))
}
# finally we create the vector of xend values needed:
df %>%
mutate(
numAlleles_u = as.numeric(as.factor(numAlleles)) + offset_modifier*(offset_segment + (bar_width_modifier*bar_width_real))
)
ggplot(df, aes(x = numAlleles, y = OR, fill = Outcome)) +
geom_bar(
position = position_dodge(position_dodge_width), stat = "identity",
width = 0.4, color = "black"
) +
geom_hline(yintercept = 1, linetype = "dashed", color = "black") +
ylim(0,10) +
geom_errorbar(
aes(ymin=OR.95L, ymax=OR.95U), width=.2,position=position_dodge(position_dodge_width)
) +
theme(
legend.key = element_blank(), text = element_text(size = 11.5),
legend.title = element_blank()
) +
labs(x = "Number of rare alleles") +
scale_fill_manual(values=c("chocolate1","coral1", "red2", "darkred")) +
geom_segment(
aes(x = numAlleles, xend = numAlleles_u, y = OR, yend = OR.95U_u),
position = position_dodge(position_dodge_width), arrow = arrow(length = unit(0.3, "cm"))
)
And the plot:
We can check that for three levels discrete variables also works:
df_three_bars <- df %>% filter(Outcome != 'D')
bars_per_class <- length(unique(df_three_bars$Outcome))
total_space_available <- 1 * position_dodge_width
bar_width_real <- (total_space_available / bars_per_class)
if (bars_per_class%%2 == 0) {
offset_segment <- bar_width_real / 2
offset_modifier <- c(rep(-1, bars_per_class%/%2), rep(1, bars_per_class%/%2))
bar_width_modifier <- c(seq((bars_per_class%/%2-1), 0), seq(0, (bars_per_class%/%2-1)))
} else {
offset_segment <- bar_width_real
offset_modifier <- c(rep(-1, bars_per_class%/%2), 0, rep(1, bars_per_class%/%2))
bar_width_modifier <- c(seq((bars_per_class%/%2-1), 0), 0, seq(0, (bars_per_class%/%2-1)))
}
df_three_bars <- df_three_bars %>%
mutate(
numAlleles_u = as.numeric(as.factor(numAlleles)) + offset_modifier*(offset_segment + (bar_width_modifier*bar_width_real))
)
ggplot(df_three_bars, aes(x = numAlleles, y = OR, fill = Outcome)) +
geom_bar(
position = position_dodge(position_dodge_width), stat = "identity",
width = 0.4, color = "black"
) +
geom_hline(yintercept = 1, linetype = "dashed", color = "black") +
ylim(0,10) +
geom_errorbar(
aes(ymin=OR.95L, ymax=OR.95U), width=.2,position=position_dodge(position_dodge_width)
) +
theme(
legend.key = element_blank(), text = element_text(size = 11.5),
legend.title = element_blank()
) +
labs(x = "Number of rare alleles") +
scale_fill_manual(values=c("chocolate1","coral1", "red2", "darkred")) +
geom_segment(
aes(x = numAlleles, xend = numAlleles_u, y = OR, yend = OR.95U_u),
position = position_dodge(position_dodge_width), arrow = arrow(length = unit(0.3, "cm"))
)

Resources