Modifying geom_ribbon borders - r

I am plotting a series of means and standard deviations over time with code below, and am trying to use geom_ribbon to display the sd's, see below.
Due to the significant overlap I'd like to add a border to the ribbons that is the same color as the corresponding variable but is a dashed line, but I can't figure out where in the code this would go. I know "colour" and "linetype" commands are involved somehow...
Thanks!
graph.msd <- ggplot(data=g.data, aes(x=quarter,y=mean,group=number))
graph.msd <- graph.msd + geom_line(aes(colour = number),size=1)+geom_ribbon(aes(ymin=mean-sd,ymax=mean+sd,fill=number),linetype=2,alpha=0.1)

You need pass a value for colour to geom_ribbon something like
graph.msd <- graph.msd +
geom_line(aes(colour = number),size=1)+
geom_ribbon(aes(ymin = mean-sd, ymax = mean+sd,
fill = number,colour = number), linetype=2, alpha=0.1)
with a reproducible example (using a variant on the examples in ?geom_ribbon
huron <- data.frame(year = 1875:1972, level = as.vector(LakeHuron))
library(plyr) # to access round_any
huron$decade <- round_any(huron$year, 10, floor)
ggplot(huron, aes(x =year, group = decade)) +
geom_ribbon(aes(ymin = level-1, ymax = level+1,
colour = factor(decade), fill = factor(decade)),
linetype = 2, alpha= 0.1)

Related

How do I line up my error bars with my bars in ggplot?

I'm creating a bar chart with a pattern for a subset of the bars, and I want to add error bars.
However, I'm having trouble lining up the error bars with with the bar charts—I want to have them appear centered on each bar. How do I do this? Moreover, the legend currently does not clearly distinguish the striped and non-striped bars as corresponding to not treated and treated groups.
Finally, I'd like to create version of this plot which stacks adjacent bars (i.e. bars within each facet_grid)—any tips on how to do that would be much appreciated.
The code I'm using is:
library(ggplot2)
library(tidyverse)
library(ggpattern)
models = c("a", "b")
task = c("1","2")
ratios = c(0.3, 0.4)
standard_errors = c(0.02, 0.02)
ymax = ratios + standard_errors
ymin = ratios - standard_errors
colors = c("#F39B7FFF", "#8491B4FF")
df <- data.frame(task = task, ratios = ratios)
df <- df %>% mutate(filler = 1-ratios)
df <- df %>% gather(key = "obs", value = "ratios", -1)
df$upper <- df$ratios + c(standard_errors,standard_errors)
df$models <- c(models,models)
df$lower <- df$ratios - c(standard_errors,standard_errors)
df$col <- c(colors,colors)
df$group <- paste(df$task, df$models, sep="-")
df$treated <- "yes"
df[df$ratios<0.5,]$treated = "no"
p <- ggplot(df, aes(x = group, y = ratios, fill = col, ymin = lower, ymax = upper)) +
stat_summary(aes(pattern=treated),
fun = "mean", position=position_dodge(),
geom = "bar_pattern", pattern_fill="black", colour="black") +
geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2, position=position_dodge(0.9)) +
scale_pattern_manual(values=c("none", "stripe"))+ #edited part
facet_grid(.~task,
scales = "free_x", # Let the x axis vary across facets.
space = "free_x", # Let the width of facets vary and force all bars to have the same width.
switch = "x") + guides(colour = guide_legend(nrow = 1)) +
guides(fill = "none")
p
Here is an option
df %>%
ggplot(aes(x = models, y = ratios)) +
geom_col_pattern(
aes(fill = col, pattern = treated),
pattern_fill = "black",
colour = "black",
pattern_key_scale_factor = 0.2,
position = position_dodge()) +
geom_errorbar(
aes(ymin = lower, ymax = upper, group = interaction(task, treated)),
width = 0.2,
position = position_dodge(0.9)) +
facet_grid(~ task, scales = "free_x") +
scale_pattern_manual(values = c("none", "stripe")) +
scale_fill_identity()
A few comments:
I don't understand the point of creating group. IMO this is unnecessary. TBH, I also don't understand the point of models and task: if task = "1" then models = "a"; if task = "2" then models = "b"; so task and models are redundant as they encode the same thing (whether you call it "1"/"2" or "a"/"b").
The reason why you (originally) didn't see a pattern in the legend is because of the scale factor in the legend key. As per ?scale_col_pattern, you can adjust this with the pattern_key_scale_factor parameter. Here, I've chosen pattern_key_scale_factor = 0.2 but you may want to play with different values.
The reason why the error bars didn't align with the dodged bars was because geom_errorbar didn't know that there are different task-treated combinations. We can fix this by explicitly defining a group aesthetic given by the combination of task & treated values. The reason why you don't need this in geom_col_pattern is because you already allow for different treated values through the pattern aesthetic.
You want to use scale_fill_identity() if you already have actual colour values defined in the data.frame.

Shade parts of a ggplot based on a (changing) dummy variable

I want to shade areas in a ggplot but I don't want to manually tell geom_rect() where to stop and where to start. My data changes and I always want to shade several areas based on a condition.
Here for example with the condition "negative":
library("ggplot2")
set.seed(3)
plotdata <- data.frame(somevalue = rnorm(10), indicator = 0 , counter = 1:10)
plotdata[plotdata$somevalue < 0,]$indicator <- 1
plotdata
I can do that manually like here or here:
plotranges <- data.frame(from = c(1,4,9), to = c(2,4,9))
ggplot() +
geom_line(data = plotdata, aes(x = counter, y = somevalue)) +
geom_rect(data = plotranges, aes(xmin = from - 1, xmax = to, ymin = -Inf, ymax = Inf), alpha = 0.4)
But my problem is that, so to speak, the set.seed() argument changes and I want to still automatically generate the plot without specifying min and max values of the shading manually. Is there a way (maybe without geom_rect() but instead geom_bar()?) to plot shading based directly on my indicator variable?
edit: Here is my own best attempt, as you can see not great:
ggplot(data = plotdata, aes(x = counter, y = somevalue)) + geom_line() +
geom_bar(aes(y = indicator*max(somevalue)), stat= "identity")
You can use stat_summary() to calculate the extremes of runs of your indicator. In the code below data.table::rleid() groups the data by runs of indicators. In the summary layer, y doesn't really do anything, so we use it to store the resolution of your datapoints, which we then later use to offset the xmin/xmax parameters. The after_stat() function is used to access computed variables after the ranges have been computed.
library("ggplot2")
plotdata <- data.frame(somevalue = rnorm(10), counter = 1:10)
plotdata$indicator <- as.numeric(plotdata$somevalue < 0)
ggplot(plotdata, aes(counter, somevalue)) +
stat_summary(
geom = "rect",
aes(group = data.table::rleid(indicator),
xmin = after_stat(xmin - y * 0.5),
xmax = after_stat(xmax + y * 0.5),
alpha = I((indicator) * 0.4),
y = resolution(counter)),
fun.min = min, fun.max = max,
orientation = "y", ymin = -Inf, ymax = Inf
) +
geom_line()
Created on 2021-09-14 by the reprex package (v2.0.1)

ggplot specific thick line

How would one be able to plot one line thicker than the other. I tried using the geom_line(size=X) but then this increases the thickness of both lines. Let say I would like to increase the thickness of the first column, how would one be able to approach this?
a <- (cbind(rnorm(100),rnorm(100))) #nav[,1:10]
sa <- stack(as.data.frame(a))
sa$x <- rep(seq_len(nrow(a)), ncol(a))
require("ggplot2")
p<-qplot(x, values, data = sa, group = ind, colour = ind, geom = "line")
p + theme(legend.position = "none")+ylab("Millions")+xlab("Age")+
geom_line( size = 1.5)
You need to map line thickness to the variable:
p + geom_line(aes(size = ind))
To control the thickness use scale_size_manual():
p + geom_line(aes(size = ind)) +
scale_size_manual(values = c(0.1, 1))

ggplot2: set (nonlinear) values for alpha

I'd like to plot a mirrored 95% density curve and map alpha to the density:
foo <- function(mw, sd, lower, upper) {
x <- seq(lower, upper, length=500)
dens <- dnorm(x, mean=mw, sd=sd, log=TRUE)
dens0 <- dens -min(dens)
return(data.frame(dens0, x))
}
df.rain <- foo(0,1,-1,1)
library(ggplot2)
drf <- ggplot(df.rain, aes(x=x, y=dens0))+
geom_line(aes(alpha=..y..))+
geom_line(aes(x=x, y=-dens0, alpha=-..y..))+
stat_identity(geom="segment", aes(xend=x, yend=0, alpha=..y..))+
stat_identity(geom="segment", aes(x=x, y=-dens0, xend=x, yend=0, alpha=-..y..))
drf
This works fine, but I'd like to make the contrast between the edges and the middle more prominent, i.e., I want the edges to be nearly white and only the middle part to be black. I've been tampering with scale_alpha() but without luck. Any ideas?
Edit: Ultimately, I'd like to plot several raindrops, i.e., the individual drops will be small but the shading should still be clearly visible.
Instead of mapping dens0 to the alpha, I'd map it to color:
drf <- ggplot(df.rain, aes(x=x, y=dens0))+
geom_line(aes(color=..y..))+
geom_line(aes(x=x, y=-dens0, color=-..y..))+
stat_identity(geom="segment", aes(xend=x, yend=0, color=..y..))+
stat_identity(geom="segment", aes(x=x, y=-dens0, xend=x, yend=0, color=-..y..))
Now we still have the contrast in color is mainly present in the tails. Using two colors helps a bit (note that the switch in color is at 0.25):
drf + scale_color_gradient2(midpoint = 0.25)
Finally, to include the distribution of the dens0 values, I base the midpoint of the color scale on the median value in the data:
drf + scale_color_gradient2(midpoint = median(df.rain$dens0))
Note!: But however the way you tweak your data, most contrast in your data is in the more extreme values in your dataset. Trying to mask this by messing with a non-linear scale, or by tweaking a color scale like I did, could present a false picture of the real data.
Here is a solution using geom_ribbon() instead of geom_line()
df.rain$group <- seq_along(df.rain$x)
tmp <- tail(df.rain, -1)
tmp$group <- tmp$group - 1
tmp$dens0 <- head(df.rain$dens0, -1)
dataset <- rbind(head(df.rain, -1), tmp)
ggplot(dataset, aes(x = x, ymin = -dens0, ymax = dens0, group = group,
alpha = dens0)) + geom_ribbon() + scale_alpha(range = c(0, 1))
ggplot(dataset, aes(x = x, ymin = -dens0, ymax = dens0, group = group,
fill = dens0)) + geom_ribbon() +
scale_fill_gradient(low = "white", high = "black")
See Paul's answer for changing the colours.
dataset9 <- merge(dataset, data.frame(study = 1:9))
ggplot(dataset9, aes(x = x, ymin = -dens0, ymax = dens0, group = group,
alpha = dens0)) + geom_ribbon() + scale_alpha(range = c(0, 0.5)) +
facet_wrap(~study)
While pondering both your answers I actually found exactly what I was looking for. The easiest way is to simply use scale_colour_gradientn with a vector of greys.
library(RColorBrewer)
grey <- brewer.pal(9,"Greys")
drf <- ggplot(df.rain, aes(x=x, y=dens0, col=dens0))+
stat_identity(geom="segment", aes(xend=x, yend=0))+
stat_identity(geom="segment", aes(x=x, y=-dens0, xend=x, yend=0))+
scale_colour_gradientn(colours=grey)
drf

Overlaying histograms with ggplot2 in R

I am new to R and am trying to plot 3 histograms onto the same graph.
Everything worked fine, but my problem is that you don't see where 2 histograms overlap - they look rather cut off.
When I make density plots, it looks perfect: each curve is surrounded by a black frame line, and colours look different where curves overlap.
Can someone tell me if something similar can be achieved with the histograms in the 1st picture? This is the code I'm using:
lowf0 <-read.csv (....)
mediumf0 <-read.csv (....)
highf0 <-read.csv(....)
lowf0$utt<-'low f0'
mediumf0$utt<-'medium f0'
highf0$utt<-'high f0'
histogram<-rbind(lowf0,mediumf0,highf0)
ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2)
Using #joran's sample data,
ggplot(dat, aes(x=xx, fill=yy)) + geom_histogram(alpha=0.2, position="identity")
note that the default position of geom_histogram is "stack."
see "position adjustment" of this page:
geom_histogram documentation
Your current code:
ggplot(histogram, aes(f0, fill = utt)) + geom_histogram(alpha = 0.2)
is telling ggplot to construct one histogram using all the values in f0 and then color the bars of this single histogram according to the variable utt.
What you want instead is to create three separate histograms, with alpha blending so that they are visible through each other. So you probably want to use three separate calls to geom_histogram, where each one gets it's own data frame and fill:
ggplot(histogram, aes(f0)) +
geom_histogram(data = lowf0, fill = "red", alpha = 0.2) +
geom_histogram(data = mediumf0, fill = "blue", alpha = 0.2) +
geom_histogram(data = highf0, fill = "green", alpha = 0.2) +
Here's a concrete example with some output:
dat <- data.frame(xx = c(runif(100,20,50),runif(100,40,80),runif(100,0,30)),yy = rep(letters[1:3],each = 100))
ggplot(dat,aes(x=xx)) +
geom_histogram(data=subset(dat,yy == 'a'),fill = "red", alpha = 0.2) +
geom_histogram(data=subset(dat,yy == 'b'),fill = "blue", alpha = 0.2) +
geom_histogram(data=subset(dat,yy == 'c'),fill = "green", alpha = 0.2)
which produces something like this:
Edited to fix typos; you wanted fill, not colour.
While only a few lines are required to plot multiple/overlapping histograms in ggplot2, the results are't always satisfactory. There needs to be proper use of borders and coloring to ensure the eye can differentiate between histograms.
The following functions balance border colors, opacities, and superimposed density plots to enable the viewer to differentiate among distributions.
Single histogram:
plot_histogram <- function(df, feature) {
plt <- ggplot(df, aes(x=eval(parse(text=feature)))) +
geom_histogram(aes(y = ..density..), alpha=0.7, fill="#33AADE", color="black") +
geom_density(alpha=0.3, fill="red") +
geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) +
labs(x=feature, y = "Density")
print(plt)
}
Multiple histogram:
plot_multi_histogram <- function(df, feature, label_column) {
plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) +
geom_histogram(alpha=0.7, position="identity", aes(y = ..density..), color="black") +
geom_density(alpha=0.7) +
geom_vline(aes(xintercept=mean(eval(parse(text=feature)))), color="black", linetype="dashed", size=1) +
labs(x=feature, y = "Density")
plt + guides(fill=guide_legend(title=label_column))
}
Usage:
Simply pass your data frame into the above functions along with desired arguments:
plot_histogram(iris, 'Sepal.Width')
plot_multi_histogram(iris, 'Sepal.Width', 'Species')
The extra parameter in plot_multi_histogram is the name of the column containing the category labels.
We can see this more dramatically by creating a dataframe with many different distribution means:
a <-data.frame(n=rnorm(1000, mean = 1), category=rep('A', 1000))
b <-data.frame(n=rnorm(1000, mean = 2), category=rep('B', 1000))
c <-data.frame(n=rnorm(1000, mean = 3), category=rep('C', 1000))
d <-data.frame(n=rnorm(1000, mean = 4), category=rep('D', 1000))
e <-data.frame(n=rnorm(1000, mean = 5), category=rep('E', 1000))
f <-data.frame(n=rnorm(1000, mean = 6), category=rep('F', 1000))
many_distros <- do.call('rbind', list(a,b,c,d,e,f))
Passing data frame in as before (and widening chart using options):
options(repr.plot.width = 20, repr.plot.height = 8)
plot_multi_histogram(many_distros, 'n', 'category')
To add a separate vertical line for each distribution:
plot_multi_histogram <- function(df, feature, label_column, means) {
plt <- ggplot(df, aes(x=eval(parse(text=feature)), fill=eval(parse(text=label_column)))) +
geom_histogram(alpha=0.7, position="identity", aes(y = ..density..), color="black") +
geom_density(alpha=0.7) +
geom_vline(xintercept=means, color="black", linetype="dashed", size=1)
labs(x=feature, y = "Density")
plt + guides(fill=guide_legend(title=label_column))
}
The only change over the previous plot_multi_histogram function is the addition of means to the parameters, and changing the geom_vline line to accept multiple values.
Usage:
options(repr.plot.width = 20, repr.plot.height = 8)
plot_multi_histogram(many_distros, "n", 'category', c(1, 2, 3, 4, 5, 6))
Result:
Since I set the means explicitly in many_distros I can simply pass them in. Alternatively you can simply calculate these inside the function and use that way.

Resources