ggplot2 barplots with errorbars when using stacked bars - r

I'm trying to produce a stacked barplot with an error bar which represents the total variability per bar. I don't want to use a dodged barplot as I have >10 categories per bar.
Below I have some sample data for a reproducible example:
scenario = c('A','A','A','A')
strategy = c('A','A','A','A')
decile = c(0,0,10,10)
asset = c('A','B','A','B')
lower = c(10,20,10, 15)
mean = c(30,50,60, 70)
upper = c(70,90,86,90)
data = data.frame(scenario, strategy, decile, asset, lower, mean, upper)
And once we have the data df we can use ggplot2 to create a stacked bar as so:
ggplot(wide, aes(x=decile, y=mean, fill=asset)) +
geom_bar(stat="identity") +
facet_grid(strategy~scenario) +
geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.25)
However, the error bars produced are for each individual component of each stacked bar:
I appreciate this results from me providing the lower, mean and upper for each row of the df, but even when I summed these per decile I didn't get my desired errorbars at the top of each bar stack.
What is the correct ggplot2 code, or alternatively, what is the correct data structure to enable this?

I think you're correct in realising you need to manipulate your data rather than your plot. You can't really have position_stack on an errorbar, so you'll need to recalculate the mean, upper and lower values for the errorbars. Essentially this means getting the cumulative sum of the mean values, and shifting the upper and lower ranges accordingly. You can do this inside a dplyr pipe.
Note I think you will also need to have a position_dodge on the error bars, since their range overlaps even when shifted appropriately, which will make them harder to interpret visually:
library(ggplot2)
library(dplyr)
data %>%
mutate(lower = lower - mean, upper = upper - mean) %>%
group_by(decile) %>%
arrange(rev(asset), by.group = TRUE) %>%
mutate(mean2 = cumsum(mean), lower = lower + mean2, upper = upper + mean2) %>%
ggplot(aes(x = decile, y = mean, fill = asset)) +
geom_bar(stat = "identity") +
facet_grid(strategy ~ scenario) +
geom_errorbar(aes(y = mean2, ymin = lower, ymax = upper), width = 2,
position = position_dodge(width = 2)) +
geom_point(aes(y = mean2), position = position_dodge(width = 2))

If you want only one error bar per decile, you should aggregate the values so that there is not difference between assest like this:
library(ggplot2)
library(dplyr)
#Code
data %>% group_by(scenario,decile) %>%
mutate(nlower=mean(lower),nupper=mean(upper)) %>%
ggplot(aes(x=factor(decile), y=mean, fill=asset,group=scenario)) +
geom_bar(stat="identity") +
facet_grid(strategy~scenario) +
geom_errorbar(aes(ymin = nlower, ymax = nupper), width = 0.25)
Output:
It is other thing using asset as it will consider each class as you have different values for each of them:
#Code 2
data %>%
ggplot(aes(x=factor(decile), y=mean, fill=asset,group=scenario)) +
geom_bar(stat="identity") +
facet_grid(strategy~scenario) +
geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.25)
Output:
In last version, each asset has its own error bar, but if you want to see erros globally, you should use an approach aggregating the limits as that was done with mean values or other measure you wish.

Related

Add error bar to ggplot2 stacked barplot, without using dodge

I can find examples of adding error bars to dodged barplots (e.g. here).
However, is it possible to denote both a stacked barplot, with a single error bar at the top of each bar showing overall error? For example, like this middle plot below? How would I add the red error bars?
My basic ggplot2 code is here:
ggplot(sample, aes(x=strategy_short, y=baseline, fill=income)) +
geom_bar(position="stack", stat="identity") +
facet_grid(~scenario_capacity)
And my data are below:
income,scenario_capacity,strategy_short,baseline,high,low
LIC,50_gb_month,4G_f,0.260317022,0.326222444,0.234391846
LIC,50_gb_month,5G_f,0.124212858,0.146834332,0.115607428
LIC,50_gb_month,4G_w,0.266087059,0.331992481,0.240156101
LIC,50_gb_month,5G_w,0.129977113,0.152604368,0.121371683
LMIC,50_gb_month,4G_f,0.83300281,0.981024297,0.770961424
LMIC,50_gb_month,5G_f,0.527561846,0.56027992,0.517383821
LMIC,50_gb_month,4G_w,0.837395381,0.985564298,0.77528317
LMIC,50_gb_month,5G_w,0.53198477,0.564819922,0.521741702
UMIC,50_gb_month,4G_f,2.084363642,2.161110527,2.047796949
UMIC,50_gb_month,5G_f,1.644845928,1.667321898,1.634737764
UMIC,50_gb_month,4G_w,2.08822286,2.165063696,2.051605578
UMIC,50_gb_month,5G_w,1.648696474,1.67124905,1.638559402
HIC,50_gb_month,4G_f,1.016843718,1.026058625,1.010465168
HIC,50_gb_month,5G_f,0.820046245,0.823345129,0.81792777
HIC,50_gb_month,4G_w,1.019669475,1.028904617,1.013290925
HIC,50_gb_month,5G_w,0.823000642,0.82634578,0.820861932
Whenever I try to use an aggregated dataframe to feed to geom_errorbar, as below, I end up with an error message ('object 'income' not found').
sample_short <- sample %>%
group_by(scenario_capacity, strategy_short) %>%
summarize(
low = sum(low),
baseline = sum(baseline),
high = sum(high),
)
ggplot(sample, aes(x=strategy_short, y=baseline, fill=income)) +
geom_bar(position="stack", stat="identity") +
geom_errorbar(data=sample_short, aes(y = baseline, ymin = low, ymax = high)) +
facet_grid(~scenario_capacity)
You need to include income in your summary stats, like so:
(df being your dataframe: avoid naming objects with function names like sample):
df_errorbar <-
df |>
group_by(scenario_capacity, strategy_short) |>
summarize(
income = first(income),
low = sum(low),
baseline = sum(baseline),
high = sum(high)
)
df |>
ggplot(aes(x=strategy_short, y=baseline, fill=income)) +
geom_bar(position="stack", stat="identity") +
geom_errorbar(data = df_errorbar, aes(y = baseline, ymin = low, ymax = high)) +
facet_grid(~scenario_capacity)
take care about appropriate grouping when desiring an overall "error"

How do I line up my error bars with my bars in ggplot?

I'm creating a bar chart with a pattern for a subset of the bars, and I want to add error bars.
However, I'm having trouble lining up the error bars with with the bar charts—I want to have them appear centered on each bar. How do I do this? Moreover, the legend currently does not clearly distinguish the striped and non-striped bars as corresponding to not treated and treated groups.
Finally, I'd like to create version of this plot which stacks adjacent bars (i.e. bars within each facet_grid)—any tips on how to do that would be much appreciated.
The code I'm using is:
library(ggplot2)
library(tidyverse)
library(ggpattern)
models = c("a", "b")
task = c("1","2")
ratios = c(0.3, 0.4)
standard_errors = c(0.02, 0.02)
ymax = ratios + standard_errors
ymin = ratios - standard_errors
colors = c("#F39B7FFF", "#8491B4FF")
df <- data.frame(task = task, ratios = ratios)
df <- df %>% mutate(filler = 1-ratios)
df <- df %>% gather(key = "obs", value = "ratios", -1)
df$upper <- df$ratios + c(standard_errors,standard_errors)
df$models <- c(models,models)
df$lower <- df$ratios - c(standard_errors,standard_errors)
df$col <- c(colors,colors)
df$group <- paste(df$task, df$models, sep="-")
df$treated <- "yes"
df[df$ratios<0.5,]$treated = "no"
p <- ggplot(df, aes(x = group, y = ratios, fill = col, ymin = lower, ymax = upper)) +
stat_summary(aes(pattern=treated),
fun = "mean", position=position_dodge(),
geom = "bar_pattern", pattern_fill="black", colour="black") +
geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2, position=position_dodge(0.9)) +
scale_pattern_manual(values=c("none", "stripe"))+ #edited part
facet_grid(.~task,
scales = "free_x", # Let the x axis vary across facets.
space = "free_x", # Let the width of facets vary and force all bars to have the same width.
switch = "x") + guides(colour = guide_legend(nrow = 1)) +
guides(fill = "none")
p
Here is an option
df %>%
ggplot(aes(x = models, y = ratios)) +
geom_col_pattern(
aes(fill = col, pattern = treated),
pattern_fill = "black",
colour = "black",
pattern_key_scale_factor = 0.2,
position = position_dodge()) +
geom_errorbar(
aes(ymin = lower, ymax = upper, group = interaction(task, treated)),
width = 0.2,
position = position_dodge(0.9)) +
facet_grid(~ task, scales = "free_x") +
scale_pattern_manual(values = c("none", "stripe")) +
scale_fill_identity()
A few comments:
I don't understand the point of creating group. IMO this is unnecessary. TBH, I also don't understand the point of models and task: if task = "1" then models = "a"; if task = "2" then models = "b"; so task and models are redundant as they encode the same thing (whether you call it "1"/"2" or "a"/"b").
The reason why you (originally) didn't see a pattern in the legend is because of the scale factor in the legend key. As per ?scale_col_pattern, you can adjust this with the pattern_key_scale_factor parameter. Here, I've chosen pattern_key_scale_factor = 0.2 but you may want to play with different values.
The reason why the error bars didn't align with the dodged bars was because geom_errorbar didn't know that there are different task-treated combinations. We can fix this by explicitly defining a group aesthetic given by the combination of task & treated values. The reason why you don't need this in geom_col_pattern is because you already allow for different treated values through the pattern aesthetic.
You want to use scale_fill_identity() if you already have actual colour values defined in the data.frame.

ggplot2 and jitter/dodge points by a group

I have 'elevation' as my y-axis and I want it as a discrete variable (in other words I want the space between each elevation to be equal and not relative to the numerical differences). My x-axis is 'time' (julian date).
mydata2<- data.frame(
"Elevation" = c(rep(c(1200),10),rep(c(1325.5),10),rep(c(1350.75),10), rep(c(1550.66),10)),
"Sex" = c(rep(c("F","M"),20)),
"Type" = c(rep(c("emerge","emerge","endhet","endhet","immerge","immerge","melt","melt", "storpor","storpor"),4)),
"mean" = c(rep(c(104,100,102,80,185,210,84,84,188,208,104,87,101,82, 183,188,83,83,190,189),2))
"se"=c(rep(c(.1,.01,.2,.02,.03),4)))
mydata2$Sex<-factor(mydata2$Sex))
mydata2$Type<-factor(mydata2$Type))
mydata2$Elevation<-factor(mydata2$Elevation))
at<-ggplot(mydata2, aes(y = mean, x = Elevation,color=Type, group=Sex)) +
geom_pointrange(aes(ymin = mean-se, ymax = mean+se),
position=position_jitter(width=0.2,height=.1),
linetype='solid') +
facet_grid(Sex~season,scales = "free")+
coord_flip()
at
Ideally, I would like each 'type' to be separated vertically. When I jitter or dodge only those that are close separate and not evenly. Is there a way to force each 'type' to be slightly shifted so they are all on their own line? I tried to force it by giving each type a slightly different 'elevation' but then I end up with a messy y-axis (I can't figure out a way to keep the point but not display all the tick marks with a discrete scale).
Thank you for your help.
If you want to use a numerical value as a discrete value, you should use as.factor. In your example, try to use x = as.factor(Elevation).
Additionally, I will suggest to use position = position_dodge() to get points from different conditions corresponding to the same elevation to be plot side-by-side
ggplot(mydata2, aes(y = mean, x = as.factor(Elevation),color=Type, group=Sex)) +
geom_pointrange(aes(ymin = mean-se, ymax = mean+se),
position=position_dodge(),
linetype='solid') +
facet_grid(Sex~season,scales = "free")+
coord_flip()
EDIT with example data provided by the OP
Using your dataset, I was not able to get range being plot with your point. So, I create two variable Lower and Upper using dplyr package.
Then, I did not pass your commdnas facotr(...) you provided in your question but instead, I used as.factor(Elevation) and position_dodge(0.9) for the plotting to get the following plot:
library(tidyverse)
mydata2 %>% mutate(Lower = mean-se*100, Upper = mean+se*100) %>%
ggplot(., aes( x = as.factor(Elevation), y = mean, color = Type))+
geom_pointrange(aes(ymin = Lower, ymax = Upper), linetype = "solid", position = position_dodge(0.9))+
facet_grid(Sex~., scales = "free")+
coord_flip()
Does it look what you are looking for ?
Data
Your dataset provided contains few errors (too much parenthesis), so I correct here.
mydata2<- data.frame(
"Elevation" = c(rep(c(1200),10),rep(c(1325.5),10),rep(c(1350.75),10), rep(c(1550.66),10)),
"Sex" = rep(c("F","M"),20),
"Type" = rep(c("emerge","emerge","endhet","endhet","immerge","immerge","melt","melt", "storpor","storpor"),4),
"mean" = rep(c(104,100,102,80,185,210,84,84,188,208,104,87,101,82, 183,188,83,83,190,189),2),
"se"=rep(c(.1,.1,.2,.05,.03),4))

position_dodge() does not seem to work with stat_summary() and x variables with different fill groups

Using stat_summary(geom = "bar) + stat_summary(geom = "errorbar") does not seem to work with position_dodge(), in the case of x values with varying numbers of condition groups.
I am trying to make a (what should be straightforward) barplot with ggplot2. My data has a number of different samples (x variable), and some of these samples also have a fill (condition) variable ("Scr" or "shRNA") while others don't (condition = NA). When I attempt to plot these data using the stat_summary wrappers to make bar plots with error bars, the position_dodge function for errorbars only works on samples that do not have different fill groups. The stat_summary(geom = "barplot") seems to be functional, because the separate bars do show up, but their error bars are not aligned.
test <- data.frame(Sample = c(rep("A",6),rep("B",3)),
Target = c(rep("GENE1",9)),
val = c(1.1,1.2,1.15,.5,.6,.7,.95,1,1.05),
condition = c(rep("Scr",3),rep("shRNA",3),rep(NA,3)))
g <- ggplot(data=test,aes(x=Sample,y=val,fill=condition)) +
stat_summary(geom = "bar", fun.y = mean,position = position_dodge2(width=.5,preserve = "single"),color="black",width=.8) +
stat_summary(geom = "errorbar", fun.data = mean_se, position = position_dodge2(width=.2,preserve = "single"),width=.2) +
scale_y_continuous(expand = expand_scale(mult = c(0,.2))) +
#scale_fill_discrete(guide=guide_legend(title="",nrow=2))
I expect the position_dodge() argument in both stat_summary()'s to align error bars to the correct x position, regardless of whether or not that particular sample has one or two fill groups.
I'm a bit confused about what you're trying to do. Why not use geom_col/geom_bar instead of stat_summary? I always prefer keeping data manipulation/summarisation and plotting separate.
This is what I'd do
library(tidyverse)
test %>%
group_by(Sample, condition) %>%
summarise(val.mean = mean(val), val.sd = sd(val)) %>%
ggplot(aes(Sample, val.mean, fill = condition)) +
geom_col(position = position_dodge(width = 0.8)) +
geom_errorbar(
aes(ymin = val.mean - val.sd, ymax = val.mean + val.sd),
position = position_dodge(width = 0.8),
width = 0.2)

How to vary line and ribbon colours in a facet_grid

I'm hoping someone can help with this plotting problem I have. The data can be found here.
Basically I want to plot a line (mean) and it's associated confidence interval (lower, upper) for 4 models I have tested. I want to facet on the Cat_Auth variable for which there are 4 categories (so 4 plots). The first 'model' is actually just the mean of the sample data and I don't want a CI for this (NA values specified in the data - not sure if this is the correct thing to do).
I can get the plot some way there with:
newdata <- read.csv("data.csv", header=T)
ggplot(newdata, aes(x = Affil_Max, y = Mean)) +
geom_line(data = newdata, aes(), colour = "blue") +
geom_ribbon(data = newdata, alpha = .5, aes(ymin = Lower, ymax = Upper, group = Model, fill = Model)) +
facet_grid(.~ Cat_Auth)
But I'd like different coloured lines and shaded ribbons for each model (e.g. a red mean line and red shaded ribbon for model 2, green for model 3 etc). Also, I can't figure out why the blue line corresponding to the first set of mean values is disjointed as it is.
Would be really grateful for any assistance!
Try this:
library(dplyr)
library(ggplot2)
newdata %>%
mutate(Model = as.factor(Model)) %>%
ggplot(aes(Affil_Max, Mean)) +
geom_line(aes(color = Model, group = Model)) +
geom_ribbon(alpha = .5, aes(ymin = Lower, ymax = Upper,
group = Model, fill = Model)) +
facet_grid(. ~ Cat_Auth)

Resources