Add error bar to ggplot2 stacked barplot, without using dodge - r

I can find examples of adding error bars to dodged barplots (e.g. here).
However, is it possible to denote both a stacked barplot, with a single error bar at the top of each bar showing overall error? For example, like this middle plot below? How would I add the red error bars?
My basic ggplot2 code is here:
ggplot(sample, aes(x=strategy_short, y=baseline, fill=income)) +
geom_bar(position="stack", stat="identity") +
facet_grid(~scenario_capacity)
And my data are below:
income,scenario_capacity,strategy_short,baseline,high,low
LIC,50_gb_month,4G_f,0.260317022,0.326222444,0.234391846
LIC,50_gb_month,5G_f,0.124212858,0.146834332,0.115607428
LIC,50_gb_month,4G_w,0.266087059,0.331992481,0.240156101
LIC,50_gb_month,5G_w,0.129977113,0.152604368,0.121371683
LMIC,50_gb_month,4G_f,0.83300281,0.981024297,0.770961424
LMIC,50_gb_month,5G_f,0.527561846,0.56027992,0.517383821
LMIC,50_gb_month,4G_w,0.837395381,0.985564298,0.77528317
LMIC,50_gb_month,5G_w,0.53198477,0.564819922,0.521741702
UMIC,50_gb_month,4G_f,2.084363642,2.161110527,2.047796949
UMIC,50_gb_month,5G_f,1.644845928,1.667321898,1.634737764
UMIC,50_gb_month,4G_w,2.08822286,2.165063696,2.051605578
UMIC,50_gb_month,5G_w,1.648696474,1.67124905,1.638559402
HIC,50_gb_month,4G_f,1.016843718,1.026058625,1.010465168
HIC,50_gb_month,5G_f,0.820046245,0.823345129,0.81792777
HIC,50_gb_month,4G_w,1.019669475,1.028904617,1.013290925
HIC,50_gb_month,5G_w,0.823000642,0.82634578,0.820861932
Whenever I try to use an aggregated dataframe to feed to geom_errorbar, as below, I end up with an error message ('object 'income' not found').
sample_short <- sample %>%
group_by(scenario_capacity, strategy_short) %>%
summarize(
low = sum(low),
baseline = sum(baseline),
high = sum(high),
)
ggplot(sample, aes(x=strategy_short, y=baseline, fill=income)) +
geom_bar(position="stack", stat="identity") +
geom_errorbar(data=sample_short, aes(y = baseline, ymin = low, ymax = high)) +
facet_grid(~scenario_capacity)

You need to include income in your summary stats, like so:
(df being your dataframe: avoid naming objects with function names like sample):
df_errorbar <-
df |>
group_by(scenario_capacity, strategy_short) |>
summarize(
income = first(income),
low = sum(low),
baseline = sum(baseline),
high = sum(high)
)
df |>
ggplot(aes(x=strategy_short, y=baseline, fill=income)) +
geom_bar(position="stack", stat="identity") +
geom_errorbar(data = df_errorbar, aes(y = baseline, ymin = low, ymax = high)) +
facet_grid(~scenario_capacity)
take care about appropriate grouping when desiring an overall "error"

Related

ggplot2 barplots with errorbars when using stacked bars

I'm trying to produce a stacked barplot with an error bar which represents the total variability per bar. I don't want to use a dodged barplot as I have >10 categories per bar.
Below I have some sample data for a reproducible example:
scenario = c('A','A','A','A')
strategy = c('A','A','A','A')
decile = c(0,0,10,10)
asset = c('A','B','A','B')
lower = c(10,20,10, 15)
mean = c(30,50,60, 70)
upper = c(70,90,86,90)
data = data.frame(scenario, strategy, decile, asset, lower, mean, upper)
And once we have the data df we can use ggplot2 to create a stacked bar as so:
ggplot(wide, aes(x=decile, y=mean, fill=asset)) +
geom_bar(stat="identity") +
facet_grid(strategy~scenario) +
geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.25)
However, the error bars produced are for each individual component of each stacked bar:
I appreciate this results from me providing the lower, mean and upper for each row of the df, but even when I summed these per decile I didn't get my desired errorbars at the top of each bar stack.
What is the correct ggplot2 code, or alternatively, what is the correct data structure to enable this?
I think you're correct in realising you need to manipulate your data rather than your plot. You can't really have position_stack on an errorbar, so you'll need to recalculate the mean, upper and lower values for the errorbars. Essentially this means getting the cumulative sum of the mean values, and shifting the upper and lower ranges accordingly. You can do this inside a dplyr pipe.
Note I think you will also need to have a position_dodge on the error bars, since their range overlaps even when shifted appropriately, which will make them harder to interpret visually:
library(ggplot2)
library(dplyr)
data %>%
mutate(lower = lower - mean, upper = upper - mean) %>%
group_by(decile) %>%
arrange(rev(asset), by.group = TRUE) %>%
mutate(mean2 = cumsum(mean), lower = lower + mean2, upper = upper + mean2) %>%
ggplot(aes(x = decile, y = mean, fill = asset)) +
geom_bar(stat = "identity") +
facet_grid(strategy ~ scenario) +
geom_errorbar(aes(y = mean2, ymin = lower, ymax = upper), width = 2,
position = position_dodge(width = 2)) +
geom_point(aes(y = mean2), position = position_dodge(width = 2))
If you want only one error bar per decile, you should aggregate the values so that there is not difference between assest like this:
library(ggplot2)
library(dplyr)
#Code
data %>% group_by(scenario,decile) %>%
mutate(nlower=mean(lower),nupper=mean(upper)) %>%
ggplot(aes(x=factor(decile), y=mean, fill=asset,group=scenario)) +
geom_bar(stat="identity") +
facet_grid(strategy~scenario) +
geom_errorbar(aes(ymin = nlower, ymax = nupper), width = 0.25)
Output:
It is other thing using asset as it will consider each class as you have different values for each of them:
#Code 2
data %>%
ggplot(aes(x=factor(decile), y=mean, fill=asset,group=scenario)) +
geom_bar(stat="identity") +
facet_grid(strategy~scenario) +
geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.25)
Output:
In last version, each asset has its own error bar, but if you want to see erros globally, you should use an approach aggregating the limits as that was done with mean values or other measure you wish.

Additional x axis on ggplot

I'm aware there are similar posts but I could not get those answers to work in my case.
e.g. Here and here.
Example:
diamonds %>%
ggplot(aes(scale(price) %>% as.vector)) +
geom_density() +
xlim(-3, 3) +
facet_wrap(vars(cut))
Returns a plot:
Since I used scale, those numbers are the zscores or standard deviations away from the mean of each break.
I would like to add as a row underneath the equivalent non scaled raw number that corresponds to each.
Tried:
diamonds %>%
ggplot(aes(scale(price) %>% as.vector)) +
geom_density() +
xlim(-3, 3) +
facet_wrap(vars(cut)) +
geom_text(aes(label = price))
Gives:
Error: geom_text requires the following missing aesthetics: y
My primary question is how can I add the raw values underneath -3:3 of each break? I don't want to change those breaks, I still want 6 breaks between -3:3.
Secondary question, how can I get -3 and 3 to actually show up in the chart? They have been trimmed.
[edit]
I've been trying to make it work with geom_text but keep hitting errors:
diamonds %>%
ggplot(aes(x = scale(price) %>% as.vector)) +
geom_density() +
xlim(-3, 3) +
facet_wrap(vars(cut)) +
geom_text(label = price)
Error in layer(data = data, mapping = mapping, stat = stat, geom = GeomText, :
object 'price' not found
I then tried changing my call to geom_text()
geom_text(data = diamonds, aes(price), label = price)
This results in the same error message.
You can make a custom labeling function for your axis. This takes each label on the axis and performs a custom transform for you. In your case you could paste the z score, a line break, and the z-score times the standard deviation plus the mean. Because of the distribution of prices in the diamonds data set, this means that z scores below about -1 represent negative prices. This may not be a problem in your own data. For clarity I have drawn in a vertical line representing $0
labeller <- function(x) {
paste0(x,"\n", scales::dollar(sd(diamonds$price) * x + mean(diamonds$price)))
}
diamonds %>%
ggplot(aes(scale(price) %>% as.vector)) +
geom_density() +
geom_vline(aes(xintercept = -0.98580251364833), linetype = 2) +
facet_wrap(vars(cut)) +
scale_x_continuous(label = labeller, limits = c(-3, 3)) +
xlab("price")
We can use the sec_axis functionality in scale_x_continuous. To use this functionality we need to manually scale your data. This will add a secondary axis at the top of the plot, not underneath. So it's not quite exactly what you're looking for.
library(tidyverse)
# manually scale the data
mean_price <- mean(diamonds$price)
sd_price <- sd(diamonds$price)
diamonds$price_scaled <- (diamonds$price - mean_price) / sd_price
# make the plot
ggplot(diamonds, aes(price_scaled))+
geom_density()+
facet_wrap(~cut)+
scale_x_continuous(sec.axis = sec_axis(~ mean_price + (sd_price * .)),
limits = c(-3, 4), breaks = -3:3)
You could cheat a bit by passing some dummy data to geom_text:
geom_text(data = tibble(label = round(((-3:3) * sd_price) + mean_price),
y = -0.25,
x = -3:3),
aes(x, y, label = label))

ggplot2 barplot - adding percentage labels inside the stacked bars but retaining counts on the y-axis

I have created an stacked barplot with the counts of a variables. I want to keep these as counts, so that the different bar sizes represent different group sizes. However, inside the bar plot i would like to add labels that show the proportion of each stack - in terms of percentage.
I managed to create the stacked plot of count for every group. Also I have created the labels and they are are placed correctly. What i struggle with is how to calculate the percentage there?
I have tried this, but i get an error:
dataex <- iris %>%
dplyr::group_by(group, Species) %>%
dplyr::summarise(N = n())
names(dataex)
dataex <- as.data.frame(dataex)
str(dataex)
ggplot(dataex, aes(x = group, y = N, fill = factor(Species))) +
geom_bar(position="stack", stat="identity") +
geom_text(aes(label = ifelse((..count..)==0,"",scales::percent((..count..)/sum(..count..)))), position = position_stack(vjust = 0.5), size = 3) +
theme_pubclean()
Error in (count) == 0 : comparison (1) is possible only for atomic
and list types
desired result:
well, just found answer ... or workaround. Maybe this will help someone in the future: calculate the percentage before the ggplot and then just just use that vector as labels.
dataex <- iris %>%
dplyr::group_by(group, Species) %>%
dplyr::summarise(N = n()) %>%
dplyr::mutate(pct = paste0((round(N/sum(N)*100, 2))," %"))
names(dataex)
dataex <- as.data.frame(dataex)
str(dataex)
ggplot(dataex, aes(x = group, y = N, fill = factor(Species))) +
geom_bar(position="stack", stat="identity") +
geom_text(aes(label = dataex$pct), position = position_stack(vjust = 0.5), size = 3) +
theme_pubclean()

dodge columns in ggplot2

I am trying to create a picture that summarises my data. Data is about prevalence of drug use obtained from different practices form different countries. Each practice has contributed with a different amount of data and I want to show all of this in my picture.
Here is a subset of the data to work on:
gr<-data.frame(matrix(0,36))
gr$drug<-c("a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b","b")
gr$practice<-c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r")
gr$country<-c("c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c3","c3","c1","c1","c1","c1","c1","c1","c1","c1","c1","c1","c2","c2","c2","c2","c2","c2","c3","c3")
gr$prevalence<-c(9.14,5.53,16.74,1.93,8.51,14.96,18.90,11.18,15.00,20.10,24.56,22.29,19.41,20.25,25.01,25.87,29.33,20.76,18.94,24.60,26.51,13.37,23.84,21.82,23.69,20.56,30.53,16.66,28.71,23.83,21.16,24.66,26.42,27.38,32.46,25.34)
gr$prop<-c(0.027,0.023,0.002,0.500,0.011,0.185,0.097,0.067,0.066,0.023,0.433,0.117,0.053,0.199,0.098,0.100,0.594,0.406,0.027,0.023,0.002,0.500,0.011,0.185,0.097,0.067,0.066,0.023,0.433,0.117,0.053,0.199,0.098,0.100,0.594,0.406)
gr$low.CI<-c(8.27,4.80,12.35,1.83,7.22,14.53,18.25,10.56,14.28,18.76,24.25,21.72,18.62,19.83,24.36,25.22,28.80,20.20,17.73,23.15,21.06,13.12,21.79,21.32,22.99,19.76,29.60,15.41,28.39,23.25,20.34,24.20,25.76,26.72,31.92,24.73)
gr$high.CI<-c(10.10,6.37,22.31,2.04,10.00,15.40,19.56,11.83,15.74,21.52,24.87,22.86,20.23,20.68,25.67,26.53,29.86,21.34,20.21,26.10,32.79,13.63,26.02,22.33,24.41,21.39,31.48,17.98,29.04,24.43,22.01,25.12,27.09,28.05,33.01,25.95)
The code I wrote is this
p<-ggplot(data=gr, aes(x=factor(drug), y=as.numeric(gr$prevalence), ymax=max(high.CI),position="dodge",fill=practice,width=prop))
colour<-c(rep("gray79",10),rep("gray60",6),rep("gray39",2))
p + theme_bw()+
geom_bar(stat="identity",position = position_dodge(0.9)) +
labs(x="Drug",y="Prevalence") +
geom_errorbar(ymax=gr$high.CI,ymin=gr$low.CI,position=position_dodge(0.9),width=0.25,size=0.25,colour="black",aes(x=factor(drug), y=as.numeric(gr$prevalence), fill=practice)) +
ggtitle("Drug usage by country and practice") +
scale_fill_manual(values = colour)+ guides(fill=F)
The figure I obtain is this one where bars are all on top of each other while I want them "dodge".
I also obtain the following warning:
ymax not defined: adjusting position using y instead
Warning message:
position_dodge requires non-overlapping x intervals
Ideally I would get each bar near one another, with their error bars in the middle of its bar, all organised by country.
Also should I be concerned about the warning (which I clearly do not fully understand)?
I hope this makes sense. I hope I am close enough, but I don't seem to be going anywhere, some help would be greatly appreciated.
Thank you
ggplot's geom_bar() accepts the width parameter, but doesn't line them up neatly against one another in dodged position by default. The following workaround references the solution here:
library(dplyr)
# calculate x-axis position for bars of varying width
gr <- gr %>%
group_by(drug) %>%
arrange(practice) %>%
mutate(pos = 0.5 * (cumsum(prop) + cumsum(c(0, prop[-length(prop)])))) %>%
ungroup()
x.labels <- gr$practice[gr$drug == "a"]
x.pos <- gr$pos[gr$drug == "a"]
ggplot(gr,
aes(x = pos, y = prevalence,
fill = country, width = prop,
ymin = low.CI, ymax = high.CI)) +
geom_col(col = "black") +
geom_errorbar(size = 0.25, colour = "black") +
facet_wrap(~drug) +
scale_fill_manual(values = c("c1" = "gray79",
"c2" = "gray60",
"c3" = "gray39"),
guide = F) +
scale_x_continuous(name = "Drug",
labels = x.labels,
breaks = x.pos) +
labs(title = "Drug usage by country and practice", y = "Prevalence") +
theme_classic()
There is a lot of information you are trying to convey here - to contrast drug A and drug B across countries using the barplots and accounting for proportions, you might use the facet_grid function. Try this:
colour<-c(rep("gray79",10),rep("gray60",6),rep("gray39",2))
gr$drug <- paste("Drug", gr$drug)
p<-ggplot(data=gr, aes(x=factor(practice), y=as.numeric(prevalence),
ymax=high.CI,ymin = low.CI,
position="dodge",fill=practice, width=prop))
p + theme_bw()+ facet_grid(drug~country, scales="free") +
geom_bar(stat="identity") +
labs(x="Practice",y="Prevalence") +
geom_errorbar(position=position_dodge(0.9), width=0.25,size=0.25,colour="black") +
ggtitle("Drug usage by country and practice") +
scale_fill_manual(values = colour)+ guides(fill=F)
The width is too small in the C1 country and as you indicated the one clinic is quite influential.
Also, you can specify your aesthetics with the ggplot(aes(...)) and not have to reset it and it is not needed to include the dataframe objects name in the aes function within the ggplot call.

Visualizing the difference between two points with ggplot2

I want to visualize the difference between two points with a line/bar in ggplot2.
Suppose we have some data on income and spending as a time series.
We would like to visualize not only them, but the balance (=income - spending) as well.
Furthermore, we would like to indicate whether the balance was positive (=surplus) or negative (=deficit).
I have tried several approaches, but none of them produced a satisfying result. Here we go with a reproducible example.
# Load libraries and create LONG data example data.frame
library(dplyr)
library(ggplot2)
library(tidyr)
df <- data.frame(year = rep(2000:2009, times=3),
var = rep(c("income","spending","balance"), each=10),
value = c(0:9, 9:0, rep(c("deficit","surplus"), each=5)))
df
1.Approach with LONG data
Unsurprisingly, it doesn't work with LONG data,
because the geom_linerange arguments ymin and ymax cannot be specified correctly. ymin=value, ymax=value is definately the wrong way to go (expected behaviour). ymin=income, ymax=spending is obviously wrong, too (expected behaviour).
df %>%
ggplot() +
geom_point(aes(x=year, y=value, colour=var)) +
geom_linerange(aes(x=year, ymin=value, ymax=value, colour=net))
#>Error in function_list[[i]](value) : could not find function "spread"
2.Approach with WIDE data
I almost got it working with WIDE data.
The plot looks good, but the legend for the geom_point(s) is missing (expected behaviour).
Simply adding show.legend = TRUE to the two geom_point(s) doesn't solve the problem as it overprints the geom_linerange legend. Besides, I would rather have the geom_point lines of code combined in one (see 1.Approach).
df %>%
spread(var, value) %>%
ggplot() +
geom_linerange(aes(x=year, ymin=spending, ymax=income, colour=balance)) +
geom_point(aes(x=year, y=spending), colour="red", size=3) +
geom_point(aes(x=year, y=income), colour="green", size=3) +
ggtitle("income (green) - spending (red) = balance")
3.Approach using LONG and WIDE data
Combining the 1.Approach with the 2.Approach results in yet another unsatisfying plot. The legend does not differentiate between balance and var (=expected behaviour).
ggplot() +
geom_point(data=(df %>% filter(var=="income" | var=="spending")),
aes(x=year, y=value, colour=var)) +
geom_linerange(data=(df %>% spread(var, value)),
aes(x=year, ymin=spending, ymax=income, colour=balance))
Any (elegant) way out of this dilemma?
Should I use some other geom instead of geom_linerange?
Is my data in the right format?
Try
ggplot(df[df$var != "balance", ]) +
geom_point(
aes(x = year, y = value, fill = var),
size=3, pch = 21, colour = alpha("white", 0)) +
geom_linerange(
aes(x = year, ymin = income, ymax = spending, colour = balance),
data = spread(df, var, value)) +
scale_fill_manual(values = c("green", "red"))
Output:
The main idea is that we use two different types of aesthetics for colours (fill for the points, with the appropriate pch, and colour for the lines) so that we get separate legends for each.

Resources