How to add labels with observation count to stat_summary ggplot? - r

I have a dataset e.g.
outcome <- c(rnorm(500, 45, 10), rnorm(250, 40, 12), rnorm(150, 38, 7), rnorm(1000, 35, 10), rnorm(100, 30, 7))
group <- c(rep("A", 500), rep("B", 250), rep("C", 150), rep("D", 1000), rep("E", 100))
reprex <- data.frame(outcome, group)
I can plot this as a "dynamite" plot with:
graph <- ggplot(reprex, aes(x=group, y=outcome, fill=..y..)) +
stat_summary(geom = "bar", fun.y = mean) +
stat_summary(geom = "errorbar", fun.data = mean_cl_normal, width = 0.1)
giving:
I would also like to add beneath each column a label specifying how many observations were in that group. However I can't work out how to do this. I tried:
graph + geom_label (aes(label=paste(..count.., "Obs.", sep=" ")), y=-0.75, size=3.5, color="black", fontface="bold")
which returns
Error in paste(count, "Obs.", sep = " ") :
cannot coerce type 'closure' to vector of type 'character'
I've also tried
graph + stat_summary(aes(label=paste(..y.., "Obs.", sep=" ")), fun.y=count, geom="label")
but this returns:
Error: stat_summary requires the following missing aesthetics: y
I know that I can do this if I just make a dataframe of summary statistics first but that will result in me creating a new dataframe every time I need a graph and therefore I'd ideally like to be able to plot this using stat_summary() from the original dataset.
Does anyone know how to do this?

Without to create a new dataframe, you can get the count by using dplyr and calculating it ("on the fly") as follow:
library(dplyr)
library(ggplot2)
ggplot(reprex, aes(x=group, y=outcome, fill=..y..)) +
stat_summary(geom = "bar", fun.y = mean) +
stat_summary(geom = "errorbar", fun.data = mean_cl_normal, width = 0.1)+
geom_label(inherit.aes = FALSE, data = . %>% group_by(group) %>% count(),
aes(label = paste0(n, " Obs."), x = group), y = -0.5)

You cannot use stat="count" when there's already a y variable declared.. I would say the easiest way would be to create a small dataframe for counts:
label_df = reprex %>% group_by(group) %>% summarise(outcome=mean(outcome),n=n())
Then plot using that
ggplot(reprex, aes(x=group, y=outcome, fill=..y..)) +
stat_summary(geom = "bar", fun.y = mean) +
stat_summary(geom = "errorbar", fun.data = mean_cl_normal, width = 0.1)+
geom_text(data=label_df,aes(label=paste(n, "Obs.", sep=" ")), size=3.5, color="black", fontface="bold",nudge_y =1)

Related

Change colour in stacked plots, ggplot 2

I would like to use palette colours for my stacked plot:
p <- ggplot() + theme_bw() +
geom_bar(aes(fill = a, y = b, x= c), data = df, width = 0.7,
position="stack", stat="identity") + theme(legend.position="bottom")
I tried the following but it didn`t work:
p + scale_color_brewer(palette = "PuOr")
Futhermore I would like to plot a line showing the mean over the barplot. Maybe somebody has a Idea how to.
Some thoughts:
1) better to use geom_col than geom_bar for values you want the bar to represent, see the documentation
2) Used factor(...) to make continuous variables discrete
3) you code will be easier to read if you follow the order of arguments as set out in the documentation; although of course it does not matter what the order is.
4) updated to reflect request with mean for each x value
library(ggplot2)
library(dplyr)
df <- data.frame(a = c(2001, 2001, 2001, 2002, 2002, 2003),
x = c(6, 7, 8, 6, 7, 6),
y = c(1, 258, 1, 3, 9, 11))
#data frame for means
df_y_mean <-
df %>%
group_by(x) %>%
summarise(y_mean = mean(y))
ggplot() +
geom_col(data = df, aes(x = factor(x), y = y, fill = factor(a)), width = 0.7) +
geom_line(data = df_y_mean, aes(factor(x), y_mean, colour = "red"), group = 1, size = 1) +
scale_fill_brewer(palette = "PuOr", name = "Year") +
guides(colour = guide_legend(title = "Mean", label = FALSE)) +
theme_bw() +
theme(legend.position = "bottom")
Created on 2020-05-20 by the reprex package (v0.3.0)
You are defining fill but using scale_colour_brewer(). Use scale_fill_brewer() to modify fill.
To draw a horizontal line add geom_hline() to your plot call.
p <- ggplot() + theme_bw() +
geom_bar(aes(fill = a, y = b, x= c), data = df, width = 0.7,
position="stack", stat="identity") +
theme(legend.position="bottom")
my.mean <- mean(df$b) ## can be any value, change as needed
p + scale_fill_brewer(palette = "PuOr") + geom_hline(my.mean)

How to position labels on grouped bar plot columns in ggplot2

I am having trouble positioning percentage&count labels on a grouped barplot.
The labels are currently stacked together:
I think this is because I have been referring to an example code for a stacked barplot. I have tried adding position=position_dodge(width=1) to geom_textto unstack the labels, but I have gotten the following warning:
Warning: Ignoring unknown aesthetics: position
Don't know how to automatically pick scale for object of type PositionDodge/Position/ggproto/gg. Defaulting to continuous.
Error: Aesthetics must be valid data columns. Problematic aesthetic(s): position = position_dodge(width = 1).
Did you mistype the name of a data column or forget to add stat()?
Here is the code I have using the Titanic dataset:
data("titanic_train")
head(titanic_train, 6)
library(dplyr)
library(ggplot2)
titanic_train$Survived <- as.factor(titanic_train$Survived)
summary = titanic_train %>% group_by(Survived, Sex) %>% tally %>% mutate(pct = n/sum(n))
ggplot(summary, aes(x=Sex, y=n, fill=Survived)) + geom_bar(stat="identity", position="dodge") + geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%\n", n)), colour="black")
How can I resolve this?
You can just add position = position_dodge(width = 1) to your geom_text call, but outside of aes. Your error was caused by trying to put position... inside aes.
library(dplyr)
library(ggplot2)
library(titanic)
ggplot(summary, aes(x = Sex, y = n, fill = Survived)) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(label = paste0(sprintf("%1.1f", pct * 100), "%\n", n)),
colour = "black",
position = position_dodge(width = 1)) +
coord_cartesian(ylim = c(0, 550))
I would like to share an example which you could replicate the same by using your data
data
df <- data.frame(
x = factor(c(1, 1, 2, 2)),
y = c(1, 3, 2, 1),
grp = c("a", "b", "a", "b")
)
plot
ggplot(data = df, aes(x, y, group = grp)) +
geom_col(aes(fill = grp), position = "dodge") +
geom_text(
aes(label = y, y = y + 0.05),
position = position_dodge(0.9),
vjust = 0
)

GGPlot With Specifications

data=data.frame("grade"=c(1, 2, 3, 1, 2, 3),
"class"=c('a', 'a', 'a', 'b', 'b', 'b'),
"size"=c(1, 1, 2, 2, 2, 1),
"var"=c('q33', 'q35', 'q39', 'q33', 'q35', 'q39'),
"score"=c(5, 8, 7, 3, 7, 5))
My data have many group variables.
First I want to just plot 'score' by 'grade' with a line
library(reshape2, ggplot2)
ggplot(data, aes(x = grade, y = score)) + geom_line()
It gives a funny graph because I have 'grade' repeated for different classes and sizes.
If I take a subset of my data then the graph looks ok.
ggplot(subset(data, size == 1), aes(x = grade, y = score)) + geom_line()
So I wonder how can I plot my data 'score' by 'grade' for ALL combinations without the graph somehow combining all values?
Here is one approach. You can plot score vs. grade, and use stat_summary to add a line going through mean at each grade, and a ribbon that contains the 95% confidence interval. Is this what you had in mind?
library(ggplot2)
ggplot(data = data, mapping = aes(x = grade, y = score)) +
stat_summary(geom = "line", fun = mean, linetype = "dashed") +
stat_summary(geom = "ribbon", fun.data= mean_cl_normal, fun.args = list(conf.int=0.95), alpha=.1) +
scale_x_continuous(breaks = data$grade)
Plot
Alternatively, you can plot points for mean values at each grade and standard error bars.
library(tidyverse)
data %>%
group_by(grade) %>%
summarise(mean_score = mean(score),
SD = sd(score),
n = n(),
SE = SD/sqrt(n)) %>%
ggplot(mapping = aes(x = grade, y = mean_score)) +
geom_point() +
geom_line() +
geom_errorbar(aes(ymin = mean_score - SE, ymax = mean_score + SE), width = .1) +
scale_x_continuous(breaks = data$grade)
Plot
You could use facet_wrap(~class+size) this will give one plot per combination.

Bar plot without grouping variable

This seems like to simplest thing to do, but I have not been able to figure this out on R. For descriptive purposes, I want to create one bar graph that show the means and error plots of multiple questions/variables. My data is based on anonymous responses so there is no grouping variables.
Is there a way to do this on R? Below is an example of what my data looks like. I would like to plot mean and standard deviation of each variable next to each other in the same bar graph.
dat <- data.frame(satisfaction = c(1, 2, 3, 4),
engaged = c(2, 3, 4, 2),
relevant = c(4, 1, 3, 2),
recommend = c(4, 1, 3, 3))
What you could do is reshape the data into long format with reshape2 (or data.table or tidyr) without specifying an id-variable and using all columns as measure variables. After that you can create a plot with for example ggplot2. Using:
library(reshape2)
library(ggplot2)
# reshape into long format
dat2 <- melt(dat, measure.vars = 1:4) # or just: melt(dat)
# create the plot
ggplot(dat2, aes(x = variable, y = value)) +
stat_summary(geom = 'bar', fun.y = 'mean', width = 0.7, fill = 'grey') +
stat_summary(geom = 'errorbar', width = 0.2, size = 1.5) +
theme_minimal(base_size = 14) +
theme(axis.title = element_blank())
gives:
Update: As #GavinSimpson pointed out in his answer: for visualizing means and standard errors, a barplot is not the best alternative. As an alternative you could also use geom_pointrange:
ggplot(dat2, aes(x = variable, y = value)) +
stat_summary(geom = 'pointrange', fatten = 5, size = 1.2) +
theme_minimal(base_size = 14) +
theme(axis.title = element_blank())
which gives:
Whilst I know you asked for a barplot, a dotplot of the data is an alternative visualisation that focuses on the means and standard errors. If the drawing of a bar all the way to 0 is not that informative, the dotplot is a good alternative.
Reusing the objects and code from #Procrastinatus Maximus' answer we have:
ggplot(dat2, aes(x = variable, y = value)) +
stat_summary(geom = 'point', fun.y = 'mean', size = 2) +
stat_summary(geom = 'errorbar', width = 0.2) +
xlab(NULL) +
theme_bw()
which produces

Condition a ..count.. summation on the faceting variable

I'm trying to annotate a bar chart with the percentage of observations falling into that bucket, within a facet. This question is very closely related to this question:
Show % instead of counts in charts of categorical variables but the introduction of faceting introduces a wrinkle. The answer to the related question is to use stat_bin w/ the text geom and then have the label be constructed as so:
stat_bin(geom="text", aes(x = bins,
y = ..count..,
label = paste(round(100*(..count../sum(..count..)),1), "%", sep="")
)
This works fine for an un-faceted plot. However, with facets, this sum(..count..) is summing over the entire collection of observations without regard for the facets. The plot below illustrates the issue---note that the percentages do not sum to 100% within a panel.
Here the actually code for the figure above:
g.invite.distro <- ggplot(data = df.exp) +
geom_bar(aes(x = invite_bins)) +
facet_wrap(~cat1, ncol=3) +
stat_bin(geom="text", aes(x = invite_bins,
y = ..count..,
label = paste(round(100*(..count../sum(..count..)),1), "%", sep="")
),
vjust = -1, size = 3) +
theme_bw() +
scale_y_continuous(limits = c(0, 3000))
UPDATE: As per request, here's a small example re-producing the issue:
df <- data.frame(x = c('a', 'a', 'b','b'), f = c('c', 'd','d','d'))
ggplot(data = df) + geom_bar(aes(x = x)) +
stat_bin(geom = "text", aes(
x = x,
y = ..count.., label = ..count../sum(..count..)), vjust = -1) +
facet_wrap(~f)
Update geom_bar requires stat = identity.
Sometimes it's easier to obtain summaries outside the call to ggplot.
df <- data.frame(x = c('a', 'a', 'b','b'), f = c('c', 'd','d','d'))
# Load packages
library(ggplot2)
library(plyr)
# Obtain summary. 'Freq' is the count, 'pct' is the percent within each 'f'
m = ddply(data.frame(table(df)), .(f), mutate, pct = round(Freq/sum(Freq) * 100, 1))
# Plot the data using the summary data frame
ggplot(data = m, aes(x = x, y = Freq)) +
geom_bar(stat = "identity", width = .7) +
geom_text(aes(label = paste(m$pct, "%", sep = "")), vjust = -1, size = 3) +
facet_wrap(~ f, ncol = 2) + theme_bw() +
scale_y_continuous(limits = c(0, 1.2*max(m$Freq)))

Resources