Add a number of observations per group AND SUBGROUP in ggplot2 boxplot - r

This might seem like a duplicate of this question, but in fact I want to expand the original question.
I want to annote the boxplot with the number of observations per group AND SUBGROUP in ggplot. Following the example or the original post, here is my minimal example:
require(ggplot2)
give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
# experiment with the multiplier to find the perfect position
}
ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(gear))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median)
My problem is that the number of samples all line up in the center of the group, rather than plotting on the appropriate boxplot (as the picture below shows):

is it what you want?
require(ggplot2)
give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
# experiment with the multiplier to find the perfect position
}
ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(gear))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median, position=position_dodge(width=0.75))

Related

ggplot 2, boxplot add mean with legend

I would like to add for my ggplot boxplot a mean with a legend, but I don't know how.
The follwing R script is include in a function for a single variable:
p2 <- ggplot(df, aes("",!!sym(x))) +
geom_boxplot() +
stat_summary(fun=mean, geom="crossbar", color="steelblue2")
Thanks.
adapted from this solution Use stat_summary to annotate plot with number of observations you could try something like this:
fun_mean <- function(x) {return(data.frame(y = mean(x),
label = paste0("mean = ", round(mean(x), 2))))}
ggplot(mtcars, aes(factor(cyl), mpg, label = rownames(mtcars))) +
geom_boxplot(fill = "grey80", colour = "#3366FF") +
stat_summary(fun.data = fun_mean, geom = "text")
Hope that helps!

Centering dots in ggplot on faceted boxplot [duplicate]

Following this question: How to add a number of observations per group and use group mean in ggplot2 boxplot?, I want to add number of observations per group in ggplot boxplot too. But I have added a colour into aes mapping.
The existing answer shows how to adjust text position in y axis. How could I adjust the text position in the x axis?
This is a minimum example to reproduce my problem:
library(ggplot2)
give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
# experiment with the multiplier to find the perfect position
}
p <- ggplot(mtcars, aes(factor(vs), mpg, colour = factor(am))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median)
p
Thanks for any suggestions.
You can just use position:
p <- ggplot(mtcars, aes(factor(vs), mpg, colour = factor(am))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median,
position = position_dodge(width = 0.75))
p
The width argument of position_dodge() controls the positioning on the horizontal axis. 0.75 is the sweet spot, see how it works for different numbers of groupings:
p2 <- ggplot(mtcars, aes(factor(vs), mpg, colour = factor(cyl))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median,
position = position_dodge(width = 0.75))
p2
Instead of stat_summary, you can use geom_text. Please refer to the following question: ggplot2 add text on top of boxplots.
This is an example of how you may do it with the number of observations:
# Create an aggregate of median & count
> cts <- merge(aggregate(mpg ~ cyl + am, mtcars, length),
aggregate(mpg ~ cyl + am, mtcars, median),
by=c("cyl", "am"))
# Rename the col names to fit with the original dataset..
> names(cts) <- c("cyl", "am", "count", "mpg")
# As alexwhan suggested, position_dodge helps with positioning
# along the x-axis..
> ggplot(mtcars, aes(factor(cyl), mpg, colour = factor(am))) +
geom_boxplot(position = position_dodge(width=1.0)) +
geom_text(data = cts, aes(label=count),
position=position_dodge(width=1.0))

R ggplot stat_summary Inconsistent Results

I plot the average mpg in the mtcar data frame using ggplot. I get several points for each cylinder class denoting the mean value, categorized by the vs variable.
library(ggplot2)
ggplot(mtcars, aes(cyl, mpg)) + geom_point(aes(color = factor(vs)), stat = "summary", fun.y = "mean")
If I overlay these averages on top of the raw data by adding + geom_point (below) the averages differ from what they originally were above. What am I doing wrong? Why aren't the means consistent?
ggplot(mtcars, aes(cyl, mpg)) + geom_point() + geom_point(aes(color = factor(vs)), stat = "summary", fun.y = "mean")
How embarassing. I didn't even look at the scale of the Y-axis. Thank you aosmith. There is no inconsistency in stat_summary.

Change stat_summary colours based on group and add text to the label in ggplot2 boxplot

Following this question: Add number of observations per group in ggplot2 boxplot, how do I change the colours of stat_summary?
Below is the example code:
give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
}
p <- ggplot(mtcars, aes(factor(vs), mpg, colour = factor(am))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median,
position = position_dodge(width = 0.75))
p
So when I do the below to change the colours, it doesn't work (it gives me only 1 combined number)
stat_summary(fun.data = give.n, geom = "text", fun.y = median,
position = position_dodge(width = 0.75), colour = c("black", "red"))
And when I do the below to add the text "n=" it doesn't work either. (I try to add the "n=" in the function itself, by doing the below:
give.n <- function(x){
return(c(y = median(x)*1.05, label = paste0("n=",length(x))))
}
But I get the below error:
Error: Discrete value supplied to continuous scale
For the colours, you want to add these using scale_colour_manual, so plot call looks like:
p <-
ggplot(mtcars, aes(factor(vs), mpg, colour = factor(am))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median,
position = position_dodge(width = 0.75)) +
scale_colour_manual(values = c("black", "red"))
The answer to adding "n=" is a duplicate of this question: Use stat_summary to annotate plot with number of observations. You need to use data.frame(...) in your give.n function rather than c(...):
give.n <-
function(x){
return(data.frame(y = median(x)*1.05, label = paste0("n=",length(x))))
}
EDIT:
Re comment on changing colours for stat_summary items only, this proved a bit tricky in that I don't think you can have multiple scale_colour_manual layers. However, in this case you can make use of the fill aesthetic for box plots and leave the colour aesthetic for your text geom.
To make it cleaner, I've taken the colour and fill aesthetics out of the ggplot(...) call and put these in each geom:
p <-
ggplot(mtcars, aes(factor(vs), mpg)) +
geom_boxplot(aes(fill = factor(am))) +
stat_summary(aes(colour = factor(am)), fun.data = give.n,
geom = "text", fun.y = median, position = position_dodge(width = 0.75)) +
scale_colour_manual(values = c("black", "red"))
Then if you want to specify colours for the box plot fill you can use scale_fill_manual(...)

Add number of observations per group in ggplot2 boxplot

Following this question: How to add a number of observations per group and use group mean in ggplot2 boxplot?, I want to add number of observations per group in ggplot boxplot too. But I have added a colour into aes mapping.
The existing answer shows how to adjust text position in y axis. How could I adjust the text position in the x axis?
This is a minimum example to reproduce my problem:
library(ggplot2)
give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
# experiment with the multiplier to find the perfect position
}
p <- ggplot(mtcars, aes(factor(vs), mpg, colour = factor(am))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median)
p
Thanks for any suggestions.
You can just use position:
p <- ggplot(mtcars, aes(factor(vs), mpg, colour = factor(am))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median,
position = position_dodge(width = 0.75))
p
The width argument of position_dodge() controls the positioning on the horizontal axis. 0.75 is the sweet spot, see how it works for different numbers of groupings:
p2 <- ggplot(mtcars, aes(factor(vs), mpg, colour = factor(cyl))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median,
position = position_dodge(width = 0.75))
p2
Instead of stat_summary, you can use geom_text. Please refer to the following question: ggplot2 add text on top of boxplots.
This is an example of how you may do it with the number of observations:
# Create an aggregate of median & count
> cts <- merge(aggregate(mpg ~ cyl + am, mtcars, length),
aggregate(mpg ~ cyl + am, mtcars, median),
by=c("cyl", "am"))
# Rename the col names to fit with the original dataset..
> names(cts) <- c("cyl", "am", "count", "mpg")
# As alexwhan suggested, position_dodge helps with positioning
# along the x-axis..
> ggplot(mtcars, aes(factor(cyl), mpg, colour = factor(am))) +
geom_boxplot(position = position_dodge(width=1.0)) +
geom_text(data = cts, aes(label=count),
position=position_dodge(width=1.0))

Resources