adding summary statistics to two factor boxplot - r

I would like to add summary statistics (e.g. mean) to the boxplot which have two factors. I have tried this:
library(ggplot2)
ggplot(ToothGrowth, aes(x = factor(dose), y = len)) +
stat_boxplot(geom = "errorbar", aes(col = supp, fill=supp), position = position_dodge(width = 0.85)) +
geom_boxplot(aes(col = supp, fill=supp), notch=T, notchwidth = 0.5, outlier.size=2, position = position_dodge(width = 0.85)) +
stat_summary(fun.y=mean, aes(supp,dose), geom="point", shape=20, size=7, color="violet", fill="violet") +
scale_color_manual(name = "SUPP", values = c("blue", "darkgreen")) +
scale_fill_manual(name = "SUPP", values = c("lightblue", "green"))
I got this picture:
It is possible somehow put the sample size of each box (e.g. top of the whiskers)? I have tried this:
ggplot(ToothGrowth, aes(x = factor(dose), y = len)) +
stat_boxplot(geom = "errorbar", aes(col = supp, fill=supp), position = position_dodge(width = 0.85)) +
geom_boxplot(aes(col = supp, fill=supp), notch=T, notchwidth = 0.5, outlier.size=2, position = position_dodge(width = 0.85)) +
stat_summary(fun.y=mean,aes(supp,dose),geom="point", shape=20, size=7, color="violet", fill="violet") +
scale_color_manual(name = "SUPP", values = c("blue", "darkgreen")) +
scale_fill_manual(name = "SUPP", values = c("lightblue", "green")) +
geom_text(data = ToothGrowth,
group_by(dose, supp),
summarize(Count = n(),
q3 = quantile(ToothGrowth, 0.75),
iqr = IQR(ToothGrowth),
aes(x= dose, y = len,label = paste0("n = ",Count, "\n")), position = position_dodge(width = 0.75)))

You can state the aesthetics just once by putting them in the main ggplot call and then they will apply to all of the geom layers: ggplot(ToothGrowth, aes(x = factor(dose), y = len, color=supp, fill=supp))
For the count of observations: The data summary step in geom_text isn't coded properly. Also, to set len (the y-value) for the text placement, the summarize function needs to output values for len.
To add the mean values in the correct locations on the x-axis, use stat_summary with the exact same aesthetics as the other geoms and stats. I've overridden the color aesthetic by setting the color to yellow so that the point markers will be visible on top of the box plot fill colors.
The code to implement the plot is below:
library(tidyverse)
pd = position_dodge(0.85)
ggplot(ToothGrowth, aes(x = factor(dose), y = len, color=supp, fill=supp)) +
stat_boxplot(geom = "errorbar", position = pd) +
geom_boxplot(notch=TRUE, notchwidth=0.5, outlier.size=2, position=pd) +
stat_summary(fun.y=mean, geom="point", shape=3, size=2, colour="yellow", stroke=1.5,
position=pd, show.legend=FALSE) +
scale_color_manual(name = "SUPP", values = c("blue", "darkgreen")) +
scale_fill_manual(name = "SUPP", values = c("lightblue", "green")) +
geom_text(data = ToothGrowth %>% group_by(dose, supp) %>%
summarize(Count = n(),
len=max(len) + 0.05 * diff(range(ToothGrowth$len))),
aes(label = paste0("n = ", Count)),
position = pd, size=3, show.legend = FALSE) +
theme_bw()
Note that the notch goes outside the hinges for all of the box plots. Also, having the sample size just above the maximum of each boxplot seems distracting and unnecessary to me. You could place all of the text annotations at the bottom of the plot like this:
geom_text(data = ToothGrowth %>% group_by(dose, supp) %>%
summarize(Count = n()) %>%
ungroup %>%
mutate(len=min(ToothGrowth$len) - 0.05 * diff(range(ToothGrowth$len))),
aes(label = paste0("n = ", Count)),
position = pd, size=3, show.legend = FALSE) +

Related

How to calculate standard error instead of standard deviation in ggplot

I need some help to figure out to estimate the standard error using the following R script:
library(ggplot2)
library(ggpubr)
library(Hmisc)
data("ToothGrowth")
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
head(ToothGrowth, 4)
theme_set(
theme_classic() +
theme(legend.position = "top")
)
# Initiate a ggplot
e <- ggplot(ToothGrowth, aes(x = dose, y = len))
# Add mean points +/- SD
# Use geom = "pointrange" or geom = "crossbar"
e + geom_violin(trim = FALSE) +
stat_summary(
fun.data = "mean_sdl", fun.args = list(mult = 1),
geom = "pointrange", color = "black"
)
# Combine with box plot to add median and quartiles
# Change fill color by groups, remove legend
e + geom_violin(aes(fill = dose), trim = FALSE) +
geom_boxplot(width = 0.2)+
scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))+
theme(legend.position = "none")
Many thanks for the help
Kind regards
A couple of things. First, you need to reassign e when you add geom_violin and stat_summary. Otherwise, it isn't carrying those changes forward when you add the boxplot in the next step. Second, when you add the boxplot last, it is mapping over the points and error bars from stat_summary so it looks like they're disappearing. If you add the boxplot first and then stat_summary the points and error bars will be placed on top of the boxplot. Here is an example:
library(ggplot2)
library(ggpubr)
library(Hmisc)
data("ToothGrowth")
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
theme_set(
theme_classic() +
theme(legend.position = "top")
)
# Initiate a ggplot
e <- ggplot(ToothGrowth, aes(x = dose, y = len))
# Add violin plot
e <- e + geom_violin(trim = FALSE)
# Combine with box plot to add median and quartiles
# Change fill color by groups, remove legend
e <- e + geom_violin(aes(fill = dose), trim = FALSE) +
geom_boxplot(width = 0.2)+
scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))+
theme(legend.position = "none")
# Add mean points +/- SE
# Use geom = "pointrange" or geom = "crossbar"
e +
stat_summary(
fun.data = "mean_se", fun.args = list(mult = 1),
geom = "pointrange", color = "black"
)
You said in a comment that you couldn't see any changes when you tried mean_se and mean_cl_normal. Perhaps the above solution will have solved the problem, but you should see a difference. Here is an example just comparing mean_se and mean_sdl. You should notice the error bars are smaller with mean_se.
ggplot(ToothGrowth, aes(x = dose, y = len)) +
stat_summary(
fun.data = "mean_sdl", fun.args = list(mult = 1),
geom = "pointrange", color = "black"
)
ggplot(ToothGrowth, aes(x = dose, y = len)) +
stat_summary(
fun.data = "mean_se", fun.args = list(mult = 1),
geom = "pointrange", color = "black"
)
Here is a simplified solution if you don't want to reassign at each step:
ggplot(ToothGrowth, aes(x = dose, y = len)) +
geom_violin(aes(fill = dose), trim = FALSE) +
geom_boxplot(width = 0.2) +
stat_summary(fun.data = "mean_se", fun.args = list(mult = 1),
geom = "pointrange", color = "black") +
scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) +
theme(legend.position = "none")

How to add a legend manually for line chart

i need the plan legend
How to add a legend manually for geom_line
ggplot(data = impact_end_Current_yr_m_actual, aes(x = month, y = gender_value)) +
geom_col(aes(fill = gender))+theme_classic()+
geom_line(data = impact_end_Current_yr_m_plan, aes(x=month, y= gender_value, group=1),color="#288D55",size=1.2)+
geom_point(data = impact_end_Current_yr_m_plan, aes(x=month, y=gender_value))+
theme(axis.line.y = element_blank(),axis.ticks = element_blank(),legend.position = "bottom", axis.text.x = element_text(face = "bold", color = "black", size = 10, angle = 0, hjust = 1))+
labs(x="", y="End Beneficiaries (in Num)", fill="")+
scale_fill_manual(values=c("#284a8d", "#00B5CE","#0590eb","#2746c2"))+
scale_y_continuous(labels = function(x) format(x, scientific = FALSE)
The neatest way to do it I think is to add colour = "[label]" into the aes() section of geom_line() then put the manual assigning of a colour into scale_colour_manual() here's an example from mtcars (apologies that it uses stat_summary instead of geom_line but does the same trick):
library(tidyverse)
mtcars %>%
ggplot(aes(gear, mpg, fill = factor(cyl))) +
stat_summary(geom = "bar", fun = mean, position = "dodge") +
stat_summary(geom = "line",
fun = mean,
size = 3,
aes(colour = "Overall mean", group = 1)) +
scale_fill_discrete("") +
scale_colour_manual("", values = "black")
Created on 2020-12-08 by the reprex package (v0.3.0)
The limitation here is that the colour and fill legends are necessarily separate. Removing labels (blank titles in both scale_ calls) doesn't them split them up by legend title.
In your code you would probably want then:
...
ggplot(data = impact_end_Current_yr_m_actual, aes(x = month, y = gender_value)) +
geom_col(aes(fill = gender))+
geom_line(data = impact_end_Current_yr_m_plan,
aes(x=month, y= gender_value, group=1, color="Plan"),
size=1.2)+
scale_color_manual(values = "#288D55") +
...
(but I cant test on your data so not sure if it works)

ggplot: color points by density as they approach a specific value?

I have a dataset containing 1,000 values for a model, these values are all within the same range (y=40-70), so the points overlap a ton. I'm interested in using color to show the density of the points converging on a single value (y=56.72) which I have indicated with a horizontal dashed line on the plot below. How can I color these points to show this?
ggplot(data, aes(x=model, y=value))+
geom_point(size=1) +
geom_hline(yintercept=56.72,
linetype="dashed",
color = "black")
I think that you should opt for an histogram or density plot:
n <- 500
data <- data.frame(model= rep("model",n),value = rnorm(n,56.72,10))
ggplot(data, aes(x = value, y = after_stat(count))) +
geom_histogram(binwidth = 1)+
geom_density(size = 1)+
geom_vline(xintercept = 56.72, linetype = "dashed", color = "black")+
theme_bw()
Here is your plot with the same data:
ggplot(data, aes(x = model, y = value))+
geom_point(size = 1) +
geom_hline(yintercept = 56.72, linetype = "dashed", color = "black")
If your model is iterative and do converge to the value, I suggest you plot as a function of the iteration to show the convergence. An other option, keeping a similar plot to your, is dodging the position of the points :
ggplot(data, aes(x = model, y = value))+
geom_point(position = position_dodge2(width = 0.2),
shape = 1,
size = 2,
stroke = 1,
alpha = 0.5) +
geom_hline(yintercept = 56.72, linetype = "dashed", color = "black")
Here is a color density plot as you asked:
library(dplyr)
library(ggplot2)
data %>%
mutate(bin = cut(value, breaks = 10:120)) %>%
dplyr::group_by(bin) %>%
mutate(density = dplyr::n()) %>%
ggplot(aes(x = model, y = value, color = density))+
geom_point(size = 1) +
geom_hline(yintercept = 56.72, linetype = "dashed", color = "black")+
scale_colour_viridis_c(option = "A")
I would suggest to use the alpha parameter within the geom_point. You should use a value close to 0.
ggplot(data, aes(x=model, y=value)) +
geom_point(size=1, alpha = .1) +
geom_hline(yintercept=56.72, linetype="dashed", color = "black")

Removing points from geom_bar legend ggplot r

This is my data.
Mod <- as.factor(c(rep("GLM",5),rep("MLP",5),rep("RF",5),rep("SVML",5),rep("SVMR",5)))
Manifold <- as.factor(rep(c("LLE","Iso","PCA","MDS","kPCA"),5))
ROC <- runif(25,0,1)
Sens <- runif(25,0,1)
Spec <- runif(25,0,1)
df <- data.frame("Mod"= Mod, "Manifold"= Manifold, "ROC" = ROC, "Sens" = sens, "Spec" = spec)
And I am making this graph
resul3 <- ggplot(df, aes(x = Mod, y = ROC, fill= Manifold)) +
geom_bar(stat = "identity", position = "dodge", color = "black") +
ylab("ROC & Specificity") +
xlab("Classifiers") +
theme_bw() +
ggtitle("Classifiers' ROC per Feature Extraction Plasma") +
geom_point(aes(y=Spec), color = "black", position=position_dodge(.9)) +
scale_fill_manual(name = "Feature \nExtraction", values = c("#FFEFCA",
"#EDA16A" ,"#C83741", "#6C283D", "#62BF94"))
first graph
And what I want is another legend with tittle "Specificity" and a single black point. I dont want the point to be inside the Manifolds legend.
Something like this but without the points inside the manifold squares
Changing the geom_point line, adding a scale_color_manual and using the override as seen in #drmariod's answer will result in this plot:
ggplot(df, aes(x = Mod, y = ROC, fill= Manifold)) +
geom_bar(stat = "identity", position = "dodge", color = "black") +
ylab("ROC & Specificity") +
xlab("Classifiers") +
theme_bw() +
ggtitle("Classifiers' ROC per Feature Extraction Plasma") +
geom_point(aes(y=Spec, color = "Specificity"), position=position_dodge(.9)) +
scale_fill_manual(name = "Feature \nExtraction", values = c("#FFEFCA",
"#EDA16A" ,"#C83741", "#6C283D", "#62BF94")) +
scale_color_manual(name = NULL, values = c("Specificity" = "black")) +
guides(fill = guide_legend(override.aes = list(shape = NA)))
You can overwrite the aesthetics for shape and set it to NA like this
ggplot(df, aes(x = Mod, y = ROC, fill= Manifold)) +
geom_bar(stat = "identity", position = "dodge", color = "black") +
ylab("ROC & Specificity") +
xlab("Classifiers") +
theme_bw() +
ggtitle("Classifiers' ROC per Feature Extraction Plasma") +
geom_point(aes(y=Spec), color = "black", position=position_dodge(.9)) +
scale_fill_manual(name = "Feature \nExtraction", values = c("#FFEFCA",
"#EDA16A" ,"#C83741", "#6C283D", "#62BF94")) +
guides(fill = guide_legend(override.aes = list(shape = NA)))

Aligning subsetted data points with ggplot2

I'm trying to build a complex figure that overlays individual data points on a boxplot to display both summary statistics as well as dispersion of the raw data. I have 2 questions in rank order of importance:
How do I center the jittered points around the middle of their respective box plot?
How can I remove the dark dots from the "drv" legend?
Code:
library(ggplot2)
library(dplyr)
mpg$cyl <- as.factor(mpg$cyl)
mpg %>% filter(fl=="p" | fl=="r" & cyl!="5") %>% sample_n(100) %>% ggplot(aes(cyl, hwy, fill=drv)) +
stat_boxplot(geom = "errorbar", width=0.5, position = position_dodge(1)) +
geom_boxplot(position = position_dodge(1), outlier.shape = NA)+
geom_point(aes(fill=drv, shape=fl), color="black", show.legend=TRUE, alpha=0.5, size=3, position = position_jitterdodge(dodge.width = 1)) +
scale_shape_manual(values = c(21,23))
It looks like the current dodging for geom_point is based on both fill and shape. Use group to indicate you only want to dodge on drv.
You can use override.aes in guide_legend to remove the points from the fill legend.
mpg %>%
filter(fl=="p" | fl=="r" & cyl!="5") %>%
sample_n(100) %>%
ggplot(aes(cyl, hwy, fill=drv)) +
stat_boxplot(geom = "errorbar", width=0.5, position = position_dodge(1)) +
geom_boxplot(position = position_dodge(1), outlier.shape = NA)+
geom_point(aes(fill = drv, shape = fl, group = drv), color="black",
alpha =0.5, size=3,
position = position_jitterdodge(jitter.width = .1, dodge.width = 1)) +
scale_shape_manual (values = c(21,23) ) +
guides(fill = guide_legend(override.aes = list(shape = NA) ) )

Resources