How to calculate standard error instead of standard deviation in ggplot - r

I need some help to figure out to estimate the standard error using the following R script:
library(ggplot2)
library(ggpubr)
library(Hmisc)
data("ToothGrowth")
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
head(ToothGrowth, 4)
theme_set(
theme_classic() +
theme(legend.position = "top")
)
# Initiate a ggplot
e <- ggplot(ToothGrowth, aes(x = dose, y = len))
# Add mean points +/- SD
# Use geom = "pointrange" or geom = "crossbar"
e + geom_violin(trim = FALSE) +
stat_summary(
fun.data = "mean_sdl", fun.args = list(mult = 1),
geom = "pointrange", color = "black"
)
# Combine with box plot to add median and quartiles
# Change fill color by groups, remove legend
e + geom_violin(aes(fill = dose), trim = FALSE) +
geom_boxplot(width = 0.2)+
scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))+
theme(legend.position = "none")
Many thanks for the help
Kind regards

A couple of things. First, you need to reassign e when you add geom_violin and stat_summary. Otherwise, it isn't carrying those changes forward when you add the boxplot in the next step. Second, when you add the boxplot last, it is mapping over the points and error bars from stat_summary so it looks like they're disappearing. If you add the boxplot first and then stat_summary the points and error bars will be placed on top of the boxplot. Here is an example:
library(ggplot2)
library(ggpubr)
library(Hmisc)
data("ToothGrowth")
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
theme_set(
theme_classic() +
theme(legend.position = "top")
)
# Initiate a ggplot
e <- ggplot(ToothGrowth, aes(x = dose, y = len))
# Add violin plot
e <- e + geom_violin(trim = FALSE)
# Combine with box plot to add median and quartiles
# Change fill color by groups, remove legend
e <- e + geom_violin(aes(fill = dose), trim = FALSE) +
geom_boxplot(width = 0.2)+
scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))+
theme(legend.position = "none")
# Add mean points +/- SE
# Use geom = "pointrange" or geom = "crossbar"
e +
stat_summary(
fun.data = "mean_se", fun.args = list(mult = 1),
geom = "pointrange", color = "black"
)
You said in a comment that you couldn't see any changes when you tried mean_se and mean_cl_normal. Perhaps the above solution will have solved the problem, but you should see a difference. Here is an example just comparing mean_se and mean_sdl. You should notice the error bars are smaller with mean_se.
ggplot(ToothGrowth, aes(x = dose, y = len)) +
stat_summary(
fun.data = "mean_sdl", fun.args = list(mult = 1),
geom = "pointrange", color = "black"
)
ggplot(ToothGrowth, aes(x = dose, y = len)) +
stat_summary(
fun.data = "mean_se", fun.args = list(mult = 1),
geom = "pointrange", color = "black"
)
Here is a simplified solution if you don't want to reassign at each step:
ggplot(ToothGrowth, aes(x = dose, y = len)) +
geom_violin(aes(fill = dose), trim = FALSE) +
geom_boxplot(width = 0.2) +
stat_summary(fun.data = "mean_se", fun.args = list(mult = 1),
geom = "pointrange", color = "black") +
scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) +
theme(legend.position = "none")

Related

Raincloud plot - histogram?

I would like to create a raincloud plot. I have successfully done it. But I would like to know if instead of the density curve, I can put a histogram (it's better for my dataset).
This is my code if it can be usefull
ATSC <- ggplot(data = data, aes(y = atsc, x = numlecteur, fill = numlecteur)) +
geom_flat_violin(position = position_nudge(x = .2, y = 0), alpha = .5) +
geom_point(aes(y = atsc, color = numlecteur), position = position_jitter(width = .15), size = .5, alpha = 0.8) +
geom_point(data = sumld, aes(x = numlecteur, y = mean), position = position_nudge(x = 0.25), size = 2.5) +
geom_errorbar(data = sumld, aes(ymin = lower, ymax = upper, y = mean), position = position_nudge(x = 0.25), width = 0) +
guides(fill = FALSE) +
guides(color = FALSE) +
scale_color_brewer(palette = "Spectral") +
scale_y_continuous(breaks=c(0,2,4,6,8,10), labels=c("0","2","4","6","8","10"))+
scale_fill_brewer(palette = "Spectral") +
coord_flip() +
theme_bw() +
expand_limits(y=c(0, 10))+
xlab("Lecteur") + ylab("Age total sans check")+
raincloud_theme
I think we can maybe put the "geom_histogram()" but it doesn't work
Thank you in advance for your help !
(sources : https://peerj.com/preprints/27137v1.pdf
https://neuroconscience.wordpress.com/2018/03/15/introducing-raincloud-plots/)
This is actually not quite easy. There are a few challenges.
geom_histogram is "horizontal by nature", and the custom geom_flat_violin is vertical - as are boxplots. Therefore the final call to coord_flip in that tutorial. In order to combine both, I think best is switch x and y, forget about coord_flip, and use ggstance::geom_boxploth instead.
Creating separate histograms for each category is another challenge. My workaround to create facets and "merge them together".
The histograms are scaled way bigger than the width of the points/boxplots. My workaround scale via after_stat function.
How to nudge the histograms to the right position above Boxplot and points - I am converting the discrete scale to a continuous by mapping a constant numeric to the global y aesthetic, and then using the facet labels for discrete labels.
library(tidyverse)
my_data<-read.csv("https://data.bris.ac.uk/datasets/112g2vkxomjoo1l26vjmvnlexj/2016.08.14_AnxietyPaper_Data%20Sheet.csv")
my_datal <-
my_data %>%
pivot_longer(cols = c("AngerUH", "DisgustUH", "FearUH", "HappyUH"), names_to = "EmotionCondition", values_to = "Sensitivity")
# use y = -... to position boxplot and jitterplot below the histogram
ggplot(data = my_datal, aes(x = Sensitivity, y = -.5, fill = EmotionCondition)) +
# after_stat for scaling
geom_histogram(aes(y = after_stat(count/100)), binwidth = .05, alpha = .8) +
# from ggstance
ggstance::geom_boxploth( width = .1, outlier.shape = NA, alpha = 0.5) +
geom_point(aes(color = EmotionCondition), position = position_jitter(width = .15), size = .5, alpha = 0.8) +
# merged those calls to one
guides(fill = FALSE, color = FALSE) +
# scale_y_continuous(breaks = 1, labels = unique(my_datal$EmotionCondition))
scale_color_brewer(palette = "Spectral") +
scale_fill_brewer(palette = "Spectral") +
# facetting, because each histogram needs its own y
# strip position = left to fake discrete labels in continuous scale
facet_wrap(~EmotionCondition, nrow = 4, scales = "free_y" , strip.position = "left") +
# remove all continuous labels from the y axis
theme(axis.title.y = element_blank(), axis.text.y = element_blank(),
axis.ticks.y = element_blank())
Created on 2021-04-15 by the reprex package (v1.0.0)

Removing points from geom_bar legend ggplot r

This is my data.
Mod <- as.factor(c(rep("GLM",5),rep("MLP",5),rep("RF",5),rep("SVML",5),rep("SVMR",5)))
Manifold <- as.factor(rep(c("LLE","Iso","PCA","MDS","kPCA"),5))
ROC <- runif(25,0,1)
Sens <- runif(25,0,1)
Spec <- runif(25,0,1)
df <- data.frame("Mod"= Mod, "Manifold"= Manifold, "ROC" = ROC, "Sens" = sens, "Spec" = spec)
And I am making this graph
resul3 <- ggplot(df, aes(x = Mod, y = ROC, fill= Manifold)) +
geom_bar(stat = "identity", position = "dodge", color = "black") +
ylab("ROC & Specificity") +
xlab("Classifiers") +
theme_bw() +
ggtitle("Classifiers' ROC per Feature Extraction Plasma") +
geom_point(aes(y=Spec), color = "black", position=position_dodge(.9)) +
scale_fill_manual(name = "Feature \nExtraction", values = c("#FFEFCA",
"#EDA16A" ,"#C83741", "#6C283D", "#62BF94"))
first graph
And what I want is another legend with tittle "Specificity" and a single black point. I dont want the point to be inside the Manifolds legend.
Something like this but without the points inside the manifold squares
Changing the geom_point line, adding a scale_color_manual and using the override as seen in #drmariod's answer will result in this plot:
ggplot(df, aes(x = Mod, y = ROC, fill= Manifold)) +
geom_bar(stat = "identity", position = "dodge", color = "black") +
ylab("ROC & Specificity") +
xlab("Classifiers") +
theme_bw() +
ggtitle("Classifiers' ROC per Feature Extraction Plasma") +
geom_point(aes(y=Spec, color = "Specificity"), position=position_dodge(.9)) +
scale_fill_manual(name = "Feature \nExtraction", values = c("#FFEFCA",
"#EDA16A" ,"#C83741", "#6C283D", "#62BF94")) +
scale_color_manual(name = NULL, values = c("Specificity" = "black")) +
guides(fill = guide_legend(override.aes = list(shape = NA)))
You can overwrite the aesthetics for shape and set it to NA like this
ggplot(df, aes(x = Mod, y = ROC, fill= Manifold)) +
geom_bar(stat = "identity", position = "dodge", color = "black") +
ylab("ROC & Specificity") +
xlab("Classifiers") +
theme_bw() +
ggtitle("Classifiers' ROC per Feature Extraction Plasma") +
geom_point(aes(y=Spec), color = "black", position=position_dodge(.9)) +
scale_fill_manual(name = "Feature \nExtraction", values = c("#FFEFCA",
"#EDA16A" ,"#C83741", "#6C283D", "#62BF94")) +
guides(fill = guide_legend(override.aes = list(shape = NA)))

adding summary statistics to two factor boxplot

I would like to add summary statistics (e.g. mean) to the boxplot which have two factors. I have tried this:
library(ggplot2)
ggplot(ToothGrowth, aes(x = factor(dose), y = len)) +
stat_boxplot(geom = "errorbar", aes(col = supp, fill=supp), position = position_dodge(width = 0.85)) +
geom_boxplot(aes(col = supp, fill=supp), notch=T, notchwidth = 0.5, outlier.size=2, position = position_dodge(width = 0.85)) +
stat_summary(fun.y=mean, aes(supp,dose), geom="point", shape=20, size=7, color="violet", fill="violet") +
scale_color_manual(name = "SUPP", values = c("blue", "darkgreen")) +
scale_fill_manual(name = "SUPP", values = c("lightblue", "green"))
I got this picture:
It is possible somehow put the sample size of each box (e.g. top of the whiskers)? I have tried this:
ggplot(ToothGrowth, aes(x = factor(dose), y = len)) +
stat_boxplot(geom = "errorbar", aes(col = supp, fill=supp), position = position_dodge(width = 0.85)) +
geom_boxplot(aes(col = supp, fill=supp), notch=T, notchwidth = 0.5, outlier.size=2, position = position_dodge(width = 0.85)) +
stat_summary(fun.y=mean,aes(supp,dose),geom="point", shape=20, size=7, color="violet", fill="violet") +
scale_color_manual(name = "SUPP", values = c("blue", "darkgreen")) +
scale_fill_manual(name = "SUPP", values = c("lightblue", "green")) +
geom_text(data = ToothGrowth,
group_by(dose, supp),
summarize(Count = n(),
q3 = quantile(ToothGrowth, 0.75),
iqr = IQR(ToothGrowth),
aes(x= dose, y = len,label = paste0("n = ",Count, "\n")), position = position_dodge(width = 0.75)))
You can state the aesthetics just once by putting them in the main ggplot call and then they will apply to all of the geom layers: ggplot(ToothGrowth, aes(x = factor(dose), y = len, color=supp, fill=supp))
For the count of observations: The data summary step in geom_text isn't coded properly. Also, to set len (the y-value) for the text placement, the summarize function needs to output values for len.
To add the mean values in the correct locations on the x-axis, use stat_summary with the exact same aesthetics as the other geoms and stats. I've overridden the color aesthetic by setting the color to yellow so that the point markers will be visible on top of the box plot fill colors.
The code to implement the plot is below:
library(tidyverse)
pd = position_dodge(0.85)
ggplot(ToothGrowth, aes(x = factor(dose), y = len, color=supp, fill=supp)) +
stat_boxplot(geom = "errorbar", position = pd) +
geom_boxplot(notch=TRUE, notchwidth=0.5, outlier.size=2, position=pd) +
stat_summary(fun.y=mean, geom="point", shape=3, size=2, colour="yellow", stroke=1.5,
position=pd, show.legend=FALSE) +
scale_color_manual(name = "SUPP", values = c("blue", "darkgreen")) +
scale_fill_manual(name = "SUPP", values = c("lightblue", "green")) +
geom_text(data = ToothGrowth %>% group_by(dose, supp) %>%
summarize(Count = n(),
len=max(len) + 0.05 * diff(range(ToothGrowth$len))),
aes(label = paste0("n = ", Count)),
position = pd, size=3, show.legend = FALSE) +
theme_bw()
Note that the notch goes outside the hinges for all of the box plots. Also, having the sample size just above the maximum of each boxplot seems distracting and unnecessary to me. You could place all of the text annotations at the bottom of the plot like this:
geom_text(data = ToothGrowth %>% group_by(dose, supp) %>%
summarize(Count = n()) %>%
ungroup %>%
mutate(len=min(ToothGrowth$len) - 0.05 * diff(range(ToothGrowth$len))),
aes(label = paste0("n = ", Count)),
position = pd, size=3, show.legend = FALSE) +

Aesthetics must be either length 1 or the same as the data (1): x, y, label

I'm working on some data on party polarization (something like this) and used geom_dumbbell from ggalt and ggplot2. I keep getting the same aes error and other solutions in the forum did not address this as effectively. This is my sample data.
df <- data_frame(policy=c("Not enough restrictions on gun ownership", "Climate change is an immediate threat", "Abortion should be illegal"),
Democrats=c(0.54, 0.82, 0.30),
Republicans=c(0.23, 0.38, 0.40),
diff=sprintf("+%d", as.integer((Democrats-Republicans)*100)))
I wanted to keep order of the plot, so converted policy to factor and wanted % to be shown only on the first line.
df <- arrange(df, desc(diff))
df$policy <- factor(df$policy, levels=rev(df$policy))
percent_first <- function(x) {
x <- sprintf("%d%%", round(x*100))
x[2:length(x)] <- sub("%$", "", x[2:length(x)])
x
}
Then I used ggplot that rendered something close to what I wanted.
gg2 <- ggplot()
gg2 <- gg + geom_segment(data = df, aes(y=country, yend=country, x=0, xend=1), color = "#b2b2b2", size = 0.15)
# making the dumbbell
gg2 <- gg + geom_dumbbell(data=df, aes(y=country, x=Democrats, xend=Republicans),
size=1.5, color = "#B2B2B2", point.size.l=3, point.size.r=3,
point.color.l = "#9FB059", point.color.r = "#EDAE52")
I then wanted the dumbbell to read Democrat and Republican on top to label the two points (like this). This is where I get the error.
gg2 <- gg + geom_text(data=filter(df, country=="Government will not control gun violence"),
aes(x=Democrats, y=country, label="Democrats"),
color="#9fb059", size=3, vjust=-2, fontface="bold", family="Calibri")
gg2 <- gg + geom_text(data=filter(df, country=="Government will not control gun violence"),
aes(x=Republicans, y=country, label="Republicans"),
color="#edae52", size=3, vjust=-2, fontface="bold", family="Calibri")
Any thoughts on what I might be doing wrong?
I think it would be easier to build your own "dumbbells" with geom_segment() and geom_point(). Working with your df and changing the variable refences "country" to "policy":
library(tidyverse)
# gather data into long form to make ggplot happy
df2 <- gather(df,"party", "value", Democrats:Republicans)
ggplot(data = df2, aes(y = policy, x = value, color = party)) +
# our dumbell
geom_path(aes(group = policy), color = "#b2b2b2", size = 2) +
geom_point(size = 7, show.legend = FALSE) +
# the text labels
geom_text(aes(label = party), vjust = -1.5) + # use vjust to shift text up to no overlap
scale_color_manual(values = c("Democrats" = "blue", "Republicans" = "red")) + # named vector to map colors to values in df2
scale_x_continuous(limits = c(0,1), labels = scales::percent) # use library(scales) nice math instead of pasting
Produces this plot:
Which has some overlapping labels. I think you could avoid that if you use just the first letter of party like this:
ggplot(data = df2, aes(y = policy, x = value, color = party)) +
geom_path(aes(group = policy), color = "#b2b2b2", size = 2) +
geom_point(size = 7, show.legend = FALSE) +
geom_text(aes(label = gsub("^(\\D).*", "\\1", party)), vjust = -1.5) + # just the first letter instead
scale_color_manual(values = c("Democrats" = "blue", "Republicans" = "red"),
guide = "none") +
scale_x_continuous(limits = c(0,1), labels = scales::percent)
Only label the top issue with names:
ggplot(data = df2, aes(y = policy, x = value, color = party)) +
geom_path(aes(group = policy), color = "#b2b2b2", size = 2) +
geom_point(size = 7, show.legend = FALSE) +
geom_text(data = filter(df2, policy == "Not enough restrictions on gun ownership"),
aes(label = party), vjust = -1.5) +
scale_color_manual(values = c("Democrats" = "blue", "Republicans" = "red")) +
scale_x_continuous(limits = c(0,1), labels = scales::percent)

Coloring the vertical lines by Class in ggplot

I want to plot the distribution of a variable by Class and add vertical lines denoting the means of the subsets defined by each Class and having them colored by Class. While I succeed to color the distributions by Class, the vertical lines appear gray. For a reproducible example see below:
library(data.table)
library(ggplot2)
library(ggthemes)
data(mtcars)
setDT(mtcars)
mtcars[, am := factor(am, levels = c(1, 0))]
mean_data <- mtcars[, .(mu = mean(hp)), by = am]
ggplot(mtcars, aes(x = hp, fill = am , color = am)) +
geom_histogram(aes(y=..density..), position="identity",alpha = 0.4) + guides(color = FALSE) +
geom_density (alpha = 0.5)+
geom_vline(data = mean_data, xintercept = mean_data$mu, aes(color = as.factor(mean_data$am)), size = 2, alpha = 0.5) +
ggtitle("Hp by am") + scale_fill_discrete(labels=c("am" , "no am")) +
labs(fill = "Transmission") + theme_economist()
This code renders the following plot:
Your advice will be appreciated.
You need to include the xintercept mapping in your aes call, so that ggplot properly maps all the aesthetics:
ggplot(mtcars, aes(x = hp, fill = am , color = am)) +
geom_histogram(aes(y=..density..), position="identity",alpha = 0.4) + guides(color = FALSE) +
geom_density (alpha = 0.5)+
geom_vline(data = mean_data, aes(xintercept = mu, color = as.factor(am)), size = 2, alpha = 0.5) +
ggtitle("Hp by am") + scale_fill_discrete(labels=c("am" , "no am")) +
labs(fill = "Transmission") + theme_economist()
Anything you put in a geom call that's not in aes gets treated as a one-off value, and doesn't get all the mapped aesthetics applied to it.

Resources