Aligning subsetted data points with ggplot2 - r

I'm trying to build a complex figure that overlays individual data points on a boxplot to display both summary statistics as well as dispersion of the raw data. I have 2 questions in rank order of importance:
How do I center the jittered points around the middle of their respective box plot?
How can I remove the dark dots from the "drv" legend?
Code:
library(ggplot2)
library(dplyr)
mpg$cyl <- as.factor(mpg$cyl)
mpg %>% filter(fl=="p" | fl=="r" & cyl!="5") %>% sample_n(100) %>% ggplot(aes(cyl, hwy, fill=drv)) +
stat_boxplot(geom = "errorbar", width=0.5, position = position_dodge(1)) +
geom_boxplot(position = position_dodge(1), outlier.shape = NA)+
geom_point(aes(fill=drv, shape=fl), color="black", show.legend=TRUE, alpha=0.5, size=3, position = position_jitterdodge(dodge.width = 1)) +
scale_shape_manual(values = c(21,23))

It looks like the current dodging for geom_point is based on both fill and shape. Use group to indicate you only want to dodge on drv.
You can use override.aes in guide_legend to remove the points from the fill legend.
mpg %>%
filter(fl=="p" | fl=="r" & cyl!="5") %>%
sample_n(100) %>%
ggplot(aes(cyl, hwy, fill=drv)) +
stat_boxplot(geom = "errorbar", width=0.5, position = position_dodge(1)) +
geom_boxplot(position = position_dodge(1), outlier.shape = NA)+
geom_point(aes(fill = drv, shape = fl, group = drv), color="black",
alpha =0.5, size=3,
position = position_jitterdodge(jitter.width = .1, dodge.width = 1)) +
scale_shape_manual (values = c(21,23) ) +
guides(fill = guide_legend(override.aes = list(shape = NA) ) )

Related

How to add a legend manually for line chart

i need the plan legend
How to add a legend manually for geom_line
ggplot(data = impact_end_Current_yr_m_actual, aes(x = month, y = gender_value)) +
geom_col(aes(fill = gender))+theme_classic()+
geom_line(data = impact_end_Current_yr_m_plan, aes(x=month, y= gender_value, group=1),color="#288D55",size=1.2)+
geom_point(data = impact_end_Current_yr_m_plan, aes(x=month, y=gender_value))+
theme(axis.line.y = element_blank(),axis.ticks = element_blank(),legend.position = "bottom", axis.text.x = element_text(face = "bold", color = "black", size = 10, angle = 0, hjust = 1))+
labs(x="", y="End Beneficiaries (in Num)", fill="")+
scale_fill_manual(values=c("#284a8d", "#00B5CE","#0590eb","#2746c2"))+
scale_y_continuous(labels = function(x) format(x, scientific = FALSE)
The neatest way to do it I think is to add colour = "[label]" into the aes() section of geom_line() then put the manual assigning of a colour into scale_colour_manual() here's an example from mtcars (apologies that it uses stat_summary instead of geom_line but does the same trick):
library(tidyverse)
mtcars %>%
ggplot(aes(gear, mpg, fill = factor(cyl))) +
stat_summary(geom = "bar", fun = mean, position = "dodge") +
stat_summary(geom = "line",
fun = mean,
size = 3,
aes(colour = "Overall mean", group = 1)) +
scale_fill_discrete("") +
scale_colour_manual("", values = "black")
Created on 2020-12-08 by the reprex package (v0.3.0)
The limitation here is that the colour and fill legends are necessarily separate. Removing labels (blank titles in both scale_ calls) doesn't them split them up by legend title.
In your code you would probably want then:
...
ggplot(data = impact_end_Current_yr_m_actual, aes(x = month, y = gender_value)) +
geom_col(aes(fill = gender))+
geom_line(data = impact_end_Current_yr_m_plan,
aes(x=month, y= gender_value, group=1, color="Plan"),
size=1.2)+
scale_color_manual(values = "#288D55") +
...
(but I cant test on your data so not sure if it works)

Ggplot - How to add 2 different error bars?

I try to make a bar plot with 2 different confidence intervals - one for the proportion of females at a sample and the other for the proportion of males. Each category has of course two different confidence intervals, how can I make this graph with only 2 and not 4 confidence intervals?
ggplot(data, aes(x= GENDER)) +
geom_bar(aes(y = (..count..)/sum(..count..)), stat="count", fill=c("deeppink","deepskyblue"), alpha=0.7) +
scale_y_continuous("Percent",labels = scales::percent)+
geom_text(aes(label = scales::percent((..count..)/sum(..count..)),
y= ((..count..)/sum(..count..))), stat="count",vjust = 5) +
geom_errorbar (aes(ymin = ymin1, ymax =ymax1), width=0.4, colour = "red", alpha =0.9, size= 1.3)+
geom_errorbar (aes(ymin = ymin2, ymax =ymax2), width=0.4, colour = "red", alpha =0.9, size= 1.3)
Thanks ahead!
You only need to call geom_errorbar once, where ymin and ymax are vectors. The way you've coded it, ggplot is plotting both error bars in both positions because it is expecting a vector of positions for ymin and ymax equal to the number of bars in your plot. E.g.
ymin = c(ymin1,ymin2)
ymax = c(ymax1,ymax2)
ggplot(data, aes(x= GENDER)) +
geom_bar(aes(y = (..count..)/sum(..count..)), stat="count", fill=c("deeppink","deepskyblue"), alpha=0.7) +
scale_y_continuous("Percent",labels = scales::percent)+
geom_text(aes(label = scales::percent((..count..)/sum(..count..)),
y= ((..count..)/sum(..count..))), stat="count",vjust = 5) +
geom_errorbar (aes(ymin = ymin, ymax =ymax), width=0.4, colour = "red", alpha =0.9, size= 1.3)
You could also manage the data ahead of time with dplyr::summarise() to make a dataset that only has one observation per group that has whatever summary values you want. This makes the plotting code a bit more streamlined. I did this on the Chile data from the carData package
data("Chile", package="carData")
library(dplyr)
library(ggplot2)
Chile %>%
group_by(sex) %>%
dplyr::summarise(n_g= n(),
n = sum(!is.na(Chile$sex)),
prop = n_g/n,
ymin = binom.test(n_g, n)$conf.int[1],
ymax = binom.test(n_g, n)$conf.int[2]) %>%
ggplot(aes(x=sex, y=prop, ymin=ymin, ymax=ymax) )+
geom_bar(stat="identity", alpha=.7, fill=c("deeppink","deepskyblue")) +
geom_errorbar(width=.4, colour="red", alpha=.9, size=1.3) +
geom_text(aes(label = scales::percent(prop)), vjust = 5)
EDIT
changed to use prop.test() as in the OP's comment.
data("Chile", package="carData")
library(dplyr)
library(ggplot2)
data <- Chile
data <- data %>% rename("GENDER" = "sex")
data %>%
group_by(GENDER) %>%
dplyr::summarise(n_g= n(),
n = sum(!is.na(data$GENDER)),
prop = n_g/n,
ymin = prop.test(n_g, n, p=.7, alternative="greater")$conf.int[1],
ymax = prop.test(n_g, n, p=.7, alternative="greater")$conf.int[2]) %>%
ggplot(aes(x=GENDER, y=prop, ymin=ymin, ymax=ymax) )+
geom_bar(stat="identity", alpha=.7, fill=c("deeppink","deepskyblue")) +
geom_errorbar(width=.4, colour="red", alpha=.9, size=1.3) +
geom_text(aes(label = scales::percent(prop)), vjust = 5)

How to define width in error bars in ggplot2 (R)?

I have the following data that I'm trying to plot. I'm trying to change the width of the error bar but I run into an error that says Width not defined. Set with position_dodge(width = ?). I tried doing the position_dodge..but it didn't help. Any suggestions?
library(ggplot2)
time <- c("t1","t1","t1","t1","t1","t1","t2","t2","t2","t2","t2","t2")
species <- c(1,1,1,2,2,2,1,1,1,2,2,2)
value <- c(1,2,3,11,12,13,4,5,6,11,12,13)
df <- data.frame(time, species,value)
df$time <- as.factor(df$time)
df$species <- as.factor(df$species)
ggplot(df,aes(x=time, y=value, color = species, group = species)) + # Change fill to color
theme_bw() +
geom_point() +
stat_summary(fun.y=mean, position = "dodge") +
stat_summary(
geom="errorbar",
fun.data= mean_cl_boot,
width = 0.1, size = 0.2, col = "grey57") +
# Lines by species using grouping
stat_summary(aes(group = species), geom = "line", fun.y = mean) +
ylab("Fitness")
Position dodge is used to show all data points when data points overlap, I am not sure if this is of any use in your example and you may find that just removing this argument solves the issue if your data are not overlapping. Keeping it constant alternatively solves the issue:
pd<-position_dodge(0.5)
ggplot(df,aes(x=time, y=value, color = species, group = species)) + # Change fill to color
theme_bw() +
geom_point(position = pd) +
stat_summary(fun.y=mean, position = pd) +
stat_summary(
geom="errorbar",
fun.data= mean_cl_boot,
width = 0.1, size = 0.2, col = "grey57",
position = pd) +
# Lines by species using grouping
stat_summary(aes(group = species), geom = "line", fun.y = mean, position = pd) +
ylab("Fitness")
Just edited to keep everything from breaking apart.

adding summary statistics to two factor boxplot

I would like to add summary statistics (e.g. mean) to the boxplot which have two factors. I have tried this:
library(ggplot2)
ggplot(ToothGrowth, aes(x = factor(dose), y = len)) +
stat_boxplot(geom = "errorbar", aes(col = supp, fill=supp), position = position_dodge(width = 0.85)) +
geom_boxplot(aes(col = supp, fill=supp), notch=T, notchwidth = 0.5, outlier.size=2, position = position_dodge(width = 0.85)) +
stat_summary(fun.y=mean, aes(supp,dose), geom="point", shape=20, size=7, color="violet", fill="violet") +
scale_color_manual(name = "SUPP", values = c("blue", "darkgreen")) +
scale_fill_manual(name = "SUPP", values = c("lightblue", "green"))
I got this picture:
It is possible somehow put the sample size of each box (e.g. top of the whiskers)? I have tried this:
ggplot(ToothGrowth, aes(x = factor(dose), y = len)) +
stat_boxplot(geom = "errorbar", aes(col = supp, fill=supp), position = position_dodge(width = 0.85)) +
geom_boxplot(aes(col = supp, fill=supp), notch=T, notchwidth = 0.5, outlier.size=2, position = position_dodge(width = 0.85)) +
stat_summary(fun.y=mean,aes(supp,dose),geom="point", shape=20, size=7, color="violet", fill="violet") +
scale_color_manual(name = "SUPP", values = c("blue", "darkgreen")) +
scale_fill_manual(name = "SUPP", values = c("lightblue", "green")) +
geom_text(data = ToothGrowth,
group_by(dose, supp),
summarize(Count = n(),
q3 = quantile(ToothGrowth, 0.75),
iqr = IQR(ToothGrowth),
aes(x= dose, y = len,label = paste0("n = ",Count, "\n")), position = position_dodge(width = 0.75)))
You can state the aesthetics just once by putting them in the main ggplot call and then they will apply to all of the geom layers: ggplot(ToothGrowth, aes(x = factor(dose), y = len, color=supp, fill=supp))
For the count of observations: The data summary step in geom_text isn't coded properly. Also, to set len (the y-value) for the text placement, the summarize function needs to output values for len.
To add the mean values in the correct locations on the x-axis, use stat_summary with the exact same aesthetics as the other geoms and stats. I've overridden the color aesthetic by setting the color to yellow so that the point markers will be visible on top of the box plot fill colors.
The code to implement the plot is below:
library(tidyverse)
pd = position_dodge(0.85)
ggplot(ToothGrowth, aes(x = factor(dose), y = len, color=supp, fill=supp)) +
stat_boxplot(geom = "errorbar", position = pd) +
geom_boxplot(notch=TRUE, notchwidth=0.5, outlier.size=2, position=pd) +
stat_summary(fun.y=mean, geom="point", shape=3, size=2, colour="yellow", stroke=1.5,
position=pd, show.legend=FALSE) +
scale_color_manual(name = "SUPP", values = c("blue", "darkgreen")) +
scale_fill_manual(name = "SUPP", values = c("lightblue", "green")) +
geom_text(data = ToothGrowth %>% group_by(dose, supp) %>%
summarize(Count = n(),
len=max(len) + 0.05 * diff(range(ToothGrowth$len))),
aes(label = paste0("n = ", Count)),
position = pd, size=3, show.legend = FALSE) +
theme_bw()
Note that the notch goes outside the hinges for all of the box plots. Also, having the sample size just above the maximum of each boxplot seems distracting and unnecessary to me. You could place all of the text annotations at the bottom of the plot like this:
geom_text(data = ToothGrowth %>% group_by(dose, supp) %>%
summarize(Count = n()) %>%
ungroup %>%
mutate(len=min(ToothGrowth$len) - 0.05 * diff(range(ToothGrowth$len))),
aes(label = paste0("n = ", Count)),
position = pd, size=3, show.legend = FALSE) +

r ggplot2 facet_grid how to add space between the top of the chart and the border

Is there a way to add space between the labels on the top of the chart and the margin of a plot using ggplot's facet_grid. Below is a reproducible example.
library(dplyr)
library(ggplot2)
Titanic %>% as.data.frame() %>%
filter(Survived == "Yes") %>%
mutate(FreqSurvived = ifelse(Freq > 100, Freq*1e+04,Freq)) %>%
ggplot( aes(x = Age, y = FreqSurvived, fill = Sex)) +
geom_bar(stat = "identity", position = "dodge") +
facet_grid(Class ~ ., scales = "free") +
theme_bw() +
geom_text(aes(label = prettyNum(FreqSurvived,big.mark = ",")), vjust = 0, position = position_dodge(0.9), size = 2)
The resulting chart has the label of numbers right next to the border of the plot.
I wanted to add to #dww 's answer, but don't have enough reputation.
The expand option actually will allow you to add space only to the top of your graph. From the ?expand_scale help file:
# No space below the bars but 10% above them
ggplot(mtcars) +
geom_bar(aes(x = factor(cyl))) +
scale_y_continuous(expand = expand_scale(mult = c(0, .1)))
One simple way is to use the expand argument of scale_y_continuous:
dt = Titanic %>% as.data.frame() %>%
filter(Survived == "Yes") %>%
mutate(FreqSurvived = ifelse(Freq > 100, Freq*1e+04,Freq))
ggplot(dt, aes(x = Age, y = FreqSurvived, fill = Sex)) +
geom_bar(stat = "identity", position = "dodge") +
facet_grid(Class ~ ., scales = "free") +
theme_bw() +
geom_text(aes(label = prettyNum(FreqSurvived,big.mark = ",")),
vjust = 0, position = position_dodge(0.9), size = 2) +
scale_y_continuous(expand = c(0.1,0))
The downside of using expand is that it will add space both above and below the bars. An alternative is to plot some invisible data on the graph at a height above the bars, which will force ggplt to expand the axis ranges to accomodate this dummy data. Here I add some invisible bars whose height is 1.2* the actual bars:
Titanic %>% as.data.frame() %>%
filter(Survived == "Yes") %>%
mutate(FreqSurvived = ifelse(Freq > 100, Freq*1e+04,Freq)) %>%
ggplot( aes(x = Age, y = FreqSurvived, fill = Sex)) +
geom_bar(aes(y = FreqSurvived*1.2), stat = "identity",
position = "dodge", fill=NA) +
geom_bar(stat = "identity", position = "dodge") +
facet_grid(Class ~ ., scales = "free") +
theme_bw() +
geom_text(aes(label = prettyNum(FreqSurvived,big.mark = ",")),
vjust = 0,
position = position_dodge(0.9), size = 2)

Resources