R ggplot2: Add means as horizontal line in a boxplot - r

I have created a boxplot using ggplot2:
library(ggplot2)
dat <- data.frame(study = c(rep('a',50),rep('b',50)),
FPKM = c(rnorm(1:50),rnorm(1:50)))
ggplot(dat, aes(x = study, y = FPKM)) + geom_boxplot()
The boxplot shows the median as a horizontal line across each box.
How do I add a dashed line to the box representing the mean of that group?
Thanks!

You can add horizontal lines to plots by using stat_summary with geom_errorbar. The line is horizontal because the y minimum and maximum are set to be the same as y.
ggplot(dat, aes(x = study, y = FPKM)) +
geom_boxplot() +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
width = .75, linetype = "dashed")

Related

How to position or align mean lines in a grouped boxplot in ggplot2?

I'm working with a nested boxplot comparing observed landtypes and simulated land types. I want to show the means of each data set as a line (not a point) within the box of each boxplot.
To erase the median lines, and input the means as an "errorbar" I used the code:
No.Forest.Plot <- ggplot(data = NoForest_Long,
aes(x= value, y = variable, fill = ObsSim)) +
labs(x= "Percent of Cover", y= "Land Type Categories") +
ggtitle( "Observed vs. Simulated Land Categories (No Forest)") +
geom_boxplot(fatten = NULL) +
stat_summary(fun = mean, fun.min = mean,
fun.max = mean, geom = "errorbar", width = 0.5,)
Which provides
this plot. Unfortunately, the lines are centered on each category as opposed to the boxes themselves.
How do I get the means of both the "observed" and "simulated" to be centered in their respective boxes?
This could be achieved by setting the position of the errorbars. As geom_boxplot uses position_dodge(.75) by default you have to use the same position for your errorbars as well.
Using mtcars as example data:
library(ggplot2)
ggplot(mtcars, aes(x = mpg, y = factor(cyl), fill = factor(am))) +
geom_boxplot(fatten = NULL) +
stat_summary(fun = mean, fun.min = mean,
fun.max = mean, geom = "errorbar", position = position_dodge(.75), width = .7)

R plot errorbars with outliers

I'm trying to get the same aesthetic as below where the error bars look the same and have outliers shown. geom_errorbar and stat_summary is somewhat similar, but doesn't provide outliers. geom_boxplot provide outliers, but the box takes up too much space and I would prefer the slimmed down appearance below. Does anyone know how to achieve this with ggplot or without?
We can set the width of the boxplot to 0 then use stat_boxplot & stat_summary to produce the rest of the plot in the picture you added
library(ggplot2)
p1 <- ggplot(data = iris, aes(x = Species, y = Sepal.Length)) +
geom_boxplot(width = 0,
outlier.colour = "red") +
stat_boxplot(geom = "errorbar", width = 0.5) +
stat_summary(fun.y = mean, geom = "point", size = 2) +
stat_summary(fun.y = mean, geom = "line", aes(group = 1)) +
theme_bw()
p1
Created on 2018-03-18 by the reprex package (v0.2.0).

Plot multiple group histogram with overlaid line ggplot

I'm trying to plot a multiple group histogram with overlaid line, but I cannot get the right scaling for the histogram.
For example:
ggplot() + geom_histogram(data=df8,aes(x=log(Y),y=..density..),binwidth=0.15,colour='black') +
geom_line(data = as.data.frame(pdf8), aes(y=pdf8$f,x=pdf8$x), col = "black",size=1)+theme_bw()
produces the right scale. But when I try to perform fill according to groups, each group is scaled separately.
ggplot() + geom_histogram(data=df8,aes(x=log(Y),fill=vec8,y=..density..),binwidth=0.15,colour='black') +
geom_line(data = as.data.frame(pdf8), aes(y=pdf8$f,x=pdf8$x), col = "black",size=1)+theme_bw()
How would I scale it so that a black line is overlaid over the histogram and on the y axis is density?
It is going to be difficult for others to help you without a reproducible example, but perhaps something like this is what you're after:
library(ggplot2)
ggplot(data = mtcars, aes(x = mpg, fill = factor(cyl))) +
geom_histogram(aes(y = ..density..)) +
geom_line(stat = "density")
If you would rather the density line pertain to the entire dataset, you need to move the fill aesthetic into the geom_histogram function:
ggplot(data = mtcars, aes(x = mpg)) +
geom_histogram(aes(y = ..density.., fill = factor(cyl))) +
geom_line(data = mtcars, stat = "density")

Dodging boxplots and error bars with ggplot2

library(ggplot2)
library(Hmisc)
data(mtcars)
myplot <- ggplot(mtcars, aes(x = as.factor(cyl), y = qsec)) +
geom_boxplot() +
stat_summary(fun.y = mean, geom = "point", shape = 5, size = 2) +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar",
width = 0.2)
produces
I'd like to dodge the mean and error bars a bit to the right, such that the error bars don't obscure the IQR line of the boxplot. Specifying position=position_dodge(.5) doesn't seem to work, because geom_errorbardoesn't know about geom_boxplot.
You can introduce a new variable which you use as the x offset for your errorbars:
library(ggplot2)
library(Hmisc)
data(mtcars)
mtcars$cyl.n <- as.numeric(as.factor(mtcars$cyl)) + .5
(myplot <- ggplot(mtcars, aes(x = as.factor(cyl), y = qsec)) +
geom_boxplot() +
stat_summary(aes(x = cyl.n), fun.y = mean, geom = "point", shape = 5, size = 2) +
stat_summary(aes(x = cyl.n), fun.data = mean_cl_normal, geom = "errorbar",
width = 0.2))
The as.numeric(as.factor(.)) makes sure that the new error bar is spaced at the same position as the boxplots but shifted by 0.5 units.

How to annotate geom_bar above bars?

I'm trying to do a simple plot using ggplot2:
library(ggplot2)
ggplot(diamonds, aes(x = cut, y = depth)) +
geom_bar(stat = "identity", color = "blue") +
facet_wrap(~ color) +
geom_text(aes(x = cut, y = depth, label = cut, vjust = 0))
How can I annotate this plot so that I get annotations above bars? Now geom_text puts labels at the bottom of the bars, but I want them above these bars.
You can use stat_summary() to calculate position of y values as sum of depth and use geom="text" to add labels. The sum is used because your bars shows the sum of depth values for each cut value.
As suggest by #joran it is better to use stat_summary() instead of geom_bar() to show sums of y values because stat="identity" makes problems due to overplotting of bars and if there will be negative values then bar will start in negative part of plot and end in positive part - result will be not the actual sum of values.
ggplot(diamonds[1:100,], aes(x = cut, y = depth)) +
facet_wrap(~ color) +
stat_summary(fun.y = sum, geom="bar", fill = "blue", aes(label=cut, vjust = 0)) +
stat_summary(fun.y = sum, geom="text", aes(label=cut), vjust = 0)
You can also precalculate sum of depth values and the you can use geom_bar() with stat="identity" and geom_text().
library(plyr)
diamonds2<-ddply(diamonds,.(cut,color),summarise,depth=sum(depth))
ggplot(diamonds2,aes(x=cut,y=depth))+
geom_bar(stat="identity",fill="blue")+
geom_text(aes(label=cut),vjust=0,angle=45,hjust=0)+
facet_wrap(~color)

Resources