Add mean to grouped box plots in R using ggplot2 - r

I have three Cultures of algae (A,B,C) at two temperatures (27C and 31C) and their densities. I want to make a box plot with Temperature in the x axis, Density in the y axis and the three cultures above each temperature (see picture below). I also need to include a dot with the mean density per culture and temperature. My script plots only one mean above each temperature but what I need is a mean for A#27C, B#27C, C#27 and A#31C, B#31C and C#31C. I tried to adapt some scripts with similar questions but I couldn’t get it to work. Any help would be much appreciated.
graph<-ggplot(Algae, aes(x = Temperature,
y = Density,
fill=Culture))+
geom_boxplot()+
stat_summary(fun=mean,
geom="point",
shape=20,
size=2,
color="red",
fill="red",
position = position_dodge2 (width = 0.5, preserve = "single"))

Remove fill in stat_summary and adapt the width in position
Here is an example with the mtcars data set:
ggplot(mtcars, aes(x = factor(am),
y = mpg,
fill=factor(cyl)))+
geom_boxplot() +
stat_summary(fun=mean,
geom="point",
shape=20,
size=2,
color="red",
position = position_dodge2 (width = 0.7, preserve = "single"))

I fixed it by adding a facet. Here is the script
graph <-ggplot(Algae, aes(x = Temperature,
y = Density,
fill=Culture))+
geom_boxplot()+
stat_summary(fun=mean,
geom="point",
shape=21,
size=2,
color="black",
fill="violet")+
facet_grid(.~Temperature,scales="free")
graph

Related

How to draw mean as a dotted line in boxplot using ggplot?

I was wondering if it's possible to draw a dotted line that corresponds to the mean value of my data in a box plot.
I know that there is possible to draw shapes with stat_summary() like for example drawing a + corresponding to the mean with stat_summary(fun.y=mean, shape="+", size=1, color = "black") nearest thing is using the geom="crossbar" but this is not dotted.
The idea is to get this graphed
You could achieve your desired result by setting linetype="dotted":
library(ggplot2)
ggplot(mtcars, aes(factor(cyl), mpg)) +
geom_boxplot() +
stat_summary(geom = "crossbar", fun = "mean", linetype = "dotted", width = .75)

Calculating means with stat_summary for two different groupings and plotting in one plot

I am having issues with plotting two calculated means using stat_summary in the same figure.
I am using ggplot and stat_summary to plot means of a dataset that I grouped based on variable A. Variable A can have value 1,2,3,4. The same data also have variable B that can have value 1,2.
So, I can make a plot with means of the data grouped after variable A, and I get 4 lines.
I can also make a plot with means of the data grouped after variable B, where I get 2 lines.
But how can I plot them in the same figure, so that I get 6 lines? I have made a somewhat similar example using the mtcars dataset:
library(ggplot2)
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$vs <- as.factor(mtcars$vs)
mtcars
plot1 <- ggplot(mtcars, aes(x=gear, y=hp, color=cyl, fill=cyl)) +
stat_summary(geom='ribbon', fun.data = mean_cl_normal, fun.args=list(conf.int=0.95), alpha=0.5) +
stat_summary(geom='line', fun.y = mean, size=1)
plot1
plot2 <- ggplot(mtcars, aes(x=gear, y=hp, color=vs, fill=vs)) +
stat_summary(geom='ribbon', fun.data = mean_cl_normal, fun.args=list(conf.int=0.95), alpha=0.5) +
stat_summary(geom='line', fun.y = mean, size=1)
plot2
So far I have the impression, that since I start with ggplot(xxx), where xxx defines the data and grouping, I can't combine it with another ggplot with another grouping. If I could initiate ggplot() without defining anything in the argument, but only defining data and grouping in the argument for stat_summary, I feel like that would be the solution. But I can't figure out how to use stat_summary like that, if even possible.
You can just add more layers, defining the aes for each seperately:
ggplot(mtcars) +
stat_summary(aes(x=gear, y=hp, color=paste('cyl:', cyl), fill = paste('cyl:', cyl)), geom='ribbon', fun.data = mean_cl_normal, fun.args=list(conf.int=0.95), alpha=0.5) +
stat_summary(aes(x=gear, y=hp, color=paste('cyl:', cyl)), geom='line', fun.y = mean, size=1) +
stat_summary(aes(x=gear, y=hp, color=paste('vs:', vs), fill=paste('vs:', vs)), geom='ribbon', fun.data = mean_cl_normal, fun.args=list(conf.int=0.95), alpha=0.5) +
stat_summary(aes(x=gear, y=hp, color=paste('vs:', vs)), geom='line', fun.y = mean, size=1)

how to plot probability histogram in ggplot2

I want to plot a probability histogram overlay with probability curve and compare them between two group.
my code is as following,
ggplot(MDmedianall, aes(x= MD_median, y=..density.., fill =IDH.type )) +
geom_histogram(alpha = 0.5,binwidth = 0.00010, position = 'identity') +
geom_density( stat="density", position="identity", alpha=0.3 ) +
scale_fill_discrete(breaks=c("0","1"), labels=c("IDH wild type","IDH mutant type")) +
scale_y_continuous(labels = scales :: percent) +
ylab("Relative cumulative frequency(%)") +
xlab("MD median value")
However, the y axis is not what I want, any reasons for that?
BTW, how to change the line style and label them within the color square on the right.

ggplot2: Shift the baseline of barplot (geom_bar) to the minimum data value

I'm trying to generate a bar plot using geom_bar. My bars have both negative and positive values:
set.seed(1)
df <- data.frame(y=log(c(runif(6,0,1),runif(6,1,10))),se=runif(12,0.05,0.1),name=factor(rep(c("a","a","b","b","c","c"),2),levels=c("a","b","c")),side=factor(rep(1:2,6),levels=1:2),group=factor(c(rep("x",6),rep("y",6)),levels=c("x","y")),stringsAsFactors=F)
This plot command plots the positive bars to face up and the negative ones to face down:
library(ggplot2)
dodge <- position_dodge(width=0.9)
limits <- aes(ymax=y+se,ymin=y-se)
ggplot(df,aes(x=name,y=y,group=interaction(side,name),col=group,fill=group))+facet_wrap(~group)+geom_bar(width=0.6,position=position_dodge(width=1),stat="identity")+
geom_bar(position=dodge,stat="identity")+geom_errorbar(limits,position=dodge,width=0.25)
My question is how do I set the base line to the minimum of all bars instead of at 0 and therefre have the red bars facing up?
You can subtract min(df$y) from each value so that the data are shifted to a baseline of zero, but then relabel the y-axis to the actual values of the points. The code to do it is below, but I wouldn't recommend this. It seems confusing to have bars emanating from a non-zero baseline, as the lengths of the bars no longer encode the magnitudes of the y values.
ggplot(df, aes(x=name,y=y - min(y),group=interaction(side, name), col=group, fill=group)) +
facet_wrap(~group) +
geom_bar(position=dodge, stat="identity", width=0.8) +
geom_errorbar(aes(ymin=y-se-min(y), ymax=y+se-min(y)),
position=dodge, width=0.25, colour="black") +
scale_y_continuous(breaks=0:4, labels=round(0:4 + min(df$y), 1)) +
geom_hline(aes(yintercept=0))
Another option is to use geom_linerange which avoids having to shift the y-values and relabel the y-axis. But this suffers from the same distortions as the bar plot above:
ggplot(df, aes(x=name, group=interaction(side, name), col=group, fill=group)) +
facet_wrap(~group) +
geom_linerange(aes(ymin=min(y), ymax=y, x=name, xend=name), position=dodge, size=10) +
geom_errorbar(aes(ymin=y-se, ymax=y+se), position=dodge, width=0.25, colour="black") +
geom_hline(aes(yintercept=min(y)))
Instead, it seems to me points would be more intuitive and natural than bars here:
ggplot(df, aes(x=name,y=y,group=interaction(side, name), col=group, fill=group)) +
facet_wrap(~group) +
geom_hline(yintercept=0, lwd=0.4, colour="grey50") +
geom_errorbar(limits, position=dodge, width=0.25) +
geom_point(position=dodge)
This simple hack also works:
m <- min(df$y) # find min
df$y <- df$y - m
ggplot(df,aes(x=name,y=y,group=interaction(side,name),col=group,fill=group))+
facet_wrap(~group)+
geom_bar(width=0.6,position=position_dodge(width=1),stat="identity")+
geom_bar(position=dodge,stat="identity")+
geom_errorbar(limits,position=dodge,width=0.25) +
scale_y_continuous(breaks=seq(min(df$y), max(df$y), length=5),labels=as.character(round(seq(m, max(df$y+m), length=5),2))) # relabel
I ran into the same problem and discovered you can also easily do this using geom_crossbar.
As long as color and fill are the same you don't see the break in the crossbar (set with y aesthetic) so they look exactly like bars.
library(ggplot2)
dodge <- position_dodge(width=0.9)
limits <- aes(ymax = y+se, ymin = y-se)
df$ymin <- min(df$y)
ggplot(df, aes(x = name, ymax = y, y = y, ymin = ymin, group = interaction(side,name), col = group, fill = group)) +
facet_wrap(~group) +
geom_crossbar(width=0.6,position=position_dodge(width=1),stat="identity") +
geom_errorbar(limits, color = 'black', position = dodge, width=0.25)
ggplot output

How can I force ggplot's geom_tile to fill every facet?

I am using ggplot's geom_tile to do 2-D density plots faceted by a factor. Every facet's scale goes from the minimum of all the data to the maximum of all the data, but the geom_tile in each facet only extends to the range of the data plotted in that facet.
Example code that demonstrates the problem:
library(ggplot2)
data.unlimited <- data.frame(x=rnorm(500), y=rnorm(500))
data.limited <- subset(data.frame(x=rnorm(500), y=rnorm(500)), x<1 & y<1 & x>-1 & y>-1)
mydata <- rbind(data.frame(groupvar="unlimited", data.unlimited),
data.frame(groupvar="limited", data.limited))
ggplot(mydata) +
aes(x=x,y=y) +
stat_density2d(geom="tile", aes(fill = ..density..), contour = FALSE) +
facet_wrap(~ groupvar)
Run the code, and you will see two facets. One facet shows a density plot of an "unlimited" random normal distribution. The second facet shows a random normal truncated to lie within a 2x2 square about the origin. The geom_tile in the "limited" facet will be confined inside this small box instead of filling the facet.
last_plot() +
scale_x_continuous(limits=c(-5,5)) +
scale_y_continuous(limits=c(-5,5))
These last three lines plot the same data with specified x and y limits, and we see that neither facet extends the tile sections to the edge in this case.
Is there any way to force the geom_tile in each facet to extend to the full range of the facet?
I think you're looking for a combination of scales = "free" and expand = c(0,0):
ggplot(mydata) +
aes(x=x,y=y) +
stat_density2d(geom="tile", aes(fill = ..density..), contour = FALSE) +
facet_wrap(~ groupvar,scales = "free") +
scale_x_continuous(expand = c(0,0)) +
scale_y_continuous(expand = c(0,0))
EDIT
Given the OP's clarification, here's one option via simply setting the panel background manually:
ggplot(mydata) +
aes(x=x,y=y) +
stat_density2d(geom="tile", aes(fill = ..density..), contour = FALSE) +
facet_wrap(~ groupvar) +
scale_fill_gradient(low = "blue", high = "red") +
opts(panel.background = theme_rect(fill = "blue"),panel.grid.major = theme_blank(),
panel.grid.minor = theme_blank())

Resources