How to add the mean line to grouped density plot in R? - r

I am trying to draw a grouped density plot and add the mean line of each plot; here is the code
data <- data.frame(
Accuracy=abs(rnorm(140)),
Species=c(rep("A.All",20),rep("B. double",60),rep("C.single",60),
rep("D.All",20),rep("E.double",60),rep("F.single",60)),
Modality=c(rep("All,w0",10),rep("double1,w0",10),rep("double2,w0",10),rep("double3,w0",10),
rep("single1,w0",10),rep("single2,w0",10),rep("single3,w0",10),
rep("All,w2",10),rep("double1,w2",10),rep("double2,w2",10),rep("double3,w2",10),
rep("single1,w2",10),rep("single2,w2",10),rep("single3,w2",10))
)
p<-ggplot(data, aes(x=Accuracy, fill=Modality)) +
geom_density(alpha=0.4)+
facet_wrap(. ~ Species) +
xlab("Accuracy") + ylab("Density")
library(plyr)
mu <- ddply(data, "Modality", summarise, grp.mean=mean(Accuracy))
head(mu)
# Add mean lines
a<-p+geom_vline(data=mu, aes(xintercept=grp.mean, color=Modality),
linetype="dashed")+ xlab("Accuracy") + ylab("Density")
However, based on the output figure as
The mean lines are absolutely incorrect, e.g. for the first picture on the top left, there should be two lines for two density plots, but a couple of lines are created and being repeated for all the figures.

You may specify both Species and Modality
plyr
dummy <- ddply(data, c("Species","Modality"), summarise, grp.mean=mean(Accuracy))
ggplot(data, aes(x=Accuracy, fill=Modality)) +
geom_density(alpha=0.4)+
facet_wrap(. ~ Species) +
xlab("Accuracy") + ylab("Density") +
geom_vline(data = dummy, aes(xintercept = grp.mean, color = Modality))
dplyr
library(dplyr)
dummy <- data %>%
group_by(Species, Modality) %>%
summarize(mean = mean(Accuracy))
ggplot(data, aes(x=Accuracy, fill=Modality)) +
geom_density(alpha=0.4)+
facet_wrap(. ~ Species) +
xlab("Accuracy") + ylab("Density") +
geom_vline(data = dummy, aes(xintercept = mean, color = Modality))

Related

How to add a mean line for grouped data plots

im using a loop to plot histograms of monthly air quality data, which grouped by year using the facet_grid() function. in my plots i have a mean line of mean of the month of all years, and i would like it to be the mean by month per year.
my code is:
for (z in vec) {
df.g <- pol %>% filter(poluentes==z)
df.g$year <- as.character(df.g$year)
df.g$month<- as.character(df.g$month)
mu <- ddply(df.g, "month", summarise, grp.mean=mean(value)) # mean line
print(ggplot(df.g, aes(x=value, fill=month, color=month)) +
geom_histogram(position="identity", alpha=0.2) +
labs(title=z,x="µg/m3", caption = "Análise: poluente") +
geom_vline(data=mu, aes(xintercept=grp.mean, color=month),
linetype="dashed") + facet_grid(year ~.))
}
the output is:
and as you can see, the mean line is the same for the 3 histograms
Your calculations of means need to include the year as well:
set.seed(111)
df.g = data.frame(year = sample(18:20,1000,replace=TRUE),
month = factor(sample(3:4,1000,replace=TRUE)),
value = rnbinom(1000,mu=50,size=1))
mu = aggregate(df.g$value,list(month=df.g$month,year=df.g$year),mean)
Then pass it:
ggplot(df.g,aes(x=value,fill=month,col=month)) +
geom_histogram(bins=20,position="identity", alpha=0.2) +
facet_grid(year ~ .) +
geom_vline(data = mu,aes(xintercept = x,col=month))

Keeping unit of measure in facet_wrap while scales="free_y"? [duplicate]

This question already has an answer here:
Setting individual y axis limits with facet wrap NOT with scales free_y
(1 answer)
Closed 4 years ago.
I'm trying to create a facet_wrap() where the unit of measure remains identical across the different plots, while allowing to slide across the y axis.
To clearify with I mean, I have created a dataset df:
library(tidyverse)
df <- tibble(
Year = c(2010,2011,2012,2010,2011,2012),
Category=c("A","A","A","B","B","B"),
Value=c(1.50, 1.70, 1.60, 4.50, 4.60, 4.55)
)
with df, we can create the following plot using facet_wrap:
ggplot(data = df, aes(x=Year, y=Value)) + geom_line() + facet_wrap(.~ Category)
Plot 1
To clarify the differences between both plots, one can use scale = "free_y":
ggplot(data = df, aes(x=Year, y=Value)) + geom_line()
+ facet_wrap(.~ Category, scale="free_y")
Plot 2
Although it's more clear, the scale on the y-axis in plot A isequal to 0.025, while being 0.0125 in B. This could be misleading to someone who's comparing A & B next to each other.
So my question right now is to know whether there exist an elegant way of plotting something like the graph below (with y-scale = 0.025) without having to plot two seperate plots into a grid?
Thanks
Desired result:
Code for the grid:
# Grid
## Plot A
df_A <- df %>%
filter(Category == "A")
plot_A <- ggplot(data = df_A, aes(x=Year, y=Value)) + geom_line() + coord_cartesian(ylim = c(1.5,1.7)) + ggtitle("A")
## Plot B
df_B <- df %>%
filter(Category == "B")
plot_B <- ggplot(data = df_B, aes(x=Year, y=Value)) + geom_line() + coord_cartesian(ylim = c(4.4,4.6)) + ggtitle("B")
grid.arrange(plot_A, plot_B, nrow=1)
Based on the info at Setting individual y axis limits with facet wrap NOT with scales free_y you can you use geom_blank() and manually specified y-limits by Category:
# df from above code
df2 <- tibble(
Category = c("A", "B"),
y_min = c(1.5, 4.4),
y_max = c(1.7, 4.6)
)
df <- full_join(df, df2, by = "Category")
ggplot(data = df, aes(x=Year, y=Value)) + geom_line() +
facet_wrap(.~ Category, scales = "free_y") +
geom_blank(aes(y = y_min)) +
geom_blank(aes(y = y_max))

Display the total number of bin elements in a stacked histogram with ggplot2

I'd like to show data values on stacked bar chart in ggplot2. After many attempts, the only way I found to show the total amount (for each bean) is using the following code
set.seed(1234)
df <- data.frame(
sex=factor(rep(c("F", "M"), each=200)),
weight=round(c(rnorm(200, mean=55, sd=5), rnorm(200, mean=65, sd=5)))
)
p<-ggplot(df, aes(x=weight, fill=sex, color=sex))
p<-p + geom_histogram(position="stack", alpha=0.5, binwidth=5)
tbl <- (ggplot_build(p)$data[[1]])[, c("x", "count")]
agg <- aggregate(tbl["count"], by=tbl["x"], FUN=sum)
for(i in 1:length(agg$x))
if(agg$count[i])
p <- p + geom_text(x=agg$x[i], y=agg$count[i] + 1.5, label=agg$count[i], colour="black" )
which generates the following plot:
Is there a better (and more efficient) way to get the same result using ggplot2?
Thanks a lot in advance
You can use stat_bin to count up the values and add text labels.
p <- ggplot(df, aes(x=weight)) +
geom_histogram(aes(fill=sex, color=sex),
position="stack", alpha=0.5, binwidth=5) +
stat_bin(aes(y=..count.. + 2, label=..count..), geom="text", binwidth=5)
I moved the fill and color aesthetics to geom_histogram so that they would apply only to that layer and not globally to the whole plot, because we want stat_bin to generate and overall count for each bin, rather than separate counts for each level of sex. ..count.. is an internal variable returned by stat_bin that stores the counts.
In this case, it was straightforward to add the counts directly. However, in more complicated situations, you might sometimes want to summarise the data outside of ggplot and then feed the summary data to ggplot. Here's how you would do that in this case:
library(dplyr)
counts = df %>% group_by(weight = cut(weight, seq(30,100,5), right=FALSE)) %>%
summarise(n = n())
countsByGroup = df %>% group_by(sex, weight = cut(weight, seq(30,100,5), right=FALSE)) %>%
summarise(n = n())
ggplot(countsByGroup, aes(x=weight, y=n, fill=sex, color=sex)) +
geom_bar(stat="identity", alpha=0.5, width=1) +
geom_text(data=counts, aes(label=n, y=n+2), colour="black")
Or, you can just create countsByGroup and then create the equivalent of counts on the fly inside ggplot:
ggplot(countsByGroup, aes(x=weight, y=n, fill=sex, color=sex)) +
geom_bar(stat="identity", alpha=0.5, width=1) +
geom_text(data=countsByGroup %>% group_by(weight) %>% mutate(n=sum(n)),
aes(label=n, y=n+2), colour="black")

R - How to overlay the average of a set of iid RVs

In the code below I build a 40x1000 data frame where in each column I have the cumulative means for successive random draws from an exponential distribution with parameter lambda = 0.2.
I add an additional column to host the specific number of the "draw".
I also calculate the rowmeans as df_means.
How do I add df_means (as a black line) on top of all my simulated RVs? I don't understand ggplot well enough to do this.
df <- data.frame(replicate(1000,cumsum(rexp(40,lambda))/(1:40)))
df$draw <- seq(1,40)
df_means <- rowMeans(df)
Molten <- melt(df, id.vars="draw")
ggplot(Molten, aes(x = draw, y = value, colour = variable)) + geom_line() + theme(legend.position = "none") + geom_line(df_means)
How would I add plot(df_means, type="l") to my ggplot, below?
Thank you,
You can make another data.frame with the means and ids and use that to draw the line,
df_means <- rowMeans(df)
means <- data.frame(id=1:40, mu=df_means)
ggplot(Molten, aes(x=draw, y=value, colour=variable)) +
geom_line() +
theme(legend.position = "none") +
geom_line(data=means, aes(x=id, y=mu), color="black")
As described here
stat_sum_df <- function(fun, geom="crossbar", ...) {
stat_summary(fun.data=fun, colour="red", geom=geom, width=0.2, ...)
}
k<-ggplot(Molten, aes(x = draw, y = value, colour = variable)) + geom_line() + theme(legend.position = "none")
k+stat_sum_single(mean) #gives you the required plot

Boxplot show the value of mean

In this boxplot we can see the mean but how can we have also the number value on the plot for every mean of every box plot?
ggplot(data=PlantGrowth, aes(x=group, y=weight, fill=group)) + geom_boxplot() +
stat_summary(fun.y=mean, colour="darkred", geom="point",
shape=18, size=3,show_guide = FALSE)
First, you can calculate the group means with aggregate:
means <- aggregate(weight ~ group, PlantGrowth, mean)
This dataset can be used with geom_text:
library(ggplot2)
ggplot(data=PlantGrowth, aes(x=group, y=weight, fill=group)) + geom_boxplot() +
stat_summary(fun=mean, colour="darkred", geom="point",
shape=18, size=3, show.legend=FALSE) +
geom_text(data = means, aes(label = weight, y = weight + 0.08))
Here, + 0.08 is used to place the label above the point representing the mean.
An alternative version without ggplot2:
means <- aggregate(weight ~ group, PlantGrowth, mean)
boxplot(weight ~ group, PlantGrowth)
points(1:3, means$weight, col = "red")
text(1:3, means$weight + 0.08, labels = means$weight)
You can use the output value from stat_summary()
ggplot(data=PlantGrowth, aes(x=group, y=weight, fill=group))
+ geom_boxplot()
+ stat_summary(fun.y=mean, colour="darkred", geom="point", hape=18, size=3,show_guide = FALSE)
+ stat_summary(fun.y=mean, colour="red", geom="text", show_guide = FALSE,
vjust=-0.7, aes( label=round(..y.., digits=1)))
You can also use a function within stat_summary to calculate the mean and the hjust argument to place the text, you need a additional function but no additional data frame:
fun_mean <- function(x){
return(data.frame(y=mean(x),label=mean(x,na.rm=T)))}
ggplot(PlantGrowth,aes(x=group,y=weight)) +
geom_boxplot(aes(fill=group)) +
stat_summary(fun.y = mean, geom="point",colour="darkred", size=3) +
stat_summary(fun.data = fun_mean, geom="text", vjust=-0.7)
The Magrittr way
I know there is an accepted answer already, but I wanted to show one cool way to do it in single command with the help of magrittr package.
PlantGrowth %$% # open dataset and make colnames accessible with '$'
split(weight,group) %T>% # split by group and side-pipe it into boxplot
boxplot %>% # plot
lapply(mean) %>% # data from split can still be used thanks to side-pipe '%T>%'
unlist %T>% # convert to atomic and side-pipe it to points
points(pch=18) %>% # add points for means to the boxplot
text(x=.+0.06,labels=.) # use the values to print text
This code will produce a boxplot with means printed as points and values:
I split the command on multiple lines so I can comment on what each part does, but it can also be entered as a oneliner. You can learn more about this in my gist.

Resources