R ggplot stat_summary Inconsistent Results - r

I plot the average mpg in the mtcar data frame using ggplot. I get several points for each cylinder class denoting the mean value, categorized by the vs variable.
library(ggplot2)
ggplot(mtcars, aes(cyl, mpg)) + geom_point(aes(color = factor(vs)), stat = "summary", fun.y = "mean")
If I overlay these averages on top of the raw data by adding + geom_point (below) the averages differ from what they originally were above. What am I doing wrong? Why aren't the means consistent?
ggplot(mtcars, aes(cyl, mpg)) + geom_point() + geom_point(aes(color = factor(vs)), stat = "summary", fun.y = "mean")

How embarassing. I didn't even look at the scale of the Y-axis. Thank you aosmith. There is no inconsistency in stat_summary.

Related

Centering dots in ggplot on faceted boxplot [duplicate]

Following this question: How to add a number of observations per group and use group mean in ggplot2 boxplot?, I want to add number of observations per group in ggplot boxplot too. But I have added a colour into aes mapping.
The existing answer shows how to adjust text position in y axis. How could I adjust the text position in the x axis?
This is a minimum example to reproduce my problem:
library(ggplot2)
give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
# experiment with the multiplier to find the perfect position
}
p <- ggplot(mtcars, aes(factor(vs), mpg, colour = factor(am))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median)
p
Thanks for any suggestions.
You can just use position:
p <- ggplot(mtcars, aes(factor(vs), mpg, colour = factor(am))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median,
position = position_dodge(width = 0.75))
p
The width argument of position_dodge() controls the positioning on the horizontal axis. 0.75 is the sweet spot, see how it works for different numbers of groupings:
p2 <- ggplot(mtcars, aes(factor(vs), mpg, colour = factor(cyl))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median,
position = position_dodge(width = 0.75))
p2
Instead of stat_summary, you can use geom_text. Please refer to the following question: ggplot2 add text on top of boxplots.
This is an example of how you may do it with the number of observations:
# Create an aggregate of median & count
> cts <- merge(aggregate(mpg ~ cyl + am, mtcars, length),
aggregate(mpg ~ cyl + am, mtcars, median),
by=c("cyl", "am"))
# Rename the col names to fit with the original dataset..
> names(cts) <- c("cyl", "am", "count", "mpg")
# As alexwhan suggested, position_dodge helps with positioning
# along the x-axis..
> ggplot(mtcars, aes(factor(cyl), mpg, colour = factor(am))) +
geom_boxplot(position = position_dodge(width=1.0)) +
geom_text(data = cts, aes(label=count),
position=position_dodge(width=1.0))

Add a number of observations per group AND SUBGROUP in ggplot2 boxplot

This might seem like a duplicate of this question, but in fact I want to expand the original question.
I want to annote the boxplot with the number of observations per group AND SUBGROUP in ggplot. Following the example or the original post, here is my minimal example:
require(ggplot2)
give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
# experiment with the multiplier to find the perfect position
}
ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(gear))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median)
My problem is that the number of samples all line up in the center of the group, rather than plotting on the appropriate boxplot (as the picture below shows):
is it what you want?
require(ggplot2)
give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
# experiment with the multiplier to find the perfect position
}
ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(gear))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median, position=position_dodge(width=0.75))

How do I make a continuous smoothed median with ggplot function?

Here is my code:
ggplot(mtcars, aes(x = wt, y = mpg, colour = factor(cyl)))+
geom_point(stat = "identity")+
theme_bw()+
geom_smooth()
Instead of getting a continuous smoothed median, I get what looks like a fractured and inaccurate of the median of the data as a whole. I think this is due to the "factor(cyl)" function.
Here is a link to what my code gives:
If add aes(group=1) in the geom_smooth() you will fix your problem:
ggplot(mtcars, aes(x = wt, y = mpg, colour = factor(cyl)))+
geom_point(stat = "identity")+
theme_bw()+
geom_smooth(aes(group=1))

Plot multiple group histogram with overlaid line ggplot

I'm trying to plot a multiple group histogram with overlaid line, but I cannot get the right scaling for the histogram.
For example:
ggplot() + geom_histogram(data=df8,aes(x=log(Y),y=..density..),binwidth=0.15,colour='black') +
geom_line(data = as.data.frame(pdf8), aes(y=pdf8$f,x=pdf8$x), col = "black",size=1)+theme_bw()
produces the right scale. But when I try to perform fill according to groups, each group is scaled separately.
ggplot() + geom_histogram(data=df8,aes(x=log(Y),fill=vec8,y=..density..),binwidth=0.15,colour='black') +
geom_line(data = as.data.frame(pdf8), aes(y=pdf8$f,x=pdf8$x), col = "black",size=1)+theme_bw()
How would I scale it so that a black line is overlaid over the histogram and on the y axis is density?
It is going to be difficult for others to help you without a reproducible example, but perhaps something like this is what you're after:
library(ggplot2)
ggplot(data = mtcars, aes(x = mpg, fill = factor(cyl))) +
geom_histogram(aes(y = ..density..)) +
geom_line(stat = "density")
If you would rather the density line pertain to the entire dataset, you need to move the fill aesthetic into the geom_histogram function:
ggplot(data = mtcars, aes(x = mpg)) +
geom_histogram(aes(y = ..density.., fill = factor(cyl))) +
geom_line(data = mtcars, stat = "density")

Add number of observations per group in ggplot2 boxplot

Following this question: How to add a number of observations per group and use group mean in ggplot2 boxplot?, I want to add number of observations per group in ggplot boxplot too. But I have added a colour into aes mapping.
The existing answer shows how to adjust text position in y axis. How could I adjust the text position in the x axis?
This is a minimum example to reproduce my problem:
library(ggplot2)
give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
# experiment with the multiplier to find the perfect position
}
p <- ggplot(mtcars, aes(factor(vs), mpg, colour = factor(am))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median)
p
Thanks for any suggestions.
You can just use position:
p <- ggplot(mtcars, aes(factor(vs), mpg, colour = factor(am))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median,
position = position_dodge(width = 0.75))
p
The width argument of position_dodge() controls the positioning on the horizontal axis. 0.75 is the sweet spot, see how it works for different numbers of groupings:
p2 <- ggplot(mtcars, aes(factor(vs), mpg, colour = factor(cyl))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median,
position = position_dodge(width = 0.75))
p2
Instead of stat_summary, you can use geom_text. Please refer to the following question: ggplot2 add text on top of boxplots.
This is an example of how you may do it with the number of observations:
# Create an aggregate of median & count
> cts <- merge(aggregate(mpg ~ cyl + am, mtcars, length),
aggregate(mpg ~ cyl + am, mtcars, median),
by=c("cyl", "am"))
# Rename the col names to fit with the original dataset..
> names(cts) <- c("cyl", "am", "count", "mpg")
# As alexwhan suggested, position_dodge helps with positioning
# along the x-axis..
> ggplot(mtcars, aes(factor(cyl), mpg, colour = factor(am))) +
geom_boxplot(position = position_dodge(width=1.0)) +
geom_text(data = cts, aes(label=count),
position=position_dodge(width=1.0))

Resources