Here is my code:
ggplot(mtcars, aes(x = wt, y = mpg, colour = factor(cyl)))+
geom_point(stat = "identity")+
theme_bw()+
geom_smooth()
Instead of getting a continuous smoothed median, I get what looks like a fractured and inaccurate of the median of the data as a whole. I think this is due to the "factor(cyl)" function.
Here is a link to what my code gives:
If add aes(group=1) in the geom_smooth() you will fix your problem:
ggplot(mtcars, aes(x = wt, y = mpg, colour = factor(cyl)))+
geom_point(stat = "identity")+
theme_bw()+
geom_smooth(aes(group=1))
Related
If I use ggplot2's stat_summary() to make a barplot of the average number of miles per gallon for 3-, 4-, and 5-geared cars, for example, how can I label each of the bars with the average value for mpg?
library(ggplot2)
CarPlot <- ggplot() +
stat_summary(data= mtcars,
aes(x = factor(gear),
y = mpg,
fill = factor(gear)
),
fun.y="mean",
geom="bar"
)
CarPlot
I know that you can normally use geom_text(), but I'm having trouble figuring out what to do in order to get the average value from stat_summary().
You should use the internal variable ..y.. to get the computed mean.
library(ggplot2)
CarPlot <- ggplot(data= mtcars) +
aes(x = factor(gear),
y = mpg)+
stat_summary(aes(fill = factor(gear)), fun.y=mean, geom="bar")+
stat_summary(aes(label=round(..y..,2)), fun.y=mean, geom="text", size=6,
vjust = -0.5)
CarPlot
but probably it is better to aggregate beforehand.
I'd simply precompute the statistics, and build the plot afterwards:
library(plyr)
library(ggplot2)
dat = ddply(mtcars, .(gear), summarise, mean_mpg = mean(mpg))
dat = within(dat, {
gear = factor(gear)
mean_mpg_string = sprintf('%0.1f', mean_mpg)
})
ggplot(dat, aes(x = gear, y = mean_mpg)) +
geom_bar(aes(fill = gear), stat = "identity") +
geom_text(aes(label = mean_mpg_string), vjust = -0.5)
I am having some trouble with ggplot and stat_summary.
Please consider following data:
head(mtcars)
data<-mtcars
data$hp2<-mtcars$hp+50
Please consider following code:
ggplot(mtcars, aes(x = cyl, y = hp)) +
stat_summary(aes(y = hp, group = 1), fun.y=mean, colour="red", geom="line",group=1) +
stat_summary(fun.y=mean, colour="red", geom="text", show_guide = FALSE, vjust=-0.7, aes( label=round(..y.., digits=0)))
The code will produce line plot with means of hp and text labels for means ans well. If we would like to add another line/curve we simply have to add:
ggplot(mtcars, aes(x = cyl, y = hp)) +
stat_summary(aes(y = hp, group = 1), fun.y=mean, colour="red", geom="line",group=1) +
stat_summary(fun.y=mean, colour="red", geom="text", show_guide = FALSE, vjust=-0.7, aes( label=round(..y.., digits=0)))+
stat_summary(aes(y = hp2), fun.y=mean, colour="blue", geom="line",group=1)
Now comes the tricky part:
How to use stat_summary with geom="text" but for the hp2 i.e. how to technically force stat_summary to calculate means on hp2 and print the text labels? It seems that I can only use it for the "main" y.
This type of problem, that asks for graphs of related vector columns, is almost always a wide-to-long data format reshaping problem.
library(ggplot2)
data_long <- reshape2::melt(data[c('cyl', 'hp', 'hp2')], id.vars = 'cyl')
head(data_long)
ggplot(data_long, aes(x = cyl, y = value, colour = variable)) +
stat_summary(fun.y = mean, geom = "line", show.legend = FALSE) +
stat_summary(fun.y = mean, geom = "text", show.legend = FALSE, vjust=-0.7, aes( label=round(..y.., digits=0))) +
scale_color_manual(values = c("red", "blue"))
I plot the average mpg in the mtcar data frame using ggplot. I get several points for each cylinder class denoting the mean value, categorized by the vs variable.
library(ggplot2)
ggplot(mtcars, aes(cyl, mpg)) + geom_point(aes(color = factor(vs)), stat = "summary", fun.y = "mean")
If I overlay these averages on top of the raw data by adding + geom_point (below) the averages differ from what they originally were above. What am I doing wrong? Why aren't the means consistent?
ggplot(mtcars, aes(cyl, mpg)) + geom_point() + geom_point(aes(color = factor(vs)), stat = "summary", fun.y = "mean")
How embarassing. I didn't even look at the scale of the Y-axis. Thank you aosmith. There is no inconsistency in stat_summary.
I'll use violin plots here as an example, but the question extends to many other ggplot types.
I know how to subset my data along the x-axis by a factor:
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
geom_violin() +
geom_point(position = "jitter")
And I know how to plot only the full dataset:
ggplot(iris, aes(x = 1, y = Sepal.Length)) +
geom_violin() +
geom_point(position = "jitter")
My question is: is there a way to plot the full data AND a subset-by-factor side-by-side in the same plot? In other words, for the iris data, could I make a violin plot that has both "full data" and "setosa" along the x-axis?
This would enable a comparison of the distribution of a full dataset and a subset of that dataset. If this isn't possible, any recommendations on better way to visualise this would also be welcome :)
Thanks for any ideas!
Using:
ggplot(iris, aes(x = "All", y = Sepal.Length)) +
geom_violin() +
geom_point(aes(color="All"), position = "jitter") +
geom_violin(data=iris, aes(x = Species, y = Sepal.Length)) +
geom_point(data=iris, aes(x = Species, y = Sepal.Length, color = Species),
position = "jitter") +
scale_color_manual(values = c("black","#F8766D","#00BA38","#619CFF")) +
theme_minimal(base_size = 16) +
theme(axis.title.x = element_blank(), legend.title = element_blank())
gives:
I'm trying to plot a multiple group histogram with overlaid line, but I cannot get the right scaling for the histogram.
For example:
ggplot() + geom_histogram(data=df8,aes(x=log(Y),y=..density..),binwidth=0.15,colour='black') +
geom_line(data = as.data.frame(pdf8), aes(y=pdf8$f,x=pdf8$x), col = "black",size=1)+theme_bw()
produces the right scale. But when I try to perform fill according to groups, each group is scaled separately.
ggplot() + geom_histogram(data=df8,aes(x=log(Y),fill=vec8,y=..density..),binwidth=0.15,colour='black') +
geom_line(data = as.data.frame(pdf8), aes(y=pdf8$f,x=pdf8$x), col = "black",size=1)+theme_bw()
How would I scale it so that a black line is overlaid over the histogram and on the y axis is density?
It is going to be difficult for others to help you without a reproducible example, but perhaps something like this is what you're after:
library(ggplot2)
ggplot(data = mtcars, aes(x = mpg, fill = factor(cyl))) +
geom_histogram(aes(y = ..density..)) +
geom_line(stat = "density")
If you would rather the density line pertain to the entire dataset, you need to move the fill aesthetic into the geom_histogram function:
ggplot(data = mtcars, aes(x = mpg)) +
geom_histogram(aes(y = ..density.., fill = factor(cyl))) +
geom_line(data = mtcars, stat = "density")