Draw mean and outlier points for box plots using ggplot2 - r

I am trying to plot the outliers and mean point for the box plots in below using the data available here. The dataset has 3 different factors and 1 value column for 3600 rows.
While I run the below the code it shows the mean point but doesn't draw the outliers properly
ggplot(df, aes(x=Representations, y=Values, fill=Methods)) +
geom_boxplot() +
facet_wrap(~Metrics) +
stat_summary(fun.y=mean, colour="black", geom="point", position=position_dodge(width=0.75)) +
geom_point() +
theme_bw()
Again, while I am modify the code like in below the mean points disappear !!
ggplot(df, aes(x=Representations, y=Values, colour=Methods)) +
geom_boxplot() +
facet_wrap(~Metrics) +
stat_summary(fun.y=mean, colour="black", geom="point", position=position_dodge(width=0.75)) +
geom_point() +
theme_bw()
In both of the cases I am getting the message: "ymax not defined: adjusting position using y instead" 3 times.
Any kind suggestions how to fix it? I would like to draw the mean points within individual box plots and show outliers in the same colour as the plots.
EDIT:
The original data set does not have any outliers and that was reason for my confusion. Thanks to MrFlick's answer with randomly generated data which clarifies it properly.

Rather than downloading the data, I just made a random sample.
set.seed(18)
gg <- expand.grid (
Methods=c("BC","FD","FDFND","NC"),
Metrics=c("DM","DTI","LB"),
Representations=c("CHG","QR","HQR")
)
df <- data.frame(
gg,
Values=rnorm(nrow(gg)*50)
)
Then you should be able to create the plot you want with
library(ggplot2)
ggplot(df, aes(x=Representations, y=Values, fill=Methods)) +
geom_boxplot() +
stat_summary(fun.y="mean", geom="point",
position=position_dodge(width=0.75), color="white") +
facet_wrap(~Metrics)
which gave me
I was using ggplot2 version 0.9.3.1

Related

ggplot2 - why does changing axis scale affect summary statistics of variables? [duplicate]

This question already has an answer here:
R ggplot boxplot: change y-axis limit
(1 answer)
Closed last month.
I have a the following data:
x <- data.frame('myvar'=c(10,10,9,9,8,8, runif(100)), 'mygroup' = c(rep('a', 26), rep('b', 80)))
I want to describe the data using a box-and-whiskers plot in ggplot2. I have also included the mean using a stat_summary.
library(ggplot2)
ggplot(x, aes(x=myvar, y=mygroup)) +
geom_boxplot() +
stat_summary(fun=mean, geom='point', shape=20, color='red', fill='red')
This is fine, but for some of my graphs, the outliers are so huge, that it's hard to make sense of the total distribution. In these cases, I have cut the x axis:
ggplot(x, aes(x=myvar, y=mygroup)) +
geom_boxplot() +
stat_summary(fun=mean, geom='point', shape=20, color='red', fill='red') +
scale_x_continuous(limit=c(0,5))
Note, now that the means (and medians?) are calculated using only the subset of data that is visible on the graph. Is there a ggplot way to include the outlier observations in the calculation but drop them from the visualisation?
My desired output would be a graph with x limits at c(0,5) and a red dot at 2.48 for group mygroup='a'.
scale_x_continuous will remove those points not lying within the limits. You want to use coord_cartesian to "zoom in" without removing your data:
ggplot(x, aes(x=myvar, y=mygroup)) +
geom_boxplot() +
stat_summary(fun=mean, geom='point', shape=20, color='red', fill='red') +
coord_cartesian(c(0,5))

How can I add a error bar in dataframe and in ggplot2?

I am making a grouped bar graph using ggplot2. However, I am having trouble adding sd/sem. The standard errors are: '1,2,3,1'. How can I add Error Bar to this bar graph?
survey <- data.frame(group=rep(c("LG", "RM"),each=2),
sample=rep(c("sample1", "sample2"),1),
values=c(200,50,300,25 ))
library(ggplot2)
ggplot(survey, aes(x=sample, y=values, fill=group)) +
geom_bar(stat="identity", position=position_dodge())
In this case just put the fixed values directly, as they were calculated already somewhere else.
survey <- data.frame(group=rep(c("LG", "RM"),each=2),
sample=rep(c("sample1", "sample2"),1),
values=c(200,50,300,25 ),
se=c(1,2,3,1)) #added values here
library(ggplot2)
ggplot(survey, aes(x=sample, y=values, fill=group)) +
geom_bar(stat="identity", position=position_dodge())+
geom_errorbar(aes(ymin=values - se, ymax=values + se),
position=position_dodge(width = 0.9),width=0.5)
gives:

Equal geom point size in legend in multiple plots with ggplot

Is there a way to equalise the size of geom_points throughout multiple plots, so that they are easily comparable?
ie. I want the size of a 100 value to be equal throughout the plots, regardless of the minimum and maximum value that makes up the size values. As seen below, the size of geom_points are the same, but they represent different values.
graph <- ggplot(mar, aes(x=long, y=lat)) + xlab("Longitude") + ylab("Latitude")
graph + theme_grey() + geom_point(aes(size=distance$NEAR_DIST)) + scale_size_area() + labs(size = "Distance from predicted LCP Roman\nroad to known Roman road (m)")
Thanks!
You could achieve that as follows:
df1 =data.frame(x=1:20,y=runif(20,1,10),size=runif(20,1,10))
df2 =data.frame(x=1:20,y=runif(20,1,10),size=runif(20,31,40))
maximum = max(c(df1$size,df2$size))
graph <- ggplot(df1, aes(x=x, y=y,size=size)) + geom_point() +
scale_size_area(limits=c(1,maximum))
graph2 <- ggplot(df2, aes(x=x, y=y,size=size)) + geom_point() +
scale_size_area(limits=c(1,maximum))
Hope this helps!

ggplot2: multiple colours in stat_summary

I have a plot in which I am displaying individual values from multiple subjects, coloured by group. Added to that are means per group, calculated using stat_summary.
I would like the two means to be coloured by group, but in colours other than the individual data. This turns out to be difficult, at least when using stat_summary. I have the following code:
ggplot(data=dat,
aes(x=Round, y=DV, group=Subject, colour=T1)) +
geom_line() + geom_point() + theme_bw() +
stat_summary(fun.y=mean, geom="line", size=1.5,
linetype="dotted", color="black",
aes(group=T1))
Which produces this example graph.
The colour for the means created by stat_summary is set to black; otherwise it would be red and blue like the individual data lines. However, it is not possible to set more than one colour - so color=c("black", "blue") does not work.
I've already tried scale_colour_manual as explained here, but this will change the colours of the individual data lines, leaving the mean lines unaffected.
Any suggestion how to solve this? Code and data here.
You need to create different values for the mapping to color:
ggplot(data=iris,
aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
geom_line() + geom_point() + theme_bw() +
stat_summary(fun.y=mean, geom="line", size=1.5,
linetype="dotted", aes(color=paste("mean", Species)))
You can then use scale_color_manual to get specific colors.

How to combine 2 plots (ggplot) into one plot?

By using R, is it possible to place 2 ggplot together (i.e., on the same plot)? I wish to show a trend from 2 different data frames and instead of putting them one next to the other, I'd like to integrate them together in one plot and only to change the color of one of them (the black dot).
To be more specific, I have the following 2 visuals:
ggplot(visual1, aes(ISSUE_DATE,COUNTED)) + geom_point() + geom_smooth(fill="blue", colour="darkblue", size=1)
and
ggplot(visual2, aes(ISSUE_DATE,COUNTED)) + geom_point() + geom_smooth(fill="red", colour="red", size=1)
They look like this (both have black dots and I'll need to change one of them to something different):
and
Creating a single combined plot with your current data set up would look something like this
p <- ggplot() +
# blue plot
geom_point(data=visual1, aes(x=ISSUE_DATE, y=COUNTED)) +
geom_smooth(data=visual1, aes(x=ISSUE_DATE, y=COUNTED), fill="blue",
colour="darkblue", size=1) +
# red plot
geom_point(data=visual2, aes(x=ISSUE_DATE, y=COUNTED)) +
geom_smooth(data=visual2, aes(x=ISSUE_DATE, y=COUNTED), fill="red",
colour="red", size=1)
however if you could combine the data sets before plotting then ggplot will
automatically give you a legend, and in general the code looks a bit cleaner
visual1$group <- 1
visual2$group <- 2
visual12 <- rbind(visual1, visual2)
p <- ggplot(visual12, aes(x=ISSUE_DATE, y=COUNTED, group=group, col=group, fill=group)) +
geom_point() +
geom_smooth(size=1)
Dummy data (you should supply this for us)
visual1 = data.frame(ISSUE_DATE=runif(100,2006,2008),COUNTED=runif(100,0,50))
visual2 = data.frame(ISSUE_DATE=runif(100,2006,2008),COUNTED=runif(100,0,50))
combine:
visuals = rbind(visual1,visual2)
visuals$vis=c(rep("visual1",100),rep("visual2",100)) # 100 points of each flavour
Now do:
ggplot(visuals, aes(ISSUE_DATE,COUNTED,group=vis,col=vis)) +
geom_point() + geom_smooth()
and adjust colours etc to taste.
Just combine them. I think this should work but it's untested:
p <- ggplot(visual1, aes(ISSUE_DATE,COUNTED)) + geom_point() +
geom_smooth(fill="blue", colour="darkblue", size=1)
p <- p + geom_point(data=visual2, aes(ISSUE_DATE,COUNTED)) +
geom_smooth(data=visual2, fill="red", colour="red", size=1)
print(p)

Resources