I am trying to make an IQR plot with no min or max and I just want to display the median and IQR. The reason is I am working with sensitive data and showing outliers might identify participants. I have been able to create a version of the plot below but as you can see the "-" that represents the median does not stretch all the way across the bars. Is it possible to get the "-" to be the length of each bar? Other packages to answer the question are welcome, also acceptable is to have the plot made from aggregate or raw data.
library(ggplot2)
library(dplyr)
median_IQR <- function(x) {
data.frame(y = median(x), # Median
ymin = quantile(x)[2], # 1st quartile
ymax = quantile(x)[4]) # 3rd quartile
}
iris <- mutate(iris, let = rep(c("a","b"), nrow(iris)/2))
p <- ggplot(iris, aes(x=Species, y=Sepal.Length, color = let, fill=let ))
posn.d <- position_dodge(width=0.5)
p +
stat_summary(geom = "crossbar",
fun.data = median_IQR,
position = 'dodge') +
stat_summary(geom = "point",
fun = "median",
position = posn.d,
size = 3,
col = "black",
shape = "_")
Rewriting the first stats_summarysolution, using the geom function instead of the stat function:
rm(list = ls())
library(ggplot2)
library(dplyr)
iris <- mutate(iris, let = rep(c("a","b"), nrow(iris)/2))
p <- ggplot(iris, aes(x=Species, y=Sepal.Length, color = let, fill=let ))+theme_bw()
p + geom_pointrange(mapping = aes(x=Species, y=Sepal.Length),
stat = "summary",
fun.min = function(z) {quantile(z,0.25)},
fun.max = function(z) {quantile(z,0.75)},
fun = median,
size=1)
To group the points side by side
p+boxplot()
Otherwise
Related
I'm trying to replicate this image.
I was able to plot a scatter plot and the median (but it's not continuous).
I failed to plot the percentiles.
The median varies according to different spell length.
ggplot(df,aes(x=Spell.Length,y=Growth.Rate)) +
geom_point() +
stat_summary(fun = median, fun.min = median, fun.max = median,
geom = "crossbar", width = 0.5,colour="red")
What I'm trying to do
What I got so far
Use dplyr::summarize to create a data frame of the values of percentiles also group_by(Spell.Length), then plot those using geom_line(). Then the horizontal lines with geom_hline().
df %>% group_by(Spell.Length) %>%
summarize(median = quantile(Growth.Rate, p = .5), q1 = quantile(Growth.Rate, p = .25)) %>%
ggplot(aes(x = Spell.Length, y = median) +
geom_line() +
geom_line(aes(x = Spell.Length, y = q1)) +
geom_hline(yintercept = 3)
would be the basic idea.
geom_line() for each specific line style/group
Red lines geom_hline()
About 18 months ago, this helpful exchange appeared, with code to show how to produce a plot of median along with interquartile ranges. Here's the code:
ggplot(data = diamonds) +
geom_pointrange(mapping = aes(x = cut, y = depth),
stat = "summary",
fun.ymin = function(z) {quantile(z,0.25)},
fun.ymax = function(z) {quantile(z,0.75)},
fun.y = median)
Producing this plot:
What I'd wonder is how to add labels for the median and IQ ranges, and how to format the bar (color, alpha, etc). I tried calling the plot as an object to see if there were objects within I could then use to call format functions, but nothing was obvious when I looked at it in the r Studio IDE.
Is this even doable? I know I can do a boxplot but that would have to include min/max. I'd like to produce boxplots with just mean/median and IQs.
You can change the formating like you would any ggplot layer, see the docs for Vertical intervals: lines, crossbars & errorbars in this case. An example of this is the following:
library(ggplot2)
ggplot(data = diamonds) +
geom_pointrange(mapping = aes(x = cut, y = depth),
stat = "summary",
fun.ymin = function(z) {quantile(z,0.25)},
fun.ymax = function(z) {quantile(z,0.75)},
fun.y = median,
size = 4, # <- adjusts size
colour = "red", # <- adjusts colour
alpha = .3) # <- adjusts transparency
If you want to control formatting for the points and lines individually you need to do as #camille suggests and pre-process your data as geom_pointrange() draws a single graphical object so the points and lines are one in the same.
I would suggest something like this:
library(dplyr)
library(ggplot2)
diamonds %>%
group_by(cut) %>%
summarise(median = median(depth),
lq = quantile(depth, 0.25),
uq = quantile(depth, 0.75)) %>%
ggplot(aes(cut, median)) +
geom_linerange(aes(ymin=lq, ymax=uq), size = 4, colour = "blue", alpha = .4) +
geom_point(size = 10, colour = "red", alpha = .8)
So im trying to make some different Boxplots,
Completely normal boxplot
I can't figure out how to create the boxplot without the lower and upper quantile, which essentially would be the outliers and the median connected by the whiskers. So something which would look like this
My attempt
But i need a total connection with a vertical line between the whisker?
what i did for the second plot in R was the following
boxplot(mpg~cyl,data=mtcars, main="Car Milage Data", xlab="Number of Cylinders",
ylab="Miles Per Gallon",col="white",frame=F,medcol = "black", boxlty =0,
whisklty = 1, staplelwd = 1,boxwex=0.4)
Many Thanks.
Here is a way to get what you are looking for using a scatter plot and error bars:
library(tidyverse)
data_summary <- data %>%
group_by(grouping_var) %>%
summarize(median = median(quant_var),
max = max(quant_var),
min = min(quant_var))
ggplot(data_summary, aes(x = grouping_var,
y = median)) +
geom_point() +
geom_errorbar(aes(ymin = min,
ymax = max))
Then if you need to overlay your old data you can just add a new geom like so:
ggplot(data_summary, aes(x = grouping_var,
y = median)) +
geom_point() +
geom_errorbar(aes(ymin = min,
ymax = max)) +
geom_point(data = data, aes(x = grouping_var,
y = quant_var))
Suppose that I have a dataframe that looks like this:
data <- data.frame(y = rnorm(10,0,1), x = runif(10,0,1))
What I would like to do is to cut the x values into bins, such as:
data$bins <- cut(data$x,breaks = 4)
Then, I would like to plot (using ggplot) the result in a way that the x-axis is the bins, and the y axis is the mean of data$y data points that fall into the corresponding bin.
Thank you in advance
You can use the stat_summary() function.
library(ggplot2)
data <- data.frame(y = rnorm(10,0,1), x = runif(10,0,1))
data$bins <- cut(data$x,breaks = 4)
# Points:
ggplot(data, aes(x = bins, y = y)) +
stat_summary(fun.y = "mean", geom = "point")
# Histogram bars:
ggplot(data, aes(x = bins, y = y)) +
stat_summary(fun.y = "mean", geom = "histogram")
Here is the picture of the points:
This thread is a bit old but here you go, use stat_summary_bin (it might be in the newer versions).
ggplot(data, mapping=aes(x, y)) +
stat_summary_bin(fun.y = "mean", geom="bar", bins=4 - 1) +
ylab("mean")
Since the mean of your y values can be smaller than 0, I recommend a dot plot instead of a bar chart. The dots represent the means. You can use either qplot or the regular ggplot function. The latter is more customizable. In this example, both produce the same output.
library(ggplot2)
set.seed(7)
data <- data.frame(y = rnorm(10,0,1), x = runif(10,0,1))
data$bins <- cut(data$x,breaks = 4, dig.lab = 2)
qplot(bins, y, data = data, stat="summary", fun.y = "mean")
ggplot(data, aes(x = factor(bins), y = y)) +
stat_summary(fun.y = mean, geom = "point")
You can also add error bars. In this case, they show the mean +/- 1.96 times the group standard deviation. The group mean and SD can be obtained using tapply.
m <- tapply(data$y, data$bins, mean)
sd <- tapply(data$y, data$bins, sd)
df <- data.frame(mean.y = m, sd = sd, bin = names(m))
ggplot(df, aes(x = bin, y = mean.y,
ymin = mean.y - 1.96*sd,
ymax = mean.y + 1.96*sd)) +
geom_errorbar() + geom_point(size = 3)
I am doing a basic boxplot where y=age and x=Patient groups
age <- ggplot(data, aes(factor(group2), age)) + ylim(15, 80)
age + geom_boxplot(fill = "grey80", colour = "#3366FF")
I was hoping you could help me out with a few things:
1) Is it possible to include a number of observations per group above each group boxplot (but NOT on the X axis where my group labels are) without having to do this in paint :)?
I have tried using:
age + annotate("text", x = "CON", y = 60, label = "25")
where CON is the 1st group and y = 60 is ~ just above the boxplot for this group. However, the command didn't work. I assume it has something to do that it reads x as a continuous rather than a categorical variable.
2) Also although there are plenty of questions about using the mean rather than the median for the boxplots, I still haven`t found a code that works for me?
3) On the same matter is there a way you could include the mean group stat in the boxplot? Perhaps using
age + stat_summary(fun.y=mean, colour="red", geom="point")
which however only includes a dot of where the mean lies. Or again using
age + annotate("text", x = "CON", y = 30, label = "30")
where CON is the 1st group and y = 30 is ~ the group age mean.
Knowing how flexible and rich ggplot2 syntax is I was hoping that there is a more elegant way of using the real stats output rather than annotate.
Any suggestions/links would be much appreciated!
Thanks!!
Is this anything like what you're after? With stat_summary, as requested:
# function for number of observations
give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
# experiment with the multiplier to find the perfect position
}
# function for mean labels
mean.n <- function(x){
return(c(y = median(x)*0.97, label = round(mean(x),2)))
# experiment with the multiplier to find the perfect position
}
# plot
ggplot(mtcars, aes(factor(cyl), mpg, label=rownames(mtcars))) +
geom_boxplot(fill = "grey80", colour = "#3366FF") +
stat_summary(fun.data = give.n, geom = "text", fun.y = median) +
stat_summary(fun.data = mean.n, geom = "text", fun.y = mean, colour = "red")
Black number is number of observations, red number is mean value. joran's answer shows you how to put the numbers at the top of the boxes
hat-tip: https://stackoverflow.com/a/3483657/1036500
I think this is what you're looking for maybe?
myboxplot <- ddply(mtcars,
.(cyl),
summarise,
min = min(mpg),
q1 = quantile(mpg,0.25),
med = median(mpg),
q3 = quantile(mpg,0.75),
max= max(mpg),
lab = length(cyl))
ggplot(myboxplot, aes(x = factor(cyl))) +
geom_boxplot(aes(lower = q1, upper = q3, middle = med, ymin = min, ymax = max), stat = "identity") +
geom_text(aes(y = max,label = lab),vjust = 0)
I just realized I mistakenly used the median when you were asking about the mean, but you can obviously use whatever function for the middle aesthetic you please.
Answer to the first problem.
To show value above the box you should provide x values as numeric not as level names. So, to plot the value above first value give x=1.
data(ToothGrowth)
ggplot(ToothGrowth,aes(supp,len))+geom_boxplot()+
annotate("text",x=1,y=32,label=30)