Plot min, max, median for each x value in geom_pointrange - r

So I know the better way to approach this is to use the stat_summary() function, but this is to address a question presented in Hadley's R for Data Science book mostly for my own curiosity. It asks how to convert code for an example plot made using stat_summary() to make the same plot with geom_pointrange(). The example is:
ggplot(data = diamonds) +
stat_summary(
mapping = aes(x = cut, y = depth),
fun.ymin = min,
fun.ymax = max,
fun.y = median
)
And the plot should look like this:
(source: had.co.nz)
I've attempted with code such as:
ggplot(data = diamonds, mapping = aes(x = cut, y = depth)) +
geom_pointrange(mapping = aes(ymin = min(depth), ymax = max(depth)))
However, this plots the min and max for all depth values across each cut category (i.e., all ymin's and ymax's are the same). I also tried passing a vector of mins and maxs, but ymin only takes single values as far as I can tell. It's probably something simple, but I think people mostly use stat_summary() as I've found very few examples of geom_pointrange() usage via Google.

I think you need to do the summary outside the plot function to use geom_pointrange:
library(dplyr)
library(ggplot2)
summary_diamonds <- diamonds %>%
group_by(cut) %>%
summarise(lower = min(depth), upper = max(depth), p = median(depth))
ggplot(data = summary_diamonds, mapping = aes(x = cut, y = p)) +
geom_pointrange(mapping = aes(ymin = lower, ymax = upper))

geom_pointrange includes a stat argument, so you can do the statistical transformation inline https://stackoverflow.com/a/41865061

Related

Ggplot boxplot by group, change summary statistics shown

I want to change the summary statistics shown in the following boxplot:
I have created the boxplot as follows:
ggplot(as.data.frame(beta2), aes(y=var1,x=as.factor(Year))) +
geom_boxplot(outlier.shape = NA)+
ylab(expression(beta[1]))+
xlab("\nYear")+
theme_bw()
The default is for the box is the first and third quantile. I want the box to show the 2.5% and 97.5% quantiles. I know one can easily change what is shown when one boxplot is visualised by adding the following to geom_boxplot:
aes(
ymin= min(var1),
lower = quantile(var1,0.025),
middle = mean(var1),
upper = quantile(var1,0.975),
ymax=max(var1))
However, this does not work for when boxplots are generated by group. Any idea how one would do this? You can use the Iris data set:
ggplot(iris, aes(y=Sepal.Length,x=Species)) +
geom_boxplot(outlier.shape = NA)
EDIT:
The answer accepted does work. My data-frame is really big and as such the method provided takes a bit of time. I found another solution here: SOLUTION that works for large datasets and my specific need.
This could be achieved via stat_summary by setting geom="boxplot". and passing to fun.data a function which returns a data frame with the summary statistics you want to display as ymin, lower, ... in your boxplot:
library(ggplot2)
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
stat_summary(geom = "boxplot", fun.data = function(x) {
data.frame(
ymin = min(x),
lower = quantile(x, 0.025),
middle = mean(x),
upper = quantile(x, 0.975),
ymax = max(x)
)}, outlier.shape = NA)

ggplot median and percentile

I'm trying to replicate this image.
I was able to plot a scatter plot and the median (but it's not continuous).
I failed to plot the percentiles.
The median varies according to different spell length.
ggplot(df,aes(x=Spell.Length,y=Growth.Rate)) +
geom_point() +
stat_summary(fun = median, fun.min = median, fun.max = median,
geom = "crossbar", width = 0.5,colour="red")
What I'm trying to do
What I got so far
Use dplyr::summarize to create a data frame of the values of percentiles also group_by(Spell.Length), then plot those using geom_line(). Then the horizontal lines with geom_hline().
df %>% group_by(Spell.Length) %>%
summarize(median = quantile(Growth.Rate, p = .5), q1 = quantile(Growth.Rate, p = .25)) %>%
ggplot(aes(x = Spell.Length, y = median) +
geom_line() +
geom_line(aes(x = Spell.Length, y = q1)) +
geom_hline(yintercept = 3)
would be the basic idea.
geom_line() for each specific line style/group
Red lines geom_hline()

Passing extra parameters into the stat function

I'm learning R and ggplot2. According to the instructions, geom and stat are usually inter-changable as a geom has a default stat and a stat has a default geom.
My exercise is to create a plot in 3 ways: with stat, manually, and with geom_pointrange. I'm stuck at the third part:
library("tidyverse")
# With stat_summary
ggplot(data = diamonds) +
stat_summary(
mapping = aes(x = cut, y = depth),
fun.min = min,
fun.max = max,
fun = median
)
# Manually
diamonds_summary = diamonds %>%
group_by(cut) %>%
summarize(p = median(depth), lower = min(depth), upper = max(depth))
ggplot(diamonds_summary) +
geom_pointrange(
mapping = aes(x = cut, y = p, ymin = lower, ymax = upper)
)
# With geom_pointrange and stat
ggplot(data = diamonds) +
geom_pointrange(
mapping = aes(x = cut, y = median(depth)),
stat = "summary"
)
# Warning: No summary function supplied, defaulting to `mean_se()`
How can I pass two summary functions (min and max) to the function identified by the stat param?
All three solutions should produce the following output:
When you sepcify stat="summary", it still needs to know how to summarize your values. The default is to use the mean with standard errors. But you want medians with min and max values. You can write your own summary function
median_min_max <- function(x) {
data.frame(y=median(x), ymin=min(x), ymax=max(x))
}
And then pass that to your pointrange layer via the fun.data= parameter.
ggplot(data = diamonds) +
geom_pointrange(
mapping = aes(x = cut, y = depth),
stat = "summary", fun.data = median_min_max
)
This will give you a plot that matches the one you created from your summarized data.
ggplot does have a median_hilow summary function but that uses the Hmisc::smedian.hilow function which uses quantiles based on a confidence inteval rather than min/max values.

overlay discrete and continuous layer in ggplot - surprised that layer order matters

consider the following example dataset:
library(dplyr)
library(ggplot2)
d = mtcars %>%
as_tibble(rownames = "name") %>%
mutate(wt.cat = cut(wt, seq(1.5, 5.5, by = 1))) %>%
group_by(wt.cat) %>%
summarize(
Mean = mean(mpg),
Min = min(mpg),
Max = max(mpg)
)
Say I want to plot points for the "mean" value of each category in wt.cat and a ribbon showing the range. This works:
ggplot(d, aes(x = wt.cat)) +
geom_point(aes(y= Mean)) +
geom_ribbon(aes(x = as.numeric(wt.cat), ymin = Min, ymax = Max), fill = "blue")
But the points are masked by the ribbon. However, if I change the order of the layers so that the points are plotted on top of the ribbon, I get an error:
ggplot(d, aes(x = wt.cat)) +
geom_ribbon(aes(x = as.numeric(wt.cat), ymin = Min, ymax = Max), fill = "blue") +
geom_point(aes(y= Mean))
## Error: Discrete value supplied to continuous scale
So even though I'm specifying the discrete axis as the "default" aesthetic, it gets overridden by the specification of the first plotted layer. The only way I can find around this is to plot a dummy point layer first:
ggplot(d, aes(x = wt.cat)) +
geom_point(aes(y= Mean), shape = NA) +
geom_ribbon(aes(x = as.numeric(wt.cat), ymin = Min, ymax = Max), fill = "blue") +
geom_point(aes(y= Mean))
## Warning message:
## Removed 4 rows containing missing values (geom_point).
Is there a more "proper" or correct way of combining discrete and continuous layers? Is there a solution that doesn't require creating a dummy layer?
would something like this be a solution?
d %>% {
ggplot(., aes(x = wt.cat)) +
scale_x_discrete(labels = levels(.$wt.cat)) +
geom_ribbon(aes(x =as.numeric(wt.cat), ymin = Min, ymax = Max), fill = "blue") +
geom_point(aes(y=Mean))
}
I just learned you can wrap a pipe with { } and then reference the entire data frame with .
As camille said, the issue is that geom_ribbon requires a continuous scale because it plots area across values relating to the adjacent position. I believe the scale gets converted to continuous when geom_ribbon is added, but the labels are maintained.
Hope this helps
As per my reply -- the following works just as well if you want ggplot2 to handle all the labeling
d %>%
ggplot(aes(x = wt.cat)) +
scale_x_discrete() +
geom_ribbon(aes(x =as.numeric(wt.cat), ymin = Min, ymax = Max), fill = "blue") +
geom_point(aes(y=Mean))

Adding labels in ggplot for summary statistics

About 18 months ago, this helpful exchange appeared, with code to show how to produce a plot of median along with interquartile ranges. Here's the code:
ggplot(data = diamonds) +
geom_pointrange(mapping = aes(x = cut, y = depth),
stat = "summary",
fun.ymin = function(z) {quantile(z,0.25)},
fun.ymax = function(z) {quantile(z,0.75)},
fun.y = median)
Producing this plot:
What I'd wonder is how to add labels for the median and IQ ranges, and how to format the bar (color, alpha, etc). I tried calling the plot as an object to see if there were objects within I could then use to call format functions, but nothing was obvious when I looked at it in the r Studio IDE.
Is this even doable? I know I can do a boxplot but that would have to include min/max. I'd like to produce boxplots with just mean/median and IQs.
You can change the formating like you would any ggplot layer, see the docs for Vertical intervals: lines, crossbars & errorbars in this case. An example of this is the following:
library(ggplot2)
ggplot(data = diamonds) +
geom_pointrange(mapping = aes(x = cut, y = depth),
stat = "summary",
fun.ymin = function(z) {quantile(z,0.25)},
fun.ymax = function(z) {quantile(z,0.75)},
fun.y = median,
size = 4, # <- adjusts size
colour = "red", # <- adjusts colour
alpha = .3) # <- adjusts transparency
If you want to control formatting for the points and lines individually you need to do as #camille suggests and pre-process your data as geom_pointrange() draws a single graphical object so the points and lines are one in the same.
I would suggest something like this:
library(dplyr)
library(ggplot2)
diamonds %>%
group_by(cut) %>%
summarise(median = median(depth),
lq = quantile(depth, 0.25),
uq = quantile(depth, 0.75)) %>%
ggplot(aes(cut, median)) +
geom_linerange(aes(ymin=lq, ymax=uq), size = 4, colour = "blue", alpha = .4) +
geom_point(size = 10, colour = "red", alpha = .8)

Resources