I am trying to draw an ECDF of some data with a "confidence interval" represented via a shaded region using ggplot2. I am having trouble combining geom_ribbon() with stat_ecdf() to achieve the effect I am after.
Consider the following example data:
set.seed(1)
dat <- data.frame(variable = rlnorm(100) + 2)
dat <- transform(dat, lower = variable - 2, upper = variable + 2)
> head(dat)
variable lower upper
1 2.534484 0.5344838 4.534484
2 3.201587 1.2015872 5.201587
3 2.433602 0.4336018 4.433602
4 6.929713 4.9297132 8.929713
5 3.390284 1.3902836 5.390284
6 2.440225 0.4402254 4.440225
I am able to produce an ECDF of variable using
library("ggplot2")
ggplot(dat, aes(x = variable)) +
geom_step(stat = "ecdf")
However I am unable to use lower and upper as the ymin and ymax aesthetics of geom_ribbon() to superimpose the confidence interval on the plot as another layer. I have tried:
ggplot(dat, aes(x = variable)) +
geom_ribbon(aes(ymin = lower, ymax = upper), stat = "ecdf") +
geom_step(stat = "ecdf")
but this raises the following error
Error: geom_ribbon requires the following missing aesthetics: ymin, ymax
Is there a way to coax geom_ribbon() into working with stat_ecdf() to produce a shaded confidence interval? Or, can anyone suggest an alternative means of adding a shaded polygon defined by lower and upper as a layer to the ECDF plot?
Try this (a bit of shot in the dark):
ggplot(dat, aes(x = variable)) +
geom_ribbon(aes(x = variable,ymin = ..y..-2,ymax = ..y..+2), stat = "ecdf",alpha=0.2) +
geom_step(stat = "ecdf")
Ok, so that's not the same thing as what you trying to do, but it should explain what's going on. The stat is returning a data frame with just the original x and the computed y, so I think that's all you have to work with. i.e. stat_ecdf only computes the cumulative distribution function for a single x at a time.
The only other thing I can think of is the obvious, calculating the lower and upper separately, something like this:
l <- ecdf(dat$lower)
u <- ecdf(dat$upper)
v <- ecdf(dat$variable)
dat$lower1 <- l(dat$variable)
dat$upper1 <- u(dat$variable)
dat$variable1 <- v(dat$variable)
ggplot(dat,aes(x = variable)) +
geom_step(aes(y = variable1)) +
geom_ribbon(aes(ymin = upper1,ymax = lower1),alpha = 0.2)
Not sure exactly how you want to reflect the CI, but ggplot_build() lets you get the generated data back from the plot, you can then overplot what you like.
This chart shows:
red = original ribbon
blue = takes the original CI vectors and applies to the ecdf curve
green = calculates the ecdf of upper and lower series and plots
g<-ggplot(dat, aes(x = variable)) +
geom_step(stat = "ecdf") +
geom_ribbon(aes(ymin = lower, ymax = upper), alpha=0.5, fill="red")
inside<-ggplot_build(g)
matched<-merge(inside$data[[1]],data.frame(x=dat$variable,dat$lower,dat$upper),by=("x"))
g +
geom_ribbon(data=matched, aes(x = x,
ymin = y + dat.upper-x,
ymax = y - x + dat.lower),
alpha=0.5, fill="blue") +
geom_ribbon(data=matched, aes(x = x,
ymin = ecdf(dat.lower)(x),
ymax = ecdf(dat.upper)(x)),
alpha=0.5, fill="green")
Related
I have a question about ggplot2.
I want to connect data point with ols result via vertical line, like the code listed below.
Can I transfer ..y.., the value calculated by stat_smooth, to geom_linerange directly?
I tried stat_smooth(..., geom = "linerange", mapping(aes(ymin=pmin(myy, ..y..), ymax=pmax(myy,..y..)) but it is not the result I want.
library(ggplot2)
df <- data.frame(myx = 1:10,
myy = c(1:10) * 5 + 2 * rnorm(10, 0, 1))
lm.fit <- lm("myy~myx", data = df)
pred <- predict(lm.fit)
ggplot(df, aes(myx, myy)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
geom_linerange(mapping = aes(ymin = pmin(myy, pred),
ymax = pmax(myy, pred)))
stat_smooth evaluates the values at n evenly spaced points, with n = 80 by default. These points may not coincide with the original x values in your data frame.
Since you are calculating predicted values anyway, it would probably be more straightforward to add that back to your data frame and plot all geom layers based on that as your data source, for example:
df$pred <- pred
ggplot(df, aes(myx, myy)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
geom_linerange(aes(ymin = myy, ymax = pred))
in the base version of R it is easy (but cumbersome) to create a plot with error bars based on the descriptive data. With ggplot2 I am struggling to do so and all the examples I have found are based on the raw data.
Specifically, how can I create a barplot with confidence intervals for a simple two-group design? M1 = 3, M2 = 4, SD1 = 1, SD2 = 1.2, n1 = 111, n2 = 222? I started off simply with
ggplot(aes(x=c(1:2), y=c(3, 4))) + geom_bar()
# or
ggplot(aes(y=c(3, 4))) + geom_bar()
but not even this seem to work to create a barplot.
Any suggestions?
What about using ggplot2::stat_summary()? You can let it take care of your mean and se calculations (it relies on library(Hmisc) for most of these summary functions, so look there for more help).
library(ggplot2)
ggplot(mtcars, aes(cyl, mpg)) +
stat_summary(geom = "bar", fun.y = mean) +
stat_summary(geom = "errorbar", fun.data = mean_se)
Adjust width = for skinnier bars or error bars.
You can also use a true confidence interval with mean_cl_normal or mean_cl_boot and for a better visualization of the data dispersion:
ggplot(mtcars, aes(cyl, mpg)) +
stat_summary(geom = "crossbar", fun.data = mean_cl_normal)
Edit:
If your want to recreate a published paper just roll your data into a data.frame first:
datf <- data.frame(
group = c("1", "2"),
means = c(3,4),
sds = c(1,1.2),
ns = c(111, 222)
)
# add your CI calcs as column called upr and lwr
library(tidyverse)
datf <- datf %>% mutate(lwr = means - (qnorm(.975)*(sds/sqrt(ns))),
upr = means + (qnorm(.975)*(sds/sqrt(ns))))
ggplot(datf, aes(group, y = means, ymin = lwr, ymax = upr)) +
geom_crossbar()
Or the traditional standard of columns with error bars if you must like this:
ggplot(datf, aes(group, y = means, ymin = lwr, ymax = upr)) +
geom_col() +
geom_errorbar()
You can draw an error bar to whatever values you want. They have an aesthetic called ymin and ymax that you can set. Here I draw the bars +/- 1 standard devaiation from the mean
dd<-read.table(text="sample mean sd n
1 3 1 111
2 4 1.2 222", header=T)
ggplot(dd, aes(sample)) +
geom_col(aes(y=mean)) +
geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd))
I'm hoping someone can help with this plotting problem I have. The data can be found here.
Basically I want to plot a line (mean) and it's associated confidence interval (lower, upper) for 4 models I have tested. I want to facet on the Cat_Auth variable for which there are 4 categories (so 4 plots). The first 'model' is actually just the mean of the sample data and I don't want a CI for this (NA values specified in the data - not sure if this is the correct thing to do).
I can get the plot some way there with:
newdata <- read.csv("data.csv", header=T)
ggplot(newdata, aes(x = Affil_Max, y = Mean)) +
geom_line(data = newdata, aes(), colour = "blue") +
geom_ribbon(data = newdata, alpha = .5, aes(ymin = Lower, ymax = Upper, group = Model, fill = Model)) +
facet_grid(.~ Cat_Auth)
But I'd like different coloured lines and shaded ribbons for each model (e.g. a red mean line and red shaded ribbon for model 2, green for model 3 etc). Also, I can't figure out why the blue line corresponding to the first set of mean values is disjointed as it is.
Would be really grateful for any assistance!
Try this:
library(dplyr)
library(ggplot2)
newdata %>%
mutate(Model = as.factor(Model)) %>%
ggplot(aes(Affil_Max, Mean)) +
geom_line(aes(color = Model, group = Model)) +
geom_ribbon(alpha = .5, aes(ymin = Lower, ymax = Upper,
group = Model, fill = Model)) +
facet_grid(. ~ Cat_Auth)
I intend to put four graphs in a single page. Each plot shows the point estimate of a single statistic and its confidence interval. I am struggling with altering the width of geom_errorbar whisker in each plot. It does not seem to change, even though I alter the width argument in geom_errorbar().
It is important for me to graph those four statistics separately because both point estimates and confidence intervals are defined in different ranges for each statistic, as you can notice on the graph below. The multiplot function I use to plot multiple graphs is defined in http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/.
#creates data.frame where point estimates and confidence intervals will be
#stored
#the numbers inputed in df are similar to the ones I get from previously
#performed regressions
w<-c(1:4)
x<-c(0.68,0.87,2.93,4.66)
y<-c(0.47,0.57,0.97,3.38)
z<-c(0.83,1.34,4.17,7.46)
df<-data.frame(w,x,y,z)
#plot each statistic
#(each row from df is a statistic: w for index, x for point estimate,
#y for ci lower bound and z for ci upper bound)
p1 <- ggplot(df[1,], aes(x = w, y = x)) +
geom_point(size = 4) +
geom_errorbar(aes(ymax = y, ymin = z),width=.1) +
labs(x="",y="")
p2 <- ggplot(df[2,], aes(x = w, y = x)) +
geom_point(size = 4) +
geom_errorbar(aes(ymax = y, ymin = z),width=.1) +
labs(x="",y="")
p3 <- ggplot(df[3,], aes(x = w, y = x)) +
geom_point(size = 4) +
geom_errorbar(aes(ymax = y, ymin = z),width=.1) +
labs(x="",y="")
p4 <- ggplot(df[4,], aes(x = w, y = x)) +
geom_point(size = 4) +
geom_errorbar(aes(ymax = y, ymin = z),width=.1) +
labs(x="",y="")
multiplot(p1, p2, p3, p4, cols=2)
I greatly appreciate any help and advice.
Thanks,
Gabriel
EXAMPLE PLOT HERE. How can I change errorbar whisker width for each graph separately?
The width is changing, but the x-axis is scaling to match the width of the error bar. You need to set the x axis manually using, for example, xlim.
For p1, you could try + xlim(0.8, 1.2)
Alternatively you could use the expand argument to scale_x_continuous, e.g. scale_x_continuous(expand = c(0, 0.1)).
I'd like to plot a mirrored 95% density curve and map alpha to the density:
foo <- function(mw, sd, lower, upper) {
x <- seq(lower, upper, length=500)
dens <- dnorm(x, mean=mw, sd=sd, log=TRUE)
dens0 <- dens -min(dens)
return(data.frame(dens0, x))
}
df.rain <- foo(0,1,-1,1)
library(ggplot2)
drf <- ggplot(df.rain, aes(x=x, y=dens0))+
geom_line(aes(alpha=..y..))+
geom_line(aes(x=x, y=-dens0, alpha=-..y..))+
stat_identity(geom="segment", aes(xend=x, yend=0, alpha=..y..))+
stat_identity(geom="segment", aes(x=x, y=-dens0, xend=x, yend=0, alpha=-..y..))
drf
This works fine, but I'd like to make the contrast between the edges and the middle more prominent, i.e., I want the edges to be nearly white and only the middle part to be black. I've been tampering with scale_alpha() but without luck. Any ideas?
Edit: Ultimately, I'd like to plot several raindrops, i.e., the individual drops will be small but the shading should still be clearly visible.
Instead of mapping dens0 to the alpha, I'd map it to color:
drf <- ggplot(df.rain, aes(x=x, y=dens0))+
geom_line(aes(color=..y..))+
geom_line(aes(x=x, y=-dens0, color=-..y..))+
stat_identity(geom="segment", aes(xend=x, yend=0, color=..y..))+
stat_identity(geom="segment", aes(x=x, y=-dens0, xend=x, yend=0, color=-..y..))
Now we still have the contrast in color is mainly present in the tails. Using two colors helps a bit (note that the switch in color is at 0.25):
drf + scale_color_gradient2(midpoint = 0.25)
Finally, to include the distribution of the dens0 values, I base the midpoint of the color scale on the median value in the data:
drf + scale_color_gradient2(midpoint = median(df.rain$dens0))
Note!: But however the way you tweak your data, most contrast in your data is in the more extreme values in your dataset. Trying to mask this by messing with a non-linear scale, or by tweaking a color scale like I did, could present a false picture of the real data.
Here is a solution using geom_ribbon() instead of geom_line()
df.rain$group <- seq_along(df.rain$x)
tmp <- tail(df.rain, -1)
tmp$group <- tmp$group - 1
tmp$dens0 <- head(df.rain$dens0, -1)
dataset <- rbind(head(df.rain, -1), tmp)
ggplot(dataset, aes(x = x, ymin = -dens0, ymax = dens0, group = group,
alpha = dens0)) + geom_ribbon() + scale_alpha(range = c(0, 1))
ggplot(dataset, aes(x = x, ymin = -dens0, ymax = dens0, group = group,
fill = dens0)) + geom_ribbon() +
scale_fill_gradient(low = "white", high = "black")
See Paul's answer for changing the colours.
dataset9 <- merge(dataset, data.frame(study = 1:9))
ggplot(dataset9, aes(x = x, ymin = -dens0, ymax = dens0, group = group,
alpha = dens0)) + geom_ribbon() + scale_alpha(range = c(0, 0.5)) +
facet_wrap(~study)
While pondering both your answers I actually found exactly what I was looking for. The easiest way is to simply use scale_colour_gradientn with a vector of greys.
library(RColorBrewer)
grey <- brewer.pal(9,"Greys")
drf <- ggplot(df.rain, aes(x=x, y=dens0, col=dens0))+
stat_identity(geom="segment", aes(xend=x, yend=0))+
stat_identity(geom="segment", aes(x=x, y=-dens0, xend=x, yend=0))+
scale_colour_gradientn(colours=grey)
drf