Cannot alter geom_errorbar width inside multiplot - r

I intend to put four graphs in a single page. Each plot shows the point estimate of a single statistic and its confidence interval. I am struggling with altering the width of geom_errorbar whisker in each plot. It does not seem to change, even though I alter the width argument in geom_errorbar().
It is important for me to graph those four statistics separately because both point estimates and confidence intervals are defined in different ranges for each statistic, as you can notice on the graph below. The multiplot function I use to plot multiple graphs is defined in http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/.
#creates data.frame where point estimates and confidence intervals will be
#stored
#the numbers inputed in df are similar to the ones I get from previously
#performed regressions
w<-c(1:4)
x<-c(0.68,0.87,2.93,4.66)
y<-c(0.47,0.57,0.97,3.38)
z<-c(0.83,1.34,4.17,7.46)
df<-data.frame(w,x,y,z)
#plot each statistic
#(each row from df is a statistic: w for index, x for point estimate,
#y for ci lower bound and z for ci upper bound)
p1 <- ggplot(df[1,], aes(x = w, y = x)) +
geom_point(size = 4) +
geom_errorbar(aes(ymax = y, ymin = z),width=.1) +
labs(x="",y="")
p2 <- ggplot(df[2,], aes(x = w, y = x)) +
geom_point(size = 4) +
geom_errorbar(aes(ymax = y, ymin = z),width=.1) +
labs(x="",y="")
p3 <- ggplot(df[3,], aes(x = w, y = x)) +
geom_point(size = 4) +
geom_errorbar(aes(ymax = y, ymin = z),width=.1) +
labs(x="",y="")
p4 <- ggplot(df[4,], aes(x = w, y = x)) +
geom_point(size = 4) +
geom_errorbar(aes(ymax = y, ymin = z),width=.1) +
labs(x="",y="")
multiplot(p1, p2, p3, p4, cols=2)
I greatly appreciate any help and advice.
Thanks,
Gabriel
EXAMPLE PLOT HERE. How can I change errorbar whisker width for each graph separately?

The width is changing, but the x-axis is scaling to match the width of the error bar. You need to set the x axis manually using, for example, xlim.
For p1, you could try + xlim(0.8, 1.2)
Alternatively you could use the expand argument to scale_x_continuous, e.g. scale_x_continuous(expand = c(0, 0.1)).

Related

Expanding confidence interval in R ggplots using coord_cartesian -- not working [duplicate]

my ggplot R-code works perfectly ok with my other datasets but I'm stumbled with why it's not working for one particular data set. See image below where the filled confidence interval stops at 0.10:
For reproducing the problem:
library(nlme)
library(ggeffects)
library(ggplot2)
SurfaceCoverage <- c(0.02,0.04,0.06,0.08,0.1,0.12,0.02,0.04,0.06,0.08,0.1,0.12)
SpecificSurfaceEnergy <- c(18.0052997,15.9636971,14.2951057,13.0263081,13.0816591,13.3825573,2.9267577,2.2889628,1.8909175,1.0083036,0.5683574,0.1681063)
sample <- c(1,1,1,1,1,1,2,2,2,2,2,2)
highW <- data.frame(sample,SurfaceCoverage,SpecificSurfaceEnergy)
highW$sample <- sub("^", "Wettable", highW$sample)
highW$RelativeHumidity <- "High relative humidity"; highW$group <- "Wettable"
highW$sR <- paste(highW$sample,highW$RelativeHumidity)
dfhighW <- data.frame(
"y"=c(highW$SpecificSurfaceEnergy),
"x"=c(highW$SurfaceCoverage),
"b"=c(highW$sample),
"sR"=c(highW$sR)
)
mixed.lme <- lme(y~log(x),random=~1|b,data=dfhighW)
pred.mmhighW <- ggpredict(mixed.lme, terms = c("x"))
(ggplot(pred.mmhighW) +
geom_line(aes(x = x, y = predicted)) + # slope
geom_ribbon(aes(x = x, ymin = predicted - std.error, ymax = predicted + std.error),
fill = "lightgrey", alpha = 0.5) + # error band
geom_point(data = dfhighW, # adding the raw data (scaled values)
aes(x = x, y = y, shape = b)) +
xlim(0.01,0.2) +
ylim(0,30) +
labs(title = "") +
ylab(bquote('Specific Surface Energy ' (mJ/m^2))) +
xlab(bquote('Surface Coverage ' (n/n[m]) )) +
theme_minimal()
)
Can someone advise me how to fix this? Thanks.
The last part of your ribbon has disappeared because you have excluded it from the plot. The lower edge of your ribbon is the following vector:
pred.mmhighW$predicted - pred.mmhighW$std.error
#> [1] 3.91264018 2.37386628 1.47061258 0.82834206 0.32935718 -0.07886245
Note the final value is a small negative number, but you have set the y axis limits with:
ylim(0, 30)
So anything negative will be cut off. If you change to
ylim(-2, 30)
You get
I don't know whether this is already answered previously, but coord_cartesian and scales::squish are two solutions to this problem.
coord_cartesian adjusts the viewport without adjusting the spacing of grid lines etc. (unlike xlim()/scale_*_continuous(limits = ...), which will "zoom")
scales::squish() is suboptimal if you are "squishing" lines and points, not just edgeless polygons (in the case of fill/polygons, squishing and clipping produce the same results)
gg0 <- (ggplot(pred.mmhighW)
+ geom_ribbon(aes(x = x, ymin = predicted - std.error,
ymax = predicted + std.error),
fill = "lightgrey", alpha = 0.5)
+ theme_minimal()
)
## set lower limit to 5 for a more obvious effect
gg0 + coord_cartesian(ylim = c(5, 30))
gg0 + scale_y_continuous(limits = c(5, 30),
## oob = "out of bounds" behaviour
oob = scales::squish)

ggplot confidence interval not filling the whole dataset for my linear mixed model

my ggplot R-code works perfectly ok with my other datasets but I'm stumbled with why it's not working for one particular data set. See image below where the filled confidence interval stops at 0.10:
For reproducing the problem:
library(nlme)
library(ggeffects)
library(ggplot2)
SurfaceCoverage <- c(0.02,0.04,0.06,0.08,0.1,0.12,0.02,0.04,0.06,0.08,0.1,0.12)
SpecificSurfaceEnergy <- c(18.0052997,15.9636971,14.2951057,13.0263081,13.0816591,13.3825573,2.9267577,2.2889628,1.8909175,1.0083036,0.5683574,0.1681063)
sample <- c(1,1,1,1,1,1,2,2,2,2,2,2)
highW <- data.frame(sample,SurfaceCoverage,SpecificSurfaceEnergy)
highW$sample <- sub("^", "Wettable", highW$sample)
highW$RelativeHumidity <- "High relative humidity"; highW$group <- "Wettable"
highW$sR <- paste(highW$sample,highW$RelativeHumidity)
dfhighW <- data.frame(
"y"=c(highW$SpecificSurfaceEnergy),
"x"=c(highW$SurfaceCoverage),
"b"=c(highW$sample),
"sR"=c(highW$sR)
)
mixed.lme <- lme(y~log(x),random=~1|b,data=dfhighW)
pred.mmhighW <- ggpredict(mixed.lme, terms = c("x"))
(ggplot(pred.mmhighW) +
geom_line(aes(x = x, y = predicted)) + # slope
geom_ribbon(aes(x = x, ymin = predicted - std.error, ymax = predicted + std.error),
fill = "lightgrey", alpha = 0.5) + # error band
geom_point(data = dfhighW, # adding the raw data (scaled values)
aes(x = x, y = y, shape = b)) +
xlim(0.01,0.2) +
ylim(0,30) +
labs(title = "") +
ylab(bquote('Specific Surface Energy ' (mJ/m^2))) +
xlab(bquote('Surface Coverage ' (n/n[m]) )) +
theme_minimal()
)
Can someone advise me how to fix this? Thanks.
The last part of your ribbon has disappeared because you have excluded it from the plot. The lower edge of your ribbon is the following vector:
pred.mmhighW$predicted - pred.mmhighW$std.error
#> [1] 3.91264018 2.37386628 1.47061258 0.82834206 0.32935718 -0.07886245
Note the final value is a small negative number, but you have set the y axis limits with:
ylim(0, 30)
So anything negative will be cut off. If you change to
ylim(-2, 30)
You get
I don't know whether this is already answered previously, but coord_cartesian and scales::squish are two solutions to this problem.
coord_cartesian adjusts the viewport without adjusting the spacing of grid lines etc. (unlike xlim()/scale_*_continuous(limits = ...), which will "zoom")
scales::squish() is suboptimal if you are "squishing" lines and points, not just edgeless polygons (in the case of fill/polygons, squishing and clipping produce the same results)
gg0 <- (ggplot(pred.mmhighW)
+ geom_ribbon(aes(x = x, ymin = predicted - std.error,
ymax = predicted + std.error),
fill = "lightgrey", alpha = 0.5)
+ theme_minimal()
)
## set lower limit to 5 for a more obvious effect
gg0 + coord_cartesian(ylim = c(5, 30))
gg0 + scale_y_continuous(limits = c(5, 30),
## oob = "out of bounds" behaviour
oob = scales::squish)

How to apply separate coord_cartesian() to "zoom in" into individual panels of a facet_grid()?

Inspired by the Q Finding the elbow/knee in a curve I started to play around with smooth.spline().
In particular, I want to visualize how the parameter df (degree of freedom) influences the approximation and the first and second derivative. Note that this Q is not about approximation but about a specific problem (or edge case) in visualisation with ggplot2.
First attempt: simple facet_grid()
library(ggplot2)
ggplot(ap, aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw()
dp is a data.table containing the data points for which an approximation is sought and ap is a data.table with the approximated data plus the derivatives (data are given below).
For each row, facet_grid() with scales = "free_y" has choosen a scale which displays all data. Unfortunately, one panel has kind of "outliers" which make it difficult to see details in the other panels. So, I want to "zoom in".
"Zoom in" using coord_cartesian()
ggplot(ap, aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw() +
coord_cartesian(ylim = c(-200, 50))
With the manually selected range, more details in the panels of row 3 have been made visible. But, the limit has been applied to all panels of the grid. So, in row 1 details hardly can been distinguished.
What I'm looking for is a way to apply coord_cartesian() with specific parameters separately to each individual panel (or group of panels, e.g., rowwise) of the grid. For instance, is it possible to manipulate the ggplot object afterwards?
Workaround: Combine individual plots with cowplot
As a workaround, we can create three separate plots and combine them afterwards using the cowplot package:
g0 <- ggplot(ap[deriv == 0], aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw()
g1 <- ggplot(ap[deriv == 1], aes(x, y)) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw() +
coord_cartesian(ylim = c(-50, 50))
g2 <- ggplot(ap[deriv == 2], aes(x, y)) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw() +
coord_cartesian(ylim = c(-200, 100))
cowplot::plot_grid(g0, g1, g2, ncol = 1, align = "v")
Unfortunately, this solution
requires to write code to create three separate plots,
duplicates strips and axes and adds whitespace which isn't available for display of the data.
Is facet_wrap() an alternative?
We can use facet_wrap() instead of facet_grid():
ggplot(ap, aes(x, y)) +
# geom_point(data = dp, alpha = 0.2) + # this line causes error message
geom_line() +
facet_wrap(~ deriv + df, scales = "free_y", labeller = label_both, nrow = 3) +
theme_bw()
Now, the y-axes of every panel are scaled individually exhibiting details of some of the panels. Unfortunately, we still can't "zoom in" into the bottom right panel because using coord_cartesian() would affect all panels.
In addition, the line
geom_point(data = dp, alpha = 0.2)
strangely causes
Error in gList(list(x = 0.5, y = 0.5, width = 1, height = 1, just = "centre", :
only 'grobs' allowed in "gList"
I had to comment this line out, so the the data points which are to be approximated are not displayed.
Data
library(data.table)
# data points
dp <- data.table(
x = c(6.6260, 6.6234, 6.6206, 6.6008, 6.5568, 6.4953, 6.4441, 6.2186,
6.0942, 5.8833, 5.7020, 5.4361, 5.0501, 4.7440, 4.1598, 3.9318,
3.4479, 3.3462, 3.1080, 2.8468, 2.3365, 2.1574, 1.8990, 1.5644,
1.3072, 1.1579, 0.95783, 0.82376, 0.67734, 0.34578, 0.27116, 0.058285),
y = 1:32,
deriv = 0)
# approximated data points and derivatives
ap <- rbindlist(
lapply(seq(2, length(dp$x), length.out = 4),
function(df) {
rbindlist(
lapply(0:2,
function(deriv) {
result <- as.data.table(
predict(smooth.spline(dp$x, dp$y, df = df), deriv = deriv))
result[, c("df", "deriv") := list(df, deriv)]
})
)
})
)
Late answer, but the following hack just occurred to me. Would it work for your use case?
Step 1. Create an alternative version of the intended plot, limiting the range of y values such that scales = "free_y" gives a desired scale range for each facet row. Also create the intended facet plot with the full data range:
library(ggplot2)
library(dplyr)
# alternate plot version with truncated data range
p.alt <- ap %>%
group_by(deriv) %>%
mutate(upper = quantile(y, 0.75),
lower = quantile(y, 0.25),
IQR.multiplier = (upper - lower) * 10) %>%
ungroup() %>%
mutate(is.outlier = y < lower - IQR.multiplier | y > upper + IQR.multiplier) %>%
mutate(y = ifelse(is.outlier, NA, y)) %>%
ggplot(aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw()
# intended plot version with full data range
p <- p.alt %+% ap
Step 2. Use ggplot_build() to generate plot data for both ggplot objects. Apply the panel parameters of the alt version onto the intended version:
p <- ggplot_build(p)
p.alt <- ggplot_build(p.alt)
p$layout$panel_params <- p.alt$layout$panel_params
rm(p.alt)
Step 3. Build the intended plot from the modified plot data, & plot the result:
p <- ggplot_gtable(p)
grid::grid.draw(p)
Note: in this example, I truncated the data range by setting all values more than 10*IQR away from the upper / lower quartile in each facet row as NA. This can be replaced by any other logic for defining outliers.

How to prevent line to extend across whole graph

Currently, the below code (part of a more comprehensive code) generates a line that ranges from the very left to the very right of the graph.
geom_abline(intercept=-8.3, slope=1/1.415, col = "black", size = 1,
lty="longdash", lwd=1) +
However, I would like the line to only range from x=1 to x=9; the limits of the x-axis are 1-9.
In ggplot2, is there a command to reduce a line that is derived from a manually defined intercept and slope to only cover the range of the x-axis value limits?
You could use geom_segment instead of geom_abline if you want to manually define the line. If your slope is derived from the dataset you are plotting from, the easiest thing to do is use stat_smooth with method = "lm".
Here is an example with some toy data:
set.seed(16)
x = runif(100, 1, 9)
y = -8.3 + (1/1.415)*x + rnorm(100)
dat = data.frame(x, y)
Estimate intercept and slope:
coef(lm(y~x))
(Intercept) x
-8.3218990 0.7036189
First make the plot with geom_abline for comparison:
ggplot(dat, aes(x, y)) +
geom_point() +
geom_abline(intercept = -8.32, slope = 0.704) +
xlim(1, 9)
Using geom_segment instead, have to define the start and end of the line for both x and y. Make sure line is truncated between 1 and 9 on the x axis.
ggplot(dat, aes(x, y)) +
geom_point() +
geom_segment(aes(x = 1, xend = 9, y = -8.32 + .704, yend = -8.32 + .704*9)) +
xlim(1, 9)
Using stat_smooth. This will draw the line only within the range of the explanatory variable by default.
ggplot(dat, aes(x, y)) +
geom_point() +
stat_smooth(method = "lm", se = FALSE, color = "black") +
xlim(1, 9)

How to combine stat_ecdf with geom_ribbon?

I am trying to draw an ECDF of some data with a "confidence interval" represented via a shaded region using ggplot2. I am having trouble combining geom_ribbon() with stat_ecdf() to achieve the effect I am after.
Consider the following example data:
set.seed(1)
dat <- data.frame(variable = rlnorm(100) + 2)
dat <- transform(dat, lower = variable - 2, upper = variable + 2)
> head(dat)
variable lower upper
1 2.534484 0.5344838 4.534484
2 3.201587 1.2015872 5.201587
3 2.433602 0.4336018 4.433602
4 6.929713 4.9297132 8.929713
5 3.390284 1.3902836 5.390284
6 2.440225 0.4402254 4.440225
I am able to produce an ECDF of variable using
library("ggplot2")
ggplot(dat, aes(x = variable)) +
geom_step(stat = "ecdf")
However I am unable to use lower and upper as the ymin and ymax aesthetics of geom_ribbon() to superimpose the confidence interval on the plot as another layer. I have tried:
ggplot(dat, aes(x = variable)) +
geom_ribbon(aes(ymin = lower, ymax = upper), stat = "ecdf") +
geom_step(stat = "ecdf")
but this raises the following error
Error: geom_ribbon requires the following missing aesthetics: ymin, ymax
Is there a way to coax geom_ribbon() into working with stat_ecdf() to produce a shaded confidence interval? Or, can anyone suggest an alternative means of adding a shaded polygon defined by lower and upper as a layer to the ECDF plot?
Try this (a bit of shot in the dark):
ggplot(dat, aes(x = variable)) +
geom_ribbon(aes(x = variable,ymin = ..y..-2,ymax = ..y..+2), stat = "ecdf",alpha=0.2) +
geom_step(stat = "ecdf")
Ok, so that's not the same thing as what you trying to do, but it should explain what's going on. The stat is returning a data frame with just the original x and the computed y, so I think that's all you have to work with. i.e. stat_ecdf only computes the cumulative distribution function for a single x at a time.
The only other thing I can think of is the obvious, calculating the lower and upper separately, something like this:
l <- ecdf(dat$lower)
u <- ecdf(dat$upper)
v <- ecdf(dat$variable)
dat$lower1 <- l(dat$variable)
dat$upper1 <- u(dat$variable)
dat$variable1 <- v(dat$variable)
ggplot(dat,aes(x = variable)) +
geom_step(aes(y = variable1)) +
geom_ribbon(aes(ymin = upper1,ymax = lower1),alpha = 0.2)
Not sure exactly how you want to reflect the CI, but ggplot_build() lets you get the generated data back from the plot, you can then overplot what you like.
This chart shows:
red = original ribbon
blue = takes the original CI vectors and applies to the ecdf curve
green = calculates the ecdf of upper and lower series and plots
g<-ggplot(dat, aes(x = variable)) +
geom_step(stat = "ecdf") +
geom_ribbon(aes(ymin = lower, ymax = upper), alpha=0.5, fill="red")
inside<-ggplot_build(g)
matched<-merge(inside$data[[1]],data.frame(x=dat$variable,dat$lower,dat$upper),by=("x"))
g +
geom_ribbon(data=matched, aes(x = x,
ymin = y + dat.upper-x,
ymax = y - x + dat.lower),
alpha=0.5, fill="blue") +
geom_ribbon(data=matched, aes(x = x,
ymin = ecdf(dat.lower)(x),
ymax = ecdf(dat.upper)(x)),
alpha=0.5, fill="green")

Resources