I am trying to generate a graph with the estimates and confidence intervals from the same regression for a number of countries. I ran the regressions using dplyr's group_by(country), and then I aggregated all the results into a data frame with broom's tidy().
When creating the graph from this data frame (called bycountry1), I run the following code:
ggplot(bycountry1, aes(x = country, y = estimate, ymin = estimate - std.error * 2, ymax = estimate + std.error * 2)) +
geom_hline(yintercept = 0, colour = "black", lty = 2) +
geom_pointrange() +
coord_flip() + facet_grid(. ~ term, scales = "free")
This is what I want, except that I'd like to have the scales for each box to be different, so that all of them would look more like the religious1box. Since that is the one with most variability, it dominates the scale, and then in most of the other boxes you cannot see the variance. As the code above shows, I did indicate scales = "free" in facet_grid() and I tried all the variants, also with facet_wrap(), and I cannot get this to work.
Following the suggestion of aosmith, I made it work using geom_errorbarh and removing coord_flip(). I also had to set the height of the geom_errorbarh to 0 and add a geom_point for the estimate. Here is the code:
ggplot(bycountry1, aes(y = country, x = estimate, xmin = estimate - std.error * 2, xmax = estimate + std.error * 2)) +
geom_vline(xintercept = 0, colour = "black", lty = 2) +
geom_point() +
geom_errorbarh(height = 0) +
facet_grid(. ~ term, scales = "free")
And the resulting image
Related
my ggplot R-code works perfectly ok with my other datasets but I'm stumbled with why it's not working for one particular data set. See image below where the filled confidence interval stops at 0.10:
For reproducing the problem:
library(nlme)
library(ggeffects)
library(ggplot2)
SurfaceCoverage <- c(0.02,0.04,0.06,0.08,0.1,0.12,0.02,0.04,0.06,0.08,0.1,0.12)
SpecificSurfaceEnergy <- c(18.0052997,15.9636971,14.2951057,13.0263081,13.0816591,13.3825573,2.9267577,2.2889628,1.8909175,1.0083036,0.5683574,0.1681063)
sample <- c(1,1,1,1,1,1,2,2,2,2,2,2)
highW <- data.frame(sample,SurfaceCoverage,SpecificSurfaceEnergy)
highW$sample <- sub("^", "Wettable", highW$sample)
highW$RelativeHumidity <- "High relative humidity"; highW$group <- "Wettable"
highW$sR <- paste(highW$sample,highW$RelativeHumidity)
dfhighW <- data.frame(
"y"=c(highW$SpecificSurfaceEnergy),
"x"=c(highW$SurfaceCoverage),
"b"=c(highW$sample),
"sR"=c(highW$sR)
)
mixed.lme <- lme(y~log(x),random=~1|b,data=dfhighW)
pred.mmhighW <- ggpredict(mixed.lme, terms = c("x"))
(ggplot(pred.mmhighW) +
geom_line(aes(x = x, y = predicted)) + # slope
geom_ribbon(aes(x = x, ymin = predicted - std.error, ymax = predicted + std.error),
fill = "lightgrey", alpha = 0.5) + # error band
geom_point(data = dfhighW, # adding the raw data (scaled values)
aes(x = x, y = y, shape = b)) +
xlim(0.01,0.2) +
ylim(0,30) +
labs(title = "") +
ylab(bquote('Specific Surface Energy ' (mJ/m^2))) +
xlab(bquote('Surface Coverage ' (n/n[m]) )) +
theme_minimal()
)
Can someone advise me how to fix this? Thanks.
The last part of your ribbon has disappeared because you have excluded it from the plot. The lower edge of your ribbon is the following vector:
pred.mmhighW$predicted - pred.mmhighW$std.error
#> [1] 3.91264018 2.37386628 1.47061258 0.82834206 0.32935718 -0.07886245
Note the final value is a small negative number, but you have set the y axis limits with:
ylim(0, 30)
So anything negative will be cut off. If you change to
ylim(-2, 30)
You get
I don't know whether this is already answered previously, but coord_cartesian and scales::squish are two solutions to this problem.
coord_cartesian adjusts the viewport without adjusting the spacing of grid lines etc. (unlike xlim()/scale_*_continuous(limits = ...), which will "zoom")
scales::squish() is suboptimal if you are "squishing" lines and points, not just edgeless polygons (in the case of fill/polygons, squishing and clipping produce the same results)
gg0 <- (ggplot(pred.mmhighW)
+ geom_ribbon(aes(x = x, ymin = predicted - std.error,
ymax = predicted + std.error),
fill = "lightgrey", alpha = 0.5)
+ theme_minimal()
)
## set lower limit to 5 for a more obvious effect
gg0 + coord_cartesian(ylim = c(5, 30))
gg0 + scale_y_continuous(limits = c(5, 30),
## oob = "out of bounds" behaviour
oob = scales::squish)
I want to overlay parameter estimates of group intercept and slope from a Bayesian analysis onto a grouped ggplot scatter-plot of actual data. I can overlay the individual lines just fine but I would really like to get a single mean line for each of the groups as well.
Here is some toy data. Three groups with differing intercepts and slopes
# data
x <- rnorm(120, 0, 1)
y <- c(20 + 3*x[1:40] + rnorm(40,0.01), rnorm(40,0.01), 10 + -3*x[81:120] + rnorm(40,0.01))
group = factor(rep(letters[1:3], each = 40))
df <- data.frame(group,x,y)
# fake parameter estimates of intercept and slope
parsDF <- data.frame(int = c(rnorm(10,20,.5), rnorm(10,0,.5), rnorm(10,10,.5)),
slope = c(rnorm(10,3,.3), rnorm(10,0,.3), rnorm(10,-3,.3)),
group = rep(letters[1:3], each = 10))
Now for the plot
ggplot(df, aes(x,y, colour = group)) +
geom_abline(data = parsDF, aes(intercept = int, slope = slope), colour = "gray75") +
geom_point() +
facet_wrap(~group)
I thought maybe I could add a single mean intercept and slope line for each group via stat.summary-type methods, like so
ggplot(df, aes(x,y, colour = group)) +
geom_abline(data = parsDF, aes(intercept = int, slope = slope), colour = "gray75") +
geom_abline(data = parsDF, aes(intercept = int, slope = slope), stat = "summary", fun.y = "mean", colour = "black", linetype = "dotted") +
geom_point() +
facet_wrap(~group)
But it just ignores those arguments and re-plots the individual lines over the existing ones.
I realise I could just calculate the mean of the intercepts and slopes for each group and brute-force that into the graph somehow but I can't see how to do that without mucking up the faceting by group, other than by creating another dataframe for mean slopes and intercepts and passing that into the plot as well. And I don't want to simply use geom_smooth() because that will use the actual data not my parameter estimates.
Any help much appreciated
I'm trying to plot the results of margin command (Average Marginal Effects) and the order of variables on the plot doesn't match the order of labels (for one label I get a value of another variable). For ggplot everything is ok (although it uses summary). Can anyone explain what is going on and how to make a proper plot? I'd be grateful :)
library(ggplot2)
library(tibble)
library(broom)
library(margins)
library(Ecdat)
data(Participation)
?Participation
logit_participation = glm(lfp ~ ., data = Participation, family = "binomial")
tidy(logit_participation)
summary(logit_participation)
effects_logit_participation = margins(logit_participation)
print(effects_logit_participation)
summary(effects_logit_participation)
plot(effects_logit_participation)
effects_logit_participation = summary(effects_logit_participation)
ggplot(data = effects_logit_participation) +
geom_point(aes(factor, AME)) +
geom_errorbar(aes(x = factor, ymin = lower, ymax = upper)) +
geom_hline(yintercept = 0) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45))
I intend to put four graphs in a single page. Each plot shows the point estimate of a single statistic and its confidence interval. I am struggling with altering the width of geom_errorbar whisker in each plot. It does not seem to change, even though I alter the width argument in geom_errorbar().
It is important for me to graph those four statistics separately because both point estimates and confidence intervals are defined in different ranges for each statistic, as you can notice on the graph below. The multiplot function I use to plot multiple graphs is defined in http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/.
#creates data.frame where point estimates and confidence intervals will be
#stored
#the numbers inputed in df are similar to the ones I get from previously
#performed regressions
w<-c(1:4)
x<-c(0.68,0.87,2.93,4.66)
y<-c(0.47,0.57,0.97,3.38)
z<-c(0.83,1.34,4.17,7.46)
df<-data.frame(w,x,y,z)
#plot each statistic
#(each row from df is a statistic: w for index, x for point estimate,
#y for ci lower bound and z for ci upper bound)
p1 <- ggplot(df[1,], aes(x = w, y = x)) +
geom_point(size = 4) +
geom_errorbar(aes(ymax = y, ymin = z),width=.1) +
labs(x="",y="")
p2 <- ggplot(df[2,], aes(x = w, y = x)) +
geom_point(size = 4) +
geom_errorbar(aes(ymax = y, ymin = z),width=.1) +
labs(x="",y="")
p3 <- ggplot(df[3,], aes(x = w, y = x)) +
geom_point(size = 4) +
geom_errorbar(aes(ymax = y, ymin = z),width=.1) +
labs(x="",y="")
p4 <- ggplot(df[4,], aes(x = w, y = x)) +
geom_point(size = 4) +
geom_errorbar(aes(ymax = y, ymin = z),width=.1) +
labs(x="",y="")
multiplot(p1, p2, p3, p4, cols=2)
I greatly appreciate any help and advice.
Thanks,
Gabriel
EXAMPLE PLOT HERE. How can I change errorbar whisker width for each graph separately?
The width is changing, but the x-axis is scaling to match the width of the error bar. You need to set the x axis manually using, for example, xlim.
For p1, you could try + xlim(0.8, 1.2)
Alternatively you could use the expand argument to scale_x_continuous, e.g. scale_x_continuous(expand = c(0, 0.1)).
Inspired by the Q Finding the elbow/knee in a curve I started to play around with smooth.spline().
In particular, I want to visualize how the parameter df (degree of freedom) influences the approximation and the first and second derivative. Note that this Q is not about approximation but about a specific problem (or edge case) in visualisation with ggplot2.
First attempt: simple facet_grid()
library(ggplot2)
ggplot(ap, aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw()
dp is a data.table containing the data points for which an approximation is sought and ap is a data.table with the approximated data plus the derivatives (data are given below).
For each row, facet_grid() with scales = "free_y" has choosen a scale which displays all data. Unfortunately, one panel has kind of "outliers" which make it difficult to see details in the other panels. So, I want to "zoom in".
"Zoom in" using coord_cartesian()
ggplot(ap, aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw() +
coord_cartesian(ylim = c(-200, 50))
With the manually selected range, more details in the panels of row 3 have been made visible. But, the limit has been applied to all panels of the grid. So, in row 1 details hardly can been distinguished.
What I'm looking for is a way to apply coord_cartesian() with specific parameters separately to each individual panel (or group of panels, e.g., rowwise) of the grid. For instance, is it possible to manipulate the ggplot object afterwards?
Workaround: Combine individual plots with cowplot
As a workaround, we can create three separate plots and combine them afterwards using the cowplot package:
g0 <- ggplot(ap[deriv == 0], aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw()
g1 <- ggplot(ap[deriv == 1], aes(x, y)) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw() +
coord_cartesian(ylim = c(-50, 50))
g2 <- ggplot(ap[deriv == 2], aes(x, y)) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw() +
coord_cartesian(ylim = c(-200, 100))
cowplot::plot_grid(g0, g1, g2, ncol = 1, align = "v")
Unfortunately, this solution
requires to write code to create three separate plots,
duplicates strips and axes and adds whitespace which isn't available for display of the data.
Is facet_wrap() an alternative?
We can use facet_wrap() instead of facet_grid():
ggplot(ap, aes(x, y)) +
# geom_point(data = dp, alpha = 0.2) + # this line causes error message
geom_line() +
facet_wrap(~ deriv + df, scales = "free_y", labeller = label_both, nrow = 3) +
theme_bw()
Now, the y-axes of every panel are scaled individually exhibiting details of some of the panels. Unfortunately, we still can't "zoom in" into the bottom right panel because using coord_cartesian() would affect all panels.
In addition, the line
geom_point(data = dp, alpha = 0.2)
strangely causes
Error in gList(list(x = 0.5, y = 0.5, width = 1, height = 1, just = "centre", :
only 'grobs' allowed in "gList"
I had to comment this line out, so the the data points which are to be approximated are not displayed.
Data
library(data.table)
# data points
dp <- data.table(
x = c(6.6260, 6.6234, 6.6206, 6.6008, 6.5568, 6.4953, 6.4441, 6.2186,
6.0942, 5.8833, 5.7020, 5.4361, 5.0501, 4.7440, 4.1598, 3.9318,
3.4479, 3.3462, 3.1080, 2.8468, 2.3365, 2.1574, 1.8990, 1.5644,
1.3072, 1.1579, 0.95783, 0.82376, 0.67734, 0.34578, 0.27116, 0.058285),
y = 1:32,
deriv = 0)
# approximated data points and derivatives
ap <- rbindlist(
lapply(seq(2, length(dp$x), length.out = 4),
function(df) {
rbindlist(
lapply(0:2,
function(deriv) {
result <- as.data.table(
predict(smooth.spline(dp$x, dp$y, df = df), deriv = deriv))
result[, c("df", "deriv") := list(df, deriv)]
})
)
})
)
Late answer, but the following hack just occurred to me. Would it work for your use case?
Step 1. Create an alternative version of the intended plot, limiting the range of y values such that scales = "free_y" gives a desired scale range for each facet row. Also create the intended facet plot with the full data range:
library(ggplot2)
library(dplyr)
# alternate plot version with truncated data range
p.alt <- ap %>%
group_by(deriv) %>%
mutate(upper = quantile(y, 0.75),
lower = quantile(y, 0.25),
IQR.multiplier = (upper - lower) * 10) %>%
ungroup() %>%
mutate(is.outlier = y < lower - IQR.multiplier | y > upper + IQR.multiplier) %>%
mutate(y = ifelse(is.outlier, NA, y)) %>%
ggplot(aes(x, y)) +
geom_point(data = dp, alpha = 0.2) +
geom_line() +
facet_grid(deriv ~ df, scales = "free_y", labeller = label_both) +
theme_bw()
# intended plot version with full data range
p <- p.alt %+% ap
Step 2. Use ggplot_build() to generate plot data for both ggplot objects. Apply the panel parameters of the alt version onto the intended version:
p <- ggplot_build(p)
p.alt <- ggplot_build(p.alt)
p$layout$panel_params <- p.alt$layout$panel_params
rm(p.alt)
Step 3. Build the intended plot from the modified plot data, & plot the result:
p <- ggplot_gtable(p)
grid::grid.draw(p)
Note: in this example, I truncated the data range by setting all values more than 10*IQR away from the upper / lower quartile in each facet row as NA. This can be replaced by any other logic for defining outliers.