Plotting a conference interval on a poisson glm (ggplot) - r

I have a poisson glm and i am trying to plot 95% condidence intervals using ggplot. My issue lies when i use the geom_ribbon() argument. I think my model and CIs are all set up fine, its just the ggplot code that i cannot get to work, if anyone knows what ive done wrong in the geom_ribbon argument this could be great
model, CIs and plot code
#creating the poisson GLM model
model3 = glm(cases ~ date,
data = aids,
family = poisson(link='log'))
#make predictions
model3_preds = predict(model3, type = 'response')
#create predictions for confidence intervals
predictions_model3 = predict(model3, aids, se.fit = TRUE, type = 'response')
#calculate 95% confidence intervals limit
upper_mod3 = predictions_model3$fit+1.96*predictions_model3$se.fit
lower_mod3 = predictions_model3$fit-1.96*predictions_model3$se.fit
#combining our predictions and confidence intervals into a df
predframe_model3 = data.frame(lwr = lower_mod3, upr = upper_mod3, data = aids$date, cases = aids$cases)
#plotting our model with 95% confidence intervals around the mean
ggplot(aids, aes(date, cases)) +
geom_ribbon(data = predframe_model3, aes(ymin = lwr, ymax = upr), fill = 'grey') +
geom_point() +
geom_line(aes(date, model3_preds), col = 'red')
aids data snipet if needed
aids
cases quarter date
1 2 1 83.00
2 6 2 83.25
3 10 3 83.50
4 8 4 83.75
5 12 1 84.00
6 9 2 84.25

In this case, you can plot directly with ggplot without a prediction data frame, using geom_smooth:
ggplot(aids, aes(date, cases)) +
geom_smooth(method = glm, formula = y ~ x, color = "red",
method.args = list(family = poisson)) +
geom_point()

Related

Plotting smooth functions from my GAM in ggplot

I have created a GAM and set up the predictions but having trouble with how to plot any smooth functions from my model. Been trying to plot these in ggplot but having trouble with the arguments/aesthetics now I have added in a the month aswell, seening some people say to use geom_smooth() too but I'm not sure. If anyone can advise me on this that would be great, i had added my data, model and predictions below;
model
mod = gam(co2 ~ s(timeStep, k = 200, bs = "cs") + s(month, k = 12, bs = "cc"),
data = carbonD,
family = gaussian(link = "identity"))
predictions
#create predictions
preds = predict(mod, type = 'terms', se.fit = TRUE)
#combine our predictions with coefficients
fit = preds$fit + coef(mod)[1]
data snipet
carbonD
co2 month year timeStep
1 315.42 1 1959 1
2 316.31 2 1959 2
3 316.50 3 1959 3
4 317.56 4 1959 4
5 318.13 5 1959 5
6 318.00 6 1959 6
7 316.39 7 1959 7
8 314.65 8 1959 8
9 313.68 9 1959 9
10 313.18 10 1959 10
11 314.66 11 1959 11
12 315.43 12 1959 12
13 316.27 1 1960 13
14 316.81 2 1960 14
15 317.42 3 1960 15
There are two ways to plot your exact model in ggplot. One is to use geom_smooth, but you can't do this with two variables on the right hand side. Actually, in your case it would be possible because month is calculable from the time step, but let's ignore that for now and just plot your model predictions directly using a ribbon and a line.
First, load the required packages and create the model (note because we only have a snippet of your data, I have had to reduce the number of knots)
library(mgcv)
library(ggplot2)
mod = gam(co2 ~ s(timeStep, k = 4, bs = "cs") + s(month, k = 12, bs = "cc"),
data = carbonD,
family = gaussian(link = "identity"))
Now we create a little data frame of the values we want our predictions at, with 1000 points across the range of our data:
newdata <- data.frame(timeStep = seq(1, 15, length.out = 1000),
month = (seq(1, 15, length.out = 1000) - 1) %% 12 + 1)
Now we make our predictions and use the standard error fit to create an upper and lower confidence band.
pred <- predict(mod, newdata, type = 'response', se.fit = TRUE)
newdata$co2 <- pred$fit
newdata$lower <- pred$fit - 1.96 * pred$se.fit
newdata$upper <- pred$fit + 1.96 * pred$se.fit
Now we can plot our results:
ggplot(carbonD, aes(timeStep, co2)) +
geom_point() +
geom_ribbon(data = newdata, alpha = 0.3,
aes(ymin = lower, ymax = upper, fill = "confidence interval")) +
geom_line(data = newdata, aes(color = "GAM")) +
scale_fill_manual(values = "lightblue", name = NULL) +
scale_color_manual(values = "darkblue", name = NULL) +
theme_minimal(base_size = 16)
It is also possible to use your gam within geom_smooth directly, but you need to be able to express the model in terms of y and x, where x is the time step. You can get the month by subtracting 1 from the time step, getting this number modulo 12, and adding 1 again, so it is possible to avoid explicitly creating a prediction data frame, at the cost of making the plotting code more complex:
ggplot(carbonD, aes(timeStep, co2)) +
geom_point() +
geom_smooth(formula = y ~ s(x, k = 4, bs = "cs") +
s((x - 1) %% 12 + 1, k = 12, bs = "cc"),
method = "gam", size = 0.7,
method.args = list(family = gaussian(link = "identity")),
aes(color = "gam", fill = "confidence interval")) +
scale_fill_manual(values = "lightblue", name = NULL) +
scale_color_manual(values = "darkblue", name = NULL) +
theme_minimal(base_size = 16)
As a caveat to this, it is not clear to me that you should have both month and timestep, since one is just the modulus of the other. It may be better to use just timestep alone, or use year and month if you want to separate the long-term and seasonal effects.
The simplest way would be to use geom_smooth with LOESS: geom_smooth(method="loess", span=0.5) and play with the span parameter to get a more smooth or wiggly shape.

Unable to plot confidence intervals using ggplot, (geom_ribbon() argument)

I am trying to plot 95% confidence intervals on some simulated values but am running into so issues when i am trying to plot the CIs using the geom_ribbon() argument. The trouble I'm having it that my model does not show the CIs when i plot them, like so;
I have included all of my code below if anyone knows where i have gone wrong here;
set.seed(20220520)
#simulating 200 values between 0 and 1 from a uniform distribution
x = runif(200, min = 0, max = 1)
lam = exp(0.3+5*x)
y = rpois(200, lambda = lam)
#before we do this each Yi may contain zeros so we need to add a small constant
y <- y + .1
#combining x and y into a dataframe so we can plot
df = data.frame(x, y)
#fitting a Poisson GLM
model2 <- glm(y ~ x,
data = df,
family = poisson(link='log'))
#make predictions (this may be the same as predictions_mod2)
preds <- predict(model2, type = "response")
#making CI predictions
predictions_mod2 = predict(model2, df, se.fit = TRUE, type = 'response')
#calculate confidence intervals limit
upper_mod2 = predictions_mod2$fit+1.96*predictions_mod2$se.fit
lower_mod2 = predictions_mod2$fit-1.96*predictions_mod2$se.fit
#transform the CI limit to get one at the level of the mean
upper_mod2 = exp(upper_mod2)/(1+exp(upper_mod2))
lower_mod2 = exp(lower_mod2)/(1+exp(lower_mod2))
#combining into a df
predframe = data.frame(lwr=lower_mod2,upr=upper_mod2, x = df$x, y = df$y)
#plot model with 95% confidence intervals using ggplot
ggplot(df, aes(x, y)) +
geom_ribbon(data = predframe, aes(ymin=lwr, ymax=upr), alpha = 0.4) +
geom_point() +
geom_line(aes(x, preds2), col = 'blue')
In a comment to the question, it's asked why not to logit transform the predicted values. The reason why is that the type of prediction asked for is "response". From the documentation, my emphasis.
type
the type of prediction required. The default is on the scale of the linear predictors; the alternative "response" is on the scale of the response variable. Thus for a default binomial model the default predictions are of log-odds (probabilities on logit scale) and type = "response" gives the predicted probabilities. The "terms" option returns a matrix giving the fitted values of each term in the model formula on the linear predictor scale.
There is a good way to answer, to show the code.
library(ggplot2, quietly = TRUE)
set.seed(20220520)
#simulating 200 values between 0 and 1 from a uniform distribution
x = runif(200, min = 0, max = 1)
lam = exp(0.3+5*x)
y = rpois(200, lambda = lam)
#before we do this each Yi may contain zeros so we need to add a small constant
y <- y + 0.1
#combining x and y into a dataframe so we can plot
df = data.frame(x, y)
#fitting a Poisson GLM
suppressWarnings(
model2 <- glm(y ~ x,
data = df,
family = poisson(link='log'))
)
#make predictions (this may be the same as predictions_mod2)
preds <- predict(model2, type = "response")
#making CI predictions
predictions_mod2 = predict(model2, df, se.fit = TRUE, type = 'response')
#calculate confidence intervals limit
upper_mod2 = predictions_mod2$fit+1.96*predictions_mod2$se.fit
lower_mod2 = predictions_mod2$fit-1.96*predictions_mod2$se.fit
#combining into a df
predframe = data.frame(lwr=lower_mod2,upr=upper_mod2, x = df$x, y = df$y)
#plot model with 95% confidence intervals using ggplot
ggplot(df, aes(x, y)) +
geom_ribbon(data = predframe, aes(ymin=lwr, ymax=upr), alpha = 0.4) +
geom_point() +
geom_line(aes(x, preds), col = 'blue')
Created on 2022-05-29 by the reprex package (v2.0.1)

Boxplot not showing range

I have predicted values, via:
glm0 <- glm(use ~ as.factor(decision), data = decision_use, family = binomial(link = "logit"))
predicted_glm <- predict(glm0, newdata = decision_use, type = "response", interval = "confidence", se = TRUE)
predict <- predicted_glm$fit
predict <- predict + 1
head(predict)
1 2 3 4 5 6
0.3715847 0.3095335 0.3095335 0.3095335 0.3095335 0.5000000
Now when I plot a box plot using ggplot2,
ggplot(decision_use, aes(x = decision, y = predict)) +
geom_boxplot(aes(fill = factor(decision)), alpha = .2)
I get a box plot with one horizontal line per categorical variable. If you look at the predict data, it's same for each categorical variable, so makes sense.
But I want a box plot with the range. How can I get that? When I use "use" instead of predict, I get boxes stretching from end to end (1 to 0). So I suppose that's not it. Thank you in advance.
To clarify, predicted_glm includes se.fit values. I wonder how to incorporate those.
It doesn't really make sense to do a boxplot here. A boxplot shows the range and spread of a continuous variable within groups. Your dependent variable is binary, so the values are all 0 or 1. Since you are plotting predictions for each group, your plot would have just a single point representing the expected value (i.e. the probability) for each group.
The closest you can come is probably to plot the prediction with 95% confidence bars around it.
You haven't provided any sample data, so I'll make some up here:
set.seed(100)
df <- data.frame(outcome = rbinom(200, 1, c(0.1, 0.9)), var1 = rep(c("A", "B"), 100))
Now we'll create our model and get the prediction for each level of my predictor variable using the newdata parameter of predict. I'm going to specify type = "link" because I want the log odds, and I'm also going to specify se.fit = TRUE so I can get the standard error of these predictions:
mod <- glm(outcome ~ var1, data = df, family = binomial)
prediction <- predict(mod, list(var1 = c("A", "B")), se.fit = TRUE, type = "link")
Now I can work out the 95% confidence intervals for my predictions:
prediction$lower <- prediction$fit - prediction$se.fit * 1.96
prediction$upper <- prediction$fit + prediction$se.fit * 1.96
Finally, I transform the fit and confidence intervals from log odds into probabilities:
prediction <- lapply(prediction, function(logodds) exp(logodds)/(1 + exp(logodds)))
plotdf <- data.frame(Group = c("A", "B"), fit = prediction$fit,
upper = prediction$upper, lower = prediction$lower)
plotdf
#> Group fit upper lower
#> 1 A 0.13 0.2111260 0.07700412
#> 2 B 0.92 0.9594884 0.84811360
Now I am ready to plot. I will use geom_points for the probability estimates and geom_errorbars for the confidence intervals :
library(ggplot2)
ggplot(plotdf, aes(x = Group, y = fit, colour = Group)) +
geom_errorbar(aes(ymin = lower, ymax = upper), size = 2, width = 0.5) +
geom_point(size = 3, colour = "black") +
scale_y_continuous(limits = c(0, 1)) +
labs(title = "Probability estimate with 95% CI", y = "Probability")
Created on 2020-05-11 by the reprex package (v0.3.0)

How is `level` used to generate the confidence interval in geom_smooth?

I'm having trouble emulating how stat_smooth calculates it's confidence interval.
Let's generate some data and a simple model:
library(tidyverse)
# sample data
df = tibble(
x = runif(10),
y = x + rnorm(10)*0.2
)
# simple linear model
model = lm(y ~ x, df)
Now use predict() to generate values and confidence intervals
# predict
df$predicted = predict(
object = model,
newdata = df
)
# predict 95% confidence interval
df$CI = predict(
object = model,
newdata = df,
se.fit = TRUE
)$se.fit * qnorm(1 - (1-0.95)/2)
Notice that qnorm is used to expand from standard error to 95% CI
Plot the data (black dots), geom_smooth (black line + gray ribbon), and the predicted ribbon (red and blue lines).
ggplot(df) +
aes(x = x, y = y) +
geom_point(size = 2) +
geom_smooth(method = "lm", level = 0.95, fullrange = TRUE, color = "black") +
geom_line(aes(y = predicted + CI), color = "blue") + # upper
geom_line(aes(y = predicted - CI), color = "red") + # lower
theme_classic()
The red and blue lines should be the same as the ribbon's edges. What am I doing wrong?
As posted in a comment by #Dason, the answer is that geom_smooth uses a t-distribution, not a normal distribution.
In my original question, replace qnorm(1 - (1-0.95)/2) with qt(1 - (1-0.95)/2, nrow(df)) for the lines to match up.

Line plot of mixed models / lsmeans results (with ggplot?)

I have longitudinal repeated measures on individuals over 4 timepoints. Following a mixed models analysis with time as fixed effect and random slopes I have used lsmeans to estimate the mean values at each time point as well as 95% confidence intervals. I would now like to plot a line graph with time points (x) and mean values of my outcome variable (y) with the CIs. Can I use e.g. ggplot to plot the results that I got from lsmeans? Or is there another smart way to plot this?
The results that I get from lsmeans, and that I would like to plot (lsmean, lower.CL, upperCL over time), are:
$lsmeans
time lsmean SE df lower.CL upper.CL
0 21.967213 0.5374422 60 20.892169 23.04226
1 16.069586 0.8392904 60 14.390755 17.74842
2 13.486802 0.8335159 60 11.819522 15.15408
3 9.495137 0.9854642 60 7.523915 11.46636
Confidence level used: 0.95
Is this what you meant?
# To convert from lsmeans output (d <- lsmeans(paramaters))
d <- summary(d)$lsmeans[c("lsmean", "lower.CL", "upper.CL")]
library(ggplot2)
ggplot(d, aes(time)) +
geom_line(aes(y = lsmean)) +
geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL),
width = 0.2) +
geom_point(aes(y = lsmean), size = 3,
shape = 21, fill = "white") +
labs(x = "Time", y = "ls mean",
title = "ls mean result over time") +
theme_bw()
To summarize, the whole code that will give you the estimates and plot of the mixed model is:
## random slope model
summary(model <- lme(outcome ~ time, random = ~1+time|ID, data = data,
na.action = na.exclude, method = "ML"))
## pairwise comparisons of timepoints
install.packages("lsmeans")
library(lsmeans)
lsmeans(model, pairwise~time, adjust="tukey")
### Draw the picture
d <- summary(lsmeans(model, ~time))
library(ggplot2)
ggplot(d, aes(time)) +
geom_line(aes(y = lsmean, group = 1)) +
geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL), width = 0.2) +
geom_point(aes(y = lsmean), size = 3, shape = 21, fill = "white") +
labs(x = "Time", y = "ls mean", title = "ls mean result over time") +
theme_bw()

Resources