Calculating credible intervals for marginal effects in binomial logit using rstanarm - r

In this method for calculating marginal effects for a binomial logit using rstanarm,
https://stackoverflow.com/a/45042387/9264004
nd <- md
nd$x1 <- 0
p0 <- posterior_linpred(glm1, newdata = nd, transform = TRUE)
nd$x1 <- 1
p1 <- posterior_linpred(glm1, newdata = nd, transform = TRUE)
ME <- p1 - p0
AME <- rowMeans(ME)
Can intervals for the marginal effects be calculated by taking quantiles, like this:
QME <- quantile(AME, c(.025,.25,.5,.75,.975))
or is there a more correct way to calculate a standard error for the effect?

If you are interested in the posterior standard deviation of the average (over the data) "marginal" effect of changing x1 from 0 to 1, then it would be sd(ME) or possibly mad(ME). But if you want quantiles, then call quantile.

Related

Predicting CI for a predicted value from a logistic regression model

So I have a specific predicted value that I calculated using logistic regression and now I need to find the CI for that probability. Here is my code:
cheese_out <- glm(taste~acetic+person,data=cheese,family = "binomial")
probabilities <- predict(cheese_out,newdata=cheese, type="response")
testdat <- data.frame(acetic = 6, person = "Child")
pred_accp <- predict(cheese_out, newdata=testdat, type="response")
and I get my pred_accp value which is 0.1206 but how do I calculate a confidence interval based off of that value?
You may use option se.fit=TRUE of the predict function. This gives you standard errors from which you can calculate the confidence interval. Example:
out <- glm(I(Sepal.Length > 5.8) ~ Sepal.Width + Species, iris, family=binomial())
testdat <- data.frame(Sepal.Width=3, Species="versicolor")
pred_accp <- predict(out, newdata=testdat, type="response", se.fit=TRUE)
alpha <- .05 ## confidence level
cc <- -qt(alpha/2, df=Inf)*pred_accp$se.fit
setNames(
pred_accp$fit + cc * c(-1, 0, 1),
c("lower", "estimate", "upper"))
# lower estimate upper
# 0.5505699 0.7072896 0.8640093
Note, that here is assumed, that data is z-distributed, i.e. df=Inf. For t-distribution you may want to specify correct degrees of freedom here.

How to check for overdispersion in a GAM with negative binomial distribution?

I fit a Generalized Additive Model in the Negative Binomial family using gam from the mgcv package. I have a data frame containing my dependent variable y, an independent variable x, a factor fac and a random variable ran. I fit the following model
gam1 <- gam(y ~ fac + s(x) + s(ran, bs = 're'), data = dt, family = "nb"
I have read in Negative Binomial Regression book that it is still possible for the model to be overdisperesed. I have found code to check for overdispersion in glm but I am failing to find it for a gam. I have also encountered suggestions to just check the QQ plot and standardised residuals vs. predicted residuals, but I can not decide from my plots if the data is still overdisperesed. Therefore, I am looking for an equation that would solve my problem.
A good way to check how well the model compares with the observed data (and hence check for overdispersion in the data relative to the conditional distribution implied by the model) is via a rootogram.
I have a blog post showing how to do this for glm() models using the countreg package, but this works for GAMs too.
The salient parts of the post applied to a GAM version of the model are:
library("coenocliner")
library('mgcv')
## parameters for simulating
set.seed(1)
locs <- runif(100, min = 1, max = 10) # environmental locations
A0 <- 90 # maximal abundance
mu <- 3 # position on gradient of optima
alpha <- 1.5 # parameter of beta response
gamma <- 4 # parameter of beta response
r <- 6 # range on gradient species is present
pars <- list(m = mu, r = r, alpha = alpha, gamma = gamma, A0 = A0)
nb.alpha <- 1.5 # overdispersion parameter 1/theta
zprobs <- 0.3 # prob(y == 0) in binomial model
## simulate some negative binomial data from this response model
nb <- coenocline(locs, responseModel = "beta", params = pars,
countModel = "negbin",
countParams = list(alpha = nb.alpha))
df <- setNames(cbind.data.frame(locs, nb), c("x", "yNegBin"))
OK, so we have a sample of data drawn from a negative binomial sampling distribution and we will now fit two models to these data:
A Poisson GAM
m_pois <- gam(yNegBin ~ s(x), data = df, family = poisson())
A negative binomial GAM
m_nb <- gam(yNegBin ~ s(x), data = df, family = nb())
The countreg package is not yet on CRAN but it can be installed from R-Forge:
install.packages("countreg", repos="http://R-Forge.R-project.org")
Then load the packages and plot the rootograms:
library("countreg")
library("ggplot2")
root_pois <- rootogram(m_pois, style = "hanging", plot = FALSE)
root_nb <- rootogram(m_nb, style = "hanging", plot = FALSE)
Now plot the rootograms for each model:
autoplot(root_pois)
autoplot(root_nb)
This is what we get (after plotting both using cowplot::plot_grid() to arrange the two rootograms on the same plot)
We can see that the negative binomial model does a bit better here than the Poisson GAM for these data — the bottom of the bars are closer to zero throughout the range of the observed counts.
The countreg package has details on how you can add an uncertain band around the zero line as a form of goodness of fit test.
You can also compute the Pearson estimate for the dispersion parameter using the Pearson residuals of each model:
r$> sum(residuals(m_pois, type = "pearson")^2) / df.residual(m_pois)
[1] 28.61546
r$> sum(residuals(m_nb, type = "pearson")^2) / df.residual(m_nb)
[1] 0.5918471
In both cases, these should be 1; we see substantial overdispersion in the Poisson GAM, and some under-dispersion in the Negative Binomial GAM.

Prediction intervals from model average

Is it possible to get prediction intervals from a model average in R?
I've used the MuMIn package to model-average several linear mixed models (that I fit using lme4::lmer()). The MuMIn package supports model predictions & st. errors of estimates (if all of the component models support the estimation of st. errors), which are convenient for getting an [estimated][1] confidence interval on the prediction.
To get a prediction interval from a single linear mixed model fit using lme4::lmer(), I could follow Ben Bolker's instructions:
library(lme4)
data("Orthodont",package="MEMSS")
fm1 <- lmer(
formula = distance ~ age*Sex + (age|Subject)
, data = Orthodont
)
newdat <- expand.grid(
age=c(8,10,12,14)
, Sex=c("Female","Male")
, distance = 0
)
newdat$distance <- predict(fm1,newdat,re.form=NA)
mm <- model.matrix(terms(fm1),newdat)
## or newdat$distance <- mm %*% fixef(fm1)
pvar1 <- diag(mm %*% tcrossprod(vcov(fm1),mm))
tvar1 <- pvar1+VarCorr(fm1)$Subject[1] ## must be adapted for more complex models
cmult <- 2 ## could use 1.96
newdat <- data.frame(
newdat
, plo = newdat$distance-cmult*sqrt(pvar1) # Confidence Interval
, phi = newdat$distance+cmult*sqrt(pvar1) # Confidence Interval
, tlo = newdat$distance-cmult*sqrt(tvar1) # Prediction Interval
, thi = newdat$distance+cmult*sqrt(tvar1) # Prediction Interval
)
But how could I do this for several models that are averaged together? This gives me a [rough][1] confidence interval, but it's unclear to me how to average the prediction interval across models:
library(lme4)
library(MuMIn)
data("Orthodont",package="MEMSS")
fit_full <- lmer(
formula = distance ~ age*Sex + (age|Subject),
data = Orthodont,
REML = FALSE,
na.action = 'na.fail'
)
fit_dredge <- dredge(fit_full)
fit_ma <- model.avg(object = get.models(fit_dredge, subset = delta <= 4))
newdat <- expand.grid(
age=c(8,10,12,14),
Sex=c("Female","Male"),
distance = 0
)
predicted <- predict(fit_ma,newdat,re.form=NA, se.fit = TRUE)
newdat$distance <- predicted$fit
newdat$distance_lower_CI <- predicted$fit - 1.96*predicted$se.fit
newdat$distance_upper_CI <- predicted$fit + 1.96*predicted$se.fit
[1] As Ben Bolker notes here, these confidence intervals only account for uncertainty in the fixed effects, not uncertainty in the random effects. lme4::bootMer() will give a better estimate of the confidence interval, but it only works on a single model, not a model-average.

How to obtain profile confidence intervals of the difference in probability of success between two groups from a logit model (glmer)?

I am struggling to transform the log odds ratio profile confidence intervals obtained from a logit model into probabilities. I would like to know how to calculate the confidence intervals of the difference between two groups.
If the p-value is > 0.05, the 95% CI of the difference should span from below zero to above zero. However, I don’t know how negative values can be obtained when the log ratios have to be exponentiated. Therefore I tried to calculate the CI of one of the groups (B) and see what the difference of the lower and the upper end of the CI to the estimate of group A is. I believe this is not the correct way to calculate the CI of the difference because the estimate of A is also uncertain.
I would be happy if anyone could help me out.
library(lme4)
# Example data:
set.seed(11)
treatment = c(rep("A",30), rep("B", 40))
site = rep(1:14, each = 5)
presence = c(rbinom(30, 1, 0.6),rbinom(40, 1, 0.8))
df = data.frame(presence, treatment, site)
# Likelihood ratio test
M0 = glmer(presence ~ 1 + (1|site), family = "binomial", data = df)
M1 = glmer(presence ~ treatment + (1|site), family = "binomial", data = df)
anova(M1, M0)
# Calculating confidence intervals
cc <- confint(M1, parm = "beta_")
ctab <- cbind(est = fixef(M1), cc)
cdat = as.data.frame(ctab)
# Function to back-transform to probability (0-1)
unlogit = function(y){
y_retransfromed = exp(y)/(1+exp(y))
y_retransfromed
}
# Getting estimates
A_est = unlogit(cdat$est[1])
B_est = unlogit(cdat$est[1] + cdat$est[2])
B_lwr = unlogit(cdat$est[1] + cdat[2,2])
B_upr = unlogit(cdat$est[1] + cdat[2,3])
Difference_est = B_est - A_est
# This is how I tried to calculate the CI of the difference
Difference_lwr = B_lwr - A_est
Difference_upr = B_upr - A_est
# However, I believe this is wrong because A_est is also “uncertain”
How to get the confidence interval of the difference of the probability of presence?
We can calculate the average treatment effect in the following way. From the original data, create two new datasets, one in which all units receive treatment A, and one in which all units receive treatment B. Now, based on your model estimates (in your case, M1), we compute predicted outcomes for units in each of these two datasets. We then compute the mean difference in the outcomes between the two datasets to get our estimated average treatment effect. Here, we can write a function that takes a glmer object and computes the average treatment effect:
ate <- function(.) {
treat_A <- treat_B <- df
treat_A$treatment <- "A"
treat_B$treatment <- "B"
c("ate" = mean(predict(., newdata = treat_B, type = "response") -
predict(., newdata = treat_A, type = "response")))
}
ate(M1)
# ate
# 0.09478276
How do we get the uncertainty interval? We can use the bootstrap, i.e. re-estimate the model many times using randomly generated samples from your original data, calculating the average treatment effect each time. We can then use the distribution of the bootstrapped average treatment effects to compute our uncertainty interval. Here we generate 100 simulations using the bootMer function
out <- bootMer(M1, ate, seed = 1234, nsim = 100)
and inspect the distribution of the effect:
quantile(out$t, c(0.025, 0.5, 0.975))
# 2.5% 50% 97.5%
# -0.06761338 0.10508751 0.26907504

Generating confidence intervals for predicted probabilities after running mlogit function in R

I have been struggling with the following problem for some time and would be very grateful for any help. I am running a logit model in R using the mlogit function and am able to generate the predicted probability of choosing each alternative for a given value of the predictors as follows:
library(mlogit)
data("Fishing", package = "mlogit")
Fish <- mlogit.data(Fishing, varying = c(2:9), shape = "wide", choice = "mode")
Fish_fit<-Fish[-(1:4),]
Fish_test<-Fish[1:4,]
m <- mlogit(mode ~price+ catch | income, data = Fish_fit)
predict(m,newdata=Fish_test,)
I cannot, however, work out how to add confidence intervals to the predicted probability estimates. I have already tried adding arguments to the predict function, but none seem to generate them. Any ideas on how it can be achieved would be much appreciated.
One approach here is Monte Carlo simulation. You'd simulate repeated draws from a multivariate-normal sampling distribution whose parameters are given by your model results.
For each simulation, estimate your predicted probabilities, and use their empirical distribution over simulations to get your confidence intervals.
library(MASS)
est_betas <- m$coefficients
est_preds <- predict(m, newdata = Fish_test)
sim_betas <- mvrnorm(1000, m$coefficients, vcov(m))
sim_preds <- apply(sim_betas, 1, function(x) {
m$coefficients <- x
predict(m, newdata = Fish_test)
})
sim_ci <- apply(sim_preds, 1, quantile, c(.025, .975))
cbind(prob = est_preds, t(sim_ci))
# prob 2.5% 97.5%
# beach 0.1414336 0.10403634 0.1920795
# boat 0.3869535 0.33521346 0.4406527
# charter 0.3363766 0.28751240 0.3894717
# pier 0.1352363 0.09858375 0.1823240

Resources