Let's imagine I want to fit an INLA model with a large dataset and want to do some prediction using the fitted model. One solution is to include into the date the covariates used to predict the outcome, assigning NA values to the outcome.
In my case, I want to regularly calculate the predicted outcome based on updated covariates. Ideally, I would like to run the model once to fit it, save it, and use it later to regularly make prediction (e.g for linear model 'lm()' in R using the 'predict()' function).
I haven't found a way to do it in INLA. Below is a simple reproducible example.
library(INLA)
#simulate data
n = 100; a = 1; b = 1; tau = 100
z = rnorm(n)
eta = a + b*z
scale = exp(rnorm(n))
prec = scale*tau
y = rnorm(n, mean = eta, sd = 1/sqrt(prec))
plot(z,y)
#run INLA model
data = list(y=y, z=z)
formula = y ~ 1+z
result = inla(formula, family = "gaussian", data = data)
summary(result)
#define prediction data
data.pred$z = c(data$z,seq(2,4,length.out=100))
data.pred$y = c(data$y, y=rep(NA,100))
#run INLA model with prediction
result = inla(formula, family = "gaussian", data = data,
control.compute=list(config = TRUE))
summary(result)
#get posterior samples of the predictions
post.samples <- inla.posterior.sample(n = 10, result = result)
pred <- do.call(cbind,
lapply(post.samples,
function(X) X$latent[startsWith(rownames(X$latent), "Pred")]))
I am trying to figure out how to get confidence intervals from predicted values from a model run on medrc (nlme model). The code worked on the regular drc package model, which does not use random effects, so I assume there is something I am not doing right with this nlme model to get CI because I am getting errors.
Below is an example data frame of the data I am using
df <- data.frame(Geno = c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7,8,8,8,8,
9,9,9,9,10,10,10,10,11,11,11,11,12,12,12,12,13,13,13,13,14,14,14,14),
Treatment = c(3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",
3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",3,6,9,"MMM",
3,6,9,"MMM",3,6,9,"MMM"),
Temp = c(32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,
32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,
32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,
32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535,
32.741,34.628,37.924,28.535,32.741,34.628,37.924,28.535),
PAM = c(0.62225,0.593,0.35775,0.654,0.60625,0.5846667,0.316,0.60875,0.62275,0.60875,0.32125,
0.63725,0.60275,0.588,0.32275,0.60875,0.65225,0.6185,0.29925,0.64525,0.61925,0.61775,
0.11725,0.596,0.603,0.6065,0.2545,0.59025,0.586,0.5895,0.27025,0.59125,0.6345,0.6135,
0.3755,0.622,0.53375,0.552,0.2485,0.51925,0.6375,0.6256667,0.3575,0.63975,0.59375,0.6055,
0.333,0.64125,0.55275,0.51025,0.319,0.55725,0.6375,0.64725,0.348,0.66125))
df$Geno <- as.factor(df$Geno)
With this data, I am running this model that has 3 parameters for the dose-response curve model, b =slope, d= max, e= ED50.
model <- medrm(PAM ~ Temp,
data=df,
random= d + e ~ 1|Geno,
fct=LL.3(),
control=nlmeControl(msMaxIter = 2000, maxIter=2000, minScale=0.00001, tolerance=0.1, pnlsTol=1))
summary(model)
plot(model)
From this model I want to make prediction values for different temperatures along the model
model_preddata = data.frame(Temp = seq(28,39, length.out = 100))
model_pred = as.data.frame(predict(model, newdata = model_preddata, interval = 'confidence'))
with this I get an error but I can make it predict the PAM values if I add this
model_pred = as.data.frame(predict(model, newdata = model_preddata, interval = 'confidence', level = 0))
However this does not give me the lower and upper bounds columns like it does when I run this code with other non mixed effect models.
Can anyone help me figure out how to get the CI from the predicted values of this model
I am undertaking a ordinal logistic regression using R package MASS.
For example:
library(MASS)
house.plr <- polr(Sat ~ Infl + Type + Cont, weights = Freq, data = housing)
summary(house.plr, digits = 3)
I am using the s3 method predict() to obtain the predicted values
test_dat <- data.frame(Infl = factor(rep("Low",4)),
Cont = factor(rep("Low",4)),
Type = unique(housing$Type))
predict(house.plr, test_dat, type = "p")
Low Medium High
1 0.3784493 0.2876752 0.3338755
2 0.5190445 0.2605077 0.2204478
3 0.4675584 0.2745383 0.2579033
4 0.6444840 0.2114256 0.1440905
The result is a table of predicted means for each level of Sat given the variables defined in the test_dat.
How might I extract the variation around each of these means in the form of a standard error or standard deviation?
First, your predicted values are the predicted probability of each outcome for each observation. It is not the predicted mean on the response scale.
Second, you can use the marginaleffects package to get the standard errors for the predicted probabilities and then calculate the confidence intervals yourself. Alternatively, you may implement the non-parametric bootstrap. I implement both in the below. Note that I shifted the order of the columns around in the test data to match the training data.
# Packages
library(MASS)
library(marginaleffects)
library(dplyr)
# Create a test set
N <- 4
test_dat <- data.frame(
Infl = factor(rep("Low", N)),
Type = unique(housing$Type),
Cont = factor(rep("Low", N))
)
# Fit ordered logistic regression model
house.plr <- polr(Sat ~ Infl + Type + Cont,
weights = Freq,
data = housing,
Hess = TRUE)
# Demonstrate that predict() doesn't provide any measure of variability
# for the predicted class probabilities, as shown in question
predict(house.plr, test_dat, type = "probs")
# Use the marginaleffects package to get delta method standard errors for
# each predicted probability
probs <- marginaleffects::predictions(house.plr,
newdata = test_dat,
type = "probs")
# Compute CIs from the standard error using normal approximation
probs$predicted - 1.96*probs$std.error
probs$predicted + 1.96*probs$std.error
# Alternatively, use non-parametric bootstrapped confidence intervals.
# note that this does not adjust the weights to a constant sum for
# each bootstrap, although it is easy to implement. You're free to
# determine how to handle the weights, including resampling based
# on the weights.
# Generate bootstrapped data.frames
set.seed(123)
sims <- 5
samples <- vector(mode = "list", length = sims)
samples <- lapply(samples, function(x){ slice_sample(housing, n = nrow(housing), replace = TRUE)})
# Fit model on each bootstrapped data.frame
models <- lapply(samples, function(x){polr(Sat ~ Infl + Type + Cont,
weights = Freq,
data = x,
Hess = TRUE)})
# Get test predictions into a data.frame
probs_boot <- lapply(models, function(x) {
marginaleffects::predictions(x,
newdata = test_dat,
type = "probs")
})
probs_boot_df <- bind_rows(probs_boot)
# Compute CIs
probs_boot_df %>%
group_by(group, Type.x, Infl, Type.y, Cont) %>%
summarise(ci_low = quantile(predicted, probs = 0.025),
ci_high = quantile(predicted, probs = 0.975))
I'm running multilevel multiple imputation through the package mitml (using the panimpute() function) and am fitting linear mixed models and marginal models through the packages nlme and geepack and the mitml:with() function.
I can get the estimates, p-values etc for those through the testEstimates() function but I'm also looking to get estimated means across my model predictors. I've tried the emmeans package, which I normally use for getting estimated means when running nlme & geepack without multiple imputation but doing so emmeans tell me "Can't handle an object of class “mitml.result”".
I'm wondering is there a way to get pooled estimated means from the multiple imputation analyses I've run?
The data frames I'm analyzing are longitudinal/repeated measures and in long format. In the linear mixed model I want to get the estimated means for a 2x2 interaction effect and in the marginal model I'm trying to get estimated means for the 6 levels of 'time' variable. The outcome in all models is continuous.
Here's my code
# mixed model
fml <- Dep + time ~ 1 + (1|id)
imp <- panImpute(data=Data, formula=fml, n.burn=50000, n.iter=5000, m=100, group = "treatment")
summary(imp)
plot(imp, trace="all")
implist <- mitmlComplete(imp, "all", force.list = TRUE)
fit <- with(implist, lme(Dep ~ time*treatment, random = ~ 1|id, method = "ML", na.action = na.exclude, control = list(opt = "optim")))
testEstimates(fit, var.comp = TRUE)
confint.mitml.testEstimates(testEstimates(fit, var.comp = TRUE))
# marginal model
fml <- Dep + time ~ 1 + (1|id)
imp <- panImpute(data=Data, formula=fml, n.burn=50000, n.iter=5000, m=100)
summary(imp)
plot(imp, trace="all")
implist <- mitmlComplete(imp, "all", force.list = TRUE)
fit <- with(implist, geeglm(Dep ~ time, id = id, corstr ="unstructured"))
testEstimates(fit, var.comp = TRUE)
confint.mitml.testEstimates(testEstimates(fit, var.comp = TRUE))
is there a way to get pooled estimated means from the multiple imputation analyses I've run?
This is not a reprex without Data, so I can't verify this works for you. But emmeans provides support for mira-class (lists of) models in the mice package. So if you fit your model in with() using the mids rather than mitml.list class object, then you can use that to obtain marginal means of your outcome (and any contrasts or pairwise comparisons afterward).
Using example data found here, which uncomfortably loads an external workspace:
con <- url("https://www.gerkovink.com/mimp/popular.RData")
load(con)
## imputation
library(mice)
ini <- mice(popNCR, maxit = 0)
meth <- ini$meth
meth[c(3, 5, 6, 7)] <- "norm"
pred <- ini$pred
pred[, "pupil"] <- 0
imp <- mice(popNCR, meth = meth, pred = pred, print = FALSE)
## analysis
library(lme4) # fit multilevel model
mod <- with(imp, lmer(popular ~ sex + (1|class)))
library(emmeans) # obtain pooled estimates of means
(em <- emmeans(mod, specs = ~ sex) )
pairs(em) # test comparison