Is it possible to get prediction intervals from a model average in R?
I've used the MuMIn package to model-average several linear mixed models (that I fit using lme4::lmer()). The MuMIn package supports model predictions & st. errors of estimates (if all of the component models support the estimation of st. errors), which are convenient for getting an [estimated][1] confidence interval on the prediction.
To get a prediction interval from a single linear mixed model fit using lme4::lmer(), I could follow Ben Bolker's instructions:
library(lme4)
data("Orthodont",package="MEMSS")
fm1 <- lmer(
formula = distance ~ age*Sex + (age|Subject)
, data = Orthodont
)
newdat <- expand.grid(
age=c(8,10,12,14)
, Sex=c("Female","Male")
, distance = 0
)
newdat$distance <- predict(fm1,newdat,re.form=NA)
mm <- model.matrix(terms(fm1),newdat)
## or newdat$distance <- mm %*% fixef(fm1)
pvar1 <- diag(mm %*% tcrossprod(vcov(fm1),mm))
tvar1 <- pvar1+VarCorr(fm1)$Subject[1] ## must be adapted for more complex models
cmult <- 2 ## could use 1.96
newdat <- data.frame(
newdat
, plo = newdat$distance-cmult*sqrt(pvar1) # Confidence Interval
, phi = newdat$distance+cmult*sqrt(pvar1) # Confidence Interval
, tlo = newdat$distance-cmult*sqrt(tvar1) # Prediction Interval
, thi = newdat$distance+cmult*sqrt(tvar1) # Prediction Interval
)
But how could I do this for several models that are averaged together? This gives me a [rough][1] confidence interval, but it's unclear to me how to average the prediction interval across models:
library(lme4)
library(MuMIn)
data("Orthodont",package="MEMSS")
fit_full <- lmer(
formula = distance ~ age*Sex + (age|Subject),
data = Orthodont,
REML = FALSE,
na.action = 'na.fail'
)
fit_dredge <- dredge(fit_full)
fit_ma <- model.avg(object = get.models(fit_dredge, subset = delta <= 4))
newdat <- expand.grid(
age=c(8,10,12,14),
Sex=c("Female","Male"),
distance = 0
)
predicted <- predict(fit_ma,newdat,re.form=NA, se.fit = TRUE)
newdat$distance <- predicted$fit
newdat$distance_lower_CI <- predicted$fit - 1.96*predicted$se.fit
newdat$distance_upper_CI <- predicted$fit + 1.96*predicted$se.fit
[1] As Ben Bolker notes here, these confidence intervals only account for uncertainty in the fixed effects, not uncertainty in the random effects. lme4::bootMer() will give a better estimate of the confidence interval, but it only works on a single model, not a model-average.
Related
So I have a specific predicted value that I calculated using logistic regression and now I need to find the CI for that probability. Here is my code:
cheese_out <- glm(taste~acetic+person,data=cheese,family = "binomial")
probabilities <- predict(cheese_out,newdata=cheese, type="response")
testdat <- data.frame(acetic = 6, person = "Child")
pred_accp <- predict(cheese_out, newdata=testdat, type="response")
and I get my pred_accp value which is 0.1206 but how do I calculate a confidence interval based off of that value?
You may use option se.fit=TRUE of the predict function. This gives you standard errors from which you can calculate the confidence interval. Example:
out <- glm(I(Sepal.Length > 5.8) ~ Sepal.Width + Species, iris, family=binomial())
testdat <- data.frame(Sepal.Width=3, Species="versicolor")
pred_accp <- predict(out, newdata=testdat, type="response", se.fit=TRUE)
alpha <- .05 ## confidence level
cc <- -qt(alpha/2, df=Inf)*pred_accp$se.fit
setNames(
pred_accp$fit + cc * c(-1, 0, 1),
c("lower", "estimate", "upper"))
# lower estimate upper
# 0.5505699 0.7072896 0.8640093
Note, that here is assumed, that data is z-distributed, i.e. df=Inf. For t-distribution you may want to specify correct degrees of freedom here.
I am running a program where I conduct an OLS regression and then I subtract the coefficients from the actual observations to keep the residuals.
model1 = lm(data = final, obs ~ day + poly(temp,2) + prpn + school + lag1) # linear model
predfit = predict(model1, final) # predicted values
residuals = data.frame(final$obs - predfit) # obtain residuals
I want to bootstrap my model and then do the same with the bootstrapped coefficients. I try doing this the following way:
lboot <- lm.boot(model1, R = 1000)
predfit = predict(lboot, final)
residuals = data.frame(final$obs - predfit) # obtain residuals
However, that does not work. I also try:
boot_predict(model1, final, R = 1000, condense = T, comparison = "difference")
and that also does not work.
How can I bootstrap my model and then predict based of that?
If you're trying to fit the best OLS using bootstrap, I'd use the caret package.
library(caret)
#separate indep and dep variables
indepVars = final[,-final$obs]
depVar = final$obs
#train model
ols.train = train(indepVars, depVar, method='lm',
trControl = trainControl(method='boot', number=1000))
#make prediction and get residuals
ols.pred = predict(ols.train, indepVars)
residuals = ols.pred - final$obs
In this regression:
I know confit() from the package glht can do confidence interval of every estimator.
But how to make confidence interval of a linear combination of coefficients,
such as confidence interval of β3+2*β5 in R?
Added this
You can do that with linearHypothesis in the car package:
library(car)
dat <- data.frame(
y = rnorm(100),
x1 = rnorm(100),
x2 = rnorm(100)
)
fit <- lm(y ~ x1 + x2, data = dat)
# enter linear hypothesis as a matrix
linearHypothesis(fit, t(c(0,2,2)), 0)
# enter linear hypothesis as a string
linearHypothesis(fit, "2*x1 + 2*x2 = 0")
Or with glht in the multcomp package, which also provides a confidence interval for the linear combination:
library(multcomp)
lh <- glht(fit, linfct = t(c(0,2,2)))
confint(lh)
# Linear Hypotheses:
# Estimate lwr upr
# 1 == 0 0.1258 -0.4398 0.6914
In this method for calculating marginal effects for a binomial logit using rstanarm,
https://stackoverflow.com/a/45042387/9264004
nd <- md
nd$x1 <- 0
p0 <- posterior_linpred(glm1, newdata = nd, transform = TRUE)
nd$x1 <- 1
p1 <- posterior_linpred(glm1, newdata = nd, transform = TRUE)
ME <- p1 - p0
AME <- rowMeans(ME)
Can intervals for the marginal effects be calculated by taking quantiles, like this:
QME <- quantile(AME, c(.025,.25,.5,.75,.975))
or is there a more correct way to calculate a standard error for the effect?
If you are interested in the posterior standard deviation of the average (over the data) "marginal" effect of changing x1 from 0 to 1, then it would be sd(ME) or possibly mad(ME). But if you want quantiles, then call quantile.
Anyone's got a quick short educational example how to use Neural Networks (nnet in R) for the purpose of prediction?
Here is an example, in R, of a time series
T = seq(0,20,length=200)
Y = 1 + 3*cos(4*T+2) +.2*T^2 + rnorm(200)
plot(T,Y,type="l")
Many thanks
David
I think you can use the caret package and specially the train function
This function sets up a grid of tuning parameters for a number
of classification and regression routines.
require(quantmod)
require(nnet)
require(caret)
T = seq(0,20,length=200)
y = 1 + 3*cos(4*T+2) +.2*T^2 + rnorm(200)
dat <- data.frame( y, x1=Lag(y,1), x2=Lag(y,2))
names(dat) <- c('y','x1','x2')
dat <- dat[c(3:200),] #delete first 2 observations
#Fit model
model <- train(y ~ x1+x2 ,
dat,
method='nnet',
linout=TRUE,
trace = FALSE)
ps <- predict(model, dat)
#Examine results
plot(T,Y,type="l",col = 2)
lines(T[-c(1:2)],ps, col=3)
legend(5, 70, c("y", "pred"), cex=1.5, fill=2:3)
The solution proposed by #agstudy is useful, but in-sample fits are not a reliable guide to out-of-sample forecasting accuracy. The gold standard in forecasting accuracy measurement is to use a holdout sample. Remove the last 5 or 10 or 20 observations (depending to the length of the time series) from the training sample, fit your models to the rest of the data, use the fitted models to forecast the holdout sample and simply compare accuracies on the holdout, using Mean Absolute Deviations (MAD) or weighted Mean Absolute Percentage Errors (wMAPEs).
So to do this you can change the code above in this way:
require(quantmod)
require(nnet)
require(caret)
t = seq(0,20,length=200)
y = 1 + 3*cos(4*t+2) +.2*t^2 + rnorm(200)
dat <- data.frame( y, x1=Lag(y,1), x2=Lag(y,2))
names(dat) <- c('y','x1','x2')
train_set <- dat[c(3:185),]
test_set <- dat[c(186:200),]
#Fit model
model <- train(y ~ x1+x2 ,
train_set,
method='nnet',
linout=TRUE,
trace = FALSE)
ps <- predict(model, test_set)
#Examine results
plot(T,Y,type="l",col = 2)
lines(T[c(186:200)],ps, col=3)
legend(5, 70, c("y", "pred"), cex=1.5, fill=2:3)
This last two lines output the wMAPE of the forecasts from the model
sum(abs(ps-test_set["y"]))/sum(test_set)