I am attempting to plot a bootstrapped linear model and I would like to include the models upper and lower confidence intervals and am not sure I am calculating the bootstrapped upper and lower bounds correctly. Below is an example using the cars dataset in r in conjunction with the boot library.
library(boot)
plot(speed~dist,cars,pch=21,bg="grey")
## standard linear model
mod<-lm(speed~dist,cars)
new.dat=seq(0,120,10)
mod.fit<-predict(mod,newdata=data.frame(dist=new.dat),interval="confidence")
lines(new.dat,mod.fit[,1]);#line fit
lines(new.dat,mod.fit[,2],lty=2);#lower confidence interval
lines(new.dat,mod.fit[,3],lty=2);#upper confidence interval
##Bootstrapped Confidence Intervals
lm.boot=function(formula, data, indices) {
d <- data[indices,] # allows boot to select sample
fit <- lm(formula, data=d)
return(coef(fit))
}
results <- boot(data=cars, statistic=lm.boot,
R=100, formula=speed~dist)
N.mod<-nrow(cars)
x.val<-new.dat
y.boot.fit<-(mean(results$t[,2])*x.val)+mean(results$t[,1])
y.boot.fit.uCI<-y.boot.fit+qt(0.975,N.mod-2)*sd(results$t[,2])
y.boot.fit.lCI<-y.boot.fit-qt(0.975,N.mod-2)*sd(results$t[,2])
lines(new.dat,y.boot.fit,col="red")
lines(new.dat,y.boot.fit.lCI,lty=2,col="red");#lower confidence interval
lines(new.dat,y.boot.fit.uCI,lty=2,col="red");#upper confidence interval
legend("bottomright",legend=c("Linear Model","Bootstrapped Model"),lty=1,
col=c("black","red"),ncol=1,cex=1,bty="n",y.intersp=1.5,x.intersp=0.75,
xpd=NA,xjust=0.5)
Using this code I get this output where the bootstrapped confidence interval is on top of the bootstrapped fitted line.
Any help/direction is appreciated. Granted this might be better suited for Cross Validated or other statistic specific boards.
Related
I'm analyzing a linear model with 2 factors and I need to find a confidence interval for the response variable. As I understand it confint() and predict(, interval='confidence') both find confidence intervals so what is the difference between them?
confint() finds confidence intervals on the model parameters
predict(., interval="confidence") finds confidence intervals on the model predictions
This is an extension from this question about 95% confidence intervals for quantile regression using rquant:
Calculating 95% confidence intervals in quantile regression in R using rq function
Here, the goal is to determine 95% confidence intervals for quantile regression for a polynomial fit.
Data:
x<-1:50
y<-c(x[1:50]+rnorm(50,0,5))^2
Attempt using the approach in the aforementioned question:
QR.b <- boot.rq(cbind(1,x,x^2),y,tau=0.5, R=1000)
t(apply(QR.b, 2, quantile, c(0.025,0.975)))
2.5% 97.5%
[1,] -14.9880661 126.906083
[2,] -20.5603779 5.424308
[3,] 0.8608203 1.516513
But this of course determines the 95% CI for each coefficient independently, and would appear to overestimates the interval (see image below).
I had another idea for an approach simply determining the coefficients from a bootstrap sample of the data (i.e. rq(y~x+I(x^2)) on thousands of samples of y and x), but wanted to see if there is something build it to the package.
I would like to get 95% confidence intervals for the regression coefficients of a quantile regression. You can calculate quantile regressions using the rq function of the quantreg package in R (compared to an OLS model):
library(quantreg)
LM<-lm(mpg~disp, data = mtcars)
QR<-rq(mpg~disp, data = mtcars, tau=0.5)
I am able to get 95% confidence intervals for the linear model using the confint function:
confint(LM)
When I use quantile regression I understand that the following code produces bootstrapped standard errors:
summary.rq(QR,se="boot")
But actually I would like something like 95% confidence intervals. That is, something to interprete like: "with a probability of 95%, the interval [...] includes the true coefficient". When I calculate standard errors using summary.lm() I would just multiply SE*1.96 and get similar results as from confint(). But this is not possible using bootstrapped standard errors.
So my question is how get 95% confidence intervals for quantile regression coefficients?
You can use the boot.rq function directly to bootstrap the coefficients:
x<-1:50
y<-c(x[1:48]+rnorm(48,0,5),rnorm(2,150,5))
QR <- rq(y~x, tau=0.5)
summary(QR, se='boot')
LM<-lm(y~x)
QR.b <- boot.rq(cbind(1,x),y,tau=0.5, R=10000)
t(apply(QR.b$B, 2, quantile, c(0.025,0.975)))
confint(LM)
plot(x,y)
abline(coefficients(LM),col="green")
abline(coefficients(QR),col="blue")
for(i in seq_len(nrow(QR.b$B))) {
abline(QR.b$B[i,1], QR.b$B[i,2], col='#0000ff01')
}
You may want to use the boot package to compute intervals other than the percentile interval.
You could also simply retrieve the vcov from the object, setting covariance=TRUE. This amounts to using boostrapped standard errors in your CI:
vcov.rq <- function(x, se = "iid") {
vc <- summary.rq(x, se=se, cov=TRUE)$cov
dimnames(vc) <- list(names(coef(x)), names(coef(x)))
vc
}
confint(QR)
But yes, the better way to do this is to use a studentized bootstrap.
I am trying to determine confidence intervals for predicted probabilities from a binomial logistic regression in R. The model is estimated using lrm (from the package rms) to allow for clustering standard errors on survey respondents (each respondent appears up to 3 times in the data):
library(rms)
model1<-lrm(outcome~var1+var2+var3,data=mydata,x=T,y=T,se.fit=T)
model.rob<-robcov(model1,cluster=respondent.id)
I am able to estimate a predicted probability for the outcome using predict.lrm:
predicted.prob<-predict(model.rob,newdata=data.frame(var1=1,var2=.33,var3=.5),
type="fitted")
What I want to determine is a 95% confidence interval for this predicted probability. I have tried specifying se.fit=T, but this not permissible in predict.lrm when type=fitted.
I have spent the last few hours scouring the Internet for how to do this with lrm to no avail (obviously). Can anyone point me toward a method for determining this confidence interval? Alternatively, if it is impossible or difficult with lrm models, is there another way to estimate a logit with clustered standard errors for which confidence intervals would be more easily obtainable?
The help file for predict.lrm has a clear example. Here is a slight modification of it:
L <- predict(fit, newdata=data.frame(...), se.fit=TRUE)
plogis(with(L, linear.predictors + 1.96*cbind(- se.fit, se.fit)))
For some problems you may want to use the gendata or Predict functions, e.g.
L <- predict(fit, gendata(fit, var1=1), se.fit=TRUE) # leave other vars at median/mode
Predict(fit, var1=1:2, var2=3) # leave other vars at median/mode; gives CLs
I am working on a big data set having over 300K elements, and running some regression analysis trying to estimate a parameter called Rate using the predictor variable Distance. I have the regression equation. Now I want to get the confidence and prediction intervals. I can easily get the confidence intervals for the coefficients by the command:
> confint(W1500.LR1, level = 0.95)
2.5 % 97.5 %
(Intercept) 666.2817393 668.0216072
Distance 0.3934499 0.3946572
which gives me the upper and lower bounds for the CI of the coefficients. Now I want to get the same upper and lower bounds for the Prediction Intervals. Only thing I have learnt so far is that, I can get the prediction intervals for specific values of Distance (say 200, 500, etc.) using the code:
predict(W1500.LR1, newdata, interval="predict")
This is not useful for me because I have over 300K different distance values, requiring to run this code for each of them. Any simple way to get the prediction intervals like the confint command I showed above?
Had to make up my own data but here you go
x = rnorm(300000)
y = jitter(3*x,1000)
fit = lm(y~x)
#Prediction intervals
pred.int = predict(fit,interval="prediction")
#Confidence intervals
conf.int = predict(fit,interval="confidence")
fitted.values = pred.int[,1]
pred.lower = pred.int[,2]
pred.upper = pred.int[,3]
plot(x[1:1000],y[1:1000])
lines(x[1:1000],fitted.values[1:1000],col="red",lwd=2)
lines(x[1:1000],pred.lower[1:1000],lwd=2,col="blue")
lines(x[1:1000],pred.upper[1:1000],lwd=2,col="blue")
So as you can see your prediction is for predicting new data values and not for constructing intervals for the beta coefficients. So the confidence intervals you actually want would be obtained in the same fashion from conf.int.