Calculating 95% confidence intervals in quantile regression in R using rq function - r

I would like to get 95% confidence intervals for the regression coefficients of a quantile regression. You can calculate quantile regressions using the rq function of the quantreg package in R (compared to an OLS model):
library(quantreg)
LM<-lm(mpg~disp, data = mtcars)
QR<-rq(mpg~disp, data = mtcars, tau=0.5)
I am able to get 95% confidence intervals for the linear model using the confint function:
confint(LM)
When I use quantile regression I understand that the following code produces bootstrapped standard errors:
summary.rq(QR,se="boot")
But actually I would like something like 95% confidence intervals. That is, something to interprete like: "with a probability of 95%, the interval [...] includes the true coefficient". When I calculate standard errors using summary.lm() I would just multiply SE*1.96 and get similar results as from confint(). But this is not possible using bootstrapped standard errors.
So my question is how get 95% confidence intervals for quantile regression coefficients?

You can use the boot.rq function directly to bootstrap the coefficients:
x<-1:50
y<-c(x[1:48]+rnorm(48,0,5),rnorm(2,150,5))
QR <- rq(y~x, tau=0.5)
summary(QR, se='boot')
LM<-lm(y~x)
QR.b <- boot.rq(cbind(1,x),y,tau=0.5, R=10000)
t(apply(QR.b$B, 2, quantile, c(0.025,0.975)))
confint(LM)
plot(x,y)
abline(coefficients(LM),col="green")
abline(coefficients(QR),col="blue")
for(i in seq_len(nrow(QR.b$B))) {
abline(QR.b$B[i,1], QR.b$B[i,2], col='#0000ff01')
}
You may want to use the boot package to compute intervals other than the percentile interval.

You could also simply retrieve the vcov from the object, setting covariance=TRUE. This amounts to using boostrapped standard errors in your CI:
vcov.rq <- function(x, se = "iid") {
vc <- summary.rq(x, se=se, cov=TRUE)$cov
dimnames(vc) <- list(names(coef(x)), names(coef(x)))
vc
}
confint(QR)
But yes, the better way to do this is to use a studentized bootstrap.

Related

95% confidence interval for smooth.spline in R [duplicate]

I have used smooth.spline to estimate a cubic spline for my data. But when I calculate the 90% point-wise confidence interval using equation, the results seems to be a little bit off. Can someone please tell me if I did it wrongly? I am just wondering if there is a function that can automatically calculate a point-wise interval band associated with smooth.spline function.
boneMaleSmooth = smooth.spline( bone[males,"age"], bone[males,"spnbmd"], cv=FALSE)
error90_male = qnorm(.95)*sd(boneMaleSmooth$x)/sqrt(length(boneMaleSmooth$x))
plot(boneMaleSmooth, ylim=c(-0.5,0.5), col="blue", lwd=3, type="l", xlab="Age",
ylab="Relative Change in Spinal BMD")
points(bone[males,c(2,4)], col="blue", pch=20)
lines(boneMaleSmooth$x,boneMaleSmooth$y+error90_male, col="purple",lty=3,lwd=3)
lines(boneMaleSmooth$x,boneMaleSmooth$y-error90_male, col="purple",lty=3,lwd=3)
Because I am not sure if I did it correctly, then I used gam() function from mgcv package.
It instantly gave a confidence band but I am not sure if it is 90% or 95% CI or something else. It would be great if someone can explain.
males=gam(bone[males,c(2,4)]$spnbmd ~s(bone[males,c(2,4)]$age), method = "GCV.Cp")
plot(males,xlab="Age",ylab="Relative Change in Spinal BMD")
I'm not sure the confidence intervals for smooth.spline have "nice" confidence intervals like those form lowess do. But I found a code sample from a CMU Data Analysis course to make Bayesian bootstap confidence intervals.
Here are the functions used and an example. The main function is spline.cis where the first parameter is a data frame where the first column are the x values and the second column are the y values. The other important parameter is B which indicates the number bootstrap replications to do. (See the linked PDF above for the full details.)
# Helper functions
resampler <- function(data) {
n <- nrow(data)
resample.rows <- sample(1:n,size=n,replace=TRUE)
return(data[resample.rows,])
}
spline.estimator <- function(data,m=300) {
fit <- smooth.spline(x=data[,1],y=data[,2],cv=TRUE)
eval.grid <- seq(from=min(data[,1]),to=max(data[,1]),length.out=m)
return(predict(fit,x=eval.grid)$y) # We only want the predicted values
}
spline.cis <- function(data,B,alpha=0.05,m=300) {
spline.main <- spline.estimator(data,m=m)
spline.boots <- replicate(B,spline.estimator(resampler(data),m=m))
cis.lower <- 2*spline.main - apply(spline.boots,1,quantile,probs=1-alpha/2)
cis.upper <- 2*spline.main - apply(spline.boots,1,quantile,probs=alpha/2)
return(list(main.curve=spline.main,lower.ci=cis.lower,upper.ci=cis.upper,
x=seq(from=min(data[,1]),to=max(data[,1]),length.out=m)))
}
#sample data
data<-data.frame(x=rnorm(100), y=rnorm(100))
#run and plot
sp.cis <- spline.cis(data, B=1000,alpha=0.05)
plot(data[,1],data[,2])
lines(x=sp.cis$x,y=sp.cis$main.curve)
lines(x=sp.cis$x,y=sp.cis$lower.ci, lty=2)
lines(x=sp.cis$x,y=sp.cis$upper.ci, lty=2)
And that gives something like
Actually it looks like there might be a more parametric way to calculate confidence intervals using the jackknife residuals. This code comes from the S+ help page for smooth.spline
fit <- smooth.spline(data$x, data$y) # smooth.spline fit
res <- (fit$yin - fit$y)/(1-fit$lev) # jackknife residuals
sigma <- sqrt(var(res)) # estimate sd
upper <- fit$y + 2.0*sigma*sqrt(fit$lev) # upper 95% conf. band
lower <- fit$y - 2.0*sigma*sqrt(fit$lev) # lower 95% conf. band
matplot(fit$x, cbind(upper, fit$y, lower), type="plp", pch=".")
And that results in
And as far as the gam confidence intervals go, if you read the print.gam help file, there is an se= parameter with default TRUE and the docs say
when TRUE (default) upper and lower lines are added to the 1-d plots at 2 standard errors above and below the estimate of the smooth being plotted while for 2-d plots, surfaces at +1 and -1 standard errors are contoured and overlayed on the contour plot for the estimate. If a positive number is supplied then this number is multiplied by the standard errors when calculating standard error curves or surfaces. See also shade, below.
So you can adjust the confidence interval by adjusting this parameter. (This would be in the print() call.)
The R package mgcv calculates smoothing splines and Bayesian "confidence intervals." These are not confidence intervals in the usual (frequentist) sense, but numerical simulations have shown that there is almost no difference; see the linked paper by Marra and Wood in the help file of mgcv.
library(SemiPar)
data(lidar)
require(mgcv)
fit=gam(range~s(logratio), data = lidar)
plot(fit)
with(lidar, points(logratio, range-mean(range)))

plotting bootstrapped confidence intervals

I am attempting to plot a bootstrapped linear model and I would like to include the models upper and lower confidence intervals and am not sure I am calculating the bootstrapped upper and lower bounds correctly. Below is an example using the cars dataset in r in conjunction with the boot library.
library(boot)
plot(speed~dist,cars,pch=21,bg="grey")
## standard linear model
mod<-lm(speed~dist,cars)
new.dat=seq(0,120,10)
mod.fit<-predict(mod,newdata=data.frame(dist=new.dat),interval="confidence")
lines(new.dat,mod.fit[,1]);#line fit
lines(new.dat,mod.fit[,2],lty=2);#lower confidence interval
lines(new.dat,mod.fit[,3],lty=2);#upper confidence interval
##Bootstrapped Confidence Intervals
lm.boot=function(formula, data, indices) {
d <- data[indices,] # allows boot to select sample
fit <- lm(formula, data=d)
return(coef(fit))
}
results <- boot(data=cars, statistic=lm.boot,
R=100, formula=speed~dist)
N.mod<-nrow(cars)
x.val<-new.dat
y.boot.fit<-(mean(results$t[,2])*x.val)+mean(results$t[,1])
y.boot.fit.uCI<-y.boot.fit+qt(0.975,N.mod-2)*sd(results$t[,2])
y.boot.fit.lCI<-y.boot.fit-qt(0.975,N.mod-2)*sd(results$t[,2])
lines(new.dat,y.boot.fit,col="red")
lines(new.dat,y.boot.fit.lCI,lty=2,col="red");#lower confidence interval
lines(new.dat,y.boot.fit.uCI,lty=2,col="red");#upper confidence interval
legend("bottomright",legend=c("Linear Model","Bootstrapped Model"),lty=1,
col=c("black","red"),ncol=1,cex=1,bty="n",y.intersp=1.5,x.intersp=0.75,
xpd=NA,xjust=0.5)
Using this code I get this output where the bootstrapped confidence interval is on top of the bootstrapped fitted line.
Any help/direction is appreciated. Granted this might be better suited for Cross Validated or other statistic specific boards.

Calculating 95% confidence intervals in quantile regression in R using rq function for polynomial

This is an extension from this question about 95% confidence intervals for quantile regression using rquant:
Calculating 95% confidence intervals in quantile regression in R using rq function
Here, the goal is to determine 95% confidence intervals for quantile regression for a polynomial fit.
Data:
x<-1:50
y<-c(x[1:50]+rnorm(50,0,5))^2
Attempt using the approach in the aforementioned question:
QR.b <- boot.rq(cbind(1,x,x^2),y,tau=0.5, R=1000)
t(apply(QR.b, 2, quantile, c(0.025,0.975)))
2.5% 97.5%
[1,] -14.9880661 126.906083
[2,] -20.5603779 5.424308
[3,] 0.8608203 1.516513
But this of course determines the 95% CI for each coefficient independently, and would appear to overestimates the interval (see image below).
I had another idea for an approach simply determining the coefficients from a bootstrap sample of the data (i.e. rq(y~x+I(x^2)) on thousands of samples of y and x), but wanted to see if there is something build it to the package.

Confidence intervals for predicted probabilities from predict.lrm

I am trying to determine confidence intervals for predicted probabilities from a binomial logistic regression in R. The model is estimated using lrm (from the package rms) to allow for clustering standard errors on survey respondents (each respondent appears up to 3 times in the data):
library(rms)
model1<-lrm(outcome~var1+var2+var3,data=mydata,x=T,y=T,se.fit=T)
model.rob<-robcov(model1,cluster=respondent.id)
I am able to estimate a predicted probability for the outcome using predict.lrm:
predicted.prob<-predict(model.rob,newdata=data.frame(var1=1,var2=.33,var3=.5),
type="fitted")
What I want to determine is a 95% confidence interval for this predicted probability. I have tried specifying se.fit=T, but this not permissible in predict.lrm when type=fitted.
I have spent the last few hours scouring the Internet for how to do this with lrm to no avail (obviously). Can anyone point me toward a method for determining this confidence interval? Alternatively, if it is impossible or difficult with lrm models, is there another way to estimate a logit with clustered standard errors for which confidence intervals would be more easily obtainable?
The help file for predict.lrm has a clear example. Here is a slight modification of it:
L <- predict(fit, newdata=data.frame(...), se.fit=TRUE)
plogis(with(L, linear.predictors + 1.96*cbind(- se.fit, se.fit)))
For some problems you may want to use the gendata or Predict functions, e.g.
L <- predict(fit, gendata(fit, var1=1), se.fit=TRUE) # leave other vars at median/mode
Predict(fit, var1=1:2, var2=3) # leave other vars at median/mode; gives CLs

Any simple way to get regression prediction intervals in R?

I am working on a big data set having over 300K elements, and running some regression analysis trying to estimate a parameter called Rate using the predictor variable Distance. I have the regression equation. Now I want to get the confidence and prediction intervals. I can easily get the confidence intervals for the coefficients by the command:
> confint(W1500.LR1, level = 0.95)
2.5 % 97.5 %
(Intercept) 666.2817393 668.0216072
Distance 0.3934499 0.3946572
which gives me the upper and lower bounds for the CI of the coefficients. Now I want to get the same upper and lower bounds for the Prediction Intervals. Only thing I have learnt so far is that, I can get the prediction intervals for specific values of Distance (say 200, 500, etc.) using the code:
predict(W1500.LR1, newdata, interval="predict")
This is not useful for me because I have over 300K different distance values, requiring to run this code for each of them. Any simple way to get the prediction intervals like the confint command I showed above?
Had to make up my own data but here you go
x = rnorm(300000)
y = jitter(3*x,1000)
fit = lm(y~x)
#Prediction intervals
pred.int = predict(fit,interval="prediction")
#Confidence intervals
conf.int = predict(fit,interval="confidence")
fitted.values = pred.int[,1]
pred.lower = pred.int[,2]
pred.upper = pred.int[,3]
plot(x[1:1000],y[1:1000])
lines(x[1:1000],fitted.values[1:1000],col="red",lwd=2)
lines(x[1:1000],pred.lower[1:1000],lwd=2,col="blue")
lines(x[1:1000],pred.upper[1:1000],lwd=2,col="blue")
So as you can see your prediction is for predicting new data values and not for constructing intervals for the beta coefficients. So the confidence intervals you actually want would be obtained in the same fashion from conf.int.

Resources