I plotted the autocorrelation function of a given set of residuals I obtained from estimating a linear regression model:
> require("stats")
> acf(Reg$residuals)
It resulted in the following graphic:
I then wanted to look up what kind of confidence interval (95%, 99%) is displayed, but there is no information on that within the help section of the function. In addition to that I could not find a way to adjust the confidence interval manually.
Is there a way to manually set the confidence interval displayed?
See ?plot.acf:
plot(x, ci = 0.95, ...)
and:
ci: coverage probability for confidence interval. Plotting of the confidence interval is suppressed if ci is zero or negative.
That is, the default is 95% confidence intervals, and e.g.:
plot(acf(Reg$residuals), ci = 0.99)
should plot the 99% confidence intervals.
Related
I have used smooth.spline to estimate a cubic spline for my data. But when I calculate the 90% point-wise confidence interval using equation, the results seems to be a little bit off. Can someone please tell me if I did it wrongly? I am just wondering if there is a function that can automatically calculate a point-wise interval band associated with smooth.spline function.
boneMaleSmooth = smooth.spline( bone[males,"age"], bone[males,"spnbmd"], cv=FALSE)
error90_male = qnorm(.95)*sd(boneMaleSmooth$x)/sqrt(length(boneMaleSmooth$x))
plot(boneMaleSmooth, ylim=c(-0.5,0.5), col="blue", lwd=3, type="l", xlab="Age",
ylab="Relative Change in Spinal BMD")
points(bone[males,c(2,4)], col="blue", pch=20)
lines(boneMaleSmooth$x,boneMaleSmooth$y+error90_male, col="purple",lty=3,lwd=3)
lines(boneMaleSmooth$x,boneMaleSmooth$y-error90_male, col="purple",lty=3,lwd=3)
Because I am not sure if I did it correctly, then I used gam() function from mgcv package.
It instantly gave a confidence band but I am not sure if it is 90% or 95% CI or something else. It would be great if someone can explain.
males=gam(bone[males,c(2,4)]$spnbmd ~s(bone[males,c(2,4)]$age), method = "GCV.Cp")
plot(males,xlab="Age",ylab="Relative Change in Spinal BMD")
I'm not sure the confidence intervals for smooth.spline have "nice" confidence intervals like those form lowess do. But I found a code sample from a CMU Data Analysis course to make Bayesian bootstap confidence intervals.
Here are the functions used and an example. The main function is spline.cis where the first parameter is a data frame where the first column are the x values and the second column are the y values. The other important parameter is B which indicates the number bootstrap replications to do. (See the linked PDF above for the full details.)
# Helper functions
resampler <- function(data) {
n <- nrow(data)
resample.rows <- sample(1:n,size=n,replace=TRUE)
return(data[resample.rows,])
}
spline.estimator <- function(data,m=300) {
fit <- smooth.spline(x=data[,1],y=data[,2],cv=TRUE)
eval.grid <- seq(from=min(data[,1]),to=max(data[,1]),length.out=m)
return(predict(fit,x=eval.grid)$y) # We only want the predicted values
}
spline.cis <- function(data,B,alpha=0.05,m=300) {
spline.main <- spline.estimator(data,m=m)
spline.boots <- replicate(B,spline.estimator(resampler(data),m=m))
cis.lower <- 2*spline.main - apply(spline.boots,1,quantile,probs=1-alpha/2)
cis.upper <- 2*spline.main - apply(spline.boots,1,quantile,probs=alpha/2)
return(list(main.curve=spline.main,lower.ci=cis.lower,upper.ci=cis.upper,
x=seq(from=min(data[,1]),to=max(data[,1]),length.out=m)))
}
#sample data
data<-data.frame(x=rnorm(100), y=rnorm(100))
#run and plot
sp.cis <- spline.cis(data, B=1000,alpha=0.05)
plot(data[,1],data[,2])
lines(x=sp.cis$x,y=sp.cis$main.curve)
lines(x=sp.cis$x,y=sp.cis$lower.ci, lty=2)
lines(x=sp.cis$x,y=sp.cis$upper.ci, lty=2)
And that gives something like
Actually it looks like there might be a more parametric way to calculate confidence intervals using the jackknife residuals. This code comes from the S+ help page for smooth.spline
fit <- smooth.spline(data$x, data$y) # smooth.spline fit
res <- (fit$yin - fit$y)/(1-fit$lev) # jackknife residuals
sigma <- sqrt(var(res)) # estimate sd
upper <- fit$y + 2.0*sigma*sqrt(fit$lev) # upper 95% conf. band
lower <- fit$y - 2.0*sigma*sqrt(fit$lev) # lower 95% conf. band
matplot(fit$x, cbind(upper, fit$y, lower), type="plp", pch=".")
And that results in
And as far as the gam confidence intervals go, if you read the print.gam help file, there is an se= parameter with default TRUE and the docs say
when TRUE (default) upper and lower lines are added to the 1-d plots at 2 standard errors above and below the estimate of the smooth being plotted while for 2-d plots, surfaces at +1 and -1 standard errors are contoured and overlayed on the contour plot for the estimate. If a positive number is supplied then this number is multiplied by the standard errors when calculating standard error curves or surfaces. See also shade, below.
So you can adjust the confidence interval by adjusting this parameter. (This would be in the print() call.)
The R package mgcv calculates smoothing splines and Bayesian "confidence intervals." These are not confidence intervals in the usual (frequentist) sense, but numerical simulations have shown that there is almost no difference; see the linked paper by Marra and Wood in the help file of mgcv.
library(SemiPar)
data(lidar)
require(mgcv)
fit=gam(range~s(logratio), data = lidar)
plot(fit)
with(lidar, points(logratio, range-mean(range)))
I am attempting to plot a bootstrapped linear model and I would like to include the models upper and lower confidence intervals and am not sure I am calculating the bootstrapped upper and lower bounds correctly. Below is an example using the cars dataset in r in conjunction with the boot library.
library(boot)
plot(speed~dist,cars,pch=21,bg="grey")
## standard linear model
mod<-lm(speed~dist,cars)
new.dat=seq(0,120,10)
mod.fit<-predict(mod,newdata=data.frame(dist=new.dat),interval="confidence")
lines(new.dat,mod.fit[,1]);#line fit
lines(new.dat,mod.fit[,2],lty=2);#lower confidence interval
lines(new.dat,mod.fit[,3],lty=2);#upper confidence interval
##Bootstrapped Confidence Intervals
lm.boot=function(formula, data, indices) {
d <- data[indices,] # allows boot to select sample
fit <- lm(formula, data=d)
return(coef(fit))
}
results <- boot(data=cars, statistic=lm.boot,
R=100, formula=speed~dist)
N.mod<-nrow(cars)
x.val<-new.dat
y.boot.fit<-(mean(results$t[,2])*x.val)+mean(results$t[,1])
y.boot.fit.uCI<-y.boot.fit+qt(0.975,N.mod-2)*sd(results$t[,2])
y.boot.fit.lCI<-y.boot.fit-qt(0.975,N.mod-2)*sd(results$t[,2])
lines(new.dat,y.boot.fit,col="red")
lines(new.dat,y.boot.fit.lCI,lty=2,col="red");#lower confidence interval
lines(new.dat,y.boot.fit.uCI,lty=2,col="red");#upper confidence interval
legend("bottomright",legend=c("Linear Model","Bootstrapped Model"),lty=1,
col=c("black","red"),ncol=1,cex=1,bty="n",y.intersp=1.5,x.intersp=0.75,
xpd=NA,xjust=0.5)
Using this code I get this output where the bootstrapped confidence interval is on top of the bootstrapped fitted line.
Any help/direction is appreciated. Granted this might be better suited for Cross Validated or other statistic specific boards.
Below I have my code, which determines the number of times the mean of a population falls within confidence intervals of samples taken from the mean. Basically, hoping to prove 95% confidence intervals work.
rp<-function(x,s,n){ #x-population data, s-number of samples taken from
population, n-size of samples
m<-mean(x)
ci.mat=NULL
tot=0
for(i in 1:s){
cix<-t.test(sample(x,n))$conf.int #obtain confidence intervals of 1000
samples of x
if(cix[1]<m & m<cix[2]){tot<-tot+1} #total number of confidence intervals containing pop mean
ci.mat<-rbind(ci.mat,cbind(cix[1],cix[2]))
}
par(mfrow=c(2,1))
hist(ci.mat[,1],main=paste("Lower Limits for Sample Confidence
Intervals"),xlab="Lower Limit")
hist(ci.mat[,2], main=paste("Upper Limits for Sample Confidence
Intervals"),xlab="Upper Limit")
return(data.frame(mean(x),tot/s))
}
I am hoping to add the population mean to my histograms, so I can show where the confidence intervals did not include the mean. So wherever the mean lies on the histogram with the lower limits, any values to the right of it would be part of a confidence interval that did not include the mean. I have no experience with modifying plots in R, so I don't even know if this is possible. Thanks for any help!
This is an extension from this question about 95% confidence intervals for quantile regression using rquant:
Calculating 95% confidence intervals in quantile regression in R using rq function
Here, the goal is to determine 95% confidence intervals for quantile regression for a polynomial fit.
Data:
x<-1:50
y<-c(x[1:50]+rnorm(50,0,5))^2
Attempt using the approach in the aforementioned question:
QR.b <- boot.rq(cbind(1,x,x^2),y,tau=0.5, R=1000)
t(apply(QR.b, 2, quantile, c(0.025,0.975)))
2.5% 97.5%
[1,] -14.9880661 126.906083
[2,] -20.5603779 5.424308
[3,] 0.8608203 1.516513
But this of course determines the 95% CI for each coefficient independently, and would appear to overestimates the interval (see image below).
I had another idea for an approach simply determining the coefficients from a bootstrap sample of the data (i.e. rq(y~x+I(x^2)) on thousands of samples of y and x), but wanted to see if there is something build it to the package.
I am working on a big data set having over 300K elements, and running some regression analysis trying to estimate a parameter called Rate using the predictor variable Distance. I have the regression equation. Now I want to get the confidence and prediction intervals. I can easily get the confidence intervals for the coefficients by the command:
> confint(W1500.LR1, level = 0.95)
2.5 % 97.5 %
(Intercept) 666.2817393 668.0216072
Distance 0.3934499 0.3946572
which gives me the upper and lower bounds for the CI of the coefficients. Now I want to get the same upper and lower bounds for the Prediction Intervals. Only thing I have learnt so far is that, I can get the prediction intervals for specific values of Distance (say 200, 500, etc.) using the code:
predict(W1500.LR1, newdata, interval="predict")
This is not useful for me because I have over 300K different distance values, requiring to run this code for each of them. Any simple way to get the prediction intervals like the confint command I showed above?
Had to make up my own data but here you go
x = rnorm(300000)
y = jitter(3*x,1000)
fit = lm(y~x)
#Prediction intervals
pred.int = predict(fit,interval="prediction")
#Confidence intervals
conf.int = predict(fit,interval="confidence")
fitted.values = pred.int[,1]
pred.lower = pred.int[,2]
pred.upper = pred.int[,3]
plot(x[1:1000],y[1:1000])
lines(x[1:1000],fitted.values[1:1000],col="red",lwd=2)
lines(x[1:1000],pred.lower[1:1000],lwd=2,col="blue")
lines(x[1:1000],pred.upper[1:1000],lwd=2,col="blue")
So as you can see your prediction is for predicting new data values and not for constructing intervals for the beta coefficients. So the confidence intervals you actually want would be obtained in the same fashion from conf.int.