BCa and p-value, how to find intercept - r

I am trying to extrapolate the p-value of my numerous bootstrap BCA. I know that confidence intervals are more robust but in my case I have hundreds of bootstrapped glm and thus I need to correct for multiple comparisons.
As BCa bootstraps are radically different than the basic ones (using the boot function), I cannot find the p-value in a usual, straightforward way.
Instead I need to calculate the BCA for a range of alpha levels and plot alpha vs the confidence limit (upper or lower as appropriate) and graphically I can find the alpha where the plot of alpha vs limit crosses a vertical line at the hypothesized value.
So here is my code so far:
# Run the boot function
boot.ci(results, type="bca", index=2)
# Extrapolate the alphas and CI limits
conf <- c()
alphas <- seq(1,.01,by=-0.01)
for (i in alphas) {
conf <- c(conf, boot.ci(results, type="bca", index=2, conf=1-i)$bca[5])
}
# Plot the results
ggplot(data.frame(alphas=alphas,conf=conf), aes(conf,alphas))+
geom_line() +geom_vline(xintercept=0)
It works very well but I am running into a problem now, I need to find the exact point where the alpha intercept with the abline=0 which is basically my p-value.
Here is my figure:
Do you have an easy solution to find the intercept? In this case the p-value/intercept would be around 0.01
Thanks a lot

Related

95% confidence interval for smooth.spline in R [duplicate]

I have used smooth.spline to estimate a cubic spline for my data. But when I calculate the 90% point-wise confidence interval using equation, the results seems to be a little bit off. Can someone please tell me if I did it wrongly? I am just wondering if there is a function that can automatically calculate a point-wise interval band associated with smooth.spline function.
boneMaleSmooth = smooth.spline( bone[males,"age"], bone[males,"spnbmd"], cv=FALSE)
error90_male = qnorm(.95)*sd(boneMaleSmooth$x)/sqrt(length(boneMaleSmooth$x))
plot(boneMaleSmooth, ylim=c(-0.5,0.5), col="blue", lwd=3, type="l", xlab="Age",
ylab="Relative Change in Spinal BMD")
points(bone[males,c(2,4)], col="blue", pch=20)
lines(boneMaleSmooth$x,boneMaleSmooth$y+error90_male, col="purple",lty=3,lwd=3)
lines(boneMaleSmooth$x,boneMaleSmooth$y-error90_male, col="purple",lty=3,lwd=3)
Because I am not sure if I did it correctly, then I used gam() function from mgcv package.
It instantly gave a confidence band but I am not sure if it is 90% or 95% CI or something else. It would be great if someone can explain.
males=gam(bone[males,c(2,4)]$spnbmd ~s(bone[males,c(2,4)]$age), method = "GCV.Cp")
plot(males,xlab="Age",ylab="Relative Change in Spinal BMD")
I'm not sure the confidence intervals for smooth.spline have "nice" confidence intervals like those form lowess do. But I found a code sample from a CMU Data Analysis course to make Bayesian bootstap confidence intervals.
Here are the functions used and an example. The main function is spline.cis where the first parameter is a data frame where the first column are the x values and the second column are the y values. The other important parameter is B which indicates the number bootstrap replications to do. (See the linked PDF above for the full details.)
# Helper functions
resampler <- function(data) {
n <- nrow(data)
resample.rows <- sample(1:n,size=n,replace=TRUE)
return(data[resample.rows,])
}
spline.estimator <- function(data,m=300) {
fit <- smooth.spline(x=data[,1],y=data[,2],cv=TRUE)
eval.grid <- seq(from=min(data[,1]),to=max(data[,1]),length.out=m)
return(predict(fit,x=eval.grid)$y) # We only want the predicted values
}
spline.cis <- function(data,B,alpha=0.05,m=300) {
spline.main <- spline.estimator(data,m=m)
spline.boots <- replicate(B,spline.estimator(resampler(data),m=m))
cis.lower <- 2*spline.main - apply(spline.boots,1,quantile,probs=1-alpha/2)
cis.upper <- 2*spline.main - apply(spline.boots,1,quantile,probs=alpha/2)
return(list(main.curve=spline.main,lower.ci=cis.lower,upper.ci=cis.upper,
x=seq(from=min(data[,1]),to=max(data[,1]),length.out=m)))
}
#sample data
data<-data.frame(x=rnorm(100), y=rnorm(100))
#run and plot
sp.cis <- spline.cis(data, B=1000,alpha=0.05)
plot(data[,1],data[,2])
lines(x=sp.cis$x,y=sp.cis$main.curve)
lines(x=sp.cis$x,y=sp.cis$lower.ci, lty=2)
lines(x=sp.cis$x,y=sp.cis$upper.ci, lty=2)
And that gives something like
Actually it looks like there might be a more parametric way to calculate confidence intervals using the jackknife residuals. This code comes from the S+ help page for smooth.spline
fit <- smooth.spline(data$x, data$y) # smooth.spline fit
res <- (fit$yin - fit$y)/(1-fit$lev) # jackknife residuals
sigma <- sqrt(var(res)) # estimate sd
upper <- fit$y + 2.0*sigma*sqrt(fit$lev) # upper 95% conf. band
lower <- fit$y - 2.0*sigma*sqrt(fit$lev) # lower 95% conf. band
matplot(fit$x, cbind(upper, fit$y, lower), type="plp", pch=".")
And that results in
And as far as the gam confidence intervals go, if you read the print.gam help file, there is an se= parameter with default TRUE and the docs say
when TRUE (default) upper and lower lines are added to the 1-d plots at 2 standard errors above and below the estimate of the smooth being plotted while for 2-d plots, surfaces at +1 and -1 standard errors are contoured and overlayed on the contour plot for the estimate. If a positive number is supplied then this number is multiplied by the standard errors when calculating standard error curves or surfaces. See also shade, below.
So you can adjust the confidence interval by adjusting this parameter. (This would be in the print() call.)
The R package mgcv calculates smoothing splines and Bayesian "confidence intervals." These are not confidence intervals in the usual (frequentist) sense, but numerical simulations have shown that there is almost no difference; see the linked paper by Marra and Wood in the help file of mgcv.
library(SemiPar)
data(lidar)
require(mgcv)
fit=gam(range~s(logratio), data = lidar)
plot(fit)
with(lidar, points(logratio, range-mean(range)))

Change significance level MannKendall trend test -- R

I want to perform Mann-Kendall test at 99% and 90% confidence interval (CI). When running the lines below the analysis will be based on a 95% CI. How to change the code to perform it on 99 and 90% CI?
vec = c(1,2,3,4,5,6,7,8,9,10)
MannKendall(vec)
I cannot comment yet, but I have a question, what do you mean when you say that you need to perform the analysis on a 99 and 95% CI. Do you want to know if your value is significant at the 99 and 90% significance level?
If you just need to know if your score is significant at 99 and 90% significance then r2evans was right, the alpha or significance level is just an arbitrary threshold that you use to define how small your probability should be for you to assume that there "is no effect" or in this case that there is independence between the observations. More importantly, the calculation of the p-value is independent of the confidence level you select, so if you want to know if your result is significant at different confidence levels just compare your p-value at those levels.
I checked how the function works and did not see any indication that the alpha level selected is going to affect the results. if you check the source code of MannKendall(x) (by typing MannKendall without parenthesis or anything) you can see that is just Kendall(1:length(x), x). The function Kendall calculates a statistic tau, that "measures the strength of monotonic association between the vectors x and y", then it returns a p-value by calculating how likely your observed tau is under the assumption that there is no relation between length(x) and x. In other words, how likely it is that you obtain that tau just by chance, as you can see this is not dependent on the confidence level at all, the confidence level only matters at the end when you are deciding how small the probability of your tau should be for you to assume that it cannot have been obtained just by chance.

Plotting standard errors for effects

I have a lme4 model I have run for a hierarchical logistic regression, and I'm plotting the effects using the effects package. I would like to create an effects graph with the standard error of the mean as the error bars. I can get the point estimates, 95% confidence intervals, and standard errors into a dataframe. The standard errors, however, seem at odds with the confidence limit parameters, see below for an example in a regular glm.
library(effects)
library(dplyr)
mtcars <- mtcars %>%
mutate(vs = factor(vs))
glm1 <- glm(am ~ vs, mtcars, family = "binomial")
(glm1_eff <- Effect("vs", glm1) %>%
as.data.frame())
vs fit se lower upper
1 0 0.3333333 0.4999999 0.1580074 0.5712210
2 1 0.5000000 0.5345225 0.2596776 0.7403224
My understanding is that the fit column displays the point estimate for the probability of am is equal to 1 and that lower and upper correspond to the 95% confidence intervals for the probability that am equals 1. Note that the standard error does not seem to correspond to the confidence interval (e.g., .33+.49 > .57).
Here's what I am shooting for. As opposed to a 95% confidence interval, I would like to have an effects plot with +- the standard error of the mean.
Are the standard errors in log-odds instead of probability? Is there a simply way to convert them to probabilities and plot them so that I can make the graph?
John Fox shared this helpful response:
From ?Effect: "se: (for "eff" objects) a vector of standard errors for the effect, on the scale of the linear predictor." So the standard errors are on the log-odds scale." You could use the delta method to get standard errors on the probability scale but that would be very ill-advised, since the approach to asymptotic normality of estimated probabilities will be much slower than of log-odds. Effect() computes confidence limits on the scale of the linear predictor (log-odds for a logit model) and then inverse-transforms them to the scale of the response (probabilities).
All of the information you need to create a custom plot is in the "eff" object returned by Effect(); the contents of the object are documented in ?Effect.
I agree, by the way, that the as.data.frame.eff() method could be improved, and I'll do that when I have a chance. In particular, it invites misunderstanding to report the effects and confidence limits on the scale of the response but to show standard errors for the linear-predictor scale.
I'm answering the mystery first, then addressing the "show SE on the plot" question
Explanation of the SE mystery: All math in a GLM needs to be done on the link scale because this is the additive scale (where stuff can be added up). So...
The values in the column "fit" are the predicted probability of success (or the "predictions on the response scale"). Their values are expit(b0) and expit(b0 + b1). expit() is the inverse logit function. The SEs are on the link scale. An SE on the response scale doesn't make much sense because the response scale is non-linear (although its kinda weird to have stats on the response and link scale in the same table). "lower" and "upper" are on the response scale, so these are the CIs of the predicted probabilities of success. They are computed as expit(b0 ± 1.96SE) and expit(b0 + b1 ± 1.96SE). To recover these values with what is given
library(boot) # inv.logit and logit functions
expit.pred_0 <- 1/3 # fit 0
expit.pred_1 <- 1/2 # fit 1
se1 <- 1/2
se2 <- .5345225
inv.logit(logit(expit.pred_0) - qnorm(.975)*se1)
inv.logit(logit(expit.pred_0) + qnorm(.975)*se1)
inv.logit(logit(expit.pred_1) - qnorm(.975)*se2)
inv.logit(logit(expit.pred_1) + qnorm(.975)*se2)
> inv.logit(logit(expit.pred_0) - qnorm(.975)*se1)
[1] 0.1580074
> inv.logit(logit(expit.pred_0) + qnorm(.975)*se1)
[1] 0.5712211
> inv.logit(logit(expit.pred_1) - qnorm(.975)*se2)
[1] 0.2596776
> inv.logit(logit(expit.pred_1) + qnorm(.975)*se2)
[1] 0.7403224
Showing an SE computed from a glm on the response (non additive) scale doesn't make any sense because the SE is only additive on the link scale. In other words Multiplying SE by some quantile on the response scale (the scale of the plot you envision, with probability on the y axis) is meaningless. A CI is a point estimate back transformed from the link scale and so makes sense for plotting.
I frequently see researchers plotting SE bars computed from a linear model, like you envision, even though the statistics presented are from a GLM. These SE's are meaningful in a sense I guess but they often imply absurd consequences (like probabilities that could be less than zero or greater than one) so...don't do that either.

{Methcomp} – Deming / orthogonal regression – goodness of fit + confidence intervals

A question following this post. I have the following data:
x1, disease symptom
y1, another disease symptom
I fitted the x1/y1 data with a Deming regression with vr (or sdr) option set to 1. In other words, the regression is a Total Least Squares regression, i.e. orthogonal regression. See previous post for the graph.
x1=c(24.0,23.9,23.6,21.6,21.0,20.8,22.4,22.6,
21.6,21.2,19.0,19.4,21.1,21.5,21.5,20.1,20.1,
20.1,17.2,18.6,21.5,18.2,23.2,20.4,19.2,22.4,
18.8,17.9,19.1,17.9,19.6,18.1,17.6,17.4,17.5,
17.5,25.2,24.4,25.6,24.3,24.6,24.3,29.4,29.4,
29.1,28.5,27.2,27.9,31.5,31.5,31.5,27.8,31.2,
27.4,28.8,27.9,27.6,26.9,28.0,28.0,33.0,32.0,
34.2,34.0,32.6,30.8)
y1=c(100.0,95.5,93.5,100.0,98.5,99.5,34.8,
45.8,47.5,17.4,42.6,63.0,6.9,12.1,30.5,
10.5,14.3,41.1, 2.2,20.0,9.8,3.5,0.5,3.5,5.7,
3.1,19.2,6.4, 1.2, 4.5, 5.7, 3.1,19.2, 6.4,
1.2,4.5,81.5,70.5,91.5,75.0,59.5,73.3,66.5,
47.0,60.5,47.5,33.0,62.5,87.0,86.0,77.0,
86.0,83.0,78.5,83.0,83.5,73.0,69.5,82.5,78.5,
84.0,93.5,83.5,96.5,96.0,97.5)
x11()
plot(x1,y1,xlim=c(0,35),ylim=c(0,100))
library(MethComp)
dem_reg <- Deming(x1, y1)
abline(dem_reg[1:2], col = "green")
I would like to know how much x1 helps to predict y1:
normally, I’d go for a R-squared, but it does not seem to be relevant; although another mathematician told me he thinks a R-squared may be appropriate. And this page suggests to calculate a Pearson product-moment correlation coefficient, which is R I believe?
partially related, there is possibly a tolerance interval. I could calculated it with R ({tolerance} package or code shown in the post), but it is not exactly what I am searching for.
Does someone know how to calculate a goodness of fit for Deming regression, using R? I looked at MetchComp pdf but could not find it (perhaps missed it though).
EDIT: following Gaurav's answers about confidence interval: R code
Firstly: confidence intervals for parameters
library(mcr)
MCR_reg=mcreg(x1,y1,method.reg="Deming",error.ratio=1,method.ci="analytical")
getCoefficients(MCR_reg)
Secondly: confidence intervals for predicted values
# plot of data
x11()
plot(x1,y1,xlim=c(0,35),ylim=c(0,100))
# Deming regression using functions from {mcr}
library(mcr) MCR_reg=mcreg(x1,y1,method.reg="Deming",error.ratio=1,method.ci="analytical")
MCR_intercept=getCoefficients(MCR_reg)[1,1]
MCR_slope=getCoefficients(MCR_reg)[2,1]
# CI for predicted values
x_to_predict=seq(0,35)
predicted_values=MCResultAnalytical.calcResponse(MCR_reg,x_to_predict,alpha=0.05)
CI_low=predicted_values[,4]
CI_up=predicted_values[,5]
# plot regression line and CI for predicted values
abline(MCR_intercept,MCR_slope, col="red")
lines(x_to_predict,CI_low,col="royalblue",lty="dashed")
lines(x_to_predict,CI_up,col="royalblue",lty="dashed")
# comments
text(7.5,60, "Deming regression", col="red")
text(7.5,40, "Confidence Interval for", col="royalblue")
text(7.5,35, "Predicted values - 95%", col="royalblue")
EDIT 2
Topic moved to Cross Validated:
https://stats.stackexchange.com/questions/167907/deming-orthogonal-regression-measuring-goodness-of-fit
There are many proposed methods to calculate goodness of fit and tolerance intervals for Deming Regression but none of them widely accepted. The conventional methods we use for OLS regression may not make sense. This is an area of active research. I don't think there many R-packages which will help you compute that since not many mathematicians agree on any particular method. Most methods for calculating intervals are based on Resampling techniques.
However you can check out the 'mcr' package for intervals...
https://cran.r-project.org/web/packages/mcr/

how can I predict probability of an event using the weibull distribution

I have a data set of connection forces based on axial force in N (http://pastebin.com/Huwg4vxv)
Some previous analyses has been undertaken (by another party) and has fitted a Weibull distribution to it, and then predicted that the chances of recording a force of 60N or higher is around 1.2%.
I have to say that eyeballing the data, that doesn't seem likely to me, but I know nothing about this particular distribution.
So far I am able to fit the curve:
force<-read.csv(file="forcestats.csv",header = T)
library(MASS)
fitdistr(force$F, 'weibull')
hist(force$F)
I am trying to understand
is a weibull distro really the best fit for this data ?
how I can make that same prediction using R (how to calculate the probability of values above 60N);
is it possible to calculate the 95% confidence interval for that value (i.e., 1.2% +/- x%)
Thanks for reading
Pete
To address your first item,
is a weibull distro really the best fit for this data ?
conceptually, this is more of a question about statistical inference rather than programming, so you most likely want to tackle that on CrossValidated rather than SO. However, you can certainly inquire about the means of investigating this programmatically, such as comparing the estimated density of the observed data to the theoretical density function or to the density function of random samples from a weibull distribution with your parameter estimates:
library(MASS)
##
Weibull <- read.csv(
"F:/Studio/MiscData/force_in_newtons.txt",
header=TRUE)
##
params <- fitdistr(Weibull$F, 'weibull')
##
Shape <- params[[1]][1]
Scale <- params[[1]][2]
##
set.seed(123)
plot(
density(
rweibull(
500,shape=Shape,scale=Scale)),
col="red",
lwd=2,lty=3,
main="")
##
lines(
density(
Weibull$F),
col="blue",
lty=3,lwd=2)
##
legend(
"topright",
legend=c(
"rweibull(n=500,...)",
"observed data"),
lty=c(3,3),
col=c("red","blue"),
lwd=c(3,3),
bty="n")
Of course, there are many other ways of assessing the fit of your model, this is just a quick sanity check.
As for your second question, you can use the pweibull function with lower.tail=FALSE to get probabilities from the theoretical survival function (S(x) = 1 - F(x)):
## Pr(X >= 60)
> pweibull(
60,shape=Shape,scale=Scale,
lower.tail=FALSE)
[1] 0.01268268
As for your final item, I believe that calculating confidence intervals on probabilities (as well as certain other statistical quantities) for an estimated distribution requires using the Delta method; I could be recalling incorrectly though, so you may want to double check on this. If this is the case and you aren't familiar with the Delta method, then unfortunately you will probably have to do a fair amount of reading on the subject because the calculation involved is generally non-trivial - here's another link; the Wikipedia article doesn't give a very in-depth treatment of the subject. Or, you could inquire about this on Cross Validated as well.

Resources