Why parametric bootstrapping bias and standard error are zero here? - r

I'm performing parametric bootstrapping in R for a simple problem and getting Bias and Standard Error zero always. What am I doing wrong?
set.seed(12345)
df <- rnorm(n=10, mean = 0, sd = 1)
Boot.fun <-
function(data) {
m1 <- mean(data)
return(m1)
}
Boot.fun(data = df)
library(boot)
out <- boot(df, Boot.fun, R = 20, sim = "parametric")
out
PARAMETRIC BOOTSTRAP
Call:
boot(data = df, statistic = Boot.fun, R = 20, sim = "parametric")
Bootstrap Statistics :
original bias std. error
t1* -0.1329441 0 0

You need to add line of code to do the sampling, ie.
Boot.fun <-
function(data) {
data <- sample(data, replace=T)
m1 <- ...
since you didn't supply a function to the argument rand.gen to generate random values. This is discussed in the documentation for ?boot. If sim = "parametric" and you don't supply a generating function, then the original data is passed to statistic and you need to sample in that function. Since your simulation was run on the same data, there is no standard error or bias.

Related

Logistics regression in R plotting Bootstrap using Titanic Dataset

I am working on an exercise for a statistics online course. I need to create a logistic regression in R using the titanic dataset. Therefore I want to apply the bootstrap method to create and plot 95% confidence intervals for the prediction of the logistic regression.
When I run the bootstrap command and want to plot it, I get the error: "All values of t* are equal to 0.0159971772980342". Also, I get a bias and standard error of 0, which cannot be true. I guess there is an error in setting up the bootstrap command, but I unfortunately cannot find it. What can I try?
My Code:
library(boot)
set.seed(50000)
logit_test <- function(data, indices) {
dt <- data[indices,]
fit <- glm(Clean_data$Survived ~ Fare, data = Clean_data, family = "binomial")
return(coef(fit))
}
boot_strap <- boot(
data = Clean_data,
statistic = logit_test,
R = 100)
boot.ci(boot.out = boot_strap,
type = c("basic"))
#Now we look at the results and plot them
boot_strap
plot(boot_strap, index=2)
My Output:
> library(boot)
>
> set.seed(50000)
>
> logit_test <- function(data, indices) {
+ dt <- data[indices,]
+ fit <- glm(Clean_data$Survived ~ Fare, data = Clean_data, family = "binomial")
+ return(coef(fit))
+ }
> boot_strap <- boot(
+ data = Clean_data,
+ statistic = logit_test,
+ R = 100)
>
> boot.ci(boot.out = boot_strap,
+ type = c("basic"))
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 100 bootstrap replicates
CALL :
boot.ci(boot.out = boot_strap, type = c("basic"))
Intervals :
Level Basic
95% (-0.8968, -0.8968 )
Calculations and Intervals on Original Scale
Some basic intervals may be unstable
> boot_strap
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = Clean_data, statistic = logit_test, R = 100)
Bootstrap Statistics :
original bias std. error
t1* -0.89682819 0 0
t2* 0.01599718 0 0
> plot(boot_strap, index=2)
[1] "All values of t* are equal to 0.0159971772980342"
The problem is that you're bootstrap function isn't using the bootstrapped data to fit the model. You have this function:
logit_test <- function(data, indices){
dt <- data[indices, ]
fit <- glm(Clean_Travelers$Survived ~ FARE, data=Clean_Travelers,
family=binomial)
return(coef(fit))
}
Note that there are a couple of problems, one is that you should be using dt in the data= argument, but you should also not be using Clean_Travelers$Survived as the dependent variable, it should just be Survived because you want to ensure that you're taking that variable not from the original data, but from the bootstrapped data. Something like this for your bootstrap function should work:
logit_test <- function(data, indices){
dt <- data[indices, ]
fit <- glm(Survived ~ FARE, data=dt, family=binomial)
return(coef(fit))
}

Wald Test for Multinomial Reg. in R

I asked this question before but never got an answer, so I am trying again and providing a sample data set so someone can tell me why I'm getting the errors I'm getting when I try implementing the Wald Test from aod and lmtest packages.
Sample data:
marital <- sample(1:5, 64614, replace = T)
race <- sample(1:3, 64614, replace = T)
educ <- sample(1:20, 64614, replace = T)
test <- data.frame(educ, marital, race)
test$marital <- as.factor(test$marital)
test$race <- as.factor(test$race)
test$marital <- relevel(test$marital, ref = "3")
require(nnet)
require(aod)
require(lmtest)
testmod <- multinom(marital ~ race*educ, data = test)
testnull <- multinom(marital ~ 1, data = test) #null model for the global test
waldtest(testnull, testmod)
wald.test(b = coef(testmod), Sigma = vcov(testmod), Terms = 1:24) #testing all terms for the global test
As you can see, when I use the waldtest function from lmtest package I get the following error:
Error in solve.default(vc[ovar, ovar]) : 'a' is 0-diml
When I use the wald.test function from aod, I get the following error:
Error in L %*% b : non-conformable arguments
I assume these are related errors as they both seem to do with the variance matrix. I'm not sure why I'm getting these errors though as the data set has no missing values.
Just as heads up when using nnet package with multinom: You can also use the package broom to tidy things a bit by doing this:
tidy(multinom_model, conf.int= True, conf.level = 0.95, exponentiate = T)
This will return a tibble with the coefficients exponentiated, confidence intervals (similiar to confint used in lm), as well as the Z-scores, Standard errors and the respective p-value for the Wald Z test (essentially doing z = summary(multinom_model)$coefficients/summary(multinom_model)$standard.errors and the round((1 - pnorm(abs(z), 0, 1)) * 2,digits=5) already

Using coeftest results in predict.lm()

I am analyzing a dataset in which the variance of the error term in the regression is not constant for all observations. For this, I re-built the model, estimating heteroskedasticity-robust (Huber-White) standard errors using the coeftest function. Now, I want to use these new results for a prediction with predict() function.
The dataset looks like the following but with multiple X:
set.seed(123)
x <- rep(c(10, 15, 20, 25), each = 25)
e <- c()
e[1:25] <- rnorm(25, sd = 10)
e[26:50] <- rnorm(25, sd = 15)
e[51:75] <- rnorm(25, sd = 20)
e[76:100] <- rnorm(25, sd = 25)
y <- 720 - 3.3 * x + e
model <- lm(y ~ x)
library(lmtest)
library(sandwich)
coeftest(model, vcov=vcovHC(model, "HC1"))
I found the following solution for the issue on the internet:
predict.rob <- function(x,vcov,newdata){
if(missing(newdata)){ newdata <- x$model }
tt <- terms(x)
Terms <- delete.response(tt)
m.mat <- model.matrix(Terms,data=newdata)
m.coef <- x$coef
fit <- as.vector(m.mat %*% x$coef)
se.fit <- sqrt(diag(m.mat%*%vcov%*%t(m.mat)))
return(list(fit=fit,se.fit=se.fit))}
The remaining problem is that my regression has more than 1 regressor.
Is there any way to addapt this resolution to multiple (7) explanatory variables?
Thanks in advance!
I'm not sure but coeftest function is only performing a test. You can't use directly its result for your prediction. Maybe, you can in a way specifify to predict.lm the covaraince via vcovHC(model, "HC1"). I hope it will help you a bit.

Bootstrapping in R. By Using Nagelkerke R-Squared

I am new in R. I try to use the boot() function in R, by using the Nagelkerke R-squared as the statistics parameter. I know that I need a function that measure Nagelkerke R-squared of original over the resample. However, I have no idea what should I put as the statistical function.
I know that Nagelkerke R-squared can be compute by using deviance and Null.deviance given Logit regression. I write the function to compute Nagelkerke R-squared.
NagR2 <- function(Objects){
n <- nrow(Objects)
reg <- glm(form,
family = binomial("logit"), data = datainput)
mo <- stepAIC(regression,direction = c("backward"), trace = FALSE)
R2cox <- 1- exp((mo$deviance - mo$null.deviance)/n)
R2nag <- R2cox/(1-exp((-mo$null.deviance)/n))
R2nag
}
How should I change my NagR2 function so that I can use it as statistic in the boot() function?
You need to alter the function, to take in a input data.frame as first argument, and indices of the data.frame as the second, and other arguments, so changing your existing function a bit:
NagR2 <- function(datainput,ind,form){
n <- nrow(datainput[ind,])
reg <- glm(form,family = binomial("logit"), data = datainput[ind,])
mo <- stepAIC(reg,direction = c("backward"), trace = FALSE)
R2cox <- 1- exp((mo$deviance - mo$null.deviance)/n)
R2nag <- R2cox/(1-exp((-mo$null.deviance)/n))
R2nag
}
And applying onto a test dataset:
library(MASS)
library(boot)
dat = iris
dat$Species=factor(ifelse(dat$Species=="versicolor","v","o"))
bo = boot(dat,statistic=NagR2,R=100,form = as.formula(Species ~ .))
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = dat, statistic = NagR2, R = 100, form = as.formula(Species ~
.))
Bootstrap Statistics :
original bias std. error
t1* 0.3650395 0.01470299 0.0720022

boot() generating an error on replacement - R

I've written a couple of functions for retrieving statistics (coefficients and p-values) from an lm object, to be bootstrapped upon. The coefficient one works; the p-value one is failing with error:
Error in boot(data = data, statistic = bs_p, R = 1000) :
number of items to replace is not a multiple of replacement length
I now believe the error is related to the inclusion of a factor variable. Attempting to recreate the problem with easily reproducible data.
L3 <- LETTERS[1:3]
data <- data.frame(cbind(x = 20:69, y = 1:50), fac = sample(L3, 50, replace = TRUE))
bs_p <- function (data, i) {
d <- data[i,]
fit <- lm (d$y~d$x*d$fac, data=d)
return(summary(fit)$coefficients[,4])
}
bt <- boot(data=data, statistic=bs_p, R=1000)
The class "numeric" values returned from each of these appears to be in exactly the same format, to my beginner's eye... but I'm guessing it isn't? I have also cleared the returned bt bootstrap object before running the next function, but that did not solve it. How could I best retrieve boot-strapped p-values? Thanks for any thoughts. (Running R 3.0.1 on Mac OSX.)
I am not sure if you can bootstrap p-values from lm model (but the solution is provided for that) . In your bs or bs_r function, you can remove d$ on the right hand side of fit since you already defined data d. Here is the example using mtcars data :
library(boot)
bs <- function(mtcars, i) {
d <- mtcars[i,]
fit <- lm (mpg~drat+wt, data=d)
return(coef(fit))
}
bt <- boot(data=mtcars, statistic=bs, R=1000)
bt
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = mtcars, statistic = bs, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* 30.290370 0.54284222 7.494441
t2* 1.442491 -0.07260619 1.393801
t3* -4.782890 -0.09804271 1.000838
Here is the p-values for bootstrapped p-values from lm.
bs_r <- function(mtcars, i) {
d <- mtcars[i,]
fit <- lm (mpg~drat+wt, data=d)
return(summary(fit)$coefficients[,4])
}
bt1 <- boot(data=mtcars, statistic=bs_r, R=1000)
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = mtcars, statistic = bs_r, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* 2.737824e-04 4.020024e-03 0.0253248217
t2* 3.308544e-01 7.108738e-02 0.2960776146
t3* 1.589075e-06 5.405459e-05 0.0005540412

Resources