Bootstrapping in R. By Using Nagelkerke R-Squared - r

I am new in R. I try to use the boot() function in R, by using the Nagelkerke R-squared as the statistics parameter. I know that I need a function that measure Nagelkerke R-squared of original over the resample. However, I have no idea what should I put as the statistical function.
I know that Nagelkerke R-squared can be compute by using deviance and Null.deviance given Logit regression. I write the function to compute Nagelkerke R-squared.
NagR2 <- function(Objects){
n <- nrow(Objects)
reg <- glm(form,
family = binomial("logit"), data = datainput)
mo <- stepAIC(regression,direction = c("backward"), trace = FALSE)
R2cox <- 1- exp((mo$deviance - mo$null.deviance)/n)
R2nag <- R2cox/(1-exp((-mo$null.deviance)/n))
R2nag
}
How should I change my NagR2 function so that I can use it as statistic in the boot() function?

You need to alter the function, to take in a input data.frame as first argument, and indices of the data.frame as the second, and other arguments, so changing your existing function a bit:
NagR2 <- function(datainput,ind,form){
n <- nrow(datainput[ind,])
reg <- glm(form,family = binomial("logit"), data = datainput[ind,])
mo <- stepAIC(reg,direction = c("backward"), trace = FALSE)
R2cox <- 1- exp((mo$deviance - mo$null.deviance)/n)
R2nag <- R2cox/(1-exp((-mo$null.deviance)/n))
R2nag
}
And applying onto a test dataset:
library(MASS)
library(boot)
dat = iris
dat$Species=factor(ifelse(dat$Species=="versicolor","v","o"))
bo = boot(dat,statistic=NagR2,R=100,form = as.formula(Species ~ .))
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = dat, statistic = NagR2, R = 100, form = as.formula(Species ~
.))
Bootstrap Statistics :
original bias std. error
t1* 0.3650395 0.01470299 0.0720022

Related

Bootstrapping in R: Predict

I am running a program where I conduct an OLS regression and then I subtract the coefficients from the actual observations to keep the residuals.
model1 = lm(data = final, obs ~ day + poly(temp,2) + prpn + school + lag1) # linear model
predfit = predict(model1, final) # predicted values
residuals = data.frame(final$obs - predfit) # obtain residuals
I want to bootstrap my model and then do the same with the bootstrapped coefficients. I try doing this the following way:
lboot <- lm.boot(model1, R = 1000)
predfit = predict(lboot, final)
residuals = data.frame(final$obs - predfit) # obtain residuals
However, that does not work. I also try:
boot_predict(model1, final, R = 1000, condense = T, comparison = "difference")
and that also does not work.
How can I bootstrap my model and then predict based of that?
If you're trying to fit the best OLS using bootstrap, I'd use the caret package.
library(caret)
#separate indep and dep variables
indepVars = final[,-final$obs]
depVar = final$obs
#train model
ols.train = train(indepVars, depVar, method='lm',
trControl = trainControl(method='boot', number=1000))
#make prediction and get residuals
ols.pred = predict(ols.train, indepVars)
residuals = ols.pred - final$obs

R: obtain coefficients&CI from bootstrapping mixed-effect model results

The working data looks like:
set.seed(1234)
df <- data.frame(y = rnorm(1:30),
fac1 = as.factor(sample(c("A","B","C","D","E"),30, replace = T)),
fac2 = as.factor(sample(c("NY","NC","CA"),30,replace = T)),
x = rnorm(1:30))
The lme model is fitted as:
library(lme4)
mixed <- lmer(y ~ x + (1|fac1) + (1|fac2), data = df)
I used bootMer to run the parametric bootstrapping and I can successfully obtain the coefficients (intercept) and SEs for fixed&random effects:
mixed_boot_sum <- function(data){s <- sigma(data)
c(beta = getME(data, "fixef"), theta = getME(data, "theta"), sigma = s)}
mixed_boot <- bootMer(mixed, FUN = mixed_boot_sum, nsim = 100, type = "parametric", use.u = FALSE)
My first question is how to obtain the coefficients(slope) of each individual levels of the two random effects from the bootstrapping results mixed_boot ?
I have no problem extracting the coefficients(slope) from mixed model by using augment function from broom package, see below:
library(broom)
mixed.coef <- augment(mixed, df)
However, it seems like broom can't deal with boot class object. I can't use above functions directly on mixed_boot.
I also tried to modify the mixed_boot_sum by adding mmList( I thought this would be what I am looking for), but R complains as:
Error in bootMer(mixed, FUN = mixed_boot_sum, nsim = 100, type = "parametric", :
bootMer currently only handles functions that return numeric vectors
Furthermore, is it possible to obtain CI of both fixed&random effects by specifying FUN as well?
Now, I am very confused about the correct specifications for the FUN in order to achieve my needs. Any help regarding to my question would be greatly appreciated!
My first question is how to obtain the coefficients(slope) of each individual levels of the two random effects from the bootstrapping results mixed_boot ?
I'm not sure what you mean by "coefficients(slope) of each individual level". broom::augment(mixed, df) gives the predictions (residuals, etc.) for every observation. If you want the predicted coefficients at each level I would try
mixed_boot_coefs <- function(fit){
unlist(coef(fit))
}
which for the original model gives
mixed_boot_coefs(mixed)
## fac1.(Intercept)1 fac1.(Intercept)2 fac1.(Intercept)3 fac1.(Intercept)4
## -0.4973925 -0.1210432 -0.3260958 0.2645979
## fac1.(Intercept)5 fac1.x1 fac1.x2 fac1.x3
## -0.6288728 0.2187408 0.2187408 0.2187408
## fac1.x4 fac1.x5 fac2.(Intercept)1 fac2.(Intercept)2
## 0.2187408 0.2187408 -0.2617613 -0.2617613
## ...
If you want the resulting object to be more clearly named you can use:
flatten <- function(cc) setNames(unlist(cc),
outer(rownames(cc),colnames(cc),
function(x,y) paste0(y,x)))
mixed_boot_coefs <- function(fit){
unlist(lapply(coef(fit),flatten))
}
When run through bootMer/confint/boot::boot.ci these functions will give confidence intervals for each of these values (note that all of the slopes facW.xZ are identical across groups because the model assumes random variation in the intercept only). In other words, whatever information you know how to extract from a fitted model (conditional modes/BLUPs [ranef], predicted intercepts and slopes for each level of the grouping variable [coef], parameter estimates [fixef, getME], random-effects variances [VarCorr], predictions under specific conditions [predict] ...) can be used in bootMer's FUN argument, as long as you can flatten its structure into a simple numeric vector.

Why parametric bootstrapping bias and standard error are zero here?

I'm performing parametric bootstrapping in R for a simple problem and getting Bias and Standard Error zero always. What am I doing wrong?
set.seed(12345)
df <- rnorm(n=10, mean = 0, sd = 1)
Boot.fun <-
function(data) {
m1 <- mean(data)
return(m1)
}
Boot.fun(data = df)
library(boot)
out <- boot(df, Boot.fun, R = 20, sim = "parametric")
out
PARAMETRIC BOOTSTRAP
Call:
boot(data = df, statistic = Boot.fun, R = 20, sim = "parametric")
Bootstrap Statistics :
original bias std. error
t1* -0.1329441 0 0
You need to add line of code to do the sampling, ie.
Boot.fun <-
function(data) {
data <- sample(data, replace=T)
m1 <- ...
since you didn't supply a function to the argument rand.gen to generate random values. This is discussed in the documentation for ?boot. If sim = "parametric" and you don't supply a generating function, then the original data is passed to statistic and you need to sample in that function. Since your simulation was run on the same data, there is no standard error or bias.

ANOVA after using glm.fit

I would like to perform a likelihood ratio test to determine the power of a model term in a DOE. Till now I have been using the p-value from the glm fit to do this and things have been fine. As I started to use the anova function, I realized that there does not seem to be an anova function designed to accept the input from a glm.fit function, only a glm function. Here is an example of what I would like to do:
X # This is a model matrix from matrix.model
y # These are the y values for the fit
tfit = glm.fit(X, y, family = poisson())
anova(tfit, test = 'LRT')
Typically I would assume that the anova function call would just need to be altered to anova.glm, but that is not the case. How can I get the glm.fit function output to be compatible with an anova function input?
The problem is that glm.fit does not output of class glm, but a raw list with all kinds of data about the model. This cannot be fed to anova.glm since this function expects an object of class glm as produced by the glm function. If you have the raw data available (thus not turned in to a model matrix, you can apply the glm function to this to produce the desired outcome.
X <- matrix(c(runif(10), rnorm(10)), ncol = 2)
y <- round(runif(10, 1, 5))
X.mm <- model.matrix(y ~ X)
model.fit.1 <- glm.fit(X.mm, y, family = poisson())
class(model.fit.1)
model.fit.2 <- glm(y ~ X, family = "poisson")
class(model.fit.2)
anova(model.fit.2, test = "LRT")
If you can't use the glm function and must use the glm.fit then you can construct the LRT yourself from the glm.fit output. For a start take the following function
LRT.glm.fit <- function(glm.fit.mod){
df.null <- glm.fit.mod$df.null
df.mod <- glm.fit.mod$df.residual
dev.null <- glm.fit.mod$null.deviance
dev.mod <- glm.fit.mod$deviance
dev.diff <- dev.null - dev.mod
p.value <- 1 - pchisq(dev.null - dev.mod, df.null - df.mod)
output <- c(round(df.null), round(df.mod), dev.null, dev.mod, p.value)
names(output) <- c("df.null", "df.mod", "dev.null", "dev.mod", "p.value")
output
}

boot() generating an error on replacement - R

I've written a couple of functions for retrieving statistics (coefficients and p-values) from an lm object, to be bootstrapped upon. The coefficient one works; the p-value one is failing with error:
Error in boot(data = data, statistic = bs_p, R = 1000) :
number of items to replace is not a multiple of replacement length
I now believe the error is related to the inclusion of a factor variable. Attempting to recreate the problem with easily reproducible data.
L3 <- LETTERS[1:3]
data <- data.frame(cbind(x = 20:69, y = 1:50), fac = sample(L3, 50, replace = TRUE))
bs_p <- function (data, i) {
d <- data[i,]
fit <- lm (d$y~d$x*d$fac, data=d)
return(summary(fit)$coefficients[,4])
}
bt <- boot(data=data, statistic=bs_p, R=1000)
The class "numeric" values returned from each of these appears to be in exactly the same format, to my beginner's eye... but I'm guessing it isn't? I have also cleared the returned bt bootstrap object before running the next function, but that did not solve it. How could I best retrieve boot-strapped p-values? Thanks for any thoughts. (Running R 3.0.1 on Mac OSX.)
I am not sure if you can bootstrap p-values from lm model (but the solution is provided for that) . In your bs or bs_r function, you can remove d$ on the right hand side of fit since you already defined data d. Here is the example using mtcars data :
library(boot)
bs <- function(mtcars, i) {
d <- mtcars[i,]
fit <- lm (mpg~drat+wt, data=d)
return(coef(fit))
}
bt <- boot(data=mtcars, statistic=bs, R=1000)
bt
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = mtcars, statistic = bs, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* 30.290370 0.54284222 7.494441
t2* 1.442491 -0.07260619 1.393801
t3* -4.782890 -0.09804271 1.000838
Here is the p-values for bootstrapped p-values from lm.
bs_r <- function(mtcars, i) {
d <- mtcars[i,]
fit <- lm (mpg~drat+wt, data=d)
return(summary(fit)$coefficients[,4])
}
bt1 <- boot(data=mtcars, statistic=bs_r, R=1000)
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = mtcars, statistic = bs_r, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* 2.737824e-04 4.020024e-03 0.0253248217
t2* 3.308544e-01 7.108738e-02 0.2960776146
t3* 1.589075e-06 5.405459e-05 0.0005540412

Resources