ANOVA after using glm.fit - r

I would like to perform a likelihood ratio test to determine the power of a model term in a DOE. Till now I have been using the p-value from the glm fit to do this and things have been fine. As I started to use the anova function, I realized that there does not seem to be an anova function designed to accept the input from a glm.fit function, only a glm function. Here is an example of what I would like to do:
X # This is a model matrix from matrix.model
y # These are the y values for the fit
tfit = glm.fit(X, y, family = poisson())
anova(tfit, test = 'LRT')
Typically I would assume that the anova function call would just need to be altered to anova.glm, but that is not the case. How can I get the glm.fit function output to be compatible with an anova function input?

The problem is that glm.fit does not output of class glm, but a raw list with all kinds of data about the model. This cannot be fed to anova.glm since this function expects an object of class glm as produced by the glm function. If you have the raw data available (thus not turned in to a model matrix, you can apply the glm function to this to produce the desired outcome.
X <- matrix(c(runif(10), rnorm(10)), ncol = 2)
y <- round(runif(10, 1, 5))
X.mm <- model.matrix(y ~ X)
model.fit.1 <- glm.fit(X.mm, y, family = poisson())
class(model.fit.1)
model.fit.2 <- glm(y ~ X, family = "poisson")
class(model.fit.2)
anova(model.fit.2, test = "LRT")
If you can't use the glm function and must use the glm.fit then you can construct the LRT yourself from the glm.fit output. For a start take the following function
LRT.glm.fit <- function(glm.fit.mod){
df.null <- glm.fit.mod$df.null
df.mod <- glm.fit.mod$df.residual
dev.null <- glm.fit.mod$null.deviance
dev.mod <- glm.fit.mod$deviance
dev.diff <- dev.null - dev.mod
p.value <- 1 - pchisq(dev.null - dev.mod, df.null - df.mod)
output <- c(round(df.null), round(df.mod), dev.null, dev.mod, p.value)
names(output) <- c("df.null", "df.mod", "dev.null", "dev.mod", "p.value")
output
}

Related

Predict mean response for a logistic regression model in R

I trained a logistic regression model in R using the glm function
model<-glm(df1$deny~df1$dir+df1$hir+df1$lvr+df1$ccs+df1$mcs+df1$pbcr+df1$dmi+df1$self+df1$single+df1$uria+df1$condominium+df1$black,data=df1,family='binomial')
Now i want to get the mean response for a data point
test<-c(0.59,0.24,0.941177,3,2,0,1,0,0,10.6,1,1)
the test data points are the respective predictors as in the model. i.e. dir = 0.59, hir = 0.24...
How to obtain the mean response in this case?
model <- glm(deny~dir+hir+lvr+ccs+mcs+pbcr+dmi+
self+single+uria+condominium+black,
data=df1,family='binomial')
test <- c(0.59,0.24,0.941177,3,2,0,1,0,0,10.6,1,1)
You can either use the model definition:
X <- matrix(c(1, test), nrow = 1)
beta <- coef(model)
drop(plogis(X %*% beta))
or
dftest <- as.data.frame(X)
names(dftest) <- c("dir", "hir", "lvr", "ccs", ...)
(you need to complete the list of names yourself, I'm lazy)
or possibly
names(dftest) <- setdiff(names(df1), "deny")
if the model variables match the order etc. of the data frame
Then:
predict(model, newdata = dftest, type = "response")
Sorted. I did
df.test<- df1[0,-13]
head(df.test)
test<- c(0.59,0.24,0.941177,3,2,0,1,0,0,10.6,1,1)
df.test[nrow(df.test)+1,]=test
pred<- model.1 %>% predict(df.test,type='response')
pred

Pass model formula as argument in R

I need to cross-validate several glmer models on the same data so I've made a function to do this (I'm not interested in preexisting functions for doing this). I want to pass an arbitrary glmer model to my function as the only argument. Sadly, I can't figure out how to do this, and the interwebz won't tell me.
Ideally, I would like to do something like:
model = glmer(y ~ x + (1|z), data = train_folds, family = "binomial"
model2 = glmer(y ~ x2 + (1|z), data = train_folds, family = "binomial"
And then call cross_validation_function(model) and cross_validation_function(model2). The training data within the function is called train_fold.
However, I suspect I need to pass the model formula in different way using reformulate.
Here is an example of my function. The project is about predicting autism(ASD) from behavioral features. The data variable is da.
library(pacman)
p_load(tidyverse, stringr, lmerTest, MuMIn, psych, corrgram, ModelMetrics,
caret, boot)
cross_validation_function <- function(model){
#creating folds
participants = unique(da$participant)
folds <- createFolds(participants, 10)
cross_val <- sapply(seq_along(folds), function(x) {
train_folds = filter(da, !(as.numeric(participant) %in% folds[[x]]))
predict_fold = filter(da, as.numeric(participant) %in% folds[[x]])
#model to be tested should be passed as an argument here
train_model <- model
predict_fold <- predict_fold %>%
mutate(predictions_perc = predict(train_model, predict_fold, allow.new.levels = T),
predictions_perc = inv.logit(predictions_perc),
predictions = ifelse(predictions_perc > 0.5, "ASD","control"))
conf_mat <- caret::confusionMatrix(data = predict_fold$predictions, reference = predict_fold$diagnosis, positive = "ASD")
accuracy <- conf_mat$overall[1]
sensitivity <- conf_mat$byClass[1]
specificity <- conf_mat$byClass[2]
fixed_ef <- fixef(train_model)
output <- c(accuracy, sensitivity, specificity, fixed_ef)
})
cross_df <- t(cross_val)
return(cross_df)
}
Solution developed from the comment: Using as.formula strings can be converted into a formula which can passed as arguments to my function in the following way:
cross_validation_function <- function(model_formula){
...
train_model <- glmer(model_formula, data = da, family = "binomial")
...}
formula <- as.formula( "y~ x + (1|z"))
cross_validation_function(formula)
If you aim is to extract the model formula from a fitted model, the you can use
attributes(model)$call[[2]]. Then you can use this formula when fitting model with the cv folds.
mod_formula <- attributes(model)$call[[2]]
train_model = glmer(mod_formula , data = train_data,
family = "binomial")

R: obtain coefficients&CI from bootstrapping mixed-effect model results

The working data looks like:
set.seed(1234)
df <- data.frame(y = rnorm(1:30),
fac1 = as.factor(sample(c("A","B","C","D","E"),30, replace = T)),
fac2 = as.factor(sample(c("NY","NC","CA"),30,replace = T)),
x = rnorm(1:30))
The lme model is fitted as:
library(lme4)
mixed <- lmer(y ~ x + (1|fac1) + (1|fac2), data = df)
I used bootMer to run the parametric bootstrapping and I can successfully obtain the coefficients (intercept) and SEs for fixed&random effects:
mixed_boot_sum <- function(data){s <- sigma(data)
c(beta = getME(data, "fixef"), theta = getME(data, "theta"), sigma = s)}
mixed_boot <- bootMer(mixed, FUN = mixed_boot_sum, nsim = 100, type = "parametric", use.u = FALSE)
My first question is how to obtain the coefficients(slope) of each individual levels of the two random effects from the bootstrapping results mixed_boot ?
I have no problem extracting the coefficients(slope) from mixed model by using augment function from broom package, see below:
library(broom)
mixed.coef <- augment(mixed, df)
However, it seems like broom can't deal with boot class object. I can't use above functions directly on mixed_boot.
I also tried to modify the mixed_boot_sum by adding mmList( I thought this would be what I am looking for), but R complains as:
Error in bootMer(mixed, FUN = mixed_boot_sum, nsim = 100, type = "parametric", :
bootMer currently only handles functions that return numeric vectors
Furthermore, is it possible to obtain CI of both fixed&random effects by specifying FUN as well?
Now, I am very confused about the correct specifications for the FUN in order to achieve my needs. Any help regarding to my question would be greatly appreciated!
My first question is how to obtain the coefficients(slope) of each individual levels of the two random effects from the bootstrapping results mixed_boot ?
I'm not sure what you mean by "coefficients(slope) of each individual level". broom::augment(mixed, df) gives the predictions (residuals, etc.) for every observation. If you want the predicted coefficients at each level I would try
mixed_boot_coefs <- function(fit){
unlist(coef(fit))
}
which for the original model gives
mixed_boot_coefs(mixed)
## fac1.(Intercept)1 fac1.(Intercept)2 fac1.(Intercept)3 fac1.(Intercept)4
## -0.4973925 -0.1210432 -0.3260958 0.2645979
## fac1.(Intercept)5 fac1.x1 fac1.x2 fac1.x3
## -0.6288728 0.2187408 0.2187408 0.2187408
## fac1.x4 fac1.x5 fac2.(Intercept)1 fac2.(Intercept)2
## 0.2187408 0.2187408 -0.2617613 -0.2617613
## ...
If you want the resulting object to be more clearly named you can use:
flatten <- function(cc) setNames(unlist(cc),
outer(rownames(cc),colnames(cc),
function(x,y) paste0(y,x)))
mixed_boot_coefs <- function(fit){
unlist(lapply(coef(fit),flatten))
}
When run through bootMer/confint/boot::boot.ci these functions will give confidence intervals for each of these values (note that all of the slopes facW.xZ are identical across groups because the model assumes random variation in the intercept only). In other words, whatever information you know how to extract from a fitted model (conditional modes/BLUPs [ranef], predicted intercepts and slopes for each level of the grouping variable [coef], parameter estimates [fixef, getME], random-effects variances [VarCorr], predictions under specific conditions [predict] ...) can be used in bootMer's FUN argument, as long as you can flatten its structure into a simple numeric vector.

GLMNET prediction with intercept

I have two questions about prediction using GLMNET - specifically about the intercept.
I made a small example of train data creation, GLMNET estimation and prediction on the train data (which I will later change to Test data):
# Train data creation
Train <- data.frame('x1'=runif(10), 'x2'=runif(10))
Train$y <- Train$x1-Train$x2+runif(10)
# From Train data frame to x and y matrix
y <- Train$y
x <- as.matrix(Train[,c('x1','x2')])
# Glmnet model
Model_El <- glmnet(x,y)
Cv_El <- cv.glmnet(x,y)
# Prediction
Test_Matrix <- model.matrix(~.-y,data=Train)[,-1]
Test_Matrix_Df <- data.frame(Test_Matrix)
Pred_El <- predict(Model_El,newx=Test_Matrix,s=Cv_El$lambda.min,type='response')
I want to have an intercept in the estimated formula. This code gives an error concerning the dimensions of the Test_Matrix matrix unless I remove the (Intercept) column of the matrix - as in
Test_Matrix <- model.matrix(~.-y,data=Train)[,-1]
My questions are:
Is it the right way to do this in order to get the prediction - when I want the prediction formula to include the intercept?
If it is the right way: Why do I have to remove the intercept in the matrix?
Thanks in advance.
The matrix x you were feeding into the glmnet function doesn't contain an intercept column. Therefore, you should respect this format when constructing your test matrix: i.e. just do model.matrix(y ~ . - 1, data = Train).
By default, an intercept is fit in glmnet (see the intercept parameter in the glmnet function). Therefore, when you called glmnet(x, y), you are technically doing glmnet(x, y, intercept = T). Thus, even though your x matrix didn't have an intercept, one was fit for you.
If you want to predict a model with intercept, you have to fit a model with intercept. Your code used model matrix x <- as.matrix(Train[,c('x1','x2')]) which is intercept-free, therefore if you provide an intercept when using predict, you get an error.
You can do the following:
x <- model.matrix(y ~ ., Train) ## model matrix with intercept
Model_El <- glmnet(x,y)
Cv_El <- cv.glmnet(x,y)
Test_Matrix <- model.matrix(y ~ ., Train) ## prediction matrix with intercept
Pred_El <- predict(Model_El, newx = Test_Matrix, s = Cv_El$lambda.min, type='response')
Note, you don't have to do
model.matrix(~ . -y)
model.matrix will ignore the LHS of the formula, so it is legitimate to use
model.matrix(y ~ .)

Function for Logistic Regression Training Set

I am trying to create a function to test a logistic regression model developed on a training set.
For example
train <- filter(y, folds != i)
test <- filter(y, folds == i)
I want to be able to use the formula for different data sets.
For example, if I were to take y to be a response variable such as “low” in the birthwt data set and x to be the explanatory variables e.g. “age", “race” how would I implement these arguments into glm.train formula without having to type the function separately for different data sets ?
glm.train <- glm(y ~x, family = binomial, data = train)
You can use reformulate to create a formula based on strings:
x <- c("age", "race")
y <- "low"
form <- reformulate(x, response = y)
# low ~ age + race
Use this formula for glm:
glm.train <- glm(form, family = binomial, data = train)

Resources