I'm doing a bit of exploratory data analysis using HMDA data from the AER package; however, the variables that I used to fit the model seem to contain some observations that perfectly determine the outcomes, an issue known as "separation." So I tried to remedy this using the solution recommended by this thread, yet when I tried to execute the first set of source code from glm.fit(), R returned an error message:
Error in family$family : object of type 'closure' is not subsettable
so I could not proceed any further to remove those fully determined observations from my data with this code. I am wondering if anyone could help me fix this?
My current code is provided at below for your reference.
# load the AER package and HMDA data
library(AER)
data(HMDA)
# fit a 2-degree olynomial probit model
probit.fit <- glm(deny ~ poly(hirat, 2), family = binomial, data = HMDA)
# using the revised source code from that stackexchage thread to find out observations that received a warning message
library(tidyverse)
library(dplyr)
library(broom)
eps <- 10 * .Machine$double.eps
if (family$family == "binomial") {
if (any(mu > 1 - eps) || any(mu < eps))
warning("glm.fit: fitted probabilities numerically 0 or 1 occurred",
call. = FALSE)
}
# this return the following error message
# Error in family$family : object of type 'closure' is not subsettable
probit.resids <- augment(probit.fit) %>%
mutate(p = 1 / (1 + exp(-.fitted)),
warning = p > 1-eps)
arrange(probit.resids, desc(.fitted)) %>%
select(2:5, p, warning) %>%
slice(1:10)
HMDA.nwarning <- filter(HMDA, !probit.resids$warning)
# using HMDA.nwarning should solve the problem...
probit.fit <- glm(deny ~ poly(hirat, 2), family = binomial, data = HMDA.nwarning)
This chunk of code
if (family$family == "binomial") {
if (any(mu > 1 - eps) || any(mu < eps))
warning("glm.fit: fitted probabilities numerically 0 or 1 occurred",
call. = FALSE)
}
there is a function, binomial() called when you run glm with family == "binomial". If you look under glm (just type glm):
if (is.character(family))
family <- get(family, mode = "function", envir = parent.frame())
if (is.function(family))
family <- family()
if (is.null(family$family)) {
print(family)
stop("'family' not recognized")
}
And the glm function checks binomial()$family during the fit, and if any of the predicted values differ from 1 or 0 by eps, it raises that warning.
You don't need to run that part, and yes, you need to set eps <- 10 * .Machine$double.eps . So let's run the code below, and if you run a probit, you need to specify link="probit" in binomial, otherwise the default is logit:
library(AER)
library(tidyverse)
library(dplyr)
library(broom)
data(HMDA)
probit.fit <- glm(deny ~ poly(hirat, 2), family = binomial(link="probit"), data = HMDA)
eps <- 10 * .Machine$double.eps
probit.resids <- augment(probit.fit) %>%
mutate(p = 1 / (1 + exp(-.fitted)),
warning = p > 1-eps)
The column warning indicates if the observations raises a warning, in this dataset, there's one:
table(probit.resids$warning)
FALSE TRUE
2379 1
We can use the next step to filter it
HMDA.nwarning <- filter(HMDA, !probit.resids$warning)
dim(HMDA.nwarning)
[1] 2379 14
And rerun the regression:
probit.fit <- glm(deny ~ poly(hirat, 2), family = binomial(link="probit"), data = HMDA.nwarning)
coefficients(probit.fit)
(Intercept) poly(hirat, 2)1 poly(hirat, 2)2
-1.191292 8.708494 6.884404
I am having an error when trying to extract values form a mixed effect model using the 'effects' package. A reproducible example below.
#create database.
df<-NULL
df$col1<-as.numeric(c(4,18,10,41,30,40,47,30,41,18,11,16,47,26,20,10,11,11,
10,22,0,11,20,11,23,17,8,27,17,47,40,28,10,10,40,17,17,
20,17,11,28,41,10,20,37,22,31,18,17,30,16,11,27,20,10,10,
28,41,20,20,20,13,11,41,13,10,27,13,18,17,24,6,11,17,10,24,
17,10,22))
df$col2<-as.factor(c('location01','location01','location01','location01','location01','location01','location01','location01','location01','location01','location01','location01','location01',
'location01','location01','location03','location03','location03','location03','location03','location03','location03','location03','location05','location05','location05',
'location05','location05','location08','location08','location08','location08','location08','location08','location08','location08','location08','location17','location17',
'location17','location17','location20','location20','location20','location20','location20','location20','location20','location20','location20','location23','location23',
'location23','location23','location23','location23','location23','location23','location23','location26','location26','location26','location26','location26','location31',
'location31','location31','location31','location31','location31','location31','location31','location31','location34','location34','location34','location34','location34','location34'))
df$col3<-as.factor(c('a','a','a','a','c','c','c','c','c','b','b','b','b','b','b','a','a','a',
'a','a','a','a','a','c','c','c','c','c','a','a','a','c','c','c','b','b',
'b','c','c','c','c','a','a','a','a','a','b','b','b','b','a','a','a','a',
'c','c','c','c','c','b','b','b','b','b','c','c','c','c','b','b','b','b',
'b','a','a','a','b','b','b'))
df$col4<-as.factor(c('x','x','x','x','y','y','y','y','y','y','y','y','y','y','y','y','y',
'y','y','y','y','y','y','x','x','x','x','x','x','x','x','y','y',
'y','y','y','y','x','x','x','x','x','x','x','x','x','y','y','y',
'y','y','y','y','y','x','x','x','x','x','x','x','x','x','x','y',
'y','y','y','x','x','x','x','x','y','y','y','x','x','x'))
df<-as.data.frame(df)
#run two potential models
M1<-lme(col1 ~ col3*col4,random = ~ 1 | col2, data = df)
M2<-lme(col1 ~ col3*col4, random = ~ 1 | col2,data = df,weights = varIdent(form = ~ 1 | col3), method = "REML")
#both models work, and their results make sense. So now extract values with effects package
effsM1 <- effect("col3*col4", M1) #works!
effsM2 <- as.data.frame(effect("col3*col4", M2))# doesn't work
Error in model.frame.default(formula = col1 ~ col3 * col4, data = df,
: variable lengths differ (found for '(weights)')
I have a feeling that it is because I do not have the same number replicates for each treatment combination but this didn't used to give me any problems in the past.
I want to use the new bootMer() feature of the new lme4 package (the developer version currently). I am new to R and don't know which function should I write for its FUN argument. It says it needs a numerical vector, but I have no idea what that function will perform. So I have a mixed-model formula which is cast to the bootMer(), and have a number of replicates. So I don't know what that external function does? Is it supposed to be a template for bootstrapping methods? Aren't bootstrapping methods already implemented in he bootMer? So why they need an external "statistic of interest"? And which statistic of interest should I use?
Is the following syntax proper to work on? R keeps on error generating that the FUN must be a numerical vector. I don't know how to separate the estimates from the "fit" and even should I do that in the first place? I can just say I am lost with that "FUN" argument. Also I don't know should I pass the mixed-model glmer() formula using the variable "Mixed5" or should I pass some pointers and references? I see in the examples that X (the first argument of bootMer() is a *lmer() object. I wanted to write *Mixed5 but it rendered an error.
Many thanks.
My code is:
library(lme4)
library(boot)
(mixed5 <- glmer(DV ~ (Demo1 +Demo2 +Demo3 +Demo4 +Trt)^2
+ (1 | PatientID) + (0 + Trt | PatientID)
, family=binomial(logit), MixedModelData4))
FUN <- function(formula) {
fit <- glmer(DV ~ (Demo1 +Demo2 +Demo3 +Demo4 +Trt)^2
+ (1 | PatientID) + (0 + Trt | PatientID)
, family=binomial(logit), MixedModelData4)
return(coef(fit))
}
result <- bootMer(mixed5, FUN, nsim = 3, seed = NULL, use.u = FALSE,
type = c("parametric"),
verbose = T, .progress = "none", PBargs = list())
result
FUN
fit
And the error:
Error in bootMer(mixed5, FUN, nsim = 3, seed = NULL, use.u = FALSE, type = c("parametric"), :
bootMer currently only handles functions that return numeric vectors
-------------------------------------------------------- Update -----------------------------------------------------
I edited the code like what Ben instructed. The code ran very good but the SEs and Biases were all zero. Also do you know how to extract P values from this output (strange to me)? Should I use mixed() of afex package?
My revised code:
library(lme4)
library(boot)
(mixed5 <- glmer(DV ~ (Demo1 +Demo2 +Demo3 +Demo4 +Trt)^2
+ (0 + Trt | PatientID)
, family=binomial(logit), MixedModelData4))
FUN <- function(fit) {
fit <- glmer(DV ~ (Demo1 +Demo2 +Demo3 +Demo4 +Trt)^2
+ (1 | PatientID) + (0 + Trt | PatientID)
, family=binomial(logit), MixedModelData4)
return(fixef(fit))
}
result <- bootMer(mixed5, FUN, nsim = 3)
result
-------------------------------------------------------- Update 2 -----------------------------------------------------
I also tried the following but the code generated warnings and didn't give any result.
(mixed5 <- glmer(DV ~ Demo1 +Demo2 +Demo3 +Demo4 +Trt
+ (1 | PatientID) + (0 + Trt | PatientID)
, family=binomial(logit), MixedModelData4))
FUN <- function(mixed5) {
return(fixef(mixed5))}
result <- bootMer(mixed5, FUN, nsim = 2)
Warning message:
In bootMer(mixed5, FUN, nsim = 2) : some bootstrap runs failed (2/2)
> result
Call:
bootMer(x = mixed5, FUN = FUN, nsim = 2)
Bootstrap Statistics :
WARNING: All values of t1* are NA
WARNING: All values of t2* are NA
WARNING: All values of t3* are NA
WARNING: All values of t4* are NA
WARNING: All values of t5* are NA
WARNING: All values of t6* are NA
-------------------------------------------------------- Update 3 -----------------------------------------------------
This code as well generated warnings:
FUN <- function(fit) {
return(fixef(fit))}
result <- bootMer(mixed5, FUN, nsim = 2)
The warnings and results:
Warning message:
In bootMer(mixed5, FUN, nsim = 2) : some bootstrap runs failed (2/2)
> result
Call:
bootMer(x = mixed5, FUN = FUN, nsim = 2)
Bootstrap Statistics :
WARNING: All values of t1* are NA
WARNING: All values of t2* are NA
WARNING: All values of t3* are NA
WARNING: All values of t4* are NA
WARNING: All values of t5* are NA
WARNING: All values of t6* are NA
There are basically two (simple) confusions here.
The first is between coef() (which returns a list of matrices) and fixef() (which returns a vector of the fixed-effect
coefficients): I assume that fixef() is what you wanted, although you might want something like c(fixef(mixed),unlist(VarCorr(mixed))).
the second is that FUN should take a fitted model object as input ...
For example:
library(lme4)
library(boot)
mixed <- glmer(incidence/size ~ period + (1|herd),
weights=size, data=cbpp, family=binomial)
FUN <- function(fit) {
return(fixef(fit))
}
result <- bootMer(mixed, FUN, nsim = 3)
result
## Call:
## bootMer(x = mixed, FUN = FUN, nsim = 3)
## Bootstrap Statistics :
## original bias std. error
## t1* -1.398343 -0.20084060 0.09157886
## t2* -0.991925 0.02597136 0.18432336
## t3* -1.128216 -0.03456143 0.05967291
## t4* -1.579745 -0.08249495 0.38272580
##
This might be the same problem, that I reported as an issue here. At least it leads to the same, unhelpful error message and took me a while too.
That would mean you have missings in your data, which lmer ignores but which kill bootMer.
Try:
(mixed5 <- glmer(DV ~ (Demo1 +Demo2 +Demo3 +Demo4 +Trt)^2
+ (1 | PatientID) + (0 + Trt | PatientID)
, family=binomial(logit), na.omit(MixedModelData4[,c('DV','Demo1','Demo2','Demo3','Trt','PatientId')])))