Is there a way to get the null deviance and df for a generalized linear mixed model fit with glmer()? Is there a reason that this is not included in the summary() output, the way that it is with a glm() object?
You can compute the null deviance by re-fitting the model with an intercept term only, e.g.
gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
gm0 <- update(gm1, . ~ 1 + (1|herd))
deviance(gm1) ## 73.47
deviance(gm0) ## 92.42 (null deviance)
I'm not sure what you mean by the "null df" for the GLMM; the 'denominator degree of freedom' measure of effective sample size that works perfectly for balanced ANOVAs and questionably for linear mixed models [via inclusion/exclusion, Satterthwaite, Kenward-Roger, etc.] is hard to define for GLMMs.
I can think of a couple of reasons that lme4 doesn't automatically do this computation for you:
it could be an expensive re-fit (even for GLMs it does require refitting the model, see here for the code in glm that does it)
it's less obvious for GLMMs what the appropriate null model for comparison is. Do you remove both random and fixed effects and reduce the model to a GLM? Do you keep all of the random effects, or only intercept-level random effects, or some other mixture depending on the context of the question? Making the user do it themselves forces them to make this choice.
(That said, I don't believe that omitting the null deviance was an explicit choice.)
If you do choose to discard all of the random effects (i.e. comparing to deviance(glm(cbind(incidence, size - incidence) ~ period, data =cbpp, family = binomial)) in the example above, you should be able to do a meaningful comparison with a glmer fit, but there are some subtleties: you might want to read the section on Deviance and log-likelihood of GLMMs in ?deviance.merMod.
Related
I am building a model using the mgcv package in r. The data has serial measures (data collected during scans 15 minutes apart in time, but discontinuously, e.g. there might be 5 consecutive scans on one day, and then none until the next day, etc.). The model has a binomial response, a random effect of day, a fixed effect, and three smooth effects. My understanding is that REML is the best fitting method for binomial models, but that this method cannot be specified using the gamm function for a binomial model. Thus, I am using the gam function, to allow for the use of REML fitting. When I fit the model, I am left with residual autocorrelation at a lag of 2 (i.e. at 30 minutes), assessed using ACF and PACF plots.
So, we wanted to include an autocorrelation structure in the model, but my understanding is that only the gamm function and not the gam function allows for the inclusion of such structures. I am wondering if there is anything I am missing and/or if there is a way to deal with autocorrelation with a binomial response variable in a GAMM built in mgcv.
My current model structure looks like:
gam(Response ~
s(Day, bs = "re") +
s(SmoothVar1, bs = "cs") +
s(SmoothVar2, bs = "cs") +
s(SmoothVar3, bs = "cs") +
as.factor(FixedVar),
family=binomial(link="logit"), method = "REML",
data = dat)
I tried thinning my data (using only every 3rd data point from consecutive scans), but found this overly restrictive to allow effects to be detected due to my relatively small sample size (only 42 data points left after thinning).
I also tried using the prior value of the binomial response variable as a factor in the model to account for the autocorrelation. This did appear to resolve the residual autocorrelation (based on the updated ACF/PACF plots), but it doesn't feel like the most elegant way to do so and I worry this added variable might be adjusting for more than just the autocorrelation (though it was not collinear with the other explanatory variables; VIF < 2).
I would use bam() for this. You don't need to have big data to fit a with bam(), you just loose some of the guarantees about convergence that you get with gam(). bam() will fit a GEE-like model with an AR(1) working correlation matrix, but you need to specify the AR parameter via rho. This only works for non-Gaussian families if you also set discrete = TRUE when fitting the model.
You could use gamm() with family = binomial() but this uses PQL to estimate the GLMM version of the GAMM and if your binomial counts are low this method isn't very good.
I've fit two models, one with gam and another with gamm.
gam(y ~ x, family= betar)
gamm(y ~ x)
So the only difference is the distributional assumption. I use betar with gam and normal with gamm.
I would like to compare these two models, but I am guessing AIC will not work since the two models are based on different methods? Is there then some other suitable estimate I can use for comparison? I know I could just fit the second with gam, but let's ignore that for the sake of this question.
AIC is independent of the type of model used as soon as y is exactly the same observation to be predicted. This is only a computation of deviance explained penalised by the number of parameters fitted.
However, depending on the goal of your model, if you want to be able to use the model for prediction for instance, you should use validation to compare model performance. 10-fold cross-validation would be a good idea for instance.
I want to use a mixed model without a random intercept but with a correlation structure. The reason is to get the AIC to help choose the best correlation structure (e.g., autoregressive versus compound symmetry). So it is essentially a GEE, but GEEs don't allow estimation of the AIC. They are also called covariance pattern models.
The code below simulates random data with a compound symmetry correlation. The model fits both a random intercept and a variance-covariance matrix. Is there any way to switch off the random intercept?
library(MASS)
library(nlme)
Sigma = toeplitz(c(1,0.5,0.5,0.5))
data = data.frame(mvrnorm(n=10, mu=1:4, Sigma=Sigma))
data$id = 1:nrow(data)
long = reshape(data, direction='long', varying=list(1:4), v.names='Y')
cs = corCompSymm(0.5, form = ~ 1 | id)
model = lme(Y~time , random=list(~1|id), data=long, correlation=cs)
summary(model)
If you are solely interested in comparing correlation structures, then I am pretty sure your goal could be served by a generalized least squares model fit with gls:
model = gls(Y~time, data=long, correlation=cs)
summary(model)
AIC(model)
Otherwise, a linear mixed effects model fit with lme must have random effects specified.
I am trying to implement the use of conditional inference trees (by package partykit) as induction trees, which purpose is merely describing and not predicting individual cases. According to Ritschard here, here and there, for example, a measure of deviance can be estimated by comparing by means of cross-tabs the real and estimated distributions of the response variable in relationship to the possible predictors-based profiles, the so called ^T T and tables.
I would like to use deviance and other derivated statistics as a GOF measure of objects obtained by ctree() function. I am introducing myself to this topic, and I would very much appreciate some input, such as a piece of R code or some orientation about the structure of ctree objects that could be involved in the coding.
I have thought myself that I could, from scratch, obtain both target and predicted tables and compute later the deviance formula. I confess being not confident at all about how to proceed though.
Thanks a lot beforehand!
Some background information first: We have discussed adding deviance() or logLik() methods for ctree objects. So far we haven't done so because conditional inference trees are not associated with a particular loss function or even likelihood. Instead, only the associations between response and partitioning variables are assessed by means of conditional inference tests using certain influence and regressor transformations. However, for the default regression and classification case, measures of deviance or log-likelihood can be a useful addition in practice. So maybe we will add these methods in future versions.
If you want to consider trees associated with a formal deviance/likelihood, you may consider using the general mob() framework or the lmtree() and glmtree() convenience functions. If only partitioning variables are specified (and no further regressors to be used in every node), these often lead to very similar trees compared to ctree(). But then you can also use AIC() etc.
But to come back to your original question: You can compute deviance/log-likelihood or other loss functions fairly easily if you look at the model response and the fitted response. Alterantively, you can extract a factor variable that indicates the terminal nodes and refit a linear or multinomial model. This will have the same fitted values but also supply deviance() and logLik(). Below, I illustrate this with the airct and irisct trees that you obtain when running example("ctree", package = "partykit").
Regression: The Gaussian deviance is simply the residual sum of squares:
sum((airq$Ozone - predict(airct, newdata = airq, type = "response"))^2)
## [1] 46825.35
The same can be obtained by re-fitting as a linear regression model:
airq$node <- factor(predict(airct, newdata = airq, type = "node"))
airlm <- lm(Ozone ~ node, data = airq)
deviance(airlm)
## [1] 46825.35
logLik(airlm)
## 'log Lik.' -512.6311 (df=6)
Classification: The log-likelihood is simply the sum of the predicted log-probabilities at the observed classes. And the deviance is -2 times the log-likelihood:
irisprob <- predict(irisct, type = "prob")
sum(log(irisprob[cbind(1:nrow(iris), iris$Species)]))
## [1] -15.18056
-2 * sum(log(irisprob[cbind(1:nrow(iris), iris$Species)]))
## [1] 30.36112
Again, this can also be obtained by re-fitting as a multinomial model:
library("nnet")
iris$node <- factor(predict(irisct, newdata = iris, type = "node"))
irismultinom <- multinom(Species ~ node, data = iris, trace = FALSE)
deviance(irismultinom)
## [1] 30.36321
logLik(irismultinom)
## 'log Lik.' -15.1816 (df=8)
See also the discussion in https://stats.stackexchange.com/questions/6581/what-is-deviance-specifically-in-cart-rpart for the connections between regression and classification trees and generalized linear models.
I am currently testing whether I should include certain random effects in my lmer model or not. I use the anova function for that. My procedure so far is to fit the model with a function call to lmer() with REML=TRUE (the default option). Then I call anova() on the two models where one of them does include the random effect to be tested for and the other one doees not. However, it is well known that the anova() function refits the model with ML but in the new version of anova() you can prevent anova() from doing so by setting the option refit=FALSE. In order to test for random effects should I set refit=FALSE in my call to anova() or not? (If I do set refit=FALSE the p-values tend to be lower. Are the p-values anti-conservative when I set refit=FALSE?)
Method 1:
mod0_reml <- lmer(x ~ y + z + (1 | w), data=dat)
mod1_reml <- lmer(x ~ y + z + (y | w), data=dat)
anova(mod0_reml, mod1_reml)
This will result in anova() refitting the models with ML instead of REML. (Newer versions of the anova() function will also output an info about this.)
Method 2:
mod0_reml <- lmer(x ~ y + z + (1 | w), data=dat)
mod1_reml <- lmer(x ~ y + z + (y | w), data=dat)
anova(mod0_reml, mod1_reml, refit=FALSE)
This will result in anova() performing its calculations on the original models, i.e. with REML=TRUE.
Which of the two methods is correct in order to test whether I should include a random effect or not?
Thanks for any help
In general I would say that it would be appropriate to use refit=FALSE in this case, but let's go ahead and try a simulation experiment.
First fit a model without a random slope to the sleepstudy data set, then simulate data from this model:
library(lme4)
mod0 <- lmer(Reaction ~ Days + (1|Subject), data=sleepstudy)
## also fit the full model for later use
mod1 <- lmer(Reaction ~ Days + (Days|Subject), data=sleepstudy)
set.seed(101)
simdat <- simulate(mod0,1000)
Now refit the null data with the full and the reduced model, and save the distribution of p-values generated by anova() with and without refit=FALSE. This is essentially a parametric bootstrap test of the null hypothesis; we want to see if it has the appropriate characteristics (i.e., uniform distribution of p-values).
sumfun <- function(x) {
m0 <- refit(mod0,x)
m1 <- refit(mod1,x)
a_refit <- suppressMessages(anova(m0,m1)["m1","Pr(>Chisq)"])
a_no_refit <- anova(m0,m1,refit=FALSE)["m1","Pr(>Chisq)"]
c(refit=a_refit,no_refit=a_no_refit)
}
I like plyr::laply for its convenience, although you could just as easily use a for loop or one of the other *apply approaches.
library(plyr)
pdist <- laply(simdat,sumfun,.progress="text")
library(ggplot2); theme_set(theme_bw())
library(reshape2)
ggplot(melt(pdist),aes(x=value,fill=Var2))+
geom_histogram(aes(y=..density..),
alpha=0.5,position="identity",binwidth=0.02)+
geom_hline(yintercept=1,lty=2)
ggsave("nullhist.png",height=4,width=5)
Type I error rate for alpha=0.05:
colMeans(pdist<0.05)
## refit no_refit
## 0.021 0.026
You can see that in this case the two procedures give practically the same answer and both procedures are strongly conservative, for well-known reasons having to do with the fact that the null value of the hypothesis test is on the boundary of its feasible space. For the specific case of testing a single simple random effect, halving the p-value gives an appropriate answer (see Pinheiro and Bates 2000 and others); this actually appears to give reasonable answers here, although it is not really justified because here we are dropping two random-effects parameters (the random effect of slope and the correlation between the slope and intercept random effects):
colMeans(pdist/2<0.05)
## refit no_refit
## 0.051 0.055
Other points:
You might be able to do a similar exercise with the PBmodcomp function from the pbkrtest package.
The RLRsim package is designed precisely for fast randomization (parameteric bootstrap) tests of null hypotheses about random effects terms, but doesn't appear to work in this slightly more complex situation
see the relevant GLMM faq section for similar information, including arguments for why you might not want to test the significance of random effects at all ...
for extra credit you could redo the parametric bootstrap runs using the deviance (-2 log likelihood) differences rather than the p-values as output and check whether the results conformed to a mixture between a chi^2_0 (point mass at 0) and a chi^2_n distribution (where n is probably 2, but I wouldn't be sure for this geometry)