Likelihood ratio for glmmTMB model? - r

I have a mixed model where I'm trying to find the significance of my random effect. The model is a mixed model with zero-inflated beta distribution which I built using the R package glmmTMB, with the following function:
model<-glmmTMB(Overlap~Diff.Long+Diff.Bkp + DiffSeason + (1|Xnumber),ziformula=~1,data=data,family=beta_family())
I'm trying to find the significance of the variable "Xnumber". I've read that what I need to do is a likelihood ratio test, but don't know how to do this with a glmmTMB object. I've tried using the Anova() function but I don't think the output is giving me what I want:
Anova(model,type="II")
Analysis of Deviance Table (Type II Wald chisquare tests)
Response: Overlap
Chisq Df Pr(>Chisq)
Diff.Long 5.0217 1 0.02503 *
Diff.Bkp 1.4717 1 0.22507
DiffSeason 7.5487 2 0.02295 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Any suggestions?

Since Xnumber is the grouping variable for which random effects are generated, it won't show up in the Anova table, because it doesn't have a coefficient that is being tested. To test this term, you could just leave it out (i.e., estimate a pooled model) and then use lrtest() from the lmtest package to calculate the LR-test. The null hypothesis is that the pooled model is sufficient. Here's the example from the glmmTMB() function:
library(glmmTMB)
library(lmtest)
m1 <- glmmTMB(count ~ mined + (1|site),
zi=~mined,
family=poisson, data=Salamanders)
m2 <- glmmTMB(count ~ mined,
zi=~mined,
family=poisson, data=Salamanders)
lrtest(m1, m2)
#> Likelihood ratio test
#>
#> Model 1: count ~ mined + (1 | site)
#> Model 2: count ~ mined
#> #Df LogLik Df Chisq Pr(>Chisq)
#> 1 5 -949.23
#> 2 4 -958.96 -1 19.456 1.03e-05 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Created on 2022-11-25 by the reprex package (v2.0.1)
Note, that in this case, since the test is significant, the pooled model is insufficient and the random effects model is preferred.

Related

How to get the overall significance from an mblogit model (mclogit package)

I am modeling a multiple choice response with the mclogit package (mblogit function) and I don't get an overall model significance in the summary(). Also running anova() for a comparison with a null-model doesn't work. Moreover, I'm not sure if the Chi-square test even applies here. I get this warning:
Warning in anova.mclogitlist(c(list(object), dotargs), dispersion = dispersion,
Results are unreliable, since deviances from quasi-likelihoods are not comparable.
Analysis of Deviance Table
Model 1: this_response ~ 1
Model 2: this_response ~ classLevel
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 7755 6530.4
2 7704 6058.6 51 471.82 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Here's my code:
m1 = mclogit::mblogit(
formula = this_response ~ classLevel,
random = ~ 1 + classLevel | question,
data = test_set_8_df)
summary(m1) # -> I get sign. for the coefficients but not for the overall model
m0 = mclogit::mblogit(
formula = this_response ~ 1,
data = test_set_8_df)
anova(m0,m1, test="Chisq") # -> I get a warning

R code to test the difference between coefficients of regressors from one panel regression

I am trying to compare two regression coefficient from the same panel regression used over two different time periods in order to confirm the statistical significance of difference. Therefore, running my panel regression first with observations over 2007-2009, I get an estimate of one coefficient I am interested in to compare with the estimate of the same coefficient obtained from the same panel model applied over the period 2010-2017.
Based on R code to test the difference between coefficients of regressors from one regression, I tried to compute a likelihood ratio test. In the linked discussion, they use a simple linear equation. If I use the same commands in R than described in the answer, I get results based on a chi-squared distribution and I don't understand if and how I can interpret that or not.
In r, I did the following:
linearHypothesis(reg.pannel.recession.fe, "Exp_Fri=0.311576")
where reg.pannel.recession.fe is the panel regression over the period 2007-2009, Exp_Fri is the coefficient of this regression I want to compare, 0.311576 is the estimated coefficient over the period 2010-2017.
I get the following results using linearHypothesis():
How can I interpret that? Should I use another function as it is plm objects?
Thank you very much for your help.
You get a F test in that example because as stated in the vignette:
The method for "lm" objects calls the default method, but it changes
the
default test to "F" [...]
You can also set the test to F, but basically linearHypothesis works whenever the standard error of the coefficient can be estimated from the variance-covariance matrix, as also said in the vignette:
The default method will work with any model
object for which the coefficient vector can be retrieved by ‘coef’
and the coefficient-covariance matrix by ‘vcov’ (otherwise the
argument ‘vcov.’ has to be set explicitly)
So using an example from the package:
library(plm)
data(Grunfeld)
wi <- plm(inv ~ value + capital,
data = Grunfeld, model = "within", effect = "twoways")
linearHypothesis(wi,"capital=0.3",test="F")
Linear hypothesis test
Hypothesis:
capital = 0.3
Model 1: restricted model
Model 2: inv ~ value + capital
Res.Df Df F Pr(>F)
1 170
2 169 1 6.4986 0.01169 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
linearHypothesis(wi,"capital=0.3")
Linear hypothesis test
Hypothesis:
capital = 0.3
Model 1: restricted model
Model 2: inv ~ value + capital
Res.Df Df Chisq Pr(>Chisq)
1 170
2 169 1 6.4986 0.0108 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
And you can also use a t.test:
tested_value = 0.3
BETA = coefficients(wi)["capital"]
SE = coefficients(summary(wi))["capital",2]
tstat = (BETA- tested_value)/SE
pvalue = as.numeric(2*pt(-tstat,wi$df.residual))
pvalue
[1] 0.01168515

Why is the p-value from the analysis of deviance table different from the estimated with pchisq()?

I'm studying GLM models from the Lane (2002) paper and I am a bit confused with the analysis of deviance for the Gamma-GLM model.
In the paper, the p-value is lower than P < 0.001 but if we used the deviance reported as well as the degrees of freedom to calculate the p-value with the pchisq() function in R we get the following results:
> 1-pchisq(11.1057, 7)
[1] 0.1340744`
and not the P <0.001 reported in the paper.
I've copied the data to replicate the GLM model here's the link! and this is the code I used to generate the results:
test <- read_csv("data/test_glm_gamma.csv", col_types = cols())
model.test <- glm(soil ~ trt, family = Gamma(link = "log"), data = test)
anova(model.test, test = "Chisq")
which returns:
Analysis of Deviance Table
Model: Gamma, link: log
Response: cont
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 23 11.5897
trt 7 11.106 16 0.4839 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
with similar deviances as in the paper and I suspect similar P-value but this is not the 0.13 obtained before.
Is there any transformation done before the P-value is calculated? Or I am calculating the p-value in the wrong way? How do they get the <2.2e-16 in the deviance table?
Lane, P. W. (2002). Generalized linear models in soil science. European Journal of Soil Science, 53, 241–251. https://doi.org/10.1046/j.1365-2389.2002.00440.x
You need to pass the deviance scaled by the dispersion to pchisq:
p <- pchisq(anova(model.test)$Deviance[2]/
summary(model.test)$dispersion,
anova(model.test)$Df[2],
lower.tail = FALSE)
p == anova(model.test, test = "Chisq")$`Pr(>Chi)`[2]
#[1] TRUE
You can study the code of stats:::stat.anova to see how the p-values are calculated for the different tests.

Interpreting output of analysis of deviance table from anova() model comparison

I have a large multivariate abundance data and I am interested in comparing multiple models that fit different combinations of three categorical predictor variables to my species matrix response variable. I have been using anova() to compare my different models, but I am having difficulty interpreting the output. Below, I have given my code as well as the corresponding R output.
invert.mvabund <- mvabund(mva.dat)
null<-manyglm(mva.dat~1, family='negative.binomial')
m1 <- manyglm(mva.dat~Habitat+Detritus, family='negative.binomial')
m2 <- manyglm(mva.dat~Habitat*Detritus, family='negative.binomial')
m3 <- manyglm(mva.dat~Habitat*Detritus+Block, family='negative.binomial')
anova(null,m1,m2,m3)
Analysis of Deviance Table
null: mva.dat ~ 1
m1: mva.dat ~ Habitat + Detritus
m2: mva.dat ~ Habitat * Detritus
m3: mva.dat ~ Habitat * Detritus + Block
Multivariate test:
Res.Df Df.diff Dev Pr(>Dev)
null 99
m1 94 5 257.2 0.001 ***
m2 90 4 87.7 0.003 **
m3 81 9 173.5 0.003 **
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
How do I interpret these results? Is m2 the best-fitting model because it has the lowest deviance, even though it has a higher p-value than m1? Is this because the p-value is suggesting that there is a significant level of deviance, so the optimal model will have a higher p-value? Any suggestions on how to interpret these results would be much appreciated- I haven't been able to find a clear answer in my Google searches. Thanks!

why step() returns weird results from backward elimination for full model using lmerTest

I am confused that why the results from processing step(model) in lmerTest are abnormal.
m0 <- lmer(seed ~ connection*age + (1|unit), data = test)
step(m0)
note: Both "connection" and "age" have been set as.factor()
Random effects:
Chi.sq Chi.DF elim.num p.value
unit 0.25 1 1 0.6194
Fixed effects:
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
connection 1 0.01746 0.017457 1.5214 0.22142
age 1 0.07664 0.076643 6.6794 0.01178 *
connection:age 1 0.04397 0.043967 3.8317 0.05417 .
Residuals 72 0.82617 0.011475
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Least squares means:
Estimate Standard Error DF t-value Lower CI Upper CI p-value
Final model:
Call:
lm(formula = fo, data = mm, contrasts = l.lmerTest.private.contrast)
Coefficients:
(Intercept) connectionD ageB connectionD:ageB
-0.84868 -0.07852 0.01281 0.09634
Why it does not show me the Final model?
The thing here is that random effect was eliminated as being NS according to the LR test. Then the anova method for the fixed effects model, the "lm" object was applied and no elimination of NS fixed effects was done. You are right, that the output is different from "lmer" objects and there are no (differences of ) least squares means. If you want to get the latter you may try the lsmeans package. For the backward elimination of NS effect of the final model you may use stats::step function.

Resources