I want to calculate CI in mixed models, zero inflated negative binomial and hurdle model. My code for hurdle model looks like this (x1, x2 continuous, x3 categorical):
m1 <- glmmTMB(count~x1+x2+x3+(1|year/class),
data = bd, zi = ~x2+x3+(1|year/class), family = truncated_nbinom2,
)
I used confint, and I got these results:
ci <- confint(m1,parm="beta_")
ci
2.5 % 97.5 % Estimate
cond.(Intercept) 1.816255e-01 0.448860094 0.285524861
cond.x1 9.045278e-01 0.972083366 0.937697401
cond.x2 1.505770e+01 26.817439186 20.094998772
cond.x3high 1.190972e+00 1.492335046 1.333164894
cond.x3low 1.028147e+00 1.215828654 1.118056377
cond.x3reg 1.135515e+00 1.385833853 1.254445909
class:year.cond.Std.Dev.(Intercept)2.256324e+00 2.662976154 2.441845815
year.cond.Std.Dev.(Intercept) 1.051889e+00 1.523719169 1.157153015
zi.(Intercept) 1.234418e-04 0.001309705 0.000402085
zi.x2 2.868578e-02 0.166378014 0.069084606
zi.x3high 8.972025e-01 1.805832900 1.272869874
Am I calculating the intervals correctly? Why is there only one category in x3 for zi?
If possible, I would also like to know if it's possible to plot these CIs.
Thanks!
Data looks like this:
class id year count x1 x2 x3
956 5 3002 2002 3 15.6 47.9 high
957 5 4004 2002 3 14.3 47.9 low
958 5 6021 2002 3 14.2 47.9 high
959 4 2030 2002 3 10.5 46.3 high
960 4 2031 2002 3 15.3 46.3 high
961 4 2034 2002 3 15.2 46.3 reg
with x1 and x2 continuous, x3 three level categorical variable (factor)
Summary of the model:
summary(m1)
'giveCsparse' has been deprecated; setting 'repr = "T"' for you'giveCsparse' has been deprecated; setting 'repr = "T"' for you'giveCsparse' has been deprecated; setting 'repr = "T"' for you
Family: truncated_nbinom2 ( log )
Formula: count ~ x1 + x2 + x3 + (1 | year/class)
Zero inflation: ~x2 + x3 + (1 | year/class)
Data: bd
AIC BIC logLik deviance df.resid
37359.7 37479.7 -18663.8 37327.7 13323
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
class:year(Intercept) 0.79701 0.8928
year (Intercept) 0.02131 0.1460
Number of obs: 13339, groups: class:year, 345; year, 15
Zero-inflation model:
Groups Name Variance Std.Dev.
dpto:year (Intercept) 1.024e+02 1.012e+01
year (Intercept) 7.842e-07 8.856e-04
Number of obs: 13339, groups: class:year, 345; year, 15
Overdispersion parameter for truncated_nbinom2 family (): 1.02
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.25343 0.23081 -5.431 5.62e-08 ***
x1 -0.06433 0.01837 -3.501 0.000464 ***
x2 3.00047 0.14724 20.378 < 2e-16 ***
x3high 0.28756 0.05755 4.997 5.82e-07 ***
x3low 0.11159 0.04277 2.609 0.009083 **
x3reg 0.22669 0.05082 4.461 8.17e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Zero-inflation model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.8188 0.6025 -12.977 < 2e-16 ***
x2 -2.6724 0.4484 -5.959 2.53e-09 ***
x3high 0.2413 0.1784 1.352 0.17635
x3low -0.1325 0.1134 -1.169 0.24258
x3reg -0.3806 0.1436 -2.651 0.00802 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
CI with broom.mixed
> broom.mixed::tidy(m1, effects="fixed", conf.int=TRUE)
# A tibble: 12 x 9
effect component term estimate std.error statistic p.value conf.low conf.high
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed cond (Intercept) -1.25 0.231 -5.43 5.62e- 8 -1.71 -0.801
2 fixed cond x1 -0.0643 0.0184 -3.50 4.64e- 4 -0.100 -0.0283
3 fixed cond x2 3.00 0.147 20.4 2.60e-92 2.71 3.29
4 fixed cond x3high 0.288 0.0575 5.00 5.82e- 7 0.175 0.400
5 fixed cond x3low 0.112 0.0428 2.61 9.08e- 3 0.0278 0.195
6 fixed cond x3reg 0.227 0.0508 4.46 8.17e- 6 0.127 0.326
7 fixed zi (Intercept) -9.88 1.32 -7.49 7.04e-14 -12.5 -7.30
8 fixed zi x1 0.214 0.120 1.79 7.38e- 2 -0.0206 0.448
9 fixed zi x2 -2.69 0.449 -6.00 2.01e- 9 -3.57 -1.81
10 fixed zi x3high 0.232 0.178 1.30 1.93e- 1 -0.117 0.582
11 fixed zi x3low -0.135 0.113 -1.19 2.36e- 1 -0.357 0.0878
12 fixed zi x4reg -0.382 0.144 -2.66 7.74e- 3 -0.664 -0.101
tl;dr as far as I can tell this is a bug in confint.glmmTMB (and probably in the internal function glmmTMB:::getParms). In the meantime, broom.mixed::tidy(m1, effects="fixed") should do what you want. (There's now a fix in progress in the development version on GitHub, should make it to CRAN sometime? soon ...)
Reproducible example:
set up data
set.seed(101)
n <- 1e3
bd <- data.frame(
year=factor(sample(2002:2018, size=n, replace=TRUE)),
class=factor(sample(1:20, size=n, replace=TRUE)),
x1 = rnorm(n),
x2 = rnorm(n),
x3 = factor(sample(c("low","reg","high"), size=n, replace=TRUE),
levels=c("low","reg","high")),
count = rnbinom(n, mu = 3, size=1))
fit
library(glmmTMB)
m1 <- glmmTMB(count~x1+x2+x3+(1|year/class),
data = bd, zi = ~x2+x3+(1|year/class), family = truncated_nbinom2,
)
confidence intervals
confint(m1, "beta_") ## wrong/ incomplete
broom.mixed::tidy(m1, effects="fixed", conf.int=TRUE) ## correct
You may want to think about which kind of confidence intervals you want:
Wald CIs (default) are much faster to compute and are generally OK as long as (1) your data set is large and (2) you aren't estimating any parameters on the log/logit scale that are near the boundaries
likelihood profile CIs are more accurate but much slower
I have a 2x2 factorial design: control vs enriched, and strain1 vs strain2. I wanted to make a linear model, which I did as follows:
anova(lmer(length ~ Strain + Insect + Strain:Insect + BW_final + (1|Pen), data = mydata))
Where length is one of the dependent variables I want to analyse, Strain and Insect as treatments, Strain:Insect as interaction effect, BW_final as covariate, and Pen as random effect.
As output I get this:
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Strain 3.274 3.274 1 65 0.1215 0.7285
Insect 14.452 14.452 1 65 0.5365 0.4665
BW_final 45.143 45.143 1 65 1.6757 0.2001
Strain:Insect 52.813 52.813 1 65 1.9604 0.1662
As you can see, I only get 1 interaction term: Strain:Insect. However, I'd like to see 4 interaction terms: Strain1:Control, Strain1:Enriched, Strain2:Control, Strain2:Enriched.
Is there any way to do this in R?
Using summary instead of anova I get:
> summary(linearmer)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [lmerModLmerTest]
Formula: length ~ Strain + Insect + Strain:Insect + BW_final + (1 | Pen)
Data: mydata_young
REML criterion at convergence: 424.2
Scaled residuals:
Min 1Q Median 3Q Max
-1.95735 -0.52107 0.07014 0.43928 2.13383
Random effects:
Groups Name Variance Std.Dev.
Pen (Intercept) 0.00 0.00
Residual 26.94 5.19
Number of obs: 70, groups: Pen, 27
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 101.646129 7.530496 65.000000 13.498 <2e-16 ***
StrainRoss 0.648688 1.860745 65.000000 0.349 0.729
Insect 0.822454 2.062696 65.000000 0.399 0.691
BW_final -0.005188 0.004008 65.000000 -1.294 0.200
StrainRoss:Insect -3.608430 2.577182 65.000000 -1.400 0.166
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) StrnRs Insect BW_fnl
StrainRoss 0.253
Insect -0.275 0.375
BW_final -0.985 -0.378 0.169
StrnRss:Ins 0.071 -0.625 -0.775 0.016
convergence code: 0
boundary (singular) fit: see ?isSingular```
I am trying to predict and graph models with species presence as the response. However I've run into the following problem: the ggpredict outputs are wildly different for the same data in glmer and glmmTMB. However, the estimates and AIC are very similar. These are simplified models only including date (which has been centered and scaled), which seems to be the most problematic to predict.
yntest<- glmer(MYOSOD.P~ jdate.z + I(jdate.z^2) + I(jdate.z^3) +
(1|area/SiteID), family = binomial, data = sodpYN)
> summary(yntest)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: MYOSOD.P ~ jdate.z + I(jdate.z^2) + I(jdate.z^3) + (1 | area/SiteID)
Data: sodpYN
AIC BIC logLik deviance df.resid
1260.8 1295.1 -624.4 1248.8 2246
Scaled residuals:
Min 1Q Median 3Q Max
-2.0997 -0.3218 -0.2013 -0.1238 9.4445
Random effects:
Groups Name Variance Std.Dev.
SiteID:area (Intercept) 1.6452 1.2827
area (Intercept) 0.6242 0.7901
Number of obs: 2252, groups: SiteID:area, 27; area, 9
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.96778 0.39190 -7.573 3.65e-14 ***
jdate.z -0.72258 0.17915 -4.033 5.50e-05 ***
I(jdate.z^2) 0.10091 0.08068 1.251 0.21102
I(jdate.z^3) 0.25025 0.08506 2.942 0.00326 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) jdat.z I(.^2)
jdate.z 0.078
I(jdat.z^2) -0.222 -0.154
I(jdat.z^3) -0.071 -0.910 0.199
The glmmTMB model + summary:
Tyntest<- glmmTMB(MYOSOD.P ~ jdate.z + I(jdate.z^2) + I(jdate.z^3) +
(1|area/SiteID), family = binomial("logit"), data = sodpYN)
> summary(Tyntest)
Family: binomial ( logit )
Formula: MYOSOD.P ~ jdate.z + I(jdate.z^2) + I(jdate.z^3) + (1 | area/SiteID)
Data: sodpYN
AIC BIC logLik deviance df.resid
1260.8 1295.1 -624.4 1248.8 2246
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
SiteID:area (Intercept) 1.6490 1.2841
area (Intercept) 0.6253 0.7908
Number of obs: 2252, groups: SiteID:area, 27; area, 9
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.96965 0.39638 -7.492 6.78e-14 ***
jdate.z -0.72285 0.18250 -3.961 7.47e-05 ***
I(jdate.z^2) 0.10096 0.08221 1.228 0.21941
I(jdate.z^3) 0.25034 0.08662 2.890 0.00385 **
---
ggpredict outputs
testg<-ggpredict(yntest, terms ="jdate.z[all]")
> testg
# Predicted probabilities of MYOSOD.P
# x = jdate.z
x predicted std.error conf.low conf.high
-1.95 0.046 0.532 0.017 0.120
-1.51 0.075 0.405 0.036 0.153
-1.03 0.084 0.391 0.041 0.165
-0.58 0.072 0.391 0.035 0.142
-0.14 0.054 0.390 0.026 0.109
0.35 0.039 0.399 0.018 0.082
0.79 0.034 0.404 0.016 0.072
1.72 0.067 0.471 0.028 0.152
Adjusted for:
* SiteID = 0 (population-level)
* area = 0 (population-level)
Standard errors are on link-scale (untransformed).
testgTMB<- ggpredict(Tyntest, "jdate.z[all]")
> testgTMB
# Predicted probabilities of MYOSOD.P
# x = jdate.z
x predicted std.error conf.low conf.high
-1.95 0.444 0.826 0.137 0.801
-1.51 0.254 0.612 0.093 0.531
-1.03 0.136 0.464 0.059 0.280
-0.58 0.081 0.404 0.038 0.163
-0.14 0.054 0.395 0.026 0.110
0.35 0.040 0.402 0.019 0.084
0.79 0.035 0.406 0.016 0.074
1.72 0.040 0.444 0.017 0.091
Adjusted for:
* SiteID = NA (population-level)
* area = NA (population-level)
Standard errors are on link-scale (untransformed).
The estimates are completely different and I have no idea why.
I did try to use both the ggeffects package from CRAN and the developer version in case that changed anything. It did not. I am using the most up to date version of glmmTMB.
This is my first time asking a question here so please let me know if I should provide more information to help explain the problem.
I checked and the issue is the same when using predict instead of ggpredict, which would imply that it is a glmmTMB issue?
GLMER:
dayplotg<-expand.grid(jdate.z=seq(min(sodp$jdate.z), max(sodp$jdate.z), length=92))
Dfitg<-predict(yntest, re.form=NA, newdata=dayplotg, type='response')
dayplotg<-data.frame(dayplotg, Dfitg)
head(dayplotg)
> head(dayplotg)
jdate.z Dfitg
1 -1.953206 0.04581691
2 -1.912873 0.04889584
3 -1.872540 0.05195598
4 -1.832207 0.05497553
5 -1.791875 0.05793307
6 -1.751542 0.06080781
glmmTMB:
dayplot<-expand.grid(jdate.z=seq(min(sodp$jdate.z), max(sodp$jdate.z), length=92),
SiteID=NA,
area=NA)
Dfit<-predict(Tyntest, newdata=dayplot, type='response')
head(Dfit)
dayplot<-data.frame(dayplot, Dfit)
head(dayplot)
> head(dayplot)
jdate.z SiteID area Dfit
1 -1.953206 NA NA 0.4458236
2 -1.912873 NA NA 0.4251926
3 -1.872540 NA NA 0.4050944
4 -1.832207 NA NA 0.3855801
5 -1.791875 NA NA 0.3666922
6 -1.751542 NA NA 0.3484646
I contacted the ggpredict developer and figured out that if I used poly(jdate.z,3) rather than jdate.z + I(jdate.z^2) + I(jdate.z^3) in the glmmTMB model, the glmer and glmmTMB predictions were the same.
I'll leave this post up even though I was able to answer my own question in case someone else has this question later.
So I'm an R novice attempting a GLMM and post hoc analysis... help! I've collected binary data on 9 damselflys under 6 light levels, 1=response to movement of optomotor drum, 0=no response. My data was imported into R with the headings 'Animal_ID, light_intensity, response'. Animal ID (1-9) repeated for each light intensity (3.36-0.61) (see below)
Using the following code (lme4 package), I've performed a GLMM and found a light level to have a significant effect on response:
d = data.frame(id = data[,1], var = data$Light_Intensity, Response = data$Response)
model <- glmer(Response~var+(1|id),family="binomial",data=d)
summary(model)
Returns
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod]
Family: binomial ( logit )
Formula: Response ~ var + (1 | Animal_ID)
Data: d
AIC BIC logLik deviance df.resid
66 72 -30 60 51
Scaled residuals:
Min 1Q Median 3Q Max
-3.7704 -0.6050 0.3276 0.5195 1.2463
Random effects:
Groups Name Variance Std.Dev.
Animal_ID (Intercept) 1.645 1.283
Number of obs: 54, groups: Animal_ID, 9
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.7406 1.0507 -1.657 0.0976 .
var 1.1114 0.4339 2.561 0.0104 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
var -0.846
Then running:
m1 <- update(model, ~.-var)
anova(model, m1, test = 'Chisq')
Returns
Data: d
Models:
m1: Response ~ (1 | Animal_ID)
model: Response ~ var + (1 | Animal_ID)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
m1 2 72.555 76.533 -34.278 68.555
model 3 66.017 71.983 -30.008 60.017 8.5388 1 0.003477 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I've installed the multcomp and lsmeans packages in an attempt at performing a Tukey post hoc to see where the difference is, but have run into difficulties with both.
Running:
summary(glht(m1,linfct=mcp("Animal_ID"="Tukey")))
Returns:
"Error in mcp2matrix(model, linfct = linfct) :
Variable(s) ‘Animal_ID’ have been specified in ‘linfct’ but cannot be found in ‘model’! "
Running:
lsmeans(model,pairwise~Animal_ID,adjust="tukey")
Returns:
"Error in lsmeans.character.ref.grid(object = new("ref.grid", model.info = list( :
No variable named Animal_ID in the reference grid"
I'm aware that I'm probably being very stupid here, but any help would be very much appreciated. My confusion is snowballing.
Also, does anyone have any suggestions as to how I might best visualize my results (and how to do this)?
Thank you very much in advance!
UPDATE:
New code-
Light <- c("3.36","3.36","3.36","3.36","3.36","3.36","3.36","3.36","3.36","2.98","2.98","2.98","2.98","2.98","2.98","2.98","2.98","2.98","2.73","2.73","2.73","2.73","2.73","2.73","2.73","2.73","2.73","2.15","2.15","2.15","2.15","2.15","2.15","2.15","2.15","2.15","1.72","1.72","1.72","1.72","1.72","1.72","1.72","1.72","1.72","0.61","0.61","0.61","0.61","0.61","0.61","0.61","0.61","0.61")
Subject <- c("1","2","3","4","5","6","7","8","9","1","2","3","4","5","6","7","8","9","1","2","3","4","5","6","7","8","9","1","2","3","4","5","6","7","8","9","1","2","3","4","5","6","7","8","9","1","2","3","4","5","6","7","8","9")
Value <- c("1","0","1","0","1","1","1","0","1","1","0","1","1","1","1","1","1","1","0","1","1","1","1","1","1","0","1","0","0","1","1","1","1","1","1","1","0","0","0","1","0","0","1","0","1","0","0","0","1","1","0","1","0","0")
data <- data.frame(Light, Subject, Value)
library(lme4)
model <- glmer(Value~Light+(1|Subject),family="binomial",data=data)
summary(model)
Returns:
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [
glmerMod]
Family: binomial ( logit )
Formula: Value ~ Light + (1 | Subject)
Data: data
AIC BIC logLik deviance df.resid
67.5 81.4 -26.7 53.5 47
Scaled residuals:
Min 1Q Median 3Q Max
-2.6564 -0.4884 0.2193 0.3836 1.2418
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 2.687 1.639
Number of obs: 54, groups: Subject, 9
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.070e+00 1.053e+00 -1.016 0.3096
Light1.72 -7.934e-06 1.227e+00 0.000 1.0000
Light2.15 2.931e+00 1.438e+00 2.038 0.0416 *
Light2.73 2.931e+00 1.438e+00 2.038 0.0416 *
Light2.98 4.049e+00 1.699e+00 2.383 0.0172 *
Light3.36 2.111e+00 1.308e+00 1.613 0.1067
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) Lg1.72 Lg2.15 Lg2.73 Lg2.98
Light1.72 -0.582
Light2.15 -0.595 0.426
Light2.73 -0.595 0.426 0.555
Light2.98 -0.534 0.361 0.523 0.523
Light3.36 -0.623 0.469 0.553 0.553 0.508
Then running:
m1 <- update(model, ~.-Light)
anova(model, m1, test= 'Chisq')
Returns:
Data: data
Models:
m1: Value ~ (1 | Subject)
model: Value ~ Light + (1 | Subject)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
m1 2 72.555 76.533 -34.278 68.555
model 7 67.470 81.393 -26.735 53.470 15.086 5 0.01 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Finally, running:
library(lsmeans)
lsmeans(model,list(pairwise~Light),adjust="tukey")
Returns (it actually works now!):
$`lsmeans of Light`
Light lsmean SE df asymp.LCL asymp.UCL
0.61 -1.070208 1.053277 NA -3.1345922 0.9941771
1.72 -1.070216 1.053277 NA -3.1345997 0.9941687
2.15 1.860339 1.172361 NA -0.4374459 4.1581244
2.73 1.860332 1.172360 NA -0.4374511 4.1581149
2.98 2.978658 1.443987 NA 0.1484964 5.8088196
3.36 1.040537 1.050317 NA -1.0180467 3.0991215
Results are given on the logit (not the response) scale.
Confidence level used: 0.95
$`pairwise differences of contrast`
contrast estimate SE df z.ratio p.value
0.61 - 1.72 7.933829e-06 1.226607 NA 0.000 1.0000
0.61 - 2.15 -2.930547e+00 1.438239 NA -2.038 0.3209
0.61 - 2.73 -2.930539e+00 1.438237 NA -2.038 0.3209
0.61 - 2.98 -4.048866e+00 1.699175 NA -2.383 0.1622
0.61 - 3.36 -2.110745e+00 1.308395 NA -1.613 0.5897
1.72 - 2.15 -2.930555e+00 1.438239 NA -2.038 0.3209
1.72 - 2.73 -2.930547e+00 1.438238 NA -2.038 0.3209
1.72 - 2.98 -4.048874e+00 1.699175 NA -2.383 0.1622
1.72 - 3.36 -2.110753e+00 1.308395 NA -1.613 0.5897
2.15 - 2.73 7.347728e-06 1.357365 NA 0.000 1.0000
2.15 - 2.98 -1.118319e+00 1.548539 NA -0.722 0.9793
2.15 - 3.36 8.198019e-01 1.302947 NA 0.629 0.9889
2.73 - 2.98 -1.118326e+00 1.548538 NA -0.722 0.9793
2.73 - 3.36 8.197945e-01 1.302947 NA 0.629 0.9889
2.98 - 3.36 1.938121e+00 1.529202 NA 1.267 0.8029
Results are given on the log odds ratio (not the response) scale.
P value adjustment: tukey method for comparing a family of 6 estimates
Your model specifies Animal_ID as a random effect. The glht and lsmeans functions work only for fixed-effect comparisons.