Linear mixed model confidence intervals question - r

Hoping that you can clear some confusion in my head.
Linear mixed model is constructed with lmerTest:
MODEL <- lmer(Ca content ~ SYSTEM +(1 | YEAR/replicate) +
(1 | YEAR:SYSTEM), data = IOSDV1)
Fun starts happening when I'm trying to get the confidence intervals for the specific levels of the main effect.
Commands emmeans and lsmeans produce the same intervals (example; SYSTEM A3: 23.9-128.9, mean 76.4, SE:8.96).
However, the command as.data.frame(effect("SYSTEM", MODEL)) produces different, narrower confidence intervals (example; SYSTEM A3: 58.0-94.9, mean 76.4, SE:8.96).
What am I missing and what number should I report?
To summarize, for the content of Ca, i have 6 total measurements per treatment (three per year, each from different replication). I will leave the names in the code in my language, as used. Idea is to test if certain production practices affect the content of specific minerals in the grains. Random effects without residual variance were left in the model for this example.
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: CA ~ SISTEM + (1 | LETO/ponovitev) + (1 | LETO:SISTEM)
Data: IOSDV1
REML criterion at convergence: 202.1
Scaled residuals:
Min 1Q Median 3Q Max
-1.60767 -0.74339 0.04665 0.73152 1.50519
Random effects:
Groups Name Variance Std.Dev.
LETO:SISTEM (Intercept) 0.0 0.0
ponovitev:LETO (Intercept) 0.0 0.0
LETO (Intercept) 120.9 11.0
Residual 118.7 10.9
Number of obs: 30, groups: LETO:SISTEM, 10; ponovitev:LETO, 8; LETO, 2
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 76.417 8.959 1.548 8.530 0.0276 *
SISTEM[T.C0] -5.183 6.291 24.000 -0.824 0.4181
SISTEM[T.C110] -13.433 6.291 24.000 -2.135 0.0431 *
SISTEM[T.C165] -7.617 6.291 24.000 -1.211 0.2378
SISTEM[T.C55] -10.883 6.291 24.000 -1.730 0.0965 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) SISTEM[T.C0 SISTEM[T.C11 SISTEM[T.C16
SISTEM[T.C0 -0.351
SISTEM[T.C11 -0.351 0.500
SISTEM[T.C16 -0.351 0.500 0.500
SISTEM[T.C5 -0.351 0.500 0.500 0.500
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see ?isSingular
> ls_means(MODEL, ddf="Kenward-Roger")
Least Squares Means table:
Estimate Std. Error df t value lower upper Pr(>|t|)
SISTEMA3 76.4167 8.9586 1.5 8.5299 23.9091 128.9243 0.02853 *
SISTEMC0 71.2333 8.9586 1.5 7.9514 18.7257 123.7409 0.03171 *
SISTEMC110 62.9833 8.9586 1.5 7.0305 10.4757 115.4909 0.03813 *
SISTEMC165 68.8000 8.9586 1.5 7.6797 16.2924 121.3076 0.03341 *
SISTEMC55 65.5333 8.9586 1.5 7.3151 13.0257 118.0409 0.03594 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Confidence level: 95%
Degrees of freedom method: Kenward-Roger
> emmeans(MODEL, spec = c("SISTEM"))
SISTEM emmean SE df lower.CL upper.CL
A3 76.4 8.96 1.53 23.9 129
C0 71.2 8.96 1.53 18.7 124
C110 63.0 8.96 1.53 10.5 115
C165 68.8 8.96 1.53 16.3 121
C55 65.5 8.96 1.53 13.0 118
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
> as.data.frame(effect("SISTEM", MODEL))
SISTEM fit se lower upper
1 A3 76.41667 8.958643 57.96600 94.86734
2 C0 71.23333 8.958643 52.78266 89.68400
3 C110 62.98333 8.958643 44.53266 81.43400
4 C165 68.80000 8.958643 50.34933 87.25067
5 C55 65.53333 8.958643 47.08266 83.98400
Many thanks.

I'm pretty sure this has to do with the dreaded "denominator degrees of freedom" question, i.e. what kind (if any) of finite-sample correction is being employed. tl;dr emmeans is using a Kenward-Roger correction, which is more or less the most accurate available option — the only reason not to use K-R is if you have a large data set for which it becomes unbearably slow.
load packages, simulate data, fit model
library(lmerTest)
library(emmeans)
library(effects)
dd <- expand.grid(f=factor(letters[1:3]),g=factor(1:20),rep=1:10)
set.seed(101)
dd$y <- simulate(~f+(1|g), newdata=dd, newparams=list(beta=rep(1,3),theta=1,sigma=1))[[1]]
m <- lmer(y~f+(1|g), data=dd)
compare default emmeans with effects
emmeans(m, ~f)
## f emmean SE df lower.CL upper.CL
## a 0.848 0.212 21.9 0.409 1.29
## b 1.853 0.212 21.9 1.414 2.29
## c 1.863 0.212 21.9 1.424 2.30
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
as.data.frame(effect("f",m))
## f fit se lower upper
## 1 a 0.8480161 0.2117093 0.4322306 1.263802
## 2 b 1.8531805 0.2117093 1.4373950 2.268966
## 3 c 1.8632228 0.2117093 1.4474373 2.279008
effects doesn't explicitly tell us what/whether it's using a finite-sample correction: we could dig around in the documentation or the code to try to find out. Alternatively, we can tell emmeans not to use finite-sample correction:
emmeans(m, ~f, lmer.df="asymptotic")
## f emmean SE df asymp.LCL asymp.UCL
## a 0.848 0.212 Inf 0.433 1.26
## b 1.853 0.212 Inf 1.438 2.27
## c 1.863 0.212 Inf 1.448 2.28
## Degrees-of-freedom method: asymptotic
## Confidence level used: 0.95
Testing shows that these are equivalent to about a tolerance of 0.001 (probably close enough). In principle we should be able to specify KR=TRUE to get effects to use Kenward-Roger correction, but I haven't been able to get that to work yet.
However, I will also say that there's something a little bit funky about your example. If we compute the distance between the mean and the lower CI in units of standard error, for emmeans we get (76.4-23.9)/8.96 = 5.86, which implies a very small effect degrees of freedom (e.g. about 1.55). That seems questionable to me unless your data set is extremely small ...
From your updated post, it appears that Kenward-Roger is indeed estimating only 1.5 denominator df.
In general it is dicey/not recommended to try fitting random effects where the grouping variable has a small number of levels (although see here for a counterargument). I would try treating LETO (which has only two levels) as a fixed effect, i.e.
CA ~ SISTEM + LETO + (1 | LETO:ponovitev) + (1 | LETO:SISTEM)
and see if that helps. (I would expect you would then get on the order of 7 df, which would make your CIs ± 2.4 SE instead of ± 6 SE ...)

Related

Is there any way to split interaction effects in a linear model up?

I have a 2x2 factorial design: control vs enriched, and strain1 vs strain2. I wanted to make a linear model, which I did as follows:
anova(lmer(length ~ Strain + Insect + Strain:Insect + BW_final + (1|Pen), data = mydata))
Where length is one of the dependent variables I want to analyse, Strain and Insect as treatments, Strain:Insect as interaction effect, BW_final as covariate, and Pen as random effect.
As output I get this:
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Strain 3.274 3.274 1 65 0.1215 0.7285
Insect 14.452 14.452 1 65 0.5365 0.4665
BW_final 45.143 45.143 1 65 1.6757 0.2001
Strain:Insect 52.813 52.813 1 65 1.9604 0.1662
As you can see, I only get 1 interaction term: Strain:Insect. However, I'd like to see 4 interaction terms: Strain1:Control, Strain1:Enriched, Strain2:Control, Strain2:Enriched.
Is there any way to do this in R?
Using summary instead of anova I get:
> summary(linearmer)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [lmerModLmerTest]
Formula: length ~ Strain + Insect + Strain:Insect + BW_final + (1 | Pen)
Data: mydata_young
REML criterion at convergence: 424.2
Scaled residuals:
Min 1Q Median 3Q Max
-1.95735 -0.52107 0.07014 0.43928 2.13383
Random effects:
Groups Name Variance Std.Dev.
Pen (Intercept) 0.00 0.00
Residual 26.94 5.19
Number of obs: 70, groups: Pen, 27
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 101.646129 7.530496 65.000000 13.498 <2e-16 ***
StrainRoss 0.648688 1.860745 65.000000 0.349 0.729
Insect 0.822454 2.062696 65.000000 0.399 0.691
BW_final -0.005188 0.004008 65.000000 -1.294 0.200
StrainRoss:Insect -3.608430 2.577182 65.000000 -1.400 0.166
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) StrnRs Insect BW_fnl
StrainRoss 0.253
Insect -0.275 0.375
BW_final -0.985 -0.378 0.169
StrnRss:Ins 0.071 -0.625 -0.775 0.016
convergence code: 0
boundary (singular) fit: see ?isSingular```

Discrepancy between summary() and anova() of linear mixed model

I have fitted a linear mixed model with split-plot design to assess the effects of water, nitrogen and phosphorus on BWC (biomass-weighted 2c-value, achieved by summing the product of each species' 2C-value(DNA content) with its biomass fraction (species subplot biomass/total subplot biomass):
model1.1<-lmer(log(BWC)~W*N*P+(1|year)+(1|W:Block),data=BWC)
There are two levels for W(0,1), N(0,1) and p(0,1) I would like to use boxplot to report my results with the output of the linear mixed model. However, I'm confused with the output of the linear mixed model.
The estimated value (slope) for WNP in model1.1 is negative, Does that mean WNP treatment will decrease BWC comparing to control plot? But we can see the BWC was highest in boxplot under the WNP treatment.
There is a discrepancy between summary() and anova(), for example, the significance for N and P effects. Estimate value for N is-4.0911 which means N addition decreased BWC But N effect was insignificant. How can I report the treatment effects like N?
Many thanks for any comments.
Boxplot of WNP treatment on BWC:
enter image description here
https://i.stack.imgur.com/cKOFt.png
(Sorry for the links,it seem I need at least 10 reputations to post images)
The summary() and anova() output:
> summary(model1)
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: BWC ~ W * N * P + (1 | year) + (1 | W:Block)
Data: BWC
REML criterion at convergence: 2969.1
Scaled residuals:
Min 1Q Median 3Q Max
-2.93847 -0.71228 -0.07573 0.68191 2.92589
Random effects:
Groups Name Variance Std.Dev.
W:Block (Intercept) 0.9169 0.9575
year (Intercept) 0.8346 0.9136
Residual 18.2966 4.2774
Number of obs: 515, groups: W:Block, 14; year, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 10.8498 0.6985 46.5200 15.532 < 2e-16 ***
W1 2.0844 0.8969 45.9613 2.324 0.02460 *
N1 -4.0911 0.7364 486.0288 -5.556 4.56e-08 ***
P1 -2.0460 0.7600 490.1120 -2.692 0.00734 **
W1:N1 4.6738 1.0394 485.9800 4.497 8.65e-06 ***
W1:P1 0.9695 1.0687 485.9809 0.907 0.36478
N1:P1 5.7550 1.0687 485.9773 5.385 1.13e-07 ***
W1:N1:P1 -3.3306 1.5100 485.9541 -2.206 0.02788 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) W1 N1 P1 W1:N1 W1:P1 N1:P1
W1 -0.645
N1 -0.531 0.414
P1 -0.515 0.401 0.488
W1:N1 0.376 -0.582 -0.708 -0.346
W1:P1 0.366 -0.566 -0.347 -0.706 0.488
N1:P1 0.366 -0.285 -0.689 -0.706 0.488 0.502
W1:N1:P1 -0.259 0.400 0.488 0.499 -0.688 -0.708 -0.708
> anova(model1)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
W 750.15 750.15 1 11.90 40.9995 3.519e-05 ***
N 10.84 10.84 1 485.95 0.5926 0.44177
P 29.14 29.14 1 494.92 1.5926 0.20755
W:N 290.51 290.51 1 485.95 15.8778 7.793e-05 ***
W:P 15.54 15.54 1 485.96 0.8493 0.35721
N:P 536.85 536.85 1 485.95 29.3415 9.562e-08 ***
W:N:P 89.01 89.01 1 485.95 4.8648 0.02788 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> emmeans::emmeans(model1,pairwise~N*P*W)
$emmeans
N P W emmean SE df lower.CL upper.CL
0 0 0 10.85 0.699 46.9 9.44 12.26
1 0 0 6.76 0.696 46.2 5.36 8.16
0 1 0 8.80 0.721 52.1 7.36 10.25
1 1 0 10.47 0.721 52.1 9.02 11.91
0 0 1 12.93 0.696 46.2 11.53 14.33
1 0 1 13.52 0.696 46.2 12.12 14.92
0 1 1 11.86 0.721 52.1 10.41 13.30
1 1 1 14.86 0.721 52.1 13.42 16.31
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95

Estimating effect size with emmenas for post hoc

Is there a way to have effect size (such as Cohen's d or the most appropriate) directly using emmeans()?
I cannot find anything for obtaining effect size by using emmeans()
post <- emmeans(fit, pairwise~ favorite.pirate | sex)
emmip(fit, ~ favorite.pirate | sex)
There is not a built-in provision for effect-size calculations, but you can cobble one together by defining a custom contrast function that divides each pairwise comparison by a value of sigma:
mypw.emmc = function(..., sigma = 1) {
result = emmeans:::pairwise.emmc (...)
for (i in seq_along(result[1, ]))
result[[i]] = result[[i]] / sigma
result
}
Here's a test run:
> mypw.emmc(1:3, sigma = 4)
1 - 2 1 - 3 2 - 3
1 0.25 0.25 0.00
2 -0.25 0.00 0.25
3 0.00 -0.25 -0.25
With your model, the error SD is 9.246 (look at summary(fit); so, ...
> emmeans(fit, mypw ~ sex, sigma = 9.246, name = "effect.size")
NOTE: Results may be misleading due to involvement in interactions
$emmeans
sex emmean SE df lower.CL upper.CL
female 63.8 0.434 3.03 62.4 65.2
male 74.5 0.809 15.82 72.8 76.2
other 68.8 1.439 187.08 65.9 71.6
Results are averaged over the levels of: favorite.pirate
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
$contrasts
effect.size estimate SE df t.ratio p.value
female - male -1.158 0.0996 399 -11.624 <.0001
female - other -0.537 0.1627 888 -3.299 0.0029
male - other 0.621 0.1717 981 3.617 0.0009
Results are averaged over the levels of: favorite.pirate
Degrees-of-freedom method: kenward-roger
P value adjustment: tukey method for comparing a family of 3 estimates
Some words of caution though:
The SEs of the effect sizes are misleading because they don't account for the variation in sigma.
This is not a very good example because
a. The factors interact (Edward Low is different in his profile).
Also, see the warning message.
b. The model is singular (as warned when the model was fitted), yielding an estimated variance of zero for college)
library(yarrr)
View(pirates)
library(lme4)
library(lmerTest)
fit <- lmer(weight~ favorite.pirate * sex +(1|college), data = pirates)
anova(fit, ddf = "Kenward-Roger")
post <- emmeans(fit, pairwise~ sex)
post

Post hoc for binary GLMM (lme4) and plot

So I'm an R novice attempting a GLMM and post hoc analysis... help! I've collected binary data on 9 damselflys under 6 light levels, 1=response to movement of optomotor drum, 0=no response. My data was imported into R with the headings 'Animal_ID, light_intensity, response'. Animal ID (1-9) repeated for each light intensity (3.36-0.61) (see below)
Using the following code (lme4 package), I've performed a GLMM and found a light level to have a significant effect on response:
d = data.frame(id = data[,1], var = data$Light_Intensity, Response = data$Response)
model <- glmer(Response~var+(1|id),family="binomial",data=d)
summary(model)
Returns
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod]
Family: binomial ( logit )
Formula: Response ~ var + (1 | Animal_ID)
Data: d
AIC BIC logLik deviance df.resid
66 72 -30 60 51
Scaled residuals:
Min 1Q Median 3Q Max
-3.7704 -0.6050 0.3276 0.5195 1.2463
Random effects:
Groups Name Variance Std.Dev.
Animal_ID (Intercept) 1.645 1.283
Number of obs: 54, groups: Animal_ID, 9
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.7406 1.0507 -1.657 0.0976 .
var 1.1114 0.4339 2.561 0.0104 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
var -0.846
Then running:
m1 <- update(model, ~.-var)
anova(model, m1, test = 'Chisq')
Returns
Data: d
Models:
m1: Response ~ (1 | Animal_ID)
model: Response ~ var + (1 | Animal_ID)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
m1 2 72.555 76.533 -34.278 68.555
model 3 66.017 71.983 -30.008 60.017 8.5388 1 0.003477 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I've installed the multcomp and lsmeans packages in an attempt at performing a Tukey post hoc to see where the difference is, but have run into difficulties with both.
Running:
summary(glht(m1,linfct=mcp("Animal_ID"="Tukey")))
Returns:
"Error in mcp2matrix(model, linfct = linfct) :
Variable(s) ‘Animal_ID’ have been specified in ‘linfct’ but cannot be found in ‘model’! "
Running:
lsmeans(model,pairwise~Animal_ID,adjust="tukey")
Returns:
"Error in lsmeans.character.ref.grid(object = new("ref.grid", model.info = list( :
No variable named Animal_ID in the reference grid"
I'm aware that I'm probably being very stupid here, but any help would be very much appreciated. My confusion is snowballing.
Also, does anyone have any suggestions as to how I might best visualize my results (and how to do this)?
Thank you very much in advance!
UPDATE:
New code-
Light <- c("3.36","3.36","3.36","3.36","3.36","3.36","3.36","3.36","3.36","2.98","2.98","2.98","2.98","2.98","2.98","2.98","2.98","2.98","2.73","2.73","2.73","2.73","2.73","2.73","2.73","2.73","2.73","2.15","2.15","2.15","2.15","2.15","2.15","2.15","2.15","2.15","1.72","1.72","1.72","1.72","1.72","1.72","1.72","1.72","1.72","0.61","0.61","0.61","0.61","0.61","0.61","0.61","0.61","0.61")
Subject <- c("1","2","3","4","5","6","7","8","9","1","2","3","4","5","6","7","8","9","1","2","3","4","5","6","7","8","9","1","2","3","4","5","6","7","8","9","1","2","3","4","5","6","7","8","9","1","2","3","4","5","6","7","8","9")
Value <- c("1","0","1","0","1","1","1","0","1","1","0","1","1","1","1","1","1","1","0","1","1","1","1","1","1","0","1","0","0","1","1","1","1","1","1","1","0","0","0","1","0","0","1","0","1","0","0","0","1","1","0","1","0","0")
data <- data.frame(Light, Subject, Value)
library(lme4)
model <- glmer(Value~Light+(1|Subject),family="binomial",data=data)
summary(model)
Returns:
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [
glmerMod]
Family: binomial ( logit )
Formula: Value ~ Light + (1 | Subject)
Data: data
AIC BIC logLik deviance df.resid
67.5 81.4 -26.7 53.5 47
Scaled residuals:
Min 1Q Median 3Q Max
-2.6564 -0.4884 0.2193 0.3836 1.2418
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 2.687 1.639
Number of obs: 54, groups: Subject, 9
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.070e+00 1.053e+00 -1.016 0.3096
Light1.72 -7.934e-06 1.227e+00 0.000 1.0000
Light2.15 2.931e+00 1.438e+00 2.038 0.0416 *
Light2.73 2.931e+00 1.438e+00 2.038 0.0416 *
Light2.98 4.049e+00 1.699e+00 2.383 0.0172 *
Light3.36 2.111e+00 1.308e+00 1.613 0.1067
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) Lg1.72 Lg2.15 Lg2.73 Lg2.98
Light1.72 -0.582
Light2.15 -0.595 0.426
Light2.73 -0.595 0.426 0.555
Light2.98 -0.534 0.361 0.523 0.523
Light3.36 -0.623 0.469 0.553 0.553 0.508
Then running:
m1 <- update(model, ~.-Light)
anova(model, m1, test= 'Chisq')
Returns:
Data: data
Models:
m1: Value ~ (1 | Subject)
model: Value ~ Light + (1 | Subject)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
m1 2 72.555 76.533 -34.278 68.555
model 7 67.470 81.393 -26.735 53.470 15.086 5 0.01 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Finally, running:
library(lsmeans)
lsmeans(model,list(pairwise~Light),adjust="tukey")
Returns (it actually works now!):
$`lsmeans of Light`
Light lsmean SE df asymp.LCL asymp.UCL
0.61 -1.070208 1.053277 NA -3.1345922 0.9941771
1.72 -1.070216 1.053277 NA -3.1345997 0.9941687
2.15 1.860339 1.172361 NA -0.4374459 4.1581244
2.73 1.860332 1.172360 NA -0.4374511 4.1581149
2.98 2.978658 1.443987 NA 0.1484964 5.8088196
3.36 1.040537 1.050317 NA -1.0180467 3.0991215
Results are given on the logit (not the response) scale.
Confidence level used: 0.95
$`pairwise differences of contrast`
contrast estimate SE df z.ratio p.value
0.61 - 1.72 7.933829e-06 1.226607 NA 0.000 1.0000
0.61 - 2.15 -2.930547e+00 1.438239 NA -2.038 0.3209
0.61 - 2.73 -2.930539e+00 1.438237 NA -2.038 0.3209
0.61 - 2.98 -4.048866e+00 1.699175 NA -2.383 0.1622
0.61 - 3.36 -2.110745e+00 1.308395 NA -1.613 0.5897
1.72 - 2.15 -2.930555e+00 1.438239 NA -2.038 0.3209
1.72 - 2.73 -2.930547e+00 1.438238 NA -2.038 0.3209
1.72 - 2.98 -4.048874e+00 1.699175 NA -2.383 0.1622
1.72 - 3.36 -2.110753e+00 1.308395 NA -1.613 0.5897
2.15 - 2.73 7.347728e-06 1.357365 NA 0.000 1.0000
2.15 - 2.98 -1.118319e+00 1.548539 NA -0.722 0.9793
2.15 - 3.36 8.198019e-01 1.302947 NA 0.629 0.9889
2.73 - 2.98 -1.118326e+00 1.548538 NA -0.722 0.9793
2.73 - 3.36 8.197945e-01 1.302947 NA 0.629 0.9889
2.98 - 3.36 1.938121e+00 1.529202 NA 1.267 0.8029
Results are given on the log odds ratio (not the response) scale.
P value adjustment: tukey method for comparing a family of 6 estimates
Your model specifies Animal_ID as a random effect. The glht and lsmeans functions work only for fixed-effect comparisons.

(Quasi)-Complete separation according to a random effect in logistic GLMM

I am experiencing convergence warning and very large group variance while fitting a binary logistic GLMM model using lme4. I am wondering whether this could be related to (quasi) complete separation according to the random effect, i.e., the fact that many individuals (the random effect/grouping variable) have only 0 in the dependent variable resulting in low within individual variation? If this could be a problem, are there alternative modelling strategies to deal with such cases?
More precisely, I am studying the chance that an individual is observed in a given status (having children while leaving by their parents) at a given age. In other words, I have several observations for each individual (typically 50) specifying whether the individual was observed in this state at a given age. Here is an example:
id age status
1 21 0
1 22 0
1 23 0
1 24 1
1 25 0
1 26 1
1 27 0
...
The chance to observe a status of 1 is quite low (between 1 and 5% depending on the cases) and I have a lot of observations (150'000 observations and 3'000 individuals).
The model was fitted using glmer specifying ID (individual) as a random effect and including some explanatory factors (age categories, parental education and the period where the status was observed). I get the following convergence warnings (except when using nAGQ=0) and very large group variance (here more than 25).
"Model failed to converge with max|grad| = 2.21808 (tol = 0.001, component 2)"
"Model is nearly unidentifiable: very large eigenvalue\n - Rescale variables?"
Here is the obtained model.
AIC BIC logLik deviance df.resid
9625.0 9724.3 -4802.5 9605.0 151215
Scaled residuals:
Min 1Q Median 3Q Max
-2.529 -0.003 -0.002 -0.001 47.081
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 28.94 5.38
Number of obs: 151225, groups: id, 3067
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -10.603822 0.496392 -21.362 < 2e-16 ***
agecat[18,21) -0.413018 0.075119 -5.498 3.84e-08 ***
agecat[21,24) -1.460205 0.095315 -15.320 < 2e-16 ***
agecat[24,27) -2.844713 0.137484 -20.691 < 2e-16 ***
agecat[27,30) -3.837227 0.199644 -19.220 < 2e-16 ***
parent_educ -0.007390 0.003609 -2.048 0.0406 *
period_cat80 s 0.126521 0.113044 1.119 0.2630
period_cat90 s -0.105139 0.176732 -0.595 0.5519
period_cat00 s -0.507052 0.263580 -1.924 0.0544 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) a[18,2 a[21,2 a[24,2 a[27,3 prnt_d pr_80' pr_90'
agct[18,21) -0.038
agct[21,24) -0.006 0.521
agct[24,27) 0.006 0.412 0.475
agct[27,30) 0.011 0.325 0.393 0.378
parent_educ -0.557 0.059 0.087 0.084 0.078
perd_ct80 s -0.075 -0.258 -0.372 -0.380 -0.352 -0.104
perd_ct90 s -0.048 -0.302 -0.463 -0.471 -0.448 -0.151 0.732
perd_ct00 s -0.019 -0.293 -0.459 -0.434 -0.404 -0.138 0.559 0.739
You could try one of a few different optimizers available through the nloptr and optimx packages. There's even an allFit function available through the afex package that tries them for you (just see the allFit helpfile). e.g:
all_mod <- allFit(exist_model)
That will let you check how stable your estimates are. This points over to more resources on the gradient topic.
If you're worried about complete separation, see here for Ben Bolker's answer to use the bglmer function from the blme package. It operates much like glmer, but allows you to add priors to the model specification.

Resources