glmer mixed models inconsistent between lme4 updates - r

This is a long post. I’m having trouble attaining consistent model results between lme4 versions. My old computer was running R 3.0.0 “Masked Marvel”, with the 0.999999-2 version of lme4, and my new work computer is running R 3.0.2 “Frisbee Sailing” with lme4 version 1.1-2. They are not giving me the same model outputs. Omitting the new lme4, I installed every package update on my old laptop one-by-one and ran the models. Then, updated R itself and ran the models. Finally, I made further updates to packages for R 3.0.2, and ran the models once more. They remained consistent with the original analysis. I wasn’t able to run the models with the most recent MASS package with R 3.0.0, and with the most recent nlme package in R 3.0.2. The error message I received in those cases was the same, and is below:
“Error in validObject(.Object) : invalid class “mer” object: Slot L must be a monotonic LL' factorization of size dims['q']”
I’m wondering if someone can help shed light on this before I update to lme4 1.1-2 to complete my analyses. There doesn’t appear to be a changelog at the address linked in R Studio…http://cran.rstudio.com/web/packages/lme4/NEWS
Maybe something else has changed on my end? I’ve been using the same datafile, and setting it up the same within R.
I am not sure how to attach data, or how to make this easiest for you all to run through yourselves. I can do that with some instruction. Below are the outputs for 2 models before the updates, and the same models after. The variation occurs in models that I haven't included here as well, some worse than others. It definitely affects an AIC ranking.
lme4 version 0.999999-2
R version 3.0.0 (2013-04-03) -- "Masked Marvel"
at1500o <- glmer(apred ~
(1 | fFarm) + (1 | Eps) + (1 | fExp),
family = binomial,
data = PU3.atristis)
summary(at1500o)
Number of levels of a grouping factor for the random effects
is *equal* to n, the number of observations
Warning message: In mer_finalize(ans) : false convergence (8)
Generalized linear mixed model fit by the Laplace approximation
Formula: apred ~ (1 | fFarm) + (1 | Eps) + (1 | fExp)
Data: PU3.atristis
AIC BIC logLik deviance
182.2 193 -87.11 174.2
Random effects:
Groups Name Variance Std.Dev.
Eps (Intercept) 1.6630e+01 4.0780e+00
fFarm (Intercept) 2.4393e+00 1.5618e+00
fExp (Intercept) 5.8587e-13 7.6542e-07
Number of obs: 110, groups: Eps, 110; fFarm, 17; fExp, 2
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.1406 0.7126 -8.618 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
False convergence didn’t seem to be a problem
at1500a <- glmer(apred ~
pctfield1500 * factor(SiteTrt) +
(1 | fFarm) + (1 | Eps) + (1 | fExp),
family = binomial,
data = PU3.atristis)
summary(at1500a)
Number of levels of a grouping factor for the random effects
is *equal* to n, the number of observations
Warning message: In mer_finalize(ans) : false convergence (8)
Generalized linear mixed model fit by the Laplace approximation
Formula: apred ~ pctfield1500 + factor(SiteTrt) + pctfield1500:factor(SiteTrt) + (1 | fFarm) + (1 | Eps) + (1 | fExp)
Data: PU3.atristis
AIC BIC logLik deviance
185.2 209.5 -83.59 167.2
Random effects:
Groups Name Variance Std.Dev.
Eps (Intercept) 2.4539e+01 4.95370375
fFarm (Intercept) 8.1397e-01 0.90220336
fExp (Intercept) 1.3452e-08 0.00011598
Number of obs: 110, groups: Eps, 110; fFarm, 17; fExp, 2
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.82245 2.64044 -2.584 0.00977 **
pctfield1500 -0.01041 0.10849 -0.096 0.92353
factor(SiteTrt)2 2.47654 3.97069 0.624 0.53282
factor(SiteTrt)3 4.33391 4.65551 0.931 0.35190
pctfield1500:factor(SiteTrt)2 -0.05073 0.13117 -0.387 0.69895
pctfield1500:factor(SiteTrt)3 -0.05729 0.14524 -0.394 0.69323
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) pc1500 f(ST)2 f(ST)3 p1500:(ST)2
pctfild1500 -0.876
fctr(StTr)2 -0.665 0.582
fctr(StTr)3 -0.571 0.504 0.380
p1500:(ST)2 0.724 -0.827 -0.850 -0.417
p1500:(ST)3 0.654 -0.747 -0.435 -0.887 0.618
Lme4 version 1.1-2
R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"
at1500o <- glmer(apred ~
(1 | fFarm) + (1 | Eps) + (1 | fExp),
family = binomial,
data = PU3.atristis)
summary(at1500o)
Generalized linear mixed model fit by maximum likelihood ['glmerMod']
Family: binomial ( logit )
Formula: apred ~ (1 | fFarm) + (1 | Eps) + (1 | fExp)
Data: PU3.atristis
AIC BIC logLik deviance
236.8296 247.6316 -114.4148 228.8296
Random effects:
Groups Name Variance Std.Dev.
Eps (Intercept) 4.799e+01 6.927e+00
fFarm (Intercept) 6.542e-01 8.088e-01
fExp (Intercept) 6.056e-10 2.461e-05
Number of obs: 110, groups: Eps, 110; fFarm, 17; fExp, 2
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.975 1.115 -7.154 8.43e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
No warnings anymore, but everything is pretty different. The model fit statement has changed from the Laplace approximation, to maximum likelihood [‘glmerMod’].
at1500a <- glmer(apred ~
pctfield1500 * factor(SiteTrt) +
(1 | fFarm) + (1 | Eps) + (1 | fExp),
family = binomial,
data = PU3.atristis)
summary(at1500a)
Generalized linear mixed model fit by maximum likelihood ['glmerMod']
Family: binomial ( logit )
Formula: apred ~ pctfield1500 + factor(SiteTrt) + pctfield1500:factor(SiteTrt) + (1 | fFarm) + (1 | Eps) + (1 | fExp)
Data: PU3.atristis
AIC BIC logLik deviance
242.7874 267.0917 -112.3937 224.7874
Random effects:
Groups Name Variance Std.Dev.
Eps (Intercept) 3.346e+01 5.784e+00
fFarm (Intercept) 1.809e-07 4.253e-04
fExp (Intercept) 4.380e-11 6.618e-06
Number of obs: 110, groups: Eps, 110; fFarm, 17; fExp, 2
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.46800 3.04197 -2.455 0.0141 *
pctfield1500 0.00032 0.12301 0.003 0.9979
factor(SiteTrt)2 2.05618 4.57444 0.450 0.6531
factor(SiteTrt)3 6.72139 5.20895 1.290 0.1969
pctfield1500:factor(SiteTrt)2 -0.04882 0.14911 -0.327 0.7434
pctfield1500:factor(SiteTrt)3 -0.11324 0.16495 -0.686 0.4924
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) pc1500 f(ST)2 f(ST)3 p1500:(ST)2
pctfild1500 -0.875
fctr(StTr)2 -0.665 0.582
fctr(StTr)3 -0.584 0.511 0.388
p1500:(ST)2 0.722 -0.825 -0.850 -0.422
p1500:(ST)3 0.653 -0.746 -0.434 -0.886 0.615
Cheers,
Nava

Version 1.0 of the lme4 package marked a major change, with a pretty substantial rework of the underlying fitting code. From the changelog:
Because the internal computational machinery has changed, results from
the newest version of lme4 will not be numerically identical to those
from previous versions. For reasonably well- defined fits, they will
be extremely close (within numerical tolerances of 1e-4 or so), but
for unstable or poorly-defined fits the results may change, and very
unstable fits may fail when they (apparently) succeeded with previous
versions. Similarly, some fits may be slower with the new version,
although on average the new version should be faster and more stable.
More numerical tuning options are now available (see below);
non-default settings may restore the speed and/or ability to fit a
particular model without an error. If you notice significant or
disturbing changes when fitting a model with the new version of lme4,
please notify the maintainers.
Considering your model fits using the 0.9999 version of lme4 produced false convergence warnings, your model might be in the "unstable fits" category this is talking about.

Related

Tukey post hoc test after glmm error R studio

Our data set consists of 3 periods of time measuring how often monkeys were in different hights in the tree.
After using a generalized linear mixed model on our data set we want to perform a posthoc test. We want to test if the monkeys are more often in higher areas in the different periods. We want to use the TukeyHSD() to do the tukey post hoc test, but we get an error :
Error in UseMethod("TukeyHSD") :
no applicable method for 'TukeyHSD' applied to an object of class "c('glmerMod', 'merMod')".
Also I can't install lsmeans or emmeans because it is not possible with my version of R (while I just updated R). Does anybody know how to solve this problem?
To do the glmm we used:
output2 <- glmer(StrataNumber ~ ffactor1 + ( 1 | Focal), data = aa, family = "poisson", na.action = "na.fail")
dredge(output2)
dredgeout2 <- dredge(output2)
subset(dredgeout2, delta <6)
summary(output2)
This gave us the following significant results:
> summary(output2)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [
glmerMod]
Family: poisson ( log )
Formula: StrataNumber ~ ffactor1 + (1 | Focal)
Data: aa
AIC BIC logLik deviance df.resid
9404.4 9428.0 -4698.2 9396.4 2688
Scaled residuals:
Min 1Q Median 3Q Max
-1.78263 -0.33628 0.06559 0.32481 1.37514
Random effects:
Groups Name Variance Std.Dev.
Focal (Intercept) 0.006274 0.07921
Number of obs: 2692, groups: Focal, 7
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.31659 0.03523 37.368 < 2e-16 ***
ffactor12 0.09982 0.02431 4.107 4.01e-05 ***
ffactor13 0.17184 0.02425 7.087 1.37e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) ffct12
ffactor12 -0.403
ffactor13 -0.403 0.585

simr: powerSim gives 100% for all effect sizes

I have carried out a binomial GLMM to determine how latitude and native status (native/non-native) of a set of plant species affects herbivory damage. I am now trying to determine the statistical power of my model when I change the effect sizes. My model looks like this:
latglmm <- glmer(cbind(chewing,total.cells-chewing) ~ scale(latitude) * native.status + scale(sample.day.of.year) + (1|genus) + (1|species) + (1|catalogue.number), family=binomial, data=mna)
where cbind(chewing,total.cells-chewing) gives me a proportion (of leaves with herbivory damage), native.status is either "native" or "non-native" and catalogue.number acts as an observation-level random effect to deal with overdispersion. There are 10 genus, each with at least 1 native and 1 non-native species to make 26 species in total. The model summary is:
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: cbind(chewing, total.cells - chewing) ~ scale(latitude) * native.status +
scale(sample.day.of.year) + (1 | genus) + (1 | species) + (1 | catalogue.number)
Data: mna
AIC BIC logLik deviance df.resid
3986.7 4023.3 -1985.4 3970.7 706
Scaled residuals:
Min 1Q Median 3Q Max
-1.3240 -0.4511 -0.0250 0.1992 1.0765
Random effects:
Groups Name Variance Std.Dev.
catalogue.number (Intercept) 1.26417 1.1244
species (Intercept) 0.08207 0.2865
genus.ID (Intercept) 0.33431 0.5782
Number of obs: 714, groups: catalogue.number, 713; species, 26; genus.ID, 10
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.61310 0.20849 -12.534 < 2e-16 ***
scale(latitude) -0.17283 0.06370 -2.713 0.00666 **
native.statusnon-native 0.11434 0.15554 0.735 0.46226
scale(sample.day.of.year) 0.28521 0.05224 5.460 4.77e-08 ***
scale(latitude):native.statusnon-native -0.02986 0.09916 -0.301 0.76327
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) scallt ntv.s- scaldy
scalelat 0.012
ntv.sttsnn- -0.304 -0.014
scaledoy 0.018 -0.085 -0.027
scllt:ntv.- -0.011 -0.634 0.006 -0.035
I should add that the actual model I have been using is a glmmTMB model as lme4 still had some overdispersion even with the observation-level random effect, but this is not compatible with simr so I am using lme4 (the results are very similar for both). I want to see what happens to the model power when I increase or decrease the effect sizes of latitude and native status but when I run fixef(latglmm1)["scale(latitude)"]<--1 and fixef(latglmm1)["native.statusnon-native"]<--1 and try this:
powerSim(latglmm, fcompare(~ scale(latitude) + native.status))
I get the following output:
Power for model comparison, (95% confidence interval):====================================================================|
100.0% (69.15, 100.0)
Test: Likelihood ratio
Comparison to ~scale(latitude) + native.status + [re]
Based on 10 simulations, (0 warnings, 0 errors)
alpha = 0.05, nrow = 1428
Time elapsed: 0 h 1 m 5 s
The output is the same (100% power) no matter what I change fixef() to. Based on other similar questions online I have ensured that my data has no NA values and according to my powerSim there are no warnings or errors to address. I am completely lost as to why this isn't working so any help would be greatly appreciated!
Alternatively, if anyone has any recommendations for other methods to carry out similar analysis I would love to hear them. What I really want is to get a p-value for each effect size I input but statistical power would be very valuable too.
Thank you!

model comparison interpretation when chi square equals 0

I'm comparing two multilevel models in R using the anova() function. One model contains a control variable and another with an experimental variable. When I compare the two, I get a weird result where the chisquare is 0 and the p value is 1. I would interpret this as the models are not significantly different, but this doesn't make sense with the data and other analyses that I've done with this experimental variable. Can someone help me understand this output?
To explain the variables, block_order (control) is the counterbalancing of the questions. It's a factor with 5 levels.
team_num is a level 2 random effect; it's the participant's team that they belong to.
cent_team_wm_agg is the team's desire to maintain a healthy weight. It is a continuous variable.
exer_vig is the continuous dependent variable, and it is how often people exercise.
Here's the model comparison output that has me confused:
anova(m2_ev_full_team, m1_ev_control_block_team)
refitting model(s) with ML (instead of REML)
Data: clean_data_0_nona
Models:
m2_ev_full_team: exer_vig ~ 1 + cent_team_wm_agg + (1 | team_num)
m1_ev_control_block_team: exer_vig ~ 1 + block_order + (1 | team_num)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
m2_ev_full_team 4 523.75 536.27 -257.88 515.75
m1_ev_control_block_team 8 533.96 559.00 -258.98 517.96 0 4 1
In case this helps, here are the models themselves. This is the one with the experimental variable:
summary(m2_ev_full_team <- lmer(exer_vig ~ 1 + cent_team_wm_agg + (1 |team_num), data = clean_data_0_nona))
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: exer_vig ~ 1 + cent_team_wm_agg + (1 | team_num)
Data: clean_data_0_nona
REML criterion at convergence: 519.7
Scaled residuals:
Min 1Q Median 3Q Max
-1.7585 -0.5819 -0.2432 0.5531 2.5569
Random effects:
Groups Name Variance Std.Dev.
team_num (Intercept) 0.1004 0.3168
Residual 1.1628 1.0783
Number of obs: 169, groups: team_num, 58
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.65955 0.09478 42.39962 28.061 <2e-16 ***
cent_team_wm_agg 0.73291 0.23572 64.27148 3.109 0.0028 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
cnt_tm_wm_g -0.004
And the one with the control:
summary(m1_ev_control_block_team <- lmer(exer_vig ~ 1 + block_order + (1 |team_num), data = clean_data_0_nona))
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: exer_vig ~ 1 + block_order + (1 | team_num)
Data: clean_data_0_nona
REML criterion at convergence: 525.1
Scaled residuals:
Min 1Q Median 3Q Max
-1.6796 -0.6597 -0.1625 0.5291 2.0941
Random effects:
Groups Name Variance Std.Dev.
team_num (Intercept) 0.2499 0.4999
Residual 1.1003 1.0490
Number of obs: 169, groups: team_num, 58
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.0874 0.2513 155.4960 12.284 <2e-16 ***
block_orderBlock2|Block4|Block3 -0.2568 0.3057 154.8652 -0.840 0.4020
block_orderBlock3|Block2|Block4 -0.3036 0.3438 160.8279 -0.883 0.3785
block_orderBlock3|Block4|Block2 -0.6204 0.3225 161.5186 -1.924 0.0561 .
block_orderBlock4|Block2|Block3 -0.4215 0.3081 151.2908 -1.368 0.1733
block_orderBlock4|Block3|Block2 -0.7306 0.3178 156.5548 -2.299 0.0228 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) b_B2|B b_B3|B2 b_B3|B4 b_B4|B2
bl_B2|B4|B3 -0.757
bl_B3|B2|B4 -0.687 0.557
bl_B3|B4|B2 -0.733 0.585 0.543
bl_B4|B2|B3 -0.741 0.601 0.545 0.577
bl_B4|B3|B2 -0.734 0.586 0.535 0.561 0.575
EDIT: If I had to guess, I assume it's because the control model has more degrees of freedom than the experimental, but that's all I can think of. I've tried running anova with the order of the models flipped, but it doesn't change anything. If that's the case, I don't know why the number of dfs would make a difference in being able to compare which one is better.
Thank you!

Convergence warnings in glmer from lme4 package + weird standard errors of estimated fixed coefficients

I am working on a generalized linear mixed model with binomial link.
My dependent variable is a score varying from 0 to 12 for each of 107 subjects (63 subjects are control and 44 subjects have the disease A). The test leading to the score is repeated 2 times with different versions.
I want to test wether there is a group difference (control VS disease A), wether there is a version difference and the interaction between group and version.
I use glmer from lme4 :
glmer(cbind(score, 12-score) ~ gender + age + group + version + group:version + (1|id), data = data, family="binomial")
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: cbind(score, 12-score) ~ gender + age + group + version + group:version + (1 | id)
Data: data
AIC BIC logLik deviance df.resid
764.7 788.2 -375.4 750.7 206
Scaled residuals:
Min 1Q Median 3Q Max
-6.1421 -0.6240 0.3693 0.7269 3.4653
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 0.3852 0.6207
Number of obs: 213, groups: id, 107
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.4707862 0.0017104 2029.2 <2e-16 ***
genderH 0.3402744 0.0017093 199.1 <2e-16 ***
age -0.0152378 0.0009988 -15.3 <2e-16 ***
groupCTRL 0.9554189 0.0017101 558.7 <2e-16 ***
versionunknown -2.0853952 0.0017089 -1220.3 <2e-16 ***
groupCTRL:versionunknown 0.1156636 0.0017092 67.7 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) SexeH age gropNC fmlrtn
SexeH 0.000
age -0.016 -0.006
groupeNC 0.001 0.000 -0.008
familrtncnn 0.000 0.000 -0.013 0.000
grpNC:fmlrt 0.000 0.000 -0.007 0.000 0.000
convergence code: 0
Model failed to converge with max|grad| = 0.0420639 (tol = 0.001, component 1)
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
I follow the different steps in this link : https://rstudio-pubs-static.s3.amazonaws.com/33653_57fc7b8e5d484c909b615d8633c01d51.html to handle the convergence failure.
None of these steps allow to take care of this issue.
Moreover, the standard errors of estimated fixed coefficients are almost the same except for age coefficients. So I can see there is a real trouble here.
Is someone could help me?

Reporting results from glmmadmb

I am very new to R and I want to know if random effects in the different areas where my research took place could be biasing my results, and providing a false positive effect on my conditions. My data is based on natural observations taking place in four different conditions, over 9 areas. It is a between subjects design- each row is an observation of a different subject.
Condition =factor (4 levels)
Area = random effect (9 levels)
GeneralDisplayingBehaviours = DV
ObsTime = How many minuets the observations took place for
This is what my model looks like
data2$Condition<-as.factor(data2$Condition)
data2$Area<-as.factor(data2$Area)
data2$Condition<-relevel(data2$Condition,ref="22")
data2$Area<-relevel(data2$Area,ref="1")
mod<-glmmadmb(GeneralDisplayingBehaviours~Condition+ObsTime+(1|Area), data=data2, family="nbinom", zeroInflation=TRUE)
This is the out put:
Call:
glmmadmb(formula = GeneralDisplayingBehaviours ~ Condition +
ObsTime + (1 | Area), data = data2, family = "nbinom", zeroInflation = TRUE)
AIC: 2990.1
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.6233 0.5019 3.23 0.0012 **
Condition12 1.3291 0.1330 9.99 <2e-16 ***
Condition11 1.2965 0.1294 10.02 <2e-16 ***
Condition21 0.0715 0.1351 0.53 0.5966
ObsTime 0.0829 0.0341 2.43 0.0151 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Number of observations: total=360, Area=9
Random effect variance(s):
Group=Area
Variance StdDev
(Intercept) 8.901e-09 9.434e-05
Negative binomial dispersion parameter: 1.7376 (std. err.: 0.16112)
Zero-inflation: 0.16855 (std. err.: 0.02051 )
Log-likelihood: -1487.06
I would then go on to change the condition ref for each level, and I think I would have to do the same with area.
How would I interpret these results and report them?
What does this tell me about my random effect of area? Does it impact the DV?
Are the conditions significantly different in terms of the DV, and which conditions are significantly different for which?
For the last question I have a feeling I would need to do multiple comparisons so how would I do this in GLMMABMD?
Thank you

Resources