I'm comparing two multilevel models in R using the anova() function. One model contains a control variable and another with an experimental variable. When I compare the two, I get a weird result where the chisquare is 0 and the p value is 1. I would interpret this as the models are not significantly different, but this doesn't make sense with the data and other analyses that I've done with this experimental variable. Can someone help me understand this output?
To explain the variables, block_order (control) is the counterbalancing of the questions. It's a factor with 5 levels.
team_num is a level 2 random effect; it's the participant's team that they belong to.
cent_team_wm_agg is the team's desire to maintain a healthy weight. It is a continuous variable.
exer_vig is the continuous dependent variable, and it is how often people exercise.
Here's the model comparison output that has me confused:
anova(m2_ev_full_team, m1_ev_control_block_team)
refitting model(s) with ML (instead of REML)
Data: clean_data_0_nona
Models:
m2_ev_full_team: exer_vig ~ 1 + cent_team_wm_agg + (1 | team_num)
m1_ev_control_block_team: exer_vig ~ 1 + block_order + (1 | team_num)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
m2_ev_full_team 4 523.75 536.27 -257.88 515.75
m1_ev_control_block_team 8 533.96 559.00 -258.98 517.96 0 4 1
In case this helps, here are the models themselves. This is the one with the experimental variable:
summary(m2_ev_full_team <- lmer(exer_vig ~ 1 + cent_team_wm_agg + (1 |team_num), data = clean_data_0_nona))
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: exer_vig ~ 1 + cent_team_wm_agg + (1 | team_num)
Data: clean_data_0_nona
REML criterion at convergence: 519.7
Scaled residuals:
Min 1Q Median 3Q Max
-1.7585 -0.5819 -0.2432 0.5531 2.5569
Random effects:
Groups Name Variance Std.Dev.
team_num (Intercept) 0.1004 0.3168
Residual 1.1628 1.0783
Number of obs: 169, groups: team_num, 58
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.65955 0.09478 42.39962 28.061 <2e-16 ***
cent_team_wm_agg 0.73291 0.23572 64.27148 3.109 0.0028 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
cnt_tm_wm_g -0.004
And the one with the control:
summary(m1_ev_control_block_team <- lmer(exer_vig ~ 1 + block_order + (1 |team_num), data = clean_data_0_nona))
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: exer_vig ~ 1 + block_order + (1 | team_num)
Data: clean_data_0_nona
REML criterion at convergence: 525.1
Scaled residuals:
Min 1Q Median 3Q Max
-1.6796 -0.6597 -0.1625 0.5291 2.0941
Random effects:
Groups Name Variance Std.Dev.
team_num (Intercept) 0.2499 0.4999
Residual 1.1003 1.0490
Number of obs: 169, groups: team_num, 58
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.0874 0.2513 155.4960 12.284 <2e-16 ***
block_orderBlock2|Block4|Block3 -0.2568 0.3057 154.8652 -0.840 0.4020
block_orderBlock3|Block2|Block4 -0.3036 0.3438 160.8279 -0.883 0.3785
block_orderBlock3|Block4|Block2 -0.6204 0.3225 161.5186 -1.924 0.0561 .
block_orderBlock4|Block2|Block3 -0.4215 0.3081 151.2908 -1.368 0.1733
block_orderBlock4|Block3|Block2 -0.7306 0.3178 156.5548 -2.299 0.0228 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) b_B2|B b_B3|B2 b_B3|B4 b_B4|B2
bl_B2|B4|B3 -0.757
bl_B3|B2|B4 -0.687 0.557
bl_B3|B4|B2 -0.733 0.585 0.543
bl_B4|B2|B3 -0.741 0.601 0.545 0.577
bl_B4|B3|B2 -0.734 0.586 0.535 0.561 0.575
EDIT: If I had to guess, I assume it's because the control model has more degrees of freedom than the experimental, but that's all I can think of. I've tried running anova with the order of the models flipped, but it doesn't change anything. If that's the case, I don't know why the number of dfs would make a difference in being able to compare which one is better.
Thank you!
Related
Our data set consists of 3 periods of time measuring how often monkeys were in different hights in the tree.
After using a generalized linear mixed model on our data set we want to perform a posthoc test. We want to test if the monkeys are more often in higher areas in the different periods. We want to use the TukeyHSD() to do the tukey post hoc test, but we get an error :
Error in UseMethod("TukeyHSD") :
no applicable method for 'TukeyHSD' applied to an object of class "c('glmerMod', 'merMod')".
Also I can't install lsmeans or emmeans because it is not possible with my version of R (while I just updated R). Does anybody know how to solve this problem?
To do the glmm we used:
output2 <- glmer(StrataNumber ~ ffactor1 + ( 1 | Focal), data = aa, family = "poisson", na.action = "na.fail")
dredge(output2)
dredgeout2 <- dredge(output2)
subset(dredgeout2, delta <6)
summary(output2)
This gave us the following significant results:
> summary(output2)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [
glmerMod]
Family: poisson ( log )
Formula: StrataNumber ~ ffactor1 + (1 | Focal)
Data: aa
AIC BIC logLik deviance df.resid
9404.4 9428.0 -4698.2 9396.4 2688
Scaled residuals:
Min 1Q Median 3Q Max
-1.78263 -0.33628 0.06559 0.32481 1.37514
Random effects:
Groups Name Variance Std.Dev.
Focal (Intercept) 0.006274 0.07921
Number of obs: 2692, groups: Focal, 7
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.31659 0.03523 37.368 < 2e-16 ***
ffactor12 0.09982 0.02431 4.107 4.01e-05 ***
ffactor13 0.17184 0.02425 7.087 1.37e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) ffct12
ffactor12 -0.403
ffactor13 -0.403 0.585
I want to see if four predictors ("OA_statusclosed" "OA_statusgreen" "OA_statushybrid" "OA_statusbronze") have an effect on "logAlt." I have chosen to do a lmer in r to account for random intercepts and slopes by variable "Journal".
I want to interpret the output so that a higher OA status (in order of highest status: green, hybrid, bronze, closed). In order to do this, I have contrast coded the four variables as such (adhering to the order of the variables in my df being hybrid, closed, green, bronze):
contrasts(df$OA_status.c) <- c(0.25, -0.5, 0.5, -0.25)
contrasts(df$OA_status.c)
I have ran this analysis:
M3 <- lmer(logAlt ~ OA_status + (1|Journal),
data = df,
control=lmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)))
And got this summary(M3):
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: logAlt ~ OA_status + (1 | Journal)
Data: df
Control: lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
REML criterion at convergence: 20873.7
Scaled residuals:
Min 1Q Median 3Q Max
-3.1272 -0.6719 0.0602 0.6618 4.4344
Random effects:
Groups Name Variance Std.Dev.
Journal (Intercept) 0.08848 0.2975
Residual 1.49272 1.2218
Number of obs: 6435, groups: Journal, 7
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.03867 0.15059 18.27727 13.538 5.71e-11 ***
OA_statusclosed -0.97648 0.09915 6428.62227 -9.848 < 2e-16 ***
OA_statusgreen -0.74956 0.10320 6429.65387 -7.263 4.22e-13 ***
OA_statushybrid 0.04621 0.12590 6427.44114 0.367 0.714
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) OA_sttsc OA_sttsg
OA_sttsclsd -0.640
OA_statsgrn -0.613 0.934
OA_sttshybr -0.501 0.763 0.744
I interpret this to mean that, for example, OA_statusclosed results in an average of -0.97 less of a logAlt value than the reference value, and that OA_statusclosed is a significant predictor.
I have two questions:
Am I approaching contrast coding correctly— in that, am I making "OA_statusgreen" my reference value (which is what I think I need to do?)
Am I interpreting the output correctly?
Thanks in advance!
I want to test the effects of island area and land use, and the interaction between island area and land use on species richness. For land use, I have three groups, namely forest, farmland and mix. The data is based on transects on different islands, so the island ID is set as random effect.
My model looks like this:
#model = glmer(SR ~ Area + land_use + Area:land_use + (1|islandID))
#summary(model)
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: SR ~ Area + land_use + Area:land_use + (1 | islandID)
Data: transect_ZS
REML criterion at convergence: 184.4
Scaled residuals:
Min 1Q Median 3Q Max
-2.66105 -0.56159 -0.00294 0.57259 1.72096
Random effects:
Groups Name Variance Std.Dev.
islandID (Intercept) 0.1524 0.3903
Residual 0.6805 0.8249
Number of obs: 70, groups: islandID, 34
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) -0.9996 0.5187 57.0061 -1.927 0.05893 .
Area 0.9064 0.2834 40.9977 3.198 0.00267 **
land_useforest 0.6563 0.5569 62.0889 1.179 0.24309
land_usemix 0.9611 0.6373 55.3032 1.508 0.13723
Area:land_useforest -0.8318 0.3034 63.4045 -2.742 0.00793 **
Area:land_usemix -0.7756 0.4748 56.3692 -1.633 0.10795
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The results told me that island area and the interaction terms have significant effect on SR:
# > anova(model)
#Type III Analysis of Variance Table with Satterthwaite's method
# Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
#Area 3.0359 3.03590 1 27.448 4.4615 0.04390 *
#land_use 1.5520 0.77601 2 57.617 1.1404 0.32679
#Area:land_use 5.1658 2.58288 2 60.935 3.7958 0.02795 *
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
And then I used lsmeans function to conduct Tukeys' pairwise comparison:
#lsmeans(model, pairwise ~ Area:land_use, adjust="tukey")
The results indicate that the species richness from farmland and forest is significantly different, right? I wonder if this difference should be seen as the significant difference of intercept of the species richness-area relationship between farmland and forest in this model? That is the species richness from farmland transects is higher than that from forest transects?
#$contrasts
contrast estimate SE df t.ratio p.value
1.19968425045037 farmland - 1.19968425045037 forest 3.4153 0.288 62.6 1.185 0.0466
1.19968425045037 farmland - 1.19968425045037 mix -0.0306 0.426 64.0 -0.072 0.9972
1.19968425045037 forest - 1.19968425045037 mix -0.3722 0.377 63.9 -0.987 0.5087
Degrees-of-freedom method: kenward-roger
P value adjustment: tukey method for comparing a family of 3 estimates
But how to test if the slope of the species richness-area relationship between farmland and forest in this model is significant different? That is to testify if the species richness-area relationship from farmland transects is more steeper than that from forest transect?
I think you want
lstrends(model, pairwise ~ land_use, var = "Area", adjust="tukey")
The functions lsmeans and lstrends are in the emmeans package, in which they are equivalent to emmeans and emtrends respectively. So look at the documentation for those functions. The lsmeans package is just a front end.
I have fit a model using glmer from the lme4 package.I used the following code to fit the model.
GLMmmia.4<- glmer(Total_abun ~
EC+ DO_sat + TP+
Vegetationcover +(1 | Season), data=wetlandmacro , family=poisson, control=lmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e4)))
> summary(GLMmmia.4)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: poisson ( log )
Formula:
Total_abun ~ EC + DO_sat + TP + Vegetationcover + (1 | Season)
Data: wetlandmacro
Control:
lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 20000))
AIC BIC logLik deviance df.resid
4817.3 4833.1 -2402.6 4805.3 98
Scaled residuals:
Min 1Q Median 3Q Max
-9.8730 -4.4156 -0.4338 3.1763 19.4737
Random effects:
Groups Name Variance Std.Dev.
Season (Intercept) 0.009201 0.09592
Number of obs: 104, groups: Season, 2
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.0118238 0.0931197 53.82 < 2e-16 ***
EC -0.0011036 0.0001632 -6.76 1.34e-11 ***
DO_sat -0.0009736 0.0003168 -3.07 0.00211 **
TP 0.2050763 0.0422935 4.85 1.24e-06 ***
Vegetationcover -0.0015678 0.0005251 -2.99 0.00283 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) EC DO_sat TP
EC -0.392
DO_sat -0.521 0.313
TP -0.021 0.155 -0.167
Vegetatncvr -0.617 0.293 0.606 -0.142
I appreciate any help to plot predicted value of response variable (species abundance) as a function of each fixed effect (environmental variables)
You can use plot and fitted separately for each one of your fixed effects. For exemple:
plot(fitted(GLMmmia.4)~EC)
I am working on a generalized linear mixed model with binomial link.
My dependent variable is a score varying from 0 to 12 for each of 107 subjects (63 subjects are control and 44 subjects have the disease A). The test leading to the score is repeated 2 times with different versions.
I want to test wether there is a group difference (control VS disease A), wether there is a version difference and the interaction between group and version.
I use glmer from lme4 :
glmer(cbind(score, 12-score) ~ gender + age + group + version + group:version + (1|id), data = data, family="binomial")
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: cbind(score, 12-score) ~ gender + age + group + version + group:version + (1 | id)
Data: data
AIC BIC logLik deviance df.resid
764.7 788.2 -375.4 750.7 206
Scaled residuals:
Min 1Q Median 3Q Max
-6.1421 -0.6240 0.3693 0.7269 3.4653
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 0.3852 0.6207
Number of obs: 213, groups: id, 107
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.4707862 0.0017104 2029.2 <2e-16 ***
genderH 0.3402744 0.0017093 199.1 <2e-16 ***
age -0.0152378 0.0009988 -15.3 <2e-16 ***
groupCTRL 0.9554189 0.0017101 558.7 <2e-16 ***
versionunknown -2.0853952 0.0017089 -1220.3 <2e-16 ***
groupCTRL:versionunknown 0.1156636 0.0017092 67.7 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) SexeH age gropNC fmlrtn
SexeH 0.000
age -0.016 -0.006
groupeNC 0.001 0.000 -0.008
familrtncnn 0.000 0.000 -0.013 0.000
grpNC:fmlrt 0.000 0.000 -0.007 0.000 0.000
convergence code: 0
Model failed to converge with max|grad| = 0.0420639 (tol = 0.001, component 1)
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
I follow the different steps in this link : https://rstudio-pubs-static.s3.amazonaws.com/33653_57fc7b8e5d484c909b615d8633c01d51.html to handle the convergence failure.
None of these steps allow to take care of this issue.
Moreover, the standard errors of estimated fixed coefficients are almost the same except for age coefficients. So I can see there is a real trouble here.
Is someone could help me?