I have a 2x2 factorial design: control vs enriched, and strain1 vs strain2. I wanted to make a linear model, which I did as follows:
anova(lmer(length ~ Strain + Insect + Strain:Insect + BW_final + (1|Pen), data = mydata))
Where length is one of the dependent variables I want to analyse, Strain and Insect as treatments, Strain:Insect as interaction effect, BW_final as covariate, and Pen as random effect.
As output I get this:
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Strain 3.274 3.274 1 65 0.1215 0.7285
Insect 14.452 14.452 1 65 0.5365 0.4665
BW_final 45.143 45.143 1 65 1.6757 0.2001
Strain:Insect 52.813 52.813 1 65 1.9604 0.1662
As you can see, I only get 1 interaction term: Strain:Insect. However, I'd like to see 4 interaction terms: Strain1:Control, Strain1:Enriched, Strain2:Control, Strain2:Enriched.
Is there any way to do this in R?
Using summary instead of anova I get:
> summary(linearmer)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [lmerModLmerTest]
Formula: length ~ Strain + Insect + Strain:Insect + BW_final + (1 | Pen)
Data: mydata_young
REML criterion at convergence: 424.2
Scaled residuals:
Min 1Q Median 3Q Max
-1.95735 -0.52107 0.07014 0.43928 2.13383
Random effects:
Groups Name Variance Std.Dev.
Pen (Intercept) 0.00 0.00
Residual 26.94 5.19
Number of obs: 70, groups: Pen, 27
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 101.646129 7.530496 65.000000 13.498 <2e-16 ***
StrainRoss 0.648688 1.860745 65.000000 0.349 0.729
Insect 0.822454 2.062696 65.000000 0.399 0.691
BW_final -0.005188 0.004008 65.000000 -1.294 0.200
StrainRoss:Insect -3.608430 2.577182 65.000000 -1.400 0.166
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) StrnRs Insect BW_fnl
StrainRoss 0.253
Insect -0.275 0.375
BW_final -0.985 -0.378 0.169
StrnRss:Ins 0.071 -0.625 -0.775 0.016
convergence code: 0
boundary (singular) fit: see ?isSingular```
I have fitted a linear mixed model with split-plot design to assess the effects of water, nitrogen and phosphorus on BWC (biomass-weighted 2c-value, achieved by summing the product of each species' 2C-value(DNA content) with its biomass fraction (species subplot biomass/total subplot biomass):
model1.1<-lmer(log(BWC)~W*N*P+(1|year)+(1|W:Block),data=BWC)
There are two levels for W(0,1), N(0,1) and p(0,1) I would like to use boxplot to report my results with the output of the linear mixed model. However, I'm confused with the output of the linear mixed model.
The estimated value (slope) for WNP in model1.1 is negative, Does that mean WNP treatment will decrease BWC comparing to control plot? But we can see the BWC was highest in boxplot under the WNP treatment.
There is a discrepancy between summary() and anova(), for example, the significance for N and P effects. Estimate value for N is-4.0911 which means N addition decreased BWC But N effect was insignificant. How can I report the treatment effects like N?
Many thanks for any comments.
Boxplot of WNP treatment on BWC:
enter image description here
https://i.stack.imgur.com/cKOFt.png
(Sorry for the links,it seem I need at least 10 reputations to post images)
The summary() and anova() output:
> summary(model1)
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: BWC ~ W * N * P + (1 | year) + (1 | W:Block)
Data: BWC
REML criterion at convergence: 2969.1
Scaled residuals:
Min 1Q Median 3Q Max
-2.93847 -0.71228 -0.07573 0.68191 2.92589
Random effects:
Groups Name Variance Std.Dev.
W:Block (Intercept) 0.9169 0.9575
year (Intercept) 0.8346 0.9136
Residual 18.2966 4.2774
Number of obs: 515, groups: W:Block, 14; year, 10
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 10.8498 0.6985 46.5200 15.532 < 2e-16 ***
W1 2.0844 0.8969 45.9613 2.324 0.02460 *
N1 -4.0911 0.7364 486.0288 -5.556 4.56e-08 ***
P1 -2.0460 0.7600 490.1120 -2.692 0.00734 **
W1:N1 4.6738 1.0394 485.9800 4.497 8.65e-06 ***
W1:P1 0.9695 1.0687 485.9809 0.907 0.36478
N1:P1 5.7550 1.0687 485.9773 5.385 1.13e-07 ***
W1:N1:P1 -3.3306 1.5100 485.9541 -2.206 0.02788 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) W1 N1 P1 W1:N1 W1:P1 N1:P1
W1 -0.645
N1 -0.531 0.414
P1 -0.515 0.401 0.488
W1:N1 0.376 -0.582 -0.708 -0.346
W1:P1 0.366 -0.566 -0.347 -0.706 0.488
N1:P1 0.366 -0.285 -0.689 -0.706 0.488 0.502
W1:N1:P1 -0.259 0.400 0.488 0.499 -0.688 -0.708 -0.708
> anova(model1)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
W 750.15 750.15 1 11.90 40.9995 3.519e-05 ***
N 10.84 10.84 1 485.95 0.5926 0.44177
P 29.14 29.14 1 494.92 1.5926 0.20755
W:N 290.51 290.51 1 485.95 15.8778 7.793e-05 ***
W:P 15.54 15.54 1 485.96 0.8493 0.35721
N:P 536.85 536.85 1 485.95 29.3415 9.562e-08 ***
W:N:P 89.01 89.01 1 485.95 4.8648 0.02788 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> emmeans::emmeans(model1,pairwise~N*P*W)
$emmeans
N P W emmean SE df lower.CL upper.CL
0 0 0 10.85 0.699 46.9 9.44 12.26
1 0 0 6.76 0.696 46.2 5.36 8.16
0 1 0 8.80 0.721 52.1 7.36 10.25
1 1 0 10.47 0.721 52.1 9.02 11.91
0 0 1 12.93 0.696 46.2 11.53 14.33
1 0 1 13.52 0.696 46.2 12.12 14.92
0 1 1 11.86 0.721 52.1 10.41 13.30
1 1 1 14.86 0.721 52.1 13.42 16.31
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
Is there a way to have effect size (such as Cohen's d or the most appropriate) directly using emmeans()?
I cannot find anything for obtaining effect size by using emmeans()
post <- emmeans(fit, pairwise~ favorite.pirate | sex)
emmip(fit, ~ favorite.pirate | sex)
There is not a built-in provision for effect-size calculations, but you can cobble one together by defining a custom contrast function that divides each pairwise comparison by a value of sigma:
mypw.emmc = function(..., sigma = 1) {
result = emmeans:::pairwise.emmc (...)
for (i in seq_along(result[1, ]))
result[[i]] = result[[i]] / sigma
result
}
Here's a test run:
> mypw.emmc(1:3, sigma = 4)
1 - 2 1 - 3 2 - 3
1 0.25 0.25 0.00
2 -0.25 0.00 0.25
3 0.00 -0.25 -0.25
With your model, the error SD is 9.246 (look at summary(fit); so, ...
> emmeans(fit, mypw ~ sex, sigma = 9.246, name = "effect.size")
NOTE: Results may be misleading due to involvement in interactions
$emmeans
sex emmean SE df lower.CL upper.CL
female 63.8 0.434 3.03 62.4 65.2
male 74.5 0.809 15.82 72.8 76.2
other 68.8 1.439 187.08 65.9 71.6
Results are averaged over the levels of: favorite.pirate
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
$contrasts
effect.size estimate SE df t.ratio p.value
female - male -1.158 0.0996 399 -11.624 <.0001
female - other -0.537 0.1627 888 -3.299 0.0029
male - other 0.621 0.1717 981 3.617 0.0009
Results are averaged over the levels of: favorite.pirate
Degrees-of-freedom method: kenward-roger
P value adjustment: tukey method for comparing a family of 3 estimates
Some words of caution though:
The SEs of the effect sizes are misleading because they don't account for the variation in sigma.
This is not a very good example because
a. The factors interact (Edward Low is different in his profile).
Also, see the warning message.
b. The model is singular (as warned when the model was fitted), yielding an estimated variance of zero for college)
library(yarrr)
View(pirates)
library(lme4)
library(lmerTest)
fit <- lmer(weight~ favorite.pirate * sex +(1|college), data = pirates)
anova(fit, ddf = "Kenward-Roger")
post <- emmeans(fit, pairwise~ sex)
post
So I'm an R novice attempting a GLMM and post hoc analysis... help! I've collected binary data on 9 damselflys under 6 light levels, 1=response to movement of optomotor drum, 0=no response. My data was imported into R with the headings 'Animal_ID, light_intensity, response'. Animal ID (1-9) repeated for each light intensity (3.36-0.61) (see below)
Using the following code (lme4 package), I've performed a GLMM and found a light level to have a significant effect on response:
d = data.frame(id = data[,1], var = data$Light_Intensity, Response = data$Response)
model <- glmer(Response~var+(1|id),family="binomial",data=d)
summary(model)
Returns
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod]
Family: binomial ( logit )
Formula: Response ~ var + (1 | Animal_ID)
Data: d
AIC BIC logLik deviance df.resid
66 72 -30 60 51
Scaled residuals:
Min 1Q Median 3Q Max
-3.7704 -0.6050 0.3276 0.5195 1.2463
Random effects:
Groups Name Variance Std.Dev.
Animal_ID (Intercept) 1.645 1.283
Number of obs: 54, groups: Animal_ID, 9
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.7406 1.0507 -1.657 0.0976 .
var 1.1114 0.4339 2.561 0.0104 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
var -0.846
Then running:
m1 <- update(model, ~.-var)
anova(model, m1, test = 'Chisq')
Returns
Data: d
Models:
m1: Response ~ (1 | Animal_ID)
model: Response ~ var + (1 | Animal_ID)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
m1 2 72.555 76.533 -34.278 68.555
model 3 66.017 71.983 -30.008 60.017 8.5388 1 0.003477 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I've installed the multcomp and lsmeans packages in an attempt at performing a Tukey post hoc to see where the difference is, but have run into difficulties with both.
Running:
summary(glht(m1,linfct=mcp("Animal_ID"="Tukey")))
Returns:
"Error in mcp2matrix(model, linfct = linfct) :
Variable(s) ‘Animal_ID’ have been specified in ‘linfct’ but cannot be found in ‘model’! "
Running:
lsmeans(model,pairwise~Animal_ID,adjust="tukey")
Returns:
"Error in lsmeans.character.ref.grid(object = new("ref.grid", model.info = list( :
No variable named Animal_ID in the reference grid"
I'm aware that I'm probably being very stupid here, but any help would be very much appreciated. My confusion is snowballing.
Also, does anyone have any suggestions as to how I might best visualize my results (and how to do this)?
Thank you very much in advance!
UPDATE:
New code-
Light <- c("3.36","3.36","3.36","3.36","3.36","3.36","3.36","3.36","3.36","2.98","2.98","2.98","2.98","2.98","2.98","2.98","2.98","2.98","2.73","2.73","2.73","2.73","2.73","2.73","2.73","2.73","2.73","2.15","2.15","2.15","2.15","2.15","2.15","2.15","2.15","2.15","1.72","1.72","1.72","1.72","1.72","1.72","1.72","1.72","1.72","0.61","0.61","0.61","0.61","0.61","0.61","0.61","0.61","0.61")
Subject <- c("1","2","3","4","5","6","7","8","9","1","2","3","4","5","6","7","8","9","1","2","3","4","5","6","7","8","9","1","2","3","4","5","6","7","8","9","1","2","3","4","5","6","7","8","9","1","2","3","4","5","6","7","8","9")
Value <- c("1","0","1","0","1","1","1","0","1","1","0","1","1","1","1","1","1","1","0","1","1","1","1","1","1","0","1","0","0","1","1","1","1","1","1","1","0","0","0","1","0","0","1","0","1","0","0","0","1","1","0","1","0","0")
data <- data.frame(Light, Subject, Value)
library(lme4)
model <- glmer(Value~Light+(1|Subject),family="binomial",data=data)
summary(model)
Returns:
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [
glmerMod]
Family: binomial ( logit )
Formula: Value ~ Light + (1 | Subject)
Data: data
AIC BIC logLik deviance df.resid
67.5 81.4 -26.7 53.5 47
Scaled residuals:
Min 1Q Median 3Q Max
-2.6564 -0.4884 0.2193 0.3836 1.2418
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 2.687 1.639
Number of obs: 54, groups: Subject, 9
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.070e+00 1.053e+00 -1.016 0.3096
Light1.72 -7.934e-06 1.227e+00 0.000 1.0000
Light2.15 2.931e+00 1.438e+00 2.038 0.0416 *
Light2.73 2.931e+00 1.438e+00 2.038 0.0416 *
Light2.98 4.049e+00 1.699e+00 2.383 0.0172 *
Light3.36 2.111e+00 1.308e+00 1.613 0.1067
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) Lg1.72 Lg2.15 Lg2.73 Lg2.98
Light1.72 -0.582
Light2.15 -0.595 0.426
Light2.73 -0.595 0.426 0.555
Light2.98 -0.534 0.361 0.523 0.523
Light3.36 -0.623 0.469 0.553 0.553 0.508
Then running:
m1 <- update(model, ~.-Light)
anova(model, m1, test= 'Chisq')
Returns:
Data: data
Models:
m1: Value ~ (1 | Subject)
model: Value ~ Light + (1 | Subject)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
m1 2 72.555 76.533 -34.278 68.555
model 7 67.470 81.393 -26.735 53.470 15.086 5 0.01 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Finally, running:
library(lsmeans)
lsmeans(model,list(pairwise~Light),adjust="tukey")
Returns (it actually works now!):
$`lsmeans of Light`
Light lsmean SE df asymp.LCL asymp.UCL
0.61 -1.070208 1.053277 NA -3.1345922 0.9941771
1.72 -1.070216 1.053277 NA -3.1345997 0.9941687
2.15 1.860339 1.172361 NA -0.4374459 4.1581244
2.73 1.860332 1.172360 NA -0.4374511 4.1581149
2.98 2.978658 1.443987 NA 0.1484964 5.8088196
3.36 1.040537 1.050317 NA -1.0180467 3.0991215
Results are given on the logit (not the response) scale.
Confidence level used: 0.95
$`pairwise differences of contrast`
contrast estimate SE df z.ratio p.value
0.61 - 1.72 7.933829e-06 1.226607 NA 0.000 1.0000
0.61 - 2.15 -2.930547e+00 1.438239 NA -2.038 0.3209
0.61 - 2.73 -2.930539e+00 1.438237 NA -2.038 0.3209
0.61 - 2.98 -4.048866e+00 1.699175 NA -2.383 0.1622
0.61 - 3.36 -2.110745e+00 1.308395 NA -1.613 0.5897
1.72 - 2.15 -2.930555e+00 1.438239 NA -2.038 0.3209
1.72 - 2.73 -2.930547e+00 1.438238 NA -2.038 0.3209
1.72 - 2.98 -4.048874e+00 1.699175 NA -2.383 0.1622
1.72 - 3.36 -2.110753e+00 1.308395 NA -1.613 0.5897
2.15 - 2.73 7.347728e-06 1.357365 NA 0.000 1.0000
2.15 - 2.98 -1.118319e+00 1.548539 NA -0.722 0.9793
2.15 - 3.36 8.198019e-01 1.302947 NA 0.629 0.9889
2.73 - 2.98 -1.118326e+00 1.548538 NA -0.722 0.9793
2.73 - 3.36 8.197945e-01 1.302947 NA 0.629 0.9889
2.98 - 3.36 1.938121e+00 1.529202 NA 1.267 0.8029
Results are given on the log odds ratio (not the response) scale.
P value adjustment: tukey method for comparing a family of 6 estimates
Your model specifies Animal_ID as a random effect. The glht and lsmeans functions work only for fixed-effect comparisons.
I am experiencing convergence warning and very large group variance while fitting a binary logistic GLMM model using lme4. I am wondering whether this could be related to (quasi) complete separation according to the random effect, i.e., the fact that many individuals (the random effect/grouping variable) have only 0 in the dependent variable resulting in low within individual variation? If this could be a problem, are there alternative modelling strategies to deal with such cases?
More precisely, I am studying the chance that an individual is observed in a given status (having children while leaving by their parents) at a given age. In other words, I have several observations for each individual (typically 50) specifying whether the individual was observed in this state at a given age. Here is an example:
id age status
1 21 0
1 22 0
1 23 0
1 24 1
1 25 0
1 26 1
1 27 0
...
The chance to observe a status of 1 is quite low (between 1 and 5% depending on the cases) and I have a lot of observations (150'000 observations and 3'000 individuals).
The model was fitted using glmer specifying ID (individual) as a random effect and including some explanatory factors (age categories, parental education and the period where the status was observed). I get the following convergence warnings (except when using nAGQ=0) and very large group variance (here more than 25).
"Model failed to converge with max|grad| = 2.21808 (tol = 0.001, component 2)"
"Model is nearly unidentifiable: very large eigenvalue\n - Rescale variables?"
Here is the obtained model.
AIC BIC logLik deviance df.resid
9625.0 9724.3 -4802.5 9605.0 151215
Scaled residuals:
Min 1Q Median 3Q Max
-2.529 -0.003 -0.002 -0.001 47.081
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 28.94 5.38
Number of obs: 151225, groups: id, 3067
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -10.603822 0.496392 -21.362 < 2e-16 ***
agecat[18,21) -0.413018 0.075119 -5.498 3.84e-08 ***
agecat[21,24) -1.460205 0.095315 -15.320 < 2e-16 ***
agecat[24,27) -2.844713 0.137484 -20.691 < 2e-16 ***
agecat[27,30) -3.837227 0.199644 -19.220 < 2e-16 ***
parent_educ -0.007390 0.003609 -2.048 0.0406 *
period_cat80 s 0.126521 0.113044 1.119 0.2630
period_cat90 s -0.105139 0.176732 -0.595 0.5519
period_cat00 s -0.507052 0.263580 -1.924 0.0544 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) a[18,2 a[21,2 a[24,2 a[27,3 prnt_d pr_80' pr_90'
agct[18,21) -0.038
agct[21,24) -0.006 0.521
agct[24,27) 0.006 0.412 0.475
agct[27,30) 0.011 0.325 0.393 0.378
parent_educ -0.557 0.059 0.087 0.084 0.078
perd_ct80 s -0.075 -0.258 -0.372 -0.380 -0.352 -0.104
perd_ct90 s -0.048 -0.302 -0.463 -0.471 -0.448 -0.151 0.732
perd_ct00 s -0.019 -0.293 -0.459 -0.434 -0.404 -0.138 0.559 0.739
You could try one of a few different optimizers available through the nloptr and optimx packages. There's even an allFit function available through the afex package that tries them for you (just see the allFit helpfile). e.g:
all_mod <- allFit(exist_model)
That will let you check how stable your estimates are. This points over to more resources on the gradient topic.
If you're worried about complete separation, see here for Ben Bolker's answer to use the bglmer function from the blme package. It operates much like glmer, but allows you to add priors to the model specification.