lme4 deviant/tratment contrast coding with interactions in R - levels are missing - r

I have a mixed effects model (with lme4) with a 2-way interaction term, each term having multiple levels (each 4) and I would like to investigate their effects in reference to their grand mean. I present this example here from the car data set and omit the error term since it is not neccessary for this example:
## shorten data frame for simplicity
df=Cars93[c(1:15),]
df=Cars93[is.element(Cars93$Make,c('Acura Integra', 'Audi 90','BMW 535i','Subaru Legacy')),]
df$Make=drop.levels(df$Make)
df$Model=drop.levels(df$Model)
## define contrasts (every factor has 4 levels)
contrasts(df$Make) = contr.treatment(4)
contrasts(df$Model) = contr.treatment(4)
## model
m1 <- lm(Price ~ Model*Make,data=df)
summary(m1)
as you can see, the first levels are omitted in the interaction term. And I would like to have all 4 levels in the output, referenced to the grand mean (often referred to deviant coding). These are the sources I looked at: https://marissabarlaz.github.io/portfolio/contrastcoding/#coding-schemes and How to change contrasts to compare with mean of all levels rather than reference level (R, lmer)?. The last reference does not report interactions though.

The simple answer is that what you want is not possible directly. You have to use a slightly different approach.
In a model with interactions, you want to use contrasts in which the mean is zero and not a specific level. Otherwise, the lower-order effects (i.e., main effects) are not main effects but simple effects (evaluated when the other factor level is at its reference level). This is explained in more details in my chapter on mixed models:
http://singmann.org/download/publications/singmann_kellen-introduction-mixed-models.pdf
To get what you want, you have to fit the model in a reasonable manner and then pass it to emmeans to compare against the intercept (i.e., the unweighted grand mean). This works also for interactions as shown below (as your code did not work, I use warpbreaks).
afex::set_sum_contrasts() ## uses contr.sum globally
library("emmeans")
## model
m1 <- lm(breaks ~ wool * tension,data=warpbreaks)
car::Anova(m1, type = 3)
coef(m1)[1]
# (Intercept)
# 28.14815
## both CIs include grand mean:
emmeans(m1, "wool")
# wool emmean SE df lower.CL upper.CL
# A 31.0 2.11 48 26.8 35.3
# B 25.3 2.11 48 21.0 29.5
#
# Results are averaged over the levels of: tension
# Confidence level used: 0.95
## same using test
emmeans(m1, "wool", null = coef(m1)[1], infer = TRUE)
# wool emmean SE df lower.CL upper.CL null t.ratio p.value
# A 31.0 2.11 48 26.8 35.3 28.1 1.372 0.1764
# B 25.3 2.11 48 21.0 29.5 28.1 -1.372 0.1764
#
# Results are averaged over the levels of: tension
# Confidence level used: 0.95
emmeans(m1, "tension", null = coef(m1)[1], infer = TRUE)
# tension emmean SE df lower.CL upper.CL null t.ratio p.value
# L 36.4 2.58 48 31.2 41.6 28.1 3.196 0.0025
# M 26.4 2.58 48 21.2 31.6 28.1 -0.682 0.4984
# H 21.7 2.58 48 16.5 26.9 28.1 -2.514 0.0154
#
# Results are averaged over the levels of: wool
# Confidence level used: 0.95
emmeans(m1, c("tension", "wool"), null = coef(m1)[1], infer = TRUE)
# tension wool emmean SE df lower.CL upper.CL null t.ratio p.value
# L A 44.6 3.65 48 37.2 51.9 28.1 4.499 <.0001
# M A 24.0 3.65 48 16.7 31.3 28.1 -1.137 0.2610
# H A 24.6 3.65 48 17.2 31.9 28.1 -0.985 0.3295
# L B 28.2 3.65 48 20.9 35.6 28.1 0.020 0.9839
# M B 28.8 3.65 48 21.4 36.1 28.1 0.173 0.8636
# H B 18.8 3.65 48 11.4 26.1 28.1 -2.570 0.0133
#
# Confidence level used: 0.95
Note that for coef() you probably want to use fixef() for lme4 models.

Related

How to evaluate a string variable as factor in the emmeans() command in R?

I would like to assign a variable with a custom factor from an ANOVA model to the emmeans() statement. Here I use the oranges dataset from R to make the code reproducible. This is my model and how I would usually calculate the emmmeans of the factor store:
library(emmeans)
oranges$store<-as.factor(oranges$store)
model <- lm (sales1 ~ 1 + price1 + store ,data=oranges)
means<-emmeans(model, pairwise ~ store, adjust="tukey")
Now I would like to assign a variable (lsmeanfact) defining the factor for which the lsmeans are calculated.
lsmeanfact<-"store"
However, when I want to evaluate this variable in the emmeans() function it returns an error, it basically does not find the variable lsmeanfact, so it does not evaluate this variable.
means<-emmeans(model, pairwise ~ eval(parse(lsmeanfact)), adjust="tukey")
Error in emmeans(model, pairwise ~ eval(parse(lsmeanfact)), adjust = "tukey") :
No variable named lsmeanfact in the reference grid
How should I change my code to be able to evaluate the variable lsmeanfact so that the lsmeans for "plantcode" are correctly calculated?
You can make use of reformulate function.
library(emmeans)
lsmeanfact<-"store"
means <- emmeans(model, reformulate(lsmeanfact, 'pairwise'), adjust="tukey")
Or construct a formula with formula/as.formula.
means <- emmeans(model, formula(paste('pairwise', lsmeanfact, sep = '~')), adjust="tukey")
Here both reformulate(lsmeanfact, 'pairwise') and formula(paste('pairwise', lsmeanfact, sep = '~')) return pairwise ~ store.
You do not need to do anything special at all. The specs argument to emmeans() can be a character value. You can get the pairwise comparisons in a separate call, which is actually a better way to go anyway.
library(emmeans)
model <- lm(sales1 ~ price1 + store, data = oranges)
lsmeanfact <- "store"
( EMM <- emmeans(model, lsmeanfact) )
## store emmean SE df lower.CL upper.CL
## 1 8.01 2.61 29 2.67 13.3
## 2 9.60 2.30 29 4.89 14.3
## 3 7.84 2.30 29 3.13 12.6
## 4 10.44 2.35 29 5.63 15.2
## 5 10.19 2.28 29 5.53 14.9
## 6 15.22 2.28 29 10.56 19.9
##
## Confidence level used: 0.95
pairs(EMM)
## contrast estimate SE df t.ratio p.value
## 1 - 2 -1.595 3.60 29 -0.443 0.9976
## 1 - 3 0.165 3.60 29 0.046 1.0000
## 1 - 4 -2.428 3.72 29 -0.653 0.9856
## 1 - 5 -2.185 3.50 29 -0.625 0.9882
## 1 - 6 -7.209 3.45 29 -2.089 0.3206
## 2 - 3 1.761 3.22 29 0.546 0.9936
## 2 - 4 -0.833 3.23 29 -0.258 0.9998
## 2 - 5 -0.590 3.23 29 -0.182 1.0000
## 2 - 6 -5.614 3.24 29 -1.730 0.5239
## 3 - 4 -2.593 3.23 29 -0.802 0.9648
## 3 - 5 -2.350 3.23 29 -0.727 0.9769
## 3 - 6 -7.375 3.24 29 -2.273 0.2373
## 4 - 5 0.243 3.26 29 0.075 1.0000
## 4 - 6 -4.781 3.28 29 -1.457 0.6930
## 5 - 6 -5.024 3.23 29 -1.558 0.6314
##
## P value adjustment: tukey method for comparing a family of 6 estimates
Created on 2021-06-29 by the reprex package (v2.0.0)
Moreover, in any case what is needed in specs are the name(s) of the factors involved, not the factors themselves. Note also that it was unnecessary to convert store to a factor before fitting the model

Effect size (Cohen's d) for pairwise comparisons

I'm trying to calculate the effect size among different factor levels. To compare the two means within each factor level, the code below works fine:
cohens_d_list <- by(mydata, mydata$factor, function(sub)
cohens_d(sub$score1, sub$score2)
)
cohens_d_list
However, I couldn't figure out how to compare each factor level for a single mean (e.g. for score1, I want to compare each factor level with each other: factor level 1 vs. factor level 2, factor level 1 vs. factor level 3, factor level 1. vs factor level 4....) with each other. I used psych, effectsize, and effsize packages, but they don't seem to account for more than 2 levels in a single factor variable. Any suggestions for a code or package?
After trying dozens of packages, esvis package did the trick.
df%>%
ungroup(Group)%>% # Include this line if you get grouping error
coh_d(score1~ Group)
You get a nice table with all possible comparisons.
You can fit a model and use the eff_size() function from emmeans (which will have the benefit of using the pooled SD from all groups, not just the 2 being compared):
m <- lm(mpg ~ factor(cyl), data = mtcars)
library(emmeans)
(em <- emmeans(m, ~ cyl))
#> cyl emmean SE df lower.CL upper.CL
#> 4 26.7 0.972 29 24.7 28.7
#> 6 19.7 1.218 29 17.3 22.2
#> 8 15.1 0.861 29 13.3 16.9
#>
#> Confidence level used: 0.95
eff_size(em, sigma = sigma(m), edf = df.residual(m))
#> contrast effect.size SE df lower.CL upper.CL
#> 4 - 6 2.15 0.56 29 1.003 3.29
#> 4 - 8 3.59 0.62 29 2.320 4.86
#> 6 - 8 1.44 0.50 29 0.418 2.46
#>
#> sigma used for effect sizes: 3.223
#> Confidence level used: 0.95
Created on 2021-06-07 by the reprex package (v2.0.0)

Is the emmeans (R) intercept-only function broken?

I've noticed that emmeans (in R) isn't working for an intercept-only estimate after the latest update.
Reproducible example:
test=lm(mpg~1,mtcars)
library(emmeans)
emmeans::emmeans(test,~1)
The output on 2 of my machines (windows and Linux) is:
> emmeans::emmeans(test,~1)
Error in `[[<-.data.frame`(`*tmp*`, ".wgt.", value = 2) :
replacement has 1 row, data has 0
Is this a known issue, or have I messed up my system somehow?
This used to work I believe.
It does work if you include a variable:
test2=lm(mpg~as.factor(cyl),mtcars)
emmeans(test2,~cyl)
Thanks very much for the help in advance.
It turns out that the fix for issue #197 -- and incorporated in CRAN version 1.47 -- created the issue (#206) that we see here. I think I have them both fixed now:
require(emmeans)
## Loading required package: emmeans
#206...
warp.lm <- lm(breaks ~ wool * tension, data = warpbreaks)
emmeans(warp.lm, "1")
## 1 emmean SE df lower.CL upper.CL
## overall 28.1 1.49 48 25.2 31.1
##
## Results are averaged over the levels of: wool, tension
## Confidence level used: 0.95
emmeans(warp.lm, "1", by = "wool")
## wool = A:
## 1 emmean SE df lower.CL upper.CL
## overall 31.0 2.11 48 26.8 35.3
##
## wool = B:
## 1 emmean SE df lower.CL upper.CL
## overall 25.3 2.11 48 21.0 29.5
##
## Results are averaged over the levels of: tension
## Confidence level used: 0.95
#197...
model <- lm(Sepal.Length ~ poly(Petal.Length,2), data = iris)
emtrends(model, ~ 1, "Petal.Length", max.degree = 2)
## degree = linear:
## 1 Petal.Length.trend SE df lower.CL upper.CL
## overall 0.4474 0.0180 147 0.4119 0.483
##
## degree = quadratic:
## 1 Petal.Length.trend SE df lower.CL upper.CL
## overall 0.0815 0.0132 147 0.0554 0.108
##
## Confidence level used: 0.95
Created on 2020-06-01 by the reprex package (v0.3.0)
Users who need this now can install from github via
remotes::install_github("rvlenth/emmeans")
It is working fine with emmeans - 1.4.6 on macOS Catalina 10.15.4 and R 4.0
emmeans::emmeans(test,~1)
# 1 emmean SE df lower.CL upper.CL
# overall 20.1 1.07 31 17.9 22.3
#Confidence level used: 0.95

How to get absolute difference estimate and confidence intervals from log(x+1) variable with emmeans

I have a mixed effect model with a log(x+1) transformed response variable. The output from emmeans with the type as "response" provides the mean and confidence intervals for both groups that I am comparing. However what I want is the mean and CI of the difference between the groups (i.e. the estimate). emmeans only provides the ratio (with type="response") or the log ratio (with type="link") and I am unsure how to change this into absolute values. If you run the model without the log(x+1) transformation then emmeans provides the estimated difference and CI around this difference, not the ratios. How can I also do this when my response variable is log(x+1) transformed?
bmnameF.lme2 = lme(log(bm+1)~TorC*name, random=~TorC|site,
data=matched.cases3F, method='REML')
emmeans(lme, pairwise~TorC,
type='response')%>%confint(OmeanFHR[[2]])%>%as.data.frame
emmeans.TorC emmeans.emmean emmeans.SE emmeans.df emmeans.lower.CL emmeans.upper.CL contrasts.contrast contrasts.estimate contrasts.SE contrasts.df contrasts.lower.CL contrasts.upper.CL
Managed 376.5484 98.66305 25 219.5120 645.9267 Managed - Open 3.390123 1.068689 217 1.821298 6.310297
Open 111.0722 43.15374 25 49.8994 247.2381 Managed - Open 3.390123 1.068689 217 1.821298 6.310297
Let me show a different example so the results are reproducible to all viewers:
mod = lm(log(breaks+1) ~ wool*tension, data = warpbreaks)
As you see, with a log transformation, comparisons/contrasts are expressed as ratios by default. But this can be changed by specifying transform instead of type in the emmeans() call:
> emmeans(mod, pairwise ~ tension|wool, transform = "response")
$emmeans
wool = A:
tension response SE df lower.CL upper.CL
L 42.3 5.06 48 32.1 52.4
M 23.6 2.83 48 17.9 29.3
H 23.7 2.83 48 18.0 29.4
wool = B:
tension response SE df lower.CL upper.CL
L 27.7 3.32 48 21.0 34.4
M 28.4 3.40 48 21.6 35.3
H 19.3 2.31 48 14.6 23.9
Confidence level used: 0.95
$contrasts
wool = A:
contrast estimate SE df t.ratio p.value
L - M 18.6253 5.80 48 3.213 0.0065
L - H 18.5775 5.80 48 3.204 0.0067
M - H -0.0479 4.01 48 -0.012 0.9999
wool = B:
contrast estimate SE df t.ratio p.value
L - M -0.7180 4.75 48 -0.151 0.9875
L - H 8.4247 4.04 48 2.086 0.1035
M - H 9.1426 4.11 48 2.224 0.0772
P value adjustment: tukey method for comparing a family of 3 estimates
Or, you can do this later via the regrid() function:
emm1 = emmeans(mod, ~ tension | wool)
emm2 = regrid(emm1)
emm2 # estimates
pairs(emm2) # comparisons
regrid() creates a new emmGrid object where everything is already back-transformed, thus side-stepping the behavior that happens with contrasts of log-transformed results. (In the previous illustration, the transform argument just calls regrid after it constructs the reference grid.)
But there is another subtle thing going on: The transformation is auto-detected as log; the +1 part is ignored. Thus, the back-transformed estimates are all too large by 1. To get this right, you need to use the make.tran() function to create this generalization of the log transformation:
> emm3 = update(emmeans(mod, ~ tension | wool), tran = make.tran("genlog", 1))
> str(emm3)
'emmGrid' object with variables:
tension = L, M, H
wool = A, B
Transformation: “log(mu + 1)”
> regrid(emm3)
wool = A:
tension response SE df lower.CL upper.CL
L 41.3 5.06 48 31.1 51.4
M 22.6 2.83 48 16.9 28.3
H 22.7 2.83 48 17.0 28.4
wool = B:
tension response SE df lower.CL upper.CL
L 26.7 3.32 48 20.0 33.4
M 27.4 3.40 48 20.6 34.3
H 18.3 2.31 48 13.6 22.9
Confidence level used: 0.95
The comparisons will come out the same as shown earlier, because offsetting all the means by 1 doesn't affect the pairwise differences.
See vignette("transformations", "emmeans") or https://cran.r-project.org/web/packages/emmeans/vignettes/transformations.html for more details.

How to get solution for mixed model using nlme package

My data look like this
Study NDF ADF CP Eeff
1 35.8 24.4 18.6 34.83181476
1 35.8 24.4 18.6 33.76824264
1 35.8 24.4 18.6 32.67390287
1 35.8 24.4 18.6 33.05520666
2 39.7 23.4 16.1 33.19730252
2 39.4 22.9 16.3 34.04709188
3 28.9 20.6 18.7 33.22501606
3 27.1 18.9 17.9 33.80766289
Of course, I have 80 lines like this.
I used lme function to run a mixed model (Study as random effect), as following:
fm1<-lme(Eeff~NDF+ADF+CP,random=~1|Study, data=na.omit(phuong))
I got this result:
Fixed effects: Ratio ~ ADF + CP + FCM + DMI + DIM
Value Std.Error DF t-value p-value
(Intercept) 3.1199808 0.16237303 158 19.214896 0.0000
ADF -0.0265626 0.00406990 158 -6.526603 0.0000
CP -0.0534021 0.00539108 158 -9.905636 0.0000
FCM -0.0149314 0.00353524 158 -4.223598 0.0000
DMI 0.0072318 0.00498779 158 1.449894 0.1491
DIM -0.0008994 0.00019408 158 -4.634076 0.0000
Correlation:
(Intr) ADF CP FCM DMI
ADF -0.628
CP -0.515 0.089
FCM -0.299 0.269 -0.203
DMI -0.229 -0.145 0.083 -0.624
DIM -0.113 0.127 -0.061 0.010 -0.047
These results show the case where intercept is random but slope is fixed. How can I see my 80 intercept, for example, like below when I used study as fixed effect:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0021083 0.0102536 -0.206 0.837351
ADF 0.0005248 0.0002962 1.772 0.078313 .
CP 0.0021131 0.0003277 6.448 1.26e-09 ***
factor(Study)2 0.0057274 0.0038709 1.480 0.140933
factor(Study)3 0.0117722 0.0035262 3.338 0.001046 **
factor(Study)4 0.0091049 0.0043227 2.106 0.036730 *
factor(Study)6 0.0149733 0.0045345 3.302 0.001182 **
factor(Study)7 0.0065518 0.0036837 1.779 0.077196 .
factor(Study)8 0.0066134 0.0035371 1.870 0.063337 .
factor(Study)9 0.0086758 0.0036641 2.368 0.019083 *
factor(Study)10 0.0105657 0.0041296 2.559 0.011434 *
factor(Study)11 0.0083694 0.0040194 2.082 0.038900 *
factor(Study)16 0.0171258 0.0028962 5.913 1.95e-08 ***
factor(Study)18 0.0019277 0.0042300 0.456 0.649209
factor(Study)20 0.0172469 0.0040412 4.268 3.36e-05 ***
factor(Study)23 0.0132676 0.0031658 4.191 4.57e-05 ***
factor(Study)24 0.0063313 0.0031519 2.009 0.046236 *
factor(Study)25 0.0050929 0.0039135 1.301 0.194989
Thank you very much,
Phuong
You didn't give us a reproducible question, but the answer is to use coef, for example:
> library(nlme)
> fm1 <- lme(distance~age,random=~1|Subject,data=Orthodont)
> coef(fm1)
(Intercept) age
M16 15.84314 0.6601852
M05 15.84314 0.6601852
M02 16.17959 0.6601852
M11 16.40389 0.6601852
M07 16.51604 0.6601852
M08 16.62819 0.6601852
M03 16.96464 0.6601852
[snip]
use fixef() to get just the fixed effect coefficients
use ranef() to get just the random effects (i.e. deviations of each individual from the fixed coefficients
the Orthodont example in lme actually uses a random-slope(+intercept) model; here I have fitted a random-intercept model, so the estimated slope (age parameter) is the same for every individual
it looks like individuals are sorted in increasing order of estimated random effect

Resources