emmeans - control vs treatment for more than one factor - r

I'm using emmeans to perform custom comparisons to a control group. The trt.vs.ctrl approach works perfectly for me if I'm only interested in comparing one factor, but then fails (or I fail) when I set the comparison to be more complicated (i.e., the control group is described by a specific combination of 2+ variables).
Example code below. Say that using the pigs data, I want to compare all diets to the low percent fish diet. Note how in the nd data frame, "fish" only has 9% associated with it. However, when I run emmeans, the function does not pick up on the nesting, and while the control is correct, the treatment groups also include various values of fish and percents. Which means that the p-value adjustment is wrong.
So the two approaches I can think of:
How do I make emmeans pick up on the nesting in this case, or
How do I do the dunnettx adjustment manually (=I can use adjustment "none", then pull out the tests I actually want, and adjust the p-value myself?).
library(emmeans)
library(dplyr)
pigs.lm <- lm(log(conc) ~ source + factor(percent), data = pigs)
nd <- expand.grid(source = levels(pigs$source), percent = unique(pigs$percent)) %>%
filter(percent == 9 | source != "fish")
emmeans(pigs.lm, trt.vs.ctrl ~ source + percent,
data = nd, covnest = TRUE, cov.reduce = FALSE)
Appreciate your help.
The suggestion to use include worked perfectly. Posting my code here in case anyone else has the same issue in the future.
library(emmeans)
library(dplyr)
library(tidyr)
pigs.lm <- lm(log(conc) ~ source + factor(percent), data = pigs)
nd <- expand.grid(source = levels(pigs$source), percent = unique(pigs$percent)) %>%
filter(percent == 9 | source != "fish")
ems <- emmeans(pigs.lm, trt.vs.ctrl ~ source + percent,
data = nd, covnest = TRUE, cov.reduce = FALSE)
# to identify which levels to exclude - in this case,
# I only want the low-percent fish to remain as the ref level
aux <- as.data.frame(ems[[1]]) %>%
mutate(ID = 1:n()) %>%
filter(!grepl("fish", source) | ID == 1)
emmeans(pigs.lm, trt.vs.ctrl ~ source + percent,
data = nd, covnest = TRUE, cov.reduce = FALSE, include = aux$ID)

I'm not totally clear on what you are trying to accomplish, but I don't think filtering the data is the solution.
If your goal is to compare the marginal means for source with the (fish, 9 percent) combination, you can do it by constructing two sets of emmeans, then subsetting and combining:
emm1 = emmeans(pigs.lm, "source")
emm2 = emmeans(pigs.lm, ~source*percent)
emm3 = emm2[1] + emm1 # or rbind(emm2[1], emm1)
Then you get
> confint(emm3, adjust ="none")
source percent emmean SE df lower.CL upper.CL
fish 9 3.22 0.0536 23 3.11 3.33
fish . 3.39 0.0367 23 3.32 3.47
soy . 3.67 0.0374 23 3.59 3.74
skim . 3.80 0.0394 23 3.72 3.88
Results are averaged over some or all of the levels of: percent
Results are given on the log (not the response) scale.
Confidence level used: 0.95
> contrast(emm3, "trt.vs.ctrl1")
contrast estimate SE df t.ratio p.value
fish,. - fish,9 0.174 0.0366 23 4.761 0.0002
soy,. - fish,9 0.447 0.0678 23 6.595 <.0001
skim,. - fish,9 0.576 0.0696 23 8.286 <.0001
Results are averaged over some or all of the levels of: percent
Results are given on the log (not the response) scale.
P value adjustment: dunnettx method for 3 tests
Another (much more tedious, more error-prone) way to do the same thing is to get the EMMs for the factor combinations, and then use custom contrasts:
> contrast(emm2, list(con1 = c(-3,0,0, 1,0,0, 1,0,0, 1,0,0)/4,
+ con2 = c(-4,1,0, 0,1,0, 0,1,0, 0,1,0)/4,
+ con3 = c(-4,0,1, 0,0,1, 0,0,1, 0,0,1)/4),
+ adjust = "mvt")
contrast estimate SE df t.ratio p.value
con1 0.174 0.0366 23 4.761 0.0002
con2 0.447 0.0678 23 6.595 <.0001
con3 0.576 0.0696 23 8.286 <.0001
Results are given on the log (not the response) scale.
P value adjustment: mvt method for 3 tests
(The mvt adjustment is the exact correction for which dunnettx is only an approximation. It doesn't default to mvt because it is computationally heavy for a large number of tests.)
In answer to the last part of the question, you may use exclude (or include) to focus on a subset of the levels; see ? pairwise.emmc.

Related

Any way to reverse the direction of comparisons when using emmeans contrast with "interaction" argument?

I'm trying to use emmeans to test "contrasts of contrasts" with custom orthogonal contrasts applied to a zero-inflated negative binomial model. The study design has 4 groups (study_group: grp1, grp2, grp3, grp4), each of which is assessed at 3 timepoints (time: Time1, Time2, Time3).
With the code below, I am able to get very close to, but not exactly, what I want. The contrasts that emerge are expressed in terms of ratios such as grp1/grp2, grp1/grp3,..., grp3/grp4 ("lower over higher"; see output following code).
What would be immensely helpful to me to have a way to flip these ratios to be grp2/grp1, grp3/grp1,..., grp4/grp3 ("higher over lower"). I've tried sticking reverse=TRUE in various spots, but to no effect.
Short of re-leveling the study_group factor, is there anyway to do this in emmeans?
Thanks!
library(glmmTMB)
library(emmeans)
set.seed(3456)
# Building grid for study design: 4 groups of 3 sites,
# each with 20 participants observed 3 times
site <- rep(1:12, each=60)
pid <- 1000*site+10*(rep(rep(1:20,each=3),12))
study_group <- c(rep("grp1",180), rep("grp2",180), rep("grp3",180), rep("grp4",180))
grp_num <- c(rep(0,180), rep(1,180), rep(2,180), rep(3,180))
time <- c(rep(c("Time1", "Time2", "Time3"),240))
time_num <- c(rep(c(0:2),240))
# Site-level random effects (intercepts)
site_eff_count = rep(rnorm(12, mean = 0, sd = 0.5), each = 60)
site_eff_zeros = rep(rnorm(12, mean = 0, sd = 0.5), each = 60)
# Simulating a neg binomial outcome
y_count <- rnbinom(n = 720, mu=exp(3.25 + grp_num*0.15 + time_num*-0.20 + grp_num*time_num*0.15 + site_eff_count), size=0.8)
# Simulating some extra zeros
log_odds = (-1.75 + grp_num*0.2 + time_num*-0.40 + grp_num*time_num*0.50 + site_eff_zeros)
prob_1 = plogis(log_odds)
prob_0 = 1 - prob_1
y_zeros <- rbinom(n = 720, size = 1, prob = prob_0)
# Building datasest with ZINB-ish outcome
data_ZINB <- data.frame(site, pid, study_group, time, y_count, y_zeros)
data_ZINB$y_obs <- ifelse(y_zeros==1, y_count, 0)
# Estimating ZINB GLMM in glmmTMB
mod_ZINB <- glmmTMB(y_obs ~ 1
+ study_group + time + study_group*time
+ (1|site),
family=nbinom2,
zi = ~ .,
data=data_ZINB)
#summary(mod_ZINB)
# Getting model-estimated "cell" means for conditional (non-zero) sub-model
# in response (not linear predictor) scale
count_means <- emmeans(mod_ZINB,
pairwise ~ time | study_group,
component="cond",
type="response",
adjust="none")
# count_means
# Defining custom contrast function for orthogonal time contrasts
# contr1 = Time 2 - Time 1
# contr2 = Time 3 - Times 1 and 2
compare_arms.emmc <- function(levels) {
k <- length(levels)
contr1 <- c(-1,1,0)
contr2 <- c(-1,-1,2)
coef <- data.frame()
coef <- as.data.frame(lapply(seq_len(k - 1), function(i) {
if(i==1) contr1 else contr2
}))
names(coef) <- c("T1vT2", "T1T2vT3")
attr(coef, "adjust") = "none"
coef
}
# Estimating pairwise between-group "contrasts of contrasts"
# i.e., testing if time contrasts differ across groups
compare_arms_contrast <- contrast(count_means[[1]],
interaction = c("compare_arms", "pairwise"),
by = NULL)
compare_arms_contrast
applying theemmeans::contrast function as above yields this:
time_compare_arms study_group_pairwise ratio SE df null t.ratio p.value
T1vT2 grp1 / grp2 1.091 0.368 693 1 0.259 0.7957
T1T2vT3 grp1 / grp2 0.623 0.371 693 1 -0.794 0.4276
T1vT2 grp1 / grp3 1.190 0.399 693 1 0.520 0.6034
T1T2vT3 grp1 / grp3 0.384 0.241 693 1 -1.523 0.1283
T1vT2 grp1 / grp4 0.664 0.245 693 1 -1.108 0.2681
.
.
.
T1T2vT3 grp3 / grp4 0.676 0.556 693 1 -0.475 0.6346
Tests are performed on the log scale
The answer, provided by Russ Lenth in the comments and in the emmeans documentation for the contrast function, is to replace pairwise with revpairwise in the contrast function call.

R lme4 model: calculating effect size between continuous predictor's max-min value

I'm struggling to calculate an effect size between a continuous predictor's max-min value while using an R lme4 multilevel model.
Simulated data: predictor "x" ranges from 1 to 3
library(tidyverse)
n = 100
a = tibble(y = rep(c("pos", "neg", "neg", "neg"), length.out = n), x = rep(3, length.out = n), group = rep(letters[1:7], length.out = n))
b = tibble(y = rep(c("pos", "pos", "neg", "neg"), length.out = n), x = rep(2, length.out = n), group = rep(letters[1:7], length.out = n))
c = tibble(y = rep(c("pos", "pos", "pos", "neg"), length.out = n), x = rep(1, length.out = n), group = rep(letters[1:7], length.out = n))
d = rbind(a, b)
df = rbind(d, c)
df = df %>% mutate(y = as.factor(y))
df
Model
library("lme4")
m = glmer(
y ~ x + (x | group),
data = df,
family = binomial(link = "logit"))
Output
ggpredict(m, "x")
.
# Predicted probabilities of y
x | Predicted | 95% CI
----------------------------
1 | 0.75 | [0.67, 0.82]
2 | 0.50 | [0.44, 0.56]
3 | 0.25 | [0.18, 0.33]
Adjusted for:
* group = 0 (population-level)
I'm failing to calculate the effect size between the predictor's "x" max (3) and min (1) value
My best try
library("emmeans")
emmeans(m, "x", trans = "logit", type = "response", at = list(x = c(1, 3)))
x response SE df asymp.LCL asymp.UCL
1 0.75 0.0387 Inf 0.667 0.818
3 0.25 0.0387 Inf 0.182 0.333
Confidence level used: 0.95
Intervals are back-transformed from the logit scale
How can I calculate the effect size with CIs between the predictor's "x" max (3) and min (1) value? The effect size should be in probability scale.
I'll try to answer, though I'm still not sure what the question is. I am going to assume that what is wanted is the difference between the two probabilities.
There are a lot of moving parts in the emmeans call shown, so I will proceed in smaller steps. First, let's get estimates of the probabilities in question:
> library(emmeans)
> EMM = emmeans(m, "x", at = list(x = c(1, 3)), type = "response")
> EMM
x prob SE df asymp.LCL asymp.UCL
1 0.75 0.0387 Inf 0.667 0.818
3 0.25 0.0387 Inf 0.182 0.333
Confidence level used: 0.95
Intervals are back-transformed from the logit scale
The quickest way to obtain a pairwise comparison is via
> pairs(EMM)
contrast odds.ratio SE df null z.ratio p.value
1 / 3 9 2.94 Inf 1 6.728 <.0001
Tests are performed on the log odds ratio scale
As stated in the annotations (and also in the documentation, e.g. the vignette on comparisons, when a log or logit transformation is in place, the comparison is shown as a ratio. This happens because the tests are performed on the link (logit) scale, and the difference between logs is the log of a ratio.
If we want the difference between probabilities, it is necessary to create a new object where the primary quantities being estimated are the probabilities, rather than their logits. In emmeans, this may be done via the regrid() function:
> EMMP = regrid(EMM, transform = "response")
> EMMP
x prob SE df asymp.LCL asymp.UCL
1 0.75 0.0387 Inf 0.674 0.826
3 0.25 0.0387 Inf 0.174 0.326
Confidence level used: 0.95
This output looks a lot like the summary of EMM; however, all memory of the logit transformation has been erased, thus the confidence intervals are different because they are calculated directly from the SEs of the prob estimates. For more information, see the vignette on transformations.
So now if we compare these, we get the difference of the probabilities:
> confint(pairs(EMMP))
contrast estimate SE df asymp.LCL asymp.UCL
1 - 3 0.5 0.0612 Inf 0.38 0.62
Confidence level used: 0.95
(Note: I wrapped this in confint() so that we woul;d obtain a confidence interval, rather than the default summary of the t ratio and P value.)
This could be accomplished in one line of code as follows:
confint(pairs(emmeans(m, "x", transform = "response", at = list(x = c(1, 3)))))
The transform argument requests that the reference grid be immediately passed to regrid(). Note that the correct argument here is transform = "response", rather than transform = "logit" (that is, specify what you want to end with, not what you started with). The latter undoes, then redoes, the logit transformation, putting you back where you started.
The emmeans package provides a lot of options, and I really do recommend reading the vignettes.

rstatix package anova_test function gives partial eta squared despite setting effect.size = 'ges'

I am not able to obtain eta squared, only partial eta squared, when I use rstatix::anova_test.
Example from the iris dataset:
First using aov:
aov <- aov(Sepal.Length ~ Sepal.Width + Species, data = iris)
summary(aov)
Df Sum Sq Mean Sq F value Pr(>F)
Sepal.Width 1 1.41 1.41 7.363 0.00746 **
Species 2 72.75 36.38 189.651 < 2e-16 ***
Residuals 146 28.00 0.19
Then using sjstats::eta_sq, if I choose partial = TRUE or FALSE I get a different effect size, as I would expect.
eta_sq(aov, partial = FALSE)
term etasq
1 Sepal.Width 0.014
2 Species 0.712
eta_sq(aov, partial = TRUE)
term partial.etasq
1 Sepal.Width 0.048
2 Species 0.722
However, when I do the same in anova_test, I get the partial eta squared both times regardless of whether the effect size is pes or ges, both times it's the partial eta squared:
aov_pes <- iris %>% anova_test(Sepal.Length ~ Sepal.Width + Species,
detailed = T,
effect.size = "pes")
get_anova_table(aov_pes)
Effect SSn SSd DFn DFd F p p<.05
1 Sepal.Width 10.953 28.004 1 146 57.102 4.19e-12 *
2 Species 72.752 28.004 2 146 189.651 2.56e-41 *
pes
1 0.281
2 0.722
aov_ges <- iris %>% anova_test(Sepal.Length ~ Sepal.Width + Species,
detailed = T,
effect.size = "ges")
get_anova_table(aov_ges)
Effect SSn SSd DFn DFd F p p<.05
1 Sepal.Width 10.953 28.004 1 146 57.102 4.19e-12 *
2 Species 72.752 28.004 2 146 189.651 2.56e-41 *
ges
1 0.281
2 0.722
Does anyone know why this is? Thanks!
Answer
rstatix::anova_test seems to contain a mistake in the calculation! I would be very, very careful with this function.
Note that eta_sq is deprecated, and effectsize::eta_squared should be used.
Proper calculation
We have three SS values: 1.412238, 72.752431, and 28.003665. We can calculate the pes and ges:
pes: 1.412238 / (1.412238 + 28.003665)
ges: 1.412238 / (1.412238 + 72.752431 + 28.003665)
anova_test
Under the hood, anova_test calls two functions for pes and ges calculation:
pes: rstatix:::add_partial_eta_squared
ges: rstatix:::add_generalized_eta_squared
The pes calculation by anova_test
res.anova.summary$ANOVA %>% mutate(pes = .data$SSn/(.data$SSn + .data$SSd))
This indeed calculates the pes as we expect it to.
The ges calculation by anova_test
aov.table <- res.anova.summary$ANOVA
aov.table %>% mutate(ges = .data$SSn/(.data$SSn + sum(unique(.data$SSd)) + obs.SSn1 - obs.SSn2))
Here, we run into a problem. This code seems blatantly incorrect. It just divides each sum of square value by itself + the residual sum of squares (28.004). That is the pes, not the ges.
You could contact the maintainer of the package (maintainer("rstatix")) or create a new issue for the rstatix package here.

Why do I get empty results with ls_means statement in lmerTest?

Here's my data:
subject arm treat bline change
'subject1' 'L' N 6.3597 4.9281
'subject1' 'R' T 10.3499 1.8915
'subject3' 'L' N 12.4108 -0.9008
'subject3' 'R' T 13.2422 -0.7357
'subject4' 'L' T 8.7383 2.756
'subject4' 'R' N 10.8257 -0.531
'subject5' 'L' N 7.1766 2.0536
'subject5' 'R' T 8.1369 1.9841
'subject6' 'L' T 10.3978 9.0743
'subject6' 'R' N 11.3184 3.381
'subject8' 'L' T 10.7251 2.9658
'subject8' 'R' N 10.9818 2.9908
'subject9' 'L' T 7.3745 2.9143
'subject9' 'R' N 9.4863 -3.0847
'subject10' 'L' T 11.8132 -2.1629
'subject10' 'R' N 9.5287 0.1401
'subject11' 'L' T 8.2977 6.2219
'subject11' 'R' N 9.3691 0.7408
'subject12' 'L' T 12.6003 -0.7645
'subject12' 'R' N 11.7329 0.0342
'subject13' 'L' N 9.4918 2.0716
'subject13' 'R' T 9.6205 1.5705
'subject14' 'L' T 9.3945 4.6176
'subject14' 'R' N 11.0176 1.445
'subject16' 'L' T 8.0221 1.4751
'subject16' 'R' N 9.8307 -2.3697
When I fit a mixed model with treat and arm as factors:
m <- lmer(change ~ bline + treat + arm + (1|subject), data=change1)
ls_means(m, which = NULL, level=0.95, ddf="Kenward-Roger")
The ls_means statement returns no result. Can anyone help with what is going wrong?
I too see empty results:
> ls_means(m, which = NULL, level=0.95, ddf="Kenward-Roger")
Least Squares Means table:
Estimate Std. Error df t value lower upper Pr(>|t|)
Confidence level: 95%
Degrees of freedom method: Kenward-Roger
However, the emmeans package works fine. You can use emmeans() or lsmeans() -- the latter just re-labels the emmeans() results. "Estimated marginal means" is a more generally-appropriate term.
> library(emmeans)
> lsmeans(m, "treat")
treat lsmean SE df lower.CL upper.CL
N 0.996 0.72 15 -0.539 2.53
T 2.290 0.72 15 0.755 3.82
Results are averaged over the levels of: arm
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
> lsmeans(m, "arm")
arm lsmean SE df lower.CL upper.CL
L 1.97 0.737 15.6 0.403 3.53
R 1.32 0.737 15.6 -0.248 2.88
Results are averaged over the levels of: treat
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
I suspect that lmerTest::ls_means() does not support predictors of class "character". If you change treat and arm to factors, it may work.
We're going to need more information. Here's a reproducible example that seems just fine:
set.seed(101)
library(lme4)
library(lmerTest)
dd <- expand.grid(subject=factor(1:40), arm=c("L","R"))
## replicate N/T in random order for each subject
dd$treat <- c(replicate(40,sample(c("N","T"))))
dd$bline <- rnorm(nrow(dd))
dd$change <- simulate(~bline+treat+arm+(1|subject),
newdata=dd,
newparams=list(beta=rep(1,4),
theta=1,
sigma=1))[[1]]
m <- lmer(change ~ bline + treat + arm + (1|subject), data=dd)
ls_means(m, which = NULL, level=0.95, ddf="Kenward-Roger")
## Least Squares Means table:
##
## Estimate Std. Error df t value lower upper Pr(>|t|)
## armL 1.37494 0.22716 55.6 6.0527 0.91981 1.83007 1.275e-07 ***
## armR 2.54956 0.22716 55.6 11.2235 2.09443 3.00469 6.490e-16 ***
My best guess at this point is that you are having some problem with model fitting. lmerTest can sometimes be opaque/swallow warnings or error messages. Did you get any warnings you neglected to tell us about? If you re-run the model with lme4::lmer(...) (i.e. use the basic version from lme4, not the augmented version in lmerTest), do you see any warnings?

How to extract the confidence limits of LSMEANS?

I am using the oranges data provided with lsmeans.
library(lsmeans)
oranges.rg1<-lm(sales1 ~ price1 + price2 + day + store, data = oranges)
days.lsm <- lsmeans(oranges.rg1, "day")
days_contr.lsm <- contrast(days.lsm, "trt.vs.ctrl", ref = c(5,6))
The confidence intervals can be visualized by ploting plot(contrast(days.lsm, "trt.vs.ctrl", ref = c(5,6))), but they are not showed at days_contr.lsm
> days_contr.lsm
contrast estimate SE df t.ratio p.value
1 - avg(5,6) -7.8538769 2.194243 23 -3.579 0.0058
2 - avg(5,6) -6.9234858 2.127341 23 -3.255 0.0125
3 - avg(5,6) 0.2462789 2.155529 23 0.114 0.9979
4 - avg(5,6) -4.6760034 2.110761 23 -2.215 0.1184
How can I extract the confidence intervals to a data.frame?
> days_contr.lsm
contrast estimate SE df t.ratio p.value lower.CL upper.CL
1 - avg(5,6) -7.8538769 2.194243 23 -3.579 0.0058 ? ?
2 - avg(5,6) -6.9234858 2.127341 23 -3.255 0.0125 ? ?
3 - avg(5,6) 0.2462789 2.155529 23 0.114 0.9979 ? ?
4 - avg(5,6) -4.6760034 2.110761 23 -2.215 0.1184 ? ?
confint(contrast(days.lsm, "trt.vs.ctrl", ref = c(5,6))) worked fine
At risk of beating a dead horse, I feel that the main point of the question is getting the confidence intervals, given that what is seen in days_contr.lsm is only the t ratios and P values.
This happened because the default method for summarizing contrast() results is to show tests and not CIs, whereas the default method for summarizing emmeans() results is to show CIs and not tests. The infer argument of summary.emmGrid() controls what you see. Thus, you can get both CIs and tests using
summary(days_contr.lsm, infer = c(TRUE, TRUE))
and this would fill-in the question marks in the OP. The summary() result, by the way, is of class c("summary_emm", "data.frame"); it is a data.frame with a special print method that often shows some additional annotations.
There are additional emmGrid methods confint() and test() that run summary() with infer = c(TRUE, FALSE) and infer = c(FALSE, TRUE) respectively (though both have additional capabilities). The as.data.frame() method is just as.data.frame(summary(...)). For details, see tge help page for emmeans::summary.emmGrid.

Resources