Cox regression HR grouping - r

I would like to perform a Cox regression for the following questions: A group of patients receives a treatment "drug" or not (0 / 1). My time variable "time" tells me, how many days the patient is observed and "status" if the patient survived or died (died = 1, survived = 0).
library(survival)
set.seed(123)
df <- data.frame(time = round(runif(100, min = 1, max = 70)),
status = round(runif(100, min = 0, max = 1)),
drug = round(runif(100, min = 0, max = 1)),
age40 = round(runif(100, min = 0, max = 1)),
stringsAsFactors = FALSE)
object <- Surv(df$time, df$status)
model <- coxph(object ~ drug, data = df)
summary(model)
This works fine for me and tells me, that the HR is 0.89, so the drug prevents patients from dying.
Now I want to do some subgroup analysis, f.e. how does the HR change, if the patient is <= 40 years or > 40 years old (age40: 0 vs 1).
Is all I have to do to include the variable "age40" into the coxph?
object2 <- Surv(df$time, df$status)
model2 <- coxph(object2 ~ drug + age40, data = df)
summary(model2)
If I do that my HR in the summary for drug1 slightly changes to 0.86 and I get another one for age40 (1.12).
Now my question is: How are the Hazard Ratios for dying under treatment (drug = 1) if the patient is <= 40 or > 40 years old.
EDIT: Another question would be to graphically show the different HRs of the effect of drug on status in a forest plot, f.e. like this: https://rpkgs.datanovia.com/survminer/reference/ggforest-2.png.
Instead of "sex", "rx", "adhere" etc. I would like to show the HRs for Age40 = 0 vs. 1 and other variables as well, like hypertension = 0 vs. 1, smoker = 0 vs. 1.
Thank you!

The function you need to use is predict on your model2, and it needs to be supplied with a newdata argument that includes all the cases that you want to consider:
exp( predict(model2, newdata=expand.grid(drug=c(0,1), age40=c(0,1))) )
# 1 2 3 4
#1.0000000 0.8564951 1.1268713 0.9651598
You now have all 4 cases of possible combinations of drug and age40. The base case has a value of unity because you are estimating risk ratios form a baseline case of {drug=0, age40=0} You can see what the other risk ratios are associated with
expand.grid(drug=c(0,1), age40=c(0,1))
drug age40
1 0 0
2 1 0
3 0 1
4 1 1
Notice that the ration of drug=0 to drug=1 is the same for each age category considered separately. If you had wanted to see if the effects of drug was different in the two age categories you would have used an interaction model:
model3 <- coxph(object2 ~ drug * age40, data = df)
summary(model3)
#----------------
Call:
coxph(formula = object2 ~ drug * age40, data = df)
n= 100, number of events= 50
coef exp(coef) se(coef) z Pr(>|z|)
drug -0.18524 0.83091 0.45415 -0.408 0.683
age40 0.09611 1.10089 0.39560 0.243 0.808
drug:age40 0.05679 1.05843 0.63094 0.090 0.928
exp(coef) exp(-coef) lower .95 upper .95
drug 0.8309 1.2035 0.3412 2.024
age40 1.1009 0.9084 0.5070 2.390
drug:age40 1.0584 0.9448 0.3073 3.645
Concordance= 0.528 (se = 0.042 )
Likelihood ratio test= 0.34 on 3 df, p=1
Wald test = 0.33 on 3 df, p=1
Score (logrank) test = 0.33 on 3 df, p=1
And the effect estimates are now a bit different:
exp( predict(model3, newdata=expand.grid(drug=c(0,1), age40=c(0,1))) )
# 1 2 3 4
#1.0000000 0.8309089 1.1008850 0.9681861

Use argument strata.
coxph(object ~ drug + strata(age40), data = df)

Related

Any way to reverse the direction of comparisons when using emmeans contrast with "interaction" argument?

I'm trying to use emmeans to test "contrasts of contrasts" with custom orthogonal contrasts applied to a zero-inflated negative binomial model. The study design has 4 groups (study_group: grp1, grp2, grp3, grp4), each of which is assessed at 3 timepoints (time: Time1, Time2, Time3).
With the code below, I am able to get very close to, but not exactly, what I want. The contrasts that emerge are expressed in terms of ratios such as grp1/grp2, grp1/grp3,..., grp3/grp4 ("lower over higher"; see output following code).
What would be immensely helpful to me to have a way to flip these ratios to be grp2/grp1, grp3/grp1,..., grp4/grp3 ("higher over lower"). I've tried sticking reverse=TRUE in various spots, but to no effect.
Short of re-leveling the study_group factor, is there anyway to do this in emmeans?
Thanks!
library(glmmTMB)
library(emmeans)
set.seed(3456)
# Building grid for study design: 4 groups of 3 sites,
# each with 20 participants observed 3 times
site <- rep(1:12, each=60)
pid <- 1000*site+10*(rep(rep(1:20,each=3),12))
study_group <- c(rep("grp1",180), rep("grp2",180), rep("grp3",180), rep("grp4",180))
grp_num <- c(rep(0,180), rep(1,180), rep(2,180), rep(3,180))
time <- c(rep(c("Time1", "Time2", "Time3"),240))
time_num <- c(rep(c(0:2),240))
# Site-level random effects (intercepts)
site_eff_count = rep(rnorm(12, mean = 0, sd = 0.5), each = 60)
site_eff_zeros = rep(rnorm(12, mean = 0, sd = 0.5), each = 60)
# Simulating a neg binomial outcome
y_count <- rnbinom(n = 720, mu=exp(3.25 + grp_num*0.15 + time_num*-0.20 + grp_num*time_num*0.15 + site_eff_count), size=0.8)
# Simulating some extra zeros
log_odds = (-1.75 + grp_num*0.2 + time_num*-0.40 + grp_num*time_num*0.50 + site_eff_zeros)
prob_1 = plogis(log_odds)
prob_0 = 1 - prob_1
y_zeros <- rbinom(n = 720, size = 1, prob = prob_0)
# Building datasest with ZINB-ish outcome
data_ZINB <- data.frame(site, pid, study_group, time, y_count, y_zeros)
data_ZINB$y_obs <- ifelse(y_zeros==1, y_count, 0)
# Estimating ZINB GLMM in glmmTMB
mod_ZINB <- glmmTMB(y_obs ~ 1
+ study_group + time + study_group*time
+ (1|site),
family=nbinom2,
zi = ~ .,
data=data_ZINB)
#summary(mod_ZINB)
# Getting model-estimated "cell" means for conditional (non-zero) sub-model
# in response (not linear predictor) scale
count_means <- emmeans(mod_ZINB,
pairwise ~ time | study_group,
component="cond",
type="response",
adjust="none")
# count_means
# Defining custom contrast function for orthogonal time contrasts
# contr1 = Time 2 - Time 1
# contr2 = Time 3 - Times 1 and 2
compare_arms.emmc <- function(levels) {
k <- length(levels)
contr1 <- c(-1,1,0)
contr2 <- c(-1,-1,2)
coef <- data.frame()
coef <- as.data.frame(lapply(seq_len(k - 1), function(i) {
if(i==1) contr1 else contr2
}))
names(coef) <- c("T1vT2", "T1T2vT3")
attr(coef, "adjust") = "none"
coef
}
# Estimating pairwise between-group "contrasts of contrasts"
# i.e., testing if time contrasts differ across groups
compare_arms_contrast <- contrast(count_means[[1]],
interaction = c("compare_arms", "pairwise"),
by = NULL)
compare_arms_contrast
applying theemmeans::contrast function as above yields this:
time_compare_arms study_group_pairwise ratio SE df null t.ratio p.value
T1vT2 grp1 / grp2 1.091 0.368 693 1 0.259 0.7957
T1T2vT3 grp1 / grp2 0.623 0.371 693 1 -0.794 0.4276
T1vT2 grp1 / grp3 1.190 0.399 693 1 0.520 0.6034
T1T2vT3 grp1 / grp3 0.384 0.241 693 1 -1.523 0.1283
T1vT2 grp1 / grp4 0.664 0.245 693 1 -1.108 0.2681
.
.
.
T1T2vT3 grp3 / grp4 0.676 0.556 693 1 -0.475 0.6346
Tests are performed on the log scale
The answer, provided by Russ Lenth in the comments and in the emmeans documentation for the contrast function, is to replace pairwise with revpairwise in the contrast function call.

Comparing two curves for difference in trend

I have some data about trends over time in drug use across the state. I want to know whether there have been changes in the gender difference in intravenous drug use versus gender differences in all recreational drug use over time.
My data is below. I think I might need to use time-series analysis, but I'm not sure. Any help would be much appreciated.
enter image description here
Since the description in the question does not match the data as there is no information on gender we will assume from the subject that we want to determine if the trends of illicit and iv are the same.
Comparing Trends
Note that there is no autocorrelation in the detrended values of iv or illicit so we will use ordinary linear models.
iv <- c(0.4, 0.3, 0.4, 0.3, 0.2, 0.2)
illicit <- c(5.5, 5.7, 4.8, 4.7, 6.1, 5.3)
time <- 2011:2016
ar(resid(lm(iv ~ time)))
## Call:
## ar(x = resid(lm(iv ~ time)))
##
## Order selected 0 sigma^2 estimated as 0.0024
ar(resid(lm(illicit ~ time)))
## Call:
## ar(x = resid(lm(illicit ~ time)))
##
## Order selected 0 sigma^2 estimated as 0.287
Create a 12x3 data frame long with columns time, value and ind (iv or illicit). Then run a linear model with two slopes and and another with one slope. Both have two intercepts. Then compare them using anova. Evidently they are not significantly different so we cannot reject the hypothesis that the slopes are the same.
wide <- data.frame(iv, illicit)
long <- cbind(time, stack(wide))
fm2 <- lm(values ~ ind/(time + 1) + 0, long)
fm1 <- lm(values ~ ind + time + 0, long)
anova(fm1, fm2)
giving:
Analysis of Variance Table
Model 1: values ~ ind + time + 0
Model 2: values ~ ind/(time + 1) + 0
Res.Df RSS Df Sum of Sq F Pr(>F)
1 9 1.4629
2 8 1.4469 1 0.016071 0.0889 0.7732
Comparing model with slopes to one without slopes
Actually the slopes are not significant in the first place and we cannot reject the hypothesis that both the slopes are zero. Compare to a two intercept model with no slopes.
fm0 <- lm(values ~ ind + 0, long)
anova(fm0, fm2)
giving:
Analysis of Variance Table
Model 1: values ~ ind + 0
Model 2: values ~ ind/(time + 1) + 0
Res.Df RSS Df Sum of Sq F Pr(>F)
1 10 1.4750
2 8 1.4469 2 0.028143 0.0778 0.9258
or running a stepwise regression we find that its favored model is one with two intercepts and no slopes:
step(fm2)
giving:
Start: AIC=-17.39
values ~ ind/(time + 1) + 0
Df Sum of Sq RSS AIC
- ind:time 2 0.028143 1.4750 -21.155
<none> 1.4469 -17.386
Step: AIC=-21.15
values ~ ind - 1
Df Sum of Sq RSS AIC
<none> 1.475 -21.155
- ind 2 172.28 173.750 32.073
Call:
lm(formula = values ~ ind - 1, data = long)
Coefficients:
indiv indillicit
0.30 5.35
log transformed values
If we use log(values) then we similarly find no autocorrelation (not shown) but we do find the slopes of the log transformed values are significantly different.
fm2log <- lm(log(values) ~ ind/(time + 1) + 0, long)
fm1log <- lm(log(values) ~ ind + time + 0, long)
anova(fm1log, fm2log)
giving:
Analysis of Variance Table
Model 1: log(values) ~ ind + time + 0
Model 2: log(values) ~ ind/(time + 1) + 0
Res.Df RSS Df Sum of Sq F Pr(>F)
1 9 0.35898
2 8 0.18275 1 0.17622 7.7141 0.02402 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

How to parametrize piecewise regression coefficient to represent the slope for the following interval (instead of the change in the slope)

Consider the following dataset
Quantity <- c(25,39,45,57,70,85,89,100,110,124,137,150,177)
Sales <- c(1000,1250,2600,3000,3500,4500,5000,4700,4405,4000,3730,3400,3300)
df <- data.frame(Quantity,Sales)
df
Plotting the data, the distribution of observations is clearly non-linear, but presents a likely breaking-point around Quantity = 89 (I skip the plot here). Therefore, I built a joint piecewise linear model as follows
df$Xbar <- ifelse(df$Quantity>89,1,0)
df$diff <- df$Quantity - 89
reg <- lm(Sales ~ Quantity + I(Xbar * (Quantity - 89)), data = df)
summary(reg)
or simply
df$X <- df$diff*df$Xbar
reg <- lm(Sales ~ Quantity + X, data = df)
summary(reg)
However, according to this parametrization, the coefficient of X represents the change in the slope from the preceding interval.
How can I parametrize the relevant coefficient to rather represent the slope for the second interval?
I did some research but I was unable to find the desired specification, apart from some automatization in stata (see the voice 'marginal' here https://www.stata.com/manuals13/rmkspline.pdf).
Any help is much appreciated. Thank you!
Acknowledgement:
the workable example is retrieved from
https://towardsdatascience.com/unraveling-spline-regression-in-r-937626bc3d96
The key here is to use a logical variable is.right which is TRUE for the points to the right of 89 and FALSE otherwise.
From the the output shown 60.88 is the slope to the left of 89 and -19.97 is the slope to the right. The lines intersect at Quantity = 89, Sales = 4817.30.
is.right <- df$Quantity > 89
fm <- lm(Sales ~ diff : is.right, df)
fm
## Call:
## lm(formula = Sales ~ diff:is.right, data = df)
##
## Coefficients:
## (Intercept) diff:is.rightFALSE diff:is.rightTRUE
## 4817.30 60.88 -19.97
Alternatives
Alternately if you want to use Xbar from the question do it this way. It gives the same coefficients as fm.
fm2 <- lm(Sales ~ diff : factor(Xbar), df)
or
fm3 <- lm(Sales ~ I(Xbar * diff) + I((1 - Xbar) * diff), df)
Double check with nls
We can double check these using nls with the following formulation which makes use of the fact that if we extend both lines the one to use at any Quantity is the lower of the two.
st <- list(a = 0, b1 = 1, b2 = -1)
fm4 <- nls(Sales ~ a + pmin(b1 * (Quantity - 89), b2 * (Quantity - 89)), start = st)
fm4
## Nonlinear regression model
## model: Sales ~ a + pmin(b1 * (Quantity - 89), b2 * (Quantity - 89))
## data: parent.frame()
## a b1 b2
## 4817.30 60.88 -19.97
## residual sum-of-squares: 713120
##
## Number of iterations to convergence: 1
## Achieved convergence tolerance: 2.285e-09
This would also work:
fm5 <- nls(Sales ~ a + ifelse(Quantity > 89, b2, b1) * diff, df, start = st)
Plot
Here is a plot:
plot(Sales ~ Quantity, df)
lines(fitted(fm) ~ Quantity, df)
Model matrix
And here is the model matrix for the linear regression:
> model.matrix(fm)
(Intercept) diff:is.rightFALSE diff:is.rightTRUE
1 1 -64 0
2 1 -50 0
3 1 -44 0
4 1 -32 0
5 1 -19 0
6 1 -4 0
7 1 0 0
8 1 0 11
9 1 0 21
10 1 0 35
11 1 0 48
12 1 0 61
13 1 0 88
If you know the breakpoints, then you almost have the model, it should be:
fit=lm(Sales ~ Quantity + Xbar + Quantity:Xbar,data=df)
Because if you don't introduce a new intercept (Xbar), it will start from the intercept already in the model, which will not work. We can plot it:
plot(df$Quantity,df$Sales)
newdata = data.frame(Quantity=seq(40,200,by=5))
newdata$Xbar= ifelse(newdata$Quantity>89,1,0)
lines(newdata$Quantity,predict(fit,newdata))
The coefficients are:
summary(fit)
Call:
lm(formula = Sales ~ Quantity * Xbar, data = df)
Residuals:
Min 1Q Median 3Q Max
-527.9 -132.2 -15.1 148.1 464.7
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -545.435 327.977 -1.663 0.131
Quantity 59.572 5.746 10.367 2.65e-06 ***
Xbar 7227.288 585.933 12.335 6.09e-07 ***
Quantity:Xbar -80.133 6.856 -11.688 9.64e-07 ***
And the coefficient of the 2nd slope is 59.572+(-80.133) = -20.561

How to change unit increment in hazard ratio from coxph and frailty model in R?

I ran a coxph model and a frailty model, but now I would like to change the hazard ratio for continuous variable (age) to show in terms of 5-unit increment instead of 1-unit. Is there a function in R that can perform such task? If so, does the function also work for frailty mode? I used the package frailtypack.
library('survival')
data(veteran)
cox <- coxph(Surv(time, status) ~ age, data = veteran)
summary(cox)
# Call:
# coxph(formula = Surv(time, status) ~ age, data = veteran)
#
# n= 137, number of events= 128
#
# coef exp(coef) se(coef) z Pr(>|z|)
# age 0.007500 1.007528 0.009565 0.784 0.433
#
# exp(coef) exp(-coef) lower .95 upper .95
# age 1.008 0.9925 0.9888 1.027
#
# Concordance= 0.515 (se = 0.029 )
# Likelihood ratio test= 0.63 on 1 df, p=0.4
# Wald test = 0.61 on 1 df, p=0.4
# Score (logrank) test = 0.62 on 1 df, p=0.4
Just add a new variable that represents the age group each subject belongs to; for example 1: 0-4, 2: 5-9, 3: 10-15, etc.
This is an example using the veteran dataset in the survival package. The data has a continuous variable age. Adding this as a predictor to the model will give you the relative risk (hazard ratio) for a one-year increase or increment in age. If you are interested in the x-year increment, you should generate a new variable which groups subjects accordingly. For these data, I applied the following grouping; group 1: younger than 40, group 2: 40 - <50, group 3: 50 - < 60, group 4: 60 - <70, and group 5: 70 or older. As such, the HR for a 10-year increment is 1.049. Alternatively, the risk increases with 5% for every 10 year increase in age. Note that the association is not statistically significant.
library(survival)
data(veteran)
veteran$ageCat <- 5
veteran$ageCat[veteran$age < 70] <- 4
veteran$ageCat[veteran$age < 60] <- 3
veteran$ageCat[veteran$age < 50] <- 2
veteran$ageCat[veteran$age < 40] <- 1
table(veteran$ageCat)
1 2 3 4 5
11 20 22 72 12
cox <- coxph(Surv(time, status) ~ ageCat, data = veteran)
summary(cox)
Call:
coxph(formula = Surv(time, status) ~ ageCat, data = veteran)
n= 137, number of events= 128
coef exp(coef) se(coef) z Pr(>|z|)
ageCat 0.04793 1.04910 0.09265 0.517 0.605
exp(coef) exp(-coef) lower .95 upper .95
ageCat 1.049 0.9532 0.8749 1.258
Concordance= 0.509 (se = 0.028 )
Rsquare= 0.002 (max possible= 0.999 )
Likelihood ratio test= 0.27 on 1 df, p=0.6024
Wald test = 0.27 on 1 df, p=0.6049
Score (logrank) test = 0.27 on 1 df, p=0.6048
#milan's post answers a similar question but not the one as asked. Since age was split into decades and modeled as a continuous variable, the hazard ratio would compare a subject's age-decade compared to the next youngest decade. That is, the HR for subjects aged 51 vs 49 or 59 vs 41 would be the same despite 2 or 18 years between them.
Anyway, the default as you suggest is for a 1-unit increment in the continuous variable, age in this case. It's not always useful to compare subjects by 1-unit change especially when the range gets to be much larger.
You can do the following which is naive to the model, so this should would for a lm, glm, survival::coxph, frailtypack::frailtyPenal, etc.
library('survival')
data(veteran)
## 1-year increase in age
cox <- coxph(Surv(time, status) ~ age, data = veteran)
exp(coef(cox))
# age
# 1.007528
For a multiplicative model like Cox regressions, you can get the x-unit change after the model is fit:
## 5-year increase in age
exp(coef(cox)) ^ 5
# age
# 1.038211
## or equivalently
exp(coef(cox) * 5)
# age
# 1.038211
However, it's easier to create a variable for the age transformation then fit the model:
## or you can create a variable to model
veteran <- within(veteran, {
age5 <- age / 5
})
cox5_1 <- coxph(Surv(time, status) ~ age5, data = veteran)
exp(coef(cox5_1))
# age10
# 1.038211
cox5_2 <- coxph(Surv(time, status) ~ I(age / 5), data = veteran)
exp(coef(cox5_2))
# I(age/5)
# 1.038211
Note you need to use I here in the formula interface since some operators have special meanings in formulae. For example, lm(mpg ~ wt - 1, mtcars) and lm(mpg ~ I(wt - 1), mtcars) are two different models.
You can use these methods in other models, for example frailtyPenal if that is indeed the one you are using:
library('frailtypack')
fp <- frailtyPenal(Surv(time, status) ~ age, data = veteran, n.knots = 12, kappa = 1e5)
exp(fp$coef)
exp(fp$coef) ^ 5
fp5_1 <- frailtyPenal(Surv(time, status) ~ age5, data = veteran, n.knots = 12, kappa = 1e5)
fp5_2 <- frailtyPenal(Surv(time, status) ~ I(age / 5), data = veteran, n.knots = 12, kappa = 1e5)
exp(fp5_1$coef)
exp(fp5_2$coef)

How to combine groups in Poisson regression to estimate contrast?

I'm not sure if this is more a programming or statistical (i.e. my lack of understanding) question.
I have a Poisson mixed model that I want to use to compare average counts across groups at different time periods.
mod <- glmer(Y ~ TX_GROUP * time + (1|ID), data = dat, family = poisson)
mod_em <- emmeans(mod, c("TX_GROUP","time"), type = "response")
TX_GROUP time rate SE df asymp.LCL asymp.UCL
0 1 5.743158 0.4566671 Inf 4.914366 6.711723
1 1 5.529303 0.4639790 Inf 4.690766 6.517741
0 2 2.444541 0.2981097 Inf 1.924837 3.104564
1 2 1.467247 0.2307103 Inf 1.078103 1.996855
0 3 4.570218 0.4121428 Inf 3.829795 5.453790
1 3 1.676827 0.2472920 Inf 1.255904 2.238826
Now, I want to estimate the marginal count for the combined time period (2 + 3) for each group. Is it not a simple case of exponentiating the sum of the logged counts from:
contrast(mod_em, list(`2 + 3` = c(0, 0, 1, 0, 1, 0)))
contrast(mod_em, list(`2 + 3` = c(0, 0, 0, 1, 0, 1)))
If I try that the value does not come close to matching the simple mean of the combined groups.
First, I suggest that you put both of your contrasts in one list, e.g.,
contr = list(`2+2|0` = c(0, 0, 1, 0, 1, 0),
`2+3|1` = c(0, 0, 0, 1, 0, 1))
You have to decide when you want to back-transform. See the vignette on transformations and note the discussion on "timing is everything". The two basic options are:
One option: Obtain the marginal means of the log counts, and then back-transform:
mod_con = update(contrast(mod_emm, contr), tran = "log")
summary(mod_con, type = "response")
[The update call is needed because contrast strips off transformations except in special cases, because it doesn't always know what scale to assign to arbitrary linear functions. For example, the difference of two square roots is not on a square-root scale.]
Second option: Back-transform the predictions, then sum them:
mod_emmr = regrid(mod_emm)
contrast(mod_emmr, contr)
The distinction between these results is the same as the distinction between a geometric mean (option 1) and an arithmetic mean (option 2). I doubt that either of them will yield the same results as the raw marginal mean counts, because they are based on the predictions from your model. Personally, I think the first option is the better choice, because sums are a linear operation, and the model is linear on the log scale.
Addendum
There is actually a third option, which is to create a grouping variable. I will illustrate with the pigs dataset.
> pigs.lm <- lm(log(conc) ~ source + factor(percent), data = pigs)
Here are the EMMs for percent:
> emmeans(pigs.lm, "percent")
percent emmean SE df lower.CL upper.CL
9 3.445307 0.04088810 23 3.360723 3.529890
12 3.624861 0.03837600 23 3.545475 3.704248
15 3.662706 0.04372996 23 3.572244 3.753168
18 3.745156 0.05296030 23 3.635599 3.854713
Results are averaged over the levels of: source
Results are given on the log (not the response) scale.
Confidence level used: 0.95
Now let's create a grouping factor group:
> pigs.emm = add_grouping(ref_grid(pigs.lm), "group", "percent", c("1&2","1&2","3&4","3&4"))
> str(pigs.emm)
'emmGrid' object with variables:
source = fish, soy, skim
percent = 9, 12, 15, 18
group = 1&2, 3&4
Nesting structure: percent %in% group
Transformation: “log”
Now get the EMMs for group and note they are just the averages of the respective levels:
> emmeans(pigs.emm, "group")
group emmean SE df lower.CL upper.CL
1&2 3.535084 0.02803816 23 3.477083 3.593085
3&4 3.703931 0.03414907 23 3.633288 3.774574
Results are averaged over the levels of: source, percent
Results are given on the log (not the response) scale.
Confidence level used: 0.95
And here is a summary on the response scale:
> summary(.Last.value, type = "response")
group response SE df lower.CL upper.CL
1&2 34.29790 0.961650 23 32.36517 36.34605
3&4 40.60662 1.386678 23 37.83703 43.57893
Results are averaged over the levels of: source, percent
Confidence level used: 0.95
Intervals are back-transformed from the log scale
These are averages rather than sums, but otherwise it works, and the transformation doesn't get zapped like it does in contrast()
To use the example data from the package, it seems to be fine, though I'd use the grouping in the formula instead.
> warp.lm <- lm(breaks ~ wool*tension, data = warpbreaks)
> warp.emm <- emmeans(warp.lm, c("tension", "wool"))
> warp.emm
tension wool emmean SE df lower.CL upper.CL
L A 44.55556 3.646761 48 37.22325 51.88786
M A 24.00000 3.646761 48 16.66769 31.33231
H A 24.55556 3.646761 48 17.22325 31.88786
L B 28.22222 3.646761 48 20.88992 35.55453
M B 28.77778 3.646761 48 21.44547 36.11008
H B 18.77778 3.646761 48 11.44547 26.11008
Confidence level used: 0.95
Sum of L and M should be 44 + 24 ~ 68 for A and 28 + 28 ~ 56 for B.
> contrast(warp.emm, list(A.LM = c(1, 1, 0, 0, 0, 0),
+ B.LM = c(0, 0, 0, 1, 1, 0)))
contrast estimate SE df t.ratio p.value
A.LM 68.55556 5.157299 48 13.293 <.0001
B.LM 57.00000 5.157299 48 11.052 <.0001
Though I'd use the grouping in the formula.
> warp.em2 <- emmeans(warp.lm, ~tension|wool)
> contrast(warp.em2, list(LM = c(1, 1, 0)))
wool = A:
contrast estimate SE df t.ratio p.value
LM 68.55556 5.157299 48 13.293 <.0001
wool = B:
contrast estimate SE df t.ratio p.value
LM 57.00000 5.157299 48 11.052 <.0001
Thanks. The second method works for me, but not the first (which seems more intuitive) - it doesn't seem to give me back-transformed values:
(mod_em_inj <- emmeans(mod_inj, c("TX_GROUP","time"), type = "response"))
TX_GROUP time rate SE df asymp.LCL asymp.UCL
0 1 5.743158 0.4566671 Inf 4.914366 6.711723
1 1 5.529303 0.4639790 Inf 4.690766 6.517741
0 2 2.444541 0.2981097 Inf 1.924837 3.104564
1 2 1.467247 0.2307103 Inf 1.078103 1.996855
0 3 4.570218 0.4121428 Inf 3.829795 5.453790
1 3 1.676827 0.2472920 Inf 1.255904 2.238826
# Marginal means for combined period (7 - 24 months) - Method 1
(mod_em_inj2 <- emmeans(mod_inj, c("TX_GROUP","time")))
TX_GROUP time emmean SE df asymp.LCL asymp.UCL
0 1 1.7480092 0.07951497 Inf 1.59216273 1.9038557
1 1 1.7100619 0.08391274 Inf 1.54559591 1.8745278
0 2 0.8938574 0.12194916 Inf 0.65484147 1.1328734
1 2 0.3833880 0.15724024 Inf 0.07520279 0.6915732
0 3 1.5195610 0.09018011 Inf 1.34281119 1.6963107
1 3 0.5169035 0.14747615 Inf 0.22785558 0.8059515
contr = list(`2+3|0` = c(0, 0, 1, 0, 1, 0),
`2+3|1` = c(0, 0, 0, 1, 0, 1))
summary(contrast(mod_em_inj2, contr), type = "response")
contrast estimate SE df z.ratio p.value
2+3|0 2.4134184 0.1541715 Inf 15.654 <.0001
2+3|1 0.9002915 0.2198023 Inf 4.096 <.0001
# Marginal means for combined period (7 - 24 months) - Method 2
mod_emmr = regrid(mod_em_inj)
contrast(mod_emmr, contr)
contrast estimate SE df z.ratio p.value
2+3|0 7.014759 0.5169870 Inf 13.569 <.0001
2+3|1 3.144075 0.3448274 Inf 9.118 <.0001
The values of 7.01 and 3.14 are about what I should be getting. Apologies if I'm missing something obvious in your response.

Resources