R: Proportional Hazard Assumption in coxme() - r

I am running a mixed effects model using the coxme() function in R. The model analyzes the event of product success of firms in different countries.
Fixed effects are for example GDP, population, technology and cultural variables. Random effects are the different countries.
I know that with coxph() it is possible to test for proportional hazard using the cox.zph() command.
My question: How can I check for proportional hazard with coxme()?

Fixed effects in a random-effects coxme model can be checked for proportional hazards (PH) with the same cox.zph() function used for standard coxph() models. According to the manual, the fit argument for cox.zph() is "the result of fitting a Cox regression model, using the coxph or coxme functions."
Random effects "are not checked for proportional hazards, rather they are treated as a fixed offset in model."
An example, borrowed from this Cross-Validated question:
> library(survival)
> library(coxme)
> df <- stanford2
> df$cid <- round(df$id / 10) + 1 ## generates some clusters
> fit <- coxme(Surv(time, status) ~ age + t5 + (1 | cid),data=df)
> fit
Cox mixed-effects model fit by maximum likelihood
Data: df
events, n = 102, 157 (27 observations deleted due to missingness)
Iterations= 2 12
NULL Integrated Fitted
Log-likelihood -451.0944 -446.8618 -446.8261
Chisq df p AIC BIC
Integrated loglik 8.47 3.00 0.037317 2.47 -5.41
Penalized loglik 8.54 2.04 0.014582 4.46 -0.88
Model: Surv(time, status) ~ age + t5 + (1 | cid)
Fixed coefficients
coef exp(coef) se(coef) z p
age 0.02960206 1.030045 0.01135724 2.61 0.0091
t5 0.17056610 1.185976 0.18330590 0.93 0.3500
Random effects
Group Variable Std Dev Variance
cid Intercept 0.0199835996 0.0003993443
> cox.zph(fit)
chisq df p
age 0.831 3.00 0.84
t5 2.062 2.04 0.36
GLOBAL 2.767 5.04 0.74
This was done with survival_3.1-11 and coxme_2.2-16.

Related

Generalized least squares results interpretation

I checked my linear regression model (WMAN = Species, WDNE = sea surface temp) and found auto-correlation so instead, I am trying generalized least squares with the following script;
library(nlme)
modelwa <- gls(WMAN ~WDNE, data=dat,
correlation = corAR1(form=~MONTH),
na.action=na.omit)
summary(modelwa)
I compared both models;
> library(MuMIn)
> model.sel(modelw,modelwa)
Model selection table
(Intrc) WDNE class na.action correlation df logLik AICc delta
modelwa 31.50 0.1874 gls na.omit crAR1(MONTH) 4 -610.461 1229.2 0.00
modelw 11.31 0.7974 lm na.excl 3 -658.741 1323.7 94.44
weight
modelwa 1
modelw 0
Abbreviations:
na.action: na.excl = ‘na.exclude’
correlation: crAR1(MONTH) = ‘corAR1(~MONTH)’
Models ranked by AICc(x)
I believe the results suggest I should use gls as the AIC is lower.
My problem is, I have been reporting F-value/R²/p-value, but the output from the gls does not have these?
I would be very grateful if someone could assist me in interpreting these results?
> summary(modelwa)
Generalized least squares fit by REML
Model: WMAN ~ WDNE
Data: mp2017.dat
AIC BIC logLik
1228.923 1240.661 -610.4614
Correlation Structure: ARMA(1,0)
Formula: ~MONTH
Parameter estimate(s):
Phi1
0.4809973
Coefficients:
Value Std.Error t-value p-value
(Intercept) 31.496911 8.052339 3.911524 0.0001
WDNE 0.187419 0.091495 2.048401 0.0424
Correlation:
(Intr)
WDNE -0.339
Standardized residuals:
Min Q1 Med Q3 Max
-2.023362 -1.606329 -1.210127 1.427247 3.567186
Residual standard error: 18.85341
Degrees of freedom: 141 total; 139 residual
>
I have now overcome the problem of auto-correlation so I can use lm()
Add lag1 of residual as an X variable to the original model. This can be done using the slide function in DataCombine package.
library(DataCombine)
econ_data <- data.frame(economics, resid_mod1=lmMod$residuals)
econ_data_1 <- slide(econ_data, Var="resid_mod1",
NewVar = "lag1", slideBy = -1)
econ_data_2 <- na.omit(econ_data_1)
lmMod2 <- lm(pce ~ pop + lag1, data=econ_data_2)
This script can be found here

R command for stphtest in Stata

What is the equivalent command in R for stphtest in Stata?
If there is not a 1-to-1 command, how do you test the assumption of proportional hazards in a Cox Proportional-Hazards Regression model?
Just to expand Glen's answer:
I know how different a mindset R poses for Stata users.
First run the coxph model:
model.coxph <- coxph(Surv(t1, t2, event) ~ var1 + var2, data = data)
> summary(model.coxph0)
coef exp(coef) se(coef) z Pr(>|z|)
var1 1.665e-01 1.181e+00 1.605e-01 1.038 0.29948
var2 -1.358e-03 9.986e-01 6.852e-04 -1.982 0.04746 *
Than run cox.zph on the model:
(viol.cox <- cox.zph(model.coxph0))
rho chisq p
var1 0.0486 1.095 0.295
var2 -0.0250 0.258 0.611
GLOBAL NA 1.462 0.691
Each p value ought to be > 0.05 so that the assumptions are not violated.
You can also plot the scaled schoenfeld residuals with a simple plot on the cox.zph:
> plot(viol.cox)
See ?survival:::cox.zph:
Description
Test the proportional hazards assumption for a Cox regression model fit (coxph).

Testing for multiple determinants in cox

I'm doing an analysis where I want to find what are determinants for (cardiovascular) events in my cohort of patients. I want to do a cox analysis (coxph from the survival package). Now I want to find which determinants are independent determinants.
Now the question is: do I simply make one model in which I throw all my prespecified determinants (age, gender, BMI, cholesterol levels etc) and then see what happens? Or do I have to do univariate testing first? And then only add the significant/near significant (e.g. p value <0.10) in the full cox model?
This was the model I am using now:
model1 <- coxph(Surv(time,event==1)~age+gender+ckdepi+smoking+cholesterol+hdl+bpsys+t2dm+bmi, data=data)
And this is my output:
coef exp(coef) se(coef) z p
age 0.04235 1.04326 0.00911 4.65 3.3e-06***
gender -0.36583 0.69362 0.11538 -3.17 0.00152**
ckdepi -0.01078 0.98927 0.00309 -3.49 0.00048***
smoking 0.14560 1.15674 0.03070 4.74 2.1e-06***
chol 0.10312 1.10862 0.03746 2.75 0.00590**
hdl -0.04485 0.95614 0.13231 -0.34 0.73465
sysbp 0.00356 1.00357 0.00225 1.58 0.11392
t2dm 0.39539 1.48496 0.11763 3.36 0.00078***
bmi -0.00898 0.99106 0.01227 -0.73 0.46427
Likelihood ratio test=97.4 on 9 df, p=0
n= 3084, number of events= 525
(2 observations deleted due to missingness)
Also, I THINK I need to do testing for linearity (e.g. use restricted cubic splines or addquadratic functions) in the fully adjusted model, but I'm not sure. Is this correct?
In that case, my next step would be to test at least ckdepi and bmi since I'm pretty sure they might not be linear.

Odds ratio and confidence intervals from glmer output

I have made a model that looks at a number of variables and the effect that has on pregnancy outcome. The outcome is a grouped binary. A mob of animals will have 34 pregnant and 3 empty, the next will have 20 pregnant and 4 empty and so on.
I have modelled this data using the glmer function where y is the pregnancy outcome (pregnant or empty).
mclus5 <- glmer(y~adg + breed + bw_start + year + (1|farm),
data=dat, family=binomial)
I get all the usual output with coefficients etc. but for interpretation I would like to transform this into odds ratios and confidence intervals for each of the coefficients.
In past logistic regression models I have used the following code
round(exp(cbind(OR=coef(mclus5),confint(mclus5))),3)
This would very nicely provide what I want, but it does not seem to work with the model I have run.
Does anyone know a way that I can get this output for my model through R?
The only real difference is that you have to use fixef() rather than coef() to extract the fixed-effect coefficients (coef() gives you the estimated coefficients for each group).
I'll illustrate with a built-in example from the lme4 package.
library("lme4")
gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
data = cbpp, family = binomial)
Fixed-effect coefficients and confidence intervals, log-odds scale:
cc <- confint(gm1,parm="beta_") ## slow (~ 11 seconds)
ctab <- cbind(est=fixef(gm1),cc)
(If you want faster-but-less-accurate Wald confidence intervals you can use confint(gm1,parm="beta_",method="Wald") instead; this will be equivalent to #Gorka's answer but marginally more convenient.)
Exponentiate to get odds ratios:
rtab <- exp(ctab)
print(rtab,digits=3)
## est 2.5 % 97.5 %
## (Intercept) 0.247 0.149 0.388
## period2 0.371 0.199 0.665
## period3 0.324 0.165 0.600
## period4 0.206 0.082 0.449
A marginally simpler/more general solution:
library(broom.mixed)
tidy(gm1,conf.int=TRUE,exponentiate=TRUE,effects="fixed")
for Wald intervals, or add conf.method="profile" for profile confidence intervals.
I believe there is another, much faster way (if you are OK with a less accurate result).
From: http://www.ats.ucla.edu/stat/r/dae/melogit.htm
First we get the confidence intervals for the Estimates
se <- sqrt(diag(vcov(mclus5)))
# table of estimates with 95% CI
tab <- cbind(Est = fixef(mclus5), LL = fixef(mclus5) - 1.96 * se, UL = fixef(mclus5) + 1.96 * se)
Then the odds ratios with 95% CI
print(exp(tab), digits=3)
Other option I believe is to just use package emmeans :
library(emmeans)
data.frame(confint(pairs(emmeans(fit, ~ factor_name,type="response"))))

Colnames error after running Summary() in mixed model

R version 3.1.0 (2014-04-10)
lmer package version 1.1-6
lmerTest package version 2.0-6
I am currently working with lmer and lmerTest for my analysis.
Every time I add an effect to the random structure, I get the following error when running summary():
#Fitting a mixed model:
TRT5ToVerb.lmer3 = lmer(TRT5ToVerb ~ Group + Condition + (1+Condition|Participant) + (1|Trial), data=AllData, REML=FALSE, na.action=na.omit)
summary(TRT5ToVerb.lmer3)
Error in `colnames<-`(`*tmp*`, value = c("Estimate", "Std. Error", "df", : length of 'dimnames' [2] not equal to array extent
If I leave the structure like this:
TRT5ToVerb.lmer2 = lmer(TRT5ToVerb ~ Group + Condition + (1|Participant) + (1|Trial), data=AllData, REML=FALSE, na.action=na.omit)
there is no error run summary(TRT5ToVerb.lmer2), returning AIC, BIC, logLik deviance, estimates of the random effects, estimates of the fixed effects and their corresponding p-values, etc., etc.
So, apparently something happens when I run lmerTest, despite the fact that the object TRT5ToVerb.lmer3 is there. The only difference between both is the random structure: (1+Condition|Participant) vs. (1|Participant)
Some characteristics of my data:
Both Condition and Group are categorical variables: Condition
comprises 3 levels, and Group 2
The dependent variable (TRT5ToVerb) is continuous: it corresponds to
reading time in terms of ms
This a repeated measures experiment, with 48 observations per
participant (participants=28)
I read this threat, but I cannot see a clear solution. Will it be that I have to transform my dataframe to long format?
And if so, then how do I work with that in lmer?
I hope it is not that.
Thanks!
Disclaimer: I am neither an expert in R, nor in statistics, so please, have some patience.
(Should be a comment, but too long/code formatting etc.)
This fake example seems to work fine with lmerTest 2.0-6 and a development version of lme4 (1.1-8; but I wouldn't expect there to be any relevant differences from 1.1-6 for this example ...)
AllData <- expand.grid(Condition=factor(1:3),Group=factor(1:2),
Participant=1:28,Trial=1:8)
form <- TRT5ToVerb ~ Group + Condition + (1+Condition|Participant) + (1|Trial)
library(lme4)
set.seed(101)
AllData$TRT5ToVerb <- simulate(form[-2],
newdata=AllData,
family=gaussian,
newparam=list(theta=rep(1,7),sigma=1,beta=rep(0,4)))[[1]]
library(lmerTest)
lmer3 <- lmer(form,data=AllData,REML=FALSE)
summary(lmer3)
Produces:
Linear mixed model fit by maximum likelihood ['merModLmerTest']
Formula: TRT5ToVerb ~ Group + Condition + (1 + Condition | Participant) +
(1 | Trial)
Data: AllData
AIC BIC logLik deviance df.resid
4073.6 4136.0 -2024.8 4049.6 1332
Scaled residuals:
Min 1Q Median 3Q Max
-2.97773 -0.65923 0.02319 0.66454 2.98854
Random effects:
Groups Name Variance Std.Dev. Corr
Participant (Intercept) 0.8546 0.9245
Condition2 1.3596 1.1660 0.58
Condition3 3.3558 1.8319 0.44 0.82
Trial (Intercept) 0.9978 0.9989
Residual 0.9662 0.9829
Number of obs: 1344, groups: Participant, 28; Trial, 8
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.49867 0.39764 12.40000 1.254 0.233
Group2 0.03002 0.05362 1252.90000 0.560 0.576
Condition2 -0.03777 0.22994 28.00000 -0.164 0.871
Condition3 -0.27796 0.35237 28.00000 -0.789 0.437
Correlation of Fixed Effects:
(Intr) Group2 Cndtn2
Group2 -0.067
Condition2 0.220 0.000
Condition3 0.172 0.000 0.794

Resources