How to fit frailty survival models in R - r

Because this is such a long question I've broken it down into 2 parts; the first being just the basic question and the second providing details of what I've attempted so far.
Question - Short
How do you fit an individual frailty survival model in R? In particular I am trying to re-create the coefficient estimates and SE's in the table below that were found from fitting the a semi-parametric frailty model to this dataset link. The model takes the form:
h_i(t) = z_i h_0(t) exp(\beta'X_i)
where z_i is the unknown frailty parameter per each patient, X_i is a vector of explanatory variables, \beta is the corresponding vector of coefficients and h_0(t) is the baseline hazard function using the explanatory variables disease, gender, bmi & age ( I have included code below to clean up the factor reference levels).
Question - Long
I am attempting to follow and re-create the Modelling Survival Data in Medical Research text book example for fitting frailty mdoels. In particular I am focusing on the semi parametric model for which the textbook provides parameter and variance estimates for the normal cox model, lognormal frailty and Gamma frailty which are shown in the above table
I am able to recreate the no frailty model estimates using
library(dplyr)
library(survival)
dat <- read.table(
"./Survival of patients registered for a lung transplant.dat",
header = T
) %>%
as_data_frame %>%
mutate( disease = factor(disease, levels = c(3,1,2,4))) %>%
mutate( gender = factor(gender, levels = c(2,1)))
mod_cox <- coxph( Surv(time, status) ~ age + gender + bmi + disease ,data = dat)
mod_cox
however I am really struggling to find a package that can reliably re-create the results of the second 2 columns. Searching online I found this table which attempts to summarise the available packages:
source
Below I have posted my current findings as well as the code I've used encase it helps someone identify if I have simply specified the functions incorrectly:
frailtyEM - Seems to work the best for gamma however doesn't offer log-normal models
frailtyEM::emfrail(
Surv(time, status) ~ age + gender + bmi + disease + cluster(patient),
data = dat ,
distribution = frailtyEM::emfrail_dist(dist = "gamma")
)
survival - Gives warnings on the gamma and from everything I've read it seems that its frailty functionality is classed as depreciated with the recommendation to use coxme instead.
coxph(
Surv(time, status) ~ age + gender + bmi + disease + frailty.gamma(patient),
data = dat
)
coxph(
Surv(time, status) ~ age + gender + bmi + disease + frailty.gaussian(patient),
data = dat
)
coxme - Seems to work but provides different estimates to those in the table and doesn't support gamma distribution
coxme::coxme(
Surv(time, status) ~ age + gender + bmi + disease + (1|patient),
data = dat
)
frailtySurv - I couldn't get to work properly and seemed to always fit the variance parameter with a flat value of 1 and provide coefficient estimates as if a no frailty model had been fitted. Additionally the documentation doesn't state what strings are support for the frailty argument so I couldn't work out how to get it to fit a log-normal
frailtySurv::fitfrail(
Surv(time, status) ~ age + gender + bmi + disease + cluster(patient),
dat = dat,
frailty = "gamma"
)
frailtyHL - Produce warning messages saying "did not converge" however it still produced coeficiant estimates however they were different to that of the text books
mod_n <- frailtyHL::frailtyHL(
Surv(time, status) ~ age + gender + bmi + disease + (1|patient),
data = dat,
RandDist = "Normal"
)
mod_g <- frailtyHL::frailtyHL(
Surv(time, status) ~ age + gender + bmi + disease + (1|patient),
data = dat,
RandDist = "Gamma"
)
frailtypack - I simply don't understand the implementation (or at least its very different from what is taught in the text book). The function requires the specification of knots and a smoother which seem to greatly impact the resulting estimates.
parfm - Only fits parametric models; having said that everytime I tried to use it to fit a weibull proportional hazards model it just errored.
phmm - Have not yet tried
I fully appreciate given the large number of packages that I've gotten through unsuccessfully that it is highly likely that the problem is myself not properly understanding the implementation and miss using the packages. Any help or examples on how to successfully re-create the above estimates though would be greatly appreciated.

Regarding
I am really struggling to find a package that can reliably re-create the results of the second 2 columns.
See the Survival Analysis CRAN task view under Random Effect Models or do a search on R Site Search on e.g., "survival frailty".

Related

Probing interactions in nlme using the "interactions" package in R

I am running a linear mixed effects models using the "nlme" package looking at stress and lifestyle as predictors of change in cognition over 4 years in a longitudinal dataset. All variables in the model are continuous variables.
I am able to create the model and get the summary statistics using this code:
mod1 <- lme(MS ~ age + sex + edu + GDST1*Time + HLI*Time + GDST1*HLI*Time, random= ~ 1|ID, data=NuAge_long, na.action=na.omit)
summary(mod1)
I am trying to use the "interactions" package to probe the 3-way interaction:
sim_slopes(model = mod1, pred = Time, modx = GDST1, mod2 = HLI, data = NuAge_long)
but am receiving this error:
Error in if (tcol == "df") tcol <- "t val." : argument is of length zero
I am also trying to plot the interaction using the same "interactions" package:
interact_plot(model = mod1, pred = Time, modx = GDST1, mod2 = HLI, data = NuAge_long)
and am receiving this error:
Error in UseMethod("family") : no applicable method for 'family' applied to an object of class "lme"
I can't seem to find what these errors mean and why I'm getting them. Any help would be appreciated!
From ?interactions::sim_slopes:
The function is tested with ‘lm’, ‘glm’,
‘svyglm’, ‘merMod’, ‘rq’, ‘brmsfit’, ‘stanreg’ models. Models
from other classes may work as well but are not officially
supported. The model should include the interaction of
interest.
Note this does not include lme models. On the other hand, merMod models are those generated by lme4::[g]lmer(), and as far as I can tell you should be able to fit this model equally well with lmer():
library(lme4)
mod1 <- lmer(MS ~ age + sex + edu + GDST1*Time + HLI*Time + GDST1*HLI*Time
+ (1|ID), data=NuAge_long)
(things will get harder if you want to specify correlation structures, e.g. correlation = corAR1(), which works for lme() but not lmer() ...)

Stratified Cox Model, Unusual Output For Stratified Factor Variable

I'm estimating a Cox PH model in R with time-varying stratification on a few of the variables. I'm using the following code to create the dataset and run the estimation
sepsis_working_fac_strat7 <- survSplit(Surv(LOS, facility) ~ ., data = sepsis_working, cut = 7, episode = "tgroup", id = "id")
cox_facility2 <- coxph(Surv(tstart, LOS, facility) ~ Age_10 + Gender + Insurance + LowInc + AbxBeforeCulture+ MorningDC+ AttendingAffilGrp+
CentralLine:strata(tgroup)+ Consults+ CountIVAbx:strata(tgroup)+ DischargeUnit+ FSSAdmit+ OralAbxBeforeDC+
OrderedToDischarge+ TimeToAbx+ TimeToBC+ UrineCulture+ Vasopressor+ Ventilator+
Behavioral_Dx + BradenGroup+ CCI+ Diabetes_Dx+ Dialysis+ PVD_Dx+ SUD_Dx, data = sepsis_working_fac_strat7, cluster = PersPersObjId)
In the output, there are four rows for the covariate "CentralLine" where there should only be two:
I have run the same estimation for other event types in the data and have not encountered this problem for CentralLine, which is a factor variable with two levels, where the base level is set to "No". Normally I see two rows for the strata, one for above and below the cut point. In the data, there are 184 events with a fairly even distribution between CentralLine == "Yes" and CentralLine == "No".
I'd like to know what is causing this output to appear as it does.
EDIT: This may be the issue: Why coxph() results some of the coefficient as NA when using survSplit() in R?

How to estimate coefficients for linear regression model on CEM matched data using `R` `cem` package?

I'm working on a project where we'd like to run a follow-up linear regression model on treatment-control data where the treatments have been matched to the controls using the cem package to perform coarsened exact matching:
match <- cem(treatment="cohort", data=df, drop=c("member_id","period","cohort_period"))
est <- att(match, total_cost ~ cohort + period + cohort_period, data = df)
where I'd like to estimate the coefficient and 95% CI on the "cohort_period" interaction term. It seems the att function in the cem package only estimates the coefficient for the specified treatment variable (in this case, "cohort") while adjusting for other variables in the regression.
Is there a way to return the coefficients and 95% CIs for the other regression terms?
Figured it out! I was using the wrong package - instead of cem I discovered the MatchIt and Zelig packages allow me to perform both exact matching and parameteric regression on the matched data:
library(MatchIt)
library(Zelig)
matched_df <- matchit(cohort ~ age_catg + sex + market_code + risk_score_catg, method="exact", data=df)
matched_df_reg <- zelig(total_cost ~ cohort + period + cohort_period, data = match.data(matched_df), model = "ls")

How to run fixed-effects logit model with clustered standard errors and survey weights in R?

I am using Afrobarometer survey data using 2 rounds of data for 10 countries. My DV is a binary 0-1 variable. I need to use logistic regression, fixed-effects, clustered standard errors (at country), and weighted survey data. A variable for the weights already exists in the dataframe.
I've been looking at help files for the following packages: clogit, glm, pglm, glm2, zelig, bife , etc. Typical errors include: can't add weights, can't do fixed effects, cant do either or etc.
#Glm
t3c1.fixed <- glm(formula = ethnic ~ elec_prox +
elec_comp + round + country, data=afb,
weights = afb$survey_weight,
index c("country", "round"),
family=binomial(link='logit'))
#clogit
t3c1.fixed2 <- clogit(formula = ethnic ~ elec_prox +
elec_comp + round + country, data=afb,
weights = afb$survey_weight,
method=c("within"))
#bife attempt
library(bife)
t3c1.fixed3 <- bife(ethnic ~ elec_prox + elec_comp + round +
country, model = logit,data=afb,
weights = afb$survey_weight,
bias_corr = "ana")
I either get error messages or the code doesn't include one of the conditions I need to include, so I can't use them. In Stata it appears this process is very simple, but in R it seems rather tedious. Any help would be appreciated!
I would check out the survey package which provides everything for which you are asking. The first step is to create the survey object, specify the survey weights and then you are off to the races.
library(survey)
my_survey <- svydesign(ids= ~1, strata = ~country, wts = ~wts, data = your_data)
# Then you can use the survey glm to do what you want via
svy_fit <- svy_glm(ethnic ~ elec_prox +
elec_comp + round + country, data = my_survey, family = binomial())
Or at least I would go down this path given you are using survey data.

how to do predictions from cox survival model with time varying coefficients

I have built a survival cox-model, which includes a covariate * time interaction (non-proportionality detected).
I am now wondering how could I most easily get survival predictions from my model.
My model was specified:
coxph(formula = Surv(event_time_mod, event_indicator_mod) ~ Sex +
ageC + HHcat_alt + Main_Branch + Acute_seizure + TreatmentType_binary +
ICH + IVH_dummy + IVH_dummy:log(event_time_mod)
And now I was hoping to get a prediction using survfit and providing new.data for the combination of variables I am doing the predictions:
survfit(cox, new.data=new)
Now as I have event_time_mod in the right-hand side in my model I need to specify it in the new data frame passed on to survfit. This event_time would need to be set at individual times of the predictions. Is there an easy way to specify event_time_mod to be the correct time to survfit?
Or are there any other options for achieving predictions from my model?
Of course I could create as many rows in the new data frame as there are distinct times in the predictions and setting to event_time_mod to correct values but it feels really cumbersome and I thought that there must be a better way.
You have done what is refereed to as
An obvious but incorrect approach ...
as stated in Using Time Dependent Covariates and Time Dependent Coefficients in the Cox Model vignette in version 2.41-3 of the R survival package. Instead, you should use the time-transform functionality, i.e., the tt function as stated in the same vignette. The code would be something similar to the example in the vignette
> library(survival)
> vfit3 <- coxph(Surv(time, status) ~ trt + prior + karno + tt(karno),
+ data=veteran,
+ tt = function(x, t, ...) x * log(t+20))
>
> vfit3
Call:
coxph(formula = Surv(time, status) ~ trt + prior + karno + tt(karno),
data = veteran, tt = function(x, t, ...) x * log(t + 20))
coef exp(coef) se(coef) z p
trt 0.01648 1.01661 0.19071 0.09 0.9311
prior -0.00932 0.99073 0.02030 -0.46 0.6462
karno -0.12466 0.88279 0.02879 -4.33 1.5e-05
tt(karno) 0.02131 1.02154 0.00661 3.23 0.0013
Likelihood ratio test=53.8 on 4 df, p=5.7e-11
n= 137, number of events= 128
The survfit though does not work when you have a tt term
> survfit(vfit3, veteran[1, ])
Error in survfit.coxph(vfit3, veteran[1, ]) :
The survfit function can not yet process coxph models with a tt term
However, you can easily get out the terms, linear predictor or mean response with predict. Further, you can create the term over time for the tt term using the answer here.

Resources