Cox regression with Inverse Propensity Treatment Weighting - r

A normal Cox Regression is as following:
coxph(formula = Surv(time, status) ~ v1 + v2 + v3, data = x)
I've calculated the Inverse Propensity Treatment Weighting (IPTW) scores with the subsequent Propensity Scores.
Propensity scores can be calculated as following:
ps<-glm(treat~v1+v2+v3, family="binomial", data=x)
Weights used for IPTW are calculated as following:
weight <- ifelse (treat==1, 1/(ps), 1/(1-ps))
Every subject in the dataset can be weighted with aforementioned method (every subject does get a specific weight, calculated as above), but I see no place to put the weights in the 'normal' Cox regression formula.
Is there a Cox regression formula wherein we can assess the calculated weights to each subject and what R package or code is being used for these calculations?

Propensity score weighting method
(inverse probability weighting method)
R was used for the following statistical analysis.
Load the following R packages:
library(ipw)
library(survival)
Estimate propensity score for each ID in your data frame (base_model), based on variables.
The propensity score is the probability of assignment of treatment in the presence of given covariates (v).
As shown in your data,
PS estimation
ps_model <- glm(treatment~v1+v2+v3...., family = binomial, data = base_model)
summary(ps_model)
# view propensity score values
pscore <- ps_model$fitted.values
dataframe$propensityScore <- predict(ps_model, type = "response")
Calculate weights
#estimate weight for each patient
base_model$weight.ATE <- ifelse((base_model$treatment=="1"),(1/base_model$propensityScore), (1/(1-base_model$propensityScore)))
base_weight <- ipwpoint(exposure = treatment, family = "binomial", link="logit", numerator = ~1, denominator =~v1+v2+v3....vn, data = base_model, trunc=0.05) #truncation of 5% for few extreme weights if needed
Survival analysis: Cox regression
#time to event analysis with weights
HR5 <- coxph(Surv(time, event)~as.factor(treat_group), weights = weights.trunc, data = base_model)
summary(HR5)
weights argument was added based on the estimated weights earlier.
cobalt or tableOne packages of R would help you view balance in characteristics before and after propensity score weighting.
Good luck!

You can do like this using the DIVAT dataset from iptwsurvival package:
##Generate ID
DIVAT$ID<- 1:nrow(DIVAT)
We can calculate the IPTW as the average treatment effect instead as the average treatment effect among treated
DIVAT$p.score <- glm(retransplant ~ age + hla, data = DIVAT,
family = "binomial")$fitted.values
DIVAT$ate.weights <- with(DIVAT, retransplant * 1/p.score + (1-retransplant)* 1/(1-p.score))
Than we can perform a cox regression
####COX without weight
coxph(Surv(times, failures)~ retransplant, data=DIVAT)->fit
summary(fit)
Adding weight is quite easy
###COX with weight naive model
coxph(Surv(times, failures)~ retransplant, data=DIVAT, weights = ate.weights)->fit
summary(fit)
###COX with weight and robust estimation
coxph(Surv(times, failures)~ retransplant + cluster(ID), data=DIVAT, weights = ate.weights)->fit
summary(fit)
However, in this way the estimation of standard error is biased (please see Austin, Peter C. "Variance estimation when using inverse probability of treatment weighting (IPTW) with survival analysis." Statistics in medicine 35.30 (2016): 5642-5655.).
Austin suggested to rely on bootstrap estimator. However I'm stacked too, since I'm not able to find a way to perform this kind of analyses. If you found any answer please let me know.

Related

Longitudinal analysis using sampling weigths in R

I have longitudinal data from two surveys and I want to do a pre-post analysis. Normally, I would use survey::svyglm() or svyVGAM::svy_vglm (for multinomial family) to include sampling weights, but these functions don't account for the random effects. On the other hand, lme4::lmer accounts for the repeated measures, but not the sampling weights.
For continuous outcomes, I understand that I can do
w_data_wide <- svydesign(ids = ~1, data = data_wide, weights = data_wide$weight)
svyglm((post-pre) ~ group, w_data_wide)
and get the same estimates that I would get if I could use lmer(outcome ~ group*time + (1|id), data_long) with weights [please correct me if I'm wrong].
However, for categorical variables, I don't know how to do the analyses. WeMix::mix() has a parameter weights, but I'm not sure if it treats them as sampling weights. Still, this function can't support multinomial family.
So, to resume: can you enlighten me on how to do a pre-post test analysis of categorical outcomes with 2 or more levels? Any tips about packages/functions in R and how to use/write them would be appreciated.
I give below some data sets with binomial and multinomial outcomes:
library(data.table)
set.seed(1)
data_long <- data.table(
id=rep(1:5,2),
time=c(rep("Pre",5),rep("Post",5)),
outcome1=sample(c("Yes","No"),10,replace=T),
outcome2=sample(c("Low","Medium","High"),10,replace=T),
outcome3=rnorm(10),
group=rep(sample(c("Man","Woman"),5,replace=T),2),
weight=rep(c(1,0.5,1.5,0.75,1.25),2)
)
data_wide <- dcast(data_long, id~time, value.var = c('outcome1','outcome2','outcome3','group','weight'))[, `:=` (weight_Post = NULL, group_Post = NULL)]
EDIT
As I said below in the comments, I've been using lmer and glmer with variables used to calculate the weights as predictors. It happens that glmer returns a lot of problems (convergence, high eigenvalues...), so I give another look at #ThomasLumley answer in this post and others (https://stat.ethz.ch/pipermail/r-help/2012-June/315529.html | https://stats.stackexchange.com/questions/89204/fitting-multilevel-models-to-complex-survey-data-in-r).
So, my question is now if a can use participants id as clusters in svydesign
library(survey)
w_data_long_cluster <- svydesign(ids = ~id, data = data_long, weights = data_long$weight)
summary(svyglm(factor(outcome1) ~ group*time, w_data_long_cluster, family="quasibinomial"))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.875e+01 1.000e+00 18.746 0.0339 *
groupWoman -1.903e+01 1.536e+00 -12.394 0.0513 .
timePre 5.443e-09 5.443e-09 1.000 0.5000
groupWoman:timePre 2.877e-01 1.143e+00 0.252 0.8431
and still interpret groupWoman:timePre as differences in the average rate of change/improvement in the outcome over time between sex groups, as if I was using mixed models with participants as random effects.
Thank you once again!
A linear model with svyglm does not give the same parameter estimates as lme4::lmer. It does estimate the same parameters as lme4::lmer if the model is correctly specified, though.
Generalised linear models with svyglm or svy_vglm don't estimate the same parameters as lme4::glmer, as you note. However, they do estimate perfectly good regression parameters and if you aren't specifically interested in the variance components or in estimating the realised random effects (BLUPs) I would recommend just using svy_glm.
Another option if you have non-survey software for random effects versions of the models is to use that. If you scale the weights to sum to the sample size and if all the clustering in the design is modelled by random effects in the model, you will get at least a reasonable approximation to valid inference. That's what I've seen recommended for Bayesian survey modelling, for example.

What weights mean in WeightIt package

I want to balance my data using the WeightIt package in R (method= ebal). I have used a code similar to the one below;
#Balancing covariates between treatment groups (binary)
W1 <- weightit(treat ~ age + educ + married + nodegree + re74, data = lalonde, method = "ebal", estimand = "ATT")
match.data(W1)
The outcome is my data table with an additional column called weights. What do those weights mean and how do I go on from here? (My next step would be to do a logit regression with a balanced dataset)
Thank you so much for helping!
weightit() estimates weights that, when applied to a dataset, yield balance in the treatment groups. To estimate effects in the weighted sample, include the weights in a regression of the outcome on the treatment. This is demonstrated in the WeightIt vignette.
You should not used match.data() with WeightIt. I'm not sure where you found the code to do that. match.data() is for use with MatchIt, which is a different package with its own functions. The fact that match.data() happened to work with WeightIt is unintended behavior and should not be relied on.
To estimate the effect of the treatment on a binary outcome (which I'll denote as Y in the code below and assume is in the lalonde dataset, even though in reality it is not), you would run the following after running the first line in your code above:
fit <- glm(Y ~ treat, data = lalonde, weights = W1$weights, family = binomial)
lmtest::coeftest(fit, vcov. = sandwich::vcovHC)
The coefficient on treat is the log odds ratio of the outcome.

Propensity score matching with individual weights

I'm trying to perform propensity score matching on survey data. I'm aware of the package MatchIt which is able to make the matching procedure but can I include in some ways the individual weights? because if I don't consider them, a less relevant observation can be match with a more relevant one. Thank you!
Update 2020-11-25 below this answer.
Survey weights cannot be used with matching in this way. You might consider using weighting, which can accommodate survey weights. With weighting, you estimate the propensity score weights using a model that accounts for the survey weights, and then multiply the estimated weights by the survey weights to arrive at your final set of weights.
This can be done using the weighting companion to the MatchIt package, WeightIt (of which I am the author). With your treatment A, outcome Y (I assume continuous for this demonstration), covariates X1 and X2, and sampling weights S, you could run the following:
#Estimate the propensity score weights
w.out <- weightit(A ~ X1 + X2, data = data, s.weights = "S",
method = "ps", estimand = "ATT")
#Combine the estimated weights with the survey weights
att.weights <- w.out$weights * data$S
#Fit the outcome model with the weights
fit <- lm(Y ~ A, data = data, weights = att.weights)
#Estimate the effect of treatment and its robust standard error
lmtest::coeftest(fit, vcov. = sandwich::vcovHC)
It's critical that you assess balance after estimating the weights; you can do that using the cobalt package, which works with WeightIt objects and automatically incorporates the sampling weights into the balance statistics. Prior to estimating the effect, you would run the following:
cobalt::bal.tab(w.out, un = TRUE)
Only if balance was achieved would you continue on to estimating the treatment effect.
There are other ways to estimate weights besides using logistic regression propensity scores. WeightIt provides support for many methods, and almost all of them support sampling weights. The documentation for each method explains whether sampling weights are supported.
MatchIt 4.0.0 now supports survey weights through the s.weights, just like WeightIt. This supplies survey weights to the model used to estimate the propensity scores but otherwise does not affect the matching. If you want units to be paired with other units that have similar survey weights, you should enter the survey weights as a variable to match on or to place a caliper on.

Get predictions from coxph

# Create the simplest test data set
test1 <- list(time=c(4,3,1,1,2,2,3),
status=c(1,1,1,0,1,1,0),
x=c(0,2,1,1,1,0,0),
sex=c(0,0,0,0,1,1,1))
# Fit a stratified model
m=coxph(Surv(time, status) ~ x + sex, test1)
y=predict(m,type="survival",by="sex")
Basically what I am doing is making fake data called test1, then I am fitting a simple coxph model and saving it as 'm'. Then what I aim to do is get the predicted probabilities and confidence bands for the survival probability separate for sexes. My hopeful dataset 'y' will include: age, survival probability, lower confidence band, upper confidence band, and sex which equals to '0' or '1'.
This can be accomplished in two ways. The first is a slight modification to your code, using the predict() function to get predictions at a specific times for specific combinations of covariates. The second is by using the survfit() function, which estimates the entire survival curve and is easy to plot. The confidence intervals don't exactly agree as we'll see, but they should match fairly closely as long as the probabilities aren't too close to 1 or 0.
Below is code to both make the predictions as your code tries. It uses the built-in cancer data. The important difference is to create a newdata which has the covariate values you're interested in. Because of the non-linear nature of survival probabilities it is generally a bad idea to try and make a prediction for the "average person". Because we want to get a survival probability we must also specify what time to consider that probability. I've taken time = 365, age = 60, and both sex = 1 and sex = 2 So this code predicts the 1-year survival probability for a 60 year old male and a 60 year old female. Note that we must also include status in the newdata, even though it doesn't affect the result.
library(survival)
mod <- coxph(Surv(time,status) ~ age + sex, data = cancer)
pred_dat <- data.frame(time = c(365,365), status = c(2,2),
age = c(60,60), sex = c(1,2))
preds <- predict(mod, newdata = pred_dat,
type = "survival", se.fit = TRUE)
pred_dat$prob <- preds$fit
pred_dat$lcl <- preds$fit - 1.96*preds$se.fit
pred_dat$ucl <- preds$fit + 1.96*preds$se.fit
pred_dat
#> time status age sex prob lcl ucl
#> 1 365 2 60 1 0.3552262 0.2703211 0.4401313
#> 2 365 2 60 2 0.5382048 0.4389833 0.6374264
We see that for a 60 year old male the 1 year survival probability is estimated as 35.5%, while for a 60 year old female it is 53.8%.
Below we estimate the entire survival curve using survfit(). I've saved time by reusing the pred_dat from above, and because the plot gets messy I've only plotted the male curve, which is the first row. I've also added some flair, but you only need the first 2 lines.
fit <- survfit(mod, newdata = pred_dat[1,])
plot(fit, conf.int = TRUE)
title("Estimated survival probability for age 60 male")
abline(v = 365, col = "blue")
abline(h = pred_dat[1,]$prob, col = "red")
abline(h = pred_dat[1,]$lcl, col = "orange")
abline(h = pred_dat[1,]$ucl, col = "orange")
Created on 2022-06-09 by the reprex package (v2.0.1)
I've overlaid lines corresponding to the predicted probabilities from part 1. The red line is the estimated survival probability at day 365 and the orange lines are the 95% confidence interval. The predicted survival probability matches, but if you squint closely you'll see the confidence interval doesn't match exactly. That's generally not a problem, but if it is a problem you should trust the ones from survfit() instead of the ones calculated from predict().
You can also dig into the values of fit to extract fitted probabilities and confidence bands, but the programming is a little more complicated because the desired time doesn't usually match exactly.
Section 5 of this document by Dimitris Rizopoulos discusses how to estimate Survival Probabilities from a Cox model. Dimitris Rizipoulos states:
the Cox model does not estimate the baseline hazard, and therefore we cannot directly obtain survival probabilities from it. To achieve that we need to combine it with a non-parametric estimator of the baseline hazard function. The most popular method to do that is to use the Breslow estimator. For a fitted Cox model from package survival these probabilities are calculated by function survfit(). As an illustration, we would like to derive survival probabilities from the following Cox model for the AIDS dataset:
He then goes on to provide R code that shows how to estimate Survival Probabilities at specific follow-up times.
I found this useful, it may help you too.

Parameter estimates and variance for stratified variables in Cox regression (strata / survival package)

I have run Cox regression using the survival package to calculate mortality hazard ratio of an exposure A. I have found that the age variable violated the proportional hazard assumption (with cox.zph) and used strata(age)to stratify age in further models.
I need a parameter estimate of the age variable, as well as the variance and the matrix of covariance (to calculate Rate Advancement Periods)... And I don't know where to find them!
Am I missing something or am I misunderstanding what strata is doing?
Here is a reproducible example, using the lung data from the survival package.
library(survival)
I create the survival object and do a first Cox regression with non-stratified age variable.
lung$SurvObj <- with(lung, Surv(time, status == 2))
coxreg1 <- coxph(SurvObj ~ age + sex, data = lung)
So, I get coefficients, variance, and covariance matrix for the parameter estimates.
> coxreg1$coefficients
age sex
0.01704533 -0.51321852
> vcov(coxreg1)
age sex
age 8.506877e-05 8.510634e-05
sex 8.510634e-05 2.804217e-02
Now, if do a second regression with the stratified age variable, I don't get any coefficient estimates, variance or covariance.
coxreg2 <- coxph(SurvObj ~ strata(age) + sex, data = lung)
> coxreg2$coefficients
sex
-0.64471
> vcov(coxreg2)
sex
sex 0.0449369
Thanks for the help!
When you use a variable for stratification you don't get any coefficient estimate for it. Instead separate baseline hazards are estimated for the different age groups.
The essence of a stratified cox regression is to fit a model that has a different baseline hazard in each stratum.

Resources