Here is how I do propensity score matching in R:
m.out <- matchit(treat ~ x1+x2, data = Newdata, method = "subclass", subclass=6)
dta_m <- match.data(m.out)
propensity <- glm.nb(y ~ treat+x1+x2+treat:x1+treat:x2,data=dta_m)
summary(propensity)
Thereinto,"treat" is a dummy variable.
I want to see the accuracy of matching function (matchit), Hence I want to get Area under the ROC curve. My question is how to get AUC in PSM?
Thank you.
You should not do this. See my answer here. Several studies have shown that there is no correspondence between the AUC of a propensity score model (aka the C-statistic) and its performance. That said, the propensity scores are stored in the distance component of the matchit output object, so you can take those and the treatment vector and put them into a function that computes the AUC from these values. I don't know of a function to do this because, as I mentioned, it's not good practice to do this with propensity scores.
Related
I'm using MatchIt to perform time-varying propensity score matching. I estimate propensity scores and do nearest neighbour matching on those, as well as exact matching on a few variables (some which are used in the estimation of propensity scores, some which are not), as follows:
matches <- matchit(
## Estimate propensity scores and perform nearest neighbour matching on propensity scores
y ~ x1
+ x2
+ x3
+ x4
+ x5
, method = "nearest" # matching method: nearest neighbour matching (on propensity score)
, distance = "glm" # method for estimating the propensity score: 'glm' = logit
# Also perform exact matching on additional variables
, exact = ~ x3
+ x4
+ x6
, data = df
, s.weights = ~ sampling_weights
)
Balance is good across all variables, but it's not so good on the propensity scores.
I think that matching on percentiles of propensity scores would solve this problem. My understanding is that this could be achieved by changing the 'method' argument to:
method = "subclass", subclass = 100
However, I don't think it's possible to use method = subclass while exact matching on other variables.
Can anyone say if it's possible to match on percentiles of propensity scores, while exact matching on other covariates using MatchIt?
Edited for clarity
Balance is good across all variables, but it's not so good on the
propensity scores.
Balance doesn't need to be good on the propensity scores. In fact, Stuart et al. (2013) found that balance on the propensity score is totally uncorrelated with bias. The purpose of matching is to achieve balance on the covariates; the propensity score is just an instrument to achieve that end. This is the propensity score tautology described in Ho et al. (2007). It sounds like your nearest neighbor match would be sufficient if balance was achieved, though it also sounds like your results might improve with a more sophisticated matching method, like genetic matching. Remember that many matching methods don't involve a propensity score at all.
You can also try full matching, which is very similar to subclassification with many subclasses and does allow exact matching.
If you can tell me what you think subclassification with exact matching constraints is supposed to look like, I can tell you how to achieve it. But subclassification works differently from other matching methods and it's not immediately clear how to combine it with subclassification.
Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2007). Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis, 15(3), 199–236. https://doi.org/10.1093/pan/mpl013
Stuart, E. A., Lee, B. K., & Leacy, F. P. (2013). Prognostic score-based balance measures can be a useful diagnostic for propensity score methods in comparative effectiveness research. Journal of Clinical Epidemiology, 66(8), S84. https://doi.org/10.1016/j.jclinepi.2013.01.013
My analysis focuses on causal inference. I am using the inverse of the propensity scores to form weights (Propensity score is the probability of receiving a treatment (or intervention) given a set of covariates).
My question is, does anyone know how I do do the balance assessment for covariates before and after weighting?
I know there are packages out there that may do this, but I want to write it by hand not by using packages.
Here is an example:
X1<-c(1,1,1,0,0,1) #Covariate
X2<c(0,1,1,0,1,0) #Covariate
X3<-c(1,0,1,1,1,0) #Treatment
X4<- c(1,0,1,1,0,0) #Outcome
data<-data.frame(X1,X2,X3,X4)
model<- glm(X3~X1+X2, family= "binomial", data=subset(data, X3==1))
propensity_score<- predict(p, newdata=data, type="response")
weights<- 1/propensity_score
The is to see if the covariates have a balance after weighting with the inverse of the propensity score (I know the general idea but am not familiar with the theory behind it)
I'm trying to perform propensity score matching on survey data. I'm aware of the package MatchIt which is able to make the matching procedure but can I include in some ways the individual weights? because if I don't consider them, a less relevant observation can be match with a more relevant one. Thank you!
Update 2020-11-25 below this answer.
Survey weights cannot be used with matching in this way. You might consider using weighting, which can accommodate survey weights. With weighting, you estimate the propensity score weights using a model that accounts for the survey weights, and then multiply the estimated weights by the survey weights to arrive at your final set of weights.
This can be done using the weighting companion to the MatchIt package, WeightIt (of which I am the author). With your treatment A, outcome Y (I assume continuous for this demonstration), covariates X1 and X2, and sampling weights S, you could run the following:
#Estimate the propensity score weights
w.out <- weightit(A ~ X1 + X2, data = data, s.weights = "S",
method = "ps", estimand = "ATT")
#Combine the estimated weights with the survey weights
att.weights <- w.out$weights * data$S
#Fit the outcome model with the weights
fit <- lm(Y ~ A, data = data, weights = att.weights)
#Estimate the effect of treatment and its robust standard error
lmtest::coeftest(fit, vcov. = sandwich::vcovHC)
It's critical that you assess balance after estimating the weights; you can do that using the cobalt package, which works with WeightIt objects and automatically incorporates the sampling weights into the balance statistics. Prior to estimating the effect, you would run the following:
cobalt::bal.tab(w.out, un = TRUE)
Only if balance was achieved would you continue on to estimating the treatment effect.
There are other ways to estimate weights besides using logistic regression propensity scores. WeightIt provides support for many methods, and almost all of them support sampling weights. The documentation for each method explains whether sampling weights are supported.
MatchIt 4.0.0 now supports survey weights through the s.weights, just like WeightIt. This supplies survey weights to the model used to estimate the propensity scores but otherwise does not affect the matching. If you want units to be paired with other units that have similar survey weights, you should enter the survey weights as a variable to match on or to place a caliper on.
A normal Cox Regression is as following:
coxph(formula = Surv(time, status) ~ v1 + v2 + v3, data = x)
I've calculated the Inverse Propensity Treatment Weighting (IPTW) scores with the subsequent Propensity Scores.
Propensity scores can be calculated as following:
ps<-glm(treat~v1+v2+v3, family="binomial", data=x)
Weights used for IPTW are calculated as following:
weight <- ifelse (treat==1, 1/(ps), 1/(1-ps))
Every subject in the dataset can be weighted with aforementioned method (every subject does get a specific weight, calculated as above), but I see no place to put the weights in the 'normal' Cox regression formula.
Is there a Cox regression formula wherein we can assess the calculated weights to each subject and what R package or code is being used for these calculations?
Propensity score weighting method
(inverse probability weighting method)
R was used for the following statistical analysis.
Load the following R packages:
library(ipw)
library(survival)
Estimate propensity score for each ID in your data frame (base_model), based on variables.
The propensity score is the probability of assignment of treatment in the presence of given covariates (v).
As shown in your data,
PS estimation
ps_model <- glm(treatment~v1+v2+v3...., family = binomial, data = base_model)
summary(ps_model)
# view propensity score values
pscore <- ps_model$fitted.values
dataframe$propensityScore <- predict(ps_model, type = "response")
Calculate weights
#estimate weight for each patient
base_model$weight.ATE <- ifelse((base_model$treatment=="1"),(1/base_model$propensityScore), (1/(1-base_model$propensityScore)))
base_weight <- ipwpoint(exposure = treatment, family = "binomial", link="logit", numerator = ~1, denominator =~v1+v2+v3....vn, data = base_model, trunc=0.05) #truncation of 5% for few extreme weights if needed
Survival analysis: Cox regression
#time to event analysis with weights
HR5 <- coxph(Surv(time, event)~as.factor(treat_group), weights = weights.trunc, data = base_model)
summary(HR5)
weights argument was added based on the estimated weights earlier.
cobalt or tableOne packages of R would help you view balance in characteristics before and after propensity score weighting.
Good luck!
You can do like this using the DIVAT dataset from iptwsurvival package:
##Generate ID
DIVAT$ID<- 1:nrow(DIVAT)
We can calculate the IPTW as the average treatment effect instead as the average treatment effect among treated
DIVAT$p.score <- glm(retransplant ~ age + hla, data = DIVAT,
family = "binomial")$fitted.values
DIVAT$ate.weights <- with(DIVAT, retransplant * 1/p.score + (1-retransplant)* 1/(1-p.score))
Than we can perform a cox regression
####COX without weight
coxph(Surv(times, failures)~ retransplant, data=DIVAT)->fit
summary(fit)
Adding weight is quite easy
###COX with weight naive model
coxph(Surv(times, failures)~ retransplant, data=DIVAT, weights = ate.weights)->fit
summary(fit)
###COX with weight and robust estimation
coxph(Surv(times, failures)~ retransplant + cluster(ID), data=DIVAT, weights = ate.weights)->fit
summary(fit)
However, in this way the estimation of standard error is biased (please see Austin, Peter C. "Variance estimation when using inverse probability of treatment weighting (IPTW) with survival analysis." Statistics in medicine 35.30 (2016): 5642-5655.).
Austin suggested to rely on bootstrap estimator. However I'm stacked too, since I'm not able to find a way to perform this kind of analyses. If you found any answer please let me know.
I'm running a dif-in-dif estimation and using the MatchIt package to match my treatment and control groups by their distance to a certain location (nearest neighbour matching, logit model, caliper = 0.25).
Everything is ok with the actual matching, however I ran across this kind of plot in a paper I read:
I'm a bit confused, how is it possible to plot propensity scores before matching since the matching itself gives the propensity scores? So if anyone is familiar with this kind of plotting I'd appreciate help. Here's my code so far, which only gives the density functions after matching for treatment (Near) and control.
m.df <- matchit(Near ~ Distance_to_center, data = df, method = "nearest", distance = "logit", caliper =0.25)
mdf <- match.data(m.df,distance = "pscore")
df <- mdf
plot(density(df$pscore[df$Near==1]))
plot(density(df$pscore[df$Near==0]))
Matching does not give the propensity scores. Propensity scores are first estimated, then matchit() matches units on the propensity scores.
You can extract the propensity scores for the whole sample from the matchit object. What you did when you used match.data() is extract the propensity scores for only the matched data. The propensity scores for the whole sample are stored in m.df$distance. So, to manually generate those plots, you can use:
plot(density(m.df$distance[df$Near==1]))
plot(density(m.df$distance[df$Near==0]))
before using match.data().
You can also use the cobalt package to automatically generate these plots:
bal.plot(m.df, var.name = "distance", which = "both")
will generate the same density plots in one simple line of code.