What weights mean in WeightIt package - r

I want to balance my data using the WeightIt package in R (method= ebal). I have used a code similar to the one below;
#Balancing covariates between treatment groups (binary)
W1 <- weightit(treat ~ age + educ + married + nodegree + re74, data = lalonde, method = "ebal", estimand = "ATT")
match.data(W1)
The outcome is my data table with an additional column called weights. What do those weights mean and how do I go on from here? (My next step would be to do a logit regression with a balanced dataset)
Thank you so much for helping!

weightit() estimates weights that, when applied to a dataset, yield balance in the treatment groups. To estimate effects in the weighted sample, include the weights in a regression of the outcome on the treatment. This is demonstrated in the WeightIt vignette.
You should not used match.data() with WeightIt. I'm not sure where you found the code to do that. match.data() is for use with MatchIt, which is a different package with its own functions. The fact that match.data() happened to work with WeightIt is unintended behavior and should not be relied on.
To estimate the effect of the treatment on a binary outcome (which I'll denote as Y in the code below and assume is in the lalonde dataset, even though in reality it is not), you would run the following after running the first line in your code above:
fit <- glm(Y ~ treat, data = lalonde, weights = W1$weights, family = binomial)
lmtest::coeftest(fit, vcov. = sandwich::vcovHC)
The coefficient on treat is the log odds ratio of the outcome.

Related

Running a glmer model with only random factors to get an estimate

I have run the following GLMER using the mixed function (from afex package) to estimate the paternity success of two different types of sires in different ecological scenarios. The predicted outcome is the relative numbers of offspring from each of the sires. I would like to know if sex ratio and density contribute to differential paternity success.
I used the following model to estimate this:
mixed(cbind(Ive, IV) ~ log2ratio + Total + (1|VialID) + (1|Batch),
method = "LRT", data = sex.ratioSO, family = binomial())
summary(sex.model.3)
sex.model.3$anova_table
Now, I would like to get estimates for each of these different sex ratios, so that I can plot them in a manner something like this:
In this plot, the "relative fitness" on the y axis can be calculated by the estimate.
So, for this I need the estimates specific to each sex ratio. I subsetted the data according to one of the "sexratio" using the following code:
unique(sexratioSO$ratio) # choosing the first sex ratio to subset
data = subset(sexratioSO, ratio == "2:01") # I would like to do this for all sex ratios in my data file.
I then ran, the null model with only my random factors to get an estimate of the relative numbers of offspring for that sex ratio so that I can plot it (I would like to do the same for all the sex ratios):
sex. ratio.estimate = lmer(cbind(Ive, IV) ~ (1|VialID) + (1|Batch), data = data,
control=lmerControl(check.nobs.vs.nlev = "ignore",
check.nobs.vs.rankZ = "ignore",
check.nobs.vs.nRE="ignore"))
However, I get this error -
Error in v/v.e : non-conformable arrays
FYI - "VialID" and "Batch" are factors since they are taken are random effects.
You can find the data file here.
It would be great if I can have some help on how I can obtain estimates for each of the sex ratios to plot them.
Many thanks.
ecosak

Longitudinal analysis using sampling weigths in R

I have longitudinal data from two surveys and I want to do a pre-post analysis. Normally, I would use survey::svyglm() or svyVGAM::svy_vglm (for multinomial family) to include sampling weights, but these functions don't account for the random effects. On the other hand, lme4::lmer accounts for the repeated measures, but not the sampling weights.
For continuous outcomes, I understand that I can do
w_data_wide <- svydesign(ids = ~1, data = data_wide, weights = data_wide$weight)
svyglm((post-pre) ~ group, w_data_wide)
and get the same estimates that I would get if I could use lmer(outcome ~ group*time + (1|id), data_long) with weights [please correct me if I'm wrong].
However, for categorical variables, I don't know how to do the analyses. WeMix::mix() has a parameter weights, but I'm not sure if it treats them as sampling weights. Still, this function can't support multinomial family.
So, to resume: can you enlighten me on how to do a pre-post test analysis of categorical outcomes with 2 or more levels? Any tips about packages/functions in R and how to use/write them would be appreciated.
I give below some data sets with binomial and multinomial outcomes:
library(data.table)
set.seed(1)
data_long <- data.table(
id=rep(1:5,2),
time=c(rep("Pre",5),rep("Post",5)),
outcome1=sample(c("Yes","No"),10,replace=T),
outcome2=sample(c("Low","Medium","High"),10,replace=T),
outcome3=rnorm(10),
group=rep(sample(c("Man","Woman"),5,replace=T),2),
weight=rep(c(1,0.5,1.5,0.75,1.25),2)
)
data_wide <- dcast(data_long, id~time, value.var = c('outcome1','outcome2','outcome3','group','weight'))[, `:=` (weight_Post = NULL, group_Post = NULL)]
EDIT
As I said below in the comments, I've been using lmer and glmer with variables used to calculate the weights as predictors. It happens that glmer returns a lot of problems (convergence, high eigenvalues...), so I give another look at #ThomasLumley answer in this post and others (https://stat.ethz.ch/pipermail/r-help/2012-June/315529.html | https://stats.stackexchange.com/questions/89204/fitting-multilevel-models-to-complex-survey-data-in-r).
So, my question is now if a can use participants id as clusters in svydesign
library(survey)
w_data_long_cluster <- svydesign(ids = ~id, data = data_long, weights = data_long$weight)
summary(svyglm(factor(outcome1) ~ group*time, w_data_long_cluster, family="quasibinomial"))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.875e+01 1.000e+00 18.746 0.0339 *
groupWoman -1.903e+01 1.536e+00 -12.394 0.0513 .
timePre 5.443e-09 5.443e-09 1.000 0.5000
groupWoman:timePre 2.877e-01 1.143e+00 0.252 0.8431
and still interpret groupWoman:timePre as differences in the average rate of change/improvement in the outcome over time between sex groups, as if I was using mixed models with participants as random effects.
Thank you once again!
A linear model with svyglm does not give the same parameter estimates as lme4::lmer. It does estimate the same parameters as lme4::lmer if the model is correctly specified, though.
Generalised linear models with svyglm or svy_vglm don't estimate the same parameters as lme4::glmer, as you note. However, they do estimate perfectly good regression parameters and if you aren't specifically interested in the variance components or in estimating the realised random effects (BLUPs) I would recommend just using svy_glm.
Another option if you have non-survey software for random effects versions of the models is to use that. If you scale the weights to sum to the sample size and if all the clustering in the design is modelled by random effects in the model, you will get at least a reasonable approximation to valid inference. That's what I've seen recommended for Bayesian survey modelling, for example.

plm() versus lm() with multiple fixed effects

I am attempting to run a model with county, year, and state:year fixed effects. The lm() approach looks like this:
lm <- lm(data = mydata, formula = y ~ x + county + year + state:year
where county, year, and state:year are all factors.
Because I have a large number of counties, running the model is very slow using lm(). More frustrating given the number of models I need to produce, lm() produces a much larger object than plm(). This plm() command yields the same coefficients and levels of significance for my main variables.
plm <- plm(data = mydata, formula = y ~ x + year + state:year, index = "county", model = "within"
However, these produce substantially different R-squared, Adj. R-squared, etc. I thought I could solve the R-squared problem by calculating the R-squared for plm by hand:
SST <- sum((mydata$y - mean(mydata$y))^2)
fit <- (mydata$y - plm$residuals)
SSR <- sum((fit - mean(mydata$y))^2)
R2 <- SSR / SST
I tested the R-squared code with lm and got the same result reported by summary(lm). However, when I calculated R-squared for plm I got a different R-squared (and it was greater than 1).
At this point I checked what the coefficients for my fixed effects in plm were and they were different than the coefficients in lm.
Can someone please 1) help me understand why I'm getting these differing results and 2) suggest the most efficient way to construct the models I need and obtain correct R-squareds? Thanks!

Cox regression with Inverse Propensity Treatment Weighting

A normal Cox Regression is as following:
coxph(formula = Surv(time, status) ~ v1 + v2 + v3, data = x)
I've calculated the Inverse Propensity Treatment Weighting (IPTW) scores with the subsequent Propensity Scores.
Propensity scores can be calculated as following:
ps<-glm(treat~v1+v2+v3, family="binomial", data=x)
Weights used for IPTW are calculated as following:
weight <- ifelse (treat==1, 1/(ps), 1/(1-ps))
Every subject in the dataset can be weighted with aforementioned method (every subject does get a specific weight, calculated as above), but I see no place to put the weights in the 'normal' Cox regression formula.
Is there a Cox regression formula wherein we can assess the calculated weights to each subject and what R package or code is being used for these calculations?
Propensity score weighting method
(inverse probability weighting method)
R was used for the following statistical analysis.
Load the following R packages:
library(ipw)
library(survival)
Estimate propensity score for each ID in your data frame (base_model), based on variables.
The propensity score is the probability of assignment of treatment in the presence of given covariates (v).
As shown in your data,
PS estimation
ps_model <- glm(treatment~v1+v2+v3...., family = binomial, data = base_model)
summary(ps_model)
# view propensity score values
pscore <- ps_model$fitted.values
dataframe$propensityScore <- predict(ps_model, type = "response")
Calculate weights
#estimate weight for each patient
base_model$weight.ATE <- ifelse((base_model$treatment=="1"),(1/base_model$propensityScore), (1/(1-base_model$propensityScore)))
base_weight <- ipwpoint(exposure = treatment, family = "binomial", link="logit", numerator = ~1, denominator =~v1+v2+v3....vn, data = base_model, trunc=0.05) #truncation of 5% for few extreme weights if needed
Survival analysis: Cox regression
#time to event analysis with weights
HR5 <- coxph(Surv(time, event)~as.factor(treat_group), weights = weights.trunc, data = base_model)
summary(HR5)
weights argument was added based on the estimated weights earlier.
cobalt or tableOne packages of R would help you view balance in characteristics before and after propensity score weighting.
Good luck!
You can do like this using the DIVAT dataset from iptwsurvival package:
##Generate ID
DIVAT$ID<- 1:nrow(DIVAT)
We can calculate the IPTW as the average treatment effect instead as the average treatment effect among treated
DIVAT$p.score <- glm(retransplant ~ age + hla, data = DIVAT,
family = "binomial")$fitted.values
DIVAT$ate.weights <- with(DIVAT, retransplant * 1/p.score + (1-retransplant)* 1/(1-p.score))
Than we can perform a cox regression
####COX without weight
coxph(Surv(times, failures)~ retransplant, data=DIVAT)->fit
summary(fit)
Adding weight is quite easy
###COX with weight naive model
coxph(Surv(times, failures)~ retransplant, data=DIVAT, weights = ate.weights)->fit
summary(fit)
###COX with weight and robust estimation
coxph(Surv(times, failures)~ retransplant + cluster(ID), data=DIVAT, weights = ate.weights)->fit
summary(fit)
However, in this way the estimation of standard error is biased (please see Austin, Peter C. "Variance estimation when using inverse probability of treatment weighting (IPTW) with survival analysis." Statistics in medicine 35.30 (2016): 5642-5655.).
Austin suggested to rely on bootstrap estimator. However I'm stacked too, since I'm not able to find a way to perform this kind of analyses. If you found any answer please let me know.

life expectancy survival package R

I would like to calculate the life-years lost due to a disease in a way that I correct for other variables in the model (corrected group prognosis method). My dataset is a cohort of individuals for which I have follow-up time till death/censored and a variable whether they died, together with covariates as age, sex and prevalence of disease. I searched the web and I got the impression this should be possible with the survival package in R.
I used the following code which returns probabilities:
fit1 <- coxph(Surv(fup_death, death) ~ age + sex + prev_disease, data)
direct <- survexp( ~prev_disease, data=data, ratetable=fit1)
I also tried the survfit function, but than my computer crashes:
t<-survfit(fit1, newdata = data)
How can I derive the life-expectancy in the ones with the disease and without the disease? Or should I do it differently?
Thanks you in advance!
Best,
Symen
The calculation for years of life lost is the difference in mean survival. You can get survfit objects for two separate but comparable conditions like this:
fit1 <- coxph(Surv(fup_death, death) ~ age + sex + prev_disease, data)
survfit_WithDisease <- survfit(fit1,
newdata=data.frame(age=50,
sex='m',
prev_disease=TRUE))
survfit_NoDisease <- survfit(fit1,
newdata=data.frame(age=50,
sex='m',
prev_disease=FALSE))
and by setting print.rmean=TRUE you can get estimates of mean survival for each condition.
print(survfit_WithDisease,print.rmean=TRUE)
print(survfit_NoDisease,print.rmean=TRUE)
Note that mean isn't defined for every survival curve. There are several options for calculating mean survival when the survival curve does not go all the way to zero, which you should read about in ?print.survfit.

Resources