I would really appreciate any help with specifying probability weights in R without using the Lumley survey package. I am conducting mediation analysis in R using the Imai et al mediation package, which does not currently support svyglm.
The code I am currently running is:
olsmediator_basic<-lm(poledu ~ gateway_strict_alt + gender_n + spline1 + spline2 + spline3,
data = unifiedanalysis, weights = designweight).
However, I'm unsure if this is weighting the data correctly. The reason is that this code yields standard errors that differ from those I am getting in Stata. The Stata code I am running is:
reg poledu gateway_strict_alt gender_n spline1 spline2 spline3 [pweight=designweight]).
I was wondering if the weights option in R may not be for inverse probability weights, but I was unable to determine this from the documentation, this forum or elsewhere. If I am missing something, I really apologize - I am new to R as well as to this forum.
Thank you in advance for your help.
The R documentation specifies that the weights parameter of the lm function is inversely proportional to the variance of the observations. This is the definition of analytic weights, or aweights in Stata.
Have a look at the ipw package for inverse probability weighting.
To correct a previous answer - I looked up the manual on weights and found the following description for weights in lm
Non-NULL weights can be used to indicate that different observations have different variances (with the values in weights being inversely proportional to the variances); or equivalently, when the elements of weights are positive integers w_i, that each response y_i is the mean of w_i unit-weight observations (including the case that there are w_i observations equal to y_i and the data have been summarized).
These are actually frequency weights (fweights in stata). They multiply out the observation n number of times as defined by the weight vector. Probability weights, on the other hand, refer to the probability that observations group is included in the population. Doing so adjusts the impact of the observation on the coefficients, but not on the standard errors, as they don't change the number of observations represented in the sample.
Related
Implementing a GLM in R with weights.
I am wondering whether this quote from R Documentation of GLM
Non-NULL weights can be used to indicate that different observations have different dispersions (with the values in weights being inversely proportional to the dispersions); or equivalently, when the elements of weights are positive integers , that each response is the mean of unit-weight observations.
means that the log-likelihood is modified in the following way
\sum_{i} \log f(X_i) \to \sum_{i} w_i \log f(X_i)
only when the weights are positive integers?
If this is not possible at all then how do I incorporate weights in the way as described in the formula?
I am running an analysis of hospital length of stay based on a number of parameters in R, with one primary exposure. Many of the covariates are missing, most commonly lab values, because these aren't checked for all patients. I've worked out what seems to be a good multiple imputation schema using MICE. Because of imbalance between exposed and unexposed groups, I'm also weighting using propensity scores.
I've managed to run a successful weighted Poisson model with MICE and WeightThem. However, when I checked the models for overdispersion, it does appear that the variance is greater than the mean, implying I should be using a quasipoisson or negative binomial model. However, I can't find documentation on negative binomial models with WeightThem or WeightIt in R.
Does anyone have any experience? To run a negative binomial model, i can just use the following code:
results <- with(models, MASS::glm.nb(LOS ~ exposure + covariate1 + covariate2)
in which "models" is the multiply-imputed WeightIt object.
However, according to the WeightIt documentation, when using any glm model you need to run it as a svyglm to get proper standard errors:
results <- with(models, svyglm(LOS ~ exposure + covariate1 + covariate2,
family = poisson()))
There is a function in the sjstats package called svyglm.nb, but this requires creating a design matrix or the model won't run. I have no idea how/whether this is necessary - is the first version (just glm.nb) sufficient? Am I entirely thinking about this wrong?
Thanks so much, advice is much appreciated.
My response variable is Yijk corresponding to the recovery time of
patient i (i=1,...,I)
with treatment j (j=1,...,J)
and measured at time k (k=1,...,K)
I would like to fit the following model:Model equation, where:
μ is a global fixed intercept
αj is a fixed effect for the treatment
bik is a random effect with the following covariance structure. Denote bi the K-dimensional vector of effect for the patient i, then its variance-covariance matrix would have the following AR(1) structure.
Variance covariance matrix
uijk is the usual error term with variance σ²
Consider the following line of command:
lme(recovery ~ treatment, method="REML", random=~1|patient, correlation=corAR1,form=~time|patient,data=data)
Several questions:
What does this correlation argument correspond to? The structure of covariance of what? Is that the var-cov matrix which I defined as R?
Does the line actually do what I would like to?
If not, what does it do?
If not, is there a way to do what I would like to?
Thank you in advance!
First, you have a command lme, I will assume that is meant to be nlme because a) lme isn't an R command in any package that I know of or that R could find and b) correlation isn't an option in lme4
Second, in the documentation for nlme they have this:
an optional corStruct object describing the within-group correlation
structure. See the documentation of corClasses for a description of
the available corStruct classes. Defaults to NULL, corresponding to no
within-group correlations.
and in corClasses it says
corAR1 autoregressive process of order 1.
So, the answers to your first two questions appears to be "Yes".
I want to fit mydata with several known distributions, power law with exponential cutoff distribution is one of the candidates.
fitdistr function in package fitdistrplus is one of good methods to use for the parameter estimation using MLE, or MME, or QME.
But power law with exponential cutoff is not the base probability function according to CRAN Task View: Probability Distributions , so I try the nls function.
The pdf of power law with exponential cutoff is f(x;α,λ)=C*x^(−α)*exp(−λ*x)
First, I generate some random values to replace my real data:
data <- rlnorm(1000,0.6,1.23)
h <- hist(data,breaks=1000,plot=FALSE)
x <- h$mids
y <- h$density
Then, I use nls function to conduct parameter estimation:
nls(y~c*x^(-a)*exp(-b*x),start=list(a=1,b=1,c=1))
But it does not work and always throws one of these two errors:
Error in numericDeriv(form[[3L]], names(ind), env) : Missing value or an infinity produced when evaluating the model
Or: singular gradient matrix at initial parameter estimates
Before posting, I have read almost all the previous posts and google, there are several reasons for the errors:
bad start values for the nls. I tried a lot, but it does not work.
some negative values or values less than 1 or values equal to Inf may be generated. I tried to do the data cleaning, also, it does not work.
What should I do now? Or are there some other better methods to do the parameter estimation of power law with exponential cutoff? I need your help, thank you!
I'm using the svydesign package in R to run survey weighted logit regressions as follows:
sdobj <- svydesign(id = ~0, weights = ~chweight, strata = ~strata, data = svdat)
model1 <- svyglm(formula=formula1,design=sdobj,family = quasibinomial)
However, the documentation states a caveat about regressions without specifying finite population corrections (FPC):
If fpc is not specified then sampling is assumed to be
with replacement at the top level and only the first stage of
cluster is used in computing variances.
Unfortunately, I do not have sufficient information to specify my populations at each level (of which I sampling very little). Any information on how to specify survey weights without FPC information would be very helpful.
You're doing it right. "With replacement" is survey statistics jargon for what you want in this case.
If the sampling fraction is low, it is standard to use an approximation that would be exact if the sampling fraction were infinitesimal or sampling were with replacement. No-one actually does surveys with replacement, but the approximation is almost universal. With this approximation you don't need to supply fpc, and conversely, if you don't supply fpc, svydesign() assumes you want this approximation.