R GLM weights implementation /interpretation - r

Implementing a GLM in R with weights.
I am wondering whether this quote from R Documentation of GLM
Non-NULL weights can be used to indicate that different observations have different dispersions (with the values in weights being inversely proportional to the dispersions); or equivalently, when the elements of weights are positive integers , that each response is the mean of unit-weight observations.
means that the log-likelihood is modified in the following way
\sum_{i} \log f(X_i) \to \sum_{i} w_i \log f(X_i)
only when the weights are positive integers?
If this is not possible at all then how do I incorporate weights in the way as described in the formula?

Related

Using GAMLSS, the difference between fitDist() and gamlss()

When using the GAMLSS package in R, there are many different ways to fit a distribution to a set of data. My data is a single vector of values, and I am fitting a distribution over these values.
My question is this: what is the main difference between using fitDist() and gamlss() since they give similar but different answers for parameter values, and different worm plots?
Also, using the function confint() works for gamlss() fitted objects but not for objects fitted with fitDist(). Is there any way to produce confidence intervals for parameters fitted with the fitDist() function? Is there an accuracy difference between the two procedures? Thanks!
m1 <- fitDist()
fits many distributions and chooses the best according to a
generalized Akaike information criterion, GAIC(k), wit penalty k for each
fitted parameter in the distribution, where k is specified by the user,
e.g. k=2 for AIC,
k = log(n) for BIC,
k=4 for a Chi-squared test (rounded from 3.84, the 5% critical value of a Chi-squared distribution with 1 degree of fereedom), which is my preference.
m1$fits
gives the full results from the best to worst distribution according to GAIC(k).

How to determine the coefficient of svm classifiers for linear kernels in R?

I am using the kernlab package for svm in R.I am using the linear kernel so that I can directly check the importance of the feature vectors, that is my variables.Using the coefficients of these feature vectors,I am required to calculate the weight of the various factors in the model,so that the linear separating plane that the svm will draw in my feature space can we evaluated. Basically I want to calculate the w in transpose(w)*x + b. Could someone please suggest what is to be done. I used the fields alpha and b and apha index and tried to logically calculate the weight vector, but to verify if I was calculating correctly I tried to predict on a test sample its correct predicted score, and this did not match the value predicted by the inbuilt predict function. How to calculate the weights?

specifying probability weights in R *without* using Lumley survey package

I would really appreciate any help with specifying probability weights in R without using the Lumley survey package. I am conducting mediation analysis in R using the Imai et al mediation package, which does not currently support svyglm.
The code I am currently running is:
olsmediator_basic<-lm(poledu ~ gateway_strict_alt + gender_n + spline1 + spline2 + spline3,
data = unifiedanalysis, weights = designweight).
However, I'm unsure if this is weighting the data correctly. The reason is that this code yields standard errors that differ from those I am getting in Stata. The Stata code I am running is:
reg poledu gateway_strict_alt gender_n spline1 spline2 spline3 [pweight=designweight]).
I was wondering if the weights option in R may not be for inverse probability weights, but I was unable to determine this from the documentation, this forum or elsewhere. If I am missing something, I really apologize - I am new to R as well as to this forum.
Thank you in advance for your help.
The R documentation specifies that the weights parameter of the lm function is inversely proportional to the variance of the observations. This is the definition of analytic weights, or aweights in Stata.
Have a look at the ipw package for inverse probability weighting.
To correct a previous answer - I looked up the manual on weights and found the following description for weights in lm
Non-NULL weights can be used to indicate that different observations have different variances (with the values in weights being inversely proportional to the variances); or equivalently, when the elements of weights are positive integers w_i, that each response y_i is the mean of w_i unit-weight observations (including the case that there are w_i observations equal to y_i and the data have been summarized).
These are actually frequency weights (fweights in stata). They multiply out the observation n number of times as defined by the weight vector. Probability weights, on the other hand, refer to the probability that observations group is included in the population. Doing so adjusts the impact of the observation on the coefficients, but not on the standard errors, as they don't change the number of observations represented in the sample.

Equivalent of "PROC GLM; FREQ" in glm()?

Does R glm() have any equivalent of the the FREQ switch in PROC GLM of SAS wherebyt each unit of observation is counted N number of times defined by the corresponding FREQ variable?
If your data is set up properly then the weights parameter given as vector of integers representing frequency weights may succeed. The situation for family="binomial" needs to have the response counts be one column and the weights be the number of occurrences of the other parameter patterns. The relevant two sentences in the help page with slight editing:
"Non-NULL weights can be used to indicate that [ when the elements of weights are positive integers w_i] .... that each response y_i is the mean of w_i unit-weight observations. For a binomial GLM prior weights are used to give the number of trials when the response is the proportion of successes: they would rarely be used for a Poisson GLM."
In the situation of Poisson GLM the population weights might, however, be entered as an offset term.

How to fit a negative binomial distribution in R while incorporating censoring

I need to fit Y_ij ~ NegBin(m_ij,k), hence a negative binomial distribution to a count. However, the data I have observed are censored, I know the value of y_ij, but it could be more than that value. Writting down the loglikelihood going with this problem is:
ll = \sum_{i=1}^n w_i (c_i log(P(Y_ij=y_ij|X_ij)) + (1- c_i) log(1- \sum_{k=1}^32 P(Y_ij = k|X_ij)))
Where X_ij represent the design matrix (with the covariates of interest), w_i is the weight for each observation, y_ij is the response variable and P(Y_ij=y_ij|Xij) is the negative binomial distribution where the m_ij=exp(X_ij \beta) and \alpha is the overdispersion parameter.
Does someone knows if there exist a build-in code in R that could be used to obtain this?
Check this paper out: Regression Models for Count Data in R

Resources