Survey-package: How do I get R-squared from a svyglm-object? - r

I'm analysing a social survey and need to use survey package to account for oversampling. I performed a log-linear regression with svyglm command and everything worked perfectly fine. However, the output is an svyglm object that apparently does not include the R-squared value unlike normal lm objects. So how do I get this value and how do I include it in my regression table if its not part of the actual object? (I'm using stargazer package to create the tables for my paper)
Thanks in advance :)

If you have a linear regression model with svyglm you can get the residual variance and the total variance directly, you don't need a pseudo-rsquared
total_var <-svyvar(y, design)
resid_var <- summary(model)$dispersion
rsq <- 1-resid_var/total_var

Related

Multilevel mixed-effects tobit regression in R

I have a dataset with data left censored and I wanted to apply a multilevel mixed-effects tobit regression, but I only find information about how to do it in Stata. Is it possible to do it in R?
I found the packages 'VGAM' and 'CensREG', but I don't get how to add fixed and random effects.
Also my data is log-normal distributed, is there a way to add this to the model?
Thanks!
According to Section 3.5 of a vignette, the censReg package can handle a mixed model if the data are prepared properly via the plm package.
This Cross Validated page shows an example.
I don't have experience with this; it might only work with formal panel data rather than more general random-effects structures.
If your data are truly log-normal, you could take logs first and set the lower censoring limit on the log scale. Note that an apparent log-normal distribution of outcomes might just represent a corresponding distribution of predictor values with an underlying normal error distribution around the predictions. Don't jump blindly into a log-normal assumption.

GLM & LM assumptions and interpretation

I am currently running some linear models and lmer (with replicate as a random effect) for continuous data and a glm and glmer (again, replicate as a random effect) for count data.
I was wondering if a lm, lmer, glm and glmer all need the data to be normally distributed and if not, do I need an alternative test?
Also, I have run a glm and looked at the pairwise differences and when reporting it other than P<0.001 I don't know what else I should report! As my glm output doesn't really give me that much. Thanks!

R - Testing for homo/heteroscedasticity and collinearity in a multivariate regression model

I'm trying to optimize a multivariate linear regression model lmMod=lm(depend_var~var1+var2+var3+var4....,data=df) and I'm presently working on the premises of the model: the constant variance of residuals and the absence of auto-correlation. For this I'm using:
Breusch-Pagan test for homo/heteroscedasticity: lmtest::bptest(lmMod) 
Durbin Watson test for auto-correlation: durbinWatsonTest(lmMod)
I found examples which are testing either one independent variable at a time:
example for Breush-Pagan test – one independent variable:
https://datascienceplus.com/how-to-detect-heteroscedasticity-and-rectify-it/
example for Durbin Watson test - one independent variable:
http://math.furman.edu/~dcs/courses/math47/R/library/lmtest/html/dwtest.html
or the whole model with several independent variables at a time:
example for Durbin Watson test – multiple independent variable:
https://www.rdocumentation.org/packages/car/versions/2.1-6/topics/durbinWatsonTest
Here are the questions:
Can durbinWatsonTest() and bptest() be fed with a whole multivariate model
If answer to 1 is yes, how is it then possible to determine which variable is causing heteroscedasticity or auto-correlation in the model in order to fix it as each of those tests give only one p-value for the entire multivariate model?
If answer to 1 is no, the test should be then performed with one dependent variable at a time. But in the case of homoscedasticity, it can only be tested AFTER a particular regression has been modelled. Hence a pattern of homo/heteroscedasticity in an univariate regression model lmMod_1=lm(depend_var~var1, data=df) will be different from the pattern of a multivariate regression model lmMod_2=lm(depend_var~var1+var2+var3+var4....,data=df)
Thank very much in advance for your help!
I would like to try to give a first help
The answer to the first question: Yes, you can use the Breusch-Pagan test and the Durbin Watson test for mutlivariate models. (However, I have always used the dwtest() instead of the durbinWatsonTest()).
Also note that the dwtest() checks only the first-order autocorrelation. Unfortunately, I do not know how to find out which variable is causing heteroscedasticity or auto-correlation. However, if you encounter these problems, then one possible solution is that you use a robust estimation method, e.g. after NeweyWest (using: coeftest (regression model, vcov = NeweyWest)) at autocorrelation or with coeftest(regression model, vcov = vcovHC) at heteroscedasticity, both from the AER package.

caret: Linear Model parameter estimates (mean and s.e.) via LOOCV

I am wondering if there is a way to extract out all of the lm parameter estimates from the results of the cross-validation training runs from caret::train() with an lm model.
I have a gist of the R code I used to do some of this checking, where I directly access the train() output object, get the cross-validation data.frame indexes used in each cross-validation run. But I was wondering if there were already functions that access this for me, because I would think that 1.) If it was a good or reasonable idea, the functionality would be there, or 2.) If the functionality is not there, my desire to do this may not be a good idea.
As a second part of the question, you can see in the gist that when I compute the mean and standard error of the single linear parameter over all of the cross-validation parameter estimates, that the mean of the CV parameter estimates and the linear model on a fit of the entire training data set match up well, but the standard error from the CV estimates is much smaller than that from the estimate from the single lm run on the whole training set. Is that expected, or am I computing/thinking about that wrong?
EDIT: I think the second part can be found by reading this answer .
Thanks in advance,
Matt

Model fitting: glm vs glmmPQL

I am fitting a model regarding absence-presence data and I would like to check whether the random factor is significant or not.
To do this, one should compare a glmm with a glm and check with the LR-test which one is most significant, if I understand correct.
But if I perform an ANOVA(glm,glmm) , I get an analysis of Deviance Table and no output that compares the models.
How do I get the output that I desire, thus comparing both models?
Thanks in advance,
Koen
Somewhere you got the wrong impression about using anova() for this. Below re was fit using glmmPQL() in MASS package. fe was fit using glm() from base:
> anova(re,fe)
#Error in anova.glmmPQL(re, fe) : 'anova' is not available for PQL fits
That message appears to be the sole reason anova.glmmPQL() was created.
See this thread for verification and vague explanation:
https://stat.ethz.ch/pipermail/r-help/2002-July/022987.html
simply anova does not work with glmmPQL you need to use glmer from lme4 package to be able to use anova.

Resources