I have a dataset with data left censored and I wanted to apply a multilevel mixed-effects tobit regression, but I only find information about how to do it in Stata. Is it possible to do it in R?
I found the packages 'VGAM' and 'CensREG', but I don't get how to add fixed and random effects.
Also my data is log-normal distributed, is there a way to add this to the model?
Thanks!
According to Section 3.5 of a vignette, the censReg package can handle a mixed model if the data are prepared properly via the plm package.
This Cross Validated page shows an example.
I don't have experience with this; it might only work with formal panel data rather than more general random-effects structures.
If your data are truly log-normal, you could take logs first and set the lower censoring limit on the log scale. Note that an apparent log-normal distribution of outcomes might just represent a corresponding distribution of predictor values with an underlying normal error distribution around the predictions. Don't jump blindly into a log-normal assumption.
I am currently running some linear models and lmer (with replicate as a random effect) for continuous data and a glm and glmer (again, replicate as a random effect) for count data.
I was wondering if a lm, lmer, glm and glmer all need the data to be normally distributed and if not, do I need an alternative test?
Also, I have run a glm and looked at the pairwise differences and when reporting it other than P<0.001 I don't know what else I should report! As my glm output doesn't really give me that much. Thanks!
I am currently examining marginal effects of some fixed effects factors in a mixed effects logistic. To do so, I've employed the ggpredict function of the tremendously helpful ggeffects package. I then also used the tab_model function of the associated sjPlot package to produce tables that include odd ratios. However, I was a bit surprised by the output of each:
1) I now see that all levels of my factor predictors are included in the output (as opposed to R's usual dummy coding in which one level of each factor serves as a reference for contrasts). Is it possible to retain a reference level in the ggpredict output? I was hoping to use it to i) check against manual calculations and ii) compare it to the glmer model coefficients that are not similarly calculated conditionally upon the random effects.
2) The odds ratios provided by tab_model are identical to those that I obtained by exponentiating the coefficients provided by my original glmer model (per the IDRE example procedure). However, I was under the impression that the ORs calculated were derived from marginal coefficients that did not account for the influence of the random effect in my model (see the paragraph starting with "Many people prefer" here, the "Predicted Probabilities and Graphing" section here, and top answer here for more information). In turn, does this mean that the ORs for fixed effects variables provided by tab_model similarly do not account for the influence of the random effect? If that's the case, is there an argument or other means by which to do so?
Thanks!
I came across 2 packages to calculate marginal effect for a logistic regression model in R with some interaction terms.
margins package https://cran.r-project.org/web/packages/margins/vignettes/Introduction.html
and mfx pacakge https://cran.r-project.org/web/packages/mfx/mfx.pdf
I want to calculate the average marginal effect and don't know which package is appropriate. For some reason, I cannot use the first one. So I have tried
install.packages("margins-package")
library(margins) ## Library not found
margins(logit) ## Logit is my glm model but that function is not found
The author of the margins package criticised mfx package and other packages used to calculate marginal effect as they do not account for interaction term properly. Does anyone have experience with one of the packages or both? I'd like to hear your feedback.
Thank you!
What functions do you use in R to fit a curve to your data and test how well that curve fits? What results are considered good?
Just the first part of that question can fill entire books. Just some quick choices:
lm() for standard linear models
glm() for generalised linear models (eg for logistic regression)
rlm() from package MASS for robust linear models
lmrob() from package robustbase for robust linear models
loess() for non-linear / non-parametric models
Then there are domain-specific models as e.g. time series, micro-econometrics, mixed-effects and much more. Several of the Task Views as e.g. Econometrics discuss this in more detail. As for goodness of fit, that is also something one can spend easily an entire book discussing.
The workhorses of canonical curve fitting in R are lm(), glm() and nls(). To me, goodness-of-fit is a subproblem in the larger problem of model selection. Infact, using goodness-of-fit incorrectly (e.g., via stepwise regression) can give rise to seriously misspecified model (see Harrell's book on "Regression Modeling Strategies"). Rather than discussing the issue from scratch, I recommend Harrell's book for lm and glm. Venables and Ripley's bible is terse, but still worth a reading. "Extending the Linear Model with R" by Faraway is comprehensive and readable. nls is not covered in these sources, but "Nonlinear Regression with R" by Ritz & Streibig fills the gap and is very hands-on.
The nls() function (http://sekhon.berkeley.edu/stats/html/nls.html) is pretty standard for nonlinear least-squares curve fitting. Chi squared (the sum of the squared residuals) is the metric that is optimized in that case, but it is not normalized so you can't readily use it to determine how good the fit is. The main thing you should ensure is that your residuals are normally distributed. Unfortunately I'm not sure of an automated way to do that.
The Quick R site has a reasonable good summary of basic functions used for fitting models and testing the fits, along with sample R code:
http://www.statmethods.net/stats/regression.html
The main thing you should ensure is
that your residuals are normally
distributed. Unfortunately I'm not
sure of an automated way to do that.
qqnorm() could probably be modified to find the correlation between the sample quantiles and the theoretical quantiles. Essentially, this would just be a numerical interpretation of the normal quantile plot. Perhaps providing several values of the correlation coefficient for different ranges of quantiles could be useful. For example, if the correlation coefficient is close to 1 for the middle 97% of the data and much lower at the tails, this tells us the distribution of residuals is approximately normal, with some funniness going on in the tails.
Best to keep simple, and see if linear methods work "well enuff". You can judge your goodness of fit GENERALLY by looking at the R squared AND F statistic, together, never separate. Adding variables to your model that have no bearing on your dependant variable can increase R2, so you must also consider F statistic.
You should also compare your model to other nested, or more simpler, models. Do this using log liklihood ratio test, so long as dependant variables are the same.
Jarque–Bera test is good for testing the normality of the residual distribution.