Model fitting: glm vs glmmPQL - r

I am fitting a model regarding absence-presence data and I would like to check whether the random factor is significant or not.
To do this, one should compare a glmm with a glm and check with the LR-test which one is most significant, if I understand correct.
But if I perform an ANOVA(glm,glmm) , I get an analysis of Deviance Table and no output that compares the models.
How do I get the output that I desire, thus comparing both models?
Thanks in advance,
Koen

Somewhere you got the wrong impression about using anova() for this. Below re was fit using glmmPQL() in MASS package. fe was fit using glm() from base:
> anova(re,fe)
#Error in anova.glmmPQL(re, fe) : 'anova' is not available for PQL fits
That message appears to be the sole reason anova.glmmPQL() was created.
See this thread for verification and vague explanation:
https://stat.ethz.ch/pipermail/r-help/2002-July/022987.html

simply anova does not work with glmmPQL you need to use glmer from lme4 package to be able to use anova.

Related

Survey-package: How do I get R-squared from a svyglm-object?

I'm analysing a social survey and need to use survey package to account for oversampling. I performed a log-linear regression with svyglm command and everything worked perfectly fine. However, the output is an svyglm object that apparently does not include the R-squared value unlike normal lm objects. So how do I get this value and how do I include it in my regression table if its not part of the actual object? (I'm using stargazer package to create the tables for my paper)
Thanks in advance :)
If you have a linear regression model with svyglm you can get the residual variance and the total variance directly, you don't need a pseudo-rsquared
total_var <-svyvar(y, design)
resid_var <- summary(model)$dispersion
rsq <- 1-resid_var/total_var

How to interpret Result of two way anova test in R?

m1 <- lm(AmountSpent~Catalogs*Salary,data=d)
summary(m1)
m2<-lm(AmountSpent~Catalogs+Salary,data=d)
summary(m2)
anova(m2,m1,test="Chisq")
The output is as follows
What is the better model according to test ? Is the order in which we insert models in the method important? Please explain the statistical concept behind this test
The Chi-square test looks into the statistical significance in reduction of the residual sum of squares between the nested linear models. From your R ouput you can see that adding a term to the regression resulted statistically in a better model (the one with the lower RSS value, Model 2).
It is usual to start the comparing with a simpler model and then add terms, however, the docs also mention that it is up to the user.
You should take a look at the docs of the the anova.lm function here.
For comparing models that are not nested rather use AIC or BIC criteria.

R: mixed model with heteroscedastic data -> only lm function works?

This question asks the same question, but hasn't been answered. My question relates to how to specify the model with the lm() function and is therefore a programming (not statistical) question.
I have a mixed design (2 repeated and 1 independent predictors). Participants were first primed into group A or B (this is the independent predictor) and then they rated how much they liked 4 different statements (these are the two repeated predictors).
There are many great online resources how to model this data. However, my data is heterscedastic. So I like to use heteroscedastic-consistent covariance matrices. This paper explains it well. The sandwich and lmtest packages are great. Here is a good explanation how to do it for a indpendent design in R with lm(y ~ x).
It seems that I have use lm, else it wont work?
Here is the code for a regression model assuming that all variances are equal (which they are not as Levene's test comes back significant).
fit3 <- nlme:::lme(DV ~ repeatedIV1*repeatedIV2*independentIV1, random = ~1|participants, df) ##works fine
Here is the code for an indepedent model correcting for heteroscedasticity, which works.
fit3 <- lm(DV ~ independentIV1)
library(sandwich)
vcovHC(fit3, type = 'HC4', sandwich = F)
library(lmtest)
coef(fit3, vcov. = vcovHC, type = 'HC4')
So my question really is, how to specify my model with lm?
Alternative approaches in R how to fit my model accounting for heteroscedasticity are welcome too!
Thanks a lot!!!
My impression is that your problems come from mixing various approaches for various aspects (repeated measurements/correlation vs. heteroscedasticity) that cannot be mixed so easily. Instead of using random effects you might also consider fixed effects, or instead of only adjusting the inference for heteroscedasticity you might consider a Gaussian model and model both mean and variance, etc. For me, it's hard to say what is the best route forward here. Hence, I only comment on some aspects regarding the sandwich package:
The sandwich package is not limited to lm/glm only but it is in principle object-oriented, see vignette("sandwich-OOP", package = "sandwich") (also published as doi:10.18637/jss.v016.i09.
There are suitable methods for a wide variety of packages/models but not
for nlme or lme4. The reason is that it's not so obvious for which mixed-effects models the usual sandwich trick actually works. (Disclaimer: But I'm no expert in mixed-effects modeling.)
However, for lme4 there is a relatively new package
called merDeriv (https://CRAN.R-project.org/package=merDeriv) that
supplies estfun and bread methods so that sandwich covariances can be
computed for lmer output etc. There is also a working paper associated
with that package: https://arxiv.org/abs/1612.04911

post hoc test for a two way mixed model anova

I am doing a repeated measures anova with a mixed model. I would like to run a post hoc test to see the p-values of the interaction TREAT*TIME, but I only managed to use the following ghlt Tukey test which do not give me the interaction I am looking for.
library(multcomp)
library(nlme)
oi<-lme(total ~ TREAT * TIME, data=TURN, random = ~1|NO_UNIT)
anova(oi)
summary(glht(oi, linfct=mcp(TIME="Tukey", TREAT="Tukey")))
what I would be looking for is something like:
summary(glht(oi, linfct=mcp(TIME="Tukey",TREAT="Tukey",TREAT*TIME="Tukey")))
Use snk.test(model, term="TREAT*TIME", among="TREAT", within="TIME") from the package GAD if you have a balanced model and summary( lsmeans( oi, pairwise ~ TIME*TREAT), infer=TRUE) from lsmeans if your model is unbalanced
I have also had this problem.
It appears that a straight-forward post hoc test for two way ANOVAs does not exist.
However, you may like to try bootstrapping, which is a form of robust estimation for a two-way ANOVA. I found the following link very helpful.
http://rcompanion.org/rcompanion/d_08a.html
It contains a step-by-step tutorial using the rcompanion,WRS2, psych, and multcompView packages to perform your bootstrapped ANOVA and follow up with a post hoc. Good luck.
For a mixed model you can find an alternative with the aov_ez() function from the afex package instead of lme(), and then performe post hoc analysis using lsmeans().
You will find a detailed tutorial here:
https://www.psychologie.uni-heidelberg.de/ae/meth/team/mertens/blog/anova_in_r_made_easy.nb.html

anova.rms problem with rcs() terms

I´m having problem with the anova function in the rms package:
require(rms)
getHdata(prostate)
mod1<-cph(Surv(dtime,status!="Alive")~stage+rx+age+wt,data=prostate,x=T,y=T)
mod2<-cph(Surv(dtime,status!="Alive")~stage+rx+rcs(age,4)+wt,data=prostate,x=T,y=T)
anova(mod1)
anova(mod2)
-everything works alright, but when I try to compare the models for the impact of non-linearity on age
anova(mod1,mod2)
I get
Error in anova.rms(mod1, mod2) : factor names not in design: mod2
What does this mean? What can I do to circumvent it?
//M
You should be able to use the output of anova(mod2) as one way to assess the significance but the best answer would be to compare the -2*log(likelihood) statistics. The anova.rms function is not designed to take two model fits. The second and subsequent unnamed arguments are assumed to be names of terms within the model rather than fit objects.
(Note that with rcs terms you will not see the sum of individual terms equal the full model chi-square values. I have asked Harrell about this and he says to do the cross-model comparisons "by hand".)
This comparison is done using lrtest (per Misha's comment).

Resources