How to report overall results of an nlme mixed effects model - r

I want to report the results of an one factorial lme from the nlme package. I want to know the overall effect of A on y. To do so I would compare the model with a Null model:
m1 <- lme(y~A,random=~1|B/C,data=data,weights=varIdent(form = ~1|A),method="ML")
m0 <- lme(y~1,random=~1|B/C,data=data,weights=varIdent(form = ~1|A),method="ML")
I am using maximum likelihood because I am comparing models with different main effects.
stats::anova(m0,m1) gives me a significant p value, meaning that there is a significant effect of A on y. However, in contrast to lmer models made with lme4, no Chi2 values are given. First: Is this approach valid? And second: What is the best way to report the result?
Thanks for your answers

An anova with lme should give you the same information as with lmer. Both use what's called a deviance test or likelihood ratio test. The L.ratio part in the table returned by anova is simply the difference in the loglikelihood of the two models multiplied by -2. A deviance test tests this value against a Chi2 distribution with the difference in model parameters (in your case 1) degrees of freedom. So the value reported under L.ratio for lme models is the same as the Chi2 value reported for lmer models (assuming the models are the same of course, and lmer rounds the value to a decimal).
The approach is valid and you could report the value under L.ratio along with the degrees of freedom and p-value, but I would add more information in your report such as the fixed and random coefficients of both models and other parameters that you've added (such as the difference in variance for levels of A specified under weights). If you're only interested in the fixed effect of A than a Wald test should also be appropriate though REML estimates are recommended in cases with a small number of groups (Snijders & Bosker, 2012). The test statistic is the t-value and associated p-value in the model summary output summary(m1). Chapter 6 in Snijders & Bosker (2012) gives a great explanation on tests for fixed and random parameters. Along with reporting examples.

Related

Model averaging with MuMIn: interpretation of coefficient-names in results

I am doing model averaging with MuMIn and trying to interpret the results.
Everything works fine, but I am wondering about the names of my coefficients in the results:
Model-averaged coefficients:
(full average)
Estimate Std. Error Adjusted SE
cond((Int)) 0.9552775 0.0967964 0.0969705
cond(Distanzpunkt) -0.0001217 0.0001451 0.0001453
cond(area_km2) 0.0022712 0.0030379 0.0030422
cond(prop) 0.0487036 0.1058994 0.1060808
Does someone know, what "cond()" tells me and why it appears in the model output?
Within the models, the coefficients are named "Distanzpunkt", "area_km2" and "prop".
Were you fitting a zero inflation model with glmmTMB? If so, then cond() is referring to the terms in the conditional model, rather than the zero-inflation model.

Interpreting zero-inflated regression summary

The output of the zeroinfl regression from pscl provides a list of coefficients under "count model coefficients" as well as a list of coefficients under "zero-inflation model coefficients."
Given the interest is to follow the z inflated model, what is the utility of the count model coefficients? Is it simply provided for reference?
Your zero inflated regression consists of two models. The zero part is usually a binomial part, such as a logit or probit model, and accounts for the probability that Y is not zero. The count part is usually a model for count data (usually integers), such as a poisson or negative binomial model, and only considers those observations that are not zero. When you compare the number of observations of both models, e.g. using summary(fit), you will see the difference. In sum, your zero model calculates the probability that an observations is not zero, the count model fits a model on those observations that are not zero.
This zero inflated regression is similar to a hurdle model. You can read more on this at Cross Validated: What is the difference between zero-inflated and hurdle models?. BTW that platform is actually better suited for this kind of merely statistical questions.

What post-hoc test should be used for a glmer model with a continious and a categorical predictor variable?

I'm a bit of a newbie with stats and R, so need a bit of direction to find a suitable post-hoc test for my glmer model.
The model has a binary dependent variable (absent/present) and the predictor variables are interactive terms between a continuous variable(eg temp) and a categorical variable (species, n=3). Only interactive terms, rather than the continuous factor in isolation, produce significant results when an anova is run on the model. Species by itself has a large effect because one species is much rarer than the others. I'm trying to tease apart how the presence of these species varies across pH and between species.
I've tried lsmeans test with Tukey, and Firth's Bias-Reduced Logistic Regression, emmeans. I ran the effects function on the interactive terms, so had a rough expectation of what a post hoc could show, but the results logistf (firth's) have produced I was not expecting. Emmeans and tukey both gave the same results and ignored the continuous variable I assume because it's not a factor.
When I run firth's regression it produces chi-squared and p values that are either infinity for chi values or the p values astronomically small, even though what I saw through effects suggested no significant difference. I can't tell with the interactive term if there truly is an effect of the environmental variable or if the significant effect is because of the difference in species. Based on what I have seen of the logistf function, I didn't think it would produce a chi-square score. Is this an issue in coding or is it because of my data?
If I wasn't clear enough about something please let me know and if anyone has any suggestions or advice, they would be massively appreciated. Thanks!
The model and test code I used are below:
###glmer model
Large<-glmer(Abs.Pres~ Species:Q.Depth+Species:Conductivity+Species:Temp+Species:pH+Species:DO.P+(1|QID),
nAGQ=0,
family=binomial,
data=Stacked_Pref)
anova(Large)
Output:Analysis of Variance Table
npar Sum Sq Mean Sq F value
Species:Q.Depth 3 234.904 78.301 78.3014
Species:Conductivity 3 32.991 10.997 10.9970
Species:Temp 3 39.001 13.000 13.0004
Species:pH 3 25.369 8.456 8.4562
Species:DO.P 3 34.930 11.643 11.6434
###Firths
Lp<-logistf(Abs.Pres~Species:pH, data=Stacked_Pref, contrasts.arg=list(pH="contr.treatment", Species="contr.sum"))
> Lp
logistf(formula = Abs.Pres ~ Species:pH, data = Stacked_Pref,
contrasts.arg = list(pH = "contr.treatment", Species = "contr.sum"))
Model fitted by Penalized ML
Confidence intervals and p-values by Profile Likelihood
coef se(coef) lower 0.95 upper 0.95 Chisq p
(Intercept) 1.9711411 0.57309880 0.8552342 3.1015114 12.09107 5.066380e-04
SpeciesGoby:pH -0.3393185 0.07146049 -0.4804047 -0.2003108 23.31954 1.371993e-06
SpeciesMosquito:pH -0.3001385 0.07127771 -0.4408186 -0.1614419 18.24981 1.937453e-05
SpeciesRFBE:pH -0.4771393 0.07232469 -0.6200179 -0.3365343 45.73750 1.352096e-11
Likelihood ratio test=267.0212 on 3 df, p=0, n=3945

Comparing AIC for different types of models (beta and normal)

I have responses which are proportions mainly centered around 0.6-0.7, and not many of them are close to 0 or 1. I have tried fitting both normal and beta models, and the normal models yield lower AIC than the beta models. I use the lm package for fitting the normal model, and betareg for the beta model.
But I wonder it if it really possible to compare AIC values for different model types like that? I do of course use the same response variables and the same data for both regressions.
Note: I tried to read about Kullback-Leibler divergence here: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4049836 (name: The AIC Criterion and Symmetrizing the Kullback–Leibler Divergence), but got confused by this sentence on page two: "It is also assumed as in [12] that the search is carried out in a parametric family of distribution including the true model.", where [12] refers to Akaikes article from 1974. Does this imply that I cannot compare the AIC from a beta and a normal model, as the true model cannot be both beta and normal?
Note 2: I tried to logit-transform the responses and then fit a normal model, but that just made the residual plots look worse.

How to test joint parameter hypothesis in multinomial logit regression R?

I am trying to test the hypothesis of market efficiency in bookmaker odds for football matches. I have estimated a multinomial logit model with the mlogit package:
Model: outcome=log(P1/Px)+log(P2/Px)
where P1 is the implicit bookie probability of a home win, Px is the implicit bookie probability of a draw, etc. Draw (x) is the reference category.
Now I want to use a likelihood-based test (LR,Wald or LM) for the following hypothesis:
H0: β1=(0,1,0), β2=(0,0,1)
Ie: under the null hypothesis the intercept coefficient is 0 for both regressions. The coefficient for the logit of home win is 1 when y=homewin, and 0 when y=away win. The coefficient for the logit of away win is 0 when y=home win, and 1 when y=away win.
I am having trouble understanding how to fit the constrained model (the H0-model), from which I would extract a loglikelihood to compare with the ditto received from the ML-estimated model in an LR-test.
I have tried following the instructions from page 57 here:
https://cran.r-project.org/web/packages/mlogit/vignettes/mlogit.pdf
but I don't understand how to specify my H0-model using the update()-function. Is it possible?
If you know how to do an equivalent test using the nnet (multinom) package, perhaps using "offset", an explanation of how to do that would also be very appreciated.
Thanks for any help!
I now understand that I did not need to fit a constrained model with fixed parameter values (the H0-model) to extract the loglikelihood value under the null hypothesis.
If the null hypothesis is true, the log likelihood will be:
sum(ln(Pj)),
where j is the actual outcome of the game and P is the implicit bookmaker probability.

Resources