glht() and lsmeans() can't find contrast in lmer() model - r

I have the following situation:
my fixed-effect model find a main effect of Relation_PenultimateLast in the group of participant called 'composers'. I want therefore to find what level of Relation_PenultimateLast differ statistically from the others.
f.e.model.composers = lmer(Score ~ Relation_PenultimateLast + (1|TrajectoryType) + (1|StimulusType) + (1|Relation_FirstLast) + (1|LastPosition), data=datasheet.complete.composers)
Summary(f.e.model.composers)
Random effects:
Groups Name Variance Std.Dev.
TrajectoryType (Intercept) 0.005457 0.07387
LastPosition (Intercept) 0.036705 0.19159
Relation_FirstLast (Intercept) 0.004298 0.06556
StimulusType (Intercept) 0.019197 0.13855
Residual 1.318116 1.14809
Number of obs: 2200, groups:
TrajectoryType, 25; LastPosition, 8; Relation_FirstLast, 4; StimulusType, 4
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.90933 0.12476 14.84800 23.320 4.15e-13 ***
Relation_PenultimateLast 0.09987 0.02493 22.43100 4.006 0.000577 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I have to make a Tukey comparison of my lmer() model.
Now, I find two methods for the comparison among Relation_PenultimateLast levels (I have found them in here: https://stats.stackexchange.com/questions/237512/how-to-perform-post-hoc-test-on-lmer-model):
summary(glht(f.e.model.composers, linfct = mcp(Relation_PenultimateLast = "Tukey")), test = adjusted("holm"))
and
lsmeans(f.e.model.composers, list(pairwise ~ Relation_PenultimateLast), adjust = "holm")
These do not work.
The former reports:
Variable(s) ‘Relation_PenultimateLast’ of class ‘integer’ is/are not contained as a factor in ‘model’
The latter:
Relation_PenultimateLast lsmean SE df lower.CL upper.CL
2.6 3.168989 0.1063552 8.5 2.926218 3.41176
Degrees-of-freedom method: satterthwaite
Confidence level used: 0.95
$` of contrast`
contrast estimate SE df z.ratio p.value
(nothing) nonEst NA NA NA NA
Can somebody help me understand why I have this result?

First, it's important to realize that the model you have fitted is inappropriate. It uses Relation_PenultimateLast as a numeric predictor; thus it fits a linear trend to its values 1, 2, 3, and 4, rather than separate estimates for each level of this as a factor. I also wonder, given the plot you show, why Test is not in the model; it looks like it should be (again as a factor, not a numeric predictor). I suggest that you get some statistical consulting help to check that you are using appropriate models in your research. Perhaps you could give a graduate student in statistics some grounding in practical applications -- a win-win proposition.
To model Relation_PenultimateLast as a factor, one way is to replace it in the model formula with factor(Relation_PenultimateLast). That will work for lsmeans() but not glht(). A better way is probably to change it in the dataset:
datasheet.complete.composers = transform(datasheet.complete.composers,
Relation_PenultimateLast = factor(Relation_PenultimateLast))
f.e.model.composers = lmer(...) ### (as before, assuming Test isn't needed)
(BTW, you must be a heck of a better typist than I am; I'd use shorter names, though I do applaud using informative ones.)
(Note: is f.e.model.composers supoposed to suggest a fixed-effects model? It isn't one; it is a mixed model. Again, a consultant...)
The lsmeans package is destined to be deprecated, so I suggest you use its continuation, the emmeans package:
library(emmeans)
emmeans(f.e.model.composers, pairwise ~ Relation_PenultimateLast)
I suggest using the default "tukey" adjustment rather than Holm for this application.
If indeed Test should be in the model, then it looks like you need to include the interaction; so it'd go something like this:
model.composers = lmer(Score ~ Relation_PenultimateLast * factor(Test) + ...)
### A plot like the one shown, but based on the model predictions:
emmip(model.composers, Relation_PenultimateLast ~ Test)
### Estimates and comparisons of Relation_PenultimateLast for each Test:
emmeans(model.composers, pairwise ~ Relation_PenultimateLast | Test)

Related

One-way anova using the Survey package in R

I am trying to identify the best way to run a one-way Anova on a complex survey design. After perusing Lumley's Survey package documentation, I am none the wiser.
The survey::anova function is meant to 'Fit and compare hierarchical loglinear models for complex survey data', which is not what I am doing.
What I am trying to do
I have collected data about one categorical independent variable [3 levels] and one quantitative dependent variable. I want to use ANOVA to check if the dependent variable changes according to the level of the independent variable.
Here is an example of my process:
Load Survey package and create complex survey design object
library(survey)
df <- data.frame(sex = c('F', 'O', NA, 'M', 'M', 'O', 'F', 'F'),
married = c(1,1,1,1,0,0,1,1),
pens = c(0, 1, 1, NA, 1, 1, 0, 0),
weight = c(1.12, 0.55, 1.1, 0.6, 0.23, 0.23, 0.66, 0.67))
svy_design <- svydesign(ids=~1, data=df, weights=~weight)
Borrowing from this post over here,
Method 1: using survey::aov
summary(aov(weight~sex,data = svy_design))
However I got an error saying:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'summary': object 'api00' not foun
Method 2: use survey::glm instead of anova
That same post has an answer/explanation with a case against using anova:
According to the main statistician of our institute there is no easy implementation of this kind of analysis in any common modeling environment. The reason for that is that ANOVA and ANCOVA are linear models that where not further developed after the emergence of General Linear Models (later Generalized linear models - GLMs) in the 70's.
A normal linear regression model yields practically the same results as an ANOVA, but is much more flexible regarding variable choice. Since weighting methods exist for GLMs (see survey package in R) there is no real need to develop methods to weight for stratified sampling design in ANOVA... simply use a GLM instead.
summary(svyglm(weight~sex,svy_design))
I got this output:
call:
svyglm(formula = weight ~ sex, design = svy_design)
Survey design:
svydesign(ids = ~1, data = df, weights = ~weight)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.8730 0.1478 5.905 0.00412 **
sexM -0.3756 0.1855 -2.024 0.11292
sexO -0.4174 0.1788 -2.334 0.07989 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 0.04270091)
Number of Fisher Scoring iterations: 2
My Questions:
Why does method 1 throw an error?
Is it possible to use the survey::aov function accomplish my goal?
If I were to use survey::glm [method 2], which value should I be looking at to identify a difference in means? Would it be the p value of the intercept?
I am a far cry from a stats buff, please do explain in the simplest possible terms. Thank you!!
There is no such function as survey::aov, so you can't use it to accomplish your goal. Your code uses stats::aov
You can use survey::svyglm. I will use one of the examples from the package, so I can actually run the code
> model<-svyglm(api00~stype, design=dclus2)
> summary(model)
Call:
svyglm(formula = api00 ~ stype, design = dclus2)
Survey design:
dclus2<-svydesign(id=~dnum+snum, weights=~pw, data=apiclus2)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 692.81 30.28 22.878 < 2e-16 ***
stypeH -94.47 27.66 -3.415 0.00156 **
stypeM -50.46 23.01 -2.193 0.03466 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 17528.44)
Number of Fisher Scoring iterations: 2
There are three school types, E, M, and H. The two coefficients here estimate differences between the mean of E and the means of the other two groups and the $p$-values test the hypotheses that H and E have the same mean and that M and E have the same mean.
If you want an overall test for the difference in means among the three groups you can use the regTermTest function, which tests a term or set of terms in the model, eg,
> regTermTest(model,~stype)
Wald test for stype
in svyglm(formula = api00 ~ stype, design = dclus2)
F = 12.5997 on 2 and 37 df: p= 6.7095e-05
That F test is analogous to the one stats::aov gives. It's not identical, because this is survey data

Multivariable regression interaction term with categorical variables

I am kind of new to R and am working on glm model and wanted to look for the interaction effect of BMI groups and patient groups (4 groups) on mortality (binary) in subgroup analysis. I have the following codes:
model <- glm(death~patient.group*bmi.group, data = data, family = "binomial")
summary(model)
and I get the following:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.4798903 0.0361911 -96.153 < 2e-16 ***
patient.group2 0.0067614 0.0507124 0.133 0.894
patient.group3 0.0142658 0.0503444 0.283 0.777
patient.group4 0.0212416 0.0497523 0.427 0.669
bmi.group2 0.1009282 0.0478828 2.108 0.035 *
bmi.group3 0.2397047 0.0552043 4.342 1.41e-05 ***
patient.group2:bmi.group2 -0.0488768 0.0676473 -0.723 0.470
patient.group3:bmi.group2 -0.0461319 0.0672853 -0.686 0.493
patient.group4:bmi.group2 -0.1014986 0.0672675 -1.509 0.131
patient.group2:bmi.group3 -0.0806240 0.0791977 -1.018 0.309
patient.group3:bmi.group3 -0.0008951 0.0785683 -0.011 0.991
patient.group4:bmi.group3 -0.0546519 0.0795683 -0.687 0.492
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
So as displayed I will have a p-value for each of the patient.group:bmi.group. My question is, is there a way I can get a single p-value for patient.group:bmi.group instead of one for each subgroup? I have tried to look for answers online but I still could not find the answer :(
Many thanks in advance.
It depends on whether you regard your patient and BMI groups as factors or continuous covariates. If they are covariates, #jay.sf's suggestion is appropriate. It fits a single degree of freedom term for the interaction between the linear effect of patient group and the linear effect of BMI group.
But this depends on both the ordering and definition of the groups. It assumes, for example, that the "difference" between patient groups 1 and 2 is the same as that between patient groups 2 and 3 and so on. Is the ordering of patient groups such that, in some way, group 1 < group 2 < group 3 < group 4? Similarly for BMI. This model would also assume that a change of 1 unit on the patient scale was "the same" as a change of one unit on the BMI scale. I don't know if these are reasonable assumptions.
It would be more usual to consider both patient group and BMI group as factors. This assumes no ordering in groups, nor that the difference between any two groups was equal to that between any other two. In this case, jay.sf's suggestion would give a misleading answer.
To illustrate my point...
First, generate some artifical data as you haven't provided any:
data <- tibble() %>%
expand(patient.group=1:4, bmi.group=1:3, rep=1:5) %>%
mutate(
z=-0.25*patient.group + 0.75*bmi.group,
death=rbernoulli(nrow(.), exp(z)/exp(1+z))
) %>%
select(-z)
Fit a simple continuous covariate model with interaction, as per jay.sf's suggestion:
covariateModel <- glm(death~patient.group * bmi.group, data = data, family = "binomial")
summary(covariateModel)
Giving, in part
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.6962 1.8207 -1.481 0.139
patient.group 0.7407 0.6472 1.144 0.252
bmi.group 1.2697 0.8340 1.523 0.128
patient.group:bmi.group -0.3807 0.2984 -1.276 0.202
Here, the p value for the patient.group:bmi.group interaction is a Wald test based on a single degree of freedom z test.
A slightly more complicated approach is necessary to fit the factor model with interaction and obtain a test for the "overall" interaction effect.
mainEffectModel <- glm(death~as.factor(patient.group) + as.factor(bmi.group), data = data, family = "binomial")
interactionModel <- glm(death~as.factor(patient.group) * as.factor(bmi.group), data = data, family = "binomial")
anova(mainEffectModel, interactionModel, test="Chisq")
Giving
Analysis of Deviance Table
Model 1: death ~ as.factor(patient.group) + as.factor(bmi.group)
Model 2: death ~ as.factor(patient.group) * as.factor(bmi.group)
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 54 81.159
2 48 70.579 6 10.58 0.1023
Here, the change in deviance is a score test and is distributed as a chi-squared statistic on (4-1) x (3-1) = 6 degrees of freedom.
The two approaches give similar answers using my particular dataset, but they may not always do so. Both are statistically correct, but which one is most appropriate depends on your particular situation. We don't have enough information to comment.
This excellent post provides more context.

How to obtain Poisson's distribution "lambda" from R glm() coefficients

My R-script produces glm() coeffs below.
What is Poisson's lambda, then? It should be ~3.0 since that's what I used to create the distribution.
Call:
glm(formula = h_counts ~ ., family = poisson(link = log), data = pois_ideal_data)
Deviance Residuals:
Min 1Q Median 3Q Max
-22.726 -12.726 -8.624 6.405 18.515
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.222532 0.015100 544.53 <2e-16 ***
h_mids -0.363560 0.004393 -82.75 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 11451.0 on 10 degrees of freedom
Residual deviance: 1975.5 on 9 degrees of freedom
AIC: 2059
Number of Fisher Scoring iterations: 5
random_pois = rpois(10000,3)
h=hist(random_pois, breaks = 10)
mean(random_pois) #verifying that the mean is close to 3.
h_mids = h$mids
h_counts = h$counts
pois_ideal_data <- data.frame(h_mids, h_counts)
pois_ideal_model <- glm(h_counts ~ ., pois_ideal_data, family=poisson(link=log))
summary_ideal=summary(pois_ideal_model)
summary_ideal
What are you doing here???!!! You used a glm to fit a distribution???
Well, it is not impossible to do so, but it is done via this:
set.seed(0)
x <- rpois(10000,3)
fit <- glm(x ~ 1, family = poisson())
i.e., we fit data with an intercept-only regression model.
fit$fitted[1]
# 3.005
This is the same as:
mean(x)
# 3.005
It looks like you're trying to do a Poisson fit to aggregated or binned data; that's not what glm does. I took a quick look for canned ways of fitting distributions to canned data but couldn't find one; it looks like earlier versions of the bda package might have offered this, but not now.
At root, what you need to do is set up a negative log-likelihood function that computes (# counts)*prob(count|lambda) and minimize it using optim(); the solution given below using the bbmle package is a little more complex up-front but gives you added benefits like easily computing confidence intervals etc..
Set up data:
set.seed(101)
random_pois <- rpois(10000,3)
tt <- table(random_pois)
dd <- data.frame(counts=unname(c(tt)),
val=as.numeric(names(tt)))
Here I'm using table rather than hist because histograms on discrete data are fussy (having integer cutpoints often makes things confusing because you have to be careful about right- vs left-closure)
Set up density function for binned Poisson data (to work with bbmle's formula interface, the first argument must be called x, and it must have a log argument).
dpoisbin <- function(x,val,lambda,log=FALSE) {
probs <- dpois(val,lambda,log=TRUE)
r <- sum(x*probs)
if (log) r else exp(r)
}
Fit lambda (log link helps prevent numerical problems/warnings from negative lambda values):
library(bbmle)
m1 <- mle2(counts~dpoisbin(val,exp(loglambda)),
data=dd,
start=list(loglambda=0))
all.equal(unname(exp(coef(m1))),mean(random_pois),tol=1e-6) ## TRUE
exp(confint(m1))
## 2.5 % 97.5 %
## 2.972047 3.040009

anova.rq() in quantreg package in R

I'm interested in comparing estimates from different quantiles (same outcome, same covariates) using anova.rqlist function called by anova in the environment of the quantreg package in R. However the math in the function is beyond my rudimentary expertise. Lets say i fit 3 models at different quantiles;
library(quantreg)
data(Mammals) # data in quantreg to be used as a useful example
fit1 <- rq(weight ~ speed + hoppers + specials, tau = .25, data = Mammals)
fit2 <- rq(weight ~ speed + hoppers + specials, tau = .5, data = Mammals)
fit3 <- rq(weight ~ speed + hoppers + specials, tau = .75, data = Mammals)
Then i compare them using;
anova(fit1, fit2, fit3, test="Wald", joint=FALSE)
My question is which is of these models is being used as the basis of the comparison?
My understanding of the Wald test (wiki entry)
where θ^ is the estimate of the parameter(s) of interest θ that is compared with the proposed value θ0.
So my question is what is the anova function in quantreg choosing as the θ0?
Based on the pvalue returned from the anova my best guess is that it is choosing the lowest quantile specified (ie tau=0.25). Is there a way to specify the median (tau = 0.5) or better yet the mean estimate from obtained using lm(y ~ x1 + x2 + x3, data)?
anova(fit1, fit2, fit3, joint=FALSE)
actually produces
Quantile Regression Analysis of Deviance Table
Model: weight ~ speed + hoppers + specials
Tests of Equality of Distinct Slopes: tau in { 0.25 0.5 0.75 }
Df Resid Df F value Pr(>F)
speed 2 319 1.0379 0.35539
hoppersTRUE 2 319 4.4161 0.01283 *
specialsTRUE 2 319 1.7290 0.17911
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
while
anova(fit3, fit1, fit2, joint=FALSE)
produces the exact same result
Quantile Regression Analysis of Deviance Table
Model: weight ~ speed + hoppers + specials
Tests of Equality of Distinct Slopes: tau in { 0.5 0.25 0.75 }
Df Resid Df F value Pr(>F)
speed 2 319 1.0379 0.35539
hoppersTRUE 2 319 4.4161 0.01283 *
specialsTRUE 2 319 1.7290 0.17911
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The order of the models is clearly being changed in the anova, but how is it that the F value and Pr(>F) are identical in both tests?
All the quantiles you input are used and there is not one model used as a reference.
I suggest you read this post and the related answer to understand what your "theta.0" is.
I believe what you are trying to do is to test whether the regression lines are parallel. In other words whether the effects of the predictor variables (only income here) are uniform across quantiles.
You can use the anova() from the quantreg package to answer this question. You should indeed use several fits for each quantile.
When you use joint=FALSE as you did, you get coefficient-wise comparisons. But you only have one coefficient so there is only one line! And your results tells you that the effect of income is not uniform accross quantiles in your example. Use several predictor variables and you will get several p-values.
You can do an overall test of equality of the entire sets of coefficients if you do not use joint=FALSE and that would give you a "Joint Test of Equality of Slopes" and therefore only one p-value.
EDIT:
I think theta.0 is the average slope for all 'tau' values or the actual estimate from 'lm()', rather than a specific slope of any of the models. My reasoning is that 'anova.rq()' does not require any specific low value of 'tau' or even the median 'tau'.
There are several ways to test this. Either do the calculations by hand with theta.0 being equal to the average value, or compare many combinations because then you could a situation where certain of your models are close to the model with a low 'tau' values but not to the 'lm()' value. So if theta.0 is the slope of the first model with lowest 'tau' then your Pr(>F) will be high whereas in the other case, it will be low.
This question should maybe have been asked on cross-validated.

model checking and test of overdispersion for glmer

I am testing differences on the number of pollen grains loading on plant stigmas in different habitats and stigma types.
My sample design comprises two habitats, with 10 sites each habitat.
In each site, I have up to 3 stigma types (wet, dry and semidry), and for each stigma stype, I have different number of plant species, with different number of individuals per plant species (code).
So, I ended up with nested design as follow: habitat/site/stigmatype/stigmaspecies/code
As it is a descriptive study, stigmatype, stigmaspecies and code vary between sites.
My response variable (n) is the number of pollengrains (log10+1)per stigma per plant, average because i collected 3 stigmas per plant.
Data doesnt fit Poisson distribution because (i) is not integers, and (ii) variance much higher than the mean (ratio = 911.0756). So, I fitted as negative.binomial.
After model selection, I have:
m4a <- glmer(n ~ habitat*stigmatype + (1|stigmaspecies/code),
family=negative.binomial(2))
> summary(m4a)
Generalized linear mixed model fit by maximum likelihood ['glmerMod']
Family: Negative Binomial(2) ( log )
Formula: n ~ habitat * stigmatype + (1 | stigmaspecies/code)
AIC BIC logLik deviance
993.9713 1030.6079 -487.9856 975.9713
Random effects:
Groups Name Variance Std.Dev.
code:stigmaspecies (Intercept) 1.034e-12 1.017e-06
stigmaspecies (Intercept) 4.144e-02 2.036e-01
Residual 2.515e-01 5.015e-01
Number of obs: 433, groups: code:stigmaspecies, 433; stigmaspecies, 41
Fixed effects:
Estimate Std. Error t value Pr(>|z|)
(Intercept) -0.31641 0.08896 -3.557 0.000375 ***
habitatnon-invaded -0.67714 0.10060 -6.731 1.68e-11 ***
stigmatypesemidry -0.24193 0.15975 -1.514 0.129905
stigmatypewet -0.07195 0.18665 -0.385 0.699885
habitatnon-invaded:stigmatypesemidry 0.60479 0.22310 2.711 0.006712 **
habitatnon-invaded:stigmatypewet 0.16653 0.34119 0.488 0.625491
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) hbttn- stgmtyps stgmtypw hbttnn-nvdd:stgmtyps
hbttnn-nvdd -0.335
stgmtypsmdr -0.557 0.186
stigmatypwt -0.477 0.160 0.265
hbttnn-nvdd:stgmtyps 0.151 -0.451 -0.458 -0.072
hbttnn-nvdd:stgmtypw 0.099 -0.295 -0.055 -0.403 0.133
Two questions:
How do I check for overdispersion from this output?
What is the best way to go through model validation here?
I have been using:
qqnorm(resid(m4a))
hist(resid(m4a))
plot(fitted(m4a),resid(m4a))
While qqnorm() and hist() seem ok, and there is a tendency of heteroscedasticity on the 3rd graph. And here is my final question:
Can I go through model validation with this graph in glmer? or is there a better way to do it? if not, how much should I worry about the 3rd graph?
a simple way to check for overdispersion in glmer is:
> library("blmeco")
> dispersion_glmer(your_model) #it shouldn't be over
> 1.4
To solve overdispersion I usually add an observation level random factor
For model validation I usually start from these plots...but then depends on your specific model...
par(mfrow=c(2,2))
qqnorm(resid(your_model), main="normal qq-plot, residuals")
qqline(resid(your_model))
qqnorm(ranef(your_model)$id[,1])
qqline(ranef(your_model)$id[,1])
plot(fitted(your_model), resid(your_model)) #residuals vs fitted
abline(h=0)
dat_kackle$fitted <- fitted(your_model) #fitted vs observed
plot(your_data$fitted, jitter(your_data$total,0.1))
abline(0,1)
hope this helps a little....
cheers
Just an addition to Q1 for those who might find this by googling: the blmco dispersion_glmer function appears to be outdated. It is better to use #Ben_Bolker's function for this purpose:
overdisp_fun <- function(model) {
rdf <- df.residual(model)
rp <- residuals(model,type="pearson")
Pearson.chisq <- sum(rp^2)
prat <- Pearson.chisq/rdf
pval <- pchisq(Pearson.chisq, df=rdf, lower.tail=FALSE)
c(chisq=Pearson.chisq,ratio=prat,rdf=rdf,p=pval)
}
Source: https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#overdispersion.
With the highlighted notion:
Do PLEASE note the usual, and extra, caveats noted here: this is an APPROXIMATE estimate of an overdispersion parameter.
PS. Why outdated?
The lme4 package includes the residuals function these days, and Pearson residuals are supposedly more robust for this type of calculation than the deviance residuals. The blmeco::dispersion_glmer sums up the deviance residuals together with u cubed, divides by residual degrees of freedom and takes a square root of the value (the function):
dispersion_glmer <- function (modelglmer)
{
n <- length(resid(modelglmer))
return(sqrt(sum(c(resid(modelglmer), modelglmer#u)^2)/n))
}
The blmeco solution gives considerably higher deviance/df ratios than Bolker's function. Since Ben is one of the authors of the lme4 package, I would trust his solution more although I am not qualified to rationalize the statistical reason.
x <- InsectSprays
x$id <- rownames(x)
mod <- lme4::glmer(count ~ spray + (1|id), data = x, family = poisson)
blmeco::dispersion_glmer(mod)
# [1] 1.012649
overdisp_fun(mod)
# chisq ratio rdf p
# 55.7160734 0.8571704 65.0000000 0.7873823

Resources