Fitting a ordinal logistic mixed effect model - r

How do I fit a ordinal (3 levels), logistic mixed effect model, in R? I guess it would be like a glmer except with three outcome levels.
data structure
patientid Viral_load Adherence audit_score visit
1520 0 optimal nonhazardous 1
1520 0 optimal nonhazardous 2
1520 0 optimal hazardous 3
1526 1 suboptimal hazardous 1
1526 0 optimal hazardous 2
1526 0 optimal hazardous 3
1568 2 suboptimal hazardous 1
1568 2 suboptimal nonhazardous 2
1568 2 suboptimal nonhazardous 3
Where viral load (outcome of interest) consists of three levels (0,1,2), adherence - optimal/suboptimal , audit score nonhazardous/hazardous, and 3 visits.
So an example of how the model should look using a generalized mixed effect model code.
library (lme4)
test <- glmer(viral_load ~ audit_score + adherence + (1|patientid) + (1|visit), data = df,family = binomial)
summary (test)
The results from this code is incorrect because it takes viral_load a binomial outcome.
I hope my question is clear.

You might try the ordinal packages clmm function:
fmm1 <- clmm(rating ~ temp + contact + (1|judge), data = wine)
summary(fmm1)
Cumulative Link Mixed Model fitted with the Laplace approximation
formula: rating ~ temp + contact + (1 | judge)
data: wine
link threshold nobs logLik AIC niter max.grad cond.H
logit flexible 72 -81.57 177.13 332(999) 1.02e-05 2.8e+01
Random effects:
Groups Name Variance Std.Dev.
judge (Intercept) 1.279 1.131
Number of groups: judge 9
Coefficients:
Estimate Std. Error z value Pr(>|z|)
tempwarm 3.0630 0.5954 5.145 2.68e-07 ***
contactyes 1.8349 0.5125 3.580 0.000344 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Threshold coefficients:
Estimate Std. Error z value
1|2 -1.6237 0.6824 -2.379
2|3 1.5134 0.6038 2.507
3|4 4.2285 0.8090 5.227
4|5 6.0888 0.9725 6.261
I'm pretty sure that the link is logistic, since running the same model with the more flexible clmm2 function, where the default link is documented to be logistic, I get the same results.

Related

simr: powerSim gives 100% for all effect sizes

I have carried out a binomial GLMM to determine how latitude and native status (native/non-native) of a set of plant species affects herbivory damage. I am now trying to determine the statistical power of my model when I change the effect sizes. My model looks like this:
latglmm <- glmer(cbind(chewing,total.cells-chewing) ~ scale(latitude) * native.status + scale(sample.day.of.year) + (1|genus) + (1|species) + (1|catalogue.number), family=binomial, data=mna)
where cbind(chewing,total.cells-chewing) gives me a proportion (of leaves with herbivory damage), native.status is either "native" or "non-native" and catalogue.number acts as an observation-level random effect to deal with overdispersion. There are 10 genus, each with at least 1 native and 1 non-native species to make 26 species in total. The model summary is:
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: cbind(chewing, total.cells - chewing) ~ scale(latitude) * native.status +
scale(sample.day.of.year) + (1 | genus) + (1 | species) + (1 | catalogue.number)
Data: mna
AIC BIC logLik deviance df.resid
3986.7 4023.3 -1985.4 3970.7 706
Scaled residuals:
Min 1Q Median 3Q Max
-1.3240 -0.4511 -0.0250 0.1992 1.0765
Random effects:
Groups Name Variance Std.Dev.
catalogue.number (Intercept) 1.26417 1.1244
species (Intercept) 0.08207 0.2865
genus.ID (Intercept) 0.33431 0.5782
Number of obs: 714, groups: catalogue.number, 713; species, 26; genus.ID, 10
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.61310 0.20849 -12.534 < 2e-16 ***
scale(latitude) -0.17283 0.06370 -2.713 0.00666 **
native.statusnon-native 0.11434 0.15554 0.735 0.46226
scale(sample.day.of.year) 0.28521 0.05224 5.460 4.77e-08 ***
scale(latitude):native.statusnon-native -0.02986 0.09916 -0.301 0.76327
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) scallt ntv.s- scaldy
scalelat 0.012
ntv.sttsnn- -0.304 -0.014
scaledoy 0.018 -0.085 -0.027
scllt:ntv.- -0.011 -0.634 0.006 -0.035
I should add that the actual model I have been using is a glmmTMB model as lme4 still had some overdispersion even with the observation-level random effect, but this is not compatible with simr so I am using lme4 (the results are very similar for both). I want to see what happens to the model power when I increase or decrease the effect sizes of latitude and native status but when I run fixef(latglmm1)["scale(latitude)"]<--1 and fixef(latglmm1)["native.statusnon-native"]<--1 and try this:
powerSim(latglmm, fcompare(~ scale(latitude) + native.status))
I get the following output:
Power for model comparison, (95% confidence interval):====================================================================|
100.0% (69.15, 100.0)
Test: Likelihood ratio
Comparison to ~scale(latitude) + native.status + [re]
Based on 10 simulations, (0 warnings, 0 errors)
alpha = 0.05, nrow = 1428
Time elapsed: 0 h 1 m 5 s
The output is the same (100% power) no matter what I change fixef() to. Based on other similar questions online I have ensured that my data has no NA values and according to my powerSim there are no warnings or errors to address. I am completely lost as to why this isn't working so any help would be greatly appreciated!
Alternatively, if anyone has any recommendations for other methods to carry out similar analysis I would love to hear them. What I really want is to get a p-value for each effect size I input but statistical power would be very valuable too.
Thank you!

fitting a GLMER model with glmmTMB

I'm trying to fit a generalized linear mixed model with glmmTMB
summary(glmmTMB(cbind(SARA_ph58, 1)~ `Melk T`+VetT+EiwT+
`VET EIWIT ratio`+LactT+CelT+UrmT+vetg+eiwitg+lactg+
`DS opname`+`boluses per day`+`chewings per bolus`+
`rumination (min/d)`+ Activiteit + (1|experiment),
data = dataset1geenNA, family = binomial()))
When i run this code i get some output but is also get the next warning message:
1: In fitTMB(TMBStruc) :
Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
2: In sqrt(diag(vcov)) : NaNs produced
does anybody know how to solve this problem?
Output:
Family: binomial ( logit )
Formula: cbind(SARA_ph58, 1) ~ `Melk T` + VetT + EiwT + `VET EIWIT ratio` + LactT + CelT + UrmT + vetg + eiwitg + lactg + `DS opname` +
`boluses per day` + `chewings per bolus` + `rumination (min/d)` + Activiteit + (1 | experiment)
Data: dataset1geenNA
AIC BIC logLik deviance df.resid
NA NA NA NA 79
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
experiment (Intercept) 5.138e-08 0.0002267
Number of obs: 96, groups: experiment, 3
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.595e+01 1.605e+01 -0.994 0.3202
`Melk T` -2.560e-01 1.330e-01 -1.925 0.0542 .
VetT -7.499e+00 3.166e+00 -2.369 0.0178 *
EiwT 8.353e+00 4.885e+00 1.710 0.0872 .
`VET EIWIT ratio` 2.100e+01 1.545e+01 1.359 0.1742
LactT -2.086e+00 8.571e-01 -2.434 0.0149 *
CelT -1.430e-04 6.939e-04 -0.206 0.8367
UrmT 1.300e-02 3.978e-02 0.327 0.7438
vetg 1.166e-03 NA NA NA
eiwitg -2.596e-03 5.180e-03 -0.501 0.6162
lactg 7.862e-03 NA NA NA
`DS opname` -1.882e-02 8.416e-02 -0.224 0.8231
`boluses per day` -3.200e-02 1.226e-01 -0.261 0.7940
`chewings per bolus` 1.758e-02 6.688e-02 0.263 0.7927
`rumination (min/d)` -1.468e-03 3.145e-03 -0.467 0.6408
Activiteit 4.265e-03 4.625e-03 0.922 0.3564
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
There are a number of issues here.
The proximal problem is that you have a (near) singular fit: glmmTMB is trying to make the variance zero (5.138e-08 is as close as it can get). Because it fits on a log-variance (actually log-standard-deviation) scale, this means that it's trying to go to -∞, which makes the covariance matrix of the parameters impossible to estimate.
The main reason this is happening is that you have a very small number of groups (3) in your random effect (experiment).
These are extremely common issues with mixed models: you can start by reading ?lme4::isSingular and the relevant section of the GLMM FAQ.
The simplest solution would be to treat experiment as a fixed effect, in which case you would no longer have a mixed model and you could back to plain glm().
Another slightly worrying aspect of your code is the response variable cbind(SARA_ph58, 1). If SARA_ph58 is a binary (0/1) variable you can use just SARA_ph58. If you pass a two-column matrix as you are doing, the first column is interpreted as the number of successes and the second column as the number of failures; it looks like you may have been trying to specify that the total number of trials for each observation is 1 (again, if this is the case you can just use SARA_ph58 as the response).
One final note is that lme4::glmer is a little more tolerant of singular fits than glmmTMB.

Checking parallel regression assumption in ordinal logistic regression

I have tried to build an ordinal logistic regression using one ordered categorical variable and another three categorical dependent variables (N= 43097). While all coefficients are significant, I have doubts about meeting the parallel regression assumption. Though the probability values of all variables and the whole model in the brant test are perfectly zero (which supposed to be more than 0.05), still test is displaying that H0: Parallel Regression Assumption holds. I am confused here. Is this model perfectly meets the criteria of the parallel regression assumption?
library(MASS)
table(hh18_u_r$cat_ci_score) # Dependent variable
Extremely Vulnerable Moderate Vulnerable Pandemic Prepared
6143 16341 20613
# Ordinal logistic regression
olr_2 <- polr(cat_ci_score ~ r1_gender + r2_merginalised + r9_religion, data = hh18_u_r, Hess=TRUE)
summary(olr_2)
Call:
polr(formula = cat_ci_score ~ r1_gender + r2_merginalised + r9_religion,
data = hh18_u_r, Hess = TRUE)
Coefficients:
Value Std. Error t value
r1_genderMale 0.3983 0.02607 15.278
r2_merginalisedOthers 0.6641 0.01953 34.014
r9_religionHinduism -0.2432 0.03069 -7.926
r9_religionIslam -0.5425 0.03727 -14.556
Intercepts:
Value Std. Error t value
Extremely Vulnerable|Moderate Vulnerable -1.5142 0.0368 -41.1598
Moderate Vulnerable|Pandemic Prepared 0.4170 0.0359 11.6260
Residual Deviance: 84438.43
AIC: 84450.43
## significance of coefficients and intercepts
summary_table_2 <- coef(summary(olr_2))
pval_2 <- pnorm(abs(summary_table_2[, "t value"]), lower.tail = FALSE)* 2
summary_table_2 <- cbind(summary_table_2, pval_2)
summary_table_2
Value Std. Error t value pval_2
r1_genderMale 0.3982719 0.02606904 15.277583 1.481954e-52
r2_merginalisedOthers 0.6641311 0.01952501 34.014386 2.848250e-250
r9_religionHinduism -0.2432085 0.03068613 -7.925682 2.323144e-15
r9_religionIslam -0.5424992 0.03726868 -14.556436 6.908533e-48
Extremely Vulnerable|Moderate Vulnerable -1.5141502 0.03678710 -41.159819 0.000000e+00
Moderate Vulnerable|Pandemic Prepared 0.4169645 0.03586470 11.626042 3.382922e-31
#Test of parallel regression assumption
library(brant)
brant(olr_2) # Probability supposed to be more than 0.05 as I understand
----------------------------------------------------
Test for X2 df probability
----------------------------------------------------
Omnibus 168.91 4 0
r1_genderMale 12.99 1 0
r2_merginalisedOthers 41.18 1 0
r9_religionHinduism 86.16 1 0
r9_religionIslam 25.13 1 0
----------------------------------------------------
H0: Parallel Regression Assumption holds
# Similar test of parallel regression assumption using car package
library(car)
car::poTest(olr_2)
Tests for Proportional Odds
polr(formula = cat_ci_score ~ r1_gender + r2_merginalised + r9_religion,
data = hh18_u_r, Hess = TRUE)
b[polr] b[>Extremely Vulnerable] b[>Moderate Vulnerable] Chisquare df Pr(>Chisq)
Overall 168.9 4 < 2e-16 ***
r1_genderMale 0.398 0.305 0.442 13.0 1 0.00031 ***
r2_merginalisedOthers 0.664 0.513 0.700 41.2 1 1.4e-10 ***
r9_religionHinduism -0.243 -0.662 -0.147 86.2 1 < 2e-16 ***
r9_religionIslam -0.542 -0.822 -0.504 25.1 1 5.4e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Kindly suggest whether this model satisfies the parallel regression assumption? Thank you
It tells you the null hypothesis (H0) is that it holds. Your values are statistically significant, which means you reject the null hypothesis (H0). It wasn't showing you that to say the assumption was met but rather it was just a reminder of what the null is.

Direction of Estimate coefficient in a Log Regression

I'm analysing ordinal logistic regression and I'm wondering, how to know which direction the estimate coefficient has? My Variables are just 0, 1 for Women, Men and 0,1,2,4 for different postures. So my question is, how do I know, if the estimate describes the change from 0 to 1 or the change from 1 to 0, talking about gender?
The output added a 1 to PicSex, is it a sign, that this one has a 1->0 direction? See the code for that.
Thank you for any help
Cumulative Link Mixed Model fitted with the Laplace approximation
formula: Int ~ PicSex + Posture + (1 | PicID)
data: x
Random effects:
Groups Name Variance Std.Dev.
PicID (Intercept) 0.0541 0.2326
Number of groups: PicID 16
Coefficients:
Estimate Std. Error z value Pr(>|z|)
PicSex1 0.3743 0.1833 2.042 0.0411 *
Posture -1.1232 0.1866 -6.018 1.77e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
P.S.
Thank you here are my head results (I relabeled PicSex to Sex)
> head(Posture)
[1] 1 1 1 1 1 1
Levels: 0 1
> head(Sex)
[1] 0 0 0 0 0 0
Levels: 0 1
So the level order is the same, but on Sex it still added a 1 but on posture not. Still very confused about the directions.
Your sex has two levels,0 or 1. So PicSex1 means the effect of PicSex being 1 compared to PicSex being 0. I show an example below using the wine dataset:
library(ordinal)
DATA = wine
> head(DATA$temp)
[1] cold cold cold cold warm warm
Levels: cold warm
Here cold comes first in Levels, so it is set as the reference in any linear models.First we verify the effect of cold vs warm
do.call(cbind,tapply(DATA$rating,DATA$temp,table))
#warm has a higher average rating
Fit the model
# we fit the a model, temp is fixed effect
summary(clmm(rating ~ temp + contact+(1|judge), data = DATA))
Cumulative Link Mixed Model fitted with the Laplace approximation
formula: rating ~ temp + contact + (1 | judge)
data: DATA
link threshold nobs logLik AIC niter max.grad cond.H
logit flexible 72 -81.57 177.13 332(999) 1.03e-05 2.8e+01
Random effects:
Groups Name Variance Std.Dev.
judge (Intercept) 1.279 1.131
Number of groups: judge 9
Coefficients:
Estimate Std. Error z value Pr(>|z|)
tempwarm 3.0630 0.5954 5.145 2.68e-07 ***
contactyes 1.8349 0.5125 3.580 0.000344 ***
Here we see warm being attached to "temp" and as we know, it has a positive coefficient because the rating is better in warm, compared to cold (the reference).
So if you set another group as the reference, you will see another name attached, and the coefficient is reversed (-3.. compared to +3.. in previous example)
# we set warm as reference now
DATA$temp = relevel(DATA$temp,ref="warm")
summary(clmm(rating ~ temp + contact+(1|judge), data = DATA))
Cumulative Link Mixed Model fitted with the Laplace approximation
formula: rating ~ temp + contact + (1 | judge)
data: DATA
link threshold nobs logLik AIC niter max.grad cond.H
logit flexible 72 -81.57 177.13 269(810) 1.14e-04 1.8e+01
Random effects:
Groups Name Variance Std.Dev.
judge (Intercept) 1.28 1.131
Number of groups: judge 9
Coefficients:
Estimate Std. Error z value Pr(>|z|)
tempcold -3.0630 0.5954 -5.145 2.68e-07 ***
contactyes 1.8349 0.5125 3.580 0.000344 ***
So always check what is the reference before you fit the model

Significant interaction in linear mixed effect model but plot shows overlapping confidence intervals?

I try to show you as much as possible of the structure of the data and the results I produced.
The structure of the data is the following:
GroupID Person Factor2 Factor1 Rating
<int> <int> <fctr> <fctr> <int>
1 2 109 2 0 1
2 2 109 2 1 -2
3 2 104 1 0 4
4 2 236 1 1 1
5 2 279 1 1 2
6 2 179 2 1 0
Person is the participant ID, GroupID is the kind of stimulus rated, Factor 1 (levels 0 and 1) and Factor 2 (levels 1 and 2) are fixed factors and the Ratings are the outcome variables.
I am trying to print a plot for a significant interaction in a linear mixed effect model. I used the packages lme4 and lmerTest to analyze the data.
This is the model we ran:
> model_interaction <- lmer(Rating ~ Factor1 * Factor2 + ( 1 | Person) +
(1 | GroupID), data)
> model_interaction
Linear mixed model fit by REML ['merModLmerTest']
Formula: Rating ~ Factor1 * Factor2 + (1 | Person) + (1 | GroupID)
Data: data
REML criterion at convergence: 207223.9
Random effects:
Groups Name Std.Dev.
Person (Intercept) 1.036
GroupID (Intercept) 1.786
Residual 1.880
Number of obs: 50240, groups: Person, 157; GroupID, 80
Fixed Effects:
(Intercept) Factor11 Factor22 Factor11:Factor22
-0.43823 0.01313 0.08568 0.12440
When I use the summary() function R returns the following output
> summary(model_interaction)
Linear mixed model fit by REML
t-tests use Satterthwaite approximations to degrees of freedom
['lmerMod']
Formula: Rating ~ Factor1 * Factor2 + (1 | Person) + (1 | GroupID)
Data: data
REML criterion at convergence: 207223.9
Scaled residuals:
Min 1Q Median 3Q Max
-4.8476 -0.6546 -0.0213 0.6516 4.2284
Random effects:
Groups Name Variance Std.Dev.
Person (Intercept) 1.074 1.036
GroupID (Intercept) 3.191 1.786
Residual 3.533 1.880
Number of obs: 50240, groups: Person, 157; GroupID, 80
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) -4.382e-01 2.185e-01 1.110e+02 -2.006 0.047336 *
Factor11 1.313e-02 2.332e-02 5.004e+04 0.563 0.573419
Factor22 8.568e-02 6.275e-02 9.793e+03 1.365 0.172138
Factor11:Factor22 1.244e-01 3.385e-02 5.002e+04 3.675 0.000238 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) Fctr11 Fctr22
Factor11 -0.047
Factor22 -0.135 0.141
Fctr11:Fc22 0.034 -0.694 -0.249
I know it is not possible to interpret p-Values for linear mixed effects model. So I ran an additional anova, comparing the interaction model to a model with just the main effects of Factor1 and Factor2
> model_Factor1_Factor2 = lmer(Rating ~ Factor1 + Factor2 +
( 1 | Person) + (1 | GroupID), data)
> anova(model_Factor1_Factor2, model_interaction)
Data: data
Models:
object: Rating ~ Factor1 + Factor2 + (1 | Person) + (1 | GroupID)
..1: Rating ~ Factor1 * Factor2 + (1 | Person) + (1 | GroupID)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
object 6 207233 207286 -103611 207221
..1 7 207222 207283 -103604 207208 13.502 1 0.0002384 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I interpreted this Output as: the Interaction of Factor1 and Factor2 explains additional variance in my outcome measurement compared to the model with only the main effects of Factor1 and Factor2.
Since interpreting output for linear mixed effects models is hard I would like to print a graph showing the interaction of Factor1 and Factor2. I did so using lsmeans package (first I used the plot(allEffects) but after reading this How to get coefficients and their confidence intervals in mixed effects models? question I realized that it was not the right way to print graphs for linear mixed effect models).
So this is what I did (following this Website http://rcompanion.org/handbook/G_06.html)
> leastsquare = lsmeans(model_interaction, pairwise ~ Factor2:Factor1,
adjust="bon")
> CLD = cld(leastsquare, alpha=0.05, Letters=letters, adjust="bon")
> CLD$.group=gsub(" ", "", CLD$.group)
> CLD
Factor2 Factor1 lsmean SE df lower.CL upper.CL .group
1 0 -0.4382331 0.2185106 111.05 -0.9930408 0.1165746 a
1 1 -0.4251015 0.2186664 111.36 -0.9803048 0.1301018 a
2 0 -0.3525561 0.2190264 112.09 -0.9086735 0.2035612 a
2 1 -0.2150234 0.2189592 111.95 -0.7709700 0.3409233 b
Degrees-of-freedom method: satterthwaite
Confidence level used: 0.95
Conf-level adjustment: bonferroni method for 4 estimates
P value adjustment: bonferroni method for 6 tests
significance level used: alpha = 0.05
This is the plotting funtion I used
> ggplot(CLD, aes(`Factor1`, y = lsmean, ymax = upper.CL,
ymin = lower.CL, colour = `Factor2`, group = `Factor2`)) +
geom_pointrange(stat = "identity",
position = position_dodge(width = 0.1)) +
geom_line(position = position_dodge(width = 0.1))
The plot can be found using this link (I am not allowed to post images yet, please excuse the workaround)
Interaction of Factor1 and Factor2
Now my question is the following: Why do I have a significant interaction and a significant amount of explained variance by this interaction but my confidence intervals in the plot overlap? I guess I did something wrong with the confidence intervals? Or is it because it is just not possible to interpret the significance indices for linear mixed effects models?
Because it’s apples and oranges.
Apples: confidence intervals for means.
Oranges: tests of differences of means.
Means, and differences of means, are different statistics, and they have different standard errors and other distributional properties. In mixed models especially, they can be radically different because some sources of variation may cancel out when you take differences.
Don’t try to use confidence intervals to do comparisons. It’s like trying to make chicken soup out of hamburger.

Resources