I am trying to MuMIn::dredge linear mixed-effect models lme4::lmer with categorical/continuous variables, the code is as follows:
# Selection of variables of interest
sig<-c("Age", "Sex", "BMI", "(1|HID)", "h_age", "h", "h_g", "smk_hs")
# Model formula
formula<-paste0("log10_PBA_N", "~", paste0(c(sig), collapse="+"))
# Global model
model<-lmer(formula, data=data)
# Dredging
DRG<-dredge(global.model=model)
The code runs fine (I guess), but in the results, I have this:
Global model call: lmer(formula = formula, data = data)
---
Model selection table
(Int) Age BMI h h_age h_g Sex smk_hs df logLik AICc delta weight
2 -0.2363 -0.01421 4 -332.476 673.0 0.00 0.847
66 -0.2461 -0.01420 + 5 -333.689 677.5 4.47 0.090
34 -0.2406 -0.01417 + 5 -334.508 679.2 6.11 0.040
4 -0.3348 -0.01598 0.007096 5 -335.935 682.0 8.96 0.010
18 -0.1553 -0.01421 + 7 -334.310 682.9 9.84 0.006
98 -0.2493 -0.01416 + + 6 -335.723 683.6 10.60 0.004
68 -0.3463 -0.01599 0.007206 + 6 -337.140 686.5 13.43 0.001
Can someone please explain to me, what does the "+" sign mean in the results?
I recently had the exact same question and was struggling to find an answer. However, based on a response to a similar question asked on R Studio Community, I think the answer is simply that a '+' sign means that a given categorical variable term is included as significant in that particular model.
So, looking at your table, the first model only includes the intercept, the second includes the intercept and the smk_hs categorical variable, the third includes the intercept and the Sex variable, etc.
Related
We are performing a beta mixed-effects regression analysis using glmmTMB package, as shown below:
mod = glmmTMB::glmmTMB(data = data,
formula = rating ~ par1 + par2 + par3 +
(1|subject)+(1|item),
family = glmmTMB::beta_family())
Next, we would like to run a model comparison — something similar to the ‘step’ function that is used for ‘lm’ objects. So far, we found the function ‘dredge’ from the MuMIn package which computes the fit of the nested models according to a criterion (e.g. BIC):
MuMIn::dredge(mod, rank = 'BIC', evaluate = T)
OUTPUT:
Model selection table
cnd((Int)) dsp((Int)) cnd(par1) cnd(par2) cnd(par3) df logLik BIC delta weight
2 1.341 + -0.4466 5 2648.524 -5258.3 0.00 0.950
6 1.341 + -0.4466 0.03311 6 2648.913 -5251.3 6.97 0.029
4 1.341 + -0.4468 -0.005058 6 2648.549 -5250.6 7.70 0.020
8 1.341 + -0.4470 -0.011140 0.03798 7 2649.025 -5243.8 14.49 0.001
1 1.321 + 4 2604.469 -5177.9 80.36 0.000
5 1.321 + 0.03116 5 2604.856 -5171.0 87.34 0.000
3 1.321 + -0.001771 5 2604.473 -5170.2 88.10 0.000
7 1.321 + -0.007266 0.03434 6 2604.909 -5163.3 94.98 0.000
However, we would like to know whether the difference in fit between these nested models is statistically significant. For lms with a normally distributed dependent variable, we would use anova, but here we are not sure if it is applicable to models with beta distribution or glmmTMB object.
You could use the buildmer package to do stepwise regression with glmmTMB models (you should definitely read about critiques of stepwise regression as well). However, the short answer to your question is that the anova() method, which implements a likelihood ratio test, is implemented for pairwise comparison of glmmTMB fits of nested models, and the theory works just fine. Some of the more important assumptions are: (1) no model assumptions are violated [independence, choice of conditional distribution, linearity on the appropriate scale, normality of random effects, etc.]; (2) the models are nested, and are applied to the same data set; (3) the sample size is large enough that asymptotic methods are applicable.
I am kind of new to R and am working on glm model and wanted to look for the interaction effect of BMI groups and patient groups (4 groups) on mortality (binary) in subgroup analysis. I have the following codes:
model <- glm(death~patient.group*bmi.group, data = data, family = "binomial")
summary(model)
and I get the following:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.4798903 0.0361911 -96.153 < 2e-16 ***
patient.group2 0.0067614 0.0507124 0.133 0.894
patient.group3 0.0142658 0.0503444 0.283 0.777
patient.group4 0.0212416 0.0497523 0.427 0.669
bmi.group2 0.1009282 0.0478828 2.108 0.035 *
bmi.group3 0.2397047 0.0552043 4.342 1.41e-05 ***
patient.group2:bmi.group2 -0.0488768 0.0676473 -0.723 0.470
patient.group3:bmi.group2 -0.0461319 0.0672853 -0.686 0.493
patient.group4:bmi.group2 -0.1014986 0.0672675 -1.509 0.131
patient.group2:bmi.group3 -0.0806240 0.0791977 -1.018 0.309
patient.group3:bmi.group3 -0.0008951 0.0785683 -0.011 0.991
patient.group4:bmi.group3 -0.0546519 0.0795683 -0.687 0.492
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
So as displayed I will have a p-value for each of the patient.group:bmi.group. My question is, is there a way I can get a single p-value for patient.group:bmi.group instead of one for each subgroup? I have tried to look for answers online but I still could not find the answer :(
Many thanks in advance.
It depends on whether you regard your patient and BMI groups as factors or continuous covariates. If they are covariates, #jay.sf's suggestion is appropriate. It fits a single degree of freedom term for the interaction between the linear effect of patient group and the linear effect of BMI group.
But this depends on both the ordering and definition of the groups. It assumes, for example, that the "difference" between patient groups 1 and 2 is the same as that between patient groups 2 and 3 and so on. Is the ordering of patient groups such that, in some way, group 1 < group 2 < group 3 < group 4? Similarly for BMI. This model would also assume that a change of 1 unit on the patient scale was "the same" as a change of one unit on the BMI scale. I don't know if these are reasonable assumptions.
It would be more usual to consider both patient group and BMI group as factors. This assumes no ordering in groups, nor that the difference between any two groups was equal to that between any other two. In this case, jay.sf's suggestion would give a misleading answer.
To illustrate my point...
First, generate some artifical data as you haven't provided any:
data <- tibble() %>%
expand(patient.group=1:4, bmi.group=1:3, rep=1:5) %>%
mutate(
z=-0.25*patient.group + 0.75*bmi.group,
death=rbernoulli(nrow(.), exp(z)/exp(1+z))
) %>%
select(-z)
Fit a simple continuous covariate model with interaction, as per jay.sf's suggestion:
covariateModel <- glm(death~patient.group * bmi.group, data = data, family = "binomial")
summary(covariateModel)
Giving, in part
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.6962 1.8207 -1.481 0.139
patient.group 0.7407 0.6472 1.144 0.252
bmi.group 1.2697 0.8340 1.523 0.128
patient.group:bmi.group -0.3807 0.2984 -1.276 0.202
Here, the p value for the patient.group:bmi.group interaction is a Wald test based on a single degree of freedom z test.
A slightly more complicated approach is necessary to fit the factor model with interaction and obtain a test for the "overall" interaction effect.
mainEffectModel <- glm(death~as.factor(patient.group) + as.factor(bmi.group), data = data, family = "binomial")
interactionModel <- glm(death~as.factor(patient.group) * as.factor(bmi.group), data = data, family = "binomial")
anova(mainEffectModel, interactionModel, test="Chisq")
Giving
Analysis of Deviance Table
Model 1: death ~ as.factor(patient.group) + as.factor(bmi.group)
Model 2: death ~ as.factor(patient.group) * as.factor(bmi.group)
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 54 81.159
2 48 70.579 6 10.58 0.1023
Here, the change in deviance is a score test and is distributed as a chi-squared statistic on (4-1) x (3-1) = 6 degrees of freedom.
The two approaches give similar answers using my particular dataset, but they may not always do so. Both are statistically correct, but which one is most appropriate depends on your particular situation. We don't have enough information to comment.
This excellent post provides more context.
I am looking to run a mixed effects model in R based on how I used to run the stats in SPSS with a repeated measures ANOVA. Here is how I set up the repeated measures ANOVA in SPSS. How would I convert this to lme4 in R?
Key:
EBT100... is the name of the task, Genotype is my IV, and my within-subject factors are Day (5 levels) and Cue (9 levels). Att is my DV.
In R, this is the code that I am trying to run:
In R, here is my code:
lmeModel <- lmer(Att ~ Genotype*Day*Cue + (1|Subject)
My Genotype Effect is the same between R and SPSS (p~0.12), but all of my interactions are different (Genotype x Day, Genotype x Cue, Genotype x Day x Cue).
R (lme4) Output:
Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)
Genotype 488 243.9 2 32 2.272 0.11954
Day 25922 6480.4 4 1408 60.356 < 2.2e-16 ***
Cue 35821 4477.6 8 1408 41.703 < 2.2e-16 ***
Genotype:Day 3646 455.7 8 1408 4.244 4.751e-05 ***
Genotype:Cue 736 46.0 16 1408 0.429 0.97560
Day:Cue 5063 158.2 32 1408 1.474 0.04352 *
Genotype:Day:Cue 3297 51.5 64 1408 0.480 0.99984
SPSS Repeated Measures ANOVA output:
F.value Pr(>F)
Genotype 2.272 0.120
Day 9.603 0.000
Cue 83.916 0.000
Genotype:Day 0.675 0.712
Genotype:Cue 0.863 0.613
Day:Cue 3.168 0.00
Genotype:Day:Cue 1.031 0.411
You can see that the main effect of Genotype is the same for both R and SPSS. Additionally, in R, my DenDF output is not correct either. Any idea as to why this would be?
Even more...
Using ezANOVA, with the same dataset that I am using for lme4, this is my code:
anova <- ezANOVA(data = dat,
wid = Subject,
dv = Att,
within = .(Day, Cue),
between = Genotype,
type = 3)
ezANOVA Output:
Effect DFn DFd F p p<.05 ges
2 Genotype 2 32 2.2715034 1.195449e-01 0.044348362
3 Day 4 128 9.6034152 8.003233e-07 * 0.103474748
5 Cue 8 256 83.9162989 3.938364e-67 * 0.137556761
4 Genotype:Day 8 128 0.6753544 7.124675e-01 0.015974029
6 Genotype:Cue 16 256 0.8624463 6.133218e-01 0.003267726
7 Day:Cue 32 1024 3.1679308 1.257738e-08 * 0.022046134
8 Genotype:Day:Cue 64 1024 1.0313631 4.115000e-01 0.014466102
How can I convert ezANOVA to lme4?
Any information would be greatly appreciated!
Thank you!
First off: It would be very beneficial and instructive if you could share your data, which allows for an easier comparison of lmer results with those from SPSS/ezANOVA.
Personally I prefer mixed effect (i.e. hierarchical) models as I find them easier to understand (and construct), so I am not that familiar with repeated measure ANOVA. Translating the latter into the former boils down to correctly translating within/between effects of your RM-ANOVA into the appropriate terms of your lmer mixed-effect model.
Provided I understood you correctly, the following seems consistent with your model problem statement:
Genotype is your fixed effect
Subject is your random (grouping or blocking) effect
Day is a within-Subject effect
Cue is a within-Subject effect
The corresponding lmer model should look something like this:
lmer(Obs ~ Genotype * Day * Cue + (Day:Cue|Subject)
If this is not tractable, you should try
lmer(Obs ~ Genotype * Day * Cue + (Day|Subject) + (Cue|Subject) + (1|Subject)
I have been trying to work with options available within R (i.e. MICE) to do binary logistic regression analyses (with interaction between continuous and categorical predictors).
However, I am struggling to carry out this simple analysis on multiply imputed data (details and reproducible example here).
Specifically, I have not been able to figure out a way to pool every aspect of the output including an equivalence of 'log likelihood ratio' using the GLM function of Mice.
To avoid redundancy from a previous post, I am seeking ANY suggestions for R packages or other softwares that may make it easy/possible to pool all essential components of the output for binary logistic regression (i.e. equivalent of model likelihood ratio test, regression coefficients, wald test). See below for an example that I was able to obtain using rms on a non-imputed data (could not figure out a way to run this on multiply imputed data)
> mylogit
Frequencies of Missing Values Due to Each Variable
P1 ST P8
18 0 31
Logistic Regression Model
lrm(formula = P1 ~ ST + P8 + ST * P8, data = PS, x = TRUE,
y = TRUE)
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 362 LR chi2 18.34 R2 0.077 C 0.652
0 287 d.f. 9 g 0.664 Dxy 0.304
1 75 Pr(> chi2) 0.0314 gr 1.943 gamma 0.311
max |deriv| 8e-08 gp 0.099 tau-a 0.100 Brier 0.155
Coef S.E. Wald Z Pr(>|Z|)
Intercept -0.5509 0.3388 -1.63 0.1040
ST= 2 -0.5688 0.4568 -1.25 0.2131
ST= 3 -0.7654 0.4310 -1.78 0.0757
ST= 4 -0.7995 0.5229 -1.53 0.1263
ST= 5 -1.2813 0.4276 -3.00 0.0027
P8 0.2162 0.4189 0.52 0.6058
ST= 2 * P8 -0.1527 0.5128 -0.30 0.7659
ST= 3 * P8 -0.0461 0.5130 -0.09 0.9285
ST= 4 * P8 -0.5031 0.5635 -0.89 0.3719
ST= 5 * P8 0.3661 0.4734 0.77 0.4393
In sum, my question is: 1) package/software that is capable of handling multiply imputed data to complete a traditional binary logistic regression analysis, esp with interaction term 2) possible steps I need to take to do run the analysis in that program
The rms package has great features for combining multiply imputed data using the fit.mult.impute() function. Here is a small working example:
dat <- mtcars
## introduce NAs
dat[sample(rownames(dat), 10), "cyl"] <- NA
im <- aregImpute(~ cyl + wt + mpg + am, data = dat)
fit.mult.impute(am ~ cyl + wt + mpg, xtrans = im, data = dat, fitter = lrm)
I'm trying to run a rmANOVA and a corresponding regression model. In the experiment participants were completing a questionnaire which was evaluating how much of a trait X they have (score). Then they were performing a task, in which each participant was exposed to three conditions (COND - nSCM, SCM, SC). Their brain responses were measured (ERP).
This is how it looks like:
> head(df)
code SEX AGE SCORE COND ERP
1 AA1407 male 29 14 nSCM -3.0348373
2 AN0312 male 26 13 nSCM -1.8799240
3 BR1410 male 23 30 nSCM 0.4284033
4 EZ2404 male 23 23 nSCM -0.7615117
5 HA1012 female 27 22 nSCM -2.9301698
6 HS3004 male 30 16 nSCM -0.5468492
Since I am a bit confused about how to use different types of variables in R, maybe someone could also reassure me about the following:
> sapply(df,class)
code SEX AGE SCORE COND ERP
"factor" "factor" "numeric" "numeric" "factor" "numeric"
Based on the experimental design, the ANOVA design has one between-subject IV: SCORE, one within-subject IV: COND and the DV is ERP (right?).
This is the model I used and the summary:
> anERP <- aov(ERP ~ COND*SCORE, data = df)
> summary(anERP)
Df Sum Sq Mean Sq F value Pr(>F)
COND 2 0.21 0.105 0.027 0.9736
SCORE 1 16.87 16.868 4.297 0.0419 *
COND:SCORE 2 0.58 0.289 0.074 0.9291
Residuals 69 270.85 3.925
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
So, IF this is right (please let me know if anything doesn't seem right), I should also find an effect for SCORE when I build a regression model, right? Also, I'm not sure how to interpret this effect, since AQ is an interval variable (scores in range 6-35). I would appreciate a little help here.
Now I'm very confused about how this model should look like for regression. I started with simple lm model with SCORE and COND as fixed effects:
> lmERP <- lm(ERP ~ SCORE*COND, data = df)
> summary(lmERP)
Call:
lm(formula = ERP ~ SCORE * COND, data = df)
Residuals:
Min 1Q Median 3Q Max
-5.2554 -1.0916 0.1975 1.4582 3.3097
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.04269 1.06193 -2.865 0.00552 **
SCORE 0.06458 0.05229 1.235 0.22108
CONDSCM -0.08141 1.50180 -0.054 0.95693
CONDnSCM 0.36646 1.50180 0.244 0.80795
SCORE:CONDSCM 0.01111 0.07396 0.150 0.88104
SCORE:CONDnSCM -0.01707 0.07396 -0.231 0.81814
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.981 on 69 degrees of freedom
Multiple R-squared: 0.0612, Adjusted R-squared: -0.006827
F-statistic: 0.8997 on 5 and 69 DF, p-value: 0.4864
However, here the main effect of SCORE doesn't reach significance. How is it possible? Shouldn't rmANOVA and regression show roughly similar results (or at least the main effects)?
I guess I'm not applying the right linear model here, since it doesn't seem to recognise there are both within and between subject factors in the design.
I have read hundreds of webpages, tutorials and forums and I'm still completely confused about these models. Thank you in advance for any piece of advice!
Repeated-measures or mixed-model designs can be very confusing to specify using R's base aov function. In the code you have written, for example, aov will treat all the specified factors as independent (i.e., between-subject). I highly recommend using a library that makes it easier to specify these types of designs.
The ez library contains ezANOVA, which makes these tests simple to perform, provided that all your cases are complete (all factors are fully crossed, with no missing data). Assuming that your CODE column uniquely identifies each subject and you wanted to include all factors from your data set, the test would look something like this:
my.aov <- ezANOVA(data = df, dv = ERP, wid = CODE, between = .(SEX, AGE, SCORE), within = COND).
It is also possible to implement these designs with the lme4 package (in fact, ezANOVA is a wrapper around lme4's functions). While lme4 allows for more flexible model specifications and can tolerate incomplete data, its syntax is more difficult. Bodo Winter's tutorial on lme4 is a good start, if you want to go really deep.
As an aside, there is usually little point in performing both an ANOVA and a linear regression. Unless the two tests are specified in a way that treats the factors differently, the results will be equivalent.