I am looking to run a mixed effects model in R based on how I used to run the stats in SPSS with a repeated measures ANOVA. Here is how I set up the repeated measures ANOVA in SPSS. How would I convert this to lme4 in R?
Key:
EBT100... is the name of the task, Genotype is my IV, and my within-subject factors are Day (5 levels) and Cue (9 levels). Att is my DV.
In R, this is the code that I am trying to run:
In R, here is my code:
lmeModel <- lmer(Att ~ Genotype*Day*Cue + (1|Subject)
My Genotype Effect is the same between R and SPSS (p~0.12), but all of my interactions are different (Genotype x Day, Genotype x Cue, Genotype x Day x Cue).
R (lme4) Output:
Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)
Genotype 488 243.9 2 32 2.272 0.11954
Day 25922 6480.4 4 1408 60.356 < 2.2e-16 ***
Cue 35821 4477.6 8 1408 41.703 < 2.2e-16 ***
Genotype:Day 3646 455.7 8 1408 4.244 4.751e-05 ***
Genotype:Cue 736 46.0 16 1408 0.429 0.97560
Day:Cue 5063 158.2 32 1408 1.474 0.04352 *
Genotype:Day:Cue 3297 51.5 64 1408 0.480 0.99984
SPSS Repeated Measures ANOVA output:
F.value Pr(>F)
Genotype 2.272 0.120
Day 9.603 0.000
Cue 83.916 0.000
Genotype:Day 0.675 0.712
Genotype:Cue 0.863 0.613
Day:Cue 3.168 0.00
Genotype:Day:Cue 1.031 0.411
You can see that the main effect of Genotype is the same for both R and SPSS. Additionally, in R, my DenDF output is not correct either. Any idea as to why this would be?
Even more...
Using ezANOVA, with the same dataset that I am using for lme4, this is my code:
anova <- ezANOVA(data = dat,
wid = Subject,
dv = Att,
within = .(Day, Cue),
between = Genotype,
type = 3)
ezANOVA Output:
Effect DFn DFd F p p<.05 ges
2 Genotype 2 32 2.2715034 1.195449e-01 0.044348362
3 Day 4 128 9.6034152 8.003233e-07 * 0.103474748
5 Cue 8 256 83.9162989 3.938364e-67 * 0.137556761
4 Genotype:Day 8 128 0.6753544 7.124675e-01 0.015974029
6 Genotype:Cue 16 256 0.8624463 6.133218e-01 0.003267726
7 Day:Cue 32 1024 3.1679308 1.257738e-08 * 0.022046134
8 Genotype:Day:Cue 64 1024 1.0313631 4.115000e-01 0.014466102
How can I convert ezANOVA to lme4?
Any information would be greatly appreciated!
Thank you!
First off: It would be very beneficial and instructive if you could share your data, which allows for an easier comparison of lmer results with those from SPSS/ezANOVA.
Personally I prefer mixed effect (i.e. hierarchical) models as I find them easier to understand (and construct), so I am not that familiar with repeated measure ANOVA. Translating the latter into the former boils down to correctly translating within/between effects of your RM-ANOVA into the appropriate terms of your lmer mixed-effect model.
Provided I understood you correctly, the following seems consistent with your model problem statement:
Genotype is your fixed effect
Subject is your random (grouping or blocking) effect
Day is a within-Subject effect
Cue is a within-Subject effect
The corresponding lmer model should look something like this:
lmer(Obs ~ Genotype * Day * Cue + (Day:Cue|Subject)
If this is not tractable, you should try
lmer(Obs ~ Genotype * Day * Cue + (Day|Subject) + (Cue|Subject) + (1|Subject)
Related
I am trying to MuMIn::dredge linear mixed-effect models lme4::lmer with categorical/continuous variables, the code is as follows:
# Selection of variables of interest
sig<-c("Age", "Sex", "BMI", "(1|HID)", "h_age", "h", "h_g", "smk_hs")
# Model formula
formula<-paste0("log10_PBA_N", "~", paste0(c(sig), collapse="+"))
# Global model
model<-lmer(formula, data=data)
# Dredging
DRG<-dredge(global.model=model)
The code runs fine (I guess), but in the results, I have this:
Global model call: lmer(formula = formula, data = data)
---
Model selection table
(Int) Age BMI h h_age h_g Sex smk_hs df logLik AICc delta weight
2 -0.2363 -0.01421 4 -332.476 673.0 0.00 0.847
66 -0.2461 -0.01420 + 5 -333.689 677.5 4.47 0.090
34 -0.2406 -0.01417 + 5 -334.508 679.2 6.11 0.040
4 -0.3348 -0.01598 0.007096 5 -335.935 682.0 8.96 0.010
18 -0.1553 -0.01421 + 7 -334.310 682.9 9.84 0.006
98 -0.2493 -0.01416 + + 6 -335.723 683.6 10.60 0.004
68 -0.3463 -0.01599 0.007206 + 6 -337.140 686.5 13.43 0.001
Can someone please explain to me, what does the "+" sign mean in the results?
I recently had the exact same question and was struggling to find an answer. However, based on a response to a similar question asked on R Studio Community, I think the answer is simply that a '+' sign means that a given categorical variable term is included as significant in that particular model.
So, looking at your table, the first model only includes the intercept, the second includes the intercept and the smk_hs categorical variable, the third includes the intercept and the Sex variable, etc.
I have a very simple stat question probably.
So, I am fitting linear mixed models like this:
lme(dependent ~ Group + Sex + Age + npgs, data=boookclub, random = ~ 1| subject)
Group is a factor variable with levels = 0, 1 , 2 , 3
The dependent are continuous variables standardized (mean 0) and the others are covariates with sex being factor, with Male/Female levels, Age being numerical, and npgs being numerical continuous standardized as well.
When I get the table with beta, standard error, t and p values, I get this:
Value Std.Error DF t-value p-value
(Intercept) -0.04550502 0.02933385 187 -1.551280 0.0025
Group1 0.04219801 0.03536929 181 1.193069 0.2344
Group2 0.03350827 0.03705896 181 0.904188 0.3671
Group3 0.00192119 0.03012654 181 0.063771 0.9492
SexMale 0.03866387 0.05012901 181 0.771287 0.4415
Age -0.00011675 0.00148684 181 -0.078520 0.9375
npgs 0.15308844 0.01637163 181 9.350835 0.0000
SexMale:Age 0.00492966 0.00276117 181 1.785352 0.0759
My problem is: how do I get the beta of Group0? In this case the intercept is Group0 but also the average of npgs, being npgs standardized. How do I get the Beta of Group0? And how can I check if Group0 is significantly associated to the dependent? I'd like to see the effect of all Group levels.
Thanks
The easiest way to do what you want may be with the emmeans package, but you may also have some conceptual issues. Technical details first, then conceptual:
Technical
Fitting an example (this isn't necessarily statistically sensible, but I wanted an example with a categorical fixed effect)
library(nlme)
m1 <- lme(Yield~Variety, random = ~1|Block, data=Alfalfa)
As with your example, the effects are "intercept" (= mean of the baseline group, which is the "Cossack" variety in this case [by default, the alphabetically-first group]), "Ladak" (difference between Ladak and Cossack means) and "Ranger" (similarly). (As #Ben hints in the comments above, R automatically generates dummies for [most of] the levels of the categorical variables [factors] in your model.)
coef(summary(m1))
## Value Std.Error DF t-value p-value
## (Intercept) 1.57166667 0.11665326 64 13.4729767 2.373343e-20
## VarietyLadak 0.09458333 0.07900687 64 1.1971532 2.356624e-01
## VarietyRanger -0.01916667 0.07900687 64 -0.2425949 8.090950e-01
The emmeans package is a convenient way to see predicted values for each group without recoding.
library(emmeans)
emmeans(m1, spec = ~Variety)
## Variety emmean SE df lower.CL upper.CL
## Cossack 1.57 0.117 5 1.27 1.87
## Ladak 1.67 0.117 5 1.37 1.97
## Ranger 1.55 0.117 5 1.25 1.85
Conceptual
You can't "check if Group0 is significantly associated with the dependent [response] variable". You can only check whether the response variables differs significantly between two groups, or whether it differs significantly among all groups (e.g. the results of anova()). You have to pick a baseline. (If you insist, you can test all pairwise comparisons among groups; emmeans can help with this too.) If you "remove the intercept" (by fitting Variety ~Yield-1, or by looking at the results that emmeans produces) then the difference you are quantifying is the difference between the mean of a particular group and zero. This is usually not a meaningful question; in the example here, for instance, this would be testing whether a wheat variety gave a yield that was significantly greater than zero — probably not very interesting.
On the other hand, if you are just interested in estimating the expected value in each group (conditioning on the baseline values of the other variables in the model), along with the standard errors/CIs, then the answers you get from emmeans are perfectly sensible.
There's a related question here that explains why you get an NA value if you manually create dummies for every level of your factor ...
I have been trying to work with options available within R (i.e. MICE) to do binary logistic regression analyses (with interaction between continuous and categorical predictors).
However, I am struggling to carry out this simple analysis on multiply imputed data (details and reproducible example here).
Specifically, I have not been able to figure out a way to pool every aspect of the output including an equivalence of 'log likelihood ratio' using the GLM function of Mice.
To avoid redundancy from a previous post, I am seeking ANY suggestions for R packages or other softwares that may make it easy/possible to pool all essential components of the output for binary logistic regression (i.e. equivalent of model likelihood ratio test, regression coefficients, wald test). See below for an example that I was able to obtain using rms on a non-imputed data (could not figure out a way to run this on multiply imputed data)
> mylogit
Frequencies of Missing Values Due to Each Variable
P1 ST P8
18 0 31
Logistic Regression Model
lrm(formula = P1 ~ ST + P8 + ST * P8, data = PS, x = TRUE,
y = TRUE)
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 362 LR chi2 18.34 R2 0.077 C 0.652
0 287 d.f. 9 g 0.664 Dxy 0.304
1 75 Pr(> chi2) 0.0314 gr 1.943 gamma 0.311
max |deriv| 8e-08 gp 0.099 tau-a 0.100 Brier 0.155
Coef S.E. Wald Z Pr(>|Z|)
Intercept -0.5509 0.3388 -1.63 0.1040
ST= 2 -0.5688 0.4568 -1.25 0.2131
ST= 3 -0.7654 0.4310 -1.78 0.0757
ST= 4 -0.7995 0.5229 -1.53 0.1263
ST= 5 -1.2813 0.4276 -3.00 0.0027
P8 0.2162 0.4189 0.52 0.6058
ST= 2 * P8 -0.1527 0.5128 -0.30 0.7659
ST= 3 * P8 -0.0461 0.5130 -0.09 0.9285
ST= 4 * P8 -0.5031 0.5635 -0.89 0.3719
ST= 5 * P8 0.3661 0.4734 0.77 0.4393
In sum, my question is: 1) package/software that is capable of handling multiply imputed data to complete a traditional binary logistic regression analysis, esp with interaction term 2) possible steps I need to take to do run the analysis in that program
The rms package has great features for combining multiply imputed data using the fit.mult.impute() function. Here is a small working example:
dat <- mtcars
## introduce NAs
dat[sample(rownames(dat), 10), "cyl"] <- NA
im <- aregImpute(~ cyl + wt + mpg + am, data = dat)
fit.mult.impute(am ~ cyl + wt + mpg, xtrans = im, data = dat, fitter = lrm)
I'm trying to run a rmANOVA and a corresponding regression model. In the experiment participants were completing a questionnaire which was evaluating how much of a trait X they have (score). Then they were performing a task, in which each participant was exposed to three conditions (COND - nSCM, SCM, SC). Their brain responses were measured (ERP).
This is how it looks like:
> head(df)
code SEX AGE SCORE COND ERP
1 AA1407 male 29 14 nSCM -3.0348373
2 AN0312 male 26 13 nSCM -1.8799240
3 BR1410 male 23 30 nSCM 0.4284033
4 EZ2404 male 23 23 nSCM -0.7615117
5 HA1012 female 27 22 nSCM -2.9301698
6 HS3004 male 30 16 nSCM -0.5468492
Since I am a bit confused about how to use different types of variables in R, maybe someone could also reassure me about the following:
> sapply(df,class)
code SEX AGE SCORE COND ERP
"factor" "factor" "numeric" "numeric" "factor" "numeric"
Based on the experimental design, the ANOVA design has one between-subject IV: SCORE, one within-subject IV: COND and the DV is ERP (right?).
This is the model I used and the summary:
> anERP <- aov(ERP ~ COND*SCORE, data = df)
> summary(anERP)
Df Sum Sq Mean Sq F value Pr(>F)
COND 2 0.21 0.105 0.027 0.9736
SCORE 1 16.87 16.868 4.297 0.0419 *
COND:SCORE 2 0.58 0.289 0.074 0.9291
Residuals 69 270.85 3.925
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
So, IF this is right (please let me know if anything doesn't seem right), I should also find an effect for SCORE when I build a regression model, right? Also, I'm not sure how to interpret this effect, since AQ is an interval variable (scores in range 6-35). I would appreciate a little help here.
Now I'm very confused about how this model should look like for regression. I started with simple lm model with SCORE and COND as fixed effects:
> lmERP <- lm(ERP ~ SCORE*COND, data = df)
> summary(lmERP)
Call:
lm(formula = ERP ~ SCORE * COND, data = df)
Residuals:
Min 1Q Median 3Q Max
-5.2554 -1.0916 0.1975 1.4582 3.3097
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.04269 1.06193 -2.865 0.00552 **
SCORE 0.06458 0.05229 1.235 0.22108
CONDSCM -0.08141 1.50180 -0.054 0.95693
CONDnSCM 0.36646 1.50180 0.244 0.80795
SCORE:CONDSCM 0.01111 0.07396 0.150 0.88104
SCORE:CONDnSCM -0.01707 0.07396 -0.231 0.81814
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.981 on 69 degrees of freedom
Multiple R-squared: 0.0612, Adjusted R-squared: -0.006827
F-statistic: 0.8997 on 5 and 69 DF, p-value: 0.4864
However, here the main effect of SCORE doesn't reach significance. How is it possible? Shouldn't rmANOVA and regression show roughly similar results (or at least the main effects)?
I guess I'm not applying the right linear model here, since it doesn't seem to recognise there are both within and between subject factors in the design.
I have read hundreds of webpages, tutorials and forums and I'm still completely confused about these models. Thank you in advance for any piece of advice!
Repeated-measures or mixed-model designs can be very confusing to specify using R's base aov function. In the code you have written, for example, aov will treat all the specified factors as independent (i.e., between-subject). I highly recommend using a library that makes it easier to specify these types of designs.
The ez library contains ezANOVA, which makes these tests simple to perform, provided that all your cases are complete (all factors are fully crossed, with no missing data). Assuming that your CODE column uniquely identifies each subject and you wanted to include all factors from your data set, the test would look something like this:
my.aov <- ezANOVA(data = df, dv = ERP, wid = CODE, between = .(SEX, AGE, SCORE), within = COND).
It is also possible to implement these designs with the lme4 package (in fact, ezANOVA is a wrapper around lme4's functions). While lme4 allows for more flexible model specifications and can tolerate incomplete data, its syntax is more difficult. Bodo Winter's tutorial on lme4 is a good start, if you want to go really deep.
As an aside, there is usually little point in performing both an ANOVA and a linear regression. Unless the two tests are specified in a way that treats the factors differently, the results will be equivalent.
I perform following ezANOVA:
RMANOVAGHB1 <- ezANOVA(GHB1, dv=DIF.SCORE.STARTLE, wid=RAT.ID, within=TRIAL.TYPE, between=GROUP, detailed = TRUE, return_aov = TRUE)
My dataset looks like this:
RAT.ID DIF.SCORE.STARTLE GROUP TRIAL.TYPE
1 1 170.73 SAL TONO
2 1 80.07 SAL NOAL
3 2 456.40 PROP TONO
4 2 290.40 PROP NOAL
5 3 507.20 SAL TONO
6 3 261.60 SAL NOAL
7 4 208.67 PROP TONO
8 4 137.60 PROP NOAL
9 5 500.50 SAL TONO
10 5 445.73 SAL NOAL
up until rat.id 16.
My supervisors don't work with R, so they can't help me. I need code that will give me all post hoc contrasts, but looking it up only confuses me more and more.
I already tried to do TukeyHSD on the aov output of ezANOVA and tried pairwise.t.test next (as I found out bonferroni is a more appropriate correction in this case), but none seem to work. I've also found things about using a linear model and then multcomp, but I don't know if that would be a good solution in this case. I feel like the problem with everything I tried is either that I have between and within variables or that my dataset is not set up right. Another complicating factor is that I'm just a beginner with R and my statistical knowledge is still pretty basic as this is one of my first practical experiences with doing analyses.
If it's important, this is the output of the anova:
$ANOVA
Effect DFn DFd SSn SSd F p p<.05 ges
1 (Intercept) 1 14 1233568.9 1076460.9 16.043280 0.001302172 * 0.508451750
2 GROUP 1 14 212967.9 1076460.9 2.769771 0.118272657 0.151521743
3 TRIAL.TYPE 1 14 137480.6 116097.9 16.578499 0.001143728 * 0.103365833
4 GROUP:TRIAL.TYPE 1 14 11007.2 116097.9 1.327335 0.268574391 0.009145489
$aov
Call:
aov(formula = formula(aov_formula), data = data)
Grand Mean: 196.3391
Stratum 1: RAT.ID
Terms:
GROUP Residuals
Sum of Squares 212967.9 1076460.9
Deg. of Freedom 1 14
Residual standard error: 277.2906
1 out of 2 effects not estimable
Estimated effects are balanced
Stratum 2: RAT.ID:TRIAL.TYPE
Terms:
TRIAL.TYPE GROUP:TRIAL.TYPE Residuals
Sum of Squares 137480.6 11007.2 116097.9
Deg. of Freedom 1 1 14
Residual standard error: 91.0643
Estimated effects may be unbalanced
My solution, considering your dataset - first 5 rats:
1. Let's build the linear model:
model.lm = lm(DIF_SCORE_STARTLE ~ GROUP * TRIAL_TYPE, data = dat)
2. Let's chceck the homogeneity of variance (leveneTest) and distribution of our model (Shapiro-Wilk). We are looking for normal distribution and our variance should be homogenic. Two tests for this:
>shapiro.test(resid(model.lm))
Shapiro-Wilk normality test
data: resid(model.lm)
W = 0.91783, p-value = 0.3392
> leveneTest(DIF_SCORE_STARTLE ~ GROUP * TRIAL_TYPE, data = dat)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 3 0.066 0.976
6
Our p-values are higher than 0.05 in both cases so we don't have proof that our variance differs between groups. In case of normality test we can also conclude that the sample doesn't deviate from normality. Summarizing we can use parametrical tests such as ANOVA or pairwise t-test.
3.Yo can also run:
hist(resid(model.lm))
To check how does distribution of our data look like. And check the model:
plot(model.lm)
Here: https://stats.stackexchange.com/questions/58141/interpreting-plot-lm/65864 you'll find interpretation of plots produced by this function. As I saw, data looks fine.
4.Now finally we can do ANOVA test and post hoc HSD test:
> anova(model.lm)
Analysis of Variance Table
Response: DIF_SCORE_STARTLE
Df Sum Sq Mean Sq F value Pr(>F)
GROUP 1 7095 7095 0.2323 0.6469
TRIAL_TYPE 1 39451 39451 1.2920 0.2990
GROUP:TRIAL_TYPE 1 84 84 0.0027 0.9600
Residuals 6 183215 30536
> (result.hsd = HSD.test(model.lm, list('GROUP', 'TRIAL_TYPE')))
$statistics
Mean CV MSerror HSD r.harmonic
305.89 57.12684 30535.91 552.2118 2.4
$parameters
Df ntr StudentizedRange alpha test name.t
6 4 4.895599 0.05 Tukey GROUP:TRIAL_TYPE
$means
DIF_SCORE_STARTLE std r Min Max
PROP:NOAL 214.0000 108.0459 2 137.60 290.40
PROP:TONO 332.5350 175.1716 2 208.67 456.40
SAL:NOAL 262.4667 182.8315 3 80.07 445.73
SAL:TONO 392.8100 192.3561 3 170.73 507.20
$comparison
NULL
$groups
trt means M
1 SAL:TONO 392.8100 a
2 PROP:TONO 332.5350 a
3 SAL:NOAL 262.4667 a
4 PROP:NOAL 214.0000 a
As you see, our 'pairs' have been grouped in one big group a that means that there are not significant difference between them. However there's some difference between NOAL and TONO no matter of SAL and PROP.