I am trying to run 2x2 mixed ANOVA with unequal sample size using R. The data was collected from 30 individuals with two different conditions (i.e., 2 levels of within factor), and they were allocated by k-means clustering analysis (i.e., # of group 1 = 11, # of group 2 = 19). Here is a sample of my code and the output:
summary(aov(JH ~ Box+Group+Box:Group+Error(P/Box), data = d3))
Error: P
Df Sum Sq Mean Sq F value Pr(>F)
Group 1 0.027 0.02715 1.56 0.22
Box:Group 1 0.001 0.00078 0.04 0.83
Residuals 27 0.470 0.01741
Error: P:Box
Df Sum Sq Mean Sq F value Pr(>F)
Box 1 0.000000 1.00e-07 0.00 0.97
Group 1 0.000022 2.17e-05 0.24 0.63
Box:Group 1 0.000032 3.24e-05 0.35 0.56
Residuals 27 0.002488 9.21e-05
Unlike the output when I ran 2x2 mixed ANOVA with equal sample sizes using another dataset (the output attached below), I had another interaction effect under Error: P.
summary(aov(HipLoadingK ~ Box*Sex+Error(P/Box), data = TW))
Error: P
Df Sum Sq Mean Sq F value Pr(>F)
Sex 1 0.02578 0.025779 8.038 0.00841 **
Residuals 28 0.08980 0.003207
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: P:Box
Df Sum Sq Mean Sq F value Pr(>F)
Box 1 0.003454 0.003454 4.271 0.0481 *
Box:Sex 1 0.000066 0.000066 0.082 0.7765
Residuals 28 0.022645 0.000809
Is this result fine as long as I interpret appropriate results for each between/within main effects and interaction effect? Or should I edit my code to get the correct outputs? If so, please let me know what should be added or edited in my code.
When I change the dataset from a short format to a long format, I accidentally put the different order of group name for each participant. For instance, P01 was group A in X condition, but his or her group was B in Y condition. After I fixed the code, it worked well.
I am having trouble discerning the error strata output from a repeated measures anova in R, as they appear to be doing something funky. I have a repeated measures anova where each participant gets a score for two different roles and a score for two different valences (Role and Valence are dichotomous categorical within subjects factors), and I am including them in a model that has Gender as a between subjects factor (also dichotomous categorical).
My model is as follows:
summary(aov(data = data,
score ~ Role * Valence * Gender + Error(Subject_ID / (Role*Valence)))
The output looks unusual:
summary(aov(data = data,
+ score ~ Role * Valence * Gender + Error(Subject_ID / (Role*Valence))))
Error: Subject_ID
Df Sum Sq Mean Sq
Gender 1 0.06647 0.06647
Error: Valence
Df Sum Sq Mean Sq
Valence 1 6.774 6.774
Error: Subject_ID:Role
Df Sum Sq Mean Sq
Role 1 0.04595 0.04595
Error: Subject_ID:Valence
Df Sum Sq Mean Sq
Valence:Gender 1 0.06981 0.06981
Error: Subject_ID:Role:Valence
Df Sum Sq Mean Sq
Role:Valence 1 1.329 1.329
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Role 1 0.00 0.0000 0.000 0.9986
Gender 1 0.08 0.0781 0.382 0.5371
Role:Valence 1 0.65 0.6457 3.159 0.0767
Role:Gender 1 0.04 0.0354 0.173 0.6777
Valence:Gender 1 0.04 0.0443 0.217 0.6420
Role:Valence:Gender 1 0.24 0.2447 1.197 0.2749
Residuals 252 51.50 0.2044
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I don't understand why the F and P values only displayed for the within error strata, why my main effects are appearing twice (e.g., Role under within and Subject_ID:Role), or why my between subjects variable is showing up in the Within error strata. I'm not sure how to begin troubleshooting this, so any insights would be greatly appreciated.
I am running a manova and my code is something like this:
fit <- manova(y ~ grp + cov)
summary(fit)
So the output of summary is:
Df Pillai approx F num Df den Df Pr(>F)
grp 2 0.185330 5.6511 6 332 1.322e-05 ***
age 1 0.110497 6.8323 3 165 0.0002284 ***
fd 1 0.153049 9.9388 3 165 4.646e-06 ***
scan 1 0.037374 2.1354 3 165 0.0977272 .
Residuals 167
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I am having trouble finding a way to extract the 6th element of the 1st column (the p value) so that I can create a conditional if loop to run univariate and post hoc tests on significant models. Hoping to have something like:
p <- [method of obtaining p value(1.322e-05) in this case]
if (p < 0.017){
for (i in 1:length(dependentvars){
univariate tests with aov()
p.uni <- univariatetestp(same thing as before with obtaining p value)
if (p.uni <-){
summary(glht(....
Any advice or direction is greatly appreciated.
Thank you
-J
What is the difference between the aov(depvar~timevar+Error(id)) and the aov(depvar~timevar+Error(id/timevar)) formula specifications? These two variants produce slightly different results.
The same question was once asked here: https://stats.stackexchange.com/questions/60108/how-to-write-the-error-term-in-repeated-measures-anova-in-r
However, I'd like to repeat it with a more appropriate example.
Here is an example that I created:
var=rep(NA,180)
id=rep(1:20,each=180/20)
group=rep(rep(1:2,each=9),180/(9*2))
time1=rep(rep(1:3,each=3),180/(3*3))
time2=rep(c(8,15,20),180/3)
var[group==1&time1==1&time2==8]=runif(10,105,115)
var[group==2&time1==1&time2==8]=runif(10,105,115)
var[group==1&time1==1&time2==15]=runif(10,95,105)
var[group==2&time1==1&time2==15]=runif(10,95,105)
var[group==1&time1==1&time2==20]=runif(10,85,95)
var[group==2&time1==1&time2==20]=runif(10,85,95)
var[group==1&time1==2&time2==8]=runif(10,95,105)
var[group==2&time1==2&time2==8]=runif(10,95,105)
var[group==1&time1==2&time2==15]=runif(10,85,95)
var[group==2&time1==2&time2==15]=runif(10,75,85)
var[group==1&time1==2&time2==20]=runif(10,75,85)
var[group==2&time1==2&time2==20]=runif(10,65,75)
var[group==1&time1==3&time2==8]=runif(10,95,105)
var[group==2&time1==3&time2==8]=runif(10,95,105)
var[group==1&time1==3&time2==15]=runif(10,85,95)
var[group==2&time1==3&time2==15]=runif(10,75,85)
var[group==1&time1==3&time2==20]=runif(10,75,85)
var[group==2&time1==3&time2==20]=runif(10,65,75)
df=data.frame(id,var,group,time1,time2)
df$id=factor(df$id)
df$group=factor(df$group)
df$time1=factor(df$time1)
df$time2=factor(df$time2)
Performing aov() on this gets slightly different results depending on Error() term specification:
Just for one time term:
> summary(aov(var~time1+Error(id),data=df))
Error: id
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 19 958.4 50.44
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
time1 2 7538 3769 30.41 6.72e-12 ***
Residuals 158 19584 124
> summary(aov(var~time1+Error(id/time1),data=df))
Error: id
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 19 958.4 50.44
Error: id:time1
Df Sum Sq Mean Sq F value Pr(>F)
time1 2 7538 3769 211.5 <2e-16 ***
Residuals 38 677 18
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 120 18907 157.6
Or for both time terms (don't type output here for the sake of space, your may check it on your own):
summary(aov(var~group*time1*time2+Error(id/(group*time1*time2)),data=df))
summary(aov(var~group*time1*time2+Error(id),data=df))
Why does it happen? Which variant is correct?
Here's a blog post that'll help break down what each means under the "Random Effects in Classical ANOVA" section.
From the blog, here's a summary for what "dividing" in the Error term means.
aov(Y ~ Error(A), data=d) # Lone random effect
aov(Y ~ B + Error(A/B), data=d) # A random, B fixed, B nested within A
aov(Y ~ (B*X) + Error(A/(B*X)), data=d) # B and X interact within levels of A
So from your question,
aov(depvar~timevar+Error(id/timevar))
means you have a random effect from id but then fix timevar with timevar nested within id levels versus
aov(depvar~timevar+Error(id))
which is just taking id as the random effects with no constraint on other variables.
Source: http://conjugateprior.org/2013/01/formulae-in-r-anova/
This might prove useful as well, which is some code going over analysis of variance that has some recommendations on learning ANOVA.
The difference between aov(depvar~timevar+Error(id)) and aov(depvar~timevar+Error(id/timevar)) is whether or not you include timevar as a random effect.
Note that there's more than one way to include a variable as a random effect. You could also use aov(depvar~timevar+Error(id*timevar)) or aov(depvar~timevar+Error(id + timevar)) as well. Each of these means something quite different, but it can be confusing because they'll often give you similar results when applied to the same dataset, due to the constraints of the data themselves.
The slash / used in aov() denotes nesting. When you use /, R automatically expands it to the main effect of the bottom variable plus the interaction between the bottom and the top. For example, A/B automatically expands to A + A:B. This is similar to how A*B automatically expands to A + B + A:B, but with nesting, the variable in the nest never appears outside of its nest (i.e. there can be no main effect of B on its own).
You can see this expansion happening in your output:
> summary(aov(var~time1+Error(id / time1)))
Error: id
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 1 52.24 52.24
Error: id:time1
Df Sum Sq Mean Sq
time1 1 4291 4291
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
time1 1 1239 1238.7 10.19 0.00167 **
Residuals 176 21399 121.6
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The Error terms denote random effects. Notice you get one for the main effect of id because it's the base of the nest, and one for the interaction between id and time1, because time1 is nested within id (you also get an Error effect for Within which is the basic residual term for the model, i.e. the random effect of the individual observations themselves).
So what's the correct approach for your data?
It depends on 1) how your data are actually structured and 2) what model you intend to run. Note: There's no definitive test you can run on the data to determine the structure or the correct model; this is a thinking exercise rather than a computational one.
In the example models you provided, you have an outcome var, and then what appear to be grouping variables group and id, and then two time variables time1 and time2. Each id is only in 1 group, not across both groups, suggesting that id is nested within group.
> table(group, id)
id
group 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 9 0 9 0 9 0 9 0 9 0 9 0 9 0 9 0 9 0 9 0
2 0 9 0 9 0 9 0 9 0 9 0 9 0 9 0 9 0 9 0 9
I'll assume that id refers to a single participant, and the 9 measurements on time1 and time2 are within-subjects tests on each participant (i.e. each participant was measured 9 times on var, so this is a repeated measures design).
To make it concrete, let's say var is a score on some problem solving task, and time1 and time2 are the minutes participants are allowed to study the problem and the amount of time they're given to complete the problem, respectively. Since time1 and time2 are crossed, each participant completes the task 9 times, under each combination of circumstances.
> table(time1, time2)
time2
time1 8 15 20
1 20 20 20
2 20 20 20
3 20 20 20
> table(time1, time2, id)
, , id = 1
time2
time1 8 15 20
1 1 1 1
2 1 1 1
3 1 1 1
, , id = 2
time2
time1 8 15 20
1 1 1 1
2 1 1 1
3 1 1 1
(output truncated)
Participants are tested in groups, with half of the participants in group 1 and the other half in group 2. Perhaps the study was run in classrooms, and group 1 is one class and group 2 is the second class. Probably, group identity is not actually a variable of interest, but we shouldn't leave it out of the model because there may be some nuisance variance resulting from differences between the groups. For example, maybe the first classroom had better lighting, giving all of the members of group 1 and better chance at scoring well on the puzzles than the members of group 2.
Scores, ID, and Group should all be random effects, and time1 and time2 should be fixed effects (note this could vary for the same data if you had different thoughts in the model; e.g. you may want to consider group as fixed depending on your research question).
Given that model, this would be the most complete version of the model, using aov():
aov(var~time1*time2 + Error(group/id/(time1*time2)),data=df)
Here's the output:
> summary(aov(var~time1*time2 + Error(group/id/(time1*time2)),data=df))
Error: group
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 1 771.7 771.7
Error: group:id
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 18 243.8 13.55
Error: group:id:time1
Df Sum Sq Mean Sq F value Pr(>F)
time1 2 7141 3571 181.6 <2e-16 ***
Residuals 38 747 20
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: group:id:time2
Df Sum Sq Mean Sq F value Pr(>F)
time2 2 16353 8176 434.6 <2e-16 ***
Residuals 38 715 19
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: group:id:time1:time2
Df Sum Sq Mean Sq F value Pr(>F)
time1:time2 4 214.5 53.63 5.131 0.00103 **
Residuals 76 794.3 10.45
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Warning message:
In aov(var ~ time1 * time2 + Error(group/id/(time1 * time2)), data = df) :
Error() model is singular
(Along with the links above, here is some additional guidance on random vs. fixed effects)
I'm quite new to R but I've tried recently to make a two way repeated measures ANOVA, to replicate the results that my supervisor did on SPSS.
I've struggled for days and read dozens of articles to understand what was going on in R, but I still don't get the same results.
> mod <- lm(Y~A*B)
> Anova(mod, type="III")
Anova Table (Type III tests)
Response: Y
Sum Sq Df F value Pr(>F)
(Intercept) 0.000 1 0.0000 1.00000
A 2.403 5 8.6516 4.991e-08 ***
B 0.403 2 3.6251 0.02702 *
A:B 1.220 10 2.1962 0.01615 *
Residuals 51.987 936
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
My data are from a balanced design, I used the type III SS since it's the one used in SPSS as well. The Sum squares the Df, and the linear model are the same in SPSS, the only things that differ being the F and p value. Thus, it should not be a Sum square mistake.
Results in SPSS are:
F Sig.
A 7.831 .000
B 2.681 .073
A:B 2.247 .014
I'm a little bit lost. Would it be a problem related to the contrasts?
Lucas