2 way repeated ANOVA did not show sphericity test - r

There were 4 levels of treatment over 11 sampling time points. Each treatment had 3 identical systems. And samples were analyzed for different parameters. So I used 2 way repeated measures anova, for example:
ezANOVA(data = WP4, dv = parameter1, wid = System, within = Time, between = Treatment, type=3)
However, it produced anova table without sphericity test on some parameters, like below:
Warning: NaNs producedWarning: NaNs produced$ANOVA
Effect DFn DFd F p p<.05 ges
2 Treatment 3 8 2.255871 1.590810e-01 0.2934692
3 Time 10 80 35.989273 1.524902e-25 * 0.6960297
4 Treatment:Time 30 80 7.574502 2.275032e-13 * 0.5911306
Please let me know if you need more information to crack my problem. Thanks.
I would like to have the complete anova table with $Mauchly's Test for Sphericity and $Sphericity Corrections.

Related

Repeated measures ANOVA and link to mixed-effect models in R

I have a problem when performing a two-way rm ANOVA in R on the following data (link : https://drive.google.com/open?id=1nIlFfijUm4Ib6TJoHUUNeEJnZnnNzO29):
subjectnbr is the id of the subject and blockType and linesTTL are the independent variables. RT2 is the dependent variable
I first performed the rm ANOVA through using ezANOVA with the following code:
ANOVA_RTS <- ezANOVA(
data=castRTs
, dv=RT2
, wid=subjectnbr
, within = .(blockType,linesTTL)
, type = 2
, detailed = TRUE
, return_aov = FALSE
)
ANOVA_RTS
The result is correct (I double-checked using statistica).
However, when I perform the rm ANOVA using the lme function, I do not get the same answer and I have no clue why.
There is my code:
lmeRTs <- lme(
RT2 ~ blockType*linesTTL,
random = ~1|subjectnbr/blockType/linesTTL,
data=castRTs)
anova(lmeRTs)
Here are the outputs of both ezANOVA and lme.
I hope I have been clear enough and have given you all the information needed.
I'm looking forward for your help as I am trying to figure it out for at least 4 hours!
Thanks in advance.
Here is a step-by-step example on how to reproduce ezANOVA results with nlme::lme.
The data
We read in the data and ensure that all categorical variables are factors.
# Read in data
library(tidyverse);
df <- read.csv("castRTs.csv");
df <- df %>%
mutate(
blockType = factor(blockType),
linesTTL = factor(linesTTL));
Results from ezANOVA
As a check, we reproduce the ez::ezANOVA results.
## ANOVA using ez::ezANOVA
library(ez);
model1 <- ezANOVA(
data = df,
dv = RT2,
wid = subjectnbr,
within = .(blockType, linesTTL),
type = 2,
detailed = TRUE,
return_aov = FALSE);
model1;
# $ANOVA
# Effect DFn DFd SSn SSd F p
#1 (Intercept) 1 13 2047405.6654 34886.767 762.9332235 6.260010e-13
#2 blockType 1 13 236.5412 5011.442 0.6136028 4.474711e-01
#3 linesTTL 1 13 6584.7222 7294.620 11.7348665 4.514589e-03
#4 blockType:linesTTL 1 13 1019.1854 2521.860 5.2538251 3.922784e-02
# p<.05 ges
#1 * 0.976293831
#2 0.004735442
#3 * 0.116958989
#4 * 0.020088855
Results from nlme::lme
We now run nlme::lme
## ANOVA using nlme::lme
library(nlme);
model2 <- anova(lme(
RT2 ~ blockType * linesTTL,
random = list(subjectnbr = pdBlocked(list(~1, pdIdent(~blockType - 1), pdIdent(~linesTTL - 1)))),
data = df))
model2;
# numDF denDF F-value p-value
#(Intercept) 1 39 762.9332 <.0001
#blockType 1 39 0.6136 0.4382
#linesTTL 1 39 11.7349 0.0015
#blockType:linesTTL 1 39 5.2538 0.0274
Results/conclusion
We can see that the F test results from both methods are identical. The somewhat complicated structure of the random effect definition in lme arises from the fact that you have two crossed random effects. Here "crossed" means that for every combination of blockType and linesTTL there exists an observation for every subjectnbr.
Some additional (optional) details
To understand the role of pdBlocked and pdIdent we need to take a look at the corresponding two-level mixed effect model
The predictor variables are your categorical variables blockType and linesTTL, which are generally encoded using dummy variables.
The variance-covariance matrix for the random effects can take different forms, depending on the underlying correlation structure of your random effect coefficients. To be consistent with the assumptions of a two-level repeated measure ANOVA, we must specify a block-diagonal variance-covariance matrix pdBlocked, where we create diagonal blocks for the offset ~1, and for the categorical predictor variables blockType pdIdent(~blockType - 1) and linesTTL pdIdent(~linesTTL - 1), respectively. Note that we need to subtract the offset from the last two blocks (since we've already accounted for the offset).
Some relevant/interesting resources
Pinheiro and Bates, Mixed-Effects Models in S and S-PLUS, Springer (2000)
Potvin and Schutz, Statistical power for the two-factor
repeated measures ANOVA, Behavior Research Methods, Instruments & Computers, 32, 347-356 (2000)
Deming Mi, How to understand and apply
mixed-effect models, Department of Biostatistics, Vanderbilt university

R post hoc comparisons of ezANOVA

I perform following ezANOVA:
RMANOVAGHB1 <- ezANOVA(GHB1, dv=DIF.SCORE.STARTLE, wid=RAT.ID, within=TRIAL.TYPE, between=GROUP, detailed = TRUE, return_aov = TRUE)
My dataset looks like this:
RAT.ID DIF.SCORE.STARTLE GROUP TRIAL.TYPE
1 1 170.73 SAL TONO
2 1 80.07 SAL NOAL
3 2 456.40 PROP TONO
4 2 290.40 PROP NOAL
5 3 507.20 SAL TONO
6 3 261.60 SAL NOAL
7 4 208.67 PROP TONO
8 4 137.60 PROP NOAL
9 5 500.50 SAL TONO
10 5 445.73 SAL NOAL
up until rat.id 16.
My supervisors don't work with R, so they can't help me. I need code that will give me all post hoc contrasts, but looking it up only confuses me more and more.
I already tried to do TukeyHSD on the aov output of ezANOVA and tried pairwise.t.test next (as I found out bonferroni is a more appropriate correction in this case), but none seem to work. I've also found things about using a linear model and then multcomp, but I don't know if that would be a good solution in this case. I feel like the problem with everything I tried is either that I have between and within variables or that my dataset is not set up right. Another complicating factor is that I'm just a beginner with R and my statistical knowledge is still pretty basic as this is one of my first practical experiences with doing analyses.
If it's important, this is the output of the anova:
$ANOVA
Effect DFn DFd SSn SSd F p p<.05 ges
1 (Intercept) 1 14 1233568.9 1076460.9 16.043280 0.001302172 * 0.508451750
2 GROUP 1 14 212967.9 1076460.9 2.769771 0.118272657 0.151521743
3 TRIAL.TYPE 1 14 137480.6 116097.9 16.578499 0.001143728 * 0.103365833
4 GROUP:TRIAL.TYPE 1 14 11007.2 116097.9 1.327335 0.268574391 0.009145489
$aov
Call:
aov(formula = formula(aov_formula), data = data)
Grand Mean: 196.3391
Stratum 1: RAT.ID
Terms:
GROUP Residuals
Sum of Squares 212967.9 1076460.9
Deg. of Freedom 1 14
Residual standard error: 277.2906
1 out of 2 effects not estimable
Estimated effects are balanced
Stratum 2: RAT.ID:TRIAL.TYPE
Terms:
TRIAL.TYPE GROUP:TRIAL.TYPE Residuals
Sum of Squares 137480.6 11007.2 116097.9
Deg. of Freedom 1 1 14
Residual standard error: 91.0643
Estimated effects may be unbalanced
My solution, considering your dataset - first 5 rats:
1. Let's build the linear model:
model.lm = lm(DIF_SCORE_STARTLE ~ GROUP * TRIAL_TYPE, data = dat)
2. Let's chceck the homogeneity of variance (leveneTest) and distribution of our model (Shapiro-Wilk). We are looking for normal distribution and our variance should be homogenic. Two tests for this:
>shapiro.test(resid(model.lm))
Shapiro-Wilk normality test
data: resid(model.lm)
W = 0.91783, p-value = 0.3392
> leveneTest(DIF_SCORE_STARTLE ~ GROUP * TRIAL_TYPE, data = dat)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 3 0.066 0.976
6
Our p-values are higher than 0.05 in both cases so we don't have proof that our variance differs between groups. In case of normality test we can also conclude that the sample doesn't deviate from normality. Summarizing we can use parametrical tests such as ANOVA or pairwise t-test.
3.Yo can also run:
hist(resid(model.lm))
To check how does distribution of our data look like. And check the model:
plot(model.lm)
Here: https://stats.stackexchange.com/questions/58141/interpreting-plot-lm/65864 you'll find interpretation of plots produced by this function. As I saw, data looks fine.
4.Now finally we can do ANOVA test and post hoc HSD test:
> anova(model.lm)
Analysis of Variance Table
Response: DIF_SCORE_STARTLE
Df Sum Sq Mean Sq F value Pr(>F)
GROUP 1 7095 7095 0.2323 0.6469
TRIAL_TYPE 1 39451 39451 1.2920 0.2990
GROUP:TRIAL_TYPE 1 84 84 0.0027 0.9600
Residuals 6 183215 30536
> (result.hsd = HSD.test(model.lm, list('GROUP', 'TRIAL_TYPE')))
$statistics
Mean CV MSerror HSD r.harmonic
305.89 57.12684 30535.91 552.2118 2.4
$parameters
Df ntr StudentizedRange alpha test name.t
6 4 4.895599 0.05 Tukey GROUP:TRIAL_TYPE
$means
DIF_SCORE_STARTLE std r Min Max
PROP:NOAL 214.0000 108.0459 2 137.60 290.40
PROP:TONO 332.5350 175.1716 2 208.67 456.40
SAL:NOAL 262.4667 182.8315 3 80.07 445.73
SAL:TONO 392.8100 192.3561 3 170.73 507.20
$comparison
NULL
$groups
trt means M
1 SAL:TONO 392.8100 a
2 PROP:TONO 332.5350 a
3 SAL:NOAL 262.4667 a
4 PROP:NOAL 214.0000 a
As you see, our 'pairs' have been grouped in one big group a that means that there are not significant difference between them. However there's some difference between NOAL and TONO no matter of SAL and PROP.

Inflated DF in lsmeans results for an lmer model

I used lmer from the lme4 package to run a linear mixed effects model. I have 3 years of temperature data for untreated (5) and treated plots (10). The model:
modela<-lmer(ave~yr*tr+(1|pl), REML=FALSE, data=mydata)
Model checked for normality of residuals; qqnorm plot
My data:
'data.frame': 6966 obs. of 7 variables:
$ yr : Factor w/ 3 levels "yr1","yr2","yr3": 1 1 1 1 1 1 1 1 1 1 ...
$ pl : Factor w/ 15 levels "C02","C03","C05",..: 1 1 1 1 1 1 1 1 1 1 ...
$ tr : Factor w/ 2 levels "Cont","OTC": 1 1 1 1 1 1 1 1 1 1 ...
$ ave: num 14.8 16.1 11.6 10.3 11.6 ...
The interaction is significant, so I used lsmeans:
lsmeans(modela, pairwise~yr*tr, adjust="tukey")
In the contrasts, I get (excerpts only)
contrast estimate SE df t.ratio p.value
yr1,Cont - yr2,Cont -0.727102895 0.2731808 6947.24 -2.662 0.0832
yr1,OTC - yr2,OTC -0.990574030 0.2015650 6449.10 -4.914 <.0001
yr1,Cont - yr1,OTC -0.005312771 0.3889335 31.89 -0.014 1.0000
yr2,Cont - yr2,OTC -0.268783907 0.3929332 32.97 -0.684 0.9825
My question regards the high dfs for some of the contrasts, and associated, but meaningless low p-values.
Can this be due to:
-presence of NA's in my data set (some improvement when removed)
-unequal sample sizes (e.g. 5 of one treatment, 10 of the other - however, those (yr1,Cont - yr1, OTC) don't seem to be a problem.
Other issues?
I have searched stakoverflow questions, and crossvalidated.
Thanks for any answers, ideas, comments.
In this example, treatments are assigned experimentally to plots. Having small numbers of plots assigned to treatments severely limits the information available to statistically compare the treatments. (If you had only one plot per treatment, it would not even be possible to compare treatments, because you wouldn't be able to sort out the effect of the treatments from the effect of the plots.) You have 10 plots assigned to one treatment and 5 to the other. In terms of the main effect for treatment, you thus have (10-1)+(5-1) = 13 d.f. for the main effect of treatment, and if you do
lsmeans(modela, pairwise ~ tr)
you will see around 13 d.f. (maybe less due to imbalance and missingness) for those statistics. When you compare combinations of years and treatments, you get roughly 3 times the d.f. because there are 3 years. However, in some of those comparisons, year is that same in each combination being compared, and in those comparisons, the variation in plots mostly cancels out (it is a within-plot comparison); and in those cases, the d.f. basically come from the residual error for the model, which has thousands of d.f. Due to imbalances in the data, these comparisons are a little bit polluted by the between-plot variations, making the d.f. somewhat smaller than the residual d.f.
It appears you are not particularly interested in cross-comparisons such as treat1, year1 vs. treat2, year3. I suggest using "by" variables to cut down on the number of comparisons tested, because when you test them all, the multiplicity correction is unnecessarily conservative. It would go something like this:
modela.lsm = lsmeans(modela, ~ tr * yr)
pairs(modela.lsm, by = "yr") # compare tr for each yr
pairs(modela.lsm, by = "tr") # compare yr for each tr
These calls will apply the Tukey correction separately to each "by" group. If you want a multiplicity correction for each whole family, do this:
rbind(pairs(modela.lsm, by = "yr"))
rbind(pairs(modela.lsm, by = "tr"))
By default, a multivariate t correction is used (Tukey is not the right method here). You can even do
rbind(pairs(modela.lsm, by = "yr"), pairs(modela.lsm, by = "tr"))
to group all of the comparisons into one family and apply a multivariate t adjustment.

Equivalent R script for repeated measures SAS script, nested design

I'm having trouble converting an SAS script to the corresponding R script.
The model is a repeated measures analysis of the response (resp) based on treatment (trt) with plot (plot) nested in the treatment.
SAS code:
data data_set;
input trt $ plot time resp;
datalines;
Burn 1 1 27
Burn 1 9 25
Burn 1 12 18
Burn 1 15 21
Burn 2 1 5
Burn 2 9 15
Burn 2 12 10
Burn 2 15 12
...
Unburn 1 1 57
Unburn 1 9 46
Unburn 1 12 49
Unburn 1 15 51
Unburn 2 1 43
Unburn 2 9 59
Unburn 2 12 59
Unburn 2 15 60
proc mixed data = data_set;
class trt plot time;
model resp = trt time trt*time / ddfm = kr;
repeated time / subject = trt(plot) type = vc rcorr;
run;
R code attempted (loading the data set from a CSV file):
library(nlme)
data.set <- read.csv( "data_set.csv" )
data.set$plot <- factor( data.set$plot )
data.set$time <- factor( data.set$time )
model1 <- lme( resp ~ trt + time + trt:time, data = data.set, random = ~1 | plot )
This works, but isn't the desired model. Other attempts I've tried have generally resulted in the error:
Error in getGroups.data.frame(dataMix, groups) :
invalid formula for groups
Basically I'm off in the weeds here...
Question 1: how to specify the same model in R as what is already specified in SAS?
Question 2: I want to be able to change the covariance matrix to replicate other work done in SAS. I believe I know how to do this with the correlation parameter for the lme function. But please correct me if I'm wrong.
Thanks in advance.
The specification of the model in R would logically be:
model1 <- lme( resp ~ trt + time + trt:time, data = data.set, random = ~1 | trt:plot )
This given that plot is nested in treatment per the coding, or alternatively, there is an interaction between plot and treatment. However if specified as such, then it generates the warning mentioned:
Error in getGroups.data.frame(dataMix, groups) : invalid formula
for groups
The problem encountered has to do with the levels introduced (I think) by using such an interaction. Regardless of the exact issue, the problem can be resolved by creating a combined treatment plot predictor variable:
data.set$trtplot <- with( data.set, factor( paste( trt, plot, sep = "." ) ) )
And then performing the analysis as follows:
model1 <- lme( resp ~ trt + time + trt:time, data = data.set, random = ~ 1 | trtplot )
For completeness this could just as easily be the following, where each predictor variable is added plus the interaction:
model1 <- lme( resp ~ trt * time, data = data.set, random = ~ 1 | trtplot )
This then matches results achieved in SAS when a Compound Symmetry (CS) covariance structure is specified (although the AIC criterion is a different - not sure why). So a little different to the SAS code above where a Variance Components (VC) covariance structure is specified, but this is just a matter of changing the structure type in the SAS code.
As for comparing different covariance structures, this appears to be more of a challenge. The covariance structures that I would like to investigate are:
Compound Symmetry (CS) - done
Variance Components(VC)
Unstructured (UN)
Spatial Power (SP)
Any thoughts would be most welcome!

Repeated-measures / within-subjects ANOVA in R

I'm attempting to run a repeated-meaures ANOVA using R. I've gone through various examples on various websites, but they never seem to talk about the error that I'm encountering. I assume I'm misunderstanding something important.
The ANOVA I'm trying to run is on some data from an experiment using human participants. It has one DV and three IVs. All of the levels of all of the IVs are run on all participants, making it a three-way repeated-measures / within-subjects ANOVA.
The code I'm running in R is as follows:
aov.output = aov(DV~ IV1 * IV2 * IV3 + Error(PARTICIPANT_ID / (IV1 * IV2 * IV3)),
data=fulldata)
When I run this, I get the following warning:
Error() model is singular
Any ideas what I might be doing wrong?
Try using the lmer function in the lme4 package. The aov function is probably not appropriate here. Look for references from Dougles Bates, e.g. http://lme4.r-forge.r-project.org/book/Ch4.pdf (the other chapters are great too, but that is the repeated measures chapter, this is the intro: http://lme4.r-forge.r-project.org/book/Ch1.pdf). The R code is at the same place and for longitudinal data, it seems to be generally considered wrong these days to just fit OLS instead of a components of variance model like in the lme4 package, or in nlme, which to me seems to have been wildly overtaken by lme4 in popularity recently. You may note Brian Ripley's referenced post in the comments section above just recommends switching to lme also.
By the way, a huge advantage off the jump is you will be able to get estimates for the level of each effect as adjustments to the grand mean with the typical syntax:
lmer(DV ~ 1 +IV1*IV2*IV3 +(IV1*IV2*IV3|Subject), dataset))
Note your random effects will be vector valued.
I know the answer has been chosen for this post. I still wish to point out how to specify a correct error term/random effect when fitting a aov or lmer model to a multi-way repeated-measures data. I assume that both independent variables (IVs) are fixed, and are crossed with each other and with subjects, meaning all subjects are exposed to all combinations of the IVs. I am going to use data taken from Kirk’s Experimental Deisign: Procedures for the Behavioral Sciences (2013).
library(lme4)
library(foreign)
library(lmerTest)
library(dplyr)
file_name <- "http://www.ats.ucla.edu/stat/stata/examples/kirk/rbf33.dta" #1
d <- read.dta(file_name) %>% #2
mutate(a_f = factor(a), b_f = factor(b), s_f = factor(s)) #3
head(d)
## a b s y a_f b_f s_f
## 1 1 1 1 37 1 1 1
## 2 1 2 1 43 1 2 1
## 3 1 3 1 48 1 3 1
## 4 2 1 1 39 2 1 1
## 5 2 2 1 35 2 2 1
In this study 5 subjects (s) are exposed to 2 treatments - type of beat (a) and training duration (b) - with 3 levels each. The outcome variable is the attitude toward minority. In #3 I made a, b, and s into factor variables, a_f, b_f, and s_f. Let p and q be the numbers of levels for a_f and b_f (3 each), and n be the sample size (5).
In this example the degrees of freedom (dfs) for the tests of a_f, b_f, and their interaction should be p-1=2, q-1=2, and (p-1)*(q-1)=4, respectively. The df for the s_f error term is (n-1) = 4, and the df for the Within (s_f:a_f:b_f) error term is (n-1)(pq-1)=32. So the correct model(s) should give you these dfs.
Using aov
Now let’s try different model specifications using aov:
aov(y ~ a_f*b_f + Error(s_f), data=d) %>% summary() # m1
aov(y ~ a_f*b_f + Error(s_f/a_f:b_f), data=d) %>% summary() # m2
aov(y ~ a_f*b_f + Error(s_f/a_f*b_f), data=d) %>% summary() # m3
Simply specifying the error as Error(s_f) in m1 gives you the correct dfs and F-ratios matching the values in the book. m2 also gives the same value as m1, but also the infamous “Warning: Error() model is singular”. m3 is doing something strange. It is further partitioning Within residuals in m1 (634.9) into residuals for three error terms: s_f:a_f (174.2), s_f:b_f (173.6), and s_f:a_f:b_f (287.1). This is wrong, since we would not get three error terms when we run a 2-way between-subjects ANOVA! Multiple error terms are also contrary to the point of using block factorial designs, which allows us to use the same error term for the test of A, B, and AB, unlike split-plot designs which requires 2 error terms.
Using lmer
How can we get the same dfs and F-values using lmer? If your data is balanced, the Kenward-Roger approximation used in the lmerTest will give you exact dfs and F-ratio.
lmer(y ~ a_f*b_f + (1|s_f), data=d) %>% anova() # mem1
lmer(y ~ a_f*b_f + (1|s_f/a_f:b_f), data=d) %>% anova() # mem2
lmer(y ~ a_f*b_f + (1|s_f/a_f*b_f), data=d) %>% anova() # mem3
lmer(y ~ a_f*b_f + (1|s_f:a_f:b_f), data=d) %>% anova() # mem4
lmer(y ~ a_f*b_f + (a_f*b_f|s_f), data=d) %>% anova() # mem5
Again simply specifying the random effect as (1|s_f) give you the correct dfs and F-ratios (mem1). mem2-5 did not even give results, presumably the numbers of random effects it needed to estimate were greater than the sample size.

Resources