I have a problem regarding my data analysis in R. One of my hypothesis is basically that my groups will differ in terms of spread of the scores, indicating that there would be a difference in extremity between the groups.
I decided to check my hypothesis with Levenes test, which turned out significant and should thus highlight that the standard deviations is significantly different between the groups.
But I do not know of any post hoc tests for Levenes test, and after reading up on possible post hoc analyses I decided to conduct an ANOVA on the residuals, and then do a post hoc test on the ANOVA.
This is the code I've tried so far:
leveneTest(SS_mean~RA01, DF)
DF$residuals <- abs(DF$SS_mean - DF$SS_mean_big) #SS_mean = Participants score,
#SS_mean_big = mean for each group.
My test and post hoc test looks like this:
levene.anova<-aov(residuals~RA01, DF) #RA01 is the groups. Four in total
summary(levene.anova)
TukeyHSD(levene.anova)
The ANOVA on the residuals turned out significant as well, but the p-value changed from 0.04 (Levenes test) to 0.01 (ANOVA on residuals).
When reading about it, it seemed like Levene test is just an ANOVA on the resiudals, and thus it should give me the same results. And I am also unsure what post hoc test i should use. I thought about Dunnett as well as it includes a baseline, which corresponds to one of my groups.
Lastly, I did a leveneTest on the residuals as well "leveneTest(residuals~RA01)", which turned out significant. Is it better for me to use a non-parametric test, e.g. Kruskal-Wallis h-test and conduct a post hoc test on my kruskal wallis test instead? And if this is the case, what would be the appropriate test? Should I use a pairwise Mann Whitney u-test or Dunn test?
As this is the first time im doing something like this, I'm unsure about if this is a legitimate analysis, I would really appreciate your help or input!
Levene's test should indeed give the same p-value as an ANOVA on the residuals.
See for example this code:
data("mtcars")
mtcars$cyl <- as.factor(mtcars$cyl)
# Calculate means and add them to data
cyl_means <- aggregate(disp ~ cyl, data = mtcars, FUN = mean)
colnames(cyl_means)[2] <- "disp_mean"
mtcars2 <- merge(mtcars, cyl_means, by = "cyl")
# Residuals and anova
mtcars2$residuals <- abs(mtcars2$disp - mtcars2$disp_mean)
res.aov <- aov(residuals ~ cyl, data = mtcars2)
summary(res.aov)
# Levene's test
lawstat::levene.test(mtcars$disp, mtcars$cyl, location = "mean")
Maybe you accidentially ran the Brown–Forsythe test instead, which is the default in lawstat::levene.test, and which uses the median instead of the mean to calculate residuals.
Use Dunnett's if you are only interested in comparing the groups to one baseline group.
Use TukeyHSD if you want all pairwise comparisons among groups.
Related
I would like to perform post-hoc tests on imputed data using MICE in R.
Typically MICE imputes data which is converted to long data to calculate total scores and can be converted back into MIDS elements. Analysis is then conducted over a MIRO element after which analysis can be pooled.
However, I am not able to get it running for post hoc tests including Tukey and Gammel-Howell. Would someone be able to help?
IMP <- mice(data, m=5, maxit=10)
IMP_long <- data.frame(complete(IMP, include=TRUE, action= 'long')
IMP_mids <- as.mids(IMP_long, where = NULL, .imp='.imp', .id = '.id'
fit <- with(IMP_mids, expr=lm(total_score ~ GroupingVariable))
The grouping variable consists of 3 groups which I would like to compare pairwise. Namely 1vs2, 2vs3 and 1vs3.
summary(pool(fit))
-> this gives comparisons between two groups, relative to the intercept. Similarly by using contrasts before creating model
Someone who knows how to compare the three groups in one analysis with tukey and/or gammel-howell?
Thanks in advance!!
i'm not sure that this is the perfect place for such a question but maybe you can help me.
I want to check for differences of a quantitative variable between 3 treatments, i.e perform an ANOVA.
Unfortunately the residuals of my model aren't normally distributed.
I usually have here two solutions : Transform my data or use a non parametric equivalent of my test (here a kruskal wallis rank test).
None of the transformations that i tried managed to satisfy normality (log, 1/x, square root, tukey and boxcox power) so I wanted to use a kruskal and to move on.
However, my project manager insisted on having only ANOVAs and talked about ANOVA on rank as a magic solution.
Working on R I looked for some examples and find a function art from ARTool package that perform anova on rank.
library(ARTool)
model <- art(variable~treatment,data)
anova(model)
Basically it takes your variable and replace it by its rank (dealing with ties by averaging the rank) as :
model2 <- lm(rank(variable, ties.method = "average")~treatment,data)
anova(model2)
gives exactly the same output.
I'm not an expert statistician and I wonder how valid is this solution/transformation.
It seems quite brutal to me and not this far from the logic of the kruskal-wallis test
even tho the statistic is not computed directly on ranks.
I find this very confusing to have an 'ANOVA on ranks' test that is different from the kruskal-wallis (also known as One-way ANOVA on ranks) and I don't know how to chose between those two tests.
I don't know if I've been very clear and if someone can help me but, anyway,
Thanks for your attention and comments!
PS: here is an exemple on dummy data
library(ARTool)
# note that dummy data are random so we shouldn't have the same results
treatment <- as.factor(c(rep("A",100),rep("B",100),rep("C",100)))
variable <- as.numeric(c(sample(c(0:30),100,replace=T),sample(c(10:40),100,replace=T),sample(c(5:35),100,replace=T)))
dummy <- data.frame(treatment,variable)
model <- art(variable~treatment)
anova(model) #f.value = 30.746 and p = 7.312e-13
model2 <- lm(rank(variable, ties.method = "average")~treatment,dummy)
anova(model2) #f.value = 30.746 and p = 7.312e-13
kruskal.test(variable~treatment,dummy)
I am testing for a linear trend among several groups, however, my data has violated the assumption of equal variance among groups (tested by Levene's homogeneity of variance).
In SPSS, along with the significance of linear trend assuming equal variance, there is automatic output for significance where equal variances are not assumed. What 'test' or 'adjustment' is being done? Can I do this in R, and how?
Image of SPSS output: (https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSZRs3EM3wJz5raHhav-LLBTmTyfLJO0z4xHDEzI-3uI15BoBQ5)
I'm struggling to find what exactly SPSS is doing, but it could be some sort of welch correction?
# TEST homogeneity of variance
leveneTest(ICECAP_A ~ SFMental_f, data = SCI)
p < 0.001 so we reject null of homogeneity of variance.
# Use built-in contr.poly() function: Tell R to get a polynomial contrast matrix for 5 levels/groups
contrasts(SCI$SFMental_f) <- contr.poly(n=5)
# call an ANOVA
anova.SFMental <- aov(ICECAP_A ~ SFMental_f, data = SCI)
# print output, show linear trend result
summary.aov(anova.SFMental, split=list (SFMental_f=list ("Linear"=1)))
Now I have the significance for linear trend. How do I get the significance if we do NOT assume equal variances?
It seems that SPSS does a correction using the [Welch-Satterthwaite Equation]1. Thanks to Andy Field for the tip. But there is no direct R alternative, so I constructed the contrasts in the usual way and ran a robust model with lmRob() instead.
I have tried to run Tukey HSD for a multi-variable dataset. However, when I run the same test on a single variable, the results are completely opposite.
While running for multiple variables, I observed the following error in ANOVA output:
8 out of 87 effects not estimable
Estimated effects may be unbalanced
While running for single variable, I observed the following error in ANOVA output:
Estimated effects may be unbalanced
Is this in any way related to the completely opposite Tukey HSD output which I received? Also, how do I go on solving this problem?
I used aov() and have close to 500000 datapoints in my dataset.
to be more specific, the following code gave me a different result:
code1:
lm_test1 <- lm(y ~ x1+ x2, data=data)
glht(lm_test1, linfct = mcp(x1 = "Tukey"))
code2:
lm_test1 <- lm(y ~ x1, data=data)
glht(lm_test1, linfct = mcp(x1 = "Tukey"))
Please tell me how this is possible...
after some more research, I found the answer to this, so thought I should post this. Anova in R is by default type - I anova. So that means the first variable that we input, the effects are considered without controlling for any other factors, on the other hand, for the other variables, the results are shown after controlling for the effects of other variables. Therefore, since I was inputting my variable as the 2nd variable, the results shown were after controlling for the 1st variable which was by chance, in a completely opposite direction to looking at a direct effect.
I've been using var.test and bartlett.test to check basic ANOVA assumptions, among others, homoscedascity (homogeniety, equality of variances). Procedure is quite simple for One-Way ANOVA:
bartlett.test(x ~ g) # where x is numeric, and g is a factor
var.test(x ~ g)
But, for 2x2 tables, i.e. Two-Way ANOVA's, I want to do something like this:
bartlett.test(x ~ c(g1, g2)) # or with list; see latter:
var.test(x ~ list(g1, g2))
Of course, ANOVA assumptions can be checked with graphical procedures, but what about "an arithmetic option"? Is that, at all, manageable? How do you test homoscedascity in Two-Way ANOVA?
Hypothesis testing is the wrong tool to use to asses the validity of model assumptions. If the sample size is small, you have no power to detect any variance differences, even if the variance differences are large. If you have a large sample size you have power to detect even the most trivial deviations from equal variance, so you will almost always reject the null. Simulation studies have shown that preliminary testing of model assumption leads to unreliable type I errors.
Looking at the residuals across all cells is a good indicator, or if your data are normal, you can use the AIC or BIC with/without equal variances as a selection procedure.
If you think there are unequal variances, drop the assumption with something like:
library(car)
model.lm <- lm(formula=x ~ g1 + g2 + g1*g2,data=dat,na.action=na.omit)
Anova(model.lm,type='II',white.adjust='hc3')
You don't loose much power with the robust method (hetroscedastic consistent covariance matrices), so if in doubt go robust.
You can test for heteroscedasticity using the Fligner–Killeen test of homogeneity of variances. Supposing your model is something like
model<-aov(gain~diet*supplement)
fligner.test(gain~diet*supplement)
Fligner-Killeen test of homogeneity of variances
data: gain by diet by supplement
Fligner-Killeen:med chi-squared = 2.0236, df = 2, p-value = 0.3636
You could have used bartlett.test (but this is more a test of non-normality than of equality of variances)
bartlett.test(gain~diet*supplement)
Bartlett test of homogeneity of variances
data: gain by diet by supplement
Bartlett's K-squared = 2.2513, df = 2, p-value = 0.3244
Moreover, you could perform the Levene test for equal group variances in both one-way and two-way ANOVA. Implementations of Levene's test can be found in packages car (link fixed), s20x and lawstat
levene.test(gain~diet*supplement) # car package version
Levene's Test for Homogeneity of Variance
Df F value Pr(>F)
group 11 1.1034 0.3866
36
For bartlett.test
bartlett.test(split(x,list(g1,g2)))
var.test is not applicable as it works only when there are two groups.