Get ANOVA summary statistics from each unique factor separately - r

Hi, I have a large dataset with isotope values for multiple species per season on different locations from which I performed an ANOVA:
Anova <- Isotopes %>%
group_by(Species) %>%
do(model = aov(d15N~Season+Location, data = Isotopes))
Anova$model
I would like to see the summary of this test for each unique species separately. How can I get this data?
Thank you in advance.

If I understood your question, you can use lapply:
# Reproduce a similar dataset
Anova <- iris %>%
mutate(new_var = sample(LETTERS[1:3], size = nrow(.), replace = T)) %>%
group_by(Species) %>%
do(model = aov(Sepal.Length~Petal.Width+new_var, data = .))
summ <- lapply(Anova$model,summary)
names(summ) <- Anova$Species
summ
$setosa
Df Sum Sq Mean Sq F value Pr(>F)
Petal.Width 1 0.471 0.4709 3.919 0.0538 .
new_var 2 0.090 0.0450 0.374 0.6898
Residuals 46 5.527 0.1202
--- Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
$versicolor
Df Sum Sq Mean Sq F value Pr(>F)
Petal.Width 1 3.899 3.899 22.432 2.12e-05 ***
new_var 2 1.162 0.581 3.344 0.0441 *
Residuals 46 7.994 0.174
--- Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
$virginica
Df Sum Sq Mean Sq F value Pr(>F)
Petal.Width 1 1.566 1.5656 4.049 0.0501 .
new_var 2 0.459 0.2295 0.593 0.5566
Residuals 46 17.788 0.3867
--- Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Related

One-way ANOVA for loop: how do I initiate through multiple colums of a dataframe

I want to run more than 1000 different one way ANOVA's
I would like to see if the number of reads from a single miroRNA changes between four different groups. And I would like that from each of the more than 1000 miRNAs.
My tibble dataframe looks like this:
I have my 4 groups (YC, OC, YH, OH) and a different miRNA in each column.
enter image description here
I tried a for-loop through which I expect R to iterate through the name of the miRNAs and then summaries an ANOVA table and TukeyHSD test:
for(i in 2:ncol(test))
{column<-names(test[i])AVz<-summary(aov(test[,i]~Group,data =
test))tk<-TukeyHSD((aov(test[,i]~Group,data =
test)))print(column)print(AVz)print(tk)}
BUT this didn't work:
Error: unexpected symbol in "for(i in 2:ncol(test)){column<-names(test[i])AVz"
If you would like to run aov you can use cbind
formula <- as.formula(paste0("cbind(", paste(names(iris)[-5], collapse = ","), ") ~ Species"))
fit <- aov(formula, data=iris)
summary(fit)
Response Sepal.Length :
Df Sum Sq Mean Sq F value Pr(>F)
Species 2 63.212 31.606 119.26 < 2.2e-16 ***
Residuals 147 38.956 0.265
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Response Sepal.Width :
Df Sum Sq Mean Sq F value Pr(>F)
Species 2 11.345 5.6725 49.16 < 2.2e-16 ***
Residuals 147 16.962 0.1154
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Response Petal.Length :
Df Sum Sq Mean Sq F value Pr(>F)
Species 2 437.10 218.551 1180.2 < 2.2e-16 ***
Residuals 147 27.22 0.185
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Response Petal.Width :
Df Sum Sq Mean Sq F value Pr(>F)
Species 2 80.413 40.207 960.01 < 2.2e-16 ***
Residuals 147 6.157 0.042
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

r - emmeans pairwise analysis for multilevel repeated measures ANCOVA

I'm building a repeated measures ANCOVA using a multi-level framework through the AOV package. I have one continuous response variable, two factor predictors, and 3 continuous covariates. My script for the model is below:
ModelDV <- aov(DV ~ IV1 + IV2 + IV1*IV2 + CV1 + CV2 + CV3 + Error(PartID/(IV1 + IV2 + IV1:IV2)), data)
A snippet of my data set shows how it is formatted:
PartID DV IV1 IV2 CV1 CV2 CV3
1 56 CondA1 CondB1 Contunous values
2 45 CondA2 CondB2 -
3 32 CondA3 CondB1 -
4 21 CondA4 CondB2 -
1 10 CondA1 CondB1 -
2 19 CondA2 CondB2 -
3 35 CondA3 CondB1 -
4 45 CondA4 CondB2 -
My condiitons are embedded in the error term of the participant ID since this is a fully within repeated measures model.
I am attempting to conduct a pairwise analysis on these values. My output provides omnibus F-tests:
Error: PartID
Df Sum Sq Mean Sq F value Pr(>F)
CV1 1 348 348 0.442 0.5308
CV2 1 9 9 0.011 0.9193
CV3 1 3989 3989 5.063 0.0654 .
Residuals 6 4727 788
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: PartID:IV1
Df Sum Sq Mean Sq F value Pr(>F)
IV1 1 6222 6222 17.41 0.0024 **
Residuals 9 3217 357
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: PartID:IV2
Df Sum Sq Mean Sq F value Pr(>F)
IV2 2 6215 3107.7 16.18 9.51e-05 ***
Residuals 18 3457 192.1
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: PartID:IV1:IV2
Df Sum Sq Mean Sq F value Pr(>F)
IV1:IV2 2 575.2 287.6 1.764 0.2
Residuals 18 2934.4 163.0
When calculating emmeans via:
emm<-emmeans(Model, ~ IV1)
pairs(emm)
I get a sensible output.
However, when using this for the covariates:
emm<-emmeans(Model, ~ CV1)
pairs(emm)
I get the following output:
contrast estimate SE df z.ratio p.value
(nothing) nonEst NA NA NA NA
Results are averaged over the levels of: IV1, IV2
What am I doing wrong here that a pairwise comparison is not working for the covariate?
Short answer is because you have made them covariates to control for them and not to consider them as part of the explanation for your model. You of course could do pairwise comparisons for the covariates outside the model but not inside the model framework. Longer blogpost using these tools I wrote here...

How can I get ggpairs to display ancova results in upper panels?

I'm making a scatterplot matrix with ggpairs{GGally) as follows, but I'd like to display the p values for each term in my aov results in the upper panels, rather than just the overall and by-species correlation value that comes with the package.
How can I get the right column from this aov result into my upper plots? Can I write a custom function to do this, and how? Is is even possible using ggpairs? Thanks.
library(GGally);library(ggplot2)
pm <- ggpairs(data = iris,
mapping = aes(color = Species),
columns = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"))
pm
result <- aov(Sepal.Length ~ Sepal.Width*Petal.Length, data = iris)
print(summary(result))
Df Sum Sq Mean Sq F value Pr(>F)
Sepal.Width 1 1.41 1.41 12.9 0.000447 ***
Petal.Length 1 84.43 84.43 771.4 < 2e-16 ***
Sepal.Width:Petal.Length 1 0.35 0.35 3.2 0.075712 .
Residuals 146 15.98 0.11
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Grouping Regressors in Anova Table for Multiple Linear Regression

I am running a regression on R
fbReg <- lm(y~x2+x7+x8,table.b1)
I then run an Anova table to analyze the significance of the regression
anova(fbReg)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x2 1 76.193 76.193 26.172 3.100e-05 ***
x7 1 139.501 139.501 47.918 3.698e-07 ***
x8 1 41.400 41.400 14.221 0.0009378 ***
Residuals 24 69.870 2.911
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Is there anything i can do to make my anova table sum all the sum of squares for x2,x7,x8 instead of having them separate.
Essentially, have the anova table look like this
df SS MS FvAL PR(>F)
Regression 3 257.094 ETC....
Error(Residual) 24 69.870 ETC.....
Thanks
To illustrate my comment:
> lm2 <- lm(Fertility ~ Catholic+Education+Agriculture, data = swiss)
> lm1 <- lm(Fertility ~ 1, data = swiss)
> anova(lm1,lm2)
Analysis of Variance Table
Model 1: Fertility ~ 1
Model 2: Fertility ~ Catholic + Education + Agriculture
Res.Df RSS Df Sum of Sq F Pr(>F)
1 46 7178.0
2 43 2567.9 3 4610.1 25.732 1.089e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

How can I get aov to show me the F-statistic and p-value?

The following script
#!/usr/bin/Rscript --vanilla
x <- c(4.5,6.4,7.2,6.7,8.8,7.8,9.6,7.0,5.9,6.8,5.7,5.2)
fertilizer<- factor(c('A','A','A','A','B','B','B','B','C','C','C','C'))
crop <- factor(c('I','II','III','IV','I','II','III','IV','I','II','III','IV'))
av <- aov(x~fertilizer*crop)
summary(av)
yields
Df Sum Sq Mean Sq
fertilizer 2 13.6800 6.8400
crop 3 2.8200 0.9400
fertilizer:crop 6 6.5800 1.0967
For other data, aov usually gives the F-statistic and associated p-value. What is wrong/special about this data that causes R to omit the juicy parts?
Should you using + instead of * in the formula?
> summary(aov(x~fertilizer + crop))
Df Sum Sq Mean Sq F value Pr(>F)
fertilizer 2 13.6800 6.8400 6.2371 0.03426 *
crop 3 2.8200 0.9400 0.8571 0.51218
Residuals 6 6.5800 1.0967
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Resources