I'm making a scatterplot matrix with ggpairs{GGally) as follows, but I'd like to display the p values for each term in my aov results in the upper panels, rather than just the overall and by-species correlation value that comes with the package.
How can I get the right column from this aov result into my upper plots? Can I write a custom function to do this, and how? Is is even possible using ggpairs? Thanks.
pm <- ggpairs(data = iris,
mapping = aes(color = Species),
columns = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"))
result <- aov(Sepal.Length ~ Sepal.Width*Petal.Length, data = iris)
Df Sum Sq Mean Sq F value Pr(>F)
Sepal.Width 1 1.41 1.41 12.9 0.000447 ***
Petal.Length 1 84.43 84.43 771.4 < 2e-16 ***
Sepal.Width:Petal.Length 1 0.35 0.35 3.2 0.075712 .
Residuals 146 15.98 0.11
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The default OLS regression in R gives me the p-value regarding whether or not the coefficient is different from zero.
Is there a way to change this default regarding coefficients that are different from one?
Thank you
Just carry out the linear hypothesis. In R use the function car::LinearHypothesis:
mod <- lm(Sepal.Width~., iris)
then run any of the following to test as to whether the coefficient for Petal.Length = 1
car::linearHypothesis(mod, "Petal.Length = 1")
car::lht(mod, "Petal.Length = 1")
Linear hypothesis test
Petal.Length = 1
Model 1: restricted model
Model 2: Sepal.Width ~ Sepal.Length + Petal.Length + Petal.Width + Species
Res.Df RSS Df Sum of Sq F Pr(>F)
1 145 24.837
2 144 10.328 1 14.509 202.31 < 2.2e-16 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Hi, I have a large dataset with isotope values for multiple species per season on different locations from which I performed an ANOVA:
Anova <- Isotopes %>%
group_by(Species) %>%
do(model = aov(d15N~Season+Location, data = Isotopes))
I would like to see the summary of this test for each unique species separately. How can I get this data?
Thank you in advance.
If I understood your question, you can use lapply:
# Reproduce a similar dataset
Anova <- iris %>%
mutate(new_var = sample(LETTERS[1:3], size = nrow(.), replace = T)) %>%
group_by(Species) %>%
do(model = aov(Sepal.Length~Petal.Width+new_var, data = .))
summ <- lapply(Anova$model,summary)
names(summ) <- Anova$Species
Df Sum Sq Mean Sq F value Pr(>F)
Petal.Width 1 0.471 0.4709 3.919 0.0538 .
new_var 2 0.090 0.0450 0.374 0.6898
Residuals 46 5.527 0.1202
--- Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
Df Sum Sq Mean Sq F value Pr(>F)
Petal.Width 1 3.899 3.899 22.432 2.12e-05 ***
new_var 2 1.162 0.581 3.344 0.0441 *
Residuals 46 7.994 0.174
--- Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
Df Sum Sq Mean Sq F value Pr(>F)
Petal.Width 1 1.566 1.5656 4.049 0.0501 .
new_var 2 0.459 0.2295 0.593 0.5566
Residuals 46 17.788 0.3867
--- Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
I'm building a repeated measures ANCOVA using a multi-level framework through the AOV package. I have one continuous response variable, two factor predictors, and 3 continuous covariates. My script for the model is below:
ModelDV <- aov(DV ~ IV1 + IV2 + IV1*IV2 + CV1 + CV2 + CV3 + Error(PartID/(IV1 + IV2 + IV1:IV2)), data)
A snippet of my data set shows how it is formatted:
1 56 CondA1 CondB1 Contunous values
2 45 CondA2 CondB2 -
3 32 CondA3 CondB1 -
4 21 CondA4 CondB2 -
1 10 CondA1 CondB1 -
2 19 CondA2 CondB2 -
3 35 CondA3 CondB1 -
4 45 CondA4 CondB2 -
My condiitons are embedded in the error term of the participant ID since this is a fully within repeated measures model.
I am attempting to conduct a pairwise analysis on these values. My output provides omnibus F-tests:
Error: PartID
Df Sum Sq Mean Sq F value Pr(>F)
CV1 1 348 348 0.442 0.5308
CV2 1 9 9 0.011 0.9193
CV3 1 3989 3989 5.063 0.0654 .
Residuals 6 4727 788
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: PartID:IV1
Df Sum Sq Mean Sq F value Pr(>F)
IV1 1 6222 6222 17.41 0.0024 **
Residuals 9 3217 357
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: PartID:IV2
Df Sum Sq Mean Sq F value Pr(>F)
IV2 2 6215 3107.7 16.18 9.51e-05 ***
Residuals 18 3457 192.1
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: PartID:IV1:IV2
Df Sum Sq Mean Sq F value Pr(>F)
IV1:IV2 2 575.2 287.6 1.764 0.2
Residuals 18 2934.4 163.0
When calculating emmeans via:
emm<-emmeans(Model, ~ IV1)
I get a sensible output.
However, when using this for the covariates:
emm<-emmeans(Model, ~ CV1)
I get the following output:
contrast estimate SE df z.ratio p.value
(nothing) nonEst NA NA NA NA
Results are averaged over the levels of: IV1, IV2
What am I doing wrong here that a pairwise comparison is not working for the covariate?
Short answer is because you have made them covariates to control for them and not to consider them as part of the explanation for your model. You of course could do pairwise comparisons for the covariates outside the model but not inside the model framework. Longer blogpost using these tools I wrote here...
I am running a regression on R
fbReg <- lm(y~x2+x7+x8,table.b1)
I then run an Anova table to analyze the significance of the regression
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x2 1 76.193 76.193 26.172 3.100e-05 ***
x7 1 139.501 139.501 47.918 3.698e-07 ***
x8 1 41.400 41.400 14.221 0.0009378 ***
Residuals 24 69.870 2.911
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Is there anything i can do to make my anova table sum all the sum of squares for x2,x7,x8 instead of having them separate.
Essentially, have the anova table look like this
df SS MS FvAL PR(>F)
Regression 3 257.094 ETC....
Error(Residual) 24 69.870 ETC.....
To illustrate my comment:
> lm2 <- lm(Fertility ~ Catholic+Education+Agriculture, data = swiss)
> lm1 <- lm(Fertility ~ 1, data = swiss)
> anova(lm1,lm2)
Analysis of Variance Table
Model 1: Fertility ~ 1
Model 2: Fertility ~ Catholic + Education + Agriculture
Res.Df RSS Df Sum of Sq F Pr(>F)
1 46 7178.0
2 43 2567.9 3 4610.1 25.732 1.089e-09 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The following script
#!/usr/bin/Rscript --vanilla
x <- c(4.5,6.4,7.2,6.7,8.8,7.8,9.6,7.0,5.9,6.8,5.7,5.2)
fertilizer<- factor(c('A','A','A','A','B','B','B','B','C','C','C','C'))
crop <- factor(c('I','II','III','IV','I','II','III','IV','I','II','III','IV'))
av <- aov(x~fertilizer*crop)
Df Sum Sq Mean Sq
fertilizer 2 13.6800 6.8400
crop 3 2.8200 0.9400
fertilizer:crop 6 6.5800 1.0967
For other data, aov usually gives the F-statistic and associated p-value. What is wrong/special about this data that causes R to omit the juicy parts?
Should you using + instead of * in the formula?
> summary(aov(x~fertilizer + crop))
Df Sum Sq Mean Sq F value Pr(>F)
fertilizer 2 13.6800 6.8400 6.2371 0.03426 *
crop 3 2.8200 0.9400 0.8571 0.51218
Residuals 6 6.5800 1.0967
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1