Why Error() in aov() gives three levels? - r

I'm trying to understand how to properly run an Repeated Measures or Nested ANOVA in R, without using mixed models. From consulting tutorials, the formula for a one-variable repeated measures anova is:
aov(Y ~ IV+ Error(SUBJECT/IV) )
where IV is the within subjects and subject is the identity of the subjects. However, most examples show outputs with two strata: Error:subject and Error: subject:WS. Meanwhile I am getting three strata ( Error:subject and Error: subject:WS, Error:within). Why do I have three strata, when I'm trying to specify only two (Within and Between)?
Here is an reproducible example:
data(beavers)
id = rep(c("beaver1","beaver2"),times=c(nrow(beaver1),nrow(beaver2)))
data = data.frame(id=id,rbind(beaver1,beaver2))
data$activ=factor(data$activ)
aov(temp~activ+Error(id/activ),data=data)
temp is a continuous measure of temperature, id is the identity of the beaver activ is binary factor for activity. The output of the model is:
Error: id
Df Sum Sq Mean Sq
activ 1 28.74 28.74
Error: id:activ
Df Sum Sq Mean Sq F value Pr(>F)
activ 1 15.313 15.313 18.51 0.145
Residuals 1 0.827 0.827
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 210 7.85 0.03738

Related

Denominator degrees of freedom (when using lmer) are different in loop than outside loop

I am looping an lmer function over 20+ y variables, with 159 rows of observations. When I run an lmer function inside the loop, denominator degrees of freedom are lost. Outside the loop (or even if I specify one y variable inside the loop), denominator df is as expected.
I have a df with 20 y variables for plants in two chambers with 4 treatments (replicated in both treatements).
lmer_test <- lmer(leaves_mean~Treatment + (1|Chamber), data = df)
aov_test <- anova(lmer_test)
aov_test
This gives:
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Treatment 97.593 32.531 3 153 1.1966 0.3131
I have a loop:
for(u in colnames(df)[6:ncol(df)])
{
Y_variable_Rex <- names(df[u])
lmer_u_Rex <-lmer(get(u) ~ Treatment + (1|Chamber), data = df)
aov_u_Rex <- anova(lmer_u_Rex)
aov_u_Rex
}
That gives
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Treatment 208420 69473 3 4.6406e-14 3.5911e+31 1
If I specify the exact code as outside the loop (replacing get(u) with "leaves_mean")... I get the correct result:
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Treatment 97.593 32.531 3 153 1.1966 0.3131
The "get_u" specification should result in exactly the same result inside the loop. What is happening that makes the denominator degrees of freedom different (0)?
I answered my own question...
The loop was reporting denominator df for a different variable (the last one).
There are missing values in that column as that was a y variable of sampled values, with many missing rows.
I was able to reproduce the error and then resolve it within the loop.
I was approaching de-bugging here like you might do in a Matlab loop (step-wise through each row), which is not a good method for fixing R loops.
Thank you for reading this question and considering answering!

Model simplification (two way ANOVA)

I am using ANOVA to analyse results from an experiment to see whether there are any effects of my explanatory variables (Heating and Dungfauna) on my response variable (Biomass). I started by looking at the main effects and interaction:
full.model <- lm(log(Biomass) ~ Heating*Dungfauna, data= df)
anova(full.model)
I understand that it is necessary to complete model simplification, removing non-significant interactions or effects to eventually reach the simplest model which still explains the results. I tried two ways of removing the interaction. However, when I manually remove the interaction (Heating*Fauna -> Heating+Fauna), the new ANOVA gives a different output to when I use this model simplification 'shortcut':
new.model <- update(full.model, .~. -Dungfauna:Heating)
anova(model)
Which way is the appropriate way to remove the interaction and simplify the model?
In both cases the data is log transformed -
lm(log(CC_noAcari_EmergencePatSoil)~ Dungfauna*Heating, data= biomass)
ANOVA output from manually changing Heating*Dungfauna to Heating+Dungfauna:
Response: log(CC_noAcari_EmergencePatSoil)
Df Sum Sq Mean Sq F value Pr(>F)
Heating 2 4.806 2.403 5.1799 0.01012 *
Dungfauna 1 37.734 37.734 81.3432 4.378e-11 ***
Residuals 39 18.091 0.464
ANOVA output from using simplification 'shortcut':
Response: log(CC_noAcari_EmergencePatSoil)
Df Sum Sq Mean Sq F value Pr(>F)
Dungfauna 1 41.790 41.790 90.0872 1.098e-11 ***
Heating 2 0.750 0.375 0.8079 0.4531
Residuals 39 18.091 0.464
R's anova and aov functions compute the Type I or "sequential" sums of squares. The order in which the predictors are specified matters. A model that specifies y ~ A + B is asking for the effect of A conditioned on B, whereas Y ~ B + A is asking for the effect of B conditioned on A. Notice that your first model specifies Dungfauna*Heating, while your comparison model uses Heating+Dungfauna.
Consider this simple example using the "mtcars" data set. Here I specify two additive models (no interactions). Both models specify the same predictors, but in different orders:
add.model <- lm(log(mpg) ~ vs + cyl, data = mtcars)
anova(add.model)
Df Sum Sq Mean Sq F value Pr(>F)
vs 1 1.22434 1.22434 48.272 1.229e-07 ***
cyl 1 0.78887 0.78887 31.103 5.112e-06 ***
Residuals 29 0.73553 0.02536
add.model2 <- lm(log(mpg) ~ cyl + vs, data = mtcars)
anova(add.model2)
Df Sum Sq Mean Sq F value Pr(>F)
cyl 1 2.00795 2.00795 79.1680 8.712e-10 ***
vs 1 0.00526 0.00526 0.2073 0.6523
Residuals 29 0.73553 0.02536
You could specify Type II or Type III sums of squares using car::Anova:
car::Anova(add.model, type = 2)
car::Anova(add.model2, type = 2)
Which gives the same result for both models:
Sum Sq Df F value Pr(>F)
vs 0.00526 1 0.2073 0.6523
cyl 0.78887 1 31.1029 5.112e-06 ***
Residuals 0.73553 29
summary also provides equivalent (and consistent) metrics regardless of the order of predictors, though it's not quite a formal ANOVA table:
summary(add.model)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.92108 0.20714 18.930 < 2e-16 ***
vs -0.04414 0.09696 -0.455 0.652
cyl -0.15261 0.02736 -5.577 5.11e-06 ***

R AOV will not do the interaction between two variables

I have in the past had R perform aov's with interaction between two varbles, however I am unable to get it to do so now.
Code:
x.aov <- aov(thesis_temp$`Transformed Time to Metamorphosis` ~ thesis_temp$Sex + thesis_temp$Mature + thesis_temp$Sex * thesis_temp$Mature)
Output:
Df Sum Sq Mean Sq F value Pr(>F)
thesis_temp$Sex 1 0.000332 0.0003323 1.370 0.2452
thesis_temp$Mature 1 0.000801 0.0008005 3.301 0.0729 .
Residuals 82 0.019886 0.0002425
I want it to also include a Sex x Mature interaction, but it will not produce this. Any suggestions of how to get R to also do the interaction analysis?

How do I run Kruskal and post HOC on multiple variables in R?

Please excuse me if I have not formated my code correctly as I am new to the site. I also do not know how to provide sample data properly.
I have a data set of 42 obs. and 37 variables (first column being the group, 3 groups) of non normal distributed data; I want to compare all of my 36 parameters between the 3 groups and do a subsequent post hoc (pairwise.wilcox?).
The data are flow cell counts for three different patient groups. I have been able to perform the initial comparison creating a formula and running an aov (though I would like to do Kruskal) but have not found a way to perform the post hoc to all variables in the same way.
#Data
Type Neutrophils Monocytes NKC .....
------------------------------------------
IN 546 2663 545
IN 0797 7979 008
OUT 0899 3899 345
OUT 6868 44533 689
HC 9898 43443 563
#Cbind all variable together to run model on all
formula <- as.formula(paste0("cbind(", paste(names(LessCount)[-1],
collapse = ","), ") ~ Type"))
print(formula)
#Run test on model
fit <- aov(formula, data=LessCount)
#Print results
summary(fit)
Response Neutrophils :
Df Sum Sq Mean Sq F value Pr(>F)
Type 2 18173966 9086983 1.8099 0.1771
Residuals 39 195806220 5020672
Response Monocytes :
Df Sum Sq Mean Sq F value Pr(>F)
Type 2 694945 347472 0.7131 0.4964
Residuals 39 19004809 487303
Response Mono.Classic :
Df Sum Sq Mean Sq F value Pr(>F)
Type 2 1561778 780889 2.5842 0.08833 .
Residuals 39 11785116 302182
###export anova####
capture.output(summary(fit),file="test1.csv")
#If Significant,Check which# (currently doing by hand individually)
pairwise.wilcox.test(LessCount$pDCs, LessCount$Type,
p.adjust.method = "BH")
I get out a table the results for the aov for every variable in my console, but would like to do the same for the post hoc, since I need every p value.
Thank you in advance.
Maybe you can directly use the function kruskal.test() and get the p.values.
Here is an example with the iris dataset. I use the function apply() in order to apply the kruskal.test function to each variable (except Species, which is the variable with group information).
data(iris)
apply(iris[-5], 2, function(x) kruskal.test(x = x, g = iris$Species)$p.value)
# Sepal.Length Sepal.Width Petal.Length Petal.Width
# 8.918734e-22 1.569282e-14 4.803974e-29 3.261796e-29

P-values from aov in R

I'm conducting a simulation study in R. Basically, I generate fake data sets and then run an ANOVA on the data using the aov function. But I'm having difficulty extracting p-values. Previous questionss do not help (Extract p-value from aov) -- I am running a mixed ANOVA.
First I have an ANOVA:
results <- summary(aov(dv~(A*B*C*D*E)+Error(subj/(A*B*C*D)), data = mdata)) # conduct repeated measures ANOVA
which generate this output:
Error: subj
Df Sum Sq Mean Sq F value Pr(>F)
E 1 1039157 1039157 0.95 0.334
Residuals 58 63428016 1093586
Error: subj:A
Df Sum Sq Mean Sq F value Pr(>F)
A 1 1996 1996 0.220 0.641
A:E 1 2294 2294 0.253 0.617
Residuals 58 526389 9076
...
I'm truncating the output for space. What I want list of p-values with the effect name (A or A:E). I have halfway succeeded, but it's messy. I can extract the p-values using this get_p function that I made.
#Function
get_p = function(results,head){
results[[1]]$'Pr(>F)'
}
#Get p-values
p <- sapply(results, get_p)
I end up with a this:
$`Error: subj`
[1] 0.3337094 NA
$`Error: subj:A`
[1] 0.6408826 0.6170181 NA
...
Any ideas on how to get a list of p-values (.6408, .6178) and effect names ('A', 'A:E')?
I found the answer, which seems to be:
get_p1 = function(results){
results[[1]]$'Pr(>F)'[[1]]
}
get_p2 = function(results){
results[[1]]$'Pr(>F)'[[2]]
}
pvals <- c(sapply(results, get_p1), sapply(results, get_p2))

Resources