R One-Way ANOVA (getting only 1 DF and expecting 2 DFs) - r

I'm working through the examples of One-Way ANOVA on the UCLA website http://www.ats.ucla.edu/stat/r/faq/posthoc.htm.
When I run the command a1 <-aov(write ~ ses), my output differs from the example output. I'm particularly bothered by the fact that when I run the command summary(a1), my DF on ses is 1 and there are three ses categories (1,2,3) so I'm expecting 2 DFs which is what the example on the website shows. I've checked the data for the 'write' column and 'ses' column and the counts and averages seem to match with the example, but the result from aov(write ~ ses) doesn't. Has something changed? Why am I getting only 1 DF.
hsb2 <- read.table("http://www.ats.ucla.edu/stat/data/hsb2.csv", sep=",", header=TRUE)
a1 <- aov(write ~ ses, data = hsb2)
summary(a1)
# Df Sum Sq Mean Sq F value Pr(>F)
# ses 1 770 769.8 8.908 0.0032 **
# Residuals 198 17109 86.4

The page you are learning from has an error, in that it doesn't tell you how to enter the data correctly. The ses variable is supposed to be a factor, as we can see from the data they give us, it is read in as numeric:
str(hsb2$ses)
If we convert it to a factor, we get the same answer as the example:
hsb2$ses <- as.factor(hsb2$ses)
a1 <- aov(write ~ ses, data=hsb2)
summary(a1)
Df Sum Sq Mean Sq F value Pr(>F)
ses 2 859 429.4 4.97 0.00784 **
Residuals 197 17020 86.4
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
In addition, using attach is highly discouraged by most R users.

Related

R: Calculating ANOVA Sum sqr for a model with interacting numerical and categorical variables

I need to know how it is calculated the Sum Sqr column of the anova() function in R, for a linear model with the form:
modelXg <-lm(Y ~ X * group, data)
(which is equivalent to lm(Y~ X+group+X:group, data=dat) )
where: "X" is a numerical variable, and "group" is a categorical one.
The function anova(modelXg) returns a table like:
Analysis of Variance Table
Response: TMIN
Df Sum Sq Mean Sq F value Pr(>F)
X 1 6476 6476.1 282.9208 < 2.2e-16 ***
group 1 1176 1176.4 51.3956 7.666e-13 ***
X:group 1 64 64.2 2.8058 0.09393 .
Residuals 45130 1033029 22.9
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
What I need is to know how to calculate all the terms of the Sum Sq column, described in a way as easy and reproducible as possible, because I need to implement it in C#.
I already searched a lot accross the Net, but I didn't find this exact case. I found some useful info in Interpretation of Sum Sq in ANOVA with numeric independent variable but it is incomplete for this case, because there the model does not involve the interaction between both variables.

running multiple anova tests in r

I have seven groups that I want to run ANOVA test on to see if there is a significant difference among each other based on a trait. And I have about 600 traits.
I already calculated per group and per trait their mean, standard deviation, and variance. the seven groups have different sample sizes. How can I arrange my data so that I will be able to run them all in R?
set.seed(2)
sampledata <- expand.grid(group = paste0("group", 1:7), trait = paste0("trait", 1:600), value = 1:5)
sampledata$value <- rnorm(nrow(sampledata))
sampledata.aov <- aov(value ~ group * trait, data = sampledata)
anova(sampledata.aov)
Analysis of Variance Table
Response: value
Df Sum Sq Mean Sq F value Pr(>F)
group 6 7.1 1.1784 1.1670 0.32072
trait 599 658.0 1.0985 1.0878 0.07096 .
group:trait 3594 3613.0 1.0053 0.9955 0.56604
Residuals 16800 16964.3 1.0098
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
A warning though, even with random numbers, you're more likely than not to have a significant difference when you have this many traits at once.

3-way ANOVA for reshaped data in R

I have just discovered reshaping in R and am unsure of how to proceed with an ANOVA once the data is reshaped. I found this site which has the data organized in a way very similar to my own data. If I were using this hypothetical data, how would I conduct a 3-way ANOVA say between race, program and subject? Now that the subjects have been reshaped into a single column I'm having trouble seeing how to include this variable using the typical ANOVA code. Any help would be much appreciated!
Assuming the data are in 'long format' and 'score' is your dependent variable you could do something like:
mymodel = aov(score ~ prog + race + subj, data=l)
summary(my model)
Which in this case yields:
Df Sum Sq Mean Sq F value Pr(>F)
prog 1 2864 2864 31.32 2.82e-08 ***
race 1 5064 5064 55.39 2.14e-13 ***
subj 4 106 27 0.29 0.885
Residuals 993 90780 91
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
n.b. this model contains only the main effects

lmerTest results in console but not displayed in knitted PDF

I have a question regarding lmerTest for approximating the degrees of freedom and p values for linear mixed model.
I just took stats class in R to help me with my behavior experiments in lab, so I am very new to this. When using R Studio, I can run the anova() after mounting the lmerTest library and I can see the results in my console, depicted below:
lmsocial <- lmer(social~ grps + stim + grps*stim + (1|cohort) +(1|subj))
library(lmerTest)
anova(lmsocial)
Analysis of Variance Table of type 3 with Satterthwaite
approximation for degrees of freedom
Sum Sq Mean Sq NumDF DenDF F.value Pr(>F)
grps 22471 22471 1 21 5.5922 0.027747 *
stim 54289 54289 1 22 13.5107 0.001326 **
grps:stim 40423 40423 1 22 10.0599 0.004416 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
However, below is the read out I get on my PDF when I knit in R Studio (same for knitting in HTML). The are no errors when knitting, just so info missing:
lmsocial <- lmer(social~ grps + stim + grps*stim + (1|cohort) +(1|subj))
library(lmerTest)
anova(lmsocial)
## Analysis of Variance Table
## Df Sum Sq Mean Sq F value
## grps 1 22471 22471 5.5922
## stim 1 54289 54289 13.5107
## grps:stim 1 40423 40423 10.0599
Am I missing something? When trying to generate a report, it would be nice to display the resulting ANOVA table form lmerTest.
How can I get it to knit properly?
lmerTest is doing something a bit sneaky, I haven't figured out what yet. In the meantime, it seems to work as long as you add an explicit print() statement:
library("lmerTest")
fm1 <- lmer(Reaction~Days + (Days|Subject),sleepstudy)
print(a1 <- anova(fm1))
(or print(anova(fm1)), or a1 <- anova(fm1); print(a))
the lmerTest package needs to be attached before the specification of the lmer model. The following should work:
library(lmerTest)
lmsocial <- lmer(social~ grps + stim + grps*stim + (1|cohort) +(1|subj))
anova(lmsocial)

"weighted" regression in R

I have created a script like the one below to do something I called as "weighted" regression:
library(plyr)
set.seed(100)
temp.df <- data.frame(uid=1:200,
bp=sample(x=c(100:200),size=200,replace=TRUE),
age=sample(x=c(30:65),size=200,replace=TRUE),
weight=sample(c(1:10),size=200,replace=TRUE),
stringsAsFactors=FALSE)
temp.df.expand <- ddply(temp.df,
c("uid"),
function(df) {
data.frame(bp=rep(df[,"bp"],df[,"weight"]),
age=rep(df[,"age"],df[,"weight"]),
stringsAsFactors=FALSE)})
temp.df.lm <- lm(bp~age,data=temp.df,weights=weight)
temp.df.expand.lm <- lm(bp~age,data=temp.df.expand)
You can see that in temp.df, each row has its weight, what I mean is that there is a total of 1178 sample but for rows with same bp and age, they are merge into 1 row and represented in the weight column.
I used the weight parameters in the lm function, then I cross check the result with another dataframe that the temp.df dataframe is "expanded". But I found the lm outputs different for the 2 dataframe.
Did I misinterpret the weight parameters in lm function, and can anyone let me know how to I run regression properly (i.e. without expanding the dataframe manually) for a dataset presented like temp.df? Thanks.
The problem here is that the degrees of freedom are not being properly added up to get the right Df and mean-sum-squares statistics. This will correct the problem:
temp.df.lm.aov <- anova(temp.df.lm)
temp.df.lm.aov$Df[length(temp.df.lm.aov$Df)] <-
sum(temp.df.lm$weights)-
sum(temp.df.lm.aov$Df[-length(temp.df.lm.aov$Df)] ) -1
temp.df.lm.aov$`Mean Sq` <- temp.df.lm.aov$`Sum Sq`/temp.df.lm.aov$Df
temp.df.lm.aov$`F value`[1] <- temp.df.lm.aov$`Mean Sq`[1]/
temp.df.lm.aov$`Mean Sq`[2]
temp.df.lm.aov$`Pr(>F)`[1] <- pf(temp.df.lm.aov$`F value`[1], 1,
temp.df.lm.aov$Df, lower.tail=FALSE)[2]
temp.df.lm.aov
Analysis of Variance Table
Response: bp
Df Sum Sq Mean Sq F value Pr(>F)
age 1 8741 8740.5 10.628 0.001146 **
Residuals 1176 967146 822.4
Compare with:
> anova(temp.df.expand.lm)
Analysis of Variance Table
Response: bp
Df Sum Sq Mean Sq F value Pr(>F)
age 1 8741 8740.5 10.628 0.001146 **
Residuals 1176 967146 822.4
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I am a bit surprised this has not come up more often on R-help. Either that or my search strategy development powers are weakening with old age.

Resources