3-way ANOVA for reshaped data in R - r

I have just discovered reshaping in R and am unsure of how to proceed with an ANOVA once the data is reshaped. I found this site which has the data organized in a way very similar to my own data. If I were using this hypothetical data, how would I conduct a 3-way ANOVA say between race, program and subject? Now that the subjects have been reshaped into a single column I'm having trouble seeing how to include this variable using the typical ANOVA code. Any help would be much appreciated!

Assuming the data are in 'long format' and 'score' is your dependent variable you could do something like:
mymodel = aov(score ~ prog + race + subj, data=l)
summary(my model)
Which in this case yields:
Df Sum Sq Mean Sq F value Pr(>F)
prog 1 2864 2864 31.32 2.82e-08 ***
race 1 5064 5064 55.39 2.14e-13 ***
subj 4 106 27 0.29 0.885
Residuals 993 90780 91
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
n.b. this model contains only the main effects

Related

R: Calculating ANOVA Sum sqr for a model with interacting numerical and categorical variables

I need to know how it is calculated the Sum Sqr column of the anova() function in R, for a linear model with the form:
modelXg <-lm(Y ~ X * group, data)
(which is equivalent to lm(Y~ X+group+X:group, data=dat) )
where: "X" is a numerical variable, and "group" is a categorical one.
The function anova(modelXg) returns a table like:
Analysis of Variance Table
Response: TMIN
Df Sum Sq Mean Sq F value Pr(>F)
X 1 6476 6476.1 282.9208 < 2.2e-16 ***
group 1 1176 1176.4 51.3956 7.666e-13 ***
X:group 1 64 64.2 2.8058 0.09393 .
Residuals 45130 1033029 22.9
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
What I need is to know how to calculate all the terms of the Sum Sq column, described in a way as easy and reproducible as possible, because I need to implement it in C#.
I already searched a lot accross the Net, but I didn't find this exact case. I found some useful info in Interpretation of Sum Sq in ANOVA with numeric independent variable but it is incomplete for this case, because there the model does not involve the interaction between both variables.

How do I treat female and male as binary variables when I am working with binomial data?

I am a complete beginner in R/R Studio, coding and statistics in general.
In R, I am running a GLM where my Y variable is a no/yes (0/1) category and my X variable is a Sex category (female/male).
So I have run the following script:
hello <- read.csv(file.choose())
hello$sexbin <- ifelse(hello$Sex == 'm',0,ifelse(hello$Sex == 'f',1,NA))
modifhello <- subset(hello,hello$Combi_cag_long>=36)
model1 <- glm(modifhello$VAB~modifhello$Sex, family=binomial(link=logit),
na.action=na.exclude, data=modifhello)
summary.lm(model1)
However, in my output, R seems to have split male/female as two separate variables, suggesting that it is not treating them as proper binary variables:
Coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.689 1.009 -3.656 0.000258 ***
modifhello$Sexf 2.506 1.010 2.482 0.013084 *
modifhello$Sexm 2.922 1.010 2.894 0.003820 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
What do I need to add to my script to correct this?
FOUND THE SOLUTION
Need to simply put modifhello$VAB~modifhello$sexbin not modifhello$VAB~modifhello$sex (as this is the old column).

Is there a way to get a 10% significance level in the results of Anova in R?

I need to use the aov (Anova) command in R to test for an interaction between two predictors in a data frame as follows:
av1<-aov(Index~Country*Period,cpi)
but I need the results at a 10% significance level (rather than the default of 5%). Is there a way to do this using the aov function in R?
Imagine the scenario of:
The Consumer Price Index (expressed as a percentage) is randomly sampled in Ireland, UK and France over Period 1 = {Jan – Apr}, Period 2 = {May – Aug} and Period 3 = (Sept – Dec}. Results are given below. It is of interest to know whether the Index varies by Country and by Period.
av1<-aov(Index~Country*Period,cpi)
# To produce the 2-way ANOVA table we use the function summary(…) in R
summary(av1)
The results from this Anova command uses a 5% significance level (according to the lecturer on our course).
Is there a way to get a significance level of 10%?
Any advice gratefully received!
This is the result at 5% (including a note from the Lecturer afterwards):
summary(av1)
Df Sum Sq Mean Sq F value Pr(>F)
Country 3 3.669 1.2230 117.788 1.54e-07 ***
Period 2 0.053 0.0267 2.569 0.1310
Country:Period 3 0.121 0.0402 3.869 0.0498 *
Residuals 9 0.093 0.0104
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Lecturer's question: Is there evidence of an interaction effect between the factors ‘Country’ and ‘Period’? Explain.
Lecturer's answer: The interaction effect between the factors ‘Country’ and ‘Period’ is marginally significant at 5% (p=0.049)

Using RDA in R with just one dataset

This is possibly a stupid question, but I was told to do a Redundancy Analysis in R (using the package Vegan) to test the differences between different groups in my data. However I only have one dataset (roughly comparable to the Iris dataset (https://en.wikipedia.org/wiki/Iris_flower_data_set)), and everything I have found on RDA seems to need two matching sets. Did I mishear or misunderstand, or is there something else going on here?
As far as the underlying statistics are concerned, you have two data matrices;
the four morphological variables in the iris data set
a single categorical predictor variable or constraint
In vegan using rda() for this and the iris example data you'd do:
library("vegan")
iris.d <- iris[, 1:4]
ord <- rda(iris.d ~ Species, data = iris)
ord
set.seed(1)
anova(ord)
The permutation test, tests for differences between species.
> anova(ord)
Permutation test for rda under reduced model
Permutation: free
Number of permutations: 999
Model: rda(formula = iris.d ~ Species, data = iris)
Df Variance F Pr(>F)
Model 2 3.9736 487.33 0.001 ***
Residual 147 0.5993
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
You might also look at adonis(), which should do the same thing here as RDA but from a different view point:
> adonis(iris.d ~ Species, data = iris)
Call:
adonis(formula = iris.d ~ Species, data = iris)
Permutation: free
Number of permutations: 999
Terms added sequentially (first to last)
Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
Species 2 2.31730 1.15865 532.74 0.87876 0.001 ***
Residuals 147 0.31971 0.00217 0.12124
Total 149 2.63701 1.00000
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(For some reason that is a lot slower...)
Also see betadisper() as you might detect a difference in means (centroids) using these methods where that may be due at least in part to differences in variance (dispersion).

R One-Way ANOVA (getting only 1 DF and expecting 2 DFs)

I'm working through the examples of One-Way ANOVA on the UCLA website http://www.ats.ucla.edu/stat/r/faq/posthoc.htm.
When I run the command a1 <-aov(write ~ ses), my output differs from the example output. I'm particularly bothered by the fact that when I run the command summary(a1), my DF on ses is 1 and there are three ses categories (1,2,3) so I'm expecting 2 DFs which is what the example on the website shows. I've checked the data for the 'write' column and 'ses' column and the counts and averages seem to match with the example, but the result from aov(write ~ ses) doesn't. Has something changed? Why am I getting only 1 DF.
hsb2 <- read.table("http://www.ats.ucla.edu/stat/data/hsb2.csv", sep=",", header=TRUE)
a1 <- aov(write ~ ses, data = hsb2)
summary(a1)
# Df Sum Sq Mean Sq F value Pr(>F)
# ses 1 770 769.8 8.908 0.0032 **
# Residuals 198 17109 86.4
The page you are learning from has an error, in that it doesn't tell you how to enter the data correctly. The ses variable is supposed to be a factor, as we can see from the data they give us, it is read in as numeric:
str(hsb2$ses)
If we convert it to a factor, we get the same answer as the example:
hsb2$ses <- as.factor(hsb2$ses)
a1 <- aov(write ~ ses, data=hsb2)
summary(a1)
Df Sum Sq Mean Sq F value Pr(>F)
ses 2 859 429.4 4.97 0.00784 **
Residuals 197 17020 86.4
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
In addition, using attach is highly discouraged by most R users.

Resources