R, programmatically give name of column - r

I've got a function to do ANOVA for a specific column (this code is simplified, my code does some other related things to that column too, and I do this set of calculations for different columns, so it deserves a function). alz is my dataframe.
analysis <- function(column) {
print(anova(lm(alz[[column]] ~ alz$Category)))
}
I call it e.g.:
analysis("VariableX")
And then in the output I get:
Analysis of Variance Table
Response: alz[[column]]
Df Sum Sq Mean Sq F value Pr(>F)
alz$Category 2 4.894 2.44684 9.3029 0.0001634 ***
Residuals 136 35.771 0.26302
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
How to make the output show the column name instead of alz[[column]]?

Here is an example:
> f <- function(n) {
+ fml <- as.formula(paste(n, "~cyl"))
+ print(anova(lm(fml, data = mtcars)))
+ }
>
> f("mpg")
Analysis of Variance Table
Response: mpg
Df Sum Sq Mean Sq F value Pr(>F)
cyl 1 817.71 817.71 79.561 6.113e-10 ***
Residuals 30 308.33 10.28
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

analysis <- function(column) {
afit <- anova(lm( alz[[column]] ~ alz$Category))
attr(afit, "heading") <- sub("\\: .+$", paste(": ", column) , attr( afit, "heading") )
print(afit)
}
The anova object carries its "Response:" value in an attribute named "heading". You would be better advised to use the 'data' argument to lm in the manner #kohske illustrated.

Related

ANOVA Error: $ operator is invalid for atomic vectors

I'm using this code to run an ANOVA using type II SS, when the error gets thrown Error: $ operator is invalid for atomic vectors
library(tidyverse)
programmers <- read_table("http://tofu.byu.edu/stat230/programmers.txt")
programmers$LargeSystemExp <-
as_factor(programmers$LargeSystemExp)
programmers$YearsOfExp <-
as_factor(programmers$YearsOfExp)
prog.lm <- lm(TimePredictionError ~ LargeSystemExp + YearsOfExp + LargeSystemExp:YearsOfExp, data=programmers)
anova(prog.lm)
anova(prog.lm,type=2)
How can I run the last line of code without error?
For type 2 ANOVA, use car::Anova will work.
car::Anova(prog.lm, type = 2)
Anova Table (Type II tests)
Response: TimePredictionError
Sum Sq Df F value Pr(>F)
LargeSystemExp 34504 1 358.59 2.469e-13 ***
YearsOfExp 41720 2 216.79 2.540e-13 ***
LargeSystemExp:YearsOfExp 24234 2 125.93 2.614e-11 ***
Residuals 1732 18
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Undefined columns error when performing TukeyHSD

I'm extremely new to R and need your help!
I performed an Anova/Factorial Anova and wanted to do a Tukey test however I got this error:
Error in `[.data.frame`(mf, mf.cols[[i]]) : undefined columns selected
Here is what I did for the anova and such (removed section testing for normality)
> data.aov<- aov(`FREQUENCY OF INGESTION` ~ `HYDROLOGY REGIME`*`DEPTH ZONE`*`ST. LOCATION`)
> anova(data.aov)
Analysis of Variance Table
Response: FREQUENCY OF INGESTION
Df Sum Sq Mean Sq F value Pr(>F)
`HYDROLOGY REGIME` 1 0.0002 0.0001530 0.0218 0.88274
`DEPTH ZONE` 3 0.0147 0.0049134 0.6990 0.55288
`ST. LOCATION` 1 0.0202 0.0201579 2.8677 0.09085 .
`HYDROLOGY REGIME`:`DEPTH ZONE` 2 0.0229 0.0114514 1.6291 0.19691
`DEPTH ZONE`:`ST. LOCATION` 1 0.0018 0.0017877 0.2543 0.61422
Residuals 651 4.5761 0.0070293
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> TukeyHSD(data.aov)
Error in `[.data.frame`(mf, mf.cols[[i]]) : undefined columns selected
> library(multcompView)
> multcompLetters(extract_p(TukeyHSD(aov(`FREQUENCY OF INGESTION`~`HYDROLOGY REGIME`*`DEPTH ZONE`*`ST. LOCATION`))) ```
Try using the TukeyC package. There are several facilities compared to other packages for factorial experiments, split-plot and etc. Follow the link: https://cran.r-project.org/web/packages/TukeyC/TukeyC.pdf

Error comparing linear mixed effects models

I want to see whether the fixed effect Group2 in my model is significant. The model is:
Response ~ Group1 + Group2 + Gender + Age + BMI + (1 | Subject)
To check the significance I create a null model not containing the effect Group2:
Resp.null = lmer(Response~Group1+Gender+Age+BMI+(1|Subject),
data=mydata,REML=FALSE)
and the full model containing the effect Group2:
Resp.model = lmer(Response~Group1+Group2+Gender+Age+BMI+(1|Subject),
data=mydata,REML=FALSE)
Then I use anova() to compare the two, but I get an error:
anova(Resp.null, Resp.model)
## Error in anova.merMod(Resp.null, Resp.model) :
## models were not all fitted to the same size of dataset
I think that the problem is that Group1 contains NaN, but I thought that linear mixed models were robust to missing data.
How can I solve this problem and compare the two models?
Do I have to delete the rows corresponding to NaN and fit Resp.null without these rows?
The data can be downloaded here.
Please note that you should replace "<undefined>" with NaN like this:
mydata = read.csv("mydata.csv")
mydata[mydata == "<undefined>"] <- NA
To avoid the "models were not all fitted to the same size of dataset" error in anova, you must fit both models on the exact same subset of data.
There are two simple ways to do this, and while this reproducible example uses lm and update, for lmer objects the same approach should work:
# 1st approach
# define a convenience wrapper
update_nested <- function(object, formula., ..., evaluate = TRUE){
update(object = object, formula. = formula., data = object$model, ..., evaluate = evaluate)
}
# prepare data with NAs
data(mtcars)
for(i in 1:ncol(mtcars)) mtcars[i,i] <- NA
xa <- lm(mpg~cyl+disp, mtcars)
xb <- update_nested(xa, .~.-cyl)
anova(xa, xb)
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + disp
## Model 2: mpg ~ disp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 256.91
## 2 27 301.32 -1 -44.411 4.4945 0.04371 *
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 2nd approach
xc <- update(xa, .~.-cyl, data=na.omit(mtcars[ , all.vars(formula(xa))]))
anova(xa, xc)
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + disp
## Model 2: mpg ~ disp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 256.91
## 2 27 301.32 -1 -44.411 4.4945 0.04371 *
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
If however you're only interested in testing a single variable (e.g. Group2), then perhaps the Anova() or linearHypothesis() in car would work as well for this usecase.
See also:
How to update `lm` or `glm` model on same subset of data?
R error which says "Models were not all fitted to the same size of dataset"
Fit Resp.model first, then use Resp.model#frame as data argument.
Resp.null = lmer(Response~Group1+Gender+Age+BMI+(1|Subject),
data=Resp.model#frame,REML=FALSE)

Format custom summary output to match with ANOVA output in R

I'm new to R. We have an assignment that i'm working on. The assignment is on creating R package to mimic Anova table. I have created all the necessary function that is mandated in the assignment. The function calculates the correct values, but I couldn't make it display like ANOVA table that R's built in anova() function can. This is my summary.oneway function
summary.oneway <- function(object, ...){
#model <- oneway(object)
fval <- object$FValue
TAB <- list(t(object$AOV), "Mean Sq."= rbind(object$MSB, object$MSW),
"F Value" = fval, p.value = object$p.value)
res <- list(call=object$call, onewayAnova = TAB)
class(res) <- "summary.oneway"
res
}
This is the output:
Analysis of Variance:
oneway.formula(formula = coag ~ diet, data = coagdata)
[[1]]
Sum of Squares Deg. of Freedom
diet 228 3
Residual 112 20
$`Mean Sq.`
1
[1,] 76.0
[2,] 5.6
$`F Value`
1
13.57143
$p.value
1
4.658471e-05
Actual ANOVA output:
Analysis of Variance Table
Response: coag
Df Sum Sq Mean Sq F value Pr(>F)
diet 3 228 76.0 13.571 4.658e-05 ***
Residuals 20 112 5.6
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
How can I achieve this format? Where and what am I missing?
Thank you so much for your help.
Kuni
The Anova output uses the print method print.anova you may want to take look at methods(print) and specifically stats:::print.anova
You will most likely want to create your own print function
print.oneway <- function(object, ...) {
foo
bar
}

How to hide anova significance levels on the bottom of the table

Suppose I compared two models of nested random effects using anova(), and the result is below:
new.model: new
current.model: new
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
new.model 8 299196 299259 -149590
current.model 9 299083 299154 -149533 115.19 1 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I would like to use only the table part (see below):
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
new.model 8 299196 299259 -149590
current.model 9 299083 299154 -149533 115.19 1 < 2.2e-16 ***
I know I am able to get rid of the heading part (see blow) by setting the heading to null using attributes(anova.object)$heading = NULL, but I don't know how to get rid of the bottom part: Signif. codes: .....
new.model: new
current.model: new
I crucially do not want to use data.frame (see below) as it changes the blank cells to NAs
data.frame(anova(new.model, current.model))
Df AIC BIC logLik Chisq Chi.Df Pr..Chisq.
new.model 8 299196.4 299258.9 -149590.2 NA NA NA
current.model 9 299083.2 299153.6 -149532.6 115.1851 1 7.168247e-27
I wonder if you guys know a way to deal with this situation.
[UPDATE]: I ended up writing a wrapper using print.anova:
anova.print = function(object, signif.stars = TRUE, heading = TRUE){
if(!heading)
attributes(object)$heading = NULL
print.anova(object, signif.stars = signif.stars)
}
Example:
dv = c(rnorm(20), rnorm(20, mean=2), rnorm(20))
iv = factor(rep(letters[1:3], each=20))
anova.object = anova(lm(dv~iv))
Analysis of Variance Table
Response: dv
Df Sum Sq Mean Sq F value Pr(>F)
iv 2 46.360 23.1798 29.534 1.578e-09 ***
Residuals 57 44.737 0.7849
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
anova.print(anova.object, F, F)
Df Sum Sq Mean Sq F value Pr(>F)
iv 2 46.360 23.1798 29.534 1.578e-09
Residuals 57 44.737 0.7849
EDIT: anova has a print method with signif.stars as a parameter
anova(new.model, current.model, signif.stars=FALSE)
> x <- anova(lm(hp~mpg+am, data=mtcars))
> print(x, signif.stars=F)
Analysis of Variance Table
Response: hp
Df Sum Sq Mean Sq F value Pr(>F)
mpg 1 87791 87791 54.5403 3.888e-08
am 1 11255 11255 6.9924 0.01307
Residuals 29 46680 1610
We had a similar post the other day about not showing NAs. You could do:
x <- as.matrix(anova(new.model, current.model))
print(x, na.print="", quote=FALSE)
A more reproducible example using the mtcars data set:
x <- as.matrix(anova(lm(hp~mpg+am, data=mtcars)))
print(x, na.print="", quote=FALSE)

Resources