I'm new to R. We have an assignment that i'm working on. The assignment is on creating R package to mimic Anova table. I have created all the necessary function that is mandated in the assignment. The function calculates the correct values, but I couldn't make it display like ANOVA table that R's built in anova() function can. This is my summary.oneway function
summary.oneway <- function(object, ...){
#model <- oneway(object)
fval <- object$FValue
TAB <- list(t(object$AOV), "Mean Sq."= rbind(object$MSB, object$MSW),
"F Value" = fval, p.value = object$p.value)
res <- list(call=object$call, onewayAnova = TAB)
class(res) <- "summary.oneway"
res
}
This is the output:
Analysis of Variance:
oneway.formula(formula = coag ~ diet, data = coagdata)
[[1]]
Sum of Squares Deg. of Freedom
diet 228 3
Residual 112 20
$`Mean Sq.`
1
[1,] 76.0
[2,] 5.6
$`F Value`
1
13.57143
$p.value
1
4.658471e-05
Actual ANOVA output:
Analysis of Variance Table
Response: coag
Df Sum Sq Mean Sq F value Pr(>F)
diet 3 228 76.0 13.571 4.658e-05 ***
Residuals 20 112 5.6
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
How can I achieve this format? Where and what am I missing?
Thank you so much for your help.
Kuni
The Anova output uses the print method print.anova you may want to take look at methods(print) and specifically stats:::print.anova
You will most likely want to create your own print function
print.oneway <- function(object, ...) {
foo
bar
}
Related
In R, I am trying to change the variables inside my linear model dynamically. I have saved a character vector of variables that I want to use in my lm as moderating variables. This works well for numeric type of variables, however, it is not a good solution for the factor type variables as R does not know they are factors with levels.
My problem is outlined below with a simple example, say I have some data here...
yVar <- c(1,2,3,4,5)
xVar <- c(2,1,2,1,2)
numVar1 <- c(1,2,2,3,4)
numVar2 <- c(1,1,2,2,3)
facVar1 <-c(1,2,3,4,5)
facVar2 <-c(1,2,1,2,1)
xVar <- factor(xVar,levels=c(1:2),labels=c("Condition1","Condition2"))
facVar1 <-factor(facVar1,levels=c(1:5),labels=c("red","blue","green","black","yellow"))
facVar2 <-factor(facVar2, levels=c(1:2), labels=c("dog","cat"))
studyData <- data.frame(yVar,xVar,numVar1,numVar2,facVar1,facVar2)
The standard model would look like:
standardModel <- lm(data=studyData, yVar ~ xVar)
summary.aov(standardModel)
I would like to dynamically include a list of moderating variables to use with this model from zList. As so:
zList <- c("numVar1","numVar2","facVar1","facVar2")
And then call variables from the Z list
for (z in zList) {
lmfit <- lm(as.formula(paste("yVar ~ xVar*",z)), data=studyData)
print(z)
print(typeof(z))
print(levels(z))
print(summary.aov(lmfit))
}
This gives the output below:
[1] "numVar1"
[1] "character"
NULL
Df Sum Sq Mean Sq F value Pr(>F)
xVar 1 0.000 0.000 0.000 1.000
numVar1 1 9.484 9.484 33.194 0.109
xVar:numVar1 1 0.230 0.230 0.806 0.534
Residuals 1 0.286 0.286
[1] "numVar2"
[1] "character"
NULL
Df Sum Sq Mean Sq F value Pr(>F)
xVar 1 0 0 2.200e-02 0.906
numVar2 1 10 10 1.781e+31 <2e-16 ***
xVar:numVar2 1 0 0 7.560e-01 0.544
Residuals 1 0 0
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[1] "facVar1"
[1] "character"
NULL
Df Sum Sq Mean Sq
xVar 1 0 0.000
facVar1 3 10 3.333
[1] "facVar2"
[1] "character"
NULL
Df Sum Sq Mean Sq F value Pr(>F)
xVar 1 0 0.000 0 1
Residuals 3 10 3.333
As can be seen, for the numeric type of variables, this solution seems to work (the no. of levels in NULL as they should be and the lm output looks fine). However, for factorial variables, the number of levels is also "NULL", so R doesn't know that this variable is of type factor and has levels.
What could I do so that I could run my linear model, and allow variables to change dynamically on the fly, whereby R knows what type the variable is? Is there an alternative better way of solving this problem?
Thank you in advance for any replies.
If you want the loop to print the information about z while it is fitting the several models, the following code will do it. The vector zList is a character vector, so z is a character string, the variables can be accessed with get(z).
The fitted models will be in list lm_list. Then a sequence of simpler lapply instructions can produce aov objects (in a list, aov_list) or summary statistics.
lm_list <- lapply(zList, function(z) {
cat("\n", "name:", z, "\n")
zvar <- get(z)
cat("typeof:", typeof(zvar), "\n")
cat("class:", class(zvar), "\n")
if(is.factor(zvar)) cat("levels:", levels(zvar), "\n")
fmla <- as.formula(paste("yVar ~ xVar *", z))
lm(fmla, data = studyData)
})
lm_smry <- lapply(lm_list, summary)
lm_smry
aov_list <- lapply(lm_list, aov)
lapply(aov_list, summary)
I want to see whether the fixed effect Group2 in my model is significant. The model is:
Response ~ Group1 + Group2 + Gender + Age + BMI + (1 | Subject)
To check the significance I create a null model not containing the effect Group2:
Resp.null = lmer(Response~Group1+Gender+Age+BMI+(1|Subject),
data=mydata,REML=FALSE)
and the full model containing the effect Group2:
Resp.model = lmer(Response~Group1+Group2+Gender+Age+BMI+(1|Subject),
data=mydata,REML=FALSE)
Then I use anova() to compare the two, but I get an error:
anova(Resp.null, Resp.model)
## Error in anova.merMod(Resp.null, Resp.model) :
## models were not all fitted to the same size of dataset
I think that the problem is that Group1 contains NaN, but I thought that linear mixed models were robust to missing data.
How can I solve this problem and compare the two models?
Do I have to delete the rows corresponding to NaN and fit Resp.null without these rows?
The data can be downloaded here.
Please note that you should replace "<undefined>" with NaN like this:
mydata = read.csv("mydata.csv")
mydata[mydata == "<undefined>"] <- NA
To avoid the "models were not all fitted to the same size of dataset" error in anova, you must fit both models on the exact same subset of data.
There are two simple ways to do this, and while this reproducible example uses lm and update, for lmer objects the same approach should work:
# 1st approach
# define a convenience wrapper
update_nested <- function(object, formula., ..., evaluate = TRUE){
update(object = object, formula. = formula., data = object$model, ..., evaluate = evaluate)
}
# prepare data with NAs
data(mtcars)
for(i in 1:ncol(mtcars)) mtcars[i,i] <- NA
xa <- lm(mpg~cyl+disp, mtcars)
xb <- update_nested(xa, .~.-cyl)
anova(xa, xb)
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + disp
## Model 2: mpg ~ disp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 256.91
## 2 27 301.32 -1 -44.411 4.4945 0.04371 *
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 2nd approach
xc <- update(xa, .~.-cyl, data=na.omit(mtcars[ , all.vars(formula(xa))]))
anova(xa, xc)
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + disp
## Model 2: mpg ~ disp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 256.91
## 2 27 301.32 -1 -44.411 4.4945 0.04371 *
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
If however you're only interested in testing a single variable (e.g. Group2), then perhaps the Anova() or linearHypothesis() in car would work as well for this usecase.
See also:
How to update `lm` or `glm` model on same subset of data?
R error which says "Models were not all fitted to the same size of dataset"
Fit Resp.model first, then use Resp.model#frame as data argument.
Resp.null = lmer(Response~Group1+Gender+Age+BMI+(1|Subject),
data=Resp.model#frame,REML=FALSE)
I want to extract the standard errors from the output of the tsls command in the sem R package.
Using some generic code as an example:
fit = tsls(Y ~ X, ~Z)
summary(fit)
The summary function outputs several things besides the regression estimates (e.g., model formulas, summary of the residuals).
I want an equivalent to fit$coef that outputs standard errors. But that doesn't seem to be an option. All the code used to do the equivalent for glm and lm output doesn't seem to work here. Is there any way to hack the output?
Sometimes it takes a little bit of digging to find where these values are coming from. The best place to look, if you don't get any clues from str(fit), would be to look at what summary.tsls is doing.
With some help from getAnywhere("summary.tsls"), we see:
getAnywhere("summary.tsls")
# A single object matching ‘summary.tsls’ was found
# It was found in the following places
# registered S3 method for summary from namespace sem
# namespace:sem
# with value
#
# function (object, digits = getOption("digits"), ...)
# {
# ###
# ### \\\SNIP///
# ###
# std.errors <- sqrt(diag(object$V))
# ###
# ### \\\SNIP///
# ###
# }
# <bytecode: 0x503c530>
# <environment: namespace:sem>
So, to get the value you are looking for, you need to calculate it yourself with:
sqrt(diag(fit$V))
A reproducible example:
library(sem)
fit <- tsls(Q ~ P + D, ~ D + F + A, data=Kmenta)
summary(fit)
#
# 2SLS Estimates
#
# Model Formula: Q ~ P + D
#
# Instruments: ~D + F + A
#
# Residuals:
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# -3.4300 -1.2430 -0.1895 0.0000 1.5760 2.4920
#
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 94.63330387 7.92083831 11.94738 1.0762e-09 ***
# P -0.24355654 0.09648429 -2.52431 0.021832 *
# D 0.31399179 0.04694366 6.68869 3.8109e-06 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 1.9663207 on 17 degrees of freedom
sqrt(diag(fit$V))
# (Intercept) P D
# 7.92083831 0.09648429 0.04694366
Suppose I compared two models of nested random effects using anova(), and the result is below:
new.model: new
current.model: new
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
new.model 8 299196 299259 -149590
current.model 9 299083 299154 -149533 115.19 1 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I would like to use only the table part (see below):
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
new.model 8 299196 299259 -149590
current.model 9 299083 299154 -149533 115.19 1 < 2.2e-16 ***
I know I am able to get rid of the heading part (see blow) by setting the heading to null using attributes(anova.object)$heading = NULL, but I don't know how to get rid of the bottom part: Signif. codes: .....
new.model: new
current.model: new
I crucially do not want to use data.frame (see below) as it changes the blank cells to NAs
data.frame(anova(new.model, current.model))
Df AIC BIC logLik Chisq Chi.Df Pr..Chisq.
new.model 8 299196.4 299258.9 -149590.2 NA NA NA
current.model 9 299083.2 299153.6 -149532.6 115.1851 1 7.168247e-27
I wonder if you guys know a way to deal with this situation.
[UPDATE]: I ended up writing a wrapper using print.anova:
anova.print = function(object, signif.stars = TRUE, heading = TRUE){
if(!heading)
attributes(object)$heading = NULL
print.anova(object, signif.stars = signif.stars)
}
Example:
dv = c(rnorm(20), rnorm(20, mean=2), rnorm(20))
iv = factor(rep(letters[1:3], each=20))
anova.object = anova(lm(dv~iv))
Analysis of Variance Table
Response: dv
Df Sum Sq Mean Sq F value Pr(>F)
iv 2 46.360 23.1798 29.534 1.578e-09 ***
Residuals 57 44.737 0.7849
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
anova.print(anova.object, F, F)
Df Sum Sq Mean Sq F value Pr(>F)
iv 2 46.360 23.1798 29.534 1.578e-09
Residuals 57 44.737 0.7849
EDIT: anova has a print method with signif.stars as a parameter
anova(new.model, current.model, signif.stars=FALSE)
> x <- anova(lm(hp~mpg+am, data=mtcars))
> print(x, signif.stars=F)
Analysis of Variance Table
Response: hp
Df Sum Sq Mean Sq F value Pr(>F)
mpg 1 87791 87791 54.5403 3.888e-08
am 1 11255 11255 6.9924 0.01307
Residuals 29 46680 1610
We had a similar post the other day about not showing NAs. You could do:
x <- as.matrix(anova(new.model, current.model))
print(x, na.print="", quote=FALSE)
A more reproducible example using the mtcars data set:
x <- as.matrix(anova(lm(hp~mpg+am, data=mtcars)))
print(x, na.print="", quote=FALSE)
I've got a function to do ANOVA for a specific column (this code is simplified, my code does some other related things to that column too, and I do this set of calculations for different columns, so it deserves a function). alz is my dataframe.
analysis <- function(column) {
print(anova(lm(alz[[column]] ~ alz$Category)))
}
I call it e.g.:
analysis("VariableX")
And then in the output I get:
Analysis of Variance Table
Response: alz[[column]]
Df Sum Sq Mean Sq F value Pr(>F)
alz$Category 2 4.894 2.44684 9.3029 0.0001634 ***
Residuals 136 35.771 0.26302
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
How to make the output show the column name instead of alz[[column]]?
Here is an example:
> f <- function(n) {
+ fml <- as.formula(paste(n, "~cyl"))
+ print(anova(lm(fml, data = mtcars)))
+ }
>
> f("mpg")
Analysis of Variance Table
Response: mpg
Df Sum Sq Mean Sq F value Pr(>F)
cyl 1 817.71 817.71 79.561 6.113e-10 ***
Residuals 30 308.33 10.28
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
analysis <- function(column) {
afit <- anova(lm( alz[[column]] ~ alz$Category))
attr(afit, "heading") <- sub("\\: .+$", paste(": ", column) , attr( afit, "heading") )
print(afit)
}
The anova object carries its "Response:" value in an attribute named "heading". You would be better advised to use the 'data' argument to lm in the manner #kohske illustrated.