Suppose I compared two models of nested random effects using anova(), and the result is below:
new.model: new
current.model: new
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
new.model 8 299196 299259 -149590
current.model 9 299083 299154 -149533 115.19 1 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I would like to use only the table part (see below):
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
new.model 8 299196 299259 -149590
current.model 9 299083 299154 -149533 115.19 1 < 2.2e-16 ***
I know I am able to get rid of the heading part (see blow) by setting the heading to null using attributes(anova.object)$heading = NULL, but I don't know how to get rid of the bottom part: Signif. codes: .....
new.model: new
current.model: new
I crucially do not want to use data.frame (see below) as it changes the blank cells to NAs
data.frame(anova(new.model, current.model))
Df AIC BIC logLik Chisq Chi.Df Pr..Chisq.
new.model 8 299196.4 299258.9 -149590.2 NA NA NA
current.model 9 299083.2 299153.6 -149532.6 115.1851 1 7.168247e-27
I wonder if you guys know a way to deal with this situation.
[UPDATE]: I ended up writing a wrapper using print.anova:
anova.print = function(object, signif.stars = TRUE, heading = TRUE){
if(!heading)
attributes(object)$heading = NULL
print.anova(object, signif.stars = signif.stars)
}
Example:
dv = c(rnorm(20), rnorm(20, mean=2), rnorm(20))
iv = factor(rep(letters[1:3], each=20))
anova.object = anova(lm(dv~iv))
Analysis of Variance Table
Response: dv
Df Sum Sq Mean Sq F value Pr(>F)
iv 2 46.360 23.1798 29.534 1.578e-09 ***
Residuals 57 44.737 0.7849
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
anova.print(anova.object, F, F)
Df Sum Sq Mean Sq F value Pr(>F)
iv 2 46.360 23.1798 29.534 1.578e-09
Residuals 57 44.737 0.7849
EDIT: anova has a print method with signif.stars as a parameter
anova(new.model, current.model, signif.stars=FALSE)
> x <- anova(lm(hp~mpg+am, data=mtcars))
> print(x, signif.stars=F)
Analysis of Variance Table
Response: hp
Df Sum Sq Mean Sq F value Pr(>F)
mpg 1 87791 87791 54.5403 3.888e-08
am 1 11255 11255 6.9924 0.01307
Residuals 29 46680 1610
We had a similar post the other day about not showing NAs. You could do:
x <- as.matrix(anova(new.model, current.model))
print(x, na.print="", quote=FALSE)
A more reproducible example using the mtcars data set:
x <- as.matrix(anova(lm(hp~mpg+am, data=mtcars)))
print(x, na.print="", quote=FALSE)
Related
I am performing an ANCOVA so as to test what is the relationship between body size (covariate, logLCC) and different head measures (response variable, logLP) in each sex (cathegorical variable, sexo).
I got the slopes for each sex in the lm and I would like to compare them to 1. More specifically, I would like to know if the slopes are significantly higher or less than 1, or if they are equal to 1, as this would have different biological meanings in their allometric relationships.
Here is my code:
#Modelling my lm#
> lm.logLP.sexo.adu<-lm(logLP~logLCC*sexo, data=ADU)
> anova(lm.logLP.sexo.adu)
Analysis of Variance Table
Response: logLP
Df Sum Sq Mean Sq F value Pr(>F)
logLCC 1 3.8727 3.8727 3407.208 < 2.2e-16 ***
sexo 1 0.6926 0.6926 609.386 < 2.2e-16 ***
logLCC:sexo 1 0.0396 0.0396 34.829 7.563e-09 ***
Residuals 409 0.4649 0.0011
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Obtaining slopes#
> lm.logLP.sexo.adu$coefficients
(Intercept) logLCC sexoM logLCC:sexoM
-0.1008891 0.6725818 -1.0058962 0.2633595
> lm.logLP.sexo.adu1<-lstrends(lm.logLP.sexo.adu,"sexo",var="logLCC")
> lm.logLP.sexo.adu1
sexo logLCC.trend SE df lower.CL upper.CL
H 0.6725818 0.03020017 409 0.6132149 0.7319487
M 0.9359413 0.03285353 409 0.8713585 1.0005241
Confidence level used: 0.95
#Comparing slopes#
> pairs(lm.logLP.sexo.adu1)
contrast estimate SE df t.ratio p.value
H - M -0.2633595 0.04462515 409 -5.902 <.0001
#Checking whether the slopes are different than 1#
#Computes Summary with statistics
> s1<-summary(lm.logLP.sexo.adu)
> s1
Call:
lm(formula = logLP ~ logLCC * sexo, data = ADU)
Residuals:
Min 1Q Median 3Q Max
-0.13728 -0.02202 -0.00109 0.01880 0.12468
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.10089 0.12497 -0.807 0.42
logLCC 0.67258 0.03020 22.271 < 2e-16 ***
sexoM -1.00590 0.18700 -5.379 1.26e-07 ***
logLCC:sexoM 0.26336 0.04463 5.902 7.56e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.03371 on 409 degrees of freedom
Multiple R-squared: 0.9083, Adjusted R-squared: 0.9076
F-statistic: 1350 on 3 and 409 DF, p-value: < 2.2e-16
#Computes t-student H0: intercept=1. The estimation of coefficients and their s.d. are in s1$coefficients
> t1<-(1-s1$coefficients[2,1])/s1$coefficients[2,2]
#Calculates two tailed probability
> pval<- 2 * pt(abs(t1), df = df.residual(lm.logLP.sexo.adu), lower.tail = FALSE)
> print(pval)
[1] 3.037231e-24
I saw this whole process in several threads here. But all that I can understand is that my slopes are just different from 1.
How could I check that they are greater or smaller than 1?
EDITED
Solved!
#performs one-side test H0=slope bigger than 1
pval<-pt(t1, df = df.residual(lm.logLP.sexo.adu), lower.tail = FALSE)
#performs one-side test H0=slope smaller than 1
pval<-pt(t1, df = df.residual(lm.logLP.sexo.adu), lower.tail = TRUE)
Also, tests should be performed in single-sex models.
How could I check that they are greater or smaller than 1?
As in this post, this post, and as your in question, you can make Wald test which you compute by
t1<-(1-s1$coefficients[2,1])/s1$coefficients[2,2]
Alternatively, use the vcov and coef function to make the code more readable
fit <- lm.logLP.sexo.adu
t1<-(1-coef(fit)[1])/vcov(fit)[1, 1]
The Wald test gives you t-statistics which can be used to make both a two-sided or one-sided test. Thus, you can drop the abs and set the lower.tail argument according to which tail you want to test in.
I want to see whether the fixed effect Group2 in my model is significant. The model is:
Response ~ Group1 + Group2 + Gender + Age + BMI + (1 | Subject)
To check the significance I create a null model not containing the effect Group2:
Resp.null = lmer(Response~Group1+Gender+Age+BMI+(1|Subject),
data=mydata,REML=FALSE)
and the full model containing the effect Group2:
Resp.model = lmer(Response~Group1+Group2+Gender+Age+BMI+(1|Subject),
data=mydata,REML=FALSE)
Then I use anova() to compare the two, but I get an error:
anova(Resp.null, Resp.model)
## Error in anova.merMod(Resp.null, Resp.model) :
## models were not all fitted to the same size of dataset
I think that the problem is that Group1 contains NaN, but I thought that linear mixed models were robust to missing data.
How can I solve this problem and compare the two models?
Do I have to delete the rows corresponding to NaN and fit Resp.null without these rows?
The data can be downloaded here.
Please note that you should replace "<undefined>" with NaN like this:
mydata = read.csv("mydata.csv")
mydata[mydata == "<undefined>"] <- NA
To avoid the "models were not all fitted to the same size of dataset" error in anova, you must fit both models on the exact same subset of data.
There are two simple ways to do this, and while this reproducible example uses lm and update, for lmer objects the same approach should work:
# 1st approach
# define a convenience wrapper
update_nested <- function(object, formula., ..., evaluate = TRUE){
update(object = object, formula. = formula., data = object$model, ..., evaluate = evaluate)
}
# prepare data with NAs
data(mtcars)
for(i in 1:ncol(mtcars)) mtcars[i,i] <- NA
xa <- lm(mpg~cyl+disp, mtcars)
xb <- update_nested(xa, .~.-cyl)
anova(xa, xb)
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + disp
## Model 2: mpg ~ disp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 256.91
## 2 27 301.32 -1 -44.411 4.4945 0.04371 *
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 2nd approach
xc <- update(xa, .~.-cyl, data=na.omit(mtcars[ , all.vars(formula(xa))]))
anova(xa, xc)
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + disp
## Model 2: mpg ~ disp
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 256.91
## 2 27 301.32 -1 -44.411 4.4945 0.04371 *
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
If however you're only interested in testing a single variable (e.g. Group2), then perhaps the Anova() or linearHypothesis() in car would work as well for this usecase.
See also:
How to update `lm` or `glm` model on same subset of data?
R error which says "Models were not all fitted to the same size of dataset"
Fit Resp.model first, then use Resp.model#frame as data argument.
Resp.null = lmer(Response~Group1+Gender+Age+BMI+(1|Subject),
data=Resp.model#frame,REML=FALSE)
I'm new to R. We have an assignment that i'm working on. The assignment is on creating R package to mimic Anova table. I have created all the necessary function that is mandated in the assignment. The function calculates the correct values, but I couldn't make it display like ANOVA table that R's built in anova() function can. This is my summary.oneway function
summary.oneway <- function(object, ...){
#model <- oneway(object)
fval <- object$FValue
TAB <- list(t(object$AOV), "Mean Sq."= rbind(object$MSB, object$MSW),
"F Value" = fval, p.value = object$p.value)
res <- list(call=object$call, onewayAnova = TAB)
class(res) <- "summary.oneway"
res
}
This is the output:
Analysis of Variance:
oneway.formula(formula = coag ~ diet, data = coagdata)
[[1]]
Sum of Squares Deg. of Freedom
diet 228 3
Residual 112 20
$`Mean Sq.`
1
[1,] 76.0
[2,] 5.6
$`F Value`
1
13.57143
$p.value
1
4.658471e-05
Actual ANOVA output:
Analysis of Variance Table
Response: coag
Df Sum Sq Mean Sq F value Pr(>F)
diet 3 228 76.0 13.571 4.658e-05 ***
Residuals 20 112 5.6
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
How can I achieve this format? Where and what am I missing?
Thank you so much for your help.
Kuni
The Anova output uses the print method print.anova you may want to take look at methods(print) and specifically stats:::print.anova
You will most likely want to create your own print function
print.oneway <- function(object, ...) {
foo
bar
}
I've got a function to do ANOVA for a specific column (this code is simplified, my code does some other related things to that column too, and I do this set of calculations for different columns, so it deserves a function). alz is my dataframe.
analysis <- function(column) {
print(anova(lm(alz[[column]] ~ alz$Category)))
}
I call it e.g.:
analysis("VariableX")
And then in the output I get:
Analysis of Variance Table
Response: alz[[column]]
Df Sum Sq Mean Sq F value Pr(>F)
alz$Category 2 4.894 2.44684 9.3029 0.0001634 ***
Residuals 136 35.771 0.26302
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
How to make the output show the column name instead of alz[[column]]?
Here is an example:
> f <- function(n) {
+ fml <- as.formula(paste(n, "~cyl"))
+ print(anova(lm(fml, data = mtcars)))
+ }
>
> f("mpg")
Analysis of Variance Table
Response: mpg
Df Sum Sq Mean Sq F value Pr(>F)
cyl 1 817.71 817.71 79.561 6.113e-10 ***
Residuals 30 308.33 10.28
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
analysis <- function(column) {
afit <- anova(lm( alz[[column]] ~ alz$Category))
attr(afit, "heading") <- sub("\\: .+$", paste(": ", column) , attr( afit, "heading") )
print(afit)
}
The anova object carries its "Response:" value in an attribute named "heading". You would be better advised to use the 'data' argument to lm in the manner #kohske illustrated.
By default lm summary test slope coefficient equal to zero. My question is very basic. I want to know how to test slope coefficient equal to non-zero value. One approach could be to use confint but this does not provide p-value. I also wonder how to do one-sided test with lm.
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2,10,20, labels=c("Ctl","Trt"))
weight <- c(ctl, trt)
lm.D9 <- lm(weight ~ group)
summary(lm.D9)
Call:
lm(formula = weight ~ group)
Residuals:
Min 1Q Median 3Q Max
-1.0710 -0.4938 0.0685 0.2462 1.3690
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.0320 0.2202 22.850 9.55e-15 ***
groupTrt -0.3710 0.3114 -1.191 0.249
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6964 on 18 degrees of freedom
Multiple R-squared: 0.07308, Adjusted R-squared: 0.02158
F-statistic: 1.419 on 1 and 18 DF, p-value: 0.249
confint(lm.D9)
2.5 % 97.5 %
(Intercept) 4.56934 5.4946602
groupTrt -1.02530 0.2833003
Thanks for your time and effort.
as #power says, you can do by your hand.
here is an example:
> est <- summary.lm(lm.D9)$coef[2, 1]
> se <- summary.lm(lm.D9)$coef[2, 2]
> df <- summary.lm(lm.D9)$df[2]
>
> m <- 0
> 2 * abs(pt((est-m)/se, df))
[1] 0.2490232
>
> m <- 0.2
> 2 * abs(pt((est-m)/se, df))
[1] 0.08332659
and you can do one-side test by omitting 2*.
UPDATES
here is an example of two-side and one-side probability:
> m <- 0.2
>
> # two-side probability
> 2 * abs(pt((est-m)/se, df))
[1] 0.08332659
>
> # one-side, upper (i.e., greater than 0.2)
> pt((est-m)/se, df, lower.tail = FALSE)
[1] 0.9583367
>
> # one-side, lower (i.e., less than 0.2)
> pt((est-m)/se, df, lower.tail = TRUE)
[1] 0.0416633
note that sum of upper and lower probabilities is exactly 1.
Use the linearHypothesis function from car package. For instance, you can check if the coefficient of groupTrt equals -1 using.
linearHypothesis(lm.D9, "groupTrt = -1")
Linear hypothesis test
Hypothesis:
groupTrt = - 1
Model 1: restricted model
Model 2: weight ~ group
Res.Df RSS Df Sum of Sq F Pr(>F)
1 19 10.7075
2 18 8.7292 1 1.9782 4.0791 0.05856 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The smatr package has a slope.test() function with which you can use OLS.
In addition to all the other good answers, you could use an offset. It's a little trickier with categorical predictors, because you need to know the coding.
lm(weight~group+offset(1*(group=="Trt")))
The 1* here is unnecessary but is put in to emphasize that you are testing against the hypothesis that the difference is 1 (if you want to test against a hypothesis of a difference of d, then use d*(group=="Trt")
You can use t.test to do this for your data. The mu parameter sets the hypothesis for the difference of group means. The alternative parameter lets you choose between one and two-sided tests.
t.test(weight~group,var.equal=TRUE)
Two Sample t-test
data: weight by group
t = 1.1913, df = 18, p-value = 0.249
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.2833003 1.0253003
sample estimates:
mean in group Ctl mean in group Trt
5.032 4.661
t.test(weight~group,var.equal=TRUE,mu=-1)
Two Sample t-test
data: weight by group
t = 4.4022, df = 18, p-value = 0.0003438
alternative hypothesis: true difference in means is not equal to -1
95 percent confidence interval:
-0.2833003 1.0253003
sample estimates:
mean in group Ctl mean in group Trt
5.032 4.661
Code up your own test. You know the estimated coeffiecient and you know the standard error. You could construct your own test stat.