Grouping Regressors in Anova Table for Multiple Linear Regression - r

I am running a regression on R
fbReg <- lm(y~x2+x7+x8,table.b1)
I then run an Anova table to analyze the significance of the regression
anova(fbReg)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x2 1 76.193 76.193 26.172 3.100e-05 ***
x7 1 139.501 139.501 47.918 3.698e-07 ***
x8 1 41.400 41.400 14.221 0.0009378 ***
Residuals 24 69.870 2.911
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Is there anything i can do to make my anova table sum all the sum of squares for x2,x7,x8 instead of having them separate.
Essentially, have the anova table look like this
df SS MS FvAL PR(>F)
Regression 3 257.094 ETC....
Error(Residual) 24 69.870 ETC.....
Thanks

To illustrate my comment:
> lm2 <- lm(Fertility ~ Catholic+Education+Agriculture, data = swiss)
> lm1 <- lm(Fertility ~ 1, data = swiss)
> anova(lm1,lm2)
Analysis of Variance Table
Model 1: Fertility ~ 1
Model 2: Fertility ~ Catholic + Education + Agriculture
Res.Df RSS Df Sum of Sq F Pr(>F)
1 46 7178.0
2 43 2567.9 3 4610.1 25.732 1.089e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Related

One-way ANOVA for loop: how do I initiate through multiple colums of a dataframe

I want to run more than 1000 different one way ANOVA's
I would like to see if the number of reads from a single miroRNA changes between four different groups. And I would like that from each of the more than 1000 miRNAs.
My tibble dataframe looks like this:
I have my 4 groups (YC, OC, YH, OH) and a different miRNA in each column.
enter image description here
I tried a for-loop through which I expect R to iterate through the name of the miRNAs and then summaries an ANOVA table and TukeyHSD test:
for(i in 2:ncol(test))
{column<-names(test[i])AVz<-summary(aov(test[,i]~Group,data =
test))tk<-TukeyHSD((aov(test[,i]~Group,data =
test)))print(column)print(AVz)print(tk)}
BUT this didn't work:
Error: unexpected symbol in "for(i in 2:ncol(test)){column<-names(test[i])AVz"
If you would like to run aov you can use cbind
formula <- as.formula(paste0("cbind(", paste(names(iris)[-5], collapse = ","), ") ~ Species"))
fit <- aov(formula, data=iris)
summary(fit)
Response Sepal.Length :
Df Sum Sq Mean Sq F value Pr(>F)
Species 2 63.212 31.606 119.26 < 2.2e-16 ***
Residuals 147 38.956 0.265
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Response Sepal.Width :
Df Sum Sq Mean Sq F value Pr(>F)
Species 2 11.345 5.6725 49.16 < 2.2e-16 ***
Residuals 147 16.962 0.1154
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Response Petal.Length :
Df Sum Sq Mean Sq F value Pr(>F)
Species 2 437.10 218.551 1180.2 < 2.2e-16 ***
Residuals 147 27.22 0.185
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Response Petal.Width :
Df Sum Sq Mean Sq F value Pr(>F)
Species 2 80.413 40.207 960.01 < 2.2e-16 ***
Residuals 147 6.157 0.042
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

r - emmeans pairwise analysis for multilevel repeated measures ANCOVA

I'm building a repeated measures ANCOVA using a multi-level framework through the AOV package. I have one continuous response variable, two factor predictors, and 3 continuous covariates. My script for the model is below:
ModelDV <- aov(DV ~ IV1 + IV2 + IV1*IV2 + CV1 + CV2 + CV3 + Error(PartID/(IV1 + IV2 + IV1:IV2)), data)
A snippet of my data set shows how it is formatted:
PartID DV IV1 IV2 CV1 CV2 CV3
1 56 CondA1 CondB1 Contunous values
2 45 CondA2 CondB2 -
3 32 CondA3 CondB1 -
4 21 CondA4 CondB2 -
1 10 CondA1 CondB1 -
2 19 CondA2 CondB2 -
3 35 CondA3 CondB1 -
4 45 CondA4 CondB2 -
My condiitons are embedded in the error term of the participant ID since this is a fully within repeated measures model.
I am attempting to conduct a pairwise analysis on these values. My output provides omnibus F-tests:
Error: PartID
Df Sum Sq Mean Sq F value Pr(>F)
CV1 1 348 348 0.442 0.5308
CV2 1 9 9 0.011 0.9193
CV3 1 3989 3989 5.063 0.0654 .
Residuals 6 4727 788
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: PartID:IV1
Df Sum Sq Mean Sq F value Pr(>F)
IV1 1 6222 6222 17.41 0.0024 **
Residuals 9 3217 357
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: PartID:IV2
Df Sum Sq Mean Sq F value Pr(>F)
IV2 2 6215 3107.7 16.18 9.51e-05 ***
Residuals 18 3457 192.1
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: PartID:IV1:IV2
Df Sum Sq Mean Sq F value Pr(>F)
IV1:IV2 2 575.2 287.6 1.764 0.2
Residuals 18 2934.4 163.0
When calculating emmeans via:
emm<-emmeans(Model, ~ IV1)
pairs(emm)
I get a sensible output.
However, when using this for the covariates:
emm<-emmeans(Model, ~ CV1)
pairs(emm)
I get the following output:
contrast estimate SE df z.ratio p.value
(nothing) nonEst NA NA NA NA
Results are averaged over the levels of: IV1, IV2
What am I doing wrong here that a pairwise comparison is not working for the covariate?
Short answer is because you have made them covariates to control for them and not to consider them as part of the explanation for your model. You of course could do pairwise comparisons for the covariates outside the model but not inside the model framework. Longer blogpost using these tools I wrote here...

Convert mgcv or gamm4 gam/bam output to dataframe

The broom package has a great tidy() function for the summary results of simple linear models such as those generated by lm(). However, tidy() does not work for mgcv::bam(), mgcv::gam() or gamm4::gamm4. The bam below produces the following:
library(mgcv)
set.seed(3)
dat <- gamSim(1,n=25000,dist="normal",scale=20)
bs <- "cr";k <- 12
b <- bam(y ~ s(x0,bs=bs)+s(x1,bs=bs)+s(x2,bs=bs,k=k)+
s(x3,bs=bs),data=dat)
summary(b)
tidy(b)
glance(b)
Output of above code:
> summary(b)
Family: gaussian
Link function: identity
Formula:
y ~ s(x0, bs = bs) + s(x1, bs = bs) + s(x2, bs = bs, k = k) +
s(x3, bs = bs)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.8918 0.1275 61.88 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x0) 3.113 3.863 6.667 3.47e-05 ***
s(x1) 2.826 3.511 63.015 < 2e-16 ***
s(x2) 8.620 9.905 52.059 < 2e-16 ***
s(x3) 1.002 1.004 3.829 0.0503 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.0295 Deviance explained = 3.01%
fREML = 1.1057e+05 Scale est. = 406.15 n = 25000
> tidy(b)
data frame with 0 columns and 0 rows
> glance(b)
Error in `$<-.data.frame`(`*tmp*`, "logLik", value = -110549.163197452) :
replacement has 1 row, data has 0
How can I convert the summary to a dataframe so I can access outputs like the coefficients?

Proportion of variance of outcome explained by each variable in a linear regression

In the example data set found below I want to calculate the proportion of variance in science explained by each independent variable using linear regression model. How could I achieve that in R?
hsb2 <- read.table('http://www.ats.ucla.edu/stat/r/modules/hsb2.csv', header=T, sep=",")
m1<-lm(science ~ math+female+ socst+ read, data =hsb2)
One of the ways is to use anova() function from stats package.
It gives you the residual sum of squares explained by each variable and total sum of squares (i.e. variance)
anova(m1)
Analysis of Variance Table
Response: science
Df Sum Sq Mean Sq F value Pr(>F)
math 1 7760.6 7760.6 151.8810 < 2.2e-16 ***
female 1 233.0 233.0 4.5599 0.033977 *
socst 1 465.6 465.6 9.1128 0.002878 **
read 1 1084.5 1084.5 21.2254 7.363e-06 ***
Residuals 195 9963.8 51.1
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

How can I get aov to show me the F-statistic and p-value?

The following script
#!/usr/bin/Rscript --vanilla
x <- c(4.5,6.4,7.2,6.7,8.8,7.8,9.6,7.0,5.9,6.8,5.7,5.2)
fertilizer<- factor(c('A','A','A','A','B','B','B','B','C','C','C','C'))
crop <- factor(c('I','II','III','IV','I','II','III','IV','I','II','III','IV'))
av <- aov(x~fertilizer*crop)
summary(av)
yields
Df Sum Sq Mean Sq
fertilizer 2 13.6800 6.8400
crop 3 2.8200 0.9400
fertilizer:crop 6 6.5800 1.0967
For other data, aov usually gives the F-statistic and associated p-value. What is wrong/special about this data that causes R to omit the juicy parts?
Should you using + instead of * in the formula?
> summary(aov(x~fertilizer + crop))
Df Sum Sq Mean Sq F value Pr(>F)
fertilizer 2 13.6800 6.8400 6.2371 0.03426 *
crop 3 2.8200 0.9400 0.8571 0.51218
Residuals 6 6.5800 1.0967
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Resources