Generate beautiful summary table output for lmer models in R Markdown? - r

I have a lot of lmer models with a summary() output that looks like this (usually with more variables, this was just a very quick nonsense example I generated):
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: mpg ~ cyl + hp + disp * drat + (1 | gear) + (1 | carb)
Data: mtcars
REML criterion at convergence: 171.8
Scaled residuals:
Min 1Q Median 3Q Max
-1.4583 -0.5671 -0.2118 0.3912 2.1303
Random effects:
Groups Name Variance Std.Dev.
carb (Intercept) 2.07353 1.4400
gear (Intercept) 0.04659 0.2158
Residual 7.63829 2.7637
Number of obs: 32, groups: carb, 6; gear, 3
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 6.27454 11.75344 25.17470 0.534 0.5981
cyl -0.30519 0.84443 25.55852 -0.361 0.7208
hp -0.01412 0.01798 10.10227 -0.785 0.4503
disp 0.03101 0.03649 23.25810 0.850 0.4041
drat 5.97395 2.65430 24.44199 2.251 0.0337 *
disp:drat -0.01380 0.01172 24.05299 -1.178 0.2503
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) cyl hp disp drat
cyl -0.512
hp -0.087 -0.337
disp -0.710 -0.021 0.341
drat -0.965 0.317 0.142 0.823
disp:drat 0.698 -0.131 -0.461 -0.950 -0.778
convergence code: 0
Model failed to converge with max|grad| = 0.0112263 (tol = 0.002, component 1)
Is there a way to generate a summary() output with all of this information, but nicely formatted for RMarkdown? I've found a variety of solutions for Markdown (e.g. kable, huxtable, pander), but most of them are only able to display for example the Fixed Effects table. In those examples, the Signif. codes stars (*-***) are usually also not shown.
I'm looking for a function that gives me an output just like this, but formatted for RMarkdown and including the significance codes.

Related

How to explain the result of ANCOVA and linear regression

Currently I am learning ANCOVA, but I'm confused with the result.
I created a linear regression model using mtcars like this:
summary(lm(qsec ~ wt+factor(am), data = mtcars))
The output is:
Call:
lm(formula = qsec ~ wt + factor(am), data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-2.6898 -1.3063 0.0167 1.1398 3.9917
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22.5990 1.5596 14.490 8.17e-15 ***
wt -1.1716 0.4025 -2.911 0.00685 **
factor(am)1 -2.4141 0.7892 -3.059 0.00474 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.582 on 29 degrees of freedom
Multiple R-squared: 0.267, Adjusted R-squared: 0.2165
F-statistic: 5.283 on 2 and 29 DF, p-value: 0.01106
As you see, the p value of wt showed 0.00685, which meaned a strong linear correlation between wt and qsec, as well as am.
But when I ran aov code:
summary(aov(qsec ~ wt+factor(am), data = mtcars))
With the output:
Df Sum Sq Mean Sq F value Pr(>F)
wt 1 3.02 3.022 1.208 0.28081
factor(am) 1 23.41 23.413 9.358 0.00474 **
Residuals 29 72.55 2.502
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
It seems like there was no effect from wt on qsec.
Does it mean that a strong linear correlation between wt and qsec could be confirmed but there is no great effect from wt on qsec?
Is my explanation appropriate?
First drop the factor since am only has two values so making it a factor will not have any effect on the results.
Now regarding the tests to get the p values they are different. For lm the wt p value is based on the comparison of these two models
qsec ~ am
qsec ~ wt + am
so we have
anova(lm(qsec ~ am, mtcars), lm(qsec ~ mt + am, mtcars))
## Model 1: qsec ~ am
## Model 2: qsec ~ wt + am
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 93.758
## 2 29 72.554 1 21.204 8.4753 0.006854 ** <-- 0.00685
summary(lm(qsec ~ wt + am, mtcars))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.5990 1.5596 14.490 8.17e-15 ***
## wt -1.1716 0.4025 -2.911 0.00685 ** <-- 0.00685
## am -2.4141 0.7892 -3.059 0.00474 **
whereas aov is really meant for balanced designs where the terms are orthogonal and if not, as here, then it conceptually orthogonalizes them sequentially so in that case the comparison is effectively between these two models
qsec ~ r.am
qsec ~ r.wt + r.am
where r.wt is the portion of wt orthogonal to the intercept and r.am is the portion of am orthogonal to wt and the intercept so we have
r.wt <- resid(lm(wt ~ 1, mtcars))
r.am <- resid(lm(am ~ wt, mtcars))
anova(lm(qsec ~ r.am, mtcars), lm(qsec ~ r.wt + r.am, mtcars))
## Model 1: qsec ~ r.am
## Model 2: qsec ~ r.wt + r.am
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 75.576
## 2 29 72.554 1 3.0217 1.2078 0.2808 <------- 0.2808
summary(aov(qsec ~ wt + am, mtcars))
## Df Sum Sq Mean Sq F value Pr(>F)
## wt 1 3.02 3.022 1.208 0.28081 <------- 0.28081
## am 1 23.41 23.413 9.358 0.00474 **
## Residuals 29 72.55 2.502
It would also be possible to demonstrate this by performing Gram Schmidt on the cbind(1, wt, am) model matrix to make the columns orthogonal. The pracma package has a Gram Schmidt routine.

Fixed effect counts in modelsummary

I have a modelsummary of three fixed effects regressions like so:
remotes::install_github("lrberge/fixest")
remotes::install_github("vincentarelbundock/modelsummary")
library(fixest)
library(modelsummary)
mod1 <- feols(mpg ~ hp | cyl, data = mtcars)
mod2 <- feols(mpg ~ wt | cyl, data = mtcars)
mod3 <- feols(mpg ~ drat | cyl, data = mtcars)
modelsummary(list(mod1, mod2, mod3), output = "markdown")
Model 1
Model 2
Model 3
hp
-0.024
(0.015)
wt
-3.206
(1.188)
drat
1.793
(1.564)
Num.Obs.
32
32
32
R2
0.754
0.837
0.745
R2 Adj.
0.727
0.820
0.718
R2 Within
0.080
0.392
0.048
R2 Within Adj.
0.047
0.371
0.014
AIC
167.9
154.6
169.0
BIC
173.8
160.5
174.9
RMSE
2.94
2.39
2.99
Std.Errors
by: cyl
by: cyl
by: cyl
FE: cyl
X
X
X
Instead of having the table show merely whether certain fixed effects were present, is it possible to show the number of fixed effects that were estimated instead?
The raw models do contain this information:
> mod1
OLS estimation, Dep. Var.: mpg
Observations: 32
Fixed-effects: cyl: 3
Standard-errors: Clustered (cyl)
Estimate Std. Error t value Pr(>|t|)
hp -0.024039 0.015344 -1.56664 0.25771
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 2.94304 Adj. R2: 0.727485
Within R2: 0.07998
Yes, you’ll need to define a glance_custom.fixest() method. See this section of the docs for detailed instructions and many examples:
https://vincentarelbundock.github.io/modelsummary/articles/modelsummary.html#customizing-existing-models-part-i
And here’s an example with fixest:
library(fixest)
library(tibble)
library(modelsummary)
models <- list(
feols(mpg ~ hp | cyl, data = mtcars),
feols(mpg ~ hp | am, data = mtcars),
feols(mpg ~ hp | cyl + am, data = mtcars)
)
glance_custom.fixest <- function(x, ...) {
tibble::tibble(`# FE` = paste(x$fixef_sizes, collapse = " + "))
}
modelsummary(models, gof_map = c("nobs", "# FE"))
(1)
(2)
(3)
hp
-0.024
-0.059
-0.044
(0.015)
(0.000)
(0.016)
Num.Obs.
32
32
32
# FE
3
2
3 + 2

Is it possible to add multiple variables in the same regression model?

These are the regression models that I want to obtain. I want to select many variables at the same time to develop a multivariate model, since my data frame has 357 variables.
summary(lm(formula = bci_bci ~ bti_acp, data = qog))
summary(lm(formula = bci_bci ~ wdi_pop, data = qog))
summary(lm(formula = bci_bci ~ ffp_sl, data = qog))
Instead of listing all your variables using + signs, you can also use the shorthand notation . to add all variables in data as explanatory variables (except the target variable on the left hand side of course).
data("mtcars")
mod <- lm(mpg ~ ., data = mtcars)
summary(mod)
#>
#> Call:
#> lm(formula = mpg ~ ., data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3.4506 -1.6044 -0.1196 1.2193 4.6271
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 12.30337 18.71788 0.657 0.5181
#> cyl -0.11144 1.04502 -0.107 0.9161
#> disp 0.01334 0.01786 0.747 0.4635
#> hp -0.02148 0.02177 -0.987 0.3350
#> drat 0.78711 1.63537 0.481 0.6353
#> wt -3.71530 1.89441 -1.961 0.0633 .
#> qsec 0.82104 0.73084 1.123 0.2739
#> vs 0.31776 2.10451 0.151 0.8814
#> am 2.52023 2.05665 1.225 0.2340
#> gear 0.65541 1.49326 0.439 0.6652
#> carb -0.19942 0.82875 -0.241 0.8122
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.65 on 21 degrees of freedom
#> Multiple R-squared: 0.869, Adjusted R-squared: 0.8066
#> F-statistic: 13.93 on 10 and 21 DF, p-value: 3.793e-07
par(mfrow=c(2,2))
plot(mod)
par(mfrow=c(1,1))
Created on 2021-12-21 by the reprex package (v2.0.1)
If you want to include all two-way interactions, the notation would be this:
lm(mpg ~ (.)^2, data = mtcars)
If you want to include all three-way interactions, the notation would be this:
lm(mpg ~ (.)^3, data = mtcars)
If you create very large models (with many variables or interactions), make sure that you also perform some model size reduction after that, e.g. using the function step(). It's very likely that not all your predictors are actually going to be informative, and many could be correlated, which causes problems in multivariate models. One way out of this could be to remove any predictors that are highly correlated to other predictors from the model.

How do I access the R squared value for the entire model generated by vegan function envfit?

I am analyzing some microbiome data by using unconstrained ordination (PCA or NMDS) followed by environmental vector fitting with the envfit function in the vegan package. The output of envfit includes an r2 value for each vector or factor included in the envfit model, but I am interested in the total amount of variation explained by all the vectors/factors, rather than just stand-alone variables. I presume I cannot simply add up the R2 values assigned to each environmental variable, because there may be overlap in the microbiome variation that is "explained" by each environmental variable. However, there does not seem to be any way of accessing the total r2 value for the model.
Using an example dataset, this is what I have tried so far:
library(vegan)
library(MASS)
data(varespec, varechem)
library(MASS)
ord <- metaMDS(varespec)
fit <- envfit(ord, varechem, perm = 999)
fit
This shows r2 for each environmental variable, but how do I extract the r2 value for the entire model?
I have tried running fit$r, attributes(fit)$r, and Rsquare.Adj(fit), but these all return NULL.
R-squared = Explained variation / Total variation, or r^2 = 1 - SSE/SST. For two different responses, the residuals will be on different scales, so calculating a combined R^2 for two responses does not make sense.
For example, in classical lm, they are calculated separately:
> summary(lm(cbind(mpg,wt) ~.,data=mtcars))
Response mpg :
Call:
lm(formula = mpg ~ cyl + disp + hp + drat + qsec + vs + am +
gear + carb, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.6453 -1.2655 -0.4199 1.6320 5.0843
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.57062 19.81294 0.786 0.4403
cyl 0.11982 1.10348 0.109 0.9145
disp -0.01361 0.01212 -1.122 0.2738
hp -0.01122 0.02246 -0.500 0.6223
drat 1.32726 1.71312 0.775 0.4467
qsec 0.09428 0.66944 0.141 0.8893
vs 0.66770 2.22845 0.300 0.7673
am 2.90074 2.17590 1.333 0.1961
gear 1.18650 1.56061 0.760 0.4552
carb -1.32912 0.63321 -2.099 0.0475 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.816 on 22 degrees of freedom
Multiple R-squared: 0.845, Adjusted R-squared: 0.7816
F-statistic: 13.33 on 9 and 22 DF, p-value: 5.228e-07
Response wt :
Call:
lm(formula = wt ~ cyl + disp + hp + drat + qsec + vs + am + gear +
carb, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-0.40769 -0.18831 0.00012 0.15204 0.50382
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.879401 2.098183 -0.419 0.679189
cyl -0.062246 0.116858 -0.533 0.599603
disp 0.007252 0.001284 5.649 1.11e-05 ***
hp -0.002763 0.002378 -1.162 0.257792
drat -0.145385 0.181419 -0.801 0.431483
qsec 0.195613 0.070893 2.759 0.011445 *
vs -0.094189 0.235992 -0.399 0.693653
am -0.102418 0.230427 -0.444 0.661045
gear -0.142945 0.165268 -0.865 0.396411
carb 0.304068 0.067056 4.535 0.000163 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2983 on 22 degrees of freedom
Multiple R-squared: 0.9341, Adjusted R-squared: 0.9071
F-statistic: 34.63 on 9 and 22 DF, p-value: 5.944e-11
For this example, you have to calculate each R^2 for each environment variable

How is 95% CI calculated using confint in R?

I use the example provided in confint help page
> fit <- lm(100/mpg ~ disp + hp + wt + am, data=mtcars)
> summary(fit)
Call:
lm(formula = 100/mpg ~ disp + hp + wt + am, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-1.6923 -0.3901 0.0579 0.3649 1.2608
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.740648 0.738594 1.003 0.32487
disp 0.002703 0.002715 0.996 0.32832
hp 0.005275 0.003253 1.621 0.11657
wt 1.001303 0.302761 3.307 0.00267 **
am 0.155815 0.375515 0.415 0.68147
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6754 on 27 degrees of freedom
Multiple R-squared: 0.8527, Adjusted R-squared: 0.8309
F-statistic: 39.08 on 4 and 27 DF, p-value: 7.369e-11
> confint(fit)
2.5 % 97.5 %
(Intercept) -0.774822875 2.256118188
disp -0.002867999 0.008273849
hp -0.001400580 0.011949674
wt 0.380088737 1.622517536
am -0.614677730 0.926307310
> confint(fit, "wt")
2.5 % 97.5 %
> wt 0.3800887 1.622518
>confint.default(fit,"wt")
2.5 % 97.5 %
wt 0.4079023 1.594704
> 1.001303 + 1.96*0.302761
[1] 1.594715
> 1.001303 - 1.96*0.302761
[1] 0.4078914
So the 95% CI obtained from confint.default is based on asymptotic normality. What about for confint?
Thanks
You can check out the code for each method.
# View code for 'default'
confint.default
# View code of lm objects
getAnywhere(confint.lm)
The difference appears to be that default uses normal quantiles and the method for linear models uses T-quantiles instead.

Resources