How to reduce the regression output in markdown - r

I want to show a regression output in markdown but it contains a lot of character variables which result in a lot of independent variables. Is there any way to only show in the summary the first 5 variables? The summary function in combination with the options(max.print=80) does not provide the solution I want.

You can use tidy() function from broom package
library(broom)
library(magrittr)
lm(mpg ~ ., data = mtcars) %>% tidy() %>% head(n = 5)
#> # A tibble: 5 × 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 12.3 18.7 0.657 0.518
#> 2 cyl -0.111 1.05 -0.107 0.916
#> 3 disp 0.0133 0.0179 0.747 0.463
#> 4 hp -0.0215 0.0218 -0.987 0.335
#> 5 drat 0.787 1.64 0.481 0.635
Created on 2022-07-08 by the reprex package (v2.0.1)

If I understand you correctly, you could for example subset the coefficients from the variables you want like this (I use mtcars dataset as an example):
model = lm(mpg ~ ., data=mtcars)
smy = summary(model)
smy$coefficients[1:5,]
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 12.30337416 18.71788443 0.6573058 0.5181244
#> cyl -0.11144048 1.04502336 -0.1066392 0.9160874
#> disp 0.01333524 0.01785750 0.7467585 0.4634887
#> hp -0.02148212 0.02176858 -0.9868407 0.3349553
#> drat 0.78711097 1.63537307 0.4813036 0.6352779
Created on 2022-07-07 by the reprex package (v2.0.1)

Related

Creating data frame with CIs from matching and poisson model

After doing ps matching, I'm running a poisson model like so:
model <- glm(outcome ~ x1 + x2 + x3 ... ,
data = d,
weights = psweights$weights,
family = "poisson")
And then want to create a new data frame with the variable names, coefficients, and upper and lower confidence intervals. Just doing:
d2 <- summary(model)$coef
gets me the variable names, coefficients, standard errors, and z values. What is the easiest way to compute confidence intervals, convert them into columns and bind it all into one data frame?
How about this, using the broom package:
library(broom)
mod <- glm(hp ~ disp + drat + cyl, data=mtcars, family=poisson)
tidy(mod, conf.int=TRUE)
#> # A tibble: 4 × 7
#> term estimate std.error statistic p.value conf.low conf.high
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 2.40 0.196 12.3 1.30e-34 2.02 2.79
#> 2 disp 0.000766 0.000259 2.96 3.07e- 3 0.000258 0.00127
#> 3 drat 0.240 0.0386 6.22 4.89e-10 0.164 0.315
#> 4 cyl 0.236 0.0195 12.1 1.21e-33 0.198 0.274
Created on 2022-06-30 by the reprex package (v2.0.1)

Summarize results of mutliple regression in a data table

I want to summarize the results of mutliple regressions in a data table.
Packages used in the example :
library(data.table)
library(fixest)
library(broom)
library(tidyr)
Example data
dt <- data.table(mtcars)
First I create all the formulas that will be used.
y_vars <- c("mpg","cyl")
x_vars <- c("disp", "hp")
vars <- tidyr::crossing(y_vars, x_vars)
vars$formula <- paste(vars$y_vars, "~", vars$x_vars)
formulas <- vars$formula
Then I estimate all the models and summarize the results using tidy() :
res <- lapply(formulas ,function(i) tidy(feols(as.formula(i),data=dt)))
data.table::rbindlist(res)
Here is the resulting data table :
term estimate std.error statistic p.value
1: (Intercept) 3.18856797 0.296387718 10.758097 8.121618e-12
2: disp 0.01299804 0.001135649 11.445474 1.802838e-12
3: (Intercept) 3.00679525 0.425485225 7.066744 7.405351e-08
4: hp 0.02168354 0.002635142 8.228604 3.477861e-09
5: (Intercept) 29.59985476 1.229719515 24.070411 3.576586e-21
6: disp -0.04121512 0.004711833 -8.747152 9.380327e-10
7: (Intercept) 30.09886054 1.633920950 18.421246 6.642736e-18
8: hp -0.06822828 0.010119304 -6.742389 1.787835e-07
The problem is I cannot identify the y variable in this summary table.
Ideally, I'd like to have one more column taking the value of the y variable.
I looked in tidy() documentation but did not found how to add it.
Any idea how to do this please ?
Consider either Map from base R (which can take multiple arguments)
library(data.table)
rbindlist(Map(function(fmla, yvar) transform(tidy(feols(as.formula(fmla),
data = dt)), yvar = yvar), formulas, vars$y_vars))
term estimate std.error statistic p.value yvar
1: (Intercept) 3.18856797 0.296387718 10.758097 8.121618e-12 cyl
2: disp 0.01299804 0.001135649 11.445474 1.802838e-12 cyl
3: (Intercept) 3.00679525 0.425485225 7.066744 7.405351e-08 cyl
4: hp 0.02168354 0.002635142 8.228604 3.477861e-09 cyl
5: (Intercept) 29.59985476 1.229719515 24.070411 3.576586e-21 mpg
6: disp -0.04121512 0.004711833 -8.747152 9.380327e-10 mpg
7: (Intercept) 30.09886054 1.633920950 18.421246 6.642736e-18 mpg
8: hp -0.06822828 0.010119304 -6.742389 1.787835e-07 mpg
or use map2 from purrr
library(dplyr)
library(purrr)
library(tidyr)
vars %>%
transmute(out = map2(y_vars, formula,
~ tidy(feols(as.formula(.y), data = dt)) %>%
mutate(y_var = .x))) %>%
unnest(out)
-output
# A tibble: 8 x 6
term estimate std.error statistic p.value y_var
<chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 (Intercept) 3.19 0.296 10.8 8.12e-12 cyl
2 disp 0.0130 0.00114 11.4 1.80e-12 cyl
3 (Intercept) 3.01 0.425 7.07 7.41e- 8 cyl
4 hp 0.0217 0.00264 8.23 3.48e- 9 cyl
5 (Intercept) 29.6 1.23 24.1 3.58e-21 mpg
6 disp -0.0412 0.00471 -8.75 9.38e-10 mpg
7 (Intercept) 30.1 1.63 18.4 6.64e-18 mpg
8 hp -0.0682 0.0101 -6.74 1.79e- 7 mpg

How to extract the coefficients of a linear model and store in a variable in R?

I have a data frame and I did a linear model. I want to extract the coefficients and store each coefficient into a variable using R.
This is my data frame
df <- mtcars
fit <- lm(mpg~., data = df)
This is how I extract one coefficient
beta_0 = fit$coefficients[1]
I want to do this automatically for all coefficients in my model. I tried to use a loop but is not working. I know is not the right code but that was what I found
for (i in fit$coefficients(1:11)) {
d["s{0}".format(x)] = variable1
}
df <- mtcars
fit <- lm(mpg~., data = df)
beta_0 = fit$coefficients[1]
#base R approach
coef_base <- coef(fit)
coef_base
#> (Intercept) cyl disp hp drat wt
#> 12.30337416 -0.11144048 0.01333524 -0.02148212 0.78711097 -3.71530393
#> qsec vs am gear carb
#> 0.82104075 0.31776281 2.52022689 0.65541302 -0.19941925
#tidyverse approach with the broom package
coef_tidy <- broom::tidy(fit)
coef_tidy
#> # A tibble: 11 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 12.3 18.7 0.657 0.518
#> 2 cyl -0.111 1.05 -0.107 0.916
#> 3 disp 0.0133 0.0179 0.747 0.463
#> 4 hp -0.0215 0.0218 -0.987 0.335
#> 5 drat 0.787 1.64 0.481 0.635
#> 6 wt -3.72 1.89 -1.96 0.0633
#> 7 qsec 0.821 0.731 1.12 0.274
#> 8 vs 0.318 2.10 0.151 0.881
#> 9 am 2.52 2.06 1.23 0.234
#> 10 gear 0.655 1.49 0.439 0.665
#> 11 carb -0.199 0.829 -0.241 0.812
for (i in coef_base) {
#do work on i
print(i)
}
#> [1] 12.30337
#> [1] -0.1114405
#> [1] 0.01333524
#> [1] -0.02148212
#> [1] 0.787111
#> [1] -3.715304
#> [1] 0.8210407
#> [1] 0.3177628
#> [1] 2.520227
#> [1] 0.655413
#> [1] -0.1994193
In most cases, as.numeric(coef(fit)[i]) is sufficient to isolate the coefficients:
fit <- lm(mpg~.,mtcars)
for(i in 1:length(coef(fit))){
print(as.numeric(coef(fit)[i]))
}
#[1] 12.30337
#[1] -0.1114405
#[1] 0.01333524
#[1] -0.02148212
#[1] 0.787111
#[1] -3.715304
#[1] 0.8210407
#[1] 0.3177628
#[1] 2.520227
#[1] 0.655413
#[1] -0.1994193
If you have need to put coefficients into a data frame, this code will put each coefficient into a separate variable (variable1, variable2,..) within a dataframe (vars):
fit <- lm(mpg~.,mtcars)
ce <- coef(fit)
vars <- data.frame(col = (NA))
for(i in 1:length(ce)) {
new_col <- as.numeric(ce[i])
vars[ 1, i] <- new_col
colnames(vars)[i] <- paste0("variable", i)
}
vars
# variable1 variable2 variable3 variable4 variable5 variable6 variable7 variable8 variable9 variable10 variable11
# 1 12.30337 -0.1114405 0.01333524 -0.02148212 0.787111 -3.715304 0.8210407 0.3177628 2.520227 0.655413 -0.1994193

How to run many regressions across rows and columns with vectorization

I want to run a series of linear regressions for multiple groups across columns. For the group stratification across rows, I can use the idea suggested here (Fitting several regression models with dplyr). In addition to that, I also need to regress them across different columns. See below the code I achieved with the loop. I wonder whether I can do both in a vectorized manner using the map function in package purrr together with the function of group_by in dplyr package and export the estimated beta coefficients and p values accordingly.
library(dplyr)
library(broom)
head(mtcars)
vec<-names(mtcars)[3:9]
data=NULL
for (i in 1:length(vec)){
df<-mtcars%>%
group_by(cyl)%>%
do( fit = lm( paste('mpg ~disp+',vec[i]), data = .))
dfCoef = tidy(df, fit)
res<-dfCoef %>%
filter(term=='disp')
res$con=vec[i]
data=bind_rows(data,res)
}
data
Using tidyr::(un)nest to perform the regressions by groups and a helper function this could be achieved like so:
library(dplyr)
library(broom)
library(tidyr)
library(purrr)
vec <- names(mtcars)[3:9]
lm_help <- function(vec) {
mtcars %>%
tidyr::nest(data = -cyl) %>%
mutate(con = vec,
fit = purrr::map(data, lm, formula = as.formula(paste0("mpg ~ disp + ", vec))),
tidy = purrr::map(fit, tidy)) %>%
select(cyl, con, tidy) %>%
tidyr::unnest(tidy) %>%
filter(term == "disp")
}
purrr::map(vec, lm_help) %>%
bind_rows()
#> # A tibble: 21 x 7
#> cyl con term estimate std.error statistic p.value
#> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 6 disp disp 0.00361 0.0156 0.232 0.826
#> 2 4 disp disp -0.135 0.0332 -4.07 0.00278
#> 3 8 disp disp -0.0196 0.00932 -2.11 0.0568
#> 4 6 hp disp 0.00180 0.0202 0.0890 0.933
#> 5 4 hp disp -0.120 0.0369 -3.24 0.0120
#> 6 8 hp disp -0.0186 0.00946 -1.97 0.0746
#> 7 6 drat disp 0.0224 0.0292 0.770 0.484
#> 8 4 drat disp -0.133 0.0406 -3.27 0.0114
#> 9 8 drat disp -0.0196 0.00977 -2.01 0.0697
#> 10 6 wt disp 0.0191 0.0109 1.75 0.154
#> # ... with 11 more rows

Linear regression between dependent variable with multiple independent variables

I would like to create a function where the dependent variable (y) regressed with individual independent variables (x1, x2, etc.) but not in the form of multiple regression. And I would like to include another function in the same formula is to calculate AIC value. So, both of these functions in the same formula. Can somebody have any idea how to do it? I have a huge dataset and I need to find a regression for an individual dependent variable with multiple independent variables. I would really appreciate it if somebody guides me here.
The following code will give you the results of the dependent variable (y) regressed with individual independent variables
data(mtcars)
x = names(mtcars[,-1])
out <- unlist(lapply(1, function(n) combn(x, 1, FUN=function(row) paste0("mpg ~ ", paste0(row, collapse = "+")))))
out
#> [1] "mpg ~ cyl" "mpg ~ disp" "mpg ~ hp" "mpg ~ drat" "mpg ~ wt"
#> [6] "mpg ~ qsec" "mpg ~ vs" "mpg ~ am" "mpg ~ gear" "mpg ~ carb"
library(broom)
#> Warning: package 'broom' was built under R version 3.5.3
library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.5.3
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#To have the regression coefficients
tmp1 = bind_rows(lapply(out, function(frml) {
a = tidy(lm(frml, data=mtcars))
a$frml = frml
return(a)
}))
head(tmp1)
#> # A tibble: 6 x 6
#> term estimate std.error statistic p.value frml
#> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 (Intercept) 37.9 2.07 18.3 8.37e-18 mpg ~ cyl
#> 2 cyl -2.88 0.322 -8.92 6.11e-10 mpg ~ cyl
#> 3 (Intercept) 29.6 1.23 24.1 3.58e-21 mpg ~ disp
#> 4 disp -0.0412 0.00471 -8.75 9.38e-10 mpg ~ disp
#> 5 (Intercept) 30.1 1.63 18.4 6.64e-18 mpg ~ hp
#> 6 hp -0.0682 0.0101 -6.74 1.79e- 7 mpg ~ hp
#To have the regression results i.e. R2, AIC, BIC
tmp2 = bind_rows(lapply(out, function(frml) {
a = glance(lm(frml, data=mtcars))
a$frml = frml
return(a)
}))
head(tmp2)
#> # A tibble: 6 x 12
#> r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
#> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.726 0.717 3.21 79.6 6.11e-10 2 -81.7 169. 174.
#> 2 0.718 0.709 3.25 76.5 9.38e-10 2 -82.1 170. 175.
#> 3 0.602 0.589 3.86 45.5 1.79e- 7 2 -87.6 181. 186.
#> 4 0.464 0.446 4.49 26.0 1.78e- 5 2 -92.4 191. 195.
#> 5 0.753 0.745 3.05 91.4 1.29e-10 2 -80.0 166. 170.
#> 6 0.175 0.148 5.56 6.38 1.71e- 2 2 -99.3 205. 209.
#> # ... with 3 more variables: deviance <dbl>, df.residual <int>, frml <chr>
write.csv(tmp1, "Try_lm_coefficients.csv")
write.csv(tmp2, "Try_lm_results.csv")
Created on 2019-11-11 by the reprex package (v0.3.0)
The results can be found in "Try_lm_coefficients.csv" and "Try_lm_results.csv" files.

Resources