Multiple fixest_multi models and shape parameter - modelsummary package - r

Again, thanks to Laurent for answering questions and supporting the modelsummarypackage.
library(tidyverse)
library(fixest)
library(modelsummary)
fit<-mtcars %>% feols(c(mpg,hp )~1)
fit_1 <- mtcars %>% feols(c(mpg,hp,wt )~1)
fit_2 <- mtcars %>% feols(c(mpg,hp,gear, wt )~1)
modelsummary(c(fit, fit_1, fit_2), shape=model + statistic ~ term, output="flextable")
We obtain the long column of estimates from all 3 models (which are simple averages) as in:
So is there a way to rearrange the columns and the rows either using internal modelsummary functions or external work to the following format:
The biggest problem is moving the terms around so that they are aligned on the same line (note that the order of terms between fit_1 and fit_2 is changed) and the rest is filled with NA. Would really appreciate any help! It's a part of a larger problem I've been trying to solve unsuccessfully for the last 3 weeks.

One option is to output to a data frame and reshape manually:
library(tidyverse)
library(fixest)
library(modelsummary)
library(flextable)
fit<-mtcars %>% feols(c(mpg,hp )~1)
fit_1 <- mtcars %>% feols(c(mpg,hp,wt )~1)
fit_2 <- mtcars %>% feols(c(mpg,hp,gear, wt )~1)
models <- c(fit, fit_1, fit_2)
modelsummary(
models,
output = "dataframe",
shape = model + statistic ~ term) |>
mutate(
fit = c(rep(1, 4), rep(2, 6), rep(3, 8)),
model = trimws(model)) |>
pivot_wider(names_from = "fit", values_from = "(Intercept)") |>
select(-statistic, -part, ` ` = model) |>
flextable()
This is a very customized shape and modeling context, so I can't currently think of a way to achieve that purely using internal modelsummary functions arguments.

Related

Add multiple model statistics with add_glance_table() horizontaly (and not verticaly) in tbl_regression

I used tbl_regression and add_glance_table() from gtsummary to build a table with model statistic:
library(gtsumary)
coxph(Surv(time, event) ~ score, data = dat) %>%
tbl_regression(exponentiate = TRUE) %>%
add_glance_table(concordance)
1st question: How can I move the model statistic horizontaly, to the right?
Because, in the end, I want to display multiple model statistic, with C index in the last column, like this:
tbl_uvregression(
dat_score,
method=survival::coxph,
y = Surv(time, event),
exponentiate = TRUE)
2nd question: How do I add add_glance_table in tbl_uvregression?
You can merge any additional columns/statistics into a gtsummary using the modify_table_body() function (the table_body is an internal data frame that is styled and printed as the summary table).
It's possible to add the c-index in a tbl_uvregression() setting. But I think it requires a higher understanding of the internals of a tbl_uvregression() object. In the example below, I estimate each univariable model separately, summarize the model with tbl_regression(), merge in the c-index, then stack all the tbls with tbl_stack().
Happy Programming!
library(gtsummary)
library(tidyverse)
library(survival)
packageVersion("gtsummary")
#> [1] '1.5.2'
covariates <- c("age", "marker")
# iterate over the covariates
tbl <-
covariates %>%
map(
function(varname) {
# build regression model
mod <-
str_glue("Surv(ttdeath, death) ~ {varname}") %>%
as.formula() %>%
coxph(data = trial)
# calculate and format c-index. adding variable column to merge in the next step
df_cindex <-
broom::glance(mod) %>%
select(concordance) %>%
mutate(
concordance = style_sigfig(concordance, digits = 3),
variable = varname
)
# summarize model
tbl_regression(mod, exponentiate = TRUE) %>%
# merge in the c-index
modify_table_body(~left_join(.x, df_cindex, by = "variable")) %>%
modify_header(concordance = "**c-index**") # assigning a header label unhides the column
}
) %>%
#stack all tbls
tbl_stack()
Created on 2022-04-09 by the reprex package (v2.0.1)

How to run ggpredict() in a loop following multiple regression models?

The aim is to get the output of the predicted probabilities of several regression models. First i run several regression models using the following code:
library(dplyr)
library(tidyr)
library(broom)
library(ggeffects)
mtcars$cyl=as.factor(mtcars$cyl)
df <- mtcars %>%
group_by(cyl) %>%
do(model1 = tidy(lm(mpg ~ wt + gear + am , data = .), conf.int=TRUE)) %>%
gather(model_name, model, -cyl) %>% ## make it long format
unnest()
I would like to get the predicted probabilities of my predictor weight (wt). If i want to run the code manually for each different cylinder (cyl), it will look as the following:
#Filter by number of cylinders
df=filter(mtcars, cyl==4)
#Save the regression
mod= lm(mpg ~ wt + gear + am, data = df)
#Run the predictive probabilities
pred <- ggpredict(mod, terms = c("wt"))
This will be the code for only the first cylinder cyl==4, then we would have to run the same code for the second (cyl==6) and the third (cyl==8). This is a bit cumbersome. My aim is to automize that as i do for the regression analyses in the first code above. Also, I would like to get these results in the same format as the first code. In other words, they should be in a format that could be plotted afterwards. Can someone help me with that?
Rerun the models with ggpredict() on the inside:
df <- mtcars %>%
group_by(cyl) %>%
do(model1 = ggpredict(lm(mpg ~ wt + gear + am, data= .), terms = c("wt"))) %>%
gather(model_name, model, -cyl) %>% unnest_legacy()
You can then plot wt (in the 'x' column) against 'predicted'. Note that you'll get a warning message on these data.

R two different code chunks to get a p-value but the code evaluates differently and I can't figure out the difference

I'm trying to figure out why these two code chunks give me different p-values for Welch's T-Test. I really just tried to do a tidy version of the base R code and create a table with both statistics. But the tidy version I'm using has a very small p-value and I'm confused as to why.
t.test(mpg ~ vs, data = mtcars) # p-value = 0.0001098
t.test(mpg ~ am, data = mtcars) # p-value = 0.001374
options(scipen = 999)
mtcars %>%
dplyr::select(mpg, vs, am) %>%
pivot_longer(names_to = 'names', values_to = 'values', 2:3) %>%
nest(data = -names) %>%
mutate(
test = map(data, ~ t.test(.x$mpg, .x$values)), # S3 list-col
tidied = map(test, tidy)
) %>%
unnest(tidied) # vs = 0.000000000000000010038009 and am = 0.000000000000000009611758
If you run simply:
t.test(mtcars$mpg, mtcars$vs)
You'll get the same values as in your nested data example.
So the issue is not the nesting - it's that you're performing a different kind of t-test. The formula version is treating the variables vs or am as having two groups (0, 1) and the vectorized version is not.

Grouped linear regression prediction on different grouped by group in R

I'm trying to build models based on specific groups in a dataset and use the models generated to predict fit on a different dataset by following the group restrictions. In other words, using the example below, models built using subset: cyl==4 of original data should be used only to predict subset: cyl==4 of new dataset (data1). Anyone can help with this interesting problem?
I tried to used data1%>% group_by(cyl) to specify the new data but that didn't help
Thank you
library(broom)
library(dplyr)
library(purrr)
data1 <- head(mtcars,20)
x<-mtcars %>%
group_by(cyl) %>%
summarise(fit = list(lm(wt ~ mpg)),
data = list(cur_data())) %>%
mutate(col = map(fit, augment, newdata = data1%>% group_by(cyl)))```
Here is a quick way to do this
library(dplyr)
models = mtcars %>% group_by(cyl) %>% do(model = lm(wt ~ mpg, data = .))
Then access the individual models with
library(broom)
tidy(models$model[[1]])
Another way to do the same -
models <- mtcars %>%
nest_by(cyl) %>%
mutate(mod = list(lm(mpg ~ disp, data = data)))

Tidy output from many single-variable models using purrr, broom

I have a dataframe that comprises of a binary outcome column (y), and multiple independent predictor columns (x1, x2, x3...).
I would like to run many single-variable logistic regression models (e.g. y ~ x1, y ~ x2, y ~ x3), and extract the exponentiated coefficients (odds ratios), 95% confidence intervals and p-values for each model into rows of a dataframe/tibble. It seems to me that a solution should be possible using a combination of purrr and broom.
This question is similar, but I can't work out the next steps of:
extracting only the values I need and
tidying into a dataframe/tibble.
Working from the example in the referenced question:
library(tidyverse)
library(broom)
df <- mtcars
df %>%
names() %>%
paste('am~',.) %>%
map(~glm(as.formula(.x), data= df, family = "binomial"))
After sleeping on it, the solution occurred to me. Requires the use of map_df to run each model, and tidy to extract the values from each model.
Hopefully this will be useful for others:
library(tidyverse)
library(broom)
df <- mtcars
output <- df %>%
select(-am) %>%
names() %>%
paste('am~',.) %>%
map_df(~tidy(glm(as.formula(.x),
data= df,
family = "binomial"),
conf.int=TRUE,
exponentiate=TRUE)) %>%
filter(term !="(Intercept)")

Resources