Using gtsummary I want to display my adjusted linear regression model without
displaying the covariates. So far I have not found a solution for this. Does anyone know how best to do this?
For example, using the code below, I would like to diplay the first row which shows the cylinder variable and omit the subsequent rows (disp and hp).
# download pacman package if not installed, otherwise load it
if(!require(pacman)) install.packages(pacman)
# loads relevant packages using the pacman package
pacman::p_load(
magrittr, # for pipes
gtsummary) # for tables
mtcars %>%
lm(mpg ~ cyl + disp + hp, data = .) %>%
tbl_regression()
The table currently looks like this...
As per Daniel's suggestion:
library(gtsummary)
mtcars %>%
lm(mpg ~ cyl + disp + hp, data = .) %>%
tbl_regression(include = c("cyl","disp"))
Related
My dataset has a dummy variable which divides the data set into two groups. I would like to display the descriptive statistics for both next to each other, like:
example
using stargazer. Is this possible?
For example, if there is the mtcars data set and the variable $am divides the dataset into two groups, how can I display the one group on the left side and the other group on the other side?
Thank you!
I was able to display the two statistics below each other (I had to make two separate datasets for each group), but never next to each other.
treated <- mtcars[mtcars$am == 1,]
control <- mtcars[mtcars$am == 0,]
stargazer(treated, control, keep=c("mpg", "cyl", "disp", "hp"),
header=FALSE, title="Descriptive statistics", digits=1, type="text")
Descriptive statistics below each other
Someone should point out if I'm mistaken, but I don't believe that stargazer will allow for the kind of nested tables you are looking for. However, there are other packages like modelsummary, gtsummary, and flextable that can produce tables similar to stargazer. I have included examples below using select mtcars variables summarized by am. Personally, I prefer gtsummary due to its flexibility.
library(tidyverse)
data(mtcars)
### modelsummary
# not great since it treats `cyl` as a continuous variable
# https://vincentarelbundock.github.io/modelsummary/articles/datasummary.html
library(modelsummary)
datasummary_balance(~am, data = mtcars, dinm = FALSE)
### gtsummary
# based on example 3 from here
# https://www.danieldsjoberg.com/gtsummary/reference/add_stat_label.html
library(gtsummary)
mtcars %>%
select(am, mpg, cyl, disp, hp) %>%
tbl_summary(
by = am,
missing = "no",
type = list(mpg ~ 'continuous2',
cyl ~ 'categorical',
disp ~ 'continuous2',
hp ~ 'continuous2'),
statistic = all_continuous2() ~ c("{mean} ({sd})", "{median}")
) %>%
add_stat_label(label = c(mpg, disp, hp) ~ c("Mean (SD)", "Median")) %>%
modify_footnote(everything() ~ NA)
### flextable
# this function only works on continuous vars, so I removed `cyl`
# https://davidgohel.github.io/flextable/reference/continuous_summary.html
library(flextable)
mtcars %>%
select(am, mpg, cyl, disp, hp) %>%
continuous_summary(
by = "am",
hide_grouplabel = FALSE,
digits = 3
)
You can use the modelsummary package and its datasummary function, which offers a formula-based language to describe the specific table you need. (Disclaimer: I am the maintainer.)
In addition to the super flexible datasummary function, there are many other functions to summarize data in easier ways. See in particular the datasummary_balance() function here:
https://vincentarelbundock.github.io/modelsummary/articles/datasummary.html
library(modelsummary)
dat <- mtcars[, c("mpg", "cyl", "disp", "hp", "am")]
datasummary(
All(dat) ~ Factor(am) * (N + Mean + SD + Min + Max),
data = dat)
The aim is to get the output of the predicted probabilities of several regression models. First i run several regression models using the following code:
library(dplyr)
library(tidyr)
library(broom)
library(ggeffects)
mtcars$cyl=as.factor(mtcars$cyl)
df <- mtcars %>%
group_by(cyl) %>%
do(model1 = tidy(lm(mpg ~ wt + gear + am , data = .), conf.int=TRUE)) %>%
gather(model_name, model, -cyl) %>% ## make it long format
unnest()
I would like to get the predicted probabilities of my predictor weight (wt). If i want to run the code manually for each different cylinder (cyl), it will look as the following:
#Filter by number of cylinders
df=filter(mtcars, cyl==4)
#Save the regression
mod= lm(mpg ~ wt + gear + am, data = df)
#Run the predictive probabilities
pred <- ggpredict(mod, terms = c("wt"))
This will be the code for only the first cylinder cyl==4, then we would have to run the same code for the second (cyl==6) and the third (cyl==8). This is a bit cumbersome. My aim is to automize that as i do for the regression analyses in the first code above. Also, I would like to get these results in the same format as the first code. In other words, they should be in a format that could be plotted afterwards. Can someone help me with that?
Rerun the models with ggpredict() on the inside:
df <- mtcars %>%
group_by(cyl) %>%
do(model1 = ggpredict(lm(mpg ~ wt + gear + am, data= .), terms = c("wt"))) %>%
gather(model_name, model, -cyl) %>% unnest_legacy()
You can then plot wt (in the 'x' column) against 'predicted'. Note that you'll get a warning message on these data.
When creating a table using tbl_regression() from the gtsummary R package, how do you add the number of events using add_nevent() ? When I run the example code from the add_nevent() help file, I am unable to get the N's to appear:
library(gtsummary)
data(trial)
add_nevent_ex <-
glm(response ~ trt, trial, family = binomial) %>%
tbl_regression() %>%
add_nevent()
add_nevent_ex
In contrast, when I run the example code from the help file for add_nevent.tbl_uvregression(), the N's appear correctly in the table. Unfortunately, I need to use tbl_regression (and not tbl_uvregression) though because I need to adjust for multiple covariates for the actual problem I'm working on.
Starting in gtsummary v1.4.0, the add_nevent() function has been generalized to A) work with tbl_regression and tbl_uvregression obejcts, and B) has the ability to place Ns on the label and variable level rows. Review the help file here: http://www.danieldsjoberg.com/gtsummary/dev/reference/add_nevent_regression.html (FYI, the add_n() has been updated similarly for tbl_regression objects in the dev version.)
library(gtsummary)
add_nevent_ex <-
glm(response ~ trt, trial, family = binomial) %>%
tbl_regression() %>%
add_n() %>%
add_nevent()
Here is some sample code from the official package documentation.
#Package preload
library(dotwhisker)
library(broom)
library(dplyr)
# run a regression compatible with tidy
m1 <- lm(mpg ~ wt + cyl + disp + gear, data = mtcars)
m2 <- update(m1, . ~ . + hp) # add another predictor
m1_df <- tidy(m1) %>% filter(term != "(Intercept)") %>% mutate(model = "Model 1")
m2_df <- tidy(m2) %>% filter(term != "(Intercept)") %>% mutate(model = "Model 2")
two_models <- rbind(m1_df, m2_df)
dwplot(two_models)
which produces this:
The most logical order inside the plot would be to have the coefficients from model 1 above model 2. In any case I would like to know how to control the order of coefficients from distinct models (not the order of the variables themselves). I tried sorting the tidy dataframe with order or factorizing the model column with factor. Neither of the two work. Any advice would be most welcome.
You can change the order of the coefficients by reordering your tidy dataframe. A possible problem might be that the legend order changes as well, but this can be fixed as well.
dwplot(arrange(two_models, desc(model))) +
scale_color_discrete(breaks=c("Model 1","Model 2"))
I am trying a problem what i found in redit and was experimenting how to do that using mtcars data set
This was the problem:
He is having list that looks like this: https://gyazo.com/0637f2226d8f53db4c90716bd3fb698c with 150 different "selskapsid".
He want to do a linear regression with "Return12" as the dependent variable and "SROE", "MktCap", and "y" and independent variables for each "Selskapsid". (Basically a row by row regression each row or for each id even the id got repeated i want separate model.)
I have read the comments in that didn't find any great solution so i was trying using dplyr and packages what I am bit comfort but the issue I was getting is cyl values are in factors so when I am trying to build the model cyl value is not repeating.
Does anyone know a simple loop to achieve this? I want to do training and testing in the same loop I wasn't getting training results also properly.
Using this libraries I was doing this:
library(tidyverse)
library(broom)
mtcars %>%
nest(-cyl) %>%
mutate(fit <-map(data, ~ lm(mpg ~ hp + wt + disp, data = .)),
results = map(fit, augment)) %>%
unnest(results)