How to run ggpredict() in a loop following multiple regression models? - r

The aim is to get the output of the predicted probabilities of several regression models. First i run several regression models using the following code:
library(dplyr)
library(tidyr)
library(broom)
library(ggeffects)
mtcars$cyl=as.factor(mtcars$cyl)
df <- mtcars %>%
group_by(cyl) %>%
do(model1 = tidy(lm(mpg ~ wt + gear + am , data = .), conf.int=TRUE)) %>%
gather(model_name, model, -cyl) %>% ## make it long format
unnest()
I would like to get the predicted probabilities of my predictor weight (wt). If i want to run the code manually for each different cylinder (cyl), it will look as the following:
#Filter by number of cylinders
df=filter(mtcars, cyl==4)
#Save the regression
mod= lm(mpg ~ wt + gear + am, data = df)
#Run the predictive probabilities
pred <- ggpredict(mod, terms = c("wt"))
This will be the code for only the first cylinder cyl==4, then we would have to run the same code for the second (cyl==6) and the third (cyl==8). This is a bit cumbersome. My aim is to automize that as i do for the regression analyses in the first code above. Also, I would like to get these results in the same format as the first code. In other words, they should be in a format that could be plotted afterwards. Can someone help me with that?

Rerun the models with ggpredict() on the inside:
df <- mtcars %>%
group_by(cyl) %>%
do(model1 = ggpredict(lm(mpg ~ wt + gear + am, data= .), terms = c("wt"))) %>%
gather(model_name, model, -cyl) %>% unnest_legacy()
You can then plot wt (in the 'x' column) against 'predicted'. Note that you'll get a warning message on these data.

Related

Multiple fixest_multi models and shape parameter - modelsummary package

Again, thanks to Laurent for answering questions and supporting the modelsummarypackage.
library(tidyverse)
library(fixest)
library(modelsummary)
fit<-mtcars %>% feols(c(mpg,hp )~1)
fit_1 <- mtcars %>% feols(c(mpg,hp,wt )~1)
fit_2 <- mtcars %>% feols(c(mpg,hp,gear, wt )~1)
modelsummary(c(fit, fit_1, fit_2), shape=model + statistic ~ term, output="flextable")
We obtain the long column of estimates from all 3 models (which are simple averages) as in:
So is there a way to rearrange the columns and the rows either using internal modelsummary functions or external work to the following format:
The biggest problem is moving the terms around so that they are aligned on the same line (note that the order of terms between fit_1 and fit_2 is changed) and the rest is filled with NA. Would really appreciate any help! It's a part of a larger problem I've been trying to solve unsuccessfully for the last 3 weeks.
One option is to output to a data frame and reshape manually:
library(tidyverse)
library(fixest)
library(modelsummary)
library(flextable)
fit<-mtcars %>% feols(c(mpg,hp )~1)
fit_1 <- mtcars %>% feols(c(mpg,hp,wt )~1)
fit_2 <- mtcars %>% feols(c(mpg,hp,gear, wt )~1)
models <- c(fit, fit_1, fit_2)
modelsummary(
models,
output = "dataframe",
shape = model + statistic ~ term) |>
mutate(
fit = c(rep(1, 4), rep(2, 6), rep(3, 8)),
model = trimws(model)) |>
pivot_wider(names_from = "fit", values_from = "(Intercept)") |>
select(-statistic, -part, ` ` = model) |>
flextable()
This is a very customized shape and modeling context, so I can't currently think of a way to achieve that purely using internal modelsummary functions arguments.

Obtain P-Value of Fixed Value in Anova Table of many Linear Regressions with Broom Package

In the multi linear regression lm(FE_FCE2 ~ Trial + .x, data = DF_FCE3) there is one fixed variable (trial) and many x variables. I am analysing each x variable against FE_FCE2 with trial as fixed effect. I than use the boom package for the many regressions and plot the results in one table. I have obtained the results for the regression results. However cannot add the data from ANOVA Table into the Broom packages with map function.
Is it possible? And Yes How?
I have used the following formula to obtain Data from Results from Regression:
DF_FCE3 %>%
select(-FE_FCE2, -Trial) %>% # exclude outcome, leave only predictors
map( ~lm(FE_FCE2 ~ Trial + .x, data = DF_FCE3)) %>%
map(summary) %>%
map_df(glance) %>%
round(3) -> rsme
However I would like to obtain the P-Value (**4.26e-08 *****) from the ANOVA Table of Trial.
To
see if Trial had a significant influence on the x variable.
**$x1
Analysis of Variance Table
**Response: FE_FCE2
Df Sum Sq Mean Sq F value Pr(>F)
Trial 3 0.84601 0.282002 15.0653 **4.26e-08 *****
.x 1 0.00716 0.007161 0.3826 0.5377
Residuals 95 1.77827 0.018719**
---**
Is it possible to use the broom package with map function to obtain a table which contains all the many p values of the anova regressions?
Like this (using mpg)?
This returns a dataframe with the original columns and one row containing the p-value except for the outcome and target (hwy and cyl in thisexample, FE_FCE2 and Trial in your case).
mpg %>%
select(-hwy, -cyl) %>% # exclude outcome, leave only predictors
map( ~lm(hwy ~ cyl + .x, data = mpg)) %>%
map(anova) %>%
map(broom::tidy) %>%
map_df(~.$p.value[1])

How to change the order of coefficients in a coefficient plot in R (package dotwhisker)

Here is some sample code from the official package documentation.
#Package preload
library(dotwhisker)
library(broom)
library(dplyr)
# run a regression compatible with tidy
m1 <- lm(mpg ~ wt + cyl + disp + gear, data = mtcars)
m2 <- update(m1, . ~ . + hp) # add another predictor
m1_df <- tidy(m1) %>% filter(term != "(Intercept)") %>% mutate(model = "Model 1")
m2_df <- tidy(m2) %>% filter(term != "(Intercept)") %>% mutate(model = "Model 2")
two_models <- rbind(m1_df, m2_df)
dwplot(two_models)
which produces this:
The most logical order inside the plot would be to have the coefficients from model 1 above model 2. In any case I would like to know how to control the order of coefficients from distinct models (not the order of the variables themselves). I tried sorting the tidy dataframe with order or factorizing the model column with factor. Neither of the two work. Any advice would be most welcome.
You can change the order of the coefficients by reordering your tidy dataframe. A possible problem might be that the legend order changes as well, but this can be fixed as well.
dwplot(arrange(two_models, desc(model))) +
scale_color_discrete(breaks=c("Model 1","Model 2"))

Writing loop on linear regression

I am trying a problem what i found in redit and was experimenting how to do that using mtcars data set
This was the problem:
He is having list that looks like this: https://gyazo.com/0637f2226d8f53db4c90716bd3fb698c with 150 different "selskapsid".
He want to do a linear regression with "Return12" as the dependent variable and "SROE", "MktCap", and "y" and independent variables for each "Selskapsid". (Basically a row by row regression each row or for each id even the id got repeated i want separate model.)
I have read the comments in that didn't find any great solution so i was trying using dplyr and packages what I am bit comfort but the issue I was getting is cyl values are in factors so when I am trying to build the model cyl value is not repeating.
Does anyone know a simple loop to achieve this? I want to do training and testing in the same loop I wasn't getting training results also properly.
Using this libraries I was doing this:
library(tidyverse)
library(broom)
mtcars %>%
nest(-cyl) %>%
mutate(fit <-map(data, ~ lm(mpg ~ hp + wt + disp, data = .)),
results = map(fit, augment)) %>%
unnest(results)

Data Munging Challenge. How do I join the correct coefficients to the correct observation in a summarized table

Before I start, the a basic answer to this question can be found here:
Correctly binding coefficients to summarized table
This question is different in the fact that I need to correctly join the correct coefficients to the correct position in the summary table based on where a knot is placed. I use the I(pmax(0, variable - knot)) technique to place my splines. The end result is a table of unique values of each variable, a summarized measure and the correct model statistics (see my final (yet unfinished) table in below example code).
library(tidyverse)
library(broom)
#pull in and gather data
mtcars1 <- as_tibble(mtcars)
mtcars1$cyl <- as.factor(mtcars$cyl)
#run model and produce model-summary table
model <- glm(mpg ~ cyl + hp + I(pmax(0, hp - 100)), data = mtcars1)
model_summary <- tidy(model)
#produce final summary table
summary_table <- mtcars1 %>%
select(cyl, hp, wt) %>%
gather(key = variable, level, - wt) %>%
group_by(variable, level) %>%
summarise("sum_wt" = sum(wt)) %>%
mutate(term = paste0(variable, level)) %>%
left_join(model_summary, by = c("term" = "term"))
The challenge is taking the I(pmax(0, hp -100)) term in the model_summary table and correctly join the estimate, std.error, statistic and p.value to each hp observation in the summary_table that is <= 100, in addition to joining the other hp estimate statistics to the hp observation in the summary_table that is > 100.

Resources