Same Recipe and Model with Different Outcomes

Same Recipe and Model with Different Outcomes - r

I have a dataset with multiple columns for the outcome variables that I would like to predict with the same preprocessing steps and models. Is there a way to run the same recipe and models (with tuning - I'm using workflow_map()) on multiple outcome variables (separate models for each outcome)?
Essentially, I want loop through the same preprocessing steps and models for each outcome. Basically I want to avoid having to do this:
model_recipe1 <- recipe(outcome_1 ~ ., data) %>%
step_1
model_recipe2 <- recipe(outcome_2 ~ ., data) %>%
step_1
model_recipe3 <- recipe(outcome_3 ~ ., data) %>%
step_1
and would instead like to do something like this:
model_recipe <- recipe(outcome[i] ~ ., data) %>%
step_1

Try running this once before the rest of your code
set.seed(123)
If that doesn't solve it, try running this once at the start of your script:
addTaskCallback(function(...) {set.seed(123);TRUE})
Both of these methods try to ensure any random processes provide the same outcomes each time you run your script, allowing reproducibility.

I'm not sure if we 100% recommend the approach you are trying, but it will work in some circumstances:
library(tidymodels)
folds <- bootstraps(mtcars, times = 5)
wf_set <- workflow_set(list(mpg ~ ., wt ~ ., disp ~ .), list(linear_reg()))
workflow_map(wf_set, "fit_resamples", resamples = folds)
#> # A workflow set/tibble: 3 × 4
#> wflow_id info option result
#> <chr> <list> <list> <list>
#> 1 formula_1_linear_reg <tibble [1 × 4]> <opts[1]> <rsmp[+]>
#> 2 formula_2_linear_reg <tibble [1 × 4]> <opts[1]> <rsmp[+]>
#> 3 formula_3_linear_reg <tibble [1 × 4]> <opts[1]> <rsmp[+]>
Created on 2022-08-04 by the reprex package (v2.0.1)
To make many recipes in an iterative fashion, you'll need a bit of metaprogramming such as with rlang. You can write a function to take (in this case) a string and create a recipe:
library(rlang)
my_recipe <- function(outcome) {
form <- new_formula(ensym(outcome), expr(.))
recipe(form, data = mtcars) %>%
step_normalize(all_numeric_predictors())
}
And then you can use this function with purrr::map() across your outcomes:
library(tidymodels)
library(rlang)
folds <- bootstraps(mtcars, times = 5)
wf_set <- workflow_set(
map(c("mpg", "wt", "disp"), my_recipe),
list(linear_reg())
)
workflow_map(wf_set, "fit_resamples", resamples = folds)
#> # A workflow set/tibble: 3 × 4
#> wflow_id info option result
#> <chr> <list> <list> <list>
#> 1 recipe_1_linear_reg <tibble [1 × 4]> <opts[1]> <rsmp[+]>
#> 2 recipe_2_linear_reg <tibble [1 × 4]> <opts[1]> <rsmp[+]>
#> 3 recipe_3_linear_reg <tibble [1 × 4]> <opts[1]> <rsmp[+]>
Created on 2022-08-04 by the reprex package (v2.0.1)

Related

purrr::map returns multiple rows instead of one

I am trying to run linear regression through functional programming. However, I am not able to get the output successfully. purrr:::map returns multiple rows per nested list instead of one row.
#perform linear regression for each cylinder
mtcars_result <- mtcars%>%
nest(-cyl)%>%
mutate(model=map(data,~ lm(as.formula("mpg~disp"),data=.)),
n=map(data,~nrow(.)))
#predict values
mtcars_result$predict <- 1:3
#helper function to obtain predict values
get_prediction <- function(m,varname,predict){
predictdata <- data.frame(predict)
names(predictdata) <- c(varname)
predict(m,newdata=predictdata,interval="confidence",level=0.95)
}
#prediction, notice it returns three rows per nested list
mtcars_result2 <- mtcars_result%>%mutate(predicted_values=map(model,get_prediction,"disp",predict))
mtcars_result2$predicted_values
[[1]]
fit lwr upr
1 19.08559 11.63407 26.53712
2 19.08920 11.67680 26.50160
3 19.09280 11.71952 26.46609
[[2]]
fit lwr upr
1 40.73681 32.68945 48.78418
2 40.60167 32.62715 48.57619
3 40.46653 32.56482 48.36824
[[3]]
fit lwr upr
1 22.01316 14.74447 29.28186
2 21.99353 14.74479 29.24227
3 21.97390 14.74511 29.20268
My attempt:
I notice the main issue is probably due to the predict argument in get_prediction(). When I run this version of get_prediction()
get_prediction <- function(m,varname,predict){
predict_global<<-predict
predictdata <- data.frame(predict)
names(predictdata) <- c(varname)
predict(m,newdata=predictdata,interval="confidence",level=0.95)
}
> predict_global
[1] 1 2 3
Therefore, my instinct is to use rowwise(), but it ends up with an error:
mtcars_result2 <- mtcars_result%>%rowwise()%>%mutate(predicted_values=map(model,get_prediction,"disp",predict))
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "c('double', 'numeric')"
Can anyone shed some lights for me? maybe we can use purrr::pmap instead of purrr::map?

One option is to use imap and subset the predict column with the index .y.
mtcars_result %>%
mutate(predicted_values = imap(model, ~ get_prediction(.x, "disp", predict[.y])))
Alternatively we can use rowwise()
mtcars_result %>%
rowwise() %>%
mutate(predicted_values = list(get_prediction(model, "disp", predict)))
#> # A tibble: 3 × 6
#> # Rowwise:
#> cyl data model n predict predicted_values
#> <dbl> <list> <list> <list> <int> <list>
#> 1 6 <tibble [7 × 10]> <lm> <int [1]> 1 <dbl [1 × 3]>
#> 2 4 <tibble [11 × 10]> <lm> <int [1]> 2 <dbl [1 × 3]>
#> 3 8 <tibble [14 × 10]> <lm> <int [1]> 3 <dbl [1 × 3]>
Created on 2022-12-01 with reprex v2.0.2

Really similar to the accepted answer but we can also use purrr::map2 and switch the order of arguments in get_prediction
get_prediction <- function(m,predict,varname){
predictdata <- data.frame(predict)
names(predictdata) <- c(varname)
predict(m,newdata=predictdata,interval="confidence",level=0.95)
}
#prediction, notice there are duplicates
mtcars_result2 <- mtcars_result%>%mutate(predicted_values=map2(model,predict,get_prediction,"disp"))
mtcars_result2$predicted_values
[[1]]
fit lwr upr
1 19.08559 11.63407 26.53712
[[2]]
fit lwr upr
1 40.60167 32.62715 48.57619
[[3]]
fit lwr upr
1 21.9739 14.74511 29.20268

Another possibility could be splitting by cyl and using map2:
library(tidyverse)
options(pillar.sigfig = 7)
mtcars %>%
split(f = .$cyl) %>%
map2_dfr(c(2, 1, 3),
~lm(mpg ~ disp, data = .) %>%
get_prediction("disp", .y) %>%
as_tibble(),
.id = "cyl")
This returns a tibble for the predicted values 2, 1, 3
# A tibble: 3 × 4
cyl fit lwr upr
<chr> <dbl> <dbl> <dbl>
1 4 40.60167 32.62715 48.57619
2 6 19.08559 11.63407 26.53712
3 8 21.97390 14.74511 29.20268

How to create nested training and testing sets?

I'm working with the ChickWeight data set in R. I'm looking to create multiple models, each trained for an individual chick. As such, I am nesting the data so that a dataframe is created for each individual chick and stored within the list column.
Here is the start:
library(tidyverse)
library(datasets)
data("ChickWeight")
ChickWeightNest <- ChickWeight %>%
group_by(Chick) %>%
nest()
From here, training a linear regression model on all dataframes simultaneously is very easy by simply building the model as a function then mutating a new column and mapping. However, building a more sophisticated model (e.g. xgboost) requires first splitting the data into testing and training sets. How can I split my all nested data frames at once to create training and testing sets so that I can train multiple models simultaneously?
As a side note, info on training/tuning multiple models seems to be relatively sparse in my research, any related resources or past stack questions would be very appreciated.

Maybe you want something like this where you first randomly sample train or test per chick in a new column to use later and group again to nest the data per group:
library(dplyr)
library(tidyr)
library(datasets)
data("ChickWeight")
ChickWeight %>%
group_by(Chick) %>%
rowwise() %>%
mutate(split = sample(c("train", "test"), n(), replace = FALSE)) %>%
group_by(Chick) %>%
nest()
#> # A tibble: 50 × 2
#> # Groups: Chick [50]
#> Chick data
#> <ord> <list>
#> 1 1 <tibble [12 × 4]>
#> 2 2 <tibble [12 × 4]>
#> 3 3 <tibble [12 × 4]>
#> 4 4 <tibble [12 × 4]>
#> 5 5 <tibble [12 × 4]>
#> 6 6 <tibble [12 × 4]>
#> 7 7 <tibble [12 × 4]>
#> 8 8 <tibble [11 × 4]>
#> 9 9 <tibble [12 × 4]>
#> 10 10 <tibble [12 × 4]>
#> # … with 40 more rows
Created on 2022-06-29 by the reprex package (v2.0.1)

The key here is realizing that each line of the nested data is a list and so you have to use list functions on it, for example lapply from base R or map from purrr.
Here's an example of how that would work using the rsample package to do the split (75% for training)
ChickWeightNest_example<- ChickWeightNest %>%
mutate(data_split = purrr::map(data,
~rsample::initial_split(.x, prop = .75))) %>%
mutate(data_training_only= purrr::map(data_split,
~rsample::training(.x)),
data_testing_only= purrr::map(data_split,
~rsample::testing(.x))
)

How to store both function and its input data inside designated tibble columns, then iterate over rows to execute?

I'm trying to run a data wrangling procedure inside a tibble using tools from {purrr} package. My method is to organize everything I need inside a tibble:
the input data inside a column
the function to apply upon the input data gets its own column too
My problem: how can I use purrr's mapping functions to say "take the function stored in column x and apply it over the data in column y"?
Below is a minimal example, based on mtcars and iris. I want to summarise each data set, in the same workflow: first subset columns, then do some aggregation. For the aggregation part, I preemptively set up 2 functions, one for each data.
summarise_iris()
summarise_mtcars()
Then I organize all I need inside a tibble (see trb object below).
The first part, the subsetting, works well. As can be seen in trb_1 below, dat_selected is a new column I mutated, which stores the output of the subset step.
However, the second part is not working. I want to take the function in column summarise_func and apply it over the data stored in column dat_selected. But it's not working. Why not? I purposely used map() because it maps only 1 input to the function.
library(purrr)
library(tibble)
library(dplyr, warn.conflicts = FALSE)
summarise_iris <- function(.dat) {
.dat %>%
group_by(Species) %>%
summarise(across(starts_with("Sepal"), ~ mean(.x, na.rm = TRUE)))
}
# to test: iris %>% summarise_iris()
summarise_mtcars <- function(.dat) {
.dat %>%
group_by(am) %>%
summarise(mpg_median = median(mpg))
}
# to test: mtcars %>% summarise_mtcars()
trb <-
tribble(~original_data, ~cols_to_select, ~summarise_func,
mtcars, c("am", "disp", "mpg"), ~summarise_mtcars(.),
iris, c("Species", "Sepal.Length", "Sepal.Width"), ~summarise_iris(.)
)
trb_1 <-
trb %>%
mutate(dat_selected = map2(.x = original_data, .y = cols_to_select, .f = ~select(.x, all_of(.y))))
trb_1
#> # A tibble: 2 x 4
#> original_data cols_to_select summarise_func dat_selected
#> <list> <list> <list> <list>
#> 1 <df [32 x 11]> <chr [3]> <formula> <df [32 x 3]>
#> 2 <df [150 x 5]> <chr [3]> <formula> <df [150 x 3]>
trb_1 %>%
mutate(dat_summarised = map(.x = dat_selected, .f = summarise_func))
#> Error: Problem with `mutate()` column `dat_summarised`.
#> i `dat_summarised = map(.x = dat_selected, .f = summarise_func)`.
#> x Index 1 must have length 1, not 2
Created on 2021-12-02 by the reprex package (v2.0.1.9000)
How can I achieve the desired output (see below) using the in-table method I'm trying to incorporate? I.e.:
trb_1 %>%
mutate(dat_summarised = map(.x = dat_selected, .f = summarise_func))
## to give the desired output that's equivalent to what we get if we run:
summar_mtcars <- mtcars %>% summarise_mtcars()
summar_iris <- iris %>% summarise_iris()
trb_1 %>%
tibble::add_column(dat_summarised = list(summar_mtcars, summar_iris))
## # A tibble: 2 x 5
## original_data cols_to_select summarise_func dat_selected dat_summarised
## <list> <list> <list> <list> <list>
## 1 <df [32 x 11]> <chr [3]> <formula> <df [32 x 3]> <tibble [2 x 2]>
## 2 <df [150 x 5]> <chr [3]> <formula> <df [150 x 3]> <tibble [3 x 3]>
UPDATE
I don't know if the following is in the right direction, but based on this answer, I thought to utilize rlang::as_function() such that:
trb_1 %>%
mutate(dat_summarised = map(.x = dat_selected, .f = ~rlang::as_function(summarise_func)))
But it gives a different error now:
x Can't convert a list to function

I think you can take a simpler approach. First, we don't need to select columns, that's inherent to summarize anyway. Let's create columns that define the columns to group by, the columns to summarize, and functions to use.
library(purrr)
library(tibble)
library(dplyr, warn.conflicts = FALSE)
trb <-
tribble(~original_data, ~cols_to_group, ~cols_to_summarize, ~summarise_func,
mtcars, "am", "mpg", \(x) mean(x, na.rm = T),
iris, "Species", ~starts_with("Sepal"), median
)
The \(x) mean(x, na.rm = TRUE) syntax is the new anonymous function syntax in R 4.1. If using an earlier version, just change to function(x) mean(...)
Now we can define a function (to eventually use in pmap that accepts the data, grouping columns, columns to analyse, and the summarize functions.
summarize_fun <- function(
.dat, .group_cols, .summ_cols, .funs
) {
.dat %>%
group_by(across(!!.group_cols)) %>%
summarize(across(!!.summ_cols, .funs))
}
And now we can just use these within mutate(pmap(...)) to get the result we want. I rely on !! for unquoting expressions because that works for passing in things like ~starts_with("Sepal"), which don't work with {{ }} to my knowledge.
trb_final <- trb %>%
mutate(dat_summarized = pmap(
list(
.dat=original_data,
.group_cols=cols_to_group,
.summ_cols=cols_to_summarize,
.funs=summarise_func
),
summarize_fun
))
trb_final
#> # A tibble: 2 × 5
#> original_data cols_to_group cols_to_summarize summarise_func dat_summarized
#> <list> <chr> <list> <list> <list>
#> 1 <df [32 × 11]> am <chr [1]> <fn> <tibble [2 × 2]>
#> 2 <df [150 × 5]> Species <formula> <fn> <tibble [3 × 3]>
trb_final$dat_summarized
#> [[1]]
#> # A tibble: 2 × 2
#> am mpg
#> <dbl> <dbl>
#> 1 0 17.1
#> 2 1 24.4
#>
#> [[2]]
#> # A tibble: 3 × 3
#> Species Sepal.Length Sepal.Width
#> <fct> <dbl> <dbl>
#> 1 setosa 5 3.4
#> 2 versicolor 5.9 2.8
#> 3 virginica 6.5 3
General functions
If instead as in the comments, we want just to apply generic functions to summarize, then just rely on pmap with 2 arguments, the data and the summarizing function.
summarize_mtcars <- function(.dat) {
.dat %>%
group_by(am) %>%
summarise(mpg_median = median(mpg))
}
summarize_iris <- function(.dat) {
.dat %>%
group_by(Species) %>%
summarise(across(starts_with("Sepal"), ~ mean(.x, na.rm = TRUE)))
}
Now we can just define our data frame to analyze using the original data and the two summarize_... functions we defined for the datasets.
trb <-
tribble(~original_data, ~summarize_func,
mtcars, summarize_mtcars,
iris, summarize_iris
)
And then just use pmap as before (can also use map2 of course).
trb_final <- trb %>%
mutate(dat_summarized = pmap(
list(
original_data,
summarize_func
),
\(.d, .f) .f(.d)
))
trb_final
#> # A tibble: 2 × 3
#> original_data summarize_func dat_summarized
#> <list> <list> <list>
#> 1 <df [32 × 11]> <fn> <tibble [2 × 2]>
#> 2 <df [150 × 5]> <fn> <tibble [3 × 3]>
trb_final$dat_summarized
#> [[1]]
#> # A tibble: 2 × 2
#> am mpg_median
#> <dbl> <dbl>
#> 1 0 17.3
#> 2 1 22.8
#>
#> [[2]]
#> # A tibble: 3 × 3
#> Species Sepal.Length Sepal.Width
#> <fct> <dbl> <dbl>
#> 1 setosa 5.01 3.43
#> 2 versicolor 5.94 2.77
#> 3 virginica 6.59 2.97

I would store the functions as strings:
trb <-
tribble(~original_data, ~cols_to_select, ~summarise_func,
mtcars, c("am", "disp", "mpg"), "summarise_mtcars",
iris, c("Species", "Sepal.Length", "Sepal.Width"), "summarise_iris"
)
Then you can simply use do.call in your map call. Or you convert your functions to strings on the fly with mutate:
trb_2 <- trb_1 %>%
mutate(summarise_func = as.character(summarise_func)) %>%
mutate(dat_summarised = map2(summarise_func, original_data, ~ do.call(what = .x, args = list(.dat = .y))))
trb_2
#> # A tibble: 2 × 5
#> original_data cols_to_select summarise_func dat_selected dat_summarised
#> <list> <list> <chr> <list> <list>
#> 1 <df [32 × 11]> <chr [3]> summarise_mtcars <df [32 × 3]> <tibble [2 × 2]>
#> 2 <df [150 × 5]> <chr [3]> summarise_iris <df [150 × 3]> <tibble [3 × 3]>
Created on 2021-12-02 by the reprex package (v2.0.1)
Update: Storing functions or rather function names as strings can be problematic if the underlying function changes (I get that now). The problem is getting the function into the tibble in the first place. What you do in the question is storing it as a formula. A better way is (imo) to store it in a list column:
trb <-
tribble(~original_data, ~cols_to_select, ~summarise_func,
mtcars, c("am", "disp", "mpg"), list(fun = summarise_mtcars),
iris, c("Species", "Sepal.Length", "Sepal.Width"), list(fun = summarise_iris)
)
With a slight adaptation, this original answer then works like this:
trb_3 <- trb_1 %>%
mutate(dat_summarised = map2(summarise_func, original_data, ~ do.call(what = .x$fun, args = list(.dat = .y))))
trb_3
#> # A tibble: 2 × 5
#> original_data cols_to_select summarise_func dat_selected dat_summarised
#> <list> <list> <list> <list> <list>
#> 1 <df [32 × 11]> <chr [3]> <named list [1]> <df [32 × 3]> <tibble [2 × 2]>
#> 2 <df [150 × 5]> <chr [3]> <named list [1]> <df [150 × 3]> <tibble [3 × 3]>
Created on 2021-12-02 by the reprex package (v2.0.1)

Tune recipe in workflow set with custom range (or value)

I'm trying to use workflow_set() function in tidymodels to evaluate a batch of models.
I've understand that is possible to modify some model specification in order to change the search range so, for example, given this specification:
spec_lin <- linear_reg( penalty = tune(),
mixture = tune() ) %>%
set_engine('glmnet')
I can modify the range using:
rec_base <- recipe( price ~ feat_1) %>%
step_novel(feat_1) %>%
step_other(feat_1,threshold=.2 ) %>%
step_dummy(feat_1)
rec_adv_param <- rec_base %>%
parameters() %>%
update ( mixture = mixture(c(0.1,0.01)) )
My attempt is to do the same but with the parameters in the recipe. For example:
rec_tuned <- recipe( price ~ feat_1) %>%
step_novel(feat_1) %>%
step_other(feat_1,threshold=tune() ) %>%
step_dummy(feat_1)
followed by
rec_adv_param <- rec_tuned %>%
parameters() %>%
update ( threshold = threshold(c(0.1,0.2)) )
However when I try to use it in the workflow_set() definition if I use something like
wf_set <- workflow_set(recipes, models, cross = TRUE )
option_add(param_info = rec_adv_param, id = "rec_tuned_spec_lin")
The finale "wf_set" lost his original tuning parameters the has been changed with the
threshold = threshold(c(0.1,0.2)
Is there a way to add the parameters specification for the recipe in all workflow_set models?
Thanks

You can add the parameters for a recipe via option_add(), either for a single workflow by id for all workflows if you leave id = NULL. When you go to tune or fit on resampled data, these options will be used.
For example, if we want to try 0 to 20 PCA components (instead of the default):
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
data(Chicago)
data("chi_features_set")
time_val_split <-
sliding_period(
Chicago,
date,
"month",
lookback = 38,
assess_stop = 1
)
## notice that there are no options; defaults will be used
chi_features_set
#> # A workflow set/tibble: 3 × 4
#> wflow_id info option result
#> <chr> <list> <list> <list>
#> 1 date_lm <tibble [1 × 4]> <opts[0]> <list [0]>
#> 2 plus_holidays_lm <tibble [1 × 4]> <opts[0]> <list [0]>
#> 3 plus_pca_lm <tibble [1 × 4]> <opts[0]> <list [0]>
## make new params
pca_param <-
parameters(num_comp()) %>%
update(num_comp = num_comp(c(0, 20)))
## add new params to workflowset like this:
chi_features_set %>%
option_add(param_info = pca_param, id = "plus_pca_lm")
#> # A workflow set/tibble: 3 × 4
#> wflow_id info option result
#> <chr> <list> <list> <list>
#> 1 date_lm <tibble [1 × 4]> <opts[0]> <list [0]>
#> 2 plus_holidays_lm <tibble [1 × 4]> <opts[0]> <list [0]>
#> 3 plus_pca_lm <tibble [1 × 4]> <opts[1]> <list [0]>
## now these new parameters can be used by `workflow_map()`:
chi_features_set %>%
option_add(param_info = pca_param, id = "plus_pca_lm") %>%
workflow_map(resamples = time_val_split, grid = 21, seed = 1)
#> # A workflow set/tibble: 3 × 4
#> wflow_id info option result
#> <chr> <list> <list> <list>
#> 1 date_lm <tibble [1 × 4]> <opts[2]> <rsmp[+]>
#> 2 plus_holidays_lm <tibble [1 × 4]> <opts[2]> <rsmp[+]>
#> 3 plus_pca_lm <tibble [1 × 4]> <opts[3]> <tune[+]>
Created on 2021-07-30 by the reprex package (v2.0.0)

(tidy, glance, augment) with exec

I see from the purrr documentation that it should be possible to map a list of functions onto arguments using the map(list(fn1, fn2, fn3), exec, !!!args) syntax or something similar. How would this work for the broom functions tidy, glance, and augment, which usually must be supplemented with do? These are three functions I almost always like to execute at the same time on the same data and model. Of course I can do this explicitly:
# works but is repetitive
MY_MODEL <- hp ~ cyl
my_glance <- mtcars %>% do(glance(lm(data = ., formula = MY_MODEL)))
my_tidy <- mtcars %>% do(tidy(lm(data = ., formula = MY_MODEL)))
my_augment <- mtcars %>% do(augment(lm(data = ., formula = MY_MODEL)))
I suspect there is a better, more compact way to do this without having to retype ...lm(data = ., formula = MY_MODEL... every time, but I couldn't figure it out. I tried
# doesn't work
omnibroom <- function(df, model){
map(list(glance, tidy, augment),
exec,
~{(do(.x(lm(data = df, formula = model))))}
)
}
omnibroom(mtcars, MY_MODEL)
but I think I don't understand the !!! syntax appropriately.
Is there a compact idiom for calling these three broom functions on the same model and data?

It's possible to do this in two lines with simple re-factoring. No do or !!! necessary.
mdl <- mtcars %>% lm(data=., formula=MY_MODEL)
res1 <- map( list(glance, tidy, augment), exec, mdl )
If you really want to squish it down into a single line, use { to help guide pipe input to the correct place in lm:
res2 <- mtcars %>%
{map( list(glance, tidy, augment), exec, lm(data=., formula=MY_MODEL) )}
Verification:
identical( res1, list(my_glance, my_tidy, my_augment) ) # TRUE
identical( res1, res2 ) # TRUE
EDIT to address grouping
Arbitrary functions like lm don't respect data frame groups. While do is a popular approach to handle grouping in this case, I personally think that tidyr::nest() is more intuitive because it places all intermediates and results alongside the data:
## "Listify" broom functions: f -> map( ..., f )
omnibroom <- map( list(glance, tidy, augment), ~function(l) map(l, .x) ) %>%
set_names( c("glance","tidy","augment") )
result <- mtcars %>% nest( data = -gear ) %>%
mutate( model = map(data, lm, formula=MY_MODEL) ) %>%
mutate_at( "model", omnibroom )
# # A tibble: 3 x 6
# gear data model glance tidy augment
# <dbl> <list> <list> <list> <list> <list>
# 1 4 <tibble [12 × 10… <lm> <tibble [1 × 11… <tibble [2 × … <tibble [12 × …
# 2 3 <tibble [15 × 10… <lm> <tibble [1 × 11… <tibble [2 × … <tibble [15 × …
# 3 5 <tibble [5 × 10]> <lm> <tibble [1 × 11… <tibble [2 × … <tibble [5 × 9…
This format also naturally lends itself to unnesting, since broom functions produce data frames:
result %>% select( gear, tidy ) %>% unnest( tidy )
# # A tibble: 6 x 6
# gear term estimate std.error statistic p.value
# <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 4 (Intercept) -5.00 25.3 -0.198 0.847
# 2 4 cyl 20.2 5.30 3.82 0.00339
# 3 3 (Intercept) -47.5 56.1 -0.847 0.412
# 4 3 cyl 30.0 7.42 4.04 0.00142
# 5 5 (Intercept) -101. 51.9 -1.94 0.148
# 6 5 cyl 49.4 8.28 5.96 0.00944

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Same Recipe and Model with Different Outcomes - r

Related

purrr::map returns multiple rows instead of one

How to create nested training and testing sets?

How to store both function and its input data inside designated tibble columns, then iterate over rows to execute?

Tune recipe in workflow set with custom range (or value)

(tidy, glance, augment) with exec

Categories

Resources