Error: No glance method recognized for this list - r

I'm trying to write a function that can flexibly group by a variable number of arguments and fit a linear model to each subset. The output should be a table with each row showing the grouping variable(s) and corresponding lm call results that broom::glance provides. But I can't figure out how to structure the output. Code that produces the same error is as follows:
library(dplyr)
library(broom)
test_fcn <- function(var1, ...) {
x <- unlist(list(...))
mtcars %>%
group_by(across(all_of(c('gear', x)))) %>%
mutate(mod = list(lm(hp ~ !!sym(var1), data = .))) %>%
summarize(broom::glance(mod))
}
test_fcn('qsec', 'cyl', 'carb')
I'm pushing my R/dplyr comfort zone by mixing static and dynamic variable arguments, so I've left them here in case that's a contributing factor. Thanks for any input!

You were nearly there.
test_fcn <- function(var1, ...) {
x <- unlist(list(...))
mtcars %>%
group_by(across(all_of(c('gear', x)))) %>%
summarise(
mod = list(lm(hp ~ !!sym(var1), data = .)),
mod = map(mod, broom::glance),
.groups = "drop")
}
test_fcn('qsec', 'cyl', 'carb') %>% unnest(mod)
## A tibble: 12 × 15
# gear cyl carb r.squared adj.r.sq…¹ sigma stati…² p.value df logLik AIC BIC devia…³ df.re…⁴
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1 3 4 1 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 2 3 6 1 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 3 3 8 2 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 4 3 8 3 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 5 3 8 4 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 6 4 4 1 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 7 4 4 2 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 8 4 6 4 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 9 5 4 2 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
#10 5 6 6 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
#11 5 8 4 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
#12 5 8 8 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
## … with 1 more variable: nobs <int>, and abbreviated variable names ¹​adj.r.squared, ²​statistic,
## ³​deviance, ⁴​df.residual
## ℹ Use `colnames()` to see all variable names
Because you are storing the lm fit objects in a list, you need to loop over the entries using purrr::map.
You might want to put the unnest into the test_fcn: a slightly more compact version would be
test_fcn <- function(var1, ...) {
x <- unlist(list(...))
mtcars %>%
group_by(across(all_of(c('gear', x)))) %>%
summarise(
mod = map(list(lm(hp ~ !!sym(var1), data = .)), broom::glance),
.groups = "drop") %>%
unnest(mod)
}
Update
Until your comment, I hadn't realised that the grouping was ignored. Here is a nest-unnest-type solution.
test_fcn <- function(var1, ...) {
x <- list(...)
mtcars %>%
group_by(across(all_of(c('gear', x)))) %>%
nest() %>%
ungroup() %>%
mutate(mod = map(
data,
~ lm(hp ~ !!sym(var1), data = .x) %>% broom::glance())) %>%
unnest(mod)
}
test_fcn('qsec', 'cyl', 'carb')
## A tibble: 12 × 16
# cyl gear carb data r.squared adj.r.s…¹ sigma statis…² p.value df logLik
# <dbl> <dbl> <dbl> <list> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 6 4 4 <tibble> 0.911 0.867 2.74e+ 0 20.5 0.0454 1 -8.32
# 2 4 4 1 <tibble> 0.525 0.287 1.15e+ 1 2.21 0.276 1 -14.1
# 3 6 3 1 <tibble> 1 NaN NaN NaN NaN 1 Inf
# 4 8 3 2 <tibble> 0.0262 -0.461 1.74e+ 1 0.0538 0.838 1 -15.7
# 5 8 3 4 <tibble> 0.869 0.825 7.48e+ 0 19.9 0.0210 1 -15.9
# 6 4 4 2 <tibble> 0.0721 -0.392 3.18e+ 1 0.155 0.732 1 -18.1
# 7 8 3 3 <tibble> 0.538 0.0769 2.63e-14 1.17 0.475 1 91.2
# 8 4 3 1 <tibble> 0 0 NaN NA NA NA Inf
# 9 4 5 2 <tibble> 1 NaN NaN NaN NaN 1 Inf
#10 8 5 4 <tibble> 0 0 NaN NA NA NA Inf
#11 6 5 6 <tibble> 0 0 NaN NA NA NA Inf
#12 8 5 8 <tibble> 0 0 NaN NA NA NA Inf
## … with 5 more variables: AIC <dbl>, BIC <dbl>, deviance <dbl>, df.residual <int>,
## nobs <int>, and abbreviated variable names ¹​adj.r.squared, ²​statistic
## ℹ Use `colnames()` to see all variable names
Explanation: tidyr::nest nests data in a list column (with name data by default); we can then loop through the data entries, fit the model and extract model summaries with broom::glance in a new column mod; unnesting mod then gives the desired structure. If not needed, you can remove the data column with select(-data).
PS. The example produces some warnings (leading to NAs in the model summaries) from those groups where you have only a single observation.

Related

How to get handle of function output with sapply in R?

Let's suppose that this is my code:
library(magrittr)
library(dplyr)
set.seed(123,kind="Mersenne-Twister",normal.kind="Inversion")
y = runif(20,0,50)
simulation <- function(y){
x <- rnorm(length(y),3,0.125)
lm(y ~ x)
}
fit <- lapply(1:10, function(dummy) simulation(y))
coef <- sapply(fit, coef) %>%
t() %>%
as.data.frame()
How can I collect the 10 simulated x variables generated from the function simulation in a data frame?
They are stored in the fit object. A single one can be extracted with fit[[1]]$model$x. Programmatically we can get all of them in a data frame like this:
xs = lapply(fit, \(x) x$model$x)
data.frame(i = rep(seq_along(xs), lengths(xs)), x = unlist(xs))
# i x
# 1 1 3.153010
# 2 1 3.044977
# 3 1 3.050096
# 4 1 3.013835
# 5 1 2.930520
# 6 1 3.223364
# 7 1 3.062231
# ...
Or, if you want lots of info you can use broom::augment instead. This will return the full data, along with predicted values, residuals, etc.
result = bind_rows(lapply(fit, broom::augment), .id = "sim")
head(result)
# A tibble: 6 × 9
# sim y x .fitted .resid .hat .sigma .cooksd .std.resid
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 14.4 3.15 19.2 -4.78 0.141 15.1 0.0101 -0.350
# 2 1 39.4 3.04 24.6 14.8 0.0612 14.7 0.0353 1.04
# 3 1 20.4 3.05 24.3 -3.89 0.0633 15.1 0.00252 -0.273
# 4 1 44.2 3.01 26.2 18.0 0.0525 14.5 0.0437 1.26
# 5 1 47.0 2.93 30.4 16.7 0.0603 14.6 0.0438 1.17
# 6 1 2.28 3.22 15.6 -13.3 0.234 14.7 0.164 -1.04
You might also like bind_rows(lapply(fit, broom::tidy), .id = "sim") which will give you one row per coefficient and bind_rows(lapply(fit, broom::glance), .id = "sim") which will give you one row per model.

Nested loop through function with multiple arguments and stack the output

I wrote a function that runs a linear model and outputs a data frame. I would like to run the function several times over two grouping variables and stack the output. Here is a hypothetical dataset and function:
data = data.frame(grade_level = rep(1:4, each = 3),
x = rnorm(12, mean = 21, sd = 7.5),
y = rnorm(12, mean = 20, sd = 7),
cut_set = rep(c("low", "med", "high"), each = 4))
func = function(grade, set){
model = lm(y ~ x, data=data[data$grade_level == grade & data$cut_set == set,])
fitted.values = model$fitted.values
final = data.frame(grade_level = data$grade_level[data$grade_level == grade & data$cut_set == set],
predicted_values = fitted.values)
final
}
I can run it multiple times and then bind the output but I know this isn't the best
g1.low <- func(grade = 1, set = "low")
g1.med <- func(grade = 1, set = "med")
pred.values = rbind(g1.low, g1.med)
I would like to loop through all grades (1 to 4) and set ("low", "med", "high") values. I've tried this loop but it doesn't work. I wonder if there is a purrr solution.
for (i in grades) {
for(c in 1:length(cut_sets)) {
temp <- func(grade = i, set = cut_sets[c])
predicted.values <- rbind(predicted.values, temp)
}
}
If I've understood well you can manage it with dplyr and broom:
library(dplyr)
library(broom)
library(tidyr)
mods <- data %>%
group_by(grade_level, cut_set) %>%
do(model = augment(lm(y ~ x, data = .)) )
mods
# A tibble: 6 x 3
# Rowwise:
grade_level cut_set model
<int> <chr> <list>
1 1 low <tibble [3 x 8]>
2 2 low <tibble [1 x 8]>
3 2 med <tibble [2 x 8]>
4 3 high <tibble [1 x 8]>
5 3 med <tibble [2 x 8]>
6 4 high <tibble [3 x 8]>
mods %>% unnest(cols = c(model))
# A tibble: 12 x 10
grade_level cut_set y x .fitted .resid .hat .sigma .cooksd .std.resid
<int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 low 27.5 20.9 27.4 1.12e- 1 0.992 NaN 60.9 1.
2 1 low 24.8 30.4 24.0 8.15e- 1 0.567 Inf 0.656 1.00
3 1 low 23.5 29.3 24.4 -9.26e- 1 0.441 NaN 0.394 -1
4 2 low 31.6 18.6 31.6 0. 1 0 NaN NaN
5 2 med 19.3 20.9 19.3 3.55e-15 1 0 NaN NaN
6 2 med 16.9 14.7 16.9 0. 1 0 NaN NaN
7 3 high 20.1 22.9 20.1 0. 1 0 NaN NaN
8 3 med 21.6 13.2 21.6 3.55e-15 1 0 NaN NaN
9 3 med 20.9 26.5 20.9 0. 1 0 NaN NaN
10 4 high 26.4 20.0 20.9 5.49e+ 0 0.369 NaN 0.293 1.
11 4 high 15.2 15.6 19.0 -3.88e+ 0 0.685 NaN 1.09 -1.
12 4 high 23.7 30.8 25.3 -1.61e+ 0 0.946 NaN 8.71 -1.
To get slopes, you can:
data %>%
group_by(grade_level, cut_set) %>%
do(model = tidy(lm(y ~ x, data = .)) ) %>% unnest(cols = c(model))
# A tibble: 12 x 7
grade_level cut_set term estimate std.error statistic p.value
<int> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 low (Intercept) 14.8 7.05 2.09 0.284
2 1 low x 0.339 0.371 0.913 0.529
3 2 low (Intercept) 23.1 NaN NaN NaN
4 2 low x NA NA NA NA
5 2 med (Intercept) 1.27 NaN NaN NaN
6 2 med x 0.561 NaN NaN NaN
7 3 high (Intercept) 14.7 NaN NaN NaN
8 3 high x NA NA NA NA
9 3 med (Intercept) 7.29 NaN NaN NaN
10 3 med x 0.229 NaN NaN NaN
11 4 high (Intercept) 13.8 4.18 3.30 0.187
12 4 high x 0.106 0.210 0.505 0.702

Performing a linear model in R of a single response with a single predictor from a large dataframe and repeat for each column

It might not be very clear from the title but what I wish to do is:
I have a dataframe df with, say, 200 columns and the first 80 columns are response variables (y1, y2, y3, ...) and the rest of 120 are predictors (x1, x2, x3, ...).
I wish to compute a linear model for each pair – lm(yi ~ xi, data = df).
Many problems and solutions I have looked through online have a either a fixed response vs many predictors or the other way around, using lapply() and its related functions.
Could anyone who is familiar with it point me to the right step?
use tidyverse
library(tidyverse)
library(broom)
df <- mtcars
y <- names(df)[1:3]
x <- names(df)[4:7]
result <- expand_grid(x, y) %>%
rowwise() %>%
mutate(frm = list(reformulate(x, y)),
model = list(lm(frm, data = df)))
result$model <- purrr::set_names(result$model, nm = paste0(result$y, " ~ ", result$x))
result$model[1:2]
#> $`mpg ~ hp`
#>
#> Call:
#> lm(formula = frm, data = df)
#>
#> Coefficients:
#> (Intercept) hp
#> 30.09886 -0.06823
#>
#>
#> $`cyl ~ hp`
#>
#> Call:
#> lm(formula = frm, data = df)
#>
#> Coefficients:
#> (Intercept) hp
#> 3.00680 0.02168
map_df(result$model, tidy)
#> # A tibble: 24 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 30.1 1.63 18.4 6.64e-18
#> 2 hp -0.0682 0.0101 -6.74 1.79e- 7
#> 3 (Intercept) 3.01 0.425 7.07 7.41e- 8
#> 4 hp 0.0217 0.00264 8.23 3.48e- 9
#> 5 (Intercept) 21.0 32.6 0.644 5.25e- 1
#> 6 hp 1.43 0.202 7.08 7.14e- 8
#> 7 (Intercept) -7.52 5.48 -1.37 1.80e- 1
#> 8 drat 7.68 1.51 5.10 1.78e- 5
#> 9 (Intercept) 14.6 1.58 9.22 2.93e-10
#> 10 drat -2.34 0.436 -5.37 8.24e- 6
#> # ... with 14 more rows
map_df(result$model, glance)
#> # A tibble: 12 x 12
#> r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.602 0.589 3.86 45.5 1.79e- 7 1 -87.6 181. 186.
#> 2 0.693 0.683 1.01 67.7 3.48e- 9 1 -44.6 95.1 99.5
#> 3 0.626 0.613 77.1 50.1 7.14e- 8 1 -183. 373. 377.
#> 4 0.464 0.446 4.49 26.0 1.78e- 5 1 -92.4 191. 195.
#> 5 0.490 0.473 1.30 28.8 8.24e- 6 1 -52.7 111. 116.
#> 6 0.504 0.488 88.7 30.5 5.28e- 6 1 -188. 382. 386.
#> 7 0.753 0.745 3.05 91.4 1.29e-10 1 -80.0 166. 170.
#> 8 0.612 0.599 1.13 47.4 1.22e- 7 1 -48.3 103. 107.
#> 9 0.789 0.781 57.9 112. 1.22e-11 1 -174. 355. 359.
#> 10 0.175 0.148 5.56 6.38 1.71e- 2 1 -99.3 205. 209.
#> 11 0.350 0.328 1.46 16.1 3.66e- 4 1 -56.6 119. 124.
#> 12 0.188 0.161 114. 6.95 1.31e- 2 1 -196. 398. 402.
#> # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
Created on 2020-12-11 by the reprex package (v0.3.0)

pmap_ variants operating on data.frames as lists

I have a recollection that purrr::pmap_* can treat a data.frame as a list but the syntax eludes me.
Imagine we wanted to fit a separate lm object for each value of mtcars$vs and mtcars$am
library(tidyverse)
library(broom)
d1 <- mtcars %>%
group_by(
vs, am
) %>%
nest %>%
mutate(
coef = data %>%
map(
~lm(mpg ~ wt, data =.) %>%
tidy
)
)
If I wanted to extract the coefficient estimates as an un-nested data.frame, and append the values of am and vs, I might try
d1[, -3] %>%
pmap_dfr(
function(i, j, k)
k %>%
mutate(
vs = i,
am = j
)
)
But this results in an error. More explicitly declaring these variables as separate lists has the desired effect
list(
d1$vs,
d1$am,
d1$coef
) %>%
pmap_dfr(
function(i, j, k)
k %>%
mutate(
vs = i,
am = j
)
)
Is there a succinct way for pmap_* to treat a data.frame as a list?
We can use the standard option to extract the components (..1, ..2, etc)
d1[, -3] %>%
pmap_dfr(~ ..3 %>%
mutate(vs = ..1, am = ..2))
# A tibble: 8 x 7
# term estimate std.error statistic p.value vs am
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 (Intercept) 42.4 3.30 12.8 0.000213 0 1
#2 wt -7.91 1.14 -6.93 0.00227 0 1
#3 (Intercept) 44.1 6.96 6.34 0.00144 1 1
#4 wt -7.77 3.36 -2.31 0.0689 1 1
#5 (Intercept) 31.5 8.98 3.51 0.0171 1 0
#6 wt -3.38 2.80 -1.21 0.281 1 0
#7 (Intercept) 25.1 3.51 7.14 0.0000315 0 0
#8 wt -2.44 0.842 -2.90 0.0159 0 0
This is because the second list has no names attribute. If you unname d1 it works. The fact that you used the list function in the second example doesn't make a difference (except that it removed the names), because both objects are lists (data frames are lists).
d1[, -3] %>%
unname %>%
pmap_dfr(
function(i, j, k)
k %>%
mutate(
vs = i,
am = j
)
)
# # A tibble: 8 x 7
# term estimate std.error statistic p.value vs am
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 (Intercept) 42.4 3.30 12.8 0.000213 0 1
# 2 wt -7.91 1.14 -6.93 0.00227 0 1
# 3 (Intercept) 44.1 6.96 6.34 0.00144 1 1
# 4 wt -7.77 3.36 -2.31 0.0689 1 1
# 5 (Intercept) 31.5 8.98 3.51 0.0171 1 0
# 6 wt -3.38 2.80 -1.21 0.281 1 0
# 7 (Intercept) 25.1 3.51 7.14 0.0000315 0 0
# 8 wt -2.44 0.842 -2.90 0.0159 0 0
You can also name the arguments in your first code block's function to match (or use ..1 etc) for the same result
d1[, -3] %>%
pmap_dfr(
function(vs, am, coef)
coef %>%
mutate(
vs = vs,
am = am
)
)
# # A tibble: 8 x 7
# term estimate std.error statistic p.value vs am
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 (Intercept) 42.4 3.30 12.8 0.000213 0 1
# 2 wt -7.91 1.14 -6.93 0.00227 0 1
# 3 (Intercept) 44.1 6.96 6.34 0.00144 1 1
# 4 wt -7.77 3.36 -2.31 0.0689 1 1
# 5 (Intercept) 31.5 8.98 3.51 0.0171 1 0
# 6 wt -3.38 2.80 -1.21 0.281 1 0
# 7 (Intercept) 25.1 3.51 7.14 0.0000315 0 0
# 8 wt -2.44 0.842 -2.90 0.0159 0 0
You could also use wap from the experimental rap package
library(rap)
d1[, -3] %>%
wap( ~ coef %>%
mutate(
vs = vs,
am = am)) %>%
bind_rows
# # A tibble: 8 x 7
# term estimate std.error statistic p.value vs am
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 (Intercept) 42.4 3.30 12.8 0.000213 0 1
# 2 wt -7.91 1.14 -6.93 0.00227 0 1
# 3 (Intercept) 44.1 6.96 6.34 0.00144 1 1
# 4 wt -7.77 3.36 -2.31 0.0689 1 1
# 5 (Intercept) 31.5 8.98 3.51 0.0171 1 0
# 6 wt -3.38 2.80 -1.21 0.281 1 0
# 7 (Intercept) 25.1 3.51 7.14 0.0000315 0 0
# 8 wt -2.44 0.842 -2.90 0.0159 0 0

setting list element names based on argument to `pmap`

I am trying to figure out if I can use the list of arguments provided to purrr::pmap() to also name the elements of the output list from this function using purrr::set_names().
For example, here is a simple example where I am using pmap to create summary for some variables from different dataframes across grouping variables.
# setup
library(tidyverse)
library(groupedstats)
set.seed(123)
# creating the dataframes
data_1 <- tibble::as.tibble(iris)
data_2 <- tibble::as.tibble(mtcars)
data_3 <- tibble::as.tibble(airquality)
# creating a list
purrr::pmap(
.l = list(
data = list(data_1, data_2, data_3),
grouping.vars = alist(Species, c(am, cyl), Month),
measures = alist(c(Sepal.Length, Sepal.Width), wt, c(Ozone, Solar.R, Wind))
),
.f = groupedstats::grouped_summary
) %>% # assigning names to each element of the list
purrr::set_names(x = ., nm = alist(data_1, data_2, data_3))
# output
#> $data_1
#> # A tibble: 6 x 16
#> Species type variable missing complete n mean sd min p25
#> <fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa nume~ Sepal.L~ 0 50 50 5.01 0.35 4.3 4.8
#> 2 setosa nume~ Sepal.W~ 0 50 50 3.43 0.38 2.3 3.2
#> 3 versic~ nume~ Sepal.L~ 0 50 50 5.94 0.52 4.9 5.6
#> 4 versic~ nume~ Sepal.W~ 0 50 50 2.77 0.31 2 2.52
#> 5 virgin~ nume~ Sepal.L~ 0 50 50 6.59 0.64 4.9 6.23
#> 6 virgin~ nume~ Sepal.W~ 0 50 50 2.97 0.32 2.2 2.8
#> # ... with 6 more variables: median <dbl>, p75 <dbl>, max <dbl>,
#> # std.error <dbl>, mean.low.conf <dbl>, mean.high.conf <dbl>
#>
#> $data_2
#> # A tibble: 6 x 17
#> am cyl type variable missing complete n mean sd min p25
#> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 6 nume~ wt 0 3 3 2.75 0.13 2.62 2.7
#> 2 1 4 nume~ wt 0 8 8 2.04 0.41 1.51 1.78
#> 3 0 6 nume~ wt 0 4 4 3.39 0.12 3.21 3.38
#> 4 0 8 nume~ wt 0 12 12 4.1 0.77 3.44 3.56
#> 5 0 4 nume~ wt 0 3 3 2.94 0.41 2.46 2.81
#> 6 1 8 nume~ wt 0 2 2 3.37 0.28 3.17 3.27
#> # ... with 6 more variables: median <dbl>, p75 <dbl>, max <dbl>,
#> # std.error <dbl>, mean.low.conf <dbl>, mean.high.conf <dbl>
#>
#> $data_3
#> # A tibble: 15 x 16
#> Month type variable missing complete n mean sd min p25
#> <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 5 inte~ Ozone 5 26 31 23.6 22.2 1 11
#> 2 5 inte~ Solar.R 4 27 31 181. 115. 8 72
#> 3 5 nume~ Wind 0 31 31 11.6 3.53 5.7 8.9
#> 4 6 inte~ Ozone 21 9 30 29.4 18.2 12 20
#> 5 6 inte~ Solar.R 0 30 30 190. 92.9 31 127
#> 6 6 nume~ Wind 0 30 30 10.3 3.77 1.7 8
#> 7 7 inte~ Ozone 5 26 31 59.1 31.6 7 36.2
#> 8 7 inte~ Solar.R 0 31 31 216. 80.6 7 175
#> 9 7 nume~ Wind 0 31 31 8.94 3.04 4.1 6.9
#> 10 8 inte~ Ozone 5 26 31 60.0 39.7 9 28.8
#> 11 8 inte~ Solar.R 3 28 31 172. 76.8 24 107
#> 12 8 nume~ Wind 0 31 31 8.79 3.23 2.3 6.6
#> 13 9 inte~ Ozone 1 29 30 31.4 24.1 7 16
#> 14 9 inte~ Solar.R 0 30 30 167. 79.1 14 117.
#> 15 9 nume~ Wind 0 30 30 10.2 3.46 2.8 7.55
#> # ... with 6 more variables: median <dbl>, p75 <dbl>, max <dbl>,
#> # std.error <dbl>, mean.low.conf <dbl>, mean.high.conf <dbl>
Created on 2018-10-31 by the reprex package (v0.2.1)
As can be seen here, the contents of data argument to purrr::pmap and nm argument in purrr::set_names are exactly identical ((data_1, data_2, data_3)). I want to avoid this repetition (which seems unnecessary here with 3 elements, but I have a much bigger list of arguments). I can't assign this list to a separate object because in one case it is a list, while the other one is entered as alist.
How can I do this?
From tidyverse package, you can also use lst function. lst is used for creating list. It is like tibble function to create tibble but for creating list. One of the difference with base list() is that it automatically names the list.
It is in dplyr, exported from tibble.
For the example, I also replace base alist by rlang::exprs as it is equivalent. Indeed, both are ok.
library(tidyverse)
library(groupedstats)
set.seed(123)
# creating the dataframes
data_1 <- tibble::as.tibble(iris)
data_2 <- tibble::as.tibble(mtcars)
data_3 <- tibble::as.tibble(airquality)
# creating a list
purrr::pmap(
.l = list(
data = lst(data_1, data_2, data_3),
grouping.vars = rlang::exprs(Species, c(am, cyl), Month),
measures = rlang::exprs(c(Sepal.Length, Sepal.Width), wt, c(Ozone, Solar.R, Wind))
),
.f = groupedstats::grouped_summary
) %>%
str(1)
#> List of 3
#> $ data_1:Classes 'tbl_df', 'tbl' and 'data.frame': 6 obs. of 16 variables:
#> $ data_2:Classes 'tbl_df', 'tbl' and 'data.frame': 6 obs. of 17 variables:
#> $ data_3:Classes 'tbl_df', 'tbl' and 'data.frame': 15 obs. of 16 variables:
Created on 2018-11-02 by the reprex package (v0.2.1)

Categories

Resources