I have a recollection that purrr::pmap_* can treat a data.frame as a list but the syntax eludes me.
Imagine we wanted to fit a separate lm object for each value of mtcars$vs and mtcars$am
library(tidyverse)
library(broom)
d1 <- mtcars %>%
group_by(
vs, am
) %>%
nest %>%
mutate(
coef = data %>%
map(
~lm(mpg ~ wt, data =.) %>%
tidy
)
)
If I wanted to extract the coefficient estimates as an un-nested data.frame, and append the values of am and vs, I might try
d1[, -3] %>%
pmap_dfr(
function(i, j, k)
k %>%
mutate(
vs = i,
am = j
)
)
But this results in an error. More explicitly declaring these variables as separate lists has the desired effect
list(
d1$vs,
d1$am,
d1$coef
) %>%
pmap_dfr(
function(i, j, k)
k %>%
mutate(
vs = i,
am = j
)
)
Is there a succinct way for pmap_* to treat a data.frame as a list?
We can use the standard option to extract the components (..1, ..2, etc)
d1[, -3] %>%
pmap_dfr(~ ..3 %>%
mutate(vs = ..1, am = ..2))
# A tibble: 8 x 7
# term estimate std.error statistic p.value vs am
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 (Intercept) 42.4 3.30 12.8 0.000213 0 1
#2 wt -7.91 1.14 -6.93 0.00227 0 1
#3 (Intercept) 44.1 6.96 6.34 0.00144 1 1
#4 wt -7.77 3.36 -2.31 0.0689 1 1
#5 (Intercept) 31.5 8.98 3.51 0.0171 1 0
#6 wt -3.38 2.80 -1.21 0.281 1 0
#7 (Intercept) 25.1 3.51 7.14 0.0000315 0 0
#8 wt -2.44 0.842 -2.90 0.0159 0 0
This is because the second list has no names attribute. If you unname d1 it works. The fact that you used the list function in the second example doesn't make a difference (except that it removed the names), because both objects are lists (data frames are lists).
d1[, -3] %>%
unname %>%
pmap_dfr(
function(i, j, k)
k %>%
mutate(
vs = i,
am = j
)
)
# # A tibble: 8 x 7
# term estimate std.error statistic p.value vs am
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 (Intercept) 42.4 3.30 12.8 0.000213 0 1
# 2 wt -7.91 1.14 -6.93 0.00227 0 1
# 3 (Intercept) 44.1 6.96 6.34 0.00144 1 1
# 4 wt -7.77 3.36 -2.31 0.0689 1 1
# 5 (Intercept) 31.5 8.98 3.51 0.0171 1 0
# 6 wt -3.38 2.80 -1.21 0.281 1 0
# 7 (Intercept) 25.1 3.51 7.14 0.0000315 0 0
# 8 wt -2.44 0.842 -2.90 0.0159 0 0
You can also name the arguments in your first code block's function to match (or use ..1 etc) for the same result
d1[, -3] %>%
pmap_dfr(
function(vs, am, coef)
coef %>%
mutate(
vs = vs,
am = am
)
)
# # A tibble: 8 x 7
# term estimate std.error statistic p.value vs am
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 (Intercept) 42.4 3.30 12.8 0.000213 0 1
# 2 wt -7.91 1.14 -6.93 0.00227 0 1
# 3 (Intercept) 44.1 6.96 6.34 0.00144 1 1
# 4 wt -7.77 3.36 -2.31 0.0689 1 1
# 5 (Intercept) 31.5 8.98 3.51 0.0171 1 0
# 6 wt -3.38 2.80 -1.21 0.281 1 0
# 7 (Intercept) 25.1 3.51 7.14 0.0000315 0 0
# 8 wt -2.44 0.842 -2.90 0.0159 0 0
You could also use wap from the experimental rap package
library(rap)
d1[, -3] %>%
wap( ~ coef %>%
mutate(
vs = vs,
am = am)) %>%
bind_rows
# # A tibble: 8 x 7
# term estimate std.error statistic p.value vs am
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 (Intercept) 42.4 3.30 12.8 0.000213 0 1
# 2 wt -7.91 1.14 -6.93 0.00227 0 1
# 3 (Intercept) 44.1 6.96 6.34 0.00144 1 1
# 4 wt -7.77 3.36 -2.31 0.0689 1 1
# 5 (Intercept) 31.5 8.98 3.51 0.0171 1 0
# 6 wt -3.38 2.80 -1.21 0.281 1 0
# 7 (Intercept) 25.1 3.51 7.14 0.0000315 0 0
# 8 wt -2.44 0.842 -2.90 0.0159 0 0
Related
I'm trying to write a function that can flexibly group by a variable number of arguments and fit a linear model to each subset. The output should be a table with each row showing the grouping variable(s) and corresponding lm call results that broom::glance provides. But I can't figure out how to structure the output. Code that produces the same error is as follows:
library(dplyr)
library(broom)
test_fcn <- function(var1, ...) {
x <- unlist(list(...))
mtcars %>%
group_by(across(all_of(c('gear', x)))) %>%
mutate(mod = list(lm(hp ~ !!sym(var1), data = .))) %>%
summarize(broom::glance(mod))
}
test_fcn('qsec', 'cyl', 'carb')
I'm pushing my R/dplyr comfort zone by mixing static and dynamic variable arguments, so I've left them here in case that's a contributing factor. Thanks for any input!
You were nearly there.
test_fcn <- function(var1, ...) {
x <- unlist(list(...))
mtcars %>%
group_by(across(all_of(c('gear', x)))) %>%
summarise(
mod = list(lm(hp ~ !!sym(var1), data = .)),
mod = map(mod, broom::glance),
.groups = "drop")
}
test_fcn('qsec', 'cyl', 'carb') %>% unnest(mod)
## A tibble: 12 × 15
# gear cyl carb r.squared adj.r.sq…¹ sigma stati…² p.value df logLik AIC BIC devia…³ df.re…⁴
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1 3 4 1 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 2 3 6 1 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 3 3 8 2 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 4 3 8 3 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 5 3 8 4 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 6 4 4 1 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 7 4 4 2 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 8 4 6 4 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 9 5 4 2 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
#10 5 6 6 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
#11 5 8 4 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
#12 5 8 8 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
## … with 1 more variable: nobs <int>, and abbreviated variable names ¹adj.r.squared, ²statistic,
## ³deviance, ⁴df.residual
## ℹ Use `colnames()` to see all variable names
Because you are storing the lm fit objects in a list, you need to loop over the entries using purrr::map.
You might want to put the unnest into the test_fcn: a slightly more compact version would be
test_fcn <- function(var1, ...) {
x <- unlist(list(...))
mtcars %>%
group_by(across(all_of(c('gear', x)))) %>%
summarise(
mod = map(list(lm(hp ~ !!sym(var1), data = .)), broom::glance),
.groups = "drop") %>%
unnest(mod)
}
Update
Until your comment, I hadn't realised that the grouping was ignored. Here is a nest-unnest-type solution.
test_fcn <- function(var1, ...) {
x <- list(...)
mtcars %>%
group_by(across(all_of(c('gear', x)))) %>%
nest() %>%
ungroup() %>%
mutate(mod = map(
data,
~ lm(hp ~ !!sym(var1), data = .x) %>% broom::glance())) %>%
unnest(mod)
}
test_fcn('qsec', 'cyl', 'carb')
## A tibble: 12 × 16
# cyl gear carb data r.squared adj.r.s…¹ sigma statis…² p.value df logLik
# <dbl> <dbl> <dbl> <list> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 6 4 4 <tibble> 0.911 0.867 2.74e+ 0 20.5 0.0454 1 -8.32
# 2 4 4 1 <tibble> 0.525 0.287 1.15e+ 1 2.21 0.276 1 -14.1
# 3 6 3 1 <tibble> 1 NaN NaN NaN NaN 1 Inf
# 4 8 3 2 <tibble> 0.0262 -0.461 1.74e+ 1 0.0538 0.838 1 -15.7
# 5 8 3 4 <tibble> 0.869 0.825 7.48e+ 0 19.9 0.0210 1 -15.9
# 6 4 4 2 <tibble> 0.0721 -0.392 3.18e+ 1 0.155 0.732 1 -18.1
# 7 8 3 3 <tibble> 0.538 0.0769 2.63e-14 1.17 0.475 1 91.2
# 8 4 3 1 <tibble> 0 0 NaN NA NA NA Inf
# 9 4 5 2 <tibble> 1 NaN NaN NaN NaN 1 Inf
#10 8 5 4 <tibble> 0 0 NaN NA NA NA Inf
#11 6 5 6 <tibble> 0 0 NaN NA NA NA Inf
#12 8 5 8 <tibble> 0 0 NaN NA NA NA Inf
## … with 5 more variables: AIC <dbl>, BIC <dbl>, deviance <dbl>, df.residual <int>,
## nobs <int>, and abbreviated variable names ¹adj.r.squared, ²statistic
## ℹ Use `colnames()` to see all variable names
Explanation: tidyr::nest nests data in a list column (with name data by default); we can then loop through the data entries, fit the model and extract model summaries with broom::glance in a new column mod; unnesting mod then gives the desired structure. If not needed, you can remove the data column with select(-data).
PS. The example produces some warnings (leading to NAs in the model summaries) from those groups where you have only a single observation.
Let's suppose that this is my code:
library(magrittr)
library(dplyr)
set.seed(123,kind="Mersenne-Twister",normal.kind="Inversion")
y = runif(20,0,50)
simulation <- function(y){
x <- rnorm(length(y),3,0.125)
lm(y ~ x)
}
fit <- lapply(1:10, function(dummy) simulation(y))
coef <- sapply(fit, coef) %>%
t() %>%
as.data.frame()
How can I collect the 10 simulated x variables generated from the function simulation in a data frame?
They are stored in the fit object. A single one can be extracted with fit[[1]]$model$x. Programmatically we can get all of them in a data frame like this:
xs = lapply(fit, \(x) x$model$x)
data.frame(i = rep(seq_along(xs), lengths(xs)), x = unlist(xs))
# i x
# 1 1 3.153010
# 2 1 3.044977
# 3 1 3.050096
# 4 1 3.013835
# 5 1 2.930520
# 6 1 3.223364
# 7 1 3.062231
# ...
Or, if you want lots of info you can use broom::augment instead. This will return the full data, along with predicted values, residuals, etc.
result = bind_rows(lapply(fit, broom::augment), .id = "sim")
head(result)
# A tibble: 6 × 9
# sim y x .fitted .resid .hat .sigma .cooksd .std.resid
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 14.4 3.15 19.2 -4.78 0.141 15.1 0.0101 -0.350
# 2 1 39.4 3.04 24.6 14.8 0.0612 14.7 0.0353 1.04
# 3 1 20.4 3.05 24.3 -3.89 0.0633 15.1 0.00252 -0.273
# 4 1 44.2 3.01 26.2 18.0 0.0525 14.5 0.0437 1.26
# 5 1 47.0 2.93 30.4 16.7 0.0603 14.6 0.0438 1.17
# 6 1 2.28 3.22 15.6 -13.3 0.234 14.7 0.164 -1.04
You might also like bind_rows(lapply(fit, broom::tidy), .id = "sim") which will give you one row per coefficient and bind_rows(lapply(fit, broom::glance), .id = "sim") which will give you one row per model.
I am trying to update my function using the new version of dplyr.
First, I had this function (old version):
slope.k <- function(data, Treatment, Replicate, Day, Ln.AFDMrem){
fitted_models <- data %>% group_by(Treatment, Replicate) %>%
do(model = lm(Ln.AFDMrem ~ Day, data = .))
broom::tidy(fitted_models,model) %>% print(n = Inf)
}
However, the do() function was superseded. Now, I am trying to update with this (new) version:
slope.k <- function(data, Treatment, Replicate, Day, Ln.AFDMrem){
mod_t <- data %>% nest_by(Treatment, Replicate) %>%
mutate(model = list(lm(Ln.AFDMrem ~ Day, data = data))) %>%
summarise(tidy_out = list(tidy(model)))
unnest(select(mod_t, Treatment, tidy_out)) %>% print(n = Inf)
}
However, it doesn't work properly, because I have the following warnings:
Warning messages:
1: `cols` is now required when using unnest().
Please use `cols = c(tidy_out)`
2: `...` is not empty.
We detected these problematic arguments:
* `needs_dots`
These dots only exist to allow future extensions and should be empty.
Did you misspecify an argument?
Thanks in advance!!!
The issue would be the use of select with unnest. It can be reproduced by changing the select to c
libary(dplyr)
library(broom)
library(tidyr)
mtcars %>%
nest_by(carb, gear) %>%
mutate(model = list(lm(mpg ~ disp + drat, data = data))) %>%
summarise(tidy_out = list(tidy(model)), .groups = 'drop') %>%
unnest(c(tidy_out))
-output
# A tibble: 33 x 7
# carb gear term estimate std.error statistic p.value
# <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 3 (Intercept) -8.50 NaN NaN NaN
# 2 1 3 disp 0.0312 NaN NaN NaN
# 3 1 3 drat 7.10 NaN NaN NaN
# 4 1 4 (Intercept) -70.5 302. -0.234 0.854
# 5 1 4 disp -0.0445 0.587 -0.0757 0.952
# 6 1 4 drat 25.5 62.4 0.408 0.753
# 7 2 3 (Intercept) -3.72 8.57 -0.434 0.739
# 8 2 3 disp 0.0437 0.0123 3.54 0.175
# 9 2 3 drat 1.90 2.88 0.661 0.628
#10 2 4 (Intercept) -10.0 226. -0.0443 0.972
# … with 23 more rows
Also, after the mutate, step, we can directly use the unnest on the 'tidy_out' column
If we use as a function, assuming that unquoted arguments are passed as column names
slope.k <- function(data, Treatment, Replicate, Day, Ln.AFDMrem){
ln_col <- rlang::as_string(ensym(Ln.AFDMrem))
day_col <- rlang::as_string(ensym(Day))
data %>%
nest_by({{Treatment}}, {{Replicate}}) %>%
mutate(model = list(lm(reformulate(day_col, ln_col), data = data))) %>%
summarise(tidy_out = list(tidy(model)), .groups = 'drop') %>%
unnest(tidy_out)
}
slope.k(mtcars, carb, gear, disp, mpg)
# A tibble: 22 x 7
carb gear term estimate std.error statistic p.value
<dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 3 (Intercept) 22.0 5.35 4.12 0.152
2 1 3 disp -0.00841 0.0255 -0.329 0.797
3 1 4 (Intercept) 52.6 8.32 6.32 0.0242
4 1 4 disp -0.279 0.0975 -2.86 0.104
5 2 3 (Intercept) 1.25 3.49 0.357 0.755
6 2 3 disp 0.0460 0.0100 4.59 0.0443
7 2 4 (Intercept) 36.6 6.57 5.57 0.0308
8 2 4 disp -0.0978 0.0529 -1.85 0.206
9 2 5 (Intercept) 47.0 NaN NaN NaN
10 2 5 disp -0.175 NaN NaN NaN
# … with 12 more rows
I am extracting the regression results for two different groups as shown in this example below. In the temp data.frame i get the estimate, std.error, statistic and p-value. However, i don't get the confidence intervals. Is there a simple way to extract them as well?
df <- tibble(
a = rnorm(1000),
b = rnorm(1000),
c = rnorm(1000),
d = rnorm(1000),
group = rbinom(n=1000, size=1, prob=0.5)
)
df$group = as.factor(df$group)
temp <- df %>%
group_by(group) %>%
do(model1 = tidy(lm(a ~ b + c + d, data = .))) %>%
gather(model_name, model, -group) %>%
unnest()
You are doing tidy on a lm object. If you check the help page, there is an option to include the confidence interval, conf.int=TRUE:
temp <- df %>%
group_by(group) %>%
do(model1 = tidy(lm(a ~ b + c + d, data = . ), conf.int=TRUE)) %>%
gather(model_name, model, -group) %>%
unnest()
# A tibble: 8 x 9
group model_name term estimate std.error statistic p.value conf.low conf.high
<fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0 model1 (Int… 0.0616 0.0423 1.46 0.146 -0.0215 0.145
2 0 model1 b 0.00178 0.0421 0.0424 0.966 -0.0808 0.0844
3 0 model1 c -0.00339 0.0431 -0.0787 0.937 -0.0881 0.0813
4 0 model1 d -0.0537 0.0445 -1.21 0.228 -0.141 0.0337
5 1 model1 (Int… -0.0185 0.0454 -0.408 0.683 -0.108 0.0707
6 1 model1 b 0.00128 0.0435 0.0295 0.976 -0.0842 0.0868
7 1 model1 c -0.0972 0.0430 -2.26 0.0244 -0.182 -0.0126
8 1 model1 d 0.0734 0.0457 1.60 0.109 -0.0165 0.163
If your version of dplyr is higher than 1.0.0, you can use:
df %>%
group_by(group) %>%
summarise(tidy(lm(a ~ b + c + d), conf.int = TRUE), .groups = "drop")
#> # A tibble: 8 x 8
#> group term estimate std.error statistic p.value conf.low conf.high
#> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 (Intercept) 0.0734 0.0468 1.57 0.117 -0.0185 0.165
#> 2 0 b -0.101 0.0461 -2.19 0.0292 -0.191 -0.0102
#> 3 0 c 0.0337 0.0464 0.726 0.468 -0.0575 0.125
#> 4 0 d -0.101 0.0454 -2.23 0.0265 -0.190 -0.0118
#> 5 1 (Intercept) -0.0559 0.0468 -1.20 0.232 -0.148 0.0360
#> 6 1 b -0.0701 0.0474 -1.48 0.140 -0.163 0.0230
#> 7 1 c 0.0319 0.0477 0.668 0.504 -0.0619 0.126
#> 8 1 d -0.0728 0.0466 -1.56 0.119 -0.164 0.0188
Is it possible to change the contrasts of interaction terms which have been specified in an lm using the colon : notation?
In the example below, the reference category defaults to the last of the six terms generated by gear:vs (i.e., gear5:vs1). I'd instead like it to use the first of the six as the reference (i.e., gear3:vs0).
mtcars.1 <- mtcars %>%
mutate(gear = as.factor(gear)) %>%
mutate(vs = as.factor(vs))
lm(data=mtcars.1, mpg ~ gear:vs) %>%
tidy
#> # A tibble: 6 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 30.4 4.13 7.36 0.0000000824
#> 2 gear3:vs0 -15.4 4.30 -3.57 0.00143
#> 3 gear4:vs0 -9.40 5.06 -1.86 0.0747
#> 4 gear5:vs0 -11.3 4.62 -2.44 0.0218
#> 5 gear3:vs1 -10.1 4.77 -2.11 0.0447
#> 6 gear4:vs1 -5.16 4.33 -1.19 0.245
Specifying contrasts for gear and vs separately doesn't seem to have an effect:
lm(data=mtcars.1, mpg ~ gear:vs,
contrasts = list(gear = contr.treatment(n=3,base=3),
vs = contr.treatment(n=2,base=2))) %>%
tidy
#> # A tibble: 6 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 30.4 4.13 7.36 0.0000000824
#> 2 gear3:vs0 -15.4 4.30 -3.57 0.00143
#> 3 gear4:vs0 -9.40 5.06 -1.86 0.0747
#> 4 gear5:vs0 -11.3 4.62 -2.44 0.0218
#> 5 gear3:vs1 -10.1 4.77 -2.11 0.0447
#> 6 gear4:vs1 -5.16 4.33 -1.19 0.245
And I'm not sure how to specify a contrast for gear:vs directly:
lm(data=mtcars.1, mpg ~ gear:vs,
contrasts = list("gear:vs" = contr.treatment(n=6,base=6))) %>%
tidy
#> Warning in model.matrix.default(mt, mf, contrasts): variable 'gear:vs' is
#> absent, its contrast will be ignored
#> # A tibble: 6 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 30.4 4.13 7.36 0.0000000824
#> 2 gear3:vs0 -15.4 4.30 -3.57 0.00143
#> 3 gear4:vs0 -9.40 5.06 -1.86 0.0747
#> 4 gear5:vs0 -11.3 4.62 -2.44 0.0218
#> 5 gear3:vs1 -10.1 4.77 -2.11 0.0447
#> 6 gear4:vs1 -5.16 4.33 -1.19 0.245
Created on 2019-01-21 by the reprex package (v0.2.1)
One way around this is to pre-calculate the interaction term before regression.
To demonstrate, we can create a factor column GV in mtcars with the same levels as observed in your lm output. It generates the same values:
mtcars %>%
mutate(GV = interaction(factor(gear), factor(vs)),
GV = factor(GV, levels = c("5.1", "3.0", "4.0", "5.0", "3.1", "4.1"))) %>%
lm(mpg ~ GV, .) %>%
tidy()
# A tibble: 6 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 30.4 4.13 7.36 0.0000000824
2 GV3.0 -15.4 4.30 -3.57 0.00143
3 GV4.0 -9.4 5.06 -1.86 0.0747
4 GV5.0 -11.3 4.62 -2.44 0.0218
5 GV3.1 -10.1 4.77 -2.11 0.0447
6 GV4.1 -5.16 4.33 -1.19 0.245
Now we omit the second mutate term, so the levels are 3.0, 4.0, 5.0, 3.1, 4.1, 5.1.
mtcars %>%
mutate(GV = interaction(factor(gear), factor(vs))) %>%
lm(mpg ~ GV, .) %>%
tidy()
# A tibble: 6 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 15.1 1.19 12.6 1.38e-12
2 GV4.0 5.95 3.16 1.88 7.07e- 2
3 GV5.0 4.08 2.39 1.71 9.96e- 2
4 GV3.1 5.28 2.67 1.98 5.83e- 2
5 GV4.1 10.2 1.77 5.76 4.61e- 6
6 GV5.1 15.4 4.30 3.57 1.43e- 3
Use interaction(factor(gear), factor(vs), lex.order = TRUE) to get the levels 3.0, 3.1, 4.0, 4.1, 5.0, 5.1.
mtcars %>%
mutate(GV = interaction(factor(gear), factor(vs), lex.order = TRUE)) %>%
lm(mpg ~ GV, .) %>%
tidy()
# A tibble: 6 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 15.0 1.19 12.6 1.38e-12
2 GV3.1 5.28 2.67 1.98 5.83e- 2
3 GV4.0 5.95 3.16 1.88 7.07e- 2
4 GV4.1 10.2 1.77 5.76 4.61e- 6
5 GV5.0 4.07 2.39 1.71 9.96e- 2
6 GV5.1 15.3 4.30 3.57 1.43e- 3