change contrasts of interaction term specified with colon in lm()

change contrasts of interaction term specified with colon in lm() - r

Is it possible to change the contrasts of interaction terms which have been specified in an lm using the colon : notation?
In the example below, the reference category defaults to the last of the six terms generated by gear:vs (i.e., gear5:vs1). I'd instead like it to use the first of the six as the reference (i.e., gear3:vs0).
mtcars.1 <- mtcars %>%
mutate(gear = as.factor(gear)) %>%
mutate(vs = as.factor(vs))
lm(data=mtcars.1, mpg ~ gear:vs) %>%
tidy
#> # A tibble: 6 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 30.4 4.13 7.36 0.0000000824
#> 2 gear3:vs0 -15.4 4.30 -3.57 0.00143
#> 3 gear4:vs0 -9.40 5.06 -1.86 0.0747
#> 4 gear5:vs0 -11.3 4.62 -2.44 0.0218
#> 5 gear3:vs1 -10.1 4.77 -2.11 0.0447
#> 6 gear4:vs1 -5.16 4.33 -1.19 0.245
Specifying contrasts for gear and vs separately doesn't seem to have an effect:
lm(data=mtcars.1, mpg ~ gear:vs,
contrasts = list(gear = contr.treatment(n=3,base=3),
vs = contr.treatment(n=2,base=2))) %>%
tidy
#> # A tibble: 6 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 30.4 4.13 7.36 0.0000000824
#> 2 gear3:vs0 -15.4 4.30 -3.57 0.00143
#> 3 gear4:vs0 -9.40 5.06 -1.86 0.0747
#> 4 gear5:vs0 -11.3 4.62 -2.44 0.0218
#> 5 gear3:vs1 -10.1 4.77 -2.11 0.0447
#> 6 gear4:vs1 -5.16 4.33 -1.19 0.245
And I'm not sure how to specify a contrast for gear:vs directly:
lm(data=mtcars.1, mpg ~ gear:vs,
contrasts = list("gear:vs" = contr.treatment(n=6,base=6))) %>%
tidy
#> Warning in model.matrix.default(mt, mf, contrasts): variable 'gear:vs' is
#> absent, its contrast will be ignored
#> # A tibble: 6 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 30.4 4.13 7.36 0.0000000824
#> 2 gear3:vs0 -15.4 4.30 -3.57 0.00143
#> 3 gear4:vs0 -9.40 5.06 -1.86 0.0747
#> 4 gear5:vs0 -11.3 4.62 -2.44 0.0218
#> 5 gear3:vs1 -10.1 4.77 -2.11 0.0447
#> 6 gear4:vs1 -5.16 4.33 -1.19 0.245
Created on 2019-01-21 by the reprex package (v0.2.1)

One way around this is to pre-calculate the interaction term before regression.
To demonstrate, we can create a factor column GV in mtcars with the same levels as observed in your lm output. It generates the same values:
mtcars %>%
mutate(GV = interaction(factor(gear), factor(vs)),
GV = factor(GV, levels = c("5.1", "3.0", "4.0", "5.0", "3.1", "4.1"))) %>%
lm(mpg ~ GV, .) %>%
tidy()
# A tibble: 6 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 30.4 4.13 7.36 0.0000000824
2 GV3.0 -15.4 4.30 -3.57 0.00143
3 GV4.0 -9.4 5.06 -1.86 0.0747
4 GV5.0 -11.3 4.62 -2.44 0.0218
5 GV3.1 -10.1 4.77 -2.11 0.0447
6 GV4.1 -5.16 4.33 -1.19 0.245
Now we omit the second mutate term, so the levels are 3.0, 4.0, 5.0, 3.1, 4.1, 5.1.
mtcars %>%
mutate(GV = interaction(factor(gear), factor(vs))) %>%
lm(mpg ~ GV, .) %>%
tidy()
# A tibble: 6 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 15.1 1.19 12.6 1.38e-12
2 GV4.0 5.95 3.16 1.88 7.07e- 2
3 GV5.0 4.08 2.39 1.71 9.96e- 2
4 GV3.1 5.28 2.67 1.98 5.83e- 2
5 GV4.1 10.2 1.77 5.76 4.61e- 6
6 GV5.1 15.4 4.30 3.57 1.43e- 3
Use interaction(factor(gear), factor(vs), lex.order = TRUE) to get the levels 3.0, 3.1, 4.0, 4.1, 5.0, 5.1.
mtcars %>%
mutate(GV = interaction(factor(gear), factor(vs), lex.order = TRUE)) %>%
lm(mpg ~ GV, .) %>%
tidy()
# A tibble: 6 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 15.0 1.19 12.6 1.38e-12
2 GV3.1 5.28 2.67 1.98 5.83e- 2
3 GV4.0 5.95 3.16 1.88 7.07e- 2
4 GV4.1 10.2 1.77 5.76 4.61e- 6
5 GV5.0 4.07 2.39 1.71 9.96e- 2
6 GV5.1 15.3 4.30 3.57 1.43e- 3

Related

how to add key variables to `dplyr::group_map()`?

I have the following, but want to add the group_by() key Species to the resulting tibble:
MWE
iris %>%
group_by(Species) %>%
group_map(~ broom::tidy(lm(Sepal.Length ~ Sepal.Width, data = .x))) %>%
bind_rows()
Output
# How do I add the grouping key `Species` to this?
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 2.64 0.310 8.51 3.74e-11
2 Sepal.Width 0.690 0.0899 7.68 6.71e-10
3 (Intercept) 3.54 0.563 6.29 9.07e- 8
4 Sepal.Width 0.865 0.202 4.28 8.77e- 5
5 (Intercept) 3.91 0.757 5.16 4.66e- 6
6 Sepal.Width 0.902 0.253 3.56 8.43e- 4

You can use group_modify() instead of group_map().
library(purrr)
library(dplyr)
iris %>%
group_by(Species) %>%
group_modify(~ broom::tidy(lm(Sepal.Length ~ Sepal.Width, data = .x)))
# A tibble: 6 x 6
# Groups: Species [3]
Species term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 setosa (Intercept) 2.64 0.310 8.51 3.74e-11
2 setosa Sepal.Width 0.690 0.0899 7.68 6.71e-10
3 versicolor (Intercept) 3.54 0.563 6.29 9.07e- 8
4 versicolor Sepal.Width 0.865 0.202 4.28 8.77e- 5
5 virginica (Intercept) 3.91 0.757 5.16 4.66e- 6
6 virginica Sepal.Width 0.902 0.253 3.56 8.43e- 4

We may do this in summarise, return a list column and unnest the output
library(dplyr)
library(tidyr)
iris %>%
group_by(Species) %>%
summarise(out = list(broom::tidy(lm(Sepal.Length ~ Sepal.Width,
data = cur_data())))) %>%
unnest(out)
-output
# A tibble: 6 x 6
Species term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 setosa (Intercept) 2.64 0.310 8.51 3.74e-11
2 setosa Sepal.Width 0.690 0.0899 7.68 6.71e-10
3 versicolor (Intercept) 3.54 0.563 6.29 9.07e- 8
4 versicolor Sepal.Width 0.865 0.202 4.28 8.77e- 5
5 virginica (Intercept) 3.91 0.757 5.16 4.66e- 6
6 virginica Sepal.Width 0.902 0.253 3.56 8.43e- 4
In group_map, according to documentation, the .y is the key, which we can add as a column
iris %>%
group_by(Species) %>%
group_map(~ broom::tidy(lm(Sepal.Length ~ Sepal.Width, data = .x)) %>%
mutate(Species = .y$Species, .before = 1)) %>%
bind_rows()
-output
# A tibble: 6 x 6
Species term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 setosa (Intercept) 2.64 0.310 8.51 3.74e-11
2 setosa Sepal.Width 0.690 0.0899 7.68 6.71e-10
3 versicolor (Intercept) 3.54 0.563 6.29 9.07e- 8
4 versicolor Sepal.Width 0.865 0.202 4.28 8.77e- 5
5 virginica (Intercept) 3.91 0.757 5.16 4.66e- 6
6 virginica Sepal.Width 0.902 0.253 3.56 8.43e- 4

Using split + map_df -
library(dplyr)
library(purrr)
iris %>%
split(.$Species) %>%
map_df(~ broom::tidy(lm(Sepal.Length ~ Sepal.Width, data=.x)),.id = 'Species')
# Species term estimate std.error statistic p.value
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#1 setosa (Intercept) 2.64 0.310 8.51 3.74e-11
#2 setosa Sepal.Width 0.690 0.0899 7.68 6.71e-10
#3 versicolor (Intercept) 3.54 0.563 6.29 9.07e- 8
#4 versicolor Sepal.Width 0.865 0.202 4.28 8.77e- 5
#5 virginica (Intercept) 3.91 0.757 5.16 4.66e- 6
#6 virginica Sepal.Width 0.902 0.253 3.56 8.43e- 4

Performing a linear model in R of a single response with a single predictor from a large dataframe and repeat for each column

It might not be very clear from the title but what I wish to do is:
I have a dataframe df with, say, 200 columns and the first 80 columns are response variables (y1, y2, y3, ...) and the rest of 120 are predictors (x1, x2, x3, ...).
I wish to compute a linear model for each pair – lm(yi ~ xi, data = df).
Many problems and solutions I have looked through online have a either a fixed response vs many predictors or the other way around, using lapply() and its related functions.
Could anyone who is familiar with it point me to the right step?

use tidyverse
library(tidyverse)
library(broom)
df <- mtcars
y <- names(df)[1:3]
x <- names(df)[4:7]
result <- expand_grid(x, y) %>%
rowwise() %>%
mutate(frm = list(reformulate(x, y)),
model = list(lm(frm, data = df)))
result$model <- purrr::set_names(result$model, nm = paste0(result$y, " ~ ", result$x))
result$model[1:2]
#> $`mpg ~ hp`
#>
#> Call:
#> lm(formula = frm, data = df)
#>
#> Coefficients:
#> (Intercept) hp
#> 30.09886 -0.06823
#>
#>
#> $`cyl ~ hp`
#>
#> Call:
#> lm(formula = frm, data = df)
#>
#> Coefficients:
#> (Intercept) hp
#> 3.00680 0.02168
map_df(result$model, tidy)
#> # A tibble: 24 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 30.1 1.63 18.4 6.64e-18
#> 2 hp -0.0682 0.0101 -6.74 1.79e- 7
#> 3 (Intercept) 3.01 0.425 7.07 7.41e- 8
#> 4 hp 0.0217 0.00264 8.23 3.48e- 9
#> 5 (Intercept) 21.0 32.6 0.644 5.25e- 1
#> 6 hp 1.43 0.202 7.08 7.14e- 8
#> 7 (Intercept) -7.52 5.48 -1.37 1.80e- 1
#> 8 drat 7.68 1.51 5.10 1.78e- 5
#> 9 (Intercept) 14.6 1.58 9.22 2.93e-10
#> 10 drat -2.34 0.436 -5.37 8.24e- 6
#> # ... with 14 more rows
map_df(result$model, glance)
#> # A tibble: 12 x 12
#> r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.602 0.589 3.86 45.5 1.79e- 7 1 -87.6 181. 186.
#> 2 0.693 0.683 1.01 67.7 3.48e- 9 1 -44.6 95.1 99.5
#> 3 0.626 0.613 77.1 50.1 7.14e- 8 1 -183. 373. 377.
#> 4 0.464 0.446 4.49 26.0 1.78e- 5 1 -92.4 191. 195.
#> 5 0.490 0.473 1.30 28.8 8.24e- 6 1 -52.7 111. 116.
#> 6 0.504 0.488 88.7 30.5 5.28e- 6 1 -188. 382. 386.
#> 7 0.753 0.745 3.05 91.4 1.29e-10 1 -80.0 166. 170.
#> 8 0.612 0.599 1.13 47.4 1.22e- 7 1 -48.3 103. 107.
#> 9 0.789 0.781 57.9 112. 1.22e-11 1 -174. 355. 359.
#> 10 0.175 0.148 5.56 6.38 1.71e- 2 1 -99.3 205. 209.
#> 11 0.350 0.328 1.46 16.1 3.66e- 4 1 -56.6 119. 124.
#> 12 0.188 0.161 114. 6.95 1.31e- 2 1 -196. 398. 402.
#> # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
Created on 2020-12-11 by the reprex package (v0.3.0)

Get summary of the model using purrr::map within dplyr piping

Using mtcars data, I am testing map() to build some lm() models:
library(tidyverse)
mtcars %>%
group_by(cyl) %>%
nest()%>%
mutate(fit = map(.x=data,~lm(mpg ~ ., data = .x)))
#> # A tibble: 3 x 3
#> cyl data fit
#> <dbl> <list> <list>
#> 1 6 <tibble [7 x 10]> <S3: lm>
#> 2 4 <tibble [11 x 10]> <S3: lm>
#> 3 8 <tibble [14 x 10]> <S3: lm>
The output shows that I have a new column, fit.
Now I wish to see the summary of each lm
When I try:
library(tidyverse)
mtcars %>%
group_by(cyl) %>%
nest()%>%
mutate(fit = map(.x=data,~lm(mpg ~ ., data = .x))) %>%
map(fit,summary)
#> Error in as_mapper(.f, ...): object 'fit' not found
It gives the error:
Error in as_mapper(.f, ...) : object 'fit' not found
If I wish to calculate R2 or aic then I can using the following code without any problem:
library(tidyverse)
library(modelr)
mtcars %>%
group_by(cyl) %>%
nest()%>%
mutate(fit = map(.x=data,~lm(mpg ~ ., data = .x))) %>%
mutate(r2 = map_dbl(fit, ~rsquare(., data = mtcars)),
aic = map_dbl(fit, ~AIC(.))) %>%
arrange(aic)
#> # A tibble: 3 x 5
#> cyl data fit r2 aic
#> <dbl> <list> <list> <dbl> <dbl>
#> 1 6 <tibble [7 x 10]> <S3: lm> -8.96 -Inf
#> 2 4 <tibble [11 x 10]> <S3: lm> -26.4 56.4
#> 3 8 <tibble [14 x 10]> <S3: lm> -1.000 67.3
Created on 2019-06-18 by the reprex package (v0.3.0)
What am I missing?

As IceCreamToucan's comment laid out, purrr::map does not look into the data which has been made within your piping.
If you use it with dplyr::mutate then it has access to fit which you have created in the previous piping.
Another option would be explicitly referring to fit which you can see below, as my second suggestion.
library(tidyverse)
mtcars %>%
group_by(cyl) %>%
nest()%>%
mutate(fit = map(.x=data,~lm(mpg ~ ., data = .x))) %>%
mutate(fit_sum = map(fit,summary))
#> # A tibble: 3 x 4
#> cyl data fit fit_sum
#> <dbl> <list> <list> <list>
#> 1 6 <tibble [7 x 10]> <lm> <smmry.lm>
#> 2 4 <tibble [11 x 10]> <lm> <smmry.lm>
#> 3 8 <tibble [14 x 10]> <lm> <smmry.lm>
mtcars %>%
group_by(cyl) %>%
nest()%>%
mutate(fit = map(.x=data,~lm(mpg ~ ., data = .x))) %>%
{map(.$fit, summary)} #or using pull: `pull(fit) %>% map(summary)`
#> [[1]]
#>
#> Call:
#> lm(formula = mpg ~ ., data = .x)
#>
#> Residuals:
#> ALL 7 residuals are 0: no residual degrees of freedom!
#>
#> Coefficients: (3 not defined because of singularities)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 32.78649 NA NA NA
#> disp 0.07456 NA NA NA
#> hp -0.04252 NA NA NA
#> drat 1.52367 NA NA NA
#> wt 5.12418 NA NA NA
#> qsec -2.33333 NA NA NA
#> vs -1.75289 NA NA NA
#> am NA NA NA NA
#> gear NA NA NA NA
#> carb NA NA NA NA
#>
#> Residual standard error: NaN on 0 degrees of freedom
#> Multiple R-squared: 1, Adjusted R-squared: NaN
#> F-statistic: NaN on 6 and 0 DF, p-value: NA
####truncated the results for the sake of space####
Created on 2019-06-17 by the reprex package (v0.3.0)

From the latest release of dplyr, tidyverse seems to be encouraging using group_modify functions instead of using purrr + nested dataframes.
In that workflow, here is how you can get both model summaries and estimates in the same dataframe via broom package:
# setup
set.seed(123)
library(tidyverse)
options(tibble.width = Inf)
# joining dataframes with regression estimates and model summaries
dplyr::full_join(
# to get a tidy dataframe of regression estimates
x = mtcars %>%
group_by(cyl) %>%
group_modify(.f = ~ broom::tidy(lm(mpg ~ ., data = .x), conf.int = TRUE)),
# to get a tidy dataframe of model summaries
y = mtcars %>%
group_by(cyl) %>%
group_modify(.f = ~ broom::glance(lm(mpg ~ ., data = .x))),
by = "cyl"
) %>%
dplyr::ungroup(x = .)
#> Warning in qt(a, object$df.residual): NaNs produced
#> # A tibble: 25 x 20
#> cyl term estimate std.error statistic.x p.value.x conf.low
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4 (Intercept) 60.9 180. 0.338 0.793 -2229.
#> 2 4 disp -0.345 0.469 -0.735 0.596 -6.31
#> 3 4 hp -0.0332 0.364 -0.0915 0.942 -4.65
#> 4 4 drat -4.19 46.4 -0.0903 0.943 -594.
#> 5 4 wt 4.48 29.7 0.151 0.905 -373.
#> 6 4 qsec -0.106 7.82 -0.0136 0.991 -99.4
#> 7 4 vs -3.64 34.0 -0.107 0.932 -435.
#> 8 4 am -6.33 45.2 -0.140 0.912 -581.
#> 9 4 gear 4.07 29.1 0.140 0.912 -366.
#> 10 4 carb 3.22 28.2 0.114 0.928 -355.
#> conf.high r.squared adj.r.squared sigma statistic.y p.value.y df
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2351. 0.928 0.276 3.84 1.42 0.576 9
#> 2 5.62 0.928 0.276 3.84 1.42 0.576 9
#> 3 4.59 0.928 0.276 3.84 1.42 0.576 9
#> 4 586. 0.928 0.276 3.84 1.42 0.576 9
#> 5 382. 0.928 0.276 3.84 1.42 0.576 9
#> 6 99.2 0.928 0.276 3.84 1.42 0.576 9
#> 7 428. 0.928 0.276 3.84 1.42 0.576 9
#> 8 568. 0.928 0.276 3.84 1.42 0.576 9
#> 9 374. 0.928 0.276 3.84 1.42 0.576 9
#> 10 362. 0.928 0.276 3.84 1.42 0.576 9
#> logLik AIC BIC deviance df.residual nobs
#> <dbl> <dbl> <dbl> <dbl> <int> <int>
#> 1 -17.2 56.4 60.8 14.7 1 11
#> 2 -17.2 56.4 60.8 14.7 1 11
#> 3 -17.2 56.4 60.8 14.7 1 11
#> 4 -17.2 56.4 60.8 14.7 1 11
#> 5 -17.2 56.4 60.8 14.7 1 11
#> 6 -17.2 56.4 60.8 14.7 1 11
#> 7 -17.2 56.4 60.8 14.7 1 11
#> 8 -17.2 56.4 60.8 14.7 1 11
#> 9 -17.2 56.4 60.8 14.7 1 11
#> 10 -17.2 56.4 60.8 14.7 1 11
#> # ... with 15 more rows
Created on 2019-06-17 by the reprex package (v0.3.0)

Calculate all possible interactions in model_matrix

I'm simulating data with a fluctuating number of variables. As part of the situation, I am needing to calculate a model matrix with all possible combinations. See the following reprex for an example. I am able to get all two-interactions by specifying the formula as ~ .*.. However, this particular dataset has 3 variables (ndim <- 3). I can get all two- and three-way interactions by specifying the formula as ~ .^3. The issue is that there may be 4+ variables that I need to calculate, so I would like to be able to generalize this. I have tried specifying the formula as ~ .^ndim, but this throws an error.
Is there a way define the power in the formula with a variable?
library(tidyverse)
library(mvtnorm)
library(modelr)
ndim <- 3
data <- rmvnorm(100, mean = rep(0, ndim)) %>%
as_tibble(.name_repair = ~ paste0("dim_", seq_len(ndim)))
model_matrix(data, ~ .*.)
#> # A tibble: 100 x 7
#> `(Intercept)` dim_1 dim_2 dim_3 `dim_1:dim_2` `dim_1:dim_3`
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 -0.775 0.214 0.111 -0.166 -0.0857
#> 2 1 1.25 -0.0636 1.40 -0.0794 1.75
#> 3 1 1.07 -0.361 0.976 -0.384 1.04
#> 4 1 2.08 0.381 0.593 0.793 1.24
#> 5 1 -0.197 0.382 -0.257 -0.0753 0.0506
#> 6 1 0.266 -1.82 0.00411 -0.485 0.00109
#> 7 1 3.09 2.57 -0.612 7.96 -1.89
#> 8 1 2.03 0.247 0.112 0.501 0.226
#> 9 1 -0.397 0.204 1.55 -0.0810 -0.614
#> 10 1 0.597 0.335 0.533 0.200 0.319
#> # … with 90 more rows, and 1 more variable: `dim_2:dim_3` <dbl>
model_matrix(data, ~ .^3)
#> # A tibble: 100 x 8
#> `(Intercept)` dim_1 dim_2 dim_3 `dim_1:dim_2` `dim_1:dim_3`
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 -0.775 0.214 0.111 -0.166 -0.0857
#> 2 1 1.25 -0.0636 1.40 -0.0794 1.75
#> 3 1 1.07 -0.361 0.976 -0.384 1.04
#> 4 1 2.08 0.381 0.593 0.793 1.24
#> 5 1 -0.197 0.382 -0.257 -0.0753 0.0506
#> 6 1 0.266 -1.82 0.00411 -0.485 0.00109
#> 7 1 3.09 2.57 -0.612 7.96 -1.89
#> 8 1 2.03 0.247 0.112 0.501 0.226
#> 9 1 -0.397 0.204 1.55 -0.0810 -0.614
#> 10 1 0.597 0.335 0.533 0.200 0.319
#> # … with 90 more rows, and 2 more variables: `dim_2:dim_3` <dbl>,
#> # `dim_1:dim_2:dim_3` <dbl>
model_matrix(data, ~.^ndim)
#> Error in terms.formula(object, data = data): invalid power in formula
Created on 2019-02-15 by the reprex package (v0.2.1)

You can use use as.formula with paste in model_matrix:
model_matrix(data, as.formula(paste0("~ .^", ndim)))

replacing `.x` with column name after running `map::purrr()` function

I run lm() for every column of a dataset with one of the column as the dependent variable, using purrr:map() function.
The results are almost perfect except for this - I want to replace .x in the results with the variable that i run lm() for.
The post R purrr map show column names in output is related but I want to avoid creating a function.
Below, are the codes using the mtcars dataset. I know, for example that .x for the first output refers to $mpg. I am not sure if setNames() will work.
library(tidyverse)
library(broom)
mod3 <- map(mtcars, ~ lm(mpg ~ .x, data = mtcars)) %>%
map(~tidy(.x))
#> Warning in summary.lm(x): essentially perfect fit: summary may be
#> unreliable
mod3
#> $mpg
#> # A tibble: 2 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) -5.02e-15 9.94e-16 -5.06e 0 0.0000198
#> 2 .x 1.00e+ 0 4.74e-17 2.11e16 0
#>
#> $cyl
#> # A tibble: 2 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 37.9 2.07 18.3 8.37e-18
#> 2 .x -2.88 0.322 -8.92 6.11e-10
#>
#> $disp
#> # A tibble: 2 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 29.6 1.23 24.1 3.58e-21
#> 2 .x -0.0412 0.00471 -8.75 9.38e-10
#>
#> $hp
#> # A tibble: 2 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 30.1 1.63 18.4 6.64e-18
#> 2 .x -0.0682 0.0101 -6.74 1.79e- 7
#>
#> $drat
#> # A tibble: 2 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) -7.52 5.48 -1.37 0.180
#> 2 .x 7.68 1.51 5.10 0.0000178
#>
#> $wt
#> # A tibble: 2 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 37.3 1.88 19.9 8.24e-19
#> 2 .x -5.34 0.559 -9.56 1.29e-10
#>
#> $qsec
#> # A tibble: 2 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) -5.11 10.0 -0.510 0.614
#> 2 .x 1.41 0.559 2.53 0.0171
#>
#> $vs
#> # A tibble: 2 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 16.6 1.08 15.4 8.85e-16
#> 2 .x 7.94 1.63 4.86 3.42e- 5

Here is one way to do it
library(tidyverse)
library(broom)
names(mtcars)[-1] %>%
set_names() %>%
map(~ lm(as.formula(paste0('mpg ~ ', .x)), data = mtcars)) %>%
map_dfr(., broom::tidy, .id = "variable")
#> # A tibble: 20 x 6
#> variable term estimate std.error statistic p.value
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 cyl (Intercept) 37.9 2.07 18.3 8.37e-18
#> 2 cyl cyl -2.88 0.322 -8.92 6.11e-10
#> 3 disp (Intercept) 29.6 1.23 24.1 3.58e-21
#> 4 disp disp -0.0412 0.00471 -8.75 9.38e-10
#> 5 hp (Intercept) 30.1 1.63 18.4 6.64e-18
#> 6 hp hp -0.0682 0.0101 -6.74 1.79e- 7
#> 7 drat (Intercept) -7.52 5.48 -1.37 1.80e- 1
#> 8 drat drat 7.68 1.51 5.10 1.78e- 5
#> 9 wt (Intercept) 37.3 1.88 19.9 8.24e-19
#> 10 wt wt -5.34 0.559 -9.56 1.29e-10
#> 11 qsec (Intercept) -5.11 10.0 -0.510 6.14e- 1
#> 12 qsec qsec 1.41 0.559 2.53 1.71e- 2
#> 13 vs (Intercept) 16.6 1.08 15.4 8.85e-16
#> 14 vs vs 7.94 1.63 4.86 3.42e- 5
#> 15 am (Intercept) 17.1 1.12 15.2 1.13e-15
#> 16 am am 7.24 1.76 4.11 2.85e- 4
#> 17 gear (Intercept) 5.62 4.92 1.14 2.62e- 1
#> 18 gear gear 3.92 1.31 3.00 5.40e- 3
#> 19 carb (Intercept) 25.9 1.84 14.1 9.22e-15
#> 20 carb carb -2.06 0.569 -3.62 1.08e- 3
Created on 2019-02-10 by the reprex package (v0.2.1.9000)

Hi you can use purrr::imap() like so:
mod3 <- map(mtcars, ~ lm(mpg ~ .x, data = mtcars)) %>%
map(tidy) %>%
imap( ~ {.x[2, 1] <- .y ; return(.x)} )
imap sends two things to the function/ formula : .x the item and .y which is either the name of the item (name in this case) or the index. I had to wrap everything in {} in this case to get the assignment to work

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

change contrasts of interaction term specified with colon in lm() - r

Related

how to add key variables to `dplyr::group_map()`?

Performing a linear model in R of a single response with a single predictor from a large dataframe and repeat for each column

Get summary of the model using purrr::map within dplyr piping

Calculate all possible interactions in model_matrix

replacing `.x` with column name after running `map::purrr()` function

Categories

Resources