R mutate_at on a subset of rows

R mutate_at on a subset of rows - r

My question is similar to this post(Applying mutate_at conditionally to specific rows in a dataframe in R), and I could reproduce the result. But whey I tried to apply this to my problem, which is putting parenthesis to the cell value for selected rows and columns, I run into error messages. Here's a reproducible example.
df <- structure(list(dep = c("cyl", "cyl", "disp", "disp", "drat",
"drat", "hp", "hp", "mpg", "mpg"), name = c("estimate", "t_stat",
"estimate", "t_stat", "estimate", "t_stat", "estimate", "t_stat",
"estimate", "t_stat"), dat1 = c(1.151, 6.686, 102.902, 12.107,
-0.422, -5.237, 37.576, 5.067, -5.057, -8.185), dat2 = c(1.274,
8.423, 106.429, 12.148, -0.394, -5.304, 38.643, 6.172, -4.843,
-10.622), dat3 = c(1.078, 5.191, 103.687, 7.79, -0.194, -2.629,
36.777, 4.842, -4.539, -7.91)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
Given above data frame, I hope to put parenthesis to the cell values of column dat1, dat2 and dat3when name == t_stat. Here's what I've tried, but it seems like that paste0 is not accepted inside of the case_when function in this case.
require(tidyverse)
df %>% mutate_at(vars(matches("dat")),
+ funs( case_when(name == 't_stat' ~ paste0("(", ., ")"), TRUE ~ .) ))
Error: must be a character vector, not a double vector
When I use brute force, namely mutate each column, then it works but my actual problem has more than 10 columns so this is not really practical.
require(tidyverse)
> df %>% mutate(dat1 = ifelse(name == "t_stat", paste0("(", dat1, ")"), dat1),
+ dat2 = ifelse(name == "t_stat", paste0("(", dat2, ")"), dat1),
+ dat3 = ifelse(name == "t_stat", paste0("(", dat3, ")"), dat1))
# A tibble: 10 x 5
dep name dat1 dat2 dat3
<chr> <chr> <chr> <chr> <chr>
1 cyl estimate 1.151 1.151 1.151
2 cyl t_stat (6.686) (8.423) (5.191)
3 disp estimate 102.902 102.902 102.902
4 disp t_stat (12.107) (12.148) (7.79)
5 drat estimate -0.422 -0.422 -0.422
6 drat t_stat (-5.237) (-5.304) (-2.629)
7 hp estimate 37.576 37.576 37.576
8 hp t_stat (5.067) (6.172) (4.842)
9 mpg estimate -5.057 -5.057 -5.057
10 mpg t_stat (-8.185) (-10.622) (-7.91)

case_when is type-strict meaning it expects output to be of same class. Your original columns are of type numeric whereas while adding "(" around your data you are making it of class character.
Also funs is long deprecated and mutate_at will soon be replaced with across.
library(dplyr)
df %>%
mutate_at(vars(matches("dat")),
~case_when(name == 't_stat' ~ paste0("(", ., ")"), TRUE ~ as.character(.)))

The error message is ... unhelpful.
Your problem is that you're mixing numeric and character data in a column. The dat variables are numeric.
df %>% mutate_at(vars(matches("dat")),
funs( case_when(name == 't_stat' ~ paste0("(", ., ")"),
TRUE ~ as.character(.))))
# A tibble: 10 x 5
dep name dat1 dat2 dat3
<chr> <chr> <chr> <chr> <chr>
1 cyl estimate 1.151 1.274 1.078
2 cyl t_stat (6.686) (8.423) (5.191)
3 disp estimate 102.902 106.429 103.687
4 disp t_stat (12.107) (12.148) (7.79)
5 drat estimate -0.422 -0.394 -0.194
6 drat t_stat (-5.237) (-5.304) (-2.629)
7 hp estimate 37.576 38.643 36.777
8 hp t_stat (5.067) (6.172) (4.842)
9 mpg estimate -5.057 -4.843 -4.539
10 mpg t_stat (-8.185) (-10.622) (-7.91)

Basically, you need to convert dbl to char first, and that is what the error message is also saying Error: must be a character vector, not a double vector
As #Rohan rightly said, case_when is type-strict meaning it expects output to be of same class.
df %>% mutate_at(vars(matches("dat")),
~case_when(name =='t_stat'~ paste0("(",as.character(.x),")"),
T ~ as.character(.x))
)
output as
# A tibble: 10 x 5
dep name dat1 dat2 dat3
<chr> <chr> <chr> <chr> <chr>
1 cyl estimate 1.151 1.274 1.078
2 cyl t_stat (6.686) (8.423) (5.191)
3 disp estimate 102.902 106.429 103.687
4 disp t_stat (12.107) (12.148) (7.79)
5 drat estimate -0.422 -0.394 -0.194
6 drat t_stat (-5.237) (-5.304) (-2.629)
7 hp estimate 37.576 38.643 36.777
8 hp t_stat (5.067) (6.172) (4.842)
9 mpg estimate -5.057 -4.843 -4.539
10 mpg t_stat (-8.185) (-10.622) (-7.91)

Related

Running single linear regressions across multiple variables, in groups

I'm trying to run a simple single linear regression over a large number of variables, grouped according to another variable. Using the mtcars dataset as an example, I'd like to run a separate linear regression between mpg and each other variable (mpg ~ disp, mpg ~ hp, etc.), grouped by another variable (for example, cyl).
Running lm over each variable independently can easily be done using purrr::map (modified from this great tutorial - https://sebastiansauer.github.io/EDIT-multiple_lm_purrr_EDIT/):
library(dplyr)
library(tidyr)
library(purrr)
mtcars %>%
select(-mpg) %>% #exclude outcome, leave predictors
map(~ lm(mtcars$mpg ~ .x, data = mtcars)) %>%
map_df(glance, .id='variable') %>%
select(variable, r.squared, p.value)
# A tibble: 10 x 3
variable r.squared p.value
<chr> <dbl> <dbl>
1 cyl 0.726 6.11e-10
2 disp 0.718 9.38e-10
3 hp 0.602 1.79e- 7
4 drat 0.464 1.78e- 5
5 wt 0.753 1.29e-10
6 qsec 0.175 1.71e- 2
7 vs 0.441 3.42e- 5
8 am 0.360 2.85e- 4
9 gear 0.231 5.40e- 3
10 carb 0.304 1.08e- 3
And running a linear model over grouped variables is also easy using map:
mtcars %>%
split(.$cyl) %>% #split by grouping variable
map(~ lm(mpg ~ wt, data = .)) %>%
map_df(broom::glance, .id='cyl') %>%
select(cyl, variable, r.squared, p.value)
# A tibble: 3 x 3
cyl r.squared p.value
<chr> <dbl> <dbl>
1 4 0.509 0.0137
2 6 0.465 0.0918
3 8 0.423 0.0118
So I can run by variable, or by group. However, I can't figure out how to combine these two (grouping everything by cyl, then running lm(mpg ~ each other variable, separately). I'd hoped to do something like this:
mtcars %>%
select(-mpg) %>% #exclude outcome, leave predictors
split(.$cyl) %>% # group by grouping variable
map(~ lm(mtcars$mpg ~ .x, data = mtcars)) %>% #run lm across all variables
map_df(glance, .id='cyl') %>%
select(cyl, variable, r.squared, p.value)
and get a result that gives me cyl(group), variable, r.squared, and p.value (a combination of 3 groups * 10 variables = 30 model outputs).
But split() turns the dataframe into a list, which the construction from part 1 [ map(~ lm(mtcars$mpg ~ .x, data = mtcars)) ] can't handle. I have tried to modify it so that it doesn't explicitly refer to the original data structure, but can't figure out a working solution. Any help is greatly appreciated!

IIUC, you can use group_by and group_modify, with a map inside that iterates over predictors.
If you can isolate your predictor variables in advance, it'll make it easier, as with ivs in this solution.
library(tidyverse)
ivs <- colnames(mtcars)[3:ncol(mtcars)]
names(ivs) <- ivs
mtcars %>%
group_by(cyl) %>%
group_modify(function(data, key) {
map_df(ivs, function(iv) {
frml <- as.formula(paste("mpg", "~", iv))
lm(frml, data = data) %>% broom::glance()
}, .id = "iv")
}) %>%
select(cyl, iv, r.squared, p.value)
# A tibble: 27 × 4
# Groups: cyl [3]
cyl iv r.squared p.value
<dbl> <chr> <dbl> <dbl>
1 4 disp 0.648 0.00278
2 4 hp 0.274 0.0984
3 4 drat 0.180 0.193
4 4 wt 0.509 0.0137
5 4 qsec 0.0557 0.485
6 4 vs 0.00238 0.887
7 4 am 0.287 0.0892
8 4 gear 0.115 0.308
9 4 carb 0.0378 0.567
10 6 disp 0.0106 0.826
11 6 hp 0.0161 0.786
# ...

R round number to different number of digits based on value on multiple columns

I'm trying to round numbers in multiple columns using different thresholds based on the value. Specifically, I want to round to an integer if the absolute value is larger than 1 and round to the third decimal point if not. I've tried a few different strategies by following answers to similar questions but doesn't seem to work. Here's a reproducible example.
df <- structure(list(dep = c("cyl", "cyl", "disp", "disp", "drat",
"drat", "hp", "hp", "mpg", "mpg"), name = c("estimate", "t_stat",
"estimate", "t_stat", "estimate", "t_stat", "estimate", "t_stat",
"estimate", "t_stat"), dat1 = c(1.15052520023357, 6.68591106097725,
102.901631449292, 12.1072688820387, -0.422439347353398, -5.23657414425551,
37.5762984208224, 5.06741973124599, -5.05739510901596, -8.18496613472796
), dat2 = c(1.27442224382304, 8.42316433209027, 106.428896001266,
12.147509560065, -0.393755429958381, -5.30373672190043, 38.64345279421,
6.17204732384094, -4.84272702226804, -10.6216411092441), dat3 = c(1.07794895749739,
5.1912094236003, 103.687423254053, 7.78976856569243, -0.19357672324514,
-2.62921011406252, 36.7770360009548, 4.84248650357675, -4.53918562415258,
-7.91010248086649)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
According to the criteria, every number in column dat1 to dat3 should become an integer except for values in the fifth row. I've tried the following two approaches, but couldn't get it done.
df_raw %>% mutate_if( is.numeric(.) == T & abs(.) > 10, round, 0)
Error in Math.data.frame(.) :
non-numeric variable(s) in data frame: dep, name
In the second approach, everything seems to work, but the fifth row is also rounded to 0 digits.
> df_raw %>% mutate_if( ~ is.numeric(.) == T && abs(.) > 1, round, 0)
# A tibble: 10 x 5
dep name dat1 dat2 dat3
<chr> <chr> <dbl> <dbl> <dbl>
1 cyl estimate 1 1 1
2 cyl t_stat 7 8 5
3 disp estimate 103 106 104
4 disp t_stat 12 12 8
5 drat estimate 0 0 0
6 drat t_stat -5 -5 -3
7 hp estimate 38 39 37
8 hp t_stat 5 6 5
9 mpg estimate -5 -5 -5
10 mpg t_stat -8 -11 -8
My real problem involves many columns to mutate, so combining round with mutate_if (or something similar) is strongly preferred. Thanks!

Try the case_when function from the dplyr package for complex condition handling:
library(dplyr)
df %>%
mutate_at(.vars = vars(dat1, dat2, dat3),
.funs = ~ case_when(abs(.x) > 1 ~ round(.x, digits = 0),
TRUE ~ round(.x, digits = 3)))
# A tibble: 10 x 5
dep name dat1 dat2 dat3
<chr> <chr> <dbl> <dbl> <dbl>
1 cyl estimate 1 1 1
2 cyl t_stat 7 8 5
3 disp estimate 103 106 104
4 disp t_stat 12 12 8
5 drat estimate -0.422 -0.394 -0.194
6 drat t_stat -5 -5 -3
7 hp estimate 38 39 37
8 hp t_stat 5 6 5
9 mpg estimate -5 -5 -5
10 mpg t_stat -8 -11 -8
What we do here is that we mutate_at all three variables dat1 to dat3 (specified in the .vars argument) and call the case_when as a quosure style lambda function. This rounds each value to an integer (i.e., digits = 0) if the absolute value is larger than 1 and to a three-digit decimal float otherwise.
Side note:
While this approach is somewhat more verbose, it allows you to flexibly adjust both the variables which you want to apply the function to and add more complex conditions. If you are sure that you really only want to apply the function to numeric variables, you can of course use the mutate_if combined with a is.numeric predicate, but keep the case_when for the condition handling part:
df %>%
mutate_if(.predicate = is.numeric,
.funs = ~ case_when(abs(.x) > 1 ~ round(.x, digits = 0),
TRUE ~ round(.x, digits = 3)))

A correct syntax would be :
df_raw %>%
mutate_if(
is.numeric,
~ ifelse(abs(.x) > 1, round(.x), round(.x, 3))
)
(second argument of mutate_if is a function, is.numeric here)

R Loop Regressions

data=mtcars
data$group = rep(seq(from=1, to=4, by=1), 8)
model1 <- glm(vs ~ mpg + cyl + disp + hp, data = subset(data, group == 1), family = "binomial")
model2 <- glm(vs ~ mpg + cyl + disp + hp, data = subset(data, group == 2), family = "binomial")
model3 <- glm(vs ~ mpg + cyl + disp + hp, data = subset(data, group == 3), family = "binomial")
model4 <- glm(vs ~ mpg + cyl + disp + hp, data = subset(data, group == 4), family = "binomial")
model5 <- glm(am ~ mpg + cyl + disp + hp, data = subset(data, group == 1), family = "binomial")
model6 <- glm(am ~ mpg + cyl + disp + hp, data = subset(data, group == 2), family = "binomial")
model7 <- glm(am ~ mpg + cyl + disp + hp, data = subset(data, group == 3), family = "binomial")
model8 <- glm(am ~ mpg + cyl + disp + hp, data = subset(data, group == 4), family = "binomial")
Say you want to estimate a bunch of stratified models that are identical in every way except the stratified group (models 1-4) and also that you want to repeat this series of models for different outcomes (models 5-8).
That is what I have for the code above. However, is there a more efficient way to run this in terms of it not taking up as many lines of code? For example to specify the covariates, outcomes, and groups, and then loop over them?

You can for instance use data.table to run the model fitting by group, e.g.:
library(data.table)
dt = as.data.table(data)
models = dt[, .(fit_vs = list(glm(vs ~ mpg + cyl + disp + hp, family = "binomial")),
fit_am = list(glm(am ~ mpg + cyl + disp + hp, family = "binomial"))),
by = .(group)]
The result is then:
print(models)
# group fit_vs fit_am
# 1: 2 <glm> <glm>
# 2: 1 <glm> <glm>
# 3: 3 <glm> <glm>
# 4: 4 <glm> <glm>
You can access the fit for vs and group 3 using:
models[group == "3", fit_vs]
# [[1]]
#
# Call: glm(formula = vs ~ mpg + cyl + disp + hp, family = "binomial")
#
# Coefficients:
# (Intercept) mpg cyl disp hp
# 180.970664 -0.384760 -24.366394 -0.008435 -0.010799
#
# Degrees of Freedom: 9 Total (i.e. Null); 5 Residual
# Null Deviance: 13.46
# Residual Deviance: 3.967e-10 AIC: 10

First of all, seq(from=1, to=4, length=T) returns 1, so your code only creates 1 group. I thus modified your code as follows.
data=mtcars
data$group = rep(1:4, each = 8)
We can use the functions to apply glm to each combination as follows.
library(tidyverse)
data2 <- data %>%
gather(Y, Value, vs, am) %>%
group_split(Y, group) %>%
set_names(nm = map_chr(., ~str_c(unique(.x$Y), unique(.x$group), sep = "-"))) %>%
map(~glm(Value ~ mpg + cyl + disp + hp, data = .x, family = "binomial"))
We can access the result by names
data2[["am-1"]]
# Call: glm(formula = Value ~ mpg + cyl + disp + hp, family = "binomial",
# data = .x)
#
# Coefficients:
# (Intercept) mpg cyl disp hp
# 4.9180 -0.5335 17.2521 -0.7975 0.5192
#
# Degrees of Freedom: 7 Total (i.e. Null); 3 Residual
# Null Deviance: 10.59
# Residual Deviance: 2.266e-10 AIC: 10
data3 <- data %>%
gather(Y, Value, vs, am) %>%
group_by(Y, group) %>%
nest() %>%
mutate(Model = map(data, ~glm(Value ~ mpg + cyl + disp + hp, data = .x, family = "binomial")))
data3
# # A tibble: 8 x 4
# # Groups: group, Y [8]
# group Y data Model
# <int> <chr> <list<df[,10]>> <list>
# 1 1 vs [8 x 10] <glm>
# 2 2 vs [8 x 10] <glm>
# 3 3 vs [8 x 10] <glm>
# 4 4 vs [8 x 10] <glm>
# 5 1 am [8 x 10] <glm>
# 6 2 am [8 x 10] <glm>
# 7 3 am [8 x 10] <glm>
# 8 4 am [8 x 10] <glm>
data3 %>%
filter(group == 1, Y == "am") %>%
pull(Model)
# [[1]]
#
# Call: glm(formula = Value ~ mpg + cyl + disp + hp, family = "binomial",
# data = .x)
#
# Coefficients:
# (Intercept) mpg cyl disp hp
# 4.9180 -0.5335 17.2521 -0.7975 0.5192
#
# Degrees of Freedom: 7 Total (i.e. Null); 3 Residual
# Null Deviance: 10.59
# Residual Deviance: 2.266e-10 AIC: 10
You can extract the information with mutate and map, like below.
data4 <- data3 %>% mutate(Coef = map(Model, coef))
data4 %>%
filter(group == 1, Y == "am") %>%
pull(Coef)
# [[1]]
# (Intercept) mpg cyl disp hp
# 4.9179574 -0.5334823 17.2520829 -0.7974839 0.5191961
Or use the functions from the broom package.
library(broom)
data5 <- data3 %>%
mutate(Info = map(Model, tidy)) %>%
select(-Model, -data) %>%
unnest(cols = "Info")
data5
# # A tibble: 40 x 7
# # Groups: group, Y [8]
# group Y term estimate std.error statistic p.value
# <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 vs (Intercept) 397. 4682905. 0.0000849 1.000
# 2 1 vs mpg -8.95 176775. -0.0000507 1.000
# 3 1 vs cyl -41.9 141996. -0.000295 1.000
# 4 1 vs disp 0.525 1510. 0.000348 1.000
# 5 1 vs hp -0.610 8647. -0.0000705 1.000
# 6 2 vs (Intercept) 126. 2034044. 0.0000619 1.000
# 7 2 vs mpg -0.965 69501. -0.0000139 1.000
# 8 2 vs cyl 25.6 398854. 0.0000642 1.000
# 9 2 vs disp 0.266 3917. 0.0000680 1.000
# 10 2 vs hp -2.29 19162. -0.000120 1.000
# # ... with 30 more rows

lapply and for loop to run a function through a list of data.frames in R

I have a list of data.frame and I'd like to run cor.test through each data.frame.
The data.frame has 8 columns, I would like to run cor.test for each of the first 7 columns against the 8th column.
I first set up the lists for storing the data
estimates = list()
pvalues = list()
Then here's the loop combining with lapply
for (i in 1:7){
corr <- lapply(datalist, function(x) {cor.test(x[,i], x[,8], alternative="two-sided", method="spearman", exact=FALSE, continuity=TRUE)})
estimates= corr$estimate
pvalues= corr$p.value
}
It ran without any errors but the estimates shows NULL
Which part of this went wrong? I used to run for loop over cor.test or run is with lapply, never put them together. I wonder if there's a solution to this or an alternative. Thank you.

We can use sapply, showing with an example on mtcars where cor.test is performed with all columns against the first column.
lst <- list(mtcars, mtcars)
lapply(lst, function(x) t(sapply(x[-8], function(y) {
val <- cor.test(y, x[[8]], alternative ="two.sided",
method="spearman", exact=FALSE, continuity=TRUE)
c(val$estimate, pval = val$p.value)
})))
[[1]]
# rho pval
#mpg 0.7065968 6.176953e-06
#cyl -0.8137890 1.520674e-08
#disp -0.7236643 2.906504e-06
#hp -0.7515934 7.247490e-07
#drat 0.4474575 1.021422e-02
#wt -0.5870162 4.163577e-04
#qsec 0.7915715 6.843882e-08
#am 0.1683451 3.566025e-01
#gear 0.2826617 1.168159e-01
#carb -0.6336948 9.977275e-05
#[[2]]
# rho pval
#mpg 0.7065968 6.176953e-06
#cyl -0.8137890 1.520674e-08
#.....
This returns you list of two column matrix with estimate and p.value respectively.

Disclaimer: This answer uses the developer version of manymodelr that I also wrote.
EDIT: You can map it to your list of data frames with Map or lapply for instance:
lst <- list(mtcars, mtcars) #Line copied and pasted from #Ronak Shah's answer
Map(function(x) manymodelr::get_var_corr(x, "mpg",get_all = TRUE,
alternative="two.sided",
method="spearman",
continuity=TRUE,exact=F),lst)
For a single data.frame object, we can use get_var_corr:
manymodelr::get_var_corr(mtcars, "mpg",get_all = TRUE,
alternative="two.sided",
method="spearman",
continuity=TRUE,exact=FALSE)
# Comparison_Var Other_Var p.value Correlation
# 1 mpg cyl 4.962301e-13 -0.9108013
# 2 mpg disp 6.731078e-13 -0.9088824
# 3 mpg hp 5.330559e-12 -0.8946646
# 4 mpg drat 5.369227e-05 0.6514555
# 5 mpg wt 1.553261e-11 -0.8864220
# 6 mpg qsec 7.042244e-03 0.4669358
# 7 mpg vs 6.176953e-06 0.7065968
# 8 mpg am 8.139885e-04 0.5620057
# 9 mpg gear 1.325942e-03 0.5427816
# 10 mpg carb 4.385340e-05 -0.6574976

purrr has some convenience functions could possibly make this operation a little more simple (although its debatable whether this is actually simpler than the Map/lapply way). Using Ronak's example list lst:
library(purrr)
lst <- list(mtcars, mtcars)
map2(map(lst, ~.[-8]), map(lst, 8), ~
map(.x, cor.test, y = .y,
alternative = "two.sided",
method = "spearman",
exact = FALSE,
continuity = TRUE) %>%
map_dfr(extract, c('estimate', 'p.value'), .id = 'var'))
# [[1]]
# # A tibble: 10 x 3
# var estimate p.value
# <chr> <dbl> <dbl>
# 1 mpg 0.707 0.00000618
# 2 cyl -0.814 0.0000000152
# 3 disp -0.724 0.00000291
# 4 hp -0.752 0.000000725
# 5 drat 0.447 0.0102
# 6 wt -0.587 0.000416
# 7 qsec 0.792 0.0000000684
# 8 am 0.168 0.357
# 9 gear 0.283 0.117
# 10 carb -0.634 0.0000998
#
# [[2]]
# # A tibble: 10 x 3
# var estimate p.value
# <chr> <dbl> <dbl>
# 1 mpg 0.707 0.00000618
# 2 cyl -0.814 0.0000000152
# 3 disp -0.724 0.00000291
# 4 hp -0.752 0.000000725
# 5 drat 0.447 0.0102
# 6 wt -0.587 0.000416
# 7 qsec 0.792 0.0000000684
# 8 am 0.168 0.357
# 9 gear 0.283 0.117
# 10 carb -0.634 0.0000998

loop or apply multiple regressions, extract coefficients and p-values into data frame

I have a data frame with 3 dependent (LHS) variables and 4 independent (RHS) variables. I'd like to run a linear regression of each LHS variable on each RHS varaiable and store the results of each regression as a row in the data frame with the columns: lhs, rhs, Estimate, Std. Error, t value, Pr(>|t|).
For example, using mtcars, I considered a nested loop:
lhs <- c('mpg', 'cyl', 'disp')
rhs <- c('hp', 'drat', 'wt', 'qsec')
reg_count <- 1
for (i in lhs){
for (j in rhs){
model <- lm(i ~ j, data = mtcars)
results[reg_count] <- coef(summary(model))
reg_count <- reg_count + 1
}
}
However, this fails for a number of reasons. Is there a simple way I can do this? Ideally using an apply() function rather than a loop?

Here's how I would do it. I shortened your example a little, but that won't matter:
lhs <- c('mpg', 'cyl', 'disp')
rhs <- c('hp', 'drat')
models = list()
for (i in lhs){
for (j in rhs){
models[[paste(i, "vs", j)]] <- lm(as.formula(paste(i, "~", j)), data = mtcars)
}
}
If you want to use apply, you'll need to start with a matrix. The difference in runtime will be negligible.
# with apply:
coefs_mat = expand.grid(lhs, rhs)
mods = apply(coefs_mat, 1, function(row) {
lm(as.formula(paste(row[1], "~", row[2])), data = mtcars)
})
names(mods) = with(coefs_mat, paste(Var1, "vs", Var2))
Both methods give the same results. Now we can pull the coefficients, etc. with broom::tidy
# get coefs
library(broom)
coefs = lapply(mods, tidy, simplify = F)
# combine
dplyr::bind_rows(coefs, .id = "mod")
# mod term estimate std.error statistic p.value
# 1 mpg vs hp (Intercept) 30.09886054 1.633921e+00 18.4212465 6.642736e-18
# 2 mpg vs hp hp -0.06822828 1.011930e-02 -6.7423885 1.787835e-07
# 3 cyl vs hp (Intercept) 3.00679525 4.254852e-01 7.0667442 7.405351e-08
# 4 cyl vs hp hp 0.02168354 2.635142e-03 8.2286042 3.477861e-09
# 5 disp vs hp (Intercept) 20.99248341 3.260662e+01 0.6438104 5.245902e-01
# 6 disp vs hp hp 1.42977003 2.019414e-01 7.0801224 7.142679e-08
# 7 mpg vs drat (Intercept) -7.52461844 5.476663e+00 -1.3739423 1.796391e-01
# 8 mpg vs drat drat 7.67823260 1.506705e+00 5.0960421 1.776240e-05
We can also pull out model summary stats:
# get summary stats
summ = lapply(mods, glance, simplify = F)
dplyr::bind_rows(summ, .id = "mod")
# mod r.squared adj.r.squared sigma statistic p.value df logLik
# 1 mpg vs hp 0.6024373 0.5891853 3.862962 45.45980 1.787835e-07 2 -87.61931
# 2 cyl vs hp 0.6929688 0.6827344 1.005944 67.70993 3.477861e-09 2 -44.56307
# 3 disp vs hp 0.6255997 0.6131197 77.089503 50.12813 7.142679e-08 2 -183.41236
# 4 mpg vs drat 0.4639952 0.4461283 4.485409 25.96964 1.776240e-05 2 -92.39996
# 5 cyl vs drat 0.4899134 0.4729105 1.296596 28.81354 8.244636e-06 2 -52.68517
# 6 disp vs drat 0.5044038 0.4878839 88.693360 30.53315 5.282022e-06 2 -187.89934
# AIC BIC deviance df.residual
# 1 181.23863 185.63584 447.67431 30
# 2 95.12614 99.52335 30.35771 30
# 3 372.82473 377.22194 178283.74604 30
# 4 190.79993 195.19714 603.56673 30
# 5 111.37033 115.76754 50.43482 30
# 6 381.79868 386.19588 235995.36410 30

You can start with expand.grid to give a nice dataframe of dependent/independent variable pairs. Then add the formulae and models to the data.
pairings <- expand.grid(
lhs = c('mpg', 'cyl', 'disp'),
rhs = c('hp', 'drat', 'wt', 'qsec')
)
pairings[["formula"]] <- lapply(
X = paste(pairings[["lhs"]], "~", pairings[["rhs"]]),
FUN = as.formula
)
pairings[["model"]] <- lapply(
X = pairings[["formula"]],
FUN = lm,
data = mtcars
)
The results:
str(pairings, max.level = 1)
# 'data.frame': 12 obs. of 4 variables:
# $ lhs : Factor w/ 3 levels "mpg","cyl","disp": 1 2 3 1 2 3 1 2 3 1 ...
# $ rhs : Factor w/ 4 levels "hp","drat","wt",..: 1 1 1 2 2 2 3 3 3 4 ...
# $ formula:List of 12
# $ model :List of 12
# - attr(*, "out.attrs")=List of 2
pairings[["model"]][[1]]
# Call:
# FUN(formula = X[[i]], data = ..1)
#
# Coefficients:
# (Intercept) hp
# 30.09886 -0.06823

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R mutate_at on a subset of rows - r

Related

Running single linear regressions across multiple variables, in groups

R round number to different number of digits based on value on multiple columns

R Loop Regressions

lapply and for loop to run a function through a list of data.frames in R

loop or apply multiple regressions, extract coefficients and p-values into data frame

Categories

Resources