R - passing object to function based on the NAME of the object - r

Suppose in R I have multiple GLM objects from multiple glm() function calls.
glm_01
glm_02
...
glm_nn
...and suppose that I want to do all possible pairwise comparisons using a chi-squared or F ANOVA test.
anova(glm_01, glm_02, test = "F")
anova(glm_01, glm_03, test = "F")
anova(glm_01, glm_04, test = "F")
...
I don't want to do this manually because the list of models is quite long. Instead I'd like to grab a list of relevant model objects (anything starting with "glm_") and do all pairwise comparisons automatically. However I'm unsure how to pass the model objects (rather than their names in string form) to the anova() function.
As a simple example:
data(mtcars)
# create some models
glm_01 <- glm(mpg ~ cyl , mtcars, family = gaussian())
glm_02 <- glm(mpg ~ cyl + disp , mtcars, family = gaussian())
glm_03 <- glm(mpg ~ cyl + disp + hp , mtcars, family = gaussian())
glm_04 <- glm(mpg ~ cyl + disp + hp + wt, mtcars, family = gaussian())
# get list of relevant model objects from the R environment
model_list <- ls()
model_list <- model_list[substr(model_list, 1, 4) == "glm_"]
# create a table to store the pairwise ANOVA results
n_models <- length(model_list)
anova_table <- matrix(0, nrow = n_models, ncol = n_models)
# loop through twice and do pairwise comparisons
for(row_index in 1:n_models) {
for(col_index in 1:n_models) {
anova_table[row_index, col_index] <- anova(model_list[row_index], model_list[col_index], test = "F")$'Pr(>F)'[2]
}
}
...but of course this loop at the end doesn't work because I'm not passing model objects to anova(), I'm passing the names of the objects as strings instead. How do I tell anova() to use the object that the string refers to, instead of the string itself?
Thank you.
======================
Possible solution:
data(mtcars)
glm_list <- list()
glm_list$glm_01 <- glm(mpg ~ cyl , mtcars, family = gaussian())
glm_list$glm_02 <- glm(mpg ~ cyl + disp , mtcars, family = gaussian())
glm_list$glm_03 <- glm(mpg ~ cyl + disp + hp , mtcars, family = gaussian())
glm_list$glm_04 <- glm(mpg ~ cyl + disp + hp + wt, mtcars, family = gaussian())
# create a table to store the pairwise ANOVA results
n_models <- length(glm_list)
anova_table <- matrix(0, nrow = n_models, ncol = n_models)
# loop through twice and do pairwise comparisons
row_idx <- 0
col_idx <- 0
for(row_glm in glm_list)
{
row_idx <- row_idx + 1
for(col_glm in glm_list)
{
col_idx <- col_idx + 1
anova_table[row_idx, col_idx] <- anova(row_glm, col_glm, test = "F")$'Pr(>F)'[2]
}
col_idx <- 0
}
row_idx <- 0

The easiest way to do this would be to keep all your models in a list. This makes it simple to iterate over them. For example, you can create all of your models and do a pairwise comparison between all of them like this:
data(mtcars)
f_list <- list(mpg ~ cyl,
mpg ~ cyl + disp,
mpg ~ cyl + disp + hp,
mpg ~ cyl + disp + hp + wt)
all_glms <- lapply(f_list, glm, data = mtcars, family = gaussian)
all_pairs <- as.data.frame(combn(length(all_glms), 2))
result <- lapply(all_pairs, function(i) anova(all_glms[[i[1]]], all_glms[[i[2]]]))
Which gives you:
result
#> $V1
#> Analysis of Deviance Table
#>
#> Model 1: mpg ~ cyl
#> Model 2: mpg ~ cyl + disp
#> Resid. Df Resid. Dev Df Deviance
#> 1 30 308.33
#> 2 29 270.74 1 37.594
#>
#> $V2
#> Analysis of Deviance Table
#>
#> Model 1: mpg ~ cyl
#> Model 2: mpg ~ cyl + disp + hp
#> Resid. Df Resid. Dev Df Deviance
#> 1 30 308.33
#> 2 28 261.37 2 46.965
#>
#> $V3
#> Analysis of Deviance Table
#>
#> Model 1: mpg ~ cyl
#> Model 2: mpg ~ cyl + disp + hp + wt
#> Resid. Df Resid. Dev Df Deviance
#> 1 30 308.33
#> 2 27 170.44 3 137.89
#>
#> $V4
#> Analysis of Deviance Table
#>
#> Model 1: mpg ~ cyl + disp
#> Model 2: mpg ~ cyl + disp + hp
#> Resid. Df Resid. Dev Df Deviance
#> 1 29 270.74
#> 2 28 261.37 1 9.3709
#>
#> $V5
#> Analysis of Deviance Table
#>
#> Model 1: mpg ~ cyl + disp
#> Model 2: mpg ~ cyl + disp + hp + wt
#> Resid. Df Resid. Dev Df Deviance
#> 1 29 270.74
#> 2 27 170.44 2 100.3
#>
#> $V6
#> Analysis of Deviance Table
#>
#> Model 1: mpg ~ cyl + disp + hp
#> Model 2: mpg ~ cyl + disp + hp + wt
#> Resid. Df Resid. Dev Df Deviance
#> 1 28 261.37
#> 2 27 170.44 1 90.925
Created on 2020-08-25 by the reprex package (v0.3.0)

If you want to reference arbitrary objects in an accessible environment by symbol without putting them into a list object, the standard way to return the top object on the search list whose symbol is equal to a string is get(), or the vector equivalent mget(). I.e. get("glm_01") gets you the top object on the search list that has the symbol glm_01. The most minimal modification to your approach would be to wrap your calls to model_list[row_index] and model_list[col_index] in get().
You can be more precise about where to look for objects by assigning the models in a named environment and only getting from that environment (using the envir parameter to get()).

Related

F test for β1=β2 in R

If my model looks like this, Y=β0+β1X1+β2X2+β3X3+β4X4, and I want to perform an F test (5%) in R for β1=β2, how do I do it?
The only tutorials I can find online deal with β1=β2=0, but that's not what I'm looking for here.
Here's an example in R testing whether the coefficient for vs is the same as the coefficient for am:
data(mtcars)
mod <- lm(mpg ~ hp + disp + vs + am, data=mtcars)
library(car)
linearHypothesis(mod, "vs=am")
# Linear hypothesis test
#
# Hypothesis:
# vs - am = 0
#
# Model 1: restricted model
# Model 2: mpg ~ hp + disp + vs + am
#
# Res.Df RSS Df Sum of Sq F Pr(>F)
# 1 28 227.07
# 2 27 213.52 1 13.547 1.7131 0.2016
The glht function from multcomp package can do this (among others). For example, if your model is
mod1 <-lm( y ~ x1 + x2 + x3 + x4)
then you can use:
summary(multcomp::glht(mod1, "x1-x2=0"))
Run the model with and without the constraint and then use anova to compare them. No packages are used.
mod1 <- lm(mpg ~ cyl + disp + hp + drat, mtcars)
mod2 <- lm(mpg ~ I(cyl + disp) + hp + drat, mtcars) # constraint imposed
anova(mod2, mod1)
giving:
Analysis of Variance Table
Model 1: mpg ~ I(cyl + disp) + hp + drat
Model 2: mpg ~ cyl + disp + hp + drat
Res.Df RSS Df Sum of Sq F Pr(>F)
1 28 252.95
2 27 244.90 1 8.0513 0.8876 0.3545
The underlying calculation is the following. It gives the same result as above.
L <- matrix(c(0, 1, -1, 0, 0), 1) # hypothesis is L %*% beta == 0
q <- nrow(L) # 1
co <- coef(mod1)
resdf <- df.residual(mod1) # = nobs(mod1) - length(co) = 32 - 5 = 27
SSH <- t(L %*% co) %*% solve(L %*% vcov(mod1) %*% t(L)) %*% L %*% co
SSH/q # F value
## [,1]
## [1,] 0.8876363
pf(SSH/q, q, resdf, lower.tail = FALSE) # p value
## [,1]
## [1,] 0.3544728

map() model output to a dataframe

I've been using map() to calculate and extract certain statistics from multiple lm() models.
To give a reproducible example, using the mtcars dataset, I start with an input vector of formulae to be estimated using lm() models:
library(tidyverse)
df <- mtcars
input_char <- c("mpg ~ disp",
"mpg ~ disp + hp")
input_formula <- map(input_char, formula)
I've then got a function that calculates and extracts the relevant statistics for each model. For simplicity and reproducibility, here's a simplified function that just extracts the R-squared of the model.
get_rsquared <- function(a_formula) {
model1 <- lm(a_formula, data = df)
rsquared <- summary(model1)$r.squared
c(model = a_formula, rsquared = rsquared)
}
I've then used map to iterate through the formulae and extract the R-squared from each model.
models <- map(input_formula, get_rsquared)
models
which gives the output:
[[1]]
[[1]]$model
mpg ~ disp
<environment: 0x7f98987f4000>
[[1]]$rsquared
[1] 0.7183433
[[2]]
[[2]]$model
mpg ~ disp + hp
<environment: 0x7f98987f4000>
[[2]]$rsquared
[1] 0.7482402
My question is regarding the output being a list.
Is there a simple way to make the output a dataframe?
My desired output is:
#> model rsquared
#> 1 mpg ~ disp 0.7183433
#> 2 mpg ~ disp + hp 0.7482402
Keep the formulas as character strings and use as.formula() as part of the the get_rsquared() function as it's easier to work with them as character strings than formula objects.
library(purrr)
library(dplyr)
df <- mtcars
input_char <- c("mpg ~ disp",
"mpg ~ disp + hp")
get_rsquared <- function(a_formula) {
model1 <- lm(as.formula(a_formula), data = df)
rsquared <- summary(model1)$r.squared
list(model = a_formula, rsquared = rsquared)
}
map_df(input_char, get_rsquared)
# A tibble: 2 x 2
model rsquared
<chr> <dbl>
1 mpg ~ disp 0.718
2 mpg ~ disp + hp 0.748

How can I display only significant path lines on a path diagram? [R: lavaan, semPlot]

I would like to make changes to my path diagram that I made with the lavaan and semPlot packages.
require(lavaan); require(semPlot)
head(mtcars)
model <-'
mpg ~ hp + gear + cyl
hp ~ cyl + disp
'
fit <- sem(model, "std", data = mtcars)
semPaths(fit, "std", fade = F, residuals = F)
Because mpg <- gear and mpg <- cyl are not significant, I would like to have it displayed in a transparent way (e.g., adding * to the significant pathlines or preventing from non-significant pathlines from showing up on a path diagram). Is there any way to do that?
Thank you for your support!
I know it's an old thread but I found it while looking for this, and figured I should provide my solution for others.
require(lavaan); require(semPlot) ; require(tidyverse)
#> Loading required package: lavaan
#> This is lavaan 0.6-3
#> lavaan is BETA software! Please report any bugs.
#> Loading required package: semPlot
#> Registered S3 methods overwritten by 'huge':
#> method from
#> plot.sim BDgraph
#> print.sim BDgraph
#> Loading required package: tidyverse
model <-'
mpg ~ hp + gear + cyl
hp ~ cyl + disp
'
fit <- sem(model, "std", data = mtcars)
# got this warning, but simply ignored it.
#> Warning in lav_partable_check(lavpartable, categorical =
#> lavoptions$categorical, : lavaan WARNING: parameter table does not contain
#> thresholds
lavaan::standardizedSolution(fit) %>% dplyr::filter(!is.na(pvalue)) %>% arrange(desc(pvalue)) %>% mutate_if("is.numeric","round",3) %>% select(-ci.lower,-ci.upper,-z)
#> lhs op rhs est.std se pvalue
#> 1 mpg ~ gear 0.022 0.087 0.801
#> 2 mpg ~ cyl -0.166 0.260 0.524
#> 3 mpg ~ hp -0.694 0.242 0.004
#> 4 hp ~~ hp 0.101 0.034 0.003
#> 5 hp ~1 -2.674 0.600 0.000
#> 6 hp ~ disp 0.444 0.094 0.000
#> 7 hp ~ cyl 0.529 0.098 0.000
#> 8 mpg ~1 4.514 0.751 0.000
#> 9 mpg ~~ mpg 0.258 0.039 0.000
pvalue_cutoff <- 0.05
obj <- semPlot:::semPlotModel(fit)
# save a copy of the original, so we can compare it later and be sure we removed only what we intended to remove
original_Pars <- obj#Pars
check_Pars <- obj#Pars %>% dplyr::filter(!(edge %in% c("int","<->") | lhs == rhs)) # this is the list of paramater to sift thru
keep_Pars <- obj#Pars %>% dplyr::filter(edge %in% c("int","<->") | lhs == rhs) # this is the list of paramater to keep asis
test_against <- lavaan::standardizedSolution(fit) %>% dplyr::filter(pvalue < pvalue_cutoff, rhs != lhs)
test_against_rev <- test_against %>% rename(rhs2 = lhs, # for some reason, the rhs and lhs are reversed in the standardizedSolution() output, for some of the values
lhs = rhs) %>% # I'll have to reverse it myself, and test against both orders
rename(rhs = rhs2)
checked_Pars <-
check_Pars %>% semi_join(test_against, by = c("lhs", "rhs")) %>% bind_rows(
check_Pars %>% semi_join(test_against_rev, by = c("lhs", "rhs"))
)
obj#Pars <- keep_Pars %>% bind_rows(checked_Pars)
#let's verify by looking at the list of the edges we removed from the object
anti_join(original_Pars,obj#Pars)
#> Joining, by = c("label", "lhs", "edge", "rhs", "est", "std", "group", "fixed", "par")
#> label lhs edge rhs est std group fixed par
#> 1 gear ~> mpg 0.1582792 0.0218978 FALSE 2
#> 2 cyl ~> mpg -0.4956938 -0.1660012 FALSE 3
# great, let's plot
semPlot::semPaths(obj, "std",fade = F, residuals = F)
Note this is highly tinkered, and the criterion for exclusion should be modified to your needs (especially the (edge %in% c("int","<->") parts)
Created on 2019-07-09 by the reprex package (v0.3.0)
redacted session_info()
#> lavaan * 0.6-3 2018-09-22 [1] CRAN (R 3.6.0)
#> semPlot * 1.1.1 2019-04-05 [1] CRAN (R 3.6.0)
#> tidyverse * 1.2.1 2017-11-14 [1] CRAN (R 3.6.0)
I have recently discovered the lavaanPlot package, which allows to show the coefficients for a specified significance criteria. The code is:
require(lavaan); require(lavaanPlot)
head(mtcars)
model <-'
mpg ~ hp + gear + cyl
hp ~ cyl + disp
'
fit <- sem(model, "std", data = mtcars)
sem.model <- lavaanPlot(model = fit, node_options = list(shape = "box", fontname = "Helvetica"), edge_options = list(color = "grey"), coefs = TRUE, sig = 0.05)
The resulting image looks like this:
I believe it can be further customized.

anova on a sequence of models stored in a list

I am running a series of models and storing them in a list:
fm0 <- list()
for(i in 1:3){
m <- formula(mpg ~ disp)
if(i > 1)
m <- update.formula(m, ~ . + gear)
if(i > 2)
m <- update.formula(m, ~ . + qsec)
fm1 <- lm(m, data = mtcars)
fm0[[i]] <- fm1
names(fm0)[i] <- paste0("m",i)
}
I want to run anova on the sequence of models like this:
anova(fm0$m1, fm0$m2, fm0$m3)
# Analysis of Variance Table
#
# Model 1: mpg ~ disp
# Model 2: mpg ~ disp + gear
# Model 3: mpg ~ disp + gear + qsec
# Res.Df RSS Df Sum of Sq F Pr(>F)
# 1 30 317.16
# 2 29 317.01 1 0.1443 0.0130 0.9099
# 3 28 309.83 1 7.1839 0.6492 0.4272
but I want something generic where I do not need to type out each named component of the list as the number of models is varying (depending on the data, which is set up in another loop, in which the loop above sits).
I tried lapply(fm0, anova), but it runs anova on each model on its own, which is not what I am after.
Here is an absolutely inelegant solution:
eval(parse(text=paste("anova(",paste("fm0[[",1:length(fm0),"]]",sep="",collapse=","),")")))

In R, when creating a model, is there an equivalent to the by statement in SAS?

Say I have a data set that I'd like to create a lm, for each combination of variable A and B. Where A has two values: 'a' and 'b', and B has three values: 1,2,3. This leaving me with six possible combinations of variables A and B.
This said, I would like to create six (6) models. In example the first model would have the data subsetted where A = a and B = 1.
In SAS, in example, the code would be as follows (please note the by statement):
proc glm data = mydate;
by A B;
class Cat1 Cat2;
model Y = X + Cat1 + Cat2;
run;
The by statement will generate one model for combination of A and B.
This is really just a split-apply step:
split the data into chunks
smydate <- split(mydate, list(A = A, B = B))
Each component of smydate represents the data for a particular combination of A and B. You may need to add drop = TRUE to the split call if your data doesn't have all combinations of the levels of A and B.
apply the lm() function over the components of the list smydate
lmFun <- function(dat) {
lm(y ~ x + cat1 + cat2, data = dat)
}
models <- lapply(smydate, lmFun)
Now you have a list, models, where each component contains a lm object for the particular combination of A and B.
An example (based on the one shown by rawr in the comments is:
models <- lapply(split(mtcars, list(mtcars$am, mtcars$gear), drop = TRUE),
function(x) {lm(mpg ~ wt + disp, data = x)})
str(models)
models
which gives:
> str(models, max = 1)
List of 4
$ 0.3:List of 12
..- attr(*, "class")= chr "lm"
$ 0.4:List of 12
..- attr(*, "class")= chr "lm"
$ 1.4:List of 12
..- attr(*, "class")= chr "lm"
$ 1.5:List of 12
..- attr(*, "class")= chr "lm"
> models
$`0.3`
Call:
lm(formula = mpg ~ wt + disp, data = x)
Coefficients:
(Intercept) wt disp
27.994610 -2.384834 -0.007983
$`0.4`
Call:
lm(formula = mpg ~ wt + disp, data = x)
Coefficients:
(Intercept) wt disp
219.1047 -106.8075 0.9953
$`1.4`
Call:
lm(formula = mpg ~ wt + disp, data = x)
Coefficients:
(Intercept) wt disp
43.27860 -3.03114 -0.09481
$`1.5`
Call:
lm(formula = mpg ~ wt + disp, data = x)
Coefficients:
(Intercept) wt disp
41.779042 -7.230952 -0.006731
As rawr notes in the comments, you can do this in fewer steps using by(), or any one of a number of other higher-level functions in say the plyr package, but doing things by hand at least once illustrates the generality of the approach; you can always use the short cuts once you are familiar with the general idea.
Using group_by in the dplyr package will run an analysis for each subgroup combination. Using the mtcars dataset:
library(dplyr)
res <- mtcars %>%
group_by(am, gear) %>%
do(mod = lm(mpg ~ wt + disp, data = .))
res$mod
Will give you the list of lm objects.
Other packages will make this more elegant. You could do this in-line with the magrittr package and go straight to the list of lm objects:
library(magrittr)
mtcars %>%
group_by(am, gear) %>%
do(mod = lm(mpg ~ wt + disp, data = .)) %>%
use_series(mod)
Or use the broom package to extract coefficient values from the lm objects:
library(broom)
mtcars %>%
group_by(am, gear) %>%
do(mod = lm(mpg ~ wt + disp, data = .)) %>%
glance(mod)
Source: local data frame [4 x 13]
Groups: am, gear
am gear r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual
1 0 3 0.6223489 0.5594070 2.2379851 9.887679 0.00290098 3 -31.694140 71.38828 74.22048 60.102926 12
2 0 4 0.9653343 0.8960028 0.9899495 13.923469 0.18618733 3 -2.862760 13.72552 11.27070 0.980000 1
3 1 4 0.7849464 0.6989249 2.9709337 9.125006 0.02144702 3 -18.182504 44.36501 44.68277 44.132234 5
4 1 5 0.9827679 0.9655358 1.2362092 57.031169 0.01723212 3 -5.864214 19.72843 18.16618 3.056426 2
More specifically, you can use lmList to fit linear models to categories, after using #bjoseph's strategy of generating an interaction variable:
mydate <- transform(mydate, ABcat=interaction(A,B,drop=TRUE))
library("lme4") ## or library("nlme")
lmList(Y~X+Cat1+Cat2|ABcat,mydate)
You could try several different things.
Let's say our data is:
structure(list(A = structure(c(1L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor"), B = structure(c(1L, 2L, 1L, 2L), .Label = c("A", "B"), class = "factor"), x = c(1, 2, 3, 4), y = c(2, 2, 2, 2)), .Names = c("A", "B", "x", "y"), row.names = c(NA, -4L), class = "data.frame")
x
#> A B x y
1 A A 1 2
2 A B 2 2
3 B A 3 2
4 B B 4 2
by()
This returns a list-type object. Notice that it doesn't return results in the order we might have expected. It's trying to keep the second factor as stable as possible when iterating. You could adjust this by using list(x$B,x$A)
by(x[c("x","y")],list(x$A,x$B),function(x){x[1]*x[2]})
[1] 2
-------------------------------------------------------------------------------------
[1] 6
-------------------------------------------------------------------------------------
[1] 4
-------------------------------------------------------------------------------------
[1] 8
expand.grid()
This is a simple for loop where we pre-generated the combinations of interest, subset the data in the loop and perform the function of interest. expand.grid() can be slow with large sets of combinations and for loops aren't necessarily fast but you have a lot of control in the middle.
combinations = expand.grid(levels(x$A),levels(x$B))
for(i in 1:nrow(combinations)){
d = x[x$A==combinations[i,1] & x$B==combinations[i,2],c("x","y")]
print(d[1]*d[2])
}
#> x
1 2
x
3 6
x
2 4
x
4 8
If you want the fit/predictions instead of summary stats(t-tests, etc), it's easier to fit an interaction model of Y~(A:B)*(X + Cat1 + Cat2) - 1 - X - Cat1 - Cat2; by subtracting out the main effects, R will reparameterize and place all the variance on the interactions. Here's an example:
> mtcars <- within(mtcars, {cyl = as.factor(cyl); am=as.factor(am)})
> model <- lm(mpg~(cyl:am)*(hp+wt)-1-hp-wt, mtcars)
> summary(model)
Call:
lm(formula = mpg ~ (cyl:am) * (hp + wt) - 1 - hp - wt, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-2.6685 -0.9071 0.0000 0.7705 4.1879
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
cyl4:am0 2.165e+01 2.252e+01 0.961 0.3517
cyl6:am0 6.340e+01 4.245e+01 1.494 0.1560
cyl8:am0 2.746e+01 5.000e+00 5.492 6.20e-05 ***
cyl4:am1 4.725e+01 5.144e+00 9.184 1.51e-07 ***
cyl6:am1 2.320e+01 3.808e+01 0.609 0.5515
cyl8:am1 1.877e+01 1.501e+01 1.251 0.2302
cyl4:am0:hp -4.635e-02 1.107e-01 -0.419 0.6815
cyl6:am0:hp 7.425e-03 1.650e-01 0.045 0.9647
cyl8:am0:hp -2.110e-02 2.531e-02 -0.834 0.4175
cyl4:am1:hp -7.288e-02 4.457e-02 -1.635 0.1228
cyl6:am1:hp -2.000e-02 4.733e-02 -0.423 0.6786
cyl8:am1:hp -1.127e-02 4.977e-02 -0.226 0.8240
cyl4:am0:wt 1.762e+00 5.341e+00 0.330 0.7460
cyl6:am0:wt -1.332e+01 1.303e+01 -1.022 0.3231
cyl8:am0:wt -2.025e+00 1.099e+00 -1.843 0.0851 .
cyl4:am1:wt -6.465e+00 2.467e+00 -2.621 0.0193 *
cyl6:am1:wt -4.926e-15 1.386e+01 0.000 1.0000
cyl8:am1:wt NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.499 on 15 degrees of freedom
Multiple R-squared: 0.9933, Adjusted R-squared: 0.9858
F-statistic: 131.4 on 17 and 15 DF, p-value: 3.045e-13
compare with a cyl4:am1 submodel:
> summary(lm(mpg~wt+hp, mtcars, subset=cyl=='4' & am=='1'))
Call:
lm(formula = mpg ~ wt + hp, data = mtcars, subset = cyl == "4" &
am == "1")
Residuals:
Datsun 710 Fiat 128 Honda Civic Toyota Corolla Fiat X1-9 Porsche 914-2
-2.66851 4.18787 -2.61455 3.25523 -2.62538 -0.77799
Lotus Europa Volvo 142E
1.17181 0.07154
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 47.24552 6.57304 7.188 0.000811 ***
wt -6.46508 3.15205 -2.051 0.095512 .
hp -0.07288 0.05695 -1.280 0.256814
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.193 on 5 degrees of freedom
Multiple R-squared: 0.6378, Adjusted R-squared: 0.493
F-statistic: 4.403 on 2 and 5 DF, p-value: 0.07893
The estimates of the coefficients are exactly the same, and the standard errors are higher/more conservative here, because s is being estimated only from the subset rather than pooling across all the models. Pooling may or may not be an appropriate assumption for your use case, statistically.
It's also much easier to get predictions: predict(model, X) vs having to split-apply-combine again.

Resources