Export coefficients from a loop for multiple cox regression

Export coefficients from a loop for multiple cox regression - r

I need your help! I have a data set with 100,000 cases and 81 variables and I run a loop for multiple regression for each variable adjusted for age and sex in r:
covariates <- c(var1, var2, ... var81)
purrr:: map(covariates, ~coxph(as.formula(paste("Surv(Time,Event) ~ Age + Sex +", .x)), data=mydata))
The output includes the coefficients for age, sex and for each variable, like that:
coef exp(coef) se(coef) z p
Age 0.0000 0.0000
Sex
Var1
I was wondering if there is a way for me to export in excel only the coefficient of each variable, aka only the third line, and not all the three of them.
Thank you so much for your help in advance!

Using mtcars as an example -
library(dplyr)
library(survival)
covariates <- c('mpg', 'cyl')
purrr:: map_df(covariates, ~{
mod <- coxph(as.formula(paste("Surv(disp,am) ~ hp + ", .x)), data=mtcars)
summary(mod)$coefficients[.x, ]
}) %>%
mutate(corvariate = covariates, .before = 1) -> result
result
# corvariate coef `exp(coef)` `se(coef)` z `Pr(>|z|)`
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 mpg 0.614 1.85 0.167 3.68 0.000238
#2 cyl -2.17 0.114 0.704 -3.08 0.00208
Write the output to excel -
writexl::write_xlsx(result, 'data.xlsx')

Related

How can I extract, label and data.frame values from Console in a loop?

I made a nls loop and get values calculated in console. Now I want to extract those values, specify which values are from which group and put everything in a dataframe to continue working.
my loop so far:
for (i in seq_along(trtlist2)) { loopmm.nls <-
nls(rate ~ (Vmax * conc /(Km + conc)),
data=subset(M3, M3$trtlist==trtlist2[i]),
start=list(Km=200, Vmax=2), trace=TRUE )
summary(loopmm.nls)
print(summary(loopmm.nls))
}
the output in console: (this is what I want to extract and put in a dataframe, I have this same "parameters" thing like 20 times)
Parameters:
Estimate Std. Error t value Pr(>|t|)
Km 23.29820 9.72304 2.396 0.0228 *
Vmax 0.10785 0.01165 9.258 1.95e-10 ***
---
different ways of extracting data from the console that work but not in the loop (so far!)
#####extract data in diff ways from nls#####
## extract coefficients as matrix
Kinall <- summary(mm.nls)$parameters
## extract coefficients save as dataframe
Kin <- as.data.frame(Kinall)
colnames(Kin) <- c("values", "SE", "T", "P")
###create Km Vmax df
Kms <- Kin[1, ]
Vmaxs <- Kin[2, ]
#####extract coefficients each manually
Km <- unname(coef(summary(mm.nls))["Km", "Estimate"])
Vmax <- unname(coef(summary(mm.nls))["Vmax", "Estimate"])
KmSE <- unname(coef(summary(mm.nls))["Km", "Std. Error"])
VmaxSE <- unname(coef(summary(mm.nls))["Vmax", "Std. Error"])
KmP <- unname(coef(summary(mm.nls))["Km", "Pr(>|t|)"])
VmaxP <- unname(coef(summary(mm.nls))["Vmax", "Pr(>|t|)"])
KmT <- unname(coef(summary(mm.nls))["Km", "t value"])
VmaxT <- unname(coef(summary(mm.nls))["Vmax", "t value"])
one thing that works if you extract data through append, but somehow that only works for "estimates" not the rest
Kms <- append(Kms, unname(coef(loopmm.nls)["Km"] ))
Vmaxs <- append(Vmaxs, unname(coef(loopmm.nls)["Vmax"] ))
}
Kindf <- data.frame(trt = trtlist2, Vmax = Vmaxs, Km = Kms)

I would just keep everything in the dataframe for ease. You can nest by the group and then run the regression then pull the coefficients out. Just make sure you have tidyverse and broom installed on your computer.
library(tidyverse)
#example
mtcars |>
nest(data = -cyl) |>
mutate(model = map(data, ~nls(mpg~hp^b,
data = .x,
start = list(b = 1))),
clean_mod = map(model, broom::tidy)) |>
unnest(clean_mod) |>
select(-c(data, model))
#> # A tibble: 3 x 6
#> cyl term estimate std.error statistic p.value
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 6 b 0.618 0.0115 53.6 2.83e- 9
#> 2 4 b 0.731 0.0217 33.7 1.27e-11
#> 3 8 b 0.504 0.0119 42.5 2.46e-15
#what I expect will work for your data
All_M3_models <- M3 |>
nest(data = -trtlist) |>
mutate(model = map(data, ~nls(rate ~ (Vmax * conc /(Km + conc)),
data=.x,
start=list(Km=200, Vmax=2))),
clean_mod = map(model, broom::tidy))|>
unnest(clean_mod) |>
select(-c(data, model))

How to use results from different regression models in a scatterplot built using group_by in R?

I would like to add 2 different regression curves, coming from different models, in a scatter plot.
Let's use the example below:
Weight=c(12.6,12.6,16.01,17.3,17.7,10.7,17,10.9,15,14,13.8,14.5,17.3,10.3,12.8,14.5,13.5,14.5,17,14.3,14.8,17.5,2.9,21.4,15.8,40.2,27.3,18.3,10.7,0.7,42.5,1.55,46.7,45.3,15.4,25.6,18.6,11.7,28,35,17,21,41,42,18,33,35,19,30,42,23,44,22)
Increment=c(0.55,0.53,16.53,55.47,80,0.08,41,0.1,6.7,2.2,1.73,3.53,64,0.05,0.71,3.88,1.37,3.8,40,3,26.3,29.7,10.7,35,27.5,60,43,31,21,7.85,63,9.01,67.8,65.8,27,40.1,31.2,22.3,35,21,74,75,12,19,4,20,65,46,9,68,74,57,57)
Id=c(rep("Aa",20),rep("Ga",18),rep("Za",15))
df=data.frame(Id,Weight,Increment)
The scatter plot looks like this:
plot_df <- ggplot(df, aes(x = Weight, y = Increment, color=Id)) + geom_point()
I tested a linear and an exponential regression model and could extract the results following loki's answer there:
linear_df <- df %>% group_by(Id) %>% do(model = glance(lm(Increment ~ Weight,data = .))) %>% unnest(model)
exp_df <- df %>% group_by(Id) %>% do(model = glance(lm(log(Increment) ~ Weight,data = .))) %>% unnest(model)
The linear model fits better for the Ga group, the exponential one for the Aa group, and nothing for the Za one:
> linear_df
# A tibble: 3 x 13
Id r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 Aa 0.656 0.637 15.1 34.4 1.50e- 5 1 -81.6 169. 172. 4106. 18 20
2 Ga 1.00 1.00 0.243 104113. 6.10e-32 1 1.01 3.98 6.65 0.942 16 18
3 Za 0.0471 -0.0262 26.7 0.642 4.37e- 1 1 -69.5 145. 147. 9283. 13 15
> exp_df
# A tibble: 3 x 13
Id r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 Aa 0.999 0.999 0.0624 24757. 1.05e-29 1 28.2 -50.3 -47.4 0.0700 18 20
2 Ga 0.892 0.885 0.219 132. 3.86e- 9 1 2.87 0.264 2.94 0.766 16 18
3 Za 0.00444 -0.0721 0.941 0.0580 8.14e- 1 1 -19.3 44.6 46.7 11.5 13 15
Now, how can I draw the linear regression line for the Aa group, the exponential regression curve for the Ga group, and no curve for the Za group? There is this, but it applies for different regressions built inside the same model type. How can I combine my different objects?

The formula shown below gives the same fitted values as does 3 separate fits for each Id so create the lm objects for each of the two models and then plot the points and the lines for each. The straight solid lines are the linear model and the curved dashed lines are the exponential model.
library(ggplot2)
fm.lin <- lm(Increment ~ Id/Weight + 0, df)
fm.exp <- lm(log(Increment) ~ Id/Weight + 0, df)
df %>%
ggplot(aes(Weight, Increment, color=Id)) +
geom_point() +
geom_line(aes(y = fitted(fm.lin))) +
geom_line(aes(y = exp(fitted(fm.exp))), lty = 2, lwd = 1)
To only show the Aa fitted lines for the linear model and Ga fitted lines for the exponential model NA out the portions not wanted. In this case we used solid lines for the fitted models.
df %>%
ggplot(aes(Weight, Increment, color=Id)) +
geom_point() +
geom_line(aes(y = ifelse(Id == "Aa", fitted(fm.lin), NA))) +
geom_line(aes(y = ifelse(Id == "Ga", exp(fitted(fm.exp)), NA)))
Added
Regarding the questions in the comments, the formula used above nests Weight within Id and effectively uses a model matrix which, modulo column order, is a block diagonal matrix whose blocks are the model matrices of the 3 individual models. Look at this to understand it.
model.matrix(fm.lin)
Since this is a single model rather than three models the summary statistics will be pooled. To get separate summary statistics use lmList from the nlme package (which comes with R so it does not have to be installed -- just issue a library statement). The statements below will give objects of class lmList that can be used in place of the ones above as they have a fitted method that will return the same fitted values.
library(nlme)
fm.lin2 <- lmList(Increment ~ Weight | Id, df, pool = FALSE)
fm.exp2 <- lmList(log(Increment) ~ Weight | Id, df, pool = FALSE)
In addition, they can be used to get individual summary statistics. Internally the lmList objects consist of a list of 3 lm objects with attributes in this case so we can extract the summary statistics by extracting the summary statistics from each component.
library(broom)
sapply(fm.lin2, glance)
sapply(fm.exp2, glance)
One caveat is that common statistical tests between models using different dependent variables, Increment vs. log(Increment), are invalid.

possible solution
Weight=c(12.6,12.6,16.01,17.3,17.7,10.7,17,10.9,15,14,13.8,14.5,17.3,10.3,12.8,14.5,13.5,14.5,17,14.3,14.8,17.5,2.9,21.4,15.8,40.2,27.3,18.3,10.7,0.7,42.5,1.55,46.7,45.3,15.4,25.6,18.6,11.7,28,35,17,21,41,42,18,33,35,19,30,42,23,44,22)
Increment=c(0.55,0.53,16.53,55.47,80,0.08,41,0.1,6.7,2.2,1.73,3.53,64,0.05,0.71,3.88,1.37,3.8,40,3,26.3,29.7,10.7,35,27.5,60,43,31,21,7.85,63,9.01,67.8,65.8,27,40.1,31.2,22.3,35,21,74,75,12,19,4,20,65,46,9,68,74,57,57)
Id=c(rep("Aa",20),rep("Ga",18),rep("Za",15))
df=data.frame(Id,Weight,Increment)
library(tidyverse)
df_model <- df %>%
group_nest(Id) %>%
mutate(
formula = c(
"lm(log(Increment) ~ Weight, data = .x)",
"lm(Increment ~ Weight,data = .x)",
"lm(Increment ~ 0,data = .x)"
),
transform = c("exp(fitted(.x))",
"fitted(.x)",
"fitted(.x)")
) %>%
mutate(model = map2(data, formula, .f = ~ eval(parse(text = .y)))) %>%
mutate(fit = map2(model, transform, ~ eval(parse(text = .y)))) %>%
select(Id, data, fit) %>%
unnest(c(data, fit))
ggplot(df_model) +
geom_point(aes(Weight, Increment, color = Id)) +
geom_line(aes(Weight, fit, color = Id))
Created on 2021-10-06 by the reprex package (v2.0.1)

How do I extract the coefficient names from the lm coefficients?

I have the following code which displays some coefficients from lm
fit <-lm(Petal.Width ~ Petal.Length, data=iris)
cf <-coef(summary(fit,complete = TRUE))
colnames(cf)[4] <- "pval"
cf<- data.frame(cf)
cf <-cf[cf$pval < 0.05,]
cf <-cf[order(-cf$pval), ]
head(cf)
cf[1,1]
I want to extract the names in the left column ie (intercept) and petal length.
I thought I could use cf[1,1] but it shows the estimate

Those are extracted using rownames :
fit <-lm(Petal.Width ~ Petal.Length, data=iris)
cf <-coef(summary(fit,complete = TRUE))
rownames(cf)
#[1] "(Intercept)" "Petal.Length"

The tidyverse solution would be to use broom:
library(broom)
tidy_fit <- tidy(fit)
Results:
# A tibble: 2 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -0.363 0.0398 -9.13 4.70e-16
2 Petal.Length 0.416 0.00958 43.4 4.68e-86
Then it's easy to extract the components that you want and the resulting code is more readable, e.g. tidy_fit$term to get the list of variables ((Intercept) and Petal.Length).

How to change unit increment in hazard ratio from coxph and frailty model in R?

I ran a coxph model and a frailty model, but now I would like to change the hazard ratio for continuous variable (age) to show in terms of 5-unit increment instead of 1-unit. Is there a function in R that can perform such task? If so, does the function also work for frailty mode? I used the package frailtypack.
library('survival')
data(veteran)
cox <- coxph(Surv(time, status) ~ age, data = veteran)
summary(cox)
# Call:
# coxph(formula = Surv(time, status) ~ age, data = veteran)
#
# n= 137, number of events= 128
#
# coef exp(coef) se(coef) z Pr(>|z|)
# age 0.007500 1.007528 0.009565 0.784 0.433
#
# exp(coef) exp(-coef) lower .95 upper .95
# age 1.008 0.9925 0.9888 1.027
#
# Concordance= 0.515 (se = 0.029 )
# Likelihood ratio test= 0.63 on 1 df, p=0.4
# Wald test = 0.61 on 1 df, p=0.4
# Score (logrank) test = 0.62 on 1 df, p=0.4

Just add a new variable that represents the age group each subject belongs to; for example 1: 0-4, 2: 5-9, 3: 10-15, etc.
This is an example using the veteran dataset in the survival package. The data has a continuous variable age. Adding this as a predictor to the model will give you the relative risk (hazard ratio) for a one-year increase or increment in age. If you are interested in the x-year increment, you should generate a new variable which groups subjects accordingly. For these data, I applied the following grouping; group 1: younger than 40, group 2: 40 - <50, group 3: 50 - < 60, group 4: 60 - <70, and group 5: 70 or older. As such, the HR for a 10-year increment is 1.049. Alternatively, the risk increases with 5% for every 10 year increase in age. Note that the association is not statistically significant.
library(survival)
data(veteran)
veteran$ageCat <- 5
veteran$ageCat[veteran$age < 70] <- 4
veteran$ageCat[veteran$age < 60] <- 3
veteran$ageCat[veteran$age < 50] <- 2
veteran$ageCat[veteran$age < 40] <- 1
table(veteran$ageCat)
1 2 3 4 5
11 20 22 72 12
cox <- coxph(Surv(time, status) ~ ageCat, data = veteran)
summary(cox)
Call:
coxph(formula = Surv(time, status) ~ ageCat, data = veteran)
n= 137, number of events= 128
coef exp(coef) se(coef) z Pr(>|z|)
ageCat 0.04793 1.04910 0.09265 0.517 0.605
exp(coef) exp(-coef) lower .95 upper .95
ageCat 1.049 0.9532 0.8749 1.258
Concordance= 0.509 (se = 0.028 )
Rsquare= 0.002 (max possible= 0.999 )
Likelihood ratio test= 0.27 on 1 df, p=0.6024
Wald test = 0.27 on 1 df, p=0.6049
Score (logrank) test = 0.27 on 1 df, p=0.6048

#milan's post answers a similar question but not the one as asked. Since age was split into decades and modeled as a continuous variable, the hazard ratio would compare a subject's age-decade compared to the next youngest decade. That is, the HR for subjects aged 51 vs 49 or 59 vs 41 would be the same despite 2 or 18 years between them.
Anyway, the default as you suggest is for a 1-unit increment in the continuous variable, age in this case. It's not always useful to compare subjects by 1-unit change especially when the range gets to be much larger.
You can do the following which is naive to the model, so this should would for a lm, glm, survival::coxph, frailtypack::frailtyPenal, etc.
library('survival')
data(veteran)
## 1-year increase in age
cox <- coxph(Surv(time, status) ~ age, data = veteran)
exp(coef(cox))
# age
# 1.007528
For a multiplicative model like Cox regressions, you can get the x-unit change after the model is fit:
## 5-year increase in age
exp(coef(cox)) ^ 5
# age
# 1.038211
## or equivalently
exp(coef(cox) * 5)
# age
# 1.038211
However, it's easier to create a variable for the age transformation then fit the model:
## or you can create a variable to model
veteran <- within(veteran, {
age5 <- age / 5
})
cox5_1 <- coxph(Surv(time, status) ~ age5, data = veteran)
exp(coef(cox5_1))
# age10
# 1.038211
cox5_2 <- coxph(Surv(time, status) ~ I(age / 5), data = veteran)
exp(coef(cox5_2))
# I(age/5)
# 1.038211
Note you need to use I here in the formula interface since some operators have special meanings in formulae. For example, lm(mpg ~ wt - 1, mtcars) and lm(mpg ~ I(wt - 1), mtcars) are two different models.
You can use these methods in other models, for example frailtyPenal if that is indeed the one you are using:
library('frailtypack')
fp <- frailtyPenal(Surv(time, status) ~ age, data = veteran, n.knots = 12, kappa = 1e5)
exp(fp$coef)
exp(fp$coef) ^ 5
fp5_1 <- frailtyPenal(Surv(time, status) ~ age5, data = veteran, n.knots = 12, kappa = 1e5)
fp5_2 <- frailtyPenal(Surv(time, status) ~ I(age / 5), data = veteran, n.knots = 12, kappa = 1e5)
exp(fp5_1$coef)
exp(fp5_2$coef)

In R, when creating a model, is there an equivalent to the by statement in SAS?

Say I have a data set that I'd like to create a lm, for each combination of variable A and B. Where A has two values: 'a' and 'b', and B has three values: 1,2,3. This leaving me with six possible combinations of variables A and B.
This said, I would like to create six (6) models. In example the first model would have the data subsetted where A = a and B = 1.
In SAS, in example, the code would be as follows (please note the by statement):
proc glm data = mydate;
by A B;
class Cat1 Cat2;
model Y = X + Cat1 + Cat2;
run;
The by statement will generate one model for combination of A and B.

This is really just a split-apply step:
split the data into chunks
smydate <- split(mydate, list(A = A, B = B))
Each component of smydate represents the data for a particular combination of A and B. You may need to add drop = TRUE to the split call if your data doesn't have all combinations of the levels of A and B.
apply the lm() function over the components of the list smydate
lmFun <- function(dat) {
lm(y ~ x + cat1 + cat2, data = dat)
}
models <- lapply(smydate, lmFun)
Now you have a list, models, where each component contains a lm object for the particular combination of A and B.
An example (based on the one shown by rawr in the comments is:
models <- lapply(split(mtcars, list(mtcars$am, mtcars$gear), drop = TRUE),
function(x) {lm(mpg ~ wt + disp, data = x)})
str(models)
models
which gives:
> str(models, max = 1)
List of 4
$ 0.3:List of 12
..- attr(*, "class")= chr "lm"
$ 0.4:List of 12
..- attr(*, "class")= chr "lm"
$ 1.4:List of 12
..- attr(*, "class")= chr "lm"
$ 1.5:List of 12
..- attr(*, "class")= chr "lm"
> models
$`0.3`
Call:
lm(formula = mpg ~ wt + disp, data = x)
Coefficients:
(Intercept) wt disp
27.994610 -2.384834 -0.007983
$`0.4`
Call:
lm(formula = mpg ~ wt + disp, data = x)
Coefficients:
(Intercept) wt disp
219.1047 -106.8075 0.9953
$`1.4`
Call:
lm(formula = mpg ~ wt + disp, data = x)
Coefficients:
(Intercept) wt disp
43.27860 -3.03114 -0.09481
$`1.5`
Call:
lm(formula = mpg ~ wt + disp, data = x)
Coefficients:
(Intercept) wt disp
41.779042 -7.230952 -0.006731
As rawr notes in the comments, you can do this in fewer steps using by(), or any one of a number of other higher-level functions in say the plyr package, but doing things by hand at least once illustrates the generality of the approach; you can always use the short cuts once you are familiar with the general idea.

Using group_by in the dplyr package will run an analysis for each subgroup combination. Using the mtcars dataset:
library(dplyr)
res <- mtcars %>%
group_by(am, gear) %>%
do(mod = lm(mpg ~ wt + disp, data = .))
res$mod
Will give you the list of lm objects.
Other packages will make this more elegant. You could do this in-line with the magrittr package and go straight to the list of lm objects:
library(magrittr)
mtcars %>%
group_by(am, gear) %>%
do(mod = lm(mpg ~ wt + disp, data = .)) %>%
use_series(mod)
Or use the broom package to extract coefficient values from the lm objects:
library(broom)
mtcars %>%
group_by(am, gear) %>%
do(mod = lm(mpg ~ wt + disp, data = .)) %>%
glance(mod)
Source: local data frame [4 x 13]
Groups: am, gear
am gear r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual
1 0 3 0.6223489 0.5594070 2.2379851 9.887679 0.00290098 3 -31.694140 71.38828 74.22048 60.102926 12
2 0 4 0.9653343 0.8960028 0.9899495 13.923469 0.18618733 3 -2.862760 13.72552 11.27070 0.980000 1
3 1 4 0.7849464 0.6989249 2.9709337 9.125006 0.02144702 3 -18.182504 44.36501 44.68277 44.132234 5
4 1 5 0.9827679 0.9655358 1.2362092 57.031169 0.01723212 3 -5.864214 19.72843 18.16618 3.056426 2

More specifically, you can use lmList to fit linear models to categories, after using #bjoseph's strategy of generating an interaction variable:
mydate <- transform(mydate, ABcat=interaction(A,B,drop=TRUE))
library("lme4") ## or library("nlme")
lmList(Y~X+Cat1+Cat2|ABcat,mydate)

You could try several different things.
Let's say our data is:
structure(list(A = structure(c(1L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor"), B = structure(c(1L, 2L, 1L, 2L), .Label = c("A", "B"), class = "factor"), x = c(1, 2, 3, 4), y = c(2, 2, 2, 2)), .Names = c("A", "B", "x", "y"), row.names = c(NA, -4L), class = "data.frame")
x
#> A B x y
1 A A 1 2
2 A B 2 2
3 B A 3 2
4 B B 4 2
by()
This returns a list-type object. Notice that it doesn't return results in the order we might have expected. It's trying to keep the second factor as stable as possible when iterating. You could adjust this by using list(x$B,x$A)
by(x[c("x","y")],list(x$A,x$B),function(x){x[1]*x[2]})
[1] 2
-------------------------------------------------------------------------------------
[1] 6
-------------------------------------------------------------------------------------
[1] 4
-------------------------------------------------------------------------------------
[1] 8
expand.grid()
This is a simple for loop where we pre-generated the combinations of interest, subset the data in the loop and perform the function of interest. expand.grid() can be slow with large sets of combinations and for loops aren't necessarily fast but you have a lot of control in the middle.
combinations = expand.grid(levels(x$A),levels(x$B))
for(i in 1:nrow(combinations)){
d = x[x$A==combinations[i,1] & x$B==combinations[i,2],c("x","y")]
print(d[1]*d[2])
}
#> x
1 2
x
3 6
x
2 4
x
4 8

If you want the fit/predictions instead of summary stats(t-tests, etc), it's easier to fit an interaction model of Y~(A:B)*(X + Cat1 + Cat2) - 1 - X - Cat1 - Cat2; by subtracting out the main effects, R will reparameterize and place all the variance on the interactions. Here's an example:
> mtcars <- within(mtcars, {cyl = as.factor(cyl); am=as.factor(am)})
> model <- lm(mpg~(cyl:am)*(hp+wt)-1-hp-wt, mtcars)
> summary(model)
Call:
lm(formula = mpg ~ (cyl:am) * (hp + wt) - 1 - hp - wt, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-2.6685 -0.9071 0.0000 0.7705 4.1879
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
cyl4:am0 2.165e+01 2.252e+01 0.961 0.3517
cyl6:am0 6.340e+01 4.245e+01 1.494 0.1560
cyl8:am0 2.746e+01 5.000e+00 5.492 6.20e-05 ***
cyl4:am1 4.725e+01 5.144e+00 9.184 1.51e-07 ***
cyl6:am1 2.320e+01 3.808e+01 0.609 0.5515
cyl8:am1 1.877e+01 1.501e+01 1.251 0.2302
cyl4:am0:hp -4.635e-02 1.107e-01 -0.419 0.6815
cyl6:am0:hp 7.425e-03 1.650e-01 0.045 0.9647
cyl8:am0:hp -2.110e-02 2.531e-02 -0.834 0.4175
cyl4:am1:hp -7.288e-02 4.457e-02 -1.635 0.1228
cyl6:am1:hp -2.000e-02 4.733e-02 -0.423 0.6786
cyl8:am1:hp -1.127e-02 4.977e-02 -0.226 0.8240
cyl4:am0:wt 1.762e+00 5.341e+00 0.330 0.7460
cyl6:am0:wt -1.332e+01 1.303e+01 -1.022 0.3231
cyl8:am0:wt -2.025e+00 1.099e+00 -1.843 0.0851 .
cyl4:am1:wt -6.465e+00 2.467e+00 -2.621 0.0193 *
cyl6:am1:wt -4.926e-15 1.386e+01 0.000 1.0000
cyl8:am1:wt NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.499 on 15 degrees of freedom
Multiple R-squared: 0.9933, Adjusted R-squared: 0.9858
F-statistic: 131.4 on 17 and 15 DF, p-value: 3.045e-13
compare with a cyl4:am1 submodel:
> summary(lm(mpg~wt+hp, mtcars, subset=cyl=='4' & am=='1'))
Call:
lm(formula = mpg ~ wt + hp, data = mtcars, subset = cyl == "4" &
am == "1")
Residuals:
Datsun 710 Fiat 128 Honda Civic Toyota Corolla Fiat X1-9 Porsche 914-2
-2.66851 4.18787 -2.61455 3.25523 -2.62538 -0.77799
Lotus Europa Volvo 142E
1.17181 0.07154
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 47.24552 6.57304 7.188 0.000811 ***
wt -6.46508 3.15205 -2.051 0.095512 .
hp -0.07288 0.05695 -1.280 0.256814
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.193 on 5 degrees of freedom
Multiple R-squared: 0.6378, Adjusted R-squared: 0.493
F-statistic: 4.403 on 2 and 5 DF, p-value: 0.07893
The estimates of the coefficients are exactly the same, and the standard errors are higher/more conservative here, because s is being estimated only from the subset rather than pooling across all the models. Pooling may or may not be an appropriate assumption for your use case, statistically.
It's also much easier to get predictions: predict(model, X) vs having to split-apply-combine again.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Export coefficients from a loop for multiple cox regression - r

Related

How can I extract, label and data.frame values from Console in a loop?

How to use results from different regression models in a scatterplot built using group_by in R?

How do I extract the coefficient names from the lm coefficients?

How to change unit increment in hazard ratio from coxph and frailty model in R?

In R, when creating a model, is there an equivalent to the by statement in SAS?

Categories

Resources