I have trained lm model on a dataset and generated the summary of the model using summary() function. How to get the summary in a table?
You can use broom::tidy :
model <- lm(mpg~cyl, mtcars)
broom::tidy(model)
# term estimate std.error statistic p.value
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 (Intercept) 37.9 2.07 18.3 8.37e-18
#2 cyl -2.88 0.322 -8.92 6.11e-10
Related
I tried doing this
fits = list(fit0)
for(i in 1:5)
{
temp = assign(paste0("fit", i), lm(formula = y ~ poly(x, degree = i, raw = TRUE)))
fits = append(fits, temp)
}
which seems like it should work and I don't get any errors initially. The problem though, is that instead of creating a list of lists of length 6, where each element is itself a list (as lm objects are lists), it seems to be taking the elements of each list and making them each a separate element in temp. When I do length(fits) it gives 61. And when I do View(fits) it shows me this:
which certainly looks like it took all the elements of each individual list and made them the elements of fits, though I don't understand why.
Oddly though, if I just do fits[1] in the console it gives
which is the exact same output I get if I type fit0 in the console. So it seems like it's in someway storing each lm object as one thing.
The problem though, is that if I then try to get the R^2 value for, e.g., fit0, it works fine if I do summary(fit0)$r.squared, but if try to do it for fits[1] it does this:
I don't understand what's going on here. I thought maybe the problem was using append, since I'd previously only used it with vectors so I Googled "how to create list of lists in R" but the examples I found used append so that doesn't seem to be the issue.
I assume it's something to do with the intricacy of lm objects, but the documentation isn't actually helpful (on a side note, why IS R's documentation so terrible anyway? Compared to Python, or even C++(which is a far more complicated language to work with overall), it's so much harder to gleam the details of how the different functions and data types work because the documentation always seems to give the bear minimum, if that, of information) so I don't know what I'm doing wrong.
I've tried Googling how to create a list of lm objects and I found the lmlist data type documentation, but that seems to be for when you want to create a single regression but using data grouped by categories in a data.frame, which isn't what I'm trying to do here. I also found this post: Populating a list with lm objects, but I don't really understand the example code the OP asks about as I'm unsure what they mean by a "random name" or how it even even makes sense for them to be trying to access a named element in what looks like am empty list, and the only answer does the same thing. I did make note of the comment mentioning using double brackets, but I get the same error whether I use double brackets or not:
I'm quite confused here, so any guidance would be greatly appreciated.
Showing how to use a for loop for this:
DF <- data.frame(x = rnorm(100), y = rnorm(100))
fits <- list(fit0 = lm(y ~ 1, data = DF))
for(i in 1:5)
{
fits[[paste0("fit", i)]] <- lm(formula = y ~ poly(x, degree = i, raw = TRUE), data = DF)
}
sapply(fits, \(x) summary(x)$r.squared)
# fit0 fit1 fit2 fit3 fit4 fit5
#0.00000000 0.06441347 0.07915820 0.08547018 0.08547089 0.08569820
From the perspective of a statistician, you should not do this.
lm objects in R are indeed complicated. The broom package offers a consistent way to convert model objects into a "tidy" output format that can be easier to work with downstream.
For instance, we can use broom::glance to get a table with the lm stats as a data frame:
fit <- lm(mpg ~ wt, data = mtcars)
broom::glance(fit)
Result
# A tibble: 1 × 12
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 0.753 0.745 3.05 91.4 1.29e-10 1 -80.0 166. 170. 278. 30 32
We could extend this to an example where we group the mtcars dataset by gear, nest the associated data for each gear group, run lm on each one, glance each of those, and finally unnest to get the results into a table. That seems to demonstrate what you're describing -- we can see how the r.squared varied for the lm run on each group.
library(tidyverse); library(broom)
mtcars %>%
group_by(gear) %>%
nest() %>%
mutate(fit = map(data, ~lm(.x$mpg~.x$wt)),
tidy = map(fit, glance)) %>%
unnest(tidy)
# Groups: gear [3]
gear data fit r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<dbl> <list> <list> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 4 <tibble [12 × 10]> <lm> 0.677 0.645 3.14 21.0 0.00101 1 -29.7 65.4 66.8 98.9 10 12
2 3 <tibble [15 × 10]> <lm> 0.608 0.578 2.19 20.2 0.000605 1 -32.0 69.9 72.1 62.3 13 15
3 5 <tibble [5 × 10]> <lm> 0.979 0.972 1.11 141. 0.00128 1 -6.34 18.7 17.5 3.69 3 5
Or maybe you have your list of lm objects, you could feed those into map_dfr(glance) to get a table with r.squared:
fit1 <- lm(mpg~wt, mtcars)
fit2 <- lm(mpg~cyl+wt, mtcars)
list(fit1, fit2) %>%
map_dfr(glance)
# A tibble: 2 × 12
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 0.753 0.745 3.05 91.4 1.29e-10 1 -80.0 166. 170. 278. 30 32
2 0.830 0.819 2.57 70.9 6.81e-12 2 -74.0 156. 162. 191. 29 32
After doing ps matching, I'm running a poisson model like so:
model <- glm(outcome ~ x1 + x2 + x3 ... ,
data = d,
weights = psweights$weights,
family = "poisson")
And then want to create a new data frame with the variable names, coefficients, and upper and lower confidence intervals. Just doing:
d2 <- summary(model)$coef
gets me the variable names, coefficients, standard errors, and z values. What is the easiest way to compute confidence intervals, convert them into columns and bind it all into one data frame?
How about this, using the broom package:
library(broom)
mod <- glm(hp ~ disp + drat + cyl, data=mtcars, family=poisson)
tidy(mod, conf.int=TRUE)
#> # A tibble: 4 × 7
#> term estimate std.error statistic p.value conf.low conf.high
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 2.40 0.196 12.3 1.30e-34 2.02 2.79
#> 2 disp 0.000766 0.000259 2.96 3.07e- 3 0.000258 0.00127
#> 3 drat 0.240 0.0386 6.22 4.89e-10 0.164 0.315
#> 4 cyl 0.236 0.0195 12.1 1.21e-33 0.198 0.274
Created on 2022-06-30 by the reprex package (v2.0.1)
I'm fitting many models like this example:
model<-lm(vl~sex+race+gene1+gene2)
anova(model)
model<-lm(vl~sex+race+gene3+gene4)
anova(model)
model<-lm(vl~sex+race+gene5+gene6)
anova(model)
model<-lm(vl~sex+race+gene7+gene8)
anova(model)
model<-lm(vl~sex+race+gene9+gene10)
anova(model)
I want a function or R package that can extract all the p values from those models and put them in one table. I have so many models that I cannot copy and past every p value. Can you help me please?
Here is an example with the mtcars dataset:
library(tidyquant)
library(tidyverse)
library(broom)
table <- mtcars %>%
mutate(cyl = as_factor(cyl)) %>%
group_by(cyl) %>%
group_split() %>%
map_dfr(.f = function(df) {
lm(mpg ~ am+disp+gear, data = df) %>%
glance() %>%
add_column(cyl = unique(df$cyl), .before = 1)
})
table
Output:
cyl r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 4 0.866 0.808 1.98 15.0 0.00196 3 -20.6 51.2 53.2 27.3 7 11
2 6 0.609 0.218 1.29 1.56 0.362 3 -8.72 27.4 27.2 4.96 3 7
3 8 0.272 0.139 2.38 2.05 0.175 2 -30.3 68.6 71.1 62.1 11 14
Does exist any package which can help me to export results of multinomial logit to excel for example like a table?
The broom package does a reasonable job of tidying multinomial output.
library(broom)
library(nnet)
fit.gear <- multinom(gear ~ mpg + factor(am), data = mtcars)
summary(fit.gear)
Call:
multinom(formula = gear ~ mpg + factor(am), data = mtcars)
Coefficients:
(Intercept) mpg factor(am)1
4 -11.15154 0.5249369 11.90045
5 -18.39374 0.3662580 22.44211
Std. Errors:
(Intercept) mpg factor(am)1
4 5.317047 0.2680456 66.895845
5 67.931319 0.2924021 2.169944
Residual Deviance: 28.03075
AIC: 40.03075
tidy(fit.gear)
# A tibble: 6 x 6
y.level term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 4 (Intercept) 1.44e-5 5.32 -2.10 3.60e- 2
2 4 mpg 1.69e+0 0.268 1.96 5.02e- 2
3 4 factor(am)1 1.47e+5 66.9 0.178 8.59e- 1
4 5 (Intercept) 1.03e-8 67.9 -0.271 7.87e- 1
5 5 mpg 1.44e+0 0.292 1.25 2.10e- 1
6 5 factor(am)1 5.58e+9 2.17 10.3 4.54e-25
Then use the openxlsx package to send that to Excel.
library(openxlsx)
write.xlsx(file="E:/.../fitgear.xlsx", tidy(fit.gear))
(Note that the tidy function exponentiates the coefficients by default, although the help page incorrectly says the default is FALSE). So these are relative risk ratios, which is why they don't match the output of summary. And if you want confidence intervals, you have to ask for them.)
I have a bunch of series to forecast using forecast::auto.arima function. I like to save what type of model did auto.arima fit. If you run the following code:
library(forecast)
set.seed(123)
y <- sin(seq(-pi,pi,0.05))+(rnorm(length(seq(-pi,pi,0.05)))/4)
arima.model <- auto.arima(y)
arima.model
the result of the last line execution shows
Series: y
**ARIMA(1,1,2)**
Coefficients:
ar1 ma1 ma2
0.9594 -1.7285 0.7740
s.e. 0.0380 0.0745 0.0658
sigma^2 estimated as 0.06534: log likelihood=-6.1
AIC=20.2 AICc=20.53 BIC=31.51
How can I capture ARIMA(1,1,2) and save results? I was hoping to do something like arima.model$ and capture what I need to but I could not figure it out.
You can try summary(arima.model), arima.model$coef, arima.model$aic, arima.model$bic.
If you want a tidy format, you can use broom package like this:
library(broom)
tidy(arima.model) #ar/ma terms
glance(arima.model) #information criteria
tidy(arima.model)
# A tibble: 3 x 3
term estimate std.error
<fct> <dbl> <dbl>
1 ar1 0.959 0.0380
2 ma1 -1.73 0.0745
3 ma2 0.774 0.0658
glance(arima.model)
# A tibble: 1 x 4
sigma logLik AIC BIC
<dbl> <dbl> <dbl> <dbl>
1 0.256 -6.10 20.2 31.5