I have the following numeric data frame dataset:
x1 x2 x3 ...
1 2 3
...
I did the following applying shapiro test to all columns
lshap <- lapply(dataset, shapiro.test)
lres <- t(sapply(lshap, `[`, c("statistic","p.value")))
The output of lres looks like this:
statistic p.value
Strong 0.8855107 6.884855e-14
Hardworking 0.9360735 8.031421e-10
Focused 0.9350827 6.421583e-10
Now, when I do:
class(lres)
It gives me "matrix" "array"
My question is how I convert lres to a data frame?
I want this output as a data frame:
variable statistic p.value
Strong 0.8855107 6.884855e-14
Hardworking 0.9360735 8.031421e-10
Focused 0.9350827 6.421583e-10
...
When I do to_df <- as.data.frame(lres) I get the following weird output:
statistic p.value
Strong <dbl [1]> <dbl [1]>
Hardworking <dbl [1]> <dbl [1]>
Focused <dbl [1]> <dbl [1]>
Gritty <dbl [1]> <dbl [1]>
Adaptable <dbl [1]> <dbl [1]>
...
What is wrong with this?
In base R, the issue with OP's 'lres' is that each element is a list element in the matrix. Instead of doing that, we could use
out <- do.call(rbind, lapply(mtcars, function(x)
as.data.frame(shapiro.test(x)[c('statistic', 'p.value')])))
out <- cbind(variable = row.names(out), out)
row.names(out) <- NULL
-output
out
# variable statistic p.value
#1 mpg 0.9475647 1.228814e-01
#2 cyl 0.7533100 6.058338e-06
#3 disp 0.9200127 2.080657e-02
#4 hp 0.9334193 4.880824e-02
#5 drat 0.9458839 1.100608e-01
#6 wt 0.9432577 9.265499e-02
#7 qsec 0.9732509 5.935176e-01
#8 vs 0.6322635 9.737376e-08
#9 am 0.6250744 7.836354e-08
#10 gear 0.7727856 1.306844e-05
#11 carb 0.8510972 4.382405e-04
Or we can use as_tibble
library(dplyr)
library(tidyr)
as_tibble(lres, rownames = 'variable') %>%
unnest(-variable)
-output
# A tibble: 11 x 3
# variable statistic p.value
# <chr> <dbl> <dbl>
# 1 mpg 0.948 0.123
# 2 cyl 0.753 0.00000606
# 3 disp 0.920 0.0208
# 4 hp 0.933 0.0488
# 5 drat 0.946 0.110
# 6 wt 0.943 0.0927
# 7 qsec 0.973 0.594
# 8 vs 0.632 0.0000000974
# 9 am 0.625 0.0000000784
#10 gear 0.773 0.0000131
#11 carb 0.851 0.000438
Or can be done in a single step
library(purrr)
library(broom)
imap_dfr(mtcars, ~ shapiro.test(.x) %>%
tidy %>%
select(-method), .id = 'variable')
-output
# A tibble: 11 x 3
# variable statistic p.value
# <chr> <dbl> <dbl>
# 1 mpg 0.948 0.123
# 2 cyl 0.753 0.00000606
# 3 disp 0.920 0.0208
# 4 hp 0.933 0.0488
# 5 drat 0.946 0.110
# 6 wt 0.943 0.0927
# 7 qsec 0.973 0.594
# 8 vs 0.632 0.0000000974
# 9 am 0.625 0.0000000784
#10 gear 0.773 0.0000131
#11 carb 0.851 0.000438
data
lshap <- lapply(mtcars, shapiro.test)
lres <- t(sapply(lshap, `[`, c("statistic","p.value")))
Related
I am doing a Shapiro Wilks test for multiple variables.
I do this as follows:
list= lapply(mtcars, shapiro.test)
I want to save the outout of list as a .txt file.
I have tried doing this:
write.table(paste(list), "SW List.txt")
That produces this:
When what I want is a .txt file with the variable names, as shown in the console when I run list:
What if instead, you map out all the stats and p values to a dataframe and then save the dataframe to text.
library(tidyverse)
imap_dfr(mtcars,
~ shapiro.test(.x) |>
(\(st) tibble(var = .y,
W = st$statistic,
p.value = st$p.value))())
#> # A tibble: 11 x 3
#> var W p.value
#> <chr> <dbl> <dbl>
#> 1 mpg 0.948 0.123
#> 2 cyl 0.753 0.00000606
#> 3 disp 0.920 0.0208
#> 4 hp 0.933 0.0488
#> 5 drat 0.946 0.110
#> 6 wt 0.943 0.0927
#> 7 qsec 0.973 0.594
#> 8 vs 0.632 0.0000000974
#> 9 am 0.625 0.0000000784
#> 10 gear 0.773 0.0000131
#> 11 carb 0.851 0.000438
for the purposes of this question, let's create the following setup:
mtcars %>%
group_split(carb) %>%
map(select, mpg) -> criterion
mtcars %>%
group_split(carb) %>%
map(select, qsec) -> predictor
This code will create two lists of length 6. What I want to do is to perform 6 linear regressions within each of these 6 groups. I read about the map2 function and I thought that the code should look like this:
map2(criterion, predictor, lm(criterion ~ predictor))
But that doesn't seem to work. So in which way could this be done?
simplify2array (you need a list of vectors, not a list of data frames) and use a lambda-function with ~:
map2(simplify2array(criterion), simplify2array(predictor), ~ lm(.x ~ .y))
While the direct answer to your question is already given, note that we can also use dplyr::nest_by() and then proceed automatically rowwise.
Now your models are stored in the mod column and we can use broom::tidy etc. to work with the models.
library(dplyr)
library(tidyr)
mtcars %>%
nest_by(carb) %>%
mutate(mod = list(lm(mpg ~ qsec, data = data)),
res = list(broom::tidy(mod))) %>%
unnest(res) %>%
filter(term != "(Intercept)")
#> # A tibble: 6 x 8
#> # Groups: carb [6]
#> carb data mod term estimate std.error statistic p.value
#> <dbl> <list<tibble[,10]>> <list> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 [7 x 10] <lm> qsec -1.26 4.51 -0.279 0.791
#> 2 2 [10 x 10] <lm> qsec 0.446 0.971 0.460 0.658
#> 3 3 [3 x 10] <lm> qsec -2.46 2.41 -1.02 0.493
#> 4 4 [10 x 10] <lm> qsec 0.0597 0.991 0.0602 0.953
#> 5 6 [1 x 10] <lm> qsec NA NA NA NA
#> 6 8 [1 x 10] <lm> qsec NA NA NA NA
Created on 2022-09-30 by the reprex package (v2.0.1)
I want to calculate the pair-wise correlations between "mpg" and all other numeric variables of interest for each cyl in the mtcars dataset. I would like to adopt the tidy data principle.
It's rather easy with corrr::correlate().
library(dplyr)
library(tidyr)
library(purrr)
library(corrr)
data(mtcars)
mtcars2 <- mtcars[,1:7] %>%
group_nest(cyl) %>%
mutate(cors = map(data, corrr::correlate),
stretch = map(cors, corrr::stretch)) %>%
unnest(stretch)
mtcars2 %>%
filter(x == "mpg")
By using corrr::correlate(), all available pair-wise correlations have been calculated. I could use dplyr::filter() to select the correlations of interest.
However, when datasets are large, a lot of calculations go to the unwanted correlations, making this approach very time-consuming. So I tried to calculate only mpg vs. others. I'm not very familiar with purrr, and the following code doesn't work.
mtcars2 <- mtcars[,1:7] %>%
group_nest(cyl) %>%
mutate(comp = map(data, ~colnames),
corr = map(comp, ~cor.test(data[["mpg"]], data[[.]])))
If you need to use cor.test, below is an option using broom:
library(broom)
library(tidyr)
library(dplyr)
mtcars[,1:7] %>%
pivot_longer(-c(mpg,cyl)) %>%
group_by(cyl,name) %>%
do(tidy(cor.test(.$mpg,.$value)))
# A tibble: 15 x 10
# Groups: cyl, name [15]
cyl name estimate statistic p.value parameter conf.low conf.high method
<dbl> <chr> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <chr>
1 4 disp -0.805 -4.07 0.00278 9 -0.947 -0.397 Pears…
2 4 drat 0.424 1.41 0.193 9 -0.236 0.816 Pears…
3 4 hp -0.524 -1.84 0.0984 9 -0.855 0.111 Pears…
4 4 qsec -0.236 -0.728 0.485 9 -0.732 0.424 Pears…
5 4 wt -0.713 -3.05 0.0137 9 -0.920 -0.198 Pears…
6 6 disp 0.103 0.232 0.826 5 -0.705 0.794 Pears…
7 6 drat 0.115 0.258 0.807 5 -0.699 0.799 Pears…
If you just need the correlation, for big datasets, the nesting etc might be costly and unnecessary because you can simply do cor(,) and melt that:
#define columns to correlate
cor_vars = setdiff(colnames(mtcars)[1:7],"cyl")
split(mtcars[,1:7],mtcars$cyl) %>%
map_dfr(~data.frame(x="mpg",y=cor_vars,
cyl=unique(.x$cyl),rho=as.numeric(cor(.x$mpg,.x[,cor_vars]))))
x y cyl rho
1 mpg mpg 4 1.00000000
2 mpg disp 4 -0.80523608
3 mpg hp 4 -0.52350342
4 mpg drat 4 0.42423947
5 mpg wt 4 -0.71318483
6 mpg qsec 4 -0.23595389
7 mpg mpg 6 1.00000000
8 mpg disp 6 0.10308269
9 mpg hp 6 -0.12706785
10 mpg drat 6 0.11471598
11 mpg wt 6 -0.68154982
12 mpg qsec 6 -0.41871779
13 mpg mpg 8 1.00000000
14 mpg disp 8 -0.51976704
15 mpg hp 8 -0.28363567
16 mpg drat 8 0.04793248
17 mpg wt 8 -0.65035801
18 mpg qsec 8 -0.10433602
Would this work for you? I have done this in the past but on smallish datasets and have not bench marked it so not sure of performance. I use pivot_longer to reshape the data prior to nesting. The variables you pass essentially work as the filtering step, sort of
mtcars2 <- mtcars[,1:7] %>%
pivot_longer(c(-mpg, -cyl), names_to = "y.var", values_to = "value" ) %>%
group_nest(cyl, y.var) %>%
mutate(x.var = "mpg", #just so you can see this in the output
cor = map_dbl(data, ~ {cor <- cor.test(.x$mpg, .x$value)
cor$estimate})) %>%
select(data, cyl, x.var , y.var, cor) %>%
arrange(cyl, y.var)
I have a list of data.frame and I'd like to run cor.test through each data.frame.
The data.frame has 8 columns, I would like to run cor.test for each of the first 7 columns against the 8th column.
I first set up the lists for storing the data
estimates = list()
pvalues = list()
Then here's the loop combining with lapply
for (i in 1:7){
corr <- lapply(datalist, function(x) {cor.test(x[,i], x[,8], alternative="two-sided", method="spearman", exact=FALSE, continuity=TRUE)})
estimates= corr$estimate
pvalues= corr$p.value
}
It ran without any errors but the estimates shows NULL
Which part of this went wrong? I used to run for loop over cor.test or run is with lapply, never put them together. I wonder if there's a solution to this or an alternative. Thank you.
We can use sapply, showing with an example on mtcars where cor.test is performed with all columns against the first column.
lst <- list(mtcars, mtcars)
lapply(lst, function(x) t(sapply(x[-8], function(y) {
val <- cor.test(y, x[[8]], alternative ="two.sided",
method="spearman", exact=FALSE, continuity=TRUE)
c(val$estimate, pval = val$p.value)
})))
[[1]]
# rho pval
#mpg 0.7065968 6.176953e-06
#cyl -0.8137890 1.520674e-08
#disp -0.7236643 2.906504e-06
#hp -0.7515934 7.247490e-07
#drat 0.4474575 1.021422e-02
#wt -0.5870162 4.163577e-04
#qsec 0.7915715 6.843882e-08
#am 0.1683451 3.566025e-01
#gear 0.2826617 1.168159e-01
#carb -0.6336948 9.977275e-05
#[[2]]
# rho pval
#mpg 0.7065968 6.176953e-06
#cyl -0.8137890 1.520674e-08
#.....
This returns you list of two column matrix with estimate and p.value respectively.
Disclaimer: This answer uses the developer version of manymodelr that I also wrote.
EDIT: You can map it to your list of data frames with Map or lapply for instance:
lst <- list(mtcars, mtcars) #Line copied and pasted from #Ronak Shah's answer
Map(function(x) manymodelr::get_var_corr(x, "mpg",get_all = TRUE,
alternative="two.sided",
method="spearman",
continuity=TRUE,exact=F),lst)
For a single data.frame object, we can use get_var_corr:
manymodelr::get_var_corr(mtcars, "mpg",get_all = TRUE,
alternative="two.sided",
method="spearman",
continuity=TRUE,exact=FALSE)
# Comparison_Var Other_Var p.value Correlation
# 1 mpg cyl 4.962301e-13 -0.9108013
# 2 mpg disp 6.731078e-13 -0.9088824
# 3 mpg hp 5.330559e-12 -0.8946646
# 4 mpg drat 5.369227e-05 0.6514555
# 5 mpg wt 1.553261e-11 -0.8864220
# 6 mpg qsec 7.042244e-03 0.4669358
# 7 mpg vs 6.176953e-06 0.7065968
# 8 mpg am 8.139885e-04 0.5620057
# 9 mpg gear 1.325942e-03 0.5427816
# 10 mpg carb 4.385340e-05 -0.6574976
purrr has some convenience functions could possibly make this operation a little more simple (although its debatable whether this is actually simpler than the Map/lapply way). Using Ronak's example list lst:
library(purrr)
lst <- list(mtcars, mtcars)
map2(map(lst, ~.[-8]), map(lst, 8), ~
map(.x, cor.test, y = .y,
alternative = "two.sided",
method = "spearman",
exact = FALSE,
continuity = TRUE) %>%
map_dfr(extract, c('estimate', 'p.value'), .id = 'var'))
# [[1]]
# # A tibble: 10 x 3
# var estimate p.value
# <chr> <dbl> <dbl>
# 1 mpg 0.707 0.00000618
# 2 cyl -0.814 0.0000000152
# 3 disp -0.724 0.00000291
# 4 hp -0.752 0.000000725
# 5 drat 0.447 0.0102
# 6 wt -0.587 0.000416
# 7 qsec 0.792 0.0000000684
# 8 am 0.168 0.357
# 9 gear 0.283 0.117
# 10 carb -0.634 0.0000998
#
# [[2]]
# # A tibble: 10 x 3
# var estimate p.value
# <chr> <dbl> <dbl>
# 1 mpg 0.707 0.00000618
# 2 cyl -0.814 0.0000000152
# 3 disp -0.724 0.00000291
# 4 hp -0.752 0.000000725
# 5 drat 0.447 0.0102
# 6 wt -0.587 0.000416
# 7 qsec 0.792 0.0000000684
# 8 am 0.168 0.357
# 9 gear 0.283 0.117
# 10 carb -0.634 0.0000998
I have created a loop which creates all possible model combinations for a given data set. There are 63 possible models and I need to put them into a tibble with model number, subset of explanatory variables, model formula, and outcome (specifically r-squared value).
Cols <- names(finalprojectdata3)
Cols <- Cols[! Cols %in% 'debt']
n <- length(Cols)
id <- unlist(
lapply(1:n,
function(i)combn(1:n, i, simplify = FALSE)
),
recursive = FALSE)
Formulas <- sapply(id, function(i)
paste('debt~', paste(Cols[i],collapse="+")))
models <- lapply(Formulas, function(i)
summary(lm(as.formula(i), data = finalprojectdata3)))
models
The output is a the summaries for each model but I need it in a easy to read tibble.
It's not perfectly clear how you want your output, but here's a suggested path, following tidyr nested objects.
Make some fake data, following your methodology above:
dat <- mtcars[,1:5]
Cols <- names(dat)
Cols <- Cols[! Cols %in% 'mpg']
n <- length(Cols)
id <- unlist(
lapply(1:n,
function(i)combn(1:n, i, simplify = FALSE)
),
recursive = FALSE)
str(id)
# List of 15
# $ : int 1
# $ : int 2
# $ : int 3
# $ : int 4
# $ : int [1:2] 1 2
# $ : int [1:2] 1 3
# $ : int [1:2] 1 4
# $ : int [1:2] 2 3
# $ : int [1:2] 2 4
# $ : int [1:2] 3 4
# $ : int [1:3] 1 2 3
# $ : int [1:3] 1 2 4
# $ : int [1:3] 1 3 4
# $ : int [1:3] 2 3 4
# $ : int [1:4] 1 2 3 4
Formulas <- sapply(id, function(i)
paste('mpg ~', paste(Cols[i], collapse=" + ")))
head(Formulas)
# [1] "mpg ~ cyl" "mpg ~ disp" "mpg ~ hp" "mpg ~ drat"
# [5] "mpg ~ cyl + disp" "mpg ~ cyl + hp"
This is where I diverge from your path.
library(dplyr)
library(tidyr)
library(purrr)
x <- data_frame(Formulas) %>%
mutate(
lms = map(Formulas, ~ lm(as.formula(.), data = dat)),
summaries = map(lms, ~ summary(.)),
coefs = map(summaries, ~ as.data.frame(coef(.)))
)
x
# # A tibble: 15 × 4
# Formulas lms summaries coefs
# <chr> <list> <list> <list>
# 1 mpg ~ cyl <S3: lm> <S3: summary.lm> <data.frame [2 × 4]>
# 2 mpg ~ disp <S3: lm> <S3: summary.lm> <data.frame [2 × 4]>
# 3 mpg ~ hp <S3: lm> <S3: summary.lm> <data.frame [2 × 4]>
# 4 mpg ~ drat <S3: lm> <S3: summary.lm> <data.frame [2 × 4]>
# 5 mpg ~ cyl + disp <S3: lm> <S3: summary.lm> <data.frame [3 × 4]>
# 6 mpg ~ cyl + hp <S3: lm> <S3: summary.lm> <data.frame [3 × 4]>
# 7 mpg ~ cyl + drat <S3: lm> <S3: summary.lm> <data.frame [3 × 4]>
# 8 mpg ~ disp + hp <S3: lm> <S3: summary.lm> <data.frame [3 × 4]>
# 9 mpg ~ disp + drat <S3: lm> <S3: summary.lm> <data.frame [3 × 4]>
# 10 mpg ~ hp + drat <S3: lm> <S3: summary.lm> <data.frame [3 × 4]>
# 11 mpg ~ cyl + disp + hp <S3: lm> <S3: summary.lm> <data.frame [4 × 4]>
# 12 mpg ~ cyl + disp + drat <S3: lm> <S3: summary.lm> <data.frame [4 × 4]>
# 13 mpg ~ cyl + hp + drat <S3: lm> <S3: summary.lm> <data.frame [4 × 4]>
# 14 mpg ~ disp + hp + drat <S3: lm> <S3: summary.lm> <data.frame [4 × 4]>
# 15 mpg ~ cyl + disp + hp + drat <S3: lm> <S3: summary.lm> <data.frame [5 × 4]>
I did this piece-wise, keeping the models and the summaries, primarily for demonstration and in case you re-use lm (perhaps for predict). If you know you never need to keep the raw lm output, you could combine them into a single function call.
I believe you are asking for a data.frame of the coefficients, in which case:
x$summaries[[1]]
# Call:
# lm(formula = as.formula(.), data = dat)
# Residuals:
# Min 1Q Median 3Q Max
# -4.9814 -2.1185 0.2217 1.0717 7.5186
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 37.8846 2.0738 18.27 < 2e-16 ***
# cyl -2.8758 0.3224 -8.92 6.11e-10 ***
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Residual standard error: 3.206 on 30 degrees of freedom
# Multiple R-squared: 0.7262, Adjusted R-squared: 0.7171
# F-statistic: 79.56 on 1 and 30 DF, p-value: 6.113e-10
coef(x$summaries[[1]])
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 37.88458 2.0738436 18.267808 8.369155e-18
# cyl -2.87579 0.3224089 -8.919699 6.112687e-10
Unfortunately, if you try to combine all of these coefficient summaries into a single data.frame, the row names are lost in dplyr::bind_rows:
bind_rows(map(x$summaries[1:2], ~ as.data.frame(coef(.))))
# Estimate Std. Error t value Pr(>|t|)
# 1 37.88457649 2.073843606 18.267808 8.369155e-18
# 2 -2.87579014 0.322408883 -8.919699 6.112687e-10
# 3 29.59985476 1.229719515 24.070411 3.576586e-21
# 4 -0.04121512 0.004711833 -8.747152 9.380327e-10
One could always use base R, though you are lacking the "which model" component:
do.call(rbind.data.frame, map(x$summaries[1:2], ~ as.data.frame(coef(.))))
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 37.88457649 2.073843606 18.267808 8.369155e-18
# cyl -2.87579014 0.322408883 -8.919699 6.112687e-10
# (Intercept)1 29.59985476 1.229719515 24.070411 3.576586e-21
# disp -0.04121512 0.004711833 -8.747152 9.380327e-1
We can re-introduce that by using tibble::rownames_to_column in the original pipeline:
x <- data_frame(Formulas) %>%
mutate(
lms = map(Formulas, ~ lm(as.formula(.), data = dat)),
summaries = map(lms, ~ summary(.)),
coefs = map(summaries, ~ tibble::rownames_to_column(as.data.frame(coef(.))))
)
select(x, Formulas, coefs) %>% unnest()
# # A tibble: 47 × 6
# Formulas rowname Estimate `Std. Error` `t value` `Pr(>|t|)`
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 mpg ~ cyl (Intercept) 37.88457649 2.073843606 18.267808 8.369155e-18
# 2 mpg ~ cyl cyl -2.87579014 0.322408883 -8.919699 6.112687e-10
# 3 mpg ~ disp (Intercept) 29.59985476 1.229719515 24.070411 3.576586e-21
# 4 mpg ~ disp disp -0.04121512 0.004711833 -8.747152 9.380327e-10
# 5 mpg ~ hp (Intercept) 30.09886054 1.633920950 18.421246 6.642736e-18
# 6 mpg ~ hp hp -0.06822828 0.010119304 -6.742389 1.787835e-07
# 7 mpg ~ drat (Intercept) -7.52461844 5.476662574 -1.373942 1.796391e-01
# 8 mpg ~ drat drat 7.67823260 1.506705108 5.096042 1.776240e-05
# 9 mpg ~ cyl + disp (Intercept) 34.66099474 2.547003876 13.608536 4.022869e-14
# 10 mpg ~ cyl + disp cyl -1.58727681 0.711844271 -2.229809 3.366495e-02
# # ... with 37 more rows
Consider staying in base R by adjusting your last lapply call to return dataframes:
df_list <- lapply(seq_along(Formulas), function(i) {
mod <- summary(lm(as.formula(Formulas[[i]]), data = finalprojectdata3))
data.frame(model_num = i,
formula = Formulas[[i]],
r2 = mod$r.squared,
adjr2 = mod$adj.r.squared
)
})
final_df <- do.call(rbind, df_list)
final_tibble <- as_data_frame(finaldf) # requires tidyverse
Using mtcars (borrowing from #r2evans's reproducible example)
final_tibble
# A tibble: 15 x 4
# model_num formula r2 adjr2
# * <int> <fctr> <dbl> <dbl>
# 1 1 mpg ~ cyl 0.7261800 0.7170527
# 2 2 mpg ~ disp 0.7183433 0.7089548
# 3 3 mpg ~ hp 0.6024373 0.5891853
# 4 4 mpg ~ drat 0.4639952 0.4461283
# 5 5 mpg ~ cyl + disp 0.7595658 0.7429841
# 6 6 mpg ~ cyl + hp 0.7407084 0.7228263
# 7 7 mpg ~ cyl + drat 0.7402482 0.7223343
# 8 8 mpg ~ disp + hp 0.7482402 0.7308774
# 9 9 mpg ~ disp + drat 0.7310094 0.7124583
# 10 10 mpg ~ hp + drat 0.7411716 0.7233214
# 11 11 mpg ~ cyl + disp + hp 0.7678877 0.7430186
# 12 12 mpg ~ cyl + disp + drat 0.7650941 0.7399256
# 13 13 mpg ~ cyl + hp + drat 0.7693992 0.7446920
# 14 14 mpg ~ disp + hp + drat 0.7750131 0.7509073
# 15 15 mpg ~ cyl + disp + hp + drat 0.7825119 0.7502914