Extract restricted regression coefficients without using tidy - r

I'm using the restriktor package to perform restricted regressions; however, at the same time I'm doing the restricted regressions by group using the dplyr. In order to extract the coefficients and have them formatted into a nice panel format, I use tidy and broom but the tidy packaged doesn't work on the restriktor so I'm not sure how to go about extracting the coefficients:
library(restriktor)
library(dplyr)
reg =
mtcars %>%
group_by(cyl) %>%
do(model = restriktor(lm(mpg ~ wt + hp, data =.), constraints = ' wt < -4 '))
I would like to have the b.restr which is the restricted model coefficients to be extracted for each group and formatted together into a panel normally I would use the following:
reg =
mtcars %>%
group_by(cyl) %>%
do({model = restriktor(lm(mpg ~ wt + hp, data =.), constraints = ' wt < -4 ') # create your model
data.frame(tidy(model), # get coefficient info
glance(model))})
But I get the following error:
Error: No tidy method for objects of class restriktor
All I want is to extract the following elements from the lists and put them altogether with their group identifier in one panel format:
reg[[2]][[1]][["b.restr"]]

Use group_modify (which is preferred over do now) with coef/as.list/as_tibble.
library(dplyr)
library(restriktor)
# coefficients and R2's or NAs if too few rows for restriktor
co <- function(fo, data) {
fm <- lm(fo, data)
coef_lm <- coef(fm)
min_rows <- length(coef_lm)
if (nrow(data) <= min_rows) NA * c(coef_lm, R2.org = NA, R2.reduced = NA)
else {
r <- restriktor(fm, constraints = ' wt < -4 ')
c(coef(r), R2.org = r$R2.org, R2.reduced = r$R2.reduced)
}
}
mtcars %>%
group_by(cyl) %>%
group_modify(~ {
.x %>%
co(mpg ~ wt + hp, .) %>%
as.list %>%
as_tibble
}) %>%
ungroup
giving:
tibble: 3 x 6
cyl `(Intercept)` wt hp R2.org R2.reduced
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 4 45.8 -5.12 -0.0905 0.681 0.681
2 6 35.3 -4 -0.0256 0.589 0.667
3 8 33.9 -4 -0.0132 0.497 0.652

Related

Extracting the coefficients of each model by using a series of map functions

I am creating the following model:
models <- mtcars %>%
split(.$cyl) %>%
map(function(df) lm(mpg ~ wt, data = df))
Based on the results you get from that, I am trying to extract the coefficients by using a series of map functions.
The results should look like this:
4 6 8
-5.647025 -2.780106 -2.192438
I am pulling my hair out trying to figure this out. Any help is appreciated.
You can use map_dbl with the coef function to pick out the "wt" coefficients:
coefs <- mtcars %>%
split(.$cyl) %>%
map(function(df) lm(mpg ~ wt, data = df)) %>%
map_dbl(~coef(.)[["wt"]])
It looks like
coefs <- (mtcars
%>% split(.$cyl)
%>% map(lm, formula = mpg~wt)
%>% map_dbl(~coef(.)[["wt"]])
)
should do what you want? If you want to get more information, ending with map_dfr(broom::tidy) instead of the map_dbl will be helpful (you can use the .id= argument too, although this is less useful when the list doesn't have named arguments).
This is very similar to #henryn's answer, although the map syntax (using the named formula argument means that the data get substituted as the next argument implicitly, so you don't have to use an anonymous function function(df) lm(mpg ~ wt, data = df) or (with R >= 4.1.0) \(df) lm(mpg ~ wt, data = df): I think the usual way of doing this, ~ lm(mpg ~ wt, data = .) might get messed up by the tilde in the formula, but I'm nto sure ...
Does this work:
mtcars %>% split(.$cyl) %>% map(function(x) {
c = lm(mpg ~ wt, data = x)
c$coefficients[2]
}) %>% unlist
4.wt 6.wt 8.wt
-5.647025 -2.780106 -2.192438
1) This could be done in straight dplyr:
mtcars %>%
group_by(cyl) %>%
summarize(wt = coef(lm(mpg ~ wt))[[2]], .groups = "drop")
giving:
# A tibble: 3 x 2
cyl wt
<dbl> <dbl>
1 4 -5.65
2 6 -2.78
3 8 -2.19
2) This variation also works:
mtcars %>%
group_by(cyl) %>%
summarize(wt = cov(mpg, wt) / var(wt), .groups = "drop")
3) Also consider this -- omit the [2] to get both coefficients.
library(nlme)
coef(lmList(mpg ~ wt | cyl, mtcars))[2]
giving:
wt
4 -5.647025
6 -2.780106
8 -2.192438

fit an `lm` model for every level of a factor

I am trying to write a function that iterates (or uses purrr::map()) through every level of a factor, and fits an lm() model for the subset of the data where the factor is equal to that level.
To make a simple reproducable example with mtcars, just say that I'd like a different lm model for each value of mtcars$gear. I'll start by making it a factor, because my real problem involves iteration through a factor:
library(tidyverse)
mtcars <- mtcars %>%
mutate(factor_gear = factor(gear))
I'd like the function to fit every level of factor_gear. The levels are given by:
levels(mtcars$factor_gear)
i.e.
[1] "3" "4" "5"
So the output I would be looking for would be:
fit1 <- lm(mpg ~ cyl, data = mtcars %>% filter(factor_gear=="3"))
fit2 <- lm(mpg ~ cyl, data = mtcars %>% filter(factor_gear=="4"))
fit3 <- lm(mpg ~ cyl, data = mtcars %>% filter(factor_gear=="5"))
fits <- list(fit1, fit2, fit3)
I've made a start on the function, but wasn't able to get it to work.
I thought that a function should:
get every level of of the factor into a vector
run an lm model for each level.
fit_each_level <- function(factor_variable) {
# trying to: 1. get every level of of the factor into a vector
factor_levels <- levels(df_cars$factor_variable)
# trying to: 2. run an lm model for each level.
for i in factor_levels {
fit <- mtcars %>% filter(factor_variable==i [# every value of segment_levels]) %>%
lm(mpg ~ cyl, data = . )
}
}
fit_each_level(factor_gear)
If the function worked well, I'd ultimately be able to do do it on another factor, eg:
mtcars <- mtcars %>%
mutate(factor_carb = factor(carb))
fit_each_level(factor_carb)
You can nest the dataframe and use map to apply lm for each factor_gear.
library(dplyr)
mtcars %>%
group_by(factor_gear) %>%
tidyr::nest() %>%
mutate(model = map(data, ~lm(mpg ~ cyl, data = .x)))
# factor_gear data model
# <fct> <list> <list>
#1 4 <tibble [12 × 11]> <lm>
#2 3 <tibble [15 × 11]> <lm>
#3 5 <tibble [5 × 11]> <lm>
In the new dplyr you can use cur_data to refer to current data in group which avoids the need of nest and map.
mtcars %>%
group_by(factor_gear) %>%
summarise(model = list(lm(mpg ~ cyl, data = cur_data())))
Make sure you have the latest version of dplyr (1.0.0). Then you can use:
model_coefs <- function(formula, data) {
coefs <- lm(formula, data)$coefficients
data.frame(coef = names(coefs), value = coefs)
}
mtcars %>%
dplyr::mutate(factor_gear = factor(gear)) %>%
dplyr::nest_by(factor_gear) %>%
dplyr::summarise(model_coefs(mpg ~ cyl, data)) %>%
tidyr::pivot_wider(names_from = coef, values_from = value)
# A tibble: 3 x 3
# Groups: factor_gear [3]
factor_gear `(Intercept)` cyl
<fct> <dbl> <dbl>
1 3 29.8 -1.83
2 4 41.3 -3.59
3 5 40.6 -3.2

Pass a list of variable names to a function using {{foo}}

Problem
I would like to know how to pass a list of variable names to a purrr::map2 function for the purpose of iterating over a separate data frame.
The input_table$key variable below contains mpg and disp from the mtcars dataset. I think the names of the variables are being passed as character strings rather than variable names. The question is how I can change that so that my function recognises that they are variable names(?).
In this example I am trying to sum all of the values in the mtcars variables mpg and disp that fall below a set of numeric thresholds. Those variables from mtcars and the relevant thresholds are contained in input_table (below).
Ideal result
percentile key value sum_y
<fct> <chr> <dbl> <dbl>
1 0.5 mpg 19.2 266.5
2 0.9 mpg 30.1 515.8
3 0.99 mpg 33.4 609.0
4 1 mpg 33.9 642.9
5 ... ... ... ...
Attempt
library(dplyr)
library(purrr)
library(tidyr)
# Arrange a generic example
# Replicating my data structure
input_table <- mtcars %>%
as_tibble() %>%
select(mpg, disp) %>%
map_df(quantile, probs = c(0.5, 0.90, 0.99, 1)) %>%
mutate(
percentile = factor(c(0.5, 0.90, 0.99, 1))
) %>%
select(
percentile, mpg, disp
) %>%
gather(key, value, -percentile)
# Defining the function
test_func <- function(label_desc, threshold) {
mtcars %>%
select({{label_desc}}) %>%
filter({{label_desc}} <= {{threshold}}) %>%
summarise(
sum_y = sum(as.numeric({{label_desc}}), na.rm = T)
)
}
# Demo'ing that it works for a single variable and threshold value
test_func(label_desc = mpg, threshold = 19.2)
# This is where I am having trouble
# Trying to iterate over multiple (mpg, disp) variables
map2(input_table$key, input_table$value, ~test_func(label_desc = .x, threshold = .y))
The issue is curly-curly ({{}}) is used for unquoted variables as you are using in your first attempt. In your second attempt you are passing quoted variables to which the curly-curly operator does not work. A simple fix would be to use _at variants of dplyr which accepts quoted arguments.
test_func <- function(label_desc, threshold) {
mtcars %>%
filter_at(label_desc, any_vars(. <= threshold)) %>%
summarise_at(label_desc, sum)
}
purrr::map2(input_table$key, input_table$value, test_func)
#[[1]]
# mpg
#1 266.5
#[[2]]
# mpg
#1 515.8
#[[3]]
# mpg
#1 609
#[[4]]
# mpg
#1 642.9
#[[5]]
# disp
#1 1956.7
#.....

ANOVA F statistic value in dplyr group by

I can summarize the mean by groups using
t(mtcars %>%
group_by(gear) %>%
dplyr::summarize(Mean_Mpg = mean(mpg, na.rm=TRUE),
StdD_Mpg = sd(mpg, na.rm=TRUE)
))
gear 3 4 5
Mean_Mpg 16.106667 24.533333 21.380000
StdD_Mpg 3.371618 5.276764 6.658979
I know summary(aov(gear ~ mpg , mtcars)) will output the results from ANOVA test includign the F Statistic.
Df Sum Sq Mean Sq F value Pr(>F)
mpg 1 3.893 3.893 8.995 0.0054 **
Residuals 30 12.982 0.433
Also chisq.test(table(mtcars$gear,mtcars$carb)) will output the results from Chi.Square test.
Pearson's Chi-squared test
X-squared = 16.518, df = 10, p-value = 0.08573
What I am trying to do is produce an output like this below, where I am combining the mean, standard deviation and F Statistic value from ANOVA, X-Squared test statistic.
gear 3 4 5 Test-Statistic Test
Mpg (Mean) 16.106667 24.533333 21.380000 8.995 ANOVA
(StdD) 3.371618 5.276764 6.658979
Carb(N) 16.518 Chi.Square
3 4 0
4 4 2
3 0 0
5 4 1
0 0 1
0 0 1
I am not sure how to do put together a table like this this by combining the mean,standard deviation, F Statistic, Chiq.Square statistic values etc. I would welcome any help from the community on formatting the results like this.
One option is to think about all the results you want, and how to manipulate them in order to have a same structure. Then, use bind_rows() for instance, to gather all results in a same table.
The functions group_by() and summarise() able to calculate mean (and others) for severals variables (and the result is a data.frame), whereas the function apply() allow to apply a same function, or a combinaison of functions (like summary(aov(...))) to several variables. The result of the second is a vector.
library(tidyverse)
# mean (± sd) of x per group
mtcars %>%
group_by(gear) %>%
summarise_at(
vars(mpg, carb),
funs(paste0(round(mean(.), 2), '(±', round(sd(.) / sqrt(n()), 1), ')'))
) %>%
mutate(gear = as.character(gear)) %>%
# add ANOVA: gear ~ x
bind_rows(
c(gear = 'ANOVA',
apply(mtcars %>% select(mpg, carb), 2,
function(x) summary(aov(mtcars$gear ~ x))[[1]]$`F value`[1] %>% round(3) %>% as.character()
))
) %>%
# add Chi-Square: gear ~ x
bind_rows(
c(gear = 'CHI-SQUARE',
apply(mtcars %>% select(mpg, carb), 2,
function(x) chisq.test(table(mtcars$gear, x))$statistic %>% round(3) %>% as.character()
))
)
# # A tibble: 5 x 3
# gear mpg carb
# <chr> <chr> <chr>
# 1 3 16.11(±0.9) 2.67(±0.3)
# 2 4 24.53(±1.5) 2.33(±0.4)
# 3 5 21.38(±3) 4.4(±1.2)
# 4 ANOVA 8.995 2.436
# 5 CHI-SQUARE 54.667 16.518

Using dplyr() to retrieve model object created via group_by() and do()

I'm trying to use dplyr and the pipe operator (%>%) to retrieve model objects stored in a dataframe.
With example data
library(dplyr)
set.seed(256)
dat <-
data.frame(x = rnorm(100),
y = rnorm(100, 10),
spec = sample(c("1", "2"), 100, TRUE)) %>%
group_by(spec) %>%
do(lm = lm(y ~ x, data = .))
I can subset and retrieve an actual model object
> dat$lm[dat$spec == "1"][[1]]
Call:
lm(formula = y ~ x, data = .)
Coefficients:
(Intercept) x
9.8171 -0.2292
> dat$lm[dat$spec == "1"][[1]] %>% class()
[1] "lm
But I think this is an inelegant way of retrieving the lm() model object contained therein, especially given that the rest of my code is structured the "dplyr way". I'd like to use dplyr but I can't figure out how. For example, using
dat %>% filter(spec == "1") %>% select(lm)
doesn't work as it returns
Source: local data frame [1 x 1]
Groups: <by row>
# A tibble: 1 x 1
lm
<list>
1 <S3: lm>
and
dat %>% filter(spec == "1") %>% .$lm
only gets me to the first object in list, e.g.,
> dat %>% filter(spec == "1") %>% .$lm
[[1]]
Call:
lm(formula = y ~ x, data = .)
Coefficients:
(Intercept) x
10.01495 -0.07438
I can't figure out a way to get to the actual model object in the dat with dplyr. Certainly, I could use broom and tidy() to condense everything
library(broom)
tidy(dat, lm)
but this still doesn't return the actual model object:
> tidy(dat, lm)
# A tibble: 4 x 6
# Groups: spec [2]
spec term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 (Intercept) 10.0 0.120 83.3 1.91e-54
2 1 x - 0.0744 0.111 - 0.671 5.05e- 1
3 2 (Intercept) 9.86 0.131 75.0 1.42e-50
4 2 x - 0.0793 0.148 - 0.535 5.95e- 1
I can even use dplyr to summarise() the output from a do() call and retrieve the coefficients from the models, but this still doesn't give me the model object itself:
dat %>%
select(spec) %>%
bind_cols(dat %>%
summarize(lm_i = coefficients(lm)[[1]],
lm_s = coefficients(lm)[[2]]))
Is there a dplyr way to retrieve the actual model object from models created with do()?
do returns a list column, so to extract its individual elements, you need to use list subsetting. There are various ways to do that, but in the tidyverse, purrr::pluck is a nice option to extract a single [possibly deeply nested] element:
library(tidyverse)
dat %>% pluck('lm', 1)
#>
#> Call:
#> lm(formula = y ~ x, data = .)
#>
#> Coefficients:
#> (Intercept) x
#> 10.01495 -0.07438
It's mostly equivalent to [[ subsetting, i.e.
dat[['lm']][[1]]
To get what you have to work, you need to keep subsetting, as .$lm returns the list column, which in this case is a list of a model. .[[1]] (akin to the 1 above) extracts the model from the list:
dat %>% filter(spec == "1") %>% .$lm %>% .[[1]]
or a hybrid approach, if you like:
dat %>% filter(spec == "1") %>% pluck('lm', 1)
or use pull to extract the column with NSE semantics:
dat %>% filter(spec == "1") %>% pull(lm) %>% pluck(1)
All return the same thing.

Resources