Error in Grouped regression using map function - r

Grouped regression is running well model1 with "do". But recently, it is told that do is superseded and suggested to use "across" but no example is given in the help file. Model2 is given in "do" help, and it is running well without "map" or "across". I don't understand how the regression is looping over those groups without map. When I tried using map in model3, I am getting errors. Model4 is given in Hadley's book, R for data science using split and working well. How to tell map function to consider the list "data". Any suggestions?
library(purrr)
#> Warning: package 'purrr' was built under R version 3.6.3
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.6.3
#> Warning: package 'ggplot2' was built under R version 3.6.3
#> Warning: package 'tidyr' was built under R version 3.6.3
#> Warning: package 'dplyr' was built under R version 3.6.3
#> Warning: package 'stringr' was built under R version 3.6.3
#> Warning: package 'forcats' was built under R version 3.6.3
model1 = mtcars %>%
group_by(cyl) %>%
do(mod = lm(mpg ~ disp, data = .))
model1
#> # A tibble: 3 x 2
#> # Rowwise:
#> cyl mod
#> <dbl> <list>
#> 1 4 <lm>
#> 2 6 <lm>
#> 3 8 <lm>
## from "do" help file
model2 = mtcars %>%
nest_by(cyl) %>%
mutate(mod = list(lm(mpg ~ disp, data = data)))
model2
#> # A tibble: 3 x 3
#> # Rowwise: cyl
#> cyl data mod
#> <dbl> <list<tbl_df[,10]>> <list>
#> 1 4 [11 x 10] <lm>
#> 2 6 [7 x 10] <lm>
#> 3 8 [14 x 10] <lm>
## using map
model3 = mtcars %>% nest_by(cyl) %>%
mutate(fit = map(data, ~lm(mpg ~ disp, data = .)))
#> Error: Problem with `mutate()` input `fit`.
#> x numeric 'envir' arg not of length one
#> i Input `fit` is `map(data, ~lm(mpg ~ disp, data = .))`.
#> i The error occured in row 1.
##model4
model4 = mtcars %>%
split(.$cyl) %>%
map(~lm(mpg ~ disp, data = .))
model4
#> $`4`
#>
#> Call:
#> lm(formula = mpg ~ disp, data = .)
#>
#> Coefficients:
#> (Intercept) disp
#> 40.8720 -0.1351
#>
#>
#> $`6`
#>
#> Call:
#> lm(formula = mpg ~ disp, data = .)
#>
#> Coefficients:
#> (Intercept) disp
#> 19.081987 0.003605
#>
#>
#> $`8`
#>
#> Call:
#> lm(formula = mpg ~ disp, data = .)
#>
#> Coefficients:
#> (Intercept) disp
#> 22.03280 -0.01963
Created on 2020-08-02 by the reprex package (v0.3.0)

It could be an issue with rowwise attribute, we could ungroup
library(dplyr)
library(purrr)
mtcars %>%
nest_by(cyl) %>% # // creates the rowwise attribute
ungroup %>% # // remove the rowwise
mutate(fit = map(data, ~lm(mpg ~ disp, data = .)))
# A tibble: 3 x 3
# cyl data fit
# <dbl> <list<tbl_df[,10]>> <list>
#1 4 [11 × 10] <lm>
#2 6 [7 × 10] <lm>
#3 8 [14 × 10] <lm>

Related

Mean of a variable after nesting the dataframe

I am trying to find the mean of the variable disp in mtcars dataset after nesting it by cyl. I am able to get the result after nest_by but not with group_nest. Please explain what the rowwise is doing it differently here.
require(tidyverse)
# working
mtcars %>% nest_by(cyl) %>% mutate(avg = mean(data$disp))
#> # A tibble: 3 × 3
#> # Rowwise: cyl
#> cyl data avg
#> <dbl> <list<tibble[,10]>> <dbl>
#> 1 4 [11 × 10] 105.
#> 2 6 [7 × 10] 183.
#> 3 8 [14 × 10] 353.
# not working
mtcars %>% group_nest(cyl) %>%
mutate(avg = mean(data$disp))
#> Error in `mutate()`:
#> ! Problem while computing `avg = mean(data$disp)`.
#> Caused by error:
#> ! Corrupt x: no names
#> Backtrace:
#> ▆
#> 1. ├─mtcars %>% group_nest(cyl) %>% mutate(avg = mean(data$disp))
#> 2. ├─dplyr::mutate(., avg = mean(data$disp))
#> 3. ├─dplyr:::mutate.data.frame(., avg = mean(data$disp))
#> 4. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), caller_env = caller_env())
#> 5. │ ├─base::withCallingHandlers(...)
#> 6. │ └─mask$eval_all_mutate(quo)
#> 7. ├─base::mean(data$disp)
#> 8. ├─data$disp
#> 9. ├─vctrs:::`$.vctrs_list_of`(data, disp)
#> 10. └─base::.handleSimpleError(`<fn>`, "Corrupt x: no names", base::quote(NULL))
#> 11. └─dplyr (local) h(simpleError(msg, call))
#> 12. └─rlang::abort(...)
Created on 2022-10-26 with reprex v2.0.2
rowwise changes the behavior of subsequent verbs, namely instead of operating on an entire column they will now operate only on values in a given row.
This works because the data in mutate refers to a single dataframe (due to rowwise provided by nest_by)
library(dplyr)
library(purrr)
mtcars %>% nest_by(cyl) %>% mutate(avg = mean(data$disp))
#> # A tibble: 3 × 3
#> # Rowwise: cyl
#> cyl data avg
#> <dbl> <list<tibble[,10]>> <dbl>
#> 1 4 [11 × 10] 105.
#> 2 6 [7 × 10] 183.
#> 3 8 [14 × 10] 353.
This will not work because data refers to a list of dataframes, and disp is not a name in that list
mtcars %>% group_nest(cyl) %>% mutate(avg = mean(data$disp))
#> Error in `mutate()`:
#> ! Problem while computing `avg = mean(data$disp)`.
#> Caused by error:
#> ! Corrupt x: no names
#> Backtrace:
#> ▆
#> 1. ├─mtcars %>% group_nest(cyl) %>% mutate(avg = mean(data$disp))
#> 2. ├─dplyr::mutate(., avg = mean(data$disp))
#> 3. ├─dplyr:::mutate.data.frame(., avg = mean(data$disp))
#> 4. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), caller_env = caller_env())
#> 5. │ ├─base::withCallingHandlers(...)
#> 6. │ └─mask$eval_all_mutate(quo)
#> 7. ├─base::mean(data$disp)
#> 8. ├─data$disp
#> 9. ├─vctrs:::`$.vctrs_list_of`(data, disp)
#> 10. └─base::.handleSimpleError(`<fn>`, "Corrupt x: no names", base::quote(NULL))
#> 11. └─dplyr (local) h(simpleError(msg, call))
#> 12. └─rlang::abort(...)
You may obtain an equivalent calculation by e.g. mapping over the list of dataframes, to apply a function to each dataframe in the list
mtcars %>% group_nest(cyl) %>% mutate(avg = map_dbl(data, ~ mean(.x$disp)))
#> # A tibble: 3 × 3
#> cyl data avg
#> <dbl> <list<tibble[,10]>> <dbl>
#> 1 4 [11 × 10] 105.
#> 2 6 [7 × 10] 183.
#> 3 8 [14 × 10] 353.
Created on 2022-10-26 with reprex v2.0.2
We could use map to loop over the list as there is no rowwise grouping with group_nest
library(dplyr)
library(purrr)
mtcars %>%
group_nest(cyl) %>%
mutate(avg = map_dbl(data, ~ mean(.x$disp)))
-output
# A tibble: 3 × 3
cyl data avg
<dbl> <list<tibble[,10]>> <dbl>
1 4 [11 × 10] 105.
2 6 [7 × 10] 183.
3 8 [14 × 10] 353.
According to ?group_nest
The primary use case for group_nest() is with already grouped data frames, typically a result of group_by().
where as with ?nest_by
nest_by() is closely related to group_by(). However, instead of storing the group structure in the metadata, it is made explicit in the data, giving each group key a single row along with a list-column of data frames that contain all the other data.
> library(pacman)
> p_load(tidyverse)
> # working
> mtcars %>% nest_by(cyl) %>% class()
[1] "rowwise_df" "tbl_df" "tbl" "data.frame"
> mtcars %>% nest_by(cyl) %>% mutate(avg = mean(data$disp))
# A tibble: 3 × 3
# Rowwise: cyl
cyl data avg
<dbl> <list<tibble[,10]>> <dbl>
1 4 [11 × 10] 105.
2 6 [7 × 10] 183.
3 8 [14 × 10] 353.
> # not working
> mtcars %>% group_nest(cyl) %>% class()
[1] "tbl_df" "tbl" "data.frame"
> mtcars %>% group_nest(cyl) %>% mutate(avg = mean(data$disp))
Error in `mutate()`:
! Problem while computing `avg = mean(data$disp)`.
Caused by error:
! Corrupt x: no names
Run `rlang::last_error()` to see where the error occurred.
The nest_by call yields a rowwise_df which is amenable to the next step in the pipe, whereas group_nest yields a plain old tbl_df, hence the difference

Using purrr:map2 to perform regression where the predictor and the criterion are stored in different objects

for the purposes of this question, let's create the following setup:
mtcars %>%
group_split(carb) %>%
map(select, mpg) -> criterion
mtcars %>%
group_split(carb) %>%
map(select, qsec) -> predictor
This code will create two lists of length 6. What I want to do is to perform 6 linear regressions within each of these 6 groups. I read about the map2 function and I thought that the code should look like this:
map2(criterion, predictor, lm(criterion ~ predictor))
But that doesn't seem to work. So in which way could this be done?
simplify2array (you need a list of vectors, not a list of data frames) and use a lambda-function with ~:
map2(simplify2array(criterion), simplify2array(predictor), ~ lm(.x ~ .y))
While the direct answer to your question is already given, note that we can also use dplyr::nest_by() and then proceed automatically rowwise.
Now your models are stored in the mod column and we can use broom::tidy etc. to work with the models.
library(dplyr)
library(tidyr)
mtcars %>%
nest_by(carb) %>%
mutate(mod = list(lm(mpg ~ qsec, data = data)),
res = list(broom::tidy(mod))) %>%
unnest(res) %>%
filter(term != "(Intercept)")
#> # A tibble: 6 x 8
#> # Groups: carb [6]
#> carb data mod term estimate std.error statistic p.value
#> <dbl> <list<tibble[,10]>> <list> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 [7 x 10] <lm> qsec -1.26 4.51 -0.279 0.791
#> 2 2 [10 x 10] <lm> qsec 0.446 0.971 0.460 0.658
#> 3 3 [3 x 10] <lm> qsec -2.46 2.41 -1.02 0.493
#> 4 4 [10 x 10] <lm> qsec 0.0597 0.991 0.0602 0.953
#> 5 6 [1 x 10] <lm> qsec NA NA NA NA
#> 6 8 [1 x 10] <lm> qsec NA NA NA NA
Created on 2022-09-30 by the reprex package (v2.0.1)

No tidy method for objects of class LiblineaR

I have fitted text data based on regression and LiblineaR engine. And I want to `tidy()` my results. I have also installed the dev version of `broom`.
But I always get an error. `ERROR: No tidy method for objects of class LiblineaR`
> svm_fit %>%
+ pull_workflow_fit() %>%
+ tidy()
ERROR: No tidy method for objects of class LiblineaR
We just merged in support for the tidy() method for parsnip models fitted with the LiblineaR engine, so if you install from GitHub, you should be able to have this feature now:
devtools::install_github("tidymodels/parsnip")
Here is a demo of how it works:
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
data(two_class_dat, package = "modeldata")
example_split <- initial_split(two_class_dat, prop = 0.99)
example_train <- training(example_split)
example_test <- testing(example_split)
rec <- recipe(Class ~ ., data = example_train) %>%
step_normalize(all_numeric_predictors())
spec1 <- svm_linear() %>%
set_engine("LiblineaR") %>%
set_mode("classification")
spec2 <- logistic_reg(penalty = 0.1, mixture = 1) %>%
set_engine("LiblineaR") %>%
set_mode("classification")
wf <- workflow() %>%
add_recipe(rec)
wf %>%
add_model(spec1) %>%
fit(example_train) %>%
tidy()
#> # A tibble: 3 x 2
#> term estimate
#> <chr> <dbl>
#> 1 A 0.361
#> 2 B -0.966
#> 3 Bias 0.113
wf %>%
add_model(spec2) %>%
fit(example_train) %>%
tidy()
#> # A tibble: 3 x 2
#> term estimate
#> <chr> <dbl>
#> 1 A 1.06
#> 2 B -2.76
#> 3 Bias 0.329
svm_linear() %>%
set_engine("LiblineaR") %>%
set_mode("regression") %>%
fit(mpg ~ ., data = mtcars) %>%
tidy()
#> # A tibble: 11 x 2
#> term estimate
#> <chr> <dbl>
#> 1 cyl 0.141
#> 2 disp -0.0380
#> 3 hp 0.0415
#> 4 drat 0.226
#> 5 wt 0.0757
#> 6 qsec 1.06
#> 7 vs 0.0648
#> 8 am 0.0479
#> 9 gear 0.219
#> 10 carb 0.00861
#> 11 Bias 0.0525
Created on 2021-04-22 by the reprex package (v2.0.0)

PCA - how to visualize that all the variable are in different / same scale

I am working with the dataset uscrime but this question applied to any well-known dataset like cars.
After to googling I found extremely useful to standardize my data, considering that PCA finds new directions based on covariance matrix of original variables, and covariance matrix is sensitive to standardization of variables.
Nevertheless, I found "It is not necessary to standardize the variables, if all the variables are in same scale."
To standardize the variable I am using the function:
z_uscrime <- (uscrime - mean(uscrime)) / sd(uscrime)
Prior to standardize my data, how to check if all the variables are in the same scale or not?
Proving my point that you can standardize your data however many times you want
library(tidyverse)
library(recipes)
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stringr':
#>
#> fixed
#> The following object is masked from 'package:stats':
#>
#> step
simple_recipe <- recipe(mpg ~ .,data = mtcars) %>%
step_center(everything()) %>%
step_scale(everything())
mtcars2 <- simple_recipe %>%
prep() %>%
juice()
simple_recipe2 <- recipe(mpg ~ .,data = mtcars2) %>%
step_center(everything()) %>%
step_scale(everything())
mtcars3 <- simple_recipe2 %>%
prep() %>%
juice()
all.equal(mtcars2,mtcars3)
#> [1] TRUE
mtcars2 %>%
summarise(across(everything(),.fns = list(mean = ~ mean(.x),sd = ~sd(.x)))) %>%
pivot_longer(everything(),names_pattern = "(.*)_(.*)",names_to = c("stat", ".value"))
#> # A tibble: 11 x 3
#> stat mean sd
#> <chr> <dbl> <dbl>
#> 1 cyl -1.47e-17 1
#> 2 disp -9.08e-17 1
#> 3 hp 1.04e-17 1
#> 4 drat -2.92e-16 1
#> 5 wt 4.68e-17 1.00
#> 6 qsec 5.30e-16 1
#> 7 vs 6.94e-18 1.00
#> 8 am 4.51e-17 1
#> 9 gear -3.47e-18 1.00
#> 10 carb 3.17e-17 1.00
#> 11 mpg 7.11e-17 1
mtcars3 %>%
summarise(across(everything(),.fns = list(mean = ~ mean(.x),sd = ~sd(.x)))) %>%
pivot_longer(everything(),names_pattern = "(.*)_(.*)",names_to = c("stat", ".value"))
#> # A tibble: 11 x 3
#> stat mean sd
#> <chr> <dbl> <dbl>
#> 1 cyl -1.17e-17 1
#> 2 disp -1.95e-17 1
#> 3 hp 9.54e-18 1
#> 4 drat 1.17e-17 1
#> 5 wt 3.26e-17 1
#> 6 qsec 1.37e-17 1
#> 7 vs 4.16e-17 1
#> 8 am 4.51e-17 1
#> 9 gear 0. 1
#> 10 carb 2.60e-18 1
#> 11 mpg 4.77e-18 1
Created on 2020-06-07 by the reprex package (v0.3.0)

Extract elements from nested list only using functions from purrr package

How do I extract elements from a nested list only using the purrr package? In this case I want to get a vector of intercepts after splitting a data.frame. I have accomplished what I need using lapply(), but I would like to use only functions purrr package.
library(purrr)
mtcars %>%
split(.$cyl) %>%
map( ~lm(mpg ~ wt, data = .)) %>% # shorthand NOTE: ~ lm
lapply(function(x) x[[1]] [1]) %>% # extract intercepts <==is there a purrr function for this line?
as_vector() # convert to vector
I have tried map() and at_depth() but nothing seemed to work for me.
The map functions have some shorthand coding for indexing nested lists. A helpful snippet from the help page:
To index deeply into a nested list, use multiple values; c("x", "y")
is equivalent to z[["x"]][["y"]].
So using code for nested indexes along with map_dbl, which reduces to a vector, you can simply do:
mtcars %>%
split(.$cyl) %>%
map(~lm(mpg ~ wt, data = .)) %>%
map_dbl(c(1, 1))
4 6 8
39.57120 28.40884 23.86803
I also found this blog post introducing purrr 0.1.0 useful, as it gave a few more example of the shorthand coding that I ended up using.
using tidy function from broom
library(purrr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
library(broom)
cyl_group<-mtcars %>% group_by(cyl) %>%
nest()
cyl_lm<-cyl_group %>% mutate(
mod=map(data,~lm(mpg ~ wt, data = .x))
) %>% mutate(coef=map(mod,~tidy(.x))) %>% unnest(coef)
cyl_lm
#> # A tibble: 6 x 8
#> # Groups: cyl [3]
#> cyl data mod term estimate std.error statistic p.value
#> <dbl> <list> <list> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 6 <tibble [7 x 10~ <lm> (Interce~ 28.4 4.18 6.79 1.05e-3
#> 2 6 <tibble [7 x 10~ <lm> wt -2.78 1.33 -2.08 9.18e-2
#> 3 4 <tibble [11 x 1~ <lm> (Interce~ 39.6 4.35 9.10 7.77e-6
#> 4 4 <tibble [11 x 1~ <lm> wt -5.65 1.85 -3.05 1.37e-2
#> 5 8 <tibble [14 x 1~ <lm> (Interce~ 23.9 3.01 7.94 4.05e-6
#> 6 8 <tibble [14 x 1~ <lm> wt -2.19 0.739 -2.97 1.18e-2
Created on 2020-08-19 by the reprex package (v0.3.0)

Resources