How to access/select nested data frame column with dplyr

How to access/select nested data frame column with dplyr - r

I have the following data frame:
library(tidyverse)
iris %>%
dplyr::select(Species, Petal.Width) %>%
as_tibble()
Then I group by Species to get mean_se with the following line of code:
df <- iris %>%
dplyr::select(Species, Petal.Width) %>%
as_tibble() %>%
group_by(Species) %>%
mutate(ms = mean_se(Petal.Width))
It looks like this:
df
# A tibble: 150 x 3
# Groups: Species [3]
Species Petal.Width ms$y $ymin $ymax
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 0.2 0.246 0.231 0.261
2 setosa 0.2 0.246 0.231 0.261
3 setosa 0.2 0.246 0.231 0.261
4 setosa 0.2 0.246 0.231 0.261
5 setosa 0.2 0.246 0.231 0.261
6 setosa 0.4 0.246 0.231 0.261
7 setosa 0.3 0.246 0.231 0.261
8 setosa 0.2 0.246 0.231 0.261
9 setosa 0.2 0.246 0.231 0.261
10 setosa 0.1 0.246 0.231 0.261
However when I want to select ms$y and $ymax column, like this
> df %>% dplyr::select(Species, ms$y, $ymax)
Error: unexpected '$' in "df %>% dplyr::select(Species, ms$y, $"
It fails. What's the way to do it?

It gets stored as a nested dataframe. You can convert it to normal dataframe as :
library(tidyverse)
iris %>%
select(Species, Petal.Width) %>%
as_tibble() %>%
group_by(Species) %>%
mutate(ms = mean_se(Petal.Width)) %>%
ungroup -> tmp
df <- bind_cols(tmp %>% select(-ms), tmp$ms)
df
# A tibble: 150 x 5
# Species Petal.Width y ymin ymax
# <fct> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 0.2 0.246 0.231 0.261
# 2 setosa 0.2 0.246 0.231 0.261
# 3 setosa 0.2 0.246 0.231 0.261
# 4 setosa 0.2 0.246 0.231 0.261
# 5 setosa 0.2 0.246 0.231 0.261
# 6 setosa 0.4 0.246 0.231 0.261
# 7 setosa 0.3 0.246 0.231 0.261
# 8 setosa 0.2 0.246 0.231 0.261
# 9 setosa 0.2 0.246 0.231 0.261
#10 setosa 0.1 0.246 0.231 0.261
# … with 140 more rows
Select the columns you need.
df %>% select(Species, y, ymax)
# A tibble: 150 x 3
# Species y ymax
# <fct> <dbl> <dbl>
# 1 setosa 0.246 0.261
# 2 setosa 0.246 0.261
# 3 setosa 0.246 0.261
# 4 setosa 0.246 0.261
# 5 setosa 0.246 0.261
# 6 setosa 0.246 0.261
# 7 setosa 0.246 0.261
# 8 setosa 0.246 0.261
# 9 setosa 0.246 0.261
#10 setosa 0.246 0.261
# … with 140 more rows
Another way without creating temporary variable tmp would be :
iris %>%
select(Species, Petal.Width) %>%
as_tibble() %>%
group_by(Species) %>%
mutate(ms = list(mean_se(Petal.Width))) %>%
unnest(ms) %>%
ungroup

Related

Map mean_se group_by combination can't handle a factor group_by

I'd like to get grouped means and corresponding se using the mean_se function from ggplot, but adding a group_by breaks the function. I've had to resort to pivoting and a longer summarize/mutate pipe, but would like to figure out how to do it in one go without all the data manipulation up-front. My dataset has 14 columns of interest + 1 group_by, but I'll use all of iris (hence skipping a select() pipe) for reproducibility.
Compare the first (doesn't work but is what I want) to the rest:
iris %>% group_by(Species) %>% #Error
map(~(mean_se(.)))
iris %>% select(-Species) %>%
map(~(mean_se(.))) #global mean+se
iris %>% select(-Species) %>%
map_dfr(~(mean_se(.)), id = "Species") %>% broom::tidy() #Gives 3 but which is which and how so if Species is unselected?
iris %>%
map_dfr(~(mean_se(.))) %>% broom::tidy() #Error
iris %>%
map_dfr(~(mean_se(.)), id = "Species") %>% broom::tidy() #Also error
iris %>% #Runs but the
group_by(Species) %>% #output doesn't make sense,
group_modify(~ #there should only be 3 columns (y, ymin, ymax) not 13
.x %>%
map_dfc(mean_se))
map_dfr isn't respecting the .id command and I don't understand how I am getting 3 groups if the grouping variable has to be removed to avoid an Error in stats::var(x) : Calling var(x) on a factor x is defunct. Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.

Alternatively:
iris %>%
group_by(Species) %>%
summarise(across(everything(), mean_se))
# A tibble: 3 x 5
Species Sepal.Length$y $ymin $ymax Sepal.Width$y $ymin $ymax Petal.Length$y $ymin $ymax Petal.Width$y $ymin $ymax
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 setosa 5.01 4.96 5.06 3.43 3.37 3.48 1.46 1.44 1.49 0.246 0.231 0.261
2 versicolor 5.94 5.86 6.01 2.77 2.73 2.81 4.26 4.19 4.33 1.33 1.30 1.35
3 virginica 6.59 6.50 6.68 2.97 2.93 3.02 5.55 5.47 5.63 2.03 1.99 2.06
or if you want that in longer form:
iris %>%
group_by(Species) %>%
summarise(across(everything(), mean_se)) %>%
pivot_longer(-Species)
# A tibble: 12 x 3
Species name value$y $ymin $ymax
<fct> <chr> <dbl> <dbl> <dbl>
1 setosa Sepal.Length 5.01 4.96 5.06
2 setosa Sepal.Width 3.43 3.37 3.48
3 setosa Petal.Length 1.46 1.44 1.49
4 setosa Petal.Width 0.246 0.231 0.261
5 versicolor Sepal.Length 5.94 5.86 6.01
6 versicolor Sepal.Width 2.77 2.73 2.81
7 versicolor Petal.Length 4.26 4.19 4.33
8 versicolor Petal.Width 1.33 1.30 1.35
9 virginica Sepal.Length 6.59 6.50 6.68
10 virginica Sepal.Width 2.97 2.93 3.02
11 virginica Petal.Length 5.55 5.47 5.63
12 virginica Petal.Width 2.03 1.99 2.06

You can do it as follows
library(tidyverse)
iris %>%
pivot_longer(!contains("Species"), names_to = "key", values_to="val") %>%
group_by(Species, key) %>%
group_modify(~ mean_se(.x$val))
output
# A tibble: 12 x 5
# Groups: Species, key [12]
Species key y ymin ymax
<fct> <chr> <dbl> <dbl> <dbl>
1 setosa Petal.Length 1.46 1.44 1.49
2 setosa Petal.Width 0.246 0.231 0.261
3 setosa Sepal.Length 5.01 4.96 5.06
4 setosa Sepal.Width 3.43 3.37 3.48
5 versicolor Petal.Length 4.26 4.19 4.33
6 versicolor Petal.Width 1.33 1.30 1.35
7 versicolor Sepal.Length 5.94 5.86 6.01
8 versicolor Sepal.Width 2.77 2.73 2.81
9 virginica Petal.Length 5.55 5.47 5.63
10 virginica Petal.Width 2.03 1.99 2.06
11 virginica Sepal.Length 6.59 6.50 6.68
12 virginica Sepal.Width 2.97 2.93 3.02

We can split by 'Species', and use map to loop over the list and apply the mean_se on each column with map. In the split, the . is used for extraction and for selecting the columns (.[-5] - last column i.e. grouping column Species is removed), thus we block the code in {} as more than one operation is done.
library(dplyr)
library(purrr)
iris %>%
{split(.[-5], .$Species)} %>%
map_dfr(map_dfr, mean_se, .id = 'Species')
-output
Species y ymin ymax
1 setosa 5.006 4.9561504 5.0558496
2 setosa 3.428 3.3743922 3.4816078
3 setosa 1.462 1.4374402 1.4865598
4 setosa 0.246 0.2310962 0.2609038
5 versicolor 5.936 5.8630024 6.0089976
6 versicolor 2.770 2.7256222 2.8143778
7 versicolor 4.260 4.1935446 4.3264554
8 versicolor 1.326 1.2980335 1.3539665
9 virginica 6.588 6.4980730 6.6779270
10 virginica 2.974 2.9283921 3.0196079
11 virginica 5.552 5.4739503 5.6300497
12 virginica 2.026 1.9871586 2.0648414
Or using by
do.call(rbind, by(iris[-5], iris$Species, FUN = Vectorize(mean_se)))
-output
Sepal.Length Sepal.Width Petal.Length Petal.Width
y 5.006 3.428 1.462 0.246
ymin 4.95615 3.374392 1.43744 0.2310962
ymax 5.05585 3.481608 1.48656 0.2609038
y 5.936 2.77 4.26 1.326
ymin 5.863002 2.725622 4.193545 1.298034
ymax 6.008998 2.814378 4.326455 1.353966
y 6.588 2.974 5.552 2.026
ymin 6.498073 2.928392 5.47395 1.987159
ymax 6.677927 3.019608 5.63005 2.064841

Slightly different syntax, but I would do the following:
iris %>%
pivot_longer(contains("."), names_to = "key", values_to = "value") %>%
group_by(Species, key) %>%
summarize(mean_se(value))
`summarise()` has grouped output by 'Species'. You can override using the `.groups` argument.
# A tibble: 12 × 5
# Groups: Species [3]
Species key y ymin ymax
<fct> <chr> <dbl> <dbl> <dbl>
1 setosa Petal.Length 1.46 1.44 1.49
2 setosa Petal.Width 0.246 0.231 0.261
3 setosa Sepal.Length 5.01 4.96 5.06
4 setosa Sepal.Width 3.43 3.37 3.48
5 versicolor Petal.Length 4.26 4.19 4.33
6 versicolor Petal.Width 1.33 1.30 1.35
7 versicolor Sepal.Length 5.94 5.86 6.01
8 versicolor Sepal.Width 2.77 2.73 2.81
9 virginica Petal.Length 5.55 5.47 5.63
10 virginica Petal.Width 2.03 1.99 2.06
11 virginica Sepal.Length 6.59 6.50 6.68
12 virginica Sepal.Width 2.97 2.93 3.02

Why does map %>% as.data.frame give different result than map_df?

I'm new to R and trying to understand how to use map and map_df. Consider the following:
iris %>% split(.$Species) %>% map_df(function (x) apply(x[, 1:4], 2, mean))
And compare to
iris %>% split(.$Species) %>% map(function (x) apply(x[, 1:4], 2, mean)) %>% as.data.frame
The former gives the following output:
Sepal.Length Sepal.Width Petal.Length Petal.Width
<dbl> <dbl> <dbl> <dbl>
1 5.01 3.43 1.46 0.246
2 5.94 2.77 4.26 1.33
3 6.59 2.97 5.55 2.03
The latter gives:
setosa versicolor virginica
Sepal.Length 5.006 5.936 6.588
Sepal.Width 3.428 2.770 2.974
Petal.Length 1.462 4.260 5.552
Petal.Width 0.246 1.326 2.026
My question is: why? I would expect these two commands to give the same output. How can I get the second output with map_df function?

map_df() seems to bind the list elements by rows (same as map_dfr()) although it is not explicitly stated in its documentation. If you would like to bind by column, use map_dfc() instead. Note that the output is a tibble in which the use of rownames is discouraged. This document suggests ways to work with rownames in tibbles.
iris %>%
split(.$Species) %>%
map_dfc(function (x) apply(x[, 1:4], 2, mean))
# # A tibble: 4 x 3
# setosa versicolor virginica
# <dbl> <dbl> <dbl>
# 1 5.01 5.94 6.59
# 2 3.43 2.77 2.97
# 3 1.46 4.26 5.55
# 4 0.246 1.33 2.03

Firstly, you have used split instead of group_split which shows that you are creating a named list instead.
secondly, after you have created a named list (having three items), you are taking mean of first columns in all three items in list using apply
After the mean calculation, the result that remains is again a three item list with a single row containing mean of all four columns.
Now here comes the difference-
As you have created a named list in first step, if you use map_df or map_dfr (the result will b same in both), map function just row binds the output.
However, using map leaves the list as a list (named) which as.data.frame binds column wise.
library(tidyverse)
#1st case
iris %>% split(.$Species) %>% map_dfr(function (x) apply(x[, 1:4], 2, mean))
#> # A tibble: 3 x 4
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 5.01 3.43 1.46 0.246
#> 2 5.94 2.77 4.26 1.33
#> 3 6.59 2.97 5.55 2.03
#OR
iris %>% split(.$Species) %>% map_df(function (x) apply(x[, 1:4], 2, mean))
#> # A tibble: 3 x 4
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 5.01 3.43 1.46 0.246
#> 2 5.94 2.77 4.26 1.33
#> 3 6.59 2.97 5.55 2.03
#2nd case
iris %>% split(.$Species) %>% map(function (x) apply(x[, 1:4], 2, mean)) %>% as.data.frame
#> setosa versicolor virginica
#> Sepal.Length 5.006 5.936 6.588
#> Sepal.Width 3.428 2.770 2.974
#> Petal.Length 1.462 4.260 5.552
#> Petal.Width 0.246 1.326 2.026
#OR (with a slight difference of dropping row names)
iris %>% split(.$Species) %>% map_dfc(function (x) apply(x[, 1:4], 2, mean))
#> # A tibble: 4 x 3
#> setosa versicolor virginica
#> <dbl> <dbl> <dbl>
#> 1 5.01 5.94 6.59
#> 2 3.43 2.77 2.97
#> 3 1.46 4.26 5.55
#> 4 0.246 1.33 2.03
#If group_split would have been used instead
iris %>% group_split(Species) %>% map(function (x) apply(x[, 1:4], 2, mean)) %>% as.data.frame
#> c.Sepal.Length...5.006..Sepal.Width...3.428..Petal.Length...1.462..
#> Sepal.Length 5.006
#> Sepal.Width 3.428
#> Petal.Length 1.462
#> Petal.Width 0.246
#> c.Sepal.Length...5.936..Sepal.Width...2.77..Petal.Length...4.26..
#> Sepal.Length 5.936
#> Sepal.Width 2.770
#> Petal.Length 4.260
#> Petal.Width 1.326
#> c.Sepal.Length...6.588..Sepal.Width...2.974..Petal.Length...5.552..
#> Sepal.Length 6.588
#> Sepal.Width 2.974
#> Petal.Length 5.552
#> Petal.Width 2.026
#OR
iris %>% group_split(Species) %>% map_dfc(function (x) apply(x[, 1:4], 2, mean))
#> New names:
#> * NA -> ...1
#> * NA -> ...2
#> * NA -> ...3
#> # A tibble: 4 x 3
#> ...1 ...2 ...3
#> <dbl> <dbl> <dbl>
#> 1 5.01 5.94 6.59
#> 2 3.43 2.77 2.97
#> 3 1.46 4.26 5.55
#> 4 0.246 1.33 2.03
Thus, default binding of list into a dataframe is columnwise and if list is named, item names will be used as output df's column names. On the other hand if you are giving specific command through map_df or map_dfr then list items will be bind row-wise and name of list items are therefore not required.
iris %>% group_split(Species) %>% map_dfr(function (x) apply(x[, 1:4], 2, mean))
# A tibble: 3 x 4
Sepal.Length Sepal.Width Petal.Length Petal.Width
<dbl> <dbl> <dbl> <dbl>
1 5.01 3.43 1.46 0.246
2 5.94 2.77 4.26 1.33
3 6.59 2.97 5.55 2.03
Hope this is clear.

Pass arugments to R custom function using map (purrr)

Using iris for reproducibility
library(tidyverse)
mean_by <- function(data,by,var1) {
data %>%
group_by({{by}}) %>%
summarise(avg=mean({{var1}}))
}
iris %>% mean_by(Species,Petal.Width) # This works
map_dfr(iris,mean_by,Species) # This doesnt work, I want to run this to all numerical columns in iris, how do I do that.
More importantly, fundamental question: how do I pass arguments to custom function using map_dfr in R

Ideally you should use across :
library(dplyr)
library(purrr)
iris %>%
group_by(Species) %>%
summarise(across(where(is.numeric), mean, na.rm = TRUE))
# Species Sepal.Length Sepal.Width Petal.Length Petal.Width
# <fct> <dbl> <dbl> <dbl> <dbl>
#1 setosa 5.01 3.43 1.46 0.246
#2 versicolor 5.94 2.77 4.26 1.33
#3 virginica 6.59 2.97 5.55 2.03
To use map_dfr for numeric variables we can change the function to :
mean_by <- function(data,by,var1) {
data %>%
group_by({{by}}) %>%
summarise(avg = mean(.data[[var1]]))
}
map_dfr(names(select(iris, where(is.numeric))),
mean_by, data = iris, by = Species)
# Species avg
# <fct> <dbl>
# 1 setosa 5.01
# 2 versicolor 5.94
# 3 virginica 6.59
# 4 setosa 3.43
# 5 versicolor 2.77
# 6 virginica 2.97
# 7 setosa 1.46
# 8 versicolor 4.26
# 9 virginica 5.55
#10 setosa 0.246
#11 versicolor 1.33
#12 virginica 2.03

tidyr: Gathering two values per key [duplicate]

This question already has answers here:
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
(8 answers)
Gather multiple sets of columns
(5 answers)
Closed 3 years ago.
I have a dataset with the mean and sd of each variable as columns, but I want to convert it into "long" format as so:
library(tidyverse)
iris %>%
group_by(Species) %>%
summarize_all(list(mean = mean, sd = sd))
#> # A tibble: 3 x 9
#> Species Sepal.Length_me~ Sepal.Width_mean Petal.Length_me~
#> <fct> <dbl> <dbl> <dbl>
#> 1 setosa 5.01 3.43 1.46
#> 2 versic~ 5.94 2.77 4.26
#> 3 virgin~ 6.59 2.97 5.55
#> # ... with 5 more variables: Petal.Width_mean <dbl>,
#> # Sepal.Length_sd <dbl>, Sepal.Width_sd <dbl>, Petal.Length_sd <dbl>,
#> # Petal.Width_sd <dbl>
# Desired output:
#
# tribble(~Species, ~Variable, ~Mean, ~SD
# #-------------------------------
# ... )
I feel like tidyr::gather would be good to use here, however, I am not sure how the syntax would work for having two values per key. Or perhaps I need to use two gathers and column bind them?

To convert your post-summarise_all data you can do the following
df %>%
gather(key, val, -Species) %>%
separate(key, into = c("Variable", "metric"), sep = "_") %>%
spread(metric, val)
## A tibble: 12 x 4
# Species Variable mean sd
# <fct> <chr> <dbl> <dbl>
# 1 setosa Petal.Length 1.46 0.174
# 2 setosa Petal.Width 0.246 0.105
# 3 setosa Sepal.Length 5.01 0.352
# 4 setosa Sepal.Width 3.43 0.379
# 5 versicolor Petal.Length 4.26 0.470
# 6 versicolor Petal.Width 1.33 0.198
# 7 versicolor Sepal.Length 5.94 0.516
# 8 versicolor Sepal.Width 2.77 0.314
# 9 virginica Petal.Length 5.55 0.552
#10 virginica Petal.Width 2.03 0.275
#11 virginica Sepal.Length 6.59 0.636
#12 virginica Sepal.Width 2.97 0.322
But it's actually faster & shorter to transform the data from wide to long right from the start
iris %>%
gather(Variable, val, -Species) %>%
group_by(Species, Variable) %>%
summarise(Mean = mean(val), SD = sd(val))
## A tibble: 12 x 4
## Groups: Species [?]
# Species Variable Mean SD
# <fct> <chr> <dbl> <dbl>
# 1 setosa Petal.Length 1.46 0.174
# 2 setosa Petal.Width 0.246 0.105
# 3 setosa Sepal.Length 5.01 0.352
# 4 setosa Sepal.Width 3.43 0.379
# 5 versicolor Petal.Length 4.26 0.470
# 6 versicolor Petal.Width 1.33 0.198
# 7 versicolor Sepal.Length 5.94 0.516
# 8 versicolor Sepal.Width 2.77 0.314
# 9 virginica Petal.Length 5.55 0.552
#10 virginica Petal.Width 2.03 0.275
#11 virginica Sepal.Length 6.59 0.636
#12 virginica Sepal.Width 2.97 0.322

Here is an option with pivot_longer from the dev version of tidyr.
library(dplyr)
library(tidyr) #tidyr_0.8.3.9000
df %>%
rename_at(-1, ~ str_replace(., "(.*)_(.*)", "\\2_\\1")) %>%
pivot_longer(-Species, names_to = c(".value", "Variable"), names_sep = "_")
# A tibble: 12 x 4
# Species Variable mean sd
# <fct> <chr> <dbl> <dbl>
# 1 setosa Sepal.Length 5.01 0.352
# 2 setosa Sepal.Width 3.43 0.379
# 3 setosa Petal.Length 1.46 0.174
# 4 setosa Petal.Width 0.246 0.105
# 5 versicolor Sepal.Length 5.94 0.516
# 6 versicolor Sepal.Width 2.77 0.314
# 7 versicolor Petal.Length 4.26 0.470
# 8 versicolor Petal.Width 1.33 0.198
# 9 virginica Sepal.Length 6.59 0.636
#10 virginica Sepal.Width 2.97 0.322
#11 virginica Petal.Length 5.55 0.552
#12 virginica Petal.Width 2.03 0.275
data
data(iris)
df <- iris %>%
group_by(Species) %>%
summarize_all(list(mean = mean, sd = sd))

How to unpack the group_by() do() output from dplyr pipe

I have the following code that performs the summary() for the iris$Petal.Width grouped by species
library(tidyverse)
dat <- iris %>%
as.tibble() %>%
select(Petal.Width, Species) %>%
group_by(Species) %>%
do(fn = summary(.$Petal.Width))
dat
#> Source: local data frame [3 x 2]
#> Groups: <by row>
#>
#> # A tibble: 3 x 2
#> Species fn
#> * <fct> <list>
#> 1 setosa <S3: summaryDefault>
#> 2 versicolor <S3: summaryDefault>
#> 3 virginica <S3: summaryDefault>
What I want to do is to unpack the fn column into the following ( I do this by hand)
Species Min. 1st Qu. Median Mean 3rd Qu. Max.
setosa 0.100 0.200 0.200 0.246 0.300 0.600
versicolor 1.000 1.200 1.300 1.326 1.500 1.800
virginica 1.400 1.800 2.000 2.026 2.300 2.500
How can I do it?
I tried this but failed dat %>% ungroup(fn)

Presently, explicit list columns are preferred as an idiom over do. In this case, it might look like
library(tidyverse)
iris %>%
group_by(Species) %>%
summarise(summary = list(broom::tidy(summary(Petal.Width)))) %>%
unnest()
#> # A tibble: 3 x 7
#> Species minimum q1 median mean q3 maximum
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa 0.100 0.200 0.200 0.246 0.300 0.600
#> 2 versicolor 1.00 1.20 1.30 1.33 1.50 1.80
#> 3 virginica 1.40 1.80 2.00 2.03 2.30 2.50
If you like, this is one of those cases where the base R idiom is more concise and readable:
aggregate(Petal.Width ~ Species, iris, summary)
#> Species Petal.Width.Min. Petal.Width.1st Qu. Petal.Width.Median
#> 1 setosa 0.100 0.200 0.200
#> 2 versicolor 1.000 1.200 1.300
#> 3 virginica 1.400 1.800 2.000
#> Petal.Width.Mean Petal.Width.3rd Qu. Petal.Width.Max.
#> 1 0.246 0.300 0.600
#> 2 1.326 1.500 1.800
#> 3 2.026 2.300 2.500
However, note that if you call str on the result, it shows Petal.Width is actually a matrix column (which isn't possible in tibbles, but is in data.frames). To extract it, tack on %>% {cbind(.[1], .[[2]])} or equivalent.
skimr::skim is another option which respects dplyr grouping:
library(dplyr)
iris %>% group_by(Species) %>% skimr::skim(Petal.Width)
#> Skim summary statistics
#> n obs: 150
#> n variables: 5
#> group variables: Species
#>
#> Variable type: numeric
#> Species variable missing complete n mean sd p0 p25 p50 p75 p100
#> setosa Petal.Width 0 50 50 0.25 0.11 0.1 0.2 0.2 0.3 0.6
#> versicolor Petal.Width 0 50 50 1.33 0.2 1 1.2 1.3 1.5 1.8
#> virginica Petal.Width 0 50 50 2.03 0.27 1.4 1.8 2 2.3 2.5
#> hist
#> ▂▇▁▂▂▁▁▁
#> ▆▃▇▅▆▂▁▁
#> ▂▁▇▃▃▆▅▃
What it displays is actually a print method for underlying long data. skimr is built to keep working with dplyr methods, but at some point you may need to hack the underlying data out. The documentation explains pretty well.

Try
dat <- iris %>%
as.tibble() %>%
select(Petal.Width, Species) %>%
group_by(Species) %>%
do(fn = summary(.$Petal.Width) %>% as.matrix() %>% t() %>% as.data.frame())
dat %>% unnest()
# # A tibble: 3 x 7
# Species Min. `1st Qu.` Median Mean `3rd Qu.` Max.
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 0.100 0.200 0.200 0.246 0.300 0.600
# 2 versicolor 1.00 1.20 1.30 1.33 1.50 1.80
# 3 virginica 1.40 1.80 2.00 2.03 2.30 2.50

Categories

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to access/select nested data frame column with dplyr - r

Related

Map mean_se group_by combination can't handle a factor group_by

Why does map %>% as.data.frame give different result than map_df?

Pass arugments to R custom function using map (purrr)

tidyr: Gathering two values per key [duplicate]

How to unpack the group_by() do() output from dplyr pipe

Categories

Resources