Pass arugments to R custom function using map (purrr) - r

Using iris for reproducibility
library(tidyverse)
mean_by <- function(data,by,var1) {
data %>%
group_by({{by}}) %>%
summarise(avg=mean({{var1}}))
}
iris %>% mean_by(Species,Petal.Width) # This works
map_dfr(iris,mean_by,Species) # This doesnt work, I want to run this to all numerical columns in iris, how do I do that.
More importantly, fundamental question: how do I pass arguments to custom function using map_dfr in R

Ideally you should use across :
library(dplyr)
library(purrr)
iris %>%
group_by(Species) %>%
summarise(across(where(is.numeric), mean, na.rm = TRUE))
# Species Sepal.Length Sepal.Width Petal.Length Petal.Width
# <fct> <dbl> <dbl> <dbl> <dbl>
#1 setosa 5.01 3.43 1.46 0.246
#2 versicolor 5.94 2.77 4.26 1.33
#3 virginica 6.59 2.97 5.55 2.03
To use map_dfr for numeric variables we can change the function to :
mean_by <- function(data,by,var1) {
data %>%
group_by({{by}}) %>%
summarise(avg = mean(.data[[var1]]))
}
map_dfr(names(select(iris, where(is.numeric))),
mean_by, data = iris, by = Species)
# Species avg
# <fct> <dbl>
# 1 setosa 5.01
# 2 versicolor 5.94
# 3 virginica 6.59
# 4 setosa 3.43
# 5 versicolor 2.77
# 6 virginica 2.97
# 7 setosa 1.46
# 8 versicolor 4.26
# 9 virginica 5.55
#10 setosa 0.246
#11 versicolor 1.33
#12 virginica 2.03

Related

Map mean_se group_by combination can't handle a factor group_by

I'd like to get grouped means and corresponding se using the mean_se function from ggplot, but adding a group_by breaks the function. I've had to resort to pivoting and a longer summarize/mutate pipe, but would like to figure out how to do it in one go without all the data manipulation up-front. My dataset has 14 columns of interest + 1 group_by, but I'll use all of iris (hence skipping a select() pipe) for reproducibility.
Compare the first (doesn't work but is what I want) to the rest:
iris %>% group_by(Species) %>% #Error
map(~(mean_se(.)))
iris %>% select(-Species) %>%
map(~(mean_se(.))) #global mean+se
iris %>% select(-Species) %>%
map_dfr(~(mean_se(.)), id = "Species") %>% broom::tidy() #Gives 3 but which is which and how so if Species is unselected?
iris %>%
map_dfr(~(mean_se(.))) %>% broom::tidy() #Error
iris %>%
map_dfr(~(mean_se(.)), id = "Species") %>% broom::tidy() #Also error
iris %>% #Runs but the
group_by(Species) %>% #output doesn't make sense,
group_modify(~ #there should only be 3 columns (y, ymin, ymax) not 13
.x %>%
map_dfc(mean_se))
map_dfr isn't respecting the .id command and I don't understand how I am getting 3 groups if the grouping variable has to be removed to avoid an Error in stats::var(x) : Calling var(x) on a factor x is defunct. Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
Alternatively:
iris %>%
group_by(Species) %>%
summarise(across(everything(), mean_se))
# A tibble: 3 x 5
Species Sepal.Length$y $ymin $ymax Sepal.Width$y $ymin $ymax Petal.Length$y $ymin $ymax Petal.Width$y $ymin $ymax
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 setosa 5.01 4.96 5.06 3.43 3.37 3.48 1.46 1.44 1.49 0.246 0.231 0.261
2 versicolor 5.94 5.86 6.01 2.77 2.73 2.81 4.26 4.19 4.33 1.33 1.30 1.35
3 virginica 6.59 6.50 6.68 2.97 2.93 3.02 5.55 5.47 5.63 2.03 1.99 2.06
or if you want that in longer form:
iris %>%
group_by(Species) %>%
summarise(across(everything(), mean_se)) %>%
pivot_longer(-Species)
# A tibble: 12 x 3
Species name value$y $ymin $ymax
<fct> <chr> <dbl> <dbl> <dbl>
1 setosa Sepal.Length 5.01 4.96 5.06
2 setosa Sepal.Width 3.43 3.37 3.48
3 setosa Petal.Length 1.46 1.44 1.49
4 setosa Petal.Width 0.246 0.231 0.261
5 versicolor Sepal.Length 5.94 5.86 6.01
6 versicolor Sepal.Width 2.77 2.73 2.81
7 versicolor Petal.Length 4.26 4.19 4.33
8 versicolor Petal.Width 1.33 1.30 1.35
9 virginica Sepal.Length 6.59 6.50 6.68
10 virginica Sepal.Width 2.97 2.93 3.02
11 virginica Petal.Length 5.55 5.47 5.63
12 virginica Petal.Width 2.03 1.99 2.06
You can do it as follows
library(tidyverse)
iris %>%
pivot_longer(!contains("Species"), names_to = "key", values_to="val") %>%
group_by(Species, key) %>%
group_modify(~ mean_se(.x$val))
output
# A tibble: 12 x 5
# Groups: Species, key [12]
Species key y ymin ymax
<fct> <chr> <dbl> <dbl> <dbl>
1 setosa Petal.Length 1.46 1.44 1.49
2 setosa Petal.Width 0.246 0.231 0.261
3 setosa Sepal.Length 5.01 4.96 5.06
4 setosa Sepal.Width 3.43 3.37 3.48
5 versicolor Petal.Length 4.26 4.19 4.33
6 versicolor Petal.Width 1.33 1.30 1.35
7 versicolor Sepal.Length 5.94 5.86 6.01
8 versicolor Sepal.Width 2.77 2.73 2.81
9 virginica Petal.Length 5.55 5.47 5.63
10 virginica Petal.Width 2.03 1.99 2.06
11 virginica Sepal.Length 6.59 6.50 6.68
12 virginica Sepal.Width 2.97 2.93 3.02
We can split by 'Species', and use map to loop over the list and apply the mean_se on each column with map. In the split, the . is used for extraction and for selecting the columns (.[-5] - last column i.e. grouping column Species is removed), thus we block the code in {} as more than one operation is done.
library(dplyr)
library(purrr)
iris %>%
{split(.[-5], .$Species)} %>%
map_dfr(map_dfr, mean_se, .id = 'Species')
-output
Species y ymin ymax
1 setosa 5.006 4.9561504 5.0558496
2 setosa 3.428 3.3743922 3.4816078
3 setosa 1.462 1.4374402 1.4865598
4 setosa 0.246 0.2310962 0.2609038
5 versicolor 5.936 5.8630024 6.0089976
6 versicolor 2.770 2.7256222 2.8143778
7 versicolor 4.260 4.1935446 4.3264554
8 versicolor 1.326 1.2980335 1.3539665
9 virginica 6.588 6.4980730 6.6779270
10 virginica 2.974 2.9283921 3.0196079
11 virginica 5.552 5.4739503 5.6300497
12 virginica 2.026 1.9871586 2.0648414
Or using by
do.call(rbind, by(iris[-5], iris$Species, FUN = Vectorize(mean_se)))
-output
Sepal.Length Sepal.Width Petal.Length Petal.Width
y 5.006 3.428 1.462 0.246
ymin 4.95615 3.374392 1.43744 0.2310962
ymax 5.05585 3.481608 1.48656 0.2609038
y 5.936 2.77 4.26 1.326
ymin 5.863002 2.725622 4.193545 1.298034
ymax 6.008998 2.814378 4.326455 1.353966
y 6.588 2.974 5.552 2.026
ymin 6.498073 2.928392 5.47395 1.987159
ymax 6.677927 3.019608 5.63005 2.064841
Slightly different syntax, but I would do the following:
iris %>%
pivot_longer(contains("."), names_to = "key", values_to = "value") %>%
group_by(Species, key) %>%
summarize(mean_se(value))
`summarise()` has grouped output by 'Species'. You can override using the `.groups` argument.
# A tibble: 12 × 5
# Groups: Species [3]
Species key y ymin ymax
<fct> <chr> <dbl> <dbl> <dbl>
1 setosa Petal.Length 1.46 1.44 1.49
2 setosa Petal.Width 0.246 0.231 0.261
3 setosa Sepal.Length 5.01 4.96 5.06
4 setosa Sepal.Width 3.43 3.37 3.48
5 versicolor Petal.Length 4.26 4.19 4.33
6 versicolor Petal.Width 1.33 1.30 1.35
7 versicolor Sepal.Length 5.94 5.86 6.01
8 versicolor Sepal.Width 2.77 2.73 2.81
9 virginica Petal.Length 5.55 5.47 5.63
10 virginica Petal.Width 2.03 1.99 2.06
11 virginica Sepal.Length 6.59 6.50 6.68
12 virginica Sepal.Width 2.97 2.93 3.02

Why does map %>% as.data.frame give different result than map_df?

I'm new to R and trying to understand how to use map and map_df. Consider the following:
iris %>% split(.$Species) %>% map_df(function (x) apply(x[, 1:4], 2, mean))
And compare to
iris %>% split(.$Species) %>% map(function (x) apply(x[, 1:4], 2, mean)) %>% as.data.frame
The former gives the following output:
Sepal.Length Sepal.Width Petal.Length Petal.Width
<dbl> <dbl> <dbl> <dbl>
1 5.01 3.43 1.46 0.246
2 5.94 2.77 4.26 1.33
3 6.59 2.97 5.55 2.03
The latter gives:
setosa versicolor virginica
Sepal.Length 5.006 5.936 6.588
Sepal.Width 3.428 2.770 2.974
Petal.Length 1.462 4.260 5.552
Petal.Width 0.246 1.326 2.026
My question is: why? I would expect these two commands to give the same output. How can I get the second output with map_df function?
map_df() seems to bind the list elements by rows (same as map_dfr()) although it is not explicitly stated in its documentation. If you would like to bind by column, use map_dfc() instead. Note that the output is a tibble in which the use of rownames is discouraged. This document suggests ways to work with rownames in tibbles.
iris %>%
split(.$Species) %>%
map_dfc(function (x) apply(x[, 1:4], 2, mean))
# # A tibble: 4 x 3
# setosa versicolor virginica
# <dbl> <dbl> <dbl>
# 1 5.01 5.94 6.59
# 2 3.43 2.77 2.97
# 3 1.46 4.26 5.55
# 4 0.246 1.33 2.03
Firstly, you have used split instead of group_split which shows that you are creating a named list instead.
secondly, after you have created a named list (having three items), you are taking mean of first columns in all three items in list using apply
After the mean calculation, the result that remains is again a three item list with a single row containing mean of all four columns.
Now here comes the difference-
As you have created a named list in first step, if you use map_df or map_dfr (the result will b same in both), map function just row binds the output.
However, using map leaves the list as a list (named) which as.data.frame binds column wise.
library(tidyverse)
#1st case
iris %>% split(.$Species) %>% map_dfr(function (x) apply(x[, 1:4], 2, mean))
#> # A tibble: 3 x 4
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 5.01 3.43 1.46 0.246
#> 2 5.94 2.77 4.26 1.33
#> 3 6.59 2.97 5.55 2.03
#OR
iris %>% split(.$Species) %>% map_df(function (x) apply(x[, 1:4], 2, mean))
#> # A tibble: 3 x 4
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 5.01 3.43 1.46 0.246
#> 2 5.94 2.77 4.26 1.33
#> 3 6.59 2.97 5.55 2.03
#2nd case
iris %>% split(.$Species) %>% map(function (x) apply(x[, 1:4], 2, mean)) %>% as.data.frame
#> setosa versicolor virginica
#> Sepal.Length 5.006 5.936 6.588
#> Sepal.Width 3.428 2.770 2.974
#> Petal.Length 1.462 4.260 5.552
#> Petal.Width 0.246 1.326 2.026
#OR (with a slight difference of dropping row names)
iris %>% split(.$Species) %>% map_dfc(function (x) apply(x[, 1:4], 2, mean))
#> # A tibble: 4 x 3
#> setosa versicolor virginica
#> <dbl> <dbl> <dbl>
#> 1 5.01 5.94 6.59
#> 2 3.43 2.77 2.97
#> 3 1.46 4.26 5.55
#> 4 0.246 1.33 2.03
#If group_split would have been used instead
iris %>% group_split(Species) %>% map(function (x) apply(x[, 1:4], 2, mean)) %>% as.data.frame
#> c.Sepal.Length...5.006..Sepal.Width...3.428..Petal.Length...1.462..
#> Sepal.Length 5.006
#> Sepal.Width 3.428
#> Petal.Length 1.462
#> Petal.Width 0.246
#> c.Sepal.Length...5.936..Sepal.Width...2.77..Petal.Length...4.26..
#> Sepal.Length 5.936
#> Sepal.Width 2.770
#> Petal.Length 4.260
#> Petal.Width 1.326
#> c.Sepal.Length...6.588..Sepal.Width...2.974..Petal.Length...5.552..
#> Sepal.Length 6.588
#> Sepal.Width 2.974
#> Petal.Length 5.552
#> Petal.Width 2.026
#OR
iris %>% group_split(Species) %>% map_dfc(function (x) apply(x[, 1:4], 2, mean))
#> New names:
#> * NA -> ...1
#> * NA -> ...2
#> * NA -> ...3
#> # A tibble: 4 x 3
#> ...1 ...2 ...3
#> <dbl> <dbl> <dbl>
#> 1 5.01 5.94 6.59
#> 2 3.43 2.77 2.97
#> 3 1.46 4.26 5.55
#> 4 0.246 1.33 2.03
Thus, default binding of list into a dataframe is columnwise and if list is named, item names will be used as output df's column names. On the other hand if you are giving specific command through map_df or map_dfr then list items will be bind row-wise and name of list items are therefore not required.
iris %>% group_split(Species) %>% map_dfr(function (x) apply(x[, 1:4], 2, mean))
# A tibble: 3 x 4
Sepal.Length Sepal.Width Petal.Length Petal.Width
<dbl> <dbl> <dbl> <dbl>
1 5.01 3.43 1.46 0.246
2 5.94 2.77 4.26 1.33
3 6.59 2.97 5.55 2.03
Hope this is clear.

How to get quantiles to work with summarise_at and group_by (dplyr)

When using dplyr to create a table of summary statistics that is organized by levels of a variable, I cannot figure out the syntax for calculating quartiles without having to repeat the column name. That is, using calls, such as vars() and list() work with other functions, such as mean() and median() but not with quantile()
Searches have produced antiquated solutions that no longer work because they use deprecated calls, such as do() and/or funs().
data(iris)
library(tidyverse)
#This works: Notice I have not attempted to calculate quartiles yet
summary_stat <- iris %>%
group_by(Species) %>%
summarise_at(vars(Sepal.Length),
list(min=min, median=median, max=max,
mean=mean, sd=sd)
)
A tibble: 3 x 6
Species min median max mean sd
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 setosa 4.3 5 5.8 5.01 0.352
2 versicolor 4.9 5.9 7 5.94 0.516
3 virginica 4.9 6.5 7.9 6.59 0.636
##########################################################################
#Does NOT work:
five_number_summary <- iris %>%
group_by(Species) %>%
summarise_at(vars(Sepal.Length),
list(min=min, Q1=quantile(.,probs = 0.25),
median=median, Q3=quantile(., probs = 0.75),
max=max))
Error: Must use a vector in `[`, not an object of class matrix.
Call `rlang::last_error()` to see a backtrace
###########################################################################
#This works: Remove the vars() argument, remove the list() argument,
#replace summarise_at() with summarise()
#but the code requires repeating the column name (Sepal.Length)
five_number_summary <- iris %>%
group_by(Species) %>%
summarise(min=min(Sepal.Length),
Q1=quantile(Sepal.Length,probs = 0.25),
median=median(Sepal.Length),
Q3=quantile(Sepal.Length, probs = 0.75),
max=max(Sepal.Length))
# A tibble: 3 x 6
Species min Q1 median Q3 max
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 setosa 4.3 4.8 5 5.2 5.8
2 versicolor 4.9 5.6 5.9 6.3 7
3 virginica 4.9 6.22 6.5 6.9 7.9
This last piece of code produces exactly what I am looking for, but I am wondering why there isn't a shorter syntax that doesn't force me to repeat the variable.
You're missing the ~ in front of the quantile function in the summarise_at call that failed. Try the following:
five_number_summary <- iris %>%
group_by(Species) %>%
summarise_at(vars(Sepal.Length),
list(min=min, Q1=~quantile(., probs = 0.25),
median=median, Q3=~quantile(., probs = 0.75),
max=max))
five_number_summary
# A tibble: 3 x 6
Species min Q1 median Q3 max
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 setosa 4.3 4.8 5 5.2 5.8
2 versicolor 4.9 5.6 5.9 6.3 7
3 virginica 4.9 6.22 6.5 6.9 7.9
You can create a list column and then use unnest_wider, which requires tidyr 1.0.0
library(tidyverse)
iris %>%
group_by(Species) %>%
summarise(q = list(quantile(Sepal.Length))) %>%
unnest_wider(q)
# # A tibble: 3 x 6
# Species `0%` `25%` `50%` `75%` `100%`
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 4.3 4.8 5 5.2 5.8
# 2 versicolor 4.9 5.6 5.9 6.3 7
# 3 virginica 4.9 6.22 6.5 6.9 7.9
There's a names_repair argument, but apparently that changes the name of all the columns, and not just the ones being unnested (??)
iris %>%
group_by(Species) %>%
summarise(q = list(quantile(Sepal.Length))) %>%
unnest_wider(q, names_repair = ~paste0('Q_', sub('%', '', .)))
# # A tibble: 3 x 6
# Q_Species Q_0 Q_25 Q_50 Q_75 Q_100
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 4.3 4.8 5 5.2 5.8
# 2 versicolor 4.9 5.6 5.9 6.3 7
# 3 virginica 4.9 6.22 6.5 6.9 7.9
Another option is group_modify
iris %>%
group_by(Species) %>%
group_modify(~as.data.frame(t(quantile(.$Sepal.Length))))
# # A tibble: 3 x 6
# # Groups: Species [3]
# Species `0%` `25%` `50%` `75%` `100%`
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 4.3 4.8 5 5.2 5.8
# 2 versicolor 4.9 5.6 5.9 6.3 7
# 3 virginica 4.9 6.22 6.5 6.9 7.9
Or you could use data.table
library(data.table)
irisdt <- as.data.table(iris)
irisdt[, as.list(quantile(Sepal.Length)), Species]
# Species 0% 25% 50% 75% 100%
# 1: setosa 4.3 4.800 5.0 5.2 5.8
# 2: versicolor 4.9 5.600 5.9 6.3 7.0
# 3: virginica 4.9 6.225 6.5 6.9 7.9
A note about a more up-to-date version of #arienrhod
library(dplyr,quietly = TRUE,verbose = FALSE, warn.conflicts = FALSE)
five_number_summary <- iris %>%
group_by(Species) %>%
summarise(across(Sepal.Length, list(min=min, Q1=~quantile(., probs = 0.25),
median=median, Q3=~quantile(., probs = 0.75),
max=max), .names = "{.fn}"))
five_number_summary
#> # A tibble: 3 x 6
#> Species min Q1 median Q3 max
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa 4.3 4.8 5 5.2 5.8
#> 2 versicolor 4.9 5.6 5.9 6.3 7
#> 3 virginica 4.9 6.22 6.5 6.9 7.9
Created on 2022-02-21 by the reprex package (v2.0.1)

How do I add summary-statistics values to each row of a data frame? [duplicate]

This question already has answers here:
Calculate group mean, sum, or other summary stats. and assign column to original data
(4 answers)
Closed 1 year ago.
I would like to know a tidyverse way to add summary statistics back to each row of a dataframe.
The code below works, but should be a quicker way out there, right?
library("tidyverse")
data <- (iris)
means <- iris %>%
group_by(Species) %>%
summarise(
Sepal.Length = mean(Sepal.Length),
Sepal.Width = mean(Sepal.Width)
)
data <- merge(data, means, by = "Species")
One way to do this would be to use mutate.
library("tidyverse")
data <- (iris)
data<-data %>%
group_by(Species) %>%
mutate(Sepal.Length.y=mean(Sepal.Length), Sepal.Width.y=mean(Sepal.Width))
So this is very similar to what you had before but cuts out a few steps. If you want to rearrange the order of the columns you can reorder them. Also, I would recommend changing the column names from Sepal.Length and Sepal.Width in your post but if you don't specify a unique name r will just put a .y on them to make them unique.
Hope this helps.
You can do this with dplyr::mutate_at:
iris %>% group_by(Species) %>%
mutate_at(.vars = vars(Sepal.Length,Sepal.Width),
.funs = list(mean = ~mean))
We need the list(mean = ~mean) bit, instead of just .funs = mean to rename the columns, instead of writing over the original ones.
# A tibble: 150 x 7
# Groups: Species [3]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_mean Sepal.Width_mean
<dbl> <dbl> <dbl> <dbl> <fct> <dbl> <dbl>
1 5.1 3.5 1.4 0.2 setosa 5.01 3.43
2 4.9 3 1.4 0.2 setosa 5.01 3.43
3 4.7 3.2 1.3 0.2 setosa 5.01 3.43
4 4.6 3.1 1.5 0.2 setosa 5.01 3.43
5 5 3.6 1.4 0.2 setosa 5.01 3.43
6 5.4 3.9 1.7 0.4 setosa 5.01 3.43
7 4.6 3.4 1.4 0.3 setosa 5.01 3.43
8 5 3.4 1.5 0.2 setosa 5.01 3.43
9 4.4 2.9 1.4 0.2 setosa 5.01 3.43
10 4.9 3.1 1.5 0.1 setosa 5.01 3.43

tidyr: Gathering two values per key [duplicate]

This question already has answers here:
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
(8 answers)
Gather multiple sets of columns
(5 answers)
Closed 3 years ago.
I have a dataset with the mean and sd of each variable as columns, but I want to convert it into "long" format as so:
library(tidyverse)
iris %>%
group_by(Species) %>%
summarize_all(list(mean = mean, sd = sd))
#> # A tibble: 3 x 9
#> Species Sepal.Length_me~ Sepal.Width_mean Petal.Length_me~
#> <fct> <dbl> <dbl> <dbl>
#> 1 setosa 5.01 3.43 1.46
#> 2 versic~ 5.94 2.77 4.26
#> 3 virgin~ 6.59 2.97 5.55
#> # ... with 5 more variables: Petal.Width_mean <dbl>,
#> # Sepal.Length_sd <dbl>, Sepal.Width_sd <dbl>, Petal.Length_sd <dbl>,
#> # Petal.Width_sd <dbl>
# Desired output:
#
# tribble(~Species, ~Variable, ~Mean, ~SD
# #-------------------------------
# ... )
I feel like tidyr::gather would be good to use here, however, I am not sure how the syntax would work for having two values per key. Or perhaps I need to use two gathers and column bind them?
To convert your post-summarise_all data you can do the following
df %>%
gather(key, val, -Species) %>%
separate(key, into = c("Variable", "metric"), sep = "_") %>%
spread(metric, val)
## A tibble: 12 x 4
# Species Variable mean sd
# <fct> <chr> <dbl> <dbl>
# 1 setosa Petal.Length 1.46 0.174
# 2 setosa Petal.Width 0.246 0.105
# 3 setosa Sepal.Length 5.01 0.352
# 4 setosa Sepal.Width 3.43 0.379
# 5 versicolor Petal.Length 4.26 0.470
# 6 versicolor Petal.Width 1.33 0.198
# 7 versicolor Sepal.Length 5.94 0.516
# 8 versicolor Sepal.Width 2.77 0.314
# 9 virginica Petal.Length 5.55 0.552
#10 virginica Petal.Width 2.03 0.275
#11 virginica Sepal.Length 6.59 0.636
#12 virginica Sepal.Width 2.97 0.322
But it's actually faster & shorter to transform the data from wide to long right from the start
iris %>%
gather(Variable, val, -Species) %>%
group_by(Species, Variable) %>%
summarise(Mean = mean(val), SD = sd(val))
## A tibble: 12 x 4
## Groups: Species [?]
# Species Variable Mean SD
# <fct> <chr> <dbl> <dbl>
# 1 setosa Petal.Length 1.46 0.174
# 2 setosa Petal.Width 0.246 0.105
# 3 setosa Sepal.Length 5.01 0.352
# 4 setosa Sepal.Width 3.43 0.379
# 5 versicolor Petal.Length 4.26 0.470
# 6 versicolor Petal.Width 1.33 0.198
# 7 versicolor Sepal.Length 5.94 0.516
# 8 versicolor Sepal.Width 2.77 0.314
# 9 virginica Petal.Length 5.55 0.552
#10 virginica Petal.Width 2.03 0.275
#11 virginica Sepal.Length 6.59 0.636
#12 virginica Sepal.Width 2.97 0.322
Here is an option with pivot_longer from the dev version of tidyr.
library(dplyr)
library(tidyr) #tidyr_0.8.3.9000
df %>%
rename_at(-1, ~ str_replace(., "(.*)_(.*)", "\\2_\\1")) %>%
pivot_longer(-Species, names_to = c(".value", "Variable"), names_sep = "_")
# A tibble: 12 x 4
# Species Variable mean sd
# <fct> <chr> <dbl> <dbl>
# 1 setosa Sepal.Length 5.01 0.352
# 2 setosa Sepal.Width 3.43 0.379
# 3 setosa Petal.Length 1.46 0.174
# 4 setosa Petal.Width 0.246 0.105
# 5 versicolor Sepal.Length 5.94 0.516
# 6 versicolor Sepal.Width 2.77 0.314
# 7 versicolor Petal.Length 4.26 0.470
# 8 versicolor Petal.Width 1.33 0.198
# 9 virginica Sepal.Length 6.59 0.636
#10 virginica Sepal.Width 2.97 0.322
#11 virginica Petal.Length 5.55 0.552
#12 virginica Petal.Width 2.03 0.275
data
data(iris)
df <- iris %>%
group_by(Species) %>%
summarize_all(list(mean = mean, sd = sd))

Resources