I have the following code that performs the summary() for the iris$Petal.Width grouped by species
library(tidyverse)
dat <- iris %>%
as.tibble() %>%
select(Petal.Width, Species) %>%
group_by(Species) %>%
do(fn = summary(.$Petal.Width))
dat
#> Source: local data frame [3 x 2]
#> Groups: <by row>
#>
#> # A tibble: 3 x 2
#> Species fn
#> * <fct> <list>
#> 1 setosa <S3: summaryDefault>
#> 2 versicolor <S3: summaryDefault>
#> 3 virginica <S3: summaryDefault>
What I want to do is to unpack the fn column into the following ( I do this by hand)
Species Min. 1st Qu. Median Mean 3rd Qu. Max.
setosa 0.100 0.200 0.200 0.246 0.300 0.600
versicolor 1.000 1.200 1.300 1.326 1.500 1.800
virginica 1.400 1.800 2.000 2.026 2.300 2.500
How can I do it?
I tried this but failed dat %>% ungroup(fn)
Presently, explicit list columns are preferred as an idiom over do. In this case, it might look like
library(tidyverse)
iris %>%
group_by(Species) %>%
summarise(summary = list(broom::tidy(summary(Petal.Width)))) %>%
unnest()
#> # A tibble: 3 x 7
#> Species minimum q1 median mean q3 maximum
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa 0.100 0.200 0.200 0.246 0.300 0.600
#> 2 versicolor 1.00 1.20 1.30 1.33 1.50 1.80
#> 3 virginica 1.40 1.80 2.00 2.03 2.30 2.50
If you like, this is one of those cases where the base R idiom is more concise and readable:
aggregate(Petal.Width ~ Species, iris, summary)
#> Species Petal.Width.Min. Petal.Width.1st Qu. Petal.Width.Median
#> 1 setosa 0.100 0.200 0.200
#> 2 versicolor 1.000 1.200 1.300
#> 3 virginica 1.400 1.800 2.000
#> Petal.Width.Mean Petal.Width.3rd Qu. Petal.Width.Max.
#> 1 0.246 0.300 0.600
#> 2 1.326 1.500 1.800
#> 3 2.026 2.300 2.500
However, note that if you call str on the result, it shows Petal.Width is actually a matrix column (which isn't possible in tibbles, but is in data.frames). To extract it, tack on %>% {cbind(.[1], .[[2]])} or equivalent.
skimr::skim is another option which respects dplyr grouping:
library(dplyr)
iris %>% group_by(Species) %>% skimr::skim(Petal.Width)
#> Skim summary statistics
#> n obs: 150
#> n variables: 5
#> group variables: Species
#>
#> Variable type: numeric
#> Species variable missing complete n mean sd p0 p25 p50 p75 p100
#> setosa Petal.Width 0 50 50 0.25 0.11 0.1 0.2 0.2 0.3 0.6
#> versicolor Petal.Width 0 50 50 1.33 0.2 1 1.2 1.3 1.5 1.8
#> virginica Petal.Width 0 50 50 2.03 0.27 1.4 1.8 2 2.3 2.5
#> hist
#> ▂▇▁▂▂▁▁▁
#> ▆▃▇▅▆▂▁▁
#> ▂▁▇▃▃▆▅▃
What it displays is actually a print method for underlying long data. skimr is built to keep working with dplyr methods, but at some point you may need to hack the underlying data out. The documentation explains pretty well.
Try
dat <- iris %>%
as.tibble() %>%
select(Petal.Width, Species) %>%
group_by(Species) %>%
do(fn = summary(.$Petal.Width) %>% as.matrix() %>% t() %>% as.data.frame())
dat %>% unnest()
# # A tibble: 3 x 7
# Species Min. `1st Qu.` Median Mean `3rd Qu.` Max.
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 0.100 0.200 0.200 0.246 0.300 0.600
# 2 versicolor 1.00 1.20 1.30 1.33 1.50 1.80
# 3 virginica 1.40 1.80 2.00 2.03 2.30 2.50
Related
I have the following, but want to add the group_by() key Species to the resulting tibble:
MWE
iris %>%
group_by(Species) %>%
group_map(~ broom::tidy(lm(Sepal.Length ~ Sepal.Width, data = .x))) %>%
bind_rows()
Output
# How do I add the grouping key `Species` to this?
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 2.64 0.310 8.51 3.74e-11
2 Sepal.Width 0.690 0.0899 7.68 6.71e-10
3 (Intercept) 3.54 0.563 6.29 9.07e- 8
4 Sepal.Width 0.865 0.202 4.28 8.77e- 5
5 (Intercept) 3.91 0.757 5.16 4.66e- 6
6 Sepal.Width 0.902 0.253 3.56 8.43e- 4
You can use group_modify() instead of group_map().
library(purrr)
library(dplyr)
iris %>%
group_by(Species) %>%
group_modify(~ broom::tidy(lm(Sepal.Length ~ Sepal.Width, data = .x)))
# A tibble: 6 x 6
# Groups: Species [3]
Species term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 setosa (Intercept) 2.64 0.310 8.51 3.74e-11
2 setosa Sepal.Width 0.690 0.0899 7.68 6.71e-10
3 versicolor (Intercept) 3.54 0.563 6.29 9.07e- 8
4 versicolor Sepal.Width 0.865 0.202 4.28 8.77e- 5
5 virginica (Intercept) 3.91 0.757 5.16 4.66e- 6
6 virginica Sepal.Width 0.902 0.253 3.56 8.43e- 4
We may do this in summarise, return a list column and unnest the output
library(dplyr)
library(tidyr)
iris %>%
group_by(Species) %>%
summarise(out = list(broom::tidy(lm(Sepal.Length ~ Sepal.Width,
data = cur_data())))) %>%
unnest(out)
-output
# A tibble: 6 x 6
Species term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 setosa (Intercept) 2.64 0.310 8.51 3.74e-11
2 setosa Sepal.Width 0.690 0.0899 7.68 6.71e-10
3 versicolor (Intercept) 3.54 0.563 6.29 9.07e- 8
4 versicolor Sepal.Width 0.865 0.202 4.28 8.77e- 5
5 virginica (Intercept) 3.91 0.757 5.16 4.66e- 6
6 virginica Sepal.Width 0.902 0.253 3.56 8.43e- 4
In group_map, according to documentation, the .y is the key, which we can add as a column
iris %>%
group_by(Species) %>%
group_map(~ broom::tidy(lm(Sepal.Length ~ Sepal.Width, data = .x)) %>%
mutate(Species = .y$Species, .before = 1)) %>%
bind_rows()
-output
# A tibble: 6 x 6
Species term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 setosa (Intercept) 2.64 0.310 8.51 3.74e-11
2 setosa Sepal.Width 0.690 0.0899 7.68 6.71e-10
3 versicolor (Intercept) 3.54 0.563 6.29 9.07e- 8
4 versicolor Sepal.Width 0.865 0.202 4.28 8.77e- 5
5 virginica (Intercept) 3.91 0.757 5.16 4.66e- 6
6 virginica Sepal.Width 0.902 0.253 3.56 8.43e- 4
Using split + map_df -
library(dplyr)
library(purrr)
iris %>%
split(.$Species) %>%
map_df(~ broom::tidy(lm(Sepal.Length ~ Sepal.Width, data=.x)),.id = 'Species')
# Species term estimate std.error statistic p.value
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#1 setosa (Intercept) 2.64 0.310 8.51 3.74e-11
#2 setosa Sepal.Width 0.690 0.0899 7.68 6.71e-10
#3 versicolor (Intercept) 3.54 0.563 6.29 9.07e- 8
#4 versicolor Sepal.Width 0.865 0.202 4.28 8.77e- 5
#5 virginica (Intercept) 3.91 0.757 5.16 4.66e- 6
#6 virginica Sepal.Width 0.902 0.253 3.56 8.43e- 4
I'd like to get grouped means and corresponding se using the mean_se function from ggplot, but adding a group_by breaks the function. I've had to resort to pivoting and a longer summarize/mutate pipe, but would like to figure out how to do it in one go without all the data manipulation up-front. My dataset has 14 columns of interest + 1 group_by, but I'll use all of iris (hence skipping a select() pipe) for reproducibility.
Compare the first (doesn't work but is what I want) to the rest:
iris %>% group_by(Species) %>% #Error
map(~(mean_se(.)))
iris %>% select(-Species) %>%
map(~(mean_se(.))) #global mean+se
iris %>% select(-Species) %>%
map_dfr(~(mean_se(.)), id = "Species") %>% broom::tidy() #Gives 3 but which is which and how so if Species is unselected?
iris %>%
map_dfr(~(mean_se(.))) %>% broom::tidy() #Error
iris %>%
map_dfr(~(mean_se(.)), id = "Species") %>% broom::tidy() #Also error
iris %>% #Runs but the
group_by(Species) %>% #output doesn't make sense,
group_modify(~ #there should only be 3 columns (y, ymin, ymax) not 13
.x %>%
map_dfc(mean_se))
map_dfr isn't respecting the .id command and I don't understand how I am getting 3 groups if the grouping variable has to be removed to avoid an Error in stats::var(x) : Calling var(x) on a factor x is defunct. Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
Alternatively:
iris %>%
group_by(Species) %>%
summarise(across(everything(), mean_se))
# A tibble: 3 x 5
Species Sepal.Length$y $ymin $ymax Sepal.Width$y $ymin $ymax Petal.Length$y $ymin $ymax Petal.Width$y $ymin $ymax
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 setosa 5.01 4.96 5.06 3.43 3.37 3.48 1.46 1.44 1.49 0.246 0.231 0.261
2 versicolor 5.94 5.86 6.01 2.77 2.73 2.81 4.26 4.19 4.33 1.33 1.30 1.35
3 virginica 6.59 6.50 6.68 2.97 2.93 3.02 5.55 5.47 5.63 2.03 1.99 2.06
or if you want that in longer form:
iris %>%
group_by(Species) %>%
summarise(across(everything(), mean_se)) %>%
pivot_longer(-Species)
# A tibble: 12 x 3
Species name value$y $ymin $ymax
<fct> <chr> <dbl> <dbl> <dbl>
1 setosa Sepal.Length 5.01 4.96 5.06
2 setosa Sepal.Width 3.43 3.37 3.48
3 setosa Petal.Length 1.46 1.44 1.49
4 setosa Petal.Width 0.246 0.231 0.261
5 versicolor Sepal.Length 5.94 5.86 6.01
6 versicolor Sepal.Width 2.77 2.73 2.81
7 versicolor Petal.Length 4.26 4.19 4.33
8 versicolor Petal.Width 1.33 1.30 1.35
9 virginica Sepal.Length 6.59 6.50 6.68
10 virginica Sepal.Width 2.97 2.93 3.02
11 virginica Petal.Length 5.55 5.47 5.63
12 virginica Petal.Width 2.03 1.99 2.06
You can do it as follows
library(tidyverse)
iris %>%
pivot_longer(!contains("Species"), names_to = "key", values_to="val") %>%
group_by(Species, key) %>%
group_modify(~ mean_se(.x$val))
output
# A tibble: 12 x 5
# Groups: Species, key [12]
Species key y ymin ymax
<fct> <chr> <dbl> <dbl> <dbl>
1 setosa Petal.Length 1.46 1.44 1.49
2 setosa Petal.Width 0.246 0.231 0.261
3 setosa Sepal.Length 5.01 4.96 5.06
4 setosa Sepal.Width 3.43 3.37 3.48
5 versicolor Petal.Length 4.26 4.19 4.33
6 versicolor Petal.Width 1.33 1.30 1.35
7 versicolor Sepal.Length 5.94 5.86 6.01
8 versicolor Sepal.Width 2.77 2.73 2.81
9 virginica Petal.Length 5.55 5.47 5.63
10 virginica Petal.Width 2.03 1.99 2.06
11 virginica Sepal.Length 6.59 6.50 6.68
12 virginica Sepal.Width 2.97 2.93 3.02
We can split by 'Species', and use map to loop over the list and apply the mean_se on each column with map. In the split, the . is used for extraction and for selecting the columns (.[-5] - last column i.e. grouping column Species is removed), thus we block the code in {} as more than one operation is done.
library(dplyr)
library(purrr)
iris %>%
{split(.[-5], .$Species)} %>%
map_dfr(map_dfr, mean_se, .id = 'Species')
-output
Species y ymin ymax
1 setosa 5.006 4.9561504 5.0558496
2 setosa 3.428 3.3743922 3.4816078
3 setosa 1.462 1.4374402 1.4865598
4 setosa 0.246 0.2310962 0.2609038
5 versicolor 5.936 5.8630024 6.0089976
6 versicolor 2.770 2.7256222 2.8143778
7 versicolor 4.260 4.1935446 4.3264554
8 versicolor 1.326 1.2980335 1.3539665
9 virginica 6.588 6.4980730 6.6779270
10 virginica 2.974 2.9283921 3.0196079
11 virginica 5.552 5.4739503 5.6300497
12 virginica 2.026 1.9871586 2.0648414
Or using by
do.call(rbind, by(iris[-5], iris$Species, FUN = Vectorize(mean_se)))
-output
Sepal.Length Sepal.Width Petal.Length Petal.Width
y 5.006 3.428 1.462 0.246
ymin 4.95615 3.374392 1.43744 0.2310962
ymax 5.05585 3.481608 1.48656 0.2609038
y 5.936 2.77 4.26 1.326
ymin 5.863002 2.725622 4.193545 1.298034
ymax 6.008998 2.814378 4.326455 1.353966
y 6.588 2.974 5.552 2.026
ymin 6.498073 2.928392 5.47395 1.987159
ymax 6.677927 3.019608 5.63005 2.064841
Slightly different syntax, but I would do the following:
iris %>%
pivot_longer(contains("."), names_to = "key", values_to = "value") %>%
group_by(Species, key) %>%
summarize(mean_se(value))
`summarise()` has grouped output by 'Species'. You can override using the `.groups` argument.
# A tibble: 12 × 5
# Groups: Species [3]
Species key y ymin ymax
<fct> <chr> <dbl> <dbl> <dbl>
1 setosa Petal.Length 1.46 1.44 1.49
2 setosa Petal.Width 0.246 0.231 0.261
3 setosa Sepal.Length 5.01 4.96 5.06
4 setosa Sepal.Width 3.43 3.37 3.48
5 versicolor Petal.Length 4.26 4.19 4.33
6 versicolor Petal.Width 1.33 1.30 1.35
7 versicolor Sepal.Length 5.94 5.86 6.01
8 versicolor Sepal.Width 2.77 2.73 2.81
9 virginica Petal.Length 5.55 5.47 5.63
10 virginica Petal.Width 2.03 1.99 2.06
11 virginica Sepal.Length 6.59 6.50 6.68
12 virginica Sepal.Width 2.97 2.93 3.02
Using iris for reproducibility
library(tidyverse)
mean_by <- function(data,by,var1) {
data %>%
group_by({{by}}) %>%
summarise(avg=mean({{var1}}))
}
iris %>% mean_by(Species,Petal.Width) # This works
map_dfr(iris,mean_by,Species) # This doesnt work, I want to run this to all numerical columns in iris, how do I do that.
More importantly, fundamental question: how do I pass arguments to custom function using map_dfr in R
Ideally you should use across :
library(dplyr)
library(purrr)
iris %>%
group_by(Species) %>%
summarise(across(where(is.numeric), mean, na.rm = TRUE))
# Species Sepal.Length Sepal.Width Petal.Length Petal.Width
# <fct> <dbl> <dbl> <dbl> <dbl>
#1 setosa 5.01 3.43 1.46 0.246
#2 versicolor 5.94 2.77 4.26 1.33
#3 virginica 6.59 2.97 5.55 2.03
To use map_dfr for numeric variables we can change the function to :
mean_by <- function(data,by,var1) {
data %>%
group_by({{by}}) %>%
summarise(avg = mean(.data[[var1]]))
}
map_dfr(names(select(iris, where(is.numeric))),
mean_by, data = iris, by = Species)
# Species avg
# <fct> <dbl>
# 1 setosa 5.01
# 2 versicolor 5.94
# 3 virginica 6.59
# 4 setosa 3.43
# 5 versicolor 2.77
# 6 virginica 2.97
# 7 setosa 1.46
# 8 versicolor 4.26
# 9 virginica 5.55
#10 setosa 0.246
#11 versicolor 1.33
#12 virginica 2.03
I want to get summary of multiple columns in a data frame group-wise. I'm using dplyr::group_by and dplyr::summarise_if to get the results, but I'm unable to get name the columns according to the names of the columns which are being summarised.
The following example illustrates this:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tibble)
library(tidyr)
iris %>%
group_by(Species) %>%
summarise_if(.predicate = is.numeric,
.funs = ~ list(enframe(x = summary(object = .)))) %>%
unnest() %>%
select(which(x = !duplicated(x = lapply(X = .,
FUN = summary))))
#> # A tibble: 18 x 6
#> Species name value value1 value2 value3
#> <fct> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa Min. 4.3 2.3 1 0.1
#> 2 setosa 1st Qu. 4.8 3.2 1.4 0.2
#> 3 setosa Median 5 3.4 1.5 0.2
#> 4 setosa Mean 5.01 3.43 1.46 0.246
#> 5 setosa 3rd Qu. 5.2 3.68 1.58 0.3
#> 6 setosa Max. 5.8 4.4 1.9 0.6
#> 7 versicolor Min. 4.9 2 3 1
#> 8 versicolor 1st Qu. 5.6 2.52 4 1.2
#> 9 versicolor Median 5.9 2.8 4.35 1.3
#> 10 versicolor Mean 5.94 2.77 4.26 1.33
#> 11 versicolor 3rd Qu. 6.3 3 4.6 1.5
#> 12 versicolor Max. 7 3.4 5.1 1.8
#> 13 virginica Min. 4.9 2.2 4.5 1.4
#> 14 virginica 1st Qu. 6.22 2.8 5.1 1.8
#> 15 virginica Median 6.5 3 5.55 2
#> 16 virginica Mean 6.59 2.97 5.55 2.03
#> 17 virginica 3rd Qu. 6.9 3.18 5.88 2.3
#> 18 virginica Max. 7.9 3.8 6.9 2.5
Created on 2019-05-15 by the reprex package (v0.2.1)
As you can see, the columns are named value, value1, etc, whereas I'd like them to be Sepal.Length, Sepal.Width, etc. After I get this result, of course it is possible to name the columns manually, but I guess there's a better way to do it using the value argument of tibble::enframe.
As an alternative, I'm currently using the following method. It requires a fake data, which is also not preferable.
iris %>%
group_by(Species) %>%
summarise_if(.predicate = is.numeric,
.funs = ~ list(summary(object = .))) %>%
unnest() %>%
group_by(Species) %>%
mutate(Statistic = names(x = summary(object = rnorm(n = 1)))) %>%
ungroup() %>%
select(Species, Statistic, everything())
Any help will be appreciated.
Might be this way? I didn't sort it according to the name within each Species, but I think it isn't important.
library(tidyverse)
iris %>%
group_by(Species) %>%
summarise_if(is.numeric, . ~ list(enframe(summary(.)))) %>%
gather('key', 'value', -Species) %>%
unnest() %>%
spread(key, value)
## A tibble: 18 x 6
# Species name Petal.Length Petal.Width Sepal.Length Sepal.Width
# <fct> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 1st Qu. 1.4 0.2 4.8 3.2
# 2 setosa 3rd Qu. 1.58 0.3 5.2 3.68
# 3 setosa Max. 1.9 0.6 5.8 4.4
# 4 setosa Mean 1.46 0.246 5.01 3.43
# 5 setosa Median 1.5 0.2 5 3.4
# 6 setosa Min. 1 0.1 4.3 2.3
# 7 versicolor 1st Qu. 4 1.2 5.6 2.52
# 8 versicolor 3rd Qu. 4.6 1.5 6.3 3
# 9 versicolor Max. 5.1 1.8 7 3.4
#10 versicolor Mean 4.26 1.33 5.94 2.77
#11 versicolor Median 4.35 1.3 5.9 2.8
#12 versicolor Min. 3 1 4.9 2
#13 virginica 1st Qu. 5.1 1.8 6.22 2.8
#14 virginica 3rd Qu. 5.88 2.3 6.9 3.18
#15 virginica Max. 6.9 2.5 7.9 3.8
#16 virginica Mean 5.55 2.03 6.59 2.97
#17 virginica Median 5.55 2 6.5 3
#18 virginica Min. 4.5 1.4 4.9 2.2
This question already has answers here:
Reshaping multiple sets of measurement columns (wide format) into single columns (long format)
(8 answers)
Gather multiple sets of columns
(5 answers)
Closed 3 years ago.
I have a dataset with the mean and sd of each variable as columns, but I want to convert it into "long" format as so:
library(tidyverse)
iris %>%
group_by(Species) %>%
summarize_all(list(mean = mean, sd = sd))
#> # A tibble: 3 x 9
#> Species Sepal.Length_me~ Sepal.Width_mean Petal.Length_me~
#> <fct> <dbl> <dbl> <dbl>
#> 1 setosa 5.01 3.43 1.46
#> 2 versic~ 5.94 2.77 4.26
#> 3 virgin~ 6.59 2.97 5.55
#> # ... with 5 more variables: Petal.Width_mean <dbl>,
#> # Sepal.Length_sd <dbl>, Sepal.Width_sd <dbl>, Petal.Length_sd <dbl>,
#> # Petal.Width_sd <dbl>
# Desired output:
#
# tribble(~Species, ~Variable, ~Mean, ~SD
# #-------------------------------
# ... )
I feel like tidyr::gather would be good to use here, however, I am not sure how the syntax would work for having two values per key. Or perhaps I need to use two gathers and column bind them?
To convert your post-summarise_all data you can do the following
df %>%
gather(key, val, -Species) %>%
separate(key, into = c("Variable", "metric"), sep = "_") %>%
spread(metric, val)
## A tibble: 12 x 4
# Species Variable mean sd
# <fct> <chr> <dbl> <dbl>
# 1 setosa Petal.Length 1.46 0.174
# 2 setosa Petal.Width 0.246 0.105
# 3 setosa Sepal.Length 5.01 0.352
# 4 setosa Sepal.Width 3.43 0.379
# 5 versicolor Petal.Length 4.26 0.470
# 6 versicolor Petal.Width 1.33 0.198
# 7 versicolor Sepal.Length 5.94 0.516
# 8 versicolor Sepal.Width 2.77 0.314
# 9 virginica Petal.Length 5.55 0.552
#10 virginica Petal.Width 2.03 0.275
#11 virginica Sepal.Length 6.59 0.636
#12 virginica Sepal.Width 2.97 0.322
But it's actually faster & shorter to transform the data from wide to long right from the start
iris %>%
gather(Variable, val, -Species) %>%
group_by(Species, Variable) %>%
summarise(Mean = mean(val), SD = sd(val))
## A tibble: 12 x 4
## Groups: Species [?]
# Species Variable Mean SD
# <fct> <chr> <dbl> <dbl>
# 1 setosa Petal.Length 1.46 0.174
# 2 setosa Petal.Width 0.246 0.105
# 3 setosa Sepal.Length 5.01 0.352
# 4 setosa Sepal.Width 3.43 0.379
# 5 versicolor Petal.Length 4.26 0.470
# 6 versicolor Petal.Width 1.33 0.198
# 7 versicolor Sepal.Length 5.94 0.516
# 8 versicolor Sepal.Width 2.77 0.314
# 9 virginica Petal.Length 5.55 0.552
#10 virginica Petal.Width 2.03 0.275
#11 virginica Sepal.Length 6.59 0.636
#12 virginica Sepal.Width 2.97 0.322
Here is an option with pivot_longer from the dev version of tidyr.
library(dplyr)
library(tidyr) #tidyr_0.8.3.9000
df %>%
rename_at(-1, ~ str_replace(., "(.*)_(.*)", "\\2_\\1")) %>%
pivot_longer(-Species, names_to = c(".value", "Variable"), names_sep = "_")
# A tibble: 12 x 4
# Species Variable mean sd
# <fct> <chr> <dbl> <dbl>
# 1 setosa Sepal.Length 5.01 0.352
# 2 setosa Sepal.Width 3.43 0.379
# 3 setosa Petal.Length 1.46 0.174
# 4 setosa Petal.Width 0.246 0.105
# 5 versicolor Sepal.Length 5.94 0.516
# 6 versicolor Sepal.Width 2.77 0.314
# 7 versicolor Petal.Length 4.26 0.470
# 8 versicolor Petal.Width 1.33 0.198
# 9 virginica Sepal.Length 6.59 0.636
#10 virginica Sepal.Width 2.97 0.322
#11 virginica Petal.Length 5.55 0.552
#12 virginica Petal.Width 2.03 0.275
data
data(iris)
df <- iris %>%
group_by(Species) %>%
summarize_all(list(mean = mean, sd = sd))