Using pmap with c(...) part 2 - r

I have been exploring the various application of using pmap function and its variations recently and I am particularly interested in using c(...) to pass all the arguments into. The following data set belongs to another question that we discussed earlier today with a number of very knowledgeable users.
We were supposed to repeat the values in weight column based on values in Days column along their respective rows to get the following output:
df <- tribble(
~Name, ~School, ~Weight, ~Days,
"Antoine", "Bach", 0.03, 5,
"Antoine", "Ken", 0.02, 7,
"Barbara", "Franklin", 0.04, 3
)
Output:
df %>%
mutate(map2_dfr(Weight, Days, ~ set_names(rep(.x, .y), 1:.y))) %>%
select(-c(Weight, Days))
# A tibble: 3 x 9
Name School `1` `2` `3` `4` `5` `6` `7`
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
2 Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
3 Barbara Franklin 0.04 0.04 0.04 NA NA NA NA
My question is this output is achievable through various solutions but the following one proposed by one of the contributors caught my attention. I would like to know how I could rewrite it by means of c(...)
# This is not my code and it works:
pmap_dfr(df, function(Weight, Days, ...) c(..., setNames(rep(Weight, Days), 1:Days)))
# And I can also rewrite it in the following way which also works:
df %>%
mutate(data = pmap(list(Weight, Days), ~ setNames(rep(.x, .y), 1:.y))) %>%
unnest_wider(data)
But I would like to know why any of these doesn't work:
df %>%
mutate(pmap_dfr(., ~ c(..., setNames(rep(Weight, Days), 1:Days))))
df %>%
pmap_dfr(., ~ c(..., setNames(rep(Weight, Days), 1:Days)))
Thank you very much in advance and so sorry for the long description.

The issue seems to be mixing the custom anonymous/lambda function (function(Weight, Days, ...) - where the arguments are named as the same as the column name) with the default lambda function (~ - where the arguments are .x, .y if only two elements or if more than two - ..1, ..2, ..3 etc). In the OP's code
library(dplyr)
library(purrr)
df %>%
mutate(pmap_dfr(., ~ c(..., setNames(rep(Weight, Days), 1:Days))))
The 'Weight', 'Days' returns the full column values from original dataset and not from rows. If we want to still make use of the above command, we need to convert the data captured in each row to a tibble and use with
df %>%
pmap_dfr(., ~ with(as_tibble(list(...)),
setNames(rep(Weight, Days), seq_len(Days))))
# A tibble: 3 x 7
# `1` `2` `3` `4` `5` `6` `7`
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 0.03 0.03 0.03 0.03 0.03 NA NA
#2 0.02 0.02 0.02 0.02 0.02 0.02 0.02
#3 0.04 0.04 0.04 NA NA NA NA
If we want the other columns,
df %>%
pmap_dfr(., ~ c(list(...)[-(3:4)], with(as_tibble(list(...)),
setNames(rep(Weight, Days), seq_len(Days)))))
# A tibble: 3 x 9
# Name School `1` `2` `3` `4` `5` `6` `7`
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
#2 Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
#3 Barbara Franklin 0.04 0.04 0.04 NA NA NA NA
Or use rowwise
library(tidyr)
df %>%
rowwise %>%
mutate(out = list(setNames(rep(Weight, Days), seq_len(Days)))) %>%
ungroup %>%
unnest_wider(c(out)) %>%
select(-Weight, -Days)
# A tibble: 3 x 9
# Name School `1` `2` `3` `4` `5` `6` `7`
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
#2 Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
#3 Barbara Franklin 0.04 0.04 0.04 NA NA NA NA

This may not make much value addition, but may be helpful for understanding things in lambda functions.
pmap_df(df, ~ c(setNames(c(..1, ..2), names(df[1:2])), setNames(rep(..3, ..4), seq_len(..4))))
# A tibble: 3 x 9
Name School `1` `2` `3` `4` `5` `6` `7`
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
2 Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
3 Barbara Franklin 0.04 0.04 0.04 NA NA NA NA
pmap_df only is sufficient and pmap_dfr may be redundant
you can pass specific arguments like ..1, ..2, etc.
Or this will also do
pmap_df(df, ~ c(list(...)[1:2], setNames(rep(..3, ..4), seq_len(..4))))
# A tibble: 3 x 9
Name School `1` `2` `3` `4` `5` `6` `7`
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
2 Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
3 Barbara Franklin 0.04 0.04 0.04 NA NA NA NA

Related

Bring excel-table in tidy format

I have some struggles converting the following data (from an Excel-sheet) into a tidy format:
input <- structure(list(...11 = c(
NA, NA, "<1000", ">=1000 and <2000",
"2000", ">2000 and < 3000", ">=3000"
), ...13 = c(
"male", "female",
NA, NA, NA, NA, NA
), ...14 = c(
"<777", "<555", "0.3", "0.1",
"0.15", "0.13", "0.15"
), ...15 = c(
"888-999", "555-999", "0.23",
"0.21", "0", "0.21", "0.36"
), ...16 = c(
"556-899", "1020-1170",
"0.13", "0.29", "0.7", "0.8", "0.2"
), ...17 = c(
">960", ">11000",
"0.58", "0.31", "0.22", "0.65", "0.7"
)), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame"))
# A tibble: 7 × 6
...11 ...13 ...14 ...15 ...16 ...17
<chr> <chr> <chr> <chr> <chr> <chr>
1 NA male <777 888-999 556-899 >960
2 NA female <555 555-999 1020-1170 >11000
3 <1000 NA 0.3 0.23 0.13 0.58
4 >=1000 and <2000 NA 0.1 0.21 0.29 0.31
5 2000 NA 0.15 0 0.7 0.22
6 >2000 and < 3000 NA 0.13 0.21 0.8 0.65
7 >=3000 NA 0.15 0.36 0.2 0.7
I would like to bring it into the following structure:
output <- tibble::tribble(
~gender, ~x, ~y, ~share,
"male", "<777", "<1000", 0.3,
"female", "<555", "<1000", 0.3,
"male", "<777", ">=1000 and <2000", 0.1,
"female", "<555", ">=1000 and <2000", 0.1,
)
# A tibble: 4 × 4
gender x y share
<chr> <chr> <chr> <dbl>
1 male <777 <1000 0.3
2 female <555 <1000 0.3
3 male <777 >=1000 and <2000 0.1
4 female <555 >=1000 and <2000 0.1
Any hints are much appreciated!
As outlined in the comments, here's a suggested approach:
Import the excel sheet twice using readxl's read_excel using the skip argument:
library(readxl)
df1 <- read_excel(file = "yourfile.xlsx", skip = 2)
df2 <- read_excel(file = "yourfile.xlsx", skip = 1)
That should give you (note X1 might be called ...1):
df1 <- read_table("NA male <777 888-999 556-899 >960
<1000 NA 0.3 0.23 0.13 0.58
>=1000and<2000 NA 0.1 0.21 0.29 0.31
2000 NA 0.15 0 0.7 0.22
>2000and<3000 NA 0.13 0.21 0.8 0.65
>=3000 NA 0.15 0.36 0.2 0.7")
df2 <- read_table("NA female <555 555-999 1020-1170 >11000
<1000 NA 0.3 0.23 0.13 0.58
>=1000and<2000 NA 0.1 0.21 0.29 0.31
2000 NA 0.15 0 0.7 0.22
>2000and<3000 NA 0.13 0.21 0.8 0.65
>=3000 NA 0.15 0.36 0.2 0.7")
Then do a little wrangling; most importantly turn into a long format:
library(dplyr)
library(tidyr)
df1 <- df1 |>
select(-male) |>
rename(y = X1) |>
mutate(gender = "male") |>
pivot_longer(-c("gender", "y"), names_to = "x", values_to = "share")
df2 <- df2 |>
select(-female) |>
rename(y = X1) |>
mutate(gender = "female") |>
pivot_longer(-c("gender", "y"), names_to = "x", values_to = "share")
And voila, a tidy frame:
bind_rows(df1, df2) |> arrange(y)
Output:
# A tibble: 40 × 4
y gender x share
<chr> <chr> <chr> <dbl>
1 <1000 male <777 0.3
2 <1000 male 888-999 0.23
3 <1000 male 556-899 0.13
4 <1000 male >960 0.58
5 <1000 female <555 0.3
6 <1000 female 555-999 0.23
7 <1000 female 1020-1170 0.13
8 <1000 female >11000 0.58
9 >=1000and<2000 male <777 0.1
10 >=1000and<2000 male 888-999 0.21
# … with 30 more rows
It's a bit unclear, but I think you'd need to do something like this
df <- input[3:nrow(input),]
input <- input[1:2, 2:3]
t <- input[rep(1:nrow(input), nrow(df)),]
s <- df[rep(1:nrow(df), 2), ]
t <- cbind(t,s)
, and repeat as needed if you need this for multiple columns.

A way to indicate all possible Likert response options for a particular column so that those not used have a 0 by them using pivot longer in R?

I have numerous likert-type questions in my data and am using pivot longer to get percentages of how often each option is used. For some questions, however, certain options are never indicated by a respondent (e.g., they never answered with a 1). However, I would still like to see each possible response for each item with a 0/0% if it wasn't used. For instance, let's say I have a data frame d1.
d1(names)
"Course" "likert_1" "likert_2" "likert_3" "likert_4"
d1_long <- d1 %>%
pivot_longer(-Course, names_to = "items", values_to = "val") %>%
group_by(items) %>%
group_by(items, Course) %>%
mutate(N= sum (is.na(val) == F),
val= as.character(val)) %>%
group_by(val, .add = TRUE) %>%
summarise(n = n(),
percent = round((n/N), digits = 2)) %>%
distinct()
head(d1_long)
# A tibble: 6 × 5
# Groups: items, Course, val [6]
items Course val n percent
<chr> <chr> <chr> <int> <dbl>
1 likert_1 A765 2 2 0.04
2 likert_1 A765 3 1 0.02
3 likert_1 A765 4 50 0.88
4 likert_1 B768 1 2 0.04
5 likert_1 B768 3 24 0.48
6 likert_1 B768 4 26 0.52
So, we can see that response option 1 wasn't used in course "A765", and option 2 wasn't used in course B768. What I am hoping to see is something like this:
head(d1_long)
# A tibble: 6 × 5
# Groups: items, Course, val [6]
items Course val n percent
<chr> <chr> <chr> <int> <dbl>
1 likert_1 A765 1 0 0.00
2 likert_1 A765 2 2 0.04
3 likert_1 A765 3 1 0.02
4 likert_1 A765 4 50 0.88
4 likert_1 B768 1 2 0.04
5 likert_1 B768 2 0 0.00
6 likert_1 B768 3 24 0.48
Any help is greatly appreciated- thanks!
Edited:
dput(d1_long)
structure(list(items = c("likert_1", "likert_1", "likert_1",
"likert_1", "likert_1", "likert_1"), Course = c("A765", "A765",
"A765", "B768", "B768", "B768"), val = c(2L, 3L, 4L, 1L, 3L,
4L), n = c(2L, 1L, 50L, 2L, 24L, 26L), percent = c(0.04, 0.02,
0.88, 0.04, 0.48, 0.52)), class = c("grouped_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -6L), groups = structure(list(
items = c("likert_1", "likert_1", "likert_1", "likert_1",
"likert_1", "likert_1"), Course = c("A765", "A765", "A765",
"B768", "B768", "B768"), val = c(2L, 3L, 4L, 1L, 3L, 4L),
.rows = structure(list(1L, 2L, 3L, 4L, 5L, 6L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L), .drop = TRUE))
Edit 2: I should have noted -- not all items have the same response scheme. For instance, some are 1-5 others are 1-7. Thanks
Here is a way. Group by items and Course, then complete based on a vector of all possible responses. Columns n and percent are filled with zeros (the default is NA).
suppressPackageStartupMessages(library(tidyverse))
all_possible_resp <- 1:4
d1_long %>%
ungroup() %>%
group_by(items, Course) %>%
complete(val = all_possible_resp,
fill = list(n = 0, percent = 0)) %>%
ungroup()
#> # A tibble: 8 × 5
#> items Course val n percent
#> <chr> <chr> <int> <int> <dbl>
#> 1 likert_1 A765 1 0 0
#> 2 likert_1 A765 2 2 0.04
#> 3 likert_1 A765 3 1 0.02
#> 4 likert_1 A765 4 50 0.88
#> 5 likert_1 B768 1 2 0.04
#> 6 likert_1 B768 2 0 0
#> 7 likert_1 B768 3 24 0.48
#> 8 likert_1 B768 4 26 0.52
Created on 2022-06-22 by the reprex package (v2.0.1)

Combine list elements into a dataframe r

I currently have a list with columns as individual elements.
I would like to combine list elements with the same column names (i.e. bind rows) and merge across the different columns (i.e. bind columns) into a single data frame. I'm having difficulty finding examples of how to do this.
l = list(est = c(0, 0.062220390087795, 1.1020213968139, 0.0359939361491544
), se = c(0.0737200634874046, 0.237735179934829, 0.18105632705918,
0.111359438298789), rf = structure(c(NA, NA, NA, 4L), levels = c("Never\nsmoker",
"Occasional\nsmoker", "Ex-regular\nsmoker", "Smoker"), class = "factor"),
n = c(187L, 18L, 32L, 82L), model = c("Crude", "Crude", "Crude",
"Crude"), est = c(0, 0.112335510453586, 0.867095253670329,
0.144963556944891), se = c(0.163523775933409, 0.237039485900481,
0.186247776987999, 0.119887623484768), rf = structure(c(NA,
NA, NA, 4L), levels = c("Never\nsmoker", "Occasional\nsmoker",
"Ex-regular\nsmoker", "Smoker"), class = "factor"), n = c(187L,
18L, 32L, 82L), model = c("Model 1", "Model 1", "Model 1",
"Model 1"), est = c(0, 0.107097305324242, 0.8278765140371,
0.0958220447859447), se = c(0.164787596943329, 0.237347836229364,
0.187201880036661, 0.120882616647714), rf = structure(c(NA,
NA, NA, 4L), levels = c("Never\nsmoker", "Occasional\nsmoker",
"Ex-regular\nsmoker", "Smoker"), class = "factor"), n = c(187L,
18L, 32L, 82L), model = c("Model 2", "Model 2", "Model 2",
"Model 2"))
I would like the data to have the following format:
data.frame(
est = c(),
se = c(),
rf = c(),
model = c()
)
Any help would be appreciated. Thank you!
In this solution, first the elements of l are grouped by name and then are combined using c. Finally, the resulting list is converted to a dataframe using map_dfc.
library(dplyr)
library(purrr)
cols <- c("est", "se", "rf", "model")
setNames(cols,cols) |>
map(~l[names(l) == .x]) |>
map_dfc(~do.call(c, .x))
#> # A tibble: 12 × 4
#> est se rf model
#> <dbl> <dbl> <fct> <chr>
#> 1 0 0.0737 NA Crude
#> 2 0.0622 0.238 NA Crude
#> 3 1.10 0.181 NA Crude
#> 4 0.0360 0.111 Smoker Crude
#> 5 0 0.164 NA Model 1
#> 6 0.112 0.237 NA Model 1
#> 7 0.867 0.186 NA Model 1
#> 8 0.145 0.120 Smoker Model 1
#> 9 0 0.165 NA Model 2
#> 10 0.107 0.237 NA Model 2
#> 11 0.828 0.187 NA Model 2
#> 12 0.0958 0.121 Smoker Model 2
another option
library(purrr)
grp <- (seq(length(l)) - 1) %/% 5
l_split <- split(l, grp)
map_df(l_split, c)
#> # A tibble: 12 × 5
#> est se rf n model
#> <dbl> <dbl> <fct> <int> <chr>
#> 1 0 0.0737 <NA> 187 Crude
#> 2 0.0622 0.238 <NA> 18 Crude
#> 3 1.10 0.181 <NA> 32 Crude
#> 4 0.0360 0.111 Smoker 82 Crude
#> 5 0 0.164 <NA> 187 Model 1
#> 6 0.112 0.237 <NA> 18 Model 1
#> 7 0.867 0.186 <NA> 32 Model 1
#> 8 0.145 0.120 Smoker 82 Model 1
#> 9 0 0.165 <NA> 187 Model 2
#> 10 0.107 0.237 <NA> 18 Model 2
#> 11 0.828 0.187 <NA> 32 Model 2
#> 12 0.0958 0.121 Smoker 82 Model 2

Make connections between two datasets

I would like to make a connection between the x and df2 datasets. Notice that the dataset x, I have a percentage value, which in this case for the day 03-01-2021 is 0.1 and for the days 01-02-2021 and 01-01-2022 it is 0.45. So from that information, I know the percentage value for 03-01-2021 is 0.1, so this value falls into category I of my dataset df2 (since the values range from 0.1 to 0.2). As for the days 02-01-2021 and 01-01-2022, they correspond to category F of the df2,since the values range from 0.4 to 0.5. So, I would like to generate an output table as follows:
library(dplyr)
df1<- structure(
list(date2= c("01-01-2022","01-01-2022","03-01-2021","03-01-2021","01-02-2021","01-02-2021"),
Category= c("ABC","CDE","ABC","CDE","ABC","CDE"),
coef= c(5,4,0,2,4,5)),
class = "data.frame", row.names = c(NA, -6L))
x<-df1 %>%
group_by(date2) %>%
summarize(across("coef", sum),.groups = 'drop')%>%
arrange(date2 = as.Date(date2, format = "%d-%m-%Y"))
number<-20
x$Percentage<-x$coef/number
date2 coef Percentage
<chr> <dbl> <dbl>
1 03-01-2021 2 0.1
2 01-02-2021 9 0.45
3 01-01-2022 9 0.45
df2 <- structure(
list(
Category = c("A", "B", "C", "D",
"E", "F", "G", "H", "I", "J"),
From = c(0.9,
0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0),
Until = c(
1,
0.8999,
0.7999,
0.6999,
0.5999,
0.4999,
0.3999,
0.2999,
0.1999,
0.0999
),
`1 Val` = c(
2222,
2017.8,
1793.6,
1621.5,
1522.4,
1457.3,
1325.2,
1229.15,
1223.1,
1177.05
),
`2 Val` = c(3200, 2220, 2560,
2200, 2220, 2080, 1220, 1240, 1720, 1620),
`3 Val` = c(
4665,
4122.5,
3732,
3498.75,
3265.5,
3032.25,
2799,
2682.375,
2565.75,
2449.125
),
`4 Val` = c(
6112,
5222.8,
4889.6,
4224,
4278.4,
3972.8,
3667.2,
3224.4,
3361.6,
3222.8
)
),
row.names = c(NA,-10L),
class = c("tbl_df",
"tbl", "data.frame")
)
Category From Until 1 Val 2 Val 3 Val 4 Val
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 0.9 1 2222 3200 4665 6112
2 B 0.8 0.900 2018 2220 4122 5223
3 C 0.7 0.800 1794 2560 3732 4890
4 D 0.6 0.700 1622 2200 3499 4224
5 E 0.5 0.600 1522 2220 3266 4278
6 F 0.4 0.500 1457 2080 3032 3973
7 G 0.3 0.400 1325 1220 2799 3667
8 H 0.2 0.300 1229 1240 2682 3224
9 I 0.1 0.200 1223 1720 2566 3362
10 J 0 0.0999 1177 1620 2449 3223
Using tidyverse, we do a rowwise on the 'x' dataset, slice the rows of 'df2' where the 'Percentage' falls between the 'From' and 'Until', and unpack the data.frame/tibble column
library(dplyr)
library(tidyr)
x %>%
rowwise %>%
mutate(out = df2 %>%
slice(which(Percentage>= From &
Percentage <= Until)[1]) %>%
select(-(1:3)) ) %>%
ungroup %>%
unpack(out)
-output
# A tibble: 3 × 7
date2 coef Percentage `1 Val` `2 Val` `3 Val` `4 Val`
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 03-01-2021 2 0.1 1223. 1720 2566. 3362.
2 01-02-2021 9 0.45 1457. 2080 3032. 3973.
3 01-01-2022 9 0.45 1457. 2080 3032. 3973.
Or this could be done with a non-equi join
library(data.table)
nm1 <- names(df2)[endsWith(names(df2), 'Val')]
setDT(x)[setDT(df2), (nm1) := mget(nm1),
on = .(Percentage >= From, Percentage <= Until)]
-output
> x
date2 coef Percentage 1 Val 2 Val 3 Val 4 Val
1: 03-01-2021 2 0.10 1223.1 1720 2565.75 3361.6
2: 01-02-2021 9 0.45 1457.3 2080 3032.25 3972.8
3: 01-01-2022 9 0.45 1457.3 2080 3032.25 3972.8

How can I easily combine the output of grouped summaries with an overall output for the data

I've used group_by with the summarise command in dplyr to generate some summaries for my data. I would like to get the same summaries for the overall data set and combine it as one tibble.
Is there a straighforward way of doing this? My solution below feels like it has 4X the amount of code required to do this efficently!
Thanks in advance.
# reprex
library(tidyverse)
tidy_data <- tibble::tribble(
~drug, ~gender, ~condition, ~value,
"control", "f", "work", 0.06,
"treatment", "m", "work", 0.42,
"treatment", "f", "work", 0.22,
"control", "m", "work", 0.38,
"treatment", "m", "work", 0.57,
"treatment", "f", "work", 0.24,
"control", "f", "work", 0.61,
"control", "f", "play", 0.27,
"treatment", "m", "play", 0.3,
"treatment", "f", "play", 0.09,
"control", "m", "play", 0.84,
"control", "m", "play", 0.65,
"treatment", "m", "play", 0.98,
"treatment", "f", "play", 0.38
)
tidy_summaries <- tidy_data %>%
# Group by the required variables
group_by(drug, gender, condition) %>%
summarise(mean = mean(value),
median = median(value),
min = min(value),
max = max(value)) %>%
# Bind rows will bind this output to the following one
bind_rows(
# Now for the overall version
tidy_data %>%
# Generate the overall summary values
mutate(mean = mean(value),
median = median(value),
min = min(value),
max = max(value)) %>%
# We need to know what the structure of the 'grouped_by' tibble first
# as the overall output format needs to match that
select(drug, gender, condition, mean:max) %>% # Keep columns of interest
# The same information will be appended to all rows, so we just need to retain one
filter(row_number() == 1) %>%
# Change the values in drug, gender, condition to "overall"
mutate_at(vars(drug:condition),
list(~ifelse(is.character(.), "overall", .)))
)
This the output I want, but it wasn't as simple as I might have hoped.
tidy_summaries
#> # A tibble: 9 x 7
#> # Groups: drug, gender [5]
#> drug gender condition mean median min max
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 control f play 0.27 0.27 0.27 0.27
#> 2 control f work 0.335 0.335 0.06 0.61
#> 3 control m play 0.745 0.745 0.65 0.84
#> 4 control m work 0.38 0.38 0.38 0.38
#> 5 treatment f play 0.235 0.235 0.09 0.38
#> 6 treatment f work 0.23 0.23 0.22 0.24
#> 7 treatment m play 0.64 0.64 0.3 0.98
#> 8 treatment m work 0.495 0.495 0.42 0.570
#> 9 overall overall overall 0.429 0.38 0.06 0.98
Try
tidy_data %>%
group_by(drug, gender, condition) %>%
summarise(mean = mean(value), median = median(value), min = min(value), max = max(value)) %>%
bind_rows(.,
tidy_data %>%
summarise(drug = "Overall", gender = "Overall", condition = "Overall", mean = mean(value), median = median(value), min = min(value), max = max(value))
)
This gives:
# A tibble: 9 x 7
# Groups: drug, gender [5]
drug gender condition mean median min max
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 control f play 0.27 0.27 0.27 0.27
2 control f work 0.335 0.335 0.06 0.61
3 control m play 0.745 0.745 0.65 0.84
4 control m work 0.38 0.38 0.38 0.38
5 treatment f play 0.235 0.235 0.09 0.38
6 treatment f work 0.23 0.23 0.22 0.24
7 treatment m play 0.64 0.64 0.3 0.98
8 treatment m work 0.495 0.495 0.42 0.570
9 Overall Overall Overall 0.429 0.38 0.06 0.98
The code summarizes it via groupings first, and then creates the final summary row from the original data and binds it at the very bottom.
Interesting question. My take is basically the same answer as #sumshyftw but uses mutate_if and summarise_at.
Code
library(hablar)
funs <- list(mean = ~mean(.),
median = ~median(.),
min = ~min(.),
max = ~max(.))
tidy_data %>%
group_by(drug, gender, condition) %>%
summarise_at(vars(value), funs) %>%
ungroup() %>%
bind_rows(., tidy_data %>% summarise_at(vars(value), funs)) %>%
mutate_if(is.character, ~if_na(., "Overall"))
Result
drug gender condition mean median min max
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 control f play 0.27 0.27 0.27 0.27
2 control f work 0.335 0.335 0.06 0.61
3 control m play 0.745 0.745 0.65 0.84
4 control m work 0.38 0.38 0.38 0.38
5 treatment f play 0.235 0.235 0.09 0.38
6 treatment f work 0.23 0.23 0.22 0.24
7 treatment m play 0.64 0.64 0.3 0.98
8 treatment m work 0.495 0.495 0.42 0.570
9 Overall Overall Overall 0.429 0.38 0.06 0.98

Resources