Getting quosures to work inside a map call - r

I'm struggling to get quosures to work inside a map call.
Some toy data:
library(tidyverse)
df <- tibble(
g1 = letters[1:2] %>%
rep(each = 3),
g2 = letters[3:5] %>%
rep(times = 2),
y = runif(6)
)
I can get this function to work, where I enquo a variable before I pass it to group_by:
sum1 <- function(df, g){
g <- enquo(g)
df %>%
group_by(!! g) %>%
summarize(
mu = y %>%
mean
)
}
Calling this function
sum1(df, g2)
gets me the expected result. But if I want to map over multiple grouping variables, (ie g1 & g2)
str_c("g", 1:2) %>%
map(
function(i)
sum1(df, i)
)
Returns the error
Error in grouped_df_impl(data, unname(vars), drop) :
Column `i` is unknown
How can I set up quosures in a map call?

We can use group_by_at and it can take a string as argument
library(tidyverse)
sum1 <- function(df, grps){
map(grps, ~
df %>%
group_by_at(.x) %>%
summarise(mu = mean(y))
)
}
sum1(df, str_c("g", 1:2))
#[[1]]
# A tibble: 2 x 2
# g1 mu
# <chr> <dbl>
#1 a 0.440
#2 b 0.469
#[[2]]
# A tibble: 3 x 2
# g2 mu
# <chr> <dbl>
#1 c 0.528
#2 d 0.592
#3 e 0.243
Regarding the usage of parameters with quosure in function, it is not clear whether it should be a single parameter or multiple parametr
In case if we are going with the string as argument, convert it to symbol (sym) and then evaluate (!!)
sum2 <- function(df, grps){
map(grps, ~
df %>%
group_by(!! rlang::sym(.x)) %>%
summarise(mu = mean(y))
)
}
sum2(df, str_c("g", 1:2))
#[[1]]
# A tibble: 2 x 2
# g1 mu
# <chr> <dbl>
#1 a 0.440
#2 b 0.469
#[[2]]
# A tibble: 3 x 2
# g2 mu
# <chr> <dbl>
#1 c 0.528
#2 d 0.592
#3 e 0.243
Another with quosure to pass multiple groups would be
sum3 <- function(df, ...){
gs <- enquos(...)
map(gs, ~
df %>%
group_by(!! .x) %>%
summarise(mu = mean(y)))
}
sum3(df, g1, g2)
#[[1]]
# A tibble: 2 x 2
# g1 mu
# <chr> <dbl>
#1 a 0.440
#2 b 0.469
#[[2]]
# A tibble: 3 x 2
# g2 mu
# <chr> <dbl>
#1 c 0.528
#2 d 0.592
#3 e 0.243

str_c("g", 1:2) %>%
syms() %>%
map(sum1, df = df)
syms() turns characters into symbols (expected by sum1).
Rewriting map(function(i) sum1(df, i)) as map(sum1, df = df) prevents unwanted evaluation of the promise i that happens when sum1 is wrapped in another function.
Rewriting map(function(i) sum1(df, i)) as map(sum1, df = df) allows to pass the symbols g1 and g2 directly to sum1(), rather than the symbol i.
(Alternatively, str_c("g", 1:2) %>% syms() %>% map(function(i) sum1(df, !! i)) or str_c("g", 1:2) %>% map(function(i) sum1(df, !! sym(i))) work, where !! unquotes i before passing it to sum1().
(Actually this is a bit oversimplified: unquoting doesn't happen before, but when you do enquo(g) in the body of sum1).

Related

How to combine function argument with group_by in R

I would like to use group_by( ) function with my customised function but the column names that goes within group_by would be defined in my function argument.
See a hypothetical example of what my data would look like:
data <- data.frame(ind = rep(c("A", "B", "C"), 4),
gender = rep(c("F", "M"), each = 6),
value = sample(1:100, 12))
And this is the result I would like to have:
result <- data %>%
group_by(ind, gender) %>%
mutate(value = mean(value)) %>%
distinct()
This is how I was trying to make my function to work:
myFunction <- function(data, set_group, variable){
result <- data %>%
group_by(get(set_group)) %>%
mutate(across(all_of(variable), ~ mean(.x, na.rm = TRUE))) %>%
distinct()
}
result3 <- myFunction(data, set_group = c("ind", "gender"), variable = c("value"))
result3
I want to allow that the user define as many set_group as needed and as many variable as needed. I tried using get( ) function, all_of( ) function and mget( ) function within group_by but none worked.
Does anyone know how can I code it?
Thank you!
We could use across within group_by
myFunction <- function(data, set_group, variable){
data %>%
group_by(across(all_of(set_group))) %>%
mutate(across(all_of(variable), ~ mean(.x, na.rm = TRUE))) %>%
ungroup %>%
distinct()
}
-testing
> myFunction(data, set_group = c("ind", "gender"), variable = c("value"))
# A tibble: 6 × 3
ind gender value
<chr> <chr> <dbl>
1 A F 43.5
2 B F 87.5
3 C F 67.5
4 A M 13
5 B M 43.5
6 C M 37.5
Another option is to convert to symbols and evaluate (!!!)
myFunction <- function(data, set_group, variable){
data %>%
group_by(!!! rlang::syms(set_group)) %>%
mutate(across(all_of(variable), ~ mean(.x, na.rm = TRUE))) %>%
ungroup %>%
distinct()
}
-testing
> myFunction(data, set_group = c("ind", "gender"), variable = c("value"))
# A tibble: 6 × 3
ind gender value
<chr> <chr> <dbl>
1 A F 43.5
2 B F 87.5
3 C F 67.5
4 A M 13
5 B M 43.5
6 C M 37.5
NOTE: get is used when there is a single object, for multiple objects mget can be used. But, it is better to use tidyverse functions

add_column applies to a matrix

I have a 2-by-2 tibble, and use add_column to add a 2-by-2 matrix. I want to have a resulting 4-by-4 tibble, but it ends up with a 2-by-3 tibble.
Here is a sample code:
A <- tibble( x = c(-1,1), y = c(-2,2))
A <- A %>% add_column( z = matrix( rnorm(4), 2,2 ) )
and dim(A) returns 2 3.
So my question is how this resulting matrix can be 2-by-4?
Maybe you can try
> A %>% add_column(as_tibble(matrix(rnorm(4), 2, 2)))
# A tibble: 2 x 4
x y V1 V2
<dbl> <dbl> <dbl> <dbl>
1 -1 -2 -0.647 -0.982
2 1 2 -0.264 -1.25
If you want to add columns with names starting with z, we can use
A %>% add_column(setNames(as_tibble(matrix(rnorm(4), 2, 2)),paste0("z",1:2)))
or (thank akrun's comment)
A %>% add_column(as_tibble(matrix(rnorm(4), 2, 2, dimnames = list(NULL, c('z1', 'z2')))))
We can flatten to regular columns from the matrix column 'z' with do.call and data.frame call
library(dplyr)
library(tibble)
A %>%
add_column( z = matrix( rnorm(4), 2,2 ) ) %>%
do.call(data.frame, .) %>%
as_tibble
-output
# A tibble: 2 x 4
# x y z.1 z.2
# <dbl> <dbl> <dbl> <dbl>
#1 -1 -2 -1.41 -1.93
#2 1 2 1.80 1.01
Or another option is to split the matrix into a list with asplit and use unnest_wider
library(tidyr)
A %>%
add_column( z = matrix( rnorm(4), 2,2 ) ) %>%
mutate(z = asplit(z, 2)) %>%
unnest_wider(c(z))

Is there some function to keep unique values in R dplyr with group_by?

I have a data.frame (or tiibble or whatever) with an id variable. Often I made some operation for this id with dplyr::group_by, so
data %>%
group_by(id) %>%
summarise/mutate/...()
Often, I have other non-numeric variables that are unique for each id, such as the project or country to which the id belongs and other characteristics of the id (such as gender, etc.). When I use the summarise function above, these other variables ares lost unless I specify, either
data %>%
group_by(id) %>%
summarise(across(c(project, country, gender, ...), unique),...)
or
data %>%
group_by(id, project, country, gender, ...) %>%
summarise()
Is there some functions which detect these variables which are unique for each id, so that one does not have to specify them?
Thank you!
PS: I am asking mainly on dplyr and group_by related functions, but other environments like R-base or data.table are wellcome also.
I did not test it extensively yet it should do the job
library(dplyr)
myData <- tibble(X = c(1, 1, 2, 2, 2, 3),
Y = LETTERS[c(1, 1, 2, 2, 2, 3)],
R = rnorm(6))
myData
#> # A tibble: 6 x 3
#> X Y R
#> <dbl> <chr> <dbl>
#> 1 1 A 0.463
#> 2 1 A -0.965
#> 3 2 B -0.403
#> 4 2 B -0.417
#> 5 2 B -2.28
#> 6 3 C 0.423
group_by_id_vars <- function(.data, ...) {
# group by the prespecified ID variables
.data <- .data %>% group_by(...)
# how many groups do these ID determine
ID_groups <- .data %>% n_groups()
# Get the number of groups if the initial grouping variables are combined
# with other variables
groupVars <- sapply(substitute(list(...))[-1], deparse) #specified grouping Variable
nms <- names(.data) # all variables in .data
res <- sapply(nms[!nms %in% groupVars],
function(x) {
.data %>%
# important to specify add = TRUE to combine the variable
# with the IDs
group_by(across(all_of(x)), .add = TRUE) %>%
n_groups()})
# which combinations are identical, i.e. this variable does not increase the
# number of groups in the data if combined with IDvars
v <- names(res)[which(res == ID_groups)]
# group the data accordingly
.data <- .data %>% ungroup() %>% group_by(across(all_of(c(groupVars, v))))
return(.data)
}
myData %>%
group_by_id_vars(X) %>%
summarise(n = n())
#> `summarise()` regrouping output by 'X' (override with `.groups` argument)
#> # A tibble: 3 x 3
#> # Groups: X [3]
#> X Y n
#> <dbl> <chr> <int>
#> 1 1 A 2
#> 2 2 B 3
#> 3 3 C 1
This is a bit more advanced in application, but what you are looking for are linear combinations of your grouping variables. You can convert these to factors and then use some linear algebra.
You can use findLinearCombos() from caret to locate these. It takes a bit of work to get it all organized how I think you want it though.
Something like this may do the trick. I also have not extensively tested this.
Packages
library(dplyr)
library(caret)
library(purrr)
Function
group_by_lc <- function(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data)) {
# capture the ... and convert to a character vector
.groups <- rlang::ensyms(...)
.groups_chr <- map_chr(.groups, rlang::as_name)
# convert all character and factor variables to a numeric
d <- .data %>%
mutate(across(where(is.factor), as.character),
across(where(is.character), as.factor),
across(where(is.factor), as.integer))
# find linear combinations of the character / factor variables
lc <- caret::findLinearCombos(d)
# see if any of your grouping variables have linear combinations
find_group_match <- function(known_groups, lc_pair) {
if (any(lc_pair %in% known_groups)) unique(c(lc_pair, known_groups)) else NULL
}
# convert column indices to names
lc_pairs <- map(lc$linearCombos, ~ names(d)[.x])
# iteratively look for linear combinations of known grouping variabels
lc_cols <- reduce(lc_pairs, find_group_match, .init = .groups_chr)
# find new grouping variables
added_groups <- rlang::syms(lc_cols[!(lc_cols %in% .groups_chr)])
# apply the grouping to your groups and the linear combinations
group_by(.data, !!!.groups, !!!added_groups, .add = .add, .drop = .drop)
}
Usage
data <- tibble(V = LETTERS[1:10], W = letters[1:10], X = paste0(V, W), Y = rep(LETTERS[1:5], each = 2), Z = runif(10))
group_by_lc(data, W)
Result
You can see how it added in all the other grouping variables. You can rework this all in other ways, the key part is building that added_groups list to find them.
# A tibble: 10 x 5
# Groups: W, X, V [10]
V W X Y Z
<chr> <chr> <chr> <chr> <dbl>
1 A a Aa A 0.884
2 B b Bb A 0.133
3 C c Cc B 0.194
4 D d Dd B 0.407
5 E e Ee C 0.256
6 F f Ff C 0.0976
7 G g Gg D 0.635
8 H h Hh D 0.0542
9 I i Ii E 0.0104
10 J j Jj E 0.464

Using quotations inside mutate: an alternative to mutate_(.dots = ...)

I want to apply different functions to the same column in a tibble. These functions are stored in a character string. I used to do this with mutate_ and the .dots argument like this:
library(dplyr)
myfuns <- c(f1 = "a^2", f2 = "exp(a)", f3 = "sqrt(a)")
tibble(a = 1:3) %>%
mutate_(.dots = myfuns)
This approach still works fine but mutate_ is deprecated. I tried to achieve the same result with mutate and the rlang package but did not get very far.
In my real example myfuns contains about 200 functions so typing them one by one is not an option.
Thanks in advance.
For simple equations that take a single input, it’s sufficient to supply the function itself, e.g.
iris %>% mutate_at(vars(-Species), sqrt)
Or, when using an equation rather than a simple function, via a formula:
iris %>% mutate_at(vars(-Species), ~ . ^ 2)
When using equations that access more than a single variable, you need to use rlang quosures instead:
area = quo(Sepal.Length * Sepal.Width)
iris %>% mutate(Sepal.Area = !! area)
Here, quo creates a “quosure” — i.e. a quoted representation of your equation, same as your use of strings, except, unlike strings, this one is properly scoped, is directly usable by dplyr, and is conceptually cleaner: It is like any other R expression, except not yet evaluated. The difference is as follows:
1 + 2 is an expression with value 3.
quo(1 + 2) is an unevaluated expression with value 1 + 2 that evaluates to 3, but it needs to be explicitly evaluated. So how do we evaluated an unevaluated expression? Well …:
Then !! (pronounced “bang bang”) unquotes the previously-quoted expression, i.e. evaluates it — inside the context of mutate. This is important, because Sepal.Length and Sepal.Width are only known inside the mutate call, not outside of it.
In all the cases above, the expressions can be inside a list, too. The only difference is that for lists you need to use !!! instead of !!:
funs = list(
Sepal.Area = quo(Sepal.Length * Sepal.Width),
Sepal.Ratio = quo(Sepal.Length / Sepal.Width)
)
iris %>% mutate(!!! funs)
The !!! operation is known as “unquote-splice”. The idea is that it “splices” the list elements of its arguments into the parent call. That is, it seems to modify the call as if it contained the list elements verbatim as arguments (this only works in functions, such as mutate, that support it, though).
Convert your strings to expressions
myexprs <- purrr::map( myfuns, rlang::parse_expr )
then pass those expressions to regular mutate using quasiquotation:
tibble(a = 1:3) %>% mutate( !!!myexprs )
# # A tibble: 3 x 4
# a f1 f2 f3
# <int> <dbl> <dbl> <dbl>
# 1 1 1 2.72 1
# 2 2 4 7.39 1.41
# 3 3 9 20.1 1.73
Note that this will also work with strings / expressions involving multiple columns.
You have only one column, so both approaches below will give you the same result.
You only have to modify your functions' list.
library(dplyr)
myfuns <- c(f1 = ~.^2, f2 = ~exp(.), f3 = ~sqrt(.))
tibble(a = 1:3) %>% mutate_at(vars(a), myfuns)
tibble(a = 1:3) %>% mutate_all(myfuns)
# # A tibble: 3 x 4
# a f1 f2 f3
# <int> <dbl> <dbl> <dbl>
# 1 1 1 2.72 1
# 2 2 4 7.39 1.41
# 3 3 9 20.1 1.73
A base alternative :
myfuns <- c(f1 = "a^2", f2 = "exp(a)", f3 = "sqrt(a)")
df <- data.frame(a = 1:3)
df[names(myfuns)] <- lapply(myfuns , function(x) eval(parse(text= x), envir = df))
df
#> a f1 f2 f3
#> 1 1 1 2.718282 1.000000
#> 2 2 4 7.389056 1.414214
#> 3 3 9 20.085537 1.732051
Created on 2019-07-08 by the reprex package (v0.3.0)
One way using parse_expr from rlang
library(tidyverse)
library(rlang)
tibble(a = 1:3) %>%
mutate(ans = map(myfuns, ~eval(parse_expr(.)))) %>%
#OR mutate(ans = map(myfuns, ~eval(parse(text = .)))) %>%
unnest() %>%
group_by(a) %>%
mutate(temp = row_number()) %>%
spread(a, ans) %>%
select(-temp) %>%
rename_all(~names(myfuns))
# A tibble: 3 x 3
# f1 f2 f3
# <dbl> <dbl> <dbl>
#1 1 2.72 1
#2 4 7.39 1.41
#3 9 20.1 1.73
you can try also a purrr approach
# define the functions
f1 <- function(a) a^2
f2 <- function(a, b) a + b
f3 <- function(b) sqrt(b)
# put all functions in one list
tibble(funs=list(f1, f2, f3)) %>%
# give each function a name
mutate(fun_id=paste0("f", row_number())) %>%
# add to each row/function the matching column profile
# first extract the column names you specified in each function
#mutate(columns=funs %>%
# toString() %>%
# str_extract_all(., "function \\(.*?\\)", simplify = T) %>%
# str_extract_all(., "(?<=\\().+?(?=\\))", simplify = T) %>%
# gsub(" ", "", .) %>%
# str_split(., ",")) %>%
# with the help of Konrad we can use fn_fmls_names
mutate(columns=map(funs, ~ rlang::fn_fmls_names(.))) %>%
# select the columns and add to our tibble/data.frame
mutate(params=map(columns, ~select(df, .))) %>%
# invoke the functions
mutate(results = invoke_map(.f = funs, .x = params)) %>%
# transform to desired output
unnest(results) %>%
group_by(fun_id) %>%
mutate(n=row_number()) %>%
spread(fun_id, results) %>%
left_join(mutate(df, n=row_number()), .) %>%
select(-n)
Joining, by = "n"
# A tibble: 5 x 5
a b f1 f2 f3
<dbl> <dbl> <dbl> <dbl> <dbl>
1 2 1 4 3 1
2 4 1 16 5 1
3 5 2 25 7 1.41
4 7 2 49 9 1.41
5 8 2 64 10 1.41
some data
df <- data_frame(
a = c(2, 4, 5, 7, 8),
b = c(1, 1, 2, 2, 2))

dplyr: passing column name to summarize inside function

I have the following example, where I pass a simple dataframe to a function that summarizes a column. The name of the summarizing column, s, I would like to have as a parameter to the function:
df <- data.frame(id = c(1,1,1,1,1,2,2,2,2,2),
a=c(1:10),
b=c(10:19))
sum <- function(df, s){
df <- df %>%
group_by(id) %>%
summarize(s = sum(a))
return(df)
}
sum(df = df, s = "summarizing.column.label")
However, regardless of the value I set, the summarizing-column always get the same name s. Is there a way to alter it?
EDIT: The output I would like is:
sum(df = df, s = "summarizing.column.label")
id summarizing.column.label
<dbl> <int>
1 1.00 15
2 2.00 40
sum(df = df, s = "a")
id a
<dbl> <int>
1 1.00 15
2 2.00 40
If we are passing a quoted argument, then one option is after the summarise, we use rename_at
sumf <- function(df, s){
df %>%
group_by(id) %>%
summarize(a = sum(a))%>%
rename_at("a", ~ s)
}
sumf(df, s ="summarizing.column.label" )
# A tibble: 2 x 2
# id summarizing.column.label
# <dbl> <int>
#1 1.00 15
#2 2.00 40
sumf(df, s ="a" )
# A tibble: 2 x 2
# id a
# <dbl> <int>
#1 1.00 15
#2 2.00 40
Or another option is to make use of := with !!
sumf <- function(df, s){
df %>%
group_by(id) %>%
summarize(a = sum(a))%>%
rename(!! (s) := a)
}
sumf(df, s ="summarizing.column.label" )
# A tibble: 2 x 2
# id summarizing.column.label
# <dbl> <int>
#1 1.00 15
#2 2.00 40
Or within summarise
sumf <- function(df, s){
df %>%
group_by(id) %>%
summarise(!!(s) := sum(a))
}
sumf(df, s ="summarizing.column.label" )
Try this:
sum <- function(df, s){
df <- df %>%
group_by(id) %>%
summarize(!!s := sum(a))
return(df)
}

Resources