using quasiquotation in a function with summarize in dplyr - r

I'm trying to write a function that can take column names as strings assigned to variables and produce a summarized output for them like so
my_function <- function(my_df, columnA,columnB){
summ_df <- my_df %>%
group_by(cyl) %>%
summarise(base_mean = mean(columnA),
contrast_mean = mean(columnB))
return(summ_df)
}
base = "drat"
cont = "wt"
my_function(mtcars,base,cont)
What I would want is that the above function would return the same thing as
mtcars %>%
group_by(cyl) %>%
summarise(base_mean = mean(drat),
contrast_mean = mean(wt))
I'm sure it's some combination of enexpr, or ensym, and !! but i keep getting NA values

Use ensym with !! so that it can take both unquoted and quoted actual arguments
my_function <- function(my_df, columnA,columnB){
my_df %>%
group_by(cyl) %>%
summarise(base_mean = mean(!! ensym(columnA)),
contrast_mean = mean(!! ensym(columnB)), .groups = 'drop' )
}
-testing
> my_function(mtcars, !!base, !!cont)
# A tibble: 3 × 3
cyl base_mean contrast_mean
<dbl> <dbl> <dbl>
1 4 4.07 2.29
2 6 3.59 3.12
3 8 3.23 4.00

Related

In R: How do I send a global variable (char string), into the mutate function [duplicate]

I would like to write a function that uses dplyr inside and I supply variable names as strings. Unfortunately dplyr-s use of NSE makes it rather complicated. From Programming with dplyr I get the following example
my_summarise <- function(df, var) {
var <- enquo(var)
df %>%
group_by(!!var) %>%
summarise(a = mean(a))
}
my_summarise(df, g1)
However, I would like to write function where instead of g1 I could provide "g1" and I am not able to wrap my head around how to do that.
dplyr >= 1.0
Use combination of double braces and the across function:
my_summarise2 <- function(df, group_var) {
df %>% group_by(across({{ group_var }})) %>%
summarise(mpg = mean(mpg))
}
my_summarise2(mtcars, "cyl")
# A tibble: 3 x 2
# cyl mpg
# <dbl> <dbl>
# 1 4 26.7
# 2 6 19.7
# 3 8 15.1
# same result as above, passing cyl without quotes
my_summarise(mtcars, cyl)
dplyr < 1.0
As far as I know, you could use as.name or sym (from the rlang package - I don't know if dplyr will import it eventually):
library(dplyr)
my_summarise <- function(df, var) {
var <- rlang::sym(var)
df %>%
group_by(!!var) %>%
summarise(mpg = mean(mpg))
}
or
my_summarise <- function(df, var) {
var <- as.name(var)
df %>%
group_by(!!var) %>%
summarise(mpg = mean(mpg))
}
my_summarise(mtcars, "cyl")
# # A tibble: 3 × 2
# cyl mpg
# <dbl> <dbl>
# 1 4 26.66364
# 2 6 19.74286
# 3 8 15.10000
Using the .data pronoun from rlang is another option that works directly with column names stored as strings.
The function with .data would look like
my_summarise <- function(df, var) {
df %>%
group_by(.data[[var]]) %>%
summarise(mpg = mean(mpg))
}
my_summarise(mtcars, "cyl")
# A tibble: 3 x 2
cyl mpg
<dbl> <dbl>
1 4 26.7
2 6 19.7
3 8 15.1
This is how to do it using only dplyr and the very useful as.name function from base R:
my_summarise <- function(df, var) {
varName <- as.name(var)
enquo_varName <- enquo(varName)
df %>%
group_by(!!enquo_varName) %>%
summarise(a = mean(a))
}
my_summarise(df, "g1")
Basically, with as.name() we generate a name object that matches var (here var is a string). Then, following Programming with dplyr, we use enquo() to look at that name and return the associated value as a quosure. This quosure can then be unquoted inside the group_by() call using !!.

using argument function with stringr::str_extract [duplicate]

I would like to write a function that uses dplyr inside and I supply variable names as strings. Unfortunately dplyr-s use of NSE makes it rather complicated. From Programming with dplyr I get the following example
my_summarise <- function(df, var) {
var <- enquo(var)
df %>%
group_by(!!var) %>%
summarise(a = mean(a))
}
my_summarise(df, g1)
However, I would like to write function where instead of g1 I could provide "g1" and I am not able to wrap my head around how to do that.
dplyr >= 1.0
Use combination of double braces and the across function:
my_summarise2 <- function(df, group_var) {
df %>% group_by(across({{ group_var }})) %>%
summarise(mpg = mean(mpg))
}
my_summarise2(mtcars, "cyl")
# A tibble: 3 x 2
# cyl mpg
# <dbl> <dbl>
# 1 4 26.7
# 2 6 19.7
# 3 8 15.1
# same result as above, passing cyl without quotes
my_summarise(mtcars, cyl)
dplyr < 1.0
As far as I know, you could use as.name or sym (from the rlang package - I don't know if dplyr will import it eventually):
library(dplyr)
my_summarise <- function(df, var) {
var <- rlang::sym(var)
df %>%
group_by(!!var) %>%
summarise(mpg = mean(mpg))
}
or
my_summarise <- function(df, var) {
var <- as.name(var)
df %>%
group_by(!!var) %>%
summarise(mpg = mean(mpg))
}
my_summarise(mtcars, "cyl")
# # A tibble: 3 × 2
# cyl mpg
# <dbl> <dbl>
# 1 4 26.66364
# 2 6 19.74286
# 3 8 15.10000
Using the .data pronoun from rlang is another option that works directly with column names stored as strings.
The function with .data would look like
my_summarise <- function(df, var) {
df %>%
group_by(.data[[var]]) %>%
summarise(mpg = mean(mpg))
}
my_summarise(mtcars, "cyl")
# A tibble: 3 x 2
cyl mpg
<dbl> <dbl>
1 4 26.7
2 6 19.7
3 8 15.1
This is how to do it using only dplyr and the very useful as.name function from base R:
my_summarise <- function(df, var) {
varName <- as.name(var)
enquo_varName <- enquo(varName)
df %>%
group_by(!!enquo_varName) %>%
summarise(a = mean(a))
}
my_summarise(df, "g1")
Basically, with as.name() we generate a name object that matches var (here var is a string). Then, following Programming with dplyr, we use enquo() to look at that name and return the associated value as a quosure. This quosure can then be unquoted inside the group_by() call using !!.

Pass a list of variable names to a function using {{foo}}

Problem
I would like to know how to pass a list of variable names to a purrr::map2 function for the purpose of iterating over a separate data frame.
The input_table$key variable below contains mpg and disp from the mtcars dataset. I think the names of the variables are being passed as character strings rather than variable names. The question is how I can change that so that my function recognises that they are variable names(?).
In this example I am trying to sum all of the values in the mtcars variables mpg and disp that fall below a set of numeric thresholds. Those variables from mtcars and the relevant thresholds are contained in input_table (below).
Ideal result
percentile key value sum_y
<fct> <chr> <dbl> <dbl>
1 0.5 mpg 19.2 266.5
2 0.9 mpg 30.1 515.8
3 0.99 mpg 33.4 609.0
4 1 mpg 33.9 642.9
5 ... ... ... ...
Attempt
library(dplyr)
library(purrr)
library(tidyr)
# Arrange a generic example
# Replicating my data structure
input_table <- mtcars %>%
as_tibble() %>%
select(mpg, disp) %>%
map_df(quantile, probs = c(0.5, 0.90, 0.99, 1)) %>%
mutate(
percentile = factor(c(0.5, 0.90, 0.99, 1))
) %>%
select(
percentile, mpg, disp
) %>%
gather(key, value, -percentile)
# Defining the function
test_func <- function(label_desc, threshold) {
mtcars %>%
select({{label_desc}}) %>%
filter({{label_desc}} <= {{threshold}}) %>%
summarise(
sum_y = sum(as.numeric({{label_desc}}), na.rm = T)
)
}
# Demo'ing that it works for a single variable and threshold value
test_func(label_desc = mpg, threshold = 19.2)
# This is where I am having trouble
# Trying to iterate over multiple (mpg, disp) variables
map2(input_table$key, input_table$value, ~test_func(label_desc = .x, threshold = .y))
The issue is curly-curly ({{}}) is used for unquoted variables as you are using in your first attempt. In your second attempt you are passing quoted variables to which the curly-curly operator does not work. A simple fix would be to use _at variants of dplyr which accepts quoted arguments.
test_func <- function(label_desc, threshold) {
mtcars %>%
filter_at(label_desc, any_vars(. <= threshold)) %>%
summarise_at(label_desc, sum)
}
purrr::map2(input_table$key, input_table$value, test_func)
#[[1]]
# mpg
#1 266.5
#[[2]]
# mpg
#1 515.8
#[[3]]
# mpg
#1 609
#[[4]]
# mpg
#1 642.9
#[[5]]
# disp
#1 1956.7
#.....

Programming with dplyr using string as input

I would like to write a function that uses dplyr inside and I supply variable names as strings. Unfortunately dplyr-s use of NSE makes it rather complicated. From Programming with dplyr I get the following example
my_summarise <- function(df, var) {
var <- enquo(var)
df %>%
group_by(!!var) %>%
summarise(a = mean(a))
}
my_summarise(df, g1)
However, I would like to write function where instead of g1 I could provide "g1" and I am not able to wrap my head around how to do that.
dplyr >= 1.0
Use combination of double braces and the across function:
my_summarise2 <- function(df, group_var) {
df %>% group_by(across({{ group_var }})) %>%
summarise(mpg = mean(mpg))
}
my_summarise2(mtcars, "cyl")
# A tibble: 3 x 2
# cyl mpg
# <dbl> <dbl>
# 1 4 26.7
# 2 6 19.7
# 3 8 15.1
# same result as above, passing cyl without quotes
my_summarise(mtcars, cyl)
dplyr < 1.0
As far as I know, you could use as.name or sym (from the rlang package - I don't know if dplyr will import it eventually):
library(dplyr)
my_summarise <- function(df, var) {
var <- rlang::sym(var)
df %>%
group_by(!!var) %>%
summarise(mpg = mean(mpg))
}
or
my_summarise <- function(df, var) {
var <- as.name(var)
df %>%
group_by(!!var) %>%
summarise(mpg = mean(mpg))
}
my_summarise(mtcars, "cyl")
# # A tibble: 3 × 2
# cyl mpg
# <dbl> <dbl>
# 1 4 26.66364
# 2 6 19.74286
# 3 8 15.10000
Using the .data pronoun from rlang is another option that works directly with column names stored as strings.
The function with .data would look like
my_summarise <- function(df, var) {
df %>%
group_by(.data[[var]]) %>%
summarise(mpg = mean(mpg))
}
my_summarise(mtcars, "cyl")
# A tibble: 3 x 2
cyl mpg
<dbl> <dbl>
1 4 26.7
2 6 19.7
3 8 15.1
This is how to do it using only dplyr and the very useful as.name function from base R:
my_summarise <- function(df, var) {
varName <- as.name(var)
enquo_varName <- enquo(varName)
df %>%
group_by(!!enquo_varName) %>%
summarise(a = mean(a))
}
my_summarise(df, "g1")
Basically, with as.name() we generate a name object that matches var (here var is a string). Then, following Programming with dplyr, we use enquo() to look at that name and return the associated value as a quosure. This quosure can then be unquoted inside the group_by() call using !!.

Renaming a column name, by using the data frame title/name

I have a data frame called "Something". I am doing an aggregation on one of the numeric columns using summarise, and I want the name of that column to contain "Something" - data frame title in the column name.
Example:
temp <- Something %>%
group_by(Month) %>%
summarise(avg_score=mean(score))
But i would like to name the aggregate column as "avg_Something_score". Did that make sense?
We can use the devel version of dplyr (soon to be released 0.6.0) that does this with quosures
library(dplyr)
myFun <- function(data, group, value){
dataN <- quo_name(enquo(data))
group <- enquo(group)
value <- enquo(value)
newName <- paste0("avg_", dataN, "_", quo_name(value))
data %>%
group_by(!!group) %>%
summarise(!!newName := mean(!!value))
}
myFun(mtcars, cyl, mpg)
# A tibble: 3 × 2
# cyl avg_mtcars_mpg
# <dbl> <dbl>
#1 4 26.66364
#2 6 19.74286
#3 8 15.10000
myFun(iris, Species, Petal.Width)
# A tibble: 3 × 2
# Species avg_iris_Petal.Width
# <fctr> <dbl>
#1 setosa 0.246
#2 versicolor 1.326
#3 virginica 2.026
Here, the enquo takes the input arguments like substitute from base R and converts to quosure, with quo_name, we can convert it to string, evaluate the quosure by unquoting (!! or UQ) inside group_by/summarise/mutate etc. The column names on the lhs of assignment (:=) can also evaluated by unquoting to get the columns of interest
You can use rename_ from dplyr with deparse(substitute(Something)) like this:
Something %>%
group_by(Month) %>%
summarise(avg_score=mean(score))%>%
rename_(.dots = setNames("avg_score",
paste0("avg_",deparse(substitute(Something)),"_score") ))
It seems like it makes more sense to generate the new column name dynamically so that you don't have to hard-code the name of the data frame inside setNames. Maybe something like the function below, which takes a data frame, a grouping variable, and a numeric variable:
library(dplyr)
library(lazyeval)
my_fnc = function(data, group, value) {
df.name = deparse(substitute(data))
data %>%
group_by_(group) %>%
summarise_(avg = interp(~mean(v), v=as.name(value))) %>%
rename_(.dots = setNames("avg", paste0("avg_", df.name, "_", value)))
}
Now let's run the function on two different data frames:
my_fnc(mtcars, "cyl", "mpg")
cyl avg_mtcars_mpg
<dbl> <dbl>
1 4 26.66364
2 6 19.74286
3 8 15.10000
my_fnc(iris, "Species", "Petal.Width")
Species avg_iris_Petal.Width
1 setosa 0.246
2 versicolor 1.326
3 virginica 2.026
library(dplyr)
# Take mtcars as an example
# Calculate the mean of mpg using cyl as group
data(mtcars)
Something <- mtcars
# Create a list of expression
dots <- list(~mean(mpg))
# Apply the function, Use setNames to name the column
temp <- Something %>%
group_by(cyl) %>%
summarise_(.dots = setNames(dots,
paste0("avg_", as.character(quote(Something)), "_score")))
You could use colnames(Something)<-c("score","something_avg_score")

Resources