Programming with dplyr using string as input - r

I would like to write a function that uses dplyr inside and I supply variable names as strings. Unfortunately dplyr-s use of NSE makes it rather complicated. From Programming with dplyr I get the following example
my_summarise <- function(df, var) {
var <- enquo(var)
df %>%
group_by(!!var) %>%
summarise(a = mean(a))
}
my_summarise(df, g1)
However, I would like to write function where instead of g1 I could provide "g1" and I am not able to wrap my head around how to do that.

dplyr >= 1.0
Use combination of double braces and the across function:
my_summarise2 <- function(df, group_var) {
df %>% group_by(across({{ group_var }})) %>%
summarise(mpg = mean(mpg))
}
my_summarise2(mtcars, "cyl")
# A tibble: 3 x 2
# cyl mpg
# <dbl> <dbl>
# 1 4 26.7
# 2 6 19.7
# 3 8 15.1
# same result as above, passing cyl without quotes
my_summarise(mtcars, cyl)
dplyr < 1.0
As far as I know, you could use as.name or sym (from the rlang package - I don't know if dplyr will import it eventually):
library(dplyr)
my_summarise <- function(df, var) {
var <- rlang::sym(var)
df %>%
group_by(!!var) %>%
summarise(mpg = mean(mpg))
}
or
my_summarise <- function(df, var) {
var <- as.name(var)
df %>%
group_by(!!var) %>%
summarise(mpg = mean(mpg))
}
my_summarise(mtcars, "cyl")
# # A tibble: 3 × 2
# cyl mpg
# <dbl> <dbl>
# 1 4 26.66364
# 2 6 19.74286
# 3 8 15.10000

Using the .data pronoun from rlang is another option that works directly with column names stored as strings.
The function with .data would look like
my_summarise <- function(df, var) {
df %>%
group_by(.data[[var]]) %>%
summarise(mpg = mean(mpg))
}
my_summarise(mtcars, "cyl")
# A tibble: 3 x 2
cyl mpg
<dbl> <dbl>
1 4 26.7
2 6 19.7
3 8 15.1

This is how to do it using only dplyr and the very useful as.name function from base R:
my_summarise <- function(df, var) {
varName <- as.name(var)
enquo_varName <- enquo(varName)
df %>%
group_by(!!enquo_varName) %>%
summarise(a = mean(a))
}
my_summarise(df, "g1")
Basically, with as.name() we generate a name object that matches var (here var is a string). Then, following Programming with dplyr, we use enquo() to look at that name and return the associated value as a quosure. This quosure can then be unquoted inside the group_by() call using !!.

Related

using quasiquotation in a function with summarize in dplyr

I'm trying to write a function that can take column names as strings assigned to variables and produce a summarized output for them like so
my_function <- function(my_df, columnA,columnB){
summ_df <- my_df %>%
group_by(cyl) %>%
summarise(base_mean = mean(columnA),
contrast_mean = mean(columnB))
return(summ_df)
}
base = "drat"
cont = "wt"
my_function(mtcars,base,cont)
What I would want is that the above function would return the same thing as
mtcars %>%
group_by(cyl) %>%
summarise(base_mean = mean(drat),
contrast_mean = mean(wt))
I'm sure it's some combination of enexpr, or ensym, and !! but i keep getting NA values
Use ensym with !! so that it can take both unquoted and quoted actual arguments
my_function <- function(my_df, columnA,columnB){
my_df %>%
group_by(cyl) %>%
summarise(base_mean = mean(!! ensym(columnA)),
contrast_mean = mean(!! ensym(columnB)), .groups = 'drop' )
}
-testing
> my_function(mtcars, !!base, !!cont)
# A tibble: 3 × 3
cyl base_mean contrast_mean
<dbl> <dbl> <dbl>
1 4 4.07 2.29
2 6 3.59 3.12
3 8 3.23 4.00

In R: How do I send a global variable (char string), into the mutate function [duplicate]

I would like to write a function that uses dplyr inside and I supply variable names as strings. Unfortunately dplyr-s use of NSE makes it rather complicated. From Programming with dplyr I get the following example
my_summarise <- function(df, var) {
var <- enquo(var)
df %>%
group_by(!!var) %>%
summarise(a = mean(a))
}
my_summarise(df, g1)
However, I would like to write function where instead of g1 I could provide "g1" and I am not able to wrap my head around how to do that.
dplyr >= 1.0
Use combination of double braces and the across function:
my_summarise2 <- function(df, group_var) {
df %>% group_by(across({{ group_var }})) %>%
summarise(mpg = mean(mpg))
}
my_summarise2(mtcars, "cyl")
# A tibble: 3 x 2
# cyl mpg
# <dbl> <dbl>
# 1 4 26.7
# 2 6 19.7
# 3 8 15.1
# same result as above, passing cyl without quotes
my_summarise(mtcars, cyl)
dplyr < 1.0
As far as I know, you could use as.name or sym (from the rlang package - I don't know if dplyr will import it eventually):
library(dplyr)
my_summarise <- function(df, var) {
var <- rlang::sym(var)
df %>%
group_by(!!var) %>%
summarise(mpg = mean(mpg))
}
or
my_summarise <- function(df, var) {
var <- as.name(var)
df %>%
group_by(!!var) %>%
summarise(mpg = mean(mpg))
}
my_summarise(mtcars, "cyl")
# # A tibble: 3 × 2
# cyl mpg
# <dbl> <dbl>
# 1 4 26.66364
# 2 6 19.74286
# 3 8 15.10000
Using the .data pronoun from rlang is another option that works directly with column names stored as strings.
The function with .data would look like
my_summarise <- function(df, var) {
df %>%
group_by(.data[[var]]) %>%
summarise(mpg = mean(mpg))
}
my_summarise(mtcars, "cyl")
# A tibble: 3 x 2
cyl mpg
<dbl> <dbl>
1 4 26.7
2 6 19.7
3 8 15.1
This is how to do it using only dplyr and the very useful as.name function from base R:
my_summarise <- function(df, var) {
varName <- as.name(var)
enquo_varName <- enquo(varName)
df %>%
group_by(!!enquo_varName) %>%
summarise(a = mean(a))
}
my_summarise(df, "g1")
Basically, with as.name() we generate a name object that matches var (here var is a string). Then, following Programming with dplyr, we use enquo() to look at that name and return the associated value as a quosure. This quosure can then be unquoted inside the group_by() call using !!.

Group_by inside a function

I am trying to use the group_by function inside of a function but it doesn't seem to work. I found an example in another post as below (this works) :-
dat <- mtcars[c(2:4,11)]
grp <- function(x) {
group_by(dat,!!as.name(x)) %>%
summarise(n=n()) %>%
mutate(pc=scales::percent(n/sum(n))) %>%
arrange(desc(n)) %>% head()
}
lapply(colnames(dat), grp)
What I don't understand is why do I need to data frame name in the group_by function - doesn't group_by function work this way :-
data %>% group_by(lgID) %>% summarise(mean_run = mean(HR))
where the data is piped to the group_by function?
Also, why do I need '!!as.name(x)' - what does this do?
Further, why does the version shown above work and this version shown below doesn't?
grp <- function(x) {
group_by(x) %>%
summarise(n=n()) %>%
mutate(pc=scales::percent(n/sum(n))) %>%
arrange(desc(n)) %>% head()
}
lapply(colnames(dat), grp)
Obviously I am missing something here!
Best regards
Deepak
If we need to pass both index and strings as 'x', wrap it inside across within group_by
library(dplyr) # version >= 1.0.0
f1 <- function(data, x) {
data %>%
group_by(across(all_of(x))) %>%
summarise(n=n(), .groups = 'drop') %>%
mutate(pc=scales::percent(n/sum(n))) %>%
arrange(desc(n)) %>%
head()
}
If we have an older version, use group_by_at(x)
-apply the function
out1 <- lapply(colnames(dat), function(x) f1(dat, x))
Or use index
out2 <- lapply(seq_along(dat), function(i) f1(dat, i))
identical(out1, out2)
#[1] TRUE
-output
out1[[1]]
# A tibble: 3 x 3
# cyl n pc
# <dbl> <int> <chr>
#1 8 14 43.8%
#2 4 11 34.4%
#3 6 7 21.9%
out2[[1]]
# A tibble: 3 x 3
# cyl n pc
# <dbl> <int> <chr>
#1 8 14 43.8%
#2 4 11 34.4%
#3 6 7 21.9%

using argument function with stringr::str_extract [duplicate]

I would like to write a function that uses dplyr inside and I supply variable names as strings. Unfortunately dplyr-s use of NSE makes it rather complicated. From Programming with dplyr I get the following example
my_summarise <- function(df, var) {
var <- enquo(var)
df %>%
group_by(!!var) %>%
summarise(a = mean(a))
}
my_summarise(df, g1)
However, I would like to write function where instead of g1 I could provide "g1" and I am not able to wrap my head around how to do that.
dplyr >= 1.0
Use combination of double braces and the across function:
my_summarise2 <- function(df, group_var) {
df %>% group_by(across({{ group_var }})) %>%
summarise(mpg = mean(mpg))
}
my_summarise2(mtcars, "cyl")
# A tibble: 3 x 2
# cyl mpg
# <dbl> <dbl>
# 1 4 26.7
# 2 6 19.7
# 3 8 15.1
# same result as above, passing cyl without quotes
my_summarise(mtcars, cyl)
dplyr < 1.0
As far as I know, you could use as.name or sym (from the rlang package - I don't know if dplyr will import it eventually):
library(dplyr)
my_summarise <- function(df, var) {
var <- rlang::sym(var)
df %>%
group_by(!!var) %>%
summarise(mpg = mean(mpg))
}
or
my_summarise <- function(df, var) {
var <- as.name(var)
df %>%
group_by(!!var) %>%
summarise(mpg = mean(mpg))
}
my_summarise(mtcars, "cyl")
# # A tibble: 3 × 2
# cyl mpg
# <dbl> <dbl>
# 1 4 26.66364
# 2 6 19.74286
# 3 8 15.10000
Using the .data pronoun from rlang is another option that works directly with column names stored as strings.
The function with .data would look like
my_summarise <- function(df, var) {
df %>%
group_by(.data[[var]]) %>%
summarise(mpg = mean(mpg))
}
my_summarise(mtcars, "cyl")
# A tibble: 3 x 2
cyl mpg
<dbl> <dbl>
1 4 26.7
2 6 19.7
3 8 15.1
This is how to do it using only dplyr and the very useful as.name function from base R:
my_summarise <- function(df, var) {
varName <- as.name(var)
enquo_varName <- enquo(varName)
df %>%
group_by(!!enquo_varName) %>%
summarise(a = mean(a))
}
my_summarise(df, "g1")
Basically, with as.name() we generate a name object that matches var (here var is a string). Then, following Programming with dplyr, we use enquo() to look at that name and return the associated value as a quosure. This quosure can then be unquoted inside the group_by() call using !!.

Renaming a column name, by using the data frame title/name

I have a data frame called "Something". I am doing an aggregation on one of the numeric columns using summarise, and I want the name of that column to contain "Something" - data frame title in the column name.
Example:
temp <- Something %>%
group_by(Month) %>%
summarise(avg_score=mean(score))
But i would like to name the aggregate column as "avg_Something_score". Did that make sense?
We can use the devel version of dplyr (soon to be released 0.6.0) that does this with quosures
library(dplyr)
myFun <- function(data, group, value){
dataN <- quo_name(enquo(data))
group <- enquo(group)
value <- enquo(value)
newName <- paste0("avg_", dataN, "_", quo_name(value))
data %>%
group_by(!!group) %>%
summarise(!!newName := mean(!!value))
}
myFun(mtcars, cyl, mpg)
# A tibble: 3 × 2
# cyl avg_mtcars_mpg
# <dbl> <dbl>
#1 4 26.66364
#2 6 19.74286
#3 8 15.10000
myFun(iris, Species, Petal.Width)
# A tibble: 3 × 2
# Species avg_iris_Petal.Width
# <fctr> <dbl>
#1 setosa 0.246
#2 versicolor 1.326
#3 virginica 2.026
Here, the enquo takes the input arguments like substitute from base R and converts to quosure, with quo_name, we can convert it to string, evaluate the quosure by unquoting (!! or UQ) inside group_by/summarise/mutate etc. The column names on the lhs of assignment (:=) can also evaluated by unquoting to get the columns of interest
You can use rename_ from dplyr with deparse(substitute(Something)) like this:
Something %>%
group_by(Month) %>%
summarise(avg_score=mean(score))%>%
rename_(.dots = setNames("avg_score",
paste0("avg_",deparse(substitute(Something)),"_score") ))
It seems like it makes more sense to generate the new column name dynamically so that you don't have to hard-code the name of the data frame inside setNames. Maybe something like the function below, which takes a data frame, a grouping variable, and a numeric variable:
library(dplyr)
library(lazyeval)
my_fnc = function(data, group, value) {
df.name = deparse(substitute(data))
data %>%
group_by_(group) %>%
summarise_(avg = interp(~mean(v), v=as.name(value))) %>%
rename_(.dots = setNames("avg", paste0("avg_", df.name, "_", value)))
}
Now let's run the function on two different data frames:
my_fnc(mtcars, "cyl", "mpg")
cyl avg_mtcars_mpg
<dbl> <dbl>
1 4 26.66364
2 6 19.74286
3 8 15.10000
my_fnc(iris, "Species", "Petal.Width")
Species avg_iris_Petal.Width
1 setosa 0.246
2 versicolor 1.326
3 virginica 2.026
library(dplyr)
# Take mtcars as an example
# Calculate the mean of mpg using cyl as group
data(mtcars)
Something <- mtcars
# Create a list of expression
dots <- list(~mean(mpg))
# Apply the function, Use setNames to name the column
temp <- Something %>%
group_by(cyl) %>%
summarise_(.dots = setNames(dots,
paste0("avg_", as.character(quote(Something)), "_score")))
You could use colnames(Something)<-c("score","something_avg_score")

Resources