enquo() inside a magrittr pipeline - r

I just would like to understand what's going wrong here.
In the first case (working), I assign the enquo()-ted argument to a variable, in the second case, I use the enquoted argument directly in my call to mutate.
library("dplyr")
df <- tibble(x = 1:5, y= 1:5, z = 1:5)
# works
myfun <- function(df, transformation) {
my_transformation <- rlang::enquo(transformation)
df %>%
gather("key","value", x,y,z) %>%
mutate(value = UQ(my_transformation))
}
myfun(df,exp(value))
# does not work
myfun_2 <- function(df, transformation) {
df %>%
gather("key","value", x,y,z) %>%
mutate(value = UQ(rlang::enquo(transformation)))
}
myfun_2(df,exp(value))
#>Error in mutate_impl(.data, dots) : Column `value` is of unsupported type closure
Edit
Here are some more lines to think about :)
Wrapping the call into quo() it looks as if the expression to evaluate is "built" correctly
# looks as if the whole thing should be working
myfun_2_1 <- function(df, transformation) {
quo(df %>%
gather("key","value", x,y,z) %>%
mutate(value = UQ(rlang::enquo(transformation))))
}
myfun_2_1(df,exp(value))
If you tell this to eval_tidy, it works (it doesn't work without quo())
# works
myfun_2_2 <- function(df, transformation) {
eval_tidy(quo(df %>%
gather("key","value", x,y,z) %>%
mutate(value = UQ(rlang::enquo(transformation)))))
}
myfun_2_2(df,exp(value))
If you don't use the pipe, it also works
# works
myfun_2_3 <- function(df, transformation) {
mutate(gather(df,"key","value", x,y,z), value = UQ(rlang::enquo(transformation)))
}
myfun_2_3(df,exp(value))
Regarding the error message, this is what one gets, when one tries to pass types that are not supported by data.frames, eg.
mutate(df, value = function(x) x)
# Error in mutate_impl(.data, dots) : Column value is of unsupported type closure
To me it looks as if the quosure in myfun_2 isn't evaluated by mutate, which is somehow interesting/non-intuitive behaviour. Do you think I should report this to the developers?

This limitation is solved in rlang 0.2.0.
Technically: The core of the issue was that magrittr evaluates its arguments in a child of the current environment. This is this environment that contains the . pronoun. As of 0.2.0, capture of arguments with enquo() and variants is now lexically scoped, which means it looks up the stack of parent environments to find the argument to capture. This solves the magrittr problem.

Related

Passing enquo expression to subfunction

This question is related to Passing variables to functions that use `enquo()`.
I have a higher function with arguments of a tibble (dat) and the columns of interest in dat (variables_of_interest_in_dat). Within that function, there is a call to another function to which I want to pass variables_of_interest_in_dat.
higher_function <- function(dat, variables_of_interest_in_dat){
variables_of_interest_in_dat <- enquos(variables_of_interest_in_dat)
lower_function(dat, ???variables_of_interest_in_dat???)
}
lower_function <- function(dat, variables_of_interest_in_dat){
variables_of_interest_in_dat <- enquos(variables_of_interest_in_dat)
dat %>%
select(!!!variables_of_interest_in_dat)
}
What is the recommended way to pass variables_of_interest_in_dat to lower_function?
I have tried lower_function(dat, !!!variables_of_interest_in_dat) but when I run higher_function(mtcars, cyl) this returns "Error: Can't use !!! at top level."
In the related post, the higher_function did not enquo the variables before passing them to lower function.
Thank you
Is this what you want?
library(tidyverse)
LF <- function(df,var){
newdf <- df %>% select({{var}})
return(newdf)
}
HF <- function(df,var){
LF(df,{{var}})
}
LF(mtcars,disp)
HF(mtcars,disp)
the {{}} (aka 'curly curly') operator replaces the approach of quoting with enquo()

Access result later in pipe

Access result later in pipe
I am trying to create functions which print the number of rows excluded in a dataset at each step in a pipe.
Something like this:
iris %>%
function_which_save_nrows_and_return_the_data() %>%
filter(exclude some rows) %>%
function_which_prints_difference_in_rows_before_after_exlusion_and_returns_data %>%
function_which_save_nrows_and_return_the_data() %>%
function_which_prints_difference_in_rows_before_after_exlusion_and_returns_data ...etc
These are the functions I have attempted:
n_before = function(x) {assign("rows", nrow(x), .GlobalEnv); return(x)}
n_excluded = function(x) {
print(rows - nrow(x))
return(x)
}
This successfully saves the object rows:
But if I add two more links, the object is NOT saved:
So how can I create and access the rows-object later the pipe?
This is due to R's lazy evaluation. It occurs even if pipes are not used. See code below. In that code the argument to n_excluded is filter(n_before(iris), Species != 'setosa') and at the point that rows is used in the print statement the argument has not been referenced from within n_excluded so the entire argument will not have been evaluated and so rows does not yet exist.
if (exists("rows")) rm(rows) # ensure rows does not exist
n_excluded(filter(n_before(iris), Species != 'setosa'))
## Error in h(simpleError(msg, call)) :
## error in evaluating the argument 'x' in selecting a method for function
## 'print': object 'rows' not found
To fix this
1) we can force x before the print statement.
n_excluded = function(x) {
force(x)
print(rows - nrow(x))
return(x)
}
2) Alternately, we can use the magrittr sequential pipe which guarantees that legs are run in order. magrittr makes it available but does not provide an operator for it but we can assign it to an operator like this.
`%s>%` <- magrittr::pipe_eager_lexical
iris %>%
n_before() %>%
filter(Species != 'setosa') %s>% # note use of %s>% on this line
n_excluded()
The magrittr developer has stated that he will add it as an operator if there is sufficient demand for it so you might want to add such request to magrittr issue #247 on github.
You can also use the extended capabilities of pipeR.
library(dplyr)
library(pipeR)
n_excluded = function(x) {
print(rows - nrow(x))
return(x)
}
p <- iris %>>%
(~rows=nrow(.)) %>>%
filter(Species != "setosa") %>>%
n_excluded()

Is there a way to pass arguments with logical operators (!=, >,<) to a function?

create_c <- function(df, line_number = NA, prior_trt, line_name, biomarker, ...) {
if (!"data.frame" %in% class(df)) {
stop("First input must be dataframe")
}
# handle extra arguments
args <- enquos(...)
names(args) <- tolower(names(args))
# check for unknown argument - cols that do not exist in df
check_args_exist(df, args)
# argument to expression
ex_args <- unname(imap(args, function(expr, name) quo(!!sym(name) == !!expr)))
# special case arguments
if (!missing(line_number)) {
df <- df %>% filter(line_number %in% (!!line_number))
if (!missing(prior_trt)) {
df <- filter_arg(df. = df, arg = prior_trt, col = "prior_trt_", val = "y")
}
}
if (!missing(biomarker)) {
df <- filter_arg(df. = df, arg = biomarker, col = "has_", val = "positive")
}
if (!missing(line_name)) {
ln <- list()
if (!!str_detect(line_name[1], "or")) {
line_name <- str_split(line_name, " or ", simplify = TRUE)
}
for (i in 1:length(line_name)) {
ln[[i]] <- paste(tolower(sort(strsplit(line_name[i], "\\+")[[1]])), collapse = ",")
}
df <- df %>% filter(line_name %in% (ln))
}
df <- df %>%
group_by(patient_id) %>%
slice(which.min(line_number)) %>%
ungroup()
df <- df %>% filter(!!!ex_args)
invisible(df)
}
I have this function where I am basically filtering various columns based on parameters users pass. I want the users to be able to pass logical operators like >,<, != for some of the parameters. Right now my function is not able to handle any other operators besides '='. Is there a way to accomplish this?
create_c(df = bsl_all_nsclc,
line_number > 2)
create_c(df, biomarker != "positive)
Error in tolower(arg) : object 'biomarker' not found
Certainly there is a way: operators are regular functions in R, you can pass them around like any other function.
The only complication is that the operators are non-syntactic names so you can’t just pass them “as is”, this would confuse the parser. Instead, you need to wrap them in backticks, to make their use syntactically valid where a name would be expected:
filter_something = function (value, op) {
op(value, 13)
}
filter_something(cars$speed, `>`)
filter_something(cars$speed, `<`)
filter_something(cars$speed, `==`)
And since R also supports non-standard evaluation of function arguments, you can also pass unevaluated expressions — this gets slightly more complicated, since you’d want to evaluate them in the correct context. ‘rlang’/‘dplyr’ uses data masking for this.
How exactly you need to apply this depends entirely on the context in which the expression is to be used. In many cases, you can simply dispatch them to the corresponding ‘dplyr’ functions, e.g.
filter_something2 = function (.data, expr) {
.data %>%
filter({{expr}})
}
filter_something2(cars, speed < 13)
The “secret sauce” here is the {{…}} syntax. This works because filter from ‘dplyr’ accepts unevaluated arguments and handles {{expr}} specially by transforming it into (effectively) !! enexpr(expr). That is: expr is first “defused”: it is explicitly marked as unevaluated, and the name expr is replaced by the unevaluated expression it binds to (speed < 13 in the above). Next, this unevaluated expression is unquoted. That is, the wrapper is “peeled off” from the expression, and that unevaluated expression itself is handled inside filter as if it were passed as filter(.data, speed < 13). In other words: the name expr is substituted with the speed < 13 in the call expression.
For a more thorough explanation, please refer to the Programming with dplyr vignette.

How to evaluate empty quosure programmatically?

In my dataset, I have a few possible grouping variables a, b, c. How do I programmatically tell dplyr to not group by any variables?
For example:
granularity <- NA
if(isTRUE(granularity == 'all')){
# all group variables
group_variables <- quos(a, b, c)
}else if(isTRUE(granularity == 'no_c')){
# all except c
group_variables <- quos(a, b)
}else{
# no group variables
group_variables <- quo()
}
data_summary <- mydata %>%
group_by(!!! group_variables) %>%
summarise(
x_mean = mean(x)
)
This will run correctly if I set granularity to 'all' or 'no_c', but it fails when I assign group_variables to the empty quosure. Does anyone know how to make this work?
Edit: This question also applies to functions like select, so assume I wanted to run
data_select <- mydata %>%
select(!!! select_variables, d, e, f)
How do I set select_variables to sometimes be quos(a, b, c) or sometimes be empty?
Thanks!
Use group_variables <- NULL in that clause:
}else{
# no group variables
group_variables <- NULL
}
also note the massive warning:
Error in grouped_df_impl(data, unname(vars), drop) :
Column `<empty>` is unknown
In addition: Warning message:
Unquoting language objects with `!!!` is soft-deprecated as of rlang 0.3.0.
Please use `!!` instead.
# Bad:
dplyr::select(data, !!!enquo(x))
# Good:
dplyr::select(data, !!enquo(x)) # Unquote single quosure
dplyr::select(data, !!!enquos(x)) # Splice list of quosures
You might want to consider not using packages with unstable APIs.

Non standard evaluation in R for loop: Unquoted input variable in a function containing dplyr summarise always returns NA, but filter function works

SHORT SUMMARY
dplyr unquoting is failing as an argument of function summarise where the quoted object is the argument of a function the use of summarise, and that argument is assigned in a for loop.
For Loop
for(j in 1:1){
sumvar <- paste0("randnum",j)
chkfunc(sumvar)
}
Function (abbreviated here, shown in full below)
chkfunc <- function(sumvar) {
sumvar <- enquo(sumvar)
[...]
summarise(mn = mean(!!sumvar))
LONG SUMMARY
I have two columns that sometimes contain NAs and I want to use dplyr non-standard evaluation and its famous unquoting (AKA bang bang !!) to summarise each column in one for loop.
library(dplyr)
set.seed(3)
randnum1 <- rnorm(10)
randnum1[randnum1<0] <- NA
randnum2 <- rnorm(10)
randnum2[randnum2<0] <- NA
randfrm <- data.frame(cbind(randnum1, randnum2))
print(randfrm)
We see below that the filter function processes the unquoting (!!) just fine but the summarise function fails, returning an "argument is not numeric or logical" error. The same occurs when I use := in the summarise function call (not shown here), which appeared in the "Programming with dplyr" vignette. Finally, I confirmed that the class of !!sumvar is numeric within function chkfunc.
chkfunc <- function(sumvar) {
sumvar <- enquo(sumvar)
message("filter function worked with !!sumvar")
outfrm <- randfrm %>%
filter(!is.na(!!sumvar))
print(outfrm)
message("summarise function failed with !!sumvar")
outfrm <- randfrm %>%
filter(!is.na(!!sumvar)) %>%
summarise(mn = mean(!!sumvar))
}
# Just one iteration to avoid confusion
for(j in 1:1){
sumvar <- paste0("randnum",j)
chkfunc(sumvar)
}
While I would like an answer using dplyr, the following works with substitute and eval rather than using dplyr functions (answer adapted from Akrun's answer to StackOverflow question "Unquote string in R's substitute command"):
chkfunc <- function(sumvar) {
outfrm <- eval(substitute(randfrm %>%
filter(!is.na(y)) %>%
summarise(mn = mean(y)),
list(y=as.name(sumvar))))
print(outfrm)
}
for(j in 1:2){
sumvar <- paste0("randnum",j)
chkfunc(sumvar)
}
print(outfrm)
Finally, I'll note that while the pull function on !!sumvar showed the resulting class to be numeric (i.e., the same class and values of randfrm$randnum1), I figured out that !!sumvar is treated as a character string (i.e., "randnum1) in both my use of filter and summarise, hence the argument is not numeric warning.

Resources