NSE lazyeval::lazy vs. substitute when referring to variable names - r

I'm still trying to wrap my head around non-standard evaluation and how it's used in dplyr. I'm having trouble understanding why lazy evaluation is important when the function arguments are variable names, and so the original context's environment doesn't seem important.
In the code below, the function select3() uses lazy evaluation, but fails (I believe) because it tries to follow the variable name order all the way to base::order.
Is it okay to use substitute as I have in my select4(), or is there some other way I should implement this function? When would it actually be important to save the original environment, when I really want those arguments to refer to variables?
Thank you!
library(dplyr)
library(lazyeval)
# Same as dplyr::select
select2 <- function(.data, ...) {
select_(.data, .dots = lazy_dots(...))
}
# I want to have two capture groups of variables, so I need named arguments.
select3 <- function(.data, group1, group2) {
out1 <- select_(.data, .dots = lazy(group1))
out2 <- select_(.data, .dots = lazy(group2))
list(out1, out2)
}
df <- data.frame(x = 1:2, y = 3:4, order = 5:6)
# select3 seems okay at first...
df %>% select2(x, y)
df %>% select3(x, y)
# But fails when the variable is a function defined in the namespace
df %>% select2(x, order)
df %>% select3(x, order)
# Error in eval(expr, envir, enclos) : object 'datafile' not found
# Using substitute instead of lazy works. But I'm not sure I understand the
# implications of doing this.
select4 <- function(.data, group1, group2) {
out1 <- select_(.data, .dots = substitute(group1))
out2 <- select_(.data, .dots = substitute(group2))
list(out1, out2)
}
df %>% select4(x, order)
PS on a related note, is this a bug or intended behavior?
select(df, z)
# Error in eval(expr, envir, enclos) : object 'z' not found
# But if I define z as a numeric variable it works.
z <- 1
select(df, z)
Update
A. Webb points out below that the environment is important for select because the special functions like one_of can use objects from it.
Update 2
I used to have an ugly hack as a fix, but here's a much better way; I should've known that even lazy has a standard evaluation counter-part lazy_
select6 <- function(.data, group1, group2) {
g1 <- lazy_(substitute(group1), env = parent.frame())
g2 <- lazy_(substitute(group2), env = parent.frame())
out1 <- select_(.data, .dots = g1)
out2 <- select_(.data, .dots = g2)
list(out1, out2)
}
# Or even more like the original...
lazy_parent <- function(expr) {
# Need to go up twice, because lazy_parent creates an environment for itself
e1 <- substitute(expr)
e2 <- do.call("substitute", list(e1), envir = parent.frame(1))
lazy_(e2, parent.frame(2))
}
select7 <- function(.data, group1, group2) {
out1 <- select_(.data, .dots = lazy_parent(group1))
out2 <- select_(.data, .dots = lazy_parent(group2))
list(out1, out2)
}

The problem here is that lazy by default follows promises, and order is a promise due to lazy loading of packages.
library(pryr)
is_promise(order)
#> TRUE
The default for lazy_dots, as used in select, is the opposite.
But there is something else going on here too, where the nature of the special ... is used to extract unevaluated expressions. While your use of substitute will work in many situations, attempts at renaming as available via select will fail.
select4(df,foo=x,bar=order)
#> Error in select4(df, foo = x, bar = order) :
#> unused arguments (foo = x, bar = order)
However, this works
select5 <- function(.data, ...) {
dots<-lazy_dots(...)
out1 <- select_(.data, .dots=dots[1])
out2 <- select_(.data, .dots=dots[2])
list(out1, out2)
}
select5(df,foo=x,bar=order)
#> [[1]]
#> foo
#> 1 1
#> 2 2
#>
#> [[2]]
#> bar
#> 1 5
#> 2 6
As another example, where substitute fails more directly, due to lack of carrying an environment, consider
vars<-c("x","y")
select4(df,one_of(vars),order)
#>Error in one_of(vars, ...) : object 'vars' not found
select5(df,one_of(vars),order)
#> [[1]]
#> x y
#> 1 1 3
#> 2 2 4
#>
#> [[2]]
#> order
#> 1 5
#> 2 6
The select4 version fails because it cannot find vars, where select5 succeeds due to lazy_dots carrying around the environment. Note select4(df,one_of(c("x","y")),order) is okay, as it uses literals.

Related

How to pass a variable to a function which has already implement non-standard evaluation in its argument in R?

I am trying to wrap up a function from an R package. From the source codes, it appears that it has non-standard evaluations for some arguments. How can I write my function to pass the value to the argument that has non-standard evaluation implemented?
Here is a toy example
data <- data.frame(name = 1:10)
#Suppose the one below is the function from that package
toy.fun <- function(dat, var) {
eval(substitute(var), dat)
}
> toy.fun(data, name)
[1] 1 2 3 4 5 6 7 8 9 10
Here is what I try to wrap it
toy.fun2 <- function(dat, var2) {
var_name <- deparse(substitute(var2))
#example, but for similar purpose.
data_subset <- dat[var_name]
toy.fun(data_subset, var2)
}
> toy.fun2(data, name)
Error in eval(substitute(var), dat) : object 'var2' not found
Edit
I should make the question more clear, in which I want to pass different variable name to the function argument in the wrapper for var2. So that when there are different variable names in the data, it could use that name for both data selection, and pass to the function I try to wrap. The source codes for exceedance function from heatwaveR has this ts_y <- eval(substitute(y), data) to capture the input variable y already. This is equivalent to my toy.fun.
I have modified the toy.fun2 for clarity.
Edit 2
It turns out the solution is quite easy, just by substituting the entire function including arguments, and evaluating it.
toy.fun2 <- function(dat, var2) {
var_name <- deparse(substitute(var2))
#example, but for similar purpose.
data_subset <- dat[var_name]
exprs <- substitute(toy.fun(data_subset, var2))
eval(exprs, parent.frame())
#include `envir` argument if this is to be worked in data.table
}
Grab the call using match.call, replace the function name and evaluate it.
It would also be possible to modify the other arguments as needed. Look at the source of lm to see another example of this approach.
toy.fun2 <- function(dat, var) {
cl <- match.call()
cl[[1L]] <- quote(toy.fun)
eval.parent(cl)
}
toy.fun2(data, name)
## [1] 1 2 3 4 5 6 7 8 9 10
Added
The question was revised after this answer was already posted. This addresses the new question. The line setting cl[[2]] could alternately be written as cl[[2]] <- dat[deparse(cl[[3]])] .
toy.fun2 <- function(dat, var2) {
cl <- match.call()
cl[[1]] <- quote(toy.fun)
cl[[2]] <- dat[deparse(substitute(var2))]
names(cl)[3] <- "var"
eval.parent(cl)
}
toy.fun2(data, name)
## [1] 1 2 3 4 5 6 7 8 9 10

How to insert function argument that is string to abs() inside dplyr::filter()?

I wanted to pass PCnumber to the abs() function inside dplyr::filter() but it resulted in error.
For example:
df <- data.frame(PC=rnorm(n = 25, mean = 0, sd = 1))
foo <- function(x, PCnumber) {
PCnumber_load0 <- select(x, PCnumber)
PCnumber_load <-
filter(PCnumber_load0, (abs(PCnumber) >= mean(abs(PCnumber))))
PCnumber_load
}
foo(df, PCnumber="PC")
Resulting in the following error message.
Error: Problem with `filter()` input `..1`.
x non-numeric argument to mathematical function
i Input `..1` is `(abs(PCnumber) >= mean(abs(PCnumber)))`.
I already tried to change the PCnumber inside abs() with cat(PCnumber), {{PCnumber}}, !!PCnumber but not one worked.
Thank you.
Currently, your function interprets the PCnumber variable as a simple string, which it is. You want it to be converted into the variable contained within the data.frame (x).
You ned to use sym() to convert the string into a symbol which can then be inserted using !!:
foo <- function(x, PCnumber) {
PCnumber <- sym(PCnumber)
PCnumber_load0 <- select(x, !!PCnumber)
PCnumber_load <-
filter(PCnumber_load0, (abs(!!PCnumber) >= mean(abs(!!PCnumber))))
PCnumber_load
}
You have to first take the user-provided argument an turn it into a symbol:
library(rlang)
df <- data.frame(PC=rnorm(n = 25, mean = 0, sd = 1))
foo <- function(x, PCnumber) {
PCnumber <- sym(PCnumber)
PCnumber_load <-
filter(x, (abs(!!PCnumber) >= mean(abs(!!PCnumber)))) # Then you unquote it with bang bang operator and it's a technic widely used in data masking which is the purpose of tidyverse functions
PCnumber_load
}
foo(df, PCnumber= "PC")
PC
1 1.1658969
2 0.9449877
3 -2.3434366
4 -1.3040914
5 -1.4638784
6 -0.8324823
7 0.8094797
8 1.4789942
9 1.0667956
10 -0.9972897
11 0.7548202
If you had any other question I would be glad to explain more.

programming with dplyr::arrange in dplyr v.0.7

I am trying to get my head around the new implementations in dplyr with respect to programming and non standard evaluation. So the verb_ functions are replaced by enquo of the argument and then applying !! in the regular verb function. Translating select from old to new works fine, the following function give similar results:
select_old <- function(x, ...) {
vars <- as.character(match.call())[-(1:2)]
x %>% select(vars)
}
select_new <- function(x, ...) {
vars <- as.character(match.call())[-(1:2)]
vars_enq <- enquo(vars)
x %>% select(!!vars_enq)
}
However when I try to use arrange in the new programming style I'll get an error:
arrange_old <- function(x, ...) {
vars <- as.character(match.call())[-(1:2)]
x %>% arrange_(vars)
}
arrange_new <- function(x, ...){
vars <- as.character(match.call())[-(1:2)]
vars_enq <- enquo(vars)
x %>% arrange(!!vars_enq)
}
mtcars %>% arrange_new(cyl)
# Error in arrange_impl(.data, dots) :
# incorrect size (1) at position 1, expecting : 32
32 is obviously the number of rows of mtcars, the inner function of dplyr apparently expects a vector of this length. My questions are why does the new programming style not traslate for arrange and how to it then in the new style.
You are overthinking it. Use the appropriate function to deal with .... No need to use match.call at all (also not in the old versions, really).
arrange_new <- function(x, ...){
dots <- quos(...)
x %>% arrange(!!!dots)
}
Of course this function does the exact same as the normal arrange, but I guess you are just using this as an example.
You can write a select function in the same way.
The arrange_old should probably have looked something like:
arrange_old <- function(x, ...){
dots <- lazyeval::lazy_dots(...)
x %>% arrange_(.dots = dots)
}
You don't actually need rlang in this situation. This will work:
my_arrange <- function(x, ...) arrange(x, ...)
# test
DF <- data.frame(a = c(2, 2, 1, 1), b = 4:1)
DF %>% my_arrange(a, b)

R object not found if defined within a function when using data.table dplyr

Note The described behaviour has been fixed in the dev version of dplyr. You can install dplyr using devtools::install_github("hadley/dplyr")
Please see this minimal example; I am using dplyr v0.3.0.2 and data.table v1.9.4
library(dplyr)
library(data.table)
f <- function(x, y, bad) {
z <- data.table(x,y, key = "x")
z2 <- z %>% group_by(x) %>% summarise(sum.bad = sum(y == bad))
z2
}
f(rnorm(100), rnorm(100) < 0, bad = FALSE)
When I run the above I get
Error in `[.data.table`(dt, , list(sum.bad = sum(y == bad)), by = vars) :
object 'bad' not found
However bad is clearly defined and in scope.
If I just run this outside of a function it works
x <- rnorm(100)
y <- rnorm(100) <0
bad <- FALSE
z <- data.table(x,y, key = "x")
z2 <- z %>% group_by(x) %>% summarise(sum.bad = sum(y == bad))
z2
What is the issue here? Is it a bug with either data.table or dplyr?
Seems like this is a problem with how dplyr is setting up the environment to the data.table call. The problem appears in the dplyr:::summarise_.grouped_dt function. It currently looks like
function (.data, ..., .dots)
{
dots <- lazyeval::all_dots(.dots, ..., all_named = TRUE)
for (i in seq_along(dots)) {
if (identical(dots[[i]]$expr, quote(n()))) {
dots[[i]]$expr <- quote(.N)
}
}
list_call <- lazyeval::make_call(quote(list), dots)
call <- substitute(dt[, list_call, by = vars], list(list_call = list_call$expr))
env <- dt_env(.data, parent.frame())
out <- eval(call, env)
grouped_dt(out, drop_last(groups(.data)), copy = FALSE)
}
<environment: namespace:dplyr>
and if we debug that function and look at the trace when it's called, we see
where 1: summarise_.grouped_dt(.data, .dots = lazyeval::lazy_dots(...))
where 2: summarise_(.data, .dots = lazyeval::lazy_dots(...))
where 3: summarise(., sum.bad = sum(y == bad))
where 4: function_list[[k]](value)
where 5: withVisible(function_list[[k]](value))
where 6: freduce(value, `_function_list`)
where 7: `_fseq`(`_lhs`)
where 8: eval(expr, envir, enclos)
where 9: eval(quote(`_fseq`(`_lhs`)), env, env)
where 10: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
where 11 at #3: z %>% group_by(x) %>% summarise(sum.bad = sum(y == bad))
where 12: f(rnorm(100), rnorm(100) < 0, bad = FALSE)
So the important line is the
env <- dt_env(.data, parent.frame())
one. Here it's setting up the environment path which specifies where to look up all variables in the call. Here it's just using the parent.frame which is looks to where the function was called from, but since you actually jump through a few hoops to get to this function from your summarize call inside f(), this doesn't seem to be the right parent frame. If, instead you run
env <- dt_env(.data, parent.frame(2))
in debug mode, that seems to actually get at the correct parent frame. So i think the problem is the jump from summarize() to summarize_() because this
ff <- function(x, y, bad) {
z <- data.table(x,y, key = "x")
z2 <- z %>% group_by(x) %>% summarise_(.dots=list(sum.bad = quote(sum(y == bad))))
z2
}
ff(rnorm(100), rnorm(100) < 0, bad = FALSE)
seems to work. So it's really dplyr that needs to set up the correct environment. The tricky part is that appears to be different if you call summarize or summarize_ directly. Perhaps summarise() could change the environment when it calls summarise_ to have the same parent.frame via eval(). But I'd probably file this as a bug report and let Hadley decide how to fix it. Something like
summarise <- function(.data, ...) {
call <- match.call()
call <- as.call(c(as.list(call)[1:2], list(.dots=as.list(call)[-(1:2)])))
call[[1]] <- quote(summarise_)
eval(call, envir=parent.frame())
}
would be a "traditional" way to do it. Not sure if the lazyeval package has nicer ways to do this or not.
Tested with data.table_1.9.2 and dplyr_0.3.0.2

How to write a function that calls a function that calls data.table?

The package data.table has some special syntax that requires one to use expressions as the i and j arguments.
This has some implications for how one write functions that accept and pass arguments to data tables, as is explained really well in section 1.16 of the FAQs.
But I can't figure out how to take this one additional level.
Here is an example. Say I want to write a wrapper function foo() that makes a specific summary of my data, and then a second wrapper plotfoo() that calls foo() and plots the result:
library(data.table)
foo <- function(data, by){
by <- substitute(by)
data[, .N, by=list(eval(by))]
}
DT <- data.table(mtcars)
foo(DT, gear)
OK, this works, because I get my tabulated results:
by N
1: 4 12
2: 3 15
3: 5 5
Now, I try to just the same when writing plotfoo() but I fail miserably:
plotfoo <- function(data, by){
by <- substitute(by)
foo(data, eval(by))
}
plotfoo(DT, gear)
But this time I get an error message:
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
OK, so the eval() is causing a problem. Let's remove it:
plotfoo <- function(data, by){
by <- substitute(by)
foo(data, by)
}
plotfoo(DT, gear)
Oh no, I get a new error message:
Error in `[.data.table`(data, , .N, by = list(eval(by))) :
column or expression 1 of 'by' or 'keyby' is type symbol. Do not quote column names. Useage: DT[,sum(colC),by=list(colA,month(colB))]
And here is where I remain stuck.
Question: How to write a function that calls a function that calls data.table?
This will work:
plotfoo <- function(data, by) {
by <- substitute(by)
do.call(foo, list(quote(data), by))
}
plotfoo(DT, gear)
# by N
# 1: 4 12
# 2: 3 15
# 3: 5 5
Explanation:
The problem is that your call to foo() in plotfoo() looks like one of the following:
foo(data, eval(by))
foo(data, by)
When foo processes those calls, it dutifully substitutes for the second formal argument (by) getting as by's value the symbols eval(by) or by. But you want by's value to be gear, as in the call foo(data, gear).
do.call() solves this problem by evaluating the elements of its second argument before constructing the call that it then evaluates. As a result, when you pass it by, it evaluates it to its value (the symbol gear) before constructing a call that looks (essentially) like this:
foo(data, gear)
I think you might be tieing yourself up in knots. This works:
library(data.table)
foo <- function(data, by){
by <- by
data[, .N, by=by]
}
DT <- data.table(mtcars)
foo(DT, 'gear')
plotfoo <- function(data, by){
foo(data, by)
}
plotfoo(DT, 'gear')
And that method supports passing in character values:
> gg <- 'gear'
> plotfoo <- function(data, by){
+ foo(data, by)
+ }
> plotfoo(DT, gg)
gear N
1: 4 12
2: 3 15
3: 5 5

Resources