Why must local({...}) be defined using two rounds of expression quoting? - r

I'm trying to understand how R's local function is working. With it, you can open a temporary local scope, which means what happens in local (most notably, variable definitions), stays in local. Only the last value of the block is returned to the outside world. So:
x <- local({
a <- 2
a * 2
})
x
## [1] 4
a
## Error: object 'a' not found
local is defined like this:
local <- function(expr, envir = new.env()){
eval.parent(substitute(eval(quote(expr), envir)))
}
As I understand it, two rounds of expression quoting and subsequent evaluation happen:
eval(quote([whatever expr input]), [whatever envir input]) is generated as an unevaluated call by substitute.
The call is evaluated in local's caller frame (which is in our case, the Global Environment), so
[whatever expr input] is evaluated in [whatever envir input]
However, I do not understand why step 2 is nessecary. Why can't I simply define local like this:
local2 <- function(expr, envir = new.env()){
eval(quote(expr), envir)
}
I would think it evaluates the expression expr in an empty environment? So any variable defined in expr should exist in envir and therefore vanish after the end of local2?
However, if I try this, I get:
x <- local2({
a <- 2
a * 2
})
x
## [1] 4
a
## [1] 2
So a leaks to the Global Environment. Why is this?
EDIT: Even more mysterious: Why does it not happen for:
eval(quote({a <- 2; a*2}), new.env())
## [1] 4
a
## Error: object 'a' not found

Parameters to R functions are passed as promises. They are not evaluated unless the value is specifically requested. So look at
# clean up first
if exists("a") rm(a)
f <- function(x) print(1)
f(a<-1)
# [1] 1
a
# Error: object 'a' not found
g <- function(x) print(x)
g(a<-1)
# [1] 1
a
# [1] 1
Note that in the g() function, we are using the value passed to the function which is that assignment to a so that creates a in the global environment. With f(), that variable is never created because that function parameter remained a promise end was never evaluated.
If you want to access a parameter without evaluating it, you need to use something like match.call() or subsititute(). The local() function does the latter.
If you remove the eval.parent(), you'll see that the substitute puts the un-evaluated expression from the parameter into a new call to eval().
h <- function(expr, envir = new.env()){
substitute(eval(quote(expr), envir))
}
h(a<-1)
# eval(quote(a <- 1), new.env())
Where as if you do
j<- function(x) {
quote(x)
}
j(a<-1)
# x
you are not really creating a new function call. Further more when you eval() that expression, you are triggering the evaluation of x from it's original calling environment (triggering the evaluation of the promise), not evaluating the expression in a new environment.
local() then uses the eval.parent() so that you can use existing variables in the environment within your block. For example
b<-5
local({
a <- b
a * 2
})
# [1] 10
Look at the behaviors here
local2 <- function(expr, envir = new.env()){
eval(quote(expr), envir)
}
local2({a<-5; a})
# [1] 5
local2({a<-5; a}, list(a=100, expr="hello"))
# [1] "hello"
See how when we use a non-empty environment, the eval() is looking up expr in the environment, it's not evaluating your code block in the environment.

Related

Manipulating enclosing environment of a function

I'm trying to get a better understanding of closures, in particular details on a function's scope and how to work with its enclosing environment(s)
Based on the Description section of the help page on rlang::fn_env(), I had the understanding, that a function always has access to all variables in its scope and that its enclosing environment belongs to that scope.
But then, why isn't it possible to manipulate the contents of the closure environment "after the fact", i.e. after the function has been created?
By means of R's lexical scoping, shouldn't bar() be able to find x when I put into its enclosing environment?
foo <- function(fun) {
env_closure <- rlang::fn_env(fun)
env_closure$x <- 5
fun()
}
bar <- function(x) x
foo(bar)
#> Error in fun(): argument "x" is missing, with no default
Ah, I think I got it down now.
It has to do with the structure of a function's formal arguments:
If an argument is defined without a default value, R will complain when you call the function without specifiying that even though it might technically be able to look it up in its scope.
One way to kick off lexical scoping even though you don't want to define a default value would be to set the defaults "on the fly" at run time via rlang::fn_fmls().
foo <- function(fun) {
env_enclosing <- rlang::fn_env(fun)
env_enclosing$x <- 5
fun()
}
# No argument at all -> lexical scoping takes over
baz <- function() x
foo(baz)
#> [1] 5
# Set defaults to desired values on the fly at run time of `foo()`
foo <- function(fun) {
env_enclosing <- rlang::fn_env(fun)
env_enclosing$x <- 5
fmls <- rlang::fn_fmls(fun)
fmls$x <- substitute(get("x", envir = env_enclosing, inherits = FALSE))
rlang::fn_fmls(fun) <- fmls
fun()
}
bar <- function(x) x
foo(bar)
#> [1] 5
I can't really follow your example as I am unfamiliar with the rlang library but I think a good example of a closure in R would be:
bucket <- function() {
n <- 1
foo <- function(x) {
assign("n", n+1, envir = parent.env(environment()))
n
}
foo
}
bar <- bucket()
Because bar() is define in the function environment of bucket then its parent environment is bucket and therefore you can carry some data there. Each time you run it you modify the bucket environment:
bar()
[1] 2
bar()
[1] 3
bar()
[1] 4

R eval(): changed behavior when argument 'envir' is explicitly set to default value

Consider the function fun1(). Calling it does not assign the value 2 to xx in .GlobalEnv.
fun1 <- function(x) eval(expr=substitute(x))
fun1({xx <- 2; xx})
## [1] 2
xx
## Error: object 'xx' not found
The default value of the argument envir of eval() is:
formals(eval)$envir
## parent.frame()
In fun2() the argument envir is explicitly set to its default value parent.frame(). Calling fun2() does assign the value 2 to xx in .GlobalEnv.
fun2 <- function(x) eval(expr=substitute(x), envir=parent.frame())
fun2({xx <- 2; xx})
## [1] 2
xx
## [1] 2
(Tested with R version 3.5.0)
Why is that? Is that behavior intended?
Defaults to functions are evaluated in the evaluation frame of the function. Explicit arguments are evaluated in the calling frame. (Both of these can be changed by non-standard evaluation tricks, but you're not using those.)
So in your first example, parent.frame() is the parent of the call to eval(), i.e. the evaluation frame of fun1(). In your second example, parent.frame() is the parent of the call to fun2().

Finding the origin environment of the ... (dots) arguments of a call

I want to be able to find the environment from which the ... (dots) arguments of a call originate.
Scenario
For example, consider a function
foo <- function(x, ...) {
# do something
}
We want a function env_dots(), which we invoke from within foo(), that finds the originating environment of the ... in a call to foo(), even when the call to foo() is deeply nested. That is, if we define
foo <- function(x, ...) {
# find the originating environment of '...'
env <- env_dots()
# do something
}
and nest a call to foo, like so,
baz <- function(...) {
a <- "You found the dots"
bar(1, 2)
}
bar <- function(...)
foo(...)
then calling baz() should return the environment in which the ... in the (nested) call to foo(...) originates: this is the environment where the call bar(1, 2) is made, since the 2 (but not the 1) gets passed to the dots of foo. In particular, we should get
baz()$a
#> [1] "You found the dots"
Naive implementation of env_dots()
Update — env_dots(), as defined here, will not work in general, because the final ... may be populated by arguments that are called at multiple levels of the call stack.
Here's one possibility for env_dots():
# mc: match.call() of function from which env_dots() is called
env_dots <- function(mc) {
# Return NULL if initial call invokes no dots
if (!rlang::has_name(mc, "...")) return(NULL)
# Otherwise, climb the call stack until the dots origin is found
stack <- rlang::call_stack()[-1]
l <- length(stack)
i <- 1
while (i <= l && has_dots(stack[[i]]$expr)) i <- i + 1
# return NULL if no dots invoked
if (i <= l) stack[[i + 1]]$env else NULL
}
# Does a call have dots?
has_dots <- function(x) {
if (is.null(x))
return(FALSE)
args <- rlang::lang_tail(x)
any(vapply(args, identical, logical(1), y = quote(...)))
}
This seems to work: with
foo <- function(x, ...)
env_dots(match.call(expand.dots = FALSE))
we get
baz()$a
#> [1] "You found the dots"
bar(1, 2) # 2 gets passed down to the dots of foo()
#> <environment: R_GlobalEnv>
bar(1) # foo() captures no dots
#> NULL
Questions
The above implementation of env_dots() is not very efficient.
Is there are more skillful way to implement env_dots() in rlang and/or base R?
How can I move the match.call() invocation to within env_dots()?
match.call(sys.function(-1), call = sys.call(-1), expand.dots = FALSE) will indeed work.
Remark — One can't infer the origin environment of the dots from rlang::quos(...), because some quosures won't be endowed with the calling environment (e.g., when an expression is a literal object).
I'm sorry to dig up an old question, but I'm not sure the desired behavior is well-defined. ... is not a single expression; it's a list of expressions. In case of rlang quosures, each of those expressions has their own environment. So what should the environment of the list be?
Furthermore, the ... list itself can be modified. Consider the following example, where g takes its ..., prepends it with an (unevaluated) expression x+3 and passes it onto f.
f <- function(...) {rlang::enquos( ... )}
g <- function(...) {
a <- rlang::quo( x + 3 )
l <- rlang::list2( a, ... )
f(!!!l)
}
b <- rlang::quo( 5 * y )
g( b, 10 )
# [[1]]
# <quosure>
# expr: ^x + 3
# env: 0x7ffd1eca16f0
# [[2]]
# <quosure>
# expr: ^5 * y
# env: global
# [[3]]
# <quosure>
# expr: ^10
# env: empty
Notice that each of the three quosures that make it over to f has their own environment. (As you noted in your question, literals like 10 have an empty environment. This is because the value is the same independent of which environment it's evaluated in.)
Given this scenario, what should the hypothetical env_dots() return when called inside f()?

Environments in R, mapply and get

Let x<-2 in the global env:
x <-2
x
[1] 2
Let a be a function that defines another x locally and uses get:
a<-function(){
x<-1
get("x")
}
This function correctly gets x from the local enviroment:
a()
[1] 1
Now let's define a function b as below, that uses mapply with get:
b<-function(){
x<-1
mapply(get,"x")
}
If I call b, it seems that mapply makes get not search the function environment first. Instead, it tries to get x directly form the global enviroment, and if x is not defined in the global env, it gives an error message:
b()
x
2
rm(x)
b()
Error in (function (x, pos = -1L, envir = as.environment(pos), mode = "any", :
object 'x' not found
The solution to this is to explicitly define envir=environment().
c<-function(){
x<-1
mapply(get,"x", MoreArgs = list(envir=environment()))
}
c()
x
1
But I would like to know what exactly is going on here. What is mapplydoing? (And why? is this the expected behavior?) Is this "pitfall" common in other R functions?
The problem is that get looks into the envivornment that its called from but here we are passing get to mapply and then calling get from the local environment within mapply. If x is not found within the mapply local environment then it looks the into the parent environment of that, i.e. into environment(mapply) (which is the lexical environment that mapply was defined in which is the base namespace environment); if it is not there either, it looks into the parent of that, which is the global environment, i.e. your R workspace.
This is because R uses lexical scoping, as opposed to dynamic scoping.
We can show this by getting a variable that exists within mapply.
x <- 2
b2<-function(){
x<-1
mapply(get, "USE.NAMES")
}
b2() # it finds USE.NAMES in mapply
## USE.NAMES
## TRUE
In addition to the workaround involving MoreArgs shown in the question this also works since it causes the search to look into the local environment within b after failing to find it mapply. (This is just for illustrating what is going on and in actual practice we would prefer the workaround shown in the question.)
x <- 2
b3 <-function(){
x<-1
environment(mapply) <- environment()
mapply(get, "x")
}
b3()
## 1
ADDED Expanded explanation. Also note that we can view the chain of environments like this:
> debug(get)
> b()
debugging in: (function (x, pos = -1L, envir = as.environment(pos), mode = "any",
inherits = TRUE)
.Internal(get(x, envir, mode, inherits)))(dots[[1L]][[1L]])
debug: .Internal(get(x, envir, mode, inherits))
Browse[2]> envir
<environment: 0x0000000021ada818>
Browse[2]> ls(envir) ### this shows that envir is the local env in mapply
[1] "dots" "FUN" "MoreArgs" "SIMPLIFY" "USE.NAMES"
Browse[2]> parent.env(envir) ### the parent of envir is the base namespace env
<environment: namespace:base>
Browse[2]> parent.env(parent.env(envir)) ### and grandparent of envir is the global env
<environment: R_GlobalEnv>
Thus, the ancestory of environments potentially followed is this (where arrow points to parent):
local environment within mapply --> environment(mapply) --> .GlobalEnv
where environment(mapply) equals asNamespace("base"), the base namespace environment.
R is lexically scoped, not dynamically scoped, meaning that when you search through parent environments to find a value, you are searching through the lexical parents (as written in the source code), not through the dynamic parents (as invoked). Consider this example:
x <- "Global!"
fun1 <- function() print(x)
fun2 <- function() {
x <- "Local!"
fun1a <- function() print(x)
fun1() # fun2() is dynamic but not lexical parent of fun1()
fun1a() # fun2() is both dynamic and lexical parent of fun1a()
}
fun2()
outputs:
[1] "Global!"
[1] "Local!"
In this case fun2 is the lexical parent of fun1a, but not of fun1. Since mapply is not defined inside your functions, your functions are not the lexical parents of mapply and the xs defined therein are not directly accessible to mapply.
The issue is an interplay with built-in C code. Namely, considering the following:
fx <- function(x) environment()
env <- NULL; fn <- function() { env <<- environment(); mapply(fx, 1)[[1]] }
Then
env2 <- fn()
identical(env2, env)
# [1] FALSE
identical(parent.env(env2), env)
# [1] FALSE
identical(parent.env(env2), globalenv())
# [1] TRUE
More specifically, the problem lies in the underlying C code, which fails to consider executing environment, and hands it off to an as-is underlying C eval call which creates a temp environment branching directly off of R_GlobalEnv.
Note this really is what is going on, since no level of stack nesting fixes the issue:
env <- NULL; fn2 <- function() { env <<- environment(); (function() { mapply(fx, 1)[[1]] })() }
identical(parent.env(fn2()), globalenv())
# [1] TRUE

How do I query for values of symbols in a closure in R?

How can I query the value of x for foo in the R code below?
make.foo <- function() {
x <- 123
function() x * 3
}
foo <- make.foo()
# now get foo's x
A function will have an environment
from ?`function`
A closure has three components, its formals (its argument list), its body (expr in the ‘Usage’ section) and its environment which provides the enclosure of the evaluation frame when the closure is used.
so you can get from that environment (or list the objects using ls)
get('x', envir = environment(foo))
## [1] 123
or if you want to know all the objects in the environment
ls(envir = environment(foo))
## 'x'
and if you want to assign to that environment (ie change x)
assign('x', 24, envir = environment(foo))
foo()
## 72
You can even remove it from the environment
rm(x, envir = environment(foo))
foo()
## Error in foo() : object 'x' not found
and then use a globally assigned x
x <- 3
foo()
# [1] 9
and reassign to the function's environment
assign('x', 123, envir = environment(foo))
foo()
## [1] 369
If you want to look for something in an object's environment and nowhere else then use get with inherits=FALSE. Otherwise you'll risk finding things in the function's parent environment. Example using your make.foo above:
> z=999
> get("x",environment(foo))
[1] 123
> get("z",environment(foo))
[1] 999
> get("x",environment(foo),inherits=FALSE)
[1] 123
> get("z",environment(foo),inherits=FALSE)
Error in get("z", environment(foo), inherits = FALSE) :
object 'z' not found
The second get shows that you might not get an error if you try and get something that isn't in the closure's environment if it appears in the parent environment. This may cause odd bugs. With inherits=FALSE you get an immediate error.

Resources