Environments in R, mapply and get - r

Let x<-2 in the global env:
x <-2
x
[1] 2
Let a be a function that defines another x locally and uses get:
a<-function(){
x<-1
get("x")
}
This function correctly gets x from the local enviroment:
a()
[1] 1
Now let's define a function b as below, that uses mapply with get:
b<-function(){
x<-1
mapply(get,"x")
}
If I call b, it seems that mapply makes get not search the function environment first. Instead, it tries to get x directly form the global enviroment, and if x is not defined in the global env, it gives an error message:
b()
x
2
rm(x)
b()
Error in (function (x, pos = -1L, envir = as.environment(pos), mode = "any", :
object 'x' not found
The solution to this is to explicitly define envir=environment().
c<-function(){
x<-1
mapply(get,"x", MoreArgs = list(envir=environment()))
}
c()
x
1
But I would like to know what exactly is going on here. What is mapplydoing? (And why? is this the expected behavior?) Is this "pitfall" common in other R functions?

The problem is that get looks into the envivornment that its called from but here we are passing get to mapply and then calling get from the local environment within mapply. If x is not found within the mapply local environment then it looks the into the parent environment of that, i.e. into environment(mapply) (which is the lexical environment that mapply was defined in which is the base namespace environment); if it is not there either, it looks into the parent of that, which is the global environment, i.e. your R workspace.
This is because R uses lexical scoping, as opposed to dynamic scoping.
We can show this by getting a variable that exists within mapply.
x <- 2
b2<-function(){
x<-1
mapply(get, "USE.NAMES")
}
b2() # it finds USE.NAMES in mapply
## USE.NAMES
## TRUE
In addition to the workaround involving MoreArgs shown in the question this also works since it causes the search to look into the local environment within b after failing to find it mapply. (This is just for illustrating what is going on and in actual practice we would prefer the workaround shown in the question.)
x <- 2
b3 <-function(){
x<-1
environment(mapply) <- environment()
mapply(get, "x")
}
b3()
## 1
ADDED Expanded explanation. Also note that we can view the chain of environments like this:
> debug(get)
> b()
debugging in: (function (x, pos = -1L, envir = as.environment(pos), mode = "any",
inherits = TRUE)
.Internal(get(x, envir, mode, inherits)))(dots[[1L]][[1L]])
debug: .Internal(get(x, envir, mode, inherits))
Browse[2]> envir
<environment: 0x0000000021ada818>
Browse[2]> ls(envir) ### this shows that envir is the local env in mapply
[1] "dots" "FUN" "MoreArgs" "SIMPLIFY" "USE.NAMES"
Browse[2]> parent.env(envir) ### the parent of envir is the base namespace env
<environment: namespace:base>
Browse[2]> parent.env(parent.env(envir)) ### and grandparent of envir is the global env
<environment: R_GlobalEnv>
Thus, the ancestory of environments potentially followed is this (where arrow points to parent):
local environment within mapply --> environment(mapply) --> .GlobalEnv
where environment(mapply) equals asNamespace("base"), the base namespace environment.

R is lexically scoped, not dynamically scoped, meaning that when you search through parent environments to find a value, you are searching through the lexical parents (as written in the source code), not through the dynamic parents (as invoked). Consider this example:
x <- "Global!"
fun1 <- function() print(x)
fun2 <- function() {
x <- "Local!"
fun1a <- function() print(x)
fun1() # fun2() is dynamic but not lexical parent of fun1()
fun1a() # fun2() is both dynamic and lexical parent of fun1a()
}
fun2()
outputs:
[1] "Global!"
[1] "Local!"
In this case fun2 is the lexical parent of fun1a, but not of fun1. Since mapply is not defined inside your functions, your functions are not the lexical parents of mapply and the xs defined therein are not directly accessible to mapply.

The issue is an interplay with built-in C code. Namely, considering the following:
fx <- function(x) environment()
env <- NULL; fn <- function() { env <<- environment(); mapply(fx, 1)[[1]] }
Then
env2 <- fn()
identical(env2, env)
# [1] FALSE
identical(parent.env(env2), env)
# [1] FALSE
identical(parent.env(env2), globalenv())
# [1] TRUE
More specifically, the problem lies in the underlying C code, which fails to consider executing environment, and hands it off to an as-is underlying C eval call which creates a temp environment branching directly off of R_GlobalEnv.
Note this really is what is going on, since no level of stack nesting fixes the issue:
env <- NULL; fn2 <- function() { env <<- environment(); (function() { mapply(fx, 1)[[1]] })() }
identical(parent.env(fn2()), globalenv())
# [1] TRUE

Related

Manipulating enclosing environment of a function

I'm trying to get a better understanding of closures, in particular details on a function's scope and how to work with its enclosing environment(s)
Based on the Description section of the help page on rlang::fn_env(), I had the understanding, that a function always has access to all variables in its scope and that its enclosing environment belongs to that scope.
But then, why isn't it possible to manipulate the contents of the closure environment "after the fact", i.e. after the function has been created?
By means of R's lexical scoping, shouldn't bar() be able to find x when I put into its enclosing environment?
foo <- function(fun) {
env_closure <- rlang::fn_env(fun)
env_closure$x <- 5
fun()
}
bar <- function(x) x
foo(bar)
#> Error in fun(): argument "x" is missing, with no default
Ah, I think I got it down now.
It has to do with the structure of a function's formal arguments:
If an argument is defined without a default value, R will complain when you call the function without specifiying that even though it might technically be able to look it up in its scope.
One way to kick off lexical scoping even though you don't want to define a default value would be to set the defaults "on the fly" at run time via rlang::fn_fmls().
foo <- function(fun) {
env_enclosing <- rlang::fn_env(fun)
env_enclosing$x <- 5
fun()
}
# No argument at all -> lexical scoping takes over
baz <- function() x
foo(baz)
#> [1] 5
# Set defaults to desired values on the fly at run time of `foo()`
foo <- function(fun) {
env_enclosing <- rlang::fn_env(fun)
env_enclosing$x <- 5
fmls <- rlang::fn_fmls(fun)
fmls$x <- substitute(get("x", envir = env_enclosing, inherits = FALSE))
rlang::fn_fmls(fun) <- fmls
fun()
}
bar <- function(x) x
foo(bar)
#> [1] 5
I can't really follow your example as I am unfamiliar with the rlang library but I think a good example of a closure in R would be:
bucket <- function() {
n <- 1
foo <- function(x) {
assign("n", n+1, envir = parent.env(environment()))
n
}
foo
}
bar <- bucket()
Because bar() is define in the function environment of bucket then its parent environment is bucket and therefore you can carry some data there. Each time you run it you modify the bucket environment:
bar()
[1] 2
bar()
[1] 3
bar()
[1] 4

Why is this simple function not working?

I first defined new variable x, then created function that require x within its body (not as argument). See code below
x <- c(1,2,3)
f1 <- function() {
x^2
}
rm(x)
f2 <- function() {
x <- c(1,2,3)
f1()
}
f(2)
Error in f1() : object 'x' not found
When I removed x, and defined new function f2 that first define x and then execute f1, it shows objects x not found.
I just wanted to know why this is not working and how I can overcome this problem. I do not want x to be name as argument in f1.
Please provide appropriate title because I do not know what kind of problem is this.
You could use a closure to make an f1 with the desired properties:
makeF <- function(){
x <- c(1,2,3)
f1 <- function() {
x^2
}
f1
}
f1 <- makeF()
f1() #returns 1 4 9
There is no x in the global scope but f1 still knows about the x in the environment that it was defined in.
In short: Your are expecting dynamic scoping but are a victim of R's lexical scoping:
dynamic scoping = the enclosing environment of a command is determined during run-time
lexical scoping = the enclosing environment of a command is determined at "compile time"
To understand the lookup path of your variable x in the current and parent environments try this code.
It shows that both functions do not share the environment in with x is defined in f2 so it can't never be found:
# list all parent environments of an environment to show the "search path"
parents <- function(env) {
while (TRUE) {
name <- environmentName(env)
txt <- if (nzchar(name)) name else format(env)
cat(txt, "\n")
if (txt == "R_EmptyEnv") break
env <- parent.env(env)
}
}
x <- c(1,2,3)
f1 <- function() {
print("f1:")
parents(environment())
x^2
}
f1() # works
# [1] "f1:"
# <environment: 0x4ebb8b8>
# R_GlobalEnv
# ...
rm(x)
f2 <- function() {
print("f2:")
parents(environment())
x <- c(1,2,3)
f1()
}
f2() # does not find "x"
# [1] "f2:"
# <environment: 0x47b2d18>
# R_GlobalEnv
# ...
# [1] "f1:"
# <environment: 0x4765828>
# R_GlobalEnv
# ...
Possible solutions:
Declare x in the global environment (bad programming style due to lack of encapsulation)
Use function parameters (this is what functions are made for)
Use a closure if x has always the same value for each call of f1 (not for beginners). See the other answer from #JohnColeman...
I strongly propose using 2. (add x as parameter - why do you want to avoid this?).

Why must local({...}) be defined using two rounds of expression quoting?

I'm trying to understand how R's local function is working. With it, you can open a temporary local scope, which means what happens in local (most notably, variable definitions), stays in local. Only the last value of the block is returned to the outside world. So:
x <- local({
a <- 2
a * 2
})
x
## [1] 4
a
## Error: object 'a' not found
local is defined like this:
local <- function(expr, envir = new.env()){
eval.parent(substitute(eval(quote(expr), envir)))
}
As I understand it, two rounds of expression quoting and subsequent evaluation happen:
eval(quote([whatever expr input]), [whatever envir input]) is generated as an unevaluated call by substitute.
The call is evaluated in local's caller frame (which is in our case, the Global Environment), so
[whatever expr input] is evaluated in [whatever envir input]
However, I do not understand why step 2 is nessecary. Why can't I simply define local like this:
local2 <- function(expr, envir = new.env()){
eval(quote(expr), envir)
}
I would think it evaluates the expression expr in an empty environment? So any variable defined in expr should exist in envir and therefore vanish after the end of local2?
However, if I try this, I get:
x <- local2({
a <- 2
a * 2
})
x
## [1] 4
a
## [1] 2
So a leaks to the Global Environment. Why is this?
EDIT: Even more mysterious: Why does it not happen for:
eval(quote({a <- 2; a*2}), new.env())
## [1] 4
a
## Error: object 'a' not found
Parameters to R functions are passed as promises. They are not evaluated unless the value is specifically requested. So look at
# clean up first
if exists("a") rm(a)
f <- function(x) print(1)
f(a<-1)
# [1] 1
a
# Error: object 'a' not found
g <- function(x) print(x)
g(a<-1)
# [1] 1
a
# [1] 1
Note that in the g() function, we are using the value passed to the function which is that assignment to a so that creates a in the global environment. With f(), that variable is never created because that function parameter remained a promise end was never evaluated.
If you want to access a parameter without evaluating it, you need to use something like match.call() or subsititute(). The local() function does the latter.
If you remove the eval.parent(), you'll see that the substitute puts the un-evaluated expression from the parameter into a new call to eval().
h <- function(expr, envir = new.env()){
substitute(eval(quote(expr), envir))
}
h(a<-1)
# eval(quote(a <- 1), new.env())
Where as if you do
j<- function(x) {
quote(x)
}
j(a<-1)
# x
you are not really creating a new function call. Further more when you eval() that expression, you are triggering the evaluation of x from it's original calling environment (triggering the evaluation of the promise), not evaluating the expression in a new environment.
local() then uses the eval.parent() so that you can use existing variables in the environment within your block. For example
b<-5
local({
a <- b
a * 2
})
# [1] 10
Look at the behaviors here
local2 <- function(expr, envir = new.env()){
eval(quote(expr), envir)
}
local2({a<-5; a})
# [1] 5
local2({a<-5; a}, list(a=100, expr="hello"))
# [1] "hello"
See how when we use a non-empty environment, the eval() is looking up expr in the environment, it's not evaluating your code block in the environment.

Get the attribute of a packaged function from within itself

Suppose we have this functions in a R package.
prova <- function() {
print(attr(prova, 'myattr'))
print(myattr(prova))
invisible(TRUE)
}
'myattr<-' <- function(x, value) {
attr(x, 'myattr') <- value
x
}
myattr <- function(x) attr(x, 'myattr')
So, I install the package and then I test it. This is the result:
prova()
# NULL
# NULL
myattr(prova) <- 'ciao' # setting 'ciao' for 'myattr' attribute
prova()
# NULL
# NULL # Why NULL here ?
myattr(prova)
# [1] "ciao"
attr(prova, 'myattr')
# [1] "ciao"
The question is: how to get the attribute of the function from within itself?
Inside the function itself I cannot get its attribute, as demonstrated by the example.
I suppose that the solution will be of the serie "computing on the language" (match.call()[[1L]], substitute, environments and friends). Am I wrong?
I think that the important point here is that this function is in a package (so, it has its environment and namespace) and I need its attribute inside itself, in the package, not outside.
you can use get with the envir argument.
prova <- function() {
print(attr(get("prova", envir=envir.prova), 'myattr'))
print(myattr(prova))
invisible(TRUE)
}
eg:
envir.prova <- environment()
prova()
# NULL
# NULL
myattr(prova) <- 'ciao'
prova()
# [1] "ciao"
# [1] "ciao"
Where envir.prova is a variable whose value you set to the environment in which prova is defined.
Alternatively you can use get(.. envir=parent.frame()), but that is less reliable as then you have to track the calls too, and ensure against another object with the same name between the target environment and the calling environment.
Update regarding question in the comments:
regarding using parent.frame() versus using an explicit environment name: parent.frame, as the name suggests, goes "up one level." Often, that is exactly where you want to go, so that works fine. And yet, even when your goal is get an object in an environment further up, R searches up the call stack until it finds the object with the matching name. So very often, parent.frame() is just fine.
HOWEVER if there are multiple calls between where you are invoking parent.frame() and where the object is located AND in one of the intermediary environments there exists another object with the same name, then R will stop at that intermediary environment and return its object, which is not the object you were looking for.
Therefore, parent.frame() has an argument n (which defaults to 1), so that you can tell R to begin it's search at n levels back.
This is the "keeping track" that I refer to, where the developer has to be mindful of the number of calls in between. The straightforward way to go about this is to have an n argument in every function that is calling the function in question, and have that value default to 1. Then for the envir argument, you use: get/assign/eval/etc (.. , envir=parent.frame(n=n) )
Then if you call Func2 from Func1, (both Func1 and Func2 have an n argument), and Func2 is calling prova, you use:
Func1 <- function(x, y, ..., n=1) {
... some stuff ...
Func2( <some, parameters, etc,> n=n+1)
}
Func2 <- function(a, b, c, ..., n=1) {
.... some stuff....
eval(quote(prova()), envir=parent.frame(n=n) )
}
As you can see, it is not complicated but it is * tedious* and sometimes what seems like a bug creeps in, which is simply forgetting to carry the n over.
Therefore, I prefer to use a fixed variable with the environment name.
The solution that I found is:
myattr <- function(x) attr(x, 'myattr')
'myattr<-' <- function(x, value) {
# check that x is a function (e.g. the prova function)
# checks on value (e.g. also value is a function with a given precise signature)
attr(x, 'myattr') <- value
x
}
prova <- function(..., env = parent.frame()) {
# get the current function object (in its environment)
this <- eval(match.call()[[1L]], env)
# print(eval(as.call(c(myattr, this)), env)) # alternative
print(myattr(this))
# print(attr(this, 'myattr')
invisible(TRUE)
}
I want to thank #RicardoSaporta for the help and the clarification about keeping tracks of the calls.
This solution doesn't work when e.g. myattr(prova) <- function() TRUE is nested in func1 while prova is called in func2 (that it's called by func1). Unless you do not properly update its parameter env ...
For completeness, following the suggestion of #RicardoSaporta, I slightly modified the prova function:
prova <- function(..., pos = 1L) {
# get the current function object (in its environment)
this <- eval(match.call()[[1L]], parent.frame(n = pos)
print(myattr(this))
# ...
}
This way, it works also when nested, if the the correct pos parameter is passed in.
With this modification it is easier to go to fish out the environment in which you set the attribute on the function prova.
myfun1 <- function() {
myattr(prova) <- function() print(FALSE)
myfun2(n = 2)
}
myfun2 <- function(n) {
prova(pos = n)
}
myfun1()
# function() print(FALSE)
# <environment: 0x22e8208>

How do I query for values of symbols in a closure in R?

How can I query the value of x for foo in the R code below?
make.foo <- function() {
x <- 123
function() x * 3
}
foo <- make.foo()
# now get foo's x
A function will have an environment
from ?`function`
A closure has three components, its formals (its argument list), its body (expr in the ‘Usage’ section) and its environment which provides the enclosure of the evaluation frame when the closure is used.
so you can get from that environment (or list the objects using ls)
get('x', envir = environment(foo))
## [1] 123
or if you want to know all the objects in the environment
ls(envir = environment(foo))
## 'x'
and if you want to assign to that environment (ie change x)
assign('x', 24, envir = environment(foo))
foo()
## 72
You can even remove it from the environment
rm(x, envir = environment(foo))
foo()
## Error in foo() : object 'x' not found
and then use a globally assigned x
x <- 3
foo()
# [1] 9
and reassign to the function's environment
assign('x', 123, envir = environment(foo))
foo()
## [1] 369
If you want to look for something in an object's environment and nowhere else then use get with inherits=FALSE. Otherwise you'll risk finding things in the function's parent environment. Example using your make.foo above:
> z=999
> get("x",environment(foo))
[1] 123
> get("z",environment(foo))
[1] 999
> get("x",environment(foo),inherits=FALSE)
[1] 123
> get("z",environment(foo),inherits=FALSE)
Error in get("z", environment(foo), inherits = FALSE) :
object 'z' not found
The second get shows that you might not get an error if you try and get something that isn't in the closure's environment if it appears in the parent environment. This may cause odd bugs. With inherits=FALSE you get an immediate error.

Resources