R: Could I make the execution environment of a function permanent? - r

I'm studying R environments but I have got a question reading the "Advanced R" book by Hadley Wickham. Is it possible to make the execution environment of a function permanent?
I will try to explain the why of my question.
When Wickham explains how the execution environment of a function works the following example is shown:
j <- function() {
if (!exists("a")) {
a <- 1
} else {
a <- a + 1
}
print(a)
}
j()
I have understood why every time the function j gets called the returned value is 1.
In another part of the text he says:
When you create a function inside another function, the enclosing
environment of the child function is the execution environment of the
parent, and the execution environment is no longer ephemeral.
So I have thought to create the following function:
j <- function() {
if (!exists("a")) {
a <- 1
} else {
a <- a + 1
}
print(a)
g <- function() {}
g()
}
j()
But the returned value is always 1, so I suppose it's because the execution environment continues to be destroyed every time. What does that "no longer ephemeral" mean?

Based on the book there is also possible to use function factory structure (a function that creates another function) to capture the ephemeral execution environment of the first function. The following example is just a simple way of how we could capture it:
library(rlang)
j <- function() {
print(current_env())
a <- 1
k <- function() {
if (!exists("a")) {
a <- 1
} else {
a <- a + 1
}
print(a)
}
}
plus <- j()
> plus <- j()
<environment: 0x000001ca98bc1598>
Now no matter how many times you use the function plus, its environment will always be the then execution environment of the first function:
library(rlang)
env_print(plus)
<environment: 000001CA98BC1598>
parent: <environment: global>
bindings:
* k: <fn>
* a: <dbl>
plus()
[1] 2
env_print(plus)
<environment: 000001CA98BC1598>
parent: <environment: global>
bindings:
* k: <fn>
* a: <dbl>
I hope this to some extent answered your question, however there might be better answers out there too.

A permanent environment within a function is called a "closure". Here a toy example to demonstrate this. Check it out and then modify your code accordingly.
closure <- function(j) {
i <- 1
function(x) {
i <<- i + 1
j * i + x
}
}
i <- 12345
instance <- closure(11)
instance(3)
#[1] 25
instance(3)
#[1] 36
instance(3)
#[1] 47
otherObj <- closure(2)
otherObj(3)
#[1] 7
instance(2)
#[1] 57

Related

Accessing variables in closure in R

In the following example, why do f$i and f$get_i() return different results?
factory <- function() {
my_list <- list()
my_list$i <- 1
my_list$increment <- function() {
my_list$i <<- my_list$i + 1
}
my_list$get_i <- function() {
my_list$i
}
my_list
}
f <- factory()
f$increment()
f$get_i() # returns 2
f$i # returns 1
The way you code is very similar to the functional paradigm. R is more often used as a script language. So unless you exactly know what you are doing, it is bad practice to use <<- or to include functions in a functions.
You can find the explanation here at the function environment chapter.
Environment is a space/frame where your code is executed. Environment can be nested, in the same way functions are.
When creating a function, you have an enclosure environment attached which can be called by environment. This is the enclosing environment.
The function is executed in another environment, the execution environment with the fresh start principle. The execution environment is a children environment of the enclosing environment.
For exemple, on my laptop:
> environment()
<environment: R_GlobalEnv>
> environment(f$increment)
<environment: 0x0000000022365d58>
> environment(f$get_i)
<environment: 0x0000000022365d58>
f is an object located in the global environment.
The function increment has the enclosing environment 0x0000000022365d58 attached, the execution environment of the function factory.
I quote from Hadley:
When you create a function inside another function, the enclosing
environment of the child function is the execution environment of the
parent, and the execution environment is no longer ephemeral.
When the function f is executed, the enclosing environments are created with the my_list object in it.
That can be assessed with the ls command:
> ls(envir = environment(f$increment))
[1] "my_list"
> ls(envir = environment(f$get_i))
[1] "my_list"
The <<- operator is searching in the parents environments for the variables used. In that case, the my_list object found is the one in the immediate upper environment which is the enclosing environment of the function.
So when an increment is made, it is made only in that environment and not in the global.
You can see it by replacing the increment function by that:
my_list$increment <- function() {
print("environment")
print(environment())
print("Parent environment")
print(parent.env(environment()))
my_list$i <<- my_list$i + 1
}
It give me:
> f$increment()
[1] "environment"
<environment: 0x0000000013c18538>
[1] "Parent environment"
<environment: 0x0000000022365d58>
You can use get to access to your result once you have stored the environment name:
> my_main_env <- environment(f$increment)
> get("my_list", env = my_main_env)
$i
[1] 2
$increment
function ()
{
print("environment")
print(environment())
print("Parent environment")
print(parent.env(environment()))
my_list$i <<- my_list$i + 1
}
<environment: 0x0000000022365d58>
$get_i
function ()
{
print("environment")
print(environment())
print("Parent environment")
print(parent.env(environment()))
my_list$i
}
<environment: 0x0000000022365d58>
f <- factory()
creates my_list object with my_list$i = 1 and assigns it to f. So now f$i = 1.
f$increment()
increments my_list$i only. It does not affect f.
Now
f$get_i()
returns (previously incremented) my_list$i while
f$i
returns unaffected f$i
It' because you used <<- operator that operates on global objects. If you change your code to
my_list$increment <- function(inverse) {
my_list$i <- my_list$i + 1
}
my_list will be incremented only inside increment function. So now you get
> f$get_i()
[1] 1
> f$i
[1] 1
Let me add a one more line to your code, so we could see increment's intestines:
my_list$increment <- function(inverse) {
my_list$i <- my_list$i + 1
return(my_list$i)
}
Now, you can see that <- operates only inside increment while <<- operated outside of it.
> f <- factory()
> f$increment()
[1] 2
> f$get_i()
[1] 1
> f$i
[1] 1
Based on comments from #Cath on "value by reference", I was inspired to come up with this.
library(data.table)
factory <- function() {
my_list <- list()
my_list$i <- data.table(1)
my_list$increment <- function(inverse) {
my_list$i[ j = V1:=V1+1]
}
my_list$get_i <- function() {
my_list$i
}
my_list
}
f <- factory()
f$increment()
f$get_i() # returns 2
V1
1: 2
f$i # returns 1
V1
1: 2
f$increment()
f$get_i() # returns 2
V1
1: 3
f$i # returns 1
V1
1: 3

Understanding R function lazy evaluation

I'm having a little trouble understanding why, in R, the two functions below, functionGen1 and functionGen2 behave differently. Both functions attempt to return another function which simply prints the number passed as an argument to the function generator.
In the first instance the generated functions fail as a is no longer present in the global environment, but I don't understand why it needs to be. I would've thought it was passed as an argument, and is replaced with aNumber in the namespace of the generator function, and the printing function.
My question is: Why do the functions in the list list.of.functions1 no longer work when a is not defined in the global environment? (And why does this work for the case of list.of.functions2 and even list.of.functions1b)?
functionGen1 <- function(aNumber) {
printNumber <- function() {
print(aNumber)
}
return(printNumber)
}
functionGen2 <- function(aNumber) {
thisNumber <- aNumber
printNumber <- function() {
print(thisNumber)
}
return(printNumber)
}
list.of.functions1 <- list.of.functions2 <- list()
for (a in 1:2) {
list.of.functions1[[a]] <- functionGen1(a)
list.of.functions2[[a]] <- functionGen2(a)
}
rm(a)
# Throws an error "Error in print(aNumber) : object 'a' not found"
list.of.functions1[[1]]()
# Prints 1
list.of.functions2[[1]]()
# Prints 2
list.of.functions2[[2]]()
# However this produces a list of functions which work
list.of.functions1b <- lapply(c(1:2), functionGen1)
A more minimal example:
functionGen1 <- function(aNumber) {
printNumber <- function() {
print(aNumber)
}
return(printNumber)
}
a <- 1
myfun <- functionGen1(a)
rm(a)
myfun()
#Error in print(aNumber) : object 'a' not found
Your question is not about namespaces (that's a concept related to packages), but about variable scoping and lazy evaluation.
Lazy evaluation means that function arguments are only evaluated when they are needed. Until you call myfun it is not necessary to evaluate aNumber = a. But since a has been removed then, this evaluation fails.
The usual solution is to force evaluation explicitly as you do with your functionGen2 or, e.g.,
functionGen1 <- function(aNumber) {
force(aNumber)
printNumber <- function() {
print(aNumber)
}
return(printNumber)
}
a <- 1
myfun <- functionGen1(a)
rm(a)
myfun()
#[1] 1

R programming functions

I'm learning R programming. I'm unable to understand how function within function works in R. Example:
f <- function(y) {
function() { y }
}
f()
f(2)()
I'm not able to understand why $f() is not working and showing following message:
function() { y }
<environment: 0x0000000015e8d470>
but when I use $f(4)() then it is showing answer as 4.
Please explain your answer in brief so that I can understand it easily.
For a more general case, let's change the "inner" function a little bit:
f <- function(y) {
function(x) { y + x}
}
Now f(2) returns a function that adds 2 to its argument. The value 2 is kept in the function's environment:
> f(2)
function(x) { y + x}
<environment: 0x0000000015a690f0>
> environment(f(2))$y
[1] 2
.. and you can change it if you really want to but for this you need to assign a name to the output of f():
> g <- f(2)
> environment(g)$y
[1] 2
> environment(g)$y <- 3
> g(1)
[1] 4
Why do you need a g here? Because otherwise, the environment created with f(2) is garbage collected immediately, so there's no way to access it:
> environment(f(2))$y<-4
Error in environment(f(2))$y <- 4 :
target of assignment expands to non-language object
This won't affect the case when you use, say, f(2) only once:
> f(2)(3)
[1] 5
The reason is that the inner function behave as the answer of the outer one son by just doing f(), the result is a function. The best way is to call it double f(3)() as the inner function doesn't take any argument.

R specify function environment

I have a question about function environments in the R language.
I know that everytime a function is called in R, a new environment E
is created in which the function body is executed. The parent link of
E points to the environment in which the function was created.
My question: Is it possible to specify the environment E somehow, i.e., can one
provide a certain environment in which function execution should happen?
A function has an environment that can be changed from outside the function, but not inside the function itself. The environment is a property of the function and can be retrieved/set with environment(). A function has at most one environment, but you can make copies of that function with different environments.
Let's set up some environments with values for x.
x <- 0
a <- new.env(); a$x <- 5
b <- new.env(); b$x <- 10
and a function foo that uses x from the environment
foo <- function(a) {
a + x
}
foo(1)
# [1] 1
Now we can write a helper function that we can use to call a function with any environment.
with_env <- function(f, e=parent.frame()) {
stopifnot(is.function(f))
environment(f) <- e
f
}
This actually returns a new function with a different environment assigned (or it uses the calling environment if unspecified) and we can call that function by just passing parameters. Observe
with_env(foo, a)(1)
# [1] 6
with_env(foo, b)(1)
# [1] 11
foo(1)
# [1] 1
Here's another approach to the problem, taken directly from http://adv-r.had.co.nz/Functional-programming.html
Consider the code
new_counter <- function() {
i <- 0
function() {
i <<- i + 1
i
}
}
(Updated to improve accuracy)
The outer function creates an environment, which is saved as a variable. Calling this variable (a function) effectively calls the inner function, which updates the environment associated with the outer function. (I don't want to directly copy Wickham's entire section on this, but I strongly recommend that anyone interested read the section entitled "Mutable state". I suspect you could get fancier than this. For example, here's a modification with a reset option:
new_counter <- function() {
i <- 0
function(reset = FALSE) {
if(reset) i <<- 0
i <<- i + 1
i
}
}
counter_one <- new_counter()
counter_one()
counter_one()
counter_two <- new_counter()
counter_two()
counter_two()
counter_one(reset = TRUE)
I am not sure I completely track the goal of the question. But one can set the environment that a function executes in, modify the objects in that environment and then reference them from the global environment. Here is an illustrative example, but again I do not know if this answers the questioners question:
e <- new.env()
e$a <- TRUE
testFun <- function(){
print(a)
}
testFun()
Results in: Error in print(a) : object 'a' not found
testFun2 <- function(){
e$a <- !(a)
print(a)
}
environment(testFun2) <- e
testFun2()
Returns: FALSE
e$a
Returns: FALSE

add on.exit expr to parent call?

Is it possible to add an on.exit expr to the parent call? If so, how?
For example, say that parentOnExit(expr) is a function implementing this. Then for the following code:
f <- function() {
parentOnExit(print("B"))
print("A")
}
I want to see "A" printed, then "B".
Background: What brought this to mind was the following... we have a collection of functions, some of which call others, which require a resource that should be shared from the topmost call down and which also should be closed upon exiting the topmost function. Eg, a connection to a remote server which is expensive to open. One pattern for this is:
foo <- function(r=NULL) {
if (is.null(r)) { # If we weren't passed open connection, open one
r <- openR()
on.exit(close(r))
}
bar(r=r) # Pass the open connection down
}
I was hoping to abstract those three lines down to:
r <- openIfNull(r) # Magically call on.exit(close(r)) in scope of caller
Now that I think about it though, perhaps it's worth some repeated code to avoid anything too magical. But still I'm curious about the answer to my original question. Thank you!
I have seen in this recent mail discussion (https://stat.ethz.ch/pipermail/r-devel/2013-November/067874.html) that you can use do.call for this:
f <- function() { do.call("on.exit", list(quote(cat('ONEXIT!\n'))), envir = parent.frame()); 42 }
g <- function() { x <- f(); cat('Not yet!\n'); x }
g()
#Not yet!
#ONEXIT!
#[1] 42
Using this feature and an additional ugly trick to pass the R connection object to the caller environment, it seems to solve the problem:
openR <- function(id = "connection1") {
message('openR():', id)
list(id)
}
closeR <- function(r) {
message('closeR():', r[[1]])
}
openRIfNull <- function(r) {
if (length(r)) return(r)
# create the connection
r <- openR("openRIfNull")
# save it in the parent call environment
parent_env <- parent.frame()
assign("..openRIfNull_r_connection..", r, envir = parent_env)
do.call("on.exit", list(quote(closeR(..openRIfNull_r_connection..))), envir = parent_env)
r
}
foo <- function(r = NULL) {
message('entered foo()')
r <- openRIfNull(r)
bar(r = r) # Pass the open connection down
message('exited foo()')
}
bar <- function(r) {
message('bar()')
}
example use:
foo()
# entered foo()
# openR():openRIfNull
# bar()
# exited foo()
# closeR():openRIfNull
foo(openR('before'))
# entered foo()
# openR():before
# bar()
# exited foo()
I was intrigued by the problem and tried a couple of ways to solve it. Unfortunately, they didn't work. I'm therefore inclined to believe that it can't be done. ...But someone else might be able to prove me wrong!
Anyway, I though I'd post my failed attempts so that they are recorded. I made them so that they would print "ONEXIT!" after "Not yet!" if they worked...
1 - First, simply try to evaluate the on.exit in the parent environment:
f <- function() { eval(on.exit(cat('ONEXIT!\n')), parent.frame()); 42 }
g <- function() { x<-f(); cat('Not yet!\n'); x }
g() # Nope, doesn't work!
This doesn't work, probably because the on.exit function adds stuff to the current stack frame, not the current environment.
2 - Step up the game and try to return an expression that is evaluated by the caller:
f <- function() { quote( {on.exit(cat('ONEXIT!\n')); 42}) }
g <- function() { x<-eval(f()); cat('Not yet!\n'); x }
g() # Nope, doesn't work!
This doesn't work either, probably because eval has its own stack frame, different from g.
3 - Bring my A-game, and try to rely on lazy evaluation:
h <- function(x) sys.frame(sys.nframe())
f <- function() { h({cat('Registering\n');on.exit(cat("ONEXIT!\n"));42}) }
g <- function() { x<-f()$x; cat('Not yet!\n'); x }
g() # Worse, "ONEXIT!" is never printed...
This one returns an environment to the caller, and when the caller accesses "x" in it, the expression including on.exit is evaluated. ...But it seems on.exit does not register at all in this case.
4 - Hmm. There is one way that might still work: a .Call to some C code that calls on.exit. It might be that calling C won't add another stack frame... This is a bit too complex for me to test now, but maybe some RAPI/RCpp guru could give it a shot?
I remain confused, but if Tommy can't do it, I suspect I won't be able to either. This does the first task and since it seemed so simple I thought I must be missing something:
f <- function() {
on.exit(print("B"))
print("A")
}
Second effort:
txtB <- textConnection("test b")
txt <-textConnection("test A")
f <- function(con) { df <- read.table(con);
if( isOpen(txtB)){ print("B open")
eval( close(txtB), env=.GlobalEnv ) }
return(df) }
txtB #just to make sure it's still open
# description class mode text
# "\"test b\"" "textConnection" "r" "text"
# opened can read can write
# "opened" "yes" "no"
dat <- f(txt); dat
#[1] "B open"
# V1 V2
#1 test A
txtB
#Error in summary.connection(x) : invalid connection
(OK, I edited it to close a connection within the calling environment.)
So what am I missing? (It wasn't clear to me as I tested this that connections actually have environments.)
Though this question is quite old, there is a simple fix for any future visitors:
Use add=TRUE (I don't find the documentation very clear.)
f <- function() {
on.exit(expr = print("B"),
add = TRUE)
print("A")
}
A another solution is using withr::defer() which has more options and better documentation.
The vignette is especially helpful.

Resources