Accessing variables in closure in R - r

In the following example, why do f$i and f$get_i() return different results?
factory <- function() {
my_list <- list()
my_list$i <- 1
my_list$increment <- function() {
my_list$i <<- my_list$i + 1
}
my_list$get_i <- function() {
my_list$i
}
my_list
}
f <- factory()
f$increment()
f$get_i() # returns 2
f$i # returns 1

The way you code is very similar to the functional paradigm. R is more often used as a script language. So unless you exactly know what you are doing, it is bad practice to use <<- or to include functions in a functions.
You can find the explanation here at the function environment chapter.
Environment is a space/frame where your code is executed. Environment can be nested, in the same way functions are.
When creating a function, you have an enclosure environment attached which can be called by environment. This is the enclosing environment.
The function is executed in another environment, the execution environment with the fresh start principle. The execution environment is a children environment of the enclosing environment.
For exemple, on my laptop:
> environment()
<environment: R_GlobalEnv>
> environment(f$increment)
<environment: 0x0000000022365d58>
> environment(f$get_i)
<environment: 0x0000000022365d58>
f is an object located in the global environment.
The function increment has the enclosing environment 0x0000000022365d58 attached, the execution environment of the function factory.
I quote from Hadley:
When you create a function inside another function, the enclosing
environment of the child function is the execution environment of the
parent, and the execution environment is no longer ephemeral.
When the function f is executed, the enclosing environments are created with the my_list object in it.
That can be assessed with the ls command:
> ls(envir = environment(f$increment))
[1] "my_list"
> ls(envir = environment(f$get_i))
[1] "my_list"
The <<- operator is searching in the parents environments for the variables used. In that case, the my_list object found is the one in the immediate upper environment which is the enclosing environment of the function.
So when an increment is made, it is made only in that environment and not in the global.
You can see it by replacing the increment function by that:
my_list$increment <- function() {
print("environment")
print(environment())
print("Parent environment")
print(parent.env(environment()))
my_list$i <<- my_list$i + 1
}
It give me:
> f$increment()
[1] "environment"
<environment: 0x0000000013c18538>
[1] "Parent environment"
<environment: 0x0000000022365d58>
You can use get to access to your result once you have stored the environment name:
> my_main_env <- environment(f$increment)
> get("my_list", env = my_main_env)
$i
[1] 2
$increment
function ()
{
print("environment")
print(environment())
print("Parent environment")
print(parent.env(environment()))
my_list$i <<- my_list$i + 1
}
<environment: 0x0000000022365d58>
$get_i
function ()
{
print("environment")
print(environment())
print("Parent environment")
print(parent.env(environment()))
my_list$i
}
<environment: 0x0000000022365d58>

f <- factory()
creates my_list object with my_list$i = 1 and assigns it to f. So now f$i = 1.
f$increment()
increments my_list$i only. It does not affect f.
Now
f$get_i()
returns (previously incremented) my_list$i while
f$i
returns unaffected f$i
It' because you used <<- operator that operates on global objects. If you change your code to
my_list$increment <- function(inverse) {
my_list$i <- my_list$i + 1
}
my_list will be incremented only inside increment function. So now you get
> f$get_i()
[1] 1
> f$i
[1] 1
Let me add a one more line to your code, so we could see increment's intestines:
my_list$increment <- function(inverse) {
my_list$i <- my_list$i + 1
return(my_list$i)
}
Now, you can see that <- operates only inside increment while <<- operated outside of it.
> f <- factory()
> f$increment()
[1] 2
> f$get_i()
[1] 1
> f$i
[1] 1

Based on comments from #Cath on "value by reference", I was inspired to come up with this.
library(data.table)
factory <- function() {
my_list <- list()
my_list$i <- data.table(1)
my_list$increment <- function(inverse) {
my_list$i[ j = V1:=V1+1]
}
my_list$get_i <- function() {
my_list$i
}
my_list
}
f <- factory()
f$increment()
f$get_i() # returns 2
V1
1: 2
f$i # returns 1
V1
1: 2
f$increment()
f$get_i() # returns 2
V1
1: 3
f$i # returns 1
V1
1: 3

Related

R: Could I make the execution environment of a function permanent?

I'm studying R environments but I have got a question reading the "Advanced R" book by Hadley Wickham. Is it possible to make the execution environment of a function permanent?
I will try to explain the why of my question.
When Wickham explains how the execution environment of a function works the following example is shown:
j <- function() {
if (!exists("a")) {
a <- 1
} else {
a <- a + 1
}
print(a)
}
j()
I have understood why every time the function j gets called the returned value is 1.
In another part of the text he says:
When you create a function inside another function, the enclosing
environment of the child function is the execution environment of the
parent, and the execution environment is no longer ephemeral.
So I have thought to create the following function:
j <- function() {
if (!exists("a")) {
a <- 1
} else {
a <- a + 1
}
print(a)
g <- function() {}
g()
}
j()
But the returned value is always 1, so I suppose it's because the execution environment continues to be destroyed every time. What does that "no longer ephemeral" mean?
Based on the book there is also possible to use function factory structure (a function that creates another function) to capture the ephemeral execution environment of the first function. The following example is just a simple way of how we could capture it:
library(rlang)
j <- function() {
print(current_env())
a <- 1
k <- function() {
if (!exists("a")) {
a <- 1
} else {
a <- a + 1
}
print(a)
}
}
plus <- j()
> plus <- j()
<environment: 0x000001ca98bc1598>
Now no matter how many times you use the function plus, its environment will always be the then execution environment of the first function:
library(rlang)
env_print(plus)
<environment: 000001CA98BC1598>
parent: <environment: global>
bindings:
* k: <fn>
* a: <dbl>
plus()
[1] 2
env_print(plus)
<environment: 000001CA98BC1598>
parent: <environment: global>
bindings:
* k: <fn>
* a: <dbl>
I hope this to some extent answered your question, however there might be better answers out there too.
A permanent environment within a function is called a "closure". Here a toy example to demonstrate this. Check it out and then modify your code accordingly.
closure <- function(j) {
i <- 1
function(x) {
i <<- i + 1
j * i + x
}
}
i <- 12345
instance <- closure(11)
instance(3)
#[1] 25
instance(3)
#[1] 36
instance(3)
#[1] 47
otherObj <- closure(2)
otherObj(3)
#[1] 7
instance(2)
#[1] 57

How can I source a file into a list in the global environment

I have the following requirement: I have a list of variables and functions defined in a config.R file:
# config.R
x <- 1
foo <- function(y) {
2
}
z <- x + 1
I want the above to be "sourced" in a list defined in the .Globalenv
I have a way to do this by creating a local environment:
source_in_list <- function(path) {
e <- new.env()
source(path, local = e)
return(as.list(e))
}
p <- source_in_list("config.R")
p
$x
[1] 1
$z
[1] 2
$foo
function(y) {
2
}
<environment: 0x2f99d90>
My problem is that foo is linked to the <environment: 0x2f99d90>, which means if I was to redefine foo in the .Globalenv p$foo would be unaffected, and this is not what I want.
Essentially, I would like to do as if I was:
creating p in .Globalenv
executing every line within p
so result would be like:
p
$x
[1] 1
$z
[1] 2
$foo
function(y) {
2
}
How can I do this ?
EDIT:
I realized that what I wanted was define functions from the source file in the globalenv() and the rest in a list
source_in_list <- function(path) {
e <- new.env()
source(path, local = e)
# types
is_fun <- sapply(e, FUN = function(x) inherits(x, "function"))
# define functions from e into globalenv
if(any(is_fun)) {
for(fun_name in names(which(is_fun))) {
# assign in globalenv
assign(x = fun_name, value = eval(parse(text = deparse(get(fun_name, envir = e))), envir = globalenv()), envir = globalenv())
# remove from local env
rm(list = fun_name, envir = e)
}
}
return(as.list(e))
}
p1 <- source_in_list("config.R")
p1
$x
[1] 1
$z
[1] 2
foo
function (y)
{
2
}
>
I think you have a misconception: if foo is stored in your list p, then redefining foo in .Globalenv won't have any effect. Those will be separate objects.
The purpose of the environment associated with a function is to tell R where to look for non-local variables used in the function. Your original version will end up with two copies of everything you sourced, one in the list and one in the local environment you created. If foo referred to x, it would see the one in the local environment. For example, look at this change to your code where foo() returns x:
# config.R
x <- 1
foo <- function() {
x
}
z <- x + 1
and then
source_in_list <- function(path) {
e <- new.env()
source(path, local = e)
return(as.list(e))
}
p <- source_in_list("config.R")
x <- 42 # Set a global variable x
p$foo()
# [1] 1 # It is ignored
p$x <- 123 # Set x in p
p$foo()
# [1] 1 # It is ignored
You probably don't want two copies of everything. But then it's not clear that what you want to do is possible. A list can't act as the environment of a function, so there's no way to make p$x be the target of references from within foo.
I'd suggest that instead of returning a list, you just return the local environment you created. Then things will work as you'd expect:
source_to_local <- function(path) {
e <- new.env()
source(path, local = e)
return(e)
}
e <- source_to_local("config.R")
x <- 42 # set a global
e$foo()
[1] 1 # it is ignored
e$x <- 123 # set x in e
e$foo()
[1] 123 # it responds
The main disadvantage of returning the environment is that they don't print the way lists do, but you could probably write a function to print an environment and make everything in it visible.

How does an environment remember that it exists?

Take
adder <- local({
x <- 0
function() {x <<- x+1; x}
})
or equivalently
adderGen <- function(){
x <- 0
function() {x <<- x+1; x}
}
adder<-adderGen()
Calling adder() will return 1 calling it again returns 2, and so on. But how does adder keep count of this? I can't see any variables hitting the global environment, so what is actually being used to store these? Particularly in the second case, you'd expect adder to forget that it was made inside of a function call.
Every function retains the environment in which it was defined as part of the function. If f is a function then environment(f) shows it. Normally the execution environment within adderGen would be discarded when it exited but because adderGen passes a function out whose environment is the execution environment within adderGen that environment is retained as part of the function that is passed out. We can verify that by displaying the execution environment within adderGen and then verify that it is the same as the environment of adder. The trace function will insert the print statement at the beginning of the body of adderGen and will show the execution environment each time adderGen runs. environment(adder) is the same environment.
trace(adderGen, quote(print(environment())))
## [1] "adderGen"
adder <- adderGen()
## Tracing adderGen() on entry
## <environment: 0x0000000014e77780>
environment(adder)
## <environment: 0x0000000014e77780>
To see what is happening, let us define the function as follows:
adderGen <- function(){
print("Initialize")
x <- 0
function() {x <<- x+1; x}
}
When we evaluate it, we obtain:
adder <- adderGen()
# [1] "Initialize"
The object that has been assigned to adder is the inside function of adderGen (which is the output of adderGen). Note that, adder does not print "Initialize" any more.
adderGen
# function(){
# print("Initialize")
# x <- 0
# a <- function() {x <<- x+1; x}
# }
adder
# function() {x <<- x+1; x}
# <environment: 0x55cd4ebd3390>
We can see that it also creates a new calling environment, which inherits the variable x in the environment of adderGen.
ls(environment(adder))
# [1] "x"
get("x",environment(adder))
# [1] 0
The first time adder is executed, it uses the inherited value of x, i.e. 0, to redefine x as a global variable (in its calling environment). And this global variable is the one that it is used in the next executions. Since x <-0 is not part of the function adder, when adder is executed, the variable x is not initialized to 0 and it increments by one the current value of x.
adder()
# [1] 1

Why is this simple function not working?

I first defined new variable x, then created function that require x within its body (not as argument). See code below
x <- c(1,2,3)
f1 <- function() {
x^2
}
rm(x)
f2 <- function() {
x <- c(1,2,3)
f1()
}
f(2)
Error in f1() : object 'x' not found
When I removed x, and defined new function f2 that first define x and then execute f1, it shows objects x not found.
I just wanted to know why this is not working and how I can overcome this problem. I do not want x to be name as argument in f1.
Please provide appropriate title because I do not know what kind of problem is this.
You could use a closure to make an f1 with the desired properties:
makeF <- function(){
x <- c(1,2,3)
f1 <- function() {
x^2
}
f1
}
f1 <- makeF()
f1() #returns 1 4 9
There is no x in the global scope but f1 still knows about the x in the environment that it was defined in.
In short: Your are expecting dynamic scoping but are a victim of R's lexical scoping:
dynamic scoping = the enclosing environment of a command is determined during run-time
lexical scoping = the enclosing environment of a command is determined at "compile time"
To understand the lookup path of your variable x in the current and parent environments try this code.
It shows that both functions do not share the environment in with x is defined in f2 so it can't never be found:
# list all parent environments of an environment to show the "search path"
parents <- function(env) {
while (TRUE) {
name <- environmentName(env)
txt <- if (nzchar(name)) name else format(env)
cat(txt, "\n")
if (txt == "R_EmptyEnv") break
env <- parent.env(env)
}
}
x <- c(1,2,3)
f1 <- function() {
print("f1:")
parents(environment())
x^2
}
f1() # works
# [1] "f1:"
# <environment: 0x4ebb8b8>
# R_GlobalEnv
# ...
rm(x)
f2 <- function() {
print("f2:")
parents(environment())
x <- c(1,2,3)
f1()
}
f2() # does not find "x"
# [1] "f2:"
# <environment: 0x47b2d18>
# R_GlobalEnv
# ...
# [1] "f1:"
# <environment: 0x4765828>
# R_GlobalEnv
# ...
Possible solutions:
Declare x in the global environment (bad programming style due to lack of encapsulation)
Use function parameters (this is what functions are made for)
Use a closure if x has always the same value for each call of f1 (not for beginners). See the other answer from #JohnColeman...
I strongly propose using 2. (add x as parameter - why do you want to avoid this?).

Difference between <- and <<- [duplicate]

This question already has answers here:
How do you use "<<-" (scoping assignment) in R?
(7 answers)
Closed 7 years ago.
CASE 1:
rm(list = ls())
foo <- function(x = 6){
set <- function(){
x <- x*x}
set()
x}
foo()
# [1] 6
CASE 2:
rm(list = ls())
foo <- function(x = 6){
set <- function(){
x <<- x*x}
set()
x}
foo()
# [1] 36
I read that <<- operator can be used to assign a value to an object in an environment that is different from the current environment. It says that object initialization using <<- can be done to the objects that is not in the current environment. I want to ask which environment's object can be initialized using <<- . In my case the environment is environment of foo function, can <<-initialize the objects outside the function or the object in the current environment? Totally confused when to use <- and when to use <<-.
The operator <<- is the parent scope assignment operator. It is used to make assignments to variables in the nearest parent scope to the scope in which it is evaluated. These assignments therefore "stick" in the scope outside of function calls. Consider the following code:
fun1 <- function() {
x <- 10
print(x)
}
> x <- 5 # x is defined in the outer (global) scope
> fun1()
[1] 10 # x was assigned to 10 in fun1()
> x
[1] 5 # but the global value of x is unchanged
In the function fun1(), a local variable x is assigned to the value 10, but in the global scope the value of x is not changed. Now consider rewriting the function to use the parent scope assignment operator:
fun2 <- function() {
x <<- 10
print(x)
}
> x <- 5
> fun2()
[1] 10 # x was assigned to 10 in fun2()
> x
[1] 10 # the global value of x changed to 10
Because the function fun2() uses the <<- operator, the assignment of x "sticks" after the function has finished evaluating. What R actually does is to go through all scopes outside fun2() and look for the first scope containing a variable called x. In this case, the only scope outside of fun2() is the global scope, so it makes the assignment there.
As a few have already commented, the <<- operator is frowned upon by many because it can break the encapsulation of your R scripts. If we view an R function as an isolated piece of functionality, then it should not be allowed to interfere with the state of the code which calls it. Abusing the <<- assignment operator runs the risk of doing just this.
The <<- operator can be used to assign a variable to the global environment. It's better to use the assign function than <<-. You probably shouldn't need to use <<- though - outputs needed from functions should be returned as objects in R.
Here's an example
f <- function(x) {
y <<- x * 2 # y outside the function
}
f(5) # y = 10
This is equivalent to
f <- function(x) {
x * 2
}
y <- f(5) # y = 10
With the assign function,
f <- function(x) {
assign('y', x*2 envir=.GlobalEnv)
}
f(5) # y = 10

Resources