This question already has answers here:
How do you use "<<-" (scoping assignment) in R?
(7 answers)
Closed 7 years ago.
CASE 1:
rm(list = ls())
foo <- function(x = 6){
set <- function(){
x <- x*x}
set()
x}
foo()
# [1] 6
CASE 2:
rm(list = ls())
foo <- function(x = 6){
set <- function(){
x <<- x*x}
set()
x}
foo()
# [1] 36
I read that <<- operator can be used to assign a value to an object in an environment that is different from the current environment. It says that object initialization using <<- can be done to the objects that is not in the current environment. I want to ask which environment's object can be initialized using <<- . In my case the environment is environment of foo function, can <<-initialize the objects outside the function or the object in the current environment? Totally confused when to use <- and when to use <<-.
The operator <<- is the parent scope assignment operator. It is used to make assignments to variables in the nearest parent scope to the scope in which it is evaluated. These assignments therefore "stick" in the scope outside of function calls. Consider the following code:
fun1 <- function() {
x <- 10
print(x)
}
> x <- 5 # x is defined in the outer (global) scope
> fun1()
[1] 10 # x was assigned to 10 in fun1()
> x
[1] 5 # but the global value of x is unchanged
In the function fun1(), a local variable x is assigned to the value 10, but in the global scope the value of x is not changed. Now consider rewriting the function to use the parent scope assignment operator:
fun2 <- function() {
x <<- 10
print(x)
}
> x <- 5
> fun2()
[1] 10 # x was assigned to 10 in fun2()
> x
[1] 10 # the global value of x changed to 10
Because the function fun2() uses the <<- operator, the assignment of x "sticks" after the function has finished evaluating. What R actually does is to go through all scopes outside fun2() and look for the first scope containing a variable called x. In this case, the only scope outside of fun2() is the global scope, so it makes the assignment there.
As a few have already commented, the <<- operator is frowned upon by many because it can break the encapsulation of your R scripts. If we view an R function as an isolated piece of functionality, then it should not be allowed to interfere with the state of the code which calls it. Abusing the <<- assignment operator runs the risk of doing just this.
The <<- operator can be used to assign a variable to the global environment. It's better to use the assign function than <<-. You probably shouldn't need to use <<- though - outputs needed from functions should be returned as objects in R.
Here's an example
f <- function(x) {
y <<- x * 2 # y outside the function
}
f(5) # y = 10
This is equivalent to
f <- function(x) {
x * 2
}
y <- f(5) # y = 10
With the assign function,
f <- function(x) {
assign('y', x*2 envir=.GlobalEnv)
}
f(5) # y = 10
Related
Take
adder <- local({
x <- 0
function() {x <<- x+1; x}
})
or equivalently
adderGen <- function(){
x <- 0
function() {x <<- x+1; x}
}
adder<-adderGen()
Calling adder() will return 1 calling it again returns 2, and so on. But how does adder keep count of this? I can't see any variables hitting the global environment, so what is actually being used to store these? Particularly in the second case, you'd expect adder to forget that it was made inside of a function call.
Every function retains the environment in which it was defined as part of the function. If f is a function then environment(f) shows it. Normally the execution environment within adderGen would be discarded when it exited but because adderGen passes a function out whose environment is the execution environment within adderGen that environment is retained as part of the function that is passed out. We can verify that by displaying the execution environment within adderGen and then verify that it is the same as the environment of adder. The trace function will insert the print statement at the beginning of the body of adderGen and will show the execution environment each time adderGen runs. environment(adder) is the same environment.
trace(adderGen, quote(print(environment())))
## [1] "adderGen"
adder <- adderGen()
## Tracing adderGen() on entry
## <environment: 0x0000000014e77780>
environment(adder)
## <environment: 0x0000000014e77780>
To see what is happening, let us define the function as follows:
adderGen <- function(){
print("Initialize")
x <- 0
function() {x <<- x+1; x}
}
When we evaluate it, we obtain:
adder <- adderGen()
# [1] "Initialize"
The object that has been assigned to adder is the inside function of adderGen (which is the output of adderGen). Note that, adder does not print "Initialize" any more.
adderGen
# function(){
# print("Initialize")
# x <- 0
# a <- function() {x <<- x+1; x}
# }
adder
# function() {x <<- x+1; x}
# <environment: 0x55cd4ebd3390>
We can see that it also creates a new calling environment, which inherits the variable x in the environment of adderGen.
ls(environment(adder))
# [1] "x"
get("x",environment(adder))
# [1] 0
The first time adder is executed, it uses the inherited value of x, i.e. 0, to redefine x as a global variable (in its calling environment). And this global variable is the one that it is used in the next executions. Since x <-0 is not part of the function adder, when adder is executed, the variable x is not initialized to 0 and it increments by one the current value of x.
adder()
# [1] 1
I'm trying to get a better understanding of closures, in particular details on a function's scope and how to work with its enclosing environment(s)
Based on the Description section of the help page on rlang::fn_env(), I had the understanding, that a function always has access to all variables in its scope and that its enclosing environment belongs to that scope.
But then, why isn't it possible to manipulate the contents of the closure environment "after the fact", i.e. after the function has been created?
By means of R's lexical scoping, shouldn't bar() be able to find x when I put into its enclosing environment?
foo <- function(fun) {
env_closure <- rlang::fn_env(fun)
env_closure$x <- 5
fun()
}
bar <- function(x) x
foo(bar)
#> Error in fun(): argument "x" is missing, with no default
Ah, I think I got it down now.
It has to do with the structure of a function's formal arguments:
If an argument is defined without a default value, R will complain when you call the function without specifiying that even though it might technically be able to look it up in its scope.
One way to kick off lexical scoping even though you don't want to define a default value would be to set the defaults "on the fly" at run time via rlang::fn_fmls().
foo <- function(fun) {
env_enclosing <- rlang::fn_env(fun)
env_enclosing$x <- 5
fun()
}
# No argument at all -> lexical scoping takes over
baz <- function() x
foo(baz)
#> [1] 5
# Set defaults to desired values on the fly at run time of `foo()`
foo <- function(fun) {
env_enclosing <- rlang::fn_env(fun)
env_enclosing$x <- 5
fmls <- rlang::fn_fmls(fun)
fmls$x <- substitute(get("x", envir = env_enclosing, inherits = FALSE))
rlang::fn_fmls(fun) <- fmls
fun()
}
bar <- function(x) x
foo(bar)
#> [1] 5
I can't really follow your example as I am unfamiliar with the rlang library but I think a good example of a closure in R would be:
bucket <- function() {
n <- 1
foo <- function(x) {
assign("n", n+1, envir = parent.env(environment()))
n
}
foo
}
bar <- bucket()
Because bar() is define in the function environment of bucket then its parent environment is bucket and therefore you can carry some data there. Each time you run it you modify the bucket environment:
bar()
[1] 2
bar()
[1] 3
bar()
[1] 4
I first defined new variable x, then created function that require x within its body (not as argument). See code below
x <- c(1,2,3)
f1 <- function() {
x^2
}
rm(x)
f2 <- function() {
x <- c(1,2,3)
f1()
}
f(2)
Error in f1() : object 'x' not found
When I removed x, and defined new function f2 that first define x and then execute f1, it shows objects x not found.
I just wanted to know why this is not working and how I can overcome this problem. I do not want x to be name as argument in f1.
Please provide appropriate title because I do not know what kind of problem is this.
You could use a closure to make an f1 with the desired properties:
makeF <- function(){
x <- c(1,2,3)
f1 <- function() {
x^2
}
f1
}
f1 <- makeF()
f1() #returns 1 4 9
There is no x in the global scope but f1 still knows about the x in the environment that it was defined in.
In short: Your are expecting dynamic scoping but are a victim of R's lexical scoping:
dynamic scoping = the enclosing environment of a command is determined during run-time
lexical scoping = the enclosing environment of a command is determined at "compile time"
To understand the lookup path of your variable x in the current and parent environments try this code.
It shows that both functions do not share the environment in with x is defined in f2 so it can't never be found:
# list all parent environments of an environment to show the "search path"
parents <- function(env) {
while (TRUE) {
name <- environmentName(env)
txt <- if (nzchar(name)) name else format(env)
cat(txt, "\n")
if (txt == "R_EmptyEnv") break
env <- parent.env(env)
}
}
x <- c(1,2,3)
f1 <- function() {
print("f1:")
parents(environment())
x^2
}
f1() # works
# [1] "f1:"
# <environment: 0x4ebb8b8>
# R_GlobalEnv
# ...
rm(x)
f2 <- function() {
print("f2:")
parents(environment())
x <- c(1,2,3)
f1()
}
f2() # does not find "x"
# [1] "f2:"
# <environment: 0x47b2d18>
# R_GlobalEnv
# ...
# [1] "f1:"
# <environment: 0x4765828>
# R_GlobalEnv
# ...
Possible solutions:
Declare x in the global environment (bad programming style due to lack of encapsulation)
Use function parameters (this is what functions are made for)
Use a closure if x has always the same value for each call of f1 (not for beginners). See the other answer from #JohnColeman...
I strongly propose using 2. (add x as parameter - why do you want to avoid this?).
I have a question about function environments in the R language.
I know that everytime a function is called in R, a new environment E
is created in which the function body is executed. The parent link of
E points to the environment in which the function was created.
My question: Is it possible to specify the environment E somehow, i.e., can one
provide a certain environment in which function execution should happen?
A function has an environment that can be changed from outside the function, but not inside the function itself. The environment is a property of the function and can be retrieved/set with environment(). A function has at most one environment, but you can make copies of that function with different environments.
Let's set up some environments with values for x.
x <- 0
a <- new.env(); a$x <- 5
b <- new.env(); b$x <- 10
and a function foo that uses x from the environment
foo <- function(a) {
a + x
}
foo(1)
# [1] 1
Now we can write a helper function that we can use to call a function with any environment.
with_env <- function(f, e=parent.frame()) {
stopifnot(is.function(f))
environment(f) <- e
f
}
This actually returns a new function with a different environment assigned (or it uses the calling environment if unspecified) and we can call that function by just passing parameters. Observe
with_env(foo, a)(1)
# [1] 6
with_env(foo, b)(1)
# [1] 11
foo(1)
# [1] 1
Here's another approach to the problem, taken directly from http://adv-r.had.co.nz/Functional-programming.html
Consider the code
new_counter <- function() {
i <- 0
function() {
i <<- i + 1
i
}
}
(Updated to improve accuracy)
The outer function creates an environment, which is saved as a variable. Calling this variable (a function) effectively calls the inner function, which updates the environment associated with the outer function. (I don't want to directly copy Wickham's entire section on this, but I strongly recommend that anyone interested read the section entitled "Mutable state". I suspect you could get fancier than this. For example, here's a modification with a reset option:
new_counter <- function() {
i <- 0
function(reset = FALSE) {
if(reset) i <<- 0
i <<- i + 1
i
}
}
counter_one <- new_counter()
counter_one()
counter_one()
counter_two <- new_counter()
counter_two()
counter_two()
counter_one(reset = TRUE)
I am not sure I completely track the goal of the question. But one can set the environment that a function executes in, modify the objects in that environment and then reference them from the global environment. Here is an illustrative example, but again I do not know if this answers the questioners question:
e <- new.env()
e$a <- TRUE
testFun <- function(){
print(a)
}
testFun()
Results in: Error in print(a) : object 'a' not found
testFun2 <- function(){
e$a <- !(a)
print(a)
}
environment(testFun2) <- e
testFun2()
Returns: FALSE
e$a
Returns: FALSE
Say I have a matlab function:
function y = myfunc(x)
persistent a
a = x*10
...
What is the equivalent statement in R for the persistent a statement? <<- or assign()?
Here's one way:
f <- local({ x<-NULL; function(y) {
if (is.null(x)) { # or perhaps !missing(y)
x <<- y+1
}
x
}})
f(3) # First time, x gets assigned
#[1] 4
f() # Second time, old value is used
#[1] 4
What happens is that local creates a new environment around the x<-NULL and the function declaration. So inside the function, it can get to the x variable and assign to it using <<-.
You can find the environment for a function like this:
e <- environment(f)
ls(e) # "x"