Into the R console, type:
#First code snippet
x <- 0
x <- x+1
x
You'll get '1'. That makes sense: the idea is that the 'x' in 'x+1' is the current value of x, namely 0, and this is used to compute the value of x+1, namely 1, which is then shoveled into the container x. So far, so good.
Now type:
#Second code snippet
f <- function(n) {n^2}
f <- function(n) {if (n >= 1) {n*f(n-1)} else {1}}
f(5)
You'll get '120', which is 5 factorial.
I find this perplexing. Following the logic of the first code snippet, we might expect the 'f' in the expression
if (n >= 1) {n*f(n-1)} else {1}
to be interpreted as the current value of f, namely
function(n) {n^2}
Following this reasoning, the value of f(5) should be 5*(5-1)^2 = 80. But that's not what we get.
Question. What's really going on here? How does R know not to use the old 'f'?
we might expect the 'f' in the expression
if (n >= 1) {n*f(n-1)} else {1}
to be interpreted as the current value of f
— Yes, we might expect that. And we would be correct.
But what is the “current value of f”? Or, more precisely, what is “current”?
“Current” is when the function is executed, not when it is defined. That is, by the time you execute f(5), it has already been redefined. So now the execution enters the function, looks up inside the function what f refers to — and also finds the current (= new) definition, not the old one.
In other words: the objects associated with names are looked up when they are actually accessed. And inside a function this means that names are accessed when the function is executed, not when it’s defined.
The same is true for all objects. Let’s say f is using a global object that’s not a function:
n = 5
f = function() n ^ 2
n = 1
f() # = 1
To understand the difference between your first and second example, consider the following case which involved functions, yet behaves like your first case (i.e. it uses the “old” value of f).
To make the example work, we need a little helper: a function that modifies other functions. In the following, twice is a function which takes a function as an argument and returns a new function. That new function is the same as the old function, only it runs twice when invoked:
twice = function (original_function) {
force(original_function)
function (...) {
original_function(original_function(...))
}
}
To illustrate what twice does, let’s invoke it on an example function:
plus1 = function (n) n + 1
plus2 = twice(plus1)
plus2(3) # = 5
Neat — R allows us to handle functions like any other object!
Now let’s modify your f:
f = function(n) {n^2}
f = twice(f)
f(5) # 625
… and here we have it: in the statement f = twice(f), the second f refers to the current (= old) definition. Only after that line does f refer to the new, modified function.
Here's a simple example illustrating my comment on Konrad's excellent answer:
a <- 2
f <- function() a*b
e <- new.env()
assign("b",5,e)
environment(f) <- e
> f()
[1] 10
b <- 10
> f()
[1] 10
So we've manually altered the environment for f so that it always first looks in e for b. Theoretically, one could even lock that binding ?lockBinding to make sure it never changes without throwing an error.
This sort of thing could get complicated, though, as in general you'd want to make sure that you set the parent environment of e correctly based on where the function f is actually being created. In this example f is created in the global environment, but if f were being created inside another function, you'd want e's parent environment to reflect that.
Related
I am trying to understand how R scopes variables inside a function. Why is the output 12? Why not 4? How are a & b assigned here
I am learning R. Please explain with some references
f1 <- function(a = {b <- 10; 2}, b = 2) {
a+b
}
f1()
This is explained in section 4.3.3 of the R Language manual.
When a function is called, each formal argument is assigned a promise
in the local environment of the call with the expression slot
containing the actual argument (if it exists) and the environment slot
containing the environment of the caller. If no actual argument for a
formal argument is given in the call and there is a default
expression, it is similarly assigned to the expression slot of the
formal argument, but with the environment set to the local
environment.
The process of filling the value slot of a promise by evaluating the
contents of the expression slot in the promise’s environment is called
forcing the promise. A promise will only be forced once, the value
slot content being used directly later on.
Nothing has a value until the sum starts getting computed. First a is required and so it's expression is evaluated. The promise for b is lost as it gets assigned a value directly during the forcing of a and so the actual b assignment promise in the function definition is not evaluated at all.
If the order is the other way round, you see a different result:
f2 <- function(a = 2, b = {a <- 10; 2}) {
a+b
}
f2()
[1] 4
However, note that the value of a will be 10 at end of the function, but 2 when it is required during the sum. Both promises get evaluated here.
If the order of the sum is reversed in f1 to instead be b+a you would find similar behaviour to f2.
Earlier in that section there is a general warning that side-effects should be avoided in assignments because they is no guarantee they will be evaluated.
R has a form of lazy evaluation of function arguments. Arguments are
not evaluated until needed. It is important to realize that in some
cases the argument will never be evaluated. Thus, it is bad style to
use arguments to functions to cause side-effects. While in C it is
common to use the form, foo(x = y) to invoke foo with the value of y
and simultaneously to assign the value of y to x this same style
should not be used in R. There is no guarantee that the argument will
ever be evaluated and hence the assignment may not take place.
Refer https://www.rdocumentation.org/packages/base/versions/3.5.2/topics/assignOpsenter link description here
Try this
f1 <- function(a = {b <= 10; 2}, b = 2) {
a+b
}
f1()
or
f1 <- function(a = {b <<- 10; 2}, b = 2) {
a+b
}
f1()
One of the most important things to know about the evaluation of arguments to a function is that supplied arguments and default arguments are treated differently. The supplied arguments to a function are evaluated in the evaluation frame of the calling function. The default arguments to a function are evaluated in the evaluation frame of the function.
I don't quite understand what it is meant by calling function. Is it the function that is invoked (like in interactive sesion with function that has named assigned you type name and hit enter). If yes how evaluation frame of the callinig function differs from evaluation frame of the function?
First change to standard terms. The arguments that are used in the function definition are the formal arguments and the arguments that are passed to the function when calling it are the actual arguments. (The quoted passage in the question is referring to the actual arguments when it uses the nonstandard term, supplied arguments.)
Consider two cases via example.
Case 1
Below f has the formal argument x and when f is called in the last line of code there are no actual arguments.
Now when f is called in the last line of code x gets the value 2 because x is not set until it is used and when it is used a is looked up within the function where it has the value 2, not in the caller where it has the value 1.
a <- 1
f <- function(x = a) {
a <- 2
x
}
f()
## [1] 2
Case 2
On the other hand the actual arguments are evaluated in the caller. In the last line of code below x is set to 1 because that is the value of b in the caller. Again, x is not evaluated until it is used but now even though b has been set to 2 in the function itself this has no effect on x. x is set to 1, not 2.
b <- 1
g <- function(x) { b <- 2; x + b }
g(b)
## [1] 3
Other
Although this covers the two cases in the quote note that there exists another case which is the situation that occurs when x is referred to in a function but is not defined in the function. In the code below a is a free variable in g since a is not an argument or otherwise defined in g. In this case when gg (which equals g) is called R attempts to look up a in the function g and fails but the next place it looks is not the caller (where a is 1) but the environment in which the function was defined, i.e. the environment where the word function appears and a is 2 in that environment.
a <- 1
f <- function() {
a <- 2
g <- function() a
}
gg <- f()
gg()
## [1] 2
This is referred to as lexical scoping since one can tell where the free variables are looked up by simply looking at the function definitions.
I've looked at the other lexical scoping questions in R and I can't find the answer. Consider this code:
f <- function(x) {
g <- function(y) {
y + z
}
z <- 4
x + g(x)
}
f(3)
f(3) will return an answer of 10. My question is why? At the point g() is defined in the code, z has not been assigned any value. At what point is the closure for g() created? Does it "look ahead" to the rest of the function body? Is it created when the g(x) is evaluated? If so, why?
When f is run, the first thing that happens is that a function g is created in f's local environment. Next, the variable z is created by assignment.
Finally, x is added to the result of g(x) and returned. At the point that g(x) is called, x = 3 and g exists in f's local environment. When the free variable z is encountered while executing g(x), R looks in the next environment up, the calling environment, which is f's local environment. It finds z there and proceeds, returning 7. It then adds this to x which is 3.
(Since this answer is attracting more attention, I should add that my language was a bit loose when talking about what x "equals" at various points that probably do not accurately reflect R's delayed evaluation of arguments. x will be equal to 3 once the value is needed.)
I just finished reading about scoping in the R intro, and am very curious about the <<- assignment.
The manual showed one (very interesting) example for <<-, which I feel I understood. What I am still missing is the context of when this can be useful.
So what I would love to read from you are examples (or links to examples) on when the use of <<- can be interesting/useful. What might be the dangers of using it (it looks easy to loose track of), and any tips you might feel like sharing.
<<- is most useful in conjunction with closures to maintain state. Here's a section from a recent paper of mine:
A closure is a function written by another function. Closures are
so-called because they enclose the environment of the parent
function, and can access all variables and parameters in that
function. This is useful because it allows us to have two levels of
parameters. One level of parameters (the parent) controls how the
function works. The other level (the child) does the work. The
following example shows how can use this idea to generate a family of
power functions. The parent function (power) creates child functions
(square and cube) that actually do the hard work.
power <- function(exponent) {
function(x) x ^ exponent
}
square <- power(2)
square(2) # -> [1] 4
square(4) # -> [1] 16
cube <- power(3)
cube(2) # -> [1] 8
cube(4) # -> [1] 64
The ability to manage variables at two levels also makes it possible to maintain the state across function invocations by allowing a function to modify variables in the environment of its parent. The key to managing variables at different levels is the double arrow assignment operator <<-. Unlike the usual single arrow assignment (<-) that always works on the current level, the double arrow operator can modify variables in parent levels.
This makes it possible to maintain a counter that records how many times a function has been called, as the following example shows. Each time new_counter is run, it creates an environment, initialises the counter i in this environment, and then creates a new function.
new_counter <- function() {
i <- 0
function() {
# do something useful, then ...
i <<- i + 1
i
}
}
The new function is a closure, and its environment is the enclosing environment. When the closures counter_one and counter_two are run, each one modifies the counter in its enclosing environment and then returns the current count.
counter_one <- new_counter()
counter_two <- new_counter()
counter_one() # -> [1] 1
counter_one() # -> [1] 2
counter_two() # -> [1] 1
It helps to think of <<- as equivalent to assign (if you set the inherits parameter in that function to TRUE). The benefit of assign is that it allows you to specify more parameters (e.g. the environment), so I prefer to use assign over <<- in most cases.
Using <<- and assign(x, value, inherits=TRUE) means that "enclosing environments of the supplied environment are searched until the variable 'x' is encountered." In other words, it will keep going through the environments in order until it finds a variable with that name, and it will assign it to that. This can be within the scope of a function, or in the global environment.
In order to understand what these functions do, you need to also understand R environments (e.g. using search).
I regularly use these functions when I'm running a large simulation and I want to save intermediate results. This allows you to create the object outside the scope of the given function or apply loop. That's very helpful, especially if you have any concern about a large loop ending unexpectedly (e.g. a database disconnection), in which case you could lose everything in the process. This would be equivalent to writing your results out to a database or file during a long running process, except that it's storing the results within the R environment instead.
My primary warning with this: be careful because you're now working with global variables, especially when using <<-. That means that you can end up with situations where a function is using an object value from the environment, when you expected it to be using one that was supplied as a parameter. This is one of the main things that functional programming tries to avoid (see side effects). I avoid this problem by assigning my values to a unique variable names (using paste with a set or unique parameters) that are never used within the function, but just used for caching and in case I need to recover later on (or do some meta-analysis on the intermediate results).
One place where I used <<- was in simple GUIs using tcl/tk. Some of the initial examples have it -- as you need to make a distinction between local and global variables for statefullness. See for example
library(tcltk)
demo(tkdensity)
which uses <<-. Otherwise I concur with Marek :) -- a Google search can help.
On this subject I'd like to point out that the <<- operator will behave strangely when applied (incorrectly) within a for loop (there may be other cases too). Given the following code:
fortest <- function() {
mySum <- 0
for (i in c(1, 2, 3)) {
mySum <<- mySum + i
}
mySum
}
you might expect that the function would return the expected sum, 6, but instead it returns 0, with a global variable mySum being created and assigned the value 3. I can't fully explain what is going on here but certainly the body of a for loop is not a new scope 'level'. Instead, it seems that R looks outside of the fortest function, can't find a mySum variable to assign to, so creates one and assigns the value 1, the first time through the loop. On subsequent iterations, the RHS in the assignment must be referring to the (unchanged) inner mySum variable whereas the LHS refers to the global variable. Therefore each iteration overwrites the value of the global variable to that iteration's value of i, hence it has the value 3 on exit from the function.
Hope this helps someone - this stumped me for a couple of hours today! (BTW, just replace <<- with <- and the function works as expected).
f <- function(n, x0) {x <- x0; replicate(n, (function(){x <<- x+rnorm(1)})())}
plot(f(1000,0),typ="l")
The <<- operator can also be useful for Reference Classes when writing Reference Methods. For example:
myRFclass <- setRefClass(Class = "RF",
fields = list(A = "numeric",
B = "numeric",
C = function() A + B))
myRFclass$methods(show = function() cat("A =", A, "B =", B, "C =",C))
myRFclass$methods(changeA = function() A <<- A*B) # note the <<-
obj1 <- myRFclass(A = 2, B = 3)
obj1
# A = 2 B = 3 C = 5
obj1$changeA()
obj1
# A = 6 B = 3 C = 9
I use it in order to change inside map() an object in the global environment.
a = c(1,0,0,1,0,0,0,0)
Say I want to obtain a vector which is c(1,2,3,1,2,3,4,5), that is if there is a 1, let it 1, otherwise add 1 until the next 1.
map(
.x = seq(1,(length(a))),
.f = function(x) {
a[x] <<- ifelse(a[x]==1, a[x], a[x-1]+1)
})
a
[1] 1 2 3 1 2 3 4 5
Consider the following simple function:
f <- function(x, value){print(x);print(substitute(value))}
Argument x will eventually be evaluated by print, but value never will. So we can get results like this:
> f(a, a)
Error in print(x) : object 'a' not found
> f(3, a)
[1] 3
a
> f(1+1, 1+1)
[1] 2
1 + 1
> f(1+1, 1+"one")
[1] 2
1 + "one"
Everything as expected.
Now consider the same function body in a replacement function:
'g<-' <- function(x, value){print(x);print(substitute(value))}
(the single quotes should be fancy quotes)
Let's try it:
> x <- 3
> g(x) <- 4
[1] 3
[1] 4
Nothing unusual so far...
> g(x) <- a
Error: object 'a' not found
This is unexpected. Name a should be printed as a language object.
> g(x) <- 1+1
[1] 4
1 + 1
This is ok, as x's former value is 4. Notice the expression passed unevaluated.
The final test:
> g(x) <- 1+"one"
Error in 1 + "one" : non-numeric argument to binary operator
Wait a minute... Why did it try to evaluate this expression?
Well the question is: bug or feature? What is going on here? I hope some guru users will shed some light about promises and lazy evaluation on R. Or we may just conclude it's a bug.
We can reduce the problem to a slightly simpler example:
g <- function(x, value)
'g<-' <- function(x, value) x
x <- 3
# Works
g(x, a)
`g<-`(x, a)
# Fails
g(x) <- a
This suggests that R is doing something special when evaluating a replacement function: I suspect it evaluates all arguments. I'm not sure why, but the comments in the C code (https://github.com/wch/r-source/blob/trunk/src/main/eval.c#L1656 and https://github.com/wch/r-source/blob/trunk/src/main/eval.c#L1181) suggest it may be to make sure other intermediate variables are not accidentally modified.
Luke Tierney has a long comment about the drawbacks of the current approach, and illustrates some of the more complicated ways replacement functions can be used:
There are two issues with the approach here:
A complex assignment within a complex assignment, like
f(x, y[] <- 1) <- 3, can cause the value temporary
variable for the outer assignment to be overwritten and
then removed by the inner one. This could be addressed by
using multiple temporaries or using a promise for this
variable as is done for the RHS. Printing of the
replacement function call in error messages might then need
to be adjusted.
With assignments of the form f(g(x, z), y) <- w the value
of z will be computed twice, once for a call to g(x, z)
and once for the call to the replacement function g<-. It
might be possible to address this by using promises.
Using more temporaries would not work as it would mess up
replacement functions that use substitute and/or
nonstandard evaluation (and there are packages that do
that -- igraph is one).
I think the key may be found in this comment beginning at line 1682 of "eval.c" (and immediately followed by the evaluation of the assignment operation's RHS):
/* It's important that the rhs get evaluated first because
assignment is right associative i.e. a <- b <- c is parsed as
a <- (b <- c). */
PROTECT(saverhs = rhs = eval(CADR(args), rho));
We expect that if we do g(x) <- a <- b <- 4 + 5, both a and b will be assigned the value 9; this is in fact what happens.
Apparently, the way that R ensures this consistent behavior is to always evaluate the RHS of an assignment first, before carrying out the rest of the assignment. If that evaluation fails (as when you try something like g(x) <- 1 + "a"), an error is thrown and no assignment takes place.
I'm going to go out on a limb here, so please, folks with more knowledge feel free to comment/edit.
Note that when you run
'g<-' <- function(x, value){print(x);print(substitute(value))}
x <- 1
g(x) <- 5
a side effect is that 5 is assigned to x. Hence, both must be evaluated. But if you then run
'g<-'(x,10)
both the values of x and 10 are printed, but the value of x remains the same.
Speculation:
So the parser is distinguishing between whether you call g<- in the course of making an actual assignment, and when you simply call g<- directly.