I've looked at the other lexical scoping questions in R and I can't find the answer. Consider this code:
f <- function(x) {
g <- function(y) {
y + z
}
z <- 4
x + g(x)
}
f(3)
f(3) will return an answer of 10. My question is why? At the point g() is defined in the code, z has not been assigned any value. At what point is the closure for g() created? Does it "look ahead" to the rest of the function body? Is it created when the g(x) is evaluated? If so, why?
When f is run, the first thing that happens is that a function g is created in f's local environment. Next, the variable z is created by assignment.
Finally, x is added to the result of g(x) and returned. At the point that g(x) is called, x = 3 and g exists in f's local environment. When the free variable z is encountered while executing g(x), R looks in the next environment up, the calling environment, which is f's local environment. It finds z there and proceeds, returning 7. It then adds this to x which is 3.
(Since this answer is attracting more attention, I should add that my language was a bit loose when talking about what x "equals" at various points that probably do not accurately reflect R's delayed evaluation of arguments. x will be equal to 3 once the value is needed.)
Related
One of the most important things to know about the evaluation of arguments to a function is that supplied arguments and default arguments are treated differently. The supplied arguments to a function are evaluated in the evaluation frame of the calling function. The default arguments to a function are evaluated in the evaluation frame of the function.
I don't quite understand what it is meant by calling function. Is it the function that is invoked (like in interactive sesion with function that has named assigned you type name and hit enter). If yes how evaluation frame of the callinig function differs from evaluation frame of the function?
First change to standard terms. The arguments that are used in the function definition are the formal arguments and the arguments that are passed to the function when calling it are the actual arguments. (The quoted passage in the question is referring to the actual arguments when it uses the nonstandard term, supplied arguments.)
Consider two cases via example.
Case 1
Below f has the formal argument x and when f is called in the last line of code there are no actual arguments.
Now when f is called in the last line of code x gets the value 2 because x is not set until it is used and when it is used a is looked up within the function where it has the value 2, not in the caller where it has the value 1.
a <- 1
f <- function(x = a) {
a <- 2
x
}
f()
## [1] 2
Case 2
On the other hand the actual arguments are evaluated in the caller. In the last line of code below x is set to 1 because that is the value of b in the caller. Again, x is not evaluated until it is used but now even though b has been set to 2 in the function itself this has no effect on x. x is set to 1, not 2.
b <- 1
g <- function(x) { b <- 2; x + b }
g(b)
## [1] 3
Other
Although this covers the two cases in the quote note that there exists another case which is the situation that occurs when x is referred to in a function but is not defined in the function. In the code below a is a free variable in g since a is not an argument or otherwise defined in g. In this case when gg (which equals g) is called R attempts to look up a in the function g and fails but the next place it looks is not the caller (where a is 1) but the environment in which the function was defined, i.e. the environment where the word function appears and a is 2 in that environment.
a <- 1
f <- function() {
a <- 2
g <- function() a
}
gg <- f()
gg()
## [1] 2
This is referred to as lexical scoping since one can tell where the free variables are looked up by simply looking at the function definitions.
Into the R console, type:
#First code snippet
x <- 0
x <- x+1
x
You'll get '1'. That makes sense: the idea is that the 'x' in 'x+1' is the current value of x, namely 0, and this is used to compute the value of x+1, namely 1, which is then shoveled into the container x. So far, so good.
Now type:
#Second code snippet
f <- function(n) {n^2}
f <- function(n) {if (n >= 1) {n*f(n-1)} else {1}}
f(5)
You'll get '120', which is 5 factorial.
I find this perplexing. Following the logic of the first code snippet, we might expect the 'f' in the expression
if (n >= 1) {n*f(n-1)} else {1}
to be interpreted as the current value of f, namely
function(n) {n^2}
Following this reasoning, the value of f(5) should be 5*(5-1)^2 = 80. But that's not what we get.
Question. What's really going on here? How does R know not to use the old 'f'?
we might expect the 'f' in the expression
if (n >= 1) {n*f(n-1)} else {1}
to be interpreted as the current value of f
— Yes, we might expect that. And we would be correct.
But what is the “current value of f”? Or, more precisely, what is “current”?
“Current” is when the function is executed, not when it is defined. That is, by the time you execute f(5), it has already been redefined. So now the execution enters the function, looks up inside the function what f refers to — and also finds the current (= new) definition, not the old one.
In other words: the objects associated with names are looked up when they are actually accessed. And inside a function this means that names are accessed when the function is executed, not when it’s defined.
The same is true for all objects. Let’s say f is using a global object that’s not a function:
n = 5
f = function() n ^ 2
n = 1
f() # = 1
To understand the difference between your first and second example, consider the following case which involved functions, yet behaves like your first case (i.e. it uses the “old” value of f).
To make the example work, we need a little helper: a function that modifies other functions. In the following, twice is a function which takes a function as an argument and returns a new function. That new function is the same as the old function, only it runs twice when invoked:
twice = function (original_function) {
force(original_function)
function (...) {
original_function(original_function(...))
}
}
To illustrate what twice does, let’s invoke it on an example function:
plus1 = function (n) n + 1
plus2 = twice(plus1)
plus2(3) # = 5
Neat — R allows us to handle functions like any other object!
Now let’s modify your f:
f = function(n) {n^2}
f = twice(f)
f(5) # 625
… and here we have it: in the statement f = twice(f), the second f refers to the current (= old) definition. Only after that line does f refer to the new, modified function.
Here's a simple example illustrating my comment on Konrad's excellent answer:
a <- 2
f <- function() a*b
e <- new.env()
assign("b",5,e)
environment(f) <- e
> f()
[1] 10
b <- 10
> f()
[1] 10
So we've manually altered the environment for f so that it always first looks in e for b. Theoretically, one could even lock that binding ?lockBinding to make sure it never changes without throwing an error.
This sort of thing could get complicated, though, as in general you'd want to make sure that you set the parent environment of e correctly based on where the function f is actually being created. In this example f is created in the global environment, but if f were being created inside another function, you'd want e's parent environment to reflect that.
The following returns 1, indicating a local x is created.
x = 1
bar() = (x = 2)
bar() # 2
x # 1
This returns 5, indicating both x refer to the global one.
x = 1
for i = 1:5
x = i
end
x # 5
A reference example: this time for loop fails to update the global.
x = 10
function foo(n)
for i = 1:n
x = i
end
1
end
foo(2), x # 1, 10
Update
The link from #matt-b is very useful. This is in fact the result of Soft Scope vs Hard Scope, see here. To wrap up, function scope used to work like loop scope, until there was a break change with the introduction of soft scope & hard scope. The documentation is not quite up to speed.
If you want to use global x in function scope you must declare it global
x = 10
function foo(n)
global x
for i = 1:n
x = i
end
1
end
foo(2), x # 2
As #colinfang have commented, in Julia function scope and loop scope are treat differently and I think the following sentence from documentation try to address this fact:
Julia uses lexical scoping, meaning that a function’s scope does not
inherit from its caller’s scope, but from the scope in which the
function was defined.
The link from #matt-b is very useful. This is in fact the result of Soft Scope vs Hard Scope, see here. To wrap up, function scope used to work like loop scope, until there was a break change with the introduction of soft scope & hard scope. The documentation is not quite up to speed.
Consider the following simple function:
f <- function(x, value){print(x);print(substitute(value))}
Argument x will eventually be evaluated by print, but value never will. So we can get results like this:
> f(a, a)
Error in print(x) : object 'a' not found
> f(3, a)
[1] 3
a
> f(1+1, 1+1)
[1] 2
1 + 1
> f(1+1, 1+"one")
[1] 2
1 + "one"
Everything as expected.
Now consider the same function body in a replacement function:
'g<-' <- function(x, value){print(x);print(substitute(value))}
(the single quotes should be fancy quotes)
Let's try it:
> x <- 3
> g(x) <- 4
[1] 3
[1] 4
Nothing unusual so far...
> g(x) <- a
Error: object 'a' not found
This is unexpected. Name a should be printed as a language object.
> g(x) <- 1+1
[1] 4
1 + 1
This is ok, as x's former value is 4. Notice the expression passed unevaluated.
The final test:
> g(x) <- 1+"one"
Error in 1 + "one" : non-numeric argument to binary operator
Wait a minute... Why did it try to evaluate this expression?
Well the question is: bug or feature? What is going on here? I hope some guru users will shed some light about promises and lazy evaluation on R. Or we may just conclude it's a bug.
We can reduce the problem to a slightly simpler example:
g <- function(x, value)
'g<-' <- function(x, value) x
x <- 3
# Works
g(x, a)
`g<-`(x, a)
# Fails
g(x) <- a
This suggests that R is doing something special when evaluating a replacement function: I suspect it evaluates all arguments. I'm not sure why, but the comments in the C code (https://github.com/wch/r-source/blob/trunk/src/main/eval.c#L1656 and https://github.com/wch/r-source/blob/trunk/src/main/eval.c#L1181) suggest it may be to make sure other intermediate variables are not accidentally modified.
Luke Tierney has a long comment about the drawbacks of the current approach, and illustrates some of the more complicated ways replacement functions can be used:
There are two issues with the approach here:
A complex assignment within a complex assignment, like
f(x, y[] <- 1) <- 3, can cause the value temporary
variable for the outer assignment to be overwritten and
then removed by the inner one. This could be addressed by
using multiple temporaries or using a promise for this
variable as is done for the RHS. Printing of the
replacement function call in error messages might then need
to be adjusted.
With assignments of the form f(g(x, z), y) <- w the value
of z will be computed twice, once for a call to g(x, z)
and once for the call to the replacement function g<-. It
might be possible to address this by using promises.
Using more temporaries would not work as it would mess up
replacement functions that use substitute and/or
nonstandard evaluation (and there are packages that do
that -- igraph is one).
I think the key may be found in this comment beginning at line 1682 of "eval.c" (and immediately followed by the evaluation of the assignment operation's RHS):
/* It's important that the rhs get evaluated first because
assignment is right associative i.e. a <- b <- c is parsed as
a <- (b <- c). */
PROTECT(saverhs = rhs = eval(CADR(args), rho));
We expect that if we do g(x) <- a <- b <- 4 + 5, both a and b will be assigned the value 9; this is in fact what happens.
Apparently, the way that R ensures this consistent behavior is to always evaluate the RHS of an assignment first, before carrying out the rest of the assignment. If that evaluation fails (as when you try something like g(x) <- 1 + "a"), an error is thrown and no assignment takes place.
I'm going to go out on a limb here, so please, folks with more knowledge feel free to comment/edit.
Note that when you run
'g<-' <- function(x, value){print(x);print(substitute(value))}
x <- 1
g(x) <- 5
a side effect is that 5 is assigned to x. Hence, both must be evaluated. But if you then run
'g<-'(x,10)
both the values of x and 10 are printed, but the value of x remains the same.
Speculation:
So the parser is distinguishing between whether you call g<- in the course of making an actual assignment, and when you simply call g<- directly.
I just ran into something odd, which hopefully someone here can shed some light on. Basically, when a function has an argument whose default value is the argument's name, strange things happen (well, strange to me anyway).
For example:
y <- 5
f <- function(x=y) x^2
f2 <- function(y=y) y^2
I would consider f and f2 to be equivalent; although they use different variable names internally, they should both pick up the y object in the global environment to use as the default. However:
> f()
[1] 25
> f2()
Error in y^2 : 'y' is missing
Not sure why that is happening.
Just to make things even more interesting:
f3 <- function(y=y) y$foo
> f3()
Error in f3() :
promise already under evaluation: recursive default argument reference or earlier problems?
I expected f3 to throw an error, but not that one!
This was tested on R 2.11.1, 2.12.2, and 2.14, on 32-bit Windows XP SP3. Only the standard packages loaded.
Default arguments are evaluated inside the scope of the function. Your f2 is similar (almost equivalent) to the following code:
f2 = function(y) {
if (missing(y)) y = y
y^2
}
This makes the scoping clearer and explains why your code doesn’t work.
Note that this is only true for default arguments; arguments that are explicitly passed are (of course) evaluated in the scope of the caller.
Lazy evaluation, on the other hand, has nothing to do with this: all arguments are lazily evaluated, but calling f2(y) works without complaint. To show that lazy evaluation always happens, consider this:
f3 = function (x) {
message("x has not been evaluated yet")
x
}
f3(message("NOW x has been evaluated")
This will print, in this order:
x has not been evaluated yet
NOW x has been evaluated