Scoping order of '=' and '<-' inside a function in R - r

I am trying to understand how R scopes variables inside a function. Why is the output 12? Why not 4? How are a & b assigned here
I am learning R. Please explain with some references
f1 <- function(a = {b <- 10; 2}, b = 2) {
a+b
}
f1()

This is explained in section 4.3.3 of the R Language manual.
When a function is called, each formal argument is assigned a promise
in the local environment of the call with the expression slot
containing the actual argument (if it exists) and the environment slot
containing the environment of the caller. If no actual argument for a
formal argument is given in the call and there is a default
expression, it is similarly assigned to the expression slot of the
formal argument, but with the environment set to the local
environment.
The process of filling the value slot of a promise by evaluating the
contents of the expression slot in the promise’s environment is called
forcing the promise. A promise will only be forced once, the value
slot content being used directly later on.
Nothing has a value until the sum starts getting computed. First a is required and so it's expression is evaluated. The promise for b is lost as it gets assigned a value directly during the forcing of a and so the actual b assignment promise in the function definition is not evaluated at all.
If the order is the other way round, you see a different result:
f2 <- function(a = 2, b = {a <- 10; 2}) {
a+b
}
f2()
[1] 4
However, note that the value of a will be 10 at end of the function, but 2 when it is required during the sum. Both promises get evaluated here.
If the order of the sum is reversed in f1 to instead be b+a you would find similar behaviour to f2.
Earlier in that section there is a general warning that side-effects should be avoided in assignments because they is no guarantee they will be evaluated.
R has a form of lazy evaluation of function arguments. Arguments are
not evaluated until needed. It is important to realize that in some
cases the argument will never be evaluated. Thus, it is bad style to
use arguments to functions to cause side-effects. While in C it is
common to use the form, foo(x = y) to invoke foo with the value of y
and simultaneously to assign the value of y to x this same style
should not be used in R. There is no guarantee that the argument will
ever be evaluated and hence the assignment may not take place.

Refer https://www.rdocumentation.org/packages/base/versions/3.5.2/topics/assignOpsenter link description here
Try this
f1 <- function(a = {b <= 10; 2}, b = 2) {
a+b
}
f1()
or
f1 <- function(a = {b <<- 10; 2}, b = 2) {
a+b
}
f1()

Related

What is the calling function in R?

One of the most important things to know about the evaluation of arguments to a function is that supplied arguments and default arguments are treated differently. The supplied arguments to a function are evaluated in the evaluation frame of the calling function. The default arguments to a function are evaluated in the evaluation frame of the function.
I don't quite understand what it is meant by calling function. Is it the function that is invoked (like in interactive sesion with function that has named assigned you type name and hit enter). If yes how evaluation frame of the callinig function differs from evaluation frame of the function?
First change to standard terms. The arguments that are used in the function definition are the formal arguments and the arguments that are passed to the function when calling it are the actual arguments. (The quoted passage in the question is referring to the actual arguments when it uses the nonstandard term, supplied arguments.)
Consider two cases via example.
Case 1
Below f has the formal argument x and when f is called in the last line of code there are no actual arguments.
Now when f is called in the last line of code x gets the value 2 because x is not set until it is used and when it is used a is looked up within the function where it has the value 2, not in the caller where it has the value 1.
a <- 1
f <- function(x = a) {
a <- 2
x
}
f()
## [1] 2
Case 2
On the other hand the actual arguments are evaluated in the caller. In the last line of code below x is set to 1 because that is the value of b in the caller. Again, x is not evaluated until it is used but now even though b has been set to 2 in the function itself this has no effect on x. x is set to 1, not 2.
b <- 1
g <- function(x) { b <- 2; x + b }
g(b)
## [1] 3
Other
Although this covers the two cases in the quote note that there exists another case which is the situation that occurs when x is referred to in a function but is not defined in the function. In the code below a is a free variable in g since a is not an argument or otherwise defined in g. In this case when gg (which equals g) is called R attempts to look up a in the function g and fails but the next place it looks is not the caller (where a is 1) but the environment in which the function was defined, i.e. the environment where the word function appears and a is 2 in that environment.
a <- 1
f <- function() {
a <- 2
g <- function() a
}
gg <- f()
gg()
## [1] 2
This is referred to as lexical scoping since one can tell where the free variables are looked up by simply looking at the function definitions.

R: Scoping, timing, and <<-

In the code below, I expect both f and the final a to return 3. But in fact they both return 2. Why is this? Hasn't 3 replaced 2 in the enclosing environment at the time the promise is evaluated?
a <- 1
f <- function(a){
a <<- 3
cat(a)
}
f(a <- 2)
a
Note that If I use an = instead of a <- in the call to f, the final a is 3 as expected, but f remains 2.
Let's walk through the code
a <- 1
assigns the value 1 to the name a in the global environment.
f <- function(a){...}
creates a function saved to the name f in the global environment.
f(a <- 2)
Now we are calling the function f with the expression a<-2 as a parameter. This expression is not evaluated immediately. It is passed as a promise. The global value of a remains 1.
Now we enter the body of the function f. The expression we've passed in is assigned to the local variable a in the function scope (still un-evaluated) and a in the global remains 1. The fact they they both involve the symbol a is irrelevant. There is no direct connection between the two a variables here.
a <<- 3
This assigns the value of 3 to a in a parent scope via <<- rather than the local scope as <- would do. This means that the a refered to here is not the local a that now hold the parameter passed to the function. So this changes the value of a in the global scope to 3. And finally
cat(a)
Now we are finally using the value that was passed to the function since the a here refers to the a in the local function scope. This triggers the promise a <- 2 to be run in the calling scope (which happens to be the global scope). Thus the global value of a is set to 2. This assignment expression returns the right-hand-side value so "2" is displayed from cat().
The function exits and
a
shows the value of the a in the global environment which is now a. It was only the value 3 in the brief moment between the two expressions in f.
If you where to call
f( a=2 )
This is very different. Now we are not passing an expression to the function anymore, we are passing the value 2 to the named function parameter a. If you tried f(x=2) you would get an error that the function doesn't recognize the parameter named "x". There is no fancy lazy expression/promise evaluation in this scenario since 2 is a constant. This would leave the global value set to 3 after the function call. f(a <- 2) and f(a = a <- 2) would behave the same way.

Understanding functions with variable scoping

For this code :
A <- 100; B <- 20
f1 <- function(a) {
B <- 100
f2 <- function(b) {
A <<- 200
B <<- 1000
}
f2(a)
}
f1(B)
cat(A)
cat(B)
The following is the output :
> cat(A)
200
> cat(B)
20
Here is my understanding of above code :
The function f1 is invoked with parameter B that has value 20. Within f1 a local variable B is created ( B <- 100 ), f1.B does not have an effect on the variable B
initialiazed outside function call f1 as f1.B is locally scoped to the function f1. A new function f2 is created within f1 that accepts a single argument b.
Within f1 the function f2 is invoked passing as paramter a to f2. f2 does not makes use of its argument b. f2 modifies A using the global operator <-- and sets it to 200. This
why cat(A) outputs 200.
My understanding is incorrect as B is set to 20 when I expect 1000 ? As A is set to 200 in f2 using <-- . d should same not also occur for B ?
The function f1 is invoked with parameter B that has value 20.
No, I don't think so. It is invoked with a parameter a that has the same value as the B in the global environment. B is not directly involved in this point.
You then assign a 100 to a different B, which you call f1.B in your post. (Note that, following the previous statement, that B is created here, not overwritten.)
Then when using the <<- operator, it traverses up in scope, going from f2 (where no B exists) to f1, where it finds this "f1.B" and assigns a 1000.
Similarly, when using the <<- on A, it traverses up. It finds no A in either f2 or f1, but does in the global environment and assigns it there.
You then print to old original B, which has never been altered.
From the help:
<<- and ->> (...) cause a search to be made through parent environments for an existing definition of the variable being
assigned. If such a variable is found (and its binding is not locked)
then its value is redefined, otherwise assignment takes place in the
global environment.
So for B, "such a variable is found", while for A "assignment takes place in the global environment."
Conclusion: <<- is confusing, and often better to be avoided.

Understanding evaluation of input arguments of functions

I am reading Advanced R by Hadley Wickham where some very good exercises are provided. One of them asks for description of this function:
f1 <- function(x = {y <- 1; 2}, y = 0) {
x + y
}
f1()
Can someone help me to understand why it returns 3? I know there is something called lazy evaluation of the input arguments, and e.g. another exercise asks for description of this function
f2 <- function(x = z) {
z <- 100
x
}
f2()
and I correctly predicted to be 100; x gets value of z which is evaluated inside a function, and then x is returned. I cannot figure out what happens in f1(), though.
Thanks.
See this from https://cran.r-project.org/doc/manuals/r-patched/R-lang.html#Evaluation:
When a function is called or invoked a new evaluation frame is
created. In this frame the formal arguments are matched with the
supplied arguments according to the rules given in Argument matching.
The statements in the body of the function are evaluated sequentially
in this environment frame.
...
R has a form of lazy evaluation of function arguments. Arguments are not evaluated until needed.
and this from https://cran.r-project.org/doc/manuals/r-patched/R-lang.html#Arguments:
Default values for arguments can be specified using the special form
‘name = expression’. In this case, if the user does not specify a
value for the argument when the function is invoked the expression
will be associated with the corresponding symbol. When a value is
needed the expression is evaluated in the evaluation frame of the
function.
In summary, if the parameter does not have user-specified value, its default value will be evaluated in the function's evaluation frame. So y is not evalulated at first. When the default of x is evaluated in the function's evaluation frame, y will be modified to 1, then x will be set to 2. As y is already found, the default argument has no change to be evaluated. if you try f1(y = 1) and f1(y = 2), the results are still 3.

How does R know not to use the old 'f'?

Into the R console, type:
#First code snippet
x <- 0
x <- x+1
x
You'll get '1'. That makes sense: the idea is that the 'x' in 'x+1' is the current value of x, namely 0, and this is used to compute the value of x+1, namely 1, which is then shoveled into the container x. So far, so good.
Now type:
#Second code snippet
f <- function(n) {n^2}
f <- function(n) {if (n >= 1) {n*f(n-1)} else {1}}
f(5)
You'll get '120', which is 5 factorial.
I find this perplexing. Following the logic of the first code snippet, we might expect the 'f' in the expression
if (n >= 1) {n*f(n-1)} else {1}
to be interpreted as the current value of f, namely
function(n) {n^2}
Following this reasoning, the value of f(5) should be 5*(5-1)^2 = 80. But that's not what we get.
Question. What's really going on here? How does R know not to use the old 'f'?
we might expect the 'f' in the expression
if (n >= 1) {n*f(n-1)} else {1}
to be interpreted as the current value of f
— Yes, we might expect that. And we would be correct.
But what is the “current value of f”? Or, more precisely, what is “current”?
“Current” is when the function is executed, not when it is defined. That is, by the time you execute f(5), it has already been redefined. So now the execution enters the function, looks up inside the function what f refers to — and also finds the current (= new) definition, not the old one.
In other words: the objects associated with names are looked up when they are actually accessed. And inside a function this means that names are accessed when the function is executed, not when it’s defined.
The same is true for all objects. Let’s say f is using a global object that’s not a function:
n = 5
f = function() n ^ 2
n = 1
f() # = 1
To understand the difference between your first and second example, consider the following case which involved functions, yet behaves like your first case (i.e. it uses the “old” value of f).
To make the example work, we need a little helper: a function that modifies other functions. In the following, twice is a function which takes a function as an argument and returns a new function. That new function is the same as the old function, only it runs twice when invoked:
twice = function (original_function) {
force(original_function)
function (...) {
original_function(original_function(...))
}
}
To illustrate what twice does, let’s invoke it on an example function:
plus1 = function (n) n + 1
plus2 = twice(plus1)
plus2(3) # = 5
Neat — R allows us to handle functions like any other object!
Now let’s modify your f:
f = function(n) {n^2}
f = twice(f)
f(5) # 625
… and here we have it: in the statement f = twice(f), the second f refers to the current (= old) definition. Only after that line does f refer to the new, modified function.
Here's a simple example illustrating my comment on Konrad's excellent answer:
a <- 2
f <- function() a*b
e <- new.env()
assign("b",5,e)
environment(f) <- e
> f()
[1] 10
b <- 10
> f()
[1] 10
So we've manually altered the environment for f so that it always first looks in e for b. Theoretically, one could even lock that binding ?lockBinding to make sure it never changes without throwing an error.
This sort of thing could get complicated, though, as in general you'd want to make sure that you set the parent environment of e correctly based on where the function f is actually being created. In this example f is created in the global environment, but if f were being created inside another function, you'd want e's parent environment to reflect that.

Resources