One of the most important things to know about the evaluation of arguments to a function is that supplied arguments and default arguments are treated differently. The supplied arguments to a function are evaluated in the evaluation frame of the calling function. The default arguments to a function are evaluated in the evaluation frame of the function.
I don't quite understand what it is meant by calling function. Is it the function that is invoked (like in interactive sesion with function that has named assigned you type name and hit enter). If yes how evaluation frame of the callinig function differs from evaluation frame of the function?
First change to standard terms. The arguments that are used in the function definition are the formal arguments and the arguments that are passed to the function when calling it are the actual arguments. (The quoted passage in the question is referring to the actual arguments when it uses the nonstandard term, supplied arguments.)
Consider two cases via example.
Case 1
Below f has the formal argument x and when f is called in the last line of code there are no actual arguments.
Now when f is called in the last line of code x gets the value 2 because x is not set until it is used and when it is used a is looked up within the function where it has the value 2, not in the caller where it has the value 1.
a <- 1
f <- function(x = a) {
a <- 2
x
}
f()
## [1] 2
Case 2
On the other hand the actual arguments are evaluated in the caller. In the last line of code below x is set to 1 because that is the value of b in the caller. Again, x is not evaluated until it is used but now even though b has been set to 2 in the function itself this has no effect on x. x is set to 1, not 2.
b <- 1
g <- function(x) { b <- 2; x + b }
g(b)
## [1] 3
Other
Although this covers the two cases in the quote note that there exists another case which is the situation that occurs when x is referred to in a function but is not defined in the function. In the code below a is a free variable in g since a is not an argument or otherwise defined in g. In this case when gg (which equals g) is called R attempts to look up a in the function g and fails but the next place it looks is not the caller (where a is 1) but the environment in which the function was defined, i.e. the environment where the word function appears and a is 2 in that environment.
a <- 1
f <- function() {
a <- 2
g <- function() a
}
gg <- f()
gg()
## [1] 2
This is referred to as lexical scoping since one can tell where the free variables are looked up by simply looking at the function definitions.
Related
I am trying to understand how R scopes variables inside a function. Why is the output 12? Why not 4? How are a & b assigned here
I am learning R. Please explain with some references
f1 <- function(a = {b <- 10; 2}, b = 2) {
a+b
}
f1()
This is explained in section 4.3.3 of the R Language manual.
When a function is called, each formal argument is assigned a promise
in the local environment of the call with the expression slot
containing the actual argument (if it exists) and the environment slot
containing the environment of the caller. If no actual argument for a
formal argument is given in the call and there is a default
expression, it is similarly assigned to the expression slot of the
formal argument, but with the environment set to the local
environment.
The process of filling the value slot of a promise by evaluating the
contents of the expression slot in the promise’s environment is called
forcing the promise. A promise will only be forced once, the value
slot content being used directly later on.
Nothing has a value until the sum starts getting computed. First a is required and so it's expression is evaluated. The promise for b is lost as it gets assigned a value directly during the forcing of a and so the actual b assignment promise in the function definition is not evaluated at all.
If the order is the other way round, you see a different result:
f2 <- function(a = 2, b = {a <- 10; 2}) {
a+b
}
f2()
[1] 4
However, note that the value of a will be 10 at end of the function, but 2 when it is required during the sum. Both promises get evaluated here.
If the order of the sum is reversed in f1 to instead be b+a you would find similar behaviour to f2.
Earlier in that section there is a general warning that side-effects should be avoided in assignments because they is no guarantee they will be evaluated.
R has a form of lazy evaluation of function arguments. Arguments are
not evaluated until needed. It is important to realize that in some
cases the argument will never be evaluated. Thus, it is bad style to
use arguments to functions to cause side-effects. While in C it is
common to use the form, foo(x = y) to invoke foo with the value of y
and simultaneously to assign the value of y to x this same style
should not be used in R. There is no guarantee that the argument will
ever be evaluated and hence the assignment may not take place.
Refer https://www.rdocumentation.org/packages/base/versions/3.5.2/topics/assignOpsenter link description here
Try this
f1 <- function(a = {b <= 10; 2}, b = 2) {
a+b
}
f1()
or
f1 <- function(a = {b <<- 10; 2}, b = 2) {
a+b
}
f1()
In the code below, I expect both f and the final a to return 3. But in fact they both return 2. Why is this? Hasn't 3 replaced 2 in the enclosing environment at the time the promise is evaluated?
a <- 1
f <- function(a){
a <<- 3
cat(a)
}
f(a <- 2)
a
Note that If I use an = instead of a <- in the call to f, the final a is 3 as expected, but f remains 2.
Let's walk through the code
a <- 1
assigns the value 1 to the name a in the global environment.
f <- function(a){...}
creates a function saved to the name f in the global environment.
f(a <- 2)
Now we are calling the function f with the expression a<-2 as a parameter. This expression is not evaluated immediately. It is passed as a promise. The global value of a remains 1.
Now we enter the body of the function f. The expression we've passed in is assigned to the local variable a in the function scope (still un-evaluated) and a in the global remains 1. The fact they they both involve the symbol a is irrelevant. There is no direct connection between the two a variables here.
a <<- 3
This assigns the value of 3 to a in a parent scope via <<- rather than the local scope as <- would do. This means that the a refered to here is not the local a that now hold the parameter passed to the function. So this changes the value of a in the global scope to 3. And finally
cat(a)
Now we are finally using the value that was passed to the function since the a here refers to the a in the local function scope. This triggers the promise a <- 2 to be run in the calling scope (which happens to be the global scope). Thus the global value of a is set to 2. This assignment expression returns the right-hand-side value so "2" is displayed from cat().
The function exits and
a
shows the value of the a in the global environment which is now a. It was only the value 3 in the brief moment between the two expressions in f.
If you where to call
f( a=2 )
This is very different. Now we are not passing an expression to the function anymore, we are passing the value 2 to the named function parameter a. If you tried f(x=2) you would get an error that the function doesn't recognize the parameter named "x". There is no fancy lazy expression/promise evaluation in this scenario since 2 is a constant. This would leave the global value set to 3 after the function call. f(a <- 2) and f(a = a <- 2) would behave the same way.
I am reading Advanced R by Hadley Wickham where some very good exercises are provided. One of them asks for description of this function:
f1 <- function(x = {y <- 1; 2}, y = 0) {
x + y
}
f1()
Can someone help me to understand why it returns 3? I know there is something called lazy evaluation of the input arguments, and e.g. another exercise asks for description of this function
f2 <- function(x = z) {
z <- 100
x
}
f2()
and I correctly predicted to be 100; x gets value of z which is evaluated inside a function, and then x is returned. I cannot figure out what happens in f1(), though.
Thanks.
See this from https://cran.r-project.org/doc/manuals/r-patched/R-lang.html#Evaluation:
When a function is called or invoked a new evaluation frame is
created. In this frame the formal arguments are matched with the
supplied arguments according to the rules given in Argument matching.
The statements in the body of the function are evaluated sequentially
in this environment frame.
...
R has a form of lazy evaluation of function arguments. Arguments are not evaluated until needed.
and this from https://cran.r-project.org/doc/manuals/r-patched/R-lang.html#Arguments:
Default values for arguments can be specified using the special form
‘name = expression’. In this case, if the user does not specify a
value for the argument when the function is invoked the expression
will be associated with the corresponding symbol. When a value is
needed the expression is evaluated in the evaluation frame of the
function.
In summary, if the parameter does not have user-specified value, its default value will be evaluated in the function's evaluation frame. So y is not evalulated at first. When the default of x is evaluated in the function's evaluation frame, y will be modified to 1, then x will be set to 2. As y is already found, the default argument has no change to be evaluated. if you try f1(y = 1) and f1(y = 2), the results are still 3.
Into the R console, type:
#First code snippet
x <- 0
x <- x+1
x
You'll get '1'. That makes sense: the idea is that the 'x' in 'x+1' is the current value of x, namely 0, and this is used to compute the value of x+1, namely 1, which is then shoveled into the container x. So far, so good.
Now type:
#Second code snippet
f <- function(n) {n^2}
f <- function(n) {if (n >= 1) {n*f(n-1)} else {1}}
f(5)
You'll get '120', which is 5 factorial.
I find this perplexing. Following the logic of the first code snippet, we might expect the 'f' in the expression
if (n >= 1) {n*f(n-1)} else {1}
to be interpreted as the current value of f, namely
function(n) {n^2}
Following this reasoning, the value of f(5) should be 5*(5-1)^2 = 80. But that's not what we get.
Question. What's really going on here? How does R know not to use the old 'f'?
we might expect the 'f' in the expression
if (n >= 1) {n*f(n-1)} else {1}
to be interpreted as the current value of f
— Yes, we might expect that. And we would be correct.
But what is the “current value of f”? Or, more precisely, what is “current”?
“Current” is when the function is executed, not when it is defined. That is, by the time you execute f(5), it has already been redefined. So now the execution enters the function, looks up inside the function what f refers to — and also finds the current (= new) definition, not the old one.
In other words: the objects associated with names are looked up when they are actually accessed. And inside a function this means that names are accessed when the function is executed, not when it’s defined.
The same is true for all objects. Let’s say f is using a global object that’s not a function:
n = 5
f = function() n ^ 2
n = 1
f() # = 1
To understand the difference between your first and second example, consider the following case which involved functions, yet behaves like your first case (i.e. it uses the “old” value of f).
To make the example work, we need a little helper: a function that modifies other functions. In the following, twice is a function which takes a function as an argument and returns a new function. That new function is the same as the old function, only it runs twice when invoked:
twice = function (original_function) {
force(original_function)
function (...) {
original_function(original_function(...))
}
}
To illustrate what twice does, let’s invoke it on an example function:
plus1 = function (n) n + 1
plus2 = twice(plus1)
plus2(3) # = 5
Neat — R allows us to handle functions like any other object!
Now let’s modify your f:
f = function(n) {n^2}
f = twice(f)
f(5) # 625
… and here we have it: in the statement f = twice(f), the second f refers to the current (= old) definition. Only after that line does f refer to the new, modified function.
Here's a simple example illustrating my comment on Konrad's excellent answer:
a <- 2
f <- function() a*b
e <- new.env()
assign("b",5,e)
environment(f) <- e
> f()
[1] 10
b <- 10
> f()
[1] 10
So we've manually altered the environment for f so that it always first looks in e for b. Theoretically, one could even lock that binding ?lockBinding to make sure it never changes without throwing an error.
This sort of thing could get complicated, though, as in general you'd want to make sure that you set the parent environment of e correctly based on where the function f is actually being created. In this example f is created in the global environment, but if f were being created inside another function, you'd want e's parent environment to reflect that.
The following function is used to multiply a sequence 1:x by y
f1<-function(x,y){return (lapply(1:x, function(a,b) b*a, b=y))}
Looks like a is used to represent the element in the sequence 1:x, but I do not know how to understand this parameter passing mechanism. In other OO languages, like Java or C++, there have call by reference or call by value.
Short answer: R is call by value. Long answer: it can do both.
Call By Value, Lazy Evaluation, and Scoping
You'll want to read through: the R language definition for more details.
R mostly uses call by value but this is complicated by its lazy evaluation:
So you can have a function:
f <- function(x, y) {
x * 3
}
If you pass in two big matrixes to x and y, only x will be copied into the callee environment of f, because y is never used.
But you can also access variables in parent environments of f:
y <- 5
f <- function(x) {
x * y
}
f(3) # 15
Or even:
y <- 5
f <- function() {
x <- 3
g <- function() {
x * y
}
}
f() # returns function g()
f()() # returns 15
Call By Reference
There are two ways for doing call by reference in R that I know of.
One is by using Reference Classes, one of the three object oriented paradigms of R (see also: Advanced R programming: Object Oriented Field Guide)
The other is to use the bigmemory and bigmatrix packages (see The bigmemory project). This allows you to create matrices in memory (underlying data is stored in C), returning a pointer to the R session. This allows you to do fun things like accessing the same matrix from multiple R sessions.
To multiply a vector x by a constant y just do
x * y
The (some prefix)apply functions works very similar to each other, you want to map a function to every element of your vector, list, matrix and so on:
x = 1:10
x.squared = sapply(x, function(elem)elem * elem)
print(x.squared)
[1] 1 4 9 16 25 36 49 64 81 100
It gets better with matrices and data frames because you can now apply a function over all rows or columns, and collect the output. Like this:
m = matrix(1:9, ncol = 3)
# The 1 below means apply over rows, 2 would mean apply over cols
row.sums = apply(m, 1, function(some.row) sum(some.row))
print(row.sums)
[1] 12 15 18
If you're looking for a simple way to multiply a sequence by a constant, definitely use #Fernando's answer or something similar. I'm assuming you're just trying to determine how parameters are being passed in this code.
lapply calls its second argument (in your case function(a, b) b*a) with each of the values of its first argument 1, 2, ..., x. Those values will be passed as the first parameter to the second argument (so, in your case, they will be argument a).
Any additional parameters to lapply after the first two, in your case b=y, are passed to the function by name. So if you called your inner function fxn, then your invocation of lapply is making calls like fxn(1, b=4), fxn(2, b=4), .... The parameters are passed by value.
You should read the help of lapply to understand how it works. Read this excellent answer to get and a good explanation of different xxpply family functions.
From the help of laapply:
lapply(X, FUN, ...)
Here FUN is applied to each elementof X and ... refer to:
... optional arguments to FUN.
Since FUN has an optional argument b, We replace the ... by , b=y.
You can see it as a syntax sugar and to emphasize the fact that argument b is optional comparing to argument a. If the 2 arguments are symmetric maybe it is better to use mapply.