Understanding evaluation of input arguments of functions - r

I am reading Advanced R by Hadley Wickham where some very good exercises are provided. One of them asks for description of this function:
f1 <- function(x = {y <- 1; 2}, y = 0) {
x + y
}
f1()
Can someone help me to understand why it returns 3? I know there is something called lazy evaluation of the input arguments, and e.g. another exercise asks for description of this function
f2 <- function(x = z) {
z <- 100
x
}
f2()
and I correctly predicted to be 100; x gets value of z which is evaluated inside a function, and then x is returned. I cannot figure out what happens in f1(), though.
Thanks.

See this from https://cran.r-project.org/doc/manuals/r-patched/R-lang.html#Evaluation:
When a function is called or invoked a new evaluation frame is
created. In this frame the formal arguments are matched with the
supplied arguments according to the rules given in Argument matching.
The statements in the body of the function are evaluated sequentially
in this environment frame.
...
R has a form of lazy evaluation of function arguments. Arguments are not evaluated until needed.
and this from https://cran.r-project.org/doc/manuals/r-patched/R-lang.html#Arguments:
Default values for arguments can be specified using the special form
‘name = expression’. In this case, if the user does not specify a
value for the argument when the function is invoked the expression
will be associated with the corresponding symbol. When a value is
needed the expression is evaluated in the evaluation frame of the
function.
In summary, if the parameter does not have user-specified value, its default value will be evaluated in the function's evaluation frame. So y is not evalulated at first. When the default of x is evaluated in the function's evaluation frame, y will be modified to 1, then x will be set to 2. As y is already found, the default argument has no change to be evaluated. if you try f1(y = 1) and f1(y = 2), the results are still 3.

Related

Scoping order of '=' and '<-' inside a function in R

I am trying to understand how R scopes variables inside a function. Why is the output 12? Why not 4? How are a & b assigned here
I am learning R. Please explain with some references
f1 <- function(a = {b <- 10; 2}, b = 2) {
a+b
}
f1()
This is explained in section 4.3.3 of the R Language manual.
When a function is called, each formal argument is assigned a promise
in the local environment of the call with the expression slot
containing the actual argument (if it exists) and the environment slot
containing the environment of the caller. If no actual argument for a
formal argument is given in the call and there is a default
expression, it is similarly assigned to the expression slot of the
formal argument, but with the environment set to the local
environment.
The process of filling the value slot of a promise by evaluating the
contents of the expression slot in the promise’s environment is called
forcing the promise. A promise will only be forced once, the value
slot content being used directly later on.
Nothing has a value until the sum starts getting computed. First a is required and so it's expression is evaluated. The promise for b is lost as it gets assigned a value directly during the forcing of a and so the actual b assignment promise in the function definition is not evaluated at all.
If the order is the other way round, you see a different result:
f2 <- function(a = 2, b = {a <- 10; 2}) {
a+b
}
f2()
[1] 4
However, note that the value of a will be 10 at end of the function, but 2 when it is required during the sum. Both promises get evaluated here.
If the order of the sum is reversed in f1 to instead be b+a you would find similar behaviour to f2.
Earlier in that section there is a general warning that side-effects should be avoided in assignments because they is no guarantee they will be evaluated.
R has a form of lazy evaluation of function arguments. Arguments are
not evaluated until needed. It is important to realize that in some
cases the argument will never be evaluated. Thus, it is bad style to
use arguments to functions to cause side-effects. While in C it is
common to use the form, foo(x = y) to invoke foo with the value of y
and simultaneously to assign the value of y to x this same style
should not be used in R. There is no guarantee that the argument will
ever be evaluated and hence the assignment may not take place.
Refer https://www.rdocumentation.org/packages/base/versions/3.5.2/topics/assignOpsenter link description here
Try this
f1 <- function(a = {b <= 10; 2}, b = 2) {
a+b
}
f1()
or
f1 <- function(a = {b <<- 10; 2}, b = 2) {
a+b
}
f1()

Dynamic scoping questions in R

I'm reading the AdvancedR by Hadley and am testing the following code on this URL
subset2 = function(df, condition){
condition_call = eval(substitute(condition),df )
df[condition_call,]
}
df = data.frame(a = 1:10, b = 2:11)
condition = 3
subset2(df, a < condition)
Then I got the following error message:
Error in eval(substitute(condition), df) : object 'a' not found
I read the explanation as follows but don't quite understand:
If eval() can’t find the variable inside the data frame (its second argument), it looks in the environment of subset2(). That’s obviously not what we want, so we need some way to tell eval() where to look if it can’t find the variables in the data frame.
In my opinion, while "eval(substitute(condition),df )", the variable they cannot find is condition, then why object "a" cannot be found?
On the other hand, why the following code won't make any error?
subset2 = function(df, condition){
condition_call = eval(substitute(condition),df )
df[condition_call,]
}
df = data.frame(a = 1:10, b = 2:11)
y = 3
subset2(df, a < y)
This more stripped down example may make it easier for you to see what's going on in Hadley's example. The first thing to note is that the symbol condition appears here in four different roles, each of which I've marked with a numbered comment.
## Role of symbol `condition`
f <- function(condition) { #1 -- formal argument
a <- 100
condition + a #2 -- symbol bound to formal argument
}
condition <- 3 #3 -- symbol in global environment
f(condition = condition + a) #4 -- supplied argument (on RHS)
## Error in f(condition = condition + a) (from #1) : object 'a' not found
The other important thing to understand is that symbols in supplied arguments (here the right hand side part of condition = condition + a at #4) are searched for in the evaluation frame of the calling function. From Section 4.3.3 Argument Evaluation of the R Language Definition:
One of the most important things to know about the evaluation of arguments to a function is that supplied arguments and default arguments are treated differently. The supplied arguments to a function are evaluated in the evaluation frame of the calling function. The default arguments to a function are evaluated in the evaluation frame of the function.
In the example above, the evaluation frame of the call to f() is the global environment, .GlobalEnv.
Taking this step by step, here is what happens when you call (condition = condition + a). During function evaluation, R comes across the expression condition + a in the function body (at #2). It searches for values of a and condition, and finds a locally assigned symbol a. It finds that the symbol condition is bound to the formal argument named condition (at #1). The value of that formal argument, supplied during the function call, is condition + a (at #4).
As noted in the R Language Definition, the values of the symbols in the expression condition + a are searched for in the environment of the calling function, here the global environment. Since the global environment contains a variable named condition (assigned at #3) but no variable named a, it is unable to evaluate the expression condition + a (at #4), and fails with the error that you see.
I want to add some details in case someone stumbles on this question. The problematic line is
condition_call = eval(substitute(condition),df )
The condition object in substitute() function is a promise object, its expression slot is "a < condition" and substitute(condition) takes expression and returns a call object with expression as "a < condition".
Then eval() function start to evaluate the "a < condition" in the df environment. Its target is finding both a and condition.
a is found in df successfully, and this is not where the bug generated.
Then R starts searching condition in df and cannot find it.
So R goes up to the execution environment of subset2, and finds condition in the execution environment.
The variable it finds is actually the promise object mentioned before with expression slot as "a < condition".
To evaluate this expression, R has to find a again, and now it cannot find a any more because it has passed the df environment. This is the part that really generates the error.
To summarize the problem here:
R does find a in the df for once.
The bug arises when R tries to look for condition and then R takes the promise object condition instead of the 4 assigned outside as the argument and tries to evaluate it.
Then R runs into the problem:
it tries to evaluate "a < condition" and it cannot find a either in the execution environment of subset2() or global environment.
For my second example, R cannot find y in the execution environment and then finds y in the calling environment of subset2() as 4, generating no errors. In this case, the name of y is different from the promise object condition and R won't try to evaluate "a < y" and no bugs generated.

What is the calling function in R?

One of the most important things to know about the evaluation of arguments to a function is that supplied arguments and default arguments are treated differently. The supplied arguments to a function are evaluated in the evaluation frame of the calling function. The default arguments to a function are evaluated in the evaluation frame of the function.
I don't quite understand what it is meant by calling function. Is it the function that is invoked (like in interactive sesion with function that has named assigned you type name and hit enter). If yes how evaluation frame of the callinig function differs from evaluation frame of the function?
First change to standard terms. The arguments that are used in the function definition are the formal arguments and the arguments that are passed to the function when calling it are the actual arguments. (The quoted passage in the question is referring to the actual arguments when it uses the nonstandard term, supplied arguments.)
Consider two cases via example.
Case 1
Below f has the formal argument x and when f is called in the last line of code there are no actual arguments.
Now when f is called in the last line of code x gets the value 2 because x is not set until it is used and when it is used a is looked up within the function where it has the value 2, not in the caller where it has the value 1.
a <- 1
f <- function(x = a) {
a <- 2
x
}
f()
## [1] 2
Case 2
On the other hand the actual arguments are evaluated in the caller. In the last line of code below x is set to 1 because that is the value of b in the caller. Again, x is not evaluated until it is used but now even though b has been set to 2 in the function itself this has no effect on x. x is set to 1, not 2.
b <- 1
g <- function(x) { b <- 2; x + b }
g(b)
## [1] 3
Other
Although this covers the two cases in the quote note that there exists another case which is the situation that occurs when x is referred to in a function but is not defined in the function. In the code below a is a free variable in g since a is not an argument or otherwise defined in g. In this case when gg (which equals g) is called R attempts to look up a in the function g and fails but the next place it looks is not the caller (where a is 1) but the environment in which the function was defined, i.e. the environment where the word function appears and a is 2 in that environment.
a <- 1
f <- function() {
a <- 2
g <- function() a
}
gg <- f()
gg()
## [1] 2
This is referred to as lexical scoping since one can tell where the free variables are looked up by simply looking at the function definitions.

What's the difference between substitute and quote in R

In the official docs, it says:
substitute returns the parse tree for the (unevaluated) expression
expr, substituting any variables bound in env.
quote simply returns its argument. The argument is not evaluated and
can be any R expression.
But when I try:
> x <- 1
> substitute(x)
x
> quote(x)
x
It looks like both quote and substitute returns the expression that's passed as argument to them.
So my question is, what's the difference between substitute and quote, and what does it mean to "substituting any variables bound in env"?
Here's an example that may help you to easily see the difference between quote() and substitute(), in one of the settings (processing function arguments) where substitute() is most commonly used:
f <- function(argX) {
list(quote(argX),
substitute(argX),
argX)
}
suppliedArgX <- 100
f(argX = suppliedArgX)
# [[1]]
# argX
#
# [[2]]
# suppliedArgX
#
# [[3]]
# [1] 100
R has lazy evaluation, so the identity of a variable name token is a little less clear than in other languages. This is used in libraries like dplyr where you can write, for instance:
summarise(mtcars, total_cyl = sum(cyl))
We can ask what each of these tokens means: summarise and sum are defined functions, mtcars is a defined data frame, total_cyl is a keyword argument for the function summarise. But what is cyl?
> cyl
Error: object 'cyl' not found
It isn't anything! Well, not yet. R doesn't evaluate it right away, but treats it as an expression to be parsed later with some parse tree that is different than the global environment your command line is working in, specifically one where the columns of mtcars are defined. Somewhere in the guts of dplyr, something like this is happening:
> substitute(cyl, mtcars)
[1] 6 6 4 6 8 ...
Suddenly cyl means something. That's what substitute is for.
So what is quote for? Well sometimes you want your lazily-evaluated expression to be represented somewhere else before it's evaluated, i.e. you want to display the actual code you're writing without any (or only some) values substituted. The docs you quoted explain this is common for "informative labels for data sets and plots".
So, for example, you could create a quoted expression, and then both print the unevaluated expression in your chart to show how you calculated and actually calculate with the expression.
expr <- quote(x + y)
print(expr) # x + y
eval(expr, list(x = 1, y = 2)) # 3
Note that substitute can do this expression trick also while giving you the option to parse only part of it. So its features are a superset of quote.
expr <- substitute(x + y, list(x = 1))
print(expr) # 1 + y
eval(expr, list(y = 2)) # 3
Maybe this section of the documentation will help somewhat:
Substitution takes place by examining each component of the parse tree
as follows: If it is not a bound symbol in env, it is unchanged. If it
is a promise object, i.e., a formal argument to a function or
explicitly created using delayedAssign(), the expression slot of the
promise replaces the symbol. If it is an ordinary variable, its value
is substituted, unless env is .GlobalEnv in which case the symbol is
left unchanged.
Note the final bit, and consider this example:
e <- new.env()
assign(x = "a",value = 1,envir = e)
> substitute(a,env = e)
[1] 1
Compare that with:
> quote(a)
a
So there are two basic situations when the substitution will occur: when we're using it on an argument of a function, and when env is some environment other than .GlobalEnv. So that's why you particular example was confusing.
For another comparison with quote, consider modifying the myplot function in the examples section to be:
myplot <- function(x, y)
plot(x, y, xlab = deparse(quote(x)),
ylab = deparse(quote(y)))
and you'll see that quote really doesn't do any substitution.
Regarding your question why GlobalEnv is treated as an exception for substitute, it is just a heritage of S. From The R language definition (https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Substitutions):
The special exception for substituting at the top level is admittedly peculiar. It has been inherited from S and the rationale is most likely that there is no control over which variables might be bound at that level so that it would be better to just make substitute act as quote.

R: Passing parameters from a wrapper function to internal functions

I am not surprised that this function doesn't work, but I cannot quite understand why.
computeMeans <- function(data,dv,fun) {
x <- with(data,aggregate(dv,
list(
method=method,
hypo=hypothesis,
pre.group=pre.group,
pre.smooth=pre.smooth
),
fun ) )
return(x)
}
computeMeans(df.basic,dprime,mean)
Where df.basic is a dataframe with factors method, hypothesis, etc, and several dependent variables (and I specify one with the dv parameter, dprime).
I have multiple dependent variables and several dataframes all of the same form, so I wanted to write this little function to keep things "simple". The error I get is:
Error in aggregate(dv, list(method = method, hypo = hypothesis,
pre.group = pre.group, :
object 'dprime' not found
But dprime does exist in df.basic, which is referenced with with(). Can anyone explain the problem? Thank you!
EDIT: This is the R programming language. http://www.r-project.org/
Although dprime exists in df.basic, when you call it at computeMeans it has no idea what you are referring to, unless you explicitly reference it.
computeMeans(df.basic,df.basic$dprime,mean)
will work.
Alternatively
computeMeans <- function(data,dv,fun) {
dv <- eval(substitute(dv), envir=data)
x <- with(data,aggregate(dv,
list(
method=method,
hypo=hypothesis,
pre.group=pre.group,
pre.smooth=pre.smooth
),
fun ) )
return(x)
}
You might think that since dv is in the with(data, (.)) call, it gets evaluated within the environment of data. It does not.
When a function is called the arguments are matched and then each of
the formal arguments is bound to a promise. The expression that was
given for that formal argument and a pointer to the environment the
function was called from are stored in the promise.
Until that argument is accessed there is no value associated with the
promise. When the argument is accessed, the stored expression is
evaluated in the stored environment, and the result is returned. The
result is also saved by the promise.
source
A promise is therefore evaluated within the environment in which it was created (ie, the environment where the function was called), regardless of the environment in which the promise is first called. Observe:
delayedAssign("x", y)
local({
y <- 10
x
})
Error in eval(expr, envir, enclos) : object 'y' not found
w <- 10
delayedAssign("z", w)
local({
w <- 11
z
})
[1] 10
Note that delayedAssign creates a promise. In the first example, x is assigned the value of y via a promise in the global environemnt, but y has not been defined in the global enviornment. x is called in an enviornment where y has been defined, yet calling x still results in an error indicating that y does not exist. This demonstrates that x is evaluated in environment in which the promise was defined, not in its current environment.
In the second example, z is assigned the value of w via a promise in the global environment, and w is defined in the global environment. z is then called in an enviornment where w has been assigned a different value, yet z still returns the value of the w in the environment where the promise has been created.
Passing in the dprime argument as a character string would allow you to sidestep any consideration of the involved scoping and evaluation rules discussed in #Michael's answer:
computeMeans <- function(data, dv, fun) {
x <- aggregate(data[[dv]],
list(
method = data[["method"]],
hypo = data[["hypothesis"]],
pre.group = data[["pre.group"]],
pre.smooth = data[["pre.smooth"]]
),
fun )
return(x)
}
computeMeans(df.basic, "dprime", mean)

Resources