Unevaluated argument in R - r

I still a novice in R, and still understanding lazy evaluation. I read quite a few threads on SO (R functions that pass on unevaluated arguments to other functions), but I am still not sure.
Question 1:
Here's my code:
f <- function(x = ls()) {
a<-1
#x ##without x
}
f(x=ls())
When I execute this code i.e. f(), nothing returns. Specifically, I don't see the value of a. Why is it so?
Question 2:
Moreover, I do see the value of a in this code:
f <- function(x = ls()) {
a<-1
x ##with x
}
f(x=ls())
When I execute the function by f() I get :
[1] "a" "x"
Why is it so? Can someone please help me?

Question 1
This has nothing to do with lazy evaluation.
A function returns the result of the last statement it executed. In this case the last statement was a <- 1. The result of a <- 1 is one. You could for example do b <- a <- 1 which would result in b being equal to 1. So, in this case you function returns 1.
> f <- function(x = ls()) {
+ a<-1
+ }
> b <- f(x=ls())
> print(b)
[1] 1
The argument x is nowhere used, and so doesn't play any role.
Functions can return values visibly (the default) or invisibly. In order to return invisibly the function invisible can be used. An example:
> f1 <- function() {
+ 1
+ }
> f1()
[1] 1
>
> f2 <- function() {
+ invisible(1)
+ }
> f2()
>
In this case f2 doesn't seem to return anything. However, it still returns the value 1. What the invisible does, is not print anything when the function is called and the result is not assigned to anything. The relevance to your example, is that a <- 1 also returns invisibly. That is the reason that your function doesn't seem to return anything. But when assigned to b above, b still gets the value 1.
Question 2
First, I'll explain why you see the results you see. The a you see in your result, was caused some previous code. If we first clean the workspace, we only see f. This makes sense as we create a variable f (a function is also a variable in R) and then do a ls().
> rm(list = ls())
>
> f <- function(x = ls()) {
+ a<-1
+ x
+ }
> f(x=ls())
[1] "f"
What the function does (at least what you would expect), if first list all variables ls() then pass the result to the function as x. This function then returns x, which is the list of all variables, which then gets printed.
How this can be modified to show lazy evaluation at work
> rm(list = ls())
>
> f <- function(x) {
+ a <<- 1
+ x
+ }
>
> f(x = ls())
[1] "a" "f"
>
In this case the global assignment is used (a <<- 1), which creates a new variable a in the global workspace (not something you normally want to do).
In this case, one would still expect the result of the function call to be just f. The fact that it also shows a is caused by lazy evaluation.
Without lazy evaluation, it would first evaluate ls() (at that time only f exists in the workspace), copy that into the function with the name x. The function then returns x. In this case the ls() is evaluated before a is created.
However, with lazy evaluation, the expression ls() is only evaluated when the result of the expression is needed. In this case that is when the function returns and the result is printed. At that time the global environment has changed (a is created), which means that ls() also shows a.
(This is also one of the reasons why you don't want functions to change the global workspace using <<-.)

Related

assigning delayed variables in R

I've just read about delayedAssign(), but the way you have to do it is by passing the name of the delayed variable as the first parameter. Is there a way to do it via direct assignment?
e.g.:
x <- delayed_variable("Hello World")
rather than
delayedAssign("x","Hello World")
I want to create a variable that will throw an error if accessed (use-case is obviously more complex), so for example:
f <- function(x){
y <- delayed_variable(stop("don't use y"))
x
}
f(10)
> 10
f <- function(x){
y <- delayed_variable(stop("don't use y"))
y
}
f(10)
> Error in f(10) : don't use y
No, you can't do it that way. Your example would be fine with the current setup, though:
f <- function(x){
delayedAssign("y", stop("don't use y"))
y
}
f(10)
which gives exactly the error you want. The reason for this limitation is that delayed_variable(stop("don't use y")) would create a value which would trigger the error when evaluated, and assigning it to y would evaluate it.
Another version of the same thing would be
f <- function(x, y = stop("don't use y")) {
...
}
Internally it's very similar to the delayedAssign version.
I reached a solution using makeActiveBinding() which works provided it is being called from within a function (so it doesn't work if called directly and will throw an error if it is). The main purpose of my use-case is a smaller part of this, but I generalised the code a bit for others to use.
Importantly for my use-case, this function can allow other functions to use delayed assignment within functions and can also pass R CMD Check with no Notes.
Here is the function and it gives the desired outputs from my question.
delayed_variable <- function(call){
#Get the current call
prev.call <- sys.call()
attribs <- attributes(prev.call)
# If srcref isn't there, then we're not coming from a function
if(is.null(attribs) || !"srcref" %in% names(attribs)){
stop("delayed_variable() can only be used as an assignment within a function.")
}
# Extract the call including the assignment operator
this_call <- parse(text=as.character(attribs$srcref))[[1]]
# Check if this is an assignment `<-` or `=`
if(!(identical(this_call[[1]],quote(`<-`)) ||
identical(this_call[[1]],quote(`=`)))){
stop("delayed_variable() can only be used as an assignment within a function.")
}
# Get the variable being assigned to as a symbol and a string
var_sym <- this_call[[2]]
var_str <- deparse(var_sym)
#Get the parent frame that we will be assigining into
p_frame <- parent.frame()
var_env <- new.env(parent = p_frame)
#Create a random string to be an identifier
var_rand <- paste0(sample(c(letters,LETTERS),50,replace=TRUE),collapse="")
#Put the variables into the environment
var_env[["p_frame"]] <- p_frame
var_env[["var_str"]] <- var_str
var_env[["var_rand"]] <- var_rand
# Create the function that will be bound to the variable.
# Since this is an Active Binding (AB), we have three situations
# i) It is run without input, and thus the AB is
# being called on it's own (missing(input)),
# and thus it should evaluate and return the output of `call`
# ii) It is being run as the lhs of an assignment
# as part of the initial assignment phase, in which case
# we do nothing (i.e. input is the output of this function)
# iii) It is being run as the lhs of a regular assignment,
# in which case, we want to overwrite the AB
fun <- function(input){
if(missing(input)){
# No assignment: variable is being called on its own
# So, we activate the delayed assignment call:
res <- eval(call,p_frame)
rm(list=var_str,envir=p_frame)
assign(var_str,res,p_frame)
res
} else if(!inherits(input,"assign_delay") &&
input != var_rand){
# Attempting to assign to the variable
# and it is not the initial definition
# So we overwrite the active binding
res <- eval(substitute(input),p_frame)
rm(list=var_str,envir=p_frame)
assign(var_str,res,p_frame)
invisible(res)
}
# Else: We are assigning and the assignee is the output
# of this function, in which case, we do nothing!
}
#Fix the call in the above eval to be the exact call
# rather than a variable (useful for debugging)
# This is in the line res <- eval(call,p_frame)
body(fun)[[c(2,3,2,3,2)]] <- substitute(call)
#Put the function inside the environment with all
# all of the variables above
environment(fun) <- var_env
# Check if the variable already exists in the calling
# environment and if so, remove it
if(exists(var_str,envir=p_frame)){
rm(list=var_str,envir=p_frame)
}
# Create the AB
makeActiveBinding(var_sym,fun,p_frame)
# Return a specific object to check for
structure(var_rand,call="assign_delay")
}

How can one make visible the difference in the outputs of quote() and substitute()?

As applied to the same R code or objects, quote and substitute typically return different objects. How can one make this difference apparent?
is.identical <- function(X){
out <- identical(quote(X), substitute(X))
out
}
> tmc <- function(X){
out <- list(typ = typeof(X), mod = mode(X), cls = class(X))
out
}
> df1 <- data.frame(a = 1, b = 2)
Here the printed output of quote and substitute are the same.
> quote(df1)
df1
> substitute(df1)
df1
And the structure of the two are the same.
> str(quote(df1))
symbol df1
> str(substitute(df1))
symbol df1
And the type, mode and class are all the same.
> tmc(quote(df1))
$typ
[1] "symbol"
$mod
[1] "name"
$cls
[1] "name"
> tmc(substitute(df1))
$typ
[1] "symbol"
$mod
[1] "name"
$cls
[1] "name"
And yet, the outputs are not the same.
> is.identical(df1)
[1] FALSE
Note that this question shows some inputs that cause the two functions to display different outputs. However, the outputs are different even when they appear the same, and are the same by most of the usual tests, as shown by the output of is.identical() above. What is this invisible difference, and how can I make it appear?
note on the tags: I am guessing that the Common LISP quote and the R quote are similar
The reason is that the behavior of substitute() is different based on where you call it, or more precisely, what you are calling it on.
Understanding what will happen requires a very careful parsing of the (subtle) documentation for substitute(), specifically:
Substitution takes place by examining each component of the parse tree
as follows: If it is not a bound symbol in env, it is unchanged. If it
is a promise object, i.e., a formal argument to a function or
explicitly created using delayedAssign(), the expression slot of the
promise replaces the symbol. If it is an ordinary variable, its value
is substituted, unless env is .GlobalEnv in which case the symbol is
left unchanged.
So there are essentially three options.
In this case:
> df1 <- data.frame(a = 1, b = 2)
> identical(quote(df1),substitute(df1))
[1] TRUE
df1 is an "ordinary variable", but it is called in .GlobalEnv, since env argument defaults to the current evaluation environment. Hence we're in the very last case where the symbol, df1, is left unchanged and so it identical to the result of quote(df1).
In the context of the function:
is.identical <- function(X){
out <- identical(quote(X), substitute(X))
out
}
The important distinction is that now we're calling these functions on X, not df1. For most R users, this is a silly, trivial distinction, but when playing with subtle tools like substitute it becomes important. X is a formal argument of a function, so that implies we're in a different case of the documented behavior.
Specifically, it says that now "the expression slot of the promise replaces the symbol". We can see what this means if we debug() the function and examine the objects in the context of the function environment:
> debugonce(is.identical)
> is.identical(X = df1)
debugging in: is.identical(X = df1)
debug at #1: {
out <- identical(quote(X), substitute(X))
out
}
Browse[2]>
debug at #2: out <- identical(quote(X), substitute(X))
Browse[2]> str(quote(X))
symbol X
Browse[2]> str(substitute(X))
symbol df1
Browse[2]> Q
Now we can see that what happened is precisely what the documentation said would happen (Ha! So obvious! ;) )
X is a formal argument, or a promise, which according to R is not the same thing as df1. For most people writing functions, they are effectively the same, but the internal implementation disagrees. X is a promise object, and substitute replaces the symbol X with the one that it "points to", namely df1. This is what the docs mean by the "expression slot of the promise"; that's what R sees when in the X = df1 part of the function call.
To round things out, try to guess what will happen in this case:
is.identical <- function(X){
out <- identical(quote(A), substitute(A))
out
}
is.identical(X = df1)
(Hint: now A is not a "bound symbol in the environment".)
A final example illustrating more directly the final case in the docs with the confusing exception:
#Ordinary variable, but in .GlobalEnv
> a <- 2
> substitute(a)
a
#Ordinary variable, but NOT in .GlobalEnv
> e <- new.env()
> e$a <- 2
> substitute(a,env = e)
[1] 2

Using the same argument names for a function defined inside another function

Why does
f <- function(a) {
g <- function(a=a) {
return(a + 2)
}
return(g())
}
f(3) # Error in a + 2: 'a' is missing
cause an error? It has something to do with the a=a argument, particularly with the fact that the variable names are the same. What exactly is going on?
Here are some similar pieces of code that work as expected:
f <- function(a) {
g <- function(a) {
return(a + 2)
}
return(g(a))
}
f(3) # 5
f <- function(a) {
g <- function(g_a=a) {
return(g_a + 2)
}
return(g())
}
f(3) # 5
g <- function(a) a + 2
f <- function(a) g(a)
f(3) # 5
The problem is that, as explained in the R language definition:
The default arguments to a function are evaluated in the evaluation frame of the function.
In your first code block, when you call g() without any arguments, it falls back on its default value of a, which is a. Evaluating that in the "frame of the function" (i.e. the environment created by the call to g()), it finds an argument whose name matches the symbol a, and its value is a. When it looks for the value of that a, it finds an argument whose name matches that symbol, and whose value is a. When...
As you can see, you're stuck in a loop, which is what the error message is trying to tell you:
Error in g() :
promise already under evaluation: recursive default argument reference or
earlier problems?
Your second attempt, which calls g(a) works as you expected, because you've supplied an argument, and, as explained in the same section of R-lang:
The supplied arguments to a function are evaluated in the evaluation frame of the calling function.
There it finds a symbol a, which is bound to whatever value you passed in to the outer function's formal argument a, and all is well.
The problem is the a=a part. An argument can't be its own default. That is a circular reference.
This example may help clarify how it works:
x <- 1
f <- function(a = x) { x <- 2; a }
f()
## [1] 2
Note that a does not have the default 1; it has the default 2. It looks first in the function itself for the default. In a similar way a=a would cause a to be its own default which is circular.

Getting the parse tree for a predefined function in R

I feel as if this is a fairly basic question, but I can't figure it out.
If I define a function in R, how do I later use the name of the function to get its parse tree. I can't just use substitute as that will just return the parse tree of its argument, in this case just the function name.
For example,
> f <- function(x){ x^2 }
> substitute(f)
f
How should I access the parse tree of the function using its name? For example, how would I get the value of substitute(function(x){ x^2 }) without explicitly writing out the whole function?
I'm not exactly sure which of these meets your desires:
eval(f)
#function(x){ x^2 }
identical(eval(f), get("f"))
#[1] TRUE
identical(eval(f), substitute( function(x){ x^2 }) )
#[1] FALSE
deparse(f)
#[1] "function (x) " "{" " x^2" "}"
body(f)
#------
{
x^2
}
#---------
eval(parse(text=deparse(f)))
#---------
function (x)
{
x^2
}
#-----------
parse(text=deparse(f))
#--------
expression(function (x)
{
x^2
})
#--------
get("f")
# function(x){ x^2 }
The print representation may not display the full features of the values returned.
class(substitute(function(x){ x^2 }) )
#[1] "call"
class( eval(f) )
#[1] "function"
The function substitute can substitute in values bound to an environment. The odd thing is that its env argument does not possess a default value, but it defaults to the evaluation environment. This behavior seems to make it fail when the evaluation environment is the global environment, but works fine otherwise.
Here is an example:
> a = new.env()
> a$f = function(x){x^2}
> substitute(f, a)
function(x){x^2}
> f = function(x){x^2}
> environment()
<environment: R_GlobalEnv>
> substitute(f, environment())
f
> substitute(f, globalenv())
f
As demonstrated, when using the global environment as the second argument the functionality fails.
A further demosntration that it works correctly using a but not the global environment:
> evalq(substitute(f), a)
function(x){x^2}
> evalq(substitute(f), environment())
f
Quite puzzling.
Apparently that's indeed some weird quirk of substitute and is mentioned here:
/* do_substitute has two arguments, an expression and an
environment (optional). Symbols found in the expression are
substituted with their values as found in the environment. There is
no inheritance so only the supplied environment is searched. If no
environment is specified the environment in which substitute was
called is used. If the specified environment is R_GlobalEnv it is
converted to R_NilValue, for historical reasons. In substitute(),
R_NilValue signals that no substitution should be done, only
extraction of promise expressions. Arguments to do_substitute
should not be evaluated.
*/
And you have already found a way of circumventing it:
e = new.env()
e$fn = f
substitute(fn, e)

R specify function environment

I have a question about function environments in the R language.
I know that everytime a function is called in R, a new environment E
is created in which the function body is executed. The parent link of
E points to the environment in which the function was created.
My question: Is it possible to specify the environment E somehow, i.e., can one
provide a certain environment in which function execution should happen?
A function has an environment that can be changed from outside the function, but not inside the function itself. The environment is a property of the function and can be retrieved/set with environment(). A function has at most one environment, but you can make copies of that function with different environments.
Let's set up some environments with values for x.
x <- 0
a <- new.env(); a$x <- 5
b <- new.env(); b$x <- 10
and a function foo that uses x from the environment
foo <- function(a) {
a + x
}
foo(1)
# [1] 1
Now we can write a helper function that we can use to call a function with any environment.
with_env <- function(f, e=parent.frame()) {
stopifnot(is.function(f))
environment(f) <- e
f
}
This actually returns a new function with a different environment assigned (or it uses the calling environment if unspecified) and we can call that function by just passing parameters. Observe
with_env(foo, a)(1)
# [1] 6
with_env(foo, b)(1)
# [1] 11
foo(1)
# [1] 1
Here's another approach to the problem, taken directly from http://adv-r.had.co.nz/Functional-programming.html
Consider the code
new_counter <- function() {
i <- 0
function() {
i <<- i + 1
i
}
}
(Updated to improve accuracy)
The outer function creates an environment, which is saved as a variable. Calling this variable (a function) effectively calls the inner function, which updates the environment associated with the outer function. (I don't want to directly copy Wickham's entire section on this, but I strongly recommend that anyone interested read the section entitled "Mutable state". I suspect you could get fancier than this. For example, here's a modification with a reset option:
new_counter <- function() {
i <- 0
function(reset = FALSE) {
if(reset) i <<- 0
i <<- i + 1
i
}
}
counter_one <- new_counter()
counter_one()
counter_one()
counter_two <- new_counter()
counter_two()
counter_two()
counter_one(reset = TRUE)
I am not sure I completely track the goal of the question. But one can set the environment that a function executes in, modify the objects in that environment and then reference them from the global environment. Here is an illustrative example, but again I do not know if this answers the questioners question:
e <- new.env()
e$a <- TRUE
testFun <- function(){
print(a)
}
testFun()
Results in: Error in print(a) : object 'a' not found
testFun2 <- function(){
e$a <- !(a)
print(a)
}
environment(testFun2) <- e
testFun2()
Returns: FALSE
e$a
Returns: FALSE

Resources