Understanding R function lazy evaluation - r

I'm having a little trouble understanding why, in R, the two functions below, functionGen1 and functionGen2 behave differently. Both functions attempt to return another function which simply prints the number passed as an argument to the function generator.
In the first instance the generated functions fail as a is no longer present in the global environment, but I don't understand why it needs to be. I would've thought it was passed as an argument, and is replaced with aNumber in the namespace of the generator function, and the printing function.
My question is: Why do the functions in the list list.of.functions1 no longer work when a is not defined in the global environment? (And why does this work for the case of list.of.functions2 and even list.of.functions1b)?
functionGen1 <- function(aNumber) {
printNumber <- function() {
print(aNumber)
}
return(printNumber)
}
functionGen2 <- function(aNumber) {
thisNumber <- aNumber
printNumber <- function() {
print(thisNumber)
}
return(printNumber)
}
list.of.functions1 <- list.of.functions2 <- list()
for (a in 1:2) {
list.of.functions1[[a]] <- functionGen1(a)
list.of.functions2[[a]] <- functionGen2(a)
}
rm(a)
# Throws an error "Error in print(aNumber) : object 'a' not found"
list.of.functions1[[1]]()
# Prints 1
list.of.functions2[[1]]()
# Prints 2
list.of.functions2[[2]]()
# However this produces a list of functions which work
list.of.functions1b <- lapply(c(1:2), functionGen1)

A more minimal example:
functionGen1 <- function(aNumber) {
printNumber <- function() {
print(aNumber)
}
return(printNumber)
}
a <- 1
myfun <- functionGen1(a)
rm(a)
myfun()
#Error in print(aNumber) : object 'a' not found
Your question is not about namespaces (that's a concept related to packages), but about variable scoping and lazy evaluation.
Lazy evaluation means that function arguments are only evaluated when they are needed. Until you call myfun it is not necessary to evaluate aNumber = a. But since a has been removed then, this evaluation fails.
The usual solution is to force evaluation explicitly as you do with your functionGen2 or, e.g.,
functionGen1 <- function(aNumber) {
force(aNumber)
printNumber <- function() {
print(aNumber)
}
return(printNumber)
}
a <- 1
myfun <- functionGen1(a)
rm(a)
myfun()
#[1] 1

Related

Using the '->' operator in R to define a function requires parentheses to work

I was playing with R and its assignment operators a little and I found out that if I want to use the -> operator to write a function into a variable, I need to enclose the definition inside parentheses like so:
(function(x) {
return(x)
}) -> my_func
my_func("hi")
The following doesn't work as expected:
function(x) {
return(x)
} -> my_func
my_func("hi")
The output seems also rather strange to me:
function (x) my_func <- {
return(x) }
Error in eval(expr, envir, enclos): could not find function "my_func" Traceback:
Does anyone know why that is? My guess is that it will have something to do with the precedence and associativity of operators, but I can't put my finger on it...
The documentation for "function" in R show the following usage
function( arglist ) expr
So basically everything after function() will be interpreted as an expression and used as the function body. It just so happens that the assignment operator can be interpreted as part of that expression. Note that this is a valid expression in R
{1 - 2} -> x
So you are basically defining a function that does the assignment. You are not creating by evaluating that statement -- only defining an unnamed function. So for your example
function(x) {
x
} -> my_func
This line nerver creates a variable named my_func. It creates a function that will assign to a variable named my_func but that variable will only exist in the function scope and will not be available after the function finished running.
In your actual example, you used return so that local variable will never actually be created because the return() happens in the code block before the assignment completes. But if you did a global assignment, that would be more clear
if (exists("my_func")) rm(my_func)
exists("my_func")
# [1] FALSE
f <- function(x) {
x
} ->> my_func
f(5)
exists("my_func")
# [1] TRUE
But you can see that if you run, that variable is never created
if (exists("my_func")) rm(my_func)
function(x) {
x
} -> my_func
exists("my_func")
# [1] FALSE
When you use x <- function() {} with a new line after, these is a clear break so the parser knows where the expression ends. Adding in the () also makes is clear to the parse where the function expression ends so R doesn't gobble the assignment into the expression (function body) as well. Functions in R don't require curly braces or explicit return() calls: function(x) x is a valid function definition.
Without parenthesis the assignment -> is in the body of the function. So when you do
function(x) {
return(x)
} -> my_func
it is like
function(x) {
my_func <- {return(x)}
}
You can see that for instance by creating a variable:
a <- function(x) {
return(x)
} -> my_func
debug(a)
a(1)
Then it prompts the function as understood by R:
> a(1)
debugging in: a(1)
debug: my_fun <- {
return(x)
}

Defining a new class of functions in R

So I'm changing the class of some functions that I'm building in R in order to add a description attribute and because I want to use S3 generics to handle everything for me. Basically, I have a structure like
foo <- function(x) x + 1
addFunction <- function(f, description) {
class(f) <- c("addFunction", "function")
attr(f, "description") <- description
f
}
foo <- addFunction(foo, "Add one")
and then I do stuff like
description <- function(x) UseMethod("description")
description.default <- function(x) deparse(substitute(x))
description.addFunction <- function(x) attr(x, "description")
This works fine, but it's not that elegant. I'm wondering if it is possible to define a new class of functions such that instances of this class can be defined in a syntax similar to the function syntax. In other words, is it possible to define addFunction such that foo is generated in the following way:
foo <- addFunction(description = "Add one", x) {
x + 1
}
(or something similar, I have no strong feelings about where the attribute should be added to the function)?
Thanks for reading!
Update: I have experimented a bit more with the idea, but haven't really reached any concrete results yet - so this is just an overview of my current (updated) thoughts on the subject:
I tried the idea of just copying the function()-function, giving it a different name and then manipulating it afterwards. However, this does not work and I would love any inputs on what is happening here:
> function2 <- `function`
> identical(`function`, function2)
[1] TRUE
> function(x) x
function(x) x
> function2(x) x
Error: unexpected symbol in "function2(x) x"
> function2(x)
Error: incorrect number of arguments to "function"
As function() is a primitive function, I tried looking at the C-code defining it for more clues. I was particularly intrigued by the error message from the function2(x) call. The C-code underlying function() is
/* Declared with a variable number of args in names.c */
SEXP attribute_hidden do_function(SEXP call, SEXP op, SEXP args, SEXP rho)
{
SEXP rval, srcref;
if (TYPEOF(op) == PROMSXP) {
op = forcePromise(op);
SET_NAMED(op, 2);
}
if (length(args) < 2) WrongArgCount("function");
CheckFormals(CAR(args));
rval = mkCLOSXP(CAR(args), CADR(args), rho);
srcref = CADDR(args);
if (!isNull(srcref)) setAttrib(rval, R_SrcrefSymbol, srcref);
return rval;
}
and from this, I conclude that for some reason, at least two of the four arguments call, op, args and rho are now required. From the signature of do_function() I am guessing that the four arguments passed to do_function should be a call, a promise, a list of arguments and then maybe an environment. I tried a lot of different combinations for function2 (including setting up to two of these arguments to NULL), but I keep getting the same (new) error message:
> function2(call("sum", 2, 1), NULL, list(x=NULL), baseenv())
Error: invalid formal argument list for "function"
> function2(call("sum", 2, 1), NULL, list(x=NULL), NULL)
Error: invalid formal argument list for "function"
This error message is returned from the C-function CheckFormals(), which I also looked up:
/* used in coerce.c */
void attribute_hidden CheckFormals(SEXP ls)
{
if (isList(ls)) {
for (; ls != R_NilValue; ls = CDR(ls))
if (TYPEOF(TAG(ls)) != SYMSXP)
goto err;
return;
}
err:
error(_("invalid formal argument list for \"function\""));
}
I'm not fluent in C at all, so from here on I'm not quite sure what to do next.
So these are my updated questions:
Why do function and function2 not behave in the same way? Why
do I need to call function2 using a different syntax when they are
deemed identical in R?
What are the proper arguments of function2
such that function2([arguments]) will actually define a function?
Some keywords in R such as if and function have special syntax in the way that the underlying functions get called. It's quite easy to use if as a function if desired, e.g.
`if`(1 == 1, "True", "False")
is equivalent to
if (1 == 1) {
"True"
} else {
"False"
}
function is trickier. There's some help on this at a previous question.
For your current problem here's one solution:
# Your S3 methods
description <- function(x) UseMethod("description")
description.default <- function(x) deparse(substitute(x))
description.addFunction <- function(x) attr(x, "description")
# Creates the pairlist for arguments, handling arguments with no defaults
# properly. Also brings in the description
addFunction <- function(description, ...) {
args <- eval(substitute(alist(...)))
tmp <- names(args)
if (is.null(tmp)) tmp <- rep("", length(args))
names(args)[tmp==""] <- args[tmp==""]
args[tmp==""] <- list(alist(x=)$x)
list(args = as.pairlist(args), description = description)
}
# Actually creates the function using the structure created by addFunction and the body
`%{%` <- function(args, body) {
stopifnot(is.pairlist(args$args), class(substitute(body)) == "{")
f <- eval(call("function", args$args, substitute(body), parent.frame()))
class(f) <- c("addFunction", "function")
attr(f, "description") <- args$description
f
}
# Example. Note that the braces {} are mandatory even for one line functions
foo <- addFunction(description = "Add one", x) %{% {
x + 1
}
foo(1)
#[1] 2

Run code when calling a missing/undefined function or when evaluating a missing symbol?

Is it possible in R to run some code when calling a missing (yet undefined) function or when evaluating an inexistent symbol?
Or: is there any way to load a library in such a situation?
In the end, I would like to have something like this:
autoload.table <- list(foo = source("foo.R"), bar = library("bar"))
foo()
#=> load "foo.R" and evaluate `foo()`
edit:
Building on the solution by #Miff, I came up with this function, which avoids the string mangling:
tAutoload <- function (name, expr) {
cl <- as.list(match.call())
sname <- as.character(cl$name)
if (!exists(sname)) {
assign(sname,
eval(substitute(function (...) {
rm(name)
expr
name(...)
})), envir = .GlobalEnv)
}
}
This can be used as follows:
tAutoload(foo, source("foo.R"))
tAutoload(bar, library("bar"))
Upon first invocation, e.g., foo() will remove itself and then execute the assigned action.
I'm not sure how generally applicable this code is - I think it may not be robust to a different types of argument matching in foo and bar, but how about something like this:
at <- list(foo = 'source("foo.R")', qplot = 'library(ggplot2)') #too lazy to type autoloader.tablF
for (i in 1:length(at))
assign(names(at)[i], eval(parse(text=paste0("function(...){ rm(",names(at)[i],",envir=.GlobalEnv);",at[[i]],"; ",names(at)[i],"(...) }")),envir=.GlobalEnv))
What does that mess do? For each element in the at list, create a function in the global environment, which deletes itself, runs the code from at[[i]], then runs the function again, with the arguments originally used, which should now call the new version loaded. So foo now has the value:
function(...){ rm(foo,envir=.GlobalEnv);source("foo.R"); foo(...) }
Example:
> foo
function(...){ rm(foo,envir=.GlobalEnv);source("foo.R"); foo(...) }
> foo(1)
fooing 1
> foo
function(x) cat("fooing", x, "\n") #Now imported from foo.R
or for qplot:
> qplot
function(...){ rm(qplot,envir=.GlobalEnv);library(ggplot2); qplot(...) }
> qplot(diamonds$cut, diamonds$carat) #produces a plot
> qplot #now prints definition from ggplot2
Create a blank function!
foo <- function(){ }

R function which returns a function... and variable scope

I am learning about functions returning other functions. For example:
foo1 <- function()
{
bar1 <- function()
{
return(constant)
}
}
foo2 <- function()
{
constant <- 1
bar2 <- function()
{
return(constant)
}
}
Suppose, now, I declare functions f1 and f2 as follows:
constant <- 2
f1 <- foo1()
f2 <- foo2()
Then it appears they have the same function definition:
> f1
function()
{
return(constant)
}
<environment: 0x408f048>
> f2
function()
{
return(constant)
}
<environment: 0x4046d78>
>
BUT the two functions are different. For example:
> constant <- 2
> f1()
[1] 2
> f2()
[1] 1
My question: Why is it legal for two functions, with identical function definitions, to produce different results?
I understand that foo1 treats constant as a global variable and foo2 as a constant variable, but it is impossible to tell this from the function definition surely?
(I am probably missing something fundamental.)
Sure they're different, the environments are different. Try ls(environment(f1)) and then ls(environment(f2)) and then get('constant', environment (f1)) and same for f2
Scope in R
Lev's answer is correct. To describe in more detail. When you you call f1 or pass f1 around you also have a reference to the original lexical environment in which the function was defined.
#since R is interpreted.. the variable constant doesn't have to be defined in the lexical environment... this all gets checked and evaluated at runtime
foo1ReturnedThisFunction <- foo1()
#outputs "Error in foo1ReturnedThisFunction() : object 'constant' not found"
foo1ReturnedThisFunction()
#defined the variable constant in the lexical environment
constant <- 5
#outputs 5
foo1ReturnedThisFunction()
in foo2... there is a definition of the variable constant in the "closer" (not sure if this is the right term) lexical environment so it uses that and doesn't look for the variable constant in the "global" environment

Get the attribute of a packaged function from within itself

Suppose we have this functions in a R package.
prova <- function() {
print(attr(prova, 'myattr'))
print(myattr(prova))
invisible(TRUE)
}
'myattr<-' <- function(x, value) {
attr(x, 'myattr') <- value
x
}
myattr <- function(x) attr(x, 'myattr')
So, I install the package and then I test it. This is the result:
prova()
# NULL
# NULL
myattr(prova) <- 'ciao' # setting 'ciao' for 'myattr' attribute
prova()
# NULL
# NULL # Why NULL here ?
myattr(prova)
# [1] "ciao"
attr(prova, 'myattr')
# [1] "ciao"
The question is: how to get the attribute of the function from within itself?
Inside the function itself I cannot get its attribute, as demonstrated by the example.
I suppose that the solution will be of the serie "computing on the language" (match.call()[[1L]], substitute, environments and friends). Am I wrong?
I think that the important point here is that this function is in a package (so, it has its environment and namespace) and I need its attribute inside itself, in the package, not outside.
you can use get with the envir argument.
prova <- function() {
print(attr(get("prova", envir=envir.prova), 'myattr'))
print(myattr(prova))
invisible(TRUE)
}
eg:
envir.prova <- environment()
prova()
# NULL
# NULL
myattr(prova) <- 'ciao'
prova()
# [1] "ciao"
# [1] "ciao"
Where envir.prova is a variable whose value you set to the environment in which prova is defined.
Alternatively you can use get(.. envir=parent.frame()), but that is less reliable as then you have to track the calls too, and ensure against another object with the same name between the target environment and the calling environment.
Update regarding question in the comments:
regarding using parent.frame() versus using an explicit environment name: parent.frame, as the name suggests, goes "up one level." Often, that is exactly where you want to go, so that works fine. And yet, even when your goal is get an object in an environment further up, R searches up the call stack until it finds the object with the matching name. So very often, parent.frame() is just fine.
HOWEVER if there are multiple calls between where you are invoking parent.frame() and where the object is located AND in one of the intermediary environments there exists another object with the same name, then R will stop at that intermediary environment and return its object, which is not the object you were looking for.
Therefore, parent.frame() has an argument n (which defaults to 1), so that you can tell R to begin it's search at n levels back.
This is the "keeping track" that I refer to, where the developer has to be mindful of the number of calls in between. The straightforward way to go about this is to have an n argument in every function that is calling the function in question, and have that value default to 1. Then for the envir argument, you use: get/assign/eval/etc (.. , envir=parent.frame(n=n) )
Then if you call Func2 from Func1, (both Func1 and Func2 have an n argument), and Func2 is calling prova, you use:
Func1 <- function(x, y, ..., n=1) {
... some stuff ...
Func2( <some, parameters, etc,> n=n+1)
}
Func2 <- function(a, b, c, ..., n=1) {
.... some stuff....
eval(quote(prova()), envir=parent.frame(n=n) )
}
As you can see, it is not complicated but it is * tedious* and sometimes what seems like a bug creeps in, which is simply forgetting to carry the n over.
Therefore, I prefer to use a fixed variable with the environment name.
The solution that I found is:
myattr <- function(x) attr(x, 'myattr')
'myattr<-' <- function(x, value) {
# check that x is a function (e.g. the prova function)
# checks on value (e.g. also value is a function with a given precise signature)
attr(x, 'myattr') <- value
x
}
prova <- function(..., env = parent.frame()) {
# get the current function object (in its environment)
this <- eval(match.call()[[1L]], env)
# print(eval(as.call(c(myattr, this)), env)) # alternative
print(myattr(this))
# print(attr(this, 'myattr')
invisible(TRUE)
}
I want to thank #RicardoSaporta for the help and the clarification about keeping tracks of the calls.
This solution doesn't work when e.g. myattr(prova) <- function() TRUE is nested in func1 while prova is called in func2 (that it's called by func1). Unless you do not properly update its parameter env ...
For completeness, following the suggestion of #RicardoSaporta, I slightly modified the prova function:
prova <- function(..., pos = 1L) {
# get the current function object (in its environment)
this <- eval(match.call()[[1L]], parent.frame(n = pos)
print(myattr(this))
# ...
}
This way, it works also when nested, if the the correct pos parameter is passed in.
With this modification it is easier to go to fish out the environment in which you set the attribute on the function prova.
myfun1 <- function() {
myattr(prova) <- function() print(FALSE)
myfun2(n = 2)
}
myfun2 <- function(n) {
prova(pos = n)
}
myfun1()
# function() print(FALSE)
# <environment: 0x22e8208>

Resources