Why wrapper functions do not work as expected? - r

Here are four functions, the latter ones wraps the former ones.
a <- 0
f1 <- function(expr) {
a1 <- 1
eval(expr)
}
f2 <- function(expr) {
a2 <- 2
f1(expr)
}
f3 <- function(expr) {
a3 <- 3
f2(expr)
}
f4 <- function(expr) {
a4 <- 4
f3(expr)
}
Do the following experienments:
> f4(a)
0
which works as expected. But if we call
f4(a4)
Error in eval(expr) : object 'a4' not found
> f4(a3)
Error in eval(expr) : object 'a3' not found
...
> f2(a2)
Error in eval(expr) : object 'a2' not found
> f2(a1)
Error in eval(expr) : object 'a1' not found
> f1(a1)
Error in eval(expr) : object 'a1' not found
I inspect the local environment and parent environment of each function body f3's parent frame is f4's local environment, ... , f1's parent is f2's body. Is it a clear explanation why this happens? And how can I get rid of this problem to make the code work for the purpose that the function call should allow subsequent functions (like f3) to find the defined symbols (e.g. a4)?

I strongly recommend you spend some time reading Advanced R: Environments.
First of all, when I run f1(a1) I get "object 'a1' not found" as well; not "1" as you get above.
The issue is that by default R resolves variables using the enclosing environment of a function. The enclosing environment of a function is determined when the function is defined, not when the function is called. Therefore it doesn't go up the call chain to resolve variable names. You can explicitly look in a calling parent with the parent.frame() environment, but these environments do not chain together in nested function calls.
In the same way that get() will loop up a variable by walking up the enclosing parent environments, you can make your own function to walk up the calling environments and see which variables are available.
call.get <- function(val) {
for(i in 1:sys.nframe()) {
if (exists(val, envir=sys.frame(i), inherits=F)) {
return(get(val, envir=sys.frame(i)))
}
}
return(NULL)
}
call.ls <- function(val) {
vars<-lapply(1:sys.nframe(), function(i) ls(envir=parent.frame(i)))
return(sort(unique(unlist(vars))))
}
Then if you do something like
f1 <- function(expr) {
a1 <- 1
call.ls()
}
f2 <- function(expr) {
a2 <- 2
f1(expr)
}
f3 <- function(expr) {
a3 <- 3
f2(expr)
}
f4 <- function(expr) {
a4 <- 4
f3(expr)
}
f4(1)
You will get
"a1" "a2" "a3" "expr" "FUN" "val" "X"
and you can use
call.get("a3")
to get one of those variables from a parent calling frame.
But another problem you have is you are triggering evaluation of the expr argument when you call the sub-function. When you do
f2 <- function(expr) {
a2 <- 2
f1(expr)
}
That evaluates expr in the f2 environment and passes the result to f1. You are losing the evaluation at that point. The easiest way to pass through a lazy-evaluation is to use "...". Something like
f1 <- function(...) {
a1 <- 1
expr<-deparse(substitute(...))
call.get(expr)
}
f2 <- function(...) {
a2 <- 2
f1(...)
}
f2(a1)
# [1] 1
f2(a2)
# [1] 2
Otherwise you need to more explicitly pass the expression with a do.call
f1 <- function(expr) {
a1 <- 1
expr<-deparse(substitute(expr))
call.get(expr)
}
f2 <- function(expr) {
expr<-substitute(expr)
a2 <- 2
do.call(f1, list(expr))
}
f2(a1)
# [1] 1
f2(a2)
# [1] 2

Related

Can I access the last computed result before a function 'stop's?

Consider this code:
bad_function <- function() {
# a lot of code
x <- 1
stop("error")
}
tryCatch(bad_function(), error = function(cond) {x})
Obviously, x is not accessible in the error handler. But is there another way to access the value of x without changing bad_function? Alternatively, is there a way to patch bad_function to skip over stop("error") and return x without having to copy all that # a lot of code?
This works if the result you are looking for is named (and the you know the name - here, x):
bad_function <- function() {
# a lot of code
x <- 1
stop("error")
}
.old_stop <- base::stopifnot
.new_stop <- function(...) {
parent.frame()$x
}
assignInNamespace("stop", .new_stop, "base")
bad_function()
assignInNamespace("stop", .old_stop, "base")
I still wonder if there are better solutions.
You could assign the value simultaneously to x in the function environment, as well to another x in an external say debug environment that you defined beforehand.
ev1 <- new.env()
bad_function <- function() {
env <- new.env(parent=baseenv())
# a lot of code
x <- ev1$x <- 1
stop("error")
}
tryCatch(bad_function(), error = function(e) ev1$x)
# [1] 1
The advantage is that .GlobalEnv stays clear (apart from the environment of course).
ls()
# [1] "bad_function" "ev1"

Why is this simple function not working?

I first defined new variable x, then created function that require x within its body (not as argument). See code below
x <- c(1,2,3)
f1 <- function() {
x^2
}
rm(x)
f2 <- function() {
x <- c(1,2,3)
f1()
}
f(2)
Error in f1() : object 'x' not found
When I removed x, and defined new function f2 that first define x and then execute f1, it shows objects x not found.
I just wanted to know why this is not working and how I can overcome this problem. I do not want x to be name as argument in f1.
Please provide appropriate title because I do not know what kind of problem is this.
You could use a closure to make an f1 with the desired properties:
makeF <- function(){
x <- c(1,2,3)
f1 <- function() {
x^2
}
f1
}
f1 <- makeF()
f1() #returns 1 4 9
There is no x in the global scope but f1 still knows about the x in the environment that it was defined in.
In short: Your are expecting dynamic scoping but are a victim of R's lexical scoping:
dynamic scoping = the enclosing environment of a command is determined during run-time
lexical scoping = the enclosing environment of a command is determined at "compile time"
To understand the lookup path of your variable x in the current and parent environments try this code.
It shows that both functions do not share the environment in with x is defined in f2 so it can't never be found:
# list all parent environments of an environment to show the "search path"
parents <- function(env) {
while (TRUE) {
name <- environmentName(env)
txt <- if (nzchar(name)) name else format(env)
cat(txt, "\n")
if (txt == "R_EmptyEnv") break
env <- parent.env(env)
}
}
x <- c(1,2,3)
f1 <- function() {
print("f1:")
parents(environment())
x^2
}
f1() # works
# [1] "f1:"
# <environment: 0x4ebb8b8>
# R_GlobalEnv
# ...
rm(x)
f2 <- function() {
print("f2:")
parents(environment())
x <- c(1,2,3)
f1()
}
f2() # does not find "x"
# [1] "f2:"
# <environment: 0x47b2d18>
# R_GlobalEnv
# ...
# [1] "f1:"
# <environment: 0x4765828>
# R_GlobalEnv
# ...
Possible solutions:
Declare x in the global environment (bad programming style due to lack of encapsulation)
Use function parameters (this is what functions are made for)
Use a closure if x has always the same value for each call of f1 (not for beginners). See the other answer from #JohnColeman...
I strongly propose using 2. (add x as parameter - why do you want to avoid this?).

Accessing variables in closure in R

In the following example, why do f$i and f$get_i() return different results?
factory <- function() {
my_list <- list()
my_list$i <- 1
my_list$increment <- function() {
my_list$i <<- my_list$i + 1
}
my_list$get_i <- function() {
my_list$i
}
my_list
}
f <- factory()
f$increment()
f$get_i() # returns 2
f$i # returns 1
The way you code is very similar to the functional paradigm. R is more often used as a script language. So unless you exactly know what you are doing, it is bad practice to use <<- or to include functions in a functions.
You can find the explanation here at the function environment chapter.
Environment is a space/frame where your code is executed. Environment can be nested, in the same way functions are.
When creating a function, you have an enclosure environment attached which can be called by environment. This is the enclosing environment.
The function is executed in another environment, the execution environment with the fresh start principle. The execution environment is a children environment of the enclosing environment.
For exemple, on my laptop:
> environment()
<environment: R_GlobalEnv>
> environment(f$increment)
<environment: 0x0000000022365d58>
> environment(f$get_i)
<environment: 0x0000000022365d58>
f is an object located in the global environment.
The function increment has the enclosing environment 0x0000000022365d58 attached, the execution environment of the function factory.
I quote from Hadley:
When you create a function inside another function, the enclosing
environment of the child function is the execution environment of the
parent, and the execution environment is no longer ephemeral.
When the function f is executed, the enclosing environments are created with the my_list object in it.
That can be assessed with the ls command:
> ls(envir = environment(f$increment))
[1] "my_list"
> ls(envir = environment(f$get_i))
[1] "my_list"
The <<- operator is searching in the parents environments for the variables used. In that case, the my_list object found is the one in the immediate upper environment which is the enclosing environment of the function.
So when an increment is made, it is made only in that environment and not in the global.
You can see it by replacing the increment function by that:
my_list$increment <- function() {
print("environment")
print(environment())
print("Parent environment")
print(parent.env(environment()))
my_list$i <<- my_list$i + 1
}
It give me:
> f$increment()
[1] "environment"
<environment: 0x0000000013c18538>
[1] "Parent environment"
<environment: 0x0000000022365d58>
You can use get to access to your result once you have stored the environment name:
> my_main_env <- environment(f$increment)
> get("my_list", env = my_main_env)
$i
[1] 2
$increment
function ()
{
print("environment")
print(environment())
print("Parent environment")
print(parent.env(environment()))
my_list$i <<- my_list$i + 1
}
<environment: 0x0000000022365d58>
$get_i
function ()
{
print("environment")
print(environment())
print("Parent environment")
print(parent.env(environment()))
my_list$i
}
<environment: 0x0000000022365d58>
f <- factory()
creates my_list object with my_list$i = 1 and assigns it to f. So now f$i = 1.
f$increment()
increments my_list$i only. It does not affect f.
Now
f$get_i()
returns (previously incremented) my_list$i while
f$i
returns unaffected f$i
It' because you used <<- operator that operates on global objects. If you change your code to
my_list$increment <- function(inverse) {
my_list$i <- my_list$i + 1
}
my_list will be incremented only inside increment function. So now you get
> f$get_i()
[1] 1
> f$i
[1] 1
Let me add a one more line to your code, so we could see increment's intestines:
my_list$increment <- function(inverse) {
my_list$i <- my_list$i + 1
return(my_list$i)
}
Now, you can see that <- operates only inside increment while <<- operated outside of it.
> f <- factory()
> f$increment()
[1] 2
> f$get_i()
[1] 1
> f$i
[1] 1
Based on comments from #Cath on "value by reference", I was inspired to come up with this.
library(data.table)
factory <- function() {
my_list <- list()
my_list$i <- data.table(1)
my_list$increment <- function(inverse) {
my_list$i[ j = V1:=V1+1]
}
my_list$get_i <- function() {
my_list$i
}
my_list
}
f <- factory()
f$increment()
f$get_i() # returns 2
V1
1: 2
f$i # returns 1
V1
1: 2
f$increment()
f$get_i() # returns 2
V1
1: 3
f$i # returns 1
V1
1: 3

Understanding R function lazy evaluation

I'm having a little trouble understanding why, in R, the two functions below, functionGen1 and functionGen2 behave differently. Both functions attempt to return another function which simply prints the number passed as an argument to the function generator.
In the first instance the generated functions fail as a is no longer present in the global environment, but I don't understand why it needs to be. I would've thought it was passed as an argument, and is replaced with aNumber in the namespace of the generator function, and the printing function.
My question is: Why do the functions in the list list.of.functions1 no longer work when a is not defined in the global environment? (And why does this work for the case of list.of.functions2 and even list.of.functions1b)?
functionGen1 <- function(aNumber) {
printNumber <- function() {
print(aNumber)
}
return(printNumber)
}
functionGen2 <- function(aNumber) {
thisNumber <- aNumber
printNumber <- function() {
print(thisNumber)
}
return(printNumber)
}
list.of.functions1 <- list.of.functions2 <- list()
for (a in 1:2) {
list.of.functions1[[a]] <- functionGen1(a)
list.of.functions2[[a]] <- functionGen2(a)
}
rm(a)
# Throws an error "Error in print(aNumber) : object 'a' not found"
list.of.functions1[[1]]()
# Prints 1
list.of.functions2[[1]]()
# Prints 2
list.of.functions2[[2]]()
# However this produces a list of functions which work
list.of.functions1b <- lapply(c(1:2), functionGen1)
A more minimal example:
functionGen1 <- function(aNumber) {
printNumber <- function() {
print(aNumber)
}
return(printNumber)
}
a <- 1
myfun <- functionGen1(a)
rm(a)
myfun()
#Error in print(aNumber) : object 'a' not found
Your question is not about namespaces (that's a concept related to packages), but about variable scoping and lazy evaluation.
Lazy evaluation means that function arguments are only evaluated when they are needed. Until you call myfun it is not necessary to evaluate aNumber = a. But since a has been removed then, this evaluation fails.
The usual solution is to force evaluation explicitly as you do with your functionGen2 or, e.g.,
functionGen1 <- function(aNumber) {
force(aNumber)
printNumber <- function() {
print(aNumber)
}
return(printNumber)
}
a <- 1
myfun <- functionGen1(a)
rm(a)
myfun()
#[1] 1

How can I manage environments to wrap functions and avoid environment issues?

Here are 4 functions which mimics a family of functions from basic to advanced.
f1 <- function(expr,envir) {
eval(expr,envir)
}
f2 <- function(expr) {
expr <- substitute(expr)
f1(expr,parent.frame())
}
f3 <- function(x) {
lapply(1:3, function(i) {
f2(x+i)
})
}
f4 <- function(...) {
f3(...)
}
f1 is the fundamental one, f2 calls f1 with an expression, f3 iteratively calls f2 with x defined in its body frame.
Calling f3 yields
> f3(1)
[[1]]
[1] 2
[[2]]
[1] 3
[[3]]
[1] 4
which has no problem because the x is evaluated correctly in parent.frame() of f2. However, if a wrapper function f4 is called, an error occurs:
> f4()
Error in eval(expr, envir, enclos) :
argument "x" is missing, with no default
I inspect the parent environments local to f2 in line and find that parent.frame(3) contains x: missing.
Is there a way or good practice to manage the environments so that I can wrap functions like f4 without worrying too much about the parent environment issues?

Resources