R get the environment created by a function at the call - r

I would like to get the environment created by a function when it is runned WITHOUT modifying the function source (ie from outside of the function), is it possible ?
fn=function()
{#Here a new environment is created at each call, how to get it ?
#This environment can be access with environment() but only (to what I know)
#from inside the function
...
}
I would like something like this:
env=some_function(fn())
where env is the environment id created by fn at the call.

You could trace the function to bind the call environment to a symbol in the global environment:
fn <- function() {x <- 2; 1}
trace(fn, quote(efn <<- environment()), at = 1)
fn()
#Tracing fn() step 1
#[1] 1
untrace(fn)
efn$x
#[1] 2

Related

R: environment diagram for decorator function

I want to draw an environment diagram for the following code which contains an error to understand how R works exactly when evaluating a function.
# emphasize text
emph <- function(f, style = '**') {
function(...) {
if (length(style) == 1) {
paste(style, f(...), style)
} else {
paste(style[1], f(...), style[2])
}
}
}
# function to be decorated
tmbg <- function() {
'tmbg are okay'
}
# a decorator function with self-referencing name
tmbg <- emph(tmbg)
I got error while evaluating the call expression of the decorator function
tmbg()
> Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
I could understand this is related to the lazy evaluation of function parameter in R. It feels like when evaluating tmbg() in global frame, the name of f used in the returned anonymous function binds again to tmbg in global frame which again returns the anonymous function and calls f, thus leads to infinite recursive calls. But this image is not so clear to me because I don't exactly know what is the evaluation model used in R especially with this "lazy evaluation".
Below I draw the essential parts of the environment diagrams and explain the evaluation rule used in Python for the equivalent code. I hope to get such environment diagrams for R as well, or at least get the same level of clarity for the environmental model used in R.
# This is the equivalent python code
def emph(f, style = ['**']):
def wrapper(*args):
if len(style) == 1:
return style[0] + f(*args) + style[0]
else:
return style[0] + f(*args) + style[1]
return wrapper
def tmbg():
return 'tmbg are okay'
tmbg = emph(tmbg)
tmbg()
When evaluating the assignment statement at line 12 tmbg = emph(tmbg), the call expression emph(tmbg) needs to be evaluated first. When evaluating the operator of the call expression, its formal parameter f binds to name tmbg in global frame which binds to a function we defined in global frame, as shown in the picture below.
Next, after finishing the evaluation of the call expression emph(tmbg), its returned function wrapper binds to the name tmbg in global frame. However the binding of f and the actual function tmbg is still hold in the local frame created by emph (f1 in the diagram below).
Therefore when evaluating tmbg() in global frame, there won't be any confusion about which is the decorator function (tmbg in global) and which is the function to be decorated (f in local frame). This is the different part compared to R.
It looks like what R does is that it changes the binding from f -> function tmbg() to f -> name tmbg in global frame, which again binds to function wrapper(*args) calling f itself and thus leads to this infinite recursion. But it might also be a completely different model that R does not really bind f to any object but a name tmbg and ignores what that name represents. When it starts to evaluate, it looks for name tmbg and it finds the global one which is created by tmbg <- emph(tmbg) and gets infinite recursion. But this sounds really weird as the local scope created by the function call does not count anymore (or partially counts) for the purpose of "lazy evaluation" as soon as we pass an expression as argument of that function. There has to be then a system running parallelly other than the environments created by the function calls managing the namespaces and the scopes.
In either case, it is not clear to me the environmental model and evaluation rule R. I want to be clear on these and draw an environment diagram for the R code as clear as the one below if possible.
The problem is not understanding environments. The problem is understanding lazy evaluation.
Due to lazy evaluation f is just a promise which is not evaluated until the anonymous function is run and by that time tmbg has been redefined. To force f to be evaluated when emph is run add the marked ### force statement to force it. No other lines are changed.
In terms of environments the anonymous function gets f from emph and in emph f is a promise which is not looked up in the caller until the anonymous function is run unless we add the force statement.
emph <- function(f, style = '**') {
force(f) ###
function(...) {
if (length(style) == 1) {
paste(style, f(...), style)
} else {
paste(style[1], f(...), style[2])
}
}
}
# function to be decorated
tmbg <- function() {
'tmbg are okay'
}
# a decorator function with self-referencing name
tmbg <- emph(tmbg)
tmbg()
## [1] "** tmbg are okay **"
We can look at the promise using the pryr package.
library(pryr)
emph <- function(f, style = '**') {
str(promise_info(f))
force(f)
cat("--\n")
str(promise_info(f))
function(...) {
if (length(style) == 1) {
paste(style, f(...), style)
} else {
paste(style[1], f(...), style[2])
}
}
}
# function to be decorated
tmbg <- function() {
'tmbg are okay'
}
tmbg <- emph(tmbg)
which results in this output that shows that f is at first unevaluated but after force is invoked it contains the value of f. Had we not used force the anonymous function would have accessed f in the state shown in the first promise_info() output so all it would know is a symbol tmbg and where to look for it (Global Environment).
List of 4
$ code : symbol tmbg
$ env :<environment: R_GlobalEnv>
$ evaled: logi FALSE
$ value : NULL
--
List of 4
$ code : symbol tmbg
$ env : NULL
$ evaled: logi TRUE
$ value :function ()
..- attr(*, "srcref")= 'srcref' int [1:8] 1 13 3 5 13 5 1 3
.. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x00000000102c3730>

Modifying calls in function arguments

How can a function inspect and modify the arguments of a call that it received as argument?
Application: A user feeds a call to function a as an argument to function b, but they forget to specify one of the required arguments of a. How can function b detect the problem and fix it?
In this minimal example, function a requires two arguments:
a <- function(arg1, arg2) {
return(arg1 + arg2)
}
Function b accepts a call and an argument. The commented lines indicate what I need to do:
b <- function(CALL, arg3) {
# 1. check if `arg2` is missing from CALL
# 2. if `arg2` is missing, plug `arg3` in its place
# 3. return evaluated call
CALL
}
Expected behavior:
b(CALL = a(arg1 = 1, arg2 = 2), arg3 = 3)
> 3
b(CALL = a(arg1 = 1), arg3 = 3)
> 4
The second call currently fails because the user forgot to specify the required arg2 argument. How can function b fix this mistake automatically?
Can I exploit lazy evaluation to modify the call to a before it is evaluated? I looked into rlang::modify_call but couldn't figure it out.
Here's a method that would work
b <- function(CALL, arg3) {
scall <- substitute(CALL)
stopifnot(is.call(scall)) #check that it's a call
lcall <- as.list(scall)
if (!"arg2" %in% names(lcall)) {
lcall <- c(lcall, list(arg2 = arg3))
}
eval.parent(as.call(lcall))
}
We use substitute() to grab the unevaluated version the CALL parameter. We convert it to a list so we can modify it. Then we append to the list another list with the parameter name/value we want. Finally, we turn the list back into a call and then evaluate that call in the environment of the caller rather than in the function body itself.
If you wanted to use rlang::modify_call and other rlang functions you could use
b <- function(CALL, arg3) {
scall <- rlang::enquo(CALL)
stopifnot(rlang::quo_is_call(scall))
if (!"arg2" %in% names(rlang::quo_get_expr(scall))) {
scall <- rlang::call_modify(scall, arg2=arg3)
}
rlang::eval_tidy(scall, env = rlang::caller_env())
}
I don't see why fancy language manipulation is needed. The problem is what to do when a, which requires 2 arguments, is supplied only 1. Wrapping it with b, which has a default value for the 2nd argument, solves this.
b <- function(arg1, arg2=42)
{
a(arg1, arg2)
}
b(1)
# [1] 43
b(1, 2)
# [1] 3

Reading global variables

All I can find is how to write to global variables, but not how to read them.
Example of incorrect code:
v = 0;
test <- function(v) {
v ->> global_v;
v <<- global_v + v;
}
test(1);
print(v);
This yields 2 because v ->> global_v treats v as the local variable v which is equal to 1. What can I replace that line with for global_v to get the 0 from the global v?
I'm asking of course about solutions different to "use different variable names".
You can use with(globalenv(), v) to evaluate v in the global environment rather than the function. with constructs an environment from its first argument, and evaluates the subsequent arguments in that environment. globalenv() returns the global environment. Putting those together, your function would become this:
test <- function(v) {
v <<- with(globalenv(), v) + v;
}

R - Define a object in a function which is inside another function

I have one function inside another like this:
func2 <- function(x=1) {ko+x+1}
func3= function(l=1){
ko=2
func2(2)+l
}
func3(1)
it shows error: Error in func2(2) : object 'ko' not found. Basically I want to use object ko in func2 which will not be define until func3 is called. Is there any fix for this?
Yes, it can be fixed:
func2 <- function(x=1) {ko+x+1}
func3= function(l=1){
ko=2
assign("ko", ko, environment(func2))
res <- func2(2)+l
rm("ko", envir = environment(func2))
res
}
func3(1)
#[1] 6
As you see this is pretty complicated. That's often a sign that you are not following good practice. Good practice would be to pass ko as a parameter:
func2 <- function(x=1, ko) {ko+x+1}
func3= function(l=1){
ko=2
func2(2, ko)+l
}
func3(1)
#[1] 6
You don't really have one function "inside" the other currently (you are just calling a function within a different function). If you did move the one function inside the other function, then this would work
func3 <- function(l=1) {
func2 <- function(x=1) {ko+x+1}
ko <- 2
func2(2)+l
}
func3(1)
Functions retain information about the environment in which they were defined. This is called "lexical scoping" and it's how R operates.
But in general I agree with #Roland that it's better to write functions that have explicit arguments.
This is a good case for learning about closures and using a factory.
func3_factory <- function (y) {
ko <- y
func2 <- function (x = 1) { ko + x + 1 }
function (l = 1) { func2(2) + l }
}
ko <- 1
func3_ko_1 <- func3_factory(ko)
ko <- 7
func3_ko_7 <- func3_factory(ko)
# each function stores its own value for ko
func3_ko_1(1) # 5
func3_ko_7(1) # 11
# changing ko in the global scope doesn't affect the internal ko values in the closures
ko <- 100
func3_ko_1(1) # 5
func3_ko_7(1) # 11
When func3_factory returns a function, that new function is coupled with the environment in which it was created, which in this case includes a variable named ko which keeps whatever value was passed into the factory and a function named func2 which can also access that fixed value for ko. This combindation of a function and the environemnt it was defined in is called a closure. Anything that happens inside the returned function can access these values, and they stay the same even if that ko variable is changed outside the closure.

Difference(s) between named function & anonymous function (Lua)

What is the differences between these myFuncs?
Code 1
function wrapper()
local someVariable = 0;
function myFunc(n)
if n > 0 then return myFunc(n-1) end
end
return myFunc;
end
Code 2
function wrapper()
local someVariable = 0;
local myFunc = function(n)
if n > 0 then return myFunc(n-1) end
end
return myFunc;
end
Code 3
function wrapper()
local someVariable = 0;
local myFunc;
myFunc = function(n)
if n > 0 then return myFunc(n-1) end
end;
return myFunc;
end
Because when I refer the function name myFunc inside the myFunc itself. Their behavior are not the same. (eg. the upvalue someVariable... problematic :-S )
[edit: I misread your code #2.]
Code #1 sets the global value of myFunc to the function. So every time you call wrapper, you will be setting this one global to a new value. Furthermore, any references to your myFunc call will be to this global (which is modifiable), not to a local (which would be an upvalue of the closure).
Code #2 sets a local variable myFunc. However, because of the rules of Lua, that local variable only comes into scope after the statement defining it is complete. This allows you to do things like this:
local x = x or 5
The x in the expression is a previously declared local or global. The new x doesn't come into scope until after the x or 5 expression has been evaluated.
The same goes for your function definition. Therefore any references to myFunc will be to a global variable, not a local.
Code #3 creates a local variable myFunc. Then it sets into that variable a function. Because the function is created after the local variable comes into scope, references to myFunc in the function will refer to the local variable, not to a global one.
Note that local function X is equivalent to local X; X = function.... Not to local X = function....
Nicol's answer is mostly correct, but there is one thing that is worth pointing out:
In Code 2 MyFunc doesn't need to be a global variable, it can be a local variable in some outer scope, which will become an upvalue for this function you are creating (the same comment also applies to Code 1). For example, this will print 100:
local function myFunc(n) return 100 end
function wrapper()
local someVariable = 0;
local myFunc = function(n)
if n > 0 then return myFunc(n-1) end
end
return myFunc;
end
print(wrapper()(1))
So, to summarize, there are four ways you can use to define myFunc:
local myFunc; myFunc = function(n) ... return myFunc(n-1) end
local function myFunc(n) ... return myFunc(n-1) end
local myFunc = function(n) ... return myFunc(n-1) end
myFunc = function(n) ... return myFunc(n-1) end
1 and 2 are full equivalents. 3 will not do what you expect as it will use whatever definition of myFunc is available when local myFunc is executed (which may point to an upvalue for MyFunc or a global variable). 4 will work, but only because it will assign the newly created function to (again) either an upvalue or a global variable (and reference the same value in the body of the function).

Resources