Why is this simple function not working? - r

I first defined new variable x, then created function that require x within its body (not as argument). See code below
x <- c(1,2,3)
f1 <- function() {
x^2
}
rm(x)
f2 <- function() {
x <- c(1,2,3)
f1()
}
f(2)
Error in f1() : object 'x' not found
When I removed x, and defined new function f2 that first define x and then execute f1, it shows objects x not found.
I just wanted to know why this is not working and how I can overcome this problem. I do not want x to be name as argument in f1.
Please provide appropriate title because I do not know what kind of problem is this.

You could use a closure to make an f1 with the desired properties:
makeF <- function(){
x <- c(1,2,3)
f1 <- function() {
x^2
}
f1
}
f1 <- makeF()
f1() #returns 1 4 9
There is no x in the global scope but f1 still knows about the x in the environment that it was defined in.

In short: Your are expecting dynamic scoping but are a victim of R's lexical scoping:
dynamic scoping = the enclosing environment of a command is determined during run-time
lexical scoping = the enclosing environment of a command is determined at "compile time"
To understand the lookup path of your variable x in the current and parent environments try this code.
It shows that both functions do not share the environment in with x is defined in f2 so it can't never be found:
# list all parent environments of an environment to show the "search path"
parents <- function(env) {
while (TRUE) {
name <- environmentName(env)
txt <- if (nzchar(name)) name else format(env)
cat(txt, "\n")
if (txt == "R_EmptyEnv") break
env <- parent.env(env)
}
}
x <- c(1,2,3)
f1 <- function() {
print("f1:")
parents(environment())
x^2
}
f1() # works
# [1] "f1:"
# <environment: 0x4ebb8b8>
# R_GlobalEnv
# ...
rm(x)
f2 <- function() {
print("f2:")
parents(environment())
x <- c(1,2,3)
f1()
}
f2() # does not find "x"
# [1] "f2:"
# <environment: 0x47b2d18>
# R_GlobalEnv
# ...
# [1] "f1:"
# <environment: 0x4765828>
# R_GlobalEnv
# ...
Possible solutions:
Declare x in the global environment (bad programming style due to lack of encapsulation)
Use function parameters (this is what functions are made for)
Use a closure if x has always the same value for each call of f1 (not for beginners). See the other answer from #JohnColeman...
I strongly propose using 2. (add x as parameter - why do you want to avoid this?).

Related

Difference between <- and <<- [duplicate]

This question already has answers here:
How do you use "<<-" (scoping assignment) in R?
(7 answers)
Closed 7 years ago.
CASE 1:
rm(list = ls())
foo <- function(x = 6){
set <- function(){
x <- x*x}
set()
x}
foo()
# [1] 6
CASE 2:
rm(list = ls())
foo <- function(x = 6){
set <- function(){
x <<- x*x}
set()
x}
foo()
# [1] 36
I read that <<- operator can be used to assign a value to an object in an environment that is different from the current environment. It says that object initialization using <<- can be done to the objects that is not in the current environment. I want to ask which environment's object can be initialized using <<- . In my case the environment is environment of foo function, can <<-initialize the objects outside the function or the object in the current environment? Totally confused when to use <- and when to use <<-.
The operator <<- is the parent scope assignment operator. It is used to make assignments to variables in the nearest parent scope to the scope in which it is evaluated. These assignments therefore "stick" in the scope outside of function calls. Consider the following code:
fun1 <- function() {
x <- 10
print(x)
}
> x <- 5 # x is defined in the outer (global) scope
> fun1()
[1] 10 # x was assigned to 10 in fun1()
> x
[1] 5 # but the global value of x is unchanged
In the function fun1(), a local variable x is assigned to the value 10, but in the global scope the value of x is not changed. Now consider rewriting the function to use the parent scope assignment operator:
fun2 <- function() {
x <<- 10
print(x)
}
> x <- 5
> fun2()
[1] 10 # x was assigned to 10 in fun2()
> x
[1] 10 # the global value of x changed to 10
Because the function fun2() uses the <<- operator, the assignment of x "sticks" after the function has finished evaluating. What R actually does is to go through all scopes outside fun2() and look for the first scope containing a variable called x. In this case, the only scope outside of fun2() is the global scope, so it makes the assignment there.
As a few have already commented, the <<- operator is frowned upon by many because it can break the encapsulation of your R scripts. If we view an R function as an isolated piece of functionality, then it should not be allowed to interfere with the state of the code which calls it. Abusing the <<- assignment operator runs the risk of doing just this.
The <<- operator can be used to assign a variable to the global environment. It's better to use the assign function than <<-. You probably shouldn't need to use <<- though - outputs needed from functions should be returned as objects in R.
Here's an example
f <- function(x) {
y <<- x * 2 # y outside the function
}
f(5) # y = 10
This is equivalent to
f <- function(x) {
x * 2
}
y <- f(5) # y = 10
With the assign function,
f <- function(x) {
assign('y', x*2 envir=.GlobalEnv)
}
f(5) # y = 10

R function which returns a function... and variable scope

I am learning about functions returning other functions. For example:
foo1 <- function()
{
bar1 <- function()
{
return(constant)
}
}
foo2 <- function()
{
constant <- 1
bar2 <- function()
{
return(constant)
}
}
Suppose, now, I declare functions f1 and f2 as follows:
constant <- 2
f1 <- foo1()
f2 <- foo2()
Then it appears they have the same function definition:
> f1
function()
{
return(constant)
}
<environment: 0x408f048>
> f2
function()
{
return(constant)
}
<environment: 0x4046d78>
>
BUT the two functions are different. For example:
> constant <- 2
> f1()
[1] 2
> f2()
[1] 1
My question: Why is it legal for two functions, with identical function definitions, to produce different results?
I understand that foo1 treats constant as a global variable and foo2 as a constant variable, but it is impossible to tell this from the function definition surely?
(I am probably missing something fundamental.)
Sure they're different, the environments are different. Try ls(environment(f1)) and then ls(environment(f2)) and then get('constant', environment (f1)) and same for f2
Scope in R
Lev's answer is correct. To describe in more detail. When you you call f1 or pass f1 around you also have a reference to the original lexical environment in which the function was defined.
#since R is interpreted.. the variable constant doesn't have to be defined in the lexical environment... this all gets checked and evaluated at runtime
foo1ReturnedThisFunction <- foo1()
#outputs "Error in foo1ReturnedThisFunction() : object 'constant' not found"
foo1ReturnedThisFunction()
#defined the variable constant in the lexical environment
constant <- 5
#outputs 5
foo1ReturnedThisFunction()
in foo2... there is a definition of the variable constant in the "closer" (not sure if this is the right term) lexical environment so it uses that and doesn't look for the variable constant in the "global" environment

Environments in R, mapply and get

Let x<-2 in the global env:
x <-2
x
[1] 2
Let a be a function that defines another x locally and uses get:
a<-function(){
x<-1
get("x")
}
This function correctly gets x from the local enviroment:
a()
[1] 1
Now let's define a function b as below, that uses mapply with get:
b<-function(){
x<-1
mapply(get,"x")
}
If I call b, it seems that mapply makes get not search the function environment first. Instead, it tries to get x directly form the global enviroment, and if x is not defined in the global env, it gives an error message:
b()
x
2
rm(x)
b()
Error in (function (x, pos = -1L, envir = as.environment(pos), mode = "any", :
object 'x' not found
The solution to this is to explicitly define envir=environment().
c<-function(){
x<-1
mapply(get,"x", MoreArgs = list(envir=environment()))
}
c()
x
1
But I would like to know what exactly is going on here. What is mapplydoing? (And why? is this the expected behavior?) Is this "pitfall" common in other R functions?
The problem is that get looks into the envivornment that its called from but here we are passing get to mapply and then calling get from the local environment within mapply. If x is not found within the mapply local environment then it looks the into the parent environment of that, i.e. into environment(mapply) (which is the lexical environment that mapply was defined in which is the base namespace environment); if it is not there either, it looks into the parent of that, which is the global environment, i.e. your R workspace.
This is because R uses lexical scoping, as opposed to dynamic scoping.
We can show this by getting a variable that exists within mapply.
x <- 2
b2<-function(){
x<-1
mapply(get, "USE.NAMES")
}
b2() # it finds USE.NAMES in mapply
## USE.NAMES
## TRUE
In addition to the workaround involving MoreArgs shown in the question this also works since it causes the search to look into the local environment within b after failing to find it mapply. (This is just for illustrating what is going on and in actual practice we would prefer the workaround shown in the question.)
x <- 2
b3 <-function(){
x<-1
environment(mapply) <- environment()
mapply(get, "x")
}
b3()
## 1
ADDED Expanded explanation. Also note that we can view the chain of environments like this:
> debug(get)
> b()
debugging in: (function (x, pos = -1L, envir = as.environment(pos), mode = "any",
inherits = TRUE)
.Internal(get(x, envir, mode, inherits)))(dots[[1L]][[1L]])
debug: .Internal(get(x, envir, mode, inherits))
Browse[2]> envir
<environment: 0x0000000021ada818>
Browse[2]> ls(envir) ### this shows that envir is the local env in mapply
[1] "dots" "FUN" "MoreArgs" "SIMPLIFY" "USE.NAMES"
Browse[2]> parent.env(envir) ### the parent of envir is the base namespace env
<environment: namespace:base>
Browse[2]> parent.env(parent.env(envir)) ### and grandparent of envir is the global env
<environment: R_GlobalEnv>
Thus, the ancestory of environments potentially followed is this (where arrow points to parent):
local environment within mapply --> environment(mapply) --> .GlobalEnv
where environment(mapply) equals asNamespace("base"), the base namespace environment.
R is lexically scoped, not dynamically scoped, meaning that when you search through parent environments to find a value, you are searching through the lexical parents (as written in the source code), not through the dynamic parents (as invoked). Consider this example:
x <- "Global!"
fun1 <- function() print(x)
fun2 <- function() {
x <- "Local!"
fun1a <- function() print(x)
fun1() # fun2() is dynamic but not lexical parent of fun1()
fun1a() # fun2() is both dynamic and lexical parent of fun1a()
}
fun2()
outputs:
[1] "Global!"
[1] "Local!"
In this case fun2 is the lexical parent of fun1a, but not of fun1. Since mapply is not defined inside your functions, your functions are not the lexical parents of mapply and the xs defined therein are not directly accessible to mapply.
The issue is an interplay with built-in C code. Namely, considering the following:
fx <- function(x) environment()
env <- NULL; fn <- function() { env <<- environment(); mapply(fx, 1)[[1]] }
Then
env2 <- fn()
identical(env2, env)
# [1] FALSE
identical(parent.env(env2), env)
# [1] FALSE
identical(parent.env(env2), globalenv())
# [1] TRUE
More specifically, the problem lies in the underlying C code, which fails to consider executing environment, and hands it off to an as-is underlying C eval call which creates a temp environment branching directly off of R_GlobalEnv.
Note this really is what is going on, since no level of stack nesting fixes the issue:
env <- NULL; fn2 <- function() { env <<- environment(); (function() { mapply(fx, 1)[[1]] })() }
identical(parent.env(fn2()), globalenv())
# [1] TRUE

Anonymous passing of variables from current environment to subfunction calls

The function testfun1, defined below, does what I want it to do. (For the reasoning of all this, see the background info below the code example.) The question I wanted to ask you is why what I tried in testfun2 doesn't work. To me, both appear to be doing the exact same thing. As shown by the print in testfun2, the evaluation of the helper function inside testfun2 takes place in the correct environment, but the variables from the main function environment get magically passed to the helper function in testfun1, but not in testfun2. Does anyone of you know why?
helpfun <- function(){
x <- x^2 + y^2
}
testfun1 <- function(x,y){
xy <- x*y
environment(helpfun) <- sys.frame(sys.nframe())
x <- eval(as.call(c(as.symbol("helpfun"))))
return(list(x=x,xy=xy))
}
testfun1(x = 2,y = 1:3)
## works as intended
eval.here <- function(fun){
environment(fun) <- parent.frame()
print(environment(fun))
eval(as.call(c(as.symbol(fun))))
}
testfun2 <- function(x,y){
print(sys.frame(sys.nframe()))
xy <- x*y
x <- eval.here("helpfun")
return(list(x=x,xy=xy))
}
testfun2(x = 2,y = 1:3)
## helpfun can't find variable 'x' despite having the same environment as in testfun1...
Background info: I have a large R code in which I want to call helperfunctions inside my main function. They alter variables of the main function environment. The purpose of all this is mainly to unclutter my code. (Main function code is currently over 2000 lines, with many calls to various helperfunctions which themselves are 40-150 lines long...)
Note that the number of arguments to my helper functions is very high, so that the traditional explicit passing of function arguments ( "helpfun(arg1 = arg1, arg2 = arg2, ... , arg50 = arg50)") would be cumbersome and doesnt yield the uncluttering of the code that I am aiming for. Therefore, I need to pass the variables from the parent frame to the helper functions anonymously.
Use this instead:
eval.here <- function(fun){
fun <- get(fun)
environment(fun) <- parent.frame()
print(environment(fun))
fun()
}
Result:
> testfun2(x = 2,y = 1:3)
<environment: 0x0000000013da47a8>
<environment: 0x0000000013da47a8>
$x
[1] 5 8 13
$xy
[1] 2 4 6

Get the attribute of a packaged function from within itself

Suppose we have this functions in a R package.
prova <- function() {
print(attr(prova, 'myattr'))
print(myattr(prova))
invisible(TRUE)
}
'myattr<-' <- function(x, value) {
attr(x, 'myattr') <- value
x
}
myattr <- function(x) attr(x, 'myattr')
So, I install the package and then I test it. This is the result:
prova()
# NULL
# NULL
myattr(prova) <- 'ciao' # setting 'ciao' for 'myattr' attribute
prova()
# NULL
# NULL # Why NULL here ?
myattr(prova)
# [1] "ciao"
attr(prova, 'myattr')
# [1] "ciao"
The question is: how to get the attribute of the function from within itself?
Inside the function itself I cannot get its attribute, as demonstrated by the example.
I suppose that the solution will be of the serie "computing on the language" (match.call()[[1L]], substitute, environments and friends). Am I wrong?
I think that the important point here is that this function is in a package (so, it has its environment and namespace) and I need its attribute inside itself, in the package, not outside.
you can use get with the envir argument.
prova <- function() {
print(attr(get("prova", envir=envir.prova), 'myattr'))
print(myattr(prova))
invisible(TRUE)
}
eg:
envir.prova <- environment()
prova()
# NULL
# NULL
myattr(prova) <- 'ciao'
prova()
# [1] "ciao"
# [1] "ciao"
Where envir.prova is a variable whose value you set to the environment in which prova is defined.
Alternatively you can use get(.. envir=parent.frame()), but that is less reliable as then you have to track the calls too, and ensure against another object with the same name between the target environment and the calling environment.
Update regarding question in the comments:
regarding using parent.frame() versus using an explicit environment name: parent.frame, as the name suggests, goes "up one level." Often, that is exactly where you want to go, so that works fine. And yet, even when your goal is get an object in an environment further up, R searches up the call stack until it finds the object with the matching name. So very often, parent.frame() is just fine.
HOWEVER if there are multiple calls between where you are invoking parent.frame() and where the object is located AND in one of the intermediary environments there exists another object with the same name, then R will stop at that intermediary environment and return its object, which is not the object you were looking for.
Therefore, parent.frame() has an argument n (which defaults to 1), so that you can tell R to begin it's search at n levels back.
This is the "keeping track" that I refer to, where the developer has to be mindful of the number of calls in between. The straightforward way to go about this is to have an n argument in every function that is calling the function in question, and have that value default to 1. Then for the envir argument, you use: get/assign/eval/etc (.. , envir=parent.frame(n=n) )
Then if you call Func2 from Func1, (both Func1 and Func2 have an n argument), and Func2 is calling prova, you use:
Func1 <- function(x, y, ..., n=1) {
... some stuff ...
Func2( <some, parameters, etc,> n=n+1)
}
Func2 <- function(a, b, c, ..., n=1) {
.... some stuff....
eval(quote(prova()), envir=parent.frame(n=n) )
}
As you can see, it is not complicated but it is * tedious* and sometimes what seems like a bug creeps in, which is simply forgetting to carry the n over.
Therefore, I prefer to use a fixed variable with the environment name.
The solution that I found is:
myattr <- function(x) attr(x, 'myattr')
'myattr<-' <- function(x, value) {
# check that x is a function (e.g. the prova function)
# checks on value (e.g. also value is a function with a given precise signature)
attr(x, 'myattr') <- value
x
}
prova <- function(..., env = parent.frame()) {
# get the current function object (in its environment)
this <- eval(match.call()[[1L]], env)
# print(eval(as.call(c(myattr, this)), env)) # alternative
print(myattr(this))
# print(attr(this, 'myattr')
invisible(TRUE)
}
I want to thank #RicardoSaporta for the help and the clarification about keeping tracks of the calls.
This solution doesn't work when e.g. myattr(prova) <- function() TRUE is nested in func1 while prova is called in func2 (that it's called by func1). Unless you do not properly update its parameter env ...
For completeness, following the suggestion of #RicardoSaporta, I slightly modified the prova function:
prova <- function(..., pos = 1L) {
# get the current function object (in its environment)
this <- eval(match.call()[[1L]], parent.frame(n = pos)
print(myattr(this))
# ...
}
This way, it works also when nested, if the the correct pos parameter is passed in.
With this modification it is easier to go to fish out the environment in which you set the attribute on the function prova.
myfun1 <- function() {
myattr(prova) <- function() print(FALSE)
myfun2(n = 2)
}
myfun2 <- function(n) {
prova(pos = n)
}
myfun1()
# function() print(FALSE)
# <environment: 0x22e8208>

Resources