Scoping assignment and local, bound and global variable in R - r

I am new to R and trying to figure out behavior of local,bound and global variables. I am confused with the following problem. If I write the function in the following way, then what are the local, bound and global variables of function f?
f <- function(a ="") {
return_a <- function() a
set_a <- function(x)
a <<- x
list(return_a,set_a)
}

return_a is a function. set_a is a function. They are both functional objects (with associated environments, but using the word "variable" to describe them seems prone to confusion. If you call f, you get a list of twofunctions. When you create a list, there are not necessarily names to the list so p$set_a("Carl") throws an error because there is no p[['set_a']].
> p <- f("Justin"); p$set_a("Carl")
Error: attempt to apply non-function
But p[[2]] now returns a function and you need to call it:
> p[[2]]
function(x)
a <<- x
<environment: 0x3664f6a28>
> p[[2]]("Carl")
That did change the value of the symbol-a in the environment of p[[1]]:
> p[[1]]()
[1] "Carl"

Related

In R, why is passing ls as a function argument differ from passing ls() as a function argument?

f <- function(x) {
a <- 1
x
}
f(ls())
In the above code, the call to f(ls()) will print out the variables in the global environment.
But:
f <- function(x) {
a <- 1
x()
}
f(ls)
will print out the variables in the environment of the function f, namely "a" and "x".
In the first case you pass the results of ls() - which is a vector of all objects in the environment. And your function just prints what was passed with x, namely - the results of ls().
In the second case you pass a function and the function get's executed within the function body.
Basically you can think about the first version of your call as:
x <- ls()
f(x)
As an additional example: look at the difference between print(ls()) and print(ls).

Why does my R function have knowledge of variables that are not given as arguments? [duplicate]

Is there any way to throw a warning (and fail..) if a global variable is used within a R function? I think that is much saver and prevents unintended behaviours...e.g.
sUm <- 10
sum <- function(x,y){
sum = x+y
return(sUm)
}
due to the "typo" in return the function will always return 10. Instead of returning the value of sUm it should fail.
My other answer is more about what approach you can take inside your function. Now I'll provide some insight on what to do once your function is defined.
To ensure that your function is not using global variables when it shouldn't be, use the codetools package.
library(codetools)
sUm <- 10
f <- function(x, y) {
sum = x + y
return(sUm)
}
checkUsage(f)
This will print the message:
<anonymous> local variable ‘sum’ assigned but may not be used (:1)
To see if any global variables were used in your function, you can compare the output of the findGlobals() function with the variables in the global environment.
> findGlobals(f)
[1] "{" "+" "=" "return" "sUm"
> intersect(findGlobals(f), ls(envir=.GlobalEnv))
[1] "sUm"
That tells you that the global variable sUm was used inside f() when it probably shouldn't have been.
There is no way to permanently change how variables are resolved because that would break a lot of functions. The behavior you don't like is actually very useful in many cases.
If a variable is not found in a function, R will check the environment where the function was defined for such a variable. You can change this environment with the environment() function. For example
environment(sum) <- baseenv()
sum(4,5)
# Error in sum(4, 5) : object 'sUm' not found
This works because baseenv() points to the "base" environment which is empty. However, note that you don't have access to other functions with this method
myfun<-function(x,y) {x+y}
sum <- function(x,y){sum = myfun(x+y); return(sUm)}
environment(sum)<-baseenv()
sum(4,5)
# Error in sum(4, 5) : could not find function "myfun"
because in a functional language such as R, functions are just regular variables that are also scoped in the environment in which they are defined and would not be available in the base environment.
You would manually have to change the environment for each function you write. Again, there is no way to change this default behavior because many of the base R functions and functions defined in packages rely on this behavior.
Using get is a way:
sUm <- 10
sum <- function(x,y){
sum <- x+y
#with inherits = FALSE below the variable is only searched
#in the specified environment in the envir argument below
get('sUm', envir = environment(), inherits=FALSE)
}
Output:
> sum(1,6)
Error in get("sUm", envir = environment(), inherits = FALSE) :
object 'sUm' not found
Having the right sum in the get function would still only look inside the function's environment for the variable, meaning that if there were two variables, one inside the function and one in the global environment with the same name, the function would always look for the variable inside the function's environment and never at the global environment:
sum <- 10
sum2 <- function(x,y){
sum <- x+y
get('sum', envir = environment(), inherits=FALSE)
}
> sum2(1,7)
[1] 8
You can check whether the variable's name appears in the list of global variables. Note that this is imperfect if the global variable in question has the same name as an argument to your function.
if (deparse(substitute(var)) %in% ls(envir=.GlobalEnv))
stop("Do not use a global variable!")
The stop() function will halt execution of the function and display the given error message.
Another way (or style) is to keep all global variables in a special environment:
with( globals <- new.env(), {
# here define all "global variables"
sUm <- 10
mEan <- 5
})
# or add a variable by using $
globals$another_one <- 42
Then the function won't be able to get them:
sum <- function(x,y){
sum = x+y
return(sUm)
}
sum(1,2)
# Error in sum(1, 2) : object 'sUm' not found
But you can always use them with globals$:
globals$sUm
[1] 10
To manage the discipline, you can check if there is any global variable (except functions) outside of globals:
setdiff(ls(), union(lsf.str(), "globals")))

Which function will identify the name of an R variable's enclosing environment?

I've been reading about R environments, and I'm trying to test my understanding with a simple example:
> f <- function() {
+ x <- 1
+ environment(x)
+ }
>
> f()
NULL
I'm assuming this means that the object x is enclosed by the environment named NULL, but when I try to list all the objects in that environment, R displays an error message:
> ls(NULL)
Error in as.environment(pos) : using 'as.environment(NULL)' is defunct
So I'm wondering if there's a built-in function I can use on the command line that will return the environment name given the object name. I tried this:
> environment(x)
Error in environment(x) : object 'x' not found
but that returned an error as well. Any help will be greatly appreciated.
Variables created in function calls are destroyed when the function finishes executing (unless you specifically create them in other persistent environments). As #joran pointed out, when a function is called, a temporary environment is created where local variables are defined, and is destroyed when the function is done executing (that memory is freed). However, as #MrFlick pointed out, if the function returns a function, the returned function maintains a reference to the environment it was created in. You can read more about 'scope', 'stack', and 'heap'. In R there are various ways you can define your variables into specified environments.
f <- function() {
x <<- 1 # create x in the global environment (or change it if it's there)
## or `assign` x to a value
## assign(x, value=1, envir=.GlobalEnv)
}
environment(f) # where was f defined?
exists("x", envir=.GlobalEnv)
# [1] TRUE
The package pryr has some nice functions to do these kind of things. For example, there is a function called where which will give you the environment of an object:
library(pryr)
f <- function() {
x <- 1
where("x")
}
f()
<environment: 0x0000000013356f50>
So the environment of x was the temporary enviroment created by function f(). As people have said before, this enviroment is detroyed after you run the function, so it will give you a different result each time you run f().

R scoping: disallow global variables in function

Is there any way to throw a warning (and fail..) if a global variable is used within a R function? I think that is much saver and prevents unintended behaviours...e.g.
sUm <- 10
sum <- function(x,y){
sum = x+y
return(sUm)
}
due to the "typo" in return the function will always return 10. Instead of returning the value of sUm it should fail.
My other answer is more about what approach you can take inside your function. Now I'll provide some insight on what to do once your function is defined.
To ensure that your function is not using global variables when it shouldn't be, use the codetools package.
library(codetools)
sUm <- 10
f <- function(x, y) {
sum = x + y
return(sUm)
}
checkUsage(f)
This will print the message:
<anonymous> local variable ‘sum’ assigned but may not be used (:1)
To see if any global variables were used in your function, you can compare the output of the findGlobals() function with the variables in the global environment.
> findGlobals(f)
[1] "{" "+" "=" "return" "sUm"
> intersect(findGlobals(f), ls(envir=.GlobalEnv))
[1] "sUm"
That tells you that the global variable sUm was used inside f() when it probably shouldn't have been.
There is no way to permanently change how variables are resolved because that would break a lot of functions. The behavior you don't like is actually very useful in many cases.
If a variable is not found in a function, R will check the environment where the function was defined for such a variable. You can change this environment with the environment() function. For example
environment(sum) <- baseenv()
sum(4,5)
# Error in sum(4, 5) : object 'sUm' not found
This works because baseenv() points to the "base" environment which is empty. However, note that you don't have access to other functions with this method
myfun<-function(x,y) {x+y}
sum <- function(x,y){sum = myfun(x+y); return(sUm)}
environment(sum)<-baseenv()
sum(4,5)
# Error in sum(4, 5) : could not find function "myfun"
because in a functional language such as R, functions are just regular variables that are also scoped in the environment in which they are defined and would not be available in the base environment.
You would manually have to change the environment for each function you write. Again, there is no way to change this default behavior because many of the base R functions and functions defined in packages rely on this behavior.
Using get is a way:
sUm <- 10
sum <- function(x,y){
sum <- x+y
#with inherits = FALSE below the variable is only searched
#in the specified environment in the envir argument below
get('sUm', envir = environment(), inherits=FALSE)
}
Output:
> sum(1,6)
Error in get("sUm", envir = environment(), inherits = FALSE) :
object 'sUm' not found
Having the right sum in the get function would still only look inside the function's environment for the variable, meaning that if there were two variables, one inside the function and one in the global environment with the same name, the function would always look for the variable inside the function's environment and never at the global environment:
sum <- 10
sum2 <- function(x,y){
sum <- x+y
get('sum', envir = environment(), inherits=FALSE)
}
> sum2(1,7)
[1] 8
You can check whether the variable's name appears in the list of global variables. Note that this is imperfect if the global variable in question has the same name as an argument to your function.
if (deparse(substitute(var)) %in% ls(envir=.GlobalEnv))
stop("Do not use a global variable!")
The stop() function will halt execution of the function and display the given error message.
Another way (or style) is to keep all global variables in a special environment:
with( globals <- new.env(), {
# here define all "global variables"
sUm <- 10
mEan <- 5
})
# or add a variable by using $
globals$another_one <- 42
Then the function won't be able to get them:
sum <- function(x,y){
sum = x+y
return(sUm)
}
sum(1,2)
# Error in sum(1, 2) : object 'sUm' not found
But you can always use them with globals$:
globals$sUm
[1] 10
To manage the discipline, you can check if there is any global variable (except functions) outside of globals:
setdiff(ls(), union(lsf.str(), "globals")))

Nested function environment selection

I am writing some functions for doing repeated tasks, but I am trying to minimize the amount of times I load the data. Basically I have one function that takes some information and makes a plot. Then I have a second function that will loop through and output multiple plots to a .pdf. In both functions I have the following line of code:
if(load.dat) load("myworkspace.RData")
where load.dat is a logical and the data I need is stored in myworkspace.RData. When I am calling the wrapper function that loops through and outputs multiple plots I do not want to reload the workspace in every call to the inner function. I thought I could just load the workspace once in the wrapper function, then the inner function could access that data, but I got an error stating otherwise.
So my understanding was when a function cannot find the variable in its local environment (created when the function gets called), the function will look to the parent environment for the variable.
I assumed the parent environment to the inner function call would be the outer function call. Obviously this is not true:
func1 <- function(...){
print(var1)
}
func2 <- function(...){
var1 <- "hello"
func1(...)
}
> func2()
Error in print(var1) : object 'var1' not found
After reading numerous questions, the language manual, and this really helpful blog post, I came up with the following:
var1 <- "hello"
save(list="var1",file="test.RData")
rm(var1)
func3 <- function(...){
attach("test.RData")
func1(...)
detach("file:test.RData")
}
> func3()
[1] "hello"
Is there a better way to do this? Why doesn't func1 look for undefined variables in the local environment created by func2, when it was func2 that called func1?
Note: I did not know how to name this question. If anyone has better suggestions I will change it and edit this line out.
To illustrate lexical scoping, consider the following:
First let's create a sandbox environment, only to avoid the oh-so-common R_GlobalEnv:
sandbox <-new.env()
Now we put two functions inside it: f, which looks for a variable named x; and g, which defines a local x and calls f:
sandbox$f <- function()
{
value <- if(exists("x")) x else "not found."
cat("This is function f looking for symbol x:", value, "\n")
}
sandbox$g <- function()
{
x <- 123
cat("This is function g. ")
f()
}
Technicality: entering function definitions in the console causes then to have the enclosing environment set to R_GlobalEnv, so we manually force the enclosures of f and g to match the environment where they "belong":
environment(sandbox$f) <- sandbox
environment(sandbox$g) <- sandbox
Calling g. The local variable x=123 is not found by f:
> sandbox$g()
This is function g. This is function f looking for symbol x: not found.
Now we create a x in the global environment and call g. The function f will look for x first in sandbox, and then in the parent of sandbox, which happens to be R_GlobalEnv:
> x <- 456
> sandbox$g()
This is function g. This is function f looking for symbol x: 456
Just to check that f looks for x first in its enclosure, we can put a x there and call g:
> sandbox$x <- 789
> sandbox$g()
This is function g. This is function f looking for symbol x: 789
Conclusion: symbol lookup in R follows the chain of enclosing environments, not the evaluation frames created during execution of nested function calls.
EDIT: Just adding a link to this very interesting answer from Martin Morgan on the related subject of parent.frame() vs parent.env()
You could use closures:
f2 <- function(...){
f1 <- function(...){
print(var1)
}
var1 <- "hello"
f1(...)
}
f2()

Resources