I am a newbie for R, and I am quite confused with the usage of local and global variables in R.
I read some posts on the internet that say if I use = or <- I will assign the variable in the current environment, and with <<- I can access a global variable when inside a function.
However, as I remember in C++ local variables arise whenever you declare a variable inside brackets {}, so I'm wondering if this is the same for R? Or is it just for functions in R that we have the concept of local variables.
I did a little experiment, which seems to suggest that only brackets are not enough, am I getting anything wrong?
{
x=matrix(1:10,2,5)
}
print(x[2,2])
[1] 4
Variables declared inside a function are local to that function. For instance:
foo <- function() {
bar <- 1
}
foo()
bar
gives the following error: Error: object 'bar' not found.
If you want to make bar a global variable, you should do:
foo <- function() {
bar <<- 1
}
foo()
bar
In this case bar is accessible from outside the function.
However, unlike C, C++ or many other languages, brackets do not determine the scope of variables. For instance, in the following code snippet:
if (x > 10) {
y <- 0
}
else {
y <- 1
}
y remains accessible after the if-else statement.
As you well say, you can also create nested environments. You can have a look at these two links for understanding how to use them:
http://stat.ethz.ch/R-manual/R-devel/library/base/html/environment.html
http://stat.ethz.ch/R-manual/R-devel/library/base/html/get.html
Here you have a small example:
test.env <- new.env()
assign('var', 100, envir=test.env)
# or simply
test.env$var <- 100
get('var') # var cannot be found since it is not defined in this environment
get('var', envir=test.env) # now it can be found
<- does assignment in the current environment.
When you're inside a function R creates a new environment for you. By default it includes everything from the environment in which it was created so you can use those variables as well but anything new you create will not get written to the global environment.
In most cases <<- will assign to variables already in the global environment or create a variable in the global environment even if you're inside a function. However, it isn't quite as straightforward as that. What it does is checks the parent environment for a variable with the name of interest. If it doesn't find it in your parent environment it goes to the parent of the parent environment (at the time the function was created) and looks there. It continues upward to the global environment and if it isn't found in the global environment it will assign the variable in the global environment.
This might illustrate what is going on.
bar <- "global"
foo <- function(){
bar <- "in foo"
baz <- function(){
bar <- "in baz - before <<-"
bar <<- "in baz - after <<-"
print(bar)
}
print(bar)
baz()
print(bar)
}
> bar
[1] "global"
> foo()
[1] "in foo"
[1] "in baz - before <<-"
[1] "in baz - after <<-"
> bar
[1] "global"
The first time we print bar we haven't called foo yet so it should still be global - this makes sense. The second time we print it's inside of foo before calling baz so the value "in foo" makes sense. The following is where we see what <<- is actually doing. The next value printed is "in baz - before <<-" even though the print statement comes after the <<-. This is because <<- doesn't look in the current environment (unless you're in the global environment in which case <<- acts like <-). So inside of baz the value of bar stays as "in baz - before <<-". Once we call baz the copy of bar inside of foo gets changed to "in baz" but as we can see the global bar is unchanged. This is because the copy of bar that is defined inside of foo is in the parent environment when we created baz so this is the first copy of bar that <<- sees and thus the copy it assigns to. So <<- isn't just directly assigning to the global environment.
<<- is tricky and I wouldn't recommend using it if you can avoid it. If you really want to assign to the global environment you can use the assign function and tell it explicitly that you want to assign globally.
Now I change the <<- to an assign statement and we can see what effect that has:
bar <- "global"
foo <- function(){
bar <- "in foo"
baz <- function(){
assign("bar", "in baz", envir = .GlobalEnv)
}
print(bar)
baz()
print(bar)
}
bar
#[1] "global"
foo()
#[1] "in foo"
#[1] "in foo"
bar
#[1] "in baz"
So both times we print bar inside of foo the value is "in foo" even after calling baz. This is because assign never even considered the copy of bar inside of foo because we told it exactly where to look. However, this time the value of bar in the global environment was changed because we explicitly assigned there.
Now you also asked about creating local variables and you can do that fairly easily as well without creating a function... We just need to use the local function.
bar <- "global"
# local will create a new environment for us to play in
local({
bar <- "local"
print(bar)
})
#[1] "local"
bar
#[1] "global"
A bit more along the same lines
attrs <- {}
attrs.a <- 1
f <- function(d) {
attrs.a <- d
}
f(20)
print(attrs.a)
will print "1"
attrs <- {}
attrs.a <- 1
f <- function(d) {
attrs.a <<- d
}
f(20)
print(attrs.a)
Will print "20"
Related
The Problem
I would like to check whether a function factory in R is "safe". Here "safe" means the results of functions created by the factory depend only on their arguments, not on Global Variables.
Description
This is an unsafe factory:
funfac_bad = function(){
newfun = function()
return(foo)
return(newfun)
}
The return value of newfun will depend on the value of foo at time of execution of newfun. It may even through an error if foo happens to be undefined.
Now - quite obviously - this factory can be made safe by binding foo to a value inside the factory
funfac_good = function(){
foo = 4711
newfun = function()
return(foo)
return(newfun)
}
I thought I could validate safety by checking for Global Variables in the factory. And indeed:
> codetools::findGlobals(funfac_bad)
[1] "{" "=" "foo" "return"
> codetools::findGlobals(funfac_good)
[1] "{" "=" "return"
But my actual use case is (much) more complicated. The functions of the factory depend on subfunctions and variables with hundreds of lines of code. Hence I sourced the definition and my factories in principle look like this:
funfac_my = function(){
sys.source("file_foo.R", envir = environment())
newfun = function()
return(foo)
return(newfun)
}
This is a safe factory if and only if code executed in "file_foo.R" binds the name "foo" to a value.
But (quite logically) codetools::findGlobals will always report "foo" as global variable.
Question
How can I detect unsafe behaviour of such a function factory when definitions are sourced?
Why not just ensure you define a default value for foo locally before sourcing the external files? For example, suppose I have this file:
foo.R
foo <- "file foo"
and this file
bar.R
bar <- "bar"
If I write my function factory like this:
funfac_my <- function(my_path) {
foo <- "fun fac foo"
if(!missing(my_path)) sys.source(my_path, envir = environment())
function() foo
}
Then I get the following results:
foo <- "global foo"
funfac_my("foo.R")()
#> [1] "file foo"
funfac_my("bar.R")()
#> [1] "fun fac foo"
funfac_my()()
#> [1] "fun fac foo"
So the output will simply never depend on whether there is an object in the global environment called "foo", (unless the scripts you are running maliciously look for a global called "foo" to copy - but then that would presumably be what you wanted by sourcing that file anyway)
Note that you could set this up to throw an error instead of returning a default value by adding the line if(foo == "fun fac foo") stop("object 'foo' not found") just before the final line. This will therefore complain that foo is not found even though you have a wrong object called foo in the global workspace.
You ask "How can I detect unsafe behaviour of such a function factory when definitions are sourced?" I think the answer is that you can't, but changing it slightly would make it easy.
For example, suppose currently you have
foo <- undefined_value
as the only line in "file_foo.R", and you want to be warned about the use of undefined_value. My suggestion is that you don't do that. Instead, put the whole definition of funfac_my into "file_foo.R", wrapping that one line:
funfac_my = function(){
foo <- undefined_value
newfun = function()
return(foo)
return(newfun)
}
Now you can source that file, and have a function funfac_my to pass to codetools::findGlobals:
codetools::findGlobals(funfac_my)
#> [1] "{" "<-" "=" "return"
#> [5] "undefined_value"
Is there a way to prevent a promise already under evaluation error when
you want the name of a function argument to be the name of an existing function
and you want to set a default for this particular argument
and you want to be able to call the outer function with its defaults only (i.e. without the need to pass an explicit value to each argument)?
In my example below, while foo(1:5, bar) works, foo(1:5) throws such an error.
Of course I could go and change the argument name from bar to, say, bar_fun, but I would rather stick with the actual function's name if possible.
foo <- function(x, bar = bar) {
bar(x)
}
bar <- function(x) {
UseMethod("bar")
}
bar.default <- function(x) {
sum(x)
}
foo(1:5)
#> Error in foo(1:5): promise already under evaluation: recursive default argument reference or earlier problems?
foo(1:5, bar)
#> [1] 15
Motivation (first order)
The real-world use case is that bar() is actually settings(), a function which returns a list of settings. I'd like to version those settings. So there'd be e.g. methods like settings.v1, settings.v2, ..., and settings.default.
And I thought of using settings.default to define the "runtime version" of settings to use, so e.g.:
settings <- function(x) {
UseMethod("settings")
}
settings.v1 <- function(x) {
list(system = "dev")
}
settings.v2 <- function(x) {
list(system = "production")
}
settings.default <- function(x) {
settings.v2(
}
foo <- function(x, settings = settings) {
settings()
}
foo()
#> Error in foo(): promise already under evaluation: recursive default argument reference or earlier problems?
foo(settings = settings)
#> $system
#> [1] "production"
Since settings.default() calls the settings method I want to use, it'd be great if I could just call foo() with its defaults (which would then always pick up the settings.default() method).
Motivation (second order)
I'm experimenting with adhering more to principles of functional programming (see e.g. chapter from Advanced R or wikipedia link) and its distinction of pure and effectful/side-effecty functions.
Previously, I probably would have implemented settings via some sort of a global variable that thus each foo() had access to, so I could be lazy and not define it as an argument of function foo(), but foo() then would depend on things outside of its scope - which is a very bad thing in FP.
Now I want to at least state the dependency of foo() on my settings by handing it a function that returns the settings values - which is sort of my lazyness at least complying to some extend with top-level FP principles.
Of course the non-lazy (and arguably best) solution would be to carefully state all actual settings dependencies one by one as function arguments in foo(), e.g. foo(settings_system = settings()$system) ;-)
1) Try explicitly getting it from the parent:
foo <- function(x, bar = get("bar", 1)) {
bar(x)
}
2) Another possibility is to use an argument name such as bar.. The user can still write foo(1:15, bar = whatever), e.g. any of these three calls work:
foo <- function(x, bar. = bar) {
bar.(x)
}
foo(1:5)
foo(1:5, bar)
foo(1:5, bar = bar)
I have here a question about why I don't have a problem (it's a nice change of pace). Consider the following
MyFuncs <- (function(){
hidden <- function(){return('ninja')}
foo <- function(){paste(hidden(), 'foo')}
bar <- function(){paste(hidden(), 'bar')}
return(list(foo = foo, bar = bar))
})()
So after sourcing this I have a list object that contains 2 custom functions, foo and bar. Both of these functions reference another function hidden, which is not part of that list. I cannot call hidden, yet foo and bar both work perfectly. Since R has lazy evaluation, I would have expected these to not work. foo's definition references hidden, and when I try to evaluate foo I would expect it to throw an error because hidden cannot be evaluated.
> print(MyFuncs$foo())
[1] "ninja foo"
> print(MyFuncs$bar())
[1] "ninja bar"
> print(MyFuncs$hidden())
Error in print(MyFuncs$hidden()) : attempt to apply non-function
> foo2 <- function(){paste(hidden(), 'foo')}
> print(foo2())
Error in paste(hidden(), "foo") : could not find function "hidden"
As near as I can tell, the functions under MyFuncs are defined as being in their own environment, which is unnamed and not in the search path. Am I coming to one of the fin difference between an environment and a frame?
Note that the environment of all internal functions is the local scope of MyFuncs:
MyFuncs <- (function(){
hidden <- function(){return('ninja')}
foo <- function(){paste(hidden(), 'foo')}
bar <- function(){paste(hidden(), 'bar')}
print(environment()) ## note I added this line
return(list(foo = foo, bar = bar))
})()
Will print (in this case where I've run it):
<environment: 0x7fb74acd00d8>
Additionally:
> environment(MyFuncs$foo)
<environment: 0x7fb74acd00d8>
> environment(MyFuncs$bar)
<environment: 0x7fb74acd00d8>
> environment(get("hidden", environment(MyFuncs$foo)))
<environment: 0x7fb74acd00d8>
> get("hidden", environment(MyFuncs$foo))()
[1] "ninja"
hidden is not evaluated until called by MyFuncs$foo() in the first instance, but since everything is contained in that local function scope there's no reason it can't exist.
Edit I didn't address the lazy evaluation issue explicitly, but as #MrFlick says this is usually applied to function arguments unless you invoke delayedAssign explicitly. hidden is assigned, just not evaluated until it's called from foo or bar. The environment of the function MyFuncs is indeed "hidden" in the sense that it's not on the search path, but this can be changed.
We can create an object that represents this namespace:
> env <- environment(MyFuncs$foo)
> foo()
Error: could not find function "foo"
> get("foo", env)()
[1] "ninja foo"
We can attach it to the search() path:
> attach(env, name="Myfuncs.foo")
> search()
[1] ".GlobalEnv" "Myfuncs.foo" [...]
> foo()
[1] "ninja foo"
> hidden()
[1] "ninja"
And detach it using the name we assigned:
> detach("Myfuncs.foo")
I'm working in a call stack of variable depth that looks like
TopLevelFunction
-> <SomeOtherFunction(s), 1 or more>
-> AssignmentFunction
Now, my goal is to assign a variable created in AssignmentFunction, to the environment of TopLevelFunction. I know I can extract the stack with sys.calls, so my current approach is
# get the call stack and search for TopLevelFunction
depth <- which(stringr::str_detect(as.character(sys.calls()), "TopLevelFunction"))
# assign in TopLevelFunction's environment
assign(varName, varValue, envir = sys.frame(depth))
I'm more or less fine with that, though I am not sure if that's a good idea to convert call objects to character vectors. Is that approach error-prone? More generally, how would you search for a specific parent environment, knowing only the name of the function?
A fn like this
get_toplevel_env <- function(env) {
if (identical(parent.env(env), globalenv())) {
env
} else {
get_toplevel_env(parent.env(env))
}
}
And use it within any level of your nested-functions like this?
get_toplevel_env(as.environment(-1))
I'm not sure if I understood correctly what you want to do, but, woulnd't it work to use parent.env(as.environment(-1))?
In this example it seems to work.
fn1 <- function() {
fn1.1 <- function(){
assign("parentvar", "PARENT",
envir = parent.env(as.environment(-1)))
}
fn1.1()
print(parentvar)
}
fn1()
Maybe other possibility is to use <<-, which assigns in the global environment, I think. But maybe that's not what you want.
I have a function within a loaded library that stops the evaluation on its arguments using the substitute function. This function then calls another within that same library, which calls another function from that library, and so forth, until several calls later when I want to evaluated that initial argument in the original environment in which it was provided. The problem I have is that the search path for functions in loaded libraries includes namespace::base before the global environment. For example, let's say that foo and bar are both functions in the library lib. As such, the environment in which they are defined is namespace::lib. Consider the following:
> require(lib)
> foo
function (x)
{
x <- substitute(x)
bar(x)
}
<environment: namespace:lib>
> bar
function (x)
{
eval(x)
}
<environment: namespace:lib>
> length = 2
> foo(length)
function (x) .Primitive("length")
Because bar is a function within a loaded library, it searches namespace::base first and finds the above. However, if bar was defined by the user in the interactive session, it would have returned 2. I am looking for a way to cause these functions to behave as if I never halted evaluation, in which case 2 would be returned regardless of the environment in which the functions are defined.
I can't simply use mget to evaluate length starting at .GlobalEnv because then the following would not work:
> baz = function()
+ {
+ length <- 3
+ foo(length)
+ }
> baz()
function (x) .Primitive("length")
I could instead add an extra argument to all involved functions that tracks how many frames ago the evaluation was halted. However, this is pretty messy and not ideal.
I could also call sys.function inside the last function, bar, and trace my way back through the previous calls and evaluate my argument in the environment above the function that halted the evaluation. For example, if I call sys.function(1) within the body of bar after calling foo(length) then I get the following:
function (x)
{
eval(x)
}
This is indeed identical to foo. I can then use eval with sys.frames. This seems more general but less than perfect. I would have to know which functions stop evaluation.
Does anyone have a more general solution?
Does adding a default enviroment to these function help with the problem?
lib<-new.env()
assign("foo", function(x, env=parent.frame()) {
x<-substitute(x);
bar(x, env)
}, envir=lib)
assign("bar", function(x, env=parent.frame()) {
eval(x, env)
}, envir=lib)
attach(lib)
length = 2
foo(length)
# [1] 2
baz <- function() {
length <- 3
foo(length)
}
baz()
# [1] 3
bar(expression(baz()))
# [1] 3
If not, perhaps you could make a clearer, reproducible example with function calls and your expected output. Otherwise it's unclear where you are having trouble.