I have the function below.
test_fun <- function() {
a <- 1
b <- 2
}
Is there a way I run this function and a and b will be assigned in the parent environment (in this case globalenv)? I don't want to modify the function (no assign with envir or <<-), but call it in a way that what I want will be achieved.
A better pattern to use, from the point of encapsulation, might be to have your custom function return a 2D vector containing the a and b values:
test_fun <- function() {
a <- 1
b <- 2
return(c(a, b))
}
result <- test_fun()
a <- result[1]
b <- result[2]
The preferred method for passing information/work done in a function is via the return value, rather than trying to manage the issue of different scopes.
Normally R functions return their outputs. Functions such as test_fun
are discouraged but if you want to do it anyways use trace as shown below. This will cause the code given in the exit argument to run at function exit. No packages are used.
trace("test_fun", exit = quote(list2env(mget(ls()), globalenv())), print = FALSE)
test_fun()
a;b
## [1] 1
## [1] 2
Alternatives
Some alternatives for the exit= argument are the following.
(a) below is similar to above except that rather than leaving the objects loose in the global environment it first creates an environment env in the global environment and puts them there.
In (b) below, the objects are copied to environment(test_fun) which is the environment in which test_fun is defined. (If test_fun is defined in the global environment then it gives the same result as the code at the top of the answer.)
In (c) below, the parent frame of test_fun is the environment of the caller of test_fun and can vary from one call to another if called from different places. (If called from the global environment then it gives the same result as the code at the top of the answer.)
It would also be possible to add the current frame within test_fun to the search path using attach but there are a number of gotchas associated with that so it is not shown.
# (a) copy objects to environment env located in global environment
assign("env", new.env(), globalenv())
trace("test_fun", exit = quote(list2env(mget(ls()), env)), print = FALSE)
# (b) copy objects to environment in which test_fun is defined
trace("test_fun",
exit = quote(list2env(mget(ls()), environment(test_fun)), print = FALSE)
# (c) copy object to parent.frame of test_fun; 5 needed as trace adds layers
trace("test_fun", exit = quote(list2env(mget(ls()), parent.frame(5))),
print = FALSE)
Related
Is there any way to change the set the working environment or workspace in R to a defined environment? I'd like to call the variables in an environment without constantly referring to the environment. I understand that the attach function can be used to access the variable in this manner, but any variables created don't get placed back in the attached environment. The goal is to have all the functions take place in the other environment.
For example:
original.env <- .GlobalEnv
other.env <- new.env()
other.env$A <- 12; other.env$B <- 1.5
set.env(other.env)
C <- A + B
set.env(original.env)
other.env$C
[1] 13.5
It's the step with set.env which I can't figure out if it exists, or there's some other trick to doing this.
The goal of this approach is to allow the same code to be used on different data sets with identical structures in several non-nested environments without constantly calling other environments with the Environment$ prefix, which gets really verbose in many cases.
If the results can also be assigned back to the set environment (as in the global environment, everything has an implicit .GlobalEnv$ in front of any variable) it would make it way easier to access and return multiple values from inside of a function.
Any help is appreciated. Thanks.
You are looking for eval / evalq
From help("evalq")
Evaluate an R expression in a specified environment.
Specifically, evalq noting
The evalq form is equivalent to eval(quote(expr), ...). eval evaluates its first argument in the current scope before passing it to the evaluator: evalq avoids this.
# Therefore you want something like this
evalq(C <- A + B, envir = other.env)
If you want to wrap more than one expression use {} eg
evalq({C <-A + B
d <- 5
}, envir = other.env)
How about using with?
original.env <- .GlobalEnv
other.env <- new.env()
other.env$A <- 12; other.env$B <- 1.5
with(other.env, { C <- A + B ; FOO <- 'bar' })
other.env$C
[1] 13.5
other.env$FOO
[1] "bar"
I've been reading about R environments, and I'm trying to test my understanding with a simple example:
> f <- function() {
+ x <- 1
+ environment(x)
+ }
>
> f()
NULL
I'm assuming this means that the object x is enclosed by the environment named NULL, but when I try to list all the objects in that environment, R displays an error message:
> ls(NULL)
Error in as.environment(pos) : using 'as.environment(NULL)' is defunct
So I'm wondering if there's a built-in function I can use on the command line that will return the environment name given the object name. I tried this:
> environment(x)
Error in environment(x) : object 'x' not found
but that returned an error as well. Any help will be greatly appreciated.
Variables created in function calls are destroyed when the function finishes executing (unless you specifically create them in other persistent environments). As #joran pointed out, when a function is called, a temporary environment is created where local variables are defined, and is destroyed when the function is done executing (that memory is freed). However, as #MrFlick pointed out, if the function returns a function, the returned function maintains a reference to the environment it was created in. You can read more about 'scope', 'stack', and 'heap'. In R there are various ways you can define your variables into specified environments.
f <- function() {
x <<- 1 # create x in the global environment (or change it if it's there)
## or `assign` x to a value
## assign(x, value=1, envir=.GlobalEnv)
}
environment(f) # where was f defined?
exists("x", envir=.GlobalEnv)
# [1] TRUE
The package pryr has some nice functions to do these kind of things. For example, there is a function called where which will give you the environment of an object:
library(pryr)
f <- function() {
x <- 1
where("x")
}
f()
<environment: 0x0000000013356f50>
So the environment of x was the temporary enviroment created by function f(). As people have said before, this enviroment is detroyed after you run the function, so it will give you a different result each time you run f().
Is there any way to throw a warning (and fail..) if a global variable is used within a R function? I think that is much saver and prevents unintended behaviours...e.g.
sUm <- 10
sum <- function(x,y){
sum = x+y
return(sUm)
}
due to the "typo" in return the function will always return 10. Instead of returning the value of sUm it should fail.
My other answer is more about what approach you can take inside your function. Now I'll provide some insight on what to do once your function is defined.
To ensure that your function is not using global variables when it shouldn't be, use the codetools package.
library(codetools)
sUm <- 10
f <- function(x, y) {
sum = x + y
return(sUm)
}
checkUsage(f)
This will print the message:
<anonymous> local variable ‘sum’ assigned but may not be used (:1)
To see if any global variables were used in your function, you can compare the output of the findGlobals() function with the variables in the global environment.
> findGlobals(f)
[1] "{" "+" "=" "return" "sUm"
> intersect(findGlobals(f), ls(envir=.GlobalEnv))
[1] "sUm"
That tells you that the global variable sUm was used inside f() when it probably shouldn't have been.
There is no way to permanently change how variables are resolved because that would break a lot of functions. The behavior you don't like is actually very useful in many cases.
If a variable is not found in a function, R will check the environment where the function was defined for such a variable. You can change this environment with the environment() function. For example
environment(sum) <- baseenv()
sum(4,5)
# Error in sum(4, 5) : object 'sUm' not found
This works because baseenv() points to the "base" environment which is empty. However, note that you don't have access to other functions with this method
myfun<-function(x,y) {x+y}
sum <- function(x,y){sum = myfun(x+y); return(sUm)}
environment(sum)<-baseenv()
sum(4,5)
# Error in sum(4, 5) : could not find function "myfun"
because in a functional language such as R, functions are just regular variables that are also scoped in the environment in which they are defined and would not be available in the base environment.
You would manually have to change the environment for each function you write. Again, there is no way to change this default behavior because many of the base R functions and functions defined in packages rely on this behavior.
Using get is a way:
sUm <- 10
sum <- function(x,y){
sum <- x+y
#with inherits = FALSE below the variable is only searched
#in the specified environment in the envir argument below
get('sUm', envir = environment(), inherits=FALSE)
}
Output:
> sum(1,6)
Error in get("sUm", envir = environment(), inherits = FALSE) :
object 'sUm' not found
Having the right sum in the get function would still only look inside the function's environment for the variable, meaning that if there were two variables, one inside the function and one in the global environment with the same name, the function would always look for the variable inside the function's environment and never at the global environment:
sum <- 10
sum2 <- function(x,y){
sum <- x+y
get('sum', envir = environment(), inherits=FALSE)
}
> sum2(1,7)
[1] 8
You can check whether the variable's name appears in the list of global variables. Note that this is imperfect if the global variable in question has the same name as an argument to your function.
if (deparse(substitute(var)) %in% ls(envir=.GlobalEnv))
stop("Do not use a global variable!")
The stop() function will halt execution of the function and display the given error message.
Another way (or style) is to keep all global variables in a special environment:
with( globals <- new.env(), {
# here define all "global variables"
sUm <- 10
mEan <- 5
})
# or add a variable by using $
globals$another_one <- 42
Then the function won't be able to get them:
sum <- function(x,y){
sum = x+y
return(sUm)
}
sum(1,2)
# Error in sum(1, 2) : object 'sUm' not found
But you can always use them with globals$:
globals$sUm
[1] 10
To manage the discipline, you can check if there is any global variable (except functions) outside of globals:
setdiff(ls(), union(lsf.str(), "globals")))
I just finished reading about scoping in the R intro, and am very curious about the <<- assignment.
The manual showed one (very interesting) example for <<-, which I feel I understood. What I am still missing is the context of when this can be useful.
So what I would love to read from you are examples (or links to examples) on when the use of <<- can be interesting/useful. What might be the dangers of using it (it looks easy to loose track of), and any tips you might feel like sharing.
<<- is most useful in conjunction with closures to maintain state. Here's a section from a recent paper of mine:
A closure is a function written by another function. Closures are
so-called because they enclose the environment of the parent
function, and can access all variables and parameters in that
function. This is useful because it allows us to have two levels of
parameters. One level of parameters (the parent) controls how the
function works. The other level (the child) does the work. The
following example shows how can use this idea to generate a family of
power functions. The parent function (power) creates child functions
(square and cube) that actually do the hard work.
power <- function(exponent) {
function(x) x ^ exponent
}
square <- power(2)
square(2) # -> [1] 4
square(4) # -> [1] 16
cube <- power(3)
cube(2) # -> [1] 8
cube(4) # -> [1] 64
The ability to manage variables at two levels also makes it possible to maintain the state across function invocations by allowing a function to modify variables in the environment of its parent. The key to managing variables at different levels is the double arrow assignment operator <<-. Unlike the usual single arrow assignment (<-) that always works on the current level, the double arrow operator can modify variables in parent levels.
This makes it possible to maintain a counter that records how many times a function has been called, as the following example shows. Each time new_counter is run, it creates an environment, initialises the counter i in this environment, and then creates a new function.
new_counter <- function() {
i <- 0
function() {
# do something useful, then ...
i <<- i + 1
i
}
}
The new function is a closure, and its environment is the enclosing environment. When the closures counter_one and counter_two are run, each one modifies the counter in its enclosing environment and then returns the current count.
counter_one <- new_counter()
counter_two <- new_counter()
counter_one() # -> [1] 1
counter_one() # -> [1] 2
counter_two() # -> [1] 1
It helps to think of <<- as equivalent to assign (if you set the inherits parameter in that function to TRUE). The benefit of assign is that it allows you to specify more parameters (e.g. the environment), so I prefer to use assign over <<- in most cases.
Using <<- and assign(x, value, inherits=TRUE) means that "enclosing environments of the supplied environment are searched until the variable 'x' is encountered." In other words, it will keep going through the environments in order until it finds a variable with that name, and it will assign it to that. This can be within the scope of a function, or in the global environment.
In order to understand what these functions do, you need to also understand R environments (e.g. using search).
I regularly use these functions when I'm running a large simulation and I want to save intermediate results. This allows you to create the object outside the scope of the given function or apply loop. That's very helpful, especially if you have any concern about a large loop ending unexpectedly (e.g. a database disconnection), in which case you could lose everything in the process. This would be equivalent to writing your results out to a database or file during a long running process, except that it's storing the results within the R environment instead.
My primary warning with this: be careful because you're now working with global variables, especially when using <<-. That means that you can end up with situations where a function is using an object value from the environment, when you expected it to be using one that was supplied as a parameter. This is one of the main things that functional programming tries to avoid (see side effects). I avoid this problem by assigning my values to a unique variable names (using paste with a set or unique parameters) that are never used within the function, but just used for caching and in case I need to recover later on (or do some meta-analysis on the intermediate results).
One place where I used <<- was in simple GUIs using tcl/tk. Some of the initial examples have it -- as you need to make a distinction between local and global variables for statefullness. See for example
library(tcltk)
demo(tkdensity)
which uses <<-. Otherwise I concur with Marek :) -- a Google search can help.
On this subject I'd like to point out that the <<- operator will behave strangely when applied (incorrectly) within a for loop (there may be other cases too). Given the following code:
fortest <- function() {
mySum <- 0
for (i in c(1, 2, 3)) {
mySum <<- mySum + i
}
mySum
}
you might expect that the function would return the expected sum, 6, but instead it returns 0, with a global variable mySum being created and assigned the value 3. I can't fully explain what is going on here but certainly the body of a for loop is not a new scope 'level'. Instead, it seems that R looks outside of the fortest function, can't find a mySum variable to assign to, so creates one and assigns the value 1, the first time through the loop. On subsequent iterations, the RHS in the assignment must be referring to the (unchanged) inner mySum variable whereas the LHS refers to the global variable. Therefore each iteration overwrites the value of the global variable to that iteration's value of i, hence it has the value 3 on exit from the function.
Hope this helps someone - this stumped me for a couple of hours today! (BTW, just replace <<- with <- and the function works as expected).
f <- function(n, x0) {x <- x0; replicate(n, (function(){x <<- x+rnorm(1)})())}
plot(f(1000,0),typ="l")
The <<- operator can also be useful for Reference Classes when writing Reference Methods. For example:
myRFclass <- setRefClass(Class = "RF",
fields = list(A = "numeric",
B = "numeric",
C = function() A + B))
myRFclass$methods(show = function() cat("A =", A, "B =", B, "C =",C))
myRFclass$methods(changeA = function() A <<- A*B) # note the <<-
obj1 <- myRFclass(A = 2, B = 3)
obj1
# A = 2 B = 3 C = 5
obj1$changeA()
obj1
# A = 6 B = 3 C = 9
I use it in order to change inside map() an object in the global environment.
a = c(1,0,0,1,0,0,0,0)
Say I want to obtain a vector which is c(1,2,3,1,2,3,4,5), that is if there is a 1, let it 1, otherwise add 1 until the next 1.
map(
.x = seq(1,(length(a))),
.f = function(x) {
a[x] <<- ifelse(a[x]==1, a[x], a[x-1]+1)
})
a
[1] 1 2 3 1 2 3 4 5
I have a function in R that structures my raw data. I create a dataframe called output and then want to make a dynamic variable name depending on the function value block.
The output object does contain a dataframe as I want, and to rename it dynamically, at the end of the function I do this (within the function):
a = assign(paste("output", block, sep=""), output)
... but after running the function there is no object output1 (if block = 1). I simply cannot retrieve the output object, neither merely output nor the dynamic output1 version.
I tried this then:
a = assign(paste("output", block, sep=""), output)
return(a)
... but still - no success.
How can I retrieve the dynamic output variable? Where is my mistake?
Environments.
assign will by default create a variable in the environment in which it's called. Read about environments here: http://adv-r.had.co.nz/Environments.html
I assume you're doing something like:
foo <- function(x){ assign("b", x); b}
If you run foo(5), you'll see it returns 5 as expected (implying that b was created successfully somewhere), but b won't exist in your current environment.
If, however, you do something like this
foo <- function(x){ assign("b", x, envir=parent.frame()); b}
Here, you're assigning not to the current environment at the time assign is called (which happens to be foo's environment). Instead, you're assigning into the parent environment (which, since you're calling this function directly, will be your environment).
All this complexity should reveal to you that this will be fairly complex, a nightmare to maintain, and a really bad idea from a maintenance perspective. You'd surely be better off with something like:
foo <- function(x) { return(x) };
b <- foo(5)
Or if you need multiple items returned:
foo <- function(x) { return(list(df=data.frame(col1=x), b=x)) }
results <- foo(5)
df <- results$df
b <- results$b
But ours is not to reason why...