I am writing some functions for doing repeated tasks, but I am trying to minimize the amount of times I load the data. Basically I have one function that takes some information and makes a plot. Then I have a second function that will loop through and output multiple plots to a .pdf. In both functions I have the following line of code:
if(load.dat) load("myworkspace.RData")
where load.dat is a logical and the data I need is stored in myworkspace.RData. When I am calling the wrapper function that loops through and outputs multiple plots I do not want to reload the workspace in every call to the inner function. I thought I could just load the workspace once in the wrapper function, then the inner function could access that data, but I got an error stating otherwise.
So my understanding was when a function cannot find the variable in its local environment (created when the function gets called), the function will look to the parent environment for the variable.
I assumed the parent environment to the inner function call would be the outer function call. Obviously this is not true:
func1 <- function(...){
print(var1)
}
func2 <- function(...){
var1 <- "hello"
func1(...)
}
> func2()
Error in print(var1) : object 'var1' not found
After reading numerous questions, the language manual, and this really helpful blog post, I came up with the following:
var1 <- "hello"
save(list="var1",file="test.RData")
rm(var1)
func3 <- function(...){
attach("test.RData")
func1(...)
detach("file:test.RData")
}
> func3()
[1] "hello"
Is there a better way to do this? Why doesn't func1 look for undefined variables in the local environment created by func2, when it was func2 that called func1?
Note: I did not know how to name this question. If anyone has better suggestions I will change it and edit this line out.
To illustrate lexical scoping, consider the following:
First let's create a sandbox environment, only to avoid the oh-so-common R_GlobalEnv:
sandbox <-new.env()
Now we put two functions inside it: f, which looks for a variable named x; and g, which defines a local x and calls f:
sandbox$f <- function()
{
value <- if(exists("x")) x else "not found."
cat("This is function f looking for symbol x:", value, "\n")
}
sandbox$g <- function()
{
x <- 123
cat("This is function g. ")
f()
}
Technicality: entering function definitions in the console causes then to have the enclosing environment set to R_GlobalEnv, so we manually force the enclosures of f and g to match the environment where they "belong":
environment(sandbox$f) <- sandbox
environment(sandbox$g) <- sandbox
Calling g. The local variable x=123 is not found by f:
> sandbox$g()
This is function g. This is function f looking for symbol x: not found.
Now we create a x in the global environment and call g. The function f will look for x first in sandbox, and then in the parent of sandbox, which happens to be R_GlobalEnv:
> x <- 456
> sandbox$g()
This is function g. This is function f looking for symbol x: 456
Just to check that f looks for x first in its enclosure, we can put a x there and call g:
> sandbox$x <- 789
> sandbox$g()
This is function g. This is function f looking for symbol x: 789
Conclusion: symbol lookup in R follows the chain of enclosing environments, not the evaluation frames created during execution of nested function calls.
EDIT: Just adding a link to this very interesting answer from Martin Morgan on the related subject of parent.frame() vs parent.env()
You could use closures:
f2 <- function(...){
f1 <- function(...){
print(var1)
}
var1 <- "hello"
f1(...)
}
f2()
Related
I have the function below.
test_fun <- function() {
a <- 1
b <- 2
}
Is there a way I run this function and a and b will be assigned in the parent environment (in this case globalenv)? I don't want to modify the function (no assign with envir or <<-), but call it in a way that what I want will be achieved.
A better pattern to use, from the point of encapsulation, might be to have your custom function return a 2D vector containing the a and b values:
test_fun <- function() {
a <- 1
b <- 2
return(c(a, b))
}
result <- test_fun()
a <- result[1]
b <- result[2]
The preferred method for passing information/work done in a function is via the return value, rather than trying to manage the issue of different scopes.
Normally R functions return their outputs. Functions such as test_fun
are discouraged but if you want to do it anyways use trace as shown below. This will cause the code given in the exit argument to run at function exit. No packages are used.
trace("test_fun", exit = quote(list2env(mget(ls()), globalenv())), print = FALSE)
test_fun()
a;b
## [1] 1
## [1] 2
Alternatives
Some alternatives for the exit= argument are the following.
(a) below is similar to above except that rather than leaving the objects loose in the global environment it first creates an environment env in the global environment and puts them there.
In (b) below, the objects are copied to environment(test_fun) which is the environment in which test_fun is defined. (If test_fun is defined in the global environment then it gives the same result as the code at the top of the answer.)
In (c) below, the parent frame of test_fun is the environment of the caller of test_fun and can vary from one call to another if called from different places. (If called from the global environment then it gives the same result as the code at the top of the answer.)
It would also be possible to add the current frame within test_fun to the search path using attach but there are a number of gotchas associated with that so it is not shown.
# (a) copy objects to environment env located in global environment
assign("env", new.env(), globalenv())
trace("test_fun", exit = quote(list2env(mget(ls()), env)), print = FALSE)
# (b) copy objects to environment in which test_fun is defined
trace("test_fun",
exit = quote(list2env(mget(ls()), environment(test_fun)), print = FALSE)
# (c) copy object to parent.frame of test_fun; 5 needed as trace adds layers
trace("test_fun", exit = quote(list2env(mget(ls()), parent.frame(5))),
print = FALSE)
I've been reading about R environments, and I'm trying to test my understanding with a simple example:
> f <- function() {
+ x <- 1
+ environment(x)
+ }
>
> f()
NULL
I'm assuming this means that the object x is enclosed by the environment named NULL, but when I try to list all the objects in that environment, R displays an error message:
> ls(NULL)
Error in as.environment(pos) : using 'as.environment(NULL)' is defunct
So I'm wondering if there's a built-in function I can use on the command line that will return the environment name given the object name. I tried this:
> environment(x)
Error in environment(x) : object 'x' not found
but that returned an error as well. Any help will be greatly appreciated.
Variables created in function calls are destroyed when the function finishes executing (unless you specifically create them in other persistent environments). As #joran pointed out, when a function is called, a temporary environment is created where local variables are defined, and is destroyed when the function is done executing (that memory is freed). However, as #MrFlick pointed out, if the function returns a function, the returned function maintains a reference to the environment it was created in. You can read more about 'scope', 'stack', and 'heap'. In R there are various ways you can define your variables into specified environments.
f <- function() {
x <<- 1 # create x in the global environment (or change it if it's there)
## or `assign` x to a value
## assign(x, value=1, envir=.GlobalEnv)
}
environment(f) # where was f defined?
exists("x", envir=.GlobalEnv)
# [1] TRUE
The package pryr has some nice functions to do these kind of things. For example, there is a function called where which will give you the environment of an object:
library(pryr)
f <- function() {
x <- 1
where("x")
}
f()
<environment: 0x0000000013356f50>
So the environment of x was the temporary enviroment created by function f(). As people have said before, this enviroment is detroyed after you run the function, so it will give you a different result each time you run f().
I am new to R and trying to figure out behavior of local,bound and global variables. I am confused with the following problem. If I write the function in the following way, then what are the local, bound and global variables of function f?
f <- function(a ="") {
return_a <- function() a
set_a <- function(x)
a <<- x
list(return_a,set_a)
}
return_a is a function. set_a is a function. They are both functional objects (with associated environments, but using the word "variable" to describe them seems prone to confusion. If you call f, you get a list of twofunctions. When you create a list, there are not necessarily names to the list so p$set_a("Carl") throws an error because there is no p[['set_a']].
> p <- f("Justin"); p$set_a("Carl")
Error: attempt to apply non-function
But p[[2]] now returns a function and you need to call it:
> p[[2]]
function(x)
a <<- x
<environment: 0x3664f6a28>
> p[[2]]("Carl")
That did change the value of the symbol-a in the environment of p[[1]]:
> p[[1]]()
[1] "Carl"
I have a function in R that structures my raw data. I create a dataframe called output and then want to make a dynamic variable name depending on the function value block.
The output object does contain a dataframe as I want, and to rename it dynamically, at the end of the function I do this (within the function):
a = assign(paste("output", block, sep=""), output)
... but after running the function there is no object output1 (if block = 1). I simply cannot retrieve the output object, neither merely output nor the dynamic output1 version.
I tried this then:
a = assign(paste("output", block, sep=""), output)
return(a)
... but still - no success.
How can I retrieve the dynamic output variable? Where is my mistake?
Environments.
assign will by default create a variable in the environment in which it's called. Read about environments here: http://adv-r.had.co.nz/Environments.html
I assume you're doing something like:
foo <- function(x){ assign("b", x); b}
If you run foo(5), you'll see it returns 5 as expected (implying that b was created successfully somewhere), but b won't exist in your current environment.
If, however, you do something like this
foo <- function(x){ assign("b", x, envir=parent.frame()); b}
Here, you're assigning not to the current environment at the time assign is called (which happens to be foo's environment). Instead, you're assigning into the parent environment (which, since you're calling this function directly, will be your environment).
All this complexity should reveal to you that this will be fairly complex, a nightmare to maintain, and a really bad idea from a maintenance perspective. You'd surely be better off with something like:
foo <- function(x) { return(x) };
b <- foo(5)
Or if you need multiple items returned:
foo <- function(x) { return(list(df=data.frame(col1=x), b=x)) }
results <- foo(5)
df <- results$df
b <- results$b
But ours is not to reason why...
For the moment, at least, this is an exercise in learning for me, so the actual functions or their complexity is not the issue. Suppose I write a function whose argument list includes some input variables and a function name, passed as a string. This function then calculates some variables internally and "decides" how to feed them to the function name I've passed in.
For nonprimitive functions, I can do (for this example, assume non of my funcname functions have any arguments other than at most (x,y,z). If they did, I'd have to write some code to search for matching names(formals(get(funcname))) so as not to delete the other arguments):
foo <- function (a,b,funcname) {
x <- 2*a
y <- a+3*b
z <- -b
formals(get(funcname)) <- list(x=x, y=y, z=z)
bar <- get(funcname)()
return(bar)
}
And the nice thing is, even if the function funcname will execute without error even if it doesn't use x, y or z (so long as there are no other args that don't have defaults) .
The problem with "primitive" functions is I don't know any way to find or modify their formals. Other than writing a wrapper, e.g. foosin <-function(x) sin(x), is there a way to set up my foo function to work with both primitive and nonprimitive function names as input arguments?
formals(args(FUN)) can be used to get the formals of a primitive function.
You could add an if statement to your existing function.
> formals(sum)
# NULL
> foo2 <- function(x) {
if(is.primitive(x)) formals(args(x)) else formals(x)
## formals(if(is.primitive(x)) args(x) else x) is another option
}
> foo2(sum)
# $...
#
#
# $na.rm
# [1] FALSE
#
> foo2(with)
# $data
#
#
# $expr
#
#
# $...
Building on Richard S' response, I ended up doing the following. Posted just in case anyone else ever tries do things as weird as I do.
EDIT: I think more type-checking needs to be done. It's possible that coleqn could be
the name of an object, in which case get(coleqn) will return some data. Probably I need
to add a if(is.function(rab)) right after the if(!is.null(rab)). (Of course, given that I wrote the function for my own needs, if I was stupid enough to pass an object, I deserve what I get :-) ).
# "coleqn" is the input argument, which is a string that could be either a function
# name or an expression.
rab<-tryCatch(get(coleqn),error=function(x) {} )
#oops, rab can easily be neither NULL nor a closure. Damn.
if(!is.null(rab)) {
# I believe this means it must be a function
# thanks to Richard Scriven of SO for this fix to handle primitives
# we are not allowed to redefine primitive's formals.
qq <- list(x=x,y=y,z=z)
# matchup the actual formals names
# by building a list of valid arguments to pass to do.call
argk<-NULL
argnames<-names(formals(args(coleqn)))
for(j in 1:length(argnames) )
argk[j]<-which(names(qq)==argnames[1] )
arglist<-list()
for(j in 1:length(qq) )
if(!is.na(argk[j])) arglist[[names(qq)[j]]]<-qq[[j]]
colvar<- do.call(coleqn,arglist)
} else {
# the input is just an expression (string), not a function
colvar <- eval(parse(text=coleqn))
}
The result is an object generated either by the expression or the function just created, using variables internal to the main function (which is not shown in this snippet)