R function for obtaining a reference to a variable - r

In Advanced R, environments are advertised as a useful way to get pass-by-reference semantics in R: instead of passing a list, which gets copied, I can pass an environment, which is not. This is useful to know.
But it assumes that whoever is calling my function is happy to agree on an "environment"-based data type, with named slots corresponding to the variables we want to modify.
Hasn't someone made a class which allows me to just refer to a single variable by reference? For example,
v = 1:5
r <- ref(v)
(function() {
getRef(r) # same as v
setRef(r, 1:6) # same as v <<- 1:6, in this case
})()
It would seem to be pretty easy to do this, by storing the character name of v together with the environment where it is bound.
Is there a standard library which accomplishes this semantics, or can someone provide a short snippet of code? (I haven't finished reading "Advanced R"; apologies if this is covered later in the book)

As you have already mentioned in your question, you can store the variable name and its environment and access it with get and assign what will be somehow like a reference to a single variable.
v <- 1:5
r <- list(name="v", env=environment())
(function() {
get(r$name, envir = r$env)
assign(r$name, 1:6, envir = r$env)
})()
v
#[1] 1 2 3 4 5 6
Alternatively you can store the reference to an environment but then you can access everything in this referenced environment.
v <- 1:5
r <- globalenv() #reference to everything in globalenv
(function() {
r$v
r$v <- 1:6
})()
v
#[1] 1 2 3 4 5 6
You can also create an environment with only one variable and make a reference to it.
v <- new.env(parent=emptyenv())
v$v <- 1:5
r <- v
(function() {
r$v
r$v <- 1:6
})()
v$v
#[1] 1 2 3 4 5 6
Implemented as functions using find or set the environment during creation. Have also a look at How to get environment of a variable in R.
ref <- function(name, envir = NULL) {
name <- substitute(name)
if (!is.character(name)) name <- deparse(name)
if(length(envir)==0) envir <- as.environment(find(name))
list(name=name, envir=envir)
}
getRef <- function(r) {
get(r$name, envir = r$envir, inherits = FALSE)
}
setRef <- function(r, x) {
assign(r$name, x, envir = r$envir, inherits = FALSE)
}
x <- 1
r1 <- ref(x) #x from Global Environment
#x from Function Environment
r2 <- (function() {x <- 2; ref(x, environment())})()
#But simply returning x might here be better
r2b <- (function() {x <- 2; x})()
a <- new.env(parent=emptyenv())
a$x <- 3
r3 <- ref(x, a) #x from Environment a

This is based on GKi's answer, thanks to him for stepping up.
It includes pryr::where so you don't have to install the whole library
Note that we need to point "where" to parent.frame() in the definition of "ref"
Added some test cases which I used to check correctness
The code:
# copy/modified from pryr::where
where = function(name, env=parent.frame()) {
if (identical(env, emptyenv())) {
stop("Can't find ", name, call. = FALSE)
}
if (exists(name, env, inherits = FALSE)) {
env
} else {
where(name, parent.env(env))
}
}
ref <- function(v) {
arg <- deparse(substitute(v))
list(name=arg, env=where(arg, env=parent.frame()))
}
getRef <- function(r) {
get(r$name, envir = r$env, inherits = FALSE)
}
setRef <- function(r, x) {
assign(r$name, x, envir = r$env)
}
if(1) { # tests
v <- 1:5
r <- ref(v)
(function() {
stopifnot(identical(getRef(r),1:5))
setRef(r, 1:6)
})()
stopifnot(identical(v,1:6))
# this refers to v in the global environment
v=2; r=(function() {ref(v)})()
stopifnot(getRef(r)==2)
setRef(r,5)
stopifnot(getRef(r)==5)
stopifnot(v==5)
# same as above
v=2; r=(function() {v <<- 3; ref(v)})()
stopifnot(getRef(r)==3)
setRef(r,5)
stopifnot(getRef(r)==5)
stopifnot(v==5)
# this creates a local binding first, and refers to that. the
# global binding is unaffected
v=2; r=(function() {v=3; ref(v)})()
stopifnot(getRef(r)==3)
setRef(r,5)
stopifnot(getRef(r)==5)
stopifnot(v==2)
# additional tests
r=(function() {v=4; (function(v1) { ref(v1) })(v)})()
stopifnot(r$name=="v1")
stopifnot(getRef(r)==4)
setRef(r,5)
stopifnot(getRef(r)==5)
# check that outer v is not modified
v=2; r=(function() {(function(v1) { ref(v1) })(v)})()
stopifnot(getRef(r)==2)
setRef(r,5)
stopifnot(getRef(r)==5)
stopifnot(v==2)
}
I imagine there may be some garbage collection inefficiency if you're creating a reference to a small variable in a temporary environment with a different large variable, since the reference must retain the whole environment - although the same problem could arise with other uses of lexical scoping.
I will probably use this code next time I need pass-by-reference semantics.

Related

Can I access the last computed result before a function 'stop's?

Consider this code:
bad_function <- function() {
# a lot of code
x <- 1
stop("error")
}
tryCatch(bad_function(), error = function(cond) {x})
Obviously, x is not accessible in the error handler. But is there another way to access the value of x without changing bad_function? Alternatively, is there a way to patch bad_function to skip over stop("error") and return x without having to copy all that # a lot of code?
This works if the result you are looking for is named (and the you know the name - here, x):
bad_function <- function() {
# a lot of code
x <- 1
stop("error")
}
.old_stop <- base::stopifnot
.new_stop <- function(...) {
parent.frame()$x
}
assignInNamespace("stop", .new_stop, "base")
bad_function()
assignInNamespace("stop", .old_stop, "base")
I still wonder if there are better solutions.
You could assign the value simultaneously to x in the function environment, as well to another x in an external say debug environment that you defined beforehand.
ev1 <- new.env()
bad_function <- function() {
env <- new.env(parent=baseenv())
# a lot of code
x <- ev1$x <- 1
stop("error")
}
tryCatch(bad_function(), error = function(e) ev1$x)
# [1] 1
The advantage is that .GlobalEnv stays clear (apart from the environment of course).
ls()
# [1] "bad_function" "ev1"

Define function in environment that changes an object in the environment

I would like to write a function that returns an environment containing a function which assigns the value of an object inside the environment. For example, what I want to do is:
makeenv <- function() {
e <- new.env(parent = .GlobalEnv)
e$x <- 0
e$setx <- function(k) { e$x <- k } # NOT OK
e
}
I would like to fix the e$setx function above. The behavior of the above is weird to me:
e1 <- makeenv()
e1$x
## [1] 0
e1$setx
## function(k) e$x <- k
## <environment: 0x7f96144d8240>
e1$setx(3) # Strangely, this works.
e1$x
## [1] 3
# --------- clone ------------
e2 <- new.env(parent = .GlobalEnv)
e2$x <- e1$x
e2$setx <- e1$setx
e2$x
## [1] 3
# ----- e2$setx() changes e1$x -----
e2$setx(7) # HERE
e2$x # e2$x is not changed.
## [1] 3
e1$x # e1$x is changed instead.
## [1] 7
Could someone please help me understand what is going on here? I especially don't understand why e2$setx(7) sets e1$x to 7 rather than issuing an error. I think I am doing something very wrong here.
I would also like to write a correct function e$setx inside the makeenv function that correctly assigns a value to the x object in the environment e. Would it be possible to have one without using S4 or R6 classes? I know that a function like setx <- function(e,k) { e$x <- k } works, but to me e1$setx(5) looks more intuitive than setx(e1,5) and I would like to investigate this possibility first. Is it possible to have something like e$setx <- function(k) { self$x <- k }, say, where self refers to the e preceding the $?
This page The equivalent of 'this' or 'self' in R looks relevant, but I like to have the effect without using S4 or R6. Or am I trying to do something impossible? Thank you.
You can use local to evaluate the function definition in the environment:
local(etx <- function (k) e$x <- k, envir = e)
Alternatively, you can also change the function’s environment after the fact:
e$setx <- function(k) e$x <- k
environment(e$setx) <- e
… but neither is strictly necessary in your case, since you probably don’t need to create a brand new environment. Instead, you can reuse the current calling environment. Doing this is a very common pattern:
makeenv <- function() {
e <- environment()
x <- 0
setx <- function(k) e$x <- k
e
}
Instead of e$x <- k you could also write e <<- k; that way, you don’t need the e variable at all:
makeenv <- function() {
x <- 0
setx <- function(k) x <<- k
environment()
}
… however, I actually recommend against this: <<- is error-prone because it looks for assignment targets in all parent environments; and if it can’t find any, it creates a new variable in the global environment. It’s better to explicitly specify the assignment target.
However, note that none of the above changes the observed semantics of your code: when you copy the function into a new environment, it retains its old environment! If you want to “move over” the function, you explicitly need to reassign its environment:
e2$setx <- e1$setx
environment(e2$setx) <- e2
… of course writing that entire code manually is pretty error-prone. If you want to create value semantics (“deep copy” semantics) for an environment in R, you should wrap this functionality into a function:
copyenv <- function (e) {
new_env <- list2env(as.list(e, all.names = TRUE), parent = parent.env(e), hash = TRUE)
new_env$e <- new_env
environment(new_env$setx) <- new_env
new_env
}
e2 <- copyenv(e1)
Note that the copyenv function is not trying to be general; it needs to be adapted for other environment structures. There is no good, general way of writing a deep-copy function for environments that handles all cases, since a general function can’t know how to handle self-references (i.e. the e in the above): in your case, you want to preserve self-references (i.e. change them to point to the new environment). But in other cases, the reference might need to point to something else.
This is a general problem of deep copying. That’s why the R serialize function, for instance, has refHook parameter that tells the function how to serialise environment references.

Access enclosed variables through outer function called within closure

The following code is supposed to change the value of the enclosed variable some.var calling the function set.var. The latter calls itself the outer function g, whose environment is changed to the parent environment of set.var
new.obj <- function(){
some.var = NULL
set.var <- function(...) {
environment(g) <- parent.frame()
g(x="some.var", ...)
}
get.var <- function(){some.var}
return(list(set.var=set.var, get.var=get.var))
}
g <- function(x) assign(x,1)
However, obj<-new.obj(); obj$set.var(); obj$get.var() returns NULL. What goes wrong here and how can the behaviour be fixed?
The reason why I am considering this construction is, that I would like to reuse the code within g in different closures. Hence, it should be placed outside of these.
I suspect that you are making things more complicated than necessary. Take a look at this example and comment if you need to achieve something that this cannot do:
do_the_job <- function(x) {
return(x * 1000)
}
wrapper_function <- function(y) {
return( do_the_job(y) )
}
my_value <- 5
my_new_value <- wrapper_function(my_value)
Not entirely sure what I'm doing here myself.
I hope it helps/inspires you:
new.obj <- function(){
env1 <- new.env()
env1$some.var = NULL
f <- function() {
environment(g) <- parent.frame()
g(x="some.var", envir = env1)
}
get.var <- function(){ env1$some.var }
return(list(f=f, get.var=get.var))
}
g <- function(x, ...) assign(x, "hihi_changed", envir = ...)
obj<-new.obj(); obj$get.var() null is returned as we expect.
obj$f(); obj$get.var() function g is called eventually that changes some.var.
My trick is to add the variable some.var to a new environment and always refer to that env1 environment.
So in function g(), always use the ellipsis to refer to the new.obj environment env1, where currently some.var lives.
Hope this keeps you going.

Evaluate an expression into a given environment

I would like to call a function inside a given environment so that the execution of the function does not mess the global environment. I have something like that:
f <- function() { x <<- 5 }
e <- new.env()
evalq(f(), envir = e)
I want to evaluate f() inside the environment e, but it seems it does not work since the variable x is accessible from the global env instead of the env e. I mean I want to access x by typing e$x, not x.
I tried to build the function f inside e like this:
e <- new.env()
evalq(f <- function() { x <<- 5 }, envir = e)
evalq(f(), envir = e)
But it still does not work, x is in the global env, not in e.
EDIT with more informations:
From the answers I received, I understand that <<- should not be used. The problem is that, in practice, I don't know what f is, since it is obtained by sourcing a file which is downloaded through a shiny app. The purpose of this app is to evaluate a classifier function f written by students. So in case they used the <<- operator in their function, I don't want to pollute my global env (which seem to be impossible from the description of the <<- operator).
I think I could remove the potentially created variables with something like
vars <- ls()
evalq(f(), envir = e)
new_vars <- ls()
rm(list = new_vars[!new_vars %in% vars])
But is there a better solution?
You can use assign to define the environment where the object should be created:
e <- new.env()
f <- function() {
assign("x", 5, envir = e)
}
f()
x
# Error: object 'x' not found
e$x
# [1] 5
As #lmo says, the simplest way would be to not use <<- to do this. Also, the function runs in its own environment so if you were to use <- you would need to assign the output of the function to a variable (otherwise it is only created in the f's temporary environment. Something like this would work:
f <- function() { x <- 5 }
e <- new.env()
evalq(x <- f(), envir = e)
e$x
#[1] 5

Convenience function for exporting objects to the global environment

UPDATE: I have added a variant
of Roland's implementation to the kimisc package.
Is there a convenience function for exporting objects to the global environment, which can be called from a function to make objects available globally?
I'm looking for something like
export(obj.a, obj.b)
which would behave like
assign("obj.a", obj.a, .GlobalEnv)
assign("obj.b", obj.b, .GlobalEnv)
Rationale
I am aware of <<- and assign. I need this to refactor oldish code which is simply a concatenation of scripts:
input("script1.R")
input("script2.R")
input("script3.R")
script2.R uses results from script1.R, and script3.R potentially uses results from both 1 and 2. This creates a heavily polluted namespace, and I wanted to change each script
pollute <- the(namespace)
useful <- result
to
(function() {
pollute <- the(namespace)
useful <- result
export(useful)
})()
as a first cheap countermeasure.
Simply write a wrapper:
myexport <- function(...) {
arg.list <- list(...)
names <- all.names(match.call())[-1]
for (i in seq_along(names)) assign(names[i],arg.list[[i]],.GlobalEnv)
}
fun <- function(a) {
ttt <- a+1
ttt2 <- a+2
myexport(ttt,ttt2)
return(a)
}
print(ttt)
#object not found error
fun(2)
#[1] 2
print(ttt)
#[1] 3
print(ttt2)
#[1] 4
Not tested thoroughly and not sure how "safe" that is.
You can create an environment variable and use it within your export function. For example:
env <- .GlobalEnv ## better here to create a new one :new.env()
exportx <- function(x)
{
x <- x+1
env$y <- x
}
exportx(3)
y
[1] 4
For example , If you want to define a global options(emulate the classic R options) in your package ,
my.options <- new.env()
setOption1 <- function(value) my.options$Option1 <- value
EDIT after OP clarification:
You can use evalq which take 2 arguments :
envir the environment in which expr is to be evaluated
enclos where R looks for objects not found in envir.
Here an example:
env.script1 <- new.env()
env.script2 <- new.env()
evalq({
x <- 2
p <- 3
z <- 5
} ,envir = env.script1,enclos=.GlobalEnv)
evalq({
h <- x +2
} ,envir = env.script2,enclos=myenv.script1)`
You can see that all variable are created within the environnment ( like local)
env.script2$h
[1] 4
env.script1$p
[1] 3
> env.script1$x
[1] 2
First, given your use case, I don't see how an export function is any better than using good (?) old-fashioned <<-. You could just do
(function() {
pollute <- the(namespace)
useful <<- result
})()
which will give the same result as what's in your example.
Second, rather than anonymous functions, it seems better form to use local, which allows you to run involved computations without littering your workspace with various temporary objects.
local({
pollute <- the(namespace)
useful <<- result
})
ETA: If it's important for whatever reason to avoid modifying an existing variable called useful, put an exists check in there. The same applies to the other solutions presented.
local({
.....
useful <- result
if(!exists("useful", globalenv())) useful <<- useful
})

Resources