Mutating a variable in a closure [duplicate] - r

This question already has answers here:
Global and local variables in R
(3 answers)
Closed 8 years ago.
I'm pretty new to R, but coming from Scheme—which is also lexically scoped and has closures—I would expect being able to mutate outer variables in a closure.
E.g., in
foo <- function() {
s <- 100
add <- function() {
s <- s + 1
}
add()
s
}
cat(foo(), "\n") # prints 100 and not 101
I would expect foo() to return 101, but it actually returns 100:
$ Rscript foo.R
100
I know that Python has the global keyword to declare scope of variables (doesn't work with this example, though). Does R need something similar?
What am I doing wrong?
Update
Ah, is the problem that in add I am creating a new, local variable s that shadows the outer s? If so, how can I mutate s without creating a local variable?

Use the <<- operator for assignment in the add() function.
From ?"<<-":
The operators <<- and ->> are normally only used in functions, and cause a search to made through parent environments for an existing definition of the variable being assigned. If such a variable is found (and its binding is not locked) then its value is redefined, otherwise assignment takes place in the global environment. Note that their semantics differ from that in the S language, but are useful in conjunction with the scoping rules of R. See ‘The R Language Definition’ manual for further details and examples.

You can also use assign and define the scope precisely using the envir argument, works the same way as <<- in your add function in this case but makes your intention a little more clear:
foo <- function() {
s <- 100
add <- function() {
assign("s", s + 1, envir = parent.frame())
}
add()
s
}
cat(foo(), "\n")
Of course the better way for this kind of thing in R is to have your function return the variable (or variables) it modifies and explicitly reassigning them to the original variable:
foo <- function() {
s <- 100
add <- function(x) x + 1
s <- add(s)
s
}
cat(foo(), "\n")

Here is one more approach that can be a little safer than the assign or <<- approaches:
foo <- function() {
e <- environment()
s <- 100
add <- function() {
e$s <- e$s + 1
}
add()
s
}
foo()
The <<- assignment can cause problems if you accidentally misspell your variable name, it will still do something, but it will not be what you are expecting and can be hard to find the source of the problem. The assign approach can be tricky if you then want to move your add function to inside another function, or call it from another function. The best approach overall is to not have the functions modify variables outside their own scope and have the function return anything that is important. But when that is not possible, the above method uses lexical scoping to access the environment e, then assigns into the environment so it will always assign specifically into that function, never above or below.

Related

R: IF object is TRUE then assign object NOT WORKING

I am trying to write a very basic IF statement in R and am stuck. I thought I'd find someone with the same problem, but I cant. Im sorry if this has been solved before.
I want to check if a variable/object has been assigned, IF TRUE I want to execute a function that is part of a R-package. First I wrote
FileAssignment <- function(x){
if(exists("x")==TRUE){
print("yes!")
x <- parse.vdjtools(x)
} else { print("Nope!")}
}
I assign a filename as x
FILENAME <- "FILENAME.txt"
I run the function
FileAssignment(FILENAME)
I use print("yes!") and print("Nope!") to check if the IF-Statement works, and it does. However, the parse.vdjtools(x) part is not assigned. Now I tested the same IF-statement outside of the function:
if(exists("FILENAME1")==TRUE){
FILENAME1 <- parse.vdjtools(FILENAME1)
}
This works. I read here that it might be because the function uses {} and the if-statement does too. So I should remove the brackets from the if-statement.
FileAssignment <- function(x){
if(exists("x")==TRUE)
x <- parse.vdjtools(x)
else { print("Nope!")
}
Did not work either.
I thought it might be related to the specific parse.vdjtools(x) function, so I just tried assigning a normal value to x with x <- 20. Also did not work inside the function, however, it does outside.
I dont really know what you are trying to acheive, but I wpuld say that the use of exists in this context is wrong. There is no way that the x cannot exist inside the function. See this example
# All this does is report if x exists
f <- function(x){
if(exists("x"))
cat("Found x!", fill = TRUE)
}
f()
f("a")
f(iris)
# All will be found!
Investigate file.exists instead? This is vectorised, so a vector of files can be investigated at the same time.
The question that you are asking is less trivial than you seem to believe. There are two points that should be addressed to obtain the desired behavior, and especially the first one is somewhat tricky:
As pointed out by #NJBurgo and #KonradRudolph the variable x will always exist within the function since it is an argument of the function. In your case the function exists() should therefore not check whether the variable x is defined. Instead, it should be used to verify whether a variable with a name corresponding to the character string stored in x exists.
This is achieved by using a combination of deparse() and
substitute():
if (exists(deparse(substitute(x)))) { …
Since x is defined only within the scope of the function, the superassignment operator <<- would be required to make a value assigned to x visible outside the function, as suggested by #thothai. However, functions should not have such side effects. Problems with this kind of programming include possible conflicts with another variable named x that could be defined in a different context outside the function body, as well as a lack of clarity concerning the operations performed by the function.
A better way is to return the value instead of assigning it to a variable.
Combining these two aspects, the function could be rewritten like this:
FileAssignment <- function(x){
if (exists(deparse(substitute(x)))) {
print("yes!")
return(parse.vdjtools(x))
} else {
print("Nope!")
return(NULL)}
}
In this version of the function, the scope of x is limited to the function body and the function has no side effects. The return value of FileAssignment(a) is either parse.vdjtools(a) or NULL, depending on whether a exists or not.
Outside the function, this value can be assigned to x with
x <- FileAssignment(a)

lapply() emptied list step by step while processing

First of all, excuse me for the bad title. I'm still so confused about this behavior, that I wasn't able to describe it; however I was able to reproduce it and broke it down to an (goofy) example.
Please, could you be so kind and explain why other.list appears to be full of NULLs after calling lapply()?
some.list <- rep(list(rnorm(1)),33)
other.list <- rep(list(), length = 33)
lapply(seq_along(some.list), function(i, other.list) {
other.list[[i]] <- some.list[[i]]
browser()
}, other.list)
I watched this in debugging mode in RStudio. For certain i, other.list[[i]] gets some.list[[i]] assigned, but it will be NULLed for the next iteration. I want to understand this behavior so bad!
The reason is that the assignment is taking place inside a function, and you've used the normal assignment operator <-, rather than the superassignment operator <<-. When inside a function scope, IOW when a function is executed, the normal assignment operator always assigns to a local variable in the evaluation environment that is created for that particular evaluation of that function (returned by a call to environment() from inside the function with fun=NULL). Thus, your global other.list variable, which is defined in the global environment (returned by globalenv()), will not be touched by such an assignment. The superassignment operator, on the other hand, will follow the closure environment chain (can be followed recursively via parent.env()) back until it finds a variable with the name on the LHS of the assignment, and then it assigns to that. The global environment is always at the base of the closure environment chain. If no such variable is found, the superassignment operator creates one in the global environment.
Thus, if you change <- to <<- in the assignment that takes place inside the function, you will be able to modify the global other.list variable.
See https://stat.ethz.ch/R-manual/R-devel/library/base/html/assignOps.html.
Here, I tried to make a little demo to demonstrate these concepts. In all my assignments, I'm assigning the actual environment that contains the variable being assigned to:
oldGlobal <- environment(); ## environment() is same as globalenv() in global scope
(function() {
newLocal1 <- environment(); ## creates a new local variable in this function evaluation's evaluation environment
print(newLocal1); ## <environment: 0x6014cbca8> (different for every evaluation)
oldGlobal <<- parent.env(environment()); ## target search hits oldGlobal in closure environment; RHS is same as globalenv()
newGlobal1 <<- globalenv(); ## target search fails; creates a new variable in the global environment
(function() {
newLocal2 <- environment(); ## creates a new local variable in this function evaluation's evaluation environment
print(newLocal2); ## <environment: 0x6014d2160> (different for every evaluation)
newLocal1 <<- parent.env(environment()); ## target search hits the existing newLocal1 in closure environment
print(newLocal1); ## same value that was already in newLocal1
oldGlobal <<- parent.env(parent.env(environment())); ## target search hits oldGlobal two closure environments up in the chain; RHS is same as globalenv()
newGlobal2 <<- globalenv(); ## target search fails; creates a new variable in the global environment
})();
})();
oldGlobal; ## <environment: R_GlobalEnv>
newGlobal1; ## <environment: R_GlobalEnv>
newGlobal2; ## <environment: R_GlobalEnv>
I haven't run your code, but two observations:
I usually avoid putting browser() as the last line inside a function because that gets treated as the return value
other.list does not get modified by your lapply. You need to understand the basics of environments and that any bindings you make inside lapply do not hold outside of it. It's a design feature and the whole point is that lapply can't have side effects - you should only use its return value. You can either use the <<- operator instead of <- though I don't recommend that, or you can use the assign function instead. Or you can do it properly the way lapply is meant to be used:
others.list <- lapply(seq_along(some.list), function(i, other.list) {
some.list[[i]]
})
Note that it's generally recommended to not make assignments inside lapply that change variables outside of it. lapply is meant to perform a function on every element and return a list, and that list should be all that lapply is used for

R: how to replicate the <<- assignment with assign()?

I need to assign a variable in a function, whose name is a parameter of the function, and I need to access it later on, outside the function. I think <<- would do it in another situation, but since the name of the variable is dynamic, I think I need assign().
What I have at the moment
assignVar = function(varname) {
assign(varname,"blabla")
}
assignVar("foo")
foo
returns the usual Error: object 'foo' not found.
Is there some option to the assign function, or any other secret weapon, that I could use to do that? I have looked at the documentation, but I am still very confused about environments... (R beginner).
It's really important to understand the behavior of <<- before you use it. Once you understand that behavior, you will also note that you cannot use assign to achieve its behavior (at least not very easily). Here's a simple example:
a <- 1
f1 <- function(){
a <<- 2
NULL
}
f1()
a
# [1] 2
a <- 1
f2 <- function(){
a <- 2
f3 <- function(){
a <<- 3
}
f3()
NULL
}
f2()
a
# [1] 1
<<- works by assigning into the parent environment, if possible. If there is no existing a object (in this example) there, it goes up another level and repeats this until it reaches the global environment, where it will ultimately assign if no other lower environment worked. So in the above examples, f1() results in a change to the global environment but f2() does not. If you comment out the a<-2 line of f2, you do get a change in the global environment.
To achieve the same behavior using assign, you would need to write a much more complex function that loops through parent environments until it reaches the global environment. Regardless, having functions produce side effects is generally discouraged (due to introducing often unnecessary complexity to code and particularly when those side effects occur in the global environment).
I highly suggest you don't do this:
assignVar = function(varname, envir = globalenv()) {
assign(varname, "blabla", envir=envir)
invisible(NULL)
}
assignVar("foo", envir=globalenv())
foo
[1] "blabla"

R: Storing data within a function and retrieving without using "return"

The following simple example will help me address a problem in my program implementation.
fun2<-function(j)
{
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
Prod<-prod(x,y)
return(Sum)
}
j=1:10
Try<-lapply(j,fun2)
#
I want to store "Prod" at each iteration so I can access it after running the function fun2. I tried using assign() to create space assign("Prod",numeric(10),pos=1)
and then assigning Prod at j-th iteration to Prod[j] but it does not work.
#
Any idea how this can be done?
Thank you
You can add anything you like in the return() command. You could return a list return(list(Sum,Prod)) or a data frame return(data.frame("In"=j,"Sum"=Sum,"Prod"=Prod))
I would then convert that list of data.frames into a single data.frame
Try2 <- do.call(rbind,Try)
Maybe re-think the problem in a more vectorized way, taking advantage of the implied symmetry to represent intermediate values as a matrix and operating on that
ni = 10; nj = 20
x = matrix(rnorm(ni * nj), ni)
y = matrix(runif(ni * nj), ni)
sums = colSums(x + y)
prods = apply(x * y, 2, prod)
Thinking about the vectorized version is as applicable to whatever your 'real' problem is as it is to the sum / prod example; in practice and when thinking in terms of vectors fails I've never used the environment or concatenation approaches in other answers, but rather the simple solution of returning a list or vector.
I have done this before, and it works. Good for a quick fix, but its kind of sloppy. The <<- operator assigns outside the function to the global environment.
fun2<-function(j){
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
Prod[j]<<-prod(x,y)
}
j=1:10
Prod <- numeric(length(j))
Try<-lapply(j,fun2)
Prod
thelatemail and JeremyS's solutions are probably what you want. Using lists is the normal way to pass back a bunch of different data items and I would encourage you to use it. Quoted here so no one thinks I'm advocating the direct option.
return(list(Sum,Prod))
Having said that, suppose that you really don't want to pass them back, you could also put them directly in the parent environment from within the function using either assign or the superassignment operator. This practice can be looked down on by functional programming purists, but it does work. This is basically what you were originally trying to do.
Here's the superassignment version
fun2<-function(j)
{
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
Prod[j] <<- prod(x,y)
return(Sum)
}
j=1:10
Prod <- numeric(10)
Try<-lapply(j,fun2)
Note that the superassignment searches back for the first environment in which the variable exists and modifies it there. It's not appropriate for creating new variables above where you are.
And an example version using the environment directly
fun2<-function(j,env)
{
x<-rnorm(10)
y<-runif(10)
Sum<-sum(x,y)
env$Prod[j] <- prod(x,y)
return(Sum)
}
j=1:10
Prod <- numeric(10)
Try<-lapply(j,fun2,env=parent.frame())
Notice that if you had called parent.frame() from within the function you would need to go back two frames because lapply() creates its own. This approach has the advantage that you could pass it any environment you want instead of parent.frame() and the value would be modified there. This is the seldom-used R implementation of writeable passing by reference. It's safer than superassignment because you know where the variable is that is being modified.

Get variables that have been created inside a function

I have created a function (which is quite long) that I have saved in a .txt file.
It works well (I use source(< >) to access it).
My problem is that I have created a few variables in that function
ie:
myfun<-function(a,b) {
Var1=....
Var2=Var1 + ..
}
Now I want to get those variables.
When I include return() inside the function, its fine: the value comes up on the screen, but when I type Var1 outside the function, I have an error message "the object cannot be found".
I am new to R, but I was thinking it might be because "myfun" operates in a different envrionment than the global one, but when I did
environment()
environment: R_GlobalEnv>
environment(myfun1)
environment: R_GlobalEnv>
It seems to me the problem is elsewhere...
Any idea?
Thanks
I realize this answer is more than 3 years old but I believe the option you are looking for is as follows:
myfun <- function(a,b) {
Var1 = (a + b) / 2 # do whatever logic you have to do here...
Var2 <<- Var1 + a # then output result to Global Environment with the "<<-" object.
}
The double "<<-" assignment operator will output "Var2" to the global environment and you can then use or reference it however you like without having to use "return()" inside your function.
If you want to do it in a nice way, write a class and than provide a print method. Within this class it is possible to return variables invisible. A nice book which covers such topics is "The Art of R programming".
An easy fix would be save each variable you need later on an list and than return a list
(as Peter pointed out):
return(list(VAR1=VAR1, .....))

Resources