What is the difference between assign() and <<- in R? - r

The normal approach to writing functions in R (as I understand) is to avoid side-effects and return a value from a function.
contained <- function(x) {
x_squared <- x^2
return(x_squared)
}
In this case, the value computed from the input into the function is returned. But the variable x_squared is not available.
But if you need to violate this basic functional programming tenet (and I'm not sure how serious R is about this issue) and return an object from a function, you have two choices.
escape <- function(x){
x_squared <<- x^2
assign("x_times_x", x*x, envir = .GlobalEnv)
}
Both objects x_squared and x_times_x are returned. Is one method preferable to the other and why so?

Thomas Lumley answers this in a superb post on r-help the other day. <<- is about the enclosing environment so you can do thing like this (and again, I quote his post from April 22 in this thread):
make.accumulator<-function(){
a <- 0
function(x) {
a <<- a + x
a
}
}
> f<-make.accumulator()
> f(1)
[1] 1
> f(1)
[1] 2
> f(11)
[1] 13
> f(11)
[1] 24
This is a legitimate use of <<- as "super-assignment" with lexical scope. And not simply to assign in the global environment. For that, Thomas has these choice words:
The Evil and Wrong use is to modify
variables in the global environment.
Very good advice.

According to the manual page here,
The operators <<- and ->> cause a search to made through the environment for an existing definition of the variable being assigned.
I've never had to do this in practice, but to my mind, assign wins a lot of points for specifying the environment exactly, without even having to think about R's scoping rules. The <<- performs a search through environments and is therefore a little bit harder to interpret.
EDIT: In deference to #Dirk and #Hadley, it sounds like assign is the appropriate way to actually assign to the global environment (when that's what you know you want), while <<- is the appropriate way to "bump up" to a broader scope.

As pointed out by #John in his answer, assign lets you specify the environment specifically. A specific application would be in the following:
testfn <- function(x){
x_squared <- NULL
escape <- function(x){
x_squared <<- x^2
assign("x_times_x", x*x, envir = parent.frame(n = 1))
}
escape(x)
print(x_squared)
print(x_times_x)
}
where we use both <<- and assign. Notice that if you want to use <<- to assign to the environment of the top level function, you need to declare/initialise the variable. However, with assign you can use parent.frame(1) to specify the encapsulating environment.

Related

equivalent of within(), attach() etc. for working within an environment?

I want to add an environment to a search path and modify the values of variables within that environment, in a limited chunk of code, without having to specify the name of the environment every time I refer to a variable: for example, given the environment
ee <- list2env(list(x=1,y=2))
Now I would like to do stuff like
ee$x <- ee$x+1
ee$y <- ee$y*2
ee$z <- 6
but without appending ee$ to everything (or using assign("x", ee$x+1, ee) ... etc.): something like
in_environment(ee, {
x <- x+1
y <- y+2
z <- 6
})
Most of the solutions I can think of are explicitly designed not to modify the environment, e.g.
?attach: "The database is not actually attached. Rather, a new environment
is created on the search path ..."
within(): takes lists or data frames (not environments) "... and makes the corresponding modifications to a copy of ‘data’"
There are two problems with <<-: (1) using it will cause NOTEs in CRAN checks (I think? can't find direct evidence of this, but e.g. see here — maybe this only happens because of the appearance of assigning to a locally undefined symbol? I guess I could put this in a package and test it with --as-cran to confirm ...); (2) it will try to assign in the parent environment, which in a package context [which this is] will be locked ...
I suppose I could use a closure as described in section 10.7 of the Introduction to R by doing
clfun <- function() {
x <- 1
y <- 2
function(...) {
x <<- x + 1
y <<- y * 2
}
}
myfun <- clfun()
This seems convoluted (but I guess not too bad?) but:
will still incur problem #1 (CRAN check?).
I think (??) it won't work with variables that don't already exist in the environment (would need an explicit assign() for that ...)
doesn't allow a choice of which environment to operate in - it's necessarily going to work in the enclosing environment, not with arbitrary environment ee
Am I missing something obvious and idiomatic?
Thanks to #Nuclear03020704 ! I think with() was what I wanted all along; I was incorrectly assuming that it would also create a local copy of the environment, but it only does this if the data argument is not already an environment.
ee <- list2env(list(x=1,y=2))
with(ee, {
x <- x+1
y <- y+2
z <- 6
})
does exactly that I want.
Just had another idea, which also seems to have some drawbacks: using a big eval clause. Rather than make my question a long laundry list of unsatisfactory solutions, I'll add it here.
myfun <- function() {
eval(quote( {
x <- x+1
y <- y*2
z <- 3
}), envir=ee)
}
This does seem to work, but also seems very weird/mysterious! I hate to think about explaining it to someone who's being using R for less than 10 years ... I suppose I could write an in_environment() based on this, but I'd have to be very careful to capture the expression properly without evaluating it ...
What about with()? From here,
with(data, expr)
data is the data to use for constructing an environment. For the default with method this may be an environment, a list, a data frame, or an integer.
expr is the expression to evaluate.
with is a generic function that evaluates expr in a local environment constructed from data. The environment has the caller's environment as its parent. This is useful for simplifying calls to modeling functions. (Note: if data is already an environment then this is used with its existing parent.)
Note that assignments within expr take place in the constructed environment and not in the user's workspace.
with() returns value of the evaluated expr.
ee <- list2env(list(x=1,y=2))
with(ee, {
x <- x+1
y <- y+2
z <- 6
})

Mixing standard and lazy evaluation of function arguments

Actual question
How can I mix standard and lazy evaluation of function arguments while giving the user a unified and simple syntax when calling my functions?
Background
I'm a huge fan of dplyr, but what I don't quite like about it is that it makes you use distinguish function names (e.g. select vs. select_) and that it makes you think too much of how to write function calls when you want to express your arguments as a "mixed bag": some are expressed as character strings, for others you want lazy evaluation, for yet others you want standard evaluation. Also see John Mount's blog post on wrapr for another example of where it becomes overly complex to do a simple thing due to standard vs. lazy evaluation.
Example
This is the simplest way of writing my dyplyr::select expression that I know of
x <- "disp"
select_(mtcars, "mpg", ~cyl, x)
After playing around, here's a draft of the solution I'm after:
select2 <- function(dat, ...) {
args <- substitute(list(...))
## Express names as character //
idx <- which(sapply(args, class) == "name")[-1]
## We don't care about the first one as it's going to be
## substituted anyway
if (length(idx)) {
for (ii in idx) args[[ii]] <- as.character(args[[ii]])
}
## Ensure `c()` //
args[[1]] <- quote(c)
## Standard eval for variables containing actual column name //
idx <- which(!eval(args) %in% names(dat)) + 1
if (length(idx)) {
for (ii in idx) args[[ii]] <- eval(as.name(args[[ii]]))
}
## Indexing expression //
exprsn <- substitute(dat[, J], list(J = eval(args)))
eval(exprsn)
}
x <- "disp"
(select2(mtcars, "mpg", cyl, x))
It works, but of course it's very poorly implemented with regard to efficiency ;-)
To make it better and to understand more with regard to evaluation in R, in particular I'd like to know how to get rid of the for loops and how I could best leverage existing functionality of the dplyr and lazyevalpackages as well as base-R functionality like do.call("[.data.frame", ...), with() or the like. Especially the indexing and assignment methods ("[.*" and "<-.*") and how to call them directly are still kind of a mystery for me.

Stop function evaluation using another function in R

I did a test with nested return function in R, but without success. I came from Mathematica, where this code works well. Here is a toy code:
fstop <- function(x){
if(x>0) return(return("Positive Number"))
}
f <- function(x){
fstop(x)
"Negative or Zero Number"
}
If I evaluate f(1), I get:
[1] "Negative or Zero Number"
When I expected just:
[1] "Positive Number"
The question is: there is some non-standard evaluation I can do in fstop, so I can have just fstop result, without change f function?
PS: I know I can put the if direct inside f, but in my real case the structure is not so simple, and this structure would make my code simpler.
Going to stick my neck out and say...
No.
Making a function return not to its caller but to its caller's caller would involve changing its execution context. This is how things like return and other control-flow things are implemented in the source. See:
https://github.com/wch/r-source/blob/trunk/src/main/context.c
Now, I don't think R level code has access to execution contexts like this. Maybe you could write some C level code that could do it, but its not clear. You could always write a do_return_return function in the style of do_return in eval.c and build a custom version of R... Its not worth it.
So the answer is most likely "no".
I think Spacedman is right, but if you're willing to evaluate your expressions in a wrapper, then it is possible by leveraging the tryCatch mechanism to break out of the evaluation stack.
First, we need to define a special RETURN function:
RETURN <- function(x) {
cond <- simpleCondition("") # dummy message required
class(cond) <- c("specialReturn", class(cond))
attr(cond, "value") <- x
signalCondition(cond)
}
Then we re-write your functions to use our new RETURN:
f <- function(x) {
fstop(x)
"Negative or Zero"
}
fstop <- function(x) if(x > 0) RETURN("Positive Number") # Note `RETURN` not `return`
Finally, we need the wrapper function (wsr here stands for "with special return") to evaluate our expressions:
wsr <- function(x) {
tryCatch(
eval(substitute(x), envir=parent.frame()),
specialReturn=function(e) attr(e, "value")
) }
Then:
wsr(f(-5))
# [1] "Negative or Zero"
wsr(f(5))
# [1] "Positive Number"
Obviously this is a little hacky, but in day to day use would be not much different than evaluating expressions in with or calling code with source. One shortcoming is this will always return to the level you call wsr from.

Use of the <<- operator in R [duplicate]

I just finished reading about scoping in the R intro, and am very curious about the <<- assignment.
The manual showed one (very interesting) example for <<-, which I feel I understood. What I am still missing is the context of when this can be useful.
So what I would love to read from you are examples (or links to examples) on when the use of <<- can be interesting/useful. What might be the dangers of using it (it looks easy to loose track of), and any tips you might feel like sharing.
<<- is most useful in conjunction with closures to maintain state. Here's a section from a recent paper of mine:
A closure is a function written by another function. Closures are
so-called because they enclose the environment of the parent
function, and can access all variables and parameters in that
function. This is useful because it allows us to have two levels of
parameters. One level of parameters (the parent) controls how the
function works. The other level (the child) does the work. The
following example shows how can use this idea to generate a family of
power functions. The parent function (power) creates child functions
(square and cube) that actually do the hard work.
power <- function(exponent) {
function(x) x ^ exponent
}
square <- power(2)
square(2) # -> [1] 4
square(4) # -> [1] 16
cube <- power(3)
cube(2) # -> [1] 8
cube(4) # -> [1] 64
The ability to manage variables at two levels also makes it possible to maintain the state across function invocations by allowing a function to modify variables in the environment of its parent. The key to managing variables at different levels is the double arrow assignment operator <<-. Unlike the usual single arrow assignment (<-) that always works on the current level, the double arrow operator can modify variables in parent levels.
This makes it possible to maintain a counter that records how many times a function has been called, as the following example shows. Each time new_counter is run, it creates an environment, initialises the counter i in this environment, and then creates a new function.
new_counter <- function() {
i <- 0
function() {
# do something useful, then ...
i <<- i + 1
i
}
}
The new function is a closure, and its environment is the enclosing environment. When the closures counter_one and counter_two are run, each one modifies the counter in its enclosing environment and then returns the current count.
counter_one <- new_counter()
counter_two <- new_counter()
counter_one() # -> [1] 1
counter_one() # -> [1] 2
counter_two() # -> [1] 1
It helps to think of <<- as equivalent to assign (if you set the inherits parameter in that function to TRUE). The benefit of assign is that it allows you to specify more parameters (e.g. the environment), so I prefer to use assign over <<- in most cases.
Using <<- and assign(x, value, inherits=TRUE) means that "enclosing environments of the supplied environment are searched until the variable 'x' is encountered." In other words, it will keep going through the environments in order until it finds a variable with that name, and it will assign it to that. This can be within the scope of a function, or in the global environment.
In order to understand what these functions do, you need to also understand R environments (e.g. using search).
I regularly use these functions when I'm running a large simulation and I want to save intermediate results. This allows you to create the object outside the scope of the given function or apply loop. That's very helpful, especially if you have any concern about a large loop ending unexpectedly (e.g. a database disconnection), in which case you could lose everything in the process. This would be equivalent to writing your results out to a database or file during a long running process, except that it's storing the results within the R environment instead.
My primary warning with this: be careful because you're now working with global variables, especially when using <<-. That means that you can end up with situations where a function is using an object value from the environment, when you expected it to be using one that was supplied as a parameter. This is one of the main things that functional programming tries to avoid (see side effects). I avoid this problem by assigning my values to a unique variable names (using paste with a set or unique parameters) that are never used within the function, but just used for caching and in case I need to recover later on (or do some meta-analysis on the intermediate results).
One place where I used <<- was in simple GUIs using tcl/tk. Some of the initial examples have it -- as you need to make a distinction between local and global variables for statefullness. See for example
library(tcltk)
demo(tkdensity)
which uses <<-. Otherwise I concur with Marek :) -- a Google search can help.
On this subject I'd like to point out that the <<- operator will behave strangely when applied (incorrectly) within a for loop (there may be other cases too). Given the following code:
fortest <- function() {
mySum <- 0
for (i in c(1, 2, 3)) {
mySum <<- mySum + i
}
mySum
}
you might expect that the function would return the expected sum, 6, but instead it returns 0, with a global variable mySum being created and assigned the value 3. I can't fully explain what is going on here but certainly the body of a for loop is not a new scope 'level'. Instead, it seems that R looks outside of the fortest function, can't find a mySum variable to assign to, so creates one and assigns the value 1, the first time through the loop. On subsequent iterations, the RHS in the assignment must be referring to the (unchanged) inner mySum variable whereas the LHS refers to the global variable. Therefore each iteration overwrites the value of the global variable to that iteration's value of i, hence it has the value 3 on exit from the function.
Hope this helps someone - this stumped me for a couple of hours today! (BTW, just replace <<- with <- and the function works as expected).
f <- function(n, x0) {x <- x0; replicate(n, (function(){x <<- x+rnorm(1)})())}
plot(f(1000,0),typ="l")
The <<- operator can also be useful for Reference Classes when writing Reference Methods. For example:
myRFclass <- setRefClass(Class = "RF",
fields = list(A = "numeric",
B = "numeric",
C = function() A + B))
myRFclass$methods(show = function() cat("A =", A, "B =", B, "C =",C))
myRFclass$methods(changeA = function() A <<- A*B) # note the <<-
obj1 <- myRFclass(A = 2, B = 3)
obj1
# A = 2 B = 3 C = 5
obj1$changeA()
obj1
# A = 6 B = 3 C = 9
I use it in order to change inside map() an object in the global environment.
a = c(1,0,0,1,0,0,0,0)
Say I want to obtain a vector which is c(1,2,3,1,2,3,4,5), that is if there is a 1, let it 1, otherwise add 1 until the next 1.
map(
.x = seq(1,(length(a))),
.f = function(x) {
a[x] <<- ifelse(a[x]==1, a[x], a[x-1]+1)
})
a
[1] 1 2 3 1 2 3 4 5

Saving output from a function.

I am minimizing this function below using the optim function, which works really well. My only problem is that I can't save the W matrix, I am computing inside the function when minimizing. Is there a way to save the W matrix somehow?
W<-c()
GMM_1_stage <- function(beta) {for (i in 1:(nrow(gmm_i))){
gmm_i[i,]=g_beta(i,beta)}
gmm_N=t(colSums(gmm_i))%*%colSums(gmm_i)
W<-solve((1/(nrow(A)/5))*t(gmm_i)%*%gmm_i)
return(gmm_N)
}
GMM_1<-optim(beta_MLE,GMM_1_stage)
Best regards
Here is a safer version of #mrip's answer that uses a temporary environment rather than <<-:
tempenv <- new.env()
tempenv$xx <- c()
fun<-function(x){
tempenv$xx[ length(tempenv$xx) + 1 ] <- x
x^2
}
optimize(fun,c(-1,1))
tempenv$xx
By using the temporary environment you don't need to worry about accidentally writing over an object in the global environment or <<- assigning in an unexpected place.
You can assign to an object in the global environment (or in a the closest ancestor environment where the variable is defined) using <<-. So, for example, if I wanted to keep track of every value of x during a simple optimization I could do this.
xx<-c()
fun<-function(x){
xx[length(xx)+1]<<-x
x^2
}
optimize(fun,c(-1,1))
xx
## [1] -2.360680e-01 2.360680e-01 5.278640e-01 -2.775558e-17 4.069010e-05
## [6] -4.069010e-05 -2.775558e-17
In your case, if you only want the last value of W you can replace that line in your code with:
W<<-solve((1/(nrow(A)/5))*t(gmm_i)%*%gmm_i)
If you want them all, then first set Wlist<-list(), and then in your function set
Wlist[[length(Wlist)+1]]<<-solve((1/(nrow(A)/5))*t(gmm_i)%*%gmm_i)

Resources