R: environment lookup

R: environment lookup - r

I am a bit confused by R's lookup mechanism. When I have the following code
# create chain of empty environments
e1 <- new.env()
e2 <- new.env(parent=e1)
e3 <- new.env(parent=e2)
# set key/value pairs
e1[["x"]] <- 1
e2[["x"]] <- 2
then I would expect to get "2" if I look for "x" in environment e3.
This works if I do
> get(x="x", envir=e3)
[1] 2
but not if I use
> e3[["x"]]
NULL
Could somebody explain the difference? It seems, that
e3[["x"]]
is not just syntactic sugar for
get(x="x", envir=e3)
Thanks in advance,
Sven

These functions are different.
get performs a search for an object in an environemnt, as well as the enclosing frames (by default):
From ?get:
This function looks to see if the name x has a value bound to it in the specified environment. If inherits is TRUE and a value is not found for x in the specified environment, the enclosing frames of the environment are searched until the name x is encountered. See environment and the ‘R Language Definition’ manual for details about the structure of environments and their enclosures.
In contrast, the [ operator does not search enclosing environments, by default.
From ?'[':
Both $ and [[ can be applied to environments. Only character indices are allowed and no partial matching is done. The semantics of these operations are those of get(i, env=x, inherits=FALSE).

Related

Where are function constants stored if a function is created inside another function?

I am using a parent function to generate a child function by returning the function in the parent function call. The purpose of the parent function is to set a constant (y) in the child function. Below is a MWE. When I try to debug the child function I cannot figure out in which environment the variable is stored in.
power=function(y){
return(function(x){return(x^y)})
}
square=power(2)
debug(square)
square(3)
debugging in: square(3)
debug at #2: {
return(x^y)
}
Browse[2]> x
[1] 3
Browse[2]> y
[1] 2
Browse[2]> ls()
[1] "x"
Browse[2]> find('y')
character(0)

If you inspect the type of an R function, you’ll observe the following:
> typeof(square)
[1] "closure"
And that is, in fact, exactly the answer to your question: a closure is a function that carries an environment around.
R also tells you which environment this is (albeit not in a terribly useful way):
> square
function(x){return(x^y)}
<environment: 0x7ffd9218e578>
(The exact number will differ with each run — it’s just a memory address.)
Now, which environment does this correspond to? It corresponds to a local environment that was created when we executed power(2) (a “stack frame”). As the other answer says, it’s now the parent environment of the square function (in fact, in R every function, except for certain builtins, is associated with a parent environment):
> ls(environment(square))
[1] "y"
> environment(square)$y
[1] 2
You can read more about environments in the chapter in Hadley’s Advanced R book.
Incidentally, closures are a core feature of functional programming languages. Another core feature of functional languages is that every expression is a value — and, by implication, a function’s (return) value is the value of its last expression. This means that using the return function in R is both unnecessary and misleading!1 You should therefore leave it out: this results in shorter, more readable code:
power = function (y) {
function (x) x ^ y
}
There’s another R specific subtlety here: since arguments are evaluated lazily, your function definition is error-prone:
> two = 2
> square = power(two)
> two = 10
> square(5)
[1] 9765625
Oops! Subsequent modifications of the variable two are reflected inside square (but only the first time! Further redefinitions won’t change anything). To guard against this, use the force function:
power = function (y) {
force(y)
function (x) x ^ y
}
force simply forces the evaluation of an argument name, nothing more.
1 Misleading, because return is a function in R and carries a slightly different meaning compared to procedural languages: it aborts the current function exectuion.

The variable y is stored in the parent environment of the function. The environment() function returns the current environment, and we use parent.env() to get the parent environment of a particular environment.
ls(envir=parent.env(environment())) #when using the browser
The find() function doesn't seem helpful in this case because it seems to only search objects that have been attached to the global search path (search()). It doesn't try to resolve variable names in the current scope.

Auto assignment

Recently I have been working with a set of R scripts that I inherited from a colleague. It is for me a trusted source but more than once I found in his code auto-assignments like
x <<- x
Is there any scope where such an operation could make sense?

This is a mechanism for copying a value defined within a function into the global environment (or at least, somewhere within the stack of parent of environments): from ?"<<-"
The operators ‘<<-’ and ‘->>’ are normally only used in functions,
and cause a search to be made through parent environments for an
existing definition of the variable being assigned. If such a
variable is found (and its binding is not locked) then its value
is redefined, otherwise assignment takes place in the global
environment.
I don't think it's particularly good practice (R is a mostly-functional language, and it's generally better to avoid function side effects), but it does do something. (#Roland points out in comments and #BrianO'Donnell in his answer [quoting Thomas Lumley] that using <<- is good practice if you're using it to modify a function closure, as in demo(scoping). In my experience it is more often misused to construct global variables than to work cleanly with function closures.)
Consider this example, starting in an empty/clean environment:
f <- function() {
x <- 1 ## assignment
x <<- x ## global assignment
}
Before we call f():
x
## Error: object 'x' not found
Now call f() and try again:
f()
x
## [1] 1

<<-
is a global assignment operator and I would imagine there should hardly ever be a reason to use it because it effectively causes side effects.
The scope to use it would be in any case when one wants to define a global variable or a variable one level up from current environment.

Alan gives a good answer: Use the superassignment operator <<- to write upstairs.
Hadley also gives a good answer: How do you use "<<-" (scoping assignment) in R?.
For details on the 'superassignment' operator see Scope.
Here is some critical information on the operator from the section on Assignment Operators in the R manual:
"The operators <<- and ->> are normally only used in functions, and cause a search to be made through parent environments for an existing definition of the variable being assigned. If such a variable is found (and its binding is not locked) then its value is redefined, otherwise assignment takes place in the global environment."
Thomas Lumley sums it up nicely: "The good use of superassignment is in conjuction with lexical scope, where an environment stores state for a function or set of functions that modify the state by using superassignment."

For example:
x <- NA
test <- function(x) {
x <<- x
}
> test(5)
> x
#[1] 5
That's a simple use here, <<- will do a parent environment search (case of nested functions declarations) and if not found assign in the global environment.
Usually this is a really bad ideaTM as you have no real control on where the variable will be assigned and you have chances it will overwrite a variable used for another purpose somewhere.

How to use reflection to intercept an expression prior to evaluation?

I was hoping to use R's reflection capabilities to intercept the current expression under evaluation before it is evaluated.
For instance, to create some syntax sugar, given the following:
> Server <- setRefClass("Server",
> methods = list(
> handler = function(expr) submitExpressionToRemoteServer(expr)
> )
> )
> server <- Server()
> server$foo$bar$baz #... should be map to... server$handler("foo$bar$baz")
I want the expression server$foo$bar$baz to be intercepted by the server$handlermethod and get mapped to server$handler("foo$bar$baz").
Note that I want this call to succeed even though server$foo is not defined: I am interested only in the expression itself (so I can do stuff with the expression), not that it evaluates to a valid local object.
Is this possible?

I don't think this is possible to redefine the $ behavior with Reference Classes (R5) objects in R. However, this is something that you can do with S4 classes. The main problem is that an expression like
server$foo$bar$baz
would get translated to a series of calls like
$($($(server,"foo"),"bar"),"baz")
but unlike normal function nesting, each inner call appears to be fully evaluated before going to the next level of nesting. This it's not really possible just to split up everything after the first $ because that's not how it's parsed. However you can have the $ function return another object and append all the values sent to the object. Here's a sample S4 class
setClass("Server", slots=list(el="character"))
setMethod("$", signature(x="Server"),
function(x,name) {
xx <- append(slot(x,"el"),name)
new("Server", el=xx)
}
)
server <- new("Server")
server$foo$bar$baz
# An object of class "Server"
# Slot "el":
# [1] "foo" "bar" "baz"
the only problem is there's no way i've found to know when you're at the end of a list if you wanted to do anything with those parameters.

Confused by ...()?

In another question, sapply(substitute(...()), as.character) was used inside a function to obtain the names passed to the function. The as.character part sounds fine, but what on earth does ...() do?
It's not valid code outside of substitute:
> test <- function(...) ...()
> test(T,F)
Error in test(T, F) : could not find function "..."
Some more test cases:
> test <- function(...) substitute(...())
> test(T,F)
[[1]]
T
[[2]]
F
> test <- function(...) substitute(...)
> test(T,F)
T

Here's a sketch of why ...() works the way it does. I'll fill in with more details and references later, but this touches on the key points.
Before performing substitution on any of its components, substitute() first parses an R statement.
...() parses to a call object, whereas ... parses to a name object.
... is a special object, intended only to be used in function calls. As a consequence, the C code that implements substitution takes special measures to handle ... when it is found in a call object. Similar precautions are not taken when ... occurs as a symbol. (The relevant code is in the functions do_substitute, substitute, and substituteList (especially the latter two) in R_SRCDIR/src/main/coerce.c.)
So, the role of the () in ...() is to cause the statement to be parsed as a call (aka language) object, so that substitution will return the fully expanded value of the dots. It may seem surprising that ... gets substituted for even when it's on the outside of the (), but: (a) calls are stored internally as list-like objects and (b) the relevant C code seems to make no distinction between the first element of that list and the subsequent ones.
Just a side note: for examining behavior of substitute or the classes of various objects, I find it useful to set up a little sandbox, like this:
f <- function(...) browser()
f(a = 4, 77, B = "char")
## Then play around within the browser
class(quote(...)) ## quote() parses without substituting
class(quote(...()))
substitute({...})
substitute(...(..., X, ...))
substitute(2 <- (makes * list(no - sense))(...))

Why the "=" R operator should not be used in functions?

The manual states:
The operator ‘<-’ can be used anywhere,
whereas the operator ‘=’ is only allowed at the top level (e.g.,
in the complete expression typed at the command prompt) or as one
of the subexpressions in a braced list of expressions.
The question here mention the difference when used in the function call. But in the function definition, it seems to work normally:
a = function ()
{
b = 2
x <- 3
y <<- 4
}
a()
# (b and x are undefined here)
So why the manual mentions that the operator ‘=’ is only allowed at the top level??
There is nothing about it in the language definition (there is no = operator listed, what a shame!)

The text you quote says at the top level OR in a braced list of subexpressions. You are using it in a braced list of subexpressions. Which is allowed.
You have to go to great lengths to find an expression which is neither toplevel nor within braces. Here is one. You sometimes want to wrap an assignment inside a try block: try( x <- f() ) is fine, but try( x = f(x) ) is not -- you need to either change the assignment operator or add braces.

Expressions not at the top level include usage in control structures like if. For example, the following programming error is illegal.
> if(x = 0) 1 else x
Error: syntax error
As mentioned here: https://stackoverflow.com/a/4831793/210673
Also see http://developer.r-project.org/equalAssign.html

Other than some examples such as system.time as others have shown where <- and = have different results, the main difference is more philisophical. Larry Wall, the creater of Perl, said something along the lines of "similar things should look similar, different things should look different", I have found it interesting in different languages to see what things are considered "similar" and which are considered "different". Now for R assignment let's compare 2 commands:
myfun( a <- 1:10 )
myfun( a = 1:10 )
Some would argue that in both cases we are assigning 1:10 to a so what we are doing is similar.
The other argument is that in the first call we are assigning to a variable a that is in the same environment from which myfun is being called and in the second call we are assigning to a variable a that is in the environment created when the function is called and is local to the function and those two a variables are different.
So which to use depends on whether you consider the assignments "similar" or "different".
Personally, I prefer <-, but I don't think it is worth fighting a holy war over.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R: environment lookup - r

Related

Where are function constants stored if a function is created inside another function?

Auto assignment

How to use reflection to intercept an expression prior to evaluation?

Confused by ...()?

Why the "=" R operator should not be used in functions?

Categories

Resources