How to use reflection to intercept an expression prior to evaluation? - r

I was hoping to use R's reflection capabilities to intercept the current expression under evaluation before it is evaluated.
For instance, to create some syntax sugar, given the following:
> Server <- setRefClass("Server",
> methods = list(
> handler = function(expr) submitExpressionToRemoteServer(expr)
> )
> )
> server <- Server()
> server$foo$bar$baz #... should be map to... server$handler("foo$bar$baz")
I want the expression server$foo$bar$baz to be intercepted by the server$handlermethod and get mapped to server$handler("foo$bar$baz").
Note that I want this call to succeed even though server$foo is not defined: I am interested only in the expression itself (so I can do stuff with the expression), not that it evaluates to a valid local object.
Is this possible?

I don't think this is possible to redefine the $ behavior with Reference Classes (R5) objects in R. However, this is something that you can do with S4 classes. The main problem is that an expression like
server$foo$bar$baz
would get translated to a series of calls like
$($($(server,"foo"),"bar"),"baz")
but unlike normal function nesting, each inner call appears to be fully evaluated before going to the next level of nesting. This it's not really possible just to split up everything after the first $ because that's not how it's parsed. However you can have the $ function return another object and append all the values sent to the object. Here's a sample S4 class
setClass("Server", slots=list(el="character"))
setMethod("$", signature(x="Server"),
function(x,name) {
xx <- append(slot(x,"el"),name)
new("Server", el=xx)
}
)
server <- new("Server")
server$foo$bar$baz
# An object of class "Server"
# Slot "el":
# [1] "foo" "bar" "baz"
the only problem is there's no way i've found to know when you're at the end of a list if you wanted to do anything with those parameters.

Related

Where are function constants stored if a function is created inside another function?

I am using a parent function to generate a child function by returning the function in the parent function call. The purpose of the parent function is to set a constant (y) in the child function. Below is a MWE. When I try to debug the child function I cannot figure out in which environment the variable is stored in.
power=function(y){
return(function(x){return(x^y)})
}
square=power(2)
debug(square)
square(3)
debugging in: square(3)
debug at #2: {
return(x^y)
}
Browse[2]> x
[1] 3
Browse[2]> y
[1] 2
Browse[2]> ls()
[1] "x"
Browse[2]> find('y')
character(0)
If you inspect the type of an R function, you’ll observe the following:
> typeof(square)
[1] "closure"
And that is, in fact, exactly the answer to your question: a closure is a function that carries an environment around.
R also tells you which environment this is (albeit not in a terribly useful way):
> square
function(x){return(x^y)}
<environment: 0x7ffd9218e578>
(The exact number will differ with each run — it’s just a memory address.)
Now, which environment does this correspond to? It corresponds to a local environment that was created when we executed power(2) (a “stack frame”). As the other answer says, it’s now the parent environment of the square function (in fact, in R every function, except for certain builtins, is associated with a parent environment):
> ls(environment(square))
[1] "y"
> environment(square)$y
[1] 2
You can read more about environments in the chapter in Hadley’s Advanced R book.
Incidentally, closures are a core feature of functional programming languages. Another core feature of functional languages is that every expression is a value — and, by implication, a function’s (return) value is the value of its last expression. This means that using the return function in R is both unnecessary and misleading!1 You should therefore leave it out: this results in shorter, more readable code:
power = function (y) {
function (x) x ^ y
}
There’s another R specific subtlety here: since arguments are evaluated lazily, your function definition is error-prone:
> two = 2
> square = power(two)
> two = 10
> square(5)
[1] 9765625
Oops! Subsequent modifications of the variable two are reflected inside square (but only the first time! Further redefinitions won’t change anything). To guard against this, use the force function:
power = function (y) {
force(y)
function (x) x ^ y
}
force simply forces the evaluation of an argument name, nothing more.
1 Misleading, because return is a function in R and carries a slightly different meaning compared to procedural languages: it aborts the current function exectuion.
The variable y is stored in the parent environment of the function. The environment() function returns the current environment, and we use parent.env() to get the parent environment of a particular environment.
ls(envir=parent.env(environment())) #when using the browser
The find() function doesn't seem helpful in this case because it seems to only search objects that have been attached to the global search path (search()). It doesn't try to resolve variable names in the current scope.

Why does rm inside a function not delete objects?

rel.mem <- function(nm) {
rm(nm)
}
I defined the above function rel.mem -- takes a single argument and passes it to rm
> ls()
[1] "rel.mem"
> x<-1:10
> ls()
[1] "rel.mem" "x"
> rel.mem(x)
> ls()
[1] "rel.mem" "x"
Now you can see what I call rel.mem x is not deleted -- I know this is due to the incorrect environment on which rm is being attempted.
What is a good fix for this?
Criteria for a good fix:
The caller should not have to pass the environment
The callee (rel.mem) should be able to determine the environment by using an R language facility (call stack inspection, aspects, etc.)
The interface of the function rel.mem should be kept simple -- idiot proof: call rel.mem -- then rel.mem takes it from there -- no need to pass environments.
NOTES:
As many commenters have pointed out that one easy fix is to pass the environment.
What I meant by a good fix [and I should have clarified it] is that the callee function (in this case rel.mem) is able to calculate/find out the environment when the caller was referring to and then remove the object from the right environment.
The type of reasoning in "2" can be done in other languages by inspecting the call stack -- for example in Java I would throw a dummy exception -- catch it and then parse the call stack. In other languages still I could use Aspect Oriented techniques. The question is can something like that be done in R?
As one commenter has suggested that there may be multiple objects with the same name and thus the "right" environment is meaningless -- as I've stated above that in other languages it is possible (sometimes with some creative trickery) to interpret the call-stack -- this may not be possible in R
As one commenter has suggested that rm(list=nm, envir = parent.frame()) will remove this from the parent environment. This is correct -- however I'm looking for something that will work for an arbitrary call depth.
The quick answer is that you're in a different environment - essentially picture the variables in a box: you have a box for the function and one for the Global Environment. You just need to tell rm where to find that box.
So
rel_mem <- function(nm) {
# State the environment
rm(list=nm, envir = .GlobalEnv )
}
x = 10
rel_mem("x")
Alternatively, you can use the pos argument, e.g.
rel_mem <- function(nm) {
rm(list=nm, pos=1 )
}
If you type search() you will see a vector of environments, the global is number 1.
Another two options are
envir = parent.frame() if you want to go one level up the call stack
Use inherits = TRUE to go up the call stack until you find something
In the above code, notice that I'm passing the object as a character - I'm passing the "x" not x. We can be clever and avoid this using the substitute function
rel_mem <- function(nm) {
rm(list = as.character(substitute(nm)), envir = .GlobalEnv )
}
To finish I'll just add that deleting things in the .GlobalEnv from a function is generally a bad idea.
Further resources:
Environments:http://adv-r.had.co.nz/Environments.html
Substitute function: http://adv-r.had.co.nz/Computing-on-the-language.html#capturing-expressions
If you are using another function to find the global objects within your function such as ls(), you must state the environment in it explicitly too:
rel_mem <- function(nm) {
# State the environment in both functions
rm(list = ls(envir = .GlobalEnv) %>% .[startsWith(., "plot_")], envir = .GlobalEnv)
}

Why is R capricious in its use of attributes on reference class objects?

I am having some trouble achieving consistent behavior accessing attributes attached to reference class objects. For example,
testClass <- setRefClass('testClass',
methods = list(print_attribute = function(name) print(attr(.self, name))))
testInstance <- testClass$new()
attr(testInstance, 'testAttribute') <- 1
testInstance$print_attribute('testAttribute')
And the R console cheerily prints NULL. However, if we try another approach,
testClass <- setRefClass('testClass',
methods = list(initialize = function() attr(.self, 'testAttribute') <<- 1,
print_attribute = function(name) print(attr(.self, name))))
testInstance <- testClass$new()
testInstance$print_attribute('testAttribute')
and now we have 1 as expected. Note that the <<- operator is required, presumably because assigning to .self has the same restrictions as assigning to reference class fields. Note that if we had tried to assign outside of the constructor, say
testClass <- setRefClass('testClass',
methods = list(set_attribute = function(name, value) attr(.self, name) <<- value,
print_attribute = function(name) print(attr(.self, name))))
testInstance <- testClass$new()
testInstance$set_attribute('testAttribute', 1)
we would be slapped with
Error in attr(.self, name) <<- value :
cannot change value of locked binding for '.self'
Indeed, the documentation ?setRefClass explains that
The entire object can be referred to in a method by the reserved name .self ... These fields are read-only (it makes no sense to
modify these references), with one exception. In principal, the
.self field can be modified in the $initialize method, because
the object is still being created at this stage.
I am happy with all of this, and agree with author's decisions. However, what I am concerned about is the following. Going back to the first example above, if we try asking for attr(testInstance, 'testAttribute'), we see from the global environment that it is 1!
Presumably, the .self that is used in the methods of the reference class object is stored in the same memory location as testInstance--it is the same object. Thus, by setting an attribute on testInstance successfully in the global environment, but not as a .self reference (as demonstrated in the first example), have we inadvertently triggered a copy of the entire object in the global environment? Or is the way attributes are stored "funny" in some way that the object can reside in the same memory, but its attributes are different depending on the calling environment?
I see no other explanation for why attr(.self, 'testAttribute') is NULL but attr(testInstance, 'testAttribute') is 1. The binding .self is locked once and for all, but that does not mean the object it references cannot change. If this is the desired behavior, it seems like a gotcha.
A final question is whether or not the preceding results imply attr<- should be avoided on reference class objects, at least if the resulting attributes are used from within the object's methods.
I think I may have figured it out. I began by digging into the implementation of reference classes for references to .self.
bodies <- Filter(function(x) !is.na(x),
structure(sapply(ls(getNamespace('methods'), all.names = TRUE), function(x) {
fn <- get(x, envir = getNamespace('methods'))
if (is.function(fn)) paste(deparse(body(fn)), collapse = "\n") else NA
}), .Names = ls(getNamespace('methods'), all.names = TRUE))
)
Now bodies holds a named character vector of all the functions in the methods package. We now look for .self:
goods <- bodies[grepl("\\.self", bodies)]
length(goods) # 4
names(goods) # [1] ".checkFieldsInMethod" ".initForEnvRefClass" ".makeDefaultBinding" ".shallowCopy"
So there are four functions in the methods package that contain the string .self. Inspecting them shows that .initForEnvRefClass is our culprit. We have the statement selfEnv$.self <- .Object. But what is selfEnv? Well, earlier in that same function, we have .Object#.xData <- selfEnv. Indeed, looking at the attributes on our testInstance from example one gives
$.xData
<environment: 0x10ae21470>
$class
[1] "testClass"
attr(,"package")
[1] ".GlobalEnv"
Peeking into attributes(attr(testInstance, '.xData')$.self) shows that we indeed can access .self directly using this approach. Notice that after executing the first two lines of example one (i.e. setting up testInstance), we have
identical(attributes(testInstance)$.xData$.self, testInstance)
# [1] TRUE
Yes! They are equal. Now, if we perform
attr(testInstance, 'testAttribute') <- 1
identical(attributes(testInstance)$.xData$.self, testInstance)
# [1] FALSE
so that adding an attribute to a reference class object has forced a creation of a copy, and .self is no longer identical to the object. However, if we check that
identical(attr(testInstance, '.xData'), attr(attr(testInstance, '.xData')$.self, '.xData'))
# [1] TRUE
we see that the environment attached to the reference class object remains the same. Thus, the copying was not very consequential in terms of memory footprint.
The end result of this foray is that the final answer is yes, you should avoid setting attributes on reference classes if you plan to use them within that object's methods. The reason for this is that the .self object in a reference class object's environment should be considered fixed once and for all after the object has been initialized--and this includes the creation of additional attributes.
Since the .self object is stored in an environment that is attached as an attribute to the reference class object, it does not seem possible to avoid this problem without using pointer yoga--and R does not have pointers.
Edit
It appears that if you are crazy, you can do
unlockBinding('.self', attr(testInstance, '.xData'))
attr(attr(testInstance, '.xData')$.self, 'testAttribute') <- 1
lockBinding('.self', attr(testInstance, '.xData'))
and the problems above magically go away.

How to avoid prepending .self when using eval in a reference class in R?

I need to use eval to call a reference class method. Below is a toy example:
MyClass <- setRefClass("MyClass",
fields = c("my_field"),
methods = list(
initialize = function(){
my_field <<- 3
},
hello = function(){
"hello"
},
run = function(user_defined_text){
eval(parse(text = user_defined_text))
}
)
)
p <- MyClass$new()
p$run("hello()") # Error: could not find function "hello" - doesn't work
p$run(".self$hello()") # "hello" - it works
p$run("hello()") # "hello" - now it works?!
p <- MyClass$new()
p$run("my_field") # 3 - no need to add .self
I guess I could do eval(parse(text = paste0(".self$", user_defined_text))), but I don't really understand:
why is .self needed to eval methods, but not fields?
why is .self no longer needed after it has been used once?
'Why' questions are always challenging to answer; usually the answer is 'because'. On ?setRefClass we eventually have
Only methods actually used will be included in the environment
corresponding to an individual object. To declare that a method
requires a particular other method, the first method should
include a call to '$usingMethods()' with the name of the other
method as an argument. Declaring the methods this way is essential
if the other method is used indirectly (e.g., via 'sapply()' or
'do.call()'). If it is called directly, code analysis will find
it. Declaring the method is harmless in any case, however, and may
aid readability of the source code.
I'm not sure this is entirely helpful in your case, where the user is apparently able to specify any method. Offering a little unasked editorial comment, I'm not sure 'why' you'd want to write a method that would parse input text to methods; I've never used that paradigm myself.

Confused by ...()?

In another question, sapply(substitute(...()), as.character) was used inside a function to obtain the names passed to the function. The as.character part sounds fine, but what on earth does ...() do?
It's not valid code outside of substitute:
> test <- function(...) ...()
> test(T,F)
Error in test(T, F) : could not find function "..."
Some more test cases:
> test <- function(...) substitute(...())
> test(T,F)
[[1]]
T
[[2]]
F
> test <- function(...) substitute(...)
> test(T,F)
T
Here's a sketch of why ...() works the way it does. I'll fill in with more details and references later, but this touches on the key points.
Before performing substitution on any of its components, substitute() first parses an R statement.
...() parses to a call object, whereas ... parses to a name object.
... is a special object, intended only to be used in function calls. As a consequence, the C code that implements substitution takes special measures to handle ... when it is found in a call object. Similar precautions are not taken when ... occurs as a symbol. (The relevant code is in the functions do_substitute, substitute, and substituteList (especially the latter two) in R_SRCDIR/src/main/coerce.c.)
So, the role of the () in ...() is to cause the statement to be parsed as a call (aka language) object, so that substitution will return the fully expanded value of the dots. It may seem surprising that ... gets substituted for even when it's on the outside of the (), but: (a) calls are stored internally as list-like objects and (b) the relevant C code seems to make no distinction between the first element of that list and the subsequent ones.
Just a side note: for examining behavior of substitute or the classes of various objects, I find it useful to set up a little sandbox, like this:
f <- function(...) browser()
f(a = 4, 77, B = "char")
## Then play around within the browser
class(quote(...)) ## quote() parses without substituting
class(quote(...()))
substitute({...})
substitute(...(..., X, ...))
substitute(2 <- (makes * list(no - sense))(...))

Resources