Why is R capricious in its use of attributes on reference class objects? - r

I am having some trouble achieving consistent behavior accessing attributes attached to reference class objects. For example,
testClass <- setRefClass('testClass',
methods = list(print_attribute = function(name) print(attr(.self, name))))
testInstance <- testClass$new()
attr(testInstance, 'testAttribute') <- 1
testInstance$print_attribute('testAttribute')
And the R console cheerily prints NULL. However, if we try another approach,
testClass <- setRefClass('testClass',
methods = list(initialize = function() attr(.self, 'testAttribute') <<- 1,
print_attribute = function(name) print(attr(.self, name))))
testInstance <- testClass$new()
testInstance$print_attribute('testAttribute')
and now we have 1 as expected. Note that the <<- operator is required, presumably because assigning to .self has the same restrictions as assigning to reference class fields. Note that if we had tried to assign outside of the constructor, say
testClass <- setRefClass('testClass',
methods = list(set_attribute = function(name, value) attr(.self, name) <<- value,
print_attribute = function(name) print(attr(.self, name))))
testInstance <- testClass$new()
testInstance$set_attribute('testAttribute', 1)
we would be slapped with
Error in attr(.self, name) <<- value :
cannot change value of locked binding for '.self'
Indeed, the documentation ?setRefClass explains that
The entire object can be referred to in a method by the reserved name .self ... These fields are read-only (it makes no sense to
modify these references), with one exception. In principal, the
.self field can be modified in the $initialize method, because
the object is still being created at this stage.
I am happy with all of this, and agree with author's decisions. However, what I am concerned about is the following. Going back to the first example above, if we try asking for attr(testInstance, 'testAttribute'), we see from the global environment that it is 1!
Presumably, the .self that is used in the methods of the reference class object is stored in the same memory location as testInstance--it is the same object. Thus, by setting an attribute on testInstance successfully in the global environment, but not as a .self reference (as demonstrated in the first example), have we inadvertently triggered a copy of the entire object in the global environment? Or is the way attributes are stored "funny" in some way that the object can reside in the same memory, but its attributes are different depending on the calling environment?
I see no other explanation for why attr(.self, 'testAttribute') is NULL but attr(testInstance, 'testAttribute') is 1. The binding .self is locked once and for all, but that does not mean the object it references cannot change. If this is the desired behavior, it seems like a gotcha.
A final question is whether or not the preceding results imply attr<- should be avoided on reference class objects, at least if the resulting attributes are used from within the object's methods.

I think I may have figured it out. I began by digging into the implementation of reference classes for references to .self.
bodies <- Filter(function(x) !is.na(x),
structure(sapply(ls(getNamespace('methods'), all.names = TRUE), function(x) {
fn <- get(x, envir = getNamespace('methods'))
if (is.function(fn)) paste(deparse(body(fn)), collapse = "\n") else NA
}), .Names = ls(getNamespace('methods'), all.names = TRUE))
)
Now bodies holds a named character vector of all the functions in the methods package. We now look for .self:
goods <- bodies[grepl("\\.self", bodies)]
length(goods) # 4
names(goods) # [1] ".checkFieldsInMethod" ".initForEnvRefClass" ".makeDefaultBinding" ".shallowCopy"
So there are four functions in the methods package that contain the string .self. Inspecting them shows that .initForEnvRefClass is our culprit. We have the statement selfEnv$.self <- .Object. But what is selfEnv? Well, earlier in that same function, we have .Object#.xData <- selfEnv. Indeed, looking at the attributes on our testInstance from example one gives
$.xData
<environment: 0x10ae21470>
$class
[1] "testClass"
attr(,"package")
[1] ".GlobalEnv"
Peeking into attributes(attr(testInstance, '.xData')$.self) shows that we indeed can access .self directly using this approach. Notice that after executing the first two lines of example one (i.e. setting up testInstance), we have
identical(attributes(testInstance)$.xData$.self, testInstance)
# [1] TRUE
Yes! They are equal. Now, if we perform
attr(testInstance, 'testAttribute') <- 1
identical(attributes(testInstance)$.xData$.self, testInstance)
# [1] FALSE
so that adding an attribute to a reference class object has forced a creation of a copy, and .self is no longer identical to the object. However, if we check that
identical(attr(testInstance, '.xData'), attr(attr(testInstance, '.xData')$.self, '.xData'))
# [1] TRUE
we see that the environment attached to the reference class object remains the same. Thus, the copying was not very consequential in terms of memory footprint.
The end result of this foray is that the final answer is yes, you should avoid setting attributes on reference classes if you plan to use them within that object's methods. The reason for this is that the .self object in a reference class object's environment should be considered fixed once and for all after the object has been initialized--and this includes the creation of additional attributes.
Since the .self object is stored in an environment that is attached as an attribute to the reference class object, it does not seem possible to avoid this problem without using pointer yoga--and R does not have pointers.
Edit
It appears that if you are crazy, you can do
unlockBinding('.self', attr(testInstance, '.xData'))
attr(attr(testInstance, '.xData')$.self, 'testAttribute') <- 1
lockBinding('.self', attr(testInstance, '.xData'))
and the problems above magically go away.

Related

R: Where are formals for a function stored in memory?

When a function has been defined but has not yet been called, do the formals that do not have default values exist? If they do, do they exist in the execution environment, or in the environment where the function definition is located, or somewhere else?
If a function has been defined but not yet called, and a formal has been assigned a default value, does that value exist? If it does, in what environment does it exist? If the default expression evaluates to a constant, has the formal been assigned to that value, to be overwritten when the function is called if a value is supplied? If not, in what environment is that (fixed) default value located between the moment of definition and the time the function is called?
After the function has been called and actual or default values have been assigned to the formals, passed into the body, and if necessary scoped and/or evaluated, do the formals continue to exist? If so, in what environment do they then exist?
The formals for a function exist as objects within the environment of a function once an instance of the function is loaded into memory by being called. In Advanced R, Hadley Wickham calls this environment the execution environment. The memory locations of the objects can be accessed via pryr::address().
As an example I'll use a modified version of code that I previously wrote to illustrate memory locations in the makeVector() function from the second programming assignment for the Johns Hopkins R Programming course on coursera.org.
makeVector <- function(x = 200) {
library(pryr)
message(paste("Address of x argument is:",address(x)))
message(paste("Number of references to x is:",refs(x)))
m <- NULL
set <- function(y) {
x <<- y
message(paste("set() address of x is:",address(x)))
message(paste("Number of references to x is:",refs(x)))
m <<- NULL
}
get <- function() x
setmean <- function(mean) m <<- mean
getmean <- function() m
list(set = set, get = get,
setmean = setmean,
getmean = getmean)
}
As coded above, makeVector() is an S3 object, which means we can access objects within its environment via getters and setters, also known as mutator methods.
We can load an instance of the makeVector() object into memory and query the address and value of x with the following code.
makeVector()$get()
...and the result:
> makeVector()$get()
Address of x argument is: 0x1103df4e0
Number of references to x is: 0
[1] 200
>
As we can see from the output, x does have a memory location, but there are no other objects that contain references to it. Also, x was set to its default value of a vector of length 1 with the value 200.
I provide a detailed walkthrough of the objects in the makeVector() environment in my answer to Caching the Mean of a Vector in R.
Regarding the question about how long the formals exist in memory, they exist as long as the environment created to store the called instance of the function is in memory. Since the garbage collector operates on objects that have no external references, if the function instance is not saved to an object, it is eligible for garbage collection as soon as the function call returns a result to the parent environment.

Why does rm inside a function not delete objects?

rel.mem <- function(nm) {
rm(nm)
}
I defined the above function rel.mem -- takes a single argument and passes it to rm
> ls()
[1] "rel.mem"
> x<-1:10
> ls()
[1] "rel.mem" "x"
> rel.mem(x)
> ls()
[1] "rel.mem" "x"
Now you can see what I call rel.mem x is not deleted -- I know this is due to the incorrect environment on which rm is being attempted.
What is a good fix for this?
Criteria for a good fix:
The caller should not have to pass the environment
The callee (rel.mem) should be able to determine the environment by using an R language facility (call stack inspection, aspects, etc.)
The interface of the function rel.mem should be kept simple -- idiot proof: call rel.mem -- then rel.mem takes it from there -- no need to pass environments.
NOTES:
As many commenters have pointed out that one easy fix is to pass the environment.
What I meant by a good fix [and I should have clarified it] is that the callee function (in this case rel.mem) is able to calculate/find out the environment when the caller was referring to and then remove the object from the right environment.
The type of reasoning in "2" can be done in other languages by inspecting the call stack -- for example in Java I would throw a dummy exception -- catch it and then parse the call stack. In other languages still I could use Aspect Oriented techniques. The question is can something like that be done in R?
As one commenter has suggested that there may be multiple objects with the same name and thus the "right" environment is meaningless -- as I've stated above that in other languages it is possible (sometimes with some creative trickery) to interpret the call-stack -- this may not be possible in R
As one commenter has suggested that rm(list=nm, envir = parent.frame()) will remove this from the parent environment. This is correct -- however I'm looking for something that will work for an arbitrary call depth.
The quick answer is that you're in a different environment - essentially picture the variables in a box: you have a box for the function and one for the Global Environment. You just need to tell rm where to find that box.
So
rel_mem <- function(nm) {
# State the environment
rm(list=nm, envir = .GlobalEnv )
}
x = 10
rel_mem("x")
Alternatively, you can use the pos argument, e.g.
rel_mem <- function(nm) {
rm(list=nm, pos=1 )
}
If you type search() you will see a vector of environments, the global is number 1.
Another two options are
envir = parent.frame() if you want to go one level up the call stack
Use inherits = TRUE to go up the call stack until you find something
In the above code, notice that I'm passing the object as a character - I'm passing the "x" not x. We can be clever and avoid this using the substitute function
rel_mem <- function(nm) {
rm(list = as.character(substitute(nm)), envir = .GlobalEnv )
}
To finish I'll just add that deleting things in the .GlobalEnv from a function is generally a bad idea.
Further resources:
Environments:http://adv-r.had.co.nz/Environments.html
Substitute function: http://adv-r.had.co.nz/Computing-on-the-language.html#capturing-expressions
If you are using another function to find the global objects within your function such as ls(), you must state the environment in it explicitly too:
rel_mem <- function(nm) {
# State the environment in both functions
rm(list = ls(envir = .GlobalEnv) %>% .[startsWith(., "plot_")], envir = .GlobalEnv)
}

How to use reflection to intercept an expression prior to evaluation?

I was hoping to use R's reflection capabilities to intercept the current expression under evaluation before it is evaluated.
For instance, to create some syntax sugar, given the following:
> Server <- setRefClass("Server",
> methods = list(
> handler = function(expr) submitExpressionToRemoteServer(expr)
> )
> )
> server <- Server()
> server$foo$bar$baz #... should be map to... server$handler("foo$bar$baz")
I want the expression server$foo$bar$baz to be intercepted by the server$handlermethod and get mapped to server$handler("foo$bar$baz").
Note that I want this call to succeed even though server$foo is not defined: I am interested only in the expression itself (so I can do stuff with the expression), not that it evaluates to a valid local object.
Is this possible?
I don't think this is possible to redefine the $ behavior with Reference Classes (R5) objects in R. However, this is something that you can do with S4 classes. The main problem is that an expression like
server$foo$bar$baz
would get translated to a series of calls like
$($($(server,"foo"),"bar"),"baz")
but unlike normal function nesting, each inner call appears to be fully evaluated before going to the next level of nesting. This it's not really possible just to split up everything after the first $ because that's not how it's parsed. However you can have the $ function return another object and append all the values sent to the object. Here's a sample S4 class
setClass("Server", slots=list(el="character"))
setMethod("$", signature(x="Server"),
function(x,name) {
xx <- append(slot(x,"el"),name)
new("Server", el=xx)
}
)
server <- new("Server")
server$foo$bar$baz
# An object of class "Server"
# Slot "el":
# [1] "foo" "bar" "baz"
the only problem is there's no way i've found to know when you're at the end of a list if you wanted to do anything with those parameters.

How to avoid prepending .self when using eval in a reference class in R?

I need to use eval to call a reference class method. Below is a toy example:
MyClass <- setRefClass("MyClass",
fields = c("my_field"),
methods = list(
initialize = function(){
my_field <<- 3
},
hello = function(){
"hello"
},
run = function(user_defined_text){
eval(parse(text = user_defined_text))
}
)
)
p <- MyClass$new()
p$run("hello()") # Error: could not find function "hello" - doesn't work
p$run(".self$hello()") # "hello" - it works
p$run("hello()") # "hello" - now it works?!
p <- MyClass$new()
p$run("my_field") # 3 - no need to add .self
I guess I could do eval(parse(text = paste0(".self$", user_defined_text))), but I don't really understand:
why is .self needed to eval methods, but not fields?
why is .self no longer needed after it has been used once?
'Why' questions are always challenging to answer; usually the answer is 'because'. On ?setRefClass we eventually have
Only methods actually used will be included in the environment
corresponding to an individual object. To declare that a method
requires a particular other method, the first method should
include a call to '$usingMethods()' with the name of the other
method as an argument. Declaring the methods this way is essential
if the other method is used indirectly (e.g., via 'sapply()' or
'do.call()'). If it is called directly, code analysis will find
it. Declaring the method is harmless in any case, however, and may
aid readability of the source code.
I'm not sure this is entirely helpful in your case, where the user is apparently able to specify any method. Offering a little unasked editorial comment, I'm not sure 'why' you'd want to write a method that would parse input text to methods; I've never used that paradigm myself.

Setting Global variables inside reference class in R

I'm a bit confused on global variable assignments after reading quite a lot of stack overflow questions. I have gone through
Global variables in R and other similar questions
I have the following situation. I have 2 global variables current_idx and previous_idx. These 2 global variables are being set by a method in a reference class.
Essentially, using <<- assignment operator should work right ? But, I get this warning
Non-local assignment to non-field names (possibly misspelled?)
Where am I going wrong ?
EDIT
Using assign(current_idx, index, envir = .GlobalEnv) works i.e. I do not get the warning. Can some one shed some light on this.
You are confusing "global variables" and Reference Classes which are a type of environment. Executing <<- will assign to a variable with that name in the parent.frame of the function. If you are only one level down from the .GlobalEnv, it will do the same thing as your assign statement.
If you have a Reference Class item you can assign items inside it by name with:
ref_item$varname <- value
Easier said than done, though. First you need to set up the ReferenceClass properly:
http://www.inside-r.org/r-doc/methods/ReferenceClasses
This is happening because the default method for modifying fields of a reference class from within a reference class method is to use <<-. For example, in:
setRefClass(
"myClass",
fields=list(a="integer"),
methods=list(setA=function(x) a <<- x)
)
You can modify the a field of your reference class via the setA method. Because this is the canonical way of setting fields via methods in reference classes, R assumes that any other use of <<- within a reference method is a mistake. So if you try to assign to a variable that exists in an environment other than the reference class, R "helpfully" warns you that maybe you have a typo since it thinks the only probably use of <<- in a reference method is to modify a reference field.
You can still just assign to global objects with <<-. The warning is just a warning that maybe you are doing something you didn't intend to do. If you intended to write to an object in the global environment, then the warning doesn't apply.
By using assign you are bypassing the check that reference methods carry out to make sure you are not accidentally typoing a field name in an assignment within the reference method, so you don't get the warning. Also, note that assign actually targets the environment you supply, whereas <<- will just find the first object of that name in the lexical search path.
All this said, there are really rare instances where you actually want a reference method do be writing directly to the global environment. You may need to rethink what you are doing. You should ask yourself why those two variables are not just fields in the reference class instead of global variables.

Resources