How to avoid prepending .self when using eval in a reference class in R? - r

I need to use eval to call a reference class method. Below is a toy example:
MyClass <- setRefClass("MyClass",
fields = c("my_field"),
methods = list(
initialize = function(){
my_field <<- 3
},
hello = function(){
"hello"
},
run = function(user_defined_text){
eval(parse(text = user_defined_text))
}
)
)
p <- MyClass$new()
p$run("hello()") # Error: could not find function "hello" - doesn't work
p$run(".self$hello()") # "hello" - it works
p$run("hello()") # "hello" - now it works?!
p <- MyClass$new()
p$run("my_field") # 3 - no need to add .self
I guess I could do eval(parse(text = paste0(".self$", user_defined_text))), but I don't really understand:
why is .self needed to eval methods, but not fields?
why is .self no longer needed after it has been used once?

'Why' questions are always challenging to answer; usually the answer is 'because'. On ?setRefClass we eventually have
Only methods actually used will be included in the environment
corresponding to an individual object. To declare that a method
requires a particular other method, the first method should
include a call to '$usingMethods()' with the name of the other
method as an argument. Declaring the methods this way is essential
if the other method is used indirectly (e.g., via 'sapply()' or
'do.call()'). If it is called directly, code analysis will find
it. Declaring the method is harmless in any case, however, and may
aid readability of the source code.
I'm not sure this is entirely helpful in your case, where the user is apparently able to specify any method. Offering a little unasked editorial comment, I'm not sure 'why' you'd want to write a method that would parse input text to methods; I've never used that paradigm myself.

Related

My defined R function does not 'save' the changes made to a matrix [duplicate]

I'm just getting my feet wet in R and was surprised to see that a function doesn't modify an object, at least it seems that's the default. For example, I wrote a function just to stick an asterisk on one label in a table; it works inside the function but the table itself is not changed. (I'm coming mainly from Ruby)
So, what is the normal, accepted way to use functions to change objects in R? How would I add an asterisk to the table title?
Replace the whole object: myTable = title.asterisk(myTable)
Use a work-around to call by reference (as described, for example, in Call by reference in R by TszKin Julian?
Use some structure other than a function? An object method?
The reason you're having trouble is the fact that you are passing the object into the local namespace of the function. This is one of the great / terrible things about R: it allows implicit variable declarations and then implements supercedence as the namespaces become deeper.
This is affecting you because a function creates a new namespace within the current namespace. The object 'myTable' was, I assume, originally created in the global namespace, but when it is passed into the function 'title.asterisk' a new function-local namespace now has an object with the same properties. This works like so:
title.asterisk <- function(myTable){ do some stuff to 'myTable' }
In this case, the function 'title.asterisk' does not make any changes to the global object 'myTable'. Instead, a local object is created with the same name, so the local object supercedes the global object. If we call the function title.asterisk(myTable) in this way, the function makes changes only to the local variable.
There are two direct ways to modify the global object (and many indirect ways).
Option 1: The first, as you mention, is to have the function return the object and overwrite the global object, like so:
title.asterisk <- function(myTable){
do some stuff to 'myTable'
return(myTable)
}
myTable <- title.asterisk(myTable)
This is okay, but you are still making your code a little difficult to understand, since there are really two different 'myTable' objects, one global and one local to the function. A lot of coders clear this up by adding a period '.' in front of variable arguments, like so:
title.asterisk <- function(.myTable){
do some stuff to '.myTable'
return(.myTable)
}
myTable <- title.asterisk(myTable)
Okay, now we have a visual cue that the two variables are different. This is good, because we don't want to rely on invisible things like namespace supercedence when we're trying to debug our code later. It just makes things harder than they have to be.
Option 2: You could just modify the object from within the function. This is the better option when you want to do destructive edits to an object and don't want memory inflation. If you are doing destructive edits, you don't need to save an original copy. Also, if your object is suitably large, you don't want to be copying it when you don't have to. To make edits to a global namespace object, simply don't pass it into or declare it from within the function.
title.asterisk <- function(){ do some stuff to 'myTable' }
Now we are making direct edits to the object 'myTable' from within the function. The fact that we aren't passing the object makes our function look to higher levels of namespace to try and resolve the variable name. Lo, and behold, it finds a 'myTable' object higher up! The code in the function makes the changes to the object.
A note to consider: I hate debugging. I mean I really hate debugging. This means a few things for me in R:
I wrap almost everything in a function. As I write my code, as soon as I get a piece working, I wrap it in a function and set it aside. I make heavy use of the '.' prefix for all my function arguments and use no prefix for anything that is native to the namespace it exists in.
I try not to modify global objects from within functions. I don't like where this leads. If an object needs to be modified, I modify it from within the function that declared it. This often means I have layers of functions calling functions, but it makes my work both modular and easy to understand.
I comment all of my code, explaining what each line or block is intended to do. It may seem a bit unrelated, but I find that these three things go together for me. Once you start wrapping coding in functions, you will find yourself wanting to reuse more of your old code. That's where good commenting comes in. For me, it's a necessary piece.
The two paradigms are replacing the whole object, as you indicate, or writing 'replacement' functions such as
`updt<-` <- function(x, ..., value) {
## x is the object to be manipulated, value the object to be assigned
x$lbl <- paste0(x$lbl, value)
x
}
with
> d <- data.frame(x=1:5, lbl=letters[1:5])
> d
x lbl
1 1 a
2 2 b
3 3 c
> updt(d) <- "*"
> d
x lbl
1 1 a*
2 2 b*
3 3 c*
This is the behavior of, for instance, $<- -- in-place update the element accessed by $. Here is a related question. One could think of replacement functions as syntactic sugar for
updt1 <- function(x, ..., value) {
x$lbl <- paste0(x$lbl, value)
x
}
d <- updt1(d, value="*")
but the label 'syntactic sugar' doesn't really do justice, in my mind, to the central paradigm that is involved. It is enabling convenient in-place updates, which is different from the copy-on-change illusion that R usually maintains, and it is really the 'R' way of updating objects (rather than using ?ReferenceClasses, for instance, which have more of the feel of other languages but will surprise R users expecting copy-on-change semantics).
For anybody in the future looking for a simple way (do not know if it is the more appropriate one) to get this solved:
Inside the function create the object to temporally save the modified version of the one you want to change. Use deparse(substitute()) to get the name of the variable that has been passed to the function argument and then use assign() to overwrite your object. You will need to use envir = parent.frame() inside assign() to let your object be defined in the environment outside the function.
(MyTable <- 1:10)
[1] 1 2 3 4 5 6 7 8 9 10
title.asterisk <- function(table) {
tmp.table <- paste0(table, "*")
name <- deparse(substitute(table))
assign(name, tmp.table, envir = parent.frame())
}
(title.asterisk(MyTable))
[1] "1*" "2*" "3*" "4*" "5*" "6*" "7*" "8*" "9*" "10*"
Using parentheses when defining an object is a little more efficient (and to me, better looking) than defining then printing.

R language: changes to the value of an attribute of an object inside a function is lost after function exits [duplicate]

I'm just getting my feet wet in R and was surprised to see that a function doesn't modify an object, at least it seems that's the default. For example, I wrote a function just to stick an asterisk on one label in a table; it works inside the function but the table itself is not changed. (I'm coming mainly from Ruby)
So, what is the normal, accepted way to use functions to change objects in R? How would I add an asterisk to the table title?
Replace the whole object: myTable = title.asterisk(myTable)
Use a work-around to call by reference (as described, for example, in Call by reference in R by TszKin Julian?
Use some structure other than a function? An object method?
The reason you're having trouble is the fact that you are passing the object into the local namespace of the function. This is one of the great / terrible things about R: it allows implicit variable declarations and then implements supercedence as the namespaces become deeper.
This is affecting you because a function creates a new namespace within the current namespace. The object 'myTable' was, I assume, originally created in the global namespace, but when it is passed into the function 'title.asterisk' a new function-local namespace now has an object with the same properties. This works like so:
title.asterisk <- function(myTable){ do some stuff to 'myTable' }
In this case, the function 'title.asterisk' does not make any changes to the global object 'myTable'. Instead, a local object is created with the same name, so the local object supercedes the global object. If we call the function title.asterisk(myTable) in this way, the function makes changes only to the local variable.
There are two direct ways to modify the global object (and many indirect ways).
Option 1: The first, as you mention, is to have the function return the object and overwrite the global object, like so:
title.asterisk <- function(myTable){
do some stuff to 'myTable'
return(myTable)
}
myTable <- title.asterisk(myTable)
This is okay, but you are still making your code a little difficult to understand, since there are really two different 'myTable' objects, one global and one local to the function. A lot of coders clear this up by adding a period '.' in front of variable arguments, like so:
title.asterisk <- function(.myTable){
do some stuff to '.myTable'
return(.myTable)
}
myTable <- title.asterisk(myTable)
Okay, now we have a visual cue that the two variables are different. This is good, because we don't want to rely on invisible things like namespace supercedence when we're trying to debug our code later. It just makes things harder than they have to be.
Option 2: You could just modify the object from within the function. This is the better option when you want to do destructive edits to an object and don't want memory inflation. If you are doing destructive edits, you don't need to save an original copy. Also, if your object is suitably large, you don't want to be copying it when you don't have to. To make edits to a global namespace object, simply don't pass it into or declare it from within the function.
title.asterisk <- function(){ do some stuff to 'myTable' }
Now we are making direct edits to the object 'myTable' from within the function. The fact that we aren't passing the object makes our function look to higher levels of namespace to try and resolve the variable name. Lo, and behold, it finds a 'myTable' object higher up! The code in the function makes the changes to the object.
A note to consider: I hate debugging. I mean I really hate debugging. This means a few things for me in R:
I wrap almost everything in a function. As I write my code, as soon as I get a piece working, I wrap it in a function and set it aside. I make heavy use of the '.' prefix for all my function arguments and use no prefix for anything that is native to the namespace it exists in.
I try not to modify global objects from within functions. I don't like where this leads. If an object needs to be modified, I modify it from within the function that declared it. This often means I have layers of functions calling functions, but it makes my work both modular and easy to understand.
I comment all of my code, explaining what each line or block is intended to do. It may seem a bit unrelated, but I find that these three things go together for me. Once you start wrapping coding in functions, you will find yourself wanting to reuse more of your old code. That's where good commenting comes in. For me, it's a necessary piece.
The two paradigms are replacing the whole object, as you indicate, or writing 'replacement' functions such as
`updt<-` <- function(x, ..., value) {
## x is the object to be manipulated, value the object to be assigned
x$lbl <- paste0(x$lbl, value)
x
}
with
> d <- data.frame(x=1:5, lbl=letters[1:5])
> d
x lbl
1 1 a
2 2 b
3 3 c
> updt(d) <- "*"
> d
x lbl
1 1 a*
2 2 b*
3 3 c*
This is the behavior of, for instance, $<- -- in-place update the element accessed by $. Here is a related question. One could think of replacement functions as syntactic sugar for
updt1 <- function(x, ..., value) {
x$lbl <- paste0(x$lbl, value)
x
}
d <- updt1(d, value="*")
but the label 'syntactic sugar' doesn't really do justice, in my mind, to the central paradigm that is involved. It is enabling convenient in-place updates, which is different from the copy-on-change illusion that R usually maintains, and it is really the 'R' way of updating objects (rather than using ?ReferenceClasses, for instance, which have more of the feel of other languages but will surprise R users expecting copy-on-change semantics).
For anybody in the future looking for a simple way (do not know if it is the more appropriate one) to get this solved:
Inside the function create the object to temporally save the modified version of the one you want to change. Use deparse(substitute()) to get the name of the variable that has been passed to the function argument and then use assign() to overwrite your object. You will need to use envir = parent.frame() inside assign() to let your object be defined in the environment outside the function.
(MyTable <- 1:10)
[1] 1 2 3 4 5 6 7 8 9 10
title.asterisk <- function(table) {
tmp.table <- paste0(table, "*")
name <- deparse(substitute(table))
assign(name, tmp.table, envir = parent.frame())
}
(title.asterisk(MyTable))
[1] "1*" "2*" "3*" "4*" "5*" "6*" "7*" "8*" "9*" "10*"
Using parentheses when defining an object is a little more efficient (and to me, better looking) than defining then printing.

How to use reflection to intercept an expression prior to evaluation?

I was hoping to use R's reflection capabilities to intercept the current expression under evaluation before it is evaluated.
For instance, to create some syntax sugar, given the following:
> Server <- setRefClass("Server",
> methods = list(
> handler = function(expr) submitExpressionToRemoteServer(expr)
> )
> )
> server <- Server()
> server$foo$bar$baz #... should be map to... server$handler("foo$bar$baz")
I want the expression server$foo$bar$baz to be intercepted by the server$handlermethod and get mapped to server$handler("foo$bar$baz").
Note that I want this call to succeed even though server$foo is not defined: I am interested only in the expression itself (so I can do stuff with the expression), not that it evaluates to a valid local object.
Is this possible?
I don't think this is possible to redefine the $ behavior with Reference Classes (R5) objects in R. However, this is something that you can do with S4 classes. The main problem is that an expression like
server$foo$bar$baz
would get translated to a series of calls like
$($($(server,"foo"),"bar"),"baz")
but unlike normal function nesting, each inner call appears to be fully evaluated before going to the next level of nesting. This it's not really possible just to split up everything after the first $ because that's not how it's parsed. However you can have the $ function return another object and append all the values sent to the object. Here's a sample S4 class
setClass("Server", slots=list(el="character"))
setMethod("$", signature(x="Server"),
function(x,name) {
xx <- append(slot(x,"el"),name)
new("Server", el=xx)
}
)
server <- new("Server")
server$foo$bar$baz
# An object of class "Server"
# Slot "el":
# [1] "foo" "bar" "baz"
the only problem is there's no way i've found to know when you're at the end of a list if you wanted to do anything with those parameters.

Why is R capricious in its use of attributes on reference class objects?

I am having some trouble achieving consistent behavior accessing attributes attached to reference class objects. For example,
testClass <- setRefClass('testClass',
methods = list(print_attribute = function(name) print(attr(.self, name))))
testInstance <- testClass$new()
attr(testInstance, 'testAttribute') <- 1
testInstance$print_attribute('testAttribute')
And the R console cheerily prints NULL. However, if we try another approach,
testClass <- setRefClass('testClass',
methods = list(initialize = function() attr(.self, 'testAttribute') <<- 1,
print_attribute = function(name) print(attr(.self, name))))
testInstance <- testClass$new()
testInstance$print_attribute('testAttribute')
and now we have 1 as expected. Note that the <<- operator is required, presumably because assigning to .self has the same restrictions as assigning to reference class fields. Note that if we had tried to assign outside of the constructor, say
testClass <- setRefClass('testClass',
methods = list(set_attribute = function(name, value) attr(.self, name) <<- value,
print_attribute = function(name) print(attr(.self, name))))
testInstance <- testClass$new()
testInstance$set_attribute('testAttribute', 1)
we would be slapped with
Error in attr(.self, name) <<- value :
cannot change value of locked binding for '.self'
Indeed, the documentation ?setRefClass explains that
The entire object can be referred to in a method by the reserved name .self ... These fields are read-only (it makes no sense to
modify these references), with one exception. In principal, the
.self field can be modified in the $initialize method, because
the object is still being created at this stage.
I am happy with all of this, and agree with author's decisions. However, what I am concerned about is the following. Going back to the first example above, if we try asking for attr(testInstance, 'testAttribute'), we see from the global environment that it is 1!
Presumably, the .self that is used in the methods of the reference class object is stored in the same memory location as testInstance--it is the same object. Thus, by setting an attribute on testInstance successfully in the global environment, but not as a .self reference (as demonstrated in the first example), have we inadvertently triggered a copy of the entire object in the global environment? Or is the way attributes are stored "funny" in some way that the object can reside in the same memory, but its attributes are different depending on the calling environment?
I see no other explanation for why attr(.self, 'testAttribute') is NULL but attr(testInstance, 'testAttribute') is 1. The binding .self is locked once and for all, but that does not mean the object it references cannot change. If this is the desired behavior, it seems like a gotcha.
A final question is whether or not the preceding results imply attr<- should be avoided on reference class objects, at least if the resulting attributes are used from within the object's methods.
I think I may have figured it out. I began by digging into the implementation of reference classes for references to .self.
bodies <- Filter(function(x) !is.na(x),
structure(sapply(ls(getNamespace('methods'), all.names = TRUE), function(x) {
fn <- get(x, envir = getNamespace('methods'))
if (is.function(fn)) paste(deparse(body(fn)), collapse = "\n") else NA
}), .Names = ls(getNamespace('methods'), all.names = TRUE))
)
Now bodies holds a named character vector of all the functions in the methods package. We now look for .self:
goods <- bodies[grepl("\\.self", bodies)]
length(goods) # 4
names(goods) # [1] ".checkFieldsInMethod" ".initForEnvRefClass" ".makeDefaultBinding" ".shallowCopy"
So there are four functions in the methods package that contain the string .self. Inspecting them shows that .initForEnvRefClass is our culprit. We have the statement selfEnv$.self <- .Object. But what is selfEnv? Well, earlier in that same function, we have .Object#.xData <- selfEnv. Indeed, looking at the attributes on our testInstance from example one gives
$.xData
<environment: 0x10ae21470>
$class
[1] "testClass"
attr(,"package")
[1] ".GlobalEnv"
Peeking into attributes(attr(testInstance, '.xData')$.self) shows that we indeed can access .self directly using this approach. Notice that after executing the first two lines of example one (i.e. setting up testInstance), we have
identical(attributes(testInstance)$.xData$.self, testInstance)
# [1] TRUE
Yes! They are equal. Now, if we perform
attr(testInstance, 'testAttribute') <- 1
identical(attributes(testInstance)$.xData$.self, testInstance)
# [1] FALSE
so that adding an attribute to a reference class object has forced a creation of a copy, and .self is no longer identical to the object. However, if we check that
identical(attr(testInstance, '.xData'), attr(attr(testInstance, '.xData')$.self, '.xData'))
# [1] TRUE
we see that the environment attached to the reference class object remains the same. Thus, the copying was not very consequential in terms of memory footprint.
The end result of this foray is that the final answer is yes, you should avoid setting attributes on reference classes if you plan to use them within that object's methods. The reason for this is that the .self object in a reference class object's environment should be considered fixed once and for all after the object has been initialized--and this includes the creation of additional attributes.
Since the .self object is stored in an environment that is attached as an attribute to the reference class object, it does not seem possible to avoid this problem without using pointer yoga--and R does not have pointers.
Edit
It appears that if you are crazy, you can do
unlockBinding('.self', attr(testInstance, '.xData'))
attr(attr(testInstance, '.xData')$.self, 'testAttribute') <- 1
lockBinding('.self', attr(testInstance, '.xData'))
and the problems above magically go away.

set method initialize S4 class vs. using function

I now have the class construction working in two ways:
The first,
setMethod("initialize", signature(.Object = "BondCashFlows"),
function(.Object, x, y, ...){
do some things .Object#foo = array[,m]
}
The second,
BondCashFlows <- function(){do some things new("BondCashFlows", ...)
So, my question is why do I even have to bother with the first since the second is much more of a user friendly way of creating the object BondCashFlows?
I understand that the first is method on a class but I am not sure why I have to do this
One of the advantage of using S4 method over a simple R function , is that the method is strongly typed.
Having a signature is a guard that methods aren't exposed to types
that doesn't meet their signature requirements. Otherwise it will
throw an exception.
It's often the case that you want to differentiate method behavior
depending on the parameter type passed. Strong typing makes that
very easy and simple.
Strongly typed is more human readable ( even if in R this argument can be debated, The S4 syntax is not very intuitive specially for a beginner)
Here and example, where I define a simple function then I wrap it in a method
show.vector <- function(.object,name,...).object[,name]
## you should first define a generic to define
setGeneric("returnVector", function(.object,name,...)
standardGeneric("returnVector")
)
## the method here is just calling the showvector function.
## Note that the function argument types are explicitly defined.
setMethod("returnVector", signature(.object="data.frame", name="character"),
def = function(.object, name, ...) show.vector(.object,name,...),
valueClass = "data.frame"
)
Now if you test this :
show.vector(mtcars,'cyl') ## works
show.vector(mtcars,1:10) ## DANGER!!works but not the desired behavior
show.vector(mtcars,-1) ## DANGER!!works but not the desired behavior
comparing to the method call:
returnVector(mtcars,'cyl') ## works
returnVector(mtcars,1:10) ## SAFER throw an excpetion
returnVector(mtcars,-1) ## SAFER throw an excpetion
Hence, If you will expose your method to others, it is better to encapsulate them in a method.

Resources