set method initialize S4 class vs. using function - r

I now have the class construction working in two ways:
The first,
setMethod("initialize", signature(.Object = "BondCashFlows"),
function(.Object, x, y, ...){
do some things .Object#foo = array[,m]
}
The second,
BondCashFlows <- function(){do some things new("BondCashFlows", ...)
So, my question is why do I even have to bother with the first since the second is much more of a user friendly way of creating the object BondCashFlows?
I understand that the first is method on a class but I am not sure why I have to do this

One of the advantage of using S4 method over a simple R function , is that the method is strongly typed.
Having a signature is a guard that methods aren't exposed to types
that doesn't meet their signature requirements. Otherwise it will
throw an exception.
It's often the case that you want to differentiate method behavior
depending on the parameter type passed. Strong typing makes that
very easy and simple.
Strongly typed is more human readable ( even if in R this argument can be debated, The S4 syntax is not very intuitive specially for a beginner)
Here and example, where I define a simple function then I wrap it in a method
show.vector <- function(.object,name,...).object[,name]
## you should first define a generic to define
setGeneric("returnVector", function(.object,name,...)
standardGeneric("returnVector")
)
## the method here is just calling the showvector function.
## Note that the function argument types are explicitly defined.
setMethod("returnVector", signature(.object="data.frame", name="character"),
def = function(.object, name, ...) show.vector(.object,name,...),
valueClass = "data.frame"
)
Now if you test this :
show.vector(mtcars,'cyl') ## works
show.vector(mtcars,1:10) ## DANGER!!works but not the desired behavior
show.vector(mtcars,-1) ## DANGER!!works but not the desired behavior
comparing to the method call:
returnVector(mtcars,'cyl') ## works
returnVector(mtcars,1:10) ## SAFER throw an excpetion
returnVector(mtcars,-1) ## SAFER throw an excpetion
Hence, If you will expose your method to others, it is better to encapsulate them in a method.

Related

Why is there a difference between length(f) and length(g) in this example?

f <- function() 1
g <- function() 2
class(g) <- "function"
class(f) ## "function"
class(g) ## "function"
length.function <- function(x) "function"
length(f) ## 1
length(g) ## "function"
First, length is not a typical generic function, but rather an "Internal Generic Function". You can see this by looking at its definition:
> length
function (x) .Primitive("length")
Compare this to a typical generic function:
> print
function (x, ...)
UseMethod("print")
<bytecode: 0x116ca6f90>
<environment: namespace:base>
length calls straight into .Primitive which then can do dispatch if it does not handle the call itself; the typical approach is directly calling UseMethod which only handles dispatch. Also note that there is no length.default function because the code in the .Primitive call does that:
> methods("length")
[1] length.function length.pdf_doc* length.POSIXlt
I am not sure it is completely defined when an Internal Generic will look at user defined methods and when it will use only internal ones; I think the general idea is that for a user/package defined (effectively, non-core) class, provided methods will be used. But overriding for internal classes may or may not work.
Additionally (though not strictly relevant for this case), even for a typical generic method, the documentation is ambiguous as to what should happen when the class is derived implicitly rather than given as an attribute. First, what class() reports is an amalgamation of things. From the class help page:
Many R objects have a class attribute, a character vector giving the names of the classes from which the object inherits. If the object does not have a class attribute, it has an implicit class, "matrix", "array" or the result of mode(x) (except that integer vectors have implicit class "integer").
So despite class returning the same thing for f and g, they are not the same.
> attributes(f)
$srcref
function() 1
> attributes(g)
$srcref
function() 2
$class
[1] "function"
Now, here is where it gets ambiguous. Method dispatch is talked about in (at least) 2 places: the class help page and the UseMethod help page. UseMethod says:
When a function calling UseMethod("fun") is applied to an object with class attribute c("first", "second"), the system searches for a function called fun.first and, if it finds it, applies it to the object. If no such function is found a function called fun.second is tried. If no class name produces a suitable function, the function fun.default is used, if it exists, or an error results.
While class says:
When a generic function fun is applied to an object with class attribute c("first", "second"), the system searches for a function called fun.first and, if it finds it, applies it to the object. If no such function is found, a function called fun.second is tried. If no class name produces a suitable function, the function fun.default is used (if it exists). If there is no class attribute, the implicit class is tried, then the default method.
The real difference is in the last sentence that the class page has that UseMethod doesn't. UseMethod does not say what happens if there is no class attribute; class says that the implicit class is used to dispatch. Your code seems to indicate that what is documented in class is not correct, as length.function would have been called for g were it.
What really happens in method dispatch when there is no class attribute will probably require examining the source code as the documentation does not seem to help.

Why is R capricious in its use of attributes on reference class objects?

I am having some trouble achieving consistent behavior accessing attributes attached to reference class objects. For example,
testClass <- setRefClass('testClass',
methods = list(print_attribute = function(name) print(attr(.self, name))))
testInstance <- testClass$new()
attr(testInstance, 'testAttribute') <- 1
testInstance$print_attribute('testAttribute')
And the R console cheerily prints NULL. However, if we try another approach,
testClass <- setRefClass('testClass',
methods = list(initialize = function() attr(.self, 'testAttribute') <<- 1,
print_attribute = function(name) print(attr(.self, name))))
testInstance <- testClass$new()
testInstance$print_attribute('testAttribute')
and now we have 1 as expected. Note that the <<- operator is required, presumably because assigning to .self has the same restrictions as assigning to reference class fields. Note that if we had tried to assign outside of the constructor, say
testClass <- setRefClass('testClass',
methods = list(set_attribute = function(name, value) attr(.self, name) <<- value,
print_attribute = function(name) print(attr(.self, name))))
testInstance <- testClass$new()
testInstance$set_attribute('testAttribute', 1)
we would be slapped with
Error in attr(.self, name) <<- value :
cannot change value of locked binding for '.self'
Indeed, the documentation ?setRefClass explains that
The entire object can be referred to in a method by the reserved name .self ... These fields are read-only (it makes no sense to
modify these references), with one exception. In principal, the
.self field can be modified in the $initialize method, because
the object is still being created at this stage.
I am happy with all of this, and agree with author's decisions. However, what I am concerned about is the following. Going back to the first example above, if we try asking for attr(testInstance, 'testAttribute'), we see from the global environment that it is 1!
Presumably, the .self that is used in the methods of the reference class object is stored in the same memory location as testInstance--it is the same object. Thus, by setting an attribute on testInstance successfully in the global environment, but not as a .self reference (as demonstrated in the first example), have we inadvertently triggered a copy of the entire object in the global environment? Or is the way attributes are stored "funny" in some way that the object can reside in the same memory, but its attributes are different depending on the calling environment?
I see no other explanation for why attr(.self, 'testAttribute') is NULL but attr(testInstance, 'testAttribute') is 1. The binding .self is locked once and for all, but that does not mean the object it references cannot change. If this is the desired behavior, it seems like a gotcha.
A final question is whether or not the preceding results imply attr<- should be avoided on reference class objects, at least if the resulting attributes are used from within the object's methods.
I think I may have figured it out. I began by digging into the implementation of reference classes for references to .self.
bodies <- Filter(function(x) !is.na(x),
structure(sapply(ls(getNamespace('methods'), all.names = TRUE), function(x) {
fn <- get(x, envir = getNamespace('methods'))
if (is.function(fn)) paste(deparse(body(fn)), collapse = "\n") else NA
}), .Names = ls(getNamespace('methods'), all.names = TRUE))
)
Now bodies holds a named character vector of all the functions in the methods package. We now look for .self:
goods <- bodies[grepl("\\.self", bodies)]
length(goods) # 4
names(goods) # [1] ".checkFieldsInMethod" ".initForEnvRefClass" ".makeDefaultBinding" ".shallowCopy"
So there are four functions in the methods package that contain the string .self. Inspecting them shows that .initForEnvRefClass is our culprit. We have the statement selfEnv$.self <- .Object. But what is selfEnv? Well, earlier in that same function, we have .Object#.xData <- selfEnv. Indeed, looking at the attributes on our testInstance from example one gives
$.xData
<environment: 0x10ae21470>
$class
[1] "testClass"
attr(,"package")
[1] ".GlobalEnv"
Peeking into attributes(attr(testInstance, '.xData')$.self) shows that we indeed can access .self directly using this approach. Notice that after executing the first two lines of example one (i.e. setting up testInstance), we have
identical(attributes(testInstance)$.xData$.self, testInstance)
# [1] TRUE
Yes! They are equal. Now, if we perform
attr(testInstance, 'testAttribute') <- 1
identical(attributes(testInstance)$.xData$.self, testInstance)
# [1] FALSE
so that adding an attribute to a reference class object has forced a creation of a copy, and .self is no longer identical to the object. However, if we check that
identical(attr(testInstance, '.xData'), attr(attr(testInstance, '.xData')$.self, '.xData'))
# [1] TRUE
we see that the environment attached to the reference class object remains the same. Thus, the copying was not very consequential in terms of memory footprint.
The end result of this foray is that the final answer is yes, you should avoid setting attributes on reference classes if you plan to use them within that object's methods. The reason for this is that the .self object in a reference class object's environment should be considered fixed once and for all after the object has been initialized--and this includes the creation of additional attributes.
Since the .self object is stored in an environment that is attached as an attribute to the reference class object, it does not seem possible to avoid this problem without using pointer yoga--and R does not have pointers.
Edit
It appears that if you are crazy, you can do
unlockBinding('.self', attr(testInstance, '.xData'))
attr(attr(testInstance, '.xData')$.self, 'testAttribute') <- 1
lockBinding('.self', attr(testInstance, '.xData'))
and the problems above magically go away.

How to force dispatch to an internal generic in R?

I have a class 'myClass' in R that is essentially a list. It has an assignment operator which is going to do some things and then should assign the value using the regular list assignment operator
`$<-.myClass`<-function(x,i,value){
# do some pre-processing stuff
# make the assignment using the default list assignment
x[[i]]<-value
x
}
But I can't actually use x[[i]]<-value as it will dispatch to the already existing [[<-.myClass method.
In similar S3 dispatching cases, I've been able use UseMethod or specifically call [[<-.list, or [[<-.default but those don't seem to exist because $<- and [[<- are primitive generics, right? And I'm sure I'll be sent to a special R hell if I try to call .Primitive("$<-"). What is the correct way to dispatch the assignment to the default assignment method?
It doesn't look like there is a particularly elegant way to do this. The data.frame method for $<- looks like this:
`$<-.data.frame` <- function (x, name, value) {
cl <- oldClass(x)
class(x) <- NULL
x[[name]] <- value
class(x) <- cl
x
}
(with error checking code omitted)
This should only create one copy of x, because class<- modifies in place, and so does the default method for [[<-.

How to avoid prepending .self when using eval in a reference class in R?

I need to use eval to call a reference class method. Below is a toy example:
MyClass <- setRefClass("MyClass",
fields = c("my_field"),
methods = list(
initialize = function(){
my_field <<- 3
},
hello = function(){
"hello"
},
run = function(user_defined_text){
eval(parse(text = user_defined_text))
}
)
)
p <- MyClass$new()
p$run("hello()") # Error: could not find function "hello" - doesn't work
p$run(".self$hello()") # "hello" - it works
p$run("hello()") # "hello" - now it works?!
p <- MyClass$new()
p$run("my_field") # 3 - no need to add .self
I guess I could do eval(parse(text = paste0(".self$", user_defined_text))), but I don't really understand:
why is .self needed to eval methods, but not fields?
why is .self no longer needed after it has been used once?
'Why' questions are always challenging to answer; usually the answer is 'because'. On ?setRefClass we eventually have
Only methods actually used will be included in the environment
corresponding to an individual object. To declare that a method
requires a particular other method, the first method should
include a call to '$usingMethods()' with the name of the other
method as an argument. Declaring the methods this way is essential
if the other method is used indirectly (e.g., via 'sapply()' or
'do.call()'). If it is called directly, code analysis will find
it. Declaring the method is harmless in any case, however, and may
aid readability of the source code.
I'm not sure this is entirely helpful in your case, where the user is apparently able to specify any method. Offering a little unasked editorial comment, I'm not sure 'why' you'd want to write a method that would parse input text to methods; I've never used that paradigm myself.

named parameters with same name

I'm using the 'caret' library to to do some cross validation on some trees.
The library provides a function called train, that takes in a named argument "method". Via its ellipsis it's supposed to let other arguments fall through to another function that it calls. This other function (rpart) takes an argument of the same name, "method".
Therefore I want to pass two arguments with the same name... and it's clearly failing. I tried to work around things as shown below but I get the error:
"Error in train.default(x = myx, y = myy, method = "rpart2", preProcess = NULL, :
formal argument "method" matched by multiple actual arguments"
any help is much appreciated! thanks!
train.wrapper = function(myx, myy, mytrControl, mytuneLenght, ...){
result = train(
x=myx,
y=myy,
method="rpart2",
preProcess=NULL,
...,
weights=NULL,
metric="Accuracy",
trControl=mytrControl,
tuneLength=mytuneLenght
)
return (result)
}
dtree.train.cv = train.wrapper(training.matrix[,2:1777],
training.matrix[,1],
2, method="class")
Here's a mock-up of your problem with a tr (train) function that calls an rp (rpart) function, passing it ...:
rp <- function(method, ...) method
tr <- function(method, ...) rp(...)
# we want to pass 2 to rp:
tr(method=1, method=2) # Error
tr(1, method=2) # 1, (wrong value!)
tr(method=1, metho=2) # 2 (Yay!)
What magic is this? And why does the last case actually work?! Well, we need to understand how argument matching works in R. A function f <- function(foo, bar) is said to have formal parameters "foo" and "bar", and the call f(foo=3, ba=13) is said to have (actual) arguments "foo" and "ba".
R first matches all arguments that have exactly the same name as a formal parameter. This is why the first "method" argument gets passed to train. Two identical argument names cause an error.
Then, R matches any argument names that partially matches a (yet unmatched) formal parameter. But if two argument names partially match the same formal parameter, that also causes an error. Also, it only matches formal parameters before .... So formal parameters after ... must be specified using their full names.
Then the unnamed arguments are matched in positional order to the remaining formal arguments.
Finally, if the formal arguments include ..., the remaining arguments are put into the ....
PHEW! So in this case, the call to tr fully matches method, and then pass the rest into .... When tr then calls rp, the metho argument partially matches its formal parameter method, and all is well!
...Still, I'd try to contact the author of train and point out this problem so he can fix it properly! Since "rpart" and "rpart2" are supposed to be supported, he must have missed this use case!
I think he should rename his method parameter to method. or similar (anything longer than "method"). This will still be backward compatible, but allows another method parameter to be passed correctly to rpart.
Generally wrappers will pass their parameters in a named list. In the case of train, provision for control is passed in the trControl argument. Perhaps you should try:
dtree.train.cv = train.wrapper(training.matrix[,2:1777],
training.matrix[,1],
2, # will be positionally matched, probably to 'myTuneLenght'
myTrControl=list(method="class") )
After your comment I reviewed again the train and rpart help pages. You could well be correct in thinking that trControl has a different purpose. I am suspicious that you may need to construct your call with a formula since rpart only has a formula method. If the y argument is a factor than method="class will be assumed by rpart. And ... running modelLookup:
modelLookup("rpart2")
model parameter label seq forReg forClass probModel
154 rpart2 maxdepth Max Tree Depth TRUE TRUE TRUE TRUE
... suggest to me that a "class" method would be assumed by default as well. You may also need to edit your question to include a data example (perhaps from the rpart help page?) if you want further advice.

Resources