I have been playing a little with R's R5 class system to see what it can and can't do. In that process I have stumbled upon what looks like static class field members (which does not appear to be in the documentation - but i could have missed it)
[2014 update]
Warning !!! : The following code does not work with R version >= 3.0
This post, its answers and particularly the comments provide useful insights and reminders about the R5 OO-system and the R language in general. However it is probably a bad idea to cultivate the idiom of using the environment of the R5 class instances directly.
[end 2014 update]
In the following code, the first field is the classic definition of an instance field variable. The second definition appears to create a static class field using an accessor method. I would like to know if this use is kosher (or is my code example simply coincidental). The third field use creates a quasi-private instant field variable using an accessor method.
assertClass <- function(x, className, R5check=FALSE) {
# simple utility function
stopifnot(class(x)[1] == className)
if(R5check) stopifnot(is(x, 'envRefClass'))
}
A <- setRefClass('A',
fields = list(
# 1. public, typed, instance field
myPublicInstanceVar = 'character',
# 2. this assignment appears static
# but if the field me.static.private
# was declared in the field list
# it would be a local instance var
myPrivateStaticVar = function(x) {
if (!missing(x)) {
assertClass(x, 'character')
me.static.private <<- x
}
me.static.private
},
# 3. quasi-private, typed, instance field
myPrivateInstanceVar = function(x) {
if (!missing(x)) {
assertClass(x, 'character')
.self$me.private <<- x
}
.self$me.private
}
),
methods = list(
initialize = function (c='default') {
myPublicInstanceVar <<- c
myPrivateStaticVar <<- c
myPrivateInstanceVar <<- c
}
)
)
# test instantiation
instance1.of.A <- A$new('first instance')
str(instance1.of.A)
instance2.of.A <- A$new('second instance')
str(instance1.of.A)
str(instance2.of.A)
instance3.of.A <- getRefClass('A')$new('third instance')
instance3.of.A$myPrivateStaticVar <- 'Third instance - changed'
print(instance1.of.A$myPrivateStaticVar)
print(instance2.of.A$myPrivateStaticVar)
print(instance3.of.A$myPrivateStaticVar)
str(instance1.of.A)
str(instance2.of.A)
str(instance3.of.A)
# but not really private ...
instance1.of.A$myPublicInstanceVar # works
instance1.of.A$me.static.private # DOES NOT WORK - where is this variable stored
instance1.of.A$me.private # works
# till death do us part
instance3.of.A <- NULL
gc()
str(instance1.of.A)
str(instance2.of.A)
str(instance3.of.A)
If you run this code - you can see that the second field variable appears to operate as a static class member. What is less clear to me is where the reference class keeps this field (hence my comment in the penultimate line above).
The short answer - based on Hadley's comments above - is no. R's reference classes do not have static variables.
Related
I am trying to change a variable in a function but even tho the function is producing the right values, when I go to use them in the next sections, R is still using the initial values.
I created a function to update my variables NetN and NetC:
Reproduction=function(NetN,NetC,cnrep=20){
if(NetC/NetN<=cnrep) {
DeltaC=NetC*p;
DeltaN=DeltaC/cnrep;
Crep=Crep+DeltaC;
Nrep=Nrep+DeltaN;
Brep=(Nrep*14+Crep*12)*2/1e6;
NetN=NetN-DeltaN; #/* Update N, C values */
NetC=NetC*(1-p)
print ("'Using C to allocate'")
}
else {
print("Using N to allocate");
DeltaN=NetN*p;
DeltaC=DeltaN*cnrep;
Nrep=Nrep+DeltaN;
Crep=Crep+DeltaC;
Brep=(Nrep*14+Crep*12)*2/1e6;
NetN=NetN*(1-p);
NetC=NetC-DeltaC;
} } return(c(NetC=NetC,NetN=NetN,NewB=NewB,Crep=Crep,Nrep=Nrep,Brep=Brep))}
When I use my function by say doing:
Reproduction(NetN=1.07149,NetC=0.0922349,cnrep=20)
I get the desired result printed out which includes:
NetC=7.378792e-02
However, when I go to use NetC in the next section of my code, R is still using NetC=0.0922349.
Can I make R update NetC without having to define a new variable?
In R, in general, functions shouldn't change things outside of the function. It's possible to do so using <<- or assign(), but this generally makes your function inflexible and very surprising.
Instead, functions should return values (which yours does nicely), and if you want to keep those values, you explicitly use <- or = to assign them to objects outside of the function. They way your function is built now, you can do that like this:
updates = Reproduction(NetN = 1.07149, NetC = 0.0922349, cnrep = 20)
NetC = updates["NetC"]
This way, you (a) still have all the other results of the function stored in updates, (b) if you wanted to run Reproduction() with a different set of inputs and compare the results, you can do that. (If NetC updated automatically, you could never see two different values), (c) You can potentially change variable names and still use the same function, (d) You can run the function to experiment/see what happens without saving/updating the values.
If you generally want to keep NetN, NetC, and cnrep in sync, I would recommend keeping them together in a named vector or list, and rewriting your function to take that list as input and return that list as output. Something like this:
params = list(NetN = 1.07149, NetC = 0.0922349, cnrep = 20)
Reproduction=function(param_list){
NetN = param_list$NetN
NetC = param_list$NetC
cnrep = param_list$cnrep
if(NetC/NetN <= cnrep) {
DeltaC=NetC*p;
DeltaN=DeltaC/cnrep;
Crep=Crep+DeltaC;
Nrep=Nrep+DeltaN;
Brep=(Nrep*14+Crep*12)*2/1e6;
NetN=NetN-DeltaN; #/* Update N, C values */
NetC=NetC*(1-p)
print ("'Using C to allocate'")
}
else {
print("Using N to allocate");
DeltaN=NetN*p;
DeltaC=DeltaN*cnrep;
Nrep=Nrep+DeltaN;
Crep=Crep+DeltaC;
Brep=(Nrep*14+Crep*12)*2/1e6;
NetN=NetN*(1-p);
NetC=NetC-DeltaC;
}
## Removed extra } and ) ??
return(list(NetC=NetC, NetN=NetN, NewB=NewB, Crep=Crep, Nrep=Nrep, Brep=Brep))
}
This way, you can use the single line params <- Reproduction(params) to update everything in your list. You can access individual items in the list with either params$Netc or params[["NetC"]].
I'm coming from C++ background, trying to make use of it for R OOP programming with R6 package.
Consider the following typical situation when writing a large OOP code. -
You have a class, in which you have several (possibly many) functions, each of which may also be quite complex and with many lines of code:
# file CTest.R
cTest <- R6Class(
"CTest",
public = list(
z = 10,
fDo1 = function() {
# very long and complex code goes here
self$z <- self$z*2; self$z
},
fDo2 = function() {
# another very long and complex code goes here
print(self)
}
)
) #"CTest"
Naturally, you don't want to put ALL your long and various functions in the same (CTest.R) file - it will become messy and unmanageable.
If you program in C++, normal way to program such code is : first, you declare you functions in .h file, then you create .c files for each you complex function, where you define your function. This makes it possible to do collaborative code writing, including efficient source-control.
So, I've tried to do something similar in R, like: first, declaring a function as in code above, and then, trying to assign the "actual long and complex" code to it later (which later I would put in a separate file CTest-Do1.R):
cTest$f <- function() {
self$z <- self$z*100000; self$z
}
Now I test if it works:
> tt <- cTest$new(); tt; tt$fDo1(); tt
<CTest>
Public:
clone: function (deep = FALSE)
fDo1: function ()
fDo2: function ()
z: 10
[1] 20
<CTest>
Public:
clone: function (deep = FALSE)
fDo1: function ()
fDo2: function ()
z: 20
No, it does not.- As seen from output above, the function has not been changed.
Any advice?
Thanks to Grothendieck's comment above, there's a reasonable workaround to make it work.
Instead of this:
# CTest-Do1_doesnotwork.R
cTest$fDo1 <- function() {
...
}
write this:
# CTest-Do1_works.R
cTest$set(
overwrite = TRUE, "public", "fDo1",
function() {
...
}
)
This code can now be written in separate file, as originally desired.
I still wonder though - Is the above describe way actually the common(best) practice for writing large OOP codes in R community? (looks a bit strange to me).
If not, what is it (beyond just using source()) ? - so that to enable collaborative coding and source control for separate parts (functions) of a class ?
I came here also searching for R6 best practice. One way that I've seen (here) is to define the functions elsewhere as normal R functions and pass in self, private etc as required
cTest<- R6::R6Class("CTest",
public = list(
fDo1 = function()
cTestfDo1(self),
fDo2 = function(x)
cTestfDo2(self, private, x)
))
and else where have
cTestfDo1 <- function(self) {
self$z <- self$z*2; self$z
}
and somewhere else
cTestfDo2 <- function(self, private, x) {
self$z * private$q + x
}
etc
I don't know if it's best practice, or efficient, but the class definition looks neat with it and if the cTestfDo1 functions are not exported then it's relatively neat in the namespace too.
Im trying to manipulate a large data table (~37 MB) but in a special way: for other (unrelated) reasons I have implemented a 'hook' like structure meaning that the overall process is like
1) load the data.table from disk
2) fire a certain hook
3) the hook structure looks for this name ans checks whether the user (=me :)) has bound a function to this hook and if so, it is called
4) the data is processed further
The functions look like this:
data = readRDS(pathToFile)
data = data.table(data)
fireHook("After_data_read", data, [some other parameters])
some_more_processing(data)
and the region around fireHook looks like
hooksRegistered = list(
"After_data_read" = function(data, ...) {
# do some stuff
}
)
fireHook = function(hookName, ...) {
for (hookNameRegistered in names(hooksRegistered)) {
if (hookName == hookNameRegistered) {
func = .global.hooksRegistered[[hookName]]
func(hookName, ...)
}
}
}
Observe that one needs to cast an object that already is a data.table into it again (otherwise the pass-by-reference does not work), see Adding new columns to a data.table by-reference within a function not always working and Pass by reference bug?
Problem: this line: func(hookName, ...) takes like forever (> 5 minutes).
The debugger never really gets into the function (so its not the code in the function that takes a long time) and I've tested it with small data.tables and it worked. Also, I noted that the following seems to work:
fireHook = function(hookName, ...) {
args = list(...)
for (hookNameRegistered in names(.global.hooksRegistered)) {
if (hookName == hookNameRegistered) {
func = .global.hooksRegistered[[hookName]]
func(hookName, args)
}
}
}
(notice that I substituted ... by list(...)). To me, it seems as if R is trying to copy the whole table when using .... Is this right/desired? Or am I using it wrong?
regards,
FW
I'm looking for a way to tell an instance of a reference class to forget one of its method definitions. For example, I create the class MyReferenceClass and an instance called my_object I can call the method print_hello and everything works:
MyReferenceClass <- setRefClass("MyReferenceClass",
methods = list(
print_hello = function(){
print("hello")
}
)
)
my_object <- MyReferenceClass$new()
my_object$print_hello() # "hello"
If I update the class definition by adding a new method (print_goodbye) my existing object will be able to use it. But if I change a previously defined method (print_hello), it won't update:
MyReferenceClass <- setRefClass("MyReferenceClass",
methods = list(
print_hello = function(){
print("hello_again")
},
print_goodbye = function(){
print("goodbye")
}
)
)
my_object$print_goodbye() # "goodbye" => it works
my_object$print_hello() # "hello" => it doesn't work
Is there a way to tell my_object to forget about its definition of print_hello? This doesn't work: my_object$print_hello <<- NULL
AFAIK the answer is no when trying to inform the object about the class def change "after the fact", i.e. after it has been instantiated/created.
Once you created an instance of a S4 Class, that object is "bound" to the class def as it was when you created the object. And in my opinion this makes perfect sense. Not sure if the "successful" update for formerly missing methods (i.e. print_goodbye()) simply works "by accident" or actually is the desired behavior.
Recommended way to deal with updated class defs
My recommendation: if you decide you want/need to update your class defs, you're just safer off by re-sourcing your entire project code. That way you make sure everything is in place before you create actual instances. I'd consider anything else to be quite a hack that stands on very shaky grounds.
If you decide to hack anyway
There might be some dirty way to hack the hidden .refClassDef object field of an Reference Class instance that actually contains the class def (see my_object$.refClassDef). But setting this field (i.e. using <- on it) didn't work:
my_object$.refClassDef <- MyReferenceClass
Error in envRefSetField(x, what, refObjectClass(x), selfEnv, value) :
'.refClassDef' is not a field in class "MyReferenceClass"
Neither did an explicit assignment via assign():
assign(".refClassDef", MyReferenceClass, my_object)
Error in assign(".refClassDef", MyReferenceClass, my_object) :
cannot change value of locked binding for '.refClassDef'
An even deeper hack would probably involve looking at attributes(my_object$.refClassDef).
There you might find the actual pieces that make up the ref class def. However, I don't know if even changing anything there would be "immediately" reflected.
Also, resetClass() might give you some more insights.
UPDATE: 2014-03-19
For handling your caching-approach two approaches come to mind:
1. The most evident way: use copy()
See ?setRefClass
MyReferenceClass <- setRefClass("MyReferenceClass",
methods = list(
print_hello = function(){
print("hello")
}
)
)
my_object <- MyReferenceClass$new()
MyReferenceClass <- setRefClass("MyReferenceClass",
methods = list(
print_hello = function(){
print("hello_again")
},
print_goodbye = function(){
print("goodbye")
}
)
)
Before copying:
my_object$print_hello()
[1] "hello"
After copying:
my_object <- my_object$copy()
my_object$print_hello()
[1] "hello_again"
2. Hacking at attributes(my_object$.refClassDef)$refMethods (OUTLINE, NOT WORKING YET)
Even though I wouldn't recommend actually relying on something like this, hacks are always a great way to get a deeper understanding of how things work.
In this case, we could try to modify attributes(my_object$.refClassDef)$refMethods which is an environment that contains the actual method defs as I'm guessing that this is where the object "looks" when a method is called.
It's no problem overwriting the actual method defs, yet it seems to have no immediate effect. I'm guessing that there are more "links" to the "old" class def involved that would need to be updated manually in a similar way.
Note that my_object still features the method print_hello that prints "hello":
attributes(my_object$.refClassDef)$refMethods$print_hello
Class method definition for method print_hello()
function ()
{
print("hello")
}
This is how an overwriting function might look like:
ensureRecentMethods <- function(obj, classname) {
## Get generator //
gen <- getRefClass(classname)
## Get names of methods belonging to the class of 'obj' //
## This will serve as an index for the update
idx1 <- names(Filter(function(x) {attr(x, "refClassName") == class(obj)},
as.list(attributes(obj$.refClassDef)$refMethods))
)
#idx2 <- names(Filter(function(x) {attr(x, "refClassName")==gen$className},
# as.list(gen$def#refMethods)
#))
## Note:
## 'idx2' could be used to enforce some validity checks such as
## "all old methods must also be present in the updated class def"
## Overwrite //
for (ii in idx1) {
## Note how we are overwriting the old method defs in environment
## 'attributes(obj$.refClassDef)$refMethods' with the updated
## definitions taken from the generator of the updated class
## 'gen$def#refMethods[[ii]]' by making use of the index retrieved
## one step before ('idx1')
expr <- substitute(
assign(x=X, value=VALUE, envir=ENVIR),
list(
X=ii,
VALUE=gen$def#refMethods[[ii]],
ENVIR=attributes(obj$.refClassDef)$refMethods
)
)
eval(expr)
}
## As at the end of the day ref class objects are nothing more than
## environments, there is no need to explicitly return the actual
## ref class object 'obj' as the original object has already
## been updated (pass-by-reference vs. pass-by-value)
return(TRUE)
}
Applying it:
ensureRecentMethods(obj=my_object, classname="MyReferenceClass")
Even though the def of print_hello was indeed overwritten, the object still grabs the "old" version somehow:
attributes(my_object$.refClassDef)$refMethods$print_hello
## Note the updated method def!
Class method definition for method print_hello()
function ()
{
print("hello_again")
}
my_object$print_hello()
[1] "hello"
Take advantage of my_class#generator$def#refMethods
How about including an update method in the original class, as did here,
Manual modifications of the class definition of a Reference Class instance
Currently, I'm reading a lot about Software Engineering, Software Design, Design Patterns etc. Coming from a totally different background, that's all new fascinating stuff to me, so please bear with me in case I'm not using the correct technical terminology to describe certain aspects ;-)
I ended up using Reference Classes (a way of OOP in R) most of the time because object orientation seems to be the right choice for a lot of the stuff that I'm doing.
Now, I was wondering if anyone has some good advice or some experience with respect to implementing the MVC (Model View Controller; also known as MVP: Model View Presenter) pattern in R, preferably using Reference Classes.
I'd also be very interested in info regarding other "standard" design patterns such as observer, blackboard etc., but I don't want to make this too broad of a question. I guess the coolest thing would be to see some minimal example code, but any pointer, "schema", diagram or any other idea will also be greatly appreciated!
For those interested in similar stuff, I can really recommend the following books:
The Pragmatic Programmer
Design Patterns
UPDATE 2012-03-12
I did eventually come up with a small example of my interpretation of MVC (which might not be totally correct ;-)).
Package Dependencies
require("digest")
Class Definition Observer
setRefClass(
"Observer",
fields=list(
.X="environment"
),
methods=list(
notify=function(uid, ...) {
message(paste("Notifying subscribers of model uid: ", uid, sep=""))
temp <- get(uid, .self$.X)
if (length(temp$subscribers)) {
# Call method updateView() for each subscriber reference
sapply(temp$subscribers, function(x) {
x$updateView()
})
}
return(TRUE)
}
)
)
Class Definition Model
setRefClass(
"Model",
fields=list(
.X="data.frame",
state="character",
uid="character",
observer="Observer"
),
methods=list(
initialize=function(...) {
# Make sure all inputs are used ('...')
.self <- callSuper(...)
# Ensure uid
.self$uid <- digest(c(.self, Sys.time()))
# Ensure hash key of initial state
.self$state <- digest(.self$.X)
# Register uid in observer
assign(.self$uid, list(state=.self$state), .self$observer$.X)
.self
},
multiply=function(x, ...) {
.self$.X <- .X * x
# Handle state change
statechangeDetect()
return(TRUE)
},
publish=function(...) {
message(paste("Publishing state change for model uid: ",
.self$uid, sep=""))
# Publish current state to observer
if (!exists(.self$uid, .self$observer$.X)) {
assign(.self$uid, list(state=.self$state), .self$observer$.X)
} else {
temp <- get(.self$uid, envir=.self$observer$.X)
temp$state <- .self$state
assign(.self$uid, temp, .self$observer$.X)
}
# Make observer notify all subscribers
.self$observer$notify(uid=.self$uid)
return(TRUE)
},
statechangeDetect=function(...) {
out <- TRUE
# Hash key of current state
state <- digest(.self$.X)
if (length(.self$state)) {
out <- .self$state != state
if (out) {
# Update state if it has changed
.self$state <- state
}
}
if (out) {
message(paste("State change detected for model uid: ",
.self$uid, sep=""))
# Publish state change to observer
.self$publish()
}
return(out)
}
)
)
Class Definition Controller and Views
setRefClass(
"Controller",
fields=list(
model="Model",
views="list"
),
methods=list(
multiply=function(x, ...) {
# Call respective method of model
.self$model$multiply(x)
},
subscribe=function(...) {
uid <- .self$model$uid
envir <- .self$model$observer$.X
temp <- get(uid, envir)
# Add itself to subscribers of underlying model
temp$subscribers <- c(temp$subscribers, .self)
assign(uid, temp, envir)
},
updateView=function(...) {
# Call display method of each registered view
sapply(.self$views, function(x) {
x$display(.self$model)
})
return(TRUE)
}
)
)
setRefClass(
"View1",
methods=list(
display=function(model, x=1, y=2, ...) {
plot(x=model$.X[,x], y=model$.X[,y])
}
)
)
setRefClass(
"View2",
methods=list(
display=function(model, ...) {
print(model$.X)
}
)
)
Class Definition For Representing Dummy Data
setRefClass(
"MyData",
fields=list(
.X="data.frame"
),
methods=list(
modelMake=function(...){
new("Model", .X=.self$.X)
}
)
)
Create Instances
x <- new("MyData", .X=data.frame(a=1:3, b=10:12))
Investigate model characteristics and observer state
mod <- x$modelMake()
mod$.X
> mod$uid
[1] "fdf47649f4c25d99efe5d061b1655193"
# Field value automatically set when initializing object.
# See 'initialize()' method of class 'Model'.
> mod$state
[1] "6d95a520d4e3416bac93fbae88dfe02f"
# Field value automatically set when initializing object.
# See 'initialize()' method of class 'Model'.
> ls(mod$observer$.X)
[1] "fdf47649f4c25d99efe5d061b1655193"
> get(mod$uid, mod$observer$.X)
$state
[1] "6d95a520d4e3416bac93fbae88dfe02f"
Note that the object's uid has automatically been registered in the observer upon initialization. That way, controllers/views can subscribe to notifications and we have a 1:n relationship.
Instantiate views and controller
view1 <- new("View1")
view2 <- new("View2")
cont <- new("Controller", model=mod, views=list(view1, view2))
Subscribe
Controller subscribes to notifications of underlying model
cont$subscribe()
Note that the subscription has been logged in the observer
get(mod$uid, mod$observer$.X)
Display Registered Views
> cont$updateView()
a b
1 1 10
2 2 11
3 3 12
[1] TRUE
There's also a plot window that is opened.
Modify Model
> cont$model$multiply(x=10)
State change detected for model uid: fdf47649f4c25d99efe5d061b1655193
Publishing state change for model uid: fdf47649f4c25d99efe5d061b1655193
Notifying subscribers of model uid: fdf47649f4c25d99efe5d061b1655193
a b
1 10 100
2 20 110
3 30 120
[1] TRUE
Note that both registered views are automatically updated as the underlying model published its state change to the observer, which in turn notified all subscribers (i.e., the controller).
Open Questions
Here's what I feel like I'm not fully understanding yet:
Is this a somewhat correct implementation of the MVC pattern? If not, what did I do wrong?
Should "processing" methods (e.g. aggregate data, take subsets etc.) for the model "belong" to the model or the controller class . So far, I always defined everything a specific object can "do" as methods of this very object.
Should the controller be sort of a "proxy" controlling every interaction between model and views (sort of "both ways"), or is it only responsible for propagating user input to the model (sort of "one way"?
It looks quite good, but I'm not so sure why you have an Observer additional to your other classes (maybe you can tell me) Usually the Controller IS an Observer. It's a really good idea to do this in R because when I learned it in Java it was not so easy to understand (Java hides some of the good parts)
Yes and No. There are many different interpretations of this pattern. I like to have the methods in the Object, I'd say it belongs to the model.
A simple example would be a sudoku solver that shows the solving steps in a GUI. Let's split it into some parts that can be separated into M, V and C: the raw data (2D array maybe), the sudoku functions (calc next step, ...), the GUI, someone who tells the GUI that a new step was calculated
I'd put it like this: M: raw data + sudoku functions, C: who tells the GUI about changes / the model about GUI inputs, V: GUI without any logic
others put the sudoku function into the Controller, is also right and might work better for some problems
It's possible to have a "one way" controller like you call it and the View is an Observer of the model
It's also possible to let the Controller do everything and Model and View don't know each other (have a look at Model View Presenter, that's about that)