Currently, I'm reading a lot about Software Engineering, Software Design, Design Patterns etc. Coming from a totally different background, that's all new fascinating stuff to me, so please bear with me in case I'm not using the correct technical terminology to describe certain aspects ;-)
I ended up using Reference Classes (a way of OOP in R) most of the time because object orientation seems to be the right choice for a lot of the stuff that I'm doing.
Now, I was wondering if anyone has some good advice or some experience with respect to implementing the MVC (Model View Controller; also known as MVP: Model View Presenter) pattern in R, preferably using Reference Classes.
I'd also be very interested in info regarding other "standard" design patterns such as observer, blackboard etc., but I don't want to make this too broad of a question. I guess the coolest thing would be to see some minimal example code, but any pointer, "schema", diagram or any other idea will also be greatly appreciated!
For those interested in similar stuff, I can really recommend the following books:
The Pragmatic Programmer
Design Patterns
UPDATE 2012-03-12
I did eventually come up with a small example of my interpretation of MVC (which might not be totally correct ;-)).
Package Dependencies
require("digest")
Class Definition Observer
setRefClass(
"Observer",
fields=list(
.X="environment"
),
methods=list(
notify=function(uid, ...) {
message(paste("Notifying subscribers of model uid: ", uid, sep=""))
temp <- get(uid, .self$.X)
if (length(temp$subscribers)) {
# Call method updateView() for each subscriber reference
sapply(temp$subscribers, function(x) {
x$updateView()
})
}
return(TRUE)
}
)
)
Class Definition Model
setRefClass(
"Model",
fields=list(
.X="data.frame",
state="character",
uid="character",
observer="Observer"
),
methods=list(
initialize=function(...) {
# Make sure all inputs are used ('...')
.self <- callSuper(...)
# Ensure uid
.self$uid <- digest(c(.self, Sys.time()))
# Ensure hash key of initial state
.self$state <- digest(.self$.X)
# Register uid in observer
assign(.self$uid, list(state=.self$state), .self$observer$.X)
.self
},
multiply=function(x, ...) {
.self$.X <- .X * x
# Handle state change
statechangeDetect()
return(TRUE)
},
publish=function(...) {
message(paste("Publishing state change for model uid: ",
.self$uid, sep=""))
# Publish current state to observer
if (!exists(.self$uid, .self$observer$.X)) {
assign(.self$uid, list(state=.self$state), .self$observer$.X)
} else {
temp <- get(.self$uid, envir=.self$observer$.X)
temp$state <- .self$state
assign(.self$uid, temp, .self$observer$.X)
}
# Make observer notify all subscribers
.self$observer$notify(uid=.self$uid)
return(TRUE)
},
statechangeDetect=function(...) {
out <- TRUE
# Hash key of current state
state <- digest(.self$.X)
if (length(.self$state)) {
out <- .self$state != state
if (out) {
# Update state if it has changed
.self$state <- state
}
}
if (out) {
message(paste("State change detected for model uid: ",
.self$uid, sep=""))
# Publish state change to observer
.self$publish()
}
return(out)
}
)
)
Class Definition Controller and Views
setRefClass(
"Controller",
fields=list(
model="Model",
views="list"
),
methods=list(
multiply=function(x, ...) {
# Call respective method of model
.self$model$multiply(x)
},
subscribe=function(...) {
uid <- .self$model$uid
envir <- .self$model$observer$.X
temp <- get(uid, envir)
# Add itself to subscribers of underlying model
temp$subscribers <- c(temp$subscribers, .self)
assign(uid, temp, envir)
},
updateView=function(...) {
# Call display method of each registered view
sapply(.self$views, function(x) {
x$display(.self$model)
})
return(TRUE)
}
)
)
setRefClass(
"View1",
methods=list(
display=function(model, x=1, y=2, ...) {
plot(x=model$.X[,x], y=model$.X[,y])
}
)
)
setRefClass(
"View2",
methods=list(
display=function(model, ...) {
print(model$.X)
}
)
)
Class Definition For Representing Dummy Data
setRefClass(
"MyData",
fields=list(
.X="data.frame"
),
methods=list(
modelMake=function(...){
new("Model", .X=.self$.X)
}
)
)
Create Instances
x <- new("MyData", .X=data.frame(a=1:3, b=10:12))
Investigate model characteristics and observer state
mod <- x$modelMake()
mod$.X
> mod$uid
[1] "fdf47649f4c25d99efe5d061b1655193"
# Field value automatically set when initializing object.
# See 'initialize()' method of class 'Model'.
> mod$state
[1] "6d95a520d4e3416bac93fbae88dfe02f"
# Field value automatically set when initializing object.
# See 'initialize()' method of class 'Model'.
> ls(mod$observer$.X)
[1] "fdf47649f4c25d99efe5d061b1655193"
> get(mod$uid, mod$observer$.X)
$state
[1] "6d95a520d4e3416bac93fbae88dfe02f"
Note that the object's uid has automatically been registered in the observer upon initialization. That way, controllers/views can subscribe to notifications and we have a 1:n relationship.
Instantiate views and controller
view1 <- new("View1")
view2 <- new("View2")
cont <- new("Controller", model=mod, views=list(view1, view2))
Subscribe
Controller subscribes to notifications of underlying model
cont$subscribe()
Note that the subscription has been logged in the observer
get(mod$uid, mod$observer$.X)
Display Registered Views
> cont$updateView()
a b
1 1 10
2 2 11
3 3 12
[1] TRUE
There's also a plot window that is opened.
Modify Model
> cont$model$multiply(x=10)
State change detected for model uid: fdf47649f4c25d99efe5d061b1655193
Publishing state change for model uid: fdf47649f4c25d99efe5d061b1655193
Notifying subscribers of model uid: fdf47649f4c25d99efe5d061b1655193
a b
1 10 100
2 20 110
3 30 120
[1] TRUE
Note that both registered views are automatically updated as the underlying model published its state change to the observer, which in turn notified all subscribers (i.e., the controller).
Open Questions
Here's what I feel like I'm not fully understanding yet:
Is this a somewhat correct implementation of the MVC pattern? If not, what did I do wrong?
Should "processing" methods (e.g. aggregate data, take subsets etc.) for the model "belong" to the model or the controller class . So far, I always defined everything a specific object can "do" as methods of this very object.
Should the controller be sort of a "proxy" controlling every interaction between model and views (sort of "both ways"), or is it only responsible for propagating user input to the model (sort of "one way"?
It looks quite good, but I'm not so sure why you have an Observer additional to your other classes (maybe you can tell me) Usually the Controller IS an Observer. It's a really good idea to do this in R because when I learned it in Java it was not so easy to understand (Java hides some of the good parts)
Yes and No. There are many different interpretations of this pattern. I like to have the methods in the Object, I'd say it belongs to the model.
A simple example would be a sudoku solver that shows the solving steps in a GUI. Let's split it into some parts that can be separated into M, V and C: the raw data (2D array maybe), the sudoku functions (calc next step, ...), the GUI, someone who tells the GUI that a new step was calculated
I'd put it like this: M: raw data + sudoku functions, C: who tells the GUI about changes / the model about GUI inputs, V: GUI without any logic
others put the sudoku function into the Controller, is also right and might work better for some problems
It's possible to have a "one way" controller like you call it and the View is an Observer of the model
It's also possible to let the Controller do everything and Model and View don't know each other (have a look at Model View Presenter, that's about that)
Related
I'm writing out some functions for Inventory management. I've recently wanted to add a "photo url column" to my spreadsheet by using an API I've used successfully while initially building my inventory. My Spreadsheet header looks like the following:
SKU | NAME | OTHER STUFF
I have a getProductInfo function that returns a list of product info from an API I'm calling.
getProductInfo<- function(barcode) {
#Input UPC
#Output List of product info
CallAPI(barcode)
Process API return, remove garbage
return(info)
}
I made a new function that takes my inventory csv as input, and attempts to add a new column with product photo url.
get_photo_url_from_product_info_output <- function(in_list){
#Input GetProductInfo Output. Returns Photo URL, or nothing if
#it doesn't exist
if(in_list$DisplayStockPhotos == TRUE){
return(in_list$StockPhotoURL)
} else {
return("")
}
}
add_Photo_URL <- function(in_csv){
#Input CSV data frame, appends photourl column
#Requires SKU (UPC) assumes no photourl column
out_csv <- mutate(in_csv, photo =
get_photo_url_from_product_info_output(
getProductInfo(SKU)
)
)
}
return (out_csv)
}
#Call it
new <- add_Photo_URL(old)
My thinking was that R would simply input the SKU of the from the row, and put it through the double function call "as is", and the vectorized DPLYR function mutate would just vectorize it. Unfortunately I was running into all sorts of problems I couldn't understand. Eventually I figured out that API call was crashing because the SKU field was all messed up as it was being passed in. I put in a breakpoint and found out that it wasn't just passing in the SKU, but instead an entire list (I think?) of SKUs. Every Row all at once. Something like this:
#Variable 'barcode' inside getProductInfo function contains:
[1] 7.869368e+11 1.438175e+10 1.256983e+10 2.454357e+10 3.139814e+10 1.256983e+10 1.313260e+10 4.339643e+10 2.454328e+10
[10] 1.313243e+10 6.839046e+11 2.454367e+10 2.454363e+10 2.454367e+10 2.454348e+10 8.418870e+11 2.519211e+10 2.454375e+10
[19] 2.454381e+10 2.454381e+10 2.454383e+10 2.454384e+10 7.869368e+11 2.454370e+10 2.454390e+10 1.913290e+11 2.454397e+10
[28] 2.454399e+10 2.519202e+10 2.519205e+10 7.742121e+11 8.839291e+11 8.539116e+10 2.519211e+10 2.519211e+10 2.519211e+10
Obviously my initial getProductInfo function can't handle that, so it'll crash.
How should I modify my code, whether it be in the input or API call to avoid this vectorized operation issue?
Well, it's not totally elegant but it works.
I figured out I need to use lapply, which is usually not my strong suit. Initally I tried to nest them like so:
lapply(SKU, get_photo_url_from_product_info_output(getProductInfo())
But that didn't work. So I just came up with bright idea of making another function
get_photo_url_from_sku <- function(barcode){
return(get_photo_url_from_product_info_output(getProductInfo(barcode)))
}
Call that in the lapply:
out_csv<- mutate(in_csv, photocolumn = lapply(SKU, get_photo_url_from_sku))
And it works great. My speed is only limited by my API calls.
ann <- Person$new("Ann", "black")
In the example above (which is from this Introduction), how would I get "ann"?
For instance, I would need a method ann$getName that would return "ann".
I'm trying to do the same thing, so perhaps I can clarify the question. The goal (for me) is to give the user of the class some feedback. Something like that:
Person <- R6Class("Person",
public = list(
name = NULL,
hair = NULL,
initialize = function(name = NA, hair = NA) {
self$name <- name
self$hair <- hair
},
do_something_very_long=function(){
whoami <- self$getInstanceName() ## $getInstanceName() is the method I need to write !
message(paste("Please wait, processing object",whoami))
# Do a very long calculation...
}
)
)
Which I'd then run in a script that would do something like
#File batch_processing.R
first_in_line<-Person$new("Alice","Black")
next_customer<-Person$new("Bob","Red")
VIP<-Person$new("Charlie","Brown")
# etc ...
first_in_line$do_something_very_long()
next_customer$do_something_very_long()
VIP$do_something_very_long()
# etc ...
So, my (notional) user will start batch_processing.R, perhaps with
$ R ~/batch_processing.R
or
R> source("batch_processing.R")
and watch not much happening while the script works. I would like, therefore, some feedback so that when the user comes back after his coffee, he can look at the screen and see that the computer is busy processing VIP or next_customer.
Obviously - one way is to explicitly give unique identifiers to each object ($name in this case). In my real application case, however, this would not be very meaningful, or would duplicate the object name ("model 1", "model 2"...) which is a bit wasteful !
I'm coming from C++ background, trying to make use of it for R OOP programming with R6 package.
Consider the following typical situation when writing a large OOP code. -
You have a class, in which you have several (possibly many) functions, each of which may also be quite complex and with many lines of code:
# file CTest.R
cTest <- R6Class(
"CTest",
public = list(
z = 10,
fDo1 = function() {
# very long and complex code goes here
self$z <- self$z*2; self$z
},
fDo2 = function() {
# another very long and complex code goes here
print(self)
}
)
) #"CTest"
Naturally, you don't want to put ALL your long and various functions in the same (CTest.R) file - it will become messy and unmanageable.
If you program in C++, normal way to program such code is : first, you declare you functions in .h file, then you create .c files for each you complex function, where you define your function. This makes it possible to do collaborative code writing, including efficient source-control.
So, I've tried to do something similar in R, like: first, declaring a function as in code above, and then, trying to assign the "actual long and complex" code to it later (which later I would put in a separate file CTest-Do1.R):
cTest$f <- function() {
self$z <- self$z*100000; self$z
}
Now I test if it works:
> tt <- cTest$new(); tt; tt$fDo1(); tt
<CTest>
Public:
clone: function (deep = FALSE)
fDo1: function ()
fDo2: function ()
z: 10
[1] 20
<CTest>
Public:
clone: function (deep = FALSE)
fDo1: function ()
fDo2: function ()
z: 20
No, it does not.- As seen from output above, the function has not been changed.
Any advice?
Thanks to Grothendieck's comment above, there's a reasonable workaround to make it work.
Instead of this:
# CTest-Do1_doesnotwork.R
cTest$fDo1 <- function() {
...
}
write this:
# CTest-Do1_works.R
cTest$set(
overwrite = TRUE, "public", "fDo1",
function() {
...
}
)
This code can now be written in separate file, as originally desired.
I still wonder though - Is the above describe way actually the common(best) practice for writing large OOP codes in R community? (looks a bit strange to me).
If not, what is it (beyond just using source()) ? - so that to enable collaborative coding and source control for separate parts (functions) of a class ?
I came here also searching for R6 best practice. One way that I've seen (here) is to define the functions elsewhere as normal R functions and pass in self, private etc as required
cTest<- R6::R6Class("CTest",
public = list(
fDo1 = function()
cTestfDo1(self),
fDo2 = function(x)
cTestfDo2(self, private, x)
))
and else where have
cTestfDo1 <- function(self) {
self$z <- self$z*2; self$z
}
and somewhere else
cTestfDo2 <- function(self, private, x) {
self$z * private$q + x
}
etc
I don't know if it's best practice, or efficient, but the class definition looks neat with it and if the cTestfDo1 functions are not exported then it's relatively neat in the namespace too.
I'm looking for a way to tell an instance of a reference class to forget one of its method definitions. For example, I create the class MyReferenceClass and an instance called my_object I can call the method print_hello and everything works:
MyReferenceClass <- setRefClass("MyReferenceClass",
methods = list(
print_hello = function(){
print("hello")
}
)
)
my_object <- MyReferenceClass$new()
my_object$print_hello() # "hello"
If I update the class definition by adding a new method (print_goodbye) my existing object will be able to use it. But if I change a previously defined method (print_hello), it won't update:
MyReferenceClass <- setRefClass("MyReferenceClass",
methods = list(
print_hello = function(){
print("hello_again")
},
print_goodbye = function(){
print("goodbye")
}
)
)
my_object$print_goodbye() # "goodbye" => it works
my_object$print_hello() # "hello" => it doesn't work
Is there a way to tell my_object to forget about its definition of print_hello? This doesn't work: my_object$print_hello <<- NULL
AFAIK the answer is no when trying to inform the object about the class def change "after the fact", i.e. after it has been instantiated/created.
Once you created an instance of a S4 Class, that object is "bound" to the class def as it was when you created the object. And in my opinion this makes perfect sense. Not sure if the "successful" update for formerly missing methods (i.e. print_goodbye()) simply works "by accident" or actually is the desired behavior.
Recommended way to deal with updated class defs
My recommendation: if you decide you want/need to update your class defs, you're just safer off by re-sourcing your entire project code. That way you make sure everything is in place before you create actual instances. I'd consider anything else to be quite a hack that stands on very shaky grounds.
If you decide to hack anyway
There might be some dirty way to hack the hidden .refClassDef object field of an Reference Class instance that actually contains the class def (see my_object$.refClassDef). But setting this field (i.e. using <- on it) didn't work:
my_object$.refClassDef <- MyReferenceClass
Error in envRefSetField(x, what, refObjectClass(x), selfEnv, value) :
'.refClassDef' is not a field in class "MyReferenceClass"
Neither did an explicit assignment via assign():
assign(".refClassDef", MyReferenceClass, my_object)
Error in assign(".refClassDef", MyReferenceClass, my_object) :
cannot change value of locked binding for '.refClassDef'
An even deeper hack would probably involve looking at attributes(my_object$.refClassDef).
There you might find the actual pieces that make up the ref class def. However, I don't know if even changing anything there would be "immediately" reflected.
Also, resetClass() might give you some more insights.
UPDATE: 2014-03-19
For handling your caching-approach two approaches come to mind:
1. The most evident way: use copy()
See ?setRefClass
MyReferenceClass <- setRefClass("MyReferenceClass",
methods = list(
print_hello = function(){
print("hello")
}
)
)
my_object <- MyReferenceClass$new()
MyReferenceClass <- setRefClass("MyReferenceClass",
methods = list(
print_hello = function(){
print("hello_again")
},
print_goodbye = function(){
print("goodbye")
}
)
)
Before copying:
my_object$print_hello()
[1] "hello"
After copying:
my_object <- my_object$copy()
my_object$print_hello()
[1] "hello_again"
2. Hacking at attributes(my_object$.refClassDef)$refMethods (OUTLINE, NOT WORKING YET)
Even though I wouldn't recommend actually relying on something like this, hacks are always a great way to get a deeper understanding of how things work.
In this case, we could try to modify attributes(my_object$.refClassDef)$refMethods which is an environment that contains the actual method defs as I'm guessing that this is where the object "looks" when a method is called.
It's no problem overwriting the actual method defs, yet it seems to have no immediate effect. I'm guessing that there are more "links" to the "old" class def involved that would need to be updated manually in a similar way.
Note that my_object still features the method print_hello that prints "hello":
attributes(my_object$.refClassDef)$refMethods$print_hello
Class method definition for method print_hello()
function ()
{
print("hello")
}
This is how an overwriting function might look like:
ensureRecentMethods <- function(obj, classname) {
## Get generator //
gen <- getRefClass(classname)
## Get names of methods belonging to the class of 'obj' //
## This will serve as an index for the update
idx1 <- names(Filter(function(x) {attr(x, "refClassName") == class(obj)},
as.list(attributes(obj$.refClassDef)$refMethods))
)
#idx2 <- names(Filter(function(x) {attr(x, "refClassName")==gen$className},
# as.list(gen$def#refMethods)
#))
## Note:
## 'idx2' could be used to enforce some validity checks such as
## "all old methods must also be present in the updated class def"
## Overwrite //
for (ii in idx1) {
## Note how we are overwriting the old method defs in environment
## 'attributes(obj$.refClassDef)$refMethods' with the updated
## definitions taken from the generator of the updated class
## 'gen$def#refMethods[[ii]]' by making use of the index retrieved
## one step before ('idx1')
expr <- substitute(
assign(x=X, value=VALUE, envir=ENVIR),
list(
X=ii,
VALUE=gen$def#refMethods[[ii]],
ENVIR=attributes(obj$.refClassDef)$refMethods
)
)
eval(expr)
}
## As at the end of the day ref class objects are nothing more than
## environments, there is no need to explicitly return the actual
## ref class object 'obj' as the original object has already
## been updated (pass-by-reference vs. pass-by-value)
return(TRUE)
}
Applying it:
ensureRecentMethods(obj=my_object, classname="MyReferenceClass")
Even though the def of print_hello was indeed overwritten, the object still grabs the "old" version somehow:
attributes(my_object$.refClassDef)$refMethods$print_hello
## Note the updated method def!
Class method definition for method print_hello()
function ()
{
print("hello_again")
}
my_object$print_hello()
[1] "hello"
Take advantage of my_class#generator$def#refMethods
How about including an update method in the original class, as did here,
Manual modifications of the class definition of a Reference Class instance
I have been playing a little with R's R5 class system to see what it can and can't do. In that process I have stumbled upon what looks like static class field members (which does not appear to be in the documentation - but i could have missed it)
[2014 update]
Warning !!! : The following code does not work with R version >= 3.0
This post, its answers and particularly the comments provide useful insights and reminders about the R5 OO-system and the R language in general. However it is probably a bad idea to cultivate the idiom of using the environment of the R5 class instances directly.
[end 2014 update]
In the following code, the first field is the classic definition of an instance field variable. The second definition appears to create a static class field using an accessor method. I would like to know if this use is kosher (or is my code example simply coincidental). The third field use creates a quasi-private instant field variable using an accessor method.
assertClass <- function(x, className, R5check=FALSE) {
# simple utility function
stopifnot(class(x)[1] == className)
if(R5check) stopifnot(is(x, 'envRefClass'))
}
A <- setRefClass('A',
fields = list(
# 1. public, typed, instance field
myPublicInstanceVar = 'character',
# 2. this assignment appears static
# but if the field me.static.private
# was declared in the field list
# it would be a local instance var
myPrivateStaticVar = function(x) {
if (!missing(x)) {
assertClass(x, 'character')
me.static.private <<- x
}
me.static.private
},
# 3. quasi-private, typed, instance field
myPrivateInstanceVar = function(x) {
if (!missing(x)) {
assertClass(x, 'character')
.self$me.private <<- x
}
.self$me.private
}
),
methods = list(
initialize = function (c='default') {
myPublicInstanceVar <<- c
myPrivateStaticVar <<- c
myPrivateInstanceVar <<- c
}
)
)
# test instantiation
instance1.of.A <- A$new('first instance')
str(instance1.of.A)
instance2.of.A <- A$new('second instance')
str(instance1.of.A)
str(instance2.of.A)
instance3.of.A <- getRefClass('A')$new('third instance')
instance3.of.A$myPrivateStaticVar <- 'Third instance - changed'
print(instance1.of.A$myPrivateStaticVar)
print(instance2.of.A$myPrivateStaticVar)
print(instance3.of.A$myPrivateStaticVar)
str(instance1.of.A)
str(instance2.of.A)
str(instance3.of.A)
# but not really private ...
instance1.of.A$myPublicInstanceVar # works
instance1.of.A$me.static.private # DOES NOT WORK - where is this variable stored
instance1.of.A$me.private # works
# till death do us part
instance3.of.A <- NULL
gc()
str(instance1.of.A)
str(instance2.of.A)
str(instance3.of.A)
If you run this code - you can see that the second field variable appears to operate as a static class member. What is less clear to me is where the reference class keeps this field (hence my comment in the penultimate line above).
The short answer - based on Hadley's comments above - is no. R's reference classes do not have static variables.