I have created an S4 class ("card") that resembles a record with several fields. Now I want to define a collection class ("cat") to hold many "card" objects. The cat class will include methods for searching, editing, and adding cards.
Here is a simplified version of what I'm trying to create:
Card <- setClass("Card",
representation(dsOwner = "character", dsFile = "character", dsUrl = "character"))
Cat <- setClass("Cat", representation(cardlist = "list"))
setGeneric("addcard",
function(catObj, owner, file, url)
standardGeneric("addcard"))
setMethod("addcard",
signature(catObj = "Cat"),
function(catObj, owner, file, url){
index <- length(catObj) + 1
catObj[[index]] <- new("Card",
dsOwner = owner,
dsFile = file,
dsUrl = url)
return(index)
})
catalog <- new("Cat")
addcard(catalog, owner = "some online resource", file = "some file name", url = "http://some.url")
Unfortunately, executing the addcard method throws an error I don't understand:
Error in '[[<-'('*tmp*`, index, value = <S4 object of class "Card">) :
[[<- defined for objects of type "S4" only for subclasses of environment'.
Did I not define the cat class correctly?
R does not have a coherent java/C++-like container support to handle arrays, lists, sets, maps, etc (in OOP sense.)
R vectors/lists are in many ways similar to java lists/c++ arrays. I don't know much about speed and efficiency though, I suspect R does many more copies unlike java/c++
Simple set functions with vectors can be done by union, setdiff, unique etc functions. There is also library "sets" that handled very well the simple objects I throw to it.
named vectors (including named lists) are in many ways like a maps. The key is the name, and with some tricks you can make a string key out of any type of object. I suspect though these are not hashed, and the performance seems not to be good. You can also implement maps via environment (see Natural way to represent hash tables/dictionaries/maps in R) and data.table (Dictionaries and pairs in R)
Usually you get pretty far by using these simple measures but I would love to see a more rigorous and efficient implementation. I would happy to be wrong though :-)
Related
Background: I am trying to build a mini spreadsheet application in Julia that could hold potentially thousands of columns/rows. I found that GTK seems to be fairly mature graphics library to this task (though no spreadsheet application that I know of exists in Julia). GTK's default TreeModel/ListStore works but writing data to it would almost certaintly be a huge pain and very slow as it requires tuples to pass each row to it...
My attempt based on the documentation from Gtk.jl (does not work as intended, I also created the various methods I want to override):
mutable struct CustomTreeModel <: ListStore
handle::Ptr{Gtk.GObject}
_num_rows::Int
_n_columns::Int
cell_matrix::Matrix{Cell}
function CustomTreeModel(cell_matrix)
ls = ListStore(String)
tm_handle = Base.unsafe_convert(Ptr{Gtk.GObject}, ls)
new_obj = new(tm_handle, size(cell_matrix, 1), size(cell_matrix, 2), cell_matrix)
return gobject_move_ref(new_obj, ls)
end
#Methods to override
get_value(self::CustomTreeModel, iter_, column) = string(self.cell_matrix[iter_.user_data][column])
get_n_columns(self::CustomTreeModel) = self._n_columns
get_column_type(self::CustomTreeModel, column) = String
get_flags(self::CustomTreeModel) = Gtk.GtkTreeModelFlags.GTK_TREE_MODEL_ITERS_PERSIST
...
#I want to override the tree model though
tv = GtkTreeView(GtkTreeModel(CustomTreeModel(cells)))
Based on the GTK documentation, it appears I have two choices: override the TreeModel class's virtual methods (cant from Julia) or register the type. Unfortunately I have no idea how to do the latter.
Any thoughts on how to implement a custom model using Gtk.jl?
i'm trying to do an "node" Class in R, like in java:
but the code pop up an error, "The class node doesn't exist when i'm creating the value "Next="node"
is it not possible to do recursion in a class in R? or how could i do this?
node <- setRefClass("node", fields = list(value="numeric", next="node"))
Error: inesperado '=' in "node <- node <- setRefClass("node", fields = list(value="numeric", next=""
The problem here is that you are using the flow Control word next. Try help(next) and this description comes up
These are the basic control-flow constructs of the R language. They function in much the same way as control statements in any Algol-like language. They are all reserved words.
As such they cannot be used for variable names, and if they are used for list names they should be quoted. Eg. this will work:
setRefClass('node', fields = list(value = 'numeric', 'next' = 'node'))
note that I wrote 'next' and not next
As a side note I would suggest checking out the R6 package, which provides a simpler interface to OOP than reference classes while also being much faster than reference classes.
One advantage of Reason ML over JavaScript is that it provides a Map type that uses structural equality rather than reference equality.
However, I cannot find usage examples of this.
For example, how would I declare a type scores that is a map of strings to integers?
/* Something like this */
type scores = Map<string, int>;
And how would I construct an instance?
/* Something like this */
let myMap = scores();
let myMap2 = myMap.set('x', 100);
The standard library Map is actually quite unique in the programming language world in that it is a module functor which you must use to construct a map module for your specific key type (and the API reference documentation is therefore found under Map.Make):
module StringMap = Map.Make({
type t = string;
let compare = compare
});
type scores = StringMap.t(int);
let myMap = StringMap.empty;
let myMap2 = StringMap.add("x", 100, myMap);
There are other data structures you can use to construct map-like functionality, particularly if you need a string key specifically. There's a comparison of different methods in the BuckleScript Cookbook. All except Js.Dict are available outside BuckleScript. BuckleScript also ships with a new Map data structure in its beta standard library which I haven't tried yet.
If you're just dealing with a Map<string, int>, Belt's Map.String would do the trick.
module MS = Belt.Map.String;
let foo: MS.t(int) = [|("a", 1), ("b", 2), ("c", 3)|]->MS.fromArray;
The ergonomics around the Belt version are a little less painful, and they're immutable maps to boot! There's also Map.Int within Belt. For other key types, you'll have to define your own comparator. Which is back to something similar to the two step process #glennsl detailed above.
Im Matlab the properties function sounds like a possible valid equivalent to commands in R that acquaint you with a particular object in the working environment, providing information as to its structure (data.frame, matrix, list, vector) and type of variables (character, numeric) (for example, with the R command str()), dimensions (using perhaps the call dim()), and names of the variables (names()).
However, this function is not operational in Octave:
>> properties(data)
warning: the 'properties' function is not yet implemented in Octave
I installed the package dataframe as suggested in a comment on the post linked above:
pkg install -forge dataframe and loaded it pkg load dataframe
But I can't find a way of getting a summary of the structure and dimensions of a datset data.mat in the workspace.
I believe it's a structure consisting of a 4 x 372,550 numerical matrix; two 4 x 46,568 numerical matrices, and a 256 x 1 character matrix. To get this info I had to scroll through many pages of the printout of data.
This info is not available on the Octave IDE, where I get:
Name Class Dimensions
data struc 1 x 1
a far cry from the complexity of the object data.
What is the smart way of getting some this information about an object in the workspace in Octave?
Following up on the first answer provided, here is what I get with whos:
>> whos
Variables in the current scope:
Attr Name Size Bytes Class
==== ==== ==== ===== =====
data 1x1 7452040 struct
Total is 1 element using 7452040 bytes
This is not particularly informative about what data really contains. In fact, I just found out a way to extract the names inside data:
>> fieldnames(data)
ans =
{
[1,1] = testData
[2,1] = trainData
[3,1] = validData
[4,1] = vocab
}
Now if I call
>> size(data)
ans =
1 1
the output is not very useful. On the other hand, knowing the names of the matrices within data I can do
>> size(data.trainData)
ans =
4 372550
which is indeed informative.
If you type the name of the variable, you'll see information about it. In your case it's a struct, so it'll tell you the field names. Relevant functions are: size, ndims, class, fieldnames, etc.
size(var)
class(var)
etc.
You refer to .mat. Maybe you have a MAT-file, which you can load with load filename. Once loaded you can examine and use the variables in the file.
whos
prints simple information on the variables in memory, most useful to see what variables exist.
Following up on your edited question. This works in Octave:
for s=fieldnames(data)'
s=s{1};
tmp=data.(s);
disp([s,' - ',class(tmp),' - ',mat2str(size(tmp))])
end
It prints basic information of each of the members of the struct. It does assume that data is a 1x1 struct array. Note that a struct can be an array:
data(2).testData = [];
Causes your data struct to be a 2x1 array. This is why size(data) is relevant. class is also important (it's shown in the output of whos. Variables can be of type double (normal arrays), and other numeric types, logical, struct, cell (an array of arrays), or a custom class that you can write yourself.
I highly recommend reading an introductory text on MATLAB/Octave, as it works very differently from R. It's not just a different flavor of language, it's a whole different world.
I am having some trouble achieving consistent behavior accessing attributes attached to reference class objects. For example,
testClass <- setRefClass('testClass',
methods = list(print_attribute = function(name) print(attr(.self, name))))
testInstance <- testClass$new()
attr(testInstance, 'testAttribute') <- 1
testInstance$print_attribute('testAttribute')
And the R console cheerily prints NULL. However, if we try another approach,
testClass <- setRefClass('testClass',
methods = list(initialize = function() attr(.self, 'testAttribute') <<- 1,
print_attribute = function(name) print(attr(.self, name))))
testInstance <- testClass$new()
testInstance$print_attribute('testAttribute')
and now we have 1 as expected. Note that the <<- operator is required, presumably because assigning to .self has the same restrictions as assigning to reference class fields. Note that if we had tried to assign outside of the constructor, say
testClass <- setRefClass('testClass',
methods = list(set_attribute = function(name, value) attr(.self, name) <<- value,
print_attribute = function(name) print(attr(.self, name))))
testInstance <- testClass$new()
testInstance$set_attribute('testAttribute', 1)
we would be slapped with
Error in attr(.self, name) <<- value :
cannot change value of locked binding for '.self'
Indeed, the documentation ?setRefClass explains that
The entire object can be referred to in a method by the reserved name .self ... These fields are read-only (it makes no sense to
modify these references), with one exception. In principal, the
.self field can be modified in the $initialize method, because
the object is still being created at this stage.
I am happy with all of this, and agree with author's decisions. However, what I am concerned about is the following. Going back to the first example above, if we try asking for attr(testInstance, 'testAttribute'), we see from the global environment that it is 1!
Presumably, the .self that is used in the methods of the reference class object is stored in the same memory location as testInstance--it is the same object. Thus, by setting an attribute on testInstance successfully in the global environment, but not as a .self reference (as demonstrated in the first example), have we inadvertently triggered a copy of the entire object in the global environment? Or is the way attributes are stored "funny" in some way that the object can reside in the same memory, but its attributes are different depending on the calling environment?
I see no other explanation for why attr(.self, 'testAttribute') is NULL but attr(testInstance, 'testAttribute') is 1. The binding .self is locked once and for all, but that does not mean the object it references cannot change. If this is the desired behavior, it seems like a gotcha.
A final question is whether or not the preceding results imply attr<- should be avoided on reference class objects, at least if the resulting attributes are used from within the object's methods.
I think I may have figured it out. I began by digging into the implementation of reference classes for references to .self.
bodies <- Filter(function(x) !is.na(x),
structure(sapply(ls(getNamespace('methods'), all.names = TRUE), function(x) {
fn <- get(x, envir = getNamespace('methods'))
if (is.function(fn)) paste(deparse(body(fn)), collapse = "\n") else NA
}), .Names = ls(getNamespace('methods'), all.names = TRUE))
)
Now bodies holds a named character vector of all the functions in the methods package. We now look for .self:
goods <- bodies[grepl("\\.self", bodies)]
length(goods) # 4
names(goods) # [1] ".checkFieldsInMethod" ".initForEnvRefClass" ".makeDefaultBinding" ".shallowCopy"
So there are four functions in the methods package that contain the string .self. Inspecting them shows that .initForEnvRefClass is our culprit. We have the statement selfEnv$.self <- .Object. But what is selfEnv? Well, earlier in that same function, we have .Object#.xData <- selfEnv. Indeed, looking at the attributes on our testInstance from example one gives
$.xData
<environment: 0x10ae21470>
$class
[1] "testClass"
attr(,"package")
[1] ".GlobalEnv"
Peeking into attributes(attr(testInstance, '.xData')$.self) shows that we indeed can access .self directly using this approach. Notice that after executing the first two lines of example one (i.e. setting up testInstance), we have
identical(attributes(testInstance)$.xData$.self, testInstance)
# [1] TRUE
Yes! They are equal. Now, if we perform
attr(testInstance, 'testAttribute') <- 1
identical(attributes(testInstance)$.xData$.self, testInstance)
# [1] FALSE
so that adding an attribute to a reference class object has forced a creation of a copy, and .self is no longer identical to the object. However, if we check that
identical(attr(testInstance, '.xData'), attr(attr(testInstance, '.xData')$.self, '.xData'))
# [1] TRUE
we see that the environment attached to the reference class object remains the same. Thus, the copying was not very consequential in terms of memory footprint.
The end result of this foray is that the final answer is yes, you should avoid setting attributes on reference classes if you plan to use them within that object's methods. The reason for this is that the .self object in a reference class object's environment should be considered fixed once and for all after the object has been initialized--and this includes the creation of additional attributes.
Since the .self object is stored in an environment that is attached as an attribute to the reference class object, it does not seem possible to avoid this problem without using pointer yoga--and R does not have pointers.
Edit
It appears that if you are crazy, you can do
unlockBinding('.self', attr(testInstance, '.xData'))
attr(attr(testInstance, '.xData')$.self, 'testAttribute') <- 1
lockBinding('.self', attr(testInstance, '.xData'))
and the problems above magically go away.