Using multiple constructors for R classes and subclasses

Using multiple constructors for R classes and subclasses - r

I would like to use multiple constructors in my R S4 class.
I have an object that has three slots. To make that object, sometimes I want to just give the values for the three slots outright. But sometimes I'd like to provide a matrix, and I have a function that can take a matrix and return what those three slots should be.
At first, it seems like I could write a function as a constructor. So I could write objectFromMatrix(matrix) --> object with three slots. The problem is that I also have sub-classes that inherit from that main class, and I want to be able to use that constructor with them as well.
So I could just write functions as extra constructors for each of the subclasses, but that would be a bit tedious and not super OO-like.
To make my problem a little more tangible, I'll try to write a minimal example below. I'll write it in Java, but I'm a bit rusty so let me know if it doesn't make sense.
Desired structure, in Java:
// An abode is a place where you live and it has a size
class Abode {
int size = 1;
// Main constructor that just assigns args to fields
Abode(int size) {
this.size = size;
}
// Alternative constructor that takes in a different datatype
// and computes args to assign to fields
Abode(string description) {
if(description eq "Large") {
this.size = 5;
}
if(description eq "Small") {
this.size = 1;
}
}
// To keep it simple, a house is just an abode with a name
class House extends Abode {
String name;
House(int size, String name) {
super(size);
this.name = name;
}
House(string size, String name) {
super(size);
this.name = name;
}
}
This implementation works nicely because I can call Abode("big") or House("big", "Casa de me"), and both of those get passed to the extra constructor I built in the Abode class.
Keeping up with the house analogy, this is the best I've been able to do in R:
# An abode is a place you live and it has a size
setClass("Abode",
slots =
list(size = "numeric")
)
# Alternative constructor that takes in a different datatype
# and computes args to assign to fields
abode_constructor_2 <- function(sizeString) {
if (sizeString == "big") {return new("Abode", size = 5)}
if (sizeString == "small") {return new("Abode", size = 1)}
}
# A house is an abode with a name
setClass("House",
slots =
list(name = "string"),
contains = "Abode"
)
# I already defined this constructor but I have to do it again
house_constructor_2 <- function(sizeString, name) {
if (sizeString == "big") {return new("House", size = 5, name = name)}
if (sizeString == "small") {return new("House", size = 1, name = name)}
}
In case it helps, here is a minimal example of the real context where this problem is coming up. I define an extra constructor for the Sensor class, sensor_constructor_2, as a function. But then, when I have a class that inherits from Sensor, I have to make that constructor over again.
# A sensor has three parameters
setClass("Sensor",
slots =
list(Rmin = "numeric", Rmax = "numeric", delta = "numeric")
)
# I also like to make sensors from a matrix
sensor_constructor_2 <- function(matrix) {
params <- matrix_to_params(matrix)
return (new("Sensor", Rmin = params[1], Rmax = params[2], delta = params[3]))
}
# A redoxSensor is just a sensor with an extra field
setClass("redoxSensor",
slots =
list(e0 = "numeric"),
contains = "Sensor"
)
# Goal: make this extra constructor unnecessary by making sensor_constructor_2 a property of the sensor class
extraConstructor_redox <- function(matrix, e0) {
params <- matrix_to_params(matrix)
return (new("redoxSensor", Rmin = params[1], Rmax = params[2], delta = params[3]), e0 = e0)
}

There is no reason why you can't do this with one S4 constructor by using default arguments and a little extra logic, along the lines of
setClass("Abode",
slots = list(size = "numeric")
) -> Abode
setClass("House",
slots = list(name = "character"),
contains = "Abode"
) -> House
createDwelling <- function(size=0,name,sizeString){
if(!missing(sizeString)){
if(sizeString == "Large") size <- 5
else if(sizeString == "Small") size <- 1
else stop("invalid sizeString")
}
if(missing(name)) return(Abode(size=size))
else return(House(size=size,name=name))
}
example usage:
> createDwelling(size=3)
An object of class "Abode"
Slot "size":
[1] 3
> createDwelling(sizeString="Small")
An object of class "Abode"
Slot "size":
[1] 1
> createDwelling(sizeString="Small",name="my house")
An object of class "House"
Slot "name":
[1] "my house"
Slot "size":
[1] 1

Related

Using R, how to scope internal functions within a MAIN function?

My young son and I were playing a board game called Snails Pace. Simple enough, so I sat down to show him the game on the R-console.
Helper function
num.round = function(n, by=5)
{
byidx = (n %% by == 0); # these already are indexed well
new = by * as.integer((n + by) / by);
res = n;
res[!byidx] = new[!byidx];
res;
}
Primary function
snails.pace = function(moves = 200, finish.line = 8,
snail.x = NULL,
snail.y = NULL,
snail.col = NULL
)
{
if(is.null(snail.x)) { snail.x = 0*(1:6); }
if(is.null(snail.y)) { snail.y = 1*(1:6); }
if(is.null(snail.col)) { snail.col = c("orange", "blue", "pink", "green", "yellow", "red"); }
snail.rank = 0*snail.x;
crank = 1; # current rank
move.number = 0;
snails.plot = function(snail.x, snail.y, snail.rank, move.number, moves, finish.line, crank)
{
xmax = max(10, max(snail.x) );
ymax = max(8, max(snail.y) );
plot(snail.x, snail.y,
col=snail.col,
pch=16, cex=5,
xlim=c(0, num.round(xmax, 5) ),
ylim=c(0, num.round(ymax, 4) ),
axes=FALSE,
frame.plot=FALSE,
xlab="", ylab="",
main=paste0("Move #", move.number, " of ", moves)
);
#axis(gr.side("bottom"));
axis(1);
has.rank = (snail.rank != 0);
snails.lab = paste0(snail.x, "*", snail.rank);
snails.lab[!has.rank] = snail.x[!has.rank];
text(snail.x, y=snail.y, labels=snails.lab, col="black");
abline(v = finish.line, col="gray", lty="dashed");
}
snails.update = function(snail.x, snail.y, snail.rank, move.number, moves, finish.line, crank)
{
x = readline(prompt="Press [enter] to continue, [ESC] to quit");
n = sample(1:6, 1);
snail.x[n] = 1 + snail.x[n];
if( (snail.rank[n] == 0) && (snail.x[n] >= finish.line) )
{
snail.rank[n] = crank;
crank = 1 + crank;
# update to MAIN environment
assign("snail.rank", snail.rank, envir=parent.frame() );
assign("crank", crank, envir=parent.frame() );
}
snail.x;
}
snails.plot(snail.x, snail.y, snail.rank, move.number, moves, finish.line, crank);
while(move.number < moves)
{
move.number = 1 + move.number;
snail.x = snails.update(snail.x, snail.y, snail.rank, move.number, moves, finish.line, crank);
snails.plot(snail.x, snail.y, snail.rank, move.number, moves, finish.line, crank);
}
}
Game play
snails.pace();
Question: how to scope internal functions within MAIN environoment?
The MAIN function is snails.pace(). You will notice in the internal function snails.update, I update two variables and assign them back to the MAIN scope using assign.
Is there a way at the MAIN level I can define all the variables and just USE them within all internal functions without having to assign them back or returning the updating values?
As you can see in my CODE, I call all of the variables into the functions and either "back assign" or return any changes. I would prefer to just set a new env() or something and have MAIN work like R-Global seems to. Any suggestions on how to do that?
That is, my internal functions would not pass anything in: snails.plot = function() and snails.update = function() AS they would get the LOCAL environment variables (defined as within MAIN defined as snails.pace()). And ideally update the LOCAL environment variables by updating the value within the internal function.
Update
So it appears that I can drop the function passing. See:
snails.pace2 = function(moves = 200, finish.line = 8,
snail.x = NULL,
snail.y = NULL,
snail.col = NULL
)
{
if(is.null(snail.x)) { snail.x = 0*(1:6); }
if(is.null(snail.y)) { snail.y = 1*(1:6); }
if(is.null(snail.col)) { snail.col = c("orange", "blue", "pink", "green", "yellow", "red"); }
snail.rank = 0*snail.x;
crank = 1; # current rank
move.number = 0;
snails.plot = function()
{
xmax = max(10, max(snail.x) );
ymax = max(8, max(snail.y) );
plot(snail.x, snail.y,
col=snail.col,
pch=16, cex=5,
xlim=c(0, num.round(xmax, 5) ),
ylim=c(0, num.round(ymax, 4) ),
axes=FALSE,
frame.plot=FALSE,
xlab="", ylab="",
main=paste0("Move #", move.number, " of ", moves)
);
#axis(gr.side("bottom"));
axis(1);
has.rank = (snail.rank != 0);
snails.lab = paste0(snail.x, "*", snail.rank);
snails.lab[!has.rank] = snail.x[!has.rank];
text(snail.x, y=snail.y, labels=snails.lab, col="black");
abline(v = finish.line, col="gray", lty="dashed");
}
snails.update = function()
{
x = readline(prompt="Press [enter] to continue, [ESC] to quit");
n = sample(1:6, 1);
snail.x[n] = 1 + snail.x[n];
if( (snail.rank[n] == 0) && (snail.x[n] >= finish.line) )
{
snail.rank[n] = crank;
crank = 1 + crank;
# update to MAIN environment
assign("snail.rank", snail.rank, envir=parent.frame() );
assign("crank", crank, envir=parent.frame() );
}
snail.x;
}
snails.plot();
while(move.number < moves)
{
move.number = 1 + move.number;
snail.x = snails.update();
snails.plot();
}
}
#MrFlick is correct about the lexical scoping, if I understand the above correctly. If an internal updates something from MAIN, it has to assign it back to MAIN I guess <<- or assign ... parent. Is there not a way to tell the internal SUBFUNCTIONS to SCOPE at the same level of MAIN?

There are two completely different concepts called "parent" in R: the parent.frame() of a call, and the parent.env() of an environment.
parent.frame() walks up the chain of the stack of calls. If you have a recursive function that calls itself, it will appear multiple times in that chain.
In general, it's dangerous to use parent.frame(), because even if the context in which you use it now makes it clear which environment will be the parent.frame(), at some future time you might change your program (e.g. make the internal function into a recursive one, or call it from another internal function), and then parent.frame() will refer to something different.
The parent.env() function applies to an environment; parent.env(environment()) gives you the enclosing environment of the current one. If you call parent.env(environment()) it will always refer to the environment where your current function was defined. It doesn't matter how you called it, just how you defined it. So you always know what will happen if you assign there, and it's much safer in the long term than using parent.frame().
The <<- "super-assignment" works with enclosing environments, not the stack of calls. If you do var <<- value, then as long as you are sure that var was defined in the enclosing function, you can be sure that's what gets modified.
One flaw in R is that it doesn't enforce the existence of var there, so that's why some people say <<- is "sloppy". If you accidentally forget to define it properly, or spell it wrong, R will search back through the whole chain of environments to try to do what you asked, and if it never finds a matching variable, it will do the assignment in the global environment. You almost never want to do that: keep side effects minimal.
So, to answer the question "Is there a way at the MAIN level I can define all the variables and just USE them within all internal functions without having to assign them back or returning the updating values?": as you found in your edit, the nested function can read the value of any variable in the MAIN function without requiring any special code. To modify those variables, be sure both snail.rank and crank are defined in MAIN, then use <<- in the nested function to assign new values to them.

To have a function f defined within another function main such that f has the same scope as main surround the entire body of f with eval.parent(substitute({...})) like this:
main <- function() {
f <- function() eval.parent(substitute({
a <- a + 1
b <- 0.5
}))
a <- 1
f()
f()
10 * a + b
}
main()
## [1] 30.5
The gtools package has defmacro which allows the same thing and uses the same technique internally. Also see the wrapr package.

Invert Map<K, List<V>> to Map<V, K>

map = mapOf((2: [3,4,5]), (7: [22,33,44]))
need to convert this to
mapOf(3:2, 4:2, 5:2, 22:7, 33:7, 44:7)
tried using associate with forEach, not sure of the syntax

There might be some nicer syntax, but this should work well enough.
fun main() {
val map = mapOf(
2 to listOf(3, 4, 5),
7 to listOf(22, 33, 44)
)
val transformedMap = map.flatMap { entry ->
entry.value.map { it to entry.key }
}.toMap()
println(transformedMap)
}
Prints
{3=2, 4=2, 5=2, 22=7, 33=7, 44=7}
Note that the toMap function states
The returned map preserves the entry iteration order of the original collection. If any of two pairs would have the same key the last one gets added to the map.
So if you have the same value in two different lists, only the last one will be included in the map.
fun main() {
val map = mapOf(
2 to listOf(3, 4, 5),
7 to listOf(22, 33, 44),
8 to listOf(3)
)
val transformedMap = map.flatMap { entry ->
entry.value.map { it to entry.key }
}.toMap()
println(transformedMap)
}
Prints {3=8, 4=2, 5=2, 22=7, 33=7, 44=7}

Zymus' answer is correct, and is also what I would probably write.
However, if this is something that will be called often, you might want to extract it to a separate function that is more efficient.
fun <K, V> Map<K, Iterable<V>>.invert(): Map<V, K> {
val newMap = mutableMapOf<V, K>()
for ((key, iterable) in this) {
for (value in iterable) {
newMap[value] = key
}
}
return newMap
}
Usage:
fun main() {
val map = mapOf((2 to listOf(3, 4, 5)), (7 to listOf(22, 33, 44)))
val inverted = map.invert()
println(inverted)
}
Output:
{3=2, 4=2, 5=2, 22=7, 33=7, 44=7}
This is functionally equivalent to
map.flatMap { (key, values) -> values.map { it to key } }.toMap()
including the behaviour where if there are duplicate values in the original input, only the last one will be preserved as a new key. However, the flatMap version creates many temporary Lists (the number of original keys + 1) and many temporary Pairs (the number of original values), whereas this iterative version creates no extra objects.

C5.0 package: Error in paste(apply(x, 1, paste, collapse = ","), collapse = "\n") : result would exceed 2^31-1 bytes

When trying to train a model with a dataset of around 3 million rows and 600 columns using the C5.0 CRAN package I get the following error:
Error in paste(apply(x, 1, paste, collapse = ","), collapse = "\n") : result would exceed 2^31-1 bytes
From what the owner of the repository answered to a similar issue, it is due to an R limitation in the number of bytes in a character string, which is limited to 2^31 - 1.

Long answer ahead:
So, as stated in the question, the error occurs in the last line of the makeDataFile function from the Cubist package, used in C5.0, which concatenates all rows into one string. As this string is needed to pass the data to the C5.0 function in C, but is not needed to make any operations in R, and C has no memory limitation aside from those of the machine itself, the approach I have taken is to create such string in C instead. In order to do this, the R code will pass the information in a character vector containing various strings that don’t surpass the length limit, instead of one, so that once in C these elements can be concatenated.
However, instead of leaving all rows as separate elements in the character vector to be concatenated in C using strcat in a loop, I have found that the strcat function is quite slow, so I have chosen to create another R function (create_max_len_strings) in order to concatenate the rows into the longest (~or close~) strings possible without reaching the memory limit so that strcat only needs to be applied a few times to concatenate these longer strings.
So, the last line of the original makeDataFile() function will be replaced so that each row is left separately as an element of a character vector, only adding a line break at the end of each string row so that when concatenating some of these elements into longer strings, using create_max_len_strings(), they will be differentiated:
makeDataFile.R:
create_max_len_strings <- function(original_vector) {
vector_length = length(original_vector)
nchars = sum(nchar(original_vector, type = "chars"))
## Check if the length of the string would reach 1900000000, which is close to the memory limitation
if(nchars >= 1900000000){
## Calculate how many strings we could create of the maximum length
nchunks = 0
while(nchars > 0){
nchars = nchars - 1900000000
nchunks = nchunks + 1
}
## Get the number of rows that would be contained in each string
chunk_size = vector_length/nchunks
## Get the rounded number of rows in each string
chunk_size = floor(chunk_size)
index = chunk_size
## Create a vector with the indexes of the rows that delimit each string
indexes_vector = c()
indexes_vector = append(indexes_vector, 0)
n = nchunks
while(n > 0){
indexes_vector = append(indexes_vector, index)
index = index + chunk_size
n = n - 1
}
## Get the last few rows if the division had remainder
remainder = vector_length %% nchunks
if (remainder != 0){
indexes_vector = append(indexes_vector, vector_length)
nchunks = nchunks + 1
}
## Create the strings pasting together the rows from the indexes in the indexes vector
strings_vector = c()
i = 2
while (i <= length(indexes_vector)){
## Sum 1 to the index_init so that the next string does not contain the last row of the previous string
index_init = indexes_vector[i-1] + 1
index_end = indexes_vector[i]
## Paste the rows from the vector from index_init to index_end
string <- paste0(original_vector[index_init:index_end], collapse="")
## Create vector containing the strings that were created
strings_vector <- append(strings_vector, string)
i = i + 1
}
}else {
strings_vector = paste0(original_vector, collapse="")
}
strings_vector
}
makeDataFile <- function(x, y, w = NULL) {
## Previous code stays the same
...
x = apply(x, 1, paste, collapse = ",")
x = paste(x, "\n", sep="")
char_vec = create_max_len_strings(x)
}
CALLING C5.0
Now, in order to create the final string to pass to the c50() function in C, an intermediate function is created and called instead. In order to do this, the .C() statement that calls c50() in R is replaced with a .Call() statement calling this function, as .Call() allows for complex objects such as vectors to be passed to C. Also, it allows for the result to be returned in the variable result instead of having to pass back the variables tree, rules and output by reference. The result of calling C5.0 will be received in the character vector result containing the strings corresponding to the tree, rules and output in the first three positions:
C5.0.R:
C5.0.default <- function(x,
y,
trials = 1,
rules = FALSE,
weights = NULL,
control = C5.0Control(),
costs = NULL,
...) {
## Previous code stays the same
...
dataString <- makeDataFile(x, y, weights)
num_chars = sum(nchar(dataString, type = "chars"))
result <- .Call(
"call_C50",
as.character(namesString),
dataString,
as.character(num_chars), ## The length of the resulting string is passed as character because it is too long for an integer
as.character(costString),
as.logical(control$subset),
# -s "use the Subset option" var name: SUBSET
as.logical(rules),
# -r "use the Ruleset option" var name: RULES
## for the bands option, I'm not sure what the default should be.
as.integer(control$bands),
# -u "sort rules by their utility into bands" var name: UTILITY
## The documentation has two options for boosting:
## -b use the Boosting option with 10 trials
## -t trials ditto with specified number of trial
## I think we should use -t
as.integer(trials),
# -t : " ditto with specified number of trial", var name: TRIALS
as.logical(control$winnow),
# -w "winnow attributes before constructing a classifier" var name: WINNOW
as.double(control$sample),
# -S : use a sample of x% for training
# and a disjoint sample for testing var name: SAMPLE
as.integer(control$seed),
# -I : set the sampling seed value
as.integer(control$noGlobalPruning),
# -g: "turn off the global tree pruning stage" var name: GLOBAL
as.double(control$CF),
# -c: "set the Pruning CF value" var name: CF
## Also, for the number of minimum cases, I'm not sure what the
## default should be. The code looks like it dynamically sets the
## value (as opposed to a static, universal integer
as.integer(control$minCases),
# -m : "set the Minimum cases" var name: MINITEMS
as.logical(control$fuzzyThreshold),
# -p "use the Fuzzy thresholds option" var name: PROBTHRESH
as.logical(control$earlyStopping)
)
## Get the first three positions of the character vector that contain the tree, rules and output returned by C5.0 in C
result_tree = result[1]
result_rules = result[2]
result_output = result[3]
modelContent <- strsplit(
if (rules)
result_rules
else
result_tree, "\n"
)[[1]]
entries <- grep("^entries", modelContent, value = TRUE)
if (length(entries) > 0) {
actual <- as.numeric(substring(entries, 10, nchar(entries) - 1))
} else
actual <- trials
if (trials > 1) {
boostResults <- getBoostResults(result_output)
## This next line is here to avoid a false positive warning in R
## CMD check:
## * checking R code for possible problems ... NOTE
## C5.0.default: no visible binding for global variable 'Data'
Data <- NULL
size <-
if (!is.null(boostResults))
subset(boostResults, Data == "Training Set")$Size
else
NA
} else {
boostResults <- NULL
size <- length(grep("[0-9])$", strsplit(result_output, "\n")[[1]]))
}
out <- list(
names = namesString,
cost = costString,
costMatrix = costs,
caseWeights = !is.null(weights),
control = control,
trials = c(Requested = trials, Actual = actual),
rbm = rules,
boostResults = boostResults,
size = size,
dims = dim(x),
call = funcCall,
levels = levels(y),
output = result_output,
tree = result_tree,
predictors = colnames(x),
rules = result_rules
)
class(out) <- "C5.0"
out
}
Now onto the C code, the function call_c50() basically acts as an intermediate between the R code and the C code, concatenating the elements in the dataString array to obtain the string needed by the C function c50(), by accessing each position of the array using CHAR(STRING_ELT(x, i)) and concatenating (strcat) them together. Then the rest of the variables are casted to their respective types and the c50() function in file top.c (where this function should also be placed) is called. The result of calling c50() will be returned to the R routine by creating a character vector and placing the strings corresponding to the tree, rules and output in each position.
Lastly, the c50() function is basically left as is, except for the variables treev, rulesv and outputv, as these are the values that are going to be returned by .Call() instead of being passed by reference, they no longer need to be in the arguments of the function. As they are all strings they can be returned in a single array, by setting each string to a position in the array c50_return.
top.c:
SEXP call_C50(SEXP namesString, SEXP data_vec, SEXP datavec_len, SEXP costString, SEXP subset, SEXP rules, SEXP bands, SEXP trials, SEXP winnow, SEXP sample,
SEXP seed, SEXP noGlobalPruning, SEXP CF, SEXP minCases, SEXP fuzzyThreshold, SEXP earlyStopping){
char* string;
char* concat;
long n = 0;
long size;
int i;
char* eptr;
// Get the length of the data vector
n = length(data_vec);
// Get the string indicating the length of the final string
char* size_str = malloc((strlen(CHAR(STRING_ELT(datavec_len, 0)))+1)*sizeof(char));
strcpy(size_str, CHAR(STRING_ELT(datavec_len, 0)));
// Turn the string to long
size = strtol(size_str, &eptr, 10);
// Allocate memory for the number of characters indicated by datavec_len
string = malloc((size+1)*sizeof(char));
// Copy the first element of data_vec into the string variable
strcpy(string, CHAR(STRING_ELT(data_vec, 0)));
// Loop over the data vector until all elements are concatenated in the string variable
for (i = 1; i < n; i++) {
strcat(string, CHAR(STRING_ELT(data_vec, i)));
}
// Copy the value of namesString into a char*
char* namesv = malloc((strlen(CHAR(STRING_ELT(namesString, 0)))+1)*sizeof(char));
strcpy(namesv, CHAR(STRING_ELT(namesString, 0)));
// Copy the value of costString into a char*
char* costv = malloc((strlen(CHAR(STRING_ELT(costString, 0)))+1)*sizeof(char));
strcpy(costv, CHAR(STRING_ELT(costString, 0)));
// Call c50() function casting the rest of arguments into their respective C types
char** c50_return = c50(namesv, string, costv, asLogical(subset), asLogical(rules), asInteger(bands), asInteger(trials), asLogical(winnow), asReal(sample), asInteger(seed), asInteger(noGlobalPruning), asReal(CF), asInteger(minCases), asLogical(fuzzyThreshold), asLogical(earlyStopping));
free(string);
free(namesv);
free(costv);
// Create a character vector to be returned to the C5.0 R function
SEXP out = PROTECT(allocVector(STRSXP, 3));
SET_STRING_ELT(out, 0, mkChar(c50_return[0]));
SET_STRING_ELT(out, 1, mkChar(c50_return[1]));
SET_STRING_ELT(out, 2, mkChar(c50_return[2]));
UNPROTECT(1);
return out;
}
static char** c50(char *namesv, char *datav, char *costv, int subset,
int rules, int utility, int trials, int winnow,
double sample, int seed, int noGlobalPruning, double CF,
int minCases, int fuzzyThreshold, int earlyStopping) {
int val; /* Used by setjmp/longjmp for implementing rbm_exit */
char ** c50_return = malloc(3 * sizeof(char*));
// Initialize the globals to the values that the c50
// program would have at the start of execution
initglobals();
// Set globals based on the arguments. This is analogous
// to parsing the command line in the c50 program.
setglobals(subset, rules, utility, trials, winnow, sample, seed,
noGlobalPruning, CF, minCases, fuzzyThreshold, earlyStopping,
costv);
// Handles the strbufv data structure
rbm_removeall();
// Deallocates memory allocated by NewCase.
// Not necessary since it's also called at the end of this function,
// but it doesn't hurt, and I'm feeling paranoid.
FreeCases();
// XXX Should this be controlled via an option?
// Rprintf("Calling setOf\n");
setOf();
// Create a strbuf using *namesv as the buffer.
// Note that this is a readonly strbuf since we can't
// extend *namesv.
STRBUF *sb_names = strbuf_create_full(namesv, strlen(namesv))
// Register this strbuf using the name "undefined.names"
if (rbm_register(sb_names, "undefined.names", 0) < 0) {
error("undefined.names already exists");
}
// Create a strbuf using *datav and register it as "undefined.data"
STRBUF *sb_datav = strbuf_create_full(datav, strlen(datav));
// XXX why is sb_datav copied? was that part of my debugging?
// XXX or is this the cause of the leak?
if (rbm_register(strbuf_copy(sb_datav), "undefined.data", 0) < 0) {
error("undefined data already exists");
}
// Create a strbuf using *costv and register it as "undefined.costs"
if (strlen(costv) > 0) {
// Rprintf("registering cost matrix: %s", *costv);
STRBUF *sb_costv = strbuf_create_full(costv, strlen(costv));
// XXX should sb_costv be copied?
if (rbm_register(sb_costv, "undefined.costs", 0) < 0) {
error("undefined.cost already exists");
}
} else {
// Rprintf("no cost matrix to register\n");
}
/*
* We need to initialize rbm_buf before calling any code that
* might call exit/rbm_exit.
*/
if ((val = setjmp(rbm_buf)) == 0) {
// Real work is done here
c50main();
if (rules == 0) {
// Get the contents of the the tree file
STRBUF *treebuf = rbm_lookup("undefined.tree");
if (treebuf != NULL) {
char *treeString = strbuf_getall(treebuf);
c50_return[0] = R_alloc(strlen(treeString) + 1, 1);
strcpy(c50_return[0], treeString);
c50_return[1] = "";
} else {
// XXX Should *treev be assigned something in this case?
// XXX Throw an error?
}
} else {
// Get the contents of the the rules file
STRBUF *rulesbuf = rbm_lookup("undefined.rules");
if (rulesbuf != NULL) {
char *rulesString = strbuf_getall(rulesbuf);
c50_return[1] = R_alloc(strlen(rulesString) + 1, 1);
strcpy(c50_return[1], rulesString);
c50_return[0] = "";
} else {
// XXX Should *rulesv be assigned something in this case?
// XXX Throw an error?
}
}
} else {
Rprintf("c50 code called exit with value %d\n", val - JMP_OFFSET);
}
// Close file object "Of", and return its contents via argument outputv
char *outputString = closeOf();
c50_return[2] = R_alloc(strlen(outputString) + 1, 1);
strcpy(c50_return[2], outputString);
// Deallocates memory allocated by NewCase
FreeCases();
// We reinitialize the globals on exit out of general paranoia
initglobals();
return c50_return;
}
***IMPORTANT: if the string created is longer than 2147483647, you also will need to change the definition of the variables i and j in the function strbuf_gets() in strbuf.c. This function basically iterates through each position of the string, so trying to increase their value above the INT limit to access those positions in the array will cause a segmentation fault. I suggest changing the declaration type to long in order to avoid this issue.
C5.0 PREDICTIONS
However, as the makeDataFile function is not only used to create the model but also to pass the data to the predictions() function, this function will also have to be modified. Just like previously, the .C() statement in predict.C5.0() used to call predictions() will be replaced with a .Call() statement in order to be able to pass the character vector to C, and the result will be returned in the result variable instead of being passed by reference:
predict.C5.0.R:
predict.C5.0 <- function (object,
newdata = NULL,
trials = object$trials["Actual"],
type = "class",
na.action = na.pass,
...) {
## Previous code stays the same
...
caseString <- makeDataFile(x = newdata, y = NULL)
num_chars = sum(nchar(caseString, type = "chars"))
## When passing trials to the C code, convert to
## zero if the original version of trials is used
if (trials <= 0)
stop("'trials should be a positive integer", call. = FALSE)
if (trials == object$trials["Actual"])
trials <- 0
## Add trials (not object$trials) as an argument
results <- .Call(
"call_predictions",
caseString,
as.character(num_chars),
as.character(object$names),
as.character(object$tree),
as.character(object$rules),
as.character(object$cost),
pred = integer(nrow(newdata)),
confidence = double(length(object$levels) * nrow(newdata)),
trials = as.integer(trials)
)
predictions = as.numeric(unlist(results[1]))
confidence = as.numeric(unlist(results[2]))
output = as.character(results[3])
if(any(grepl("Error limit exceeded", output)))
stop(output, call. = FALSE)
if (type == "class") {
out <- factor(object$levels[predictions], levels = object$levels)
} else {
out <-
matrix(confidence,
ncol = length(object$levels),
byrow = TRUE)
if (!is.null(rownames(newdata)))
rownames(out) <- rownames(newdata)
colnames(out) <- object$levels
}
out
}
In the file top.c, the predictions() function will be modified to receive the variables passed by the .Call() statement, so that just like previously, the caseString array will be concatenated into a single string and the rest of the variables casted to their respective types. In this case the variables pred and confidence will be also received as vectors of integer and double types and so they will need to be casted to int* and double*. The rest of the function is left as it was in order to create the predictions and the resulting variables predv, confidencev and output variables will be placed in the first three positions of a vector respectively.
top.c:
SEXP call_predictions(SEXP caseString, SEXP case_len, SEXP names, SEXP tree, SEXP rules, SEXP cost, SEXP pred, SEXP confidence, SEXP trials){
char* casev;
char* outputv = "";
char* eptr;
char* size_str = malloc((strlen(CHAR(STRING_ELT(case_len, 0)))+1)*sizeof(char));
strcpy(size_str, CHAR(STRING_ELT(case_len, 0)));
long size = strtol(size_str, &eptr, 10);
casev = malloc((size+1)*sizeof(char));
strcpy(casev, CHAR(STRING_ELT(caseString, 0)));
int n = length(caseString);
for (int i = 1; i < n; i++) {
strcat(casev, CHAR(STRING_ELT(caseString, i)));
}
char* namesv = malloc((strlen(CHAR(STRING_ELT(names, 0)))+1)*sizeof(char));
strcpy(namesv, CHAR(STRING_ELT(names, 0)));
char* treev = malloc((strlen(CHAR(STRING_ELT(tree, 0)))+1)*sizeof(char));
strcpy(treev, CHAR(STRING_ELT(tree, 0)));
char* rulesv = malloc((strlen(CHAR(STRING_ELT(rules, 0)))+1)*sizeof(char));
strcpy(rulesv, CHAR(STRING_ELT(rules, 0)));
char* costv = malloc((strlen(CHAR(STRING_ELT(cost, 0)))+1)*sizeof(char));
strcpy(costv, CHAR(STRING_ELT(cost, 0)));
int variable;
int* predv = &variable;
int npred = length(pred);
predv = malloc((npred+1)*sizeof(int));
for (int i = 0; i < npred; i++) {
predv[i] = INTEGER(pred)[i];
}
double variable1;
double* confidencev = &variable1;
int nconf = length(confidence);
confidencev = malloc((nconf+1)*sizeof(double));
for (int i = 0; i < nconf; i++) {
confidencev[i] = REAL(confidence)[i];
}
int* trialsv = &variable;
*trialsv = asInteger(trials);
/* Original code for predictions starts */
int val;
// Announce ourselves for testing
// Rprintf("predictions called\n");
// Initialize the globals
initglobals();
// Handles the strbufv data structure
rbm_removeall();
// XXX Should this be controlled via an option?
// Rprintf("Calling setOf\n");
setOf();
STRBUF *sb_cases = strbuf_create_full(casev, strlen(casev));
if (rbm_register(sb_cases, "undefined.cases", 0) < 0) {
error("undefined.cases already exists");
}
STRBUF *sb_names = strbuf_create_full(namesv, strlen(namesv));
if (rbm_register(sb_names, "undefined.names", 0) < 0) {
error("undefined.names already exists");
}
if (strlen(treev)) {
STRBUF *sb_treev = strbuf_create_full(treev, strlen(treev));
if (rbm_register(sb_treev, "undefined.tree", 0) < 0) {
error("undefined.tree already exists");
}
} else if (strlen(rulesv)) {
STRBUF *sb_rulesv = strbuf_create_full(rulesv, strlen(rulesv));
if (rbm_register(sb_rulesv, "undefined.rules", 0) < 0) {
error("undefined.rules already exists");
}
setrules(1);
} else {
error("either a tree or rules must be provided");
}
// Create a strbuf using *costv and register it as "undefined.costs"
if (strlen(costv) > 0) {
// Rprintf("registering cost matrix: %s", *costv);
STRBUF *sb_costv = strbuf_create_full(costv, strlen(costv));
// XXX should sb_costv be copied?
if (rbm_register(sb_costv, "undefined.costs", 0) < 0) {
error("undefined.cost already exists");
}
} else {
// Rprintf("no cost matrix to register\n");
}
if ((val = setjmp(rbm_buf)) == 0) {
// Real work is done here
// Rprintf("\n\nCalling rpredictmain\n");
rpredictmain(trialsv, predv, confidencev);
// Rprintf("predict finished\n\n");
} else {
// Rprintf("predict code called exit with value %d\n\n", val - JMP_OFFSET);
}
// Close file object "Of", and return its contents via argument outputv
char *outputString = closeOf();
char *output = R_alloc(strlen(outputString) + 1, 1);
strcpy(output, outputString);
// We reinitialize the globals on exit out of general paranoia
initglobals();
/* Original code for predictions ends */
free(namesv);
free(treev);
free(rulesv);
free(costv);
SEXP predx = PROTECT(allocVector(INTSXP, npred));
for (int i = 0; i < npred; i++) {
INTEGER(predx)[i] = predv[i];
}
SEXP confidencex = PROTECT(allocVector(REALSXP, nconf));
for (int i = 0; i < npred; i++) {
REAL(confidencex)[i] = confidencev[i];
}
SEXP outputx = PROTECT(allocVector(STRSXP, 1));
SET_STRING_ELT(outputx, 0, mkChar(output));
SEXP vector = PROTECT(allocVector(VECSXP, 3));
SET_VECTOR_ELT(vector, 0, predx);
SET_VECTOR_ELT(vector, 1, confidencex);
SET_VECTOR_ELT(vector, 2, outputx);
UNPROTECT(4);
free(predv);
free(confidencev);
return vector;
}

Gouraud Shading in JavaFX

Each face has 3 colors (one for each vertex). I want to use Gouraud shading to blend these colors. So far, I have taken some inspiration from the FXyz library.
My current approach uses the setTextureModeVertices3D from the FXyz library. But this uses density maps, which does not work in my case because the colors don't come from a mathematical formula. My initial idea was to implement it as follows:
Calculate the color for each vertex in the mesh.
Extract a list of all the unique colors
Make a palette of the unique colors
val palette = object : ColorPalette {
override fun getNumColors() = colors.size
override fun getColor(i: Int) = colors.getOrNull(i)?: Color.BLACK
}
Make a map with as key the point value (x,y,z) and with as value the index of the color
Make a density function that returns the color index from the aforementioned map.
{ point3F ->
val key = Triple(point3F.x.toInt(), point3F.y.toInt(), point3F.z.toInt())
pointColorIndexMap[key]!!
}
I have a feeling my only option is to create one large image where I put in all shaded triangles, and then references those. But I am unsure what the best technique would be here. Any help is appreciated!
Edit:
This is the code I currently use (this is written in Kotlin):
override fun updateMesh() {
val definition = model.modelDefinition
val (colors1, colors2, colors3) = definition.calculateFaceColors()
val uniqueColorHSBValues = (colors1 + colors2 + colors3).toSet().toList()
val uniqueColors = uniqueColorHSBValues.map { ModelUtil.hsbToColor(it, null) }
val palette = object : ColorPalette {
override fun getNumColors() = uniqueColors.size
override fun getColor(i: Int) = uniqueColors.getOrNull(i)?: Color.BLACK
}
val pointColorIndexMap = HashMap<Triple<Int, Int, Int>, Int>()
for (face in 0 until definition.getFaceCount()) {
val type = definition.getFaceTypes()?.get(face)?.toInt()?.let { it and 3}?:0
val (p1, p2, p3) = definition.getPoints(face)
if (type == RENDER_SHADED_TRIANGLE) {
pointColorIndexMap[p1] = uniqueColorHSBValues.indexOf(colors1[face])
pointColorIndexMap[p2] = uniqueColorHSBValues.indexOf(colors2[face])
pointColorIndexMap[p3] = uniqueColorHSBValues.indexOf(colors3[face])
} else if (type == RENDER_FLAT_TRIANGLE) {
pointColorIndexMap[p1] = uniqueColorHSBValues.indexOf(colors1[face])
pointColorIndexMap[p2] = uniqueColorHSBValues.indexOf(colors1[face])
pointColorIndexMap[p3] = uniqueColorHSBValues.indexOf(colors1[face])
}
}
setTextureModeVertices3D(palette) { point3F ->
val key = Triple(point3F.x.toInt(), point3F.y.toInt(), point3F.z.toInt())
pointColorIndexMap[key]!!
}
val meshHelper = MeshHelper(atlas)
updateMesh(meshHelper)
}
This is how my current implementation of the shading looks like.
This is how my current implementation looks like without the shading.
This is how I want my implementation of the shading to roughly look like
This is how my palette looks like (9x10 pixels) (

Is there a destructor in R reference class?

Just as a test:
myclass = setRefClass("myclass",
fields = list(
x = "numeric",
y = "numeric"
))
myclass$methods(
dfunc = function(i) {
message("In dfunc, I save x and y...")
obj = .self
base::save(obj, file="/tmp/obj.rda")
}
)
myclass$methods(
print = function() {
if (.self$x > 10) {
stop("x is too large!")
}
message(paste("x: ", .self$x))
message(paste("y: ", .self$y))
}
)
myclass$methods(
initialize = function(x=NULL, y=NULL, obj=NULL) {
if(is.null(obj)) {
.self$x = x
.self$y = y
}
else {
.self$x = obj$x
.self$y = obj$y
}
}
)
myclass$methods(
finalize = function() {
message("I am finalizing this thing...")
}
)
Then try to create and remove an object:
u = myclass(15, 6)
u$print()
rm(u)
The finalize function is not called at all...

When you call rm you just remove the object reference from the enviroment, but you don't destroy the element.
That is the work of the garbage collector that is designed to automatically destroy objects when they have nomore reference (like in this case). Anyway, the garbage collector is triggered by some special events (e.g. too much memory used etc.), so it is not automatically invoked when you call rm (it will be probably called later later).
Anyway, you can force the garbage collector, even if this is usually discouraged, by calling gc().
u = myclass(15, 6)
rm(u)
gc()
# > I am finalizing this thing...
As you can see by running the above code, your finalize method is indeed called after gc()

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Using multiple constructors for R classes and subclasses - r

Related

Using R, how to scope internal functions within a MAIN function?

Invert Map<K, List<V>> to Map<V, K>

C5.0 package: Error in paste(apply(x, 1, paste, collapse = ","), collapse = "\n") : result would exceed 2^31-1 bytes

Gouraud Shading in JavaFX

Is there a destructor in R reference class?

Categories

Resources