How to speed up writing to a matrix in a reference class in R - r

Here is a piece of R code that writes to each element of a matrix in a reference class. It runs incredibly slowly, and I’m wondering if I’ve missed a simple trick that will speed this up.
nx = 2000
ny = 10
ref_matrix <- setRefClass(
"ref_matrix",fields = list(data = "matrix"),
)
out <- ref_matrix(data = matrix(0.0,nx,ny))
#tracemem(out$data)
for (iy in 1:ny) {
for (ix in 1:nx) {
out$data[ix,iy] <- ix + iy
}
}
It seems that each write to an element of the matrix triggers a check that involves a copy of the entire matrix. (Uncommenting the tracemen() call shows this.) Now, I’ve found a discussion that seems to confirm this:
https://r-devel.r-project.narkive.com/8KtYICjV/rd-copy-on-assignment-to-large-field-of-reference-class
and this also seems to be covered by Speeding up field access in R reference classes
but in both of these this behaviour can be bypassed by not declaring a class for the field, and this works for the example in the first link which uses a 1D vector, b, which can just be set as b <<- 1:10000. But I’ve not found an equivalent way of creating a 2D array without using a explicit “matrix” instance.
Am I just missing something simple, or is this actually not possible?
Let me add a couple of things. First, I’m very new to R, so could easily have missed something. Second, I’m really just curious about the way reference classes work in this case and whether there’s a simple way to use them efficiently; I’m not looking for a really fast way to set the elements of a matrix - I can do that by not having the matrix in a reference class at all, and if I really care about speed I can write a C routine to do it and call it from R.
Here’s some background that might explain why I’m interested in this, which you’re welcome to ignore.
I got here by wanting to see how different languages, and even different compiler options and different ways of coding the same operation, compared for efficiency when accessing 2D rectangular arrays. I’ve been playing with a test program that creates two 2D arrays of the same size, and calls a subroutine that sets the first to the elements of the second plus their index values. (Almost any operation would do, but this one isn’t completely trivial to optimise.) I have this in a number of languages now, C, C++, Julia, Tcl, Fortran, Swift, etc., even hand-coded assembler (spoiler alert: assembler isn’t worth the effort any more) and thought I’d try R. The obvious implementation in R passes the two arrays to a subroutine that does the work, but because R doesn’t normally pass by reference, that routine has to make a copy of the modified array and return that as the function value. I thought using a reference class would avoid the relatively minor overhead of that copy, so I tried that and was surprised to discover that, far from speeding things up, it slowed them down enormously.

Use outer:
out$data <- outer(1:ny, 1:nx, `+`)
Also, don't use reference classes (or R6 classes) unless you actually need reference semantics. KISS and all that.

Related

Rf_allocVector only allocates and does not zero out memory

Original motivation behind this is that I have a dynamically sized array of floats that I want to pass to R through Rcpp without either incurring the cost of a zeroing out nor the cost of a deep copy.
Originally I had thought that there might be some way to take heap allocated array, make it aware to R's gc system and then wrap it with other data to create a "Rcpp::NumericVector" but it seems like that that's not possible - or doable with my current knowledge.
However and correct me if I'm wrong it looks like simply constructing a NumericVector with a size N and then using it as an N sized allocation will call R.h's Rf_allocVector and that itself does not either zero out the allocated array - I tested it on a small C program that gets dyn.loaded into R and it looks like garbage values. I also took a peek at the assembly and there doesn't seem to be any zeroing out.
Can anyone confirm this or offer any alternate solution?
Welcome to StackOverflow.
You marked this rcpp but that is a function from the C API of R -- whereas the Rcpp API offers you its constructors which do in fact set the memory tp zero:
> Rcpp::cppFunction("NumericVector goodVec(int n) { return NumericVector(n); }")
> sum(goodVec(1e7))
[1] 0
>
This creates a dynamically allocated vector using R's memory functions. The vector is indistinguishable from R's own. And it has the memory set to zero
as we use R_Calloc, which is documented in Writing R Extension to setting the memory to zero. (We may also use memcpy() explicitly, you can check the sources.)
So in short, you just have yourself confused over what the C API of R, as well as Rcpp offer, and what is easiest to use when. Keep reading documentation, running and writing examples, and studying existing code. It's all out there!

matrix multiplication order PVM vs MVP in graphics programming

hi there I was wondering why most tutorials and programming code use MVP to describe the Model-View-Projection matrix. Instead of PVM which is the actual order of implementation in the code:
mat4 MVP = ProjectionMatrix * ViewMatrix * ModelMatrix;
gl_Position = MVP * VertexInModelSpace;
seems much more understandable to me to write PVM instead of MVP.
Matrices don't actually have a fixed meaning, just relations between rows and columns. The meaning is freely definable by the developers. The MVP order follows from standard mathematical conventions. But since nothing says you can not define the vectors as columns instead of rows nothing precludes this ordering.
Clarification: Since changing notation transposes the meaning. Then following applies:
MmvpT = Mpvm
Due to the definition of matrix multiplication following rule kicks in:
(AB)T = BTAT
Since B can be recursively another matrix multiplication a infinite chain of these are possible. Which means essentially that you have swapped the multiplication order, by changing notation.
Its a bit like looking at the problem from the outside or the problem from the inside. In this case your thinking as a outside observer. Whereas the other way around one would observe the thing from the standpoint of the first operator in the chain. Personally I think the notation you use may be more intuitive for this specific task, the other is just way more common. Mainly due to the fact that all mathematics books I have ever seen use this convention, so blame the mathematicians.
So better stick with the more common way, makes things more generally understandable. For example: Nothing stops me from typing the answer in Finnish but the convention of stackoverflow is to answer in English, making answers more understandable to most users. Use the more common form since others may not grasp the difference, and this leads to errors.
The other problem is that matrix multiplication is not necessarily commutative:
AB != BA
So it's a good idea to stick with the convention.

How can you efficiently check values of large vectors in R?

One thing I want to do all the time in my R code is to test whether certain conditions hold for a vector, such as whether it contains any or all values equal to some specified value. The Rish way to do this is to create a boolean vector and use any or all, for example:
any(is.na(my_big_vector))
all(my_big_vector == my_big_vector[[1]])
...
It seems really inefficient to me to allocate a big vector and fill it with values, just to throw it away (especially if any() or all() call can be short-circuited after testing only a couple of the values. Is there a better way to do this, or should I just hand in my desire to write code that is both efficient and succinct when working in R?
"Cheap, fast, reliable: pick any two" is a dry way of saying that you sometimes need to order your priorities when building or designing systems.
It is rather similar here: the cost of the concise expression is the fact that memory gets allocated behind the scenes. If that really is a problem, then you can always write a (compiled ?) routines to runs (quickly) along the vectors and uses only pair of values at a time.
You can trade off memory usage versus performance versus expressiveness, but is difficult to hit all three at the same time.
which(is.na(my_big_vector))
which(my_big_vector == 5)
which(my_big_vector < 3)
And if you want to count them...
length(which(is.na(my_big_vector)))
I think it is not a good idea -- R is a very high-level language, so what you should do is to follow standards. This way R developers know what to optimize. You should also remember that while R is functional and lazy language, it is even possible that statement like
any(is.na(a))
can be recognized and executed as something like
.Internal(is_any_na,a)

Multidimensional vectors in Scheme?

I earlier asked a question about arrays in scheme (turns out they're called vectors but are basically otherwise the same as you'd expect).
Is there an easy way to do multidimensional arrays vectors in PLT Scheme though? For my purposes I'd like to have a procedure called make-multid-vector or something.
By the way if this doesn't already exist, I don't need a full code example of how to implement it. If I have to roll this myself I'd appreciate some general direction though. The way I'd probably do it is to just iterate through each element of the currently highest dimension of the vector to add another dimension, but I can see that being a bit ugly using scheme's recursive setup.
Also, this seems like something I should have been able to find myself so please know that I did actually google it and nothing came up.
The two common approaches are the same as in many languages, either use a vector of vectors, or (more efficiently) use a single vector of X*Y and compute the location of each reference. But there is a library that does that -- look in the docs for srfi/25, which you can get with (require srfi/25).

Efficiency of stack-based expression evaluation for math parsing

I have to write, for academic purposes, an application that plots user-input expressions like: f(x) = 1 - exp(3^(5*ln(cosx)) + x)
The approach I've chosen to write the parser is to convert the expression in RPN with the Shunting-Yard algorithm, treating primitive functions like "cos" as unary operators. This means the function written above would be converted in a series of tokens like:
1, x, cos, ln, 5, *,3, ^, exp, -
The problem is that to plot the function I have to evaluate it LOTS of times, so applying the stack evaluation algorithm for each input value would be very inefficient.
How can I solve this? Do I have to forget the RPN idea?
How much is "LOTS of times"? A million?
What kind of functions could be input? Can we assume they are continuous?
Did you try measuring how well your code performs?
(Sorry, started off with questions!)
You could try one of the two approaches (or both) described briefly below (there are probably many more):
1) Parse Trees.
You could create a Parse Tree. Then do what most compilers do to optimize expressions, constant folding, common subexpression elimination (which you could achieve by linking together the common expression subtrees and caching the result), etc.
Then you could use lazy evaluation techniques to avoid whole subtrees. For instance if you have a tree
*
/ \
A B
where A evaluates to 0, you could completely avoid evaluating B as you know the result is 0. With RPN you would lose out on the lazy evaluation.
2) Interpolation
Assuming your function is continuous, you could approximate your function to a high degree of accuracy using Polynomial Interpolation. This way you can do the complicated calculation of the function a few times (based on the degree of polynomial you choose), and then do fast polynomial calculations for the rest of the time.
To create the initial set of data, you could just use approach 1 or just stick to using your RPN, as you would only be generating a few values.
So if you use Interpolation, you could keep your RPN...
Hope that helps!
Why reinvent the wheel? Use a fast scripting language instead.
Integrating something like lua into your code will take very little time and be very fast.
You'll usually be able byte compile your expression, and that should result in code that runs very fast, certainly fast enough for simple 1D graphs.
I recommend lua as its fast, and integrates with C/C++ easier than any other scripting language. Another good options would be python, but while its better known I found it trickier to integrate.
Why not keep around a parse tree (I use "tree" loosely, in your case it's a sequence of operations), and mark input variables accordingly? (e.g. for inputs x, y, z, etc. annotate "x" with 0 to signify the first input variable, "y" with 1 to signify the 2nd input variable, etc.)
That way you can parse the expression once, keep the parse tree, take in an array of inputs, and apply the parse tree to evaluate.
If you're worrying about the performance aspects of the evaluation step (vs. the parsing step), I don't think you'd do much better unless you get into vectorizing (applying your parse tree on a vector of inputs at once) or hard-coding the operations into a fixed function.
What I do is use the shunting algorithm to produce the RPN. I then "compile" the RPN into a tokenised form that can be executed (interpretively) repeatedly without re-parsing the expression.
Michael Anderson suggested Lua. If you want to try Lua for just this task, see my ae library.
Inefficient in what sense? There's machine time and programmer time. Is there a standard for how fast it needs to run with a particular level of complexity? Is it more important to finish the assignment and move on to the next one (perfectionists sometimes never finish)?
All those steps have to happen for each input value. Yes, you could have a heuristic that scans the list of operations and cleans it up a bit. Yes, you could compile some of it down to assembly instead of calling +, * etc. as high level functions. You can compare vectorization (doing all the +'s then all the *'s etc, with a vector of values) to doing the whole procedure for one value at a time. But do you need to?
I mean, what do you think happens if you plot a function in gnuplot or Mathematica?
Your simple interpretation of RPN should work just fine, especially since it contains
math library functions like cos, exp, and ^(pow, involving logs)
symbol table lookup
Hopefully, your symbol table (with variables like x in it) will be short and simple.
The library functions will most likely be your biggest time-takers, so unless your interpreter is poorly written, it will not be a problem.
If, however, you really gotta go for speed, you could translate the expression into C code, compile and link it into a dll on-the-fly and load it (takes about a second). That, plus memoized versions of the math functions, could give you the best performance.
P.S. For parsing, your syntax is pretty vanilla, so a simple recursive-descent parser (about a page of code, O(n) same as shunting-yard) should work just fine. In fact, you might just be able to compute the result as you parse (if math functions are taking most of the time), and not bother with parse trees, RPN, any of that stuff.
I think this RPN based library can serve the purpose: http://expressionoasis.vedantatree.com/
I used it with one of my calculator project and it works well. It is small and simple, but extensible.
One optimization would be to replace the stack with an array of values and implement the evaluator as a three address mechine where each operation loads from two (or one) location and saves to a third. This can make for very tight code:
struct Op {
enum {
add, sub, mul, div,
cos, sin, tan,
//....
} op;
int a, b, d;
}
void go(Op* ops, int n, float* v) {
for(int i = 0; i < n; i++) {
switch(ops[i].op) {
case add: v[op[i].d] = v[op[i].a] + v[op[i].b]; break;
case sub: v[op[i].d] = v[op[i].a] - v[op[i].b]; break;
case mul: v[op[i].d] = v[op[i].a] * v[op[i].b]; break;
case div: v[op[i].d] = v[op[i].a] / v[op[i].b]; break;
//...
}
}
}
The conversion from RPN to 3-address should be easy as 3-address is a generalization.

Resources