Caching intermediate results in Rcpp objects - r

I'm currently trying to speedup an optimisation procedurÄ™ which uses Rcpp to calculate the objective function. My current setup is similar to this:
largeConstantVector <- readVector()
result <- optim(..., eval=function(par) evalRcpp(par, largeConstantVector))
and the evalRcpp function
double evalRcpp(NumericVector par, NumericVector constVector){
NumericVector parT = transform(par)
NumericVector constVectorT = transform(constVector)
return aggregate(parT, constVectorT)
}
What I would like to do is to calculate NumericVector constVectorT = transform(constVector) only once and keep the result in a C++ object and only use a reference to that object on R's side. So the procedurÄ™ would be similar to this:
largeConstantVector <- readVector()
objReference <- calculateCommonStuff(largeConstantVector)
result <- optim(..., eval=function(par) evalRcpp(par, objReference))
and the evalRcpp function
double evalRcpp(NumericVector par, const SomeClass& objRef){
NumericVector parT = transform(par)
NumericVector constVectorT = objRef.constVectorT
return aggregate(parT, constVectorT)
}
Is such an approach possible using Rcpp? Will it be possible to prevent unnecessary computation and data copying (that is keep the itermediate data "on the C++ side")?
Thanks in advance.

Yes, it is possible to keep the itermediate data "on the C++ side" as you say, but that is more of a C++ program design issue than anything particular to Rcpp.
Create a class with private state data, use a function to create the class object, then have it update during the iterations.
Rcpp will help to easily call those member functions, but it doesn't create the rest of the framework for you.

Related

Is there a way of passing an R function object (`CLOSXP`) to C level code and register it as a callback?

Problem
I am wrapping a fraction of the GLFW C library in R.
One functionality I'd like to wrap is the registration of callbacks.
For example, to set an error callback in C, GLFW provides the function:
GLFWerrorfun glfwSetErrorCallback(GLFWerrorfun callback)
whose callback signature is:
void callback_name(int error_code, const char* description)
Could you provide any pointers on how to allow the user to define a function in R, i.e. a CLOSXP object, and pass it, and, somehow, get it transformed to a C function passable to glfwSetErrorCallback().
I am looking for a plain R's C internal API solution, i.e., and not something based on Rcpp, please.
I have read this blog post , and I got the feeling I could perhaps use some combination of Rf_install() and R_tryEval()... but on the other hand, in Hadley's R internals, I only see Rf_install() associated with symbols (SYMSXP). I could not find much information about Rf_install() either except for some appearances in code snippets in An external pointer example in Writing R Extensions.
Edit
After reading the answer to Calling R from C from R in mcmc, I think might be able to register an R function as a callback if I resort to using a global variable, SEXP rcallback:
R code
#' #export
glfw_set_error_callback <- function(callback) {
.Call("glfw_set_error_callback_", callback)
}
C code
SEXP rcallback;
static void error_cb(int error, const char* description)
{
SEXP r_error = PROTECT(Rf_ScalarInteger(error));
SEXP r_description = PROTECT(Rf_mkString(description));
SEXP call = PROTECT(Rf_lang3(rcallback, r_error, r_description));
// `call` is evaluated for its side effects, so no need to store and/or
// PROTECT the result from its evaluation.
Rf_eval(call, R_GlobalEnv);
UNPROTECT(3);
return;
}
SEXP glfw_set_error_callback_(SEXP callback) {
rcallback = callback;
glfwSetErrorCallback(error_cb);
return R_NilValue;
}

How to achieve type stability when assigning values with StaticArrays?

I have the following struct (simplified), and some calculations done with this struct:
mutable struct XX{VecType}
v::VecType
end
long_calculation(x::XX) = sum(x.v)
as a part of the program i need to update the v value. the struct is callable and mainly used as a cache. here, the use of static arrays helps a lot in speeding up calculations, but the type of v is ultimately defined by an user. my problem lies when assigning new values to XX.v:
function (f::XX)(w)
f.v .= w #here lies the problem
return long_calculation(f)
this works if v <: Array and w is of any value, but it doesn't work when v <: StaticArrays.StaticArray, as setindex! is not defined on that type.
How can i write f.v .= w in a way that, when v allows it, performs an inplace modification, but when not, just creates a new value, and stores it in the XX struct?
There's a package for exactly this use case: BangBang.jl. From there, you can use setindex!!:
f.v = setindex!!(f.v, w)
Here I propose a simple solution that should be enough in most cases. Use multiple dispatch and define the following function:
my_assign!(f::XX, w) = (f.v .= w)
my_assign!(f::XX{<:StaticArray}, w) = (f.v = w)
and then simply call it in your code like this:
function (f::XX)(w)
my_assign!(f, w)
return long_calculation(f)
end
Then if you (or your users) get an error with a default implementation it is easy enough to add another method to my_assign! co cover other special cases when it throws an error.
Would such a solution be enough for you?

time_ns() result not saved by writedlm() in julia

I am working with a program which includes many function calls inside a for loop. For short, it is something like this:
function something()
....
....
timer = zeros(NSTEP);
for it = 1:NSTEP # time steps
tic = time_ns();
Threads.#threads for p in 1:2 # Star parallel of two sigma functions
Threads.lock(l);
Threads.unlock(l);
arg_in_sig[p] = func_sig[p](arg_in_sig[p]);
end
.....
.....
Threads.#threads for p in 1:2
Threads.lock(l)
Threads.unlock(l)
arg_in_vel[p] = func_vel[p](arg_in_vel[p])
end
toc=time_ns();
timer[i] = toc-tic;
end # time loop
writedlm("timer.txt",timer)
return
end
What I am trying to do, is to meassure the time that takes to perform on each loop iteration, saving the result in an output file called "timer.txt". The thing is that it doesn't work.
It saves a file with all zeros on it (except two or three values, which is more confusing).
I made a toy example like:
using DelimitedFiles;
function test()
a=zeros(1000)
for i=1:1000
tic = time_ns();
C = rand(20,20)*rand(20,20);
toc = time_ns();
a[i] = toc-tic;
end
writedlm("aaa.txt",a);
return a;
end
and these actually works (it saves fine!). Is there something to do with the fact that I am implementing Threads.#threads?. What can be happening between writedlm() and time_ns() in my program?
Any help would be much apreciated!
You are iterating over it but try to save by:
timer[i] = toc-tic;
while it should be
timer[it] = toc-tic;
Perhaps you have some i in global scope and hence the code still works.
Additionally locking the thread and immediately unlocking does not seem to make much sense. Moreover, when you iterate over p which happens to be also index of the Vector cell where you save the results there is no need to use the locking mechanism at all (unless you are calling some functions that depend on a global state).

Why is this Rcpp function alter variables outside of its own scope?

I am running the following sample code from http://markovjumps.blogspot.com/2011/12/r-array-to-rcpparmadillo-cube.html which is illustrating how one can transform R array to RcppArmadillo cube. The code is as follows
require(inline)
require(RcppArmadillo)
src <- '
using namespace Rcpp;
NumericVector vecArray(myArray);
IntegerVector arrayDims = vecArray.attr("dim");
arma::cube cubeArray(vecArray.begin(), arrayDims[0], arrayDims[1], arrayDims[2], false);
//change one element in the array/cube
cubeArray(0,0,0) = 518;
return(wrap(cubeArray));
'
readCube = cxxfunction(signature(myArray="numeric"),body=src, plugin="RcppArmadillo")
set.seed(345)
testArray = array(rnorm(8), dim=c(2,2,2))
print(testArray[1,1,1])
# -0.7849082
readCube(testArray)[1,1,1]
# 518
print(testArray)[1,1,1]
# 518
As can be seen, the testArray has been altered. However, I don't quite understand why this happens.
I did some search on the issue and found in http://arma.sourceforge.net/docs.html#Cube that "Create a cube using data from writable auxiliary (external) memory, where ptr_aux_mem is a pointer to the memory. By default the cube allocates its own memory and copies data from the auxiliary memory (for safety). However, if copy_aux_mem is set to false, the cube will instead directly use the auxiliary memory (ie. no copying); this is faster, but can be dangerous unless you know what you are doing!"
So I changed false to true and the problem is gone. However, I am still confusing since the original code create a new NumericVector vecArray and vecArray.begin() should refer to the memory of that new NumericVector object instead of the function input myArray. I feel changing cubeArray should only change vecArray but not myArray.
Thanks to Dirk, now I see where I was trapped. Below are my own solutions:
The problem would either be fixed via using clone() or creating a new Armadillo cube object. The two solutions are as follows.
// Using clone()
src <- '
using namespace Rcpp;
NumericVector vecArray(clone(myArray));
IntegerVector arrayDims = vecArray.attr("dim");
arma::cube cubeArray(vecArray.begin(), arrayDims[0], arrayDims[1], arrayDims[2], false);
//change one element in the array/cube
cubeArray(0,0,0) = 518;
return(wrap(cubeArray));
'
readCube = cxxfunction(signature(myArray="numeric"),body=src, plugin="RcppArmadillo")
or
// Not using clone()
src <- '
using namespace Rcpp;
NumericVector vecArray(myArray);
IntegerVector arrayDims = vecArray.attr("dim");
arma::cube cubeArray(vecArray.begin(), arrayDims[0], arrayDims[1], arrayDims[2]);
//change one element in the array/cube
cubeArray(0,0,0) = 518;
return(wrap(cubeArray));
'
readCube = cxxfunction(signature(myArray="numeric"),body=src, plugin="RcppArmadillo")
Note that you don't want to use clone() and at the same time also create a new cube object, since it is not efficient in memory.

Calling user specified R function in inline C++ body

I have been working with the R package "RcppArmadillo". I already used it to define two cxxfunction(they have been debugged are fine to use):
calc1 <- cxxfunction(signature(A="integer", B="integer"),...)
calc2 <- cxxfunction(signature(A="integer", K="integer"),...)
Now I'm writing the body part of another cxxfunction main and wish to call calc1 and calc2 within the for loops there, like:
body_main = '
...
for(int i=0; i<N; i++){
// This is where I want to call calc1.
// (?)
for(int j=0; j<N; j++){
// This is where I want to call calc2.
// (?)
}
}
'
Is there anyway that I can achieve that? Can that be done in an inline fashion?
I haven't seen an example of inline usage of RcppArmadillo(or Rcpp, RcppGSL) in which people write a subroutine within the body part - specifically, I mean code looks like this:
body_example = '
// Subroutine
SEXP(/*or something else*/) func_0(SEXP A, SEXP B){
...
return ...;
}
// Then call it from the main part
...
AB = func_0(A, B);
...
'
My question probably looks naive but it haunts me nevertheless. Can anyone help explain this? I'd appreciate that a lot!
You could switch from using cxxfunction() from package inline to using Rcpp attributes and its sourceCpp(). That way you get the predictable function headers at the C++ level, see the Rcpp atributes vignette.
Or split calc1 and calc2 into 'worker' and 'wrapper', have cxxfunction() around the wrapper allowing you to call the worker.
The key issue here really is that cxxfunction() exists to create an R-callable function, and it generates internal randomized function headers.
Lastly, a package would help too.

Resources