Why is this Rcpp function alter variables outside of its own scope? - r

I am running the following sample code from http://markovjumps.blogspot.com/2011/12/r-array-to-rcpparmadillo-cube.html which is illustrating how one can transform R array to RcppArmadillo cube. The code is as follows
require(inline)
require(RcppArmadillo)
src <- '
using namespace Rcpp;
NumericVector vecArray(myArray);
IntegerVector arrayDims = vecArray.attr("dim");
arma::cube cubeArray(vecArray.begin(), arrayDims[0], arrayDims[1], arrayDims[2], false);
//change one element in the array/cube
cubeArray(0,0,0) = 518;
return(wrap(cubeArray));
'
readCube = cxxfunction(signature(myArray="numeric"),body=src, plugin="RcppArmadillo")
set.seed(345)
testArray = array(rnorm(8), dim=c(2,2,2))
print(testArray[1,1,1])
# -0.7849082
readCube(testArray)[1,1,1]
# 518
print(testArray)[1,1,1]
# 518
As can be seen, the testArray has been altered. However, I don't quite understand why this happens.
I did some search on the issue and found in http://arma.sourceforge.net/docs.html#Cube that "Create a cube using data from writable auxiliary (external) memory, where ptr_aux_mem is a pointer to the memory. By default the cube allocates its own memory and copies data from the auxiliary memory (for safety). However, if copy_aux_mem is set to false, the cube will instead directly use the auxiliary memory (ie. no copying); this is faster, but can be dangerous unless you know what you are doing!"
So I changed false to true and the problem is gone. However, I am still confusing since the original code create a new NumericVector vecArray and vecArray.begin() should refer to the memory of that new NumericVector object instead of the function input myArray. I feel changing cubeArray should only change vecArray but not myArray.

Thanks to Dirk, now I see where I was trapped. Below are my own solutions:
The problem would either be fixed via using clone() or creating a new Armadillo cube object. The two solutions are as follows.
// Using clone()
src <- '
using namespace Rcpp;
NumericVector vecArray(clone(myArray));
IntegerVector arrayDims = vecArray.attr("dim");
arma::cube cubeArray(vecArray.begin(), arrayDims[0], arrayDims[1], arrayDims[2], false);
//change one element in the array/cube
cubeArray(0,0,0) = 518;
return(wrap(cubeArray));
'
readCube = cxxfunction(signature(myArray="numeric"),body=src, plugin="RcppArmadillo")
or
// Not using clone()
src <- '
using namespace Rcpp;
NumericVector vecArray(myArray);
IntegerVector arrayDims = vecArray.attr("dim");
arma::cube cubeArray(vecArray.begin(), arrayDims[0], arrayDims[1], arrayDims[2]);
//change one element in the array/cube
cubeArray(0,0,0) = 518;
return(wrap(cubeArray));
'
readCube = cxxfunction(signature(myArray="numeric"),body=src, plugin="RcppArmadillo")
Note that you don't want to use clone() and at the same time also create a new cube object, since it is not efficient in memory.

Related

Is there a way of passing an R function object (`CLOSXP`) to C level code and register it as a callback?

Problem
I am wrapping a fraction of the GLFW C library in R.
One functionality I'd like to wrap is the registration of callbacks.
For example, to set an error callback in C, GLFW provides the function:
GLFWerrorfun glfwSetErrorCallback(GLFWerrorfun callback)
whose callback signature is:
void callback_name(int error_code, const char* description)
Could you provide any pointers on how to allow the user to define a function in R, i.e. a CLOSXP object, and pass it, and, somehow, get it transformed to a C function passable to glfwSetErrorCallback().
I am looking for a plain R's C internal API solution, i.e., and not something based on Rcpp, please.
I have read this blog post , and I got the feeling I could perhaps use some combination of Rf_install() and R_tryEval()... but on the other hand, in Hadley's R internals, I only see Rf_install() associated with symbols (SYMSXP). I could not find much information about Rf_install() either except for some appearances in code snippets in An external pointer example in Writing R Extensions.
Edit
After reading the answer to Calling R from C from R in mcmc, I think might be able to register an R function as a callback if I resort to using a global variable, SEXP rcallback:
R code
#' #export
glfw_set_error_callback <- function(callback) {
.Call("glfw_set_error_callback_", callback)
}
C code
SEXP rcallback;
static void error_cb(int error, const char* description)
{
SEXP r_error = PROTECT(Rf_ScalarInteger(error));
SEXP r_description = PROTECT(Rf_mkString(description));
SEXP call = PROTECT(Rf_lang3(rcallback, r_error, r_description));
// `call` is evaluated for its side effects, so no need to store and/or
// PROTECT the result from its evaluation.
Rf_eval(call, R_GlobalEnv);
UNPROTECT(3);
return;
}
SEXP glfw_set_error_callback_(SEXP callback) {
rcallback = callback;
glfwSetErrorCallback(error_cb);
return R_NilValue;
}

How to copy Rcpp::DateVector to std::vector<boost::gregorian::date>

I want to pass data from zoo object into my program in C++ using Rinside,
but I don't know how to pass date.
Here is an example
RInside R(argc, argv); // create an embedded R instance
std::string cmd = "suppressMessages(library(zoo)); "
"z <- zoo(rnorm(10), as.Date('2000-01-01') - 0:10);";
R.parseEvalQ(cmd);
std::vector<double> v = Rcpp::as< std::vector< double > >(R.parseEval("coredata(z)"));
Rcpp::DateVector d ( (SEXP) R.parseEval("index(z)") );
std::vector<boost::gregorian::date> dt = //How assign d to dt ?
You need simple converters such as this in the RcppBDT package:
template <> boost::gregorian::date as( SEXP dtsexp ) {
Rcpp::Date dt(dtsexp);
return boost::gregorian::date(dt.getYear(), dt.getMonth(), dt.getDay());
}
which you then need to vectorise. Alternative, maybe use integer vector (with days since epoch).
Edit: An there is an entire Rcpp Gallery post devoting to this, as well as several more dealing in related topics.

Within C++ functions, how are Rcpp objects passed to other functions (by reference or by copy)?

I've just finished writing a new version of the ABCoptim package using Rcpp. With around 30x speed ups, I'm very happy with the new version's performance (vs old version), but I'm still having some concerns on if I have space to improve performance without modifying too much the code.
Within the main function of ABCoptim (written in C++) I'm passing around a Rcpp::List object containing "bees positions" (NumericMatrix) and some NumericVectors with important information for the algorithm itself. My question is, when I'm passing a Rcpp::List object around other functions, e.g.
#include <Rcpp.h>
using namespace Rcpp;
List ABCinit([some input]){[some code here]};
void ABCfun2(List x){[some code here]};
void ABCfun3(List x){[some code here]};
List ABCmain([some input])
{
List x = ABCinit([some input]);
while ([some statement])
{
ABCfun2(x);
ABCfun3(x);
}
...
return List::create(x["results"]);
}
What does Rcpp does within the while loop? Does the x object is passed by reference or by deep copy to the functions ABCfun2 and ABCfun3? I've seen the usage of 'const List&x', which tells me that I can pass Rcpp objects using pointers, but the thing is that I need this list to be variable (and no constant), is there anyway to improve this? I'm afraid that iterative copy of this x List can be slowing down my code.
PS: I'm still new to C++, furthermore I'm using Rcpp to learn C++.
There is no deep copy in Rcpp unless you ask for it with clone. When you pass by value, you are making a new List object but it uses the same underlying R object.
So the different is small between pass by value and pass by reference.
However, when you pass by value, you have to pay the price for protecting the underlying object one more time. It might incur extra cost as for this Rcpp relies on the recursive not very efficient R_PreserveObject.
My guideline would be to pass by reference whenever possible so that you don't pay extra protecting price. If you know that ABCfun2 won't change the object, I'd advise passing by reference to const : ABCfun2( const List& ). If you are going to make changes to the List, then I'd recommend using ABCfun2( List& ).
Consider this code:
#include <Rcpp.h>
using namespace Rcpp ;
#define DBG(MSG,X) Rprintf("%20s SEXP=<%p>. List=%p\n", MSG, (SEXP)X, &X ) ;
void fun_copy( List x, const char* idx ){
x[idx] = "foo" ;
DBG( "in fun_copy: ", x) ;
}
void fun_ref( List& x, const char* idx ){
x[idx] = "bar" ;
DBG( "in fun_ref: ", x) ;
}
// [[Rcpp::export]]
void test_copy(){
// create a list of 3 components
List data = List::create( _["a"] = 1, _["b"] = 2 ) ;
DBG( "initial: ", data) ;
fun_copy( data, "a") ;
DBG( "\nafter fun_copy (1): ", data) ;
// alter the 1st component of ths list, passed by value
fun_copy( data, "d") ;
DBG( "\nafter fun_copy (2): ", data) ;
}
// [[Rcpp::export]]
void test_ref(){
// create a list of 3 components
List data = List::create( _["a"] = 1, _["b"] = 2 ) ;
DBG( "initial: ", data) ;
fun_ref( data, "a") ;
DBG( "\nafter fun_ref (1): ", data) ;
// alter the 1st component of ths list, passed by value
fun_ref( data, "d") ;
DBG( "\nafter fun_ref (2): ", data) ;
}
All I'm doing is pass a list to a function, update it and print some information about the pointer to the underlying R object and the pointer to the List object ( this ) .
Here are the results of what happens when I call test_copy and test_ref:
> test_copy()
initial: SEXP=<0x7ff97c26c278>. List=0x7fff5b909fd0
in fun_copy: SEXP=<0x7ff97c26c278>. List=0x7fff5b909f30
after fun_copy (1): SEXP=<0x7ff97c26c278>. List=0x7fff5b909fd0
$a
[1] "foo"
$b
[1] 2
in fun_copy: SEXP=<0x7ff97b2b3ed8>. List=0x7fff5b909f20
after fun_copy (2): SEXP=<0x7ff97c26c278>. List=0x7fff5b909fd0
$a
[1] "foo"
$b
[1] 2
We start with an existing list associated with an R object.
initial: SEXP=<0x7fda4926d278>. List=0x7fff5bb5efd0
We pass it by value to fun_copy so we get a new List but using the same underlying R object:
in fun_copy: SEXP=<0x7fda4926d278>. List=0x7fff5bb5ef30
We exit of fun_copy. same underlying R object again, and back to our original List :
after fun_copy (1): SEXP=<0x7fda4926d278>. List=0x7fff5bb5efd0
Now we call again fun_copy but this time updating a component that was not on the list: x["d"]="foo".
in fun_copy: SEXP=<0x7fda48989120>. List=0x7fff5bb5ef20
List had no choice but to create itself a new underlying R object, but this object is only underlying to the local List. Therefore when we get out of get_copy, we are back to our original List with its original underlying SEXP.
after fun_copy (2): SEXP=<0x7fda4926d278>. List=0x7fff5bb5efd0
The key thing here is that the first time "a" was already on the list, so we updated the data directly. Because the local object to fun_copy and the outer object from test_copy share the same underlying R object, modifications inside fun_copy was propagated.
The second time, fun_copy grows its local List object, associating it with a brand new SEXP which does not propagate to the outer function.
Now consider what happens when you pass by reference :
> test_ref()
initial: SEXP=<0x7ff97c0e0f80>. List=0x7fff5b909fd0
in fun_ref: SEXP=<0x7ff97c0e0f80>. List=0x7fff5b909fd0
after fun_ref(1): SEXP=<0x7ff97c0e0f80>. List=0x7fff5b909fd0
$a
[1] "bar"
$b
[1] 2
in fun_ref: SEXP=<0x7ff97b5254c8>. List=0x7fff5b909fd0
after fun_ref(2): SEXP=<0x7ff97b5254c8>. List=0x7fff5b909fd0
$a
[1] "bar"
$b
[1] 2
$d
[1] "bar"
There is only one List object 0x7fff5b909fd0. When we have to get a new SEXP in the second call, it correctly gets propagated to the outer level.
To me, the behavior you get when passing by references is much easier to reason with.
Briefly:
void ABCfun(List x) passes by value but then again List is an Rcpp object wrapping a SEXP which is a pointer -- so the cost here is less than what a C++ programmer would suspect and it is in fact lightweight. (But as Romain rightly points out, there is cost in an extra protection layer.)
void ABCfun(const List x) promises not to change x, but again because it is a pointer...
void ABCfun(const List & x) looks most normal to a C++ programmer and is supported in Rcpp since last year.
Ipso facto, in the Rcpp context all three are about the same. But you should think along the lines of best C++ practice and prefer 3. as one day you may use a std::list<....> instead in which case the const reference clearly is preferable (Scott Meyers has an entire post about this in Effective C++ (or maybe in the companion More Effective C++).
But the most important lesson is that you should not just believe what people tell you on the internet, but rather measure and profile whenever possible.
I'm new to Rcpp so figured i'd answer #Dirk's request for a measurement of the cost of the two passing styles (copy and reference) ...
There is surprisingly little difference -- between the two approaches.
I get the below:
microbenchmark(test_copy(), test_ref(), times = 1e6)
Unit: microseconds
expr min lq mean median uq max neval cld
test_copy() 5.102 5.566 7.518406 6.030 6.494 106615.653 1e+06 a
test_ref() 4.639 5.566 7.262655 6.029 6.494 5794.319 1e+06 a
I used a cut-down version of #Roman's code: removing the DBG calls.
#include <Rcpp.h>
using namespace Rcpp;
void fun_copy( List x, const char* idx){
x[idx] = "foo";
}
void fun_ref( List& x, const char* idx){
x[idx] = "bar";
}
// [[Rcpp::export]]
List test_copy(){
// create a list of 3 components
List data = List::create( _["a"] = 1, _["b"] = 2);
// alter the 1st component of the list, passed by value
fun_copy( data, "a");
// add a 3rd component to the list
fun_copy( data, "d");
return(data);
}
// [[Rcpp::export]]
List test_ref(){
// create a list of 3 components
List data = List::create( _["a"] = 1, _["b"] = 2);
// alter the 1st component of the list, passed by reference
fun_ref( data, "a");
// add a 3rd component to the list
fun_ref( data, "d");
return(data);
}
/*** R
# benchmark copy v. ref functions
require(microbenchmark)
microbenchmark(test_copy(), test_ref(), times = 1e6)
*/

Caching intermediate results in Rcpp objects

I'm currently trying to speedup an optimisation procedurÄ™ which uses Rcpp to calculate the objective function. My current setup is similar to this:
largeConstantVector <- readVector()
result <- optim(..., eval=function(par) evalRcpp(par, largeConstantVector))
and the evalRcpp function
double evalRcpp(NumericVector par, NumericVector constVector){
NumericVector parT = transform(par)
NumericVector constVectorT = transform(constVector)
return aggregate(parT, constVectorT)
}
What I would like to do is to calculate NumericVector constVectorT = transform(constVector) only once and keep the result in a C++ object and only use a reference to that object on R's side. So the procedurÄ™ would be similar to this:
largeConstantVector <- readVector()
objReference <- calculateCommonStuff(largeConstantVector)
result <- optim(..., eval=function(par) evalRcpp(par, objReference))
and the evalRcpp function
double evalRcpp(NumericVector par, const SomeClass& objRef){
NumericVector parT = transform(par)
NumericVector constVectorT = objRef.constVectorT
return aggregate(parT, constVectorT)
}
Is such an approach possible using Rcpp? Will it be possible to prevent unnecessary computation and data copying (that is keep the itermediate data "on the C++ side")?
Thanks in advance.
Yes, it is possible to keep the itermediate data "on the C++ side" as you say, but that is more of a C++ program design issue than anything particular to Rcpp.
Create a class with private state data, use a function to create the class object, then have it update during the iterations.
Rcpp will help to easily call those member functions, but it doesn't create the rest of the framework for you.

Calling user specified R function in inline C++ body

I have been working with the R package "RcppArmadillo". I already used it to define two cxxfunction(they have been debugged are fine to use):
calc1 <- cxxfunction(signature(A="integer", B="integer"),...)
calc2 <- cxxfunction(signature(A="integer", K="integer"),...)
Now I'm writing the body part of another cxxfunction main and wish to call calc1 and calc2 within the for loops there, like:
body_main = '
...
for(int i=0; i<N; i++){
// This is where I want to call calc1.
// (?)
for(int j=0; j<N; j++){
// This is where I want to call calc2.
// (?)
}
}
'
Is there anyway that I can achieve that? Can that be done in an inline fashion?
I haven't seen an example of inline usage of RcppArmadillo(or Rcpp, RcppGSL) in which people write a subroutine within the body part - specifically, I mean code looks like this:
body_example = '
// Subroutine
SEXP(/*or something else*/) func_0(SEXP A, SEXP B){
...
return ...;
}
// Then call it from the main part
...
AB = func_0(A, B);
...
'
My question probably looks naive but it haunts me nevertheless. Can anyone help explain this? I'd appreciate that a lot!
You could switch from using cxxfunction() from package inline to using Rcpp attributes and its sourceCpp(). That way you get the predictable function headers at the C++ level, see the Rcpp atributes vignette.
Or split calc1 and calc2 into 'worker' and 'wrapper', have cxxfunction() around the wrapper allowing you to call the worker.
The key issue here really is that cxxfunction() exists to create an R-callable function, and it generates internal randomized function headers.
Lastly, a package would help too.

Resources