Rcpp parallelize functions that return XPtr? [duplicate] - r

This question already has answers here:
Using Rcpp within parallel code via snow to make a cluster
(3 answers)
Closed 6 years ago.
Taking an example XPtr function:
test.cpp
#include <Rcpp.h>
// [[Rcpp::export]]
SEXP funx()
{
/* creating a pointer to a vector<int> */
std::vector<int>* v = new std::vector<int> ;
v->push_back( 1 ) ;
v->push_back( 2 ) ;
/* wrap the pointer as an external pointer */
/* this automatically protected the external pointer from R garbage
collection until p goes out of scope. */
Rcpp::XPtr< std::vector<int> > p(v, true) ;
/* return it back to R, since p goes out of scope after the return
the external pointer is no more protected by p, but it gets
protected by being on the R side */
return( p ) ;
}
R
library(Rcpp)
sourceCpp("test.cpp")
xp <- funx()
xp
<pointer: 0x9618cc0>
But if I try to parallelize this I get null pointers
library(parallel)
out <- mclapply(1:2, function(x) funx())
out
[[1]]
<pointer: (nil)>
[[2]]
<pointer: (nil)>
Is it possible to achieve this kind of functionality?
Edit
It is worth noting that despite a duplicate question there appears to be no true solution to this problem. From what I understand now, an XPtr is not able to be multi-threaded. So essentially this cannot be done in R.
For example, when I put the function inside package test and try to use snow it still fails to return the pointers.
library(test)
library(snow)
fun <- function(){
library(test)
test:::funx()
}
cl <- makeCluster(2, type = "SOCK")
clusterExport(cl, 'fun')
clusterCall(cl, fun)
[[1]]
<pointer: (nil)>
[[2]]
<pointer: (nil)>

Regarding
Is it possible to achieve this kind of functionality?
I would say the answer is a pretty firm 'nope' as the First Rule of Fight Club applies here: you simply cannot parallelise the underlying R instance merely by hoping it would work. Packages like RcppParallel are very careful about using non-R data structures for multithreaded work.
I may be too pessimistic but I would place the 'collection level' one level deeper, and only return its aggregated result to R.

Related

Rcpp override summary method for custom class

Suppose I have the following function:
List foo(List x)
{
x.attr("class") = "myOwnClass";
return(x);
}
I whant to override R summary method for foo function output. However the following R-style approach does not work:
List summary.myOwnClass(List x)
{
return(x)
}
During compilation I have a error which says that "expected initializer before '.' token".
Please help me to understand how to implement summary function override within Rcpp framework for my customly defined class.
Will be very greatfull for help!
I feel like this is likely a duplicate, but my initial search didn't pull one up. I add a quick answer for now, but if I later find one I'll delete this answer and propose a duplicate.
The way to solve this issue is to use the export tag to specify the function's R side name as summary.myOwnClass while using something else for the C++ side name; you can't have dots in the middle of a C++ function name (think about, e.g., how member functions are called -- it would be unworkable). So, we do the following
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
List foo(List x)
{
x.attr("class") = "myOwnClass";
return(x);
}
// [[Rcpp::export(summary.myOwnClass)]]
List summary(List x)
{
return(x);
}
/*** R
l <- foo(1:3)
summary(l)
*/
Then we get the output we expect
> l <- foo(1:3)
> summary(l)
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
attr(,"class")
[1] "myOwnClass"

Rcpp function to construct a function

In R the possibility exists to have a function that creates another function, e.g.
create_ax2 <- function(a) {
ax2 <- function(x) {
y <- a * x^2
return(y)
}
return(ax2)
}
The result of which is
> fun <- create_ax2(3)
> fun(1)
[1] 3
> fun(2)
[1] 12
> fun(2.5)
[1] 18.75
I have such a complicated create function in R which take a couple of arguments, sets some of the constants used in the returned function, does some intermediary computations etc... But the result is a function that is way too slow. Hence I tried to translate the code to C++ to use it with Rcpp. However, I can't figure out a way to construct a function inside a C++ function and return it to be used in R.
This is what I have so far:
Rcpp::Function createax2Rcpp(int a) {
double ax2(double x) {
return(a * pow(x, 2));
};
return (ax2);
}
This gives me the error 'function definition is not allowed here', I am stuck about how to create the function.
EDIT: The question RcppArmadillo pass user-defined function comes close, but as far as I can tell, it only provides a way to pass a C++ function to R. It does not provide a way to initialise some values in the C++ function before it is passed to R.
Ok, as far as I understand, you want a function returning function with a closure, a.k.a. " the function defined in the closure 'remembers' the environment in which it was created."
In C++11 and up it is quite possible to define such function, along the lines
std::function<double(double)> createax2Rcpp(int a) {
auto ax2 = [a](double x) { return(double(a) * pow(x, 2)); };
return ax2;
}
What happens, the anonymous class and object with overloaded operator() will be created, it will capture the closure and moved out of the creator function. Return will be captured into instance of std::function with type erasure etc.
But! C/C++ function in R requires to be of a certain type, which is narrower (as an opposite to wider, you could capture narrow objects into wide one, but not vice versa).
Thus, I don't know how to make from std::function a proper R function, looks like it is impossible.
Perhaps, emulation of the closure like below might help
static int __a;
double ax2(double x) {
return(__a * pow(x, 2));
}
Rcpp::Function createax2Rcpp(int a) {
__a = a;
return (ax2);
}

Within C++ functions, how are Rcpp objects passed to other functions (by reference or by copy)?

I've just finished writing a new version of the ABCoptim package using Rcpp. With around 30x speed ups, I'm very happy with the new version's performance (vs old version), but I'm still having some concerns on if I have space to improve performance without modifying too much the code.
Within the main function of ABCoptim (written in C++) I'm passing around a Rcpp::List object containing "bees positions" (NumericMatrix) and some NumericVectors with important information for the algorithm itself. My question is, when I'm passing a Rcpp::List object around other functions, e.g.
#include <Rcpp.h>
using namespace Rcpp;
List ABCinit([some input]){[some code here]};
void ABCfun2(List x){[some code here]};
void ABCfun3(List x){[some code here]};
List ABCmain([some input])
{
List x = ABCinit([some input]);
while ([some statement])
{
ABCfun2(x);
ABCfun3(x);
}
...
return List::create(x["results"]);
}
What does Rcpp does within the while loop? Does the x object is passed by reference or by deep copy to the functions ABCfun2 and ABCfun3? I've seen the usage of 'const List&x', which tells me that I can pass Rcpp objects using pointers, but the thing is that I need this list to be variable (and no constant), is there anyway to improve this? I'm afraid that iterative copy of this x List can be slowing down my code.
PS: I'm still new to C++, furthermore I'm using Rcpp to learn C++.
There is no deep copy in Rcpp unless you ask for it with clone. When you pass by value, you are making a new List object but it uses the same underlying R object.
So the different is small between pass by value and pass by reference.
However, when you pass by value, you have to pay the price for protecting the underlying object one more time. It might incur extra cost as for this Rcpp relies on the recursive not very efficient R_PreserveObject.
My guideline would be to pass by reference whenever possible so that you don't pay extra protecting price. If you know that ABCfun2 won't change the object, I'd advise passing by reference to const : ABCfun2( const List& ). If you are going to make changes to the List, then I'd recommend using ABCfun2( List& ).
Consider this code:
#include <Rcpp.h>
using namespace Rcpp ;
#define DBG(MSG,X) Rprintf("%20s SEXP=<%p>. List=%p\n", MSG, (SEXP)X, &X ) ;
void fun_copy( List x, const char* idx ){
x[idx] = "foo" ;
DBG( "in fun_copy: ", x) ;
}
void fun_ref( List& x, const char* idx ){
x[idx] = "bar" ;
DBG( "in fun_ref: ", x) ;
}
// [[Rcpp::export]]
void test_copy(){
// create a list of 3 components
List data = List::create( _["a"] = 1, _["b"] = 2 ) ;
DBG( "initial: ", data) ;
fun_copy( data, "a") ;
DBG( "\nafter fun_copy (1): ", data) ;
// alter the 1st component of ths list, passed by value
fun_copy( data, "d") ;
DBG( "\nafter fun_copy (2): ", data) ;
}
// [[Rcpp::export]]
void test_ref(){
// create a list of 3 components
List data = List::create( _["a"] = 1, _["b"] = 2 ) ;
DBG( "initial: ", data) ;
fun_ref( data, "a") ;
DBG( "\nafter fun_ref (1): ", data) ;
// alter the 1st component of ths list, passed by value
fun_ref( data, "d") ;
DBG( "\nafter fun_ref (2): ", data) ;
}
All I'm doing is pass a list to a function, update it and print some information about the pointer to the underlying R object and the pointer to the List object ( this ) .
Here are the results of what happens when I call test_copy and test_ref:
> test_copy()
initial: SEXP=<0x7ff97c26c278>. List=0x7fff5b909fd0
in fun_copy: SEXP=<0x7ff97c26c278>. List=0x7fff5b909f30
after fun_copy (1): SEXP=<0x7ff97c26c278>. List=0x7fff5b909fd0
$a
[1] "foo"
$b
[1] 2
in fun_copy: SEXP=<0x7ff97b2b3ed8>. List=0x7fff5b909f20
after fun_copy (2): SEXP=<0x7ff97c26c278>. List=0x7fff5b909fd0
$a
[1] "foo"
$b
[1] 2
We start with an existing list associated with an R object.
initial: SEXP=<0x7fda4926d278>. List=0x7fff5bb5efd0
We pass it by value to fun_copy so we get a new List but using the same underlying R object:
in fun_copy: SEXP=<0x7fda4926d278>. List=0x7fff5bb5ef30
We exit of fun_copy. same underlying R object again, and back to our original List :
after fun_copy (1): SEXP=<0x7fda4926d278>. List=0x7fff5bb5efd0
Now we call again fun_copy but this time updating a component that was not on the list: x["d"]="foo".
in fun_copy: SEXP=<0x7fda48989120>. List=0x7fff5bb5ef20
List had no choice but to create itself a new underlying R object, but this object is only underlying to the local List. Therefore when we get out of get_copy, we are back to our original List with its original underlying SEXP.
after fun_copy (2): SEXP=<0x7fda4926d278>. List=0x7fff5bb5efd0
The key thing here is that the first time "a" was already on the list, so we updated the data directly. Because the local object to fun_copy and the outer object from test_copy share the same underlying R object, modifications inside fun_copy was propagated.
The second time, fun_copy grows its local List object, associating it with a brand new SEXP which does not propagate to the outer function.
Now consider what happens when you pass by reference :
> test_ref()
initial: SEXP=<0x7ff97c0e0f80>. List=0x7fff5b909fd0
in fun_ref: SEXP=<0x7ff97c0e0f80>. List=0x7fff5b909fd0
after fun_ref(1): SEXP=<0x7ff97c0e0f80>. List=0x7fff5b909fd0
$a
[1] "bar"
$b
[1] 2
in fun_ref: SEXP=<0x7ff97b5254c8>. List=0x7fff5b909fd0
after fun_ref(2): SEXP=<0x7ff97b5254c8>. List=0x7fff5b909fd0
$a
[1] "bar"
$b
[1] 2
$d
[1] "bar"
There is only one List object 0x7fff5b909fd0. When we have to get a new SEXP in the second call, it correctly gets propagated to the outer level.
To me, the behavior you get when passing by references is much easier to reason with.
Briefly:
void ABCfun(List x) passes by value but then again List is an Rcpp object wrapping a SEXP which is a pointer -- so the cost here is less than what a C++ programmer would suspect and it is in fact lightweight. (But as Romain rightly points out, there is cost in an extra protection layer.)
void ABCfun(const List x) promises not to change x, but again because it is a pointer...
void ABCfun(const List & x) looks most normal to a C++ programmer and is supported in Rcpp since last year.
Ipso facto, in the Rcpp context all three are about the same. But you should think along the lines of best C++ practice and prefer 3. as one day you may use a std::list<....> instead in which case the const reference clearly is preferable (Scott Meyers has an entire post about this in Effective C++ (or maybe in the companion More Effective C++).
But the most important lesson is that you should not just believe what people tell you on the internet, but rather measure and profile whenever possible.
I'm new to Rcpp so figured i'd answer #Dirk's request for a measurement of the cost of the two passing styles (copy and reference) ...
There is surprisingly little difference -- between the two approaches.
I get the below:
microbenchmark(test_copy(), test_ref(), times = 1e6)
Unit: microseconds
expr min lq mean median uq max neval cld
test_copy() 5.102 5.566 7.518406 6.030 6.494 106615.653 1e+06 a
test_ref() 4.639 5.566 7.262655 6.029 6.494 5794.319 1e+06 a
I used a cut-down version of #Roman's code: removing the DBG calls.
#include <Rcpp.h>
using namespace Rcpp;
void fun_copy( List x, const char* idx){
x[idx] = "foo";
}
void fun_ref( List& x, const char* idx){
x[idx] = "bar";
}
// [[Rcpp::export]]
List test_copy(){
// create a list of 3 components
List data = List::create( _["a"] = 1, _["b"] = 2);
// alter the 1st component of the list, passed by value
fun_copy( data, "a");
// add a 3rd component to the list
fun_copy( data, "d");
return(data);
}
// [[Rcpp::export]]
List test_ref(){
// create a list of 3 components
List data = List::create( _["a"] = 1, _["b"] = 2);
// alter the 1st component of the list, passed by reference
fun_ref( data, "a");
// add a 3rd component to the list
fun_ref( data, "d");
return(data);
}
/*** R
# benchmark copy v. ref functions
require(microbenchmark)
microbenchmark(test_copy(), test_ref(), times = 1e6)
*/

Caching intermediate results in Rcpp objects

I'm currently trying to speedup an optimisation procedurÄ™ which uses Rcpp to calculate the objective function. My current setup is similar to this:
largeConstantVector <- readVector()
result <- optim(..., eval=function(par) evalRcpp(par, largeConstantVector))
and the evalRcpp function
double evalRcpp(NumericVector par, NumericVector constVector){
NumericVector parT = transform(par)
NumericVector constVectorT = transform(constVector)
return aggregate(parT, constVectorT)
}
What I would like to do is to calculate NumericVector constVectorT = transform(constVector) only once and keep the result in a C++ object and only use a reference to that object on R's side. So the procedurÄ™ would be similar to this:
largeConstantVector <- readVector()
objReference <- calculateCommonStuff(largeConstantVector)
result <- optim(..., eval=function(par) evalRcpp(par, objReference))
and the evalRcpp function
double evalRcpp(NumericVector par, const SomeClass& objRef){
NumericVector parT = transform(par)
NumericVector constVectorT = objRef.constVectorT
return aggregate(parT, constVectorT)
}
Is such an approach possible using Rcpp? Will it be possible to prevent unnecessary computation and data copying (that is keep the itermediate data "on the C++ side")?
Thanks in advance.
Yes, it is possible to keep the itermediate data "on the C++ side" as you say, but that is more of a C++ program design issue than anything particular to Rcpp.
Create a class with private state data, use a function to create the class object, then have it update during the iterations.
Rcpp will help to easily call those member functions, but it doesn't create the rest of the framework for you.

Forcing specific data types as arguments to a function

I was just wondering if there was a way to force a function to only accept certain data types, without having to check for it within the function; or, is this not possible because R's type-checking is done at runtime (as opposed to those programming languages, such as Java, where type-checking is done during compilation)?
For example, in Java, you have to specify a data type:
class t2 {
public int addone (int n) {
return n+1;
}
}
In R, a similar function might be
addone <- function(n)
{
return(n+1)
}
but if a vector is supplied, a vector will (obviously) be returned. If you only want a single integer to be accepted, then is the only way to do to have a condition within the function, along the lines of
addone <- function(n)
{
if(is.vector(n) && length(n)==1)
{
return(n+1)
} else
{
return ("You must enter a single integer")
}
}
Thanks,
Chris
This is entirely possible using S3 classes. Your example is somewhat contrived in the context or R, since I can't think of a practical reason why one would want to create a class of a single value. Nonetheless, this is possible. As an added bonus, I demonstrate how the function addone can be used to add the value of one to numeric vectors (trivial) and character vectors (so A turns to B, etc.):
Start by creating a generic S3 method for addone, utlising the S3 despatch mechanism UseMethod:
addone <- function(x){
UseMethod("addone", x)
}
Next, create the contrived class single, defined as the first element of whatever is passed to it:
as.single <- function(x){
ret <- unlist(x)[1]
class(ret) <- "single"
ret
}
Now create methods to handle the various classes. The default method will be called unless a specific class is defined:
addone.default <- function(x) x + 1
addone.character <- function(x)rawToChar(as.raw(as.numeric(charToRaw(x))+1))
addone.single <- function(x)x + 1
Finally, test it with some sample data:
addone(1:5)
[1] 2 3 4 5 6
addone(as.single(1:5))
[1] 2
attr(,"class")
[1] "single"
addone("abc")
[1] "bcd"
Some additional information:
Hadley's devtools wiki is a valuable source of information on all things, including the S3 object system.
The S3 method doesn't provide strict typing. It can quite easily be abused. For stricter object orientation, have a look at S4 classes, reference based classesor the proto package for Prototype object-based programming.
You could write a wrapper like the following:
check.types = function(classes, func) {
n = as.name
params = formals(func)
param.names = lapply(names(params), n)
handler = function() { }
formals(handler) = params
checks = lapply(seq_along(param.names), function(I) {
as.call(list(n('assert.class'), param.names[[I]], classes[[I]]))
})
body(handler) = as.call(c(
list(n('{')),
checks,
list(as.call(list(n('<-'), n('.func'), func))),
list(as.call(c(list(n('.func')), lapply(param.names, as.name))))
))
handler
}
assert.class = function(x, cls) {
stopifnot(cls %in% class(x))
}
And use it like
f = check.types(c('numeric', 'numeric'), function(x, y) {
x + y
})
> f(1, 2)
[1] 3
> f("1", "2")
Error: cls %in% class(x) is not TRUE
Made somewhat inconvenient by R not having decorators. This is kind of hacky
and it suffers from some serious problems:
You lose lazy evaluation, because you must evaluate an argument to determine
its type.
You still can't check the types until call time; real static type checking
lets you check the types even of a call that never actually happens.
Since R uses lazy evaluation, (2) might make type checking not very useful,
because the call might not actually occur until very late, or never.
The answer to (2) would be to add static type information. You could probably
do this by transforming expressions, but I don't think you want to go there.
I've found stopifnot() to be highly useful for these situations as well.
x <- function(n) {
stopifnot(is.vector(n) && length(n)==1)
print(n)
}
The reason it is so useful is because it provides a pretty clear error message to the user if the condition is false.

Resources