Passing large matrices to RcppArmadillo function without creating copy (advanced constructors) - r

I want to pass a large matrix to a RcppArmadillo function (about 30,000*30,000) and have the feeling that this passing alone eats up all the performance gains. The question was also raised here with the suggested to solution to use advanced constructors with the copy_aux_mem = false argument. This seems to be a good solution also because I only need to read rows from the matrix without changing anything. I am having problems implementing the solution correctly though. This is probably just a simply syntax question.
Here is my current set-up of the function call (simplified, of course):
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec test(arma::mat M) {
return(M.row(0))
}
this is pretty slow with large a matrix M (e.g. M=matrix(rnorm(30000*30000), nrow=30000, ncol=30000). So I would like to use an advanced constructor as documented here. The syntax is mat(aux_mem*, n_rows, n_cols, copy_aux_mem = true, strict = true) and copy_aux_mem should be set to false to 'pass-by-reference'. I just not sure about the syntax in the function definition. How do I use this in arma::vec test(arma::mat M) {?

This has been discussed extensively in the Rcpp mailing list. See this thread. The solution that has been implemented in RcppArmadillo is to pass the arma::mat by reference. Internally this will call the advanced constructor for you.
So with this version, you would do something like this:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec test(const arma::mat& M) {
// do whatever with M
...
}
And the data from the R matrix is not copied but rather borrowed. More details in the thread.
Here are some benchmarks comparing the time it takes to copy or pass by reference:
expr min lq median uq max neval
arma_test_value(m) 3540.369 3554.4105 3572.3305 3592.5795 4168.671 100
arma_test_ref(m) 4.046 4.3205 4.7770 15.5855 16.671 100
arma_test_const_ref(m) 3.994 4.3660 5.5125 15.7355 34.874 100
With these functions:
#include <RcppArmadillo.h>
using namespace Rcpp ;
// [[Rcpp::depends("RcppArmadillo")]]
// [[Rcpp::export]]
void arma_test_value( arma::mat x){}
// [[Rcpp::export]]
void arma_test_ref( arma::mat& x){}
// [[Rcpp::export]]
void arma_test_const_ref( const arma::mat& x){}

With the CRAN version of RcppArmadillo, you would use that sort of syntax:
void foo( NumericMatrix x_ ){
arma::mat M( x_.begin(), x_.nrow(), x_.ncol(), false ) ;
// do whatever with M
}
This has been used in many places, including several articles in the Rcpp gallery.

Related

RcppArmadillo: conflicting declaration of C function 'SEXPREC* sourceCpp_1_hh(SEXP, SEXP, SEXP)'

My code is the following
#include <RcppArmadillo.h>
#include <Rcpp.h>
using namespace std;
using namespace Rcpp;
using namespace arma;
//RNGScope scope;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat hh(arma::mat Z, int n, int m){
if(Z.size()==0){
Z = arma::randu<mat>(n,m); # if matrix Z is null, then generate random numbers to fill in it
return Z;
}else{
return Z;
}
}
Error reported:
conflicting declaration of C function 'SEXPREC* sourceCpp_1_hh(SEXP, SEXP, SEXP)'
Do you have any idea about this question?
Thank you in advance!
Let's slow down and clean up, following other examples:
Never ever include both Rcpp.h and RcppArmadillo.h. It errors. And RcppArmadillo.h pulls in Rcpp.h for you, and at the right time. (This matters for the generated code.)
No need to mess with RNGScope unless you really know what your are doing.
I recommend against flattening namespaces.
For reasons discussed elsewhere at length, you probably want R's RNGs.
The code doesn't compile as posted: C++ uses // for comments, not #.
The code doesn't compile as posted: Armadillo uses different matrix creation.
The code doesn't run as intended as size() is not what you want there. We also do not let a 'zero element' matrix in---maybe a constraint on our end.
That said, once repaired, we now get correct behavior for a slightly changed spec:
Output
R> Rcpp::sourceCpp("~/git/stackoverflow/63984142/answer.cpp")
R> hh(2, 2)
[,1] [,2]
[1,] 0.359028 0.775823
[2,] 0.645632 0.563647
R>
Code
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat hh(int n, int m) {
arma::mat Z = arma::mat(n,m,arma::fill::randu);
return Z;
}
/*** R
hh(2, 2)
*/

RcppArmadillo's sample() is ambiguous after updating R

I commonly work with a short Rcpp function that takes as input a matrix where each row contains K probabilities that sum to 1. The function then randomly samples for each row an integer between 1 and K corresponding to the provided probabilities. This is the function:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadilloExtensions/sample.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector sample_matrix(NumericMatrix x, IntegerVector choice_set) {
int n = x.nrow();
IntegerVector result(n);
for ( int i = 0; i < n; ++i ) {
result[i] = RcppArmadillo::sample(choice_set, 1, false, x(i, _))[0];
}
return result;
}
I recently updated R and all packages. Now I cannot compile this function anymore. The reason is not clear to me. Running
library(Rcpp)
library(RcppArmadillo)
Rcpp::sourceCpp("sample_matrix.cpp")
throws the following error:
error: call of overloaded 'sample(Rcpp::IntegerVector&, int, bool, Rcpp::Matrix<14>::Row)' is ambiguous
This basically tells me that my call to RcppArmadillo::sample() is ambiguous. Can anyone enlighten me as to why this is the case?
There are two things happening here, and two parts to your problem and hence the answer.
The first is "meta": why now? Well we had a bug let in the sample() code / setup which Christian kindly fixed for the most recent RcppArmadillo release (and it is all documented there). In short, the interface for the very probability argument giving you trouble here was changed as it was not safe for re-use / repeated use. It is now.
Second, the error message. You didn't say what compiler or version you use but mine (currently g++-9.3) is actually pretty helpful with the error. It is still C++ so some interpretative dance is needed but in essence it clearly stating you called with Rcpp::Matrix<14>::Row and no interface is provided for that type. Which is correct. sample() offers a few interface, but none for a Row object. So the fix is, once again, simple. Add a line to aid the compiler by making the row a NumericVector and all is good.
Fixed code
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector sample_matrix(NumericMatrix x, IntegerVector choice_set) {
int n = x.nrow();
IntegerVector result(n);
for ( int i = 0; i < n; ++i ) {
Rcpp::NumericVector z(x(i, _));
result[i] = RcppArmadillo::sample(choice_set, 1, false, z)[0];
}
return result;
}
Example
R> Rcpp::sourceCpp("answer.cpp") # no need for library(Rcpp)
R>

sample in Rcpp Armadillo

I am currently struggeling with the sample() command provided in RcppArmadillo. When I try to run the code below I get the error no matching function for call to sample and I already add the extra Rcpp:: namespace in front since this worked out well in another post.
I also tried several other container classes, but I am always stuck with this error. Below is some code, which produces the error.
Any help would be greatly appreciated :)
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadilloExtensions/sample.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix example(arma::mat fprob,
int K) {
int t = fprob.n_rows;
IntegerVector choice_set = seq_len(K);
arma::mat states(t,1); states.fill(0);
arma::rowvec p0(K);
arma::rowvec alph(K);
double fit;
p0 = fprob.row(t-1);
fit = accu(p0);
alph = p0/fit;
states(t-1,1) = Rcpp::RcppArmadillo::sample(choice_set, 1, false, alph)[0];
return wrap(states);
}
Here the definition of that function from the header:
// Enables supplying an arma probability
template <class T>
T sample(const T &x, const int size, const bool replace, arma::vec &prob_){
return sample_main(x, size, replace, prob_);
}
Note that it expects a arma::vec == arma::colvec, while you are providing a arma::rowvec. So it should work if you change p0 and alph to arma::vec. Untested because of missing sample data ...
BTW, there is meanwhile also a Rcpp:::sample() function in case you are not really needing Armadillo for other tasks.
Concerning the performance questions raised by #JosephWood in the comments:
I have the impression that both Rcpp::sample() and Rcpp::RcppArmadillo::sample() are based on do_sample(). So they should be quite similar in most cases, but I have not benchmarked them. The higher performance of R for unweighted sampling without replacement for larger numbers comes from the hash algorithm, which is selected at R level in such cases. It is also interesting to note that R 3.6 will have a new method for sampling in order to remove a bias present in the current method.

Rcpp: passing native c++ functions as arguments? [duplicate]

I'm trying to run something like
R
my_r_function <- function(input_a) {return(input_a**3)}
RunFunction(c(1,2,3), my_r_function)
CPP
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector RunFunction(NumericVector a, Function func)
{
NumericVector b = NumericVector(a.size());
for(int i=0; i<a.size(); i++)
b[i] = func(a[i]);
return b;
}
How would I make "Function func" actually work in Rcpp?
P.S. I understand there are ways to do this without Rcpp (apply comes to mind for this example) but I'm just using this as an example to demonstrate what I'm looking for.
You should be able to use the example in the link I provided above to get your code working; but you should also take note of Dirk's warning,
Calling a function is simple and tempting. It is also slow as there
are overheads involved. And calling it repeatedly from inside your C++
code, possibly buried within several loops, is outright silly.
which can be demonstrated by modifying your above code slightly and benchmarking the two versions:
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::NumericVector RunFunction(Rcpp::NumericVector a, Rcpp::Function func)
{
Rcpp::NumericVector b = func(a);
return b;
}
// [[Rcpp::export]]
Rcpp::NumericVector RunFunction2(Rcpp::NumericVector a, Rcpp::Function func)
{
Rcpp::NumericVector b(a.size());
for(int i = 0; i < a.size(); i++){
b[i] = Rcpp::as<double>(func(a[i]));
}
return b;
}
/*** R
my_r_function <- function(input_a) {return(input_a**3)}
x <- 1:10
##
RunFunction(x,my_r_function)
RunFunction2(x,my_r_function)
##
library(microbenchmark)
microbenchmark(
RunFunction(rep(1:10,10),my_r_function),
RunFunction2(rep(1:10,10),my_r_function))
Unit: microseconds
expr min lq mean median uq max neval
RunFunction(rep(1:10, 10), my_r_function) 21.390 22.9985 25.74988 24.0840 26.464 43.722 100
RunFunction2(rep(1:10, 10), my_r_function) 843.864 903.0025 1048.13175 951.2405 1057.899 2387.550 100
*/
Notice that RunFunction is ~40x faster than RunFunction2: in the former we only incur the overhead of calling func from inside the C++ code once, whereas in the latter case we have to make the exchange for each element of the input vector. If you tried running this on even longer vectors, I'm sure you would see a substantially worse performance from RunFunction2 relative to RunFunction. So, if you are going to be calling R functions from inside of your C++ code, you should try to take advantage of R's native vectorization (if possible) rather than repeatedly making calls to the R function in a loop, at least for reasonably simple calculations like x**3.
Also, if you were wondering why your code wasn't compiling, it was because of this line:
b[i] = func(a[i]);
You were presumably getting the error
cannot convert ‘SEXP’ to ‘Rcpp::traits::storage_type<14>::type {aka
double}’ in assignment
which I resolved by wrapping the return value of func(a[i]) in Rcpp::as<double>() above. However, this clearly isn't worth the trouble because you end up with a much slower function overall anyhow.
You can use 'transform()' and avoid using loops! Try the following code:
List RunFunction(List input, Function f) {
List output(input.size());
std::transform(input.begin(), input.end(), output.begin(), f);
output.names() = input.names();
}

Rcpp - Use multiple C++ functions in file referenced by sourceCpp?

I hope this isn't too obvious, as I've searched all day and can't find the answer.
Say I have the following R file:
library(Rcpp)
sourceCpp("cfile.cpp")
giveOutput(c(1,2,3))
And it compiles the following C++ file:
#include <Rcpp>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector plusTwo(NumericVector x){
NumericVector out = x + 2.0;
return out;
}
NumericVector giveOutput(NumericVector a){
NumericVector b = plusTwo(a);
return b;
}
No matter what I try, the Rcpp preprocessor makes plusTwo() available, and giveOutput() not at all. The documentation I've been able to find says that this is the point at which one should create a package, but after reading the package vignette it seems an order of magnitude more complicated than what I need.
Short of explicitly defining plusTwo() inside giveOutput(), what can I do?
You are expected to use the export attribute in front of every function you wanted exported. So by correcting your file to
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector plusTwo(NumericVector x){
NumericVector out = x + 2.0;
return out;
}
// [[Rcpp::export]]
NumericVector giveOutput(NumericVector a){
NumericVector b = plusTwo(a);
return b;
}
I get the desired behaviour:
R> sourceCpp("/tmp/patrick.cpp")
R> giveOutput(1:3)
[1] 3 4 5
R> plusTwo(1:3)
[1] 3 4 5
R>
Oh, and creating a package is as easy as calling Rcpp.package.skeleton() (but read its help page, particularly for the attributes argument). I know of at least one CRAN package that started how you started here and clearly went via Rcpp.package.skeleton()...

Resources