sample in Rcpp Armadillo - r

I am currently struggeling with the sample() command provided in RcppArmadillo. When I try to run the code below I get the error no matching function for call to sample and I already add the extra Rcpp:: namespace in front since this worked out well in another post.
I also tried several other container classes, but I am always stuck with this error. Below is some code, which produces the error.
Any help would be greatly appreciated :)
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadilloExtensions/sample.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix example(arma::mat fprob,
int K) {
int t = fprob.n_rows;
IntegerVector choice_set = seq_len(K);
arma::mat states(t,1); states.fill(0);
arma::rowvec p0(K);
arma::rowvec alph(K);
double fit;
p0 = fprob.row(t-1);
fit = accu(p0);
alph = p0/fit;
states(t-1,1) = Rcpp::RcppArmadillo::sample(choice_set, 1, false, alph)[0];
return wrap(states);
}

Here the definition of that function from the header:
// Enables supplying an arma probability
template <class T>
T sample(const T &x, const int size, const bool replace, arma::vec &prob_){
return sample_main(x, size, replace, prob_);
}
Note that it expects a arma::vec == arma::colvec, while you are providing a arma::rowvec. So it should work if you change p0 and alph to arma::vec. Untested because of missing sample data ...
BTW, there is meanwhile also a Rcpp:::sample() function in case you are not really needing Armadillo for other tasks.
Concerning the performance questions raised by #JosephWood in the comments:
I have the impression that both Rcpp::sample() and Rcpp::RcppArmadillo::sample() are based on do_sample(). So they should be quite similar in most cases, but I have not benchmarked them. The higher performance of R for unweighted sampling without replacement for larger numbers comes from the hash algorithm, which is selected at R level in such cases. It is also interesting to note that R 3.6 will have a new method for sampling in order to remove a bias present in the current method.

Related

RcppArmadillo: conflicting declaration of C function 'SEXPREC* sourceCpp_1_hh(SEXP, SEXP, SEXP)'

My code is the following
#include <RcppArmadillo.h>
#include <Rcpp.h>
using namespace std;
using namespace Rcpp;
using namespace arma;
//RNGScope scope;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat hh(arma::mat Z, int n, int m){
if(Z.size()==0){
Z = arma::randu<mat>(n,m); # if matrix Z is null, then generate random numbers to fill in it
return Z;
}else{
return Z;
}
}
Error reported:
conflicting declaration of C function 'SEXPREC* sourceCpp_1_hh(SEXP, SEXP, SEXP)'
Do you have any idea about this question?
Thank you in advance!
Let's slow down and clean up, following other examples:
Never ever include both Rcpp.h and RcppArmadillo.h. It errors. And RcppArmadillo.h pulls in Rcpp.h for you, and at the right time. (This matters for the generated code.)
No need to mess with RNGScope unless you really know what your are doing.
I recommend against flattening namespaces.
For reasons discussed elsewhere at length, you probably want R's RNGs.
The code doesn't compile as posted: C++ uses // for comments, not #.
The code doesn't compile as posted: Armadillo uses different matrix creation.
The code doesn't run as intended as size() is not what you want there. We also do not let a 'zero element' matrix in---maybe a constraint on our end.
That said, once repaired, we now get correct behavior for a slightly changed spec:
Output
R> Rcpp::sourceCpp("~/git/stackoverflow/63984142/answer.cpp")
R> hh(2, 2)
[,1] [,2]
[1,] 0.359028 0.775823
[2,] 0.645632 0.563647
R>
Code
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat hh(int n, int m) {
arma::mat Z = arma::mat(n,m,arma::fill::randu);
return Z;
}
/*** R
hh(2, 2)
*/

RcppArmadillo's sample() is ambiguous after updating R

I commonly work with a short Rcpp function that takes as input a matrix where each row contains K probabilities that sum to 1. The function then randomly samples for each row an integer between 1 and K corresponding to the provided probabilities. This is the function:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadilloExtensions/sample.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector sample_matrix(NumericMatrix x, IntegerVector choice_set) {
int n = x.nrow();
IntegerVector result(n);
for ( int i = 0; i < n; ++i ) {
result[i] = RcppArmadillo::sample(choice_set, 1, false, x(i, _))[0];
}
return result;
}
I recently updated R and all packages. Now I cannot compile this function anymore. The reason is not clear to me. Running
library(Rcpp)
library(RcppArmadillo)
Rcpp::sourceCpp("sample_matrix.cpp")
throws the following error:
error: call of overloaded 'sample(Rcpp::IntegerVector&, int, bool, Rcpp::Matrix<14>::Row)' is ambiguous
This basically tells me that my call to RcppArmadillo::sample() is ambiguous. Can anyone enlighten me as to why this is the case?
There are two things happening here, and two parts to your problem and hence the answer.
The first is "meta": why now? Well we had a bug let in the sample() code / setup which Christian kindly fixed for the most recent RcppArmadillo release (and it is all documented there). In short, the interface for the very probability argument giving you trouble here was changed as it was not safe for re-use / repeated use. It is now.
Second, the error message. You didn't say what compiler or version you use but mine (currently g++-9.3) is actually pretty helpful with the error. It is still C++ so some interpretative dance is needed but in essence it clearly stating you called with Rcpp::Matrix<14>::Row and no interface is provided for that type. Which is correct. sample() offers a few interface, but none for a Row object. So the fix is, once again, simple. Add a line to aid the compiler by making the row a NumericVector and all is good.
Fixed code
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector sample_matrix(NumericMatrix x, IntegerVector choice_set) {
int n = x.nrow();
IntegerVector result(n);
for ( int i = 0; i < n; ++i ) {
Rcpp::NumericVector z(x(i, _));
result[i] = RcppArmadillo::sample(choice_set, 1, false, z)[0];
}
return result;
}
Example
R> Rcpp::sourceCpp("answer.cpp") # no need for library(Rcpp)
R>

Call R functions in RcppArmadillo with armadillo data type

I am translating my R code with some prepared functions to RcppArmadillo. I want to use some of these functions directly in my Rcpp code,instead of translating. For example, I want to call the sigma2 function:
sigma2<- function(xi.vec,w.vec,log10lambda,n,q){
lambda <- 10^log10lambda
(1/(n-q))*sum((lambda*xi.vec*(w.vec^2))/(lambda*xi.vec+1))
}
A typical Rcpp code is as below:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
SEXP myS(){
Rcpp::Environment myEnv = Rcpp::Environment::global_env();
Rcpp::Function myS = myEnv["sigma2"];
arma::vec xvec = myEnv["xi.vec"];
arma::vec wvec = myEnv["w.vec"];
double l = myEnv["log10lambda"];
int n = myEnv["n"];
int q = myEnv["q"];
return myS(Rcpp::Named("xi.vec",xvec),
Rcpp::Named("w.vec",wvec),
Rcpp::Named("l",l),
Rcpp::Named("n",n),
Rcpp::Named("q",q));
}
Of course it works. But my problem is that in my case, the parameters of sigma2 function should be defined before as output of another function(say func1) in RcppArmadillo and they have armadillo data type. For instance, xi.vec and w.vec have vec type. Now I want to know how can I modified this code to call sigma2? Do I need to change my environment?
First, just say no to embedding R functions and environments into C++ routines. There is no speedup in this case; only a considerable slowdown. Furthermore, there is a greater potential for things to go cockeye if the variables are not able to be retrieved in the global.env scope.
In your case, you seem to be calling myS() from within myS() with no terminating condition. Thus, your function will never end.
e.g.
SEXP myS(){
Rcpp::Function myS = myEnv["sigma2"];
return myS(Rcpp::Named("xi.vec",xvec),
Rcpp::Named("w.vec",wvec),
Rcpp::Named("l",l),
Rcpp::Named("n",n),
Rcpp::Named("q",q));
}
Switch one to be myS_R and myS_cpp.
Regarding environment hijacking, you would need to pass down to C++ the values. You cannot reach into an R function to obtain values specific passed to it before it is called.
e.g.
SEXP myS_cpp(arma::vec xvec, arma::vec wvec, double l, int n, int q){
// code here
}

Passing large matrices to RcppArmadillo function without creating copy (advanced constructors)

I want to pass a large matrix to a RcppArmadillo function (about 30,000*30,000) and have the feeling that this passing alone eats up all the performance gains. The question was also raised here with the suggested to solution to use advanced constructors with the copy_aux_mem = false argument. This seems to be a good solution also because I only need to read rows from the matrix without changing anything. I am having problems implementing the solution correctly though. This is probably just a simply syntax question.
Here is my current set-up of the function call (simplified, of course):
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec test(arma::mat M) {
return(M.row(0))
}
this is pretty slow with large a matrix M (e.g. M=matrix(rnorm(30000*30000), nrow=30000, ncol=30000). So I would like to use an advanced constructor as documented here. The syntax is mat(aux_mem*, n_rows, n_cols, copy_aux_mem = true, strict = true) and copy_aux_mem should be set to false to 'pass-by-reference'. I just not sure about the syntax in the function definition. How do I use this in arma::vec test(arma::mat M) {?
This has been discussed extensively in the Rcpp mailing list. See this thread. The solution that has been implemented in RcppArmadillo is to pass the arma::mat by reference. Internally this will call the advanced constructor for you.
So with this version, you would do something like this:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec test(const arma::mat& M) {
// do whatever with M
...
}
And the data from the R matrix is not copied but rather borrowed. More details in the thread.
Here are some benchmarks comparing the time it takes to copy or pass by reference:
expr min lq median uq max neval
arma_test_value(m) 3540.369 3554.4105 3572.3305 3592.5795 4168.671 100
arma_test_ref(m) 4.046 4.3205 4.7770 15.5855 16.671 100
arma_test_const_ref(m) 3.994 4.3660 5.5125 15.7355 34.874 100
With these functions:
#include <RcppArmadillo.h>
using namespace Rcpp ;
// [[Rcpp::depends("RcppArmadillo")]]
// [[Rcpp::export]]
void arma_test_value( arma::mat x){}
// [[Rcpp::export]]
void arma_test_ref( arma::mat& x){}
// [[Rcpp::export]]
void arma_test_const_ref( const arma::mat& x){}
With the CRAN version of RcppArmadillo, you would use that sort of syntax:
void foo( NumericMatrix x_ ){
arma::mat M( x_.begin(), x_.nrow(), x_.ncol(), false ) ;
// do whatever with M
}
This has been used in many places, including several articles in the Rcpp gallery.

Rcpp - Use multiple C++ functions in file referenced by sourceCpp?

I hope this isn't too obvious, as I've searched all day and can't find the answer.
Say I have the following R file:
library(Rcpp)
sourceCpp("cfile.cpp")
giveOutput(c(1,2,3))
And it compiles the following C++ file:
#include <Rcpp>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector plusTwo(NumericVector x){
NumericVector out = x + 2.0;
return out;
}
NumericVector giveOutput(NumericVector a){
NumericVector b = plusTwo(a);
return b;
}
No matter what I try, the Rcpp preprocessor makes plusTwo() available, and giveOutput() not at all. The documentation I've been able to find says that this is the point at which one should create a package, but after reading the package vignette it seems an order of magnitude more complicated than what I need.
Short of explicitly defining plusTwo() inside giveOutput(), what can I do?
You are expected to use the export attribute in front of every function you wanted exported. So by correcting your file to
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector plusTwo(NumericVector x){
NumericVector out = x + 2.0;
return out;
}
// [[Rcpp::export]]
NumericVector giveOutput(NumericVector a){
NumericVector b = plusTwo(a);
return b;
}
I get the desired behaviour:
R> sourceCpp("/tmp/patrick.cpp")
R> giveOutput(1:3)
[1] 3 4 5
R> plusTwo(1:3)
[1] 3 4 5
R>
Oh, and creating a package is as easy as calling Rcpp.package.skeleton() (but read its help page, particularly for the attributes argument). I know of at least one CRAN package that started how you started here and clearly went via Rcpp.package.skeleton()...

Resources