RcppArmadillo's sample() is ambiguous after updating R - r

I commonly work with a short Rcpp function that takes as input a matrix where each row contains K probabilities that sum to 1. The function then randomly samples for each row an integer between 1 and K corresponding to the provided probabilities. This is the function:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadilloExtensions/sample.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector sample_matrix(NumericMatrix x, IntegerVector choice_set) {
int n = x.nrow();
IntegerVector result(n);
for ( int i = 0; i < n; ++i ) {
result[i] = RcppArmadillo::sample(choice_set, 1, false, x(i, _))[0];
}
return result;
}
I recently updated R and all packages. Now I cannot compile this function anymore. The reason is not clear to me. Running
library(Rcpp)
library(RcppArmadillo)
Rcpp::sourceCpp("sample_matrix.cpp")
throws the following error:
error: call of overloaded 'sample(Rcpp::IntegerVector&, int, bool, Rcpp::Matrix<14>::Row)' is ambiguous
This basically tells me that my call to RcppArmadillo::sample() is ambiguous. Can anyone enlighten me as to why this is the case?

There are two things happening here, and two parts to your problem and hence the answer.
The first is "meta": why now? Well we had a bug let in the sample() code / setup which Christian kindly fixed for the most recent RcppArmadillo release (and it is all documented there). In short, the interface for the very probability argument giving you trouble here was changed as it was not safe for re-use / repeated use. It is now.
Second, the error message. You didn't say what compiler or version you use but mine (currently g++-9.3) is actually pretty helpful with the error. It is still C++ so some interpretative dance is needed but in essence it clearly stating you called with Rcpp::Matrix<14>::Row and no interface is provided for that type. Which is correct. sample() offers a few interface, but none for a Row object. So the fix is, once again, simple. Add a line to aid the compiler by making the row a NumericVector and all is good.
Fixed code
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector sample_matrix(NumericMatrix x, IntegerVector choice_set) {
int n = x.nrow();
IntegerVector result(n);
for ( int i = 0; i < n; ++i ) {
Rcpp::NumericVector z(x(i, _));
result[i] = RcppArmadillo::sample(choice_set, 1, false, z)[0];
}
return result;
}
Example
R> Rcpp::sourceCpp("answer.cpp") # no need for library(Rcpp)
R>

Related

how to create a Rcpp NumericVector from Eigen::Tensor without copying underlying data

If I create a large Tensor in Eigen, and I like to return the Tensor back to R as multi-dimension array. I know how to do it with data copy like below. Question: is it possible to do it without the data-copy step?
#include <Rcpp.h>
#include <RcppEigen.h>
#include <unsupported/Eigen/CXX11/Tensor>
// [[Rcpp::depends(RcppEigen)]]
using namespace Rcpp;
template <typename T>
NumericVector copyFromTensor(const T& x)
{
int n = x.size();
NumericVector ans(n);
IntegerVector dim(x.NumDimensions);
for (int i = 0; i < x.NumDimensions; ++i) {
dim[i] = x.dimension(i);
}
memcpy((double*)ans.begin(), (double*)x.data(), n * sizeof(double));
ans.attr("dim") = dim;
return ans;
}
// [[Rcpp::export]]
NumericVector getTensor() {
Eigen::Tensor<double, 3> x(4, 3, 1);
x.setRandom();
return copyFromTensor(x);
}
/*** R
getTensor()
*/
As a general rule you can zero-copy one the way into your C++ code with data coming from R and already managed by R.
On the way out of your C++ code with data returning to R anything that is not created used the R allocator has to be copied.
Here your object x is a stack-allocated so you need a copy. See Writing R Extensions about the R allocator; Eigen may let you use it when you create a new Tensor object. Not a trivial step. I think I would just live with the copy.

RcppArmadillo: conflicting declaration of C function 'SEXPREC* sourceCpp_1_hh(SEXP, SEXP, SEXP)'

My code is the following
#include <RcppArmadillo.h>
#include <Rcpp.h>
using namespace std;
using namespace Rcpp;
using namespace arma;
//RNGScope scope;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat hh(arma::mat Z, int n, int m){
if(Z.size()==0){
Z = arma::randu<mat>(n,m); # if matrix Z is null, then generate random numbers to fill in it
return Z;
}else{
return Z;
}
}
Error reported:
conflicting declaration of C function 'SEXPREC* sourceCpp_1_hh(SEXP, SEXP, SEXP)'
Do you have any idea about this question?
Thank you in advance!
Let's slow down and clean up, following other examples:
Never ever include both Rcpp.h and RcppArmadillo.h. It errors. And RcppArmadillo.h pulls in Rcpp.h for you, and at the right time. (This matters for the generated code.)
No need to mess with RNGScope unless you really know what your are doing.
I recommend against flattening namespaces.
For reasons discussed elsewhere at length, you probably want R's RNGs.
The code doesn't compile as posted: C++ uses // for comments, not #.
The code doesn't compile as posted: Armadillo uses different matrix creation.
The code doesn't run as intended as size() is not what you want there. We also do not let a 'zero element' matrix in---maybe a constraint on our end.
That said, once repaired, we now get correct behavior for a slightly changed spec:
Output
R> Rcpp::sourceCpp("~/git/stackoverflow/63984142/answer.cpp")
R> hh(2, 2)
[,1] [,2]
[1,] 0.359028 0.775823
[2,] 0.645632 0.563647
R>
Code
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat hh(int n, int m) {
arma::mat Z = arma::mat(n,m,arma::fill::randu);
return Z;
}
/*** R
hh(2, 2)
*/

sample in Rcpp Armadillo

I am currently struggeling with the sample() command provided in RcppArmadillo. When I try to run the code below I get the error no matching function for call to sample and I already add the extra Rcpp:: namespace in front since this worked out well in another post.
I also tried several other container classes, but I am always stuck with this error. Below is some code, which produces the error.
Any help would be greatly appreciated :)
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadilloExtensions/sample.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix example(arma::mat fprob,
int K) {
int t = fprob.n_rows;
IntegerVector choice_set = seq_len(K);
arma::mat states(t,1); states.fill(0);
arma::rowvec p0(K);
arma::rowvec alph(K);
double fit;
p0 = fprob.row(t-1);
fit = accu(p0);
alph = p0/fit;
states(t-1,1) = Rcpp::RcppArmadillo::sample(choice_set, 1, false, alph)[0];
return wrap(states);
}
Here the definition of that function from the header:
// Enables supplying an arma probability
template <class T>
T sample(const T &x, const int size, const bool replace, arma::vec &prob_){
return sample_main(x, size, replace, prob_);
}
Note that it expects a arma::vec == arma::colvec, while you are providing a arma::rowvec. So it should work if you change p0 and alph to arma::vec. Untested because of missing sample data ...
BTW, there is meanwhile also a Rcpp:::sample() function in case you are not really needing Armadillo for other tasks.
Concerning the performance questions raised by #JosephWood in the comments:
I have the impression that both Rcpp::sample() and Rcpp::RcppArmadillo::sample() are based on do_sample(). So they should be quite similar in most cases, but I have not benchmarked them. The higher performance of R for unweighted sampling without replacement for larger numbers comes from the hash algorithm, which is selected at R level in such cases. It is also interesting to note that R 3.6 will have a new method for sampling in order to remove a bias present in the current method.

Rcpp Error Null value passed as symbol address

I am new to Rcpp.
I created an rcpp function which takes a dataframe with 2 columns and a vector as input, and returns a vector.
My data are as below
set.seed(10)
min= sort(rnorm(1000,800,sd=0.1))
max= min+0.02
k=data.frame(min,max)
explist= sort(rnorm(100,800,sd=0.2))
Then I call the cfilter.cpp
k$output <- cfilter(k,explist)
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
CharacterVector cfilter(DataFrame k, NumericVector explist) {
NumericVector col1 = k["min"];
NumericVector col2 = k["max"];
NumericVector exp = explist ;
int n = col1.size();
int j = 0;
CharacterVector out(n);
for (int i=0; i<n ; i++){
out[i]=NA_STRING;
while(exp[j]<= col2[i]){
if( exp[j]>= col1[i] && exp[j]<= col2[i] ){
out[i]="Y";
break;
}
else if(exp[j]>col2[i]){
break;
}
else {
j++ ;
}
}
}
return out;
}
It run perfectly fine for 16171 times I called it. And then suddenly, in the loop 16172 it just stops with an error:
> myfile$output<- cfilter(k,explist2)
Error in .Call(<pointer: (nil)>, k, explist) :
NULL value passed as symbol address
I checked k and explist for NA values but there aren't any, there is no problem whatsoever with the input.
I have no clue how to fix this and what causes this error.
Thanks in advance for any response
I came across the same problem. I'm not an Rcpp expert, nor C++ nor, a backend coding expert.
I have circumvented this problem by re-sourcing my cpp file every time I want to make a call of the function. So, for example if following is your for loop:
for(i in 1:SampleSize){
out[[I]]<-cfilter(k,explist)
}
Do something like:
for(i in 1:SampleSize){
sourceCpp("cfilter.cpp")
out[[i]]<-cfilter(k,explist)
}
Again, I don't know exactly why this worked for me, but it worked. Based on my shallow knowledge of C++, it might be related to memory allocation and that every time you source, memory is released and hence there is no mis-allocation. But I think this is a very wild guess.
Best

RcppArmadillo::sample function causes RStudio and R to crash

While using RcppArmadillo::sample function I discovered that using a large input vector causes RStudio to crash. I provide the entire code below:
#include<iostream>
#include <armadillo>
#include <RcppArmadilloExtensions/sample.h>
//[[Rcpp::depends(RcppArmadillo)]]
using namespace std;
using namespace Rcpp;
using namespace arma;
//[[Rcpp::export]]
IntegerVector test_func(int N) {
IntegerVector frame = Range(1, N);
NumericVector wts = runif(N, 0, 1);
NumericVector Wts = wts / sum(wts);
IntegerVector y = RcppArmadillo::sample(frame, N,TRUE, Wts );
return y;
}
Calling test_func(N=100) produces the right results. But N greater that 200, for instance test_func(N=210), crashes RStudio as well as RConsole. Is there a mistake I am making?
I cannot replicate this. On either a straight R ression or inside RStudio it just works for me.
I made small corrections to your code:
// the following header also include Rcpp and Armadillo headers
#include <RcppArmadilloExtensions/sample.h>
//[[Rcpp::depends(RcppArmadillo)]]
//[[Rcpp::export]]
Rcpp::IntegerVector test_func(int N) {
Rcpp::IntegerVector frame = Rcpp::Range(1, N);
Rcpp::NumericVector wts = Rcpp::runif(N, 0.0, 1.0);
return Rcpp::RcppArmadillo::sample(frame, N, true, wts / Rcpp::sum(wts));
}
but none of these should be material.
Note how the code of the sample() function throws an excpetion if N gets big:
if (walker_test < 200) {
ProbSampleReplace<IntegerVector>(index, nOrig, size, prob);
} else {
throw std::range_error("Walker Alias method not implemented. [...]");
}
so I think you may be seeing a garden variety error of a mismatch between R, Rcpp, RcppArmadillo. What platform are you on? For it is Linux where packages are recompiled.

Resources