RCppParallel Programming Error Crashing R - r

I have been trying to parallelize one of my Rcpp routines. In doing so I have been trying to follow the Parallel Distance Calculation example from jjalaire. Unfortunately, once I got everything coded up and started to play around, my R session would crash. Sometimes after the first execution, sometimes after the third. To be honest, it was a crap shoot as to when R would crash when I ran the routine. So, I have paired down my code to a small reproducible example to play with.
Rcpp File (mytest.cpp)
#include <Rcpp.h>
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
using namespace std;
using namespace Rcpp;
using namespace RcppParallel;
struct MyThing : public Worker {
RVector<double> _pc;
RVector<double> _pcsd;
MyThing(Rcpp::NumericVector _pc, Rcpp::NumericVector _pcsd) : _pc(_pc), _pcsd(_pcsd){}
void operator()(std::size_t begin, std::size_t end) {
for(int j = begin; j <= end; j++) {
_pc[j] = 1;
// _pcsd[j] = 1;
}
}
};
// [[Rcpp::export]]
void calculateMyThingParallel() {
NumericVector _pc(100);
NumericVector _pcsd(100);
MyThing mt(_pc, _pcsd);
parallelFor(0, 100, mt);
}
R Compilation and Execution Script (mytest.R)
library(Rcpp)
library(inline)
sourceCpp('mytest.cpp')
testmything = function() {
calculateMyThingParallel()
}
if(TRUE) {
for(i in 1:20) {
testmything()
}
}
The error seems to be directly related to my setting of the _pc and _pcsd variables in the operator() method. If I take those out things dramatically improve. Based on the Parallel Distance Calculation example, I am not sure what it is that I have done wrong here. I was under the impression that RVector was thread safe. Although that is my impression, I know this is an issue with threads somehow. Can anybody help me to understand why the above code randomly crashes my R sessions?
For information I am running the following:
Windows 7
R: 3.1.2
Rtools: 3.1
Rcpp: 0.11.3
inline: 0.3.13
RStudio: 0.99.62

After cross-posting this question on the rcpp-devel list, a user responded and infomed me that my loop over j in the operator() method should go between begin <= j < end and not begin <= j <= end which is what I had.
I made that change and sure nuff, everything seems to be working right now.
seems like overextending ones reach past allocated memory spaces still results in unintended consequences...

Related

How do I resolve compile error in RcppParallel function which points to an RcppParallel header file

I'm trying to speed up numeric computation in my R code using RcppParallel and am attempting to edit an example that uses the Cpp sqrt() function to take the square root of each element of a matrix. My edited code replaces matrices with vectors and multiplies the sqrt() by a constant. (In actual use I will have 3 constants and my own operator function.)
The example comes from
https://gallery.rcpp.org/articles/parallel-matrix-transform/
The compiler identifies the error as in the 'algorithm' file on a comment line:
Line 7 no matching function for call to object of type 'SquareRootPlus::sqrtWrapper'
This is my initial attempt to use RcppParallel and I've not used
Cpp for several years.
Edit: running macOS Ventura on apple silicon,
Rcpp ver 1.0.10,
RcppParallel ver 5.1.6,
and R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
It would be called like this (if it compiled):
where c is a numerical constant aka a double and res is a numerical vector
res <- parallelMatrixSqrt(someNumericalVector, c)
My testing code is:
#include <Rcpp.h>
#include <RcppParallel.h>
using namespace RcppParallel;
using namespace Rcpp;
struct SquareRootPlus : public Worker
{
// source vector etc
const RVector<double> input;
const double constParam;
// destination vector
RVector<double> output;
// initialize with source and destination
// get the data type correctly unless auto promoted/cast
SquareRootPlus(const Rcpp::NumericVector input, const double constParam,
Rcpp::NumericVector output)
: input(input), constParam(constParam), output(output) {}
struct sqrt_wrapper { // describe worker function
public: double operator()(double a, double cp) {
return ::sqrt(a) * cp;
}
};
// take the square root of the range of elements requested
// (and multiply it by the constant)
void operator()(std::size_t begin, std::size_t end) {
std::transform(input.begin() + begin,
input.begin() + end,
output.begin() + begin,
sqrt_wrapper());
}
};
// public called routine
// [[Rcpp::export]]
Rcpp::NumericVector paralleVectorSqrt(Rcpp::NumericVector x, double c) {
// allocate the output matrix
Rcpp::NumericVector output(x.length());
// SquareRoot functor (pass input and output matrixes)
SquareRootPlus squareRoot(x, c, output);
// call parallelFor to do the work
parallelFor(0, x.length(), squareRoot);
// return the output matrix
return output;
}
That still works fine for me (Ubuntu 22.10, g++-12) -- modulo same warnings we often get from libraries like Boost, and here now from the include TBB library (and the repo should have a newer one so you can try that).
I just did (straight from the Rcpp Gallery source directory):
> library(Rcpp)
> sourceCpp("2014-06-29-parallel-matrix-transform.cpp")
In file included from /usr/local/lib/R/site-library/RcppParallel/include/tbb/tbb.h:41,
from /usr/local/lib/R/site-library/RcppParallel/include/RcppParallel/TBB.h:10,
from /usr/local/lib/R/site-library/RcppParallel/include/RcppParallel.h:21,
from 2014-06-29-parallel-matrix-transform.cpp:59:
/usr/local/lib/R/site-library/RcppParallel/include/tbb/concurrent_hash_map.h:343:23: warning: ‘template<class _Category, class _Tp, class _Distance, class _Pointer, class _Reference> struct std::iterator’ is dep
recated [-Wdeprecated-declarations]
[... more like this omitted for brevity ...]
> # allocate a matrix
> m <- matrix(as.numeric(c(1:1000000)), nrow = 1000, ncol = 1000)
> # ensure that serial and parallel versions give the same result
> stopifnot(identical(matrixSqrt(m), parallelMatrixSqrt(m)))
> # compare performance of serial and parallel
> library(rbenchmark)
> res <- benchmark(matrixSqrt(m),
+ parallelMatrixSqrt(m),
+ order="relative")
> res[,1:4]
test replications elapsed relative
2 parallelMatrixSqrt(m) 100 0.496 1.000
1 matrixSqrt(m) 100 0.565 1.139
>
and as you can see it not only builds but also runs the example call from R.
You would have to give us more detail about how you call it and what OS and package versions you use. And I won't have time now to dig into your code and do a code review for you but given that (still relatively simple) reference example works maybe you can reduce your currently-not-working approach down to something simpler that works.
Edit Your example appears to have switched from a unary function to one with two arguments in the signature. Sadly it ain't that easy. The fuller error message is (on my side with g++-12)
/usr/include/c++/12/bits/stl_algo.h:4263:31: error: no match for call to ‘(SquareRootPlus::sqrt_wrapper) (const double&)’
4263 | *__result = __unary_op(*__first);
| ~~~~~~~~~~^~~~~~~~~~
question.cpp:25:20: note: candidate: ‘double SquareRootPlus::sqrt_wrapper::operator()(double, double)’
25 | public: double operator()(double a, double cp) {
| ^~~~~~~~
question.cpp:25:20: note: candidate expects 2 arguments, 1 provided
So you need to rework / extend the example framework for this.
Edit 2: The gory details about std::transform() and its unary function are e.g. here at cppreference.com.
Edit 3: Building on the previous comment, when you step back a bit and look at what is happening here you may seen that RcppParellel excels at parceling up a large data structure, then submitting all the piece in parallel and finally reassemble the result. That still works. You simply cannot apply for 'richer signature function' via std::transform(). No more, no less. You need to work the guts of work which applies your function to the chunk it sees. Check the other RcppParallel examples for inspiration.

RcppArmadillo's sample() is ambiguous after updating R

I commonly work with a short Rcpp function that takes as input a matrix where each row contains K probabilities that sum to 1. The function then randomly samples for each row an integer between 1 and K corresponding to the provided probabilities. This is the function:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadilloExtensions/sample.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector sample_matrix(NumericMatrix x, IntegerVector choice_set) {
int n = x.nrow();
IntegerVector result(n);
for ( int i = 0; i < n; ++i ) {
result[i] = RcppArmadillo::sample(choice_set, 1, false, x(i, _))[0];
}
return result;
}
I recently updated R and all packages. Now I cannot compile this function anymore. The reason is not clear to me. Running
library(Rcpp)
library(RcppArmadillo)
Rcpp::sourceCpp("sample_matrix.cpp")
throws the following error:
error: call of overloaded 'sample(Rcpp::IntegerVector&, int, bool, Rcpp::Matrix<14>::Row)' is ambiguous
This basically tells me that my call to RcppArmadillo::sample() is ambiguous. Can anyone enlighten me as to why this is the case?
There are two things happening here, and two parts to your problem and hence the answer.
The first is "meta": why now? Well we had a bug let in the sample() code / setup which Christian kindly fixed for the most recent RcppArmadillo release (and it is all documented there). In short, the interface for the very probability argument giving you trouble here was changed as it was not safe for re-use / repeated use. It is now.
Second, the error message. You didn't say what compiler or version you use but mine (currently g++-9.3) is actually pretty helpful with the error. It is still C++ so some interpretative dance is needed but in essence it clearly stating you called with Rcpp::Matrix<14>::Row and no interface is provided for that type. Which is correct. sample() offers a few interface, but none for a Row object. So the fix is, once again, simple. Add a line to aid the compiler by making the row a NumericVector and all is good.
Fixed code
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector sample_matrix(NumericMatrix x, IntegerVector choice_set) {
int n = x.nrow();
IntegerVector result(n);
for ( int i = 0; i < n; ++i ) {
Rcpp::NumericVector z(x(i, _));
result[i] = RcppArmadillo::sample(choice_set, 1, false, z)[0];
}
return result;
}
Example
R> Rcpp::sourceCpp("answer.cpp") # no need for library(Rcpp)
R>

Passing R functions to C routines using rcpp

I have a C function from a down-stream library that I call in C like this
result = cfunction(input_function)
input_function is a callback that needs to have the following structure
double input_function(const double &x)
{
return(x*x);
}
Where x*x is a user-defined computation that is usually much more complicated. I'd like to wrap cfunction using Rcpp so that the R user could call it on arbitrary R functions.
NumericVector rfunction(Function F){
NumericVector result(1);
// MAGIC THAT I DON'T KNOW HOW TO DO
// SOMEHOW TURN F INTO COMPATIBLE input_funcion
result[0] = cfunction(input_function);
return(result);
}
The R user then might do rfunction(function(x) {x*x}) and get the right result.
I am aware that calling R functions within cfunction will kill the speed but I figure that I can figure out how to pass compiled functions later on. I'd just like to get this part working.
The closest thing I can find that does what I need is this https://sites.google.com/site/andrassali/computing/user-supplied-functions-in-rcppgsl which wraps a function that uses callback that has an oh-so-useful second parameter within which I could stuff the R function.
Advice would be gratefully received.
One possible solution would be saving the R-function into a global variable and defining a function that uses that global variable. Example implementation where I use an anonymous namespace to make the variable known only within the compilation unit:
#include <Rcpp.h>
extern "C" {
double cfunction(double (*input_function)(const double&)) {
return input_function(42);
}
}
namespace {
std::unique_ptr<Rcpp::Function> func;
}
double input_function(const double &x) {
Rcpp::NumericVector result = (*func)(x);
return result(0);
}
// [[Rcpp::export]]
double rfunction(Rcpp::Function F){
func = std::make_unique<Rcpp::Function>(F);
return cfunction(input_function);
}
/*** R
rfunction(sqrt)
rfunction(log)
*/
Output:
> Rcpp::sourceCpp('57137507/code.cpp')
> rfunction(sqrt)
[1] 6.480741
> rfunction(log)
[1] 3.73767

Rcpp Error Null value passed as symbol address

I am new to Rcpp.
I created an rcpp function which takes a dataframe with 2 columns and a vector as input, and returns a vector.
My data are as below
set.seed(10)
min= sort(rnorm(1000,800,sd=0.1))
max= min+0.02
k=data.frame(min,max)
explist= sort(rnorm(100,800,sd=0.2))
Then I call the cfilter.cpp
k$output <- cfilter(k,explist)
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
CharacterVector cfilter(DataFrame k, NumericVector explist) {
NumericVector col1 = k["min"];
NumericVector col2 = k["max"];
NumericVector exp = explist ;
int n = col1.size();
int j = 0;
CharacterVector out(n);
for (int i=0; i<n ; i++){
out[i]=NA_STRING;
while(exp[j]<= col2[i]){
if( exp[j]>= col1[i] && exp[j]<= col2[i] ){
out[i]="Y";
break;
}
else if(exp[j]>col2[i]){
break;
}
else {
j++ ;
}
}
}
return out;
}
It run perfectly fine for 16171 times I called it. And then suddenly, in the loop 16172 it just stops with an error:
> myfile$output<- cfilter(k,explist2)
Error in .Call(<pointer: (nil)>, k, explist) :
NULL value passed as symbol address
I checked k and explist for NA values but there aren't any, there is no problem whatsoever with the input.
I have no clue how to fix this and what causes this error.
Thanks in advance for any response
I came across the same problem. I'm not an Rcpp expert, nor C++ nor, a backend coding expert.
I have circumvented this problem by re-sourcing my cpp file every time I want to make a call of the function. So, for example if following is your for loop:
for(i in 1:SampleSize){
out[[I]]<-cfilter(k,explist)
}
Do something like:
for(i in 1:SampleSize){
sourceCpp("cfilter.cpp")
out[[i]]<-cfilter(k,explist)
}
Again, I don't know exactly why this worked for me, but it worked. Based on my shallow knowledge of C++, it might be related to memory allocation and that every time you source, memory is released and hence there is no mis-allocation. But I think this is a very wild guess.
Best

How to call R function as a worker thread from cpp code? using Rcpp package

I have found a strange problem when using Rcpp, maybe it is a known limitation in Rcpp package, but I failed to find any hints by searching related documents, hope someone can help or explain this problem.
Here is my code:
// [[Rcpp::export]]
void set_r_cb(Function f) {
Environment env = Environment::global_env();
env["place_f"] = f;
}
void __test_thread(void* data) {
Rprintf("in thread body\n");
Function f("place_f");
f(*((NumericVector*)data));
}
// [[Rcpp::export]]
NumericVector use_r_callback(NumericVector x) {
Environment env = Environment::global_env();
Function f = env["place_f"];
{ // test thread
tthread::thread t(__test_thread, x);
t.join();
}
return f(x);
}
where in R code:
> x = runif(100)
> set_r_cb(fivenum)
when there is no thread call, everything is OK.
return something like this:
> use_r_callback(x)
[1] 0.01825808 0.24010829 0.37492796 0.58618216 0.93935818
when using thread code,I got such error:
> use_r_callback(x)
in thread body
Error: C stack usage 237426928 is too close to the limit
BTW, I use tinythread, https://gitorious.org/tinythread, but same error occurs when use boost::thread.
R itself is single-treaded so you simply cannot use your R instance from multiple threads.
You can call C++ from R which
sets things up (if needed)
sets a mutex
multithreads to its heart's content, never calling R, never touching R datastructures (and here you can use Open MP, Boost threads, C++ threads, standard pthreads, ...)
collects results
clears the mutex
prepares return (if needed)
and returns. Pretty much everything else will get you errors.

Resources