Rcpp Error Null value passed as symbol address - r

I am new to Rcpp.
I created an rcpp function which takes a dataframe with 2 columns and a vector as input, and returns a vector.
My data are as below
set.seed(10)
min= sort(rnorm(1000,800,sd=0.1))
max= min+0.02
k=data.frame(min,max)
explist= sort(rnorm(100,800,sd=0.2))
Then I call the cfilter.cpp
k$output <- cfilter(k,explist)
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
CharacterVector cfilter(DataFrame k, NumericVector explist) {
NumericVector col1 = k["min"];
NumericVector col2 = k["max"];
NumericVector exp = explist ;
int n = col1.size();
int j = 0;
CharacterVector out(n);
for (int i=0; i<n ; i++){
out[i]=NA_STRING;
while(exp[j]<= col2[i]){
if( exp[j]>= col1[i] && exp[j]<= col2[i] ){
out[i]="Y";
break;
}
else if(exp[j]>col2[i]){
break;
}
else {
j++ ;
}
}
}
return out;
}
It run perfectly fine for 16171 times I called it. And then suddenly, in the loop 16172 it just stops with an error:
> myfile$output<- cfilter(k,explist2)
Error in .Call(<pointer: (nil)>, k, explist) :
NULL value passed as symbol address
I checked k and explist for NA values but there aren't any, there is no problem whatsoever with the input.
I have no clue how to fix this and what causes this error.
Thanks in advance for any response

I came across the same problem. I'm not an Rcpp expert, nor C++ nor, a backend coding expert.
I have circumvented this problem by re-sourcing my cpp file every time I want to make a call of the function. So, for example if following is your for loop:
for(i in 1:SampleSize){
out[[I]]<-cfilter(k,explist)
}
Do something like:
for(i in 1:SampleSize){
sourceCpp("cfilter.cpp")
out[[i]]<-cfilter(k,explist)
}
Again, I don't know exactly why this worked for me, but it worked. Based on my shallow knowledge of C++, it might be related to memory allocation and that every time you source, memory is released and hence there is no mis-allocation. But I think this is a very wild guess.
Best

Related

Exception in thread "main" java.lang.StackOverflowError at Solution.recur(File.java:58)

class Solution{
ArrayList subsetSums(ArrayList arr, int N){
int sum=0;
ArrayList<Integer> temparr = new ArrayList<>();
for(int i=1;i<=arr.size();i++)
{
for(int j = 0; i < arr.size()-i+1 ; j++)
temparr.add(recur(arr,i,j,sum));
}
return temparr;
}
int recur(ArrayList<Integer> arr,int i,int j,int sum)
{
int index = j;
int len = i;
int Sum = sum;
if(len==0)
{
return Sum;
}
Sum += arr.get(index);
return recur(arr,len--,index++,Sum);
}
}
,,,
I'm getting stack overflow error in 'return recur(arr,len--,index++,Sum);'
'''
I think, the main problem here (see comments for potential other problems) is the way you are trying to pass changed arguments to the recursive invocation:
recur(arr,len--,index++,Sum)
Actually this will call recur with the unchanged values of len and index because the operators ++ and -- (when written on the right side of a variable) are defined to return the original value of the variable and then update the variable's value.
Use (I would prefer this)
recur(arr, len-1, index+1, Sum)
or (okay, but the assignment is not needed)
recur(arr, --len, ++index, Sum)
to actually pass the modified value to the function.
Java has a recursion limit. The way to fix this is replace the recursion with a loop. (Or change the function so it does not recur as much. Infinite loops are a problem just as infinite recursion is).
A few tips for the future:
Google the documentation for errors
State the language with a tag in posts
Don't use formatting of line 1
Debug with print statements

how to create a Rcpp NumericVector from Eigen::Tensor without copying underlying data

If I create a large Tensor in Eigen, and I like to return the Tensor back to R as multi-dimension array. I know how to do it with data copy like below. Question: is it possible to do it without the data-copy step?
#include <Rcpp.h>
#include <RcppEigen.h>
#include <unsupported/Eigen/CXX11/Tensor>
// [[Rcpp::depends(RcppEigen)]]
using namespace Rcpp;
template <typename T>
NumericVector copyFromTensor(const T& x)
{
int n = x.size();
NumericVector ans(n);
IntegerVector dim(x.NumDimensions);
for (int i = 0; i < x.NumDimensions; ++i) {
dim[i] = x.dimension(i);
}
memcpy((double*)ans.begin(), (double*)x.data(), n * sizeof(double));
ans.attr("dim") = dim;
return ans;
}
// [[Rcpp::export]]
NumericVector getTensor() {
Eigen::Tensor<double, 3> x(4, 3, 1);
x.setRandom();
return copyFromTensor(x);
}
/*** R
getTensor()
*/
As a general rule you can zero-copy one the way into your C++ code with data coming from R and already managed by R.
On the way out of your C++ code with data returning to R anything that is not created used the R allocator has to be copied.
Here your object x is a stack-allocated so you need a copy. See Writing R Extensions about the R allocator; Eigen may let you use it when you create a new Tensor object. Not a trivial step. I think I would just live with the copy.

RcppArmadillo's sample() is ambiguous after updating R

I commonly work with a short Rcpp function that takes as input a matrix where each row contains K probabilities that sum to 1. The function then randomly samples for each row an integer between 1 and K corresponding to the provided probabilities. This is the function:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadilloExtensions/sample.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector sample_matrix(NumericMatrix x, IntegerVector choice_set) {
int n = x.nrow();
IntegerVector result(n);
for ( int i = 0; i < n; ++i ) {
result[i] = RcppArmadillo::sample(choice_set, 1, false, x(i, _))[0];
}
return result;
}
I recently updated R and all packages. Now I cannot compile this function anymore. The reason is not clear to me. Running
library(Rcpp)
library(RcppArmadillo)
Rcpp::sourceCpp("sample_matrix.cpp")
throws the following error:
error: call of overloaded 'sample(Rcpp::IntegerVector&, int, bool, Rcpp::Matrix<14>::Row)' is ambiguous
This basically tells me that my call to RcppArmadillo::sample() is ambiguous. Can anyone enlighten me as to why this is the case?
There are two things happening here, and two parts to your problem and hence the answer.
The first is "meta": why now? Well we had a bug let in the sample() code / setup which Christian kindly fixed for the most recent RcppArmadillo release (and it is all documented there). In short, the interface for the very probability argument giving you trouble here was changed as it was not safe for re-use / repeated use. It is now.
Second, the error message. You didn't say what compiler or version you use but mine (currently g++-9.3) is actually pretty helpful with the error. It is still C++ so some interpretative dance is needed but in essence it clearly stating you called with Rcpp::Matrix<14>::Row and no interface is provided for that type. Which is correct. sample() offers a few interface, but none for a Row object. So the fix is, once again, simple. Add a line to aid the compiler by making the row a NumericVector and all is good.
Fixed code
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector sample_matrix(NumericMatrix x, IntegerVector choice_set) {
int n = x.nrow();
IntegerVector result(n);
for ( int i = 0; i < n; ++i ) {
Rcpp::NumericVector z(x(i, _));
result[i] = RcppArmadillo::sample(choice_set, 1, false, z)[0];
}
return result;
}
Example
R> Rcpp::sourceCpp("answer.cpp") # no need for library(Rcpp)
R>

Rcpp keeps running for a seemingly simple task

I've been thinking about it all day and still cannot figure out why this happens. My objective is simple: STEP1, generate a function S(h,p); STEP2, numerically integrate S(h,p) with respect to p by trapezoidal rule and obtain a new function SS(h). I wrote the code and source it by sourceCpp, and it successfully generated two functions S(h,p) and SS(h) in R. But when I tried to test it by calculating SS(1), R just kept running and never gave the result, which is weird because the calculation amount is not that big. Any idea why this would happen?
My code is here:
#include <Rcpp.h>
using namespace Rcpp;
//generate the first function that gives S(h,p)
// [[Rcpp::export]]
double S(double h, double p){
double out=2*(h+p+h*p);
return out;
}
//generate the second function that gives the numerically integreation of S(h,p) w.r.t p
//[[Rcpp::export]]
double SS(double h){
double out1=0;
double sum=0;
for (int i=0;i<1;i=i+0.01){
sum=sum+S(h,i);
}
out1=0.01/2*(2*sum-S(h,0)-S(h,1));
return out1;
}
The problem is that you are treating i as if it were not an int in this statement:
for (int i=0;i<1;i=i+0.01){
sum=sum+S(h,i);
}
After each iteration you are attempting to add 0.01 to an integer, which is of course immediately truncated towards 0, meaning that i is always equal to zero, and you have an infinite loop. A minimal example highlighting the problem, with a couple of possible solutions:
#include <Rcpp.h>
// [[Rcpp::export]]
void bad_loop() {
for (int i = 0; i < 1; i += 0.01) {
std::printf("i = %d\n", i);
Rcpp::checkUserInterrupt();
}
}
// [[Rcpp::export]]
void good_loop() {
for (int i = 0; i < 100; i++) {
std::printf("i = %d\n", i);
Rcpp::checkUserInterrupt();
}
}
// [[Rcpp::export]]
void good_loop2() {
for (double j = 0.0; j < 1.0; j += 0.01) {
std::printf("j = %.2f\n", j);
Rcpp::checkUserInterrupt();
}
}
The first alternative (good_loop) is to scale your step size appropriately -- looping from 0 through 99 by 1 takes the same number of iterations as looping from 0 to 0.99 by 0.01. Additionally, you could just use a double instead of an int, as in good_loop2. At any rate, the main takeaway here is that you need to be more careful about choosing your variable types in C++. Unlike R, when you declare i to be an int it will be treated like an int, not a floating point number.
As #nrussell pointed out very expertly, there is an issue with treating i as an int when the type held is a double. The goal of posting this answer is to stress the need to avoid using a double or float as a loop incrementer. I've opted to post it as an answer instead of a comment for readability.
Please note, the loop increment should not ever be given as a double or a float due to precision issues. e.g. it is hard to get i = .99 since i = 0.981111111 et cetera...
Instead, I would opt to have the loop be processed as an int and convert it to a double / float as soon as possible, e.g.
for (int i=0; i < 100; i++){
// Make sure to use double division
// (e.g. either numerator or denominator is a floating / double)
sum += S(h, i/100.0);
}
Further notes:
RcppArmadillo and C++ division issue
Using float / double as a loop variable

RCppParallel Programming Error Crashing R

I have been trying to parallelize one of my Rcpp routines. In doing so I have been trying to follow the Parallel Distance Calculation example from jjalaire. Unfortunately, once I got everything coded up and started to play around, my R session would crash. Sometimes after the first execution, sometimes after the third. To be honest, it was a crap shoot as to when R would crash when I ran the routine. So, I have paired down my code to a small reproducible example to play with.
Rcpp File (mytest.cpp)
#include <Rcpp.h>
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
using namespace std;
using namespace Rcpp;
using namespace RcppParallel;
struct MyThing : public Worker {
RVector<double> _pc;
RVector<double> _pcsd;
MyThing(Rcpp::NumericVector _pc, Rcpp::NumericVector _pcsd) : _pc(_pc), _pcsd(_pcsd){}
void operator()(std::size_t begin, std::size_t end) {
for(int j = begin; j <= end; j++) {
_pc[j] = 1;
// _pcsd[j] = 1;
}
}
};
// [[Rcpp::export]]
void calculateMyThingParallel() {
NumericVector _pc(100);
NumericVector _pcsd(100);
MyThing mt(_pc, _pcsd);
parallelFor(0, 100, mt);
}
R Compilation and Execution Script (mytest.R)
library(Rcpp)
library(inline)
sourceCpp('mytest.cpp')
testmything = function() {
calculateMyThingParallel()
}
if(TRUE) {
for(i in 1:20) {
testmything()
}
}
The error seems to be directly related to my setting of the _pc and _pcsd variables in the operator() method. If I take those out things dramatically improve. Based on the Parallel Distance Calculation example, I am not sure what it is that I have done wrong here. I was under the impression that RVector was thread safe. Although that is my impression, I know this is an issue with threads somehow. Can anybody help me to understand why the above code randomly crashes my R sessions?
For information I am running the following:
Windows 7
R: 3.1.2
Rtools: 3.1
Rcpp: 0.11.3
inline: 0.3.13
RStudio: 0.99.62
After cross-posting this question on the rcpp-devel list, a user responded and infomed me that my loop over j in the operator() method should go between begin <= j < end and not begin <= j <= end which is what I had.
I made that change and sure nuff, everything seems to be working right now.
seems like overextending ones reach past allocated memory spaces still results in unintended consequences...

Resources