Rcpp keeps running for a seemingly simple task - r

I've been thinking about it all day and still cannot figure out why this happens. My objective is simple: STEP1, generate a function S(h,p); STEP2, numerically integrate S(h,p) with respect to p by trapezoidal rule and obtain a new function SS(h). I wrote the code and source it by sourceCpp, and it successfully generated two functions S(h,p) and SS(h) in R. But when I tried to test it by calculating SS(1), R just kept running and never gave the result, which is weird because the calculation amount is not that big. Any idea why this would happen?
My code is here:
#include <Rcpp.h>
using namespace Rcpp;
//generate the first function that gives S(h,p)
// [[Rcpp::export]]
double S(double h, double p){
double out=2*(h+p+h*p);
return out;
}
//generate the second function that gives the numerically integreation of S(h,p) w.r.t p
//[[Rcpp::export]]
double SS(double h){
double out1=0;
double sum=0;
for (int i=0;i<1;i=i+0.01){
sum=sum+S(h,i);
}
out1=0.01/2*(2*sum-S(h,0)-S(h,1));
return out1;
}

The problem is that you are treating i as if it were not an int in this statement:
for (int i=0;i<1;i=i+0.01){
sum=sum+S(h,i);
}
After each iteration you are attempting to add 0.01 to an integer, which is of course immediately truncated towards 0, meaning that i is always equal to zero, and you have an infinite loop. A minimal example highlighting the problem, with a couple of possible solutions:
#include <Rcpp.h>
// [[Rcpp::export]]
void bad_loop() {
for (int i = 0; i < 1; i += 0.01) {
std::printf("i = %d\n", i);
Rcpp::checkUserInterrupt();
}
}
// [[Rcpp::export]]
void good_loop() {
for (int i = 0; i < 100; i++) {
std::printf("i = %d\n", i);
Rcpp::checkUserInterrupt();
}
}
// [[Rcpp::export]]
void good_loop2() {
for (double j = 0.0; j < 1.0; j += 0.01) {
std::printf("j = %.2f\n", j);
Rcpp::checkUserInterrupt();
}
}
The first alternative (good_loop) is to scale your step size appropriately -- looping from 0 through 99 by 1 takes the same number of iterations as looping from 0 to 0.99 by 0.01. Additionally, you could just use a double instead of an int, as in good_loop2. At any rate, the main takeaway here is that you need to be more careful about choosing your variable types in C++. Unlike R, when you declare i to be an int it will be treated like an int, not a floating point number.

As #nrussell pointed out very expertly, there is an issue with treating i as an int when the type held is a double. The goal of posting this answer is to stress the need to avoid using a double or float as a loop incrementer. I've opted to post it as an answer instead of a comment for readability.
Please note, the loop increment should not ever be given as a double or a float due to precision issues. e.g. it is hard to get i = .99 since i = 0.981111111 et cetera...
Instead, I would opt to have the loop be processed as an int and convert it to a double / float as soon as possible, e.g.
for (int i=0; i < 100; i++){
// Make sure to use double division
// (e.g. either numerator or denominator is a floating / double)
sum += S(h, i/100.0);
}
Further notes:
RcppArmadillo and C++ division issue
Using float / double as a loop variable

Related

Exception in thread "main" java.lang.StackOverflowError at Solution.recur(File.java:58)

class Solution{
ArrayList subsetSums(ArrayList arr, int N){
int sum=0;
ArrayList<Integer> temparr = new ArrayList<>();
for(int i=1;i<=arr.size();i++)
{
for(int j = 0; i < arr.size()-i+1 ; j++)
temparr.add(recur(arr,i,j,sum));
}
return temparr;
}
int recur(ArrayList<Integer> arr,int i,int j,int sum)
{
int index = j;
int len = i;
int Sum = sum;
if(len==0)
{
return Sum;
}
Sum += arr.get(index);
return recur(arr,len--,index++,Sum);
}
}
,,,
I'm getting stack overflow error in 'return recur(arr,len--,index++,Sum);'
'''
I think, the main problem here (see comments for potential other problems) is the way you are trying to pass changed arguments to the recursive invocation:
recur(arr,len--,index++,Sum)
Actually this will call recur with the unchanged values of len and index because the operators ++ and -- (when written on the right side of a variable) are defined to return the original value of the variable and then update the variable's value.
Use (I would prefer this)
recur(arr, len-1, index+1, Sum)
or (okay, but the assignment is not needed)
recur(arr, --len, ++index, Sum)
to actually pass the modified value to the function.
Java has a recursion limit. The way to fix this is replace the recursion with a loop. (Or change the function so it does not recur as much. Infinite loops are a problem just as infinite recursion is).
A few tips for the future:
Google the documentation for errors
State the language with a tag in posts
Don't use formatting of line 1
Debug with print statements

RcppArmadillo's sample() is ambiguous after updating R

I commonly work with a short Rcpp function that takes as input a matrix where each row contains K probabilities that sum to 1. The function then randomly samples for each row an integer between 1 and K corresponding to the provided probabilities. This is the function:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadilloExtensions/sample.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector sample_matrix(NumericMatrix x, IntegerVector choice_set) {
int n = x.nrow();
IntegerVector result(n);
for ( int i = 0; i < n; ++i ) {
result[i] = RcppArmadillo::sample(choice_set, 1, false, x(i, _))[0];
}
return result;
}
I recently updated R and all packages. Now I cannot compile this function anymore. The reason is not clear to me. Running
library(Rcpp)
library(RcppArmadillo)
Rcpp::sourceCpp("sample_matrix.cpp")
throws the following error:
error: call of overloaded 'sample(Rcpp::IntegerVector&, int, bool, Rcpp::Matrix<14>::Row)' is ambiguous
This basically tells me that my call to RcppArmadillo::sample() is ambiguous. Can anyone enlighten me as to why this is the case?
There are two things happening here, and two parts to your problem and hence the answer.
The first is "meta": why now? Well we had a bug let in the sample() code / setup which Christian kindly fixed for the most recent RcppArmadillo release (and it is all documented there). In short, the interface for the very probability argument giving you trouble here was changed as it was not safe for re-use / repeated use. It is now.
Second, the error message. You didn't say what compiler or version you use but mine (currently g++-9.3) is actually pretty helpful with the error. It is still C++ so some interpretative dance is needed but in essence it clearly stating you called with Rcpp::Matrix<14>::Row and no interface is provided for that type. Which is correct. sample() offers a few interface, but none for a Row object. So the fix is, once again, simple. Add a line to aid the compiler by making the row a NumericVector and all is good.
Fixed code
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector sample_matrix(NumericMatrix x, IntegerVector choice_set) {
int n = x.nrow();
IntegerVector result(n);
for ( int i = 0; i < n; ++i ) {
Rcpp::NumericVector z(x(i, _));
result[i] = RcppArmadillo::sample(choice_set, 1, false, z)[0];
}
return result;
}
Example
R> Rcpp::sourceCpp("answer.cpp") # no need for library(Rcpp)
R>

Rcpp Error Null value passed as symbol address

I am new to Rcpp.
I created an rcpp function which takes a dataframe with 2 columns and a vector as input, and returns a vector.
My data are as below
set.seed(10)
min= sort(rnorm(1000,800,sd=0.1))
max= min+0.02
k=data.frame(min,max)
explist= sort(rnorm(100,800,sd=0.2))
Then I call the cfilter.cpp
k$output <- cfilter(k,explist)
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
CharacterVector cfilter(DataFrame k, NumericVector explist) {
NumericVector col1 = k["min"];
NumericVector col2 = k["max"];
NumericVector exp = explist ;
int n = col1.size();
int j = 0;
CharacterVector out(n);
for (int i=0; i<n ; i++){
out[i]=NA_STRING;
while(exp[j]<= col2[i]){
if( exp[j]>= col1[i] && exp[j]<= col2[i] ){
out[i]="Y";
break;
}
else if(exp[j]>col2[i]){
break;
}
else {
j++ ;
}
}
}
return out;
}
It run perfectly fine for 16171 times I called it. And then suddenly, in the loop 16172 it just stops with an error:
> myfile$output<- cfilter(k,explist2)
Error in .Call(<pointer: (nil)>, k, explist) :
NULL value passed as symbol address
I checked k and explist for NA values but there aren't any, there is no problem whatsoever with the input.
I have no clue how to fix this and what causes this error.
Thanks in advance for any response
I came across the same problem. I'm not an Rcpp expert, nor C++ nor, a backend coding expert.
I have circumvented this problem by re-sourcing my cpp file every time I want to make a call of the function. So, for example if following is your for loop:
for(i in 1:SampleSize){
out[[I]]<-cfilter(k,explist)
}
Do something like:
for(i in 1:SampleSize){
sourceCpp("cfilter.cpp")
out[[i]]<-cfilter(k,explist)
}
Again, I don't know exactly why this worked for me, but it worked. Based on my shallow knowledge of C++, it might be related to memory allocation and that every time you source, memory is released and hence there is no mis-allocation. But I think this is a very wild guess.
Best

Accuracy: C++11's binomial_distribution<int> not coincide with the what R returns

I need to generate samples in C++ that follow the hypergeometric distribution. But, for my case I can approximate it with the binomial distribution without any problem.
Thus I'd like to use the std implementation in C++11. If I generate many samples at calculate the probability I get different values from the one R tells me. What is more, the difference does not get any smaller when increase the number of samples increases. The parameters are the same for R and C++.
Thus the question: Why do I not get the same results and what can I do/which should I trust?
See below, the R and C++ code. The C++ program calculates the difference to the R values. Even if I let the program run for quite a while this numbers don't get smaller but just wiggle around the E-5, E-6, E-7 magnitude.
R:
dbinom(0:2, 2, 0.48645948945615974379)
#0.26372385596962805154 0.49963330914842424280 0.23664283488194759464
C++:
#include <iostream>
#include <iomanip>
#include <random>
using namespace std;
class Generator {
public:
Generator();
virtual ~Generator();
int binom();
private:
std::random_device randev;
std::mt19937_64 gen;
std::binomial_distribution<int> dist;
};
Generator::Generator() : randev(), gen(randev()), dist(2,0.48645948945615974379) { }
Generator::~Generator() {}
int Generator::binom() { return dist(gen); }
int main() {
Generator rd;
const double nrolls = 10000000; // number of experiments
double p[3]={};
for (int k=1; k<100; ++k) {
for (int i=0; i<nrolls; ++i) {
int number = rd.binom();
++p[number];
}
cout << "Samples=" << setw(8) << nrolls*k <<
" dP(0)="<<setw(13)<<p[0]/(nrolls*k)-0.26372385596962805154<<
" dP(1)="<<setw(13)<<p[1]/(nrolls*k)-0.49963330914842424280<<
" dP(2)="<<setw(13)<<p[2]/(nrolls*k)-0.23664283488194759464<<endl;
}
cout<<"end";
return 0;
}
A selective output:
Samples= 1e+07 dP(0)= -2.0056e-05 dP(1)= 9.49909e-05 dP(2)= -7.49349e-05
Samples= 1e+08 dP(0)= 1.5064e-05 dP(1)= 3.43609e-05 dP(2)= -4.94249e-05
Samples= 9.9e+08 dP(0)= -2.06449e-05 dP(1)= 5.93429e-06 dP(2)= 1.47106e-05
This should really be a comment.
I don't see anything wrong with your numbers. You are doing 10**9 repetitions. Hence by the central limit theorem you should see accuracy around 10**(-4.5). That is indeed what you are seeing. That the signs of dP(0) and dP(2) fluctuate is another good sign. If you run your program multiple times, do the signs on the last line always show the same pattern. If not, that is another good sign.
Btw R is giving you way too many digits in my opinion. With doubles you only have about 15 digits of accuracy.

using a user defined function in Rcpp (cppFunction)

I have a user defined function in r:
blacksch<-function(s_0,k,sigma,r,t)
{
d1=(log(s_0/k) + (r + (sigma^2)/2)*(t))/(sigma*sqrt(t))
d2=(log(s_0/k) + (r - (sigma^2)/2)*(t))/(sigma*sqrt(t))
p=(pnorm(-d2)*k*exp(-r*t))-pnorm(-d1)*s_0
}
And I would like to use this function in c++ code that I have written using Rcpp and cppFunction. I have been through the documentation and examples a few times, but have not been successful.
bs_martin<-cppFunction('NumericMatrix compMartin (NumericMatrix st, NumericMatrix dv, double s_0, double k,
double t, double sigma, double r, int steps, int paths, Function blacksch(fun)) {
// Ensure RNG scope set
RNGScope scope;
int min_bs_step=0;
double minbsvalue=0;
vector<double> u[0]=100.0;
for(int i=1;i<=paths; i++)
{
min_bs_step=0;
for(int j=1;j<=steps;j++)
{
if (dv[i,j]>0 && min_bs_step==0)
{
min_bs_step=i;
minbsvalue=blacksch(s_0,k,sigma,r,t);
}
else if (min_bs_step!=0)
{
dv[i,j]=1 - minbsvalue;
}
}
}
return dv;
}')
I would suggest the following:
Study our documentation and examples. We show how to pass functions around too, even if we do not recommend it (for obvious performance reason, calling R from C++ ain't speedy).
If you somewhat complex example does not work, try a smaller one. At the end of the day you may just want a tester which receives two numbers and passes those to a supplied function.
And lastly: You really want blacksch in C++ too. All the statistical functions are available under the same names.

Resources