Inconsistent results between Rcpp and R code - r

UPDATE
Previous example is complicated, hence please allow me to use a simpler example as shown below:
Here is the Rcpp code:
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
#include <Rmath.h>
#include <Rcpp.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp ;
using namespace arma;
using namespace std;
// [[Rcpp::export]]
double chooseC(double n, double k) {
return Rf_choose(n, k);
}
// [[Rcpp::export]]
double function3(double n, double m, double beta) {
double prob;
NumericVector k(m);
NumericVector k_vec(m);
if(n<m){prob=0;}
else{
if(chooseC(n,m)==R_PosInf){
k=seq_len(m)-1;
k_vec= (n-k)/(m-k)*std::pow((1-beta),(n-m)/m)*beta;
prob=std::accumulate(k_vec.begin(),k_vec.end(), 1, std::multiplies<double>())*beta;
}
else{
prob = beta * chooseC(n,m) * std::pow(beta,m) * std::pow((1-beta),(n-m));
}
}
return(prob);
}
Here is the R code:
function4 <- function ( n , m , beta )
{
if ( n < m )
{
prob <- 0.0
}
else
{
if (is.infinite(choose(n,m))){
k<-0:(m-1)
prob <- beta *prod((n-k)/(m-k)*(1-beta)^((n-m)/m)*beta)
}
else{
prob <- beta * choose(n,m) * beta^m * (1-beta)^(n-m)
}
}
prob
}
Comparison:
input<-619
beta<-0.09187495
x<-seq(0, (input+1)/beta*3)
yy<-sapply(x,function(n)function3(n,input, beta=beta))
yy2<-sapply(x,function(n)function4(n,input, beta=beta))
sum(yy)=0
sum(yy2)=1
However, with other input:
input<-1
beta<-0.08214248
Both results are the same, sum(yy)=sum(yy2)=0.9865887.
I used double in Rcpp code, I don't know what else could cause the inconsistent precision between Rcpp and R code.
Thanks a lot!

I think I fix the Rcpp code, so right now both Rcpp and R code produce the same results when the results are very small values. The solution is shown as below:
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
#include <Rmath.h>
#include <Rcpp.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp ;
using namespace arma;
using namespace std;
// [[Rcpp::export]]
double chooseC(double n, double k) {
return Rf_choose(n, k);
}
// [[Rcpp::export]]
double function3(double n, double m, double beta) {
double prob;
arma::vec k = arma::linspace<vec>(0, m-1, m);
arma::vec k_vec;
if(n<m){prob=0;}
else{
if(chooseC(n,m)==R_PosInf){
k_vec= (n-k)/(m-k)*pow((1-beta),(n-m)/m)*beta;
prob=arma::prod(k_vec)*beta;
}
else{
prob = beta * chooseC(n,m) * pow(beta,m) * pow((1-beta),(n-m));
}
}
return(prob);
}
However, I still do not understand why by writing code in this way will fix the precision inconsistent. Rcpp and RcppArmadillo still look like black boxes to me.

Related

Using Rcpp for faster extraction and summary of lm output

I'm trying to speed up extraction from R's summary lm object when contained in an extremely large loop. The following is my attempt as a Rcpp to speed it up.
Toy-Data
synthetic_LMsummary<-lapply(1:100000L,function(UU){summary(lm(rnorm(500)~replicate(2,{rnorm(500)})))})
R Version
tidy.train <- function(s,SelectedRow=3) {
out<-data.frame(
estimate=s$coefficients[, "Estimate"][SelectedRow],
std.error=s$coefficients[, "Std. Error"][SelectedRow],
statistic=s$coefficients[, "t value"][SelectedRow],
p.value=s$coefficients[, "Pr(>|t|)"][SelectedRow],
rsquared=s$r.squared
)
row.names(out) <- NULL
out
}
synthetic_LMsummary %>% purrr::map_dfr(~tidy.train(.,SelectedRow=3))
Attempted Rcpp Version
Rcpp::sourceCpp(code='
#include <Rcpp.h>
// [[Rcpp::export]]
using namespace std;
using namespace Rcpp;
Rcpp::NumericMatrix RcppTidy(Rcpp::List Summary_List,int mat_cols, int select) {
int mat_rows = Summary_List.length();
Rcpp::NumericMatrix Output_Mat(mat_rows,mat_cols);
for (int i = 0; i < Summary_List.length(); ++i) {
Rcpp::List SubSet=Summary_List[i];
Rcpp::NumericMatrix CoefDF=SubSet["coefficients"];
Rcpp::NumericVector Coef=CoefDF.row(select);
Rcpp::NumericVector rsquared=SubSet["adj.r.squared"];
Output_Mat(i,_) = cbind(Coef,rsquared);
}
return Output_Mat;
}
')
Tidy_synLM<-RcppTidy(synthetic_LMsummary,5L,3L)

Why do I get the error for using "pnorm" in Rcpp

I need to involve variable from arma::in my Rcpp code. But I ran into a problem when trying to use the sugar function pnorm. Here is a demo:
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
double pget(NumericVector x, NumericVector beta) {
arma::colvec xx = Rcpp::as<arma::colvec>(x) ;
arma::colvec bb = Rcpp::as<arma::colvec>(beta) ;
double tt = as_scalar( arma::trans(xx) * bb);
double temp = Rcpp::pnorm(tt);
return temp;
}
Then I got an error: no matching function for call to 'pnorm5'
Does that mean I cannot use Rcpp::pnorm???
The Rcpp sugar functions are meant for vector type arguments like Rcpp::NumericVector. For scalar arguments you can use the functions in the R namespace:
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
double pget(NumericVector x, NumericVector beta) {
arma::colvec xx = Rcpp::as<arma::colvec>(x) ;
arma::colvec bb = Rcpp::as<arma::colvec>(beta) ;
double tt = as_scalar( arma::trans(xx) * bb);
double temp = R::pnorm(tt, 0.0, 1.0, 1, 0);
return temp;
}
/*** R
x <- rnorm(5)
beta <- rnorm(5)
pget(x, beta)
*/
BTW, here two variants. First variant uses arma instead of Rcpp vectors as arguments. Since these are const references, no data is copied. In addition, arma::dot is used:
// [[Rcpp::export]]
double pget2(const arma::colvec& xx, const arma::colvec& bb) {
double tt = arma::dot(xx, bb);
return R::pnorm(tt, 0.0, 1.0, 1, 0);
}
The second variant calculates the scalar product without resorting to Armadillo:
// [[Rcpp::export]]
double pget3(NumericVector x, NumericVector beta) {
double tt = Rcpp::sum(x * beta);
return R::pnorm(tt, 0.0, 1.0, 1, 0);
}
I'm much less of an expert than #RalfStubner at Rcpp, so I had to hack around (with help from StackOverflow and the Rcpp cheat sheat) to get the following code. Instead of using the R-namespace versions on scalars, I converted back to a NumericVector ... this can almost certainly be done more efficiently/skipping a few steps by someone who actually knows what they're doing ... e.g. it's possible that the arma-to-NumericVector conversion could be done directly without going through as_scalar ... ?
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
#include <Rcpp.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
NumericVector pget(NumericVector x, NumericVector beta) {
colvec xx = as<colvec>(x) ;
colvec bb = as<colvec>(beta) ;
double tt = as_scalar(trans(xx) * bb);
NumericVector tt2 = NumericVector::create( tt );
NumericVector temp = Rcpp::pnorm(tt2);
return temp;
}

Rcpp sugar commands in armadillo

I'm trying to use ifelse() command of Rcpp sugar with arma::vec. The code fails with error
'ifelse' was not declared in this scope
I could not find a solution. A simple example code (resulted with error) is below.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec f(arma::vec x, arma::vec y) {
arma::vec res1 = Rcpp::ifelse(x < y, x, y);
arma::vec res = trans(res1)*y;
return res;
}
/*** R
f(c(1,2,3),c(3,2,1))
*/
Using Armadillo's advanced constructors you can have Rcpp::NumericVector and arma::vec that refer to the same memory location. Then you can use both Rcpp functions and arma functions by using the correct front-end object for that piece of memory:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec f(Rcpp::NumericVector xr, Rcpp::NumericVector yr) {
arma::vec x(xr.begin(), xr.size(), false, true);
arma::vec y(yr.begin(), yr.size(), false, true);
Rcpp::NumericVector res1r(xr.size());
arma::vec res1(res1r.begin(), res1r.size(), false, true);
res1r = Rcpp::ifelse(xr < yr, xr, yr);
arma::vec res = trans(res1)*y;
return res;
}
/*** R
f(c(1,2,3),c(3,2,1))
*/
I am not 100% sure that this does not have any unwanted side-effects.
This is the solution that I found I hope will work for you.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec f(arma::vec x, arma::vec y) {
int n = x.size();
arma::vec res(n);
for(int i = 0; i < n; i++){
if (x[i] < y[i]){res[i] = x[i];} else{res[i] = y[i];}
}
return trans(res)*y;
}
The output is
/*** R
f(c(1,2,3),c(3,2,1))
*/
[,1]
[1,] 8

RcppParallel RVector push_back or something similar?

I am using RcppParallel to speed up some calculations. However, I am running out of memory in the process, so I would like to save results within the Parallel loop that are pass some relevance threshold. Below is a toy example to illustrate my point:
#include <Rcpp.h>
#include <RcppParallel.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppParallel)]]
// [[Rcpp::plugins(cpp11)]]
struct Example : public RcppParallel::Worker {
RcppParallel::RVector<double> xvals, xvals_output, yvals;
Example(const NumericVector & xvals, NumericVector & yvals, NumericVector & xvals_output) :
xvals(xvals), xvals_output(xvals_output), yvals(yvals) {}
void operator()(std::size_t begin, size_t end) {
for(std::size_t i=begin; i < end; i++) {
double y = xvals[i] * (xvals[i] - 1);
// if(y < 0) {
// xvals_output.push_back(xvals[i]);
// yvals.push_back(y);
// }
xvals_output[i] = xvals[i];
yvals[i] = y;
}
}
};
// [[Rcpp::export]]
List find_values(NumericVector xvals) {
NumericVector xvals_output(xvals.size());
NumericVector yvals(xvals.size());
Example ex(xvals, yvals, xvals_output);
parallelFor(0, xvals.size(), ex);
List L = List::create(xvals_output, yvals);
return(L);
}
The R code would be:
find_values(seq(-10,10, by=0.5))
The commented out code is what I would like to do.
That is, I would like to initialize an empty vector, and append only the y-values that pass a certain threshold and also the associated x-values.
In my real usage, I am calculating a MxN matrix, so memory is an issue.
What is the correct way to approach this issue?
If anyone ever comes across a similar problem, here's a solution using "concurrent_vector" from TBB (which RcppParallel uses under the hood and is available as a header).
#include <Rcpp.h>
#include <RcppParallel.h>
#include <tbb/concurrent_vector.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppParallel)]]
// [[Rcpp::plugins(cpp11)]]
struct Example : public RcppParallel::Worker {
RcppParallel::RVector<double> xvals;
tbb::concurrent_vector< std::pair<double, double> > &output;
Example(const NumericVector & xvals, tbb::concurrent_vector< std::pair<double, double> > &output) :
xvals(xvals), output(output) {}
void operator()(std::size_t begin, size_t end) {
for(std::size_t i=begin; i < end; i++) {
double y = xvals[i] * (xvals[i] - 1);
if(y < 0) {
output.push_back( std::pair<double, double>(xvals[i], y) );
}
}
}
};
// [[Rcpp::export]]
List find_values(NumericVector xvals) {
tbb::concurrent_vector< std::pair<double, double> > output;
Example ex(xvals,output);
parallelFor(0, xvals.size(), ex);
NumericVector xout(output.size());
NumericVector yout(output.size());
for(int i=0; i<output.size(); i++) {
xout[i] = output[i].first;
yout[i] = output[i].second;
}
List L = List::create(xout, yout);
return(L);
}
Output:
> find_values(seq(-10,10, by=0.5))
[[1]]
[1] 0.5
[[2]]
[1] -0.25

Rcpp gamma integral

I am trying to rewrite into (R)cpp an original R function that makes use of the gamma function (from double input). Below the original source. When comping with sourceCpp the following error is raised "no matching function for call to 'gamma(Rcpp::traits::storage_type(<14>:.type)'"
The gamma function should has been put within sugar (as the mean below use) so I expect there should be easily called.
#include <Rcpp.h>
#include <math.h>
using namespace Rcpp;
// original R function
// function (y_pred, y_true)
// {
// eps <- 1e-15
// y_pred <- pmax(y_pred, eps)
// Poisson_LogLoss <- mean(log(gamma(y_true + 1)) + y_pred -
// log(y_pred) * y_true)
// return(Poisson_LogLoss)
// }
// [[Rcpp::export]]
double poissonLogLoss(NumericVector predicted, NumericVector actual) {
NumericVector temp, y_pred_new;
double out;
const double eps=1e-15;
y_pred_new=pmax(predicted,eps);
long n = predicted.size();
for (long i = 0; i < n; ++i) {
temp[i] = log( gamma(actual[i]+1)+y_pred_new[i]-log(y_pred_new[i])*actual[i]);
}
out=mean(temp); // using sugar implementation
return out;
}
You are making this too complicated as the point of Rcpp Sugar is work vectorized. So the following compiles as well:
#include <Rcpp.h>
#include <math.h>
using namespace Rcpp;
// [[Rcpp::export]]
double poissonLogLoss(NumericVector predicted, NumericVector actual) {
NumericVector temp, y_pred_new;
double out;
const double eps=1e-15;
y_pred_new=pmax(predicted,eps);
temp = log(gamma(actual + 1)) + y_pred_new - log(y_pred_new)*actual;
out=mean(temp); // using sugar implementation
return out;
}
Now, you didn't supply any test data so I do not know if this computes correctly or not. Also, because your R expression is already vectorized, this will not be much faster.
Lastly, your compile error is likely due to the Sugar function gamma() expecting an Rcpp object whereas you provided a double.

Resources