Rcpp sugar commands in armadillo - r

I'm trying to use ifelse() command of Rcpp sugar with arma::vec. The code fails with error
'ifelse' was not declared in this scope
I could not find a solution. A simple example code (resulted with error) is below.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec f(arma::vec x, arma::vec y) {
arma::vec res1 = Rcpp::ifelse(x < y, x, y);
arma::vec res = trans(res1)*y;
return res;
}
/*** R
f(c(1,2,3),c(3,2,1))
*/

Using Armadillo's advanced constructors you can have Rcpp::NumericVector and arma::vec that refer to the same memory location. Then you can use both Rcpp functions and arma functions by using the correct front-end object for that piece of memory:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec f(Rcpp::NumericVector xr, Rcpp::NumericVector yr) {
arma::vec x(xr.begin(), xr.size(), false, true);
arma::vec y(yr.begin(), yr.size(), false, true);
Rcpp::NumericVector res1r(xr.size());
arma::vec res1(res1r.begin(), res1r.size(), false, true);
res1r = Rcpp::ifelse(xr < yr, xr, yr);
arma::vec res = trans(res1)*y;
return res;
}
/*** R
f(c(1,2,3),c(3,2,1))
*/
I am not 100% sure that this does not have any unwanted side-effects.

This is the solution that I found I hope will work for you.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec f(arma::vec x, arma::vec y) {
int n = x.size();
arma::vec res(n);
for(int i = 0; i < n; i++){
if (x[i] < y[i]){res[i] = x[i];} else{res[i] = y[i];}
}
return trans(res)*y;
}
The output is
/*** R
f(c(1,2,3),c(3,2,1))
*/
[,1]
[1,] 8

Related

"inner_product" was not declared in this scope

Hi I am new to rcpp and computing the inner product of two variables but getting an error "inner_product was not declared in this scope" for the following code:
#include <math.h>
#include <RcppCommon.h>
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector polynomial_kernel(NumericVector x, NumericMatrix Y, double scale = 1, double offset =
1, int d=1){
int n = Y.nrow();
NumericVector kernel(n);
for (int j = 0; j < n; j++){
NumericVector v = Y( j,_ );
double crossProd =innerProduct(x,v);
kernel[j]= pow((scale*crossProd+offset),2);
}
return kernel;
}
Please help me to resolve this problem.
Below is simpler, repaired version of your code that actually compiles. It uses Armadillo types for consistency, and instead of calling a non-existing "inner_product" routines computes the inner product of two vectors the standard way via multiplication.
#include <RcppArmadillo.h> // also pulls in Rcpp.h amd cmath
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec polynomial_kernel(arma::vec x, arma::mat Y,
double scale = 1, double offset = 1, int d=1) {
int n = Y.n_rows;
arma::vec kernel(n);
for (int j = 0; j < n; j++){
arma::rowvec v = Y.row(j);
double crossProd = arma::as_scalar(v * x);
kernel[j] = std::pow((scale*crossProd+offset),2);
}
return kernel;
}
Your example was not a minimallyc complete verifiable example so I cannot show it any data you could have supplied with. On some made up data it seems to work:
R> set.seed(123)
R> polynomial_kernel(runif(4), matrix(rnorm(16),4))
[,1]
[1,] 3.317483
[2,] 3.055690
[3,] 1.208345
[4,] 0.301834
R>

Why do I get the error for using "pnorm" in Rcpp

I need to involve variable from arma::in my Rcpp code. But I ran into a problem when trying to use the sugar function pnorm. Here is a demo:
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
double pget(NumericVector x, NumericVector beta) {
arma::colvec xx = Rcpp::as<arma::colvec>(x) ;
arma::colvec bb = Rcpp::as<arma::colvec>(beta) ;
double tt = as_scalar( arma::trans(xx) * bb);
double temp = Rcpp::pnorm(tt);
return temp;
}
Then I got an error: no matching function for call to 'pnorm5'
Does that mean I cannot use Rcpp::pnorm???
The Rcpp sugar functions are meant for vector type arguments like Rcpp::NumericVector. For scalar arguments you can use the functions in the R namespace:
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
double pget(NumericVector x, NumericVector beta) {
arma::colvec xx = Rcpp::as<arma::colvec>(x) ;
arma::colvec bb = Rcpp::as<arma::colvec>(beta) ;
double tt = as_scalar( arma::trans(xx) * bb);
double temp = R::pnorm(tt, 0.0, 1.0, 1, 0);
return temp;
}
/*** R
x <- rnorm(5)
beta <- rnorm(5)
pget(x, beta)
*/
BTW, here two variants. First variant uses arma instead of Rcpp vectors as arguments. Since these are const references, no data is copied. In addition, arma::dot is used:
// [[Rcpp::export]]
double pget2(const arma::colvec& xx, const arma::colvec& bb) {
double tt = arma::dot(xx, bb);
return R::pnorm(tt, 0.0, 1.0, 1, 0);
}
The second variant calculates the scalar product without resorting to Armadillo:
// [[Rcpp::export]]
double pget3(NumericVector x, NumericVector beta) {
double tt = Rcpp::sum(x * beta);
return R::pnorm(tt, 0.0, 1.0, 1, 0);
}
I'm much less of an expert than #RalfStubner at Rcpp, so I had to hack around (with help from StackOverflow and the Rcpp cheat sheat) to get the following code. Instead of using the R-namespace versions on scalars, I converted back to a NumericVector ... this can almost certainly be done more efficiently/skipping a few steps by someone who actually knows what they're doing ... e.g. it's possible that the arma-to-NumericVector conversion could be done directly without going through as_scalar ... ?
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
#include <Rcpp.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
NumericVector pget(NumericVector x, NumericVector beta) {
colvec xx = as<colvec>(x) ;
colvec bb = as<colvec>(beta) ;
double tt = as_scalar(trans(xx) * bb);
NumericVector tt2 = NumericVector::create( tt );
NumericVector temp = Rcpp::pnorm(tt2);
return temp;
}

Rcpp - generate multiple random observations from custom distribution

This question is related to a previous one on calling functions within functions in Rcpp.
I need to generate a large number of random draws from a custom distribution, in a way similar to rnorm() or rbinom(), with the additional complication that my function produces a vector output.
As a solution, I thought about defining a function that generates observations from the custom distribution, and then a main function that draws n times from the generating function via a for loop. Here below is a much simplified working version of the code:
#include <Rcpp.h>
using namespace Rcpp;
// generating function
NumericVector gen(NumericVector A, NumericVector B){
NumericVector out = no_init_vector(2);
out[0] = R::runif(A[0],A[1]) + R::runif(B[0],B[1]);
out[1] = R::runif(A[0],A[1]) - R::runif(B[0],B[1]);
return out;
}
// [[Rcpp::export]]
// draw n observations
NumericVector rdraw(int n, NumericVector A, NumericVector B){
NumericMatrix out = no_init_matrix(n, 2);
for (int i = 0; i < n; ++i) {
out(i,_) = gen(A, B);
}
return out;
}
I am looking for ways to speed up the draws. My questions are: is there any more efficient alternative to the for loop? Would parallelization help in this case?
Thank you for any help!
There are different ways to speed this up:
Use inline on gen(), reducing the number of function calls.
Use Rcpp::runif instead of a loop with R::runif to remove even more function calls.
Use a faster RNG that allows for parallel execution.
Here points 1. and 2.:
#include <Rcpp.h>
using namespace Rcpp;
// generating function
inline NumericVector gen(NumericVector A, NumericVector B){
NumericVector out = no_init_vector(2);
out[0] = R::runif(A[0],A[1]) + R::runif(B[0],B[1]);
out[1] = R::runif(A[0],A[1]) - R::runif(B[0],B[1]);
return out;
}
// [[Rcpp::export]]
// draw n observations
NumericVector rdraw(int n, NumericVector A, NumericVector B){
NumericMatrix out = no_init_matrix(n, 2);
for (int i = 0; i < n; ++i) {
out(i,_) = gen(A, B);
}
return out;
}
// [[Rcpp::export]]
// draw n observations
NumericVector rdraw2(int n, NumericVector A, NumericVector B){
NumericMatrix out = no_init_matrix(n, 2);
out(_, 0) = Rcpp::runif(n, A[0],A[1]) + Rcpp::runif(n, B[0],B[1]);
out(_, 1) = Rcpp::runif(n, A[0],A[1]) - Rcpp::runif(n, B[0],B[1]);
return out;
}
/*** R
set.seed(42)
system.time(rdraw(1e7, c(0,2), c(1,3)))
system.time(rdraw2(1e7, c(0,2), c(1,3)))
*/
Result:
> set.seed(42)
> system.time(rdraw(1e7, c(0,2), c(1,3)))
user system elapsed
1.576 0.034 1.610
> system.time(rdraw2(1e7, c(0,2), c(1,3)))
user system elapsed
0.458 0.139 0.598
For comparison, your original code took about 1.8s for 10^7 draws. For point 3. I am adapting code from the parallel vignette of my dqrng package:
#include <Rcpp.h>
// [[Rcpp::depends(dqrng)]]
#include <xoshiro.h>
#include <dqrng_distribution.h>
// [[Rcpp::plugins(openmp)]]
#include <omp.h>
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
Rcpp::NumericMatrix rdraw3(int n, Rcpp::NumericVector A, Rcpp::NumericVector B, int seed, int ncores) {
dqrng::uniform_distribution distA(A(0), A(1));
dqrng::uniform_distribution distB(B(0), B(1));
dqrng::xoshiro256plus rng(seed);
Rcpp::NumericMatrix res = Rcpp::no_init_matrix(n, 2);
RcppParallel::RMatrix<double> output(res);
#pragma omp parallel num_threads(ncores)
{
dqrng::xoshiro256plus lrng(rng); // make thread local copy of rng
lrng.jump(omp_get_thread_num() + 1); // advance rng by 1 ... ncores jumps
auto genA = std::bind(distA, std::ref(lrng));
auto genB = std::bind(distB, std::ref(lrng));
#pragma omp for
for (int i = 0; i < n; ++i) {
output(i, 0) = genA() + genB();
output(i, 1) = genA() - genB();
}
}
return res;
}
/*** R
system.time(rdraw3(1e7, c(0,2), c(1,3), 42, 2))
*/
Result:
> system.time(rdraw3(1e7, c(0,2), c(1,3), 42, 2))
user system elapsed
0.276 0.025 0.151
So with a faster RNG and moderate parallelism, we can gain an order of magnitude in execution time. The results will be different, of course, but summary statistics should be the same.

RcppParallel RVector push_back or something similar?

I am using RcppParallel to speed up some calculations. However, I am running out of memory in the process, so I would like to save results within the Parallel loop that are pass some relevance threshold. Below is a toy example to illustrate my point:
#include <Rcpp.h>
#include <RcppParallel.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppParallel)]]
// [[Rcpp::plugins(cpp11)]]
struct Example : public RcppParallel::Worker {
RcppParallel::RVector<double> xvals, xvals_output, yvals;
Example(const NumericVector & xvals, NumericVector & yvals, NumericVector & xvals_output) :
xvals(xvals), xvals_output(xvals_output), yvals(yvals) {}
void operator()(std::size_t begin, size_t end) {
for(std::size_t i=begin; i < end; i++) {
double y = xvals[i] * (xvals[i] - 1);
// if(y < 0) {
// xvals_output.push_back(xvals[i]);
// yvals.push_back(y);
// }
xvals_output[i] = xvals[i];
yvals[i] = y;
}
}
};
// [[Rcpp::export]]
List find_values(NumericVector xvals) {
NumericVector xvals_output(xvals.size());
NumericVector yvals(xvals.size());
Example ex(xvals, yvals, xvals_output);
parallelFor(0, xvals.size(), ex);
List L = List::create(xvals_output, yvals);
return(L);
}
The R code would be:
find_values(seq(-10,10, by=0.5))
The commented out code is what I would like to do.
That is, I would like to initialize an empty vector, and append only the y-values that pass a certain threshold and also the associated x-values.
In my real usage, I am calculating a MxN matrix, so memory is an issue.
What is the correct way to approach this issue?
If anyone ever comes across a similar problem, here's a solution using "concurrent_vector" from TBB (which RcppParallel uses under the hood and is available as a header).
#include <Rcpp.h>
#include <RcppParallel.h>
#include <tbb/concurrent_vector.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppParallel)]]
// [[Rcpp::plugins(cpp11)]]
struct Example : public RcppParallel::Worker {
RcppParallel::RVector<double> xvals;
tbb::concurrent_vector< std::pair<double, double> > &output;
Example(const NumericVector & xvals, tbb::concurrent_vector< std::pair<double, double> > &output) :
xvals(xvals), output(output) {}
void operator()(std::size_t begin, size_t end) {
for(std::size_t i=begin; i < end; i++) {
double y = xvals[i] * (xvals[i] - 1);
if(y < 0) {
output.push_back( std::pair<double, double>(xvals[i], y) );
}
}
}
};
// [[Rcpp::export]]
List find_values(NumericVector xvals) {
tbb::concurrent_vector< std::pair<double, double> > output;
Example ex(xvals,output);
parallelFor(0, xvals.size(), ex);
NumericVector xout(output.size());
NumericVector yout(output.size());
for(int i=0; i<output.size(); i++) {
xout[i] = output[i].first;
yout[i] = output[i].second;
}
List L = List::create(xout, yout);
return(L);
}
Output:
> find_values(seq(-10,10, by=0.5))
[[1]]
[1] 0.5
[[2]]
[1] -0.25

Inconsistent results between Rcpp and R code

UPDATE
Previous example is complicated, hence please allow me to use a simpler example as shown below:
Here is the Rcpp code:
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
#include <Rmath.h>
#include <Rcpp.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp ;
using namespace arma;
using namespace std;
// [[Rcpp::export]]
double chooseC(double n, double k) {
return Rf_choose(n, k);
}
// [[Rcpp::export]]
double function3(double n, double m, double beta) {
double prob;
NumericVector k(m);
NumericVector k_vec(m);
if(n<m){prob=0;}
else{
if(chooseC(n,m)==R_PosInf){
k=seq_len(m)-1;
k_vec= (n-k)/(m-k)*std::pow((1-beta),(n-m)/m)*beta;
prob=std::accumulate(k_vec.begin(),k_vec.end(), 1, std::multiplies<double>())*beta;
}
else{
prob = beta * chooseC(n,m) * std::pow(beta,m) * std::pow((1-beta),(n-m));
}
}
return(prob);
}
Here is the R code:
function4 <- function ( n , m , beta )
{
if ( n < m )
{
prob <- 0.0
}
else
{
if (is.infinite(choose(n,m))){
k<-0:(m-1)
prob <- beta *prod((n-k)/(m-k)*(1-beta)^((n-m)/m)*beta)
}
else{
prob <- beta * choose(n,m) * beta^m * (1-beta)^(n-m)
}
}
prob
}
Comparison:
input<-619
beta<-0.09187495
x<-seq(0, (input+1)/beta*3)
yy<-sapply(x,function(n)function3(n,input, beta=beta))
yy2<-sapply(x,function(n)function4(n,input, beta=beta))
sum(yy)=0
sum(yy2)=1
However, with other input:
input<-1
beta<-0.08214248
Both results are the same, sum(yy)=sum(yy2)=0.9865887.
I used double in Rcpp code, I don't know what else could cause the inconsistent precision between Rcpp and R code.
Thanks a lot!
I think I fix the Rcpp code, so right now both Rcpp and R code produce the same results when the results are very small values. The solution is shown as below:
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
#include <Rmath.h>
#include <Rcpp.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp ;
using namespace arma;
using namespace std;
// [[Rcpp::export]]
double chooseC(double n, double k) {
return Rf_choose(n, k);
}
// [[Rcpp::export]]
double function3(double n, double m, double beta) {
double prob;
arma::vec k = arma::linspace<vec>(0, m-1, m);
arma::vec k_vec;
if(n<m){prob=0;}
else{
if(chooseC(n,m)==R_PosInf){
k_vec= (n-k)/(m-k)*pow((1-beta),(n-m)/m)*beta;
prob=arma::prod(k_vec)*beta;
}
else{
prob = beta * chooseC(n,m) * pow(beta,m) * pow((1-beta),(n-m));
}
}
return(prob);
}
However, I still do not understand why by writing code in this way will fix the precision inconsistent. Rcpp and RcppArmadillo still look like black boxes to me.

Resources