Convert RcppArmadillo vector to Rcpp vector - r

I am trying to convert RcppArmadillo vector (e.g. arma::colvec) to a Rcpp vector (NumericVector). I know I can first convert arma::colvec to SEXP and then convert SEXP to NumericVector (e.g. as<NumericVector>(wrap(temp)), assuming temp is an arma::colvec object). But what is a good way to do that?
I want to do that simply because I am unsure if it is okay to pass arma::colvec object as a parameter to an Rcpp::Function object.

I was trying to Evaluate a Rcpp::Function with argument arma::vec, it seems that it takes the argument in four forms without compilation errors. That is, if f is a Rcpp::Function and a is a arma::vec, then
f(a)
f(wrap(a))
f(as<NumericVector>(wrap(a)))
f(NumericVector(a.begin(),a.end()))
produce no compilation and runtime errors, at least apparently.
For this reason, I have conducted a little test for the four versions of arguments. Since I suspect that somethings will go wrong in garbage collection, I test them again gctorture.
gctorture(on=FALSE)
Rcpp::sourceCpp(code = '
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
double foo1(arma::vec a, arma::vec b, Function f){
double sum = 0.0;
for(int i=0;i<100;i++){
sum += as<double>(f(a, b));
}
return sum;
}
// [[Rcpp::export]]
double foo2(arma::vec a, arma::vec b, Function f){
double sum = 0.0;
for(int i=0;i<100;i++){
sum += as<double>(f(wrap(a),wrap(b)));
}
return sum;
}
// [[Rcpp::export]]
double foo3(arma::vec a, arma::vec b, Function f){
double sum = 0.0;
for(int i=0;i<100;i++){
sum += as<double>(f(as<NumericVector>(wrap(a)),as<NumericVector>(wrap(b))));
}
return sum;
}
// [[Rcpp::export]]
double foo4(arma::vec a, arma::vec b, Function f){
double sum = 0.0;
for(int i=0;i<100;i++){
sum += as<double>(f(NumericVector(a.begin(),a.end()),NumericVector(b.begin(),b.end())));
}
return sum;
}
')
# note that when gctorture is on, the program will be very slow as it
# tries to perfrom GC for every allocation.
# gctorture(on=TRUE)
f = function(x,y) {
mean(x) + mean(y)
}
# all three functions should return 700
foo1(c(1,2,3), c(4,5,6), f) # error
foo2(c(1,2,3), c(4,5,6), f) # wrong answer (occasionally)!
foo3(c(1,2,3), c(4,5,6), f) # correct answer
foo4(c(1,2,3), c(4,5,6), f) # correct answer
As a result, the first method produces an error, the second method produces a wrong answer and only the third and the fourth method return the correct answer.
> # they should return 700
> foo1(c(1,2,3), c(4,5,6), f) # error
Error: invalid multibyte string at '<80><a1><e2>'
> foo2(c(1,2,3), c(4,5,6), f) # wrong answer (occasionally)!
[1] 712
> foo3(c(1,2,3), c(4,5,6), f) # correct answer
[1] 700
> foo4(c(1,2,3), c(4,5,6), f) # correct answer
[1] 700
Note that, if gctorture is set FALSE, then all functions will return a correct result.
> foo1(c(1,2,3), c(4,5,6), f) # error
[1] 700
> foo2(c(1,2,3), c(4,5,6), f) # wrong answer (occasionally)!
[1] 700
> foo3(c(1,2,3), c(4,5,6), f) # correct answer
[1] 700
> foo4(c(1,2,3), c(4,5,6), f) # correct answer
[1] 700
It means that method 1 and method 2 are subjected to break when garbage is collected during runtime and we don't know when it happens. Thus, it is dangerous to not wrap the parameter properly.
Edit: as of 2017-12-05, all four conversions produce the correct result.
f(a)
f(wrap(a))
f(as<NumericVector>(wrap(a)))
f(NumericVector(a.begin(),a.end()))
and this is the benchmark
> microbenchmark(foo1(c(1,2,3), c(4,5,6), f), foo2(c(1,2,3), c(4,5,6), f), foo
3(c(1,2,3), c(4,5,6), f), foo4(c(1,2,3), c(4,5,6), f))
Unit: milliseconds
expr min lq mean median uq
foo1(c(1, 2, 3), c(4, 5, 6), f) 2.575459 2.694297 2.905398 2.734009 2.921552
foo2(c(1, 2, 3), c(4, 5, 6), f) 2.574565 2.677380 2.880511 2.731615 2.847573
foo3(c(1, 2, 3), c(4, 5, 6), f) 2.582574 2.701779 2.862598 2.753256 2.875745
foo4(c(1, 2, 3), c(4, 5, 6), f) 2.378309 2.469361 2.675188 2.538140 2.695720
max neval
4.186352 100
5.336418 100
4.611379 100
3.734019 100
And f(NumericVector(a.begin(),a.end())) is marginally faster than other methods.

This should works with arma::vec, arma::rowvec and arma::colvec:
template <typename T>
Rcpp::NumericVector arma2vec(const T& x) {
return Rcpp::NumericVector(x.begin(), x.end());
}

I had the same question. I used wrap to do the conversion at the core of several layers of for loops and it was very slow. I think the wrap function is to blame for dragging the speed down so I wish to know if there is an elegant way to do this.
As for Raymond's question, you might want to try including the namespace like: Rcpp::as<Rcpp::NumericVector>(wrap(A)) instead or include a line using namespace Rcpp; at the beginning of your code.

Related

Is there any way in which to make an Infix function using sourceCpp()

I was wondering whether it is possible to make an infix function, e.g. A %o% B with Rcpp.
I know that this is possible using the inline package, but have yet been able to find a method for doing this when using sourceCpp().
I have made the following infix implementation of %o% / outer() when arguments are sure to be vectors using RcppEigen and inline:
`%op%` <- cxxfunction(signature(v1="NumericVector",
v2="NumericVector"),
plugin = "RcppEigen",
body = c("
NumericVector xx(v1);
NumericVector yy(v2);
const Eigen::Map<Eigen::VectorXd> x(as<Eigen::Map<Eigen::VectorXd> >(xx));
const Eigen::Map<Eigen::VectorXd> y(as<Eigen::Map<Eigen::VectorXd> >(yy));
Eigen::MatrixXd op = x * y.transpose();
return Rcpp::wrap(op);
"))
This can easily be implemented in to be imported using sourceCpp(), however not as an infix function.
My current attempt is as follows:
#include <Rcpp.h>
using namespace Rcpp;
#include <RcppEigen.h>
// [[Rcpp::depends(RcppEigen)]]
// [[Rcpp::export]]
NumericMatrix outerProd(NumericVector v1, NumericVector v2) {
NumericVector xx(v1);
NumericVector yy(v2);
const Eigen::Map<Eigen::VectorXd> x(as<Eigen::Map<Eigen::VectorXd> >(xx));
const Eigen::Map<Eigen::VectorXd> y(as<Eigen::Map<Eigen::VectorXd> >(yy));
Eigen::MatrixXd op = x * y.transpose();
return Rcpp::wrap(op);
}
So to summarize my question.. Is it possible to make an infix function available through sourceCpp?
Is it possible to make an infix function available through sourceCpp?
Yes.
As always, one should read the Rcpp vignettes!
In particular here, if you look in Section 1.6 of the Rcpp attributes vignette, you'd see you can modify the name of a function using the name parameter for Rcpp::export.
For example, we could do:
#include <Rcpp.h>
// [[Rcpp::export(name = `%+%`)]]
Rcpp::NumericVector add(Rcpp::NumericVector x, Rcpp::NumericVector y) {
return x + y;
}
/*** R
1:3 %+% 4:6
*/
Then we'd get:
Rcpp::sourceCpp("~/infix-test.cpp")
> 1:3 %+% 4:6
[1] 5 7 9
So, you still have to name C++ functions valid C++ names in the code, but you can export it to R through the name parameter of Rcpp::export without having to do anything further on the R side.
John Chambers states three principles on page four of the (highly recommended) "Extending R" book:
Everything that exists in R is an object.
Everything that happens in R is a function call.
Interfaces to other software are part of R.
So per point two, you can of course use sourceCpp() to create your a compiled function and hang that at any odd infix operator you like.
Code Example
library(Rcpp)
cppFunction("std::string cc(std::string a, std::string b) { return a+b; }")
`%+%` <- function(a,b) cc(a,b)
cc("Hello", "World")
"hello" %+% "world"
Output
R> library(Rcpp)
R> cppFunction("std::string cc(std::string a, std::string b) { return a+b; }")
R> `%+%` <- function(a,b) cc(a,b)
R>
R> cc("Hello", "World")
[1] "HelloWorld"
R>
R> "hello" %+% "world"
[1] "helloworld"
R>
Summary
Rcpp is really just one cog in the machinery.
Edit
It also works with your initial function, with some minor simplification. For
`%op%` <- cppFunction("Eigen::MatrixXd op(Eigen::VectorXd x, Eigen::VectorXd y) { Eigen::MatrixXd op = x * y.transpose(); return op; }", depends="RcppEigen")
as.numeric(1:3) %op% as.numeric(3:1)
we get
R> `%op%` <- cppFunction("Eigen::MatrixXd op(Eigen::VectorXd x, Eigen::VectorXd y) { Eigen::MatrixXd op = x * y.transpose(); return op; }", depends="RcppEigen")
R> as.numeric(1:3) %op% as.numeric(3:1)
[,1] [,2] [,3]
[1,] 3 2 1
[2,] 6 4 2
[3,] 9 6 3
R>
(modulo some line noise from the compiler).

Rcpp Function slower than Rf_eval

I have been working on a package that uses Rcpp to apply arbitrary R code over a group of large medical imaging files. I noticed that my Rcpp implementation is considerably slower than the original pure C version. I traced the difference to calling a function via Function, vs the original Rf_eval. My question is why is there a close to 4x performance degradation, and is there a way to speed up the function call to be closer in performance to Rf_eval?
Example:
library(Rcpp)
library(inline)
library(microbenchmark)
cpp_fun1 <-
'
Rcpp::List lots_of_calls(Function fun, NumericVector vec){
Rcpp::List output(1000);
for(int i = 0; i < 1000; ++i){
output[i] = fun(NumericVector(vec));
}
return output;
}
'
cpp_fun2 <-
'
Rcpp::List lots_of_calls2(SEXP fun, SEXP env){
Rcpp::List output(1000);
for(int i = 0; i < 1000; ++i){
output[i] = Rf_eval(fun, env);
}
return output;
}
'
lots_of_calls <- cppFunction(cpp_fun1)
lots_of_calls2 <- cppFunction(cpp_fun2)
microbenchmark(lots_of_calls(mean, 1:1000),
lots_of_calls2(quote(mean(1:1000)), .GlobalEnv))
Results
Unit: milliseconds
expr min lq mean median uq max neval
lots_of_calls(mean, 1:1000) 38.23032 38.80177 40.84901 39.29197 41.62786 54.07380 100
lots_of_calls2(quote(mean(1:1000)), .GlobalEnv) 10.53133 10.71938 11.08735 10.83436 11.03759 18.08466 100
Rcpp is great because it makes things look absurdly clean to the programmer. The cleanliness has a cost in the form of templated responses and a set of assumptions that weigh down the execution time. But, such is the case with a generalized vs. specific code setup.
Take for instance the call route for an Rcpp::Function. The initial construction and then outside call to a modified version of Rf_reval requires a special Rcpp specific eval function given in Rcpp_eval.h. In turn, this function is wrapped in protections to protect against a function error when calling into R via a Shield associated with it. And so on...
In comparison, Rf_eval has neither. If it fails, you will be up the creek without a paddle. (Unless, of course, you implement error catching via R_tryEval for it.)
With this being said, the best way to speed up the calculation is to simply write everything necessary for the computation in C++.
Besides the points made by #coatless, you aren't even comparing apples with apples. Your Rf_eval example does not pass the vector to the function, and, more importantly, plays tricks on the function via quote().
In short, it is all a little silly.
Below is a more complete example using the sugar function mean().
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
List callFun(Function fun, NumericVector vec) {
List output(1000);
for(int i = 0; i < 1000; ++i){
output[i] = fun(NumericVector(vec));
}
return output;
}
// [[Rcpp::export]]
List callRfEval(SEXP fun, SEXP env){
List output(1000);
for(int i = 0; i < 1000; ++i){
output[i] = Rf_eval(fun, env);
}
return output;
}
// [[Rcpp::export]]
List callSugar(NumericVector vec) {
List output(1000);
for(int i = 0; i < 1000; ++i){
double d = mean(vec);
output[i] = d;
}
return output;
}
/*** R
library(microbenchmark)
microbenchmark(callFun(mean, 1:1000),
callRfEval(quote(mean(1:1000)), .GlobalEnv),
callSugar(1:1000))
*/
You can just sourceCpp() this:
R> sourceCpp("/tmp/ch.cpp")
R> library(microbenchmark)
R> microbenchmark(callFun(mean, 1:1000),
+ callRfEval(quote(mean(1:1000)), .GlobalEnv),
+ callSugar(1:1000))
Unit: milliseconds
expr min lq mean median uq max neval
callFun(mean, 1:1000) 14.87451 15.54385 18.57635 17.78990 18.29127 114.77153 100
callRfEval(quote(mean(1:1000)), .GlobalEnv) 3.35954 3.57554 3.97380 3.75122 4.16450 6.29339 100
callSugar(1:1000) 1.50061 1.50827 1.62204 1.51518 1.76683 1.84513 100
R>

Remove NA values efficiently

I need to remove NA values efficiently from vectors inside a function which is implemented with RcppEigen. I can of course do it using a for loop, but I wonder if there is a more efficient way.
Here is an example:
library(RcppEigen)
library(inline)
incl <- '
using Eigen::Map;
using Eigen::VectorXd;
typedef Map<VectorXd> MapVecd;
'
body <- '
const MapVecd x(as<MapVecd>(xx)), y(as<MapVecd>(yy));
VectorXd x1(x), y1(y);
int k(0);
for (int i = 0; i < x.rows(); ++i) {
if (x.coeff(i)==x.coeff(i) && y.coeff(i)==y.coeff(i)) {
x1(k) = x.coeff(i);
y1(k) = y.coeff(i);
k++;
};
};
x1.conservativeResize(k);
y1.conservativeResize(k);
return Rcpp::List::create(Rcpp::Named("x") = x1,
Rcpp::Named("y") = y1);
'
na.omit.cpp <- cxxfunction(signature(xx = "Vector", yy= "Vector"),
body, "RcppEigen", incl)
na.omit.cpp(c(1.5, NaN, 7, NA), c(7.0, 1, NA, 3))
#$x
#[1] 1.5
#
#$y
#[1] 7
In my use case I need to do this about one million times in a loop (inside the Rcpp function) and the vectors could be quite long (let's assume 1000 elements).
PS: I've also investigated the route to find all NA/NaN values using x.array()==x.array(), but was unable to find a way to use the result for subsetting with Eigen.
Perhaps I am not understanding the question correctly, but within Rcpp, I don't see how you could possibly do this more efficiently than a for loop. for loops are generally inefficient in R only because iterating through a loop in R requires a lot of heavy interpreted machinery. But this is not the case once you are down at the C++ level. Even natively vectorized R functions ultimately are implemented with for loops in C. So the only way I can think to make this more efficient is to try to do it in parallel.
For example, here's a simple na.omit.cpp function that omits NA values from a single vector:
rcppfun<-"
Rcpp::NumericVector naomit(Rcpp::NumericVector x){
std::vector<double> r(x.size());
int k=0;
for (int i = 0; i < x.size(); ++i) {
if (x[i]==x[i]) {
r[k] = x[i];
k++;
}
}
r.resize(k);
return Rcpp::wrap(r);
}"
na.omit.cpp<-cppFunction(rcppfun)
This runs even more quickly than R's built in na.omit:
> set.seed(123)
> x<-1:10000
> x[sample(10000,1000)]<-NA
> y1<-na.omit(x)
> y2<-na.omit.cpp(x)
> all(y1==y2)
[1] TRUE
> require(microbenchmark)
> microbenchmark(na.omit(x),na.omit.cpp(x))
Unit: microseconds
expr min lq median uq max neval
na.omit(x) 290.157 363.9935 376.4400 401.750 6547.447 100
na.omit.cpp(x) 107.524 168.1955 173.6035 210.524 222.564 100
I do not know if I understand the problem correctly or not but you can use the following arguments:
a = c(1.5, NaN, 7, NA)
a[-which(is.na(a))]
[1] 1.5 7.0
It might be useful to use `rinside' if you want to use it in C++.

Passing a `data.table` to c++ functions using `Rcpp` and/or `RcppArmadillo`

Is there a way to pass a data.table objects to c++ functions using Rcpp and/or RcppArmadillo without manually transforming to data.table to a data.frame? In the example below test_rcpp(X2) and test_arma(X2) both fail with c++ exception (unknown reason).
R code
X=data.frame(c(1:100),c(1:100))
X2=data.table(X)
test_rcpp(X)
test_rcpp(X2)
test_arma(X)
test_arma(X2)
c++ functions
NumericMatrix test_rcpp(NumericMatrix X) {
return(X);
}
mat test_arma(mat X) {
return(X);
}
Building on top of other answers, here is some example code:
#include <Rcpp.h>
using namespace Rcpp ;
// [[Rcpp::export]]
double do_stuff_with_a_data_table(DataFrame df){
CharacterVector x = df["x"] ;
NumericVector y = df["y"] ;
IntegerVector z = df["v"] ;
/* do whatever with x, y, v */
double res = sum(y) ;
return res ;
}
So, as Matthew says, this treats the data.table as a data.frame (aka a Rcpp::DataFrame in Rcpp).
require(data.table)
DT <- data.table(
x=rep(c("a","b","c"),each=3),
y=c(1,3,6),
v=1:9)
do_stuff_with_a_data_table( DT )
# [1] 30
This completely ignores the internals of the data.table.
Try passing the data.table as a DataFrame rather than NumericMatrix. It is a data.frame anyway, with the same structure, so you shouldn't need to convert it.
Rcpp sits on top of native R types encoded as SEXP. This includes eg data.frame or matrix.
data.table is not native, it is an add-on. So someone who wants this (you?) has to write a converter, or provide funding for someone else to write one.
For reference, I think the good thing is to output a list from rcpp as data.table allow update via lists.
Here is a dummy example:
cCode <-
'
DataFrame DT(DTi);
NumericVector x = DT["x"];
int N = x.size();
LogicalVector b(N);
NumericVector d(N);
for(int i=0; i<N; i++){
b[i] = x[i]<=4;
d[i] = x[i]+1.;
}
return Rcpp::List::create(Rcpp::Named("b") = b, Rcpp::Named("d") = d);
';
require("data.table");
require("rcpp");
require("inline");
DT <- data.table(x=1:9,y=sample(letters,9)) #declare a data.table
modDataTable <- cxxfunction(signature(DTi="data.frame"), plugin="Rcpp", body=cCode)
DT_add <- modDataTable(DT) #here we get the list
DT[, names(DT_add):=DT_add] #here we update by reference the data.table

Efficient subsetting in Rcpp (equivalent of the R "which" command)

In Rcpp, there are various "Rcpp sugar" commands that permit nice vectorised operations in the code. In the code below I move across a data frame, break it into vectors, then use the "ifelse" and "sum" sugar commands to compute the mean of v over the rows where x equals either y or y+1. All seems to work correctly.
Just wondering if there is a neater way than this - e.g. an equivalent of the "which" command that gives index points satisfying a particular condition? There seems to be a facility available as "find" in Armadillo but that means using incompatible object types (you can't use "find" and "ifelse" together).
On the same topic, is it possible to get "ifelse" to accept a compound logical condition? In the example below, for instance, the definition of indic is formed of two "ifelse" commands, and it would obviously be cleaner as one. Any thoughts would be much appreciated.
Look forward to hearing your responses :)
require(Rcpp)
require(inline)
set.seed(42)
df = data.frame(x = rpois(1000,3), y = rpois(1000,3), v = rnorm(1000),
stringsAsFactors=FALSE)
myfunc1 = cxxfunction(
signature(DF = "data.frame"),
plugin = "Rcpp",
body = '
using namespace Rcpp;
DataFrame df(DF);
IntegerVector x = df["x"];
IntegerVector y = df["y"];
NumericVector v = df["v"];
LogicalVector indic = ifelse(x==y,true,ifelse(x==y+1,true,false));
double subsum = sum(ifelse(indic,v,0));
int subsize = sum(indic);
double mn = ((subsize>0) ? subsum/subsize : 0.0);
return(Rcpp::List::create(_["subsize"] = subsize,
_["submean"] = mn
));
'
)
myfunc1(df)
### OUTPUT:
#
# $subsize
# [1] 300
#
# $submean
# [1] 0.1091555
#
Rcpp (>= 0.10.0) implements the | operator between two logical sugar expressions. So you can do:
require( Rcpp )
cppFunction( code = '
List subsum( IntegerVector x, IntegerVector y, NumericVector v){
using namespace Rcpp ;
LogicalVector indic = (x==y) | (x==y+1) ;
int subsize = sum(indic) ;
double submean = subsize == 0 ? 0.0 : sum(ifelse(indic,v,0)) / subsize ;
return List::create( _["subsize"] = subsize, _["submean"] = submean ) ;
}
' )
subsum( rpois(1000,3), rpois(1000,3), rnorm(1000) )
# $subsize
# [1] 320
#
# $submean
# [1] -0.05708866

Resources