sparse matrix - sparse matrix multiplication algo. performance improvement in R - r

My mac's R is linked with openblas. When I look at the "% CPU" usage while performing the sparse-sparse multiplication in R or in Rcpp using Armadillo, it doesn't seem like multithreading is being used unlike the dense-dense multiplication. Speed-wise, the single thread sparse-sparse multiplication in R or Armadillo seems slower than Matlab as well.
To address this issue, I have implemented FG Gustavson's algorithm (https://dl.acm.org/citation.cfm?id=355796) for performing sparse-sparse matrix multiplication in Rcpp using Armadillo's spMat container.
I can see an improvement (please see below) if I ignore ordering of the rows which is direct implementation of the algorithm, however the standard ordering makes it slower than R's (edited as per mtall's comment). I am not an expert in Rcpp/RcppArmadillo/C++ and I am looking for help in two specific things:
Programmatically how can I make the sp_sp_gc_ord function more efficient and faster based on single thread application?
My lame attempt at multithreading sp_sp_gc_ord with openmp is causing R to crash. I have commented out the omp commands below. I have looked at Rcpp gallery discussions on OpenMP http://gallery.rcpp.org/tags/openmp/ but couldn't figure out the problem
I would appreciate any help. Below here is a reproducible example of the code and corresponding microbenchmark:
#### Rcpp functions
#include <RcppArmadillo.h>
#include<omp.h>
#include<Rcpp.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::plugins(openmp)]]
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
sp_mat sp_sp_gc_ord(const arma::sp_mat &A, const arma::sp_mat &B, double p){
// This function evaluates A * B where both A & B are sparse and the resultant
// product is also sparse
// define matrix sizes
const int mA= A.n_rows;
const int nB= B.n_cols;
// number of non-zeros in the resultant matrix
const int nnzC = ceil(mA * nB * p);
// initialize colptr, row_index and value vectors for the resultant sparse matrix
urowvec colptrC(nB+1);
colptrC.zeros();
uvec rowvalC(nnzC);
rowvalC.zeros();
colvec nzvalC(nnzC);
//setenv("OMP_STACKSIZE","500M",1);
// counters and other variables
unsigned int i, jp, j, kp, k, vp;
unsigned int ip = 0;
double nzB, nzA;
ivec xb(mA);
xb.fill(-1);
vec x(mA);
// loop logic: outer loop over columns of B and inner loop over columns of A and then aggregate
// #pragma omp parallel for shared(colptrC,rowvalC,nzvalC,x,xb,ip,A,B) private(j,nzA,nzB,kp,i,jp,kp,k,vp) default(none) schedule(auto)
for(i=0; i< nB; i++) {
colptrC.at(i) = ip;
for ( jp = B.col_ptrs[i]; jp < B.col_ptrs[i+1]; jp++) {
j = B.row_indices[jp];
nzB = B.values[jp];
for ( kp = A.col_ptrs[j]; kp < A.col_ptrs[j+1]; kp++ ){
k = A.row_indices[kp];
nzA = A.values[kp];
if (xb.at(k) != i){
rowvalC.at(ip) = k;
ip +=1;
// Rcpp::print(wrap(ip));
xb.at(k) = i;
x.at(k) = nzA * nzB;
} else {
x.at(k) += nzA * nzB;
}
}
}
// put in the value vector of resultant matrix
if(ip>0){
for ( vp= colptrC.at(i); vp <= (ip-1); vp++ ) {
nzvalC.at(vp) = x(rowvalC.at(vp));
}
}
}
// resize and put in the spMat container
colptrC.at(nB) = ip;
sp_mat C(rowvalC.subvec(0,(ip-1)),colptrC,nzvalC.subvec(0,(ip-1)),mA,nB);
// Gustavson's algorithm produces unordered rows for each column: a standard way to address this is: (X.t()).t()
return (C.t()).t();
}
// [[Rcpp::export]]
sp_mat sp_sp_arma(const sp_mat &A, const sp_mat &B){
return A * B;
}
// [[Rcpp::export]]
mat dense_dense_arma(const mat &A, const mat &B){
return A * B;
}
#### End
The corresponding microbenchmark part in R:
#### Microbenchmark
library(Matrix)
library(microbenchmark)
## define two matrices
m<- 1000
n<- 6000
p<- 2000
A<- matrix(runif(m*n),m,n)
B<- matrix(runif(n*p),n,p)
A[abs(A)> .01] = B[abs(B)> .01] = 0
A <- as(A,'dgCMatrix')
B<- as(B,'dgCMatrix')
Adense<- as.matrix(A)
Bdense<- as.matrix(B)
## sp_sp_gc is the function without ordering
microbenchmark(sp_sp_gc(A,B,.5),sp_sp_arma(A,B),A%*%B,
dense_dense_arma(Adense,Bdense),Adense %*% Bdense,Adense %*% B, times=100)
Unit: milliseconds
expr min lq mean median uq max neval
sp_sp_gc(A, B, 0.5) 16.09809 21.75001 25.76436 24.44657 26.96300 99.30778 100
sp_sp_gc_ord(A, B, 0.5) 36.78781 44.64558 49.82102 47.64348 51.87361 116.85013 100
sp_sp_arma(A, B) 47.45203 52.77132 59.37077 59.24010 62.41710 86.15647 100
A %*% B 23.64307 28.99649 32.88566 32.10017 35.21816 59.16251 100
dense_dense_arma(Adense, Bdense) 286.22358 302.95170 345.66766 317.75786 340.50143 862.15116 100
Adense %*% Bdense 292.32099 317.10795 342.48345 329.80950 342.21333 697.56468 100
Adense %*% B 167.87248 186.63499 219.11872 195.19197 212.50286 843.17172 100
####
sessionInfo():
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /usr/local/Cellar/openblas/0.3.3/lib/libopenblas_haswellp-r0.3.3.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Matrix_1.2-14 RcppArmadillo_0.8.500.0 Rcpp_0.12.18
loaded via a namespace (and not attached):
[1] compiler_3.5.1 grid_3.5.1 lattice_0.20-35
Rcpp and RcppArmadillo are installed from source after installing clang4 for mac following coatless's link https://github.com/coatless/r-macos-rtools

Related

Speeding up Rcpp `anyNA` equivalent

This question is related to this old question and this old question.
R has the nice wrapper-ish function anyNA for quicker evaluation of any(is.na(x)). When working in Rcpp a similar minimal implementation could be given by:
// CharacterVector example
#include <Rcpp.h>
using namespace Rcpp;
template<typename T, typename S>
bool any_na(S x){
T xx = as<T>(x);
for(auto i : xx){
if(T::is_na(i))
return true;
}
return false;
}
// [[Rcpp::export(rng = false)]]
LogicalVector any_na(SEXP x){
return any_na<CharacterVector>(x);
}
// [[Rcpp::export(rng = false)]]
SEXP overhead(SEXP x){
CharacterVector xx = as<CharacterVector>(x);
return wrap(xx);
}
/***R
library(microbenchmark)
vec <- sample(letters, 1e6, TRUE)
vec[1e6] <- NA_character_
any_na(vec)
# [1] TRUE
*/
But comparing the performance of this to anyNA I was surprised by the benchmark below
library(microbenchmark)
microbenchmark(
Rcpp = any_na(vec),
R = anyNA(vec),
overhead = overhead(vec),
unit = "ms"
)
Unit: milliseconds
expr min lq mean median uq max neval cld
Rcpp 2.647901 2.8059500 3.243573 3.0435010 3.675051 5.899100 100 c
R 0.800300 0.8151005 0.952301 0.8577015 0.961201 3.467402 100 b
overhead 0.001300 0.0029010 0.011388 0.0122510 0.015751 0.048401 100 a
where the last line is the "overhead" incurred from converting back and forth from SEXP to CharacterVector (turns out to be negligible). As immediately evident the Rcpp version is roughly ~3.5 times slower than the R version. I was curious so I checked up on the source for Rcpp's is_na and finding no obvious reasons for the slow performance I continued to check the source for anyNA for R's own character vectors's and reimplementing the function using R's C API thinking to speed up this
// Added after SEXP overhead(SEXP x){ --- }
inline bool anyNA2(SEXP x){
R_xlen_t n = Rf_length(x);
for(R_xlen_t i = 0; i < n; i++){
if(STRING_ELT(x, i) == NA_STRING)
return true;
}
return false;
}
// [[Rcpp::export(rng = false)]]
SEXP any_na2(SEXP x){
bool xx = anyNA2(x);
return wrap(xx);
}
// [[Rcpp::export(rng = false)]]
SEXP any_na3(SEXP x){
Function anyNA("anyNA");
return anyNA(x);
}
/***R
microbenchmark(
Rcpp = any_na(vec),
R = anyNA(vec),
R_C_api = any_na2(vec),
Rcpp_Function = any_na3(vec),
overhead = overhead(vec),
unit = "ms"
)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# Rcpp 2.654901 2.8650515 3.54936501 3.2392510 3.997901 8.074201 100 d
# R 0.803701 0.8303015 1.01017200 0.9400015 1.061751 2.019902 100 b
# R_C_api 2.336402 2.4536510 3.01576302 2.7220010 3.314951 6.905101 100 c
# Rcpp_Function 0.844001 0.8862510 1.09259990 0.9597505 1.120701 3.011801 100 b
# overhead 0.001500 0.0071005 0.01459391 0.0146510 0.017651 0.101401 100 a
*/
Note that I've included a simple wrapper calling anyNA through Rcpp::Function as well. Once again this implementation of anyNA is not just a little but alot slower than the base implementation.
So the question becomes 2 fold:
Why is the Rcpp so much slower?
Derived from 1: How could this be "changed" to speed up the code?
The questions themselves are not very interesting in itself, but it is interesting if this is affecting multiple parts of Rcpp implementations that may in aggregate gain significant performance boosts.
SessonInfo()
sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_Denmark.1252 LC_CTYPE=English_Denmark.1252 LC_MONETARY=English_Denmark.1252 LC_NUMERIC=C LC_TIME=English_Denmark.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] microbenchmark_1.4-7 cmdline.arguments_0.0.1 glue_1.4.2 R6_2.5.0 Rcpp_1.0.6
loaded via a namespace (and not attached):
[1] codetools_0.2-18 lattice_0.20-41 mvtnorm_1.1-1 zoo_1.8-8 MASS_7.3-53 grid_4.0.3 multcomp_1.4-15 Matrix_1.2-18 sandwich_3.0-0 splines_4.0.3
[11] TH.data_1.0-10 tools_4.0.3 survival_3.2-7 compiler_4.0.3
Edit (Not only a windows problem):
I wanted to make sure this is not a "Windows problem" so I went through and executed the problem within a Docker container running linux. The result is shown below and is very similar
# Unit: milliseconds
# expr min lq mean median uq max neval
# Rcpp 2.3399 2.62155 4.093380 3.12495 3.92155 26.2088 100
# R 0.7635 0.84415 1.459659 1.10350 1.42145 12.1148 100
# R_C_api 2.3358 2.56500 3.833955 3.11075 3.65925 14.2267 100
# Rcpp_Function 0.8163 0.96595 1.574403 1.27335 1.56730 11.9240 100
# overhead 0.0009 0.00530 0.013330 0.01195 0.01660 0.0824 100
Session info:
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS
Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-openmp/libopenblasp-r0.3.8.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] microbenchmark_1.4-7 Rcpp_1.0.5
loaded via a namespace (and not attached):
[1] compiler_4.0.2 tools_4.0.2
This is an interesting question, but the answer is pretty simple: there are two versions of STRING_ELT one used internally by R or if you set the USE_RINTERNALS macro in Rinlinedfuns.h and one for plebs in memory.c.
Comparing the two versions, you can see that the pleb version has more checks, which fully accounts for the difference in speed.
If you really want speed and don't care about safety, you can usually beat R by at least a little bit.
// [[Rcpp::export(rng = false)]]
bool any_na_unsafe(SEXP x) {
SEXP* ptr = STRING_PTR(x);
R_xlen_t n = Rf_xlength(x);
for(R_xlen_t i=0; i<n; ++i) {
if(ptr[i] == NA_STRING) return true;
}
return false;
}
Bench:
> microbenchmark(
+ R = anyNA(vec),
+ R_C_api = any_na2(vec),
+ unsafe = any_na_unsafe(vec),
+ unit = "ms"
+ )
Unit: milliseconds
expr min lq mean median uq max neval
R 0.5058 0.52830 0.553696 0.54000 0.55465 0.7758 100
R_C_api 1.9990 2.05170 2.214136 2.06695 2.10220 12.2183 100
unsafe 0.3170 0.33135 0.369585 0.35270 0.37730 1.2856 100
Although as written this is unsafe, if you add a few checks before the loop in the beginning it'd be fine.
This questions turns out to be a good example of why some people rail and rant against microbenchmarks.
Baseline is a built-in primitive
The function that is supposed to be beat here is actually a primitive so that makes it a little tricky already
> anyNA
function (x, recursive = FALSE) .Primitive("anyNA")
>
ALTREP puts a performance floor down
Next, a little experiment shows that the baseline function anyNA() never loops. We define a very short vector srt and a long vector lng, both contain a NA value. Turns out ... R is optimised via ALTREP keeping a matching bit in the data structure headers and the cost of checking is independent of length:
> srt <- c("A",NA_character_); lng <- c(rep("A", 1e6), NA_character_)
> microbenchmark(short=function(srt) { anyNA(srt) },
+ long=function(lng) { anyNA(lng) }, times=1000)
Unit: nanoseconds
expr min lq mean median uq max neval cld
short 48 50 69.324 51 53 5293 1000 a
long 48 50 92.166 51 52 15494 1000 a
>
Note the units here (nanoseconds) and time spent. We are measuring looking at single bit.
(Edit: Scrab that. Thinko of mine in a rush, see comments.)
Rcpp functions have some small overhead
This is not new and documented. If you look at the code generated by Rcpp Attributes, conveniently giving us an R function of the same name of the C++ function we designate you see that at least one other function call is involved. Plus a baked-in try/catch layer, RNG setting (here turned off) and so on. That cannot be zero, and if amortized against anything reasonable it does neither matter not show up in measurements.
Here, however, the exercise was set up to match a primitive function looking at one bit. It's a race one cannot win. So here is my final table
> microbenchmark(anyNA = anyNA(vec), Rcpp_plain = rcpp_c_api(vec),
+ Rcpp_tmpl = rcpp_any_na(vec), Rcpp_altrep = rcpp_altrep(vec),
+ times = .... [TRUNCATED]
Unit: microseconds
expr min lq mean median uq max neval cld
anyNA 643.993 658.43 827.773 700.729 819.78 6280.85 5000 a
Rcpp_plain 1916.188 1952.55 2168.708 2022.017 2191.64 8506.71 5000 d
Rcpp_tmpl 1709.380 1743.04 1933.043 1798.788 1947.83 8176.10 5000 c
Rcpp_altrep 1501.148 1533.88 1741.465 1590.572 1744.74 10584.93 5000 b
It contains the primitive R function, the original (templated) C++ function which looks pretty good still, something using Rcpp (and its small overhead) with just C API use (plus the automatic wrappers in/out) a little slower -- and then for comparison a function from Michel's checkmate package which does look at the ALTREP bit. And it is barely faster.
So really what we are looking at here is overhead from function calls getting in the way of measurning a micro-operations. So no, Rcpp cannot be made faster than a highly optimised primitive. The question looked interesting, but was, at the end of the day, somewhat ill-posed. Sometimes it is worth working through that.
My code version follows below.
// CharacterVector example
#include <Rcpp.h>
using namespace Rcpp;
template<typename T, typename S>
bool any_na(S x){
T xx = as<T>(x);
for (auto i : xx){
if (T::is_na(i))
return true;
}
return false;
}
// [[Rcpp::export(rng = false)]]
LogicalVector rcpp_any_na(SEXP x){
return any_na<CharacterVector>(x);
}
// [[Rcpp::export(rng = false)]]
SEXP overhead(SEXP x){
CharacterVector xx = as<CharacterVector>(x);
return wrap(xx);
}
// [[Rcpp::export(rng = false)]]
bool rcpp_c_api(SEXP x) {
R_xlen_t n = Rf_length(x);
for (R_xlen_t i = 0; i < n; i++) {
if(STRING_ELT(x, i) == NA_STRING)
return true;
}
return false;
}
// [[Rcpp::export(rng = false)]]
SEXP any_na3(SEXP x){
Function anyNA("anyNA");
return anyNA(x);
}
// courtesy of the checkmate package
// [[Rcpp::export(rng=false)]]
R_xlen_t rcpp_altrep(SEXP x) {
#if defined(R_VERSION) && R_VERSION >= R_Version(3, 5, 0)
if (STRING_NO_NA(x))
return 0;
#endif
const R_xlen_t nx = Rf_xlength(x);
for (R_xlen_t i = 0; i < nx; i++) {
if (STRING_ELT(x, i) == NA_STRING)
return i + 1;
}
return 0;
}
/***R
library(microbenchmark)
srt <- c("A",NA_character_)
lng <- c(rep("A", 1e6), NA_character_)
microbenchmark(short = function(srt) { anyNA(srt) },
long = function(lng) { anyNA(lng) },
times=1000)
N <- 1e6
vec <- sample(letters, N, TRUE)
vec[N] <- NA_character_
anyNA(vec) # to check
microbenchmark(
anyNA = anyNA(vec),
Rcpp_plain = rcpp_c_api(vec),
Rcpp_tmpl = rcpp_any_na(vec),
Rcpp_altrep = rcpp_altrep(vec),
#Rcpp_Function = any_na3(vec),
#overhead = overhead(vec),
times = 5000
# unit="relative"
)
*/

Why is my Rcpp function crashing when using a large input matrix?

I made a simple Rcpp fucntion to calculate all pearson correlation coefficients that can be computed from all row combinations of an input matrix E. The results are stored with 4 decimals of precision (in intger format) in a vector v. The function works fine if the dimensions of E aren't too large but just crashes when I test with a data size similar to that of the real data that I want to process with the function.
Here is the Rccp code:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
void pearson(NumericMatrix E, IntegerVector v){
int rows = E.nrow();
int cols = E.ncol();
int j, irow, jrow;
double rowsum;
NumericVector means(rows);
int k = 0;
double cov, varx, vary;
double pearson;
for(irow = 0; irow < rows; irow++){
rowsum = 0;
for(j = 0; j < cols; j++){
rowsum += E(irow, j);
}
means[irow] = rowsum / cols;
}
for(irow = 0; irow < rows - 1; irow++){
for(jrow = irow + 1; jrow < rows; jrow++){
cov = 0;
varx = 0;
vary = 0;
for(j = 0; j < cols; j++) {
cov += (E(irow, j) - means[irow]) * (E(jrow, j) - means[jrow]);
varx += std::pow(E(irow, j) - means[irow], 2);
vary += std::pow(E(jrow, j) - means[jrow], 2);
}
pearson = cov / std::sqrt(varx * vary);
v[k] = (int) (pearson * 10000);
k++;
}
}
}
And then for testing it in R I started with the following:
library(Rcpp)
sourceCpp("pearson.cpp")
testin <- matrix(rnorm(1000 * 1100), nrow = 1000, ncol = 1100)
testout <- integer( (nrow(testin) * (nrow(testin) - 1)) / 2 )
pearson(testin, testout) # success!
However when increasing input size the R session crashes after executing the last line in the following script:
library(Rcpp)
sourceCpp("pearson.cpp")
testin <- matrix(rnorm(16000 * 17000), nrow = 16000, ncol = 17000)
testout <- integer( (nrow(testin) * (nrow(testin) - 1)) / 2 )
pearson(testin, testout) # sad
I feel like this is strange since I'm able to allocate the input and the output just fine before executing the function. Inside the function the output vector is modified by reference. Can't figure out what is wrong. Currently I'm working on machine with 16GB RAM.
EDIT: output of sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=Spanish_Mexico.1252
[2] LC_CTYPE=Spanish_Mexico.1252
[3] LC_MONETARY=Spanish_Mexico.1252
[4] LC_NUMERIC=C
[5] LC_TIME=Spanish_Mexico.1252
attached base packages:
[1] stats graphics grDevices
[4] utils datasets methods
[7] base
other attached packages:
[1] Rcpp_1.0.5
loaded via a namespace (and not attached):
[1] compiler_4.0.4
Just for the sake of giving closure to this question, I tried to run the function just allocating the inputs and not running the actual algorithm as suggested in the comments and it returns just fine. I think in Windows for some reason when the input reaches a certain size the window will dim and say "not responding" next to the R console's window name. However the function is still running as it will eventually finish if left enough time and the R console's window will return to normal. The fact that the process took so long and that the window looked like when Rcpp crashes led me to think the process was not running and that it was some sort of crash.
What I ended up doing is programming a parallel version of the algorithm with the aid of this very helpful tutorial by some of the creators of RcppParallel. Since I cannot afford using the base R cor() function due to memory constraints, making the parallel version suited my needs perfectly.

Is there any way in which to make an Infix function using sourceCpp()

I was wondering whether it is possible to make an infix function, e.g. A %o% B with Rcpp.
I know that this is possible using the inline package, but have yet been able to find a method for doing this when using sourceCpp().
I have made the following infix implementation of %o% / outer() when arguments are sure to be vectors using RcppEigen and inline:
`%op%` <- cxxfunction(signature(v1="NumericVector",
v2="NumericVector"),
plugin = "RcppEigen",
body = c("
NumericVector xx(v1);
NumericVector yy(v2);
const Eigen::Map<Eigen::VectorXd> x(as<Eigen::Map<Eigen::VectorXd> >(xx));
const Eigen::Map<Eigen::VectorXd> y(as<Eigen::Map<Eigen::VectorXd> >(yy));
Eigen::MatrixXd op = x * y.transpose();
return Rcpp::wrap(op);
"))
This can easily be implemented in to be imported using sourceCpp(), however not as an infix function.
My current attempt is as follows:
#include <Rcpp.h>
using namespace Rcpp;
#include <RcppEigen.h>
// [[Rcpp::depends(RcppEigen)]]
// [[Rcpp::export]]
NumericMatrix outerProd(NumericVector v1, NumericVector v2) {
NumericVector xx(v1);
NumericVector yy(v2);
const Eigen::Map<Eigen::VectorXd> x(as<Eigen::Map<Eigen::VectorXd> >(xx));
const Eigen::Map<Eigen::VectorXd> y(as<Eigen::Map<Eigen::VectorXd> >(yy));
Eigen::MatrixXd op = x * y.transpose();
return Rcpp::wrap(op);
}
So to summarize my question.. Is it possible to make an infix function available through sourceCpp?
Is it possible to make an infix function available through sourceCpp?
Yes.
As always, one should read the Rcpp vignettes!
In particular here, if you look in Section 1.6 of the Rcpp attributes vignette, you'd see you can modify the name of a function using the name parameter for Rcpp::export.
For example, we could do:
#include <Rcpp.h>
// [[Rcpp::export(name = `%+%`)]]
Rcpp::NumericVector add(Rcpp::NumericVector x, Rcpp::NumericVector y) {
return x + y;
}
/*** R
1:3 %+% 4:6
*/
Then we'd get:
Rcpp::sourceCpp("~/infix-test.cpp")
> 1:3 %+% 4:6
[1] 5 7 9
So, you still have to name C++ functions valid C++ names in the code, but you can export it to R through the name parameter of Rcpp::export without having to do anything further on the R side.
John Chambers states three principles on page four of the (highly recommended) "Extending R" book:
Everything that exists in R is an object.
Everything that happens in R is a function call.
Interfaces to other software are part of R.
So per point two, you can of course use sourceCpp() to create your a compiled function and hang that at any odd infix operator you like.
Code Example
library(Rcpp)
cppFunction("std::string cc(std::string a, std::string b) { return a+b; }")
`%+%` <- function(a,b) cc(a,b)
cc("Hello", "World")
"hello" %+% "world"
Output
R> library(Rcpp)
R> cppFunction("std::string cc(std::string a, std::string b) { return a+b; }")
R> `%+%` <- function(a,b) cc(a,b)
R>
R> cc("Hello", "World")
[1] "HelloWorld"
R>
R> "hello" %+% "world"
[1] "helloworld"
R>
Summary
Rcpp is really just one cog in the machinery.
Edit
It also works with your initial function, with some minor simplification. For
`%op%` <- cppFunction("Eigen::MatrixXd op(Eigen::VectorXd x, Eigen::VectorXd y) { Eigen::MatrixXd op = x * y.transpose(); return op; }", depends="RcppEigen")
as.numeric(1:3) %op% as.numeric(3:1)
we get
R> `%op%` <- cppFunction("Eigen::MatrixXd op(Eigen::VectorXd x, Eigen::VectorXd y) { Eigen::MatrixXd op = x * y.transpose(); return op; }", depends="RcppEigen")
R> as.numeric(1:3) %op% as.numeric(3:1)
[,1] [,2] [,3]
[1,] 3 2 1
[2,] 6 4 2
[3,] 9 6 3
R>
(modulo some line noise from the compiler).

RcppEigen cannot return matrix to R with more than 2^31 elements [duplicate]

This question already has answers here:
Is there a limit on working with matrix in R with Rcpp?
(2 answers)
Closed 4 years ago.
I've been using the Rcpp and RcppEigen packages to do some matrix calculations and have noticed that an error is produced if the length of the matrix to be returned to R exceeds .Machine$integer.max. Here is a reproducible example:
test_rcpp.cpp
// [[Rcpp::depends(RcppEigen)]]
#include <RcppEigen.h>
using namespace Rcpp;
// [[Rcpp::export]]
SEXP testM(const Eigen::Map<Eigen::MatrixXd> A) {
Eigen::MatrixXd C = A * A.transpose();
return List::create(Rcpp::Named("first") = C.block(0,0,C.rows()/2,C.cols()),
Rcpp::Named("second") = C.block(C.rows()/2,0,C.rows()/2+1,C.cols()));
}
// [[Rcpp::export]]
SEXP testM2(const Eigen::Map<Eigen::MatrixXd> A) {
Eigen::MatrixXd C = A * A.transpose();
return wrap(C);
}
test_rcpp.R
library(Rcpp)
sourceCpp("./test_rcpp.cpp")
A <- matrix(rep(1, ceiling(sqrt((.Machine$integer.max)))), nrow=ceiling(sqrt(.Machine$integer.max)))
tm <- do.call(rbind, testM(A))
tm2 <- testM2(A)
Running testM2(A) returns an error Error in testM2(A) : negative length vectors are not allowed. Currently, testM(A) is my workaround, which splits the matrix in half and returns a list of the two halves.
Is this intended behavior? And if so, what other workarounds are there?
This link had some information but didn't help me specifically with this problem. A similar post suggests problems are encountered when the dimensions of the matrix exceed 2^31. In this case, the matrix I'm returning has dimension c(46341, 46341), well under the 2^31 limit imposed on matrix indices, and contains 2147488281 elements, well under the 2^52 limit imposed on long vectors.
A subset of sessionInfo() information:
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.3 (Santiago)
Note: I get the same problem on R version 3.4.2.
This is a limitation in the current implementation of RcppEigen. Example:
// [[Rcpp::depends(RcppEigen)]]
#include <RcppEigen.h>
// [[Rcpp::export]]
void get_length_rcpp(Rcpp::IntegerMatrix m){
Rcpp::Rcout << m.nrow() << ' ' << m.ncol() << ' '
<< (m.nrow() * m.ncol()) << ' ' << m.size();
}
// [[Rcpp::export]]
void get_length_eigen(Eigen::Map<Eigen::MatrixXi> m){
Rcpp::Rcout << m.rows() << ' ' << m.cols() << ' '
<< (m.rows() * m.cols()) << ' ' << m.size();
}
/*** R
N <- 5e4
A <- matrix(1L, ncol = N, nrow = N)
get_length_rcpp(A)
get_length_eigen(A)
*/
Output:
> N <- 50000
> A <- matrix(1, ncol = N, nrow = N)
> get_length_rcpp(A)
50000 50000 -1794967296 2500000000
> get_length_eigen(A)
Error in get_length_eigen(A) :
long vectors not supported yet: ../../src/include/Rinlinedfuns.h:519
Calls: <Anonymous> ... withVisible -> eval -> eval -> get_length_eigen -> .Call
Execution halted
I have opened an issue and a pull request on github for this.

memory efficient method to calculate distance matrix [duplicate]

I have an object of class big.matrix in R with dimension 778844 x 2. The values are all integers (kilometres). My objective is to calculate the Euclidean distance matrix using the big.matrix and have as a result an object of class big.matrix. I would like to know if there is an optimal way of doing that.
The reason for my choice of using the class big.matrix is memory limitation. I could transform my big.matrix to an object of class matrix and calculate the Euclidean distance matrix using dist(). However, dist() would return an object of size that would not be allocated in the memory.
Edit
The following answer was given by John W. Emerson, author and maintainer of the bigmemory package:
You could use big algebra I expect, but this would also be a very nice use case for Rcpp via sourceCpp(), and very short and easy. But in short, we don't even attempt to provide high-level features (other than the basics which we implemented as proof-of-concept). No single algorithm could cover all use cases once you start talking out-of-memory big.
Here is a way using RcppArmadillo. Much of this is very similar to the RcppGallery example. This will return a big.matrix with the associated pairwise (by row) euclidean distances. I like to wrap my big.matrix functions in a wrapper function to create a cleaner syntax (i.e. avoid the #address and other initializations.
Note - as we are using bigmemory (and therefore concerned with RAM usage) I have this example returned the N-1 x N-1 matrix of only lower triangular elements. You could modify this but this is what I threw together.
euc_dist.cpp
// To enable the functionality provided by Armadillo's various macros,
// simply include them before you include the RcppArmadillo headers.
#define ARMA_NO_DEBUG
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo, BH, bigmemory)]]
using namespace Rcpp;
using namespace arma;
// The following header file provides the definitions for the BigMatrix
// object
#include <bigmemory/BigMatrix.h>
// C++11 plugin
// [[Rcpp::plugins(cpp11)]]
template <typename T>
void BigArmaEuclidean(const Mat<T>& inBigMat, Mat<T> outBigMat) {
int W = inBigMat.n_rows;
for(int i = 0; i < W - 1; i++){
for(int j=i+1; j < W; j++){
outBigMat(j-1,i) = sqrt(sum(pow((inBigMat.row(i) - inBigMat.row(j)),2)));
}
}
}
// [[Rcpp::export]]
void BigArmaEuc(SEXP pInBigMat, SEXP pOutBigMat) {
// First we tell Rcpp that the object we've been given is an external
// pointer.
XPtr<BigMatrix> xpMat(pInBigMat);
XPtr<BigMatrix> xpOutMat(pOutBigMat);
int type = xpMat->matrix_type();
switch(type) {
case 1:
BigArmaEuclidean(
arma::Mat<char>((char *)xpMat->matrix(), xpMat->nrow(), xpMat->ncol(), false),
arma::Mat<char>((char *)xpOutMat->matrix(), xpOutMat->nrow(), xpOutMat->ncol(), false)
);
return;
case 2:
BigArmaEuclidean(
arma::Mat<short>((short *)xpMat->matrix(), xpMat->nrow(), xpMat->ncol(), false),
arma::Mat<short>((short *)xpOutMat->matrix(), xpOutMat->nrow(), xpOutMat->ncol(), false)
);
return;
case 4:
BigArmaEuclidean(
arma::Mat<int>((int *)xpMat->matrix(), xpMat->nrow(), xpMat->ncol(), false),
arma::Mat<int>((int *)xpOutMat->matrix(), xpOutMat->nrow(), xpOutMat->ncol(), false)
);
return;
case 8:
BigArmaEuclidean(
arma::Mat<double>((double *)xpMat->matrix(), xpMat->nrow(), xpMat->ncol(), false),
arma::Mat<double>((double *)xpOutMat->matrix(), xpOutMat->nrow(), xpOutMat->ncol(), false)
);
return;
default:
// We should never get here, but it resolves compiler warnings.
throw Rcpp::exception("Undefined type for provided big.matrix");
}
}
My little wrapper
bigMatrixEuc <- function(bigMat){
zeros <- big.matrix(nrow = nrow(bigMat)-1,
ncol = nrow(bigMat)-1,
init = 0,
type = typeof(bigMat))
BigArmaEuc(bigMat#address, zeros#address)
return(zeros)
}
The test
library(Rcpp)
sourceCpp("euc_dist.cpp")
library(bigmemory)
set.seed(123)
mat <- matrix(rnorm(16), 4)
bm <- as.big.matrix(mat)
# Call new euclidean function
bm_out <- bigMatrixEuc(bm)[]
# pull out the matrix elements for out purposes
distMat <- as.matrix(dist(mat))
distMat[upper.tri(distMat, diag=TRUE)] <- 0
distMat <- distMat[2:4, 1:3]
# check if identical
all.equal(bm_out, distMat, check.attributes = FALSE)
[1] TRUE

Resources