Allocate Rcpp List of n NumericMatrix - r

Is there a way to allocate an Rcpp List of length n, where each element of the List will be filled with a NumericMatrix, but the size of each NumericMatrix can change?
I have an idea for doing this using std::list and push_back(), but the size of the list may be quite large and I want to avoid the overhead of creating an extra copy of the list when I return from the function.
The below R code gives an idea of what I hope to do:
myvec = function(n) {
x = vector("list", n)
for (i in seq_len(n)) {
nc = sample(1:3, 1)
nr = sample(1:3, 1)
x[[i]] = matrix(rbinom(nc * nr, size = 1, prob = 0.5),
nrow = nr, ncol = nc)
}
x
}
This could result in something like:
> myvec(2)
[[1]]
[,1]
[1,] 0
[2,] 1
[[2]]
[,1] [,2] [,3]
[1,] 0 1 0
[2,] 0 1 1
Update: based on the comments of #Dirk and #Ralf, I created functions based on Rcpp::List and std::list with a wrap at the end. Speed comparisons don't seem to favor one version over the other, but perhaps there's an inefficiency I'm not aware of.
src = '
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::List myvec(int n) {
Rcpp::RNGScope rngScope;
Rcpp::List x(n);
// Rcpp::IntegerVector choices = {1, 2 ,3};
Rcpp::IntegerVector choices = Rcpp::seq_len(50);
for (int i = 0; i < n; ++i) {
int nc = Rcpp::sample(choices, 1).at(0);
int nr = Rcpp::sample(choices, 1).at(0);
Rcpp::NumericVector entries = Rcpp::rbinom(nc * nr, 1, 0.5);
x(i) = Rcpp::NumericMatrix(nc, nr, entries.begin());
}
return x;
}
// [[Rcpp::export]]
Rcpp::List myvec2(int n) {
Rcpp::RNGScope scope;
std::list< Rcpp::NumericMatrix > x;
// Rcpp::IntegerVector choices = {1, 2 ,3};
Rcpp::IntegerVector choices = Rcpp::seq_len(50);
for (int i = 0; i < n; ++i) {
int nc = Rcpp::sample(choices, 1).at(0);
int nr = Rcpp::sample(choices, 1).at(0);
Rcpp::NumericVector entries = Rcpp::rbinom(nc * nr, 1, 0.5);
x.push_back( Rcpp::NumericMatrix(nc, nr, entries.begin()));
}
return Rcpp::wrap(x);
}
'
sourceCpp(code = src)
Resulting benchmarks on my computer are:
> library(microbenchmark)
> rcpp_list = function() {
+ set.seed(10);myvec(105)
+ }
> std_list = function() {
+ set.seed(10);myvec2(105)
+ }
> microbenchmark(rcpp_list(), std_list(), times = 1000)
Unit: milliseconds
expr min lq mean median uq
rcpp_list() 1.8901 1.92535 2.205286 1.96640 2.22380
std_list() 1.9164 1.95570 2.224941 2.00555 2.32315
max neval cld
7.1569 1000 a
7.1194 1000 a

The fundamental issue that Rcpp objects are R objects governed my R's memory management where resizing is expensive: full copies.
So when I have tasks similar to yours where sizes may change, or are unknown, I often work with different data structures -- the STL gives us plenty -- and only convert to R(cpp) at the return step at the end.
The devil in the detail here (as always). Profile, experiment, ...
Edit: And in the narrower sense of "can we return a List of NumericMatrix objects with varying sizes" the answer is of course we can because that is what List objects do. You can also insert other types.

As Dirk said, it is of course possible to create a list with matrices of different size. To make it a bit more concrete, here a translation of your R function:
#include <Rcpp.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
Rcpp::List myvec(int n) {
Rcpp::List x(n);
Rcpp::IntegerVector choices = {1, 2 ,3};
for (int i = 0; i < n; ++i) {
int nc = Rcpp::sample(choices, 1).at(0);
int nr = Rcpp::sample(choices, 1).at(0);
Rcpp::NumericVector entries = Rcpp::rbinom(nc * nr, 1, 0.5);
x(i) = Rcpp::NumericMatrix(nc, nr, entries.begin());
}
return x;
}
/***R
myvec(2)
*/
The main difference to the R code are the explicitly named vectors choices and entries, which are only implicit in the R code.

Related

How to concatenate Lists in Rcpp

I want to c() 2 lists in Rcpp, but I'm struggling to get the same structure as I would in R.
Here is some simple data + example:
rlist = list(a = "123")
listadd = list(typ = "fdb")
c(rlist, listadd)
which gives me this:
$a
[1] "123"
$typ
[1] "fdb"
With Rcpp I only found push_back to do more or less what I want but the structure is a bit different. I also tried to use emplace_back based on this reference but it doesnt seem to be implemented in Rcpp.
cppFunction('
List cLists(List x, List y) {
x.push_back(y);
return(x);
}')
which gives me:
cLists(rlist, listadd)
$a
[1] "123"
[[2]]
[[2]]$typ
[1] "fdb"
Based on this question I know that I could use Language("c",x,y).eval(); to use R's c() function and get the correct result, but that doesn't seem to be the right way.
So I was wondering how can I concatenate lists in Rcpp correctly?
EDIT:
Based on #Dirk's comment, I tried to create a new list and fill them with the other lists elements, but then I loose the element names.
cppFunction('
List cLists(List x, List y) {
int nsize = x.size();
int msize = y.size();
List out(nsize + msize);
for(int i = 0; i < nsize; i++) {
out[i] = x[i];
}
for(int i = 0; i < msize; i++) {
out[nsize+i] = y[i];
}
return(out);
}')
Output:
cLists(rlist, listadd)
[[1]]
[1] "123"
[[2]]
[1] "fdb"
The performance hit for your implementation seems to come from copying the name attribute to stl string vectors. You can avoid it like so:
library(Rcpp)
library(microbenchmark)
cppFunction('
List cLists(List x, List y) {
int nsize = x.size();
int msize = y.size();
List out(nsize + msize);
CharacterVector xnames = x.names();
CharacterVector ynames = y.names();
CharacterVector outnames(nsize + msize);
out.attr("names") = outnames;
for(int i = 0; i < nsize; i++) {
out[i] = x[i];
outnames[i] = xnames[i];
}
for(int i = 0; i < msize; i++) {
out[nsize+i] = y[i];
outnames[nsize+i] = ynames[i];
}
return(out);
}')
x <- as.list(runif(1e6)); names(x) <- sample(letters, 1e6, T)
y <- as.list(runif(1e6)); names(y) <- sample(letters, 1e6, T)
microbenchmark(cLists(x,y), c(x,y), times=3)
Unit: milliseconds
expr min lq mean median uq max neval cld
cLists(x, y) 31.70104 31.86375 32.09983 32.02646 32.29922 32.57198 3 a
c(x, y) 47.31037 53.21409 56.41159 59.11781 60.96220 62.80660 3 b
Note: by copying to std::string you're also losing possible character encoding information, whereas working with just R/Rcpp preserves.
This is what I came up with. The output is correct, but unfortunately it is also much less performant than the R version.
library(Rcpp)
cppFunction('
List cLists(List x, List y) {
int nsize = x.size();
int msize = y.size();
List out(nsize + msize);
CharacterVector xnames = x.names();
CharacterVector ynames = y.names();
for(int i = 0; i < nsize; i++) {
out[i] = x[i];
}
for(int i = 0; i < msize; i++) {
out[nsize+i] = y[i];
}
std::vector<std::string> z(x.size() + y.size());
std::copy(xnames.begin(), xnames.end(), z.begin());
std::copy(ynames.begin(), ynames.end(), z.begin() + x.size());
out.attr("names") = z;
return(out);
}')
Output:
cLists(rlist, listadd)
$a
[1] "123"
$typ
[1] "fdb"

Interleaving results from many objects in Rcpp

I need to write to a file row by row of matrices and sparse matrices that appears in a list and I am doing something like this:
#include <RcppArmadillo.h>
// [[Rcpp::export]]
bool write_rows (Rcpp::List data, Rcpp::CharacterVector clss, int n) {
int len = data.length();
for(int i = 0; i<n; i++) {
for(int j=0; j<len; j++) {
if (clss[j] == "matrix") {
Rcpp::NumericMatrix x = data[j];
auto row = x.row(i);
// do something with row i
} else if (clss[j] == "dgCMatrix") {
arma::sp_mat x = data[j];
auto row = x.row(i);
// do something different with row i
}
}
}
return true;
}
This function can be called in R with:
data <- list(
x = Matrix::rsparsematrix(nrow = 1000, ncol = 1000, density = 0.3),
y = matrix(1:10000, nrow = 1000, ncol = 10)
)
clss <- c("dgCMatrix", "matrix")
write_rows(data, clss, 1000)
The function receives a list of matrices or sparse matrices with the same number of rows and writes those matrices row by row, ie. first writes first rows of all elements in data then the second row of all elements and etc.
My problem is that it seems that this line arma::sp_mat x = data[i]; seems to have a huge impact in performance since it seems that I am implicitly casting the list element data[j] to an Armadillo Sparse Matrix n times.
My question is: is there anyway I could avoid this? Is there a more efficient solution? I tried to find a solution by looking into readr's source code, since they also write list elements row by row, but they also do a cast for each row (in this line for example, but maybe this doesn't impact the performance because they deal with SEXPS?
With the clarification, it seems that the result should interleave the rows from each matrix. You can still do this while avoiding multiple conversions.
This is the original code, modified to generate some actual output:
// [[Rcpp::export]]
arma::mat write_rows(Rcpp::List data, Rcpp::CharacterVector clss, int nrows, int ncols) {
int len = data.length();
arma::mat result(nrows*len, ncols);
for (int i = 0, k = 0; i < nrows; i++) {
for (int j = 0; j < len; j++) {
arma::rowvec r;
if (clss[j] == "matrix") {
Rcpp::NumericMatrix x = data[j];
r = x.row(i);
}
else {
arma::sp_mat x = data[j];
r = x.row(i);
}
result.row(k++) = r;
}
}
return result;
}
The following code creates a vector of converted objects, and then extracts the rows from each object as required. The conversion is only done once per matrix. I use a struct containing a dense and sparse mat because it's a lot simpler than dealing with unions; and I don't want to drag in boost::variant or require C++17. Since there's only 2 classes we want to deal with, the overhead is minimal.
struct Matrix_types {
arma::mat m;
arma::sp_mat M;
};
// [[Rcpp::export]]
arma::mat write_rows2(Rcpp::List data, Rcpp::CharacterVector clss, int nrows, int ncols) {
const int len = data.length();
std::vector<Matrix_types> matr(len);
std::vector<bool> is_dense(len);
arma::mat result(nrows*len, ncols);
// populate the structs
for (int j = 0; j < len; j++) {
is_dense[j] = (clss[j] == "matrix");
if (is_dense[j]) {
matr[j].m = Rcpp::as<arma::mat>(data[j]);
}
else {
matr[j].M = Rcpp::as<arma::sp_mat>(data[j]);
}
}
// populate the result
for (int i = 0, k = 0; i < nrows; i++) {
for (int j = 0; j < len; j++, k++) {
if (is_dense[j]) {
result.row(k) = matr[j].m.row(i);
}
else {
arma::rowvec r(matr[j].M.row(i));
result.row(k) = r;
}
}
}
return result;
}
Running on some test data:
data <- list(
a=Matrix(1.0, 1000, 1000, sparse=TRUE),
b=matrix(2.0, 1000, 1000),
c=Matrix(3.0, 1000, 1000, sparse=TRUE),
d=matrix(4.0, 1000, 1000)
)
system.time(z <- write_rows(data, sapply(data, class), 1000, 1000))
# user system elapsed
# 185.75 35.04 221.38
system.time(z2 <- write_rows2(data, sapply(data, class), 1000, 1000))
# user system elapsed
# 4.21 0.05 4.25
identical(z, z2)
# [1] TRUE

Rcpp Matrix row column permutations

I am trying to mimic the R function that allows to run column and row matrix permutations based on a vector of indices. Like in the following code:
m=matrix(sample(c(0:9),5*5,T),ncol=5,nrow=5)
diag(m)=0
rand=sample(c(1:5))
m[rand,rand]
I tried the following code in c++:
Library(Rcpp)
cppFunction(‘
NumericMatrix test(NumericMatrix& M, int col, IntegerVector& rand) {
NumericMatrix M2(col,col);
for(int a=0;a<col;a++){
for(int b=a+1;b<col;b++){
M2(b,a)=M(rand(b),rand(a));
M2(a,b)=M(rand(a),rand(b));
}
}
return M2;
}
‘)
But it is very slow:
microbenchmark::microbenchmark(test(m,5,(rand-1)),m2[rand,rand])
Any ideas how I could speed up the process?
Using a simpler loop:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix test(NumericMatrix& M, int col, IntegerVector& rand) {
NumericMatrix M2(col,col);
for(int a=0;a<col;a++){
for(int b=a+1;b<col;b++){
M2(b,a)=M(rand(b),rand(a));
M2(a,b)=M(rand(a),rand(b));
}
}
return M2;
}
// [[Rcpp::export]]
NumericMatrix test2(const NumericMatrix& M, const IntegerVector& ind) {
int col = M.ncol();
NumericMatrix M2(col, col);
for (int j = 0; j < col; j++)
for (int i = 0; i < col; i++)
M2(i, j) = M(ind[i], ind[j]);
return M2;
}
/*** R
N <- 500
m <- matrix(sample(c(0:9), N * N, TRUE), ncol = N, nrow = N)
diag(m) <- 0
rand <- sample(N)
all.equal(test(m, ncol(m), rand - 1), m[rand, rand], test2(m, rand - 1))
microbenchmark::microbenchmark(
test(m, ncol(m), rand - 1),
m[rand, rand],
test2(m, rand - 1)
)
*/
For N = 5, the R version is faster, but in terms of nanoseconds..
For example, with N = 500, you get:
Unit: microseconds
expr min lq mean median uq max neval
test(m, ncol(m), rand - 1) 2092.474 2233.020 2843.145 2360.654 2548.050 7412.057 100
m[rand, rand] 1422.352 1506.117 2064.500 1578.129 1718.345 6700.219 100
test2(m, rand - 1) 698.595 769.944 1161.747 838.811 928.535 5379.841 100

Rcpp version of tabulate is slower; where is this from, how to understand

In the process of creating some sampling functions for already aggregated data I found that table was rather slow on the size data I am working with. I tried two improvements, first an Rcpp function as follows
// [[Rcpp::export]]
IntegerVector getcts(NumericVector x, int m) {
IntegerVector cts(m);
int t;
for (int i = 0; i < x.length(); i++) {
t = x[i] - 1;
if (0 <= t && t < m)
cts[t]++;
}
return cts;
}
And then while trying to understand why table was rather slow I found it being based on tabulate. Tabulate works well for me, and is faster than the Rcpp version. The code for tabulate is at:
https://github.com/wch/r-source/blob/545d365bd0485e5f0913a7d609c2c21d1f43145a/src/main/util.c#L2204
With the key line being:
for(R_xlen_t i = 0 ; i < n ; i++)
if (x[i] != NA_INTEGER && x[i] > 0 && x[i] <= nb) y[x[i] - 1]++;
Now the key parts of tabulate and my Rcpp version seem pretty close (I have not bothered dealing with NA).
Q1: why is my Rcpp version 3 times slower?
Q2: how can I find out where this time goes?
I would very much appreciate knowing where the time went, but even better would be a good way to profile the code. My C++ skills are only so so, but this seems simple enough that I should (cross my fingers) have been able to avoid any silly stuff that would triple my time.
My timing code:
max_x <- 100
xs <- sample(seq(max_x), size = 50000000, replace = TRUE)
system.time(getcts(xs, max_x))
system.time(tabulate(xs))
This gives 0.318 for getcts and 0.126 for tabulate.
Your function calls a length method in each loop iteration. Seems compiler don't cache it. To fix this store size of the vector in a separate variable or use range based loop. Also note that we don't really need explicit missing values check because in C++ all comparisons involving a NaN always return false.
Let's compare performance:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector tabulate1(const IntegerVector& x, const unsigned max) {
IntegerVector counts(max);
for (std::size_t i = 0; i < x.size(); i++) {
if (x[i] > 0 && x[i] <= max)
counts[x[i] - 1]++;
}
return counts;
}
// [[Rcpp::export]]
IntegerVector tabulate2(const IntegerVector& x, const unsigned max) {
IntegerVector counts(max);
std::size_t n = x.size();
for (std::size_t i = 0; i < n; i++) {
if (x[i] > 0 && x[i] <= max)
counts[x[i] - 1]++;
}
return counts;
}
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
IntegerVector tabulate3(const IntegerVector& x, const unsigned max) {
IntegerVector counts(max);
for (auto& now : x) {
if (now > 0 && now <= max)
counts[now - 1]++;
}
return counts;
}
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
IntegerVector tabulate4(const IntegerVector& x, const unsigned max) {
IntegerVector counts(max);
for (auto it = x.begin(); it != x.end(); it++) {
if (*it > 0 && *it <= max)
counts[*it - 1]++;
}
return counts;
}
/***R
library(microbenchmark)
x <- sample(10, 1e5, rep = TRUE)
microbenchmark(
tabulate(x, 10), tabulate1(x, 10),
tabulate2(x, 10), tabulate3(x, 10), tabulate4(x, 10)
)
x[sample(10e5, 10e3)] <- NA
microbenchmark(
tabulate(x, 10), tabulate1(x, 10),
tabulate2(x, 10), tabulate3(x, 10), tabulate4(x, 10)
)
*/
tabulate1 is the original version.
Benchmark results:
Without NA:
Unit: microseconds
expr min lq mean median uq max neval
tabulate(x, 10) 143.557 146.8355 169.2820 156.1970 177.327 286.370 100
tabulate1(x, 10) 390.706 392.6045 437.7357 416.5655 443.065 748.767 100
tabulate2(x, 10) 108.149 111.4345 139.7579 118.2735 153.118 337.647 100
tabulate3(x, 10) 107.879 111.7305 138.2711 118.8650 139.598 300.023 100
tabulate4(x, 10) 391.003 393.4530 436.3063 420.1915 444.048 777.862 100
With NA:
Unit: microseconds
expr min lq mean median uq max neval
tabulate(x, 10) 943.555 1089.5200 1614.804 1333.806 2042.320 3986.836 100
tabulate1(x, 10) 4523.076 4787.3745 5258.490 4929.586 5624.098 7233.029 100
tabulate2(x, 10) 765.102 931.9935 1361.747 1113.550 1679.024 3436.356 100
tabulate3(x, 10) 773.358 914.4980 1350.164 1140.018 1642.354 3633.429 100
tabulate4(x, 10) 4241.025 4466.8735 4933.672 4717.016 5148.842 8603.838 100
The tabulate4 function which uses an iterator also slower than tabulate. We can improve it:
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
IntegerVector tabulate4(const IntegerVector& x, const unsigned max) {
IntegerVector counts(max);
auto start = x.begin();
auto end = x.end();
for (auto it = start; it != end; it++) {
if (*(it) > 0 && *(it) <= max)
counts[*(it) - 1]++;
}
return counts;
}

sorting columns of Rcpp NumericMatrix for median calculations

I've been testing Rcpp and RcppArmadillo for calculating summary stats on big matrices. This was a lot faster (5 or 10 times faster) than the base R colMeans or the the Armadillo on ~4million rows, 45 columns.
colMeansRcpp <- cxxfunction(signature(X_="integer"),
plugin='Rcpp',
body='
Rcpp::IntegerMatrix X = X_;
int ncol = X.ncol(); int nrow = X.nrow();
Rcpp::NumericVector out(ncol);
for(int col = 0; col < ncol; col++){
out[col]=Rcpp::sum(X(_, col));
}
return wrap(out/nrow);
')
I really want to calculate the median and maybe other quantiles for plotting - and because it requires a sort its even more needy of C++ outsourcing. The armadillo seems a bit slow so I wanted to do an in place sort on code similar to above but I just cant get the syntax right... here is what I'm trying..
# OK I'm aware this floor(nrow/2) is not **absolutely** correct
# I'm simplifying here
colMedianRcpp <- cxxfunction(signature(X_="integer"),
plugin='Rcpp',
body='
Rcpp::IntegerMatrix X = clone(X_);
int ncol = X.ncol(); int nrow = X.nrow();
Rcpp::NumericVector out(ncol);
for(int col = 0; col < ncol; col++){
X(_,col)= std::sort((X_,col).begin, (X_,col).end));
out[col]=X(floor(nrow/2), col));
}
return wrap(out);
')
Basically it's the line
X(_,col)= std::sort((X_,col).begin, (X_,col).end));
I don't know how to express "sort a column in place" with this mixture of Rcpp sugar and std C++. Sorry I can see what I'm doing is wrong but a hint on the right syntax would be lovely.
ps Am I right I need to do this clone() so I don't change the R object?
EDIT
I add the RcppArmadillo code and a benchmark comparison to address the answer/comment below. the benchmark was only on 50k rows for a quick reply but I recall it was similar with many more. I realise you are the Rcpp author.. so many thanks for your time!
The thought occurs that perhaps I'm doing something daft with the RcppArmadillo code to make it run so much slower than the base colMeans or Rcpp version?
colMeansRcppArmadillo <- cxxfunction(signature(X_="integer"),
plugin="RcppArmadillo",
body='
arma::mat X = Rcpp::as<arma::mat > (X_);
arma::rowvec MD= arma::mean(X, 0);
return wrap(MD);
')
And the benchmark is ...
(mb = microbenchmark(
+ colMeans(fqSmallMatrix),
+ colMeansRcpp(fqSmallMatrix),
+ colMeansRcppArmadillo(fqSmallMatrix),
+ times=50))
Unit: milliseconds
expr min lq median uq max neval
colMeans(fqSmallMatrix) 10.620919 10.63289 10.640819 10.648882 10.907145 50
colMeansRcpp(fqSmallMatrix) 2.649038 2.66832 2.676709 2.700839 2.841012 50
colMeansRcppArmadillo(fqSmallMatrix) 25.687067 26.23488 33.168589 33.792489 113.832495 50
You can copy the column into a new vector with
NumericVector y = x(_,j);
Complete example:
library(Rcpp)
cppFunction('
NumericVector colMedianRcpp(NumericMatrix x) {
int nrow = x.nrow();
int ncol = x.ncol();
int position = nrow / 2; // Euclidian division
NumericVector out(ncol);
for (int j = 0; j < ncol; j++) {
NumericVector y = x(_,j); // Copy the column -- the original will not be modified
std::nth_element(y.begin(), y.begin() + position, y.end());
out[j] = y[position];
}
return out;
}
')
x <- matrix( sample(1:12), 3, 4 )
x
colMedianRcpp(x)
x # Unchanged
You are not actually showing RcppArmadillo code -- I have been quite happy with the performance of RcppArmadillo code where I needed row/col column subsetting.
You can instantiate Armadillo matrices via Rcpp just about as efficiently (no copy, re-using R object memory) so I would try that.
And you: you want clone() for a distinct copy, and I think you'd get that for free if you use the default RcppArmadillo ctor (rather than the more efficient two-step).
Edit a few hours later
You had left an open question about why your Armadillo was slow. In the meantime, Vincent solved the issue for you but here is a revisited, cleaner solution using your code as well as Vincent's.
Now how it instantiates the Armadillo matrix without copy -- so it is faster. And it also avoids mixing integer and numeric matrices. The code first:
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
NumericVector colMedianRcpp(NumericMatrix x) {
int nrow = x.nrow();
int ncol = x.ncol();
int position = nrow / 2; // Euclidian division
NumericVector out(ncol);
for (int j = 0; j < ncol; j++) {
NumericVector y = x(_,j); // Copy column -- original will not be mod
std::nth_element(y.begin(), y.begin() + position, y.end());
out[j] = y[position];
}
return out;
}
// [[Rcpp::export]]
arma::rowvec colMeansRcppArmadillo(NumericMatrix x){
arma::mat X = arma::mat(x.begin(), x.nrow(), x.ncol(), false);
return arma::mean(X, 0);
}
// [[Rcpp::export]]
NumericVector colMeansRcpp(NumericMatrix X) {
int ncol = X.ncol();
int nrow = X.nrow();
Rcpp::NumericVector out(ncol);
for (int col = 0; col < ncol; col++){
out[col]=Rcpp::sum(X(_, col));
}
return wrap(out/nrow);
}
/*** R
set.seed(42)
X <- matrix(rnorm(100*10), 100, 10)
library(microbenchmark)
mb <- microbenchmark(colMeans(X), colMeansRcpp(X), colMeansRcppArmadillo(X),
colMedianRcpp(X), times=50)
print(mb)
*/
And here is the result on my machine, with the concise Armadillo version about as fast as yours, and median a little slower as it has to do more work:
R> sourceCpp("/tmp/stephen.cpp")
R> set.seed(42)
R> X <- matrix(rnorm(100*10), 100, 10)
R> library(microbenchmark)
R> mb <- microbenchmark(colMeans(X), colMeansRcpp(X), colMeansRcppArmadillo(X),
+ colMedianRcpp(X), times=50)
R> print(mb)
Unit: microseconds
expr min lq median uq max neval
colMeans(X) 9.469 10.422 11.5810 12.421 30.597 50
colMeansRcpp(X) 3.922 4.281 4.5245 5.306 18.020 50
colMeansRcppArmadillo(X) 4.196 4.549 4.9295 5.927 11.159 50
colMedianRcpp(X) 15.615 16.291 16.7290 17.971 27.026 50
R>

Resources