In Rcpp I want to find the maximum of a vector, but I want to omit one element.
I have working code, but I'm sure my approach is quite bad as it involves the full copy of the vector. Is there a much better way to accomplish what I want?
In R:
vec <- 1:10
ele <- 3
max(vec[-ele])
My (terrible) version in Rcpp:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double my_fun(NumericVector vec, int ele) {
NumericVector vec_no_ele = clone(vec);
vec_no_ele.erase(ele);
return max(vec_no_ele);
}
Under the hood, max is implemented as a humble for loop. You shouldn't shy away from for loops in c++ since there is much less overhead compared to R. In this case, a for loop does significantly better than using the built in:
// Coatless answer:
// [[Rcpp::export]]
double max_no_copy(NumericVector vec, int ele) {
double temp = vec[ele-1];
vec[ele-1] = vec[0];
double result_max = max(vec);
vec[ele-1] = temp;
return result_max;
}
// humble for loop
// [[Rcpp::export]]
double max_except_for(NumericVector vec, int ele) {
int vs = vec.size();
double res = 0;
for(int i=0; i<vs; i++) {
if( i == ele-1 ) continue;
if(vec[i] > res) res = vec[i];
}
return res;
}
R side:
x <- rnorm(1e8)
x[1000] <- 1e9
microbenchmark(max_except_for(x, 1000), max_no_copy(x, 1000), times=5)
Unit: milliseconds
expr min lq mean median uq max neval cld
max_except_for(x, 1000) 87.58906 93.56962 92.5092 93.59754 93.6262 94.16361 5 a
max_no_copy(x, 1000) 284.46662 292.57627 296.3772 296.78390 300.5345 307.52455 5 b
identical(max_except_for(x, 1000), max_no_copy(x, 1000)) # TRUE
#Spacemen is suggesting the following approach in comments:
Save the value you want to omit in a temp variable. Set that element to zero or some small value or the same as another value in the vector. Compute the max. Reset the element from the temp variable.
This would be implemented like so:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double max_no_copy(NumericVector vec, int ele) {
double temp = vec[ele];
// Use a value already in vector
vec[ele] = vec[0];
// Find max value
double result_max = max(vec);
// Remove NA value
vec[ele] = temp;
return result_max;
}
Test:
vec <- 1:10
ele <- 2 # C++ indices start at 0 not 1. So, subtract.
max_no_copy(vec, ele)
# [1] 10
Benchmark to be added later...
The answer from #coatless is great and builds on the other generous commenters above. However, it can be further generalized. #coatless uses the value in vec[0] as the placeholder for the value to be omitted, but this fails when the value to be omitted is element 0!
Here's a slightly generalized solution, where I use the adjacent element to ele as the placeholder, and I check that ele is in the index of vec and that vec.length() is greater than 1:
// calculate the max of a vector after omitting one element
double max_except(NumericVector vec, int ele) {
if (vec.length() == 1) stop("vec too short");
if (ele < 0 | ele > vec.length()-1) stop("ele out of range");
double temp = vec[ele];
int idx = (ele > 0) ? ele-1 : ele+1;
vec[ele] = vec[idx];
double res = max(vec);
vec[ele] = temp;
return res;
}
Related
Usually I have to work with big spatial data, and high speed and memory efficiency are expected.
Supposing I want to modify some numeric columns of a dataframe with a self-defined function in Rcpp, I am confused about the reference and copy mechanism of C++ and Rcpp. With the three minimal example code below, would you please help me clatifying the following questions:
Is updateDF3 the best function to do such a task with the highest speed and lowest memory required?
This function is modified from a similiar question here, but I do not understand the warning given by the author, "There are issues associated with this approach. Your original data frame and the one you created share the same vectors and so bad things can happen." If I use this function only for a sub function as updateDF3 and called from R, is it safe?
Why is the difference of performance of updateDF1 and updateDF2 not significant?
What is the difference between passing the parameter with or without reference (&)?
Is the function coded pooly and there is another way, such as DataFrame out=clone(df), tmpstr=asstd::string(colnames[v])?
Thanks in advance.
#include <Rcpp.h>
#include <iostream>
using namespace Rcpp;
using namespace std;
// [[Rcpp::export]]
bool contains(CharacterVector x, std::string y) {
return std::find(x.begin(), x.end(), y)!=x.end();
}
// [[Rcpp::export]]
DataFrame updateDF1(DataFrame df, Nullable<Rcpp::CharacterVector> vars=R_NilValue) {
DataFrame out=clone(df);
string tmpstr;
NumericVector tmpv;
if(vars.isNotNull()){
CharacterVector selvars(vars);
for(int v=0;v<selvars.size();v++){
tmpstr=as<std::string>(selvars[v]);
tmpv=df[tmpstr];
tmpv=tmpv+1.0;
out[tmpstr]=tmpv;
}
}
return out;
}
// [[Rcpp::export]]
DataFrame updateDF2(DataFrame& df, Nullable<Rcpp::CharacterVector> vars=R_NilValue) {
DataFrame out=clone(df);
string tmpstr;
NumericVector tmpv;
if(vars.isNotNull()){
CharacterVector selvars(vars);
for(int v=0;v<selvars.size();v++){
tmpstr=as<std::string>(selvars[v]);
tmpv=df[tmpstr];
tmpv=tmpv+1.0;
out[tmpstr]=tmpv;
}
}
return out;
}
// [[Rcpp::export]]
List updateDF3(DataFrame& df, Nullable<Rcpp::CharacterVector> vars=R_NilValue) {
List out(df.size());
CharacterVector colnames=df.attr("names");
string tmpstr;
NumericVector tmpv;
for(int v=0;v<df.size();v++){
if(vars.isNotNull()){
CharacterVector selvars(vars);
tmpstr=as<std::string>(colnames[v]);
if(contains(selvars,tmpstr)){
tmpv=df[tmpstr];
tmpv=tmpv+1.0;
out[v]=tmpv;
}else{
out[v]=df[tmpstr];
}
}else{
out[v]=df[tmpstr];
}
}
out.attr("class") = df.attr("class") ;
out.attr("row.names") = df.attr("row.names") ;
out.attr("names") = df.attr("names") ;
return out;
}
/*** R
df=as.data.frame(matrix(1:120000000,nrow=10000000))
names(df)=paste("band",1:ncol(df),sep="_")
df=cbind(x="charcol",df)
microbenchmark::microbenchmark(
x1<<-updateDF1(df,vars=names(df)[-1]),
x2<<-updateDF2(df,vars=names(df)[-1]),
x3<<-updateDF3(df,vars=names(df)[-1]),
times=10
)
identical(x1,x2)
identical(x1,x3)
*/
##performance
#Unit: milliseconds
# expr min lq mean median
# x1 <<- updateDF1(df, vars = names(df)[-1]) 587.6023 604.9242 711.8981 651.1242
# x2 <<- updateDF2(df, vars = names(df)[-1]) 581.7129 641.2876 882.9999 766.9354
# x3 <<- updateDF3(df, vars = names(df)[-1]) 406.1824 417.5892 542.2559 420.8485
According to the suggestion of #Roland, the best way using a reference method by modifying updateDF2, the code is as below:
// [[Rcpp::export]]
DataFrame updateDF(DataFrame& df, Nullable<Rcpp::CharacterVector> vars=R_NilValue) {
string tmpstr;
NumericVector tmpv;
if(vars.isNotNull()){
CharacterVector selvars(vars);
for(int v=0;v<selvars.size();v++){
tmpstr=selvars[v];
tmpv=df[tmpstr];
tmpv=tmpv+1.0;
df[tmpstr]=tmpv;
}
}
return df;
}
with the performance of:
Unit: milliseconds
expr min lq mean median
x1 <<- updateDF1(df, vars = names(df)[-1]) 573.8246 728.4211 990.8680 951.3108
x2 <<- updateDF2(df, vars = names(df)[-1]) 595.7339 694.0645 935.4226 941.7450
x3 <<- updateDF3(df, vars = names(df)[-1]) 197.7855 206.4767 377.4378 225.0290
x4 <<- updateDF(df, vars = names(df)[-1]) 148.5119 149.7321 247.1329 152.3744
What is a faster way to get the nonnegative component of a double vector? That is,
pmax(x, 0)
My attempt is using Rcpp:
//' #title Parallel maximum
//' #description A faster \code{pmax()}.
//'
//' #name pmaxC
//' #param x A numeric vector.
//' #param a A single numeric value.
//' #return The parallel maximum of the input values.
//' #note This function will always be faster than \code{pmax(x, a)} when \code{a} is a single value, but can be slower than \code{pmax.int(x, a)} when \code{x} is short. Use this function when comparing a numeric vector with a single value.
//' #export pmaxC
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector pmaxC(NumericVector x, double a) {
int n = x.length();
NumericVector out(n);
for (int i = 0; i < n; ++i) {
double xi = x[i];
if (xi < a) {
out[i] = a;
} else {
out[i] = xi;
}
}
return out;
}
This is a modest improvement:
set.seed(5)
x <- rnorm(1e6)
microbenchmark(pmax(x, 0), pmaxC(x, 0))
Unit: milliseconds
expr min lq mean median uq max neval cld
pmax(x, 0) 8.500419 8.621341 11.09672 10.132045 10.791020 58.44972 100 a
pmaxC(x, 0) 5.624480 5.709262 8.83968 7.598093 7.907853 53.91339 100 a
Neither are unacceptably slow, but given it is a common scenario, I was wondering whether a package had developed a faster approach.
The operation you are performing is fairly simple, so I'm not sure there is much room for improvement with regard to your algorithm above. However, if you really need to squeeze out extra performance, this seems like a good candidate for parallelization. Here is a possible implementation using RcppParallel:
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
#include <Rcpp.h>
struct Pmax : public RcppParallel::Worker {
struct Apply {
double mx;
Apply(double mx_)
: mx(mx_)
{}
double operator()(const double x) const
{
return x > mx ? x : mx;
}
};
const RcppParallel::RVector<double> input;
RcppParallel::RVector<double> output;
Apply f;
Pmax(const Rcpp::NumericVector input_,
Rcpp::NumericVector output_,
double mx_)
: input(input_), output(output_), f(mx_)
{}
void operator()(std::size_t begin, std::size_t end)
{
std::transform(
input.begin() + begin,
input.begin() + end,
output.begin() + begin,
f
);
}
};
// [[Rcpp::export]]
Rcpp::NumericVector par_pmax(Rcpp::NumericVector x, double y)
{
Rcpp::NumericVector res = Rcpp::no_init_vector(x.size());
Pmax p(x, res, y);
RcppParallel::parallelFor(0, x.size(), p);
return res;
}
Testing this with your example data, I get a reasonable improvement:
set.seed(5)
x <- rnorm(1e6)
all.equal(pmax(x, 0), par_pmax(x, 0))
#[1] TRUE
microbenchmark::microbenchmark(
pmax(x, 0),
pmaxC(x, 0),
par_pmax(x, 0),
times = 500L
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# pmax(x, 0) 11.843528 12.193126 14.972588 13.030448 16.799250 102.09895 500
# pmaxC(x, 0) 7.804883 8.036879 10.462070 8.772635 12.407587 69.08290 500
# par_pmax(x, 0) 2.244691 2.443971 4.552169 2.624008 6.359027 65.99233 500
I have a very long vector of parameters (approximately 4^10 elements) and a vector of indices. My aim is to add together all of the values of the parameters that are indexed in the indices vector.
For instance, if I had paras = [1,2,3,4,5,5,5] and indices = [3,3,1,6] then I would want to find the cumulative sum of the third value (3) twice, the first value (1) and the sixth (5), to get 12. There is additionally the option of warping the parameter values according to their location.
I am trying to speed up an R implementation, as I am calling it millions of times.
My current code always returns NA, and I can't see where it is going wrong
Here's the Rcpp function:
double dot_prod_c(NumericVector indices, NumericVector paras,
NumericVector warp = NA_REAL) {
int len = indices.size();
LogicalVector indices_ok;
for (int i = 0; i < len; i++){
indices_ok.push_back(R_IsNA(indices[i]));
}
if(is_true(any(indices_ok))){
return NA_REAL;
}
double counter = 0;
if(NumericVector::is_na(warp[1])){
for (int i = 0; i < len; i++){
counter += paras[indices[i]];
}
} else {
for (int i = 0; i < len; i++){
counter += paras[indices[i]] * warp[i];
}
}
return counter;
}
And here is the working R version:
dot_prod <- function(indices, paras, warp = NA){
if(is.na(warp[1])){
return(sum(sapply(indices, function(ind) paras[ind + 1])))
} else {
return(sum(sapply(1:length(indices), function(i){
ind <- indices[i]
paras[ind + 1] * warp[i]
})))
}
}
Here is some code for testing, and benchmarking using the microbenchmark package:
# testing
library(Rcpp)
library(microbenchmark)
parameters <- list()
indices <- list()
indices_trad <- list()
set.seed(2)
for (i in 4:12){
size <- 4^i
window_size <- 100
parameters[[i-3]] <- runif(size)
indices[[i-3]] <- floor(runif(window_size)*size)
temp <- rep(0, size)
for (j in 1:window_size){
temp[indices[[i-3]][j] + 1] <- temp[indices[[i-3]][j] + 1] + 1
}
indices_trad[[i-3]] <- temp
}
microbenchmark(
x <- sapply(1:9, function(i) dot_prod(indices[[i]], parameters[[i]])),
x_c <- sapply(1:9, function(i) dot_prod_c(indices[[i]], parameters[[i]])),
x_base <- sapply(1:9, function(i) indices_trad[[i]] %*% parameters[[i]])
)
all.equal(x, x_base) # is true, does work
all.equal(x_c, x_base) # not true - C++ version returns only NAs
I was having a little trouble trying to interpret your overall goal through your code, so I'm just going to go with this explanation
For instance, if I had paras = [1,2,3,4,5,5,5] and indices = [3,3,1,6]
then I would want to find the cumulative sum of the third value (3)
twice, the first value (1) and the sixth (5), to get 12. There is
additionally the option of warping the parameter values according to
their location.
since it was most clear to me.
There are some issues with your C++ code. To start, instead of doing this - NumericVector warp = NA_REAL - use the Rcpp::Nullable<> template (shown below). This will solve a few problems:
It's more readable. If you're not familiar with the Nullable class, it's pretty much exactly what it sounds like - an object that may or may not be null.
You won't have to make any awkward initializations, such as NumericVector warp = NA_REAL. Frankly I was surprised that the compiler accepted this.
You won't have to worry about accidentally forgetting that C++ uses zero-based indexing, unlike R, as in this line: if(NumericVector::is_na(warp[1])){. That has undefined behavior written all over it.
Here's a revised version, going off of your quoted description of the problem above:
#include <Rcpp.h>
typedef Rcpp::Nullable<Rcpp::NumericVector> nullable_t;
// [[Rcpp::export]]
double DotProd(Rcpp::NumericVector indices, Rcpp::NumericVector params, nullable_t warp_ = R_NilValue) {
R_xlen_t i = 0, n = indices.size();
double result = 0.0;
if (warp_.isNull()) {
for ( ; i < n; i++) {
result += params[indices[i]];
}
} else {
Rcpp::NumericVector warp(warp_);
for ( ; i < n; i++) {
result += params[indices[i]] * warp[i];
}
}
return result;
}
You had some elaborate code to generate sample data. I didn't take the time to go through this because it wasn't necessary, nor was the benchmarking. You stated yourself that the C++ version wasn't producing the correct results. Your first priority should be to get your code working on simple data. Then feed it some more complex data. Then benchmark. The revised version above works on simple data:
args <- list(
indices = c(3, 3, 1, 6),
params = c(1, 2, 3, 4, 5, 5, 5),
warp = c(.25, .75, 1.25, 1.75)
)
all.equal(
DotProd(args[[1]], args[[2]]),
dot_prod(args[[1]], args[[2]]))
#[1] TRUE
all.equal(
DotProd(args[[1]], args[[2]], args[[3]]),
dot_prod(args[[1]], args[[2]], args[[3]]))
#[1] TRUE
It's also faster than the R version on this sample data. I have no reason to believe it wouldn't be for larger, more complex data either - there's nothing magical or particularly efficient about the *apply functions; they are just more idiomatic / readable R.
microbenchmark::microbenchmark(
"Rcpp" = DotProd(args[[1]], args[[2]]),
"R" = dot_prod(args[[1]], args[[2]]))
#Unit: microseconds
#expr min lq mean median uq max neval
#Rcpp 2.463 2.8815 3.52907 3.3265 3.8445 18.823 100
#R 18.869 20.0285 21.60490 20.4400 21.0745 66.531 100
#
microbenchmark::microbenchmark(
"Rcpp" = DotProd(args[[1]], args[[2]], args[[3]]),
"R" = dot_prod(args[[1]], args[[2]], args[[3]]))
#Unit: microseconds
#expr min lq mean median uq max neval
#Rcpp 2.680 3.0430 3.84796 3.701 4.1360 12.304 100
#R 21.587 22.6855 23.79194 23.342 23.8565 68.473 100
I omitted the NA checks from the example above, but that too can be revised into something more idiomatic by using a little Rcpp sugar. Previously, you were doing this:
LogicalVector indices_ok;
for (int i = 0; i < len; i++){
indices_ok.push_back(R_IsNA(indices[i]));
}
if(is_true(any(indices_ok))){
return NA_REAL;
}
It's a little aggressive - you are testing a whole vector of values (with R_IsNA), and then applying is_true(any(indices_ok)) - when you could just break prematurely and return NA_REAL on the first instance of R_IsNA(indices[i]) resulting in true. Also, the use of push_back will slow down your function quite a bit - you would have been better off initializing indices_ok to the known size and filling it by index access in your loop. Nevertheless, here's one way to condense the operation:
if (Rcpp::na_omit(indices).size() != indices.size()) return NA_REAL;
For completeness, here's a fully sugar-ized version which allows you to avoid loops entirely:
#include <Rcpp.h>
typedef Rcpp::Nullable<Rcpp::NumericVector> nullable_t;
// [[Rcpp::export]]
double DotProd3(Rcpp::NumericVector indices, Rcpp::NumericVector params, nullable_t warp_ = R_NilValue) {
if (Rcpp::na_omit(indices).size() != indices.size()) return NA_REAL;
if (warp_.isNull()) {
Rcpp::NumericVector tmp = params[indices];
return Rcpp::sum(tmp);
} else {
Rcpp::NumericVector warp(warp_), tmp = params[indices];
return Rcpp::sum(tmp * warp);
}
}
/*** R
all.equal(
DotProd3(args[[1]], args[[2]]),
dot_prod(args[[1]], args[[2]]))
#[1] TRUE
all.equal(
DotProd3(args[[1]], args[[2]], args[[3]]),
dot_prod(args[[1]], args[[2]], args[[3]]))
#[1] TRUE
*/
I wanted to check if there is any pre-existing trick for na.locf (from zoo package), rle and inverse.rle in RCpp?
I wrote a loop to implement, e.g. I did the implementation of na.locf(x, na.rm=FALSE, fromLast=FALSE) as follows:
#include <Rcpp.h>
using namespace Rcpp;
//[[Rcpp::export]]
NumericVector naLocf(NumericVector x) {
int n=x.size();
for (int i=1;i<n;i++) {
if (R_IsNA(x[i]) & !R_IsNA(x[i-1])) {
x[i]=x[i-1];
}
}
return x;
}
I was just wondering that since these are quite basic functions, someone might have already implemented them in RCpp in a better way (may be avoid the loop) OR a faster way?
The only thing I'd say is that you are testing for NA twice for each value when you only need to do it once. Testing for NA is not a free operation. Perhaps something like this:
//[[Rcpp::export]]
NumericVector naLocf(NumericVector x) {
int n = x.size() ;
double v = x[0]
for( int i=1; i<n; i++){
if( NumericVector::is_na(x[i]) ) {
x[i] = v ;
} else {
v = x[i] ;
}
}
return x;
}
This still however does unnecessary things, like setting v every time when we could only do it for the last time we don't see NA. We can try something like this:
//[[Rcpp::export]]
NumericVector naLocf3(NumericVector x) {
double *p=x.begin(), *end = x.end() ;
double v = *p ; p++ ;
while( p < end ){
while( p<end && !NumericVector::is_na(*p) ) p++ ;
v = *(p-1) ;
while( p<end && NumericVector::is_na(*p) ) {
*p = v ;
p++ ;
}
}
return x;
}
Now, we can try some benchmarks:
x <- rnorm(1e6)
x[sample(1:1e6, 1000)] <- NA
require(microbenchmark)
microbenchmark( naLocf1(x), naLocf2(x), naLocf3(x) )
# Unit: milliseconds
# expr min lq median uq max neval
# naLocf1(x) 6.296135 6.323142 6.339132 6.354798 6.749864 100
# naLocf2(x) 4.097829 4.123418 4.139589 4.151527 4.266292 100
# naLocf3(x) 3.467858 3.486582 3.507802 3.521673 3.569041 100
I've been testing Rcpp and RcppArmadillo for calculating summary stats on big matrices. This was a lot faster (5 or 10 times faster) than the base R colMeans or the the Armadillo on ~4million rows, 45 columns.
colMeansRcpp <- cxxfunction(signature(X_="integer"),
plugin='Rcpp',
body='
Rcpp::IntegerMatrix X = X_;
int ncol = X.ncol(); int nrow = X.nrow();
Rcpp::NumericVector out(ncol);
for(int col = 0; col < ncol; col++){
out[col]=Rcpp::sum(X(_, col));
}
return wrap(out/nrow);
')
I really want to calculate the median and maybe other quantiles for plotting - and because it requires a sort its even more needy of C++ outsourcing. The armadillo seems a bit slow so I wanted to do an in place sort on code similar to above but I just cant get the syntax right... here is what I'm trying..
# OK I'm aware this floor(nrow/2) is not **absolutely** correct
# I'm simplifying here
colMedianRcpp <- cxxfunction(signature(X_="integer"),
plugin='Rcpp',
body='
Rcpp::IntegerMatrix X = clone(X_);
int ncol = X.ncol(); int nrow = X.nrow();
Rcpp::NumericVector out(ncol);
for(int col = 0; col < ncol; col++){
X(_,col)= std::sort((X_,col).begin, (X_,col).end));
out[col]=X(floor(nrow/2), col));
}
return wrap(out);
')
Basically it's the line
X(_,col)= std::sort((X_,col).begin, (X_,col).end));
I don't know how to express "sort a column in place" with this mixture of Rcpp sugar and std C++. Sorry I can see what I'm doing is wrong but a hint on the right syntax would be lovely.
ps Am I right I need to do this clone() so I don't change the R object?
EDIT
I add the RcppArmadillo code and a benchmark comparison to address the answer/comment below. the benchmark was only on 50k rows for a quick reply but I recall it was similar with many more. I realise you are the Rcpp author.. so many thanks for your time!
The thought occurs that perhaps I'm doing something daft with the RcppArmadillo code to make it run so much slower than the base colMeans or Rcpp version?
colMeansRcppArmadillo <- cxxfunction(signature(X_="integer"),
plugin="RcppArmadillo",
body='
arma::mat X = Rcpp::as<arma::mat > (X_);
arma::rowvec MD= arma::mean(X, 0);
return wrap(MD);
')
And the benchmark is ...
(mb = microbenchmark(
+ colMeans(fqSmallMatrix),
+ colMeansRcpp(fqSmallMatrix),
+ colMeansRcppArmadillo(fqSmallMatrix),
+ times=50))
Unit: milliseconds
expr min lq median uq max neval
colMeans(fqSmallMatrix) 10.620919 10.63289 10.640819 10.648882 10.907145 50
colMeansRcpp(fqSmallMatrix) 2.649038 2.66832 2.676709 2.700839 2.841012 50
colMeansRcppArmadillo(fqSmallMatrix) 25.687067 26.23488 33.168589 33.792489 113.832495 50
You can copy the column into a new vector with
NumericVector y = x(_,j);
Complete example:
library(Rcpp)
cppFunction('
NumericVector colMedianRcpp(NumericMatrix x) {
int nrow = x.nrow();
int ncol = x.ncol();
int position = nrow / 2; // Euclidian division
NumericVector out(ncol);
for (int j = 0; j < ncol; j++) {
NumericVector y = x(_,j); // Copy the column -- the original will not be modified
std::nth_element(y.begin(), y.begin() + position, y.end());
out[j] = y[position];
}
return out;
}
')
x <- matrix( sample(1:12), 3, 4 )
x
colMedianRcpp(x)
x # Unchanged
You are not actually showing RcppArmadillo code -- I have been quite happy with the performance of RcppArmadillo code where I needed row/col column subsetting.
You can instantiate Armadillo matrices via Rcpp just about as efficiently (no copy, re-using R object memory) so I would try that.
And you: you want clone() for a distinct copy, and I think you'd get that for free if you use the default RcppArmadillo ctor (rather than the more efficient two-step).
Edit a few hours later
You had left an open question about why your Armadillo was slow. In the meantime, Vincent solved the issue for you but here is a revisited, cleaner solution using your code as well as Vincent's.
Now how it instantiates the Armadillo matrix without copy -- so it is faster. And it also avoids mixing integer and numeric matrices. The code first:
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
NumericVector colMedianRcpp(NumericMatrix x) {
int nrow = x.nrow();
int ncol = x.ncol();
int position = nrow / 2; // Euclidian division
NumericVector out(ncol);
for (int j = 0; j < ncol; j++) {
NumericVector y = x(_,j); // Copy column -- original will not be mod
std::nth_element(y.begin(), y.begin() + position, y.end());
out[j] = y[position];
}
return out;
}
// [[Rcpp::export]]
arma::rowvec colMeansRcppArmadillo(NumericMatrix x){
arma::mat X = arma::mat(x.begin(), x.nrow(), x.ncol(), false);
return arma::mean(X, 0);
}
// [[Rcpp::export]]
NumericVector colMeansRcpp(NumericMatrix X) {
int ncol = X.ncol();
int nrow = X.nrow();
Rcpp::NumericVector out(ncol);
for (int col = 0; col < ncol; col++){
out[col]=Rcpp::sum(X(_, col));
}
return wrap(out/nrow);
}
/*** R
set.seed(42)
X <- matrix(rnorm(100*10), 100, 10)
library(microbenchmark)
mb <- microbenchmark(colMeans(X), colMeansRcpp(X), colMeansRcppArmadillo(X),
colMedianRcpp(X), times=50)
print(mb)
*/
And here is the result on my machine, with the concise Armadillo version about as fast as yours, and median a little slower as it has to do more work:
R> sourceCpp("/tmp/stephen.cpp")
R> set.seed(42)
R> X <- matrix(rnorm(100*10), 100, 10)
R> library(microbenchmark)
R> mb <- microbenchmark(colMeans(X), colMeansRcpp(X), colMeansRcppArmadillo(X),
+ colMedianRcpp(X), times=50)
R> print(mb)
Unit: microseconds
expr min lq median uq max neval
colMeans(X) 9.469 10.422 11.5810 12.421 30.597 50
colMeansRcpp(X) 3.922 4.281 4.5245 5.306 18.020 50
colMeansRcppArmadillo(X) 4.196 4.549 4.9295 5.927 11.159 50
colMedianRcpp(X) 15.615 16.291 16.7290 17.971 27.026 50
R>