Translate outer() from base R to RcppArmadillo - r

Is there any way to efficiently translate the outer() function for multiplication of two vectors from R base to RcppArmadillo? I attempted to do so but it is not efficient at all.
Take the following example:
library(Rcpp)
library(RcppArmadillo)
library(microbenchmark)
#Outer attempt
cppFunction(depends = "RcppArmadillo",
'
arma::mat outer_rcpp(arma::vec x, arma::vec y) {
int x_length = x.n_elem;
int y_length = y.n_elem;
arma::mat final(x_length, y_length);
// And use loops instead of outer
for(int i = 0; i < x_length; i++) {
final.col(i) = x[i] * y;
}
return(final);
}
'
)
#Test for equal results
a <- rnorm(5)
base <- base::outer(a, a)
rcpp <- outer_rcpp(a, a)
all.equal(base, rcpp)
#Test for speed
b <- rnorm(5000)
microbenchmark(base = base::outer(b, b),
rcpp = outer_rcpp(b, b), times = 10)
The results are 2 times slower using R base. I am sure that this can be done though matrix multiplication, any idea how?

As #thelatemail pointed out in the comments, the outer routine is already using a heavily optimized C routine.
src/library/base/R/outer.R: tcrossprod usage.
src/main/array.c: underlying C routine powering the tcrossprod computation.
Armadillo itself has its own optimization for addressing matrix multiplication using the dgemm and dgemv routines from LAPACK:
armadillo_bits/mul_gemm.hpp: C := alpha*op( A )op( B ) + betaC,
armadillo_bits/mul_gemv.hpp: y := alphaAx + betay, or y := alphaA**Tx + betay,
Playing around with the outerproduct calculations leads to a few optimizations. Mainly, we're opting to move the outer product into armadillo actions instead of loops:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat outer_rcpp(const arma::vec& x, const arma::vec& y) {
int x_length = x.n_elem;
int y_length = y.n_elem;
arma::mat final(x_length, y_length);
// And use loops instead of outer
for(int i = 0; i < x_length; i++) {
final.col(i) = x[i] * y;
}
return final;
}
// [[Rcpp::export]]
arma::mat outer_with_armadillo(const arma::vec& x, const arma::vec& y) {
arma::mat final = x*y.t();
return final;
}
// [[Rcpp::export]]
arma::mat outer_with_armadillo_transposed(const arma::vec& x, const arma::rowvec& y) {
arma::mat final = x*y;
return final;
}
Revisiting the benchmarking code, we have:
b = rnorm(5000)
b_tranposed = t(b)
bench_results = microbenchmark::microbenchmark(base = base::outer(b, b),
outer_armadillo_loop = outer_rcpp(b, b),
outer_armadillo_optimized = outer_with_armadillo(b, b),
outer_armadillo_optimized_transposed = outer_with_armadillo_transposed(b, b_tranposed), times = 10)
bench_results
expr
min
lq
mean
median
uq
max
neval
base
132.8601
141.3532
156.9979
146.7993
154.8954
234.2619
10
outer_armadillo_loop
278.4115
279.9204
317.7907
288.4212
329.0769
451.6872
10
outer_armadillo_optimized
272.4348
283.3380
347.7913
304.1181
339.3282
728.2264
10
outer_armadillo_optimized_transposed
269.7855
270.7108
297.9580
279.8099
312.3488
386.4270
10
From the results, the lowest I could achieve is having a pre-transposed b vector from column vector form into row-vector form: (n x 1) * (1 x m)

Related

Lasso solution with Rcpp: A self study

I'm very new to Rcpp. I 'm trying to write a coordinate descent algorithm for lasso in Rcpp as a self study. The code return an error:
Mat::init(): requested size is not compatible with column vector layout
I also have some problems while writing the code.
no matching function for call to 'sign'
no matching function for call to 'ifelse'
pow(X.col(j),2) : no viable conversion
I write
(S1>0)-(S1<0) for (1) to compute the sign of S1,
a if(){}else{} statement for (2) and
X.col(j)%X.col(j) for (3).
Any suggestion, please?
Here is the code.
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
mat betamat(NumericMatrix Xr, NumericVector yr, NumericVector lambdar, double tol=0.0000001) {
int N = Xr.nrow(), p = Xr.ncol(), l = lambdar.size();
mat X(Xr.begin(), N, p, false);
colvec y(yr.begin(), yr.size(), false);
vec lambda(lambdar.begin(), lambdar.size(),false);
colvec ols = solve(X,y);
mat betas = zeros<mat>(p,l);
//
bool converged = false;
for (int i = 0; i < l; ++i) {
colvec b = zeros<vec>(p);
colvec r = y-X*b;
while(converged == false){
colvec beta_old = betas;
for(int j = 0; j < p; ++j){
r = r + X.col(j)*b(j);
double xr = dot(X.col(j),r);
double S1 = xr/N;
double xx = sum(X.col(j)%X.col(j))/N;
b(j) =((S1>0)-(S1<0))*(abs(S1)-lambda(i))/xx;
if(b(j)>0){
b(j)=b(j);
}else{
b(j)=0;
}
r = r - X.col(j)*b(j);
}
converged = (sum(abs(betas - beta_old)) < tol);
}
betas.col(i) = b;
}
return betas;
}
In R, I'm calling this with
library(Rcpp)
sourceCpp("filename.cpp")
set.seed(1)
X <- matrix(rnorm(100*3),100)
y <- rnorm(100)
coefficients <- betamat(X,y,seq(0,1,0.0005))

Euclidean distance matrix performance between two shapes

The problem I am having is that I have to calculate a Euclidean distance matrix between shapes that can range from 20,000 up to 60,000 points, which produces 10-20GB amounts of data. I have to run each of these calculates thousands of times so 20GB x 7,000 (each calculation is a different point cloud). The shapes can be either 2D or 3D.
EDITED (Updated questions)
Is there a more efficient way to calculate the forward and backward distances without using two separate nested loops?
I know I could save the data matrix and calculate the minimum
distances in each direction, but then there is a huge memory issue
with large point clouds.
Is there a way to speed up this calculation and/or clean up the code to trim off time?
The irony is that I only need the matrix to calculate a very simple metric, but it requires the entire matrix to find that metric (Average Hausdorff distance).
Data example where each column represents a dimension of the shape and each row is a point in the shape:
first_configuration <- matrix(1:6,2,3)
second_configuration <- matrix(6:11,2,3)
colnames(first_configuration) <- c("x","y","z")
colnames(second_configuration) <- c("x","y","z")
This code calculates a Euclidean distance between between coordinates:
m <- nrow(first_configuration)
n <- nrow(second_configuration)
D <- sqrt(pmax(matrix(rep(apply(first_configuration * first_configuration, 1, sum), n), m, n, byrow = F) + matrix(rep(apply(second_configuration * second_configuration, 1, sum), m), m, n, byrow = T) - 2 * first_configuration %*% t(second_configuration), 0))
D
Output:
[,1] [,2]
[1,] 8.660254 10.392305
[2,] 6.928203 8.660254
EDIT: included hausdorff average code
d1 <- mean(apply(D, 1, min))
d2 <- mean(apply(D, 2, min))
average_hausdorff <- mean(d1, d2)
EDIT (Rcpp solution):
Here is my attempt to implement it in Rcpp so the matrix is never saved to memory. Working now but very slow.
sourceCpp(code=
#include <Rcpp.h>
#include <limits>
using namespace Rcpp;
// [[Rcpp::export]]
double edist_rcpp(NumericVector x, NumericVector y){
double d = sqrt( sum( pow(x - y, 2) ) );
return d;
}
// [[Rcpp::export]]
double avg_hausdorff_rcpp(NumericMatrix x, NumericMatrix y){
int nrowx = x.nrow();
int nrowy = y.nrow();
double new_low_x = std::numeric_limits<int>::max();
double new_low_y = std::numeric_limits<int>::max();
double mean_forward = 0;
double mean_backward = 0;
double mean_hd;
double td;
//forward
for(int i = 0; i < nrowx; i++) {
for(int j = 0; j < nrowy; j++) {
NumericVector v1 = x.row(i);
NumericVector v2 = y.row(j);
td = edist_rcpp(v1, v2);
if(td < new_low_x) {
new_low_x = td;
}
}
mean_forward = mean_forward + new_low_x;
new_low_x = std::numeric_limits<int>::max();
}
//backward
for(int i = 0; i < nrowy; i++) {
for(int j = 0; j < nrowx; j++) {
NumericVector v1 = y.row(i);
NumericVector v2 = x.row(j);
td = edist_rcpp(v1, v2);
if(td < new_low_y) {
new_low_y = td;
}
}
mean_backward = mean_backward + new_low_y;
new_low_y = std::numeric_limits<int>::max();
}
//hausdorff mean
mean_hd = (mean_forward / nrowx + mean_backward / nrowy) / 2;
return mean_hd;
}
)
EDIT (RcppParallel solution):
Definitely faster than the serial Rcpp solution and most certainly the R solution. If anyone has tips on how to improve my RcppParallel code to trim off some extra time it would be much appreciated!
sourceCpp(code=
#include <Rcpp.h>
#include <RcppParallel.h>
#include <limits>
// [[Rcpp::depends(RcppParallel)]]
struct minimum_euclidean_distances : public RcppParallel::Worker {
//Input
const RcppParallel::RMatrix<double> a;
const RcppParallel::RMatrix<double> b;
//Output
RcppParallel::RVector<double> medm;
minimum_euclidean_distances(const Rcpp::NumericMatrix a, const Rcpp::NumericMatrix b, Rcpp::NumericVector medm) : a(a), b(b), medm(medm) {}
void operator() (std::size_t begin, std::size_t end) {
for(std::size_t i = begin; i < end; i++) {
double new_low = std::numeric_limits<double>::max();
for(std::size_t j = 0; j < b.nrow(); j++) {
double dsum = 0;
for(std::size_t z = 0; z < b.ncol(); z++) {
dsum = dsum + pow(a(i,z) - b(j,z), 2);
}
dsum = pow(dsum, 0.5);
if(dsum < new_low) {
new_low = dsum;
}
}
medm[i] = new_low;
}
}
};
// [[Rcpp::export]]
double mean_directional_hausdorff_rcpp(Rcpp::NumericMatrix a, Rcpp::NumericMatrix b){
Rcpp::NumericVector medm(a.nrow());
minimum_euclidean_distances minimum_euclidean_distances(a, b, medm);
RcppParallel::parallelFor(0, a.nrow(), minimum_euclidean_distances);
double results = Rcpp::sum(medm);
results = results / a.nrow();
return results;
}
// [[Rcpp::export]]
double max_directional_hausdorff_rcpp(Rcpp::NumericMatrix a, Rcpp::NumericMatrix b){
Rcpp::NumericVector medm(a.nrow());
minimum_euclidean_distances minimum_euclidean_distances(a, b, medm);
RcppParallel::parallelFor(0, a.nrow(), minimum_euclidean_distances);
double results = Rcpp::max(medm);
return results;
}
)
Benchmarks using large point clouds of sizes 37,775 and 36,659:
//Rcpp serial solution
system.time(avg_hausdorff_rcpp(ll,rr))
user system elapsed
409.143 0.000 409.105
//RcppParallel solution
system.time(mean(mean_directional_hausdorff_rcpp(ll,rr), mean_directional_hausdorff_rcpp(rr,ll)))
user system elapsed
260.712 0.000 33.265
I try to use JuliaCall to do the calculation for the average Hausdorff distance.
JuliaCall embeds Julia in R.
I only try a serial solution in JuliaCall. It seems to be faster than the RcppParallel and the Rcpp serial solution in the question, but I don't have the benchmark data. Since ability for parallel computation is built in Julia. A parallel computation version in Julia should be written without much difficulty. I will update my answer after finding that out.
Below is the julia file I wrote:
# Calculate the min distance from the k-th point in as to the points in bs
function min_dist(k, as, bs)
n = size(bs, 1)
p = size(bs, 2)
dist = Inf
for i in 1:n
r = 0.0
for j in 1:p
r += (as[k, j] - bs[i, j]) ^ 2
## if r is already greater than the upper bound,
## then there is no need to continue doing the calculation
if r > dist
continue
end
end
if r < dist
dist = r
end
end
sqrt(dist)
end
function avg_min_dist_from(as, bs)
distsum = 0.0
n1 = size(as, 1)
for k in 1:n1
distsum += min_dist_from(k, as, bs)
end
distsum / n1
end
function hausdorff_avg_dist(as, bs)
(avg_min_dist_from(as, bs) + avg_min_dist_from(bs, as)) / 2
end
And this is the R code to use the julia function:
first_configuration <- matrix(1:6,2,3)
second_configuration <- matrix(6:11,2,3)
colnames(first_configuration) <- c("x","y","z")
colnames(second_configuration) <- c("x","y","z")
m <- nrow(first_configuration)
n <- nrow(second_configuration)
D <- sqrt(matrix(rep(apply(first_configuration * first_configuration, 1, sum), n), m, n, byrow = F) + matrix(rep(apply(second_configuration * second_configuration, 1, sum), m), m, n, byrow = T) - 2 * first_configuration %*% t(second_configuration))
D
d1 <- mean(apply(D, 1, min))
d2 <- mean(apply(D, 2, min))
average_hausdorff <- mean(d1, d2)
library(JuliaCall)
## the first time of julia_setup could be quite time consuming
julia_setup()
## source the julia file which has our hausdorff_avg_dist function
julia_source("hausdorff.jl")
## check if the julia function is correct with the example
average_hausdorff_julia <- julia_call("hausdauff_avg_dist",
first_configuration,
second_configuration)
## generate some large random point clouds
n1 <- 37775
n2 <- 36659
as <- matrix(rnorm(n1 * 3), n1, 3)
bs <- matrix(rnorm(n2 * 3), n2, 3)
system.time(julia_call("hausdauff_avg_dist", as, bs))
The time on my laptop was less than 20 seconds, note this is performance of the serial version of JuliaCall! I used the same data to test RCpp serial solution in the question, which took more than 10 minutes to run. I don't have RCpp parallel on my laptop now so I can't try that. And as I said, Julia has built-in ability to do parallel computation.

Vector mean in Rcpp

I'm trying to convert a r function into Rcpp to try and speed thing up since it involves a for loop. Along the way I need to calculate the mean of the entries of a vector, which in R would be as simple as mean(x), but it appears to not work in Rcpp, giving me 0 0 as result everytime.
My code looks like this:
cppFunction(
"NumericVector fun(int n, double lambda, ...) {
...
NumericVector y = rpois(n, lambda);
NumericVector w = dpois(y, lambda);
NumericVector x = w*y;
double z = mean(x);
return z;
}")
Edit: So I thought my error was due to what was mentioned above, and the return of a single double of z is just me trying to isolate the issue. The following code however still does not work:
cppFunction(
"NumericVector zstat(int n, double lambda, double lambda0, int m) {
NumericVector z(m);
for (int i=1; i<m; ++i){
NumericVector y = rpois(n, lambda0);
NumericVector w = dpois(y, lambda)/dpois(y,lambda0);
double x = mean(w*y);
z[i] = (x-2)/(sqrt(2/n));
}
return z;
}")
The return type of your function is NumericVector, but Rcpp::mean returns a scalar value convertible to double. Fixing this will correct the issue:
library(Rcpp)
cppFunction(
"double fun(int n, double lambda) {
NumericVector y = rpois(n, lambda);
NumericVector w = dpois(y, lambda);
NumericVector x = w*y;
double z = mean(x);
return z;
}")
set.seed(123)
fun(50, 1.5)
# [1] 0.2992908
What is happening in your code is since NumericVector was specified as the return type, this constructor is called,
template <typename T>
Vector(T size,
typename Rcpp::traits::enable_if<traits::is_arithmetic<T>::value, void>::type* = 0) {
Storage::set__( Rf_allocVector( RTYPE, size) ) ;
init() ;
}
which casts the double to an integral type and creates a NumericVector with length equal to the truncated value of the double. To demonstrate,
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector from_double(double x) {
return x;
}
/*** R
sapply(0.5:4.5, from_double)
# [[1]]
# numeric(0)
#
# [[2]]
# [1] 0
#
# [[3]]
# [1] 0 0
#
# [[4]]
# [1] 0 0 0
#
# [[5]]
# [1] 0 0 0 0
*/
Edit: Regarding your second question, you are dividing by sqrt(2 / n), where 2 and n are both integers, which ends up causing a division by zero in most cases -- hence all of the Inf values in the result vector. You can fix this by using 2.0 instead of 2:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector zstat(int n, double lambda, double lambda0, int m) {
NumericVector z(m);
for (int i=1; i<m; ++i){
NumericVector y = rpois(n, lambda0);
NumericVector w = dpois(y, lambda)/dpois(y,lambda0);
double x = mean(w * y);
// z[i] = (x - 2) / sqrt(2 / n);
// ^^^^^
z[i] = (x - 2) / sqrt(2.0 / n);
// ^^^^^^^
}
return z;
}
/*** R
set.seed(123)
zstat(25, 2, 3, 10)
# [1] 0.0000000 -0.4427721 0.3199805 0.1016661 0.4078687 0.4054078
# [7] -0.1591861 0.9717596 0.6325110 0.1269779
*/
C++ is not R -- you need to be more careful about the types of your variables.

Fastest way to get nonnegative component

What is a faster way to get the nonnegative component of a double vector? That is,
pmax(x, 0)
My attempt is using Rcpp:
//' #title Parallel maximum
//' #description A faster \code{pmax()}.
//'
//' #name pmaxC
//' #param x A numeric vector.
//' #param a A single numeric value.
//' #return The parallel maximum of the input values.
//' #note This function will always be faster than \code{pmax(x, a)} when \code{a} is a single value, but can be slower than \code{pmax.int(x, a)} when \code{x} is short. Use this function when comparing a numeric vector with a single value.
//' #export pmaxC
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector pmaxC(NumericVector x, double a) {
int n = x.length();
NumericVector out(n);
for (int i = 0; i < n; ++i) {
double xi = x[i];
if (xi < a) {
out[i] = a;
} else {
out[i] = xi;
}
}
return out;
}
This is a modest improvement:
set.seed(5)
x <- rnorm(1e6)
microbenchmark(pmax(x, 0), pmaxC(x, 0))
Unit: milliseconds
expr min lq mean median uq max neval cld
pmax(x, 0) 8.500419 8.621341 11.09672 10.132045 10.791020 58.44972 100 a
pmaxC(x, 0) 5.624480 5.709262 8.83968 7.598093 7.907853 53.91339 100 a
Neither are unacceptably slow, but given it is a common scenario, I was wondering whether a package had developed a faster approach.
The operation you are performing is fairly simple, so I'm not sure there is much room for improvement with regard to your algorithm above. However, if you really need to squeeze out extra performance, this seems like a good candidate for parallelization. Here is a possible implementation using RcppParallel:
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
#include <Rcpp.h>
struct Pmax : public RcppParallel::Worker {
struct Apply {
double mx;
Apply(double mx_)
: mx(mx_)
{}
double operator()(const double x) const
{
return x > mx ? x : mx;
}
};
const RcppParallel::RVector<double> input;
RcppParallel::RVector<double> output;
Apply f;
Pmax(const Rcpp::NumericVector input_,
Rcpp::NumericVector output_,
double mx_)
: input(input_), output(output_), f(mx_)
{}
void operator()(std::size_t begin, std::size_t end)
{
std::transform(
input.begin() + begin,
input.begin() + end,
output.begin() + begin,
f
);
}
};
// [[Rcpp::export]]
Rcpp::NumericVector par_pmax(Rcpp::NumericVector x, double y)
{
Rcpp::NumericVector res = Rcpp::no_init_vector(x.size());
Pmax p(x, res, y);
RcppParallel::parallelFor(0, x.size(), p);
return res;
}
Testing this with your example data, I get a reasonable improvement:
set.seed(5)
x <- rnorm(1e6)
all.equal(pmax(x, 0), par_pmax(x, 0))
#[1] TRUE
microbenchmark::microbenchmark(
pmax(x, 0),
pmaxC(x, 0),
par_pmax(x, 0),
times = 500L
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# pmax(x, 0) 11.843528 12.193126 14.972588 13.030448 16.799250 102.09895 500
# pmaxC(x, 0) 7.804883 8.036879 10.462070 8.772635 12.407587 69.08290 500
# par_pmax(x, 0) 2.244691 2.443971 4.552169 2.624008 6.359027 65.99233 500

sorting columns of Rcpp NumericMatrix for median calculations

I've been testing Rcpp and RcppArmadillo for calculating summary stats on big matrices. This was a lot faster (5 or 10 times faster) than the base R colMeans or the the Armadillo on ~4million rows, 45 columns.
colMeansRcpp <- cxxfunction(signature(X_="integer"),
plugin='Rcpp',
body='
Rcpp::IntegerMatrix X = X_;
int ncol = X.ncol(); int nrow = X.nrow();
Rcpp::NumericVector out(ncol);
for(int col = 0; col < ncol; col++){
out[col]=Rcpp::sum(X(_, col));
}
return wrap(out/nrow);
')
I really want to calculate the median and maybe other quantiles for plotting - and because it requires a sort its even more needy of C++ outsourcing. The armadillo seems a bit slow so I wanted to do an in place sort on code similar to above but I just cant get the syntax right... here is what I'm trying..
# OK I'm aware this floor(nrow/2) is not **absolutely** correct
# I'm simplifying here
colMedianRcpp <- cxxfunction(signature(X_="integer"),
plugin='Rcpp',
body='
Rcpp::IntegerMatrix X = clone(X_);
int ncol = X.ncol(); int nrow = X.nrow();
Rcpp::NumericVector out(ncol);
for(int col = 0; col < ncol; col++){
X(_,col)= std::sort((X_,col).begin, (X_,col).end));
out[col]=X(floor(nrow/2), col));
}
return wrap(out);
')
Basically it's the line
X(_,col)= std::sort((X_,col).begin, (X_,col).end));
I don't know how to express "sort a column in place" with this mixture of Rcpp sugar and std C++. Sorry I can see what I'm doing is wrong but a hint on the right syntax would be lovely.
ps Am I right I need to do this clone() so I don't change the R object?
EDIT
I add the RcppArmadillo code and a benchmark comparison to address the answer/comment below. the benchmark was only on 50k rows for a quick reply but I recall it was similar with many more. I realise you are the Rcpp author.. so many thanks for your time!
The thought occurs that perhaps I'm doing something daft with the RcppArmadillo code to make it run so much slower than the base colMeans or Rcpp version?
colMeansRcppArmadillo <- cxxfunction(signature(X_="integer"),
plugin="RcppArmadillo",
body='
arma::mat X = Rcpp::as<arma::mat > (X_);
arma::rowvec MD= arma::mean(X, 0);
return wrap(MD);
')
And the benchmark is ...
(mb = microbenchmark(
+ colMeans(fqSmallMatrix),
+ colMeansRcpp(fqSmallMatrix),
+ colMeansRcppArmadillo(fqSmallMatrix),
+ times=50))
Unit: milliseconds
expr min lq median uq max neval
colMeans(fqSmallMatrix) 10.620919 10.63289 10.640819 10.648882 10.907145 50
colMeansRcpp(fqSmallMatrix) 2.649038 2.66832 2.676709 2.700839 2.841012 50
colMeansRcppArmadillo(fqSmallMatrix) 25.687067 26.23488 33.168589 33.792489 113.832495 50
You can copy the column into a new vector with
NumericVector y = x(_,j);
Complete example:
library(Rcpp)
cppFunction('
NumericVector colMedianRcpp(NumericMatrix x) {
int nrow = x.nrow();
int ncol = x.ncol();
int position = nrow / 2; // Euclidian division
NumericVector out(ncol);
for (int j = 0; j < ncol; j++) {
NumericVector y = x(_,j); // Copy the column -- the original will not be modified
std::nth_element(y.begin(), y.begin() + position, y.end());
out[j] = y[position];
}
return out;
}
')
x <- matrix( sample(1:12), 3, 4 )
x
colMedianRcpp(x)
x # Unchanged
You are not actually showing RcppArmadillo code -- I have been quite happy with the performance of RcppArmadillo code where I needed row/col column subsetting.
You can instantiate Armadillo matrices via Rcpp just about as efficiently (no copy, re-using R object memory) so I would try that.
And you: you want clone() for a distinct copy, and I think you'd get that for free if you use the default RcppArmadillo ctor (rather than the more efficient two-step).
Edit a few hours later
You had left an open question about why your Armadillo was slow. In the meantime, Vincent solved the issue for you but here is a revisited, cleaner solution using your code as well as Vincent's.
Now how it instantiates the Armadillo matrix without copy -- so it is faster. And it also avoids mixing integer and numeric matrices. The code first:
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
NumericVector colMedianRcpp(NumericMatrix x) {
int nrow = x.nrow();
int ncol = x.ncol();
int position = nrow / 2; // Euclidian division
NumericVector out(ncol);
for (int j = 0; j < ncol; j++) {
NumericVector y = x(_,j); // Copy column -- original will not be mod
std::nth_element(y.begin(), y.begin() + position, y.end());
out[j] = y[position];
}
return out;
}
// [[Rcpp::export]]
arma::rowvec colMeansRcppArmadillo(NumericMatrix x){
arma::mat X = arma::mat(x.begin(), x.nrow(), x.ncol(), false);
return arma::mean(X, 0);
}
// [[Rcpp::export]]
NumericVector colMeansRcpp(NumericMatrix X) {
int ncol = X.ncol();
int nrow = X.nrow();
Rcpp::NumericVector out(ncol);
for (int col = 0; col < ncol; col++){
out[col]=Rcpp::sum(X(_, col));
}
return wrap(out/nrow);
}
/*** R
set.seed(42)
X <- matrix(rnorm(100*10), 100, 10)
library(microbenchmark)
mb <- microbenchmark(colMeans(X), colMeansRcpp(X), colMeansRcppArmadillo(X),
colMedianRcpp(X), times=50)
print(mb)
*/
And here is the result on my machine, with the concise Armadillo version about as fast as yours, and median a little slower as it has to do more work:
R> sourceCpp("/tmp/stephen.cpp")
R> set.seed(42)
R> X <- matrix(rnorm(100*10), 100, 10)
R> library(microbenchmark)
R> mb <- microbenchmark(colMeans(X), colMeansRcpp(X), colMeansRcppArmadillo(X),
+ colMedianRcpp(X), times=50)
R> print(mb)
Unit: microseconds
expr min lq median uq max neval
colMeans(X) 9.469 10.422 11.5810 12.421 30.597 50
colMeansRcpp(X) 3.922 4.281 4.5245 5.306 18.020 50
colMeansRcppArmadillo(X) 4.196 4.549 4.9295 5.927 11.159 50
colMedianRcpp(X) 15.615 16.291 16.7290 17.971 27.026 50
R>

Resources