In R we can use unique method to find unique rows
> data <- matrix(c(1,1,0,1,1,1,0,1),ncol = 2)
> data
[,1] [,2]
[1,] 1 1
[2,] 1 1
[3,] 0 0
[4,] 1 1
> unique(data)
[,1] [,2]
[1,] 1 1
[2,] 0 0
How can we do it for arma::mat in Rcpp?
Here unique function returns unique elements not unique rows.
I don't think there is a built-in way to do this in the Armadillo library, but here is a simple approach:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
template <typename T>
inline bool rows_equal(const T& lhs, const T& rhs, double tol = 0.00000001) {
return arma::approx_equal(lhs, rhs, "absdiff", tol);
}
// [[Rcpp::export]]
arma::mat unique_rows(const arma::mat& x) {
unsigned int count = 1, i = 1, j = 1, nr = x.n_rows, nc = x.n_cols;
arma::mat result(nr, nc);
result.row(0) = x.row(0);
for ( ; i < nr; i++) {
bool matched = false;
if (rows_equal(x.row(i), result.row(0))) continue;
for (j = i + 1; j < nr; j++) {
if (rows_equal(x.row(i), x.row(j))) {
matched = true;
break;
}
}
if (!matched) result.row(count++) = x.row(i);
}
return result.rows(0, count - 1);
}
/*** R
data <- matrix(c(1,1,0,1,1,1,0,1), ncol = 2)
all.equal(unique(data), unique_rows(data))
#[1] TRUE
data2 <- matrix(1:9, nrow = 3)
all.equal(unique(data2), unique_rows(data2))
#[1] TRUE
data3 <- matrix(0, nrow = 3, ncol = 3)
all.equal(unique(data3), unique_rows(data3))
#[1] TRUE
data4 <- matrix(c(0, 0, 0, 1, 1, 0, 1, 1), ncol = 2)
all.equal(unique(data4), unique_rows(data4))
#[1] TRUE
*/
As suggested by mtall in the comments, rows_equal is using arma::approx_equal to test for equality, rather than operator==, to avoid some of the comparison issues inherent to floating point numbers. The options used in this function were chosen somewhat arbitrarily and can of course be changed as needed; but the value of tol is roughly equal to the default tolerance used by R's all.equal, which is .Machine$double.eps^0.5 (~0.00000001490116 on my machine).
Same approach inspired by #nrussell, slightly shorter:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
template <typename T>
inline bool approx_equal_cpp(const T& lhs, const T& rhs, double tol = 0.00000001) {
return arma::approx_equal(lhs, rhs, "absdiff", tol);
}
// [[Rcpp::export]]
arma::mat unique_rows(const arma::mat& m) {
arma::uvec ulmt = arma::zeros<arma::uvec>(m.n_rows);
for (arma::uword i = 0; i < m.n_rows; i++) {
for (arma::uword j = i + 1; j < m.n_rows; j++) {
if (approx_equal_cpp(m.row(i), m.row(j))) { ulmt(j) = 1; break; }
}
}
return m.rows(find(ulmt == 0));
}
// [[Rcpp::export]]
arma::mat unique_cols(const arma::mat& m) {
arma::uvec vlmt = arma::zeros<arma::uvec>(m.n_cols);
for (arma::uword i = 0; i < m.n_cols; i++) {
for (arma::uword j = i + 1; j < m.n_cols; j++) {
if (approx_equal_cpp(m.col(i), m.col(j))) { vlmt(j) = 1; break; }
}
}
return m.cols(find(vlmt == 0));
}
/*** R
data <- matrix(c(1,1,0,1,1,1,0,1), ncol = 2)
all.equal(unique(data), unique_rows(data))
#[1] TRUE
data2 <- matrix(1:9, nrow = 3)
all.equal(unique(data2), unique_rows(data2))
#[1] TRUE
data3 <- matrix(0, nrow = 3, ncol = 3)
all.equal(unique(data3), unique_rows(data3))
#[1] TRUE
data4 <- matrix(c(0, 0, 0, 1, 1, 0, 1, 1), ncol = 2)
all.equal(unique(data4), unique_rows(data4))
#[1] TRUE
*/
Related
Is there a way to allocate an Rcpp List of length n, where each element of the List will be filled with a NumericMatrix, but the size of each NumericMatrix can change?
I have an idea for doing this using std::list and push_back(), but the size of the list may be quite large and I want to avoid the overhead of creating an extra copy of the list when I return from the function.
The below R code gives an idea of what I hope to do:
myvec = function(n) {
x = vector("list", n)
for (i in seq_len(n)) {
nc = sample(1:3, 1)
nr = sample(1:3, 1)
x[[i]] = matrix(rbinom(nc * nr, size = 1, prob = 0.5),
nrow = nr, ncol = nc)
}
x
}
This could result in something like:
> myvec(2)
[[1]]
[,1]
[1,] 0
[2,] 1
[[2]]
[,1] [,2] [,3]
[1,] 0 1 0
[2,] 0 1 1
Update: based on the comments of #Dirk and #Ralf, I created functions based on Rcpp::List and std::list with a wrap at the end. Speed comparisons don't seem to favor one version over the other, but perhaps there's an inefficiency I'm not aware of.
src = '
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::List myvec(int n) {
Rcpp::RNGScope rngScope;
Rcpp::List x(n);
// Rcpp::IntegerVector choices = {1, 2 ,3};
Rcpp::IntegerVector choices = Rcpp::seq_len(50);
for (int i = 0; i < n; ++i) {
int nc = Rcpp::sample(choices, 1).at(0);
int nr = Rcpp::sample(choices, 1).at(0);
Rcpp::NumericVector entries = Rcpp::rbinom(nc * nr, 1, 0.5);
x(i) = Rcpp::NumericMatrix(nc, nr, entries.begin());
}
return x;
}
// [[Rcpp::export]]
Rcpp::List myvec2(int n) {
Rcpp::RNGScope scope;
std::list< Rcpp::NumericMatrix > x;
// Rcpp::IntegerVector choices = {1, 2 ,3};
Rcpp::IntegerVector choices = Rcpp::seq_len(50);
for (int i = 0; i < n; ++i) {
int nc = Rcpp::sample(choices, 1).at(0);
int nr = Rcpp::sample(choices, 1).at(0);
Rcpp::NumericVector entries = Rcpp::rbinom(nc * nr, 1, 0.5);
x.push_back( Rcpp::NumericMatrix(nc, nr, entries.begin()));
}
return Rcpp::wrap(x);
}
'
sourceCpp(code = src)
Resulting benchmarks on my computer are:
> library(microbenchmark)
> rcpp_list = function() {
+ set.seed(10);myvec(105)
+ }
> std_list = function() {
+ set.seed(10);myvec2(105)
+ }
> microbenchmark(rcpp_list(), std_list(), times = 1000)
Unit: milliseconds
expr min lq mean median uq
rcpp_list() 1.8901 1.92535 2.205286 1.96640 2.22380
std_list() 1.9164 1.95570 2.224941 2.00555 2.32315
max neval cld
7.1569 1000 a
7.1194 1000 a
The fundamental issue that Rcpp objects are R objects governed my R's memory management where resizing is expensive: full copies.
So when I have tasks similar to yours where sizes may change, or are unknown, I often work with different data structures -- the STL gives us plenty -- and only convert to R(cpp) at the return step at the end.
The devil in the detail here (as always). Profile, experiment, ...
Edit: And in the narrower sense of "can we return a List of NumericMatrix objects with varying sizes" the answer is of course we can because that is what List objects do. You can also insert other types.
As Dirk said, it is of course possible to create a list with matrices of different size. To make it a bit more concrete, here a translation of your R function:
#include <Rcpp.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
Rcpp::List myvec(int n) {
Rcpp::List x(n);
Rcpp::IntegerVector choices = {1, 2 ,3};
for (int i = 0; i < n; ++i) {
int nc = Rcpp::sample(choices, 1).at(0);
int nr = Rcpp::sample(choices, 1).at(0);
Rcpp::NumericVector entries = Rcpp::rbinom(nc * nr, 1, 0.5);
x(i) = Rcpp::NumericMatrix(nc, nr, entries.begin());
}
return x;
}
/***R
myvec(2)
*/
The main difference to the R code are the explicitly named vectors choices and entries, which are only implicit in the R code.
In R, we can use Matrix::nearPD() to calculate nearest positive definite matrix.
I have written a Rcpp-version, nearPD_c, myself as follows (c++ file),
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace arma;
using namespace Rcpp;
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
vec rep_each(const vec& x, const int each) {
std::size_t n=x.n_elem;
std::size_t n_out=n*each;
vec res(n_out);
auto begin = res.begin();
for (std::size_t i = 0, ind = 0; i < n; ind += each, ++i) {
auto start = begin + ind;
auto end = start + each;
std::fill(start, end, x[i]);
}
return res;
}
mat mat_vec_same_len(mat mt1, vec v1){
//do not check the input...
int t=0;
for(int i=0;i<mt1.n_cols;i++){
for(int j=0;j<mt1.n_rows;j++){
mt1(j,i)=mt1(j,i)*v1(t);
t++;
}
}
return(mt1);
}
// [[Rcpp::export]]
vec pmax_c(double a, vec b){
vec c(b.n_elem);
for(int i=0;i<b.n_elem;i++){
c(i)=std::max(a,b(i));
}
return c;
}
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
mat nearPD_c(mat x,
bool corr = false, bool keepDiag = false
,bool do2eigen = true // if TRUE do a sfsmisc::posdefify() eigen step
,bool doSym = false // symmetrize after tcrossprod()
, bool doDykstra = true // do use Dykstra's correction
,bool only_values = false // if TRUE simply return lambda[j].
, double eig_tol = 1e-6 // defines relative positiveness of eigenvalues compared to largest
, double conv_tol = 1e-7 // convergence tolerance for algorithm
,double posd_tol = 1e-8 // tolerance for enforcing positive definiteness
, int maxit = 100 // maximum number of iterations allowed
, bool trace = false // set to TRUE (or 1 ..) to trace iterations
){
int n = x.n_cols;
vec diagX0;
if(keepDiag) {
diagX0 = x.diag();
}
mat D_S;
if(doDykstra) {
//D_S should be like x, but filled with '0' -- following also works for 'Matrix':
D_S = x;
D_S.zeros(); //set all element
}
mat X = x;
int iter = 0 ;
bool converged = false;
double conv = R_PosInf;
mat Y;
mat R;
mat B;
while (iter < maxit && !converged) {
Y = X;
if(doDykstra){
R = Y - D_S;
}
vec d;
mat Q;
if(doDykstra){
B=R;
}else{
B=Y;
}
eig_sym(d, Q, B);
// create mask from relative positive eigenvalues
uvec p= (d>eig_tol*d[1]);
if(sum(p)==0){
//stop("Matrix seems negative semi-definite")
break;
}
// use p mask to only compute 'positive' part
uvec p_indexes(sum(p));
int p_i_i=0;
for(int i=0;i<p.n_elem;i++){
if(p(i)){
p_indexes(p_i_i)=i;
p_i_i++;
}
}
Q=Q.cols(p_indexes);
X=mat_vec_same_len(Q,rep_each(d.elem(p_indexes),Q.n_rows))*Q.t();
// update Dykstra's correction D_S = \Delta S_k
if(doDykstra){
D_S = X - R;
}
// project onto symmetric and possibly 'given diag' matrices:
if(doSym){
X = (X + X.t())/2;
}
if(corr){
X.diag().ones(); //set diagnols as ones
}
else if(keepDiag){
X.diag() = diagX0;
}
conv = norm(Y-X,"inf")/norm(Y,"inf");
iter = iter + 1;
if (trace){
// cat(sprintf("iter %3d : #{p}=%d, ||Y-X|| / ||Y||= %11g\n",
// iter, sum(p), conv))
Rcpp::Rcout << "iter " << iter <<" : #{p}= "<< sum(p) << std::endl;
}
converged = (conv <= conv_tol);
// force symmetry is *NEVER* needed, we have symmetric X here!
//X <- (X + t(X))/2
if(do2eigen || only_values) {
// begin from posdefify(sfsmisc)
eig_sym(d, Q, X);
double Eps = posd_tol * std::abs(d[1]);
// if (d[n] < Eps) { //should be n-1?
if (d(n-1) < Eps) {
uvec d_comp = d < Eps;
for(int i=0;i<sum(d_comp);i++){
if(d_comp(i)){
d(i)=Eps;
}
}
// d[d < Eps] = Eps; //how to assign values likes this?
if(!only_values) {
vec o_diag = X.diag();
X = Q * (d *Q.t());
vec D = sqrt(pmax_c(Eps, o_diag)/X.diag());
x=D * X * rep_each(D, n);
}
}
if(only_values) return(d);
// unneeded(?!): X <- (X + t(X))/2
if(corr) {
X.diag().ones(); //set diag as ones
}
else if(keepDiag){
X.diag()= diagX0;
}
} //end from posdefify(sfsmisc)
}
if(!converged){ //not converged
Rcpp::Rcout << "did not converge! " <<std::endl;
}
return X;
// return List::create(_["mat"] = X,_["eigenvalues"]=d,
//
// _["corr"] = corr, _["normF"] = norm(x-X, "fro"), _["iterations"] = iter,
// _["rel.tol"] = conv, _["converged"] = converged);
}
However, although nearPD and nearPD_c give similar results, they are not identical. For example (in R):
> mt0=matrix(c(0.5416, -0.0668 , -0.1538, -0.2435,
+ -0.0668 , 0.9836 , -0.0135 , -0.0195,
+ -0.1538 , -0.0135 , 0.0226 , 0.0334,
+ -0.2435, -0.0195 , 0.0334 , 0.0487),4,byrow = T)
> nearPD(mt0)$mat
4 x 4 Matrix of class "dpoMatrix"
[,1] [,2] [,3] [,4]
[1,] 0.55417390 -0.06540967 -0.14059121 -0.22075966
[2,] -0.06540967 0.98375373 -0.01203943 -0.01698557
[3,] -0.14059121 -0.01203943 0.03650733 0.05726836
[4,] -0.22075966 -0.01698557 0.05726836 0.08983952
> nearPD_c(mt0)
[,1] [,2] [,3] [,4]
[1,] 0.55417390 -0.06540967 -0.14059123 -0.22075967
[2,] -0.06540967 0.98375373 -0.01203944 -0.01698557
[3,] -0.14059123 -0.01203944 0.03650733 0.05726837
[4,] -0.22075967 -0.01698557 0.05726837 0.08983952
There are some differences in 7th or 8th decimal, which make nearPD(mt0) positive define while nearPD_c(mt0) not.
> chol(nearPD(mt0)$mat)
4 x 4 Matrix of class "Cholesky"
[,1] [,2] [,3] [,4]
[1,] 7.444286e-01 -8.786561e-02 -1.888579e-01 -2.965491e-01
[2,] . 9.879440e-01 -2.898297e-02 -4.356729e-02
[3,] . . 1.029821e-04 1.014128e-05
[4,] . . . 1.071201e-04
> chol(nearPD_c(mt0))
Error in chol.default(nearPD_c(mt0)) :
the leading minor of order 3 is not positive definite
I sense that there might be some rounding issue in Rcpp. But I couldn't identify it. Any insights of what goes wrong?
There is at least one logic error in your post-processing. In R we have:
e <- eigen(X, symmetric = TRUE)
d <- e$values
Eps <- posd.tol * abs(d[1])
if (d[n] < Eps) {
d[d < Eps] <- Eps
[...]
While you have:
eig_sym(d, Q, X);
double Eps = posd_tol * std::abs(d[1]);
// if (d[n] < Eps) { //should be n-1?
if (d(n-1) < Eps) {
uvec d_comp = d < Eps;
for(int i=0;i<sum(d_comp);i++){
if(d_comp(i)){
d(i)=Eps;
}
}
According to the Armadillo docs, eigen values are in ascending order, while they are in decreasing order in R. So R builds Eps based on the largest eigen value, while you use the second(!) smallest. Then R compares with the smallest eigen value, while you compare with the largest. Something like this should give the same results as R (untested):
eig_sym(d, Q, X);
double Eps = posd_tol * std::abs(d[n-1]);
if (d(0) < Eps) {
uvec d_comp = d < Eps;
for(int i=0;i<sum(d_comp);i++){
if(d_comp(i)){
d(i)=Eps;
}
}
BTW, you only need // [[Rcpp::export]] for functions that you want to call from R.
I need to write to a file row by row of matrices and sparse matrices that appears in a list and I am doing something like this:
#include <RcppArmadillo.h>
// [[Rcpp::export]]
bool write_rows (Rcpp::List data, Rcpp::CharacterVector clss, int n) {
int len = data.length();
for(int i = 0; i<n; i++) {
for(int j=0; j<len; j++) {
if (clss[j] == "matrix") {
Rcpp::NumericMatrix x = data[j];
auto row = x.row(i);
// do something with row i
} else if (clss[j] == "dgCMatrix") {
arma::sp_mat x = data[j];
auto row = x.row(i);
// do something different with row i
}
}
}
return true;
}
This function can be called in R with:
data <- list(
x = Matrix::rsparsematrix(nrow = 1000, ncol = 1000, density = 0.3),
y = matrix(1:10000, nrow = 1000, ncol = 10)
)
clss <- c("dgCMatrix", "matrix")
write_rows(data, clss, 1000)
The function receives a list of matrices or sparse matrices with the same number of rows and writes those matrices row by row, ie. first writes first rows of all elements in data then the second row of all elements and etc.
My problem is that it seems that this line arma::sp_mat x = data[i]; seems to have a huge impact in performance since it seems that I am implicitly casting the list element data[j] to an Armadillo Sparse Matrix n times.
My question is: is there anyway I could avoid this? Is there a more efficient solution? I tried to find a solution by looking into readr's source code, since they also write list elements row by row, but they also do a cast for each row (in this line for example, but maybe this doesn't impact the performance because they deal with SEXPS?
With the clarification, it seems that the result should interleave the rows from each matrix. You can still do this while avoiding multiple conversions.
This is the original code, modified to generate some actual output:
// [[Rcpp::export]]
arma::mat write_rows(Rcpp::List data, Rcpp::CharacterVector clss, int nrows, int ncols) {
int len = data.length();
arma::mat result(nrows*len, ncols);
for (int i = 0, k = 0; i < nrows; i++) {
for (int j = 0; j < len; j++) {
arma::rowvec r;
if (clss[j] == "matrix") {
Rcpp::NumericMatrix x = data[j];
r = x.row(i);
}
else {
arma::sp_mat x = data[j];
r = x.row(i);
}
result.row(k++) = r;
}
}
return result;
}
The following code creates a vector of converted objects, and then extracts the rows from each object as required. The conversion is only done once per matrix. I use a struct containing a dense and sparse mat because it's a lot simpler than dealing with unions; and I don't want to drag in boost::variant or require C++17. Since there's only 2 classes we want to deal with, the overhead is minimal.
struct Matrix_types {
arma::mat m;
arma::sp_mat M;
};
// [[Rcpp::export]]
arma::mat write_rows2(Rcpp::List data, Rcpp::CharacterVector clss, int nrows, int ncols) {
const int len = data.length();
std::vector<Matrix_types> matr(len);
std::vector<bool> is_dense(len);
arma::mat result(nrows*len, ncols);
// populate the structs
for (int j = 0; j < len; j++) {
is_dense[j] = (clss[j] == "matrix");
if (is_dense[j]) {
matr[j].m = Rcpp::as<arma::mat>(data[j]);
}
else {
matr[j].M = Rcpp::as<arma::sp_mat>(data[j]);
}
}
// populate the result
for (int i = 0, k = 0; i < nrows; i++) {
for (int j = 0; j < len; j++, k++) {
if (is_dense[j]) {
result.row(k) = matr[j].m.row(i);
}
else {
arma::rowvec r(matr[j].M.row(i));
result.row(k) = r;
}
}
}
return result;
}
Running on some test data:
data <- list(
a=Matrix(1.0, 1000, 1000, sparse=TRUE),
b=matrix(2.0, 1000, 1000),
c=Matrix(3.0, 1000, 1000, sparse=TRUE),
d=matrix(4.0, 1000, 1000)
)
system.time(z <- write_rows(data, sapply(data, class), 1000, 1000))
# user system elapsed
# 185.75 35.04 221.38
system.time(z2 <- write_rows2(data, sapply(data, class), 1000, 1000))
# user system elapsed
# 4.21 0.05 4.25
identical(z, z2)
# [1] TRUE
This question is related to this and this. The difference here is that I'm not passing an Rcpp type like NumericVector or NumericMatrix, but an arma::sp_mat.
Is there any way to pass an sp_mat to C++, modify its values, and have the changes show up in the original object in R?
This can be done with a NumericMatrix, for example:
cppFunction("void frob(NumericMatrix& x)
{
for(NumericMatrix::iterator it = x.begin(); it != x.end(); ++it)
{
if(*it != 0) *it = *it + 5;
}
}")
M <- Matrix(0, 5, 1, sparse=TRUE)
M[1] <- 1.2345
m <- as.matrix(M)
frob(m)
m
#[,1]
#[1,] 6.2345
#[2,] 0.0000
#[3,] 0.0000
#[4,] 0.0000
#[5,] 0.0000
The same technique works for an arma::mat dense matrix. But for a sparse matrix, it doesn't work:
cppFunction("void frob2(arma::sp_mat& x)
{
for(arma::sp_mat::iterator it = x.begin(); it != x.end(); ++it)
{
*it = *it + 5;
}
}", depends="RcppArmadillo")
frob2(M)
M
#5 x 1 sparse Matrix of class "dgCMatrix"
#[1,] 1.2345
#[2,] .
#[3,] .
#[4,] .
#[5,] .
Unfortunately there is no auxiliary memory constructor for sparse matrices in Armadillo.
However you can construct sparse matrix like structure in C++ using pointers to R objects. Here is example:
template< typename T>
class MappedCSC {
public:
MappedCSC();
MappedCSC(std::uint32_t n_rows,
std::uint32_t n_cols,
size_t nnz,
std::uint32_t * row_indices,
std::uint32_t * col_ptrs,
T * values):
n_rows(n_rows), n_cols(n_cols), nnz(nnz), row_indices(row_indices), col_ptrs(col_ptrs), values(values) {};
const std::uint32_t n_rows;
const std::uint32_t n_cols;
const size_t nnz;
const std::uint32_t * row_indices;
const std::uint32_t * col_ptrs;
T * values;
};
using dMappedCSC = MappedCSC<double>;
Here is how you can extract it:
dMappedCSC extract_mapped_csc(Rcpp::S4 input) {
Rcpp::IntegerVector dim = input.slot("Dim");
Rcpp::NumericVector values = input.slot("x");
uint32_t nrows = dim[0];
uint32_t ncols = dim[1];
Rcpp::IntegerVector row_indices = input.slot("i");
Rcpp::IntegerVector col_ptrs = input.slot("p");
return dMappedCSC(nrows, ncols, values.length(), (uint32_t *)row_indices.begin(), (uint32_t *)col_ptrs.begin(), values.begin());
}
And here is example on how to iterate column by column:
Rcpp::NumericMatrix dense_csc_prod(const Rcpp::NumericMatrix &x_r, const Rcpp::S4 &y_csc_r) {
const arma::dmat x = arma::dmat((double *)&x_r[0], x_r.nrow(), x_r.ncol(), false, false);
const dMappedCSC y_csc = extract_mapped_csc(y_csc_r);
Rcpp::NumericMatrix res(x.n_rows, y_csc.n_cols);
arma::dmat res_arma_map = arma::dmat(res.begin(), res.nrow(), res.ncol(), false, false);
for (uint32_t i = 0; i < y_csc.n_cols; i++) {
const uint32_t p1 = y_csc.col_ptrs[i];
const uint32_t p2 = y_csc.col_ptrs[i + 1];
// mapped indices are uint32_t, but arma only allows indices be uvec = vec<uword> = vec<size_t>
// so we need to construct these indices by copying from uint32_t to uword
const arma::Col<uint32_t> idx_temp = arma::Col<uint32_t>(&y_csc.row_indices[p1], p2 - p1);
const arma::uvec idx = arma::conv_to<arma::uvec>::from(idx_temp);
const arma::colvec y_csc_col = arma::colvec(&y_csc.values[p1], p2 - p1, false, false);
res_arma_map.col(i) = x.cols(idx) * y_csc_col;
}
return res;
}
I am trying to find an optimal way for finding the index of the maximum value in each row. The problem is that I cannot find a really efficient way in doing it.
An example:
Dummy <- matrix(runif(500000000,0,3), ncol = 10000)
> system.time(max.col(Dummy, "first"))
user system elapsed
5.532 0.075 5.599
> system.time(apply(Dummy,1,which.max))
user system elapsed
14.638 0.210 14.828
> system.time(rowRanges(Dummy))
user system elapsed
2.083 0.029 2.109
My main question is, why is it more than 2 times so slow to calculate the indices of the max value in comparison with calculating the max and the min with the rowRanges function. Is there a way how I can improve the performance of calculating the index of the max of each row?
Expanding on krlmlr's answer, some benchmarks:
On dataset:
set.seed(007); Dummy <- matrix(runif(50000000,0,3), ncol = 1000)
maxCol_R is an R by-column loop, maxCol_col is a C by-column loop, maxCol_row is a C by-row loop.
microbenchmark::microbenchmark(max.col(Dummy, "first"), maxCol_R(Dummy), maxCol_col(Dummy), maxCol_row(Dummy), times = 30)
#Unit: milliseconds
# expr min lq median uq max neval
# max.col(Dummy, "first") 1209.28408 1245.24872 1268.34146 1291.26612 1504.0072 30
# maxCol_R(Dummy) 1060.99994 1084.80260 1099.41400 1154.11213 1436.2136 30
# maxCol_col(Dummy) 86.52765 87.22713 89.00142 93.29838 122.2456 30
# maxCol_row(Dummy) 577.51613 583.96600 598.76010 616.88250 671.9191 30
all.equal(max.col(Dummy, "first"), maxCol_R(Dummy))
#[1] TRUE
all.equal(max.col(Dummy, "first"), maxCol_col(Dummy))
#[1] TRUE
all.equal(max.col(Dummy, "first"), maxCol_row(Dummy))
#[1] TRUE
And the functions:
maxCol_R = function(x)
{
ans = rep_len(1L, nrow(x))
mx = x[, 1L]
for(j in 2:ncol(x)) {
tmp = x[, j]
wh = which(tmp > mx)
ans[wh] = j
mx[wh] = tmp[wh]
}
ans
}
maxCol_col = inline::cfunction(sig = c(x = "matrix"), body = '
int nr = INTEGER(getAttrib(x, R_DimSymbol))[0], nc = INTEGER(getAttrib(x, R_DimSymbol))[1];
double *px = REAL(x), *buf = (double *) R_alloc(nr, sizeof(double));
for(int i = 0; i < nr; i++) buf[i] = R_NegInf;
SEXP ans = PROTECT(allocVector(INTSXP, nr));
int *pans = INTEGER(ans);
for(int j = 0; j < nc; j++) {
for(int i = 0; i < nr; i++) {
if(px[i + j*nr] > buf[i]) {
buf[i] = px[i + j*nr];
pans[i] = j + 1;
}
}
}
UNPROTECT(1);
return(ans);
', language = "C")
maxCol_row = inline::cfunction(sig = c(x = "matrix"), body = '
int nr = INTEGER(getAttrib(x, R_DimSymbol))[0], nc = INTEGER(getAttrib(x, R_DimSymbol))[1];
double *px = REAL(x), *buf = (double *) R_alloc(nr, sizeof(double));
for(int i = 0; i < nr; i++) buf[i] = R_NegInf;
SEXP ans = PROTECT(allocVector(INTSXP, nr));
int *pans = INTEGER(ans);
for(int i = 0; i < nr; i++) {
for(int j = 0; j < nc; j++) {
if(px[i + j*nr] > buf[i]) {
buf[i] = px[i + j*nr];
pans[i] = j + 1;
}
}
}
UNPROTECT(1);
return(ans);
', language = "C")
EDIT Jun 10 '16
With slight changes to find the indices of both max and min:
rangeCol = inline::cfunction(sig = c(x = "matrix"), body = '
int nr = INTEGER(getAttrib(x, R_DimSymbol))[0], nc = INTEGER(getAttrib(x, R_DimSymbol))[1];
double *px = REAL(x),
*maxbuf = (double *) R_alloc(nr, sizeof(double)),
*minbuf = (double *) R_alloc(nr, sizeof(double));
memcpy(maxbuf, &(px[0 + 0*nr]), nr * sizeof(double));
memcpy(minbuf, &(px[0 + 0*nr]), nr * sizeof(double));
SEXP ans = PROTECT(allocMatrix(INTSXP, nr, 2));
int *pans = INTEGER(ans);
for(int i = 0; i < LENGTH(ans); i++) pans[i] = 1;
for(int j = 1; j < nc; j++) {
for(int i = 0; i < nr; i++) {
if(px[i + j*nr] > maxbuf[i]) {
maxbuf[i] = px[i + j*nr];
pans[i] = j + 1;
}
if(px[i + j*nr] < minbuf[i]) {
minbuf[i] = px[i + j*nr];
pans[i + nr] = j + 1;
}
}
}
UNPROTECT(1);
return(ans);
', language = "C")
set.seed(007); m = matrix(sample(24) + 0, 6, 4)
m
# [,1] [,2] [,3] [,4]
#[1,] 24 7 23 6
#[2,] 10 17 21 11
#[3,] 3 22 20 14
#[4,] 2 18 1 15
#[5,] 5 19 12 8
#[6,] 16 4 9 13
rangeCol(m)
# [,1] [,2]
#[1,] 1 4
#[2,] 3 1
#[3,] 2 1
#[4,] 2 3
#[5,] 2 1
#[6,] 1 2
Here's a pretty basic Rcpp implementation:
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::NumericVector MaxCol(Rcpp::NumericMatrix m) {
R_xlen_t nr = m.nrow(), nc = m.ncol(), i = 0;
Rcpp::NumericVector result(nr);
for ( ; i < nr; i++) {
double current = m(i, 0);
R_xlen_t idx = 0, j = 1;
for ( ; j < nc; j++) {
if (m(i, j) > current) {
current = m(i, j);
idx = j;
}
}
result[i] = idx + 1;
}
return result;
}
/*** R
microbenchmark::microbenchmark(
"Rcpp" = MaxCol(Dummy),
"R" = max.col(Dummy, "first"),
times = 200L
)
#Unit: milliseconds
# expr min lq mean median uq max neval
# Rcpp 221.7777 224.7442 242.0089 229.6407 239.6339 455.9549 200
# R 513.4391 524.7585 562.7465 539.4829 562.3732 944.7587 200
*/
I had to scale your sample data down by an order of magnitude since my laptop did not have enough memory, but the results should translate on your original sample data:
Dummy <- matrix(runif(50000000,0,3), ncol = 10000)
all.equal(MaxCol(Dummy), max.col(Dummy, "first"))
#[1] TRUE
This can be changed slightly to return the indices of the min and max in each row:
// [[Rcpp::export]]
Rcpp::NumericMatrix MinMaxCol(Rcpp::NumericMatrix m) {
R_xlen_t nr = m.nrow(), nc = m.ncol(), i = 0;
Rcpp::NumericMatrix result(nr, 2);
for ( ; i < nr; i++) {
double cmin = m(i, 0), cmax = m(i, 0);
R_xlen_t min_idx = 0, max_idx = 0, j = 1;
for ( ; j < nc; j++) {
if (m(i, j) > cmax) {
cmax = m(i, j);
max_idx = j;
}
if (m(i, j) < cmin) {
cmin = m(i, j);
min_idx = j;
}
}
result(i, 0) = min_idx + 1;
result(i, 1) = max_idx + 1;
}
return result;
}
R stores matrices in column-major order. Therefore, iterating over the columns will be usually faster because the values for one column are close to each other in memory and will travel through the cache hierarchy in one go:
Dummy <- matrix(runif(100000000,0,3), ncol = 10000)
system.time(apply(Dummy,1,function(x) NULL))
## user system elapsed
## 1.360 0.160 1.519
system.time(apply(Dummy,2,function(x) NULL))
## user system elapsed
## 0.94 0.12 1.06
This should be close to the minimal time even the fastest Rcpp solution will be able to obtain. Any solution that uses apply() will have to copy each column/row, this can be saved when using Rcpp. You decide if the potential speed-up by a factor of 2 is worth the effort to you.
Generally, the fastest way to do things in R is to call C, C++, or FORTRAN.
It appears that matrixStats::rowRanges is implemented in C which explains why it is the fastest.
If you want to improve performance even more, there is presumably a little bit of speed to gain in modifying the rowRanges.c code to ignore the minimum and just get the maximum, but I think the gains will be very small.
Tried with STL algorithms and RcppArmadillo.
microbenchmark::microbenchmark(MaxColArmadillo(Dummy), #Using RcppArmadillo
MaxColAlgorithm(Dummy), #Using STL algorithm max_element
maxCol_col(Dummy), #Column processing
maxCol_row(Dummy)) #Row processing
Unit: milliseconds
expr min lq mean median uq max neval
MaxColArmadillo(Dummy) 227.95864 235.01426 261.4913 250.17897 276.7593 399.6183 100
MaxColAlgorithm(Dummy) 292.77041 345.84008 392.1704 390.66578 433.8009 552.2349 100
maxCol_col(Dummy) 40.64343 42.41487 53.7250 48.10126 61.3781 128.4968 100
maxCol_row(Dummy) 146.96077 158.84512 173.0941 169.20323 178.7959 272.6261 100
STL implementation
#include <Rcpp.h>
// [[Rcpp::export]]
// Argument is a matrix ansd returns a
// vector of max of each of the rows of the matrix
Rcpp::NumericVector MaxColAlgorithm(Rcpp::NumericMatrix m) {
//int numOfRows = m.rows();
//Create vector with 0 of size numOfRows
Rcpp::NumericVector total(m.rows());
for(int i = 0; i < m.rows(); ++i)
{
//Create vector of the rows of matrix
Rcpp::NumericVector rVec = m.row(i);
//Apply STL max of elemsnts on the vector and store in a vector
total(i) = *std::max_element(rVec.begin(), rVec.end());
}
return total;
}
RcppArmadillo implementation
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
arma::mat MaxColArmadillo(arma::mat x)
{
//RcppArmadillo max function where dim = 1 means max of each row
// of the matrix
return(max(x,1));
}