Matrix indexing via integer vector - r

I want to access non-consecutive matrix elements and then pass the sub-selection to (for instance) the sum() function. In the example below I get a compile error about invalid conversion.
I am relatively new to Rcpp, so I am sure the answer is simple. Perhaps I am missing some type of cast?
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::plugins("cpp11")]]
double sumExample() {
// these are the matrix row elements I want to sum (the column in this example will be fixed)
IntegerVector a = {2,4,6};
// create 10x10 matrix filled with random numbers [0,1]
NumericVector v = runif(100);
NumericMatrix x(10, 10, v.begin());
// sum the row elements 2,4,6 from column 0
double result = sum( x(a,0) );
return(result);
}

You were close. Indexing uses [] only -- see this write up at the Rcpp Gallery -- and you missed the export tag. The main issue is that compound expresssion are sometimes too much for the compiler and the template programming. So it works if you take it apart.
Corrected Code
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::plugins("cpp11")]]
// [[Rcpp::export]]
double sumExample() {
// these are the matrix row elements I want to sum
// (the column in this example will be fixed)
IntegerVector a = {2,4,6};
// create 10x10 matrix filled with random numbers [0,1]
NumericVector v = runif(100);
NumericMatrix x(10, 10, v.begin());
// sum the row elements 2,4,6 from column 0
NumericVector z1 = x.column(0);
NumericVector z2 = z1[a];
double result = sum( z2 );
return(result);
}
/*** R
sumExample()
*/
Demo
R> Rcpp::sourceCpp("~/git/stackoverflow/56739765/question.cpp")
R> sumExample()
[1] 0.758416
R>

Related

Using sample() from within Rcpp

I have a matrix containing probabilities, with each of the four columns corresponding to a score (an integer in sequence from 0 to 4). I want to sample a single score for each row using the probabilities contained in that row as sampling weights. In rows where some columns do not contain probabilities (NAs instead), the sampling frame is limited to the columns (and their corresponding scores) which do (e.g. for a row with 0.45,0.55,NA,NA, either 0 or 1 would be sampled). However, I get this error (followed by several others), so how can I make it work?:
error: no matching function for call to 'as<Rcpp::IntegerVector>(Rcpp::Matrix<14>::Sub&)'
score[i] = sample(scrs,1,true,as<IntegerVector>(probs));
Existing answers suggest RcppArmadillo is the solution but I can't get that to work either. If I add:
require(RcppArmadillo)
before the cppFunction and
score[i] = Rcpp::RcppArmadillo::sample(scrs,1,true,probs);
in place of the existing sample() statement, I get:
error: 'Rcpp::RcppArmadillo' has not been declared
score[i] = Rcpp::RcppArmadillo::sample(scrs,1,true,probs);
Or if I also include,
#include <RcppArmadilloExtensions/sample.h>
at the top, I get:
fatal error: RcppArmadilloExtensions/sample.h: No such file or directory
#include <RcppArmadilloExtensions/sample.h>
Reproducible code:
p.vals <- matrix(c(0.44892077,0.55107923,NA,NA,
0.37111195,0.62888805,NA,NA,
0.04461714,0.47764478,0.303590351,1.741477e-01,
0.91741642,0.07968127,0.002826406,7.589714e-05,
0.69330800,0.24355559,0.058340934,4.795468e-03,
0.43516823,0.43483784,0.120895859,9.098067e-03,
0.73680809,0.22595438,0.037237525,NA,
0.89569365,0.10142719,0.002879163,NA),nrow=8,ncol=4,byrow=TRUE)
step.vals <- c(1,1,3,3,3,3,2,2)
require(Rcpp)
cppFunction('IntegerVector scores_cpp(NumericMatrix p, IntegerVector steps){
int prows = p.nrow();
IntegerVector score(prows);
for(int i=0;i<prows;i++){
int step = steps[i];
IntegerVector scrs = seq(0,step);
int start = 0;
int end = step;
NumericMatrix::Sub probs = p(Range(i,i),Range(start,end));
score[i] = sample(scrs,1,true,probs);
}
return score;
}')
test <- scores_cpp(p.vals,step.vals)
test
Note: the value of step.vals for each row is always equal to the number of columns containing probabilities in that row -1. So passing the step.values to the function may be extraneous.
You may be having a 'forest for the trees' moment here. The RcppArmadillo unit tests actually provide a working example. If you look at the source file inst/tinytest/test_sample.R, it has a simple
Rcpp::sourceCpp("cpp/sample.cpp")
and in the that file inst/tinytest/cpp/sample.cpp we have the standard
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
to a) tell R to look at RcppArmadillo header directories and b) include the sampler extensions. This is how it works, and this has been documented to work for probably close to a decade.
As an example I can just do (in my $HOME directory containing git/rcpparmadillo)
> Rcpp::sourceCpp("git/rcpparmadillo/inst/tinytest/cpp/sample.cpp")
> set.seed(123)
> csample_integer(1:5, 10, TRUE, c(0.4, 0.3, 0.2, 0.05, 0.05))
[1] 1 3 2 3 4 1 2 3 2 2
>
The later Rcpp addition works the same way, but I find working with parts of matrices to be more expressive and convenient with RcppArmadillo.
Edit: Even simpler for anybody with the RcppArmadillo package installed:
< library(Rcpp)
> sourceCpp(system.file("tinytest","cpp","sample.cpp", package="RcppArmadillo"))
> set.seed(123)
> csample_integer(1:5, 10, TRUE, c(0.4, 0.3, 0.2, 0.05, 0.05))
[1] 1 3 2 3 4 1 2 3 2 2
>
Many thanks for the pointers. I also had some problems with indexing the matrix, so that part is changed, too. The following code works as intended (using sourceCpp()):
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector scores_cpp(NumericMatrix p, IntegerVector steps){
int prows = p.nrow();
IntegerVector score(prows);
for(int i=0;i<prows;i++){
int step = steps[i];
IntegerVector scrs = seq(0,step);
NumericMatrix probs = p(Range(i,i),Range(0,step));
IntegerVector sc = RcppArmadillo::sample(scrs,1,true,probs);
score[i] = sc[0];
}
return score;
}

setting element name Rcpp error stack usage

In rcpp I want to create characterVector, with the vector variable set as character element
I tried with
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
CharacterVector assignName(){
CharacterVector rn={"a","b","c"};
rn.names()=rn;
return rn;
}
/***R
assignName()
m <- assignName()
m
*/
For example i have a CharacterVector rn as a,b,c.
rn should be set : a="a", b="b", c="c"
then in R after the call of this function as :
m<-assignName()
An error occurr :
Error: C stack usage 7969212 is too close to the limit
But if i do not assign the function to a variable all works, for example if i do :
>assignName()
a b c
"a""b""c"
I am not sure why this is the case, but it seems it is not a good idea to use the vector itself as name. You can fix this by using Rcpp::clone:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
CharacterVector assignName(){
CharacterVector rn={"a","b","c"};
// original rn.names()=rn;
rn.names()=clone(rn);
return rn;
}
/***R
assignName()
m <- assignName()
m
*/

Sequence of Integers in Rcpp

I want to create a sequence of integer numbers for indexing within a matrix. The R pendant would be:
indexRow <- max(0,1):min(2,12)
matrix1[indexRow, ]
This is what i have tried in Rcpp to create the sequence of integers:
#include <Rcpp.h>
#include <algorithm>
#include <vector>
#include <numeric>
using namespace Rcpp;
using namespace std;
// [[Rcpp::export]]
NumericVector test(NumericVector x) {
IntegerVector indexRow = Rcpp::seq_along(max(0, 1), min(1, 12));
}
However I get the Error message:
no matching function for call to 'seq_along(const int&, const int&)'
How can I create a sequence of integers in Rcpp?
Here is a possible Rcpp implementation :
library(Rcpp)
cppFunction(plugins='cpp11','NumericVector myseq(int &first, int &last) {
NumericVector y(abs(last - first) + 1);
if (first < last)
std::iota(y.begin(), y.end(), first);
else {
std::iota(y.begin(), y.end(), last);
std::reverse(y.begin(), y.end());
}
return y;
}')
#> myseq(max(0,1), min(13,17))
#[1] 1 2 3 4 5 6 7 8 9 10 11 12 13
This code generates a function myseq which takes two arguments: The first and the last number in an integer series. It is similar to R's seq function called with two integer arguments seq(first, last).
A documentation on the C++11 function std::iota is given here.
seq_along takes in a vector, what you want to use is seq combined with min and max, both take vectors. seq returns an IntegerVector. Here is an example.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector test(IntegerVector a, IntegerVector b) {
IntegerVector vec = seq(max(a), min(b));
return vec;
}
In R you use
> test(c(0,1), c(2,12))
[1] 1 2

memory efficient method to calculate distance matrix [duplicate]

I have an object of class big.matrix in R with dimension 778844 x 2. The values are all integers (kilometres). My objective is to calculate the Euclidean distance matrix using the big.matrix and have as a result an object of class big.matrix. I would like to know if there is an optimal way of doing that.
The reason for my choice of using the class big.matrix is memory limitation. I could transform my big.matrix to an object of class matrix and calculate the Euclidean distance matrix using dist(). However, dist() would return an object of size that would not be allocated in the memory.
Edit
The following answer was given by John W. Emerson, author and maintainer of the bigmemory package:
You could use big algebra I expect, but this would also be a very nice use case for Rcpp via sourceCpp(), and very short and easy. But in short, we don't even attempt to provide high-level features (other than the basics which we implemented as proof-of-concept). No single algorithm could cover all use cases once you start talking out-of-memory big.
Here is a way using RcppArmadillo. Much of this is very similar to the RcppGallery example. This will return a big.matrix with the associated pairwise (by row) euclidean distances. I like to wrap my big.matrix functions in a wrapper function to create a cleaner syntax (i.e. avoid the #address and other initializations.
Note - as we are using bigmemory (and therefore concerned with RAM usage) I have this example returned the N-1 x N-1 matrix of only lower triangular elements. You could modify this but this is what I threw together.
euc_dist.cpp
// To enable the functionality provided by Armadillo's various macros,
// simply include them before you include the RcppArmadillo headers.
#define ARMA_NO_DEBUG
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo, BH, bigmemory)]]
using namespace Rcpp;
using namespace arma;
// The following header file provides the definitions for the BigMatrix
// object
#include <bigmemory/BigMatrix.h>
// C++11 plugin
// [[Rcpp::plugins(cpp11)]]
template <typename T>
void BigArmaEuclidean(const Mat<T>& inBigMat, Mat<T> outBigMat) {
int W = inBigMat.n_rows;
for(int i = 0; i < W - 1; i++){
for(int j=i+1; j < W; j++){
outBigMat(j-1,i) = sqrt(sum(pow((inBigMat.row(i) - inBigMat.row(j)),2)));
}
}
}
// [[Rcpp::export]]
void BigArmaEuc(SEXP pInBigMat, SEXP pOutBigMat) {
// First we tell Rcpp that the object we've been given is an external
// pointer.
XPtr<BigMatrix> xpMat(pInBigMat);
XPtr<BigMatrix> xpOutMat(pOutBigMat);
int type = xpMat->matrix_type();
switch(type) {
case 1:
BigArmaEuclidean(
arma::Mat<char>((char *)xpMat->matrix(), xpMat->nrow(), xpMat->ncol(), false),
arma::Mat<char>((char *)xpOutMat->matrix(), xpOutMat->nrow(), xpOutMat->ncol(), false)
);
return;
case 2:
BigArmaEuclidean(
arma::Mat<short>((short *)xpMat->matrix(), xpMat->nrow(), xpMat->ncol(), false),
arma::Mat<short>((short *)xpOutMat->matrix(), xpOutMat->nrow(), xpOutMat->ncol(), false)
);
return;
case 4:
BigArmaEuclidean(
arma::Mat<int>((int *)xpMat->matrix(), xpMat->nrow(), xpMat->ncol(), false),
arma::Mat<int>((int *)xpOutMat->matrix(), xpOutMat->nrow(), xpOutMat->ncol(), false)
);
return;
case 8:
BigArmaEuclidean(
arma::Mat<double>((double *)xpMat->matrix(), xpMat->nrow(), xpMat->ncol(), false),
arma::Mat<double>((double *)xpOutMat->matrix(), xpOutMat->nrow(), xpOutMat->ncol(), false)
);
return;
default:
// We should never get here, but it resolves compiler warnings.
throw Rcpp::exception("Undefined type for provided big.matrix");
}
}
My little wrapper
bigMatrixEuc <- function(bigMat){
zeros <- big.matrix(nrow = nrow(bigMat)-1,
ncol = nrow(bigMat)-1,
init = 0,
type = typeof(bigMat))
BigArmaEuc(bigMat#address, zeros#address)
return(zeros)
}
The test
library(Rcpp)
sourceCpp("euc_dist.cpp")
library(bigmemory)
set.seed(123)
mat <- matrix(rnorm(16), 4)
bm <- as.big.matrix(mat)
# Call new euclidean function
bm_out <- bigMatrixEuc(bm)[]
# pull out the matrix elements for out purposes
distMat <- as.matrix(dist(mat))
distMat[upper.tri(distMat, diag=TRUE)] <- 0
distMat <- distMat[2:4, 1:3]
# check if identical
all.equal(bm_out, distMat, check.attributes = FALSE)
[1] TRUE

Rcpp: extract subset of matrix using indexmatrix

I have a question about subsetting from a matrix to a vector. The user has the possibility to explicitly give the indexmatrix (which is a matrix of the same size as M, with 0 if the entry is not wanted, and 1 if the entry has to be extracted). If the indexmatrix is provided, then we just subset it, and if the indexmatrix is not provided (indexmatrix = NULL), then we build it using type1 (which takes true or false). Only two types of indexmatrices are possible.
I used the subsetting technique provided in
Subset of a Rcpp Matrix that matches a logical statement
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
arma::colvec extractElementsRcpp(arma::mat M,
Rcpp::Nullable<Rcpp::NumericMatrix> indexmatrix = R_NilValue,
bool type1 = false) {
unsigned int D = M.n_rows; // dimension of the data
arma::mat indmatrix(D, D); // initialize indexmatrix
if (indexmatrix.isNotNull()) {
// copy indexmatrix to numericmatrix
Rcpp::NumericMatrix indexmatrixt(indexmatrix);
// make indexmatrix into arma matrix indmatrix
indmatrix = Rcpp::as<arma::mat>(indexmatrixt);
} //else {
// get indexmatrix
// Rcpp::NumericMatrix indexmatrixt = getindexmatrix(D, type1)["indexmatrix"];
// // make indexmatrix into arma matrix
// indmatrix = Rcpp::as<arma::mat>(indexmatrixt);
// }
arma::colvec unM = M.elem(find(indmatrix == 1)); // extract wanted elements
return(unM);
}
It works, great! However, the speed is not what I was hoping for. Whenever the indexmatrix is provided, the C++ code is slower than the normal R code, while I was aiming for a nice improvement in speed. I have the feeling I'm copying the matrices around too much. But I am new to C++ and did not find a way to avoid it yet.
The speed comparison is as follows:
test replications elapsed relative
2 extractElementsR(M, indexmatrix = ind) 100 0.084 1.00
1 extractElementsRcpp(M, indexmatrix = ind) 100 0.142 1.69
EDIT: The R function is defined as
extractElementsR <- function (M, indexmatrix, type1 = FALSE) {
D <- nrow(M)
# # get indexmatrix, if necessary
# if(is.null(indexmatrix)) indexmatrix <- getindexmatrix(D, type1 = type1)$indexmatrix
# extract wanted elements
return (M[which(indexmatrix > 0)])
}
One could for example take the matrices
M <- matrix(rnorm(1000^2), ncol = 1000)
indexmatrix <- matrix(1, 1000, 1000)
indexmatrix[lower.tri(indexmatrix)] <- 0
as M and indexmatrix.
EDIT2: I commented the else statement in the Rcpp function and omitted the default NULL value in the R function as it is not important for my question. I want to improve the speed of the Rcpp function when indexmatrix is provided. However, I want to keep the default NULL value (and create and indexmatrix when necessary).
Can you show
the function extractElementR() as well and
example data so that this become a reproducible example?
And at first blush, you are mixing Rcpp and RcppArmadillo types in order to subset with the latter. That will create lots of copies. We can now index with both Rcpp (and Kevin has some answers here) and RcppArmadillo (several older answers) so you could even try two different ways.

Resources