Related
I'm new to C++ programming and apologize if my solution is in plain sight. I am attempting to use RCPP to speed up a slow R function. I think I've narrowed down the issue to a nested for loop. I've simplified the function and provided one R and one RCPP version for comparison. Will someone please explain why my RCPP function yields different results? Thanks!
## Data ##
set.seed(666)
input <- rmultinom(10,2,c(.4,.5,.6)) + 1
## R ##
testR <- \(input){
M1 <- matrix(c(0.5,0.4,0.0,0.3,0.5,0.0,0.2,0.1,1.0),3,3)
M2 <- matrix(c(0.75,0.0,0.0,0.0,0.6,0.0,0.25,0.4,1.0),3,3)
Mrows <- nrow(M1)
tmsteps <- ncol(input)
N <- nrow(input)
alphas <- NULL; tmp <- NULL; out <- NULL
for(i in 1:N){
alphas = c(0,-1e6,-1e6)
for(j in 1:tmsteps){
for(k in 1:Mrows){
tmp[k] = sum(alphas + M1[,k] + M2[k, input[i,j] ])
}
alphas <- tmp
}
out[i] <- sum(alphas)
}
sum(out)
}
## RCPP ##
cppFunction('double testRCPP(IntegerMatrix input){
NumericVector v1 = {0.5,0.4,0.0,0.3,0.5,0.0,0.2,0.1,1.0};
v1.attr("dim") = Dimension(3, 3);
NumericMatrix M1 = as<NumericMatrix>(v1);
NumericVector v2 = {0.75,0.0,0.0,0.0,0.6,0.0,0.25,0.4,1.0};
v2.attr("dim") = Dimension(3, 3);
NumericMatrix M2 = as<NumericMatrix>(v2);
int Mrows = M1.nrow();
int tmsteps = input.ncol();
int N = input.nrow();
NumericVector alphas(3);
NumericVector tmp(3);
NumericVector out(N);
for(int i=0; i<N; i++){
alphas = {0,-1e6,-1e6};
for(int j=0; j<tmsteps; j++){
for(int k=0; k<Mrows; k++){
tmp[k] = sum(alphas + M1(_,k) + M2(k, (input(i,j) - 1) ));
}
alphas = tmp;
}
out += alphas;
}
return(sum(out));
}')
> testRCPP(input)
[1] -2.273726e+14
> testR(input)
[1] -354293536945
I have figured out how to get the Rcpp to behave like the R function. I think my issue has to do with C++ variable scoping.
I had previously been initializing the tmp variable outside the nested for loop.
NumericVector tmp(3);
for(int i=0; i<N; i++){
alphas = {0,-1e6,-1e6};
...
All is good when I declare the tmp variable inside the loop, although I don't understand why yet.
for(int i=0; i<N; i++){
alphas = {0,-1e6,-1e6};
for(int j=0; j<tmsteps; j++){
NumericVector tmp(3);
for(int k=0; k<Mrows; k++){
tmp[k] = sum(alphas + M1(_,k) + M2(k, (input(i,j) - 1) ));
}
alphas = tmp;
}
...
I have a following R code which is not efficient. I would like to make this efficient using Rcpp. Particularly, I am not used to dealing with array in Rcpp. Any help would be appreciated.
myfunc <- function(n=1600,
m=400,
p = 3,
time = runif(n,min=0.05,max=4),
qi21 = rnorm(n),
s0c = rnorm(n),
zc_min_ecox_multi = array(rnorm(n*n*p),dim=c(n,n,p)),
qi=matrix(0,n,n),
qi11 = rnorm(p),
iIc_mat = matrix(rnorm(p*p),p,p)){
for (j in 1:n){
u<-time[j]
ind<-1*(u<=time)
locu<-which(time==u)
qi2<- sum(qi21*ind) /s0c[locu]
for (i in 1:n){
qi1<- qi11%*%iIc_mat%*%matrix(zc_min_ecox_multi[i,j,],p,1)
qi[i,j]<- -(qi1+qi2)/m
}
}
}
Computing time is about 7.35 secs. I need to call this function over and over again, maybe 20 times.
system.time(myfunc())
user system elapsed
7.34 0.00 7.35
First thing to do would be to profile your code: profvis::profvis({myfunc()}).
What you can do is precompute qi11 %*% iIc_mat once.
You get (with minor improvements):
precomp <- qi11 %*% iIc_mat
for (j in 1:n) {
u <- time[j]
qi2 <- sum(qi21[u <= time]) / s0c[time == u]
for (i in 1:n) {
qi1 <- precomp %*% zc_min_ecox_multi[i, j, ]
qi[i, j] <- -(qi1 + qi2) / m
}
}
that is twice as fast (8 sec -> 4 sec).
Vectorizing the i loop then seems straightforward:
q1_all_i <- tcrossprod(precomp, zc_min_ecox_multi[, j, ])
qi[, j] <- -(q1_all_i + qi2) / m
(12 times as fast now)
And if you want to try it in Rcpp, you will first need a function to multiply the matrices...
#include<Rcpp.h>
#include<numeric>
// [[Rcpp::plugins("cpp11")]]
Rcpp::NumericMatrix mult(const Rcpp::NumericMatrix& lhs,
const Rcpp::NumericMatrix& rhs)
{
if (lhs.ncol() != rhs.nrow())
Rcpp::stop ("Incompatible matrices");
Rcpp::NumericMatrix out(lhs.nrow(),rhs.ncol());
Rcpp::NumericVector rowvec, colvec;
for (int i = 0; i < lhs.nrow(); ++i)
{
rowvec = lhs(i,Rcpp::_);
for (int j = 0; j < rhs.ncol(); ++j)
{
colvec = rhs(Rcpp::_,j);
out(i, j) = std::inner_product(rowvec.begin(), rowvec.end(),
colvec.begin(), 0.);
}
}
return out;
}
Then port your function...
// [[Rcpp::export]]
Rcpp::NumericMatrix myfunc_rcpp( int n, int m, int p,
const Rcpp::NumericVector& time,
const Rcpp::NumericVector& qi21,
const Rcpp::NumericVector& s0c,
const Rcpp::NumericVector& zc_min_ecox_multi,
const Rcpp::NumericMatrix& qi11,
const Rcpp::NumericMatrix& iIc_mat)
{
Rcpp::NumericMatrix qi(n, n);
Rcpp::NumericMatrix outermat = mult(qi11, iIc_mat);
for (int j = 0; j < n; ++j)
{
double qi2 = 0;
for(int k = 0; k < n; ++k)
{
if(time[j] <= time[k]) qi2 += qi21[k];
}
qi2 /= s0c[j];
for (int i = 0; i < n; ++i)
{
Rcpp::NumericMatrix tmpmat(p, 1);
for(int z = 0; z < p; ++z)
{
tmpmat(z, 0) = zc_min_ecox_multi[i + n*j + z*n*n];
}
Rcpp::NumericMatrix qi1 = mult(outermat, tmpmat);
qi(i,j) -= (qi1(0,0) + qi2)/m;
}
}
return qi;
}
Then in R:
my_rcpp_func <- function(n=1600,
m=400,
p = 3,
time = runif(n,min=0.05,max=4),
qi21 = rnorm(n),
s0c = rnorm(n),
zc_min_ecox_multi = array(rnorm(n*n*p),dim=c(n,n,p)),
qi11 = rnorm(p),
iIc_mat = matrix(rnorm(p*p),p,p))
{
myfunc_rcpp(n, m, p, time, qi21, s0c, as.vector(zc_min_ecox_multi),
matrix(qi11,1,p), iIc_mat)
}
This is certainly faster, and gives the same results as your own function, but it's no quicker than the in-R optimizations suggested by F Privé. Maybe optimizing the C++ code could get things even faster, but ultimately you are multiplying 2 reasonably large matrices together over 2.5 million times, so it's never going to be all that fast. R is optimized pretty well for this kind of calculation after all...
Is there a way to allocate an Rcpp List of length n, where each element of the List will be filled with a NumericMatrix, but the size of each NumericMatrix can change?
I have an idea for doing this using std::list and push_back(), but the size of the list may be quite large and I want to avoid the overhead of creating an extra copy of the list when I return from the function.
The below R code gives an idea of what I hope to do:
myvec = function(n) {
x = vector("list", n)
for (i in seq_len(n)) {
nc = sample(1:3, 1)
nr = sample(1:3, 1)
x[[i]] = matrix(rbinom(nc * nr, size = 1, prob = 0.5),
nrow = nr, ncol = nc)
}
x
}
This could result in something like:
> myvec(2)
[[1]]
[,1]
[1,] 0
[2,] 1
[[2]]
[,1] [,2] [,3]
[1,] 0 1 0
[2,] 0 1 1
Update: based on the comments of #Dirk and #Ralf, I created functions based on Rcpp::List and std::list with a wrap at the end. Speed comparisons don't seem to favor one version over the other, but perhaps there's an inefficiency I'm not aware of.
src = '
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::List myvec(int n) {
Rcpp::RNGScope rngScope;
Rcpp::List x(n);
// Rcpp::IntegerVector choices = {1, 2 ,3};
Rcpp::IntegerVector choices = Rcpp::seq_len(50);
for (int i = 0; i < n; ++i) {
int nc = Rcpp::sample(choices, 1).at(0);
int nr = Rcpp::sample(choices, 1).at(0);
Rcpp::NumericVector entries = Rcpp::rbinom(nc * nr, 1, 0.5);
x(i) = Rcpp::NumericMatrix(nc, nr, entries.begin());
}
return x;
}
// [[Rcpp::export]]
Rcpp::List myvec2(int n) {
Rcpp::RNGScope scope;
std::list< Rcpp::NumericMatrix > x;
// Rcpp::IntegerVector choices = {1, 2 ,3};
Rcpp::IntegerVector choices = Rcpp::seq_len(50);
for (int i = 0; i < n; ++i) {
int nc = Rcpp::sample(choices, 1).at(0);
int nr = Rcpp::sample(choices, 1).at(0);
Rcpp::NumericVector entries = Rcpp::rbinom(nc * nr, 1, 0.5);
x.push_back( Rcpp::NumericMatrix(nc, nr, entries.begin()));
}
return Rcpp::wrap(x);
}
'
sourceCpp(code = src)
Resulting benchmarks on my computer are:
> library(microbenchmark)
> rcpp_list = function() {
+ set.seed(10);myvec(105)
+ }
> std_list = function() {
+ set.seed(10);myvec2(105)
+ }
> microbenchmark(rcpp_list(), std_list(), times = 1000)
Unit: milliseconds
expr min lq mean median uq
rcpp_list() 1.8901 1.92535 2.205286 1.96640 2.22380
std_list() 1.9164 1.95570 2.224941 2.00555 2.32315
max neval cld
7.1569 1000 a
7.1194 1000 a
The fundamental issue that Rcpp objects are R objects governed my R's memory management where resizing is expensive: full copies.
So when I have tasks similar to yours where sizes may change, or are unknown, I often work with different data structures -- the STL gives us plenty -- and only convert to R(cpp) at the return step at the end.
The devil in the detail here (as always). Profile, experiment, ...
Edit: And in the narrower sense of "can we return a List of NumericMatrix objects with varying sizes" the answer is of course we can because that is what List objects do. You can also insert other types.
As Dirk said, it is of course possible to create a list with matrices of different size. To make it a bit more concrete, here a translation of your R function:
#include <Rcpp.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
Rcpp::List myvec(int n) {
Rcpp::List x(n);
Rcpp::IntegerVector choices = {1, 2 ,3};
for (int i = 0; i < n; ++i) {
int nc = Rcpp::sample(choices, 1).at(0);
int nr = Rcpp::sample(choices, 1).at(0);
Rcpp::NumericVector entries = Rcpp::rbinom(nc * nr, 1, 0.5);
x(i) = Rcpp::NumericMatrix(nc, nr, entries.begin());
}
return x;
}
/***R
myvec(2)
*/
The main difference to the R code are the explicitly named vectors choices and entries, which are only implicit in the R code.
I need to write to a file row by row of matrices and sparse matrices that appears in a list and I am doing something like this:
#include <RcppArmadillo.h>
// [[Rcpp::export]]
bool write_rows (Rcpp::List data, Rcpp::CharacterVector clss, int n) {
int len = data.length();
for(int i = 0; i<n; i++) {
for(int j=0; j<len; j++) {
if (clss[j] == "matrix") {
Rcpp::NumericMatrix x = data[j];
auto row = x.row(i);
// do something with row i
} else if (clss[j] == "dgCMatrix") {
arma::sp_mat x = data[j];
auto row = x.row(i);
// do something different with row i
}
}
}
return true;
}
This function can be called in R with:
data <- list(
x = Matrix::rsparsematrix(nrow = 1000, ncol = 1000, density = 0.3),
y = matrix(1:10000, nrow = 1000, ncol = 10)
)
clss <- c("dgCMatrix", "matrix")
write_rows(data, clss, 1000)
The function receives a list of matrices or sparse matrices with the same number of rows and writes those matrices row by row, ie. first writes first rows of all elements in data then the second row of all elements and etc.
My problem is that it seems that this line arma::sp_mat x = data[i]; seems to have a huge impact in performance since it seems that I am implicitly casting the list element data[j] to an Armadillo Sparse Matrix n times.
My question is: is there anyway I could avoid this? Is there a more efficient solution? I tried to find a solution by looking into readr's source code, since they also write list elements row by row, but they also do a cast for each row (in this line for example, but maybe this doesn't impact the performance because they deal with SEXPS?
With the clarification, it seems that the result should interleave the rows from each matrix. You can still do this while avoiding multiple conversions.
This is the original code, modified to generate some actual output:
// [[Rcpp::export]]
arma::mat write_rows(Rcpp::List data, Rcpp::CharacterVector clss, int nrows, int ncols) {
int len = data.length();
arma::mat result(nrows*len, ncols);
for (int i = 0, k = 0; i < nrows; i++) {
for (int j = 0; j < len; j++) {
arma::rowvec r;
if (clss[j] == "matrix") {
Rcpp::NumericMatrix x = data[j];
r = x.row(i);
}
else {
arma::sp_mat x = data[j];
r = x.row(i);
}
result.row(k++) = r;
}
}
return result;
}
The following code creates a vector of converted objects, and then extracts the rows from each object as required. The conversion is only done once per matrix. I use a struct containing a dense and sparse mat because it's a lot simpler than dealing with unions; and I don't want to drag in boost::variant or require C++17. Since there's only 2 classes we want to deal with, the overhead is minimal.
struct Matrix_types {
arma::mat m;
arma::sp_mat M;
};
// [[Rcpp::export]]
arma::mat write_rows2(Rcpp::List data, Rcpp::CharacterVector clss, int nrows, int ncols) {
const int len = data.length();
std::vector<Matrix_types> matr(len);
std::vector<bool> is_dense(len);
arma::mat result(nrows*len, ncols);
// populate the structs
for (int j = 0; j < len; j++) {
is_dense[j] = (clss[j] == "matrix");
if (is_dense[j]) {
matr[j].m = Rcpp::as<arma::mat>(data[j]);
}
else {
matr[j].M = Rcpp::as<arma::sp_mat>(data[j]);
}
}
// populate the result
for (int i = 0, k = 0; i < nrows; i++) {
for (int j = 0; j < len; j++, k++) {
if (is_dense[j]) {
result.row(k) = matr[j].m.row(i);
}
else {
arma::rowvec r(matr[j].M.row(i));
result.row(k) = r;
}
}
}
return result;
}
Running on some test data:
data <- list(
a=Matrix(1.0, 1000, 1000, sparse=TRUE),
b=matrix(2.0, 1000, 1000),
c=Matrix(3.0, 1000, 1000, sparse=TRUE),
d=matrix(4.0, 1000, 1000)
)
system.time(z <- write_rows(data, sapply(data, class), 1000, 1000))
# user system elapsed
# 185.75 35.04 221.38
system.time(z2 <- write_rows2(data, sapply(data, class), 1000, 1000))
# user system elapsed
# 4.21 0.05 4.25
identical(z, z2)
# [1] TRUE
Today I was trying to debug my code and stumbled across something that renders my solutions useless. What i am generally trying to calculate is the multidimensional L2-Norm for the following two matrices. As long as I am not using scale() everything is working fine. Nonetheless, as soon as I scale the matrices the solutions of the three used approaches are not the same anymore. What am I missing here?
set.seed(655)
df.a <- data.frame(A = sample(100:124, 24), B = sample(1:24, 24), C = sample(1:24, 24), D = rep(0, times=24))
df.b <- data.frame(A = sample(125:148, 24), B = sample(25:48, 24), C = sample(1:24, 24), D = sample(1:100, 24))
For this reason I have three different approaches:
sapply-function and sqrt of rowSums
sse <- function(x1, x2) sum((x1 - x2) ^ 2)
distanceChangeByTech <- function(x) {
sse(df.a[,x], df.b[,x])
}
help1 <- t(data.frame(sapply(colnames(df.a), distanceChangeByTech)))
dist_sap <- sqrt(rowSums(help1))
multidimensional Euclidean distance using RCPP:
multiEucl <- cxxfunction(signature(x="matrix", y="matrix"), plugin="Rcpp",
body='
Rcpp::NumericMatrix dx(x);
Rcpp::NumericMatrix dy(y);
const int N = dx.nrow();
const int M = dx.ncol();
double sum = 0;
for(int i=0; i<N; i++){
for(int j=0; j<M; j++){
sum = sum + pow(dx(i,j) - dy(i,j), 2);
}
}
return wrap(sqrt(sum));
')
multidimensional Lp-Norm using RCPP:
multiPNorm <- cxxfunction(signature(x="matrix", y="matrix", p="numeric"), plugin="Rcpp",
body='
Rcpp::NumericMatrix dx(x);
Rcpp::NumericMatrix dy(y);
double dp = Rcpp::as<double>(p);
const int N = dx.nrow();
const int M = dx.ncol();
double sum = 0;
double rsum = 0;
for(int i=0; i<N; i++){
for(int j=0; j<M; j++){
sum = sum + pow(abs(dx(i,j) - dy(i,j)), dp);
}
}
rsum = pow(sum, 1/dp);
return wrap(rsum);
')
When I tried this at first all worked well.
> multiEucl(as.matrix(df.a), as.matrix(df.b))
[1] 366.1543
> multiPNorm(as.matrix(df.a), as.matrix(df.b), 2)
[1] 366.1543
> sqrt(rowSums(help1)) sapply.colnames.df.a...distanceChangeByTech.
366.1543
But as soon as I scale the matrices, which I want to do because I will do a Clustering based on these distancemeasures, there is a fault. The solutions are not the same anymore?! What is causing this? I am using these commands to scale.
df.a <- as.data.frame(scale(df.a))
df.a[is.na(df.a)] <- 0
df.b <- as.data.frame(scale(df.b))
df.b[is.na(df.b)] <- 0
> multiEucl(as.matrix(df.a), as.matrix(df.b))
[1] 12.51781
> multiPNorm(as.matrix(df.a), as.matrix(df.b), 2)
[1] 8.944272
> sqrt(rowSums(help1))
sapply.colnames.df.a...distanceChangeByTech.
12.51781
You used abs() which is documented eg here but you meant to use fabs() which is documented here.
The cmath.h header provides overloaded abs() as well, but you probably didn't include that.
It seems that abs() is not doing the right thing here. Instead I changed my coding of the multiPNorm and the changes seem to work.
multiPNorm <- cxxfunction(signature(x="matrix", y="matrix", p="numeric"), plugin="Rcpp",
body='
Rcpp::NumericMatrix dx(x);
Rcpp::NumericMatrix dy(y);
double dp = Rcpp::as<double>(p);
const int N = dx.nrow();
const int M = dx.ncol();
double sum = 0;
double rsum = 0;
double help = 0;
for(int i=0; i<N; i++){
for(int j=0; j<M; j++){
help = dx(i,j) - dy(i,j);
if (help < 0) {
help = - help;
}
sum = sum + pow(help, dp);
}
}
rsum = pow(sum, 1/dp);
return wrap(rsum);
')