How to make Rcpp code efficient with multiple for loops? - r

I am trying to implement following Rcpp code by calling from R. The computing time is extremely slow. There are lots of for loops involved.
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat qpart(
const int& n,
const int& p,
const int& m,
arma::vec& G,
arma::vec& ftime,
arma::vec& cause,
arma::mat& covs,
arma::mat& S1byS0hat,
arma::vec& S0hat,
arma::vec& expz){
arma::mat q(n,p);
q.zeros();
for(int u=0;u<n;++u){
arma::mat q1(1,p);
q1.zeros();
for(int iprime=0;iprime<n;++iprime){
for(int i=0;i<n;++i){
if(cause(iprime)==1 & cause(i)>1 & (ftime(i) < ftime(u)) & (ftime(u) <= ftime(iprime))){
q1 += (covs.row(i) - S1byS0hat.row(iprime))*G(iprime)/G(i)*expz(i)/S0hat(iprime);
}
}
}
q.row(u) = q1/(m*m);
}
return q;
}
Following is an example in R.
#### In R ########
n = 2000
m = 500
p=3
G = runif(n)
ftime = runif(n,0.01,5)
cause = c(rep(0,600),rep(1,1000),rep(2,400))
covs = matrix(rnorm(n*p),n,p)
S1byS0hat = matrix(rnorm(n*p),p,n)
S0hat = rnorm(n)
expz = rnorm(n)
system.time( qpart(n,p,m,G,ftime,cause,covs,t(S1byS0hat),S0hat,expz))
user system elapsed
21.5 0.0 21.5
As we can see, the computing time is very high.
Same code implemented in R and the computing time is very high.
q = matrix(0,n,p)
for(u in 1 : n){
q1 <- matrix(0,p,1)
for(iprime in 1 : n){
for(i in 1 : n){
if(cause[iprime]==1 & cause[i]>1 & (time[i]<time[u]) & (time[u] <= time[iprime])){
q1 = q1 + (covs[i,] - S1byS0hat[,iprime])*G[iprime]/G[i]*expz[i]/S0hat[iprime]
}
}
}
q[u,] = q1/(m*m)
}
Following is the formula that I am trying to implement.

Some conditions depends only on u and iprime so you can check them much before.
You can also precompute some stuff. This gives:
arma::mat qpart2(
double m,
arma::vec& ftime,
arma::vec& cause,
arma::mat& covs,
arma::mat& S1byS0hat,
arma::vec& G_div_S0hat,
arma::vec& expz_div_G){
double m2 = m * m;
int n = covs.n_rows;
int p = covs.n_cols;
arma::mat q(n, p, arma::fill::zeros);
for (int u = 0; u < n; u++) {
double ftime_u = ftime(u);
for (int iprime = 0; iprime < n; iprime++) {
if (cause(iprime) == 1 && ftime_u <= ftime(iprime)) {
for (int i = 0; i < n; i++) {
if (cause(i) > 1 && ftime(i) < ftime_u) {
double coef = G_div_S0hat(iprime) * expz_div_G(i);
for (int j = 0; j < p; j++) {
q(u, j) += (covs(i, j) - S1byS0hat(iprime, j)) * coef;
}
}
}
}
}
for (int j = 0; j < p; j++) q(u, j) /= m2;
}
return q;
}
Using qpart2(m, ftime, cause, covs, t(S1byS0hat), G / S0hat, expz / G) takes 3.7 sec (vs 32 sec for your code).
**
Small remarks:
Is there a reason why you are using arma structures instead of Rcpp ones?
You should access matrices by columns, not by rows, it should be a bit faster because they are stored column-wise.

Related

Is this correct in Rcpp?

I want to compare each columns, and return all the results after calculating. I try to write the codes, but the outcome was not resonable. Because if there are 5 columns in a matrix, the number of result will will be 5*4/2=10 rather than 5. I think the problem is the m in codes. I don't know whether it is correct. Thanks.
library(Rcpp)
sourceCpp(code='
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
double KS(arma::colvec x, arma::colvec y) {
int n = x.n_rows;
arma::colvec w = join_cols(x, y);
arma::uvec z = arma::sort_index(w);
w.fill(-1); w.elem( find(z <= n-1) ).ones();
return max(abs(cumsum(w)))/n;
}
// [[Rcpp::export]]
Rcpp::NumericVector K_S(arma::mat mt) {
int n = mt.n_cols;
int m = 1;
Rcpp::NumericVector results(n);
for (int i = 0; i < n-1; i++) {
for (int j = i+1; j < n; j++){
arma::colvec x=mt.col(i);
arma::colvec y=mt.col(j);
results[m] = KS(x, y);
m ++;
}
}
return results;
}
')
set.seed(1)
mt <- matrix(rnorm(400*5), ncol=5)
result <- K_S(t(mt))
> result
[1] 0.0000 0.1050 0.0675 0.0475 0.0650
You had a couple of small errors. In fixing it, an intermediate version I had just filled a similar n by n matrix -- that made indexing errors obvious. Returning an arma::rowvec also helps with possible out-of-bounds index errors (it errors by default) but lastly you (in this case !!) can actually just grow a std::vector instead.
Code
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
double KS(arma::colvec x, arma::colvec y) {
int n = x.n_rows;
arma::colvec w = join_cols(x, y);
arma::uvec z = arma::sort_index(w);
w.fill(-1); w.elem( find(z <= n-1) ).ones();
return max(abs(cumsum(w)))/n;
}
// [[Rcpp::export]]
std::vector<double> K_S(arma::mat mt) {
int n = mt.n_cols;
std::vector<double> res;
for (int i = 0; i < n; i++) {
for (int j = i+1; j < n; j++){
arma::colvec x=mt.col(i);
arma::colvec y=mt.col(j);
res.push_back(KS(x, y));
}
}
return res;
}
/*** R
set.seed(1)
mt <- matrix(rnorm(400*5), ncol=5)
result <- K_S(mt)
result
*/
Output
> Rcpp::sourceCpp("~/git/stackoverflow/73916783/answer.cpp")
> set.seed(1)
> mt <- matrix(rnorm(400*5), ncol=5)
> result <- K_S(mt)
> result
[1] 0.1050 0.0675 0.0475 0.0650 0.0500 0.0775 0.0575 0.0500 0.0475 0.0600
>

How to make R code with an array to be more efficient?

I have a following R code which is not efficient. I would like to make this efficient using Rcpp. Particularly, I am not used to dealing with array in Rcpp. Any help would be appreciated.
myfunc <- function(n=1600,
m=400,
p = 3,
time = runif(n,min=0.05,max=4),
qi21 = rnorm(n),
s0c = rnorm(n),
zc_min_ecox_multi = array(rnorm(n*n*p),dim=c(n,n,p)),
qi=matrix(0,n,n),
qi11 = rnorm(p),
iIc_mat = matrix(rnorm(p*p),p,p)){
for (j in 1:n){
u<-time[j]
ind<-1*(u<=time)
locu<-which(time==u)
qi2<- sum(qi21*ind) /s0c[locu]
for (i in 1:n){
qi1<- qi11%*%iIc_mat%*%matrix(zc_min_ecox_multi[i,j,],p,1)
qi[i,j]<- -(qi1+qi2)/m
}
}
}
Computing time is about 7.35 secs. I need to call this function over and over again, maybe 20 times.
system.time(myfunc())
user system elapsed
7.34 0.00 7.35
First thing to do would be to profile your code: profvis::profvis({myfunc()}).
What you can do is precompute qi11 %*% iIc_mat once.
You get (with minor improvements):
precomp <- qi11 %*% iIc_mat
for (j in 1:n) {
u <- time[j]
qi2 <- sum(qi21[u <= time]) / s0c[time == u]
for (i in 1:n) {
qi1 <- precomp %*% zc_min_ecox_multi[i, j, ]
qi[i, j] <- -(qi1 + qi2) / m
}
}
that is twice as fast (8 sec -> 4 sec).
Vectorizing the i loop then seems straightforward:
q1_all_i <- tcrossprod(precomp, zc_min_ecox_multi[, j, ])
qi[, j] <- -(q1_all_i + qi2) / m
(12 times as fast now)
And if you want to try it in Rcpp, you will first need a function to multiply the matrices...
#include<Rcpp.h>
#include<numeric>
// [[Rcpp::plugins("cpp11")]]
Rcpp::NumericMatrix mult(const Rcpp::NumericMatrix& lhs,
const Rcpp::NumericMatrix& rhs)
{
if (lhs.ncol() != rhs.nrow())
Rcpp::stop ("Incompatible matrices");
Rcpp::NumericMatrix out(lhs.nrow(),rhs.ncol());
Rcpp::NumericVector rowvec, colvec;
for (int i = 0; i < lhs.nrow(); ++i)
{
rowvec = lhs(i,Rcpp::_);
for (int j = 0; j < rhs.ncol(); ++j)
{
colvec = rhs(Rcpp::_,j);
out(i, j) = std::inner_product(rowvec.begin(), rowvec.end(),
colvec.begin(), 0.);
}
}
return out;
}
Then port your function...
// [[Rcpp::export]]
Rcpp::NumericMatrix myfunc_rcpp( int n, int m, int p,
const Rcpp::NumericVector& time,
const Rcpp::NumericVector& qi21,
const Rcpp::NumericVector& s0c,
const Rcpp::NumericVector& zc_min_ecox_multi,
const Rcpp::NumericMatrix& qi11,
const Rcpp::NumericMatrix& iIc_mat)
{
Rcpp::NumericMatrix qi(n, n);
Rcpp::NumericMatrix outermat = mult(qi11, iIc_mat);
for (int j = 0; j < n; ++j)
{
double qi2 = 0;
for(int k = 0; k < n; ++k)
{
if(time[j] <= time[k]) qi2 += qi21[k];
}
qi2 /= s0c[j];
for (int i = 0; i < n; ++i)
{
Rcpp::NumericMatrix tmpmat(p, 1);
for(int z = 0; z < p; ++z)
{
tmpmat(z, 0) = zc_min_ecox_multi[i + n*j + z*n*n];
}
Rcpp::NumericMatrix qi1 = mult(outermat, tmpmat);
qi(i,j) -= (qi1(0,0) + qi2)/m;
}
}
return qi;
}
Then in R:
my_rcpp_func <- function(n=1600,
m=400,
p = 3,
time = runif(n,min=0.05,max=4),
qi21 = rnorm(n),
s0c = rnorm(n),
zc_min_ecox_multi = array(rnorm(n*n*p),dim=c(n,n,p)),
qi11 = rnorm(p),
iIc_mat = matrix(rnorm(p*p),p,p))
{
myfunc_rcpp(n, m, p, time, qi21, s0c, as.vector(zc_min_ecox_multi),
matrix(qi11,1,p), iIc_mat)
}
This is certainly faster, and gives the same results as your own function, but it's no quicker than the in-R optimizations suggested by F Privé. Maybe optimizing the C++ code could get things even faster, but ultimately you are multiplying 2 reasonably large matrices together over 2.5 million times, so it's never going to be all that fast. R is optimized pretty well for this kind of calculation after all...

Returning bunch of matrices using RCPP in C++ in an efficient way using a list

I am trying to return a bunch of matrices using RCPP. My code below is extremely inefficient. I would like to know if the following code can be efficient.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
Rcpp::List hello(
const arma::rowvec& g,
const int& n,
const int& p,
const arma::mat& S,
const arma::mat& zc,
const arma::rowvec& dl){
Rcpp::List ht(n);
for(int t=0; t < n;++t){
arma::mat hhat(p,n);
hhat.fill(0.0);
for(int i = 0;i < n; ++i){
arma::mat h(p,1);
h.fill(0.0);
if (t > i){
for(int u=i;u <= t; ++u){
arma::rowvec zr = zc.rows(i,i);
h += exp(arma::as_scalar(g*zr.t())) * (zr.t() - S.cols(u,u))*dl(u);
}
}
hhat.cols(i,i) = h;
}
ht[t] = hhat;
}
// Specify list length
Rcpp::List res(1);
res[0] = ht;
return(res);
}
Here is the example.
g=c(1,2.1,3.1)
n=1600
p=3
S = matrix(rnorm(4800),nrow=3,ncol=1600)
dl=runif(1600)
z=matrix(runif(4800),nrow=1600,ncol=3)
ptm=proc.time();kkk= hello(g=g,n=n,p=p,S = S,zc=z,dl = dl);proc.time()-ptm;
user system elapsed
31.25 0.00 31.30
Any help would be appreciated.
Following the updated code. Initially I was returning list of a list. Now it returns a list. This reduces the computing time by 10 seconds. I hope this code can be improved further.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
Rcpp::List hello(
const arma::rowvec& g,
const int& n,
const int& p,
const arma::mat& S,
const arma::mat& zc,
const arma::rowvec& dl){
Rcpp::List ht(n);
for(int t=0; t < n;++t){
arma::mat hhat(p,n);
hhat.zeros();
for(int i = 0;i < n; ++i){
arma::mat h(p,1);
// h.fill(0.0);
h.zeros();
if (t > i){
for(int u=i;u <= t; ++u){
//arma::rowvec zr = zc.rows(i,i);
h += exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(u))*dl(u);
}
}
hhat.col(i) = h;
}
ht[t] = hhat;
}
// Specify list length
// Rcpp::List res(1);
// res[0] = ht;
return(ht);
}
The formula that I am trying to implement is given below.
In my other answer I looked at the efficiency of returning data and at simple optimizations. Here I want to look at something different: Optimization of the algorithm.
You want to compute hhat(i, t) for 0 <= i, t < n and i < t. Looking at your formula we see that the dependency of hhat on i and t is very different. In particular, hhat(i, t + 1) can be written as hhat(i, t) + something. Right now your outer loop is over t and you are recomputing all these intermediate values. By switching the loop order, it is easy to do each such computation only once, bringing the algorithm down to a two nested loops. This means you have to generate the resulting matrices separately. And since you cannot store an arma::mat inside a Rcpp::List, I need an additional std::vector for storage:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
Rcpp::List hello(
const arma::rowvec& g,
const int& n,
const int& p,
const arma::mat& S,
const arma::mat& zc,
const arma::rowvec& dl){
std::vector<arma::mat> foo(n);
for(int t=0; t < n;++t){
arma::mat hhat(p,n);
hhat.zeros();
foo[t] = hhat;
}
for(int i = 0;i < n; ++i){
arma::mat h = exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(i))*dl(i);
for(int t=i+1; t < n;++t){
h += exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(t))*dl(t);
foo[t].col(i) = h;
}
}
Rcpp::List ht(n);
for(int t=0; t < n;++t){
ht[t] = foo[t];
}
return(ht);
}
// [[Rcpp::export]]
Rcpp::List hello_orig(
const arma::rowvec& g,
const int& n,
const int& p,
const arma::mat& S,
const arma::mat& zc,
const arma::rowvec& dl){
Rcpp::List ht(n);
for(int t=0; t < n;++t){
arma::mat hhat(p,n);
hhat.zeros();
for(int i = 0;i < n; ++i){
arma::mat h(p,1);
h.zeros();
if (t > i){
for(int u=i;u <= t; ++u){
h += exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(u))*dl(u);
}
}
hhat.col(i) = h;
}
ht[t] = hhat;
}
return(ht);
}
/***R
g=c(1,2.1,3.1)
n=1600
p=3
S = matrix(rnorm(p*n),nrow=p,ncol=n)
dl=runif(n)
z=matrix(runif(p*n),nrow=n,ncol=p)
bench::mark(hello_orig(g=g,n=n,p=p,S = S,zc=z,dl = dl),
hello(g=g,n=n,p=p,S = S,zc=z,dl = dl))
*/
Result:
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc
<bch:expr> <bch:> <bch:> <dbl> <bch:byt>
1 hello_orig(g = g, n = n, p = p, S = S, zc = z, dl = dl) 14.2s 14.2s 0.0703 58.7MB
2 hello(g = g, n = n, p = p, S = S, zc = z, dl = dl) 53.9ms 85.9ms 11.1 58.7MB
# … with 8 more variables: `gc/sec` <dbl>, n_itr <int>, n_gc <dbl>, total_time <bch:tm>,
# result <list>, memory <list>, time <list>, gc <list>
More than a factor 100 faster!
You can get cleaner (and maybe even a bit faster code) by floowing #coatless' suggestions in the comments to use an arma::cube. The most compact form will give you a different return structure, though. Instead of a list of p x n you will get a p x n x n array:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
arma::cube coatless(
const arma::rowvec& g,
const int& n,
const int& p,
const arma::mat& S,
const arma::mat& zc,
const arma::rowvec& dl){
arma::cube ht(p, n, n);
ht.zeros();
for(int i = 0;i < n; ++i){
arma::mat h = exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(i))*dl(i);
for(int t=i+1; t < n;++t){
h += exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(t))*dl(t);
ht.slice(t).col(i) = h;
}
}
return(ht);
}
Your question title makes one think you see the problem in returning the data to R. Rest assured that this is not an issue. You can easily check this by calling a function that returns matrices of zeros in the required size:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
Rcpp::List minimal(
const arma::rowvec& g,
const int& n,
const int& p,
const arma::mat& S,
const arma::mat& zc,
const arma::rowvec& dl){
Rcpp::List ht(n);
for(int t=0; t < n;++t){
arma::mat hhat(p,n);
hhat.zeros();
ht[t] = hhat;
}
return(ht);
}
On my system this function takes about 0.01 s with your input data. In other words, your real function spends most of its time on computing the actual results.
As for optimizing that part, it would be helpful if you could provide an idea of what you are trying to implement, e.g. with the help of mathematical formulas. As it stands, I can only do some simple changes:
In the i loop you only do something for t > i. Therefore it is sufficient to let the loop run till i < t.
The u loop can be formulated as a matrix-vector product, for which efficient implementations exist.
With changes like this I end up with
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
Rcpp::List hello(
const arma::rowvec& g,
const int& n,
const int& p,
const arma::mat& S,
const arma::mat& zc,
const arma::rowvec& dl){
Rcpp::List ht(n);
for(int t=0; t < n;++t){
arma::mat hhat(p,n);
hhat.zeros();
for(int i = 0;i < t; ++i){
arma::mat Sit = S.cols(i,t);
hhat.col(i) = - exp(arma::as_scalar(g*zc.row(i).t())) *
(Sit.each_col() - zc.row(i).t()) * dl.subvec(i,t).t();
}
ht[t] = hhat;
}
return(ht);
}
On my system this is about a factor of two faster than your code. It might well be possible to get even faster, though.

Unexpected behaviour in Rcpp

Please note that this error was taken from a bigger context, which I cannot obviously report here entirely.
I have the following functions in the file fun.cpp
#include <RcppArmadilloExtensions/sample.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]
arma::vec colMeans(arma::mat data){
int n_0 = data.n_rows;
arma::vec xbar(data.n_cols);
for(int i = 0; i < data.n_rows; i++){
for(int j = 0; j < data.n_cols; j++){
xbar[j] += data(i,j) /n_0;
}
}
return xbar;
}
// [[Rcpp::export]]
List PosteriorNIW(arma::mat data, arma::vec mu0, double lambda0,
double df0, arma::mat V){
// Compute posterior
int n = data.n_rows;
arma::vec xbar = colMeans(data);
double lambdan = lambda0 + n;
arma::vec mun = (lambda0 * mu0 + n * xbar) / lambdan;
arma::mat S;
S.zeros(data.n_cols, data.n_cols);
for(int i = 0; i < n; i++){
S += (arma::conv_to<arma::vec>::from(data.row(i)) - xbar) * arma::trans(arma::conv_to<arma::vec>::from(data.row(i)) - xbar);
}
arma::mat Vn = V + S + ((lambda0*n)/(lambda0 + n)) * (xbar - mu0) * arma::trans(xbar - mu0);
return List::create(_["mun"] = mun,
_["Vn"] = Vn,
_["lambdan"] = lambdan);
}
Calling now:
library(Rcpp); library(RcppArmadillo)
mu0 <- c(3,3)
V0 <- matrix(c(2.5,0.0,0.0,2.5), nrow = 2)
sourceCpp("fun.cpp")
data <- cbind(rep(5,15),rep(0,15))
PosteriorNIW(data, mu0, 1, 1, V0)
gives the expected result.
$mun
[,1]
[1,] 4.8750
[2,] 0.1875
$Vn
[,1] [,2]
[1,] 6.250 -5.6250
[2,] -5.625 10.9375
$lambdan
[1] 16
Now if I add to the file fun.cpp the following functions (again, these are taken from a bigger context so don't bother trying to understand but just paste them) strange things happens:
// [[Rcpp::export]]
NumericMatrix myFun(arma::mat t_dish, arma::cube data){
int l = 0;
for(int j = 0; j < data.n_rows; j++){
l++;
}
NumericMatrix Dk(l, 2);
return Dk;
}
// [[Rcpp::export]]
int myFun2(arma::cube n_cust){
arma::mat temp = n_cust.subcube(arma::span(0), arma::span(), arma::span());
int i;
for(i = 0; i < n_cust.n_cols; i++){
arma::rowvec temp2 = temp.row(i);
}
return i + 1;
}
// [[Rcpp::export]]
arma::vec myFun3(arma::mat k_tables){
arma::vec temp(k_tables.n_cols * k_tables.n_rows);
int l = 0;
if(!R_IsNA(k_tables(0,0))){
l++;
}
arma::vec temp2(l);
arma::vec tmp3 = sort(temp2);
return tmp3;
}
double myFun4(arma::vec x, double nu, arma::vec mu, arma::mat Sigma){
arma::vec product = (arma::trans(x - mu) * arma::inv(Sigma) * (x - mu));
double num = pow(1 + (1 / nu) * product[0], - ( nu + 2 ) / 2);
double den = pow(sqrt(M_PI * nu),2) * sqrt(arma::det(Sigma));
return num / den;
}
bool myFun5(NumericVector X, double z) {
return std::find(X.begin(), X.end(), z)!=X.end();
}
calling PosteriorNIW(data, mu0, 1, 1, V0) repeatedly starts giving different results every time. Note that there is no randomness in the functions and that obviously those functions have got no impact as they are not called in the original function.
I have tried on a different machine to make sure it was not a problem of my compiler but the error keeps happening.
I know that removing those function (even just one of them) fixes the problem but clearly this is not a feasible solution when I am working with more functions.
I would like to know if other users are able to replicate this behavior and if yes if there is a fix for it.
Thank you in advance
EDIT:
The version of R is 3.3.2 and Rtools is 3.4. Both Rcpp and RcppArmadillo are up-to-date
You're not zeroing xbar in your colMeans function. If I do do that:
arma::vec colMeans(arma::mat data){
int n_0 = data.n_rows;
arma::vec xbar;
xbar.zeros(data.n_cols);
for(int i = 0; i < data.n_rows; i++){
for(int j = 0; j < data.n_cols; j++){
xbar[j] += data(i,j) /n_0;
}
}
return xbar;
}
I get this everytime:
> PosteriorNIW(data, mu0, 1, 1.1, V0)
$mun
[,1]
[1,] 4.8750
[2,] 0.1875
$Vn
[,1] [,2]
[1,] 6.250 -5.6250
[2,] -5.625 10.9375
$lambdan
[1] 16
Even when I do add your extra block of code.
I don't know if these vectors are documented to be initialised to zero by their constructor (in which case this might be a bug there) or not, in which case its your bug!

RcppArmadillo on several cpu cores

I have the following RccpArmadillo function that runs fine if I execute it on one cpu core. But if I use several cores, then R will crash. All the other Rcpp functions I created so far run fine on several cores (with foreach), only RccpArmadillo seems to be problematic. Any ideas how to fix that?
cppFunction('double augmentedDickeyFullerCpp(NumericVector a, NumericVector b, double gamma, double mu, int lags) {
if (gamma < 0) {
return 0;
}
int n = a.size()-1;
int lags2 = lags + 1;
// first rows, then columns
NumericMatrix x(n-lags2,lags2);
NumericMatrix zdifflag(n-lags2+1,lags2);
NumericVector diff(n);
NumericVector zdiff(n-lags2+1);
NumericVector residuals(n+1);
residuals[0] = a[0] - gamma * b[0] - mu;
// residuals a is y and b is x
for(int i = 1; i < n+1; i++) {
residuals[i] = a[i] - gamma * b[i] - mu;
diff[i-1] = residuals[i] - residuals[i-1];
}
for(int i = 0; i < n-lags2+1; i++) {
zdifflag[0,i] = residuals[i+lags2-1];
}
for(int j = 0; j < n-lags2+1; j++) {
for(int i = 0; i < lags2; i++) {
x(j,i) = diff[j+lags2-1-i];
if (i > 0) {
zdifflag(j,i) = x(j,i);
}
}
zdiff[j] = x(j,0);
}
int length = zdifflag.nrow(), k = zdifflag.ncol();
arma::mat X(zdifflag.begin(), length, k, false); // reuses memory and avoids extra copy
arma::colvec y(zdiff.begin(), zdiff.size(), false);
arma::colvec coef = arma::solve(X, y); // fit model y ~ X
arma::colvec res = y - X*coef; // residuals
// std.errors of coefficients
//arma::colvec res = y - X*coef[0];
// sqrt(sum(residuals^2)/(length - k))
double s2 = std::inner_product(res.begin(), res.end(), res.begin(), 0.0)/(length - k);
arma::colvec std_err = arma::sqrt(s2 * arma::diagvec(arma::pinv(arma::trans(X)*X)));
return coef[0]/std_err[0];
}',depends = "RcppArmadillo", includes="#include <RcppArmadillo.h>")
I generally recommend putting the code into a small package, and having each parallel worker load the package. That is known to work, both in serial and parallel, whereas relying on cppFunction() for an ad-hoc function may be too fragile for parallel execution.

Resources