R: Fast cartesian product calculation of two numeric matrices - r

I have two large numeric matrices and want to calculate their cartesian product in R. Is there a way to do it with higher performance and lower memory usage than with my current approach?
EDIT: I added a Rcpp version, which already performs a lot better than my first only-R approach. Since I'm not experienced with Rcpp or RcppArmadillo: Is there a faster/more standardized way to write this Rcpp-function?
m1 <- matrix(sample(0:9, size=110000, replace = TRUE), ncol = 110)
m2 <- matrix(sample(0:9, size=110000, replace = TRUE), ncol = 110)
#Current approach:
m3 <- apply(m1, 1, function(x) x * t(m2))
matrix(m3, ncol = 110, byrow = TRUE)
#EDIT - Rcpp approach
library(Rcpp)
#assuming ncol(m1) == ncol(m2)
cppFunction('IntegerMatrix cartProd(IntegerMatrix m1, IntegerMatrix m2) {
int nrow1 = m1.nrow(), ncol = m1.ncol(), nrow2 = m2.nrow();
int orow = 0;
IntegerMatrix out(nrow1 * nrow2, ncol);
for (int r1 = 0; r1 < nrow1; r1++) {
for (int r2 = 0; r2 < nrow2; r2++) {
for (int c = 0; c < ncol; c++){
out(orow, c) = m1(r1, c) * m2(r2, c);
}
orow++;
}
}
return out;
}')
m5 <- cartProd(m1, m2)

The best approach as you have surmised is to use C++ to perform the cartesian product that you desire. Trying to port the code over to Armadillo will yield a minor speed up compared to the pure Rcpp version, which is significantly faster than the written R version. For details on how well each method did, see the benchmark section at the end.
The first version is almost a direct port into armadillo and actually performs slightly worse than initial pure Rcpp function. The second version uses armadillo's built in submatrix views and each_row() functions to exploit in place evaluation. To achieve parity with the Rcpp version, note the use of the pass-by-reference and the use of a signed integer type yielding const arma::imat&. This avoids a deep copy of the two large integer matrices as the types are matched and a reference is established.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]
// --- Version 1
// [[Rcpp::export]]
arma::imat cartProd_arma(const arma::imat& m1, const arma::imat& m2) {
int nrow1 = m1.n_rows, ncol = m1.n_cols, nrow2 = m2.n_rows, orow = 0;
arma::imat out(nrow1 * nrow2, ncol);
for (int r1 = 0; r1 < nrow1; ++r1) {
for (int r2 = 0; r2 < nrow2; ++r2) {
out.row(orow) = m1.row(r1) % m2.row(r2);
orow++;
}
}
return out;
}
// --- Version 2
// [[Rcpp::export]]
arma::imat cartProd_arma2(const arma::imat& m1, const arma::imat& m2) {
int nrow1 = m1.n_rows, ncol = m1.n_cols, nrow2 = m2.n_rows, orow = 0;
arma::imat out(nrow1 * nrow2, ncol);
for (int r1 = 0; r1 < nrow1; ++r1) {
out.submat(orow, 0, orow + nrow2 - 1, ncol - 1) = m1.row(r1) % m2.each_row();
orow += nrow2;
}
return out;
}
Quick verification of implementation details matching the initial product
all.equal( cartProd(m1, m2), cartProd_arma(m1, m2))
# [1] TRUE
all.equal( cartProd(m1, m2), cartProd_arma2(m1, m2))
# [1] TRUE
To generate the benchmarks, I tidied up the initial function slightly by pre-transposing the matrix to avoid multiple transpose calls each time apply was called per row. Furthermore, I've included the function trick showed by #user20650.
# OP's initial R only solution with slight modifications
op_R = function(m1, m2){
m2 <- t(m2)
m3 <- matrix(apply(m1, 1, function(x) x * m2), ncol = ncol(m1), byrow = TRUE)
}
# user20650's comment
so_comment <- function(m1, m2){
m4 <- matrix(rep(t(m1), each=nrow(m1)) * c(m2), ncol=nrow(m1))
}
As a result, we have the following microbenchmark
library("microbenchmark")
out <- microbenchmark(op_r = op_R(m1, m2), so_comment_r = so_comment(m1, m2),
rcpp = cartProd(m1, m2), arma_v1 = cartProd_arma(m1, m2),
arma_v2 = cartProd_arma2(m1, m2),
times = 50)
out
# Unit: milliseconds
# expr min lq mean median uq max neval
# op_r 1615.6572 1693.0526 1793.0515 1771.7353 1886.0988 2053.7050 50
# so_comment_r 2778.0971 2856.6429 2950.5837 2936.7459 3021.4249 3344.4401 50
# rcpp 463.6743 482.3118 565.0525 582.1660 614.3714 699.3516 50
# arma_v1 597.9004 620.5888 713.4101 726.7572 783.4225 820.3770 50
# arma_v2 384.7205 401.9744 490.5118 503.5007 574.6840 622.9876 50
So, from this, we can see that the cartProd_arma2, the submatrix armadillo implementation, is the best function followed closely by cartProd, the pure Rcpp implementation.

Related

How can I speed up my Rcpp code, which only carries out simple operations?

I'm trying to write a function that takes in a matrix and computes a value for every pair of columns. The matrix always has 2000 rows, but can potentially have a very large number of columns (up to 100,000 or so). The R code I started with is as follows:
x_dist <- data.frame(array(0,dim=c(ncol(x),ncol(x))))
cs <- colSums(x)
for (i in 1:ncol(x)) {
p_i <- x[,i]
for (j in 1:ncol(x)) {
p_j <- x[,j]
s <- p_i+p_j
fac <- cs[i]/(cs[i]+cs[j])
N1 <- fac*s
N2 <- (1-fac)*s
d1 <- (p_i+1)/(N1+1)
d2 <- (p_j+1)/(N2+1)
x_dist[i,j] <- sum(N1+N2-N1*d1-N2*d2+p_i*log(d1)+p_j*log(d2))
}
}
This function is quite slow. When there are only 400 columns in the matrix x, it takes about 32 seconds, and obviously grows quadratically in the number of columns.
Since I've heard Rcpp is good for speeding up for loops and matrix operations, I decided to give that a try. I am completely new to it, but ended up putting together the following function:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix wdist(NumericMatrix x) {
int nrow = x.nrow(),ncol=x.ncol();
NumericMatrix m = no_init_matrix(ncol,ncol);
NumericVector v1 = no_init_vector(nrow);
NumericVector v2 = no_init_vector(nrow);
NumericVector s = no_init_vector(nrow);
NumericVector N1 = no_init_vector(nrow);
NumericVector N2 = no_init_vector(nrow);
NumericVector d1 = no_init_vector(nrow);
NumericVector d2 = no_init_vector(nrow);
for(int i=0; i<ncol; ++i){
v1 = x(_,i);
for(int j=0; j<i; ++j){
v2 = x(_,j);
s = v1+v2;
N1 = sum(v1)*s/(sum(v1)+sum(v2));
N2 = s-N1;
d1 = (v1+1)/(N1+1);
d2 = (v2+1)/(N2+1);
m(i,j) = sum(N1+N2-N1*d1-N2*d2+v1*log(d1)+v2*log(d2));
}
}
return m;
}
This certainly makes a big difference. Now with 400 columns, this takes about 8 seconds. I am pleased by the improvement, but this is still intractably slow for my current test case of interest, which is 32,000 columns. I feel like I am doing some relatively simple operations, so it's confusing to me why my code is still this slow. I've tried to do some reading on writing efficient Rcpp code, but haven't found anything that helps address my issue. Please let me know if there is anything I'm doing wrong or any improvements I can look into to make my code faster (or even the R code itself, if that can be made faster than the Rcpp code!)
Some example data could be:
set.seed(121220)
x <- array(rpois(2000*400,3),dim=c(2000,400))
I refactored your base R code and hope it could speed up somewhat
f <- function(...) {
p <- x[, t(...)]
N <- matrix(rowSums(p), ncol = 1) %*% colSums(p) / sum(p)
d <- (p + 1) / (N + 1)
sum(N - N * d + p * log(d))
}
x_dist <- diag(0, ncol(x))
x_dist[lower.tri(x_dist)] <- combn(ncol(x), 2, FUN = f)
x_dist <- pmax(x_dist, t(x_dist))
To speed up your Rcpp code, you can try the following nested for loops after initializing your matrix m as a all-zero matrix:
for(int i=0; i<ncol-1; ++i){
v1 = x(_,i);
for(int j=i+1; j<ncol; ++j){
v2 = x(_,j);
s = v1+v2;
N1 = sum(v1)*s/sum(s);
N2 = s-N1;
d1 = (v1+1)/(N1+1);
d2 = (v2+1)/(N2+1);
val = sum(N1+N2-N1*d1-N2*d2+v1*log(d1)+v2*log(d2));
m(i,j) = val;
m(j,i) = val;
}
}
which applies the property that the matrix is symmetry and thus reduce computational complexity by half.

Rcpp use outer with pmax

I have got an R function which I need to calculate approximately one million times for vectors of length ~ 5000. Is there any possibily to speed it up by implementing it in Rcpp? I hardly worked with Rcpp before and the code below does not to work:
set.seet(1)
a <- rt(5e3, df = 2)
b <- rt(5e3, df = 2.5)
c <- rt(5e3, df = 3)
d <- rt(5e3, df = 3.5)
sum((1 - outer(a, b, pmax)) * (1 - outer(c, d, pmax)))
#[1] -367780.1
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double f_outer(NumericVector u, NumericVector v, NumericVector x, NumericVector y) {
double result = sum((1 - Rcpp::outer(u, v, Rcpp::pmax)) * (1 - Rcpp::outer(x, y, Rcpp::pmax)));
return(result);
}
Thank you very much!
F. Privé is right -- we'll want to go with loops here; I've got the following C++ code in a file so-answer.cpp:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double f_outer(NumericVector u, NumericVector v, NumericVector x, NumericVector y) {
// We'll use the size of the first and second vectors for our for loops
int n = u.size();
int m = v.size();
// Make sure the vectors are appropriately sized for what we're doing
if ( (n != x.size() ) || ( m != y.size() ) ) {
::Rf_error("Vectors not of compatible sizes.");
}
// Initialize a result variable
double result = 0.0;
// And use loops instead of outer
for ( int i = 0; i < n; ++i ) {
for ( int j = 0; j < m; ++j ) {
result += (1 - std::max(u[i], v[j])) * (1 - std::max(x[i], y[j]));
}
}
// Then return the result
return result;
}
Then we see in R that the C++ code gives the same answer as your R code, but runs much faster:
library(Rcpp) # for sourceCpp()
library(microbenchmark) # for microbenchmark() (for benchmarking)
sourceCpp("so-answer.cpp") # compile our C++ code and make it available in R
set.seed(1) # for reproducibility
a <- rt(5e3, df = 2)
b <- rt(5e3, df = 2.5)
c <- rt(5e3, df = 3)
d <- rt(5e3, df = 3.5)
sum((1 - outer(a, b, pmax)) * (1 - outer(c, d, pmax)))
#> [1] -69677.99
f_outer(a, b, c, d)
#> [1] -69677.99
# Same answer, so looking good. Which one's faster?
microbenchmark(base = sum((1 - outer(a, b, pmax)) * (1 - outer(c, d, pmax))),
rcpp = f_outer(a, b, c, d))
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> base 3978.9201 4119.6757 4197.9292 4131.3300 4144.4524 10121.5558 100
#> rcpp 118.8963 119.1531 129.4071 119.4767 122.5218 909.2744 100
#> cld
#> b
#> a
Created on 2018-12-13 by the reprex package (v0.2.1)

R recursive function or loop in loop

simple problem.
I want to check if the difference of two points (i, j) is greater than a threshold (diff).
If the difference between the points exceeds the threshold the index should be returned and the next distance is measured but from the new datapoint. It is a simple cutofffilter where all datapoints under a predefined threshold are filtered. The only trick is, that the measurement is performed from always the "last" point (that was "far enough away" from the point before).
I first wrote it as two nested loops like:
x <- sample(1:100)
for(i in 1:(length(x)-1)){
for(j in (i+1):length(x)){
if(abs(x[i] - x[j]) >= cutoff) {
print(j)
i <- j # set the index to the current datapoint
break }
}}
This solution is kind of intuitive. But does not work proper. I think the assignment of i and j is not valid. The first loop just ignores to jump and loops through all datapoints.
Well, I did not want to waste time with debugging and just thought I can do the same with a recursive function.
So I wrote it like:
checkCutOff.f <- function(x,cutoff,i = 1) {
options(expressions=500000)
# Loops through the data and comperes the temporally fixed point 'i with the looping points 'j
for(j in (i+1):length(x)){
if( abs(x[i] - x[j]) >= cutoff ){
break
}
}
# Recursive function to update the new 'i - stops at the end of the dataset
if( j<length(x) ) return(c(j,checkCutOff.f(x,cutoff,j)))
else return(j)
}
x<-sample(1:100000)
checkCutOff.f(x,1)
This code works. But I get a stack overflow with big datasets. That's why I ask myself if this code is efficient.
For me is increasing limits etc. always a hint for inefficient code...
So my question is:
What kind of solution is really efficient?
Thanks!
You should avoid growing your return value with c. That's inefficient. Allocate to the maximum size and subset to the needed size in the end.
Note that your function always includes length(x) in your result, which is wrong:
set.seed(42)
x<-sample(1:10)
checkCutOff.f(x, 100)
#[1] 10
Here is an R solution with a loop:
checkCutOff.f1 <- function(x,cutoff) {
i <- 1
j <- 1
k <- 1
result <- integer(length(x))
while(j < length(x)) {
j <- j + 1
if (abs(x[i] - x[j]) >= cutoff) {
result[k] <- j
k <- k + 1
i <- j
}
}
result[seq_len(k - 1)]
}
all.equal(checkCutOff.f(x, 4), checkCutOff.f1(x, 4))
#[1] TRUE
#the correct solution includes length(x) here (by chance)
It's easy to translate to Rcpp:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector checkCutOff_f1cpp(NumericVector x, double cutoff) {
int i = 0;
int j = 1;
int k = 0;
IntegerVector result(x.size());
while(j < x.size()) {
if (std::abs(x[i] - x[j]) >= cutoff) {
result[k] = j + 1;
k++;
i = j;
}
j++;
}
result = result[seq_len(k)-1];
return result;
}
Then in R:
all.equal(checkCutOff.f(x, 4), checkCutOff_f1cpp(x, 4))
#[1] TRUE
Benchmarks:
library(microbenchmark)
y <- sample(1:1000)
microbenchmark(
checkCutOff.f(y, 4),
checkCutOff.f1(y, 4),
checkCutOff_f1cpp(y, 4)
)
#Unit: microseconds
# expr min lq mean median uq max neval cld
# checkCutOff.f(y, 4) 3665.105 4681.6005 7798.41776 5323.068 6635.9205 41028.930 100 c
# checkCutOff.f1(y, 4) 1384.524 1507.2635 1831.43236 1769.031 2070.7225 3012.279 100 b
# checkCutOff_f1cpp(y, 4) 8.765 10.7035 26.40709 14.240 18.0005 587.958 100 a
I'm sure this can be improved further and more testing should be done.

Efficiency and Speed of R code using Rcpp

This post is about speeding up R code using Rcpp package to avoid recursive loops.
My input is define by the following example (length 7) which is part of the data.frame (length 51673) that I used :
S=c(906.65,906.65,906.65,906.65,906.65,906.65,906.65)
T=c(0.1371253,0.1457896,0.1248953,0.1261278,0.1156931,0.0985253,0.1332596)
r=c(0.013975,0.013975,0.013975,0.013975,0.013975,0.013975,0.013975)
h=c(0.001332596,0.001248470,0.001251458,0.001242143,0.001257921,0.001235755,0.001238440)
P=c(3,1,5,2,1,4,2)
A= data.frame(S=S,T=T,r=r,h=h,P=P)
S T r h Per
1 906.65 0.1971253 0.013975 0.001332596 3
2 906.65 0.1971253 0.013975 0.001248470 1
3 906.65 0.1971253 0.013975 0.001251458 5
4 906.65 0.1971253 0.013975 0.001242143 2
5 906.65 0.1971253 0.013975 0.001257921 1
6 906.65 0.1971253 0.013975 0.001235755 4
7 906.65 0.1971253 0.013975 0.001238440 2
The parameters are :
w=0.001; b=0.2; a=0.0154; c=0.0000052; neta=-0.70
I have the following code of the function that I want to use :
F<-function(x,w,b,a,c,neta,S,T,r,P){
u=1i*x
nu=(1/(neta^2))*(((1-2*neta)^(1/2))-1)
# Recursion back to time t
# Terminal condition for the A and B
A_Q=0
B_Q=0
steps<-round(T*250,0)
for (j in 1:steps){
A_Q= A_Q+ r*u + w*B_Q-(1/2)*log(1-2*a*(neta^4)*B_Q)
B_Q= b*B_Q+u*nu+ (1/neta^2)*(1-sqrt((1-2*a*(neta^4)*B_Q)*( 1- 2*c*B_Q - 2*u*neta)))
}
F= exp(log(S)*u + A_Q + B_Q*h[P])
return(F)
}
S = A$S ; r= A$r ; T= A$T ; P=A$P; h= A$h
Then I want to apply the previous function using my Data.set a the vector of length N= 100000 :
Z=length(S); N=100000 ; alpha=2 ; delta= 0.25
lambda=(2*pi)/(N*delta)
res = matrix(nrow=N, ncol=Z)
for (i in 1:N){
for (j in 1:Z){
res[i,j]= Re(F(((delta*(i-1))-(alpha+1)*1i),w,b,a,c,neta,S[j],T[j],r[j],P[j]))
}
}
But it is taking a lot of time: it takes 20 seconds to execute this line of code for N=100 but I want to execute it for N= 100000 times, the overall run time can take hours. How to fine tune the above code using Rcpp, to reduce the execution time and to obtain an Efficient program?
Is it possible to reduce the execution time and if so, please suggest me a solution even with out Rcpp.
Thanks.
Your function F can be converted to C++ pretty easily by taking advantage of the vec and cx_vec classes in the Armadillo library (accessed through the RcppArmadillo package) - which has great support for vectorized calculations.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::cx_vec Fcpp(const arma::cx_vec& x, double w, double b, double a, double c,
double neta, const arma::vec& S, const arma::vec& T,
const arma::vec& r, Rcpp::IntegerVector P, Rcpp::NumericVector h) {
arma::cx_vec u = x * arma::cx_double(0.0,1.0);
double nu = (1.0/std::pow(neta,2.0)) * (std::sqrt(1.0-2.0*neta)-1.0);
arma::cx_vec A_Q(r.size());
arma::cx_vec B_Q(r.size());
arma::vec steps = arma::round(T*250.0);
for (size_t j = 0; j < steps.size(); j++) {
for (size_t k = 0; k < steps[j]; k++) {
A_Q = A_Q + r*u + w*B_Q -
0.5*arma::log(1.0 - 2.0*a*std::pow(neta,4.0)*B_Q);
B_Q = b*B_Q + u*nu + (1.0/std::pow(neta,2.0)) *
(1.0 - arma::sqrt((1.0 - 2.0*a*std::pow(neta,4.0)*B_Q) *
(1.0 - 2.0*c*B_Q - 2.0*u*neta)));
}
}
arma::vec hP = Rcpp::as<arma::vec>(h[P-1]);
arma::cx_vec F = arma::exp(arma::log(S)*u + A_Q + B_Q*hP);
return F;
}
Just a couple of minor changes to note:
I'm using arma:: functions for vectorized calculations, such as arma::log, arma::exp, arma::round, arma::sqrt, and various overloaded operators (*, +, -); but using std::pow and std::sqrt for scalar calculations. In R, this is abstracted away from us, but here we have to distinguish between the two situations.
Your function F has one loop - for (i in 1:steps) - but the C++ version has two, just due to the differences in loop semantics between the two languages.
Most of the input vectors are arma:: classes (as opposed to using Rcpp::NumericVector and Rcpp::ComplexVector), the exception being P and h, since Rcpp vectors offer R-like element access - e.g. h[P-1]. Also notice that P needs to be offset by 1 (0-based indexing in C++), and then converted to an Armadillo vector (hP) using Rcpp::as<arma::vec>, since your compiler will complain if you try to multiply a cx_vec with a NumericVector (B_Q*hP).
I added a function parameter h - it's not a good idea to rely on the existence of a global variable h, which you were doing in F. If you need to use it in the function body, you should pass it into the function.
I changed the name of your function to Fr, and to make benchmarking a little easier, I just wrapped your double loop that populates the matrix res into the functions Fr and Fcpp:
loop_Fr <- function(mat = res) {
for (i in 1:N) {
for (j in 1:Z) {
mat[i,j]= Re(Fr(((delta*(i-1))-(alpha+1)*1i),w,b,a,c,neta,S[j],T[j],r[j],P[j],h))
}
}
return(mat)
}
loop_Fcpp <- function(mat = res) {
for (i in 1:N) {
for (j in 1:Z) {
mat[i,j]= Re(Fcpp(((delta*(i-1))-(alpha+1)*1i),w,b,a,c,neta,S[j],T[j],r[j],P[j],h))
}
}
return(mat)
}
##
R> all.equal(loop_Fr(),loop_Fcpp())
[1] TRUE
I compared the two functions for N = 100, N = 1000, and N = 100000 (which took forever) - adjusting lambda and res accordingly, but keeping everything else the same. Generally speaking, Fcpp is about 10x faster than Fr on my computer:
N <- 100
lambda <- (2*pi)/(N*delta)
res <- matrix(nrow=N, ncol=Z)
##
R> microbenchmark::microbenchmark(loop_Fr(), loop_Fcpp(),times=50L)
Unit: milliseconds
expr min lq median uq max neval
loop_Fr() 142.44694 146.62848 148.97571 151.86318 186.67296 50
loop_Fcpp() 14.72357 15.26384 15.58604 15.85076 20.19576 50
N <- 1000
lambda <- (2*pi)/(N*delta)
res <- matrix(nrow=N, ncol=Z)
##
R> microbenchmark::microbenchmark(loop_Fr(), loop_Fcpp(),times=50L)
Unit: milliseconds
expr min lq median uq max neval
loop_Fr() 1440.8277 1472.4429 1491.5577 1512.5636 1565.6914 50
loop_Fcpp() 150.6538 153.2687 155.4156 158.0857 181.8452 50
N <- 100000
lambda <- (2*pi)/(N*delta)
res <- matrix(nrow=N, ncol=Z)
##
R> microbenchmark::microbenchmark(loop_Fr(), loop_Fcpp(),times=2L)
Unit: seconds
expr min lq median uq max neval
loop_Fr() 150.14978 150.14978 150.33752 150.52526 150.52526 2
loop_Fcpp() 15.49946 15.49946 15.75321 16.00696 16.00696 2
Other variables, as presented in your question:
S <- c(906.65,906.65,906.65,906.65,906.65,906.65,906.65)
T <- c(0.1371253,0.1457896,0.1248953,0.1261278,0.1156931,0.0985253,0.1332596)
r <- c(0.013975,0.013975,0.013975,0.013975,0.013975,0.013975,0.013975)
h <- c(0.001332596,0.001248470,0.001251458,0.001242143,0.001257921,0.001235755,0.001238440)
P <- c(3,1,5,2,1,4,2)
w <- 0.001; b <- 0.2; a <- 0.0154; c <- 0.0000052; neta <- (-0.70)
Z <- length(S)
alpha <- 2; delta <- 0.25

%dopar% or alternative method to speed up sequential stochastic calculation

I have written a stochastic process simulator but I would like to speed it up since it's pretty slow.
The main part of the simulator is made of a for loop which I would like to re-write as a foreach with `%dopar%.
I have tried doing so with a simplified loop but I'm running into some problems. Suppose my for loop looks like this
library(foreach)
r=0
t<-rep(0,500)
for(n in 1:500){
s<-1/2+r
u<-runif(1, min = 0, max = 1)
if(u<s){
t[n]<-u
r<-r+0.001
}else{r<-r-0.001}
}
which means that at each iteration I update the value of r and s and, in one of the two outcomes, populate my vector t. I have tried several different ways of re-writing it as a foreach loop but it seems like with each iteration my values don't get updated and I get some pretty strange results. I have tried using return but it doesn't seem to work!
This is an example of what I have come up with.
rr=0
tt<-foreach(i=1:500, .combine=c) %dopar% {
ss<-1/2+rr
uu<-runif(1, min = 0, max = 1)
if(uu<=ss){
return(uu)
rr<-rr+0.001
}else{
return(0)
rr<-rr-0.001}
}
If it is impossible to use foreach what other way is there for me to re-write the loop so to be able to use all cores and speed up things?
Since your comments, about turning to C, were encouraging and -mostly- to prove that this isn't a hard task (especially for such operations) and it's worth looking into, here is a comparison of two sample functions that accept a number of iterations and perform the steps of your loop:
ffR = function(n)
{
r = 0
t = rep(0, n)
for(i in 1:n) {
s = 1/2 + r
u = runif(1)
if(u < s) {
t[i] = u
r = r + 0.001
} else r = r - 0.001
}
return(t)
}
ffC = inline::cfunction(sig = c(R_n = "integer"), body = '
int n = INTEGER(AS_INTEGER(R_n))[0];
SEXP ans;
PROTECT(ans = allocVector(REALSXP, n));
double r = 0.0, s, u, *pans = REAL(ans);
GetRNGstate();
for(int i = 0; i < n; i++) {
s = 0.5 + r;
u = runif(0.0, 1.0);
if(u < s) {
pans[i] = u;
r += 0.001;
} else {
pans[i] = 0.0;
r -= 0.001;
}
}
PutRNGstate();
UNPROTECT(1);
return(ans);
', includes = "#include <Rmath.h>")
A comparison of results:
set.seed(007); ffR(5)
#[1] 0.00000000 0.39774545 0.11569778 0.06974868 0.24374939
set.seed(007); ffC(5)
#[1] 0.00000000 0.39774545 0.11569778 0.06974868 0.24374939
A comparison of speed:
microbenchmark::microbenchmark(ffR(1e5), ffC(1e5), times = 20)
#Unit: milliseconds
# expr min lq median uq max neval
# ffR(1e+05) 497.524808 519.692781 537.427332 668.875402 692.598785 20
# ffC(1e+05) 2.916289 3.019473 3.133967 3.445257 4.076541 20
And for the sake of completeness:
set.seed(101); ans1 = ffR(1e5)
set.seed(101); ans2 = ffC(1e5)
all.equal(ans1, ans2)
#[1] TRUE
Hope any of this could be helpful in some way.
What you are trying to do, since every iteration is dependent on the previous steps of the loop, doesn't seem to be parallelizable. You are updating the variable r and expecting other branches that are running simultaneously to know about it, and in fact wait for the update to happen, which
1) Doesn't happen. They won't wait, they'll just take r's current value whatever that is at the time they are running
2) If it did it would be same as running it without %dopar%

Resources