How to vectorize triple nested loops? - r

I've done searching similar problems and I have a vague idea about what should I do: to vectorize everything or use apply() family. But I'm a beginner on R programming and both of the above methods are quite confusing.
Here is my source code:
x<-rlnorm(100,0,1.6)
j=0
k=0
i=0
h=0
lambda<-rep(0,200)
sum1<-rep(0,200)
constjk=0
wj=0
wk=0
for (h in 1:200)
{
lambda[h]=2+h/12.5
N=ceiling(lambda[h]*max(x))
for (j in 0:N)
{
wj=(sum(x<=(j+1)/lambda[h])-sum(x<=j/lambda[h]))/100
for (k in 0:N)
{
constjk=dbinom(k, j + k, 0.5)
wk=(sum(x<=(k+1)/lambda[h])-sum(x<=k/lambda[h]))/100
sum1[h]=sum1[h]+(lambda[h]/2)*constjk*wk*wj
}
}
}
Let me explain a bit. I want to collect 200 sum1 values (that's the first loop), and for every sum1 value, it is the summation of (lambda[h]/2)*constjk*wk*wj, thus the other two loops. Most tedious is that N changes with h, so I have no idea how to vectorize the j-loop and the k-loop. But of course I can vectorize the h-loop with lambda<-seq() and N<-ceiling(), and that's the best I can do. Is there a way to further simplify the code?

Your code can be perfectly verctorized with 3 nested sapply calls. It might be a bit hard to read for the untrained eye, but the essence of it is that instead of adding one value at a time to sum1[h] we calculate all the terms produced by the innermost loop in one go and sum them up.
Although this vectorized solution is faster than your tripple for loop, the improvement is not dramatical. If you plan to use it many times I suggest you implement it in C or Fortran (with regular for loops), which improves the speed a lot. Beware though that it has high time complexity and will scale badly with increased values of lambda, ultimatelly reaching a point when it is not possible to compute within reasonable time regardless of the implementation.
lambda <- 2 + 1:200/12.5
sum1 <- sapply(lambda, function(l){
N <- ceiling(l*max(x))
sum(sapply(0:N, function(j){
wj <- (sum(x <= (j+1)/l) - sum(x <= j/l))/100
sum(sapply(0:N, function(k){
constjk <- dbinom(k, j + k, 0.5)
wk <- (sum(x <= (k+1)/l) - sum(x <= k/l))/100
l/2*constjk*wk*wj
}))
}))
})
Btw, you don't need to predefine variables like h, j, k, wj and wk. Especially since not when vectorizing, as assignments to them inside the functions fed to sapply will create overlayered local variables with the same name (i.e. ignoring the ones you predefied).

Let`s wrap your simulation in a function and time it:
sim1 <- function(num=20){
set.seed(42)
x<-rlnorm(100,0,1.6)
j=0
k=0
i=0
h=0
lambda<-rep(0,num)
sum1<-rep(0,num)
constjk=0
wj=0
wk=0
for (h in 1:num)
{
lambda[h]=2+h/12.5
N=ceiling(lambda[h]*max(x))
for (j in 0:N)
{
wj=(sum(x<=(j+1)/lambda[h])-sum(x<=j/lambda[h]))/100
for (k in 0:N)
{
set.seed(42)
constjk=dbinom(k, j + k, 0.5)
wk=(sum(x<=(k+1)/lambda[h])-sum(x<=k/lambda[h]))/100
sum1[h]=sum1[h]+(lambda[h]/2)*constjk*wk*wj
}
}
}
sum1
}
system.time(res1 <- sim1())
# user system elapsed
# 5.4 0.0 5.4
Now let's make it faster:
sim2 <- function(num=20){
set.seed(42) #to make it reproducible
x <- rlnorm(100,0,1.6)
h <- 1:num
sum1 <- numeric(num)
lambda <- 2+1:num/12.5
N <- ceiling(lambda*max(x))
#functions for wj and wk
wjfun <- function(x,j,lambda,h){
(sum(x<=(j+1)/lambda[h])-sum(x<=j/lambda[h]))/100
}
wkfun <- function(x,k,lambda,h){
(sum(x<=(k+1)/lambda[h])-sum(x<=k/lambda[h]))/100
}
#function to calculate values of sum1
fun1 <- function(N,h,x,lambda) {
sum1 <- 0
set.seed(42) #to make it reproducible
#calculate constants using outer
const <- outer(0:N[h],0:N[h],FUN=function(j,k) dbinom(k, j + k, 0.5))
wk <- numeric(N[h]+1)
#loop only once to calculate wk
for (k in 0:N[h]){
wk[k+1] <- (sum(x<=(k+1)/lambda[h])-sum(x<=k/lambda[h]))/100
}
for (j in 0:N[h])
{
wj <- (sum(x<=(j+1)/lambda[h])-sum(x<=j/lambda[h]))/100
for (k in 0:N[h])
{
sum1 <- sum1+(lambda[h]/2)*const[j+1,k+1]*wk[k+1]*wj
}
}
sum1
}
for (h in 1:num)
{
sum1[h] <- fun1(N,h,x,lambda)
}
sum1
}
system.time(res2 <- sim2())
#user system elapsed
#1.25 0.00 1.25
all.equal(res1,res2)
#[1] TRUE
Timings for #Backlin`s code (with 20 interations) for comparison:
user system elapsed
3.30 0.00 3.29
If this is still too slow and you cannot or don't want to use another language, there is also the possibility of parallelization. As far as I see the outer loop is embarrassingly parallel. There are some nice and easy packages for parallelization.

Related

R: fastest way to set up matrix of integrals?

I have a tree-parameter function f(x, y, z), and two limits L, U.
Given a vector v, I want to set up a matrix with element M[i, j] = INTEGRAL( f(x, v[i], v[j]) ), where the integrals limits go from x = L to x = U.
So the problem has two elements:
We need to be able to calculate the integrals. I don't care how this is done, as long as its FAST and reasonably accurate. Fast, fast, fast!! What's the fastest way?
We need to set up the matrix M[i, j]. What's the fastest way?
Please don't make this an issue of "dO yOu WaNt GauSsIan QuaDraTure oR SimPsoNs ruLe?". I don't care. Speed is the only thing relevant here. Whatevers faster, I'll take it, as long as the integrals are at least accurate up to 1-2 digits or something.
A potentially fastest solution is given as below
library(pracma)
M <- matrix(0,nrow = length(v),ncol = length(v))
p <- sapply(seq(length(v)-1), function(k) integral(f,v[k],v[k+1]))
u <- unlist(sapply(rev(seq_along(p)), function(k) cumsum(tail(p,k))))
M[lower.tri(M)] <- u
M <- t(M-t(M))
Regarding the two elements requested by OP
I guess integral from package pracma is fast enough
To build the matrix M, I did not used nested for loop. The idea is explained at the bottom lines, which I believe speeds up the computation remarkably
Benchmark
I wrote down some of the possible solutions and you can compare their performance (my "fastest" solution is in method1()).
set.seed(1)
library(pracma)
# dummy data: function f and vector v
f <- function(x) x**3 + cos(x**2)
v <- rnorm(500)
# my "fastest" solution
method1 <- function() {
m1 <- matrix(0,nrow = length(v),ncol = length(v))
p <- sapply(seq(length(v)-1), function(k) integral(f,v[k],v[k+1]))
u <- unlist(sapply(rev(seq_along(p)), function(k) cumsum(tail(p,k))))
m1[lower.tri(m1)] <- u
t(m1-t(m1))
}
# faster than brute-force solution
method2 <- function() {
m2 <- matrix(0,nrow = length(v),ncol = length(v))
for (i in 1:(length(v)-1)) {
for (j in i:length(v)) {
m2[i,j] <- integral(f,v[i],v[j])
}
}
m2 + t(m2)
}
# slowest, brute-force solution
method3 <- function() {
m3 <- matrix(0,nrow = length(v),ncol = length(v))
for (i in 1:length(v)) {
for (j in 1:length(v)) {
m3[i,j] <- integral(f,v[i],v[j])
}
}
m3
}
# timing for compare
system.time(method1())
system.time(method2())
system.time(method3())
such that
> system.time(method1())
user system elapsed
0.17 0.01 0.19
> system.time(method2())
user system elapsed
25.72 0.07 25.81
> system.time(method3())
user system elapsed
41.84 0.03 41.89
Principle
The idea in method1() is that, you only need to calculate the integrals over intervals consisting of adjacent points in v. Note that the integral properties:
integral(f,v[i],v[j]) is equal to sum(integral(f,v[i],v[i+1]) + integral(f,v[i+1],v[i+1]) + ... + integral(f,v[j-1],v[j]))
integral(f,v[j],v[i]) is equal to -integral(f,v[i],v[j])
In this sense, given n <- length(v), you only need to run integral operations (which is rather computational expensive compared to matrix transpose or vector cumulative summation) n-1 times (far less than choose(n,2) times in method2() or n**2 times in method3(), particularly when n is large).

Nested rolling sum in vector

I am struggling to produce an efficient code to compute the vector result r result from an input vector v using this function.
r(i) = \sum_{j=i}^{i-N} [o(i)-o(j)] * exp(o(i)-o(j))
where i loops (from N to M) over the vector v. Size of v is M>>N.
Of course this is feasible with 2 nested for loops, but it is too slow for computational purposes, probably out of fashion and deprecated style...
A MWE:
for (i in c(N+1):length(v)){
csum <- 0
for (j in i:c(i-N)) {
csum <- csum + (v[i]-v[j])*exp(v[i]-v[j])
}
r[i] <- csum
}
In my real application M > 10^5 and the v vector is indeed several vectors.
I have been trying with nested applications of lapply and rollapply without success.
Any suggestion is welcome.
Thanks!
I don't know if it is any more efficient but something you can try:
r[N:M] <- sapply(N:M, function(i) tail(cumsum((v[i]-v[1:N])*exp(v[i]-v[1:N])), 1))
checking that both computations give same results, I got r with your way and r2 with mine, initializing r2 to rep(NA, M) and assessed the similarity:
all((r-r2)<1e-12, na.rm=TRUE)
# [1] TRUE
NOTE: as in #lmo answer, tail(cumsum(...), 1) can be efficiently replaced by just using sum(...):
r[N:M] <- sapply(N:M, function(i) sum((v[i]-v[1:N])*exp(v[i]-v[1:N])))
Here is a method with a single for loop.
# create new blank vector
rr <- rep(NA,M)
for(i in N:length(v)) {
rr[i] <- sum((v[i] - v[seq_len(N)]) * exp(v[i] - v[seq_len(N)]))
}
check for equality
all.equal(r, rr)
[1] TRUE
You could reduce the number of operations by 1 if you store the difference. This should add a little speed up.
for(i in N:length(v)) {
x <- v[i] - v[seq_len(N)]
rr[i] <- sum(x * exp(x))
}

R: How could I change this loop to apply?

I'm currently working on an R program, where there is one part of this program that computes in a loop two values which are interdependant. Although since I have to do 100,000 iterations it takes so long time.
So I would like to substitute this for loop for an apply loop or some more efficient function, but I don't know how to do it. Could someone help me?
p <- c()
for(i in 1:n) {
if(i == 1) {
x <- b[i]
}
else {
x <- c(x, max(h[i - 1], p[i]))
}
h <- c(h, x[i] + y[i])
}
Thank you very much!!
You don't seem to have a full working example here, but the main problem is that building up the x and h vectors with the c() function is very slow. It's better to preallocate them:
x <- numeric(n) # allocate vector of size n
h <- numeric(n)
and then fill them in as you go by assigning to x[i] and h[i]. For example, the following loop:
x <- c(); for (i in 1:100000) x <- c(x,1)
takes about 10 seconds to run on my laptop, but this version:
x <- numeric(100000); for (i in 1:100000) x[i] <- 1
does the same thing while running almost instantly.

How to make a loop run faster in R?

I want to use arms() to get one sample each time and make a loop like the following one in my function. It runs very slowly. How could I make it run faster? Thanks.
library(HI)
dmat <- matrix(0, nrow=100,ncol=30)
system.time(
for (d in 1:100){
for (j in 1:30){
y <- rep(0, 101)
for (i in 2:100){
y[i] <- arms(0.3, function(x) (3.5+0.000001*d*j*y[i-1])*log(x)-x,
function(x) (x>1e-4)*(x<20), 1)
}
dmat[d, j] <- sum(y)
}
}
)
This is a version based on Tommy's answer but avoiding all loops:
library(multicore) # or library(parallel) in 2.14.x
set.seed(42)
m = 100
n = 30
system.time({
arms.C <- getNativeSymbolInfo("arms")$address
bounds <- 0.3 + convex.bounds(0.3, dir = 1, function(x) (x>1e-4)*(x<20))
if (diff(bounds) < 1e-07) stop("pointless!")
# create the vector of z values
zval <- 0.00001 * rep(seq.int(n), m) * rep(seq.int(m), each = n)
# apply the inner function to each grid point and return the matrix
dmat <- matrix(unlist(mclapply(zval, function(z)
sum(unlist(lapply(seq.int(100), function(i)
.Call(arms.C, bounds, function(x) (3.5 + z * i) * log(x) - x,
0.3, 1L, parent.frame())
)))
)), m, byrow=TRUE)
})
On a multicore machine this will be really fast since it spreads the loads across cores. On a single-core machine (or for poor Windows users) you can replace mclapply above with lapply and get only a slight speedup compared to Tommy's answer. But note that the result will be different for parallel versions since it will use different RNG sequences.
Note that any C code that needs to evaluate R functions will be inherently slow (because interpreted code is slow). I have added the arms.C just to remove all R->C overhead to make moli happy ;), but it doesn't make any difference.
You could squeeze out a few more milliseconds by using column-major processing (the question code was row-major which requires re-copying as R matrices are always column-major).
Edit: I noticed that moli changed the question slightly since Tommy answered - so instead of the sum(...) part you have to use a loop since y[i] are dependent, so the function(z) would look like
function(z) { y <- 0
for (i in seq.int(99))
y <- y + .Call(arms.C, bounds, function(x) (3.5 + z * y) * log(x) - x,
0.3, 1L, parent.frame())
y }
Well, one effective way is to get rid of the overhead inside arms. It does some checks and calls the indFunc every time even though the result is always the same in your case.
Some other evaluations can be also be done outside the loop. These optimizations bring down the time from 54 secs to around 6.3 secs on my machine. ...and the answer is identical.
set.seed(42)
#dmat2 <- ##RUN ORIGINAL CODE HERE##
# Now try this:
set.seed(42)
dmat <- matrix(0, nrow=100,ncol=30)
system.time({
e <- new.env()
bounds <- 0.3 + convex.bounds(0.3, dir = 1, function(x) (x>1e-4)*(x<20))
f <- function(x) (3.5+z*i)*log(x)-x
if (diff(bounds) < 1e-07) stop("pointless!")
for (d in seq_len(nrow(dmat))) {
for (j in seq_len(ncol(dmat))) {
y <- 0
z <- 0.00001*d*j
for (i in 1:100) {
y <- y + .Call("arms", bounds, f, 0.3, 1L, e)
}
dmat[d, j] <- y
}
}
})
all.equal(dmat, dmat2) # TRUE
why not like this?
dat <- expand.grid(d=1:10, j=1:3, i=1:10)
arms.func <- function(vec) {
require(HI)
dji <- vec[1]*vec[2]*vec[3]
arms.out <- arms(0.3,
function(x,params) (3.5 + 0.00001*params)*log(x) - x,
function(x,params) (x>1e-4)*(x<20),
n.sample=1,
params=dji)
return(arms.out)
}
dat$arms <- apply(dat,1,arms.func)
library(plyr)
out <- ddply(dat,.(d,j),summarise, arms=sum(arms))
matrix(out$arms,nrow=length(unique(out$d)),ncol=length(unique(out$j)))
However, its still single core and time consuming. But that isn't R being slow, its the arms function.

vector binding in R

I would like to implement a simulation program, which requires the following structure:
It has a for loop, the program will generate an vector in each iteration. I need each generated vector is appended to the existing vector.
I do not how how to do this in R. Thanks for the help.
These answers work, but they all require a call to a non-deterministic function like sample() in the loop. This is not loop-invariant code (it is random each time), but it can still be moved out of the for loop. The trick is to use the n argument and generate all the random numbers you need beforehand (if your problem allows this; some may not, but many do). Now you make one call rather than n calls, which matters if your n is large. Here is a quick example random walk (but many problems can be phrased this way). Also, full disclosure: I haven't had any coffee today, so please point out if you see an error :-)
steps <- 30
n <- 100
directions <- c(-1, 1)
results <- vector('list', n)
for (i in seq_len(n)) {
walk <- numeric(steps)
for (s in seq_len(steps)) {
walk[s] <- sample(directions, 1)
}
results[[i]] <- sum(walk)
}
We can rewrite this with one call to sample():
all.steps <- sample(directions, n*steps, replace=TRUE)
dim(all.steps) <- c(n, steps)
walks <- apply(all.steps, 1, sum)
Proof of speed increase (n=10000):
> system.time({
+ for (i in seq_len(n)) {
+ walk <- numeric(steps)
+ for (s in seq_len(steps)) {
+ walk[s] <- sample(directions, 1)
+ }
+ results[[i]] <- sum(walk)
+ }})
user system elapsed
4.231 0.332 4.758
> system.time({
+ all.steps <- sample(directions, n*steps, replace=TRUE)
+ dim(all.steps) <- c(n, steps)
+ walks <- apply(all.steps, 1, sum)
+ })
user system elapsed
0.010 0.001 0.012
If your simulation needs just one random variable per simulation function call, use sapply(), or better yet the multicore package's mclapply(). Revolution Analytics's foreach package may be of use here too. Also, JD Long has a great presentation and post about simulating stuff in R on Hadoop via Amazon's EMR here (I can't find the video, but I'm sure someone will know).
Take home points:
Preallocate with numeric(n) or vector('list', n)
Push invariant code out of for loops. Cleverly push stochastic functions out of code with their n argument.
Try hard for sapply() or lapply(), or better yet mclapply.
Don't use x <- c(x, rnorm(100)). Every time you do this, a member of R-core kills a puppy.
Probably the best thing you can do is preallocate a list of length n (n is number of iterations) and flatten out the list after you're done.
n <- 10
start <- vector("list", n)
for (i in 1:n) {
a[[i]] <- sample(10)
}
start <- unlist(start)
You could do it the old nasty way. This may be slow for larger vectors.
start <- c()
for (i in 1:n) {
add <- sample(10)
start <- c(start, add)
}
x <- rnorm(100)
for (i in 100) {
x <- c(x, rnorm(100))
}
This link should be useful: http://www.milbo.users.sonic.net/ra/
Assuming your simulation function -- call it func -- returns a vector with the same length each time, you can store the results in the columns of a pre-allocated matrix:
sim1 <- function(reps, func) {
first <- func()
result <- matrix(first, nrow=length(first), ncol=reps)
for (i in seq.int(from=2, to=reps - 1)) {
result[, i] <- func()
}
return(as.vector(result))
}
Or you could express it as follows using replicate:
sim2 <- function(reps, func) {
return(as.vector(replicate(reps, func(), simplify=TRUE)))
}
> sim2(3, function() 1:3)
[1] 1 2 3 1 2 3 1 2 3

Resources