Rewriting loops with apply functions - r

I have the 3 following functions which I would like to make faster, I assume apply functions are the best way to go, but I have never used apply functions, so I have no idea what to do. Any type of hints, ideas and code snippets will be much appreciated.
n, T, dt are global parameters and par is a vector of parameters.
Function 1: is a function to create an m+1,n matrix containing poisson distributed jumps with exponentially distributed jump sizes. My troubles here is because I have 3 loops and I am not sure how to incorporate the if statement in the inner loop. Also I have no idea if it is at all possible to use apply functions on the outer layers of the loops only.
jump <- function(t=0,T=T,par){
jump <- matrix(0,T/dt+1,n) # initializing output matrix
U <- replicate(n,runif(100,t,T)) #matrix used to decide when the jumps will happen
Y <-replicate(n,rexp(100,1/par[6])) #matrix with jump sizes
for (l in 1:n){
NT <- rpois(1,par[5]*T) #number of jumps
k=0
for (j in seq(t,T,dt)){
k=k+1
if (NT>0){
temp=0
for (i in 1:NT){
u <- vector("numeric",NT)
if (U[i,l]>j){ u[i]=0
}else u[i]=1
temp=temp+Y[i,l]*u[i]
}
jump[k,l]=temp
}else jump[k,l]=0
}
}
return(jump)
}
Function 2: calculates a default intensity, based on Brownian motions and the jumps from function 1. Here my trouble is how to use apply functions when the variable used for the calculation is the values from the row above in the output matrix AND how to get the right values from the external matrices which are used in the calculations (BMz_C & J)
lambda <- function(t=0,T=T,par,fit=0){
lambda <- matrix(0,m+1,n) # matrix to hold intesity path output
lambda[1,] <- par[4] #initializing start value of the intensity path.
J <- jump(t,T,par) #matrix containing jumps
for(i in 2:(m+1)){
dlambda <- par[1]*(par[2]-max(lambda[i-1,],0))*dt+par[3]*sqrt(max(lambda[i- 1,],0))*BMz_C[i,]+(J[i,]-J[i-1,])
lambda[i,] <- lambda[i-1,]+dlambda
}
return(lambda)
}
Function 3: calculates a survival probability based on the intensity from function 2. Here a() and B() are functions that return numerical values. My problem here is that the both value i and j are used because i is not always an integer which thus can to be used to reference the external matrix. I have earlier tried to use i/dt, but sometimes it would overwrite one line and skip the next lines in the matrix, most likely due to rounding errors.
S <- function(t=0,T=T,par,plot=0, fit=0){
S <- matrix(0,(T-t)/dt+1,n)
if (fit > 0) S.fit <- matrix(0,1,length(mat)) else S.fit <- 0
l=lambda(t,T,par,fit)
j=0
for (i in seq(t,T,dt)){
j=j+1
S[j,] <- a(i,T,par)*exp(B(i,T,par)*l[j,])
}
return(S)
}
Sorry for the long post, any help for any of the functions will be much appreciated.
EDIT:
First of all thanks to digEmAll for the great reply.
I have now worked on vectorising function 2. First I tried
lambda <- function(t=0,T=T,par,fit=0){
lambda <- matrix(0,m+1,n) # matrix to hold intesity path input
J <- jump(t,T,par,fit)
lambda[1,] <- par[4]
lambda[2:(m+1),] <- sapply(2:(m+1), function(i){
lambda[i-1,]+par[1]*(par[2]-max(lambda[i-1,],0))*dt+par[3]*sqrt(max(lambda[i-1,],0))*BMz_C[i,]+(J[i,]-J[i-1,])
})
return(lambda)
}
but it would only produce the first column. So I tried a two step apply function.
lambda <- function(t=0,T=T,par,fit=0){
lambda <- matrix(0,m+1,n) # matrix to hold intesity path input
J <- jump(t,T,par,fit)
lambda[1,] <- par[4]
lambda[2:(m+1),] <- sapply(1:n, function(l){
sapply(2:(m+1), function(i){
lambda[i-1,l]+par[1]*(par[2]-max(lambda[i-1,l],0))*dt+par[3]*sqrt(max(lambda[i-1,l],0))*BMz_C[i,l]+(J[i,l]-J[i-1,l])
})
})
return(lambda)
}
This seems to work, but only on the first row, all rows after that have an identical non-zero value, as if lambda[i-1] is not used in the calculation of lambda[i], does anyone have an idea how to manage that?

I'm going to explain to you, setp-by-step, how to vectorize the first function (one possible way of vectorization, maybe not the best one for your case).
For the others 2 functions, you can simply apply the same concepts and you should be able to do it.
Here, the key concept is: start to vectorize from the innermost loop.
1) First of all, rpois can generate more than one random value at a time but you are calling it n-times asking one random value. So, let's take it out of the loop obtaining this:
jump <- function(t=0,T=T,par){
jump <- matrix(0,T/dt+1,n)
U <- replicate(n,runif(100,t,T))
Y <-replicate(n,rexp(100,1/par[6]))
NTs <- rpois(n,par[5]*T) # note the change
for (l in 1:n){
NT <- NTs[l] # note the change
k=0
for (j in seq(t,T,dt)){
k=k+1
if (NT>0){
temp=0
for (i in 1:NT){
u <- vector("numeric",NT)
if (U[i,l]>j){ u[i]=0
}else u[i]=1
temp=temp+Y[i,l]*u[i]
}
jump[k,l]=temp
}else jump[k,l]=0
}
}
return(jump)
}
2) Similarly, it is useless/inefficient to call seq(t,T,dt) n-times in the loop since it will always generate the same sequence. So, let's take it out of the loop and store into a vector, obtainig this:
jump <- function(t=0,T=T,par){
jump <- matrix(0,T/dt+1,n)
U <- replicate(n,runif(100,t,T))
Y <-replicate(n,rexp(100,1/par[6]))
NTs <- rpois(n,par[5]*T)
js <- seq(t,T,dt) # note the change
for (l in 1:n){
NT <- NTs[l]
k=0
for (j in js){ # note the change
k=k+1
if (NT>0){
temp=0
for (i in 1:NT){
u <- vector("numeric",NT)
if (U[i,l]>j){ u[i]=0
}else u[i]=1
temp=temp+Y[i,l]*u[i]
}
jump[k,l]=temp
}else jump[k,l]=0
}
}
return(jump)
}
3) Now, let's have a look at the innermost loop:
for (i in 1:NT){
u <- vector("numeric",NT)
if (U[i,l]>j){ u[i]=0
}else u[i]=1
temp=temp+Y[i,l]*u[i]
}
this is equal to :
u <- as.integer(U[1:NT,l]<=j)
temp <- sum(Y[1:NT,l]*u)
or, in one-line:
temp <- sum(Y[1:NT,l] * as.integer(U[1:NT,l] <= j))
hence, now the function can be written as :
jump <- function(t=0,T=T,par){
jump <- matrix(0,T/dt+1,n)
U <- replicate(n,runif(100,t,T))
Y <-replicate(n,rexp(100,1/par[6]))
NTs <- rpois(n,par[5]*T)
js <- seq(t,T,dt)
for (l in 1:n){
NT <- NTs[l]
k=0
for (j in js){
k=k+1
if (NT>0){
jump[k,l] <- sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) # note the change
}else jump[k,l]=0
}
}
return(jump)
}
4) Again, let's have a look at the current innermost loop:
for (j in js){
k=k+1
if (NT>0){
jump[k,l] <- sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) # note the change
}else jump[k,l]=0
}
as you can notice, NT does not depend on the iteration of this loop, so the inner if can be moved outside, as follows:
if (NT>0){
for (j in js){
k=k+1
jump[k,l] <- sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) # note the change
}
}else{
for (j in js){
k=k+1
jump[k,l]=0
}
}
this seems worse than before, well yes it is, but now the 2 conditions can be turned into one-liner's (note the use of sapply¹):
if (NT>0){
jump[1:length(js),l] <- sapply(js,function(j){ sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) })
}else{
jump[1:length(js),l] <- 0
}
obtaining the following jump function:
jump <- function(t=0,T=T,par){
jump <- matrix(0,T/dt+1,n)
U <- replicate(n,runif(100,t,T))
Y <-replicate(n,rexp(100,1/par[6]))
NTs <- rpois(n,par[5]*T)
js <- seq(t,T,dt)
for (l in 1:n){
NT <- NTs[l]
if (NT>0){
jump[1:length(js),l] <- sapply(js,function(j){ sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) })
}else{
jump[1:length(js),l] <- 0
}
}
return(jump)
}
5) finally we can get rid of the last loop, using again the sapply¹ function, obtaining the final jump function :
jump <- function(t=0,T=T,par){
U <- replicate(n,runif(100,t,T))
Y <-replicate(n,rexp(100,1/par[6]))
js <- seq(t,T,dt)
NTs <- rpois(n,par[5]*T)
jump <- sapply(1:n,function(l){
NT <- NTs[l]
if (NT>0){
sapply(js,function(j){ sum(Y[1:NT,l]*as.integer(U[1:NT,l]<=j)) })
}else {
rep(0,length(js))
}
})
return(jump)
}
(¹)
sapply function is pretty easy to use. For each element of the list or vector passed in the X parameter, it applies the function passed in the FUN parameter, e.g. :
vect <- 1:3
sapply(X=vect,FUN=function(el){el+10}
# [1] 11 12 13
since by default the simplify parameter is true, the result is coerced to the simplest possible object. So, for example in the previous case the result becomes a vector, while in the following example result become a matrix (since for each element we return a vector of the same size) :
vect <- 1:3
sapply(X=vect,FUN=function(el){rep(el,5)})
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 1 2 3
# [3,] 1 2 3
# [4,] 1 2 3
# [5,] 1 2 3
Benchmark :
The following benchmark just give you an idea of the speed gain, but the actual performances may be different depending on your input parameters.
As you can imagine, jump_old corresponds to your original function 1, while jump_new is the final vectorized version.
# let's use some random parameters
n = 10
m = 3
T = 13
par = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6)
dt <- 3
set.seed(123)
system.time(for(i in 1:5000) old <- jump_old(T=T,par=par))
# user system elapsed
# 12.39 0.00 12.41
set.seed(123)
system.time(for(i in 1:5000) new <- jump_new(T=T,par=par))
# user system elapsed
# 4.49 0.00 4.53
# check if last results of the 2 functions are the same:
isTRUE(all.equal(old,new))
# [1] TRUE

Related

Cooley-Tukey FFT in R radix-2 DIT case

So I've been trying to (manually) implement the Cooley-Turkey FFT algorithm in R (for Inputs with size N=n^2). I tried:
myfft <- function(s){
N <- length(s)
if (N != 1){
s[1:(N/2)] <- myfft(s[(1:(N/2))*2-1])
s[(N/2+1):N] <- myfft(s[(1:(N/2))*2])
for (k in 1:(N/2)){
t <- s[k]
s[k] <- t + exp(-1i*2*pi*(k-1)/N) * s[k+N/2]
s[k+N/2] <- t - exp(-1i*2*pi*(k-1)/N) * s[k+N/2]
}
}
s
}
This compiles, but for n>1, N=2^n it does not compute the right values. I implemented a DFT-function and used the fft() function to compare, both compute, when normalized, give the same values, but seem to disagree with my algorithm above.
If anyone feels interested and sees where I went wrong, help would be greatly appreciated, I'm going mad searching for the mistake and am starting to question, if I even ever understood this FFT algorithm.
UPDATE: I fixed it, I'm not 100% sure where the problem exactly was, but here is the working implementation:
myfft <- function(s){
N <- length(s)
if (N != 1){
t <- s
t[1:(N/2)] <- myfft(s[(1:(N/2))*2-1]) # 1 3 5 7 ...
t[(N/2+1):N] <- myfft(s[(1:(N/2))*2]) # 2 4 6 8 ...
s[1:(N/2)] <- t[1:(N/2)] + exp(-1i*2*pi*(0:(N/2-1))/N) * t[(N/2+1):N]
s[(N/2+1):N] <- t[1:(N/2)] - exp(-1i*2*pi*(0:(N/2-1))/N) * t[(N/2+1):N]
}
return(s)
}
The problem was with the following line
s[1:(N/2)] <- myfft(s[(1:(N/2))*2-1])
which was overwriting part of the untransformed values that were needed on the subsequent line:
s[(N/2+1):N] <- myfft(s[(1:(N/2))*2])
For example, when N=4, the second call to myfft uses s[2] and s[4], but the assignment from the first call to myfft writes into s[1] and s[2] (thus overwriting the required original value in s[2]).
Your solution of copying the entire array prevents this overwrite.
An alternate solution commonly used is to copy the even and odd parts separately:
myfft <- function(s){
N <- length(s)
if (N != 1){
odd <- s[(1:(N/2))*2-1]
even <- s[(1:(N/2))*2]
s[1:(N/2)] <- myfft(odd)
s[(N/2+1):N] <- myfft(even)
s[1:(N/2)] <- t[1:(N/2)] + exp(-1i*2*pi*(0:(N/2-1))/N) * t[(N/2+1):N]
s[(N/2+1):N] <- t[1:(N/2)] - exp(-1i*2*pi*(0:(N/2-1))/N) * t[(N/2+1):N]
}
return(s)
}

How to indicate the level of the for loop in R

Suppose I have a function including a for loop part. This for loop will work for, say, 10 iteration. How can I know from the result that the function is working now at level (iteration) number, say, 5.
That is, I would like my function to let me know the current iteration number.
For example,
I would like the result to be such this:
Iteration 1 starts
some result
iteration 1 ends
iteration 2 starts
some result
iteration 2 ends
...
...
Please note this is not my original function. In my original function I use optim function over a list of models, and I really need to know what is the current model.
Here is a general example:
Myfun <- function(x,y){
v <- list()
for(i in 1:100){
v[[i]] <- sum(x[[i]], y[[i]])
cat(v, "\n")
}
v
}
x <- rnorm(100)
y <- rnorm(100)
Myfun(x=x, y=y)
Method 1
Output the current iteration step inside the for loop.
Myfun <- function(x,y) {
v <- list()
for (i in 1:100) {
v[[i]] <- sum(x[[i]], y[[i]])
cat(sprintf("Step %i / 100 done\n", i))
}
v
}
Method 2
Use a progress bar (see ?txtProgressBar for details).
Myfun <- function(x,y) {
v <- list()
pb <- txtProgressBar(min = 0, max = 100, style = 3)
for (i in 1:100) {
v[[i]] <- sum(x[[i]], y[[i]])
setTxtProgressBar(pb, i)
}
close(pb)
v
}
Note that the line cat(v, "\n") from your original Myfun will give an error.

Poisson Distribution Function infinite loop

I have the following code that is trying to simulate the deposition of metallic atoms onto a cold substrate; however, it runs in an infinite loop.
Can anyone see where I'm making a mistake?
l <- 20
n <- 2000
e <- 1000
lsize <- matrix(0,l,l)
deposits <- rep(0,n)
avg.deposits <- rep(0,n)
prob <- rep(0,n)
n.deposits <- rep(0,n)
for(m in 1:e){
for(j in 1:l){
for(k in 1:l){
lsize[j,k] <- 0
}
}
for(i in 1:n){
ra <- runif(1)
x <- floor(1+l*ra)
ra <- runif(1)
y <- floor(1+l*ra)
lsize[x,y] <- lsize[x,y]+1
s <- 0
for(j in 1:l){
for(k in 1:l){
if(lsize[j,k] <- 1){
s <- s+1
}
}
}
n.deposits[i] <- n.deposits[i]+s
}
}
for(i in 1:n){
avg.deposits[i] <- n.deposits[i]/e
prob[i] <- avg.deposits[i]/(l*l)
deposits[i] <- i
}
plot(deposits, prob)
There is no infinite loop problem.
This is easy to check if you go ahead and run your code with smaller l,n,e arguments. Your code scales sub-optimally (super-linearly in this case) when increasing any of the arguments mentioned.
Obvious points:
Preallocate matrices. Do not allocate lsize in each loops again and again.
Limit your functions calls; to runif() in this case. You do not have to call the same function thousands of times. Call it once outside the loop to generate the random number you want and then within the loop just access the next element in line.
Use print and cat statement to print out the loop counters you use. Try small values that ensure what they program does what you want and then set your counters to thousands.
Look to vectorize your code when possible. Eg. If 'a = runif(100)' and you want to set all the instances where a < 0.5 to equal 4 there is no reason to loop over all elements of a sequentially. a[ a < 0.5] = 4 is enough.

Loop inside a loop in R

I am trying to create an R code that puts another loop inside of the one I've already created. Here is my code:
t <- rep(1,1000)
omega <- seq(from=1,to=12,by=1)
for(i in 1:1000){
omega <- setdiff(omega,sample(1:12,1))
t[i] <- length(omega)
remove <- 0
f <- length(t [! t %in% remove]) + 1
}
When I run this code, I get a number a trials it takes f to reach the zero vector, but I want to do 10000 iterations of this experiment.
replicate is probably how you want to run the outer loop. There's also no need for the f assignment to be inside the loop. Here I've moved it outside and converted it to simply count of the elements of t that are greater than 0, plus 1.
result <- replicate(10000, {
t <- rep(1, 1000)
omega <- 1:12
for(i in seq_along(t)) {
omega <- setdiff(omega,sample(1:12,1))
t[i] <- length(omega)
}
sum(t > 0) + 1
})
I suspect your code could be simplified in other ways as well, and also that you could just write down the distribution that you're looking for without simulation. I believe your variable of interest is just how long until you get at least one of each of the numbers 1:12, yes?
Are you just looking to run your existing loop 10,000 times, like below?
t <- rep(1,1000)
omega <- seq(from=1,to=12,by=1)
f <- rep(NA, 10000)
for(j in 1:10000) {
for(i in 1:1000){
omega <- setdiff(omega,sample(1:12,1))
t[i] <- length(omega)
remove <- 0
f[j] <- length(t [! t %in% remove]) + 1
}
}

vector binding in R

I would like to implement a simulation program, which requires the following structure:
It has a for loop, the program will generate an vector in each iteration. I need each generated vector is appended to the existing vector.
I do not how how to do this in R. Thanks for the help.
These answers work, but they all require a call to a non-deterministic function like sample() in the loop. This is not loop-invariant code (it is random each time), but it can still be moved out of the for loop. The trick is to use the n argument and generate all the random numbers you need beforehand (if your problem allows this; some may not, but many do). Now you make one call rather than n calls, which matters if your n is large. Here is a quick example random walk (but many problems can be phrased this way). Also, full disclosure: I haven't had any coffee today, so please point out if you see an error :-)
steps <- 30
n <- 100
directions <- c(-1, 1)
results <- vector('list', n)
for (i in seq_len(n)) {
walk <- numeric(steps)
for (s in seq_len(steps)) {
walk[s] <- sample(directions, 1)
}
results[[i]] <- sum(walk)
}
We can rewrite this with one call to sample():
all.steps <- sample(directions, n*steps, replace=TRUE)
dim(all.steps) <- c(n, steps)
walks <- apply(all.steps, 1, sum)
Proof of speed increase (n=10000):
> system.time({
+ for (i in seq_len(n)) {
+ walk <- numeric(steps)
+ for (s in seq_len(steps)) {
+ walk[s] <- sample(directions, 1)
+ }
+ results[[i]] <- sum(walk)
+ }})
user system elapsed
4.231 0.332 4.758
> system.time({
+ all.steps <- sample(directions, n*steps, replace=TRUE)
+ dim(all.steps) <- c(n, steps)
+ walks <- apply(all.steps, 1, sum)
+ })
user system elapsed
0.010 0.001 0.012
If your simulation needs just one random variable per simulation function call, use sapply(), or better yet the multicore package's mclapply(). Revolution Analytics's foreach package may be of use here too. Also, JD Long has a great presentation and post about simulating stuff in R on Hadoop via Amazon's EMR here (I can't find the video, but I'm sure someone will know).
Take home points:
Preallocate with numeric(n) or vector('list', n)
Push invariant code out of for loops. Cleverly push stochastic functions out of code with their n argument.
Try hard for sapply() or lapply(), or better yet mclapply.
Don't use x <- c(x, rnorm(100)). Every time you do this, a member of R-core kills a puppy.
Probably the best thing you can do is preallocate a list of length n (n is number of iterations) and flatten out the list after you're done.
n <- 10
start <- vector("list", n)
for (i in 1:n) {
a[[i]] <- sample(10)
}
start <- unlist(start)
You could do it the old nasty way. This may be slow for larger vectors.
start <- c()
for (i in 1:n) {
add <- sample(10)
start <- c(start, add)
}
x <- rnorm(100)
for (i in 100) {
x <- c(x, rnorm(100))
}
This link should be useful: http://www.milbo.users.sonic.net/ra/
Assuming your simulation function -- call it func -- returns a vector with the same length each time, you can store the results in the columns of a pre-allocated matrix:
sim1 <- function(reps, func) {
first <- func()
result <- matrix(first, nrow=length(first), ncol=reps)
for (i in seq.int(from=2, to=reps - 1)) {
result[, i] <- func()
}
return(as.vector(result))
}
Or you could express it as follows using replicate:
sim2 <- function(reps, func) {
return(as.vector(replicate(reps, func(), simplify=TRUE)))
}
> sim2(3, function() 1:3)
[1] 1 2 3 1 2 3 1 2 3

Resources