I need to speed up my R-code. My bottleneck is a function that needs to use the choose function. It looks like this:
P_ni <- function(Pn,Pi,eta1,eta2,p,d=NA)
{
if(is.na(d)) d <- 1-p
if(Pn==Pi) output <- p^Pn
else
{
if(Pi==1)seq1 <- seq_len(Pn-1)
if(Pi>1)seq1 <- seq_len(Pn-1)[-seq_len(Pi-1)]
output <- sum(choose((Pn-Pi-1),c(seq1-Pi))*choose(Pn,seq1)*
(eta1/(eta1+eta2))^c(seq1-Pi)*
(eta2/(eta1+eta2))^c(Pn-seq1)*p^seq1*d^c(Pn-seq1)
)
}
return(output)
}
This function need to be called several times with different Pn and Pi. The Problem here is, that Pn and Pi only are able to take a single number and not work with vectors. This is caused by the choose()-function.
I do this with a for-loop at the moment and it works perfectly, but it is slow.
The for-loop looks like this:
for(i in 1:nrow(n_k_matrix_p))
{
n_k_matrix_p[i,4] <- P_ni(n_k_matrix_p[i,1],n_k_matrix_p[i,2],eta1,eta2,p)
}
To make it reproducible:
eta1 <- 10
eta2 <- 5
p <- 0.4
n_k_matrix <- expand.grid(c(1:20),c(1:20))
n_k_matrix <- n_k_matrix[n_k_matrix[,1] >=n_k_matrix[,2],]
n_k_matrix <- n_k_matrix[order(n_k_matrix[,1]),]
The n_k_matrix contains my numbers for Pn and Pi.
Unfortunately the loop is still faster than using apply.
Does anyone have any idea how to speed things up?
You can regroup or precompute some computations.
P_ni2 <- function(n, eta1, eta2, p, d = 1 - p) {
res <- matrix(0, n, n)
diag(res) <- p^seq_len(n)
C1 <- eta1 / eta2 * p / d
C2 <- eta2 / (eta1 + eta2) * d
C3 <- eta1 / (eta1 + eta2)
C2_n <- C2^seq_len(n)
C3_n <- C3^seq_len(n)
precomputed <- outer(0:n, 0:n, choose)
for (j in seq_len(n)) {
for (i in seq_len(j - 1)) {
seq1 <- seq(i, j - 1)
res[i, j] <- sum(
precomputed[j-i, seq1-i+1] * precomputed[j+1, seq1+1] * C1^seq1
) * C2_n[j] / C3_n[i]
}
}
res
}
Verif:
> system.time({
+ n_k_matrix[[3]] <- sapply(1:nrow(n_k_matrix), function(i) {
+ P_ni(n_k_matrix[i,1], n_k_matrix[i,2], eta1, eta2, p)
+ })
+ })
utilisateur système écoulé
11.799 0.000 11.797
> system.time({
+ test <- P_ni2(400, eta1, eta2, p)
+ n_k_matrix[[4]] <- test[as.matrix(n_k_matrix[, 2:1])]
+ })
utilisateur système écoulé
2.328 0.003 2.341
> all.equal(n_k_matrix[[3]], n_k_matrix[[4]])
[1] TRUE
Note that I first store the results in the upper triangle of a squared matrix. Then, I convert it in your data frame format (that you call a matrix by the way).
This solution is 5 times faster for n = 400. I think you could improve it by recoding the double-loop (only) in Rcpp.
Related
I need to run a coverage probability test on different sample sizes and censoring proportions. I need to replicate 1000 bootstrap samples using the boot function in R. I have run the code for up to 3-8 hours and I have no idea regarding the runtime.
set.seed(20)
lambda <- 0.02
beta <- 0.5
alpha <- 0.05
n <- 140
N <- 1000
lambda_hat <- NULL
beta_hat <- NULL
cp <- NULL
bp_lambda <- matrix(NA, nrow=N, ncol=2)
bp_beta <- matrix(NA, nrow=N, ncol=2)
for (i in 1:N) {
u <- runif(n)
c_i <- rexp(n, 0.0001)
t_i <- (log(1 - (1/lambda)*log(1 - u)))^(1/beta)
s_i <- 1*(t_i < c_i)
t <- pmin(t_i, c_i)
data <- data.frame(u, t_i, c_i, s_i, t)
estimates.boot <- function(data, j){
data <- data[j, ]
data0 <- data[which(data$s_i == 0), ] #uncensored data
data1 <- data[which(data$s_i == 1), ] #right censored data
data
library(maxLik)
LLF <- function(para) {
t1 <- data$t_i
lambda <- para[1]
beta <- para[2]
e <- s_i*log(lambda*t1^(beta - 1)*beta*exp(t1^beta)*exp(lambda*(1 - exp(t1^beta))))
r <- (1 - s_i)*log(exp(lambda*(1 - exp(t1^beta))))
f <- sum(e + r)
return(f)
}
mle <- maxLik(LLF, start=c(para=c(0.02, 0.5)))
lambda_hat[i] <- mle$estimate[1]
beta_hat[i] <- mle$estimate[2]
return(c(lambda_hat[i], beta_hat[i]))
}
library(boot)
bootstrap <- boot(data, estimates.boot, 1000)
bootlambda <- bootstrap$t[, 1]
klambda <- bootlambda[order(bootlambda)]
bp_lambda[i, ] <- c(klambda[25], klambda[975])
bootbeta <- bootstrap$t[, 2]
kbeta <- bootbeta[order(bootbeta)]
bp_beta[i, ] <- c(kbeta[25], kbeta[975])
}
left_lambda <- sum(bp_lambda[, 1]>lambda)/N
right_lambda <- sum(bp_lambda[, 2]<lambda)/N
total_lambda <- left_lambda + right_lambda
left_beta <- sum(bp_beta[, 1] > beta)/N
right_beta <- sum(bp_beta[, 2]<beta)/N
total_beta <- left_beta + right_beta
sealphahat <- sqrt(alpha*(1 - alpha)/N)
antilambda <- total_lambda>(alpha + 2.58*sealphahat)
conlambda <- total_lambda<(alpha - 2.58*sealphahat)
asymlambda <- (max(left_lambda, right_lambda)/min(left_lambda, right_lambda)) > 1.5
antibeta <- total_beta > (alpha + 2.58*sealphahat)
conbeta <- total_beta < (alpha - 2.58*sealphahat)
asymbeta <- (max(left_beta, right_beta)/min(left_beta, right_beta)) > 1.5
anti <- antilambda + antibeta
con <- conlambda + conbeta
asym <- asymlambda + asymbeta
cbind(anti, con, asym)
Anyone have idea how to speed up the operation?
Basically, you want to apply a random sampling to an estimation function (inner bootstrap) and randomly sample the entire process again (outer bootstrap).
Consequently we could write an estimation function estimate() using replicate() (to avoid boot:boot) and a function for the inner bootstrap innerBoot(). In the latter we could use matrixStats::rowQuantiles for fast computation of the quantiles you want.
I essentially used your code, just fixed a few issues that prevented the code from running.
estimate <- function() {
u <- runif(n)
c_i <- rexp(n, 0.0001)
t_i <- (log(1 - (1/lambda)*log(1 - u)))^(1/beta)
s_i <- 1*(t_i < c_i)
t <- pmin(t_i, c_i)
LLF <- function(para) {
lambda <- para[1]
beta <- para[2]
e <- s_i*log(lambda*t_i^(beta - 1)*beta*exp(t_i^beta)*exp(lambda*(1 - exp(t_i^beta))))
r <- (1 - s_i)*log(exp(lambda*(1 - exp(t_i^beta))))
return(sum(e + r))
}
mle <- maxLik::maxLik(LLF, start=c(para=c(0.02, 0.5)))
return(setNames(mle$estimate, c('lambda_hat', 'beta_hat')))
}
innerBoot <- function() {
boot <- replicate(N, estimate())
return(matrixStats::rowQuantiles(boot, p=c(.025, .975)))
}
We also perform the outer bootstrap with replicate(). I wrap it here in system.time() to get a time measurement.
lambda <- 0.02
beta <- 0.5
alpha <- 0.05
n <- 140
# N <- 1000
N <- 10 ## for testing
seed <- 42
set.seed(seed)
tm <- system.time(
BA <- replicate(N, innerBoot())
)
I got these measurements,
tm
# user system elapsed ## N = 10
# 1.055 0.000 1.057
# user system elapsed ## N = 100
# 102.012 0.227 102.489
which indicates that for N <- 1000 about 167 minutes are to be expected.
The result is an array of dim 2x2xN.
> dim(BA)
[1] 2 2 100
To calculate the summaries we may easily refer to the respective cells.
boot_sum <- function(BA) {
left_lambda <- sum(BA[1, 1, ] > lambda)/N
right_lambda <- sum(BA[1, 2, ] >< lambda)/N
left_beta <- sum(BA[2, 1, ] > beta)/N
right_beta <- sum(BA[2, 2, ] < beta)/N
total_lambda <- left_lambda + right_lambda
total_beta <- left_beta + right_beta
sealphahat <- sqrt(alpha*(1 - alpha)/N)
antilambda <- total_lambda > (alpha + 2.58*sealphahat)
conlambda <- total_lambda < (alpha - 2.58*sealphahat)
asymlambda <- (max(left_lambda, right_lambda)/min(left_lambda, right_lambda)) > 1.5
antibeta <- total_beta > (alpha + 2.58*sealphahat)
conbeta <- total_beta < (alpha - 2.58*sealphahat)
asymbeta <- (max(left_beta, right_beta)/min(left_beta, right_beta)) > 1.5
anti <- antilambda + antibeta
con <- conlambda + conbeta
asym <- asymlambda + asymbeta
return(cbind(anti, con, asym))
}
boot_sum(BA)
# anti con asym
# [1,] 2 0 2
Note: You should definitely check the code in the body of estimate() (i.e. run it manually several times without bootstrapping), as it throws warnings every now and then, probably there is a mistake in how you define LLF().
Warning messages:
1: In log(lambda * t_i^(beta - 1) * beta * exp(t_i^beta) * exp(lambda * :
NaNs produced
2: In log(lambda * t_i^(beta - 1) * beta * exp(t_i^beta) * exp(lambda * :
NaNs produced
Also I'm not sure if the summary calculation currently makes much sense.
My advice is to check your likelihood function and the summary by 1. manually run the lines, 2. starting with a very small N like 10 or so, to see if calculations make sense.
Once you've checked that, it's worth to wait the ~167 minutes to wait for the result.
Or parallelize innerBoot(), which is about 80% faster overall (using 7 cores), as follows:
innerBootParallel <- function() {
boot <- parSapply(cl, 1:N, function(i) estimate())
return(matrixStats::rowQuantiles(boot, p=c(.025, .975)))
}
library(parallel)
cl <- makeCluster(detectCores() - 1)
clusterExport(cl, c('estimate', 'n', 'lambda', 'N', 'beta'))
clusterSetRNGStream(cl, seed)
BA <- replicate(N, innerBootParallel())
stopCluster(cl)
boot_sum(BA)
The pricing of the Asian option is approximated, using Monte Carlo simulation, by:
delta <- 1/12
T <- 2
S0 <- 100
sigma <- 0.20
K <- 100
r <- 0.01
n <- 10^4
m <- T/delta
S <- S0
for(i in 1:n) {
for(j in 1:m) {
W <- rnorm(1)
Si <- S[length(S)]*exp((r-0.5*sigma^2)*delta + sigma*sqrt(delta)*W)
S <- c(S, Si)
}
Si.bar <- mean(S[-1])
Ci <- exp(-r*T)*max(Si.bar - K, 0)
}
mean(Ci)
The for(j in 1:m) for loop runs perfectly, I think... But when I run it n times, using for(i in 1:n) S gets smaller and smaller by n. It decreases to almost zero when n grows. This leads to a mean (Si.bar <- mean(S[-1]) well below the strike price, K= 100.
I can't figure out what is wrong with the two last lines of codes. I'm getting a value on the Asian call option of 0, due to the payoff function. The correct solution to this option is a value of approximately 7 (mean(Ci))
There's a couple of issues with your code. Firstly, it's inefficient in R to build a vector by repeated concatenation. Instead, you should allocate the vector up front and then assign to its members.
Secondly, as I understand it, the aim is to repeat the inner loop n times and store the output into members of a vector C before taking the mean. That's not what you're doing at the moment - each iteration of the outer loop makes S longer and overwrites Ci such that the last statement, mean(Ci) is meaningless.
Here's an amended version of the code. I've used plyr partly to make the code neater, and partly for its progress bar functionality.
library(plyr)
delta <- 1/12
T <- 2
S0 <- 100
sigma <- 0.20
K <- 100
r <- 0.01
n <- 10^4
m <- T/delta
S <- numeric(m + 1)
S[1] <- S0
asian_price <- function() {
for(j in 1:m) {
W <- rnorm(1)
S[j + 1] <- S[j] * exp((r - 0.5 * sigma^2) * delta + sigma * sqrt(delta) * W)
}
Si.bar <- mean(S[-1])
exp(-r * T) * max(Si.bar - K, 0)
}
C <- raply(n, asian_price(), .progress = "text")
mean(C)
# [1] 7.03392
I am working on a research project where I want to determine equivalence of two distributions. I am currently using the Mann-Whitney Test for Equivalence and the code I am running (below) was provided with the book Testing Statistical Hypotheses of Equivalence and Noninferiority by Stefan Wellek (2010). Before running my data I am testing this code with random normal distributions which have the same mean and standard deviation. My problem is that there are three nested for loops and when running larger distributions sizes (as in the example below) the code takes forever to run. If I only had to run it once that would not be such a problem, but I am doing a simulation test and creating power curves so I need to run many iterations of this code (around 10,000). At the moment, depending on how I alter the distribution sizes, it takes days to run 10,000 iterations.
Any help in a way to increase the performance of this would be greatly appreciated.
x <- rnorm(n=125, m=3, sd=1)
y <- rnorm(n=500, m=3, sd=1)
alpha <- 0.05
m <- length(x)
n <- length(y)
eps1_ <- 0.2 #0.1382 default
eps2_ <- 0.2 #0.2602 default
eqctr <- 0.5 + (eps2_-eps1_)/2
eqleng <- eps1_ + eps2_
wxy <- 0
pihxxy <- 0
pihxyy <- 0
for (i in 1:m)
for (j in 1:n)
wxy <- wxy + trunc(0.5*(sign(x[i] - y[j]) + 1))
for (i in 1:m)
for (j1 in 1:(n-1))
for (j2 in (j1+1):n)
pihxyy <- pihxyy + trunc(0.5*(sign(x[i] - max(y[j1],y[j2])) + 1))
for (i1 in 1:(m-1))
for (i2 in (i1+1):m)
for (j in 1:n)
pihxxy <- pihxxy + trunc(0.5*(sign(min(x[i1],x[i2]) - y[j]) + 1))
wxy <- wxy / (m*n)
pihxxy <- pihxxy*2 / (m*(m-1)*n)
pihxyy <- pihxyy*2 / (n*(n-1)*m)
sigmah <- sqrt((wxy-(m+n-1)*wxy**2+(m-1)*pihxxy+(n-1)*pihxyy)/(m*n))
crit <- sqrt(qchisq(alpha,1,(eqleng/2/sigmah)**2))
if (abs((wxy-eqctr)/sigmah) >= crit) rej <- 1
if (abs((wxy-eqctr)/sigmah) < crit) rej <- 0
if (is.na(sigmah) || is.na(crit)) rej <- 1
MW_Decision <- rej
cat(" ALPHA =",alpha," M =",m," N =",n," EPS1_ =",eps1_," EPS2_ =",eps2_,
"\n","WXY =",wxy," SIGMAH =",sigmah," CRIT =",crit," REJ=",MW_Decision)
See edit below for an even better suggestion
One simple suggestion to get a bit of a speed boost is to byte compile your code.
For example, I wrapped your code into a function starting from the alpha <- 0.05 line and ran it on my laptop. Simply byte compiling your current code, it runs twice as fast.
set.seed(1234)
x <- rnorm(n=125, m=3, sd=1)
y <- rnorm(n=500, m=3, sd=1)
# f1 <- function(x,y){ ...your code...}
system.time(f1(x, y))
# user system elapsed
# 33.249 0.008 33.278
library(compiler)
f2 <- cmpfun(f1)
system.time(f2(x, y))
# user system elapsed
# 17.162 0.002 17.170
EDIT
I should add, this is the type of things that a different language would do much better than R. Have you looked at the Rcpp and the inline packages?
I've been curious to learn how to use them so I figured this was a good chance.
Here's a tweak of your code using the inline package and Fortran (since I'm more comfortable with that than C). It wasn't hard at all (provided you know Fortran or C); I just followed the examples listed in cfunction.
First, let's re-write your loops and compile them:
library(inline)
# Fortran code for first loop
loop1code <- "
integer i, j1, j2
real*8 tmp
do i = 1, m
do j1 = 1, n-1
do j2 = j1+1, n
tmp = x(i) - max(y(j1),y(j2))
if (tmp > 0.) pihxyy = pihxyy + 1
end do
end do
end do
"
# Compile the code and turn loop into a function
loop1fun <- cfunction(sig = signature(x="numeric", y="numeric", pihxyy="integer", m="integer", n="integer"), dim=c("(m)", "(n)", "", "", ""), loop1code, language="F95")
# Fortran code for second loop
loop2code <- "
integer i1, i2, j
real*8 tmp
do i1 = 1, m-1
do i2 = i1+1, m
do j = 1, n
tmp = min(x(i1), x(i2)) - y(j)
if (tmp > 0.) pihxxy = pihxxy + 1
end do
end do
end do
"
# Compile the code and turn loop into a function
loop2fun <- cfunction(sig = signature(x="numeric", y="numeric", pihxxy="integer", m="integer", n="integer"), dim=c("(m)", "(n)", "", "", ""), loop2code, language="F95")
Now let's create a new function that uses these. So it's not too long, I'll just sketch the key parts I modified from your code:
f3 <- function(x, y){
# ... code ...
# Remove old loop
## for (i in 1:m)
## for (j1 in 1:(n-1))
## for (j2 in (j1+1):n)
## pihxyy <- pihxyy + trunc(0.5*(sign(x[i] - max(y[j1],y[j2])) + 1))
# Call new function from compiled code instead
pihxyy <- loop1fun(x, y, pihxyy, m, n)$pihxyy
# Remove second loop
## for (i1 in 1:(m-1))
## for (i2 in (i1+1):m)
## for (j in 1:n)
## pihxxy <- pihxxy + trunc(0.5*(sign(min(x[i1],x[i2]) - y[j]) + 1))
# Call new compiled function for second loop
pihxxy <- loop2fun(x, y, pihxxy, m, n)$pihxxy
# ... code ...
}
And now we run it and voila, we get a huge speed boost! :)
system.time(f3(x, y))
# user system elapsed
0.12 0.00 0.12
I did check that it got the same results as your code, but you probably want to run some additional tests just in case.
You can use outer instead of the first double loop:
set.seed(42)
f1 <- function(x,y) {
wxy <- 0
for (i in 1:m)
for (j in 1:n)
wxy <- wxy + trunc(0.5*(sign(x[i] - y[j]) + 1))
wxy
}
f2 <- function(x,y) sum(outer(x,y, function(x,y) trunc(0.5*(sign(x-y)+1))))
f1(x,y)
[1] 32041
f2(x,y)
[1] 32041
You get roughly 50x speedup:
library(microbenchmark)
microbenchmark(f1(x,y),f2(x,y))
Unit: milliseconds
expr min lq median uq max neval
f1(x, y) 138.223841 142.586559 143.642650 145.754241 183.0024 100
f2(x, y) 1.846927 2.194879 2.677827 3.141236 21.1463 100
The other loops are trickier.
The problem with my R script is that it takes too much time and the main solution that I consider is to parallelize it. I don't know where to start.
My code look like this:
n<- nrow (aa)
output <- matrix (0, n, n)
akl<- function (dii){
ddi<- as.matrix (dii)
m<- rowMeans(ddi)
M<- mean(ddi)
r<- sweep (ddi, 1, m)
b<- sweep (r, 2, m)
return (b + M)
}
for (i in 1:n)
{
A<- akl(dist(aa[i,]))
dVarX <- sqrt(mean (A * A))
for (j in i:n)
{
B<- akl(dist(aa[j,]))
V <- sqrt (dVarX * (sqrt(mean(B * B))))
output[i,j] <- (sqrt(mean(A * B))) / V
}
}
I would like to parallelize on different cpus. How can I do that?
I saw the SNOW package, is it suitable for my purpose?
Thank you for suggestions,
Gab
There are two ways in which your code could be made to run faster that I could think of:
First: As #Dwin was saying (with a small twist), you could precompute akl (yes, not necesarily dist, but the whole of akl).
# a random square matrix
aa <- matrix(runif(100), ncol=10)
n <- nrow(aa)
output <- matrix (0, n, n)
akl <- function(dii) {
ddi <- as.matrix(dii)
m <- rowMeans(ddi)
M <- mean(m) # mean(ddi) == mean(m)
r <- sweep(ddi, 1, m)
b <- sweep(r, 2, m)
return(b + M)
}
# precompute akl here
require(plyr)
akl.list <- llply(1:nrow(aa), function(i) {
akl(dist(aa[i, ]))
})
# Now, apply your function, but index the list instead of computing everytime
for (i in 1:n) {
A <- akl.list[[i]]
dVarX <- sqrt(mean(A * A))
for (j in i:n) {
B <- akl.list[[j]]
V <- sqrt (dVarX * (sqrt(mean(B * B))))
output[i,j] <- (sqrt(mean(A * B))) / V
}
}
This should already get your code to run faster than before (as you compute akl everytime in the inner loop) on larger matrices.
Second: In addition to that, you can get it faster by parallelising as follows:
# now, the parallelisation you require can be achieved as follows
# with the help of `plyr` and `doMC`.
# First step of parallelisation is to compute akl in parallel
require(plyr)
require(doMC)
registerDoMC(10) # 10 Cores/CPUs
akl.list <- llply(1:nrow(aa), function(i) {
akl(dist(aa[i, ]))
}, .parallel = TRUE)
# then, you could write your for-loop using plyr again as follows
output <- laply(1:n, function(i) {
A <- akl.list[[i]]
dVarX <- sqrt(mean(A * A))
t <- laply(i:n, function(j) {
B <- akl.list[[j]]
V <- sqrt(dVarX * (sqrt(mean(B*B))))
sqrt(mean(A * B))/V
})
c(rep(0, n-length(t)), t)
}, .parallel = TRUE)
Note that I have added .parallel = TRUE only on the outer loop. This is because, you assign 10 processors to the outer loop. Now, if you add it to both outer and inner loops, then the total number of processers will be 10 * 10 = 100. Please take care of this.
I have a R code that can do convolution of two functions...
convolveSlow <- function(x, y) {
nx <- length(x); ny <- length(y)
xy <- numeric(nx + ny - 1)
for(i in seq(length = nx)) {
xi <- x[[i]]
for(j in seq(length = ny)) {
ij <- i+j-1
xy[[ij]] <- xy[[ij]] + xi * y[[j]]
}
}
xy
}
Is there a way to remove the two for loops and make the code run faster?
Thank you
San
Since R is very fast at computing vector operations, the most important thing to keep in mind when programming for performance is to vectorise as many of your operations as possible.
This means thinking hard about replacing loops with vector operations. Here is my solution for fast convolution (50 times faster with input vectors of length 1000 each):
convolveFast <- function(x, y) {
nx <- length(x)
ny <- length(y)
xy <- nx + ny - 1
xy <- rep(0, xy)
for(i in (1:nx)){
j <- 1:ny
ij <- i + j - 1
xy[i+(1:ny)-1] <- xy[ij] + x[i] * y
}
xy
}
You will notice that the inner loop (for j in ...) has disappeared. Instead, I replaced it with a vector operation. j is now defined as a vector (j <- 1:ny). Notice also that I refer to the entire vector y, rather than subsetting it (i.e. y instead of y[j]).
j <- 1:ny
ij <- i + j - 1
xy[i+(1:ny)-1] <- xy[ij] + x[i] * y
I wrote a small function to measure peformance:
measure.time <- function(fun1, fun2, ...){
ptm <- proc.time()
x1 <- fun1(...)
time1 <- proc.time() - ptm
ptm <- proc.time()
x2 <- fun2(...)
time2 <- proc.time() - ptm
ident <- all(x1==x2)
cat("Function 1\n")
cat(time1)
cat("\n\nFunction 2\n")
cat(time2)
if(ident) cat("\n\nFunctions return identical results")
}
For two vectors of length 1000 each, I get a 98% performance improvement:
x <- runif(1000)
y <- runif(1000)
measure.time(convolveSlow, convolveFast, x, y)
Function 1
7.07 0 7.59 NA NA
Function 2
0.14 0 0.16 NA NA
Functions return identical results
For vectors, you index with [], not [[]], so use xy[ij] etc
Convolution doesn't vectorise easily but one common trick is to switch to compiled code. The Writing R Extensions manual uses convolution as a running example and shows several alternative; we also use it a lot in the Rcpp documentation.
As Dirk says, compiled code can be a lot faster. I had to do this for one of my projects and was surprised at the speedup: ~40x faster than Andrie's solution.
> a <- runif(10000)
> b <- runif(10000)
> system.time(convolveFast(a, b))
user system elapsed
7.814 0.001 7.818
> system.time(convolveC(a, b))
user system elapsed
0.188 0.000 0.188
I made several attempts to speed this up in R before I decided that using C code couldn't be that bad (note: it really wasn't). All of mine were slower than Andrie's, and were variants on adding up the cross-product appropriately. A rudimentary version can be done in just three lines.
convolveNotAsSlow <- function(x, y) {
xyt <- x %*% t(y)
ds <- row(xyt)+col(xyt)-1
tapply(xyt, ds, sum)
}
This version only helps a little.
> a <- runif(1000)
> b <- runif(1000)
> system.time(convolveSlow(a, b))
user system elapsed
6.167 0.000 6.170
> system.time(convolveNotAsSlow(a, b))
user system elapsed
5.800 0.018 5.820
My best version was this:
convolveFaster <- function(x,y) {
foo <- if (length(x)<length(y)) {y %*% t(x)} else { x %*% t(y) }
foo.d <- dim(foo)
bar <- matrix(0, sum(foo.d)-1, foo.d[2])
bar.rc <- row(bar)-col(bar)
bar[bar.rc>=0 & bar.rc<foo.d[1]]<-foo
rowSums(bar)
}
This was quite a bit better, but still not nearly as fast as Andrie's
> system.time(convolveFaster(a, b))
user system elapsed
0.280 0.038 0.319
The convolveFast function can be optimized a little by carefully using integer math only and replacing (1:ny)-1L with seq.int(0L, ny-1L):
convolveFaster <- function(x, y) {
nx <- length(x)
ny <- length(y)
xy <- nx + ny - 1L
xy <- rep(0L, xy)
for(i in seq_len(nx)){
j <- seq_len(ny)
ij <- i + j - 1L
xy[i+seq.int(0L, ny-1L)] <- xy[ij] + x[i] * y
}
xy
}
How about convolve(x, rev(y), type = "open") in stats?
> x <- runif(1000)
> y <- runif(1000)
> system.time(a <- convolve(x, rev(y), type = "o"))
user system elapsed
0.032 0.000 0.032
> system.time(b <- convolveSlow(x, y))
user system elapsed
11.417 0.060 11.443
> identical(a,b)
[1] FALSE
> all.equal(a,b)
[1] TRUE
Some say the apply() and sapply() functions are faster than for() loops in R. You could convert the convolution to a function and call it from within apply().
However, there is evidence to the contrary
http://yusung.blogspot.com/2008/04/speed-issue-in-r-computing-apply-vs.html