So I am trying to calculate the Pareto front (http://en.wikipedia.org/wiki/Pareto_efficiency) in R and am able to do it, however, I am not able to do it efficiently. In particular as the number of pairs of points increases, the computations slow down considerably.
So in general, what I want to do is check for all non-dominated (or dominated) pairs. Now the way I have been doing this is to find all such pair of points such that xi > X
and yi > Y where (xi, yi) are a single pair and X and Y represent all points x and y. Now, this part works very fast and is easy to implement, however, there is the additional possibility that multiple x values may be the same but they will have different y values so in that case I want to be able to identify the x value that has the lowest y value (and vise versa for points that have identical y values but different x values).
To illustrate this point here is a picture from Wikipedia:
so basically I want to be able to identify all points that lie on the red line.
Here is my code that does work but is very inefficient for large datasets:
#Example Data that actually runs quickly
x = runif(10000)
y = runif(10000)
pareto = 1:length(x)
for(i in 1:length(x)){
cond1 = y[i]!=min(y[which(x==x[i])])
cond2 = x[i]!=min(x[which(y==y[i])])
for(n in 1:length(x)){
if((x[i]>x[n] & y[i]>y[n]) | (x[i]==x[n] & cond1) | (y[i]==y[n] & cond2)){
pareto[i] = NA
break
}
}
}
#All points not on the red line should be marks as NA in the pareto variable
The slow down definitely comes from calculating the points where (x[i]==x[n] & cond1) | (y[i]==y[n] & cond2) but I cannot find a way around it or a better Boolean expression to capture everything that I want. any suggestions greatly appreciated!
Following #BrodieG
system.time( {
d = data.frame(x,y)
D = d[order(d$x,d$y,decreasing=FALSE),]
front = D[which(!duplicated(cummin(D$y))),]
} )
user system elapsed
0.02 0.00 0.02
which is 0.86/0.02 = 43 times faster!
EDIT: new version:
system.time( {
pareto.2 <- logical(length(x))
x.sort <- sort(x)
y.sort <- y[order(x)]
y.min <- max(y)
for(i in 1:length(x.sort)) {
if(pareto.2[i] <- y.sort[i] <= y.min) y.min <- y.sort[i]
}
} )
# user system elapsed
# 0.036 0.000 0.035
OLD VERSION:
This is about 6x faster on my system. You can probably do better with a better algorithm, as well as with Rcpp, but this was straightforward. The trick here is to sort by x, which then allows you to limit your check to making sure that all prior values of x must have greater values of y to ensure that point is on the frontier.
system.time( {
pareto.2 <- logical(length(x))
x.sort <- sort(x)
y.sort <- y[order(x)]
for(i in 1:length(x.sort)) {
pareto.2[i] <- all(y.sort[1:i] >= y.sort[i])
}
} )
# user system elapsed
# 0.86 0.00 0.88
The original:
pareto = 1:length(x)
system.time(
for(i in 1:length(x)){
cond1 = y[i]!= min(y[which(x==x[i])])
cond2 = x[i]!= min(x[which(y==y[i])])
for(n in 1:length(x)){
if((x[i]>x[n] & y[i]>y[n]) | (x[i]==x[n] & cond1) | (y[i]==y[n] & cond2)){
pareto[i] = NA
break
}
}
}
)
# user system elapsed
# 5.32 0.00 5.33
And showing the two methods produce the same result (a bit tricky because I need to re-order pareto.2 to the original order of x):
all.equal(pareto.2[match(1:length(x), order(x))], !is.na(pareto))
# [1] TRUE
Wanted to share with you my solution as a function. It's been tested and works pretty good for N Pareto-fronts. Set to fronts = Inf to calculate all fronts.
pareto_front <- function(x, y, fronts = 1, sort = TRUE) {
stopifnot(length(x) == length(y))
d <- data.frame(x, y)
Dtemp <- D <- d[order(d$x, d$y, decreasing = FALSE), ]
df <- data.frame()
i <- 1
while (nrow(Dtemp) >= 1 & i <= max(fronts)) {
these <- Dtemp[which(!duplicated(cummin(Dtemp$y))), ]
these$pareto_front <- i
df <- rbind(df, these)
Dtemp <- Dtemp[!row.names(Dtemp) %in% row.names(these), ]
i <- i + 1
}
ret <- merge(x = d, y = df, by = c("x", "y"), all.x = TRUE, sort = sort)
return(ret)
}
Related
I need to calculate entries of a vector whose length I do not know beforehand. How to do so efficiently?
A trivial solution is to "grow" it: start with a small or empty vector and successively append new entries until the stopping criterion is reached. For example:
foo <- numeric(0)
while ( sum(foo) < 100 ) foo <- c(foo,runif(1))
length(foo)
# 195
However, "growing" vectors is frowned upon in R for performance reasons.
Of course, I could "grow it in chunks": pre-allocate a "good-sized" vector, fill it, double its length when it is full, and finally cut it down to size. But this feels error-prone and will make for inelegant code.
Is there a better or canonical way to do this? (In my actual application, the calculation and the stopping criterion are a bit more complicated, of course.)
In reply to some useful comments
Even if you don't know the length beforehand, do you know the maximum possible length it can theoretically have? In such cases I tend to initialize the vector with that length and after the loop cut the NAs or remove the unused entries based on the latest index value.
No, the maximum length is not known in advance.
Do you need to keep all values as the vector grows?
Yes, I do.
What about something like rand_num <- runif(300); rand_num[cumsum(rand_num) < 100] where you choose a sufficiently large vector that you know for a high probability that the condition will be met? You can of course check it and use an even bigger number if it's not met. I've tested up till runif(10000) it's still faster than "growing".
My actual use case involves a dynamic calculation, which I can't simply vectorize (otherwise I would not be asking).
Specifically, to approximate the convolution of negative binomial random variables, I need to calculate the probability masses of the integer random variable $K$ in Theorem 2 in Furman, 2007 up to a high cumulative probability. These masses $pr_k$ involve some intricate recursive sums.
I could "grow it in chunks": pre-allocate a "good-sized" vector, fill it, double its length when it is full, and finally cut it down to size. But this feels error-prone and will make for inelegant code.
Sounds like you are referring to the accepted answer of Collecting an unknown number of results in a loop. Have you coded it up and tried it? The idea of length doubling is more than sufficient (see the end of this answer), as the length will grow geometrically. I will demonstrate my method in the following.
For test purpose, wrap your code in a function. Note how I avoid doing sum(z) for every while test.
ref <- function (stop_sum, timing = TRUE) {
set.seed(0) ## fix a seed to compare performance
if (timing) t1 <- proc.time()[[3]]
z <- numeric(0)
sum_z <- 0
while ( sum_z < stop_sum ) {
z_i <- runif(1)
z <- c(z, z_i)
sum_z <- sum_z + z_i
}
if (timing) {
t2 <- proc.time()[[3]]
return(t2 - t1) ## return execution time
} else {
return(z) ## return result
}
}
Chunking is necessary to reduce the operational costs of concatenation.
template <- function (chunk_size, stop_sum, timing = TRUE) {
set.seed(0) ## fix a seed to compare performance
if (timing) t1 <- proc.time()[[3]]
z <- vector("list") ## store all segments in a list
sum_z <- 0 ## cumulative sum
while ( sum_z < stop_sum ) {
segmt <- numeric(chunk_size) ## initialize a segment
i <- 1
while (i <= chunk_size) {
z_i <- runif(1) ## call a function & get a value
sum_z <- sum_z + z_i ## update cumulative sum
segmt[i] <- z_i ## fill in the segment
if (sum_z >= stop_sum) break ## ready to break at any time
i <- i + 1
}
## grow the list
if (sum_z < stop_sum) z <- c(z, list(segmt))
else z <- c(z, list(segmt[1:i]))
}
if (timing) {
t2 <- proc.time()[[3]]
return(t2 - t1) ## return execution time
} else {
return(unlist(z)) ## return result
}
}
Let's check correctness first.
z <- ref(1e+4, FALSE)
z1 <- template(5, 1e+4, FALSE)
z2 <- template(1000, 1e+4, FALSE)
range(z - z1)
#[1] 0 0
range(z - z2)
#[1] 0 0
Let's then compare speed.
## reference implementation
t0 <- ref(1e+4, TRUE)
## unrolling implementation
trial_chunk_size <- seq(5, 1000, by = 5)
tm <- sapply(trial_chunk_size, template, stop_sum = 1e+4, timing = TRUE)
## visualize timing statistics
plot(trial_chunk_size, tm, type = "l", ylim = c(0, t0), col = 2, bty = "l")
abline(h = t0, lwd = 2)
Looks like chunk_size = 200 is sufficiently good, and the speedup factor is
t0 / tm[trial_chunk_size == 200]
#[1] 16.90598
Let's finally see how much time is spent for growing vector with c, via profiling.
Rprof("a.out")
z0 <- ref(1e+4, FALSE)
Rprof(NULL)
summaryRprof("a.out")$by.self
# self.time self.pct total.time total.pct
#"c" 1.68 90.32 1.68 90.32
#"runif" 0.12 6.45 0.12 6.45
#"ref" 0.06 3.23 1.86 100.00
Rprof("b.out")
z1 <- template(200, 1e+4, FALSE)
Rprof(NULL)
summaryRprof("b.out")$by.self
# self.time self.pct total.time total.pct
#"runif" 0.10 83.33 0.10 83.33
#"c" 0.02 16.67 0.02 16.67
Adaptive chunk_size with linear growth
ref has O(N * N) operational complexity where N is the length of the final vector. template in principle has O(M * M) complexity, where M = N / chunk_size. To attain linear complexity O(N), chunk_size needs to grow with N, but a linear growth is sufficient: chunk_size <- chunk_size + 1.
template1 <- function (chunk_size, stop_sum, timing = TRUE) {
set.seed(0) ## fix a seed to compare performance
if (timing) t1 <- proc.time()[[3]]
z <- vector("list") ## store all segments in a list
sum_z <- 0 ## cumulative sum
while ( sum_z < stop_sum ) {
segmt <- numeric(chunk_size) ## initialize a segment
i <- 1
while (i <= chunk_size) {
z_i <- runif(1) ## call a function & get a value
sum_z <- sum_z + z_i ## update cumulative sum
segmt[i] <- z_i ## fill in the segment
if (sum_z >= stop_sum) break ## ready to break at any time
i <- i + 1
}
## grow the list
if (sum_z < stop_sum) z <- c(z, list(segmt))
else z <- c(z, list(segmt[1:i]))
## increase chunk_size
chunk_size <- chunk_size + 1
}
## remove this line if you want
cat(sprintf("final chunk size = %d\n", chunk_size))
if (timing) {
t2 <- proc.time()[[3]]
return(t2 - t1) ## return execution time
} else {
return(unlist(z)) ## return result
}
}
A quick test verifies that we have attained linear complexity.
template1(200, 1e+4)
#final chunk size = 283
#[1] 0.103
template1(200, 1e+5)
#final chunk size = 664
#[1] 1.076
template1(200, 1e+6)
#final chunk size = 2012
#[1] 10.848
template1(200, 1e+7)
#final chunk size = 6330
#[1] 108.183
Suppose I have the following data frame
set.seed(36)
n <- 300
dat <- data.frame(x = round(runif(n,0,200)), y = round(runif(n, 0, 500)))
d <- dat[order(dat$y),]
For each value of d$y<=300, I have to create a variable res in which the numerator is the sum of the indicator (d$x <= d$y[i]) and the denominator is the sum of the indicator (d$y >= d$y[i]). I have written the codes in for loop:
res <- NULL
for( i in seq_len(sum(d$y<=300)) ){
numerator <- sum(d$x <= d$y[i])
denominator <- sum(d$y >= d$y[i])
res[i] <- numerator / denominator
}
But my concern is when the number of observations of x and y is large, that is, the number of rows of the data frame increases, the for loop will work slowly. Additionally, if I simulate data 1000 times and each time run the for loop, the program will be inefficient.
What can be the more efficient solution of the code?
This depends on d already being sorted as it is:
# example data
set.seed(36)
n <- 1e5
dat <- data.frame(x = round(runif(n,0,200)), y = round(runif(n, 0, 500)))
d <- dat[order(dat$y),]
My suggestion (thanks to #alexis_laz for the denominator):
system.time(res3 <- {
xs <- sort(d$x) # sorted x
yt <- d$y[d$y <= 300] # truncated y
num = findInterval(yt, xs)
den = length(d$y) - match(yt, d$y) + 1L
num/den
})
# user system elapsed
# 0 0 0
OP's approach:
system.time(res <- {
res <- NULL
for( i in seq_len(sum(d$y<=300)) ){
numerator <- sum(d$x <= d$y[i])
denominator <- sum(d$y >= d$y[i])
res[i] <- numerator / denominator
}
res
})
# user system elapsed
# 50.77 1.13 52.10
# verify it matched
all.equal(res,res3) # TRUE
#d.b's approach:
system.time(res2 <- {
numerator = rowSums(outer(d$y, d$x, ">="))
denominator = rowSums(outer(d$y, d$y, "<="))
res2 = numerator/denominator
res2 = res2[d$y <= 300]
res2
})
# Error: cannot allocate vector of size 74.5 Gb
# ^ This error is common when using outer() on large-ish problems
Vectorization. Generally, tasks are faster in R if they can be vectorized. The key functions related to ordered vectors have confusing names (findInterval, sort, order and cut), but fortunately they all work on vectors.
Continuous vs discrete. The match above should be a fast way to compute the denominator whether the data is continuous or has mass points / repeating values. If the data is continuous (and so has no repeats), the denominator can just be seq(length(xs), length = length(yt), by=-1). If it is fully discrete and has a lot of repetition (like the example here), there might be some way to make that faster as well, maybe like one of these:
den2 <- inverse.rle(with(rle(yt), list(
values = length(xs) - length(yt) + rev(cumsum(rev(lengths))),
lengths = lengths)))
tab <- unname(table(yt))
den3 <- rep(rev(cumsum(rev(tab))) + length(xs) - length(yt), tab)
# verify
all.equal(den,den2) # TRUE
all.equal(den,den3) # TRUE
findInterval will still work for the numerator for continuous data. It's not ideal for the repeated-values case considered here I guess (since we're redundantly finding the interval for many repeated yt values). Similar ideas for speeding that up likely apply.
Other options. As #chinsoon suggested, the data.table package might be a good fit if findInterval is too slow, since it has a lot of features focused on sorted data, but it's not obvious to me how to apply it here.
Instead of running loop, generate all the numerator and denominator at once. This also allows you to keep track of which res is associated with which x and y. Later, you can keep only the ones you want.
You can use outer for element wise comparison between vectors.
numerator = rowSums(outer(d$y, d$x, ">=")) #Compare all y against all x
denominator = rowSums(outer(d$y, d$y, "<=")) #Compare all y against itself
res2 = numerator/denominator #Obtain 'res' for all rows
#I would first 'cbind' res2 to d and only then remove the ones for 'y <=300'
res2 = res2[d$y <= 300] #Keep only those 'res' that you want
Since this is using rowSums, this should be faster.
I am trying to build a function that creates a vector where any item is NOT the sum of any combination of other items in the list (without duplication).
This function does the job but is quite slow... any bright thoughts on how to improve it?
sum_fun <- function(k)
{
out_list <- c(2,3,4)
new_num <- 4
while(length(out_list) < k)
{
new_num <- new_num + 1
#Check if new_num can be written as a sum of the terms in out_list
new_valid <- T
for (i in 2:(length(out_list) - 1)){
if (new_num %in% (apply(combn(out_list,i), FUN = sum, MAR = 2)))
{
new_valid <- F
break
}
}
if (new_valid)
{
out_list <- c(out_list, new_num)
}
}
return(out_list)
}
This was a good question. I made some changes to your original function and got mine to run a bit quicker than your function. On a side note, how many are you trying to find?
The main idea is that we shouldn't calculate more things more often than we absolutely have to. I think the for loop was probably slowing things down a bit, plus, how many of the column sums were repeated? If we can "de-dup" the list, maybe we can search through it more quickly [reduce, reuse, recycle :) ].
sum_fun2 <- function(k)
{
out_list <- c(2,3,4) #dummy list
new_num <- 4 #dummy number
calc_big_sum <- T #calculate big sum on the first go
while(length(out_list) < k)
{
new_num <- new_num + 1 #dummy number to add
#calculate big sum, and then find unique values
if(calc_big_sum){
big_sum<- unique(unlist(lapply(lapply(2:(length(out_list) - 1),
FUN = function(x) combn(out_list, m = x)),
FUN = function(y) apply(y, 2, sum))))
}
if(new_num %in% big_sum){
calc_big_sum = F #don't make it calculate the sum again
}else{
out_list <- c(out_list, new_num) #add number to list
calc_big_sum = T #make it calculate a new sum
}
}
return(out_list)
}
> system.time(sum_fun2(10))
user system elapsed
0.03 0.00 0.03
> system.time(sum_fun(10))
user system elapsed
1.30 0.00 1.27
> system.time(sum_fun2(14))
user system elapsed
3.35 0.07 3.47
> system.time(sum_fun(14))
## I ended it
Timing stopped at: 39.86 0 40.02
So I am trying to figure out if there is a better way to add multiple conditional statements to an if clause in R in order to speed up the process. Below is some code that I wrote that runs very fast on large datasets in the simple case and not so fast on the not so simple case. Any suggestions are greatly appreciated! Also, the tic-toc function is at the very bottom on the question in case you would like to run it yourself and see how fast the function runs.
Also, to give some intuition of what the code is doing, the first chunk is simply determining if there are any pairs of x's and y's that have larger values than all of the other x's and y's.
The second chunk of code is doing the same thing, however, it adds the condition that is any of the x values are actually equal to each other then check to see which one has the lowest y value. Likewise, if any of the y values are equal to each other than check to see which one has the lowest x value.
So, running the code in the simple case I have the following:
tic()
x = runif(10000)
y = runif(10000)
front = 1:length(x)
for(i in 1:length(x)){
for(n in 1:length(x)){
if((x[i]>x[n] & y[i]>y[n])){
front[i] = NA
break
}
}
}
toc()
So as you can see, I am only evaluating the single condition that x[i]>x[n] & y[i]>y[n]
toc()
elapsed
1.28
and the code above runs in 1.28 seconds.
Now, running the code when I have three conditions to check I have the following:
tic()
x = runif(10000)
y = runif(10000)
front = 1:length(x)
for(i in 1:length(x)){
for(n in 1:length(x)){
if((x[i]>x[n] & y[i]>y[n]) | (x[i]==x[n] & y[i]!=min(y[which(x==x[i])])) | (y[i]==y[n] & x[i]!=min(x[which(y==y[i])]))){
front[i] = NA
break
}
}
}
toc()
so as you can see, I now have to check three conditions inside my if statement, namely,
(x[i]>x[n] & y[i]>y[n]) | (x[i]==x[n] & y[i]!=min(y[which(x==x[i])])) | (y[i]==y[n] & x[i]!=min(x[which(y==y[i])]))
however, this leads to a huge computational burden in R and make the code much more slow.
> toc()
elapsed
74.47
We see that running the newly adapted code its now slowed down considerably to 74.47 seconds. Now I am looking for either alternative function calls that would speed up my code or simply rewriting it in a "better" way that the code is not so slow.
Here is the code for the tic-toc function if needed:
tic <- function(gcFirst = TRUE, type=c("elapsed", "user.self", "sys.self"))
{
type <- match.arg(type)
assign(".type", type, envir=baseenv())
if(gcFirst) gc(FALSE)
tic <- proc.time()[type]
assign(".tic", tic, envir=baseenv())
invisible(tic)
}
toc <- function()
{
type <- get(".type", envir=baseenv())
toc <- proc.time()[type]
tic <- get(".tic", envir=baseenv())
print(toc - tic)
invisible(toc)
}
EDIT for sashkello
So my code now looks like this:
library(mvtnorm)
#Here are the variables I will be working with
> x
[1] 0.53137100 0.75357474 0.87904120 0.29727488 0.00000000 0.00000000
[7] 0.00000000 0.00000000 0.00000000 0.04059217
> y
[1] 4.873500 3.896917 1.258215 5.776484 12.475491 5.273784 13.803158
[8] 4.472204 2.629839 6.689242
> front
[1] NA NA 3 NA NA NA NA NA 9 NA
> all.preds
[1] 0.596905183 0.027696850 1.005666896 0.007688514 3.900000000
x = x[!is.na(front)]
y = y[!is.na(front)]
mu = c(all.preds[1],all.preds[3])
sigma = matrix(c(all.preds[2],0,0,all.preds[4]),nrow=2)
z = rmvnorm(10000,mu,sigma)
z[,1] = sapply(z[,1],function(x){max(x,0)})
points(z,col="black",pch=19,cex=.01)
temp = 1:nrow(z)
for(i in 1:length(temp)){
cond1 = z[i,2]!=min(z[which(z[,1]==z[i,1]),2])
cond2 = z[i,1]!=min(z[which(z[,2]==z[i,2]),1])
for(n in 1:length(x)){
if((z[i,1]>x[n] & z[i,2]>y[n]) | (z[i,1]==x[n] & cond1) | (z[i,2]==y[n] & cond2)){
temp[i] = NA
break
}
}
}
prop = sum(!is.na(temp))/length(temp)
and that cond1 and cond2 statement still take horribly long. Any suggestions?
You can put y[i]!=min(y[which(x==x[i])]) and x[i]!=min(x[which(y==y[i])]) before the second loop, because they both only involve i.
for(i in 1:length(x)){
cond1 = y[i]!=min(y[which(x==x[i])])
cond2 = x[i]!=min(x[which(y==y[i])])
for(n in 1:length(x)){
if((x[i]>x[n] & y[i]>y[n]) | (x[i]==x[n] & cond1) | (y[i]==y[n] & cond2)){
This should speed things up significantly because both min and which are extremely slow and you are running them every time in the second loop.
Since you asked for it, here is an efficient way to calculate cond1 outside of a for loop (which you probably don't need at all):
#some data_
set.seed(42)
z <- matrix(sample(1:5, 200, TRUE), ncol=2)
#your loop
cond1 <- logical(100)
for (i in 1:100) {
cond1[i] = z[i,2]!=min(z[which(z[,1]==z[i,1]),2])
}
#alternative
library(data.table)
DT <- data.table(z)
DT[, id:=.I]
DT[, cond1:=V2!=min(V2), by=V1]
#compare results
identical(DT[, cond1], cond1)
#[1] TRUE
I have a R code that can do convolution of two functions...
convolveSlow <- function(x, y) {
nx <- length(x); ny <- length(y)
xy <- numeric(nx + ny - 1)
for(i in seq(length = nx)) {
xi <- x[[i]]
for(j in seq(length = ny)) {
ij <- i+j-1
xy[[ij]] <- xy[[ij]] + xi * y[[j]]
}
}
xy
}
Is there a way to remove the two for loops and make the code run faster?
Thank you
San
Since R is very fast at computing vector operations, the most important thing to keep in mind when programming for performance is to vectorise as many of your operations as possible.
This means thinking hard about replacing loops with vector operations. Here is my solution for fast convolution (50 times faster with input vectors of length 1000 each):
convolveFast <- function(x, y) {
nx <- length(x)
ny <- length(y)
xy <- nx + ny - 1
xy <- rep(0, xy)
for(i in (1:nx)){
j <- 1:ny
ij <- i + j - 1
xy[i+(1:ny)-1] <- xy[ij] + x[i] * y
}
xy
}
You will notice that the inner loop (for j in ...) has disappeared. Instead, I replaced it with a vector operation. j is now defined as a vector (j <- 1:ny). Notice also that I refer to the entire vector y, rather than subsetting it (i.e. y instead of y[j]).
j <- 1:ny
ij <- i + j - 1
xy[i+(1:ny)-1] <- xy[ij] + x[i] * y
I wrote a small function to measure peformance:
measure.time <- function(fun1, fun2, ...){
ptm <- proc.time()
x1 <- fun1(...)
time1 <- proc.time() - ptm
ptm <- proc.time()
x2 <- fun2(...)
time2 <- proc.time() - ptm
ident <- all(x1==x2)
cat("Function 1\n")
cat(time1)
cat("\n\nFunction 2\n")
cat(time2)
if(ident) cat("\n\nFunctions return identical results")
}
For two vectors of length 1000 each, I get a 98% performance improvement:
x <- runif(1000)
y <- runif(1000)
measure.time(convolveSlow, convolveFast, x, y)
Function 1
7.07 0 7.59 NA NA
Function 2
0.14 0 0.16 NA NA
Functions return identical results
For vectors, you index with [], not [[]], so use xy[ij] etc
Convolution doesn't vectorise easily but one common trick is to switch to compiled code. The Writing R Extensions manual uses convolution as a running example and shows several alternative; we also use it a lot in the Rcpp documentation.
As Dirk says, compiled code can be a lot faster. I had to do this for one of my projects and was surprised at the speedup: ~40x faster than Andrie's solution.
> a <- runif(10000)
> b <- runif(10000)
> system.time(convolveFast(a, b))
user system elapsed
7.814 0.001 7.818
> system.time(convolveC(a, b))
user system elapsed
0.188 0.000 0.188
I made several attempts to speed this up in R before I decided that using C code couldn't be that bad (note: it really wasn't). All of mine were slower than Andrie's, and were variants on adding up the cross-product appropriately. A rudimentary version can be done in just three lines.
convolveNotAsSlow <- function(x, y) {
xyt <- x %*% t(y)
ds <- row(xyt)+col(xyt)-1
tapply(xyt, ds, sum)
}
This version only helps a little.
> a <- runif(1000)
> b <- runif(1000)
> system.time(convolveSlow(a, b))
user system elapsed
6.167 0.000 6.170
> system.time(convolveNotAsSlow(a, b))
user system elapsed
5.800 0.018 5.820
My best version was this:
convolveFaster <- function(x,y) {
foo <- if (length(x)<length(y)) {y %*% t(x)} else { x %*% t(y) }
foo.d <- dim(foo)
bar <- matrix(0, sum(foo.d)-1, foo.d[2])
bar.rc <- row(bar)-col(bar)
bar[bar.rc>=0 & bar.rc<foo.d[1]]<-foo
rowSums(bar)
}
This was quite a bit better, but still not nearly as fast as Andrie's
> system.time(convolveFaster(a, b))
user system elapsed
0.280 0.038 0.319
The convolveFast function can be optimized a little by carefully using integer math only and replacing (1:ny)-1L with seq.int(0L, ny-1L):
convolveFaster <- function(x, y) {
nx <- length(x)
ny <- length(y)
xy <- nx + ny - 1L
xy <- rep(0L, xy)
for(i in seq_len(nx)){
j <- seq_len(ny)
ij <- i + j - 1L
xy[i+seq.int(0L, ny-1L)] <- xy[ij] + x[i] * y
}
xy
}
How about convolve(x, rev(y), type = "open") in stats?
> x <- runif(1000)
> y <- runif(1000)
> system.time(a <- convolve(x, rev(y), type = "o"))
user system elapsed
0.032 0.000 0.032
> system.time(b <- convolveSlow(x, y))
user system elapsed
11.417 0.060 11.443
> identical(a,b)
[1] FALSE
> all.equal(a,b)
[1] TRUE
Some say the apply() and sapply() functions are faster than for() loops in R. You could convert the convolution to a function and call it from within apply().
However, there is evidence to the contrary
http://yusung.blogspot.com/2008/04/speed-issue-in-r-computing-apply-vs.html