Avoid two for loops in R - r

I have a R code that can do convolution of two functions...
convolveSlow <- function(x, y) {
nx <- length(x); ny <- length(y)
xy <- numeric(nx + ny - 1)
for(i in seq(length = nx)) {
xi <- x[[i]]
for(j in seq(length = ny)) {
ij <- i+j-1
xy[[ij]] <- xy[[ij]] + xi * y[[j]]
Is there a way to remove the two for loops and make the code run faster?
Thank you

Since R is very fast at computing vector operations, the most important thing to keep in mind when programming for performance is to vectorise as many of your operations as possible.
This means thinking hard about replacing loops with vector operations. Here is my solution for fast convolution (50 times faster with input vectors of length 1000 each):
convolveFast <- function(x, y) {
nx <- length(x)
ny <- length(y)
xy <- nx + ny - 1
xy <- rep(0, xy)
for(i in (1:nx)){
j <- 1:ny
ij <- i + j - 1
xy[i+(1:ny)-1] <- xy[ij] + x[i] * y
You will notice that the inner loop (for j in ...) has disappeared. Instead, I replaced it with a vector operation. j is now defined as a vector (j <- 1:ny). Notice also that I refer to the entire vector y, rather than subsetting it (i.e. y instead of y[j]).
j <- 1:ny
ij <- i + j - 1
xy[i+(1:ny)-1] <- xy[ij] + x[i] * y
I wrote a small function to measure peformance:
measure.time <- function(fun1, fun2, ...){
ptm <- proc.time()
x1 <- fun1(...)
time1 <- proc.time() - ptm
ptm <- proc.time()
x2 <- fun2(...)
time2 <- proc.time() - ptm
ident <- all(x1==x2)
cat("Function 1\n")
cat("\n\nFunction 2\n")
if(ident) cat("\n\nFunctions return identical results")
For two vectors of length 1000 each, I get a 98% performance improvement:
x <- runif(1000)
y <- runif(1000)
measure.time(convolveSlow, convolveFast, x, y)
Function 1
7.07 0 7.59 NA NA
Function 2
0.14 0 0.16 NA NA
Functions return identical results

For vectors, you index with [], not [[]], so use xy[ij] etc
Convolution doesn't vectorise easily but one common trick is to switch to compiled code. The Writing R Extensions manual uses convolution as a running example and shows several alternative; we also use it a lot in the Rcpp documentation.

As Dirk says, compiled code can be a lot faster. I had to do this for one of my projects and was surprised at the speedup: ~40x faster than Andrie's solution.
> a <- runif(10000)
> b <- runif(10000)
> system.time(convolveFast(a, b))
user system elapsed
7.814 0.001 7.818
> system.time(convolveC(a, b))
user system elapsed
0.188 0.000 0.188
I made several attempts to speed this up in R before I decided that using C code couldn't be that bad (note: it really wasn't). All of mine were slower than Andrie's, and were variants on adding up the cross-product appropriately. A rudimentary version can be done in just three lines.
convolveNotAsSlow <- function(x, y) {
xyt <- x %*% t(y)
ds <- row(xyt)+col(xyt)-1
tapply(xyt, ds, sum)
This version only helps a little.
> a <- runif(1000)
> b <- runif(1000)
> system.time(convolveSlow(a, b))
user system elapsed
6.167 0.000 6.170
> system.time(convolveNotAsSlow(a, b))
user system elapsed
5.800 0.018 5.820
My best version was this:
convolveFaster <- function(x,y) {
foo <- if (length(x)<length(y)) {y %*% t(x)} else { x %*% t(y) }
foo.d <- dim(foo)
bar <- matrix(0, sum(foo.d)-1, foo.d[2])
bar.rc <- row(bar)-col(bar)
bar[bar.rc>=0 & bar.rc<foo.d[1]]<-foo
This was quite a bit better, but still not nearly as fast as Andrie's
> system.time(convolveFaster(a, b))
user system elapsed
0.280 0.038 0.319

The convolveFast function can be optimized a little by carefully using integer math only and replacing (1:ny)-1L with seq.int(0L, ny-1L):
convolveFaster <- function(x, y) {
nx <- length(x)
ny <- length(y)
xy <- nx + ny - 1L
xy <- rep(0L, xy)
for(i in seq_len(nx)){
j <- seq_len(ny)
ij <- i + j - 1L
xy[i+seq.int(0L, ny-1L)] <- xy[ij] + x[i] * y

How about convolve(x, rev(y), type = "open") in stats?
> x <- runif(1000)
> y <- runif(1000)
> system.time(a <- convolve(x, rev(y), type = "o"))
user system elapsed
0.032 0.000 0.032
> system.time(b <- convolveSlow(x, y))
user system elapsed
11.417 0.060 11.443
> identical(a,b)
> all.equal(a,b)
[1] TRUE

Some say the apply() and sapply() functions are faster than for() loops in R. You could convert the convolution to a function and call it from within apply().
However, there is evidence to the contrary


for loop in R when we have LOOCV

I have two for loops in R with a data around 150000 observation. I tried apply() family functions but they were slower than for loop in my case. here is my code:
where k=500 and N= 150000, x is location at each time t (for all observation) and xm is specific x with a specific coordination that I filtered here. At each time j we observe xm so we remove it from the data and fit the model with the rest of dataset. I had an if else condition here that removed it in order to make the loop faster.
It's so slow, I am so thankful for your help!
xs = 0:200
result= matrix(0, k,N )
for (j in 1: N){
for ( i in 1:k){
a <- sum(dnorm(xs[i],xm[-j],bx))
b <- sum(dnorm(xs[i],x[-ind[j]],bx))
Using dummy values ind, x, and xm, here is a solution that runs in about 10 seconds on my machine (>1000 times faster than the original code).
# start with a small N for verification
N <- 15e2L
xm <- runif(N)
x <- runif(N)
ind <- sample(N)
k <- 501L
xs <- 0:500
bx <- 2
# proposed solution
a <- outer(xs, xm, function(x, y) dnorm(x, y, bx))
b <- outer(xs, x[ind], function(x, y) dnorm(x, y, bx))
result1 <- (rowSums(a) - a)/(rowSums(b) - b)
#> user system elapsed
#> 0.08 0.02 0.10
# OP's solution
result2 <- matrix(0, k, N)
for (j in 1:N){
for (i in 1:k){
a <- sum(dnorm(xs[i], xm[-j], bx))
b <- sum(dnorm(xs[i], x[-ind[j]], bx))
result2[i,j] <- a/b
#> user system elapsed
#> 109.42 0.80 110.90
# check that the results are the same
all.equal(result1, result2)
#> [1] TRUE
# use a large N
N <- 15e4L
xm <- runif(N)
x <- runif(N)
ind <- sample(N)
a <- outer(xs, xm, function(x, y) dnorm(x, y, bx))
b <- outer(xs, x[ind], function(x, y) dnorm(x, y, bx))
result1 <- (rowSums(a) - a)/(rowSums(b) - b)
#> user system elapsed
#> 8.62 1.10 9.73

R: fastest way to set up matrix of integrals?

I have a tree-parameter function f(x, y, z), and two limits L, U.
Given a vector v, I want to set up a matrix with element M[i, j] = INTEGRAL( f(x, v[i], v[j]) ), where the integrals limits go from x = L to x = U.
So the problem has two elements:
We need to be able to calculate the integrals. I don't care how this is done, as long as its FAST and reasonably accurate. Fast, fast, fast!! What's the fastest way?
We need to set up the matrix M[i, j]. What's the fastest way?
Please don't make this an issue of "dO yOu WaNt GauSsIan QuaDraTure oR SimPsoNs ruLe?". I don't care. Speed is the only thing relevant here. Whatevers faster, I'll take it, as long as the integrals are at least accurate up to 1-2 digits or something.
A potentially fastest solution is given as below
M <- matrix(0,nrow = length(v),ncol = length(v))
p <- sapply(seq(length(v)-1), function(k) integral(f,v[k],v[k+1]))
u <- unlist(sapply(rev(seq_along(p)), function(k) cumsum(tail(p,k))))
M[lower.tri(M)] <- u
M <- t(M-t(M))
Regarding the two elements requested by OP
I guess integral from package pracma is fast enough
To build the matrix M, I did not used nested for loop. The idea is explained at the bottom lines, which I believe speeds up the computation remarkably
I wrote down some of the possible solutions and you can compare their performance (my "fastest" solution is in method1()).
# dummy data: function f and vector v
f <- function(x) x**3 + cos(x**2)
v <- rnorm(500)
# my "fastest" solution
method1 <- function() {
m1 <- matrix(0,nrow = length(v),ncol = length(v))
p <- sapply(seq(length(v)-1), function(k) integral(f,v[k],v[k+1]))
u <- unlist(sapply(rev(seq_along(p)), function(k) cumsum(tail(p,k))))
m1[lower.tri(m1)] <- u
# faster than brute-force solution
method2 <- function() {
m2 <- matrix(0,nrow = length(v),ncol = length(v))
for (i in 1:(length(v)-1)) {
for (j in i:length(v)) {
m2[i,j] <- integral(f,v[i],v[j])
m2 + t(m2)
# slowest, brute-force solution
method3 <- function() {
m3 <- matrix(0,nrow = length(v),ncol = length(v))
for (i in 1:length(v)) {
for (j in 1:length(v)) {
m3[i,j] <- integral(f,v[i],v[j])
# timing for compare
such that
> system.time(method1())
user system elapsed
0.17 0.01 0.19
> system.time(method2())
user system elapsed
25.72 0.07 25.81
> system.time(method3())
user system elapsed
41.84 0.03 41.89
The idea in method1() is that, you only need to calculate the integrals over intervals consisting of adjacent points in v. Note that the integral properties:
integral(f,v[i],v[j]) is equal to sum(integral(f,v[i],v[i+1]) + integral(f,v[i+1],v[i+1]) + ... + integral(f,v[j-1],v[j]))
integral(f,v[j],v[i]) is equal to -integral(f,v[i],v[j])
In this sense, given n <- length(v), you only need to run integral operations (which is rather computational expensive compared to matrix transpose or vector cumulative summation) n-1 times (far less than choose(n,2) times in method2() or n**2 times in method3(), particularly when n is large).

R: function choose, efficient

I need to speed up my R-code. My bottleneck is a function that needs to use the choose function. It looks like this:
P_ni <- function(Pn,Pi,eta1,eta2,p,d=NA)
if(is.na(d)) d <- 1-p
if(Pn==Pi) output <- p^Pn
if(Pi==1)seq1 <- seq_len(Pn-1)
if(Pi>1)seq1 <- seq_len(Pn-1)[-seq_len(Pi-1)]
output <- sum(choose((Pn-Pi-1),c(seq1-Pi))*choose(Pn,seq1)*
This function need to be called several times with different Pn and Pi. The Problem here is, that Pn and Pi only are able to take a single number and not work with vectors. This is caused by the choose()-function.
I do this with a for-loop at the moment and it works perfectly, but it is slow.
The for-loop looks like this:
for(i in 1:nrow(n_k_matrix_p))
n_k_matrix_p[i,4] <- P_ni(n_k_matrix_p[i,1],n_k_matrix_p[i,2],eta1,eta2,p)
To make it reproducible:
eta1 <- 10
eta2 <- 5
p <- 0.4
n_k_matrix <- expand.grid(c(1:20),c(1:20))
n_k_matrix <- n_k_matrix[n_k_matrix[,1] >=n_k_matrix[,2],]
n_k_matrix <- n_k_matrix[order(n_k_matrix[,1]),]
The n_k_matrix contains my numbers for Pn and Pi.
Unfortunately the loop is still faster than using apply.
Does anyone have any idea how to speed things up?
You can regroup or precompute some computations.
P_ni2 <- function(n, eta1, eta2, p, d = 1 - p) {
res <- matrix(0, n, n)
diag(res) <- p^seq_len(n)
C1 <- eta1 / eta2 * p / d
C2 <- eta2 / (eta1 + eta2) * d
C3 <- eta1 / (eta1 + eta2)
C2_n <- C2^seq_len(n)
C3_n <- C3^seq_len(n)
precomputed <- outer(0:n, 0:n, choose)
for (j in seq_len(n)) {
for (i in seq_len(j - 1)) {
seq1 <- seq(i, j - 1)
res[i, j] <- sum(
precomputed[j-i, seq1-i+1] * precomputed[j+1, seq1+1] * C1^seq1
) * C2_n[j] / C3_n[i]
> system.time({
+ n_k_matrix[[3]] <- sapply(1:nrow(n_k_matrix), function(i) {
+ P_ni(n_k_matrix[i,1], n_k_matrix[i,2], eta1, eta2, p)
+ })
+ })
utilisateur système écoulé
11.799 0.000 11.797
> system.time({
+ test <- P_ni2(400, eta1, eta2, p)
+ n_k_matrix[[4]] <- test[as.matrix(n_k_matrix[, 2:1])]
+ })
utilisateur système écoulé
2.328 0.003 2.341
> all.equal(n_k_matrix[[3]], n_k_matrix[[4]])
[1] TRUE
Note that I first store the results in the upper triangle of a squared matrix. Then, I convert it in your data frame format (that you call a matrix by the way).
This solution is 5 times faster for n = 400. I think you could improve it by recoding the double-loop (only) in Rcpp.

Making nested for loops in R more efficient

I am working on a research project where I want to determine equivalence of two distributions. I am currently using the Mann-Whitney Test for Equivalence and the code I am running (below) was provided with the book Testing Statistical Hypotheses of Equivalence and Noninferiority by Stefan Wellek (2010). Before running my data I am testing this code with random normal distributions which have the same mean and standard deviation. My problem is that there are three nested for loops and when running larger distributions sizes (as in the example below) the code takes forever to run. If I only had to run it once that would not be such a problem, but I am doing a simulation test and creating power curves so I need to run many iterations of this code (around 10,000). At the moment, depending on how I alter the distribution sizes, it takes days to run 10,000 iterations.
Any help in a way to increase the performance of this would be greatly appreciated.
x <- rnorm(n=125, m=3, sd=1)
y <- rnorm(n=500, m=3, sd=1)
alpha <- 0.05
m <- length(x)
n <- length(y)
eps1_ <- 0.2 #0.1382 default
eps2_ <- 0.2 #0.2602 default
eqctr <- 0.5 + (eps2_-eps1_)/2
eqleng <- eps1_ + eps2_
wxy <- 0
pihxxy <- 0
pihxyy <- 0
for (i in 1:m)
for (j in 1:n)
wxy <- wxy + trunc(0.5*(sign(x[i] - y[j]) + 1))
for (i in 1:m)
for (j1 in 1:(n-1))
for (j2 in (j1+1):n)
pihxyy <- pihxyy + trunc(0.5*(sign(x[i] - max(y[j1],y[j2])) + 1))
for (i1 in 1:(m-1))
for (i2 in (i1+1):m)
for (j in 1:n)
pihxxy <- pihxxy + trunc(0.5*(sign(min(x[i1],x[i2]) - y[j]) + 1))
wxy <- wxy / (m*n)
pihxxy <- pihxxy*2 / (m*(m-1)*n)
pihxyy <- pihxyy*2 / (n*(n-1)*m)
sigmah <- sqrt((wxy-(m+n-1)*wxy**2+(m-1)*pihxxy+(n-1)*pihxyy)/(m*n))
crit <- sqrt(qchisq(alpha,1,(eqleng/2/sigmah)**2))
if (abs((wxy-eqctr)/sigmah) >= crit) rej <- 1
if (abs((wxy-eqctr)/sigmah) < crit) rej <- 0
if (is.na(sigmah) || is.na(crit)) rej <- 1
MW_Decision <- rej
cat(" ALPHA =",alpha," M =",m," N =",n," EPS1_ =",eps1_," EPS2_ =",eps2_,
"\n","WXY =",wxy," SIGMAH =",sigmah," CRIT =",crit," REJ=",MW_Decision)
See edit below for an even better suggestion
One simple suggestion to get a bit of a speed boost is to byte compile your code.
For example, I wrapped your code into a function starting from the alpha <- 0.05 line and ran it on my laptop. Simply byte compiling your current code, it runs twice as fast.
x <- rnorm(n=125, m=3, sd=1)
y <- rnorm(n=500, m=3, sd=1)
# f1 <- function(x,y){ ...your code...}
system.time(f1(x, y))
# user system elapsed
# 33.249 0.008 33.278
f2 <- cmpfun(f1)
system.time(f2(x, y))
# user system elapsed
# 17.162 0.002 17.170
I should add, this is the type of things that a different language would do much better than R. Have you looked at the Rcpp and the inline packages?
I've been curious to learn how to use them so I figured this was a good chance.
Here's a tweak of your code using the inline package and Fortran (since I'm more comfortable with that than C). It wasn't hard at all (provided you know Fortran or C); I just followed the examples listed in cfunction.
First, let's re-write your loops and compile them:
# Fortran code for first loop
loop1code <- "
integer i, j1, j2
real*8 tmp
do i = 1, m
do j1 = 1, n-1
do j2 = j1+1, n
tmp = x(i) - max(y(j1),y(j2))
if (tmp > 0.) pihxyy = pihxyy + 1
end do
end do
end do
# Compile the code and turn loop into a function
loop1fun <- cfunction(sig = signature(x="numeric", y="numeric", pihxyy="integer", m="integer", n="integer"), dim=c("(m)", "(n)", "", "", ""), loop1code, language="F95")
# Fortran code for second loop
loop2code <- "
integer i1, i2, j
real*8 tmp
do i1 = 1, m-1
do i2 = i1+1, m
do j = 1, n
tmp = min(x(i1), x(i2)) - y(j)
if (tmp > 0.) pihxxy = pihxxy + 1
end do
end do
end do
# Compile the code and turn loop into a function
loop2fun <- cfunction(sig = signature(x="numeric", y="numeric", pihxxy="integer", m="integer", n="integer"), dim=c("(m)", "(n)", "", "", ""), loop2code, language="F95")
Now let's create a new function that uses these. So it's not too long, I'll just sketch the key parts I modified from your code:
f3 <- function(x, y){
# ... code ...
# Remove old loop
## for (i in 1:m)
## for (j1 in 1:(n-1))
## for (j2 in (j1+1):n)
## pihxyy <- pihxyy + trunc(0.5*(sign(x[i] - max(y[j1],y[j2])) + 1))
# Call new function from compiled code instead
pihxyy <- loop1fun(x, y, pihxyy, m, n)$pihxyy
# Remove second loop
## for (i1 in 1:(m-1))
## for (i2 in (i1+1):m)
## for (j in 1:n)
## pihxxy <- pihxxy + trunc(0.5*(sign(min(x[i1],x[i2]) - y[j]) + 1))
# Call new compiled function for second loop
pihxxy <- loop2fun(x, y, pihxxy, m, n)$pihxxy
# ... code ...
And now we run it and voila, we get a huge speed boost! :)
system.time(f3(x, y))
# user system elapsed
0.12 0.00 0.12
I did check that it got the same results as your code, but you probably want to run some additional tests just in case.
You can use outer instead of the first double loop:
f1 <- function(x,y) {
wxy <- 0
for (i in 1:m)
for (j in 1:n)
wxy <- wxy + trunc(0.5*(sign(x[i] - y[j]) + 1))
f2 <- function(x,y) sum(outer(x,y, function(x,y) trunc(0.5*(sign(x-y)+1))))
[1] 32041
[1] 32041
You get roughly 50x speedup:
Unit: milliseconds
expr min lq median uq max neval
f1(x, y) 138.223841 142.586559 143.642650 145.754241 183.0024 100
f2(x, y) 1.846927 2.194879 2.677827 3.141236 21.1463 100
The other loops are trickier.

Using apply in R with an additional vector argument

I have a matrix of size 10000 x 100 and a vector of length 100. I'd like to apply a custom function, percentile, which takes in a vector argument and a scalar argument, to each column of the matrix such that on iteration j, the arguments used with percentile are column j of the matrix and entry j of the vector. Is there a way to use one of the apply functions to do this?
Here's my code. It runs, but doesn't return the correct result.
percentile <- function(x, v){
length(x[x <= v]) / length(x)
X <- matrix(runif(10000 * 100), nrow = 10000, ncol = 100)
y <- runif(100)
result <- apply(X, 2, percentile, v = y)
The workaround that I've been using has been to just append y to X, and re-write the percentile function, as shown below.
X <- rbind(X, y)
percentile2 <- function(x){
v <- x[length(x)]
x <- x[-length(x)]
length(x[x <= v]) / length(x)
result <- apply(X, 2, percentile2)
This code does return the correct result, but I would prefer something a bit more elegant.
If you understand that R is vectorised and know the right functions you can avoid loops entirely, and do the whole thing in one relatively simple line...
colSums( t( t( X ) <= y ) ) / nrow( X )
Through vectorisation R will recycle each element in y across each column of X (by default it will do this across the rows, so we use the transpose function t to turn the columns to rows, apply the logical comparison <= and then transpose back again.
Since TRUE and FALSE evaluate to 1 and 0 respectively we can use colSums to effectively get the number of rows in each column which met the condition and then divde each column by the total number of rows (remember the recycling rule!). It is the exact same result....
res1 <- apply(X2, 2, percentile2)
res2 <- colSums( t( t( X ) <= y ) ) / nrow( X )
identical( res1 , res2 )
[1] TRUE
Obviously as this doesn't use any R loops it's a lot quicker (~10 times on this small matrix).
Even better would be to use rowMeans like this (thanks to #flodel):
rowMeans( t(X) <= y )
I think the easiest and clearest way is to use a for loop:
result2 <- numeric(ncol(X))
for (i in seq_len(ncol(X))) {
result2[i] <- sum(X[,i] <= y[i])
result2 <- result2 / nrow(X)
the fastest and shortest solution I can think of is:
result1 <- rowSums(t(X) <= y) / nrow(X)
SimonO101 has an explanation in his answer how this works. As I said, it is fast. However, the disadvantage is that it is less clear what exactly is calculated here, although you could solve this by placing this piece of code in a well-named function.
flodel also suggester a solution using mapply which is an apply that can work on multiple vectors. However, for that to work you first need to put each of your columns or your matrix in a list or data.frame:
result3 <- mapply(percentile, as.data.frame(X), y)
Speed wise (see below for some benchmarking) the for-loop doesn't do that bad and it's faster than using apply (in this case at least). The trick with rowSums and vector recycling is faster, over 10 times as fast as the solution using apply.
> X <- matrix(rnorm(10000 * 100), nrow = 10000, ncol = 100)
> y <- runif(100)
> system.time({result1 <- rowSums(t(X) <= y) / nrow(X)})
user system elapsed
0.020 0.000 0.018
> system.time({
+ X2 <- rbind(X, y)
+ percentile2 <- function(x){
+ v <- x[length(x)]
+ x <- x[-length(x)]
+ length(x[x <= v]) / length(x)
+ }
+ result <- apply(X2, 2, percentile2)
+ })
user system elapsed
0.252 0.000 0.249
> system.time({
+ result2 <- numeric(ncol(X))
+ for (i in seq_len(ncol(X))) {
+ result2[i] <- sum(X[,i] <= y[i])
+ }
+ result2 <- result2 / nrow(X)
+ })
user system elapsed
0.024 0.000 0.024
> system.time({
+ result3 <- mapply(percentile, as.data.frame(X), y)
+ })
user system elapsed
0.076 0.000 0.073
> all(result2 == result1)
[1] TRUE
> all(result2 == result)
[1] TRUE
> all(result3 == result)
[1] TRUE
