Suggestions on speeding up my 5 loops in R? - r

I have 5 tables which I would like to find some combinations that fulfills some requirements. I could easily solve the data below by using matrix multiplication to create all possible combinations and afterwards selecting rows that fulfills my needs. The problem is that my original problem consist of 5 tables with 200 rows each. Which requires a couple of hundred gb of ram if generating all possible combinations.
So I tried this instead:
x1 <- seq(1,10,1)
x2 <- seq(5,15,3)
x3 <- seq(2,11,1)
x4 <- seq(1,5,1)
x5 <- seq(1,20,2)
Which should fulfill: x1 < x2 and x1 < x3.
nm <- data.frame(matrix(NA,1,5))
for(a in 1:length(x1)){
for(s in 1:length(x2)){
for(d in 1:length(x3)){
for(f in 1:length(x4)){
for(g in 1:length(x5)){
l1 <- x1[a]
l2 <- x2[s]
if(l1 < l2){
l3 <- x3[d]
if(l1 < l3){
l4 <- x4[f]
l5 <- x5[g]
fy <- c()
fy[1] <- l1
fy[2] <- l2
fy[3] <- l3
fy[4] <- l4
fy[5] <- l5
nm <- rbind(nm, fy)
}}}}}}}
In my original problem I have more if statements, which I hope will increase speed. But I have had it running for about 24hours now, and still not done. The above problem takes me about 10s which makes me think its stuck.

Two problems:
The huge problem is that you grow an object in a loop. This is the slowest operation possible since there is huge OS overhead involved. You need to preallocate the object and only grow it in chunks as necessary.
The medium problem is that you use a data.frame to store the results. Data.frames are useful, but slow. Use a matrix instead.
nm1 <- matrix(nrow = 1e3, ncol = 5) #adjust the chunk size to a reasonable estimate
rx <- 1
for(a in 1:length(x1)){
for(s in 1:length(x2)){
for(d in 1:length(x3)){
for(f in 1:length(x4)){
for(g in 1:length(x5)){
l1 <- x1[a]
l2 <- x2[s]
if(l1 < l2){
l3 <- x3[d]
if(l1 < l3){
l4 <- x4[f]
l5 <- x5[g]
if(rx > nrow(nm1)) nm1 <- rbind(nm1, matrix(nrow = 1e3, ncol = 5))
nm1[rx, 1] <- l1
nm1[rx, 2] <- l2
nm1[rx, 3] <- l3
nm1[rx, 4] <- l4
nm1[rx, 5] <- l5
rx <- rx + 1
}}}}}}}
nm1 <- nm1[seq_len(rx - 1),]
Timings:
Unit: milliseconds
expr min lq mean median uq max neval cld
mod() 589.2437 591.1576 594.4138 593.3678 595.0909 603.2087 5 a
original() 4934.4981 4952.4502 4980.6414 4953.3183 4985.7943 5077.1463 5 b
We get a factor 10 performance improvement without actually starting to think about the algorithm. This factor gets bigger if you have more iterations of growing the data.frame. If this is still too slow, you could try to byte-compile the code using the compiler package. It would also be trivial to implement as actual compiled code with Rcpp. However, you should benchmark with increasing number of iterations and extrapolate the timings to your actual problem. You might need to find a better algorithm than brute force or consider if you actually need to do this.

Related

Improve time consumption within matrix column calculation in R

I have to calculate values of a matrix which one column depends of others. In summary this code is part of a simulation where I want to see how values of m1[,1] would change in function of the amount of columns in this matrix and other parameters.
My problem is that this simulation took around 10 days (or more) to complete and I was wondering if it's possible to do this calculations in a more efficient way.
Here is the code (this is only an example of the operations for time composition, because those results have no significance):
library(microbenchmark)
number_of_columns <- 6 #In my simulation I'm using 10,000 columns
microbenchmark({
m1 <- matrix(1.12345678912356789, nrow = 6, ncol = number_of_columns)
m2 <- matrix(1.12345678912356789, nrow = 6, ncol = number_of_columns)
v1 <- rnorm(6)
v2 <- rnorm(6)
v3 <- rnorm(6)
for (j in 1:10000) { #I need to loop 1e07 times
m1[,1] <- m2[, 2] + m2[, 1]*v1
m2[, 1] <- m2[, 1] + m1[, 1]
for (i in 2:(ncol(m1) - 1)) {
m1[,i] <- (m2[, i + 1] - m2[, i])*v2
m2[, i] <- m2[, i] + (m1[, i] - m1[, i - 1])*v3
}
m2[, 6] <- m2[, 5]
}
})
In each simulations I want to change values of v1, v2 and v3. Also, the number of columns of m1 and m2 would be 10000.
The result of time consumption of microbenchmark in my computer is:
min lq mean median uq max neval
133.823 144.2911 154.8575 151.2269 157.7208 232.0194 100
I'm using a i5-1135G7. I also have a NVIDIA GeForce MX350 in my laptop. I tried to use library gpuR to run my simulation in my GPU but I did not understand how to install OpenCL.

How to optimize my correlation problem in R?

I have three dataframes in R, let's call them A, B, and C.
dataframe C contains two columns, the first one contains various row names from dataframe A and the second one contains row names in dataframe B:
C <- data.frame(col1 = c("a12", "a9"), col2 = c("b6","b54"))
I want to calculate the correlation coefficient and p-values for each row of the table C using the corresponding values from the rows of table A and B (i.e. correlating values from the a12 row in the table A with values from b6 row from table B, a9 row from table A with b54 row from table B, etc.) and put the resulting values in additional columns in the table C. This is my current naive and highly inefficient code:
for (i in 1:nrow(C)) {
correlation <- cor.test(unlist(A[C[i,1],]), unlist(B[C[i,2],]), method = "spearman")
C[i,3] <-correlation$estimate
C[i,4] <- correlation$p.value
}
The main problem is that with my current large datasets this analysis can literally take months. so I'm looking for a more efficient way to accomplish this task. I also tried the following code using the "Hmisc" package but the server I'm working on can't handle the large vectors:
A <- t(A)
B <- t(B)
ind.A <- match(C[,1], colnames(A))
A<- A[,ind.A]
ind.B <- match(C[,2], colnames(B))
B<- B[,ind.B]
C[,3]<- diag(rcorr(as.matrix(A),as.matrix(B),type = "spearman")$r[c(1:ncol(A)),c(1:ncol(A))])
C[,4]<- diag(rcorr(as.matrix(A),as.matrix(B),type = "spearman")$P[c(1:ncol(A)),c(1:ncol(A))])
Based on the comment by #HYENA, I tried parallelize processing. This approach accelerated the process approximately 4 times (with 8 cores). The code:
library(foreach)
library(doParallel)
cl<- makeCluster(detectCores())
registerDoParallel(cl)
cor.res<- foreach (i=1:nrow(C)) %dopar% {
a<- C[i,1]
b<- C[i,2]
correlation<- cor.test(unlist(A[a,]),unlist(B[b,]), method = "spearman")
c(correlation$estimate,correlation$p.value)
}
cor.res<- data.frame(Reduce("rbind",cor.res))
C[,c(3,4)]<- cor.res
Extract just the part you need from cor.test giving cor_test1 and use that instead or, in addition, create a lookup table for the p values giving cor_test2 which is slightly faster than cor_test1.
Based on the median column with 10-vectors these run about 3x faster than cor.test. Although cor_test2 is only slightly faster than cor_test1 here we have included it since the speed could depend on size of input which we don't have but you can try it out yourself with whatever sizes you have.
# given correlation and degrees of freedom output p value
r2pval <- function(r, dof) {
tval <- sqrt(dof) * r/sqrt(1 - r^2)
min(pt(tval, dof), pt(tval, dof, lower.tail = FALSE))
}
# faster version of cor.test
cor_test1 <- function(x, y) {
r <- cor(x, y)
dof <- length(x) - 2
tval <- sqrt(dof) * r/sqrt(1 - r^2)
pval <- min(pt(tval, dof), pt(tval, dof, lower.tail = FALSE))
c(r, pval)
}
# even faster version of cor.test.
# Given x, y and the pvals table calculate a 2-vector of r and p value
cor_test2 <- function(x, y, pvals) {
r <- cor(x, y)
c(r, pvals[100 * round(r, 2) + 101])
}
# test
set.seed(123)
n <- 10
x <- rnorm(n); y <- rnorm(n)
dof <- n - 2
# pvals is the 201 p values for r = -1, -0.99, -0.98, ..., 1
pvals <- sapply(seq(-1, 1, 0.01), r2pval, dof = dof)
library(microbenchmark)
microbenchmark(cor.test(x, y), cor_test1(x, y), cor_test2(x, y, pvals))
giving:
Unit: microseconds
expr min lq mean median uq max neval cld
cor.test(x, y) 253.7 256.7 346.278 266.05 501.45 650.6 100 a
cor_test1(x, y) 84.8 87.2 346.777 89.10 107.40 22974.4 100 a
cor_test2(x, y, pvals) 72.4 75.0 272.030 79.45 91.25 17935.8 100 a

Faster matrix multiplication by replacing a double loop

I have a dataframe which looks a bit as produced by the following code (but much larger)
set.seed(10)
mat <- matrix(rbinom(200, size=1, prob = .5), ncol = 10)
In the columns are issues and 1 indicates that an observation is interested in a specific issue. I want to generate a network comparing all observations and have a count of issues that each dyad is jointly interested in.
I have produced the following code, which seems to be working fine:
mat2 <- matrix(NA,20,20)
for(i in 1:nrow(mat)){
for(j in 1:nrow(mat)){
mat2[i,j] <- sum(as.numeric(mat[i,]==1) + as.numeric(mat[j,]==1) == 2)
}
}
So I compare every entry with every other entry, and only if both have a 1 entry (i.e., they are interested), then this sums to 2 and will be counted as joint interest in a topic.
My problem is that my dataset is very large, and the loop now runs for hours already.
Does anyone have an idea how to do this while avoiding the loop?
This should be faster:
tmat <- t(mat==1)
mat4 <- apply(tmat, 2, function(x) colSums(tmat & x))
going ahead and promoting #jogo's comment as it is by far the fastest (thank's for the hint, I will use that in production as well).
set.seed(10)
mat <- matrix(rbinom(200, size=1, prob = .5), ncol = 10)
mat2 <- matrix(NA,20,20)
binary_mat <- mat == 1
tmat <- t(mat==1)
microbenchmark::microbenchmark(
"loop" = for(i in 1:nrow(mat)){
for(j in 1:nrow(mat)){
mat2[i,j] <- sum(as.numeric(mat[i,]==1) + as.numeric(mat[j,]==1) == 2)
}
},
"apply" = mat4 <- apply(tmat, 2, function(x) colSums(tmat & x)),
"matrix multiplication" = mat5 <- mat %*% t(mat),
"tcrossprod" = tcrossprod(mat),
"tcrossprod binary" = tcrossprod(binary_mat)
)
On my machine this benchmark results in
Unit: microseconds
expr min lq mean median uq max neval cld
loop 16699.634 16972.271 17931.82535 17180.397 17546.1545 31502.706 100 b
apply 322.942 330.046 395.69045 357.886 368.8300 4299.228 100 a
matrix multiplication 21.889 28.801 36.76869 39.360 43.9685 50.689 100 a
tcrossprod 7.297 8.449 11.20218 9.984 14.4005 18.433 100 a
tcrossprod binary 7.680 8.833 11.08316 9.601 12.0970 35.713 100 a

Why does sapply scale slower than for loop with sample size?

So let's say I want to take the vector X = 2*1:N and raise e to the exponent of each element. (Yes, I recognize the best way to do that is simply by vectorization exp(X), but the point of this is to compare for loop with sapply). Well I tested by incrementally trying three methods (one with for loops, two with sapply applied in a different manner) with different sample sizes and measuring the corresponding time. I then plot the sample size N vs time t for each method.
Each method is indicated by "#####".
k <- 20
t1 <- rep(0,k)
t2 <- rep(0,k)
t3 <- rep(0,k)
L <- round(10^seq(4,7,length=k))
for (i in 1:k) {
X <- 2*1:L[i]
Y1 <- rep(0,L[i])
t <- system.time(for (j in 1:L[i]) Y1[j] <- exp(X[j]))[3] #####
t1[i] <- t
}
for (i in 1:k) {
X <- 2*1:L[i]
t <- system.time( Y2 <- sapply(1:L[i], function(q) exp(X[q])) )[3] #####
t2[i] <- t
}
for (i in 1:k) {
X <- 2*1:L[i]
t <- system.time( Y3 <- sapply(X, function(x) exp(x)) )[3] #####
t3[i] <- t
}
plot(L, t3, type='l', col='green')
lines(L, t2,col='red')
lines(L, t1,col='blue')
plot(log(L), log(t1), type='l', col='blue')
lines(log(L), log(t2),col='red')
lines(log(L), log(t3), col='green')
We get the following results.
Plot of N vs t:
Plot of log(N) vs log(t)
The blue plot is the for loop method, and the red and green plots are the sapply methods. In the regular plot, you can see that, as sample size gets larger, the for loop method is heavily favoured over the sapply methods, which is not what I would have expected at all. If you look at the log-log plot (in order to more easily distinguish the smaller N results) we see the expected result of sapply being more efficient than for loop for small N.
Does anybody know why sapply scales more slowly than for loop with sample size? Thanks.
You're not accounting for the time it takes to allocate space for the resulting vector Y1. As the sample size increases, the time it takes to allocate Y1 becomes a larger share of the execution time, and the time it takes to do the replacement becomes a smaller share.
sapply always allocates memory for the the result, so that's one reason it would be less efficient as sample size increases. gagolews also has a very good point about sapply calling simplify2array. That (likely) adds another copy.
After some more testing, it looks like lapply is still about the same or slower than a byte-compiled function containing a for loop, as the objects get larger. I'm not sure how to explain this, other than possibly this line in do_lapply:
if (MAYBE_REFERENCED(tmp)) tmp = lazy_duplicate(tmp);
Or possibly something with how lapply constructs the function call... but I'm mostly speculating.
Here's the code I used to test:
k <- 20
t1 <- rep(0,k)
t2 <- rep(0,k)
t3 <- rep(0,k)
L <- round(10^seq(4,7,length=k))
L <- round(10^seq(4,6,length=k))
# put the loop in a function
fun <- function(X, L) {
Y1 <- rep(0,L)
for (j in 1:L)
Y1[j] <- exp(X[j])
Y1
}
# for loops often benefit from compiling
library(compiler)
cfun <- cmpfun(fun)
for (i in 1:k) {
X <- 2*1:L[i]
t1[i] <- system.time( Y1 <- fun(X, L[i]) )[3]
}
for (i in 1:k) {
X <- 2*1:L[i]
t2[i] <- system.time( Y2 <- cfun(X, L[i]) )[3]
}
for (i in 1:k) {
X <- 2*1:L[i]
t3[i] <- system.time( Y3 <- lapply(X, exp) )[3]
}
identical(Y1, Y2) # TRUE
identical(Y1, unlist(Y3)) # TRUE
plot(L, t1, type='l', col='blue', log="xy", ylim=range(t1,t2,t3))
lines(L, t2, col='red')
lines(L, t3, col='green')
Most of the points have been made before, but...
sapply() uses lapply() and then pays a one-time cost of formatting the result using simplify2array().
lapply() creates a long vector, and then a large number of short (length 1) vectors, whereas the for loop generates a single long vector.
The sapply() as written has an extra function call compared to the for loop.
Using gcinfo(TRUE) lets us see the garbage collector in action, and each approach results in the garbage collector running several times -- this can be quite expensive, and not completely deterministic.
Points 1 - 3 need to be interpreted in the artificial context of the example -- exp() is a fast function, exaggerating the relative contribution of memory allocation (2), function evaluation (3), and one-time costs (1). Point 4 emphasizes the need to replicate timings in a systematic way.
I started by loading the compiler and microbenchmark packages. I focused on the largest size only
library(compiler)
library(microbenchmark)
n <- 10^7
In my first experiment I replaced exp() with simple assignment, and tried different ways of representing the result in the for loop -- a vector of numeric values, or list of numeric vectors as implied by lapply().
fun0n <- function(n) {
Y1 <- numeric(n)
for (j in seq_len(n)) Y1[j] <- 1
}
fun0nc <- compiler::cmpfun(fun0n)
fun0l <- function(n) {
Y1 <- vector("list", n)
for (j in seq_len(n)) Y1[[j]] <- 1
}
fun0lc <- compiler::cmpfun(fun0l)
microbenchmark(fun0n(n), fun0nc(n), fun0lc(n), times=5)
## Unit: seconds
## expr min lq mean median uq max neval
## fun0n(n) 5.620521 6.350068 6.487850 6.366029 6.933915 7.168717 5
## fun0nc(n) 1.852048 1.974962 2.028174 1.984000 2.035380 2.294481 5
## fun0lc(n) 1.644120 2.706605 2.743017 2.998258 3.178751 3.187349 5
So it pays to compile the for loop, and there's a fairly substantial cost to generating a list of vectors. Again this memory cost is amplified by the simplicity of the body of the for loop.
My next experiment explored different *apply()
fun2s <- function(n)
sapply(raw(n), function(i) 1)
fun2l <- function(n)
lapply(raw(n), function(i) 1)
fun2v <- function(n)
vapply(raw(n), function(i) 1, numeric(1))
microbenchmark(fun2s(n), fun2l(n), fun2v(n), times=5)
## Unit: seconds
## expr min lq mean median uq max neval
## fun2s(n) 4.847188 4.946076 5.625657 5.863453 6.130287 6.341282 5
## fun2l(n) 1.718875 1.912467 2.024325 2.141173 2.142004 2.207105 5
## fun2v(n) 1.722470 1.829779 1.847945 1.836187 1.845979 2.005312 5
There is a large cost to the simplification step in sapply(); vapply() is more robust than lapply() (I am guaranteed the type of the return) without performance penalty, so it should be my go-to function in this family.
Finally, I compared the for iteration to vapply() where the result is a list-of-vectors.
fun1 <- function(n) {
Y1 <- vector("list", n)
for (j in seq_len(n)) Y1[[j]] <- exp(0)
}
fun1c <- compiler::cmpfun(fun1)
fun3 <- function(n)
vapply(numeric(n), exp, numeric(1))
fun3fun <- function(n)
vapply(numeric(n), function(i) exp(i), numeric(1))
microbenchmark(fun1c(n), fun3(n), fun3fun(n), times=5)
## Unit: seconds
## expr min lq mean median uq max neval
## fun1c(n) 2.265282 2.391373 2.610186 2.438147 2.450145 3.505986 5
## fun3(n) 2.303728 2.324519 2.646558 2.380424 2.384169 3.839950 5
## fun3fun(n) 4.782477 4.832025 5.165543 4.893481 4.973234 6.346498 5
microbenchmark(fun1c(10^3), fun1c(10^4), fun1c(10^5),
fun3(10^3), fun3(10^4), fun3(10^5),
times=50)
## Unit: microseconds
## expr min lq mean median uq max neval
## fun1c(10^3) 199 215 230 228 241 279 50
## fun1c(10^4) 1956 2016 2226 2296 2342 2693 50
## fun1c(10^5) 19565 20262 21671 20938 23410 24116 50
## fun3(10^3) 227 244 254 254 264 295 50
## fun3(10^4) 2165 2256 2359 2348 2444 2695 50
## fun3(10^5) 22069 22796 23503 23251 24393 25735 50
The compiled for loop and vapply() are neck-in-neck; the extra function call almost doubles the execution time of vapply() (again, this effect is exaggerated by the simplicity of the example). There does not seem to be much change in relative speed across a range of sizes
Try taking out the excess function(x) code that runs every iteration. It must have a lot of overhead. I didn't separate the two, but the for loop should also include all associated work for an apples to apples comparison like this:
t <- system.time(Y1 <- rep(0,L[i])) + system.time(for (j in 1:L[i]) Y1[j] <- exp(X[j]))[3] #####
A much faster sapply:
for (i in 1:k) {
X <- 2*1:L[i]
t <- system.time( Y4 <- sapply(X,exp )[3]) #####
t4[i] <- t
}
It's still slower, but much closer than the first two sapply's.

How to make a loop run faster in R?

I want to use arms() to get one sample each time and make a loop like the following one in my function. It runs very slowly. How could I make it run faster? Thanks.
library(HI)
dmat <- matrix(0, nrow=100,ncol=30)
system.time(
for (d in 1:100){
for (j in 1:30){
y <- rep(0, 101)
for (i in 2:100){
y[i] <- arms(0.3, function(x) (3.5+0.000001*d*j*y[i-1])*log(x)-x,
function(x) (x>1e-4)*(x<20), 1)
}
dmat[d, j] <- sum(y)
}
}
)
This is a version based on Tommy's answer but avoiding all loops:
library(multicore) # or library(parallel) in 2.14.x
set.seed(42)
m = 100
n = 30
system.time({
arms.C <- getNativeSymbolInfo("arms")$address
bounds <- 0.3 + convex.bounds(0.3, dir = 1, function(x) (x>1e-4)*(x<20))
if (diff(bounds) < 1e-07) stop("pointless!")
# create the vector of z values
zval <- 0.00001 * rep(seq.int(n), m) * rep(seq.int(m), each = n)
# apply the inner function to each grid point and return the matrix
dmat <- matrix(unlist(mclapply(zval, function(z)
sum(unlist(lapply(seq.int(100), function(i)
.Call(arms.C, bounds, function(x) (3.5 + z * i) * log(x) - x,
0.3, 1L, parent.frame())
)))
)), m, byrow=TRUE)
})
On a multicore machine this will be really fast since it spreads the loads across cores. On a single-core machine (or for poor Windows users) you can replace mclapply above with lapply and get only a slight speedup compared to Tommy's answer. But note that the result will be different for parallel versions since it will use different RNG sequences.
Note that any C code that needs to evaluate R functions will be inherently slow (because interpreted code is slow). I have added the arms.C just to remove all R->C overhead to make moli happy ;), but it doesn't make any difference.
You could squeeze out a few more milliseconds by using column-major processing (the question code was row-major which requires re-copying as R matrices are always column-major).
Edit: I noticed that moli changed the question slightly since Tommy answered - so instead of the sum(...) part you have to use a loop since y[i] are dependent, so the function(z) would look like
function(z) { y <- 0
for (i in seq.int(99))
y <- y + .Call(arms.C, bounds, function(x) (3.5 + z * y) * log(x) - x,
0.3, 1L, parent.frame())
y }
Well, one effective way is to get rid of the overhead inside arms. It does some checks and calls the indFunc every time even though the result is always the same in your case.
Some other evaluations can be also be done outside the loop. These optimizations bring down the time from 54 secs to around 6.3 secs on my machine. ...and the answer is identical.
set.seed(42)
#dmat2 <- ##RUN ORIGINAL CODE HERE##
# Now try this:
set.seed(42)
dmat <- matrix(0, nrow=100,ncol=30)
system.time({
e <- new.env()
bounds <- 0.3 + convex.bounds(0.3, dir = 1, function(x) (x>1e-4)*(x<20))
f <- function(x) (3.5+z*i)*log(x)-x
if (diff(bounds) < 1e-07) stop("pointless!")
for (d in seq_len(nrow(dmat))) {
for (j in seq_len(ncol(dmat))) {
y <- 0
z <- 0.00001*d*j
for (i in 1:100) {
y <- y + .Call("arms", bounds, f, 0.3, 1L, e)
}
dmat[d, j] <- y
}
}
})
all.equal(dmat, dmat2) # TRUE
why not like this?
dat <- expand.grid(d=1:10, j=1:3, i=1:10)
arms.func <- function(vec) {
require(HI)
dji <- vec[1]*vec[2]*vec[3]
arms.out <- arms(0.3,
function(x,params) (3.5 + 0.00001*params)*log(x) - x,
function(x,params) (x>1e-4)*(x<20),
n.sample=1,
params=dji)
return(arms.out)
}
dat$arms <- apply(dat,1,arms.func)
library(plyr)
out <- ddply(dat,.(d,j),summarise, arms=sum(arms))
matrix(out$arms,nrow=length(unique(out$d)),ncol=length(unique(out$j)))
However, its still single core and time consuming. But that isn't R being slow, its the arms function.

Resources