The problem with my R script is that it takes too much time and the main solution that I consider is to parallelize it. I don't know where to start.
My code look like this:
n<- nrow (aa)
output <- matrix (0, n, n)
akl<- function (dii){
ddi<- as.matrix (dii)
m<- rowMeans(ddi)
M<- mean(ddi)
r<- sweep (ddi, 1, m)
b<- sweep (r, 2, m)
return (b + M)
}
for (i in 1:n)
{
A<- akl(dist(aa[i,]))
dVarX <- sqrt(mean (A * A))
for (j in i:n)
{
B<- akl(dist(aa[j,]))
V <- sqrt (dVarX * (sqrt(mean(B * B))))
output[i,j] <- (sqrt(mean(A * B))) / V
}
}
I would like to parallelize on different cpus. How can I do that?
I saw the SNOW package, is it suitable for my purpose?
Thank you for suggestions,
Gab
There are two ways in which your code could be made to run faster that I could think of:
First: As #Dwin was saying (with a small twist), you could precompute akl (yes, not necesarily dist, but the whole of akl).
# a random square matrix
aa <- matrix(runif(100), ncol=10)
n <- nrow(aa)
output <- matrix (0, n, n)
akl <- function(dii) {
ddi <- as.matrix(dii)
m <- rowMeans(ddi)
M <- mean(m) # mean(ddi) == mean(m)
r <- sweep(ddi, 1, m)
b <- sweep(r, 2, m)
return(b + M)
}
# precompute akl here
require(plyr)
akl.list <- llply(1:nrow(aa), function(i) {
akl(dist(aa[i, ]))
})
# Now, apply your function, but index the list instead of computing everytime
for (i in 1:n) {
A <- akl.list[[i]]
dVarX <- sqrt(mean(A * A))
for (j in i:n) {
B <- akl.list[[j]]
V <- sqrt (dVarX * (sqrt(mean(B * B))))
output[i,j] <- (sqrt(mean(A * B))) / V
}
}
This should already get your code to run faster than before (as you compute akl everytime in the inner loop) on larger matrices.
Second: In addition to that, you can get it faster by parallelising as follows:
# now, the parallelisation you require can be achieved as follows
# with the help of `plyr` and `doMC`.
# First step of parallelisation is to compute akl in parallel
require(plyr)
require(doMC)
registerDoMC(10) # 10 Cores/CPUs
akl.list <- llply(1:nrow(aa), function(i) {
akl(dist(aa[i, ]))
}, .parallel = TRUE)
# then, you could write your for-loop using plyr again as follows
output <- laply(1:n, function(i) {
A <- akl.list[[i]]
dVarX <- sqrt(mean(A * A))
t <- laply(i:n, function(j) {
B <- akl.list[[j]]
V <- sqrt(dVarX * (sqrt(mean(B*B))))
sqrt(mean(A * B))/V
})
c(rep(0, n-length(t)), t)
}, .parallel = TRUE)
Note that I have added .parallel = TRUE only on the outer loop. This is because, you assign 10 processors to the outer loop. Now, if you add it to both outer and inner loops, then the total number of processers will be 10 * 10 = 100. Please take care of this.
Related
So, I'm trying to write a function that builds a large complicated formula recursively. Basically, what I would love to work simply, is the following:
f <- function(x) {
g <- function(y) y
for( i in 1:4 ) {
h <- g
g <- function(y) h(y)^2
}
g(x)
}
Please refrain from laughing at this insane motivation. Now what I would like to get, is a function that returns ((((x^2)^2)^2)^2), but what actually happens is that my runtime just crashes immediately, probably because there's some sort of call to an unreferenced function or something, since I'm overwriting the expression for g every time (obviously I don't really know how r works in this scenario).
How can I achieve this idea of retaining the information from the older g references?
1) Recursion We can use recursion like this:
h1 <- function(f, n) if (n == 1) f else function(x) f(h1(f, n-1)(x))
# test using g from questioun
h1(g, 4)(3)
## [1] 43046721
(((3^2)^2)^2)^2
## [1] 43046721
2) Reduce This uses Reduce to compose a function f with itself iteratively n times.
h2 <- function(f, n) function(y) Reduce(function(x, f) f(x), rep(list(f), n), y)
h2(g, 4)(3)
## [1] 43046721
3) for
h3 <- function(f, n) {
function(x) {
for(i in 1:n) x <- f(x)
x
}
}
h3(g, 4)(3)
## [1] 43046721
4) Fixed If there are a small fixed number we could just write it out
explicitly:
h4 <- function(x) g(g(g(g(x))))
h4(3)
## [1] 43046721
5) Compose We could slightly simplify any of the above using Compose from the functional package. (The purrr package also has a compose function. Use that if you are already using purrr; otherwise, functional has a smaller footprint.)
library(functional)
h1a <- function(f, n) if (n == 1) f else Compose(f, h(f, n-1))
h2a <- function(f, n) Reduce(Compose, rep(list(f), n))
h2b <- function(f, n) do.call(Compose, rep(list(f), n))
h3a <- function(f, n) {
for(i in 1:n) ff <- if (i == 1) f else Compose(ff, f)
ff
}
h4a <- Compose(g, g, g, g)
I need to speed up my R-code. My bottleneck is a function that needs to use the choose function. It looks like this:
P_ni <- function(Pn,Pi,eta1,eta2,p,d=NA)
{
if(is.na(d)) d <- 1-p
if(Pn==Pi) output <- p^Pn
else
{
if(Pi==1)seq1 <- seq_len(Pn-1)
if(Pi>1)seq1 <- seq_len(Pn-1)[-seq_len(Pi-1)]
output <- sum(choose((Pn-Pi-1),c(seq1-Pi))*choose(Pn,seq1)*
(eta1/(eta1+eta2))^c(seq1-Pi)*
(eta2/(eta1+eta2))^c(Pn-seq1)*p^seq1*d^c(Pn-seq1)
)
}
return(output)
}
This function need to be called several times with different Pn and Pi. The Problem here is, that Pn and Pi only are able to take a single number and not work with vectors. This is caused by the choose()-function.
I do this with a for-loop at the moment and it works perfectly, but it is slow.
The for-loop looks like this:
for(i in 1:nrow(n_k_matrix_p))
{
n_k_matrix_p[i,4] <- P_ni(n_k_matrix_p[i,1],n_k_matrix_p[i,2],eta1,eta2,p)
}
To make it reproducible:
eta1 <- 10
eta2 <- 5
p <- 0.4
n_k_matrix <- expand.grid(c(1:20),c(1:20))
n_k_matrix <- n_k_matrix[n_k_matrix[,1] >=n_k_matrix[,2],]
n_k_matrix <- n_k_matrix[order(n_k_matrix[,1]),]
The n_k_matrix contains my numbers for Pn and Pi.
Unfortunately the loop is still faster than using apply.
Does anyone have any idea how to speed things up?
You can regroup or precompute some computations.
P_ni2 <- function(n, eta1, eta2, p, d = 1 - p) {
res <- matrix(0, n, n)
diag(res) <- p^seq_len(n)
C1 <- eta1 / eta2 * p / d
C2 <- eta2 / (eta1 + eta2) * d
C3 <- eta1 / (eta1 + eta2)
C2_n <- C2^seq_len(n)
C3_n <- C3^seq_len(n)
precomputed <- outer(0:n, 0:n, choose)
for (j in seq_len(n)) {
for (i in seq_len(j - 1)) {
seq1 <- seq(i, j - 1)
res[i, j] <- sum(
precomputed[j-i, seq1-i+1] * precomputed[j+1, seq1+1] * C1^seq1
) * C2_n[j] / C3_n[i]
}
}
res
}
Verif:
> system.time({
+ n_k_matrix[[3]] <- sapply(1:nrow(n_k_matrix), function(i) {
+ P_ni(n_k_matrix[i,1], n_k_matrix[i,2], eta1, eta2, p)
+ })
+ })
utilisateur système écoulé
11.799 0.000 11.797
> system.time({
+ test <- P_ni2(400, eta1, eta2, p)
+ n_k_matrix[[4]] <- test[as.matrix(n_k_matrix[, 2:1])]
+ })
utilisateur système écoulé
2.328 0.003 2.341
> all.equal(n_k_matrix[[3]], n_k_matrix[[4]])
[1] TRUE
Note that I first store the results in the upper triangle of a squared matrix. Then, I convert it in your data frame format (that you call a matrix by the way).
This solution is 5 times faster for n = 400. I think you could improve it by recoding the double-loop (only) in Rcpp.
I have a function that takes i and j as parameters and returns a single value and I currently have a nested loop designed to compute a value for each entry in a square matrix. But in essence since each individual value can be computed in parallel. Is there a way I can apply lapply in this situation? The resulting matrix must be N X N and the function is dependant on i and j. Thanks
for ( i in 1:matrixRowLength ) {
for ( j in 1:matrixColLength ) {
result_matrix[i,j] <- function(i,j) } }
The foreach package has a nesting operator that can be useful when parallelizing nested for loops. Here's an example:
library(doSNOW)
cl <- makeSOCKcluster(3)
registerDoSNOW(cl)
matrixRowLength <- 5
matrixColLength <- 5
fun <- function(i, j) 10 * i + j
result_matrix.1 <-
foreach(j=1:matrixColLength, .combine='cbind') %:%
foreach(i=1:matrixRowLength, .combine='c') %dopar% {
fun(i, j)
}
Note that I reversed the order of the loops so that the matrix is computed column by column. This is generally preferable since matrices in R are stored in column-major order.
The nesting operator is useful if you have large tasks and at least one of the loops may have a small number of iterations. But in many cases, it's safer to only parallelize the outer loop:
result_matrix.2 <-
foreach(j=1:matrixColLength, .combine='cbind') %dopar% {
x <- double(matrixRowLength)
for (i in 1:matrixRowLength) {
x[i] <- fun(i, j)
}
x
}
Note that it can also be useful to use chunking in the outer loop to decrease the amount of post processing performed by the master process. Unfortunately, this technique is a bit more tricky:
library(itertools)
nw <- getDoParWorkers()
result_matrix.3 <-
foreach(jglobals=isplitIndices(matrixColLength, chunks=nw),
.combine='cbind') %dopar% {
localColLength <- length(jglobals)
m <- matrix(0, nrow=matrixRowLength, ncol=localColLength)
for (j in 1:localColLength) {
for (i in 1:matrixRowLength) {
m[i,j] <- fun(i, jglobals[j])
}
}
m
}
In my experience, this method often gives the best performance.
Thanks for an interesting question / use case. Here's a solution using the future package (I'm the author):
First, define (*):
future_array_call <- function(dim, FUN, ..., simplify = TRUE) {
args <- list(...)
idxs <- arrayInd(seq_len(prod(dim)), .dim = dim)
idxs <- apply(idxs, MARGIN = 1L, FUN = as.list)
y <- future::future_lapply(idxs, FUN = function(idx_list) {
do.call(FUN, args = c(idx_list, args))
})
if (simplify) y <- simplify2array(y)
dim(y) <- dim
y
}
This function does not make any assumptions on what data type your function returns, but with the default simplify = TRUE it will try to simplify the returned data type iff possible (similar to how sapply() works).
Then with your matrix dimensions (**):
matrixRowLength <- 5
matrixColLength <- 5
dim <- c(matrixRowLength, matrixColLength)
and function:
slow_fun <- function(i, j, ..., a = 1.0) {
Sys.sleep(0.1)
a * i + j
}
you can run calculate slow_fun(i, j, a = 10) for all elements as:
y <- future_array_call(dim, FUN = slow_fun, a = 10)
To do it in parallel on your local machine, use:
library("future")
plan(multiprocess)
y <- future_array_call(dim, FUN = slow_fun, a = 10)
On a cluster of machines (for which you have SSH access with SSH-key authentication), use:
library("future")
plan(cluster, workers = c("machine1", "machine2"))
y <- future_array_call(dim, FUN = slow_fun, a = 10)
Footnotes:
(*) If you wonder how it works, just replace the future::future_lapply() statement with a regular lapply().
(**) future_array_call(dim, FUN) should work for any length(dim), not just for two (= matrices).
I am building a movie recommendation engine and the below code computes the similarity matrix.
data <- read.csv('movie_test.csv')
similarity <- matrix(NA, nrow(data), nrow(data))
for (i in 1:nrow(data)) {
for (j in 1:nrow(data)) {
if (i != j) {
similarity[i, j] <- sum((data[i,] * data[j,]), na.rm = TRUE) /
(sqrt((sum(((data[i,] - data[j,] + data[j,]) * data[i,]), na.rm = TRUE))) *
sqrt((sum(((data[j,] - data[i,] + data[i,]) * data[j,]), na.rm = TRUE))))
}
}
}
For a small dataset this works perfect. But for 900 users and 1000 movies this does not scale. I have heard that the apply set of functions works faster but I doubt even that will scale. Is there any other way I can achieve the above task without using a for loop?
Thank you so much for your suggestions!!!
This should be fast:
m <- as.matrix(data)
m[is.na(m)] <- 0
z <- m %*% t(m)
d <- sqrt(diag(z))
similarity <- t(t(z) / d) / d
The diagonal will contain 1 which seems more appropriate than NA but if you prefer you can always do:
diag(similarity) <- NA
I want to use arms() to get one sample each time and make a loop like the following one in my function. It runs very slowly. How could I make it run faster? Thanks.
library(HI)
dmat <- matrix(0, nrow=100,ncol=30)
system.time(
for (d in 1:100){
for (j in 1:30){
y <- rep(0, 101)
for (i in 2:100){
y[i] <- arms(0.3, function(x) (3.5+0.000001*d*j*y[i-1])*log(x)-x,
function(x) (x>1e-4)*(x<20), 1)
}
dmat[d, j] <- sum(y)
}
}
)
This is a version based on Tommy's answer but avoiding all loops:
library(multicore) # or library(parallel) in 2.14.x
set.seed(42)
m = 100
n = 30
system.time({
arms.C <- getNativeSymbolInfo("arms")$address
bounds <- 0.3 + convex.bounds(0.3, dir = 1, function(x) (x>1e-4)*(x<20))
if (diff(bounds) < 1e-07) stop("pointless!")
# create the vector of z values
zval <- 0.00001 * rep(seq.int(n), m) * rep(seq.int(m), each = n)
# apply the inner function to each grid point and return the matrix
dmat <- matrix(unlist(mclapply(zval, function(z)
sum(unlist(lapply(seq.int(100), function(i)
.Call(arms.C, bounds, function(x) (3.5 + z * i) * log(x) - x,
0.3, 1L, parent.frame())
)))
)), m, byrow=TRUE)
})
On a multicore machine this will be really fast since it spreads the loads across cores. On a single-core machine (or for poor Windows users) you can replace mclapply above with lapply and get only a slight speedup compared to Tommy's answer. But note that the result will be different for parallel versions since it will use different RNG sequences.
Note that any C code that needs to evaluate R functions will be inherently slow (because interpreted code is slow). I have added the arms.C just to remove all R->C overhead to make moli happy ;), but it doesn't make any difference.
You could squeeze out a few more milliseconds by using column-major processing (the question code was row-major which requires re-copying as R matrices are always column-major).
Edit: I noticed that moli changed the question slightly since Tommy answered - so instead of the sum(...) part you have to use a loop since y[i] are dependent, so the function(z) would look like
function(z) { y <- 0
for (i in seq.int(99))
y <- y + .Call(arms.C, bounds, function(x) (3.5 + z * y) * log(x) - x,
0.3, 1L, parent.frame())
y }
Well, one effective way is to get rid of the overhead inside arms. It does some checks and calls the indFunc every time even though the result is always the same in your case.
Some other evaluations can be also be done outside the loop. These optimizations bring down the time from 54 secs to around 6.3 secs on my machine. ...and the answer is identical.
set.seed(42)
#dmat2 <- ##RUN ORIGINAL CODE HERE##
# Now try this:
set.seed(42)
dmat <- matrix(0, nrow=100,ncol=30)
system.time({
e <- new.env()
bounds <- 0.3 + convex.bounds(0.3, dir = 1, function(x) (x>1e-4)*(x<20))
f <- function(x) (3.5+z*i)*log(x)-x
if (diff(bounds) < 1e-07) stop("pointless!")
for (d in seq_len(nrow(dmat))) {
for (j in seq_len(ncol(dmat))) {
y <- 0
z <- 0.00001*d*j
for (i in 1:100) {
y <- y + .Call("arms", bounds, f, 0.3, 1L, e)
}
dmat[d, j] <- y
}
}
})
all.equal(dmat, dmat2) # TRUE
why not like this?
dat <- expand.grid(d=1:10, j=1:3, i=1:10)
arms.func <- function(vec) {
require(HI)
dji <- vec[1]*vec[2]*vec[3]
arms.out <- arms(0.3,
function(x,params) (3.5 + 0.00001*params)*log(x) - x,
function(x,params) (x>1e-4)*(x<20),
n.sample=1,
params=dji)
return(arms.out)
}
dat$arms <- apply(dat,1,arms.func)
library(plyr)
out <- ddply(dat,.(d,j),summarise, arms=sum(arms))
matrix(out$arms,nrow=length(unique(out$d)),ncol=length(unique(out$j)))
However, its still single core and time consuming. But that isn't R being slow, its the arms function.