I've inherited R some code and it runs incredibly slowly. Most of the time is spent evaluating the functions of the form (there are about 15 such functions with different integrands G):
TMin <- 0.5
F <- function (t, d) {
result <- ifelse(((d > 0) & (t > TMin)),
mapply(function(t, d) integrate(G, lower=0, upper=t, t, d)$value, t, d),
0)
return(result)
}
For testing, I'm using the following dummy function, but in the real code the Gs are much more complicated involving exp(), log(), dlnorm(), plnorm() etc..
G <- function(x, t, d) {
mean(rnorm(1e5))
x + t - d
}
F will be calculated around 2 million times in the worst case.
The function gets called in 3 different ways, either:
t is a single number and d is a numeric vector or,
t is a numeric vector and d is a single number or,
t is a numeric vector and is a numeric vector
Is there a (simple) way to speed up this function?
For far I've tried variations along the lines of (to get rid of the ifelse loop):
F2 <- function (t,d) {
TempRes <- mapply(function(t, d) integrate(G, lower=0, upper=t, t, d)$value, t, d)
TempRes[(d <= 0) | (t <= TMin)] <- 0
result <- TempRes
return(result)
}
and
F3 <- function (t,d) {
result <- rep(0, max(length(t),length(d)))
test <- ((d > 0) & (t > TMin))
result[test] <- mapply(function(t, d) integrate(G, lower=0, upper=t, t, d)$value, t, d)[test]
return(result)
}
but they take almost exactly the same time.
You are performing a large number of independent integrations. You can speed things up by performing these integrations on separate cores simultaneously (if you have a multicore processor available). The problem is that R performs its calculations in a single threaded manner by default. However, there are a number packages available that allow multithreading support. I have recently answered a few similar questions here and here, with some additional info regarding the relevant packages and functions.
Additionally, As #Mike Dunlavey already mentioned, you should avoid performing the integrations for values of t and d that do not match your criteria. (You are currently performing unneeded function evaluations for these values and then you overwrite the result with 0 afterwards).
I have added a possible improvement below. Note that you will have to create a separate file with your function G included in order to evaluate it on the cluster nodes. In the code below it is assumed that this file is called functionG.R
The snippet:
library(doParallel)
F4 <- function(t,d) {
results = vector(mode="numeric",max(length=length(t),length(d))) # Zero vector
logicalVector <- ((d > 0) & (t > TMin))
relevantT <- t[logicalVector]
relevantD <- d[logicalVector] # when d is single element, NA values created
if(length(relevantT) > 1 | length(relevantD) > 1)
{
if(length(d)==1) # d is only one element instead of vector --> replicate it
relevantD <- rep(d,length(relevantT))
if(length(t)==1) # t is only one element instead of vector --> replicate it
relevantT <- rep(t,length(relevantD))
cl <- makeCluster(detectCores());
registerDoParallel(cl)
clusterEvalQ(cl,eval(parse("functionG.R")))
integrationResults <- foreach(i=1:length(relevantT),.combine="c") %dopar%
{
integrate(G,lower=0,upper=relevantT[i],relevantT[i],relevantD[i])$value;
}
stopCluster(cl)
results[logicalVector] <- integrationResults
}
else if(length(relevantT==1)) # Cluster overhead not needd
{
results[logicalVector] = integrate(G,lower=0,upper=relevantT,relevantT,relevantD)$value;
}
return(results)
}
My CPU contains 6 physical cores with hyperthreading enabled (x2). These are the results:
> t = -5000:20000
> d = -5000:20000
>
> start = Sys.time()
> testF3 = F3(t,d)
> timeNeededF3 = Sys.time()-start
>
> start = Sys.time()
> testF4 = F4(t,d)
> timeNeededF4 = Sys.time()-start;
> timeNeededF3
Time difference of 3.452825 mins
> timeNeededF4
Time difference of 29.52558 secs
> identical(testF3,testF4)
[1] TRUE
It seems that the cores are constantly in use while running this code. However, you can potentially optimize this code further by presplitting the data more efficiently around the cores and then subsequently utilize an apply type functions on the separate cores.
If more optimization is required you could also take a deeper look at the integrate function. You can potentially play around with the settings and obtain a performance gain by allowing a less strict numerical approximation. As an alternative you could implement your own simple version of adaptive Simpson quadrature and play around with the discrete stepsizes. Most likely you could obtain massive performance increases like this (if you are able/willing to allow more error in the approximation).
EDIT:
Updated code in order for it to work in all scenario's: d and/or t valid/invalid numbers or vectors
REPLY TO COMMENTS
#mawir: you are correct. ifelse(test, yes, no) will return corresponding yes values for the rows of which test evaluates to TRUE, it will return the respective no values for the rows for which test evaluate to FALSE. However, it WILL first have to evaluate your yes expression in order to create the yes vector of length(test). This piece of code demonstrates this:
> t = -5000:5
> d = -5000:5
>
> start = Sys.time()
> testF1 = F(t,d)
> timeNeededF1 = Sys.time()-start
> timeNeededF1
Time difference of 43.31346 secs
>
> start = Sys.time()
> testF4 = F4(t,d)
> timeNeededF4 = Sys.time()-start
> timeNeededF4
Time difference of 2.284134 secs
Only the last 5 values of t and d are relevant in this scenario.
However, inside the F1 function the ifelse evaluates the mapply on all d and t values first in order to create the yes vector. This is why the function execution takes so long. Next, it selects the elements for which the conditions are met, or 0 otherwise. The F4 function works around this issue.
Futhermore, you say that you obtain speedup in the scenario where t and d are non-vectors. However, in this scenario no parallelisation is used. You should normally obtain the maximum speedup in the senario's where one or both of t/d are vectors.
ANOTHER EDIT, in response to Roland's comment:
You could potentially replace clusterEvalQ(cl,eval(parse("functionG.R"))) with clusterExport(cl,"G") if you prefer not to create a separate function file(s).
As a generality, the place to look is in the innermost loop, and you can speed it up either by making it take less time or by calling it fewer times. You have an inner loop running mapply, but then you extract element [test] from it. Does that mean all the other elements are discarded? If so, why bother spending time to calculate the extra elements?
Related
I'm trying to create a function to solve this puzzle:
An Arithmetic Progression is defined as one in which there is a constant difference between the consecutive terms of a given series of numbers. You are provided with consecutive elements of an Arithmetic Progression. There is however one hitch: exactly one term from the original series is missing from the set of numbers which have been given to you. The rest of the given series is the same as the original AP. Find the missing term.
You have to write the function findMissing(list), list will always be at least 3 numbers. The missing term will never be the first or last one.
The next section of code shows my attempt at this function. The site i'm on runs tests against the function, all of which passed, as in they output the correct missing integer.
The problem i'm facing is it's giving me a timeout error, because it takes to long to run all the tests. There are 102 tests and it's saying it takes over 12 seconds to complete them. Taking more than 12 seconds means the function isn't efficient enough.
After running my own timing tests in RStudio it seems running the function would take considerably less time than 12 seconds to run but regardless i need to make it more efficient to be able to complete the puzzle.
I asked on the site forum and someone said "Sorting is expensive, think of another way of doing it without it." I took this to mean i shouldn't be using the sort() function. Is this what they mean?
I've since found a few different ways of getting my_diff which is calculated using the sort() function. All of these ways are even less efficient than the original way of doing it.
Can anyway give me a more efficient way of doing the sort to find my_diff or maybe make other parts of the code more efficient? It's the sort() part which is apparently the inefficient part of the code though.
find_missing <- function(sequence){
len <- length(sequence)
if(len > 3){
my_diff <- as.integer(names(sort(table(diff(sequence)), decreasing = TRUE))[1])
complete_seq <- seq(sequence[1], sequence[len], my_diff)
}else{
differences <- diff(sequence)
complete_seq_1 <- seq(sequence[1],sequence[len],differences[1])
complete_seq_2 <- seq(sequence[1],sequence[len],differences[2])
if(length(complete_seq_1) == 4){
complete_seq <- complete_seq_1
}else{
complete_seq <- complete_seq_2
}
}
complete_seq[!complete_seq %in% sequence]
}
Here are a couple of sample sequences to check the code works:
find_missing(c(1,3,5,9,11))
find_missing(c(1,5,7))
Here are some of the other things i tried instead of sort:
1:
library(pracma)
Mode(diff(sequence))
2:
library(dplyr)
(data.frame(diff_1 = diff(sequence)) %>%
group_by(diff_1) %>%
summarise(count = n()) %>%
ungroup() %>%
filter(count==max(count)))[1]
3:
MaxTable <- function(sequence, mult = FALSE) {
differences <- diff(sequence)
if (!is.factor(differences)) differences <- factor(differences)
A <- tabulate(differences)
if (isTRUE(mult)) {
as.integer(levels(differences)[A == max(A)])
}
else as.integer(levels(differences)[which.max(A)])
}
Here is one way to do this using seq. We can create a sequence from minimum value in sequence to maximum value in the sequence having length as length(x) + 1 as there is exactly one term missing in the sequence.
find_missing <- function(x) {
setdiff(seq(min(x), max(x), length.out = length(x) + 1), x)
}
find_missing(c(1,3,5,9,11))
#[1] 7
find_missing(c(1,5,7))
#[1] 3
This approach takes the diff() of the vector - there will always be one difference higher than the others.
find_missing <- function(x) {
diffs <- diff(x)
x[which.max(diffs)] + min(diffs)
}
find_missing(c(1,3,5,9,11))
[1] 7
find_missing(c(1,5,7))
[1] 3
There is actually a simple formula for this, which will work even if your vector is not sorted...
find_missing <- function(x) {
(length(x) + 1) * (min(x) + max(x))/2 - sum(x)
}
find_missing(c(1,5,7))
[1] 3
find_missing(c(1,3,5,9,11,13,15))
[1] 7
find_missing(c(2,8,6))
[1] 4
It is based on the fact that the sum of the full series should be the average value times the length.
I tried to look for a duplicate question and I know many people have asked about parLapply in R so I apologize if I missed one that is applicable to my situation.
Problem: I have the following function that runs correctly in R but when I try to run it in parallel using parLapply (I'm on a windows machine) I get the error that $ operator is invalid for atomic vectors. The error mentions that 3 nodes produced the errors no matter how many nodes I set my cluster at, for example I have 8 cores on my desktop so I set the cluster to 7 nodes.
Here is example code showing where the problem is:
library(parallel)
library(doParallel)
library(arrangements)
#Function
perms <- function(inputs)
{
x <- 0
L <- 2^length(inputs$w)
ip <- inputs$ip
for( i in 1:L)
{
y <- ip$getnext()%*%inputs$w
if (inputs$t >= y)
{
x <- x + 1
}
}
return(x)
}
#Inputs is a list of several other variables that are created before this
#function runs (w, t_obs and iperm), here is a reproducible example of them
#W is derived from my data, this is just an easy way to make a reproducible example
set.seed(1)
m <- 15
W <- matrix(runif(15,0,1))
iperm <- arrangements::ipermutations(0:1, m, replace = T)
t_obs <- 5
inputs <- list(W,t_obs, iperm)
names(inputs) <- c("w", "t", "ip")
#If I run the function not in parallel
perms(inputs)
#It gives a value of 27322 for this example data
This runs exactly as it should, however when I try the following to run in parallel I get an error
#make the cluster
cor <- detectCores()
cl<-makeCluster(cor-1,type="SOCK")
#passing library and arguments
clusterExport(cl, c("inputs"))
clusterEvalQ(cl, {
library(arrangements)
})
results <- parLapply(cl, inputs, perms)
I get the error:
Error in checkForRemoteErrors(val) :
3 nodes produced errors; first error: $ operator is invalid for atomic vectors
However I've checked to see if anything is an atomic vector using is.atomic(), and using is.recursive(inputs) it says this is TRUE.
My question is why am I getting this error when I try to run this using parLapply when the function otherwise runs correctly and is there a reason is says "3 nodes produced errors" even when I have 7 nodes?
It says "3 nodes" because, as you're passing it to parLapply, you are only activating three nodes. The first argument to parLapply should be a list of things, each element to pass to each node. In your case, your inputs is a list, correct, but it is being broken down, such that your three nodes are effectively seeing:
# node 1
perms(inputs[[1]]) # effectively inputs$w
# node 2
perms(inputs[[2]]) # effectively inputs$t
# node 3
perms(inputs[[3]]) # effectively inputs$ip
# nodes 4-7 idle
You could replicate this on the local host (not parallel) with:
lapply(inputs, perms)
and when you see it like that, perhaps it becomes a little more obvious what is being passed to your nodes. (If you want to see if further, do debug(perms) then run the lapply above, and see what the inputs inside that function call looks like.)
To get this to work once on one node (I think not what you're trying to do), you could do
parLapply(cl, list(inputs), perms)
But that's only going to run one instance on one node. Perhaps you would prefer to do something like:
parLapply(cl, replicate(7, inputs, simplify=FALSE), perms)
I'm adding an answer in case anyone with a similar problem comes across this. #r2evans answered my original question which lead to a realization that even fixing the above problems would not get me the desired result (see comments to his answer).
Problem: Using the package arrangements to generate a large number of combinations and apply a function to the combinations. This becomes very time consuming as the number of combinations gets huge. What we need to do is split the combinations into chunks depending on the number of cores you will using to run in parallel and then do the calculations in each node only on that specific chunk of the combinations.
Solution:
cor <- detectCores()-1
cl<-makeCluster(cor,type="SOCK")
set.seed(1)
m <- 15
W <- matrix(runif(15,0,1))
#iperm <- arrangements::ipermutations(0:1, m, replace = T)
t_obs <- 5
chunk_list <- list()
for (i in 1:cor)
{
chunk_list[i] <- i
}
chunk_size <- floor((2^m)/(cor))
chunk_size <- c(rep(chunk_size,cor-1), (2^m)-chunk_size*(cor-1))
inputs_list <- Map(list, t=list(t_obs), w=list(W), chunk_list = chunk_list, chunk_size = list(chunk_size))
#inputs <- list(W,t_obs, iperm)
#names(inputs) <- c("w", "t", "ip", "chunk_it")
perms <- function(inputs)
{
x <- 0
L <- 2^length(inputs$w)
ip <- arrangements::ipermutations(0:1, m, replace = T)
chunk_size <- floor((2^m)/(cor))
chunk_size <- c(rep(chunk_size,cor-1), (2^m)-chunk_size*(cor-1))
if (inputs$chunk_list !=1)
{
ip$getnext(sum(chunk_size[1:inputs$chunk_list-1]))
}
for( i in 1:chunk_size[inputs$chunk_list])
{
y <- ip$getnext()%*%inputs$w
if (inputs$t >= y)
{
x <- x + 1
}
}
return(x)
}
clusterExport(cl, c("inputs_list", "m", "cor"))
clusterEvalQ(cl, {
library(arrangements)
})
system.time(results <- parLapply(cl, inputs_list, perms))
Reduce(`+`, results)
What I did was split the total number of combinations up into different chunks, i.e. the first 4681 (I have 7 nodes assigned to cor), the second and so on and made sure I didn't miss any combinations. Then I changed my original function to generate the permutations in each node but to basically skip to the combination it should start calculating on, so for node 1 it starts with the first combination but for node it it starts with the 4682 and so on. I'm still working on optimizing this because it's currently only about 4 times as fast as running it in parallel even though I'm using 7 cores. I think the skip in the permutation option will speed this up but I haven't checked yet. Hopefully this is helpful to someone else, it speeds up my estimated time to run (with m = 25, not 15) a simulation from about 10 days to about 2.5 days.
You need to pass dplyr to the nodes to solve this
clusterEvalQ(clust,{library (dplyr)})
The above code should solve your issue.
This is a rather simple question but somehow my code either takes long time or consumes more resource. It is a question asked in www.codewars.com which I use for R Programming practice.
Below are the two versions of the problem I coded:
Version 1 :
f <- function(n, m){
# Your code here
if(n<=0) return(0) else return((n%%m)+f((n-1),m))
}
Version 2:
#Function created to calculate the sum of the first half of the vector
created
calculate_sum <- function(x,y){
sum = 0
for(i in x){
sum = sum + i%%y
}
return(sum)
}
#Main function to be called with a test call
f <- function(n, m){
# Your code here
#Trying to create two vectors from the number to calculate the sum
#separately for each half
if(n%%2==0){
first_half = 1:(n/2)
second_half = ((n/2)+1):n
} else {
first_half = 1:floor(n/2)
second_half = (ceiling(n/2)):n
}
sum_first_half = calculate_sum(first_half,m)
sum_second_half = 0
for(j in second_half){
sum_second_half = sum_second_half+(j%%m)
}
return(sum_first_half+sum_second_half)
}
I am trying to figure out a way to optimize the code. For the first version it gives the following error message:
Error: C stack usage 7971184 is too close to the limit
Execution halted
For the second version it says my code took more than 7000 ms and hence was killed.
Can someone give me a few pointers on how to optimize the code in R??
The optimisation is mathematical, not programmatical (though as others have mentioned, loops are slow in R.)
Firstly, note that sum(0:(m-1)) = m*(m-1)/2.
You are being asked to calculate this n%/%m times, and add a remainder of (n - n%/%m)(n - n%/%m + 1)/2. So you might try
f <- function(n,m){
k <- n%/%m
r <- n - k*m
return(k*m*(m-1)/2 + r*(r+1)/2)
}
which is a much less complex calculation, and will not take very long regardless of how large n or m is.
There is a risk that, if n is greater than 2^53 and m does not have enough powers of 2 in its factorisation, there will not be enough precision to calculate k and r accurately enough.
EDIT: Just Realized it is actually a trick question
I would do
n%/%m *sum(1:(m-1)) + sum( 0:(n%%m))
Loop are real slow in R. Also, from my experience recursive function in R doesnt help much with the speed and it takes lots of memory
I am trying to implement a block bootstrap procedure, but I haven't figured out a way of doing this efficiently.
My data.frame has the following structure:
CHR POS var_A var_B
1 192 0.9 0.7
1 2000 0.8 0.3
2 3 0.21 0.76
2 30009 0.36 0.15
...
The first column is the chromosome identification, the second column is the position, and the last two columns are variables for which I want to calculate a correlation. The problem is that each row is not entirely independent to one another, depending on the distance between them (the closer the more dependent), and so I cannot simply do cor(df$var_A, df$var_B).
The way out of this problem that is commonly used with this type of data is performing a block bootstrap. That is, I need to divide my data into blocks of length X, randomly select one row inside that block, and then calculate my statistic of interest. Note, however, that these blocks need to be defined based on the column POS, and not based on the row number. Also, this procedure needs to be done for each chromosome.
I tried to implement this, but I came up with the slowest code possible (it didn't even finish running) and I am not 100% sure it works.
x = 1000
cors = numeric()
iter = 1000
for(j in 1:iter) {
df=freq[0,]
for (i in unique(freq$CHR)) {
t = freq[freq$CHR==i,]
fim = t[nrow(t),2]
i = t[1,2]
f = i + x
while(f < fim) {
rows = which(t$POS>=i & t$POS<f)
s = sample(rows)
df = rbind(df,t[s,])
i = f
f = f + x
}
}
cors = c(cors, cor(df$var_A, df$var_B))
}
Could anybody help me out? I am sure there is a more efficient way of doing this.
Thank you in advance.
One efficient way to try would be to use the 'boot' package, of which functions include parallel processing capabilities.
In particular, the 'tsboot', or time series boot function, will select ordered blocks of data. This could work if your POS variable is some kind of ordered observation.
The boot package functions are great, but they need a little help first. To use bootstrap functions in the boot package, one must first wrap the statistic of interest in a function which includes an index argument. This is the device the bootstrap generated index will use to pass sampled data to your statistic.
cor_hat <- function(data, index) cor(y = data[index,]$var_A, x = data[index,]$var_B)
Note cor_hat in the arguments below. The sim = "fixed", l = 1000 arguments, which indicate you want fixed blocks of length(l) 1000. However, you could do blocks of any size, 5 or 10 if your trying to capture nearest neighbor dynamics moving over time. The multicore argument speaks for itself, but it maybe "snow" if you are using windows.
library(boot)
tsboot(data, cor_hat, R = 1000, sim = "fixed", l = 1000, parallel = "multicore", ncpus = 4)
In addition, page 194 of Elements of Statistical Learning provides a good example of the framework using the traditional boot function, all of which is relevant to tsboot.
Hope that helps, good luck.
Justin
r
I hope I understood you right:
# needed for round_any()
library(plyr)
res <- lapply(unique(freq$CHR),function(x){
freq_sel <- freq[freq$CHR==x,]
blocks <- lapply(seq(1,round_any(max(freq_sel$POS),1000,ceiling),1000), function(ix) freq_sel[freq_sel$POS > ix & freq_sel$POS <= ix+999,])
do.call(rbind,lapply(blocks,function(x) if (nrow(x) > 1) x[sample(1:nrow(x),1),] else x))
})
This should return a list with an entry for each chromosome. Within each entry, there's an observation per 1kb-block if present. The number of blocks is determined by the maximum POS value.
EDIT:
library(doParallel)
library(foreach)
library(plyr)
cl <- makeCluster(detectCores())
registerDoParallel(cl)
res <- foreach(x=unique(freq$CHR),.packages = 'plyr') %dopar% {
freq_sel <- freq[freq$CHR==x,]
blocks <- lapply(seq(1,round_any(max(freq_sel$POS),1000,ceiling),1000), function(ix) freq_sel[freq_sel$POS > ix & freq_sel$POS <= ix+999,])
do.call(rbind,lapply(blocks,function(x) if (nrow(x) > 1) x[sample(1:nrow(x),1),] else x))
}
stopCluster(cl)
This is a simple parallelisation with foreach on each Chromosome. It could be better to restructure the function and base the parallel processing on another level (such as the 1000 iterations or maybe the blocks). In any case, I can just stress again what I was saying in my comment: Before you work on parallelising your code, you should be sure that it's as efficient as possible. Meaning you might want to look into the boot package or similar to get an increase in efficiency. That said, with the number of iterations you're planning, parallel processing might be useful once you're comfortable with your function.
So, after a while I came up with an answer to my problem. Here it goes.
You'll need the package dplyr.
l = 1000
teste = freq %>%
mutate(w = ceiling(POS/l)) %>%
group_by(CHR, w) %>%
sample_n(1)
This code creates a new variable named w based on the position in the genome (POS). This variable w is the window to which each row was assigned, and it depends on l, which is the length of your window.
You can repeat this code several times, each time sampling one row per window/CHR (with the sample_n(1)) and apply whatever statistic of interest that you want.
I have a very large list X and a vectorized function f. I want to calculate f(X), but this will take a long time if I do it with a single core. I have (access to) a 48-core server. What is the easiest way to parallelize the calculation of f(X)? The following is not the right answer:
library(foreach)
library(doMC)
registerDoMC()
foreach(x=X, .combine=c) %dopar% f(x)
The above code will indeed parallelize the calculation of f(X), but it will do so by applying f separately to every element of X. This ignores the vectorized nature of f and will probably make things slower as a result, not faster. Rather than applying f elementwise to X, I want to split X into reasonably-sized chunks and apply f to those.
So, should I just manually split X into 48 equal-sized sublists and then apply f to each in parallel, then manually put together the result? Or is there a package designed for this?
In case anyone is wondering, my specific use case is here.
Although this is an older question this might be interesting for everyone who stumbled upon this via google (like me): Have a look at the pvec function in the multicore package. I think it does exactly what you want.
Here's my implementation. It's a function chunkmap that takes a
vectorized function, a list of arguments that should be vectorized,
and a list of arguments that should not be vectorized (i.e.
constants), and returns the same result as calling the function on the
arguments directly, except that the result is calculated in parallel.
For a function f, vector arguments v1, v2, v3, and scalar
arguments s1, s2, the following should return identical results:
f(a=v1, b=v2, c=v3, d=s1, e=s2)
f(c=v3, b=v2, e=s2, a=v1, d=s1)
chunkapply(FUN=f, VECTOR.ARGS=list(a=v1, b=v2, c=v3), SCALAR.ARGS=list(d=s1, e=s2))
chunkapply(FUN=f, SCALAR.ARGS=list(e=s2, d=s1), VECTOR.ARGS=list(a=v1, c=v3, b=v2))
Since it is impossible for the chunkapply function to know which
arguments of f are vectorized and which are not, it is up to you to
specify when you call it, or else you will get the wrong results. You
should generally name your arguments to ensure that they get bound
correctly.
library(foreach)
library(iterators)
# Use your favorite doPar backend here
library(doMC)
registerDoMC()
get.chunk.size <- function(vec.length,
min.chunk.size=NULL, max.chunk.size=NULL,
max.chunks=NULL) {
if (is.null(max.chunks)) {
max.chunks <- getDoParWorkers()
}
size <- vec.length / max.chunks
if (!is.null(max.chunk.size)) {
size <- min(size, max.chunk.size)
}
if (!is.null(min.chunk.size)) {
size <- max(size, min.chunk.size)
}
num.chunks <- ceiling(vec.length / size)
actual.size <- ceiling(vec.length / num.chunks)
return(actual.size)
}
ichunk.vectors <- function(vectors=NULL,
min.chunk.size=NULL,
max.chunk.size=NULL,
max.chunks=NULL) {
## Calculate number of chunks
recycle.length <- max(sapply(vectors, length))
actual.chunk.size <- get.chunk.size(recycle.length, min.chunk.size, max.chunk.size, max.chunks)
num.chunks <- ceiling(recycle.length / actual.chunk.size)
## Make the chunk iterator
i <- 1
it <- idiv(recycle.length, chunks=num.chunks)
nextEl <- function() {
n <- nextElem(it)
ix <- seq(i, length = n)
i <<- i + n
vchunks <- foreach(v=vectors) %do% v[1+ (ix-1) %% length(v)]
names(vchunks) <- names(vectors)
vchunks
}
obj <- list(nextElem = nextEl)
class(obj) <- c("ichunk", "abstractiter", "iter")
obj
}
chunkapply <- function(FUN, VECTOR.ARGS, SCALAR.ARGS=list(), MERGE=TRUE, ...) {
## Check that the arguments make sense
stopifnot(is.list(VECTOR.ARGS))
stopifnot(length(VECTOR.ARGS) >= 1)
stopifnot(is.list(SCALAR.ARGS))
## Choose appropriate combine function
if (MERGE) {
combine.fun <- append
} else {
combine.fun <- foreach:::defcombine
}
## Chunk and apply, and maybe merge
foreach(vchunk=ichunk.vectors(vectors=VECTOR.ARGS, ...),
.combine=combine.fun,
.options.multicore = mcoptions) %dopar%
{
do.call(FUN, args=append(vchunk, SCALAR.ARGS))
}
}
## Only do chunkapply if it will run in parallel
maybe.chunkapply <- function(FUN, VECTOR.ARGS, SCALAR.ARGS=list(), ...) {
if (getDoParWorkers() > 1) {
chunkapply(FUN, VECTOR.ARGS, SCALAR.ARGS, ...)
} else {
do.call(FUN, append(VECTOR.ARGS, SCALAR.ARGS))
}
}
Here are some examples showing that chunkapply(f,list(x)) produces identical results to f(x). I have set the max.chunk.size extremely small to ensure that the chunking algorithm is actually used.
> # Generate all even integers from 2 to 100 inclusive
> identical(chunkapply(function(x,y) x*y, list(1:50), list(2), max.chunk.size=10), 1:50 * 2)
[1] TRUE
> ## Sample from a standard normal distribution, then discard values greater than 1
> a <- rnorm(n=100)
> cutoff <- 1
> identical(chunkapply(function(x,limit) x[x<=limit], list(x=a), list(limit=cutoff), max.chunk.size=10), a[a<cutoff])
[1] TRUE
If anyone has a better name than "chunkapply", please suggest it.
Edit:
As another answer points out, there is a function called pvec in the multicore pacakge that has very similar functionality to what I have written. For simple cases, you should us that, and you should vote up Jonas Rauch's answer for it. However, my function is a bit more general, so if any of the following apply to you, you might want to consider using my function instead:
You need to use a parallel backend other than multicore (e.g. MPI). My function uses foreach, so you can use any parallelization framework that provides a backend for foreach.
You need to pass multiple vectorized arguments. pvec only vectorizes over a single argument, so you couldn't easily implement parallel vectorized addition with pvec, for example. My function allows you to specify arbitrary arguments.
The itertools package was designed to address this kind of problem. In this case, I would use isplitVector:
n <- getDoParWorkers()
foreach(x=isplitVector(X, chunks=n), .combine='c') %dopar% f(x)
For this example, pvec is undoubtably faster and simpler, but this can be used on Windows with the doParallel package, for example.
Map-Reduce might be what you're looking for; it's been ported to R
How about something like this? R will take advantage of all the available memory and multicore will parallelize over all available cores.
library(multicore)
result = mclapply(X, function,mc.preschedule=FALSE, mc.set.seed=FALSE)