I am running the following for loop for the gwr.basic function in the GWmodel package in R. What I need to do is to collect the mean of estimate parameter for any given bandwidth.
the code looks like:
library(GWmodel)
data("DubVoter")
#Dub.voter
LARentMean = list()
for (i in 20:21)
{
gwr.res <- gwr.basic(GenEl2004 ~ DiffAdd + LARent + SC1 + Unempl + LowEduc + Age18_24 + Age25_44 + Age45_64, data = Dub.voter, bw = i, kernel = "bisquare", adaptive = TRUE, F123.test = TRUE)
a <- mean(gwr.res$SDF$LARent)
LARentMean[i] <- a
}
outcome = unlist(LARentMean)
> outcome
[1] -0.1117668 -0.1099969
However it is terribly slow at returning the result. I need a much wider range such as 20:200. Is there a way to speed the process up? If not, how to have a stepped range let's say 20 to 200 with steps of 5 to reduce the number of operations?
I am a python user new to R. I read on SO that R is well known for being slow at for loops and that there are more efficient alternatives. More clarity on this point would be welcomed.
I got the same impression like #musically_ut. The for loop and the traditional for-vs.apply debate is unlikely to help you here. Try to go for parallelization if you got more than one core. There are several packages like parallel or snowfall. Which package is ultimately the best and fastest depends on your machine and operating system.
Best does not always equal fastest here. A code that works cross-platform and can be worth more than a bit of extra performance. Also transparency and ease of use can outweigh maximum speed. That being said I like the standard solution a lot and would recommend to use parallel which ships with R and works on Windows, OSX and Linux.
EDIT: here's the fully reproducible example using the OP's example.
library(GWmodel)
data("DubVoter")
library(parallel)
bwlist <- list(bw1 = 20, bw2 = 21)
cl <- makeCluster(detectCores())
# load 'GWmodel' for each node
clusterEvalQ(cl, library(GWmodel))
# export data to each node
clusterExport(cl, varlist = c("bwlist","Dub.voter"))
out <- parLapply(cl, bwlist, function(e){
try(gwr.basic(GenEl2004 ~ DiffAdd + LARent + SC1 +
Unempl + LowEduc + Age18_24 + Age25_44 +
Age45_64, data = Dub.voter,
bw = e, kernel = "bisquare",
adaptive = TRUE, F123.test = TRUE ))
} )
LArent_l <- lapply(lapply(out,"[[","SDF"),"[[","LARent")
unlist(lapply(LArent_l,"mean"))
# finally, stop the cluster
stopCluster(cl)
Besides using parallelization as Matt Bannert suggests, you should preallocate the vector LARentMean. Often, it's not the for loop itself that is slow but the fact that the for seduces you to do slow things like creating growing vectors.
Consider the following example to see the impact of a growing vector as compared to preallocating the memory:
library(microbenchmark)
growing <- function(x) {
mylist <- list()
for (i in 1:x) {
mylist[[i]] <- i
}
}
allocate <- function(x) {
mylist <- vector(mode = "list", length = x)
for (i in 1:x) {
mylist[[i]] <- i
}
}
microbenchmark(growing(1000), allocate(1000), times = 1000)
# Unit: microseconds
# expr min lq mean median uq max neval
# growing(1000) 3055.134 4284.202 4743.4874 4433.024 4655.616 47977.236 1000
# allocate(1000) 867.703 917.738 998.0719 956.441 995.143 2564.192 1000
The growing list is about 5 times slower than the version that preallocates the memory.
Related
few days ago I ask this topic about calling a custom made function within a loop that was well resolved by a combination of
eval(parse(text = Function text))
here is the link: Automatic creation and use of custom made function in R.
This allowed me to work with for loop and call automatically the function I need from a Data frame storing the body of the function to create.
Now I would like to bring the question to a next level. My problem is computation time. I need to evaluate something like 52 indices from a hyperspectrial image. this means that in R my hyperspectral image is loaded as a 3d array of 512x512x204 bands.
what I would like to do is run the evaluation of the indices in parallel to reduce the computation time.
here a dummy example to what I would like to emulate, but not in parallel computing.
# create a fake matrix rappresenting my Hyperpectral image
HYPR_IMG=array(NA,dim=c(5,3,4))
HYPR_IMG[,,1]=1
HYPR_IMG[,,2]=2
HYPR_IMG[,,3]=3
HYPR_IMG[,,4]=4
image.plot(HYPR_IMG[,,1], zlim=c(0,20))
image.plot(HYPR_IMG[,,2], zlim=c(0,20))
image.plot(HYPR_IMG[,,3], zlim=c(0,20))
image.plot(HYPR_IMG[,,4], zlim=c(0,20))
#create a fake DF for simulating my indices stored in the dataframe
IDXname=c("IDX1","IDX2","IDX3","IDX4")
IDXFunc=c("HYPR_IMG[,,1] + 3*HYPR_IMG[,,2]",
"HYPR_IMG[,,3] + HYPR_IMG[,,2]",
"HYPR_IMG[,,4] + HYPR_IMG[,,2] - HYPR_IMG[,,3]",
"HYPR_IMG[,,1] + HYPR_IMG[,,4] + 4*HYPR_IMG[,,2] + HYPR_IMG[,,3]")
IDX_DF=as.data.frame(cbind(IDXname,IDXFunc))
# that was what I did before
Store_DF=data.frame(NA)
for (i in 1: length(IDX_DF$IDXname)) {
IDX_ID=IDX_DF$IDXname[i]
IDX_Fun_tmp=IDX_DF$IDXFunc[which(IDX_DF$IDXname==IDX_ID)] #use for extra care to select the right fuction
IDXFunc_call=paste("IDXfun_tmp=function(HYPR_IMG){",IDX_Fun_tmp,"}",sep="")
eval(parse(text = IDXFunc_call))
IDX_VAL=IDXfun_tmp (HYPR_IMG)
image.plot(IDX_VAL,zlim=c(0,20)); title(main=IDX_ID)
temp_DF=as.vector(IDX_VAL)
Store_DF=cbind(Store_DF,temp_DF)
names(Store_DF)[i+1] <- as.vector(IDX_ID)
}
my final goal is to have the very same Store_DF ,storing all the Indices value. Here I have a for loop but using a foreach loop things should speed up. if needed I am working with windows 8 or more as OS.
Is it really possible ?
Will I be able at the end, to reduce the overall computational time having the same Store_DF dataframe or somthing simlar like a matrix with the columns names?
Thanks a lot!!!
For the specific example using either the build in parallelization of a package like data.table or a parallel apply might be more beneficial.
Below is a minimal example of how to achieve the results using a parApply from the parallel package. Note the output is a matrix, which actually yields slightly better performance in base R (not the case necessarily in tidyverse or data.table). In case the data.frame structure is vital you will have to convert it with data.frame.
cl <- parallel::makeCluster( parallel::detectCores() )
result <- parallel::parApply(cl = cl, X = IDX_DF, MARGIN = 1, FUN = function(x, IMAGES){
IDX_ID <- x[["IDXname"]]
eval(parse(text = paste0("IDXfun_tmp <- function(HYPR_IMG){", x[["IDXFunc"]], "}")))
IDX_VAL <- as.vector(IDXfun_tmp(IMAGES))
names(IDX_VAL) <- IDX_ID
IDX_VAL
}, IMAGES = HYPR_IMG)
colnames(result) = IDXname
IDXname
parallel::stopCluster(cl)
Please note the stopCluster(cl) which is important to shut down any loose R sessions.
Benchmark results (4 tiny cores):
Unit: milliseconds
expr min lq mean median uq max neval
Loop 8.420432 9.027583 10.426565 9.272444 9.943783 26.58623 100
Parallel 1.382324 1.491634 2.038024 1.554690 1.907728 18.23942 100
For replications of benchmarks the code has been provided below:
cl <- parallel::makeCluster( parallel::detectCores() )
microbenchmark::microbenchmark(
Loop = {
Store_DF=data.frame(NA)
for (i in 1: length(IDX_DF$IDXname)) {
IDX_ID = IDX_DF$IDXname[i]
IDX_Fun_tmp = IDX_DF$IDXFunc[which(IDX_DF$IDXname == IDX_ID)] #use for extra care to select the right function
eval(parse(text = paste0("IDXfun_tmp = function(HYPR_IMG){", IDX_Fun_tmp, "}")))
IDX_VAL = IDXfun_tmp(HYPR_IMG)
#Plotting in parallel is not a good idea. It will most often not work but might make the R session crash or slow down significantly (at best the latter at worst the prior)
#image.plot(IDX_VAL, zlim = c(0,20)); title(main = IDX_ID)
temp_DF = as.vector(IDX_VAL)
Store_DF = cbind(Store_DF,temp_DF)
names(Store_DF)[i+1] <- as.vector(IDX_ID)
}
rm(Store_DF)
},
Parallel = {
result <- parallel::parApply(cl = cl, X = IDX_DF, MARGIN = 1, FUN = function(x, IMAGES){
IDX_ID <- x[["IDXname"]]
eval(parse(text = paste0("IDXfun_tmp <- function(HYPR_IMG){", x[["IDXFunc"]], "}")))
IDX_VAL <- as.vector(IDXfun_tmp(IMAGES))
names(IDX_VAL) <- IDX_ID
IDX_VAL
}, IMAGES = HYPR_IMG)
colnames(result) = IDXname
rm(result)
}
)
parallel::stopCluster(cl)
Edit: Using the foreach package
After a few comments about performance issues (maybe due to memory), i decided to make an illustration of how one could obtain the same result using the foreach package. A few notes:
The foreach package uses iterators. As standard it can be used like a for loop, where it will iterate over each column in a data.frame
Like other parallel implementations in R, if you are on Windows, often you will have to export the data used for calculations. It can sometimes be avoided with some fiddling and foreach sometimes will let you not export data. When this is, is unclear from the documentation.
The output of the foreach will be combined either as a list or as defined by the .combine argument, which can be rbind, cbind or any other function.
There is a lot of comments, making the code seem alot longer than it actually it. Removing comments and blank lines, it is 9 lines longer.
Below is the code which will yield the same output as above. Note i have used the data.table package. For more information about this package i suggest their wikipedia on github.
cl <- parallel::makeCluster( parallel::detectCores() )
#Foeach uses doParallel for the parallization
doParallel::registerDoParallel(cl)
#To iterate over the rows, we need to use iterators
# if foreach is given a matrix it will be converted to a column iterators
rowIterator <- iterators::iter(IDX_DF, by = "row")
library(foreach)
result <-
foreach(
#Supply the iterator
row = rowIterator,
#Specify if the calculations needs to be in order. If not then we can get better performance not doing so
.inorder = FALSE,
#In most foreach loops you will have to export the data you need for the calculations
# it worked without doing so for me, in which case it is faster if the exported stuff is large
#.export = c("HYPR_IMG"),
#We need to say how the output should be merged. If nothing is given it will be output as a list
#data.table rbindlist is faster than rbind (returns a data.table)
.combine = function(...)data.table::rbindlist(list(...)) ,
#otherwise we could've used:
#.combine = rbind
#if we dont use rbind or cbind (i used data.table::rbindlist for speed)
# we will have to tell if it can take more than 1 argument
.multicombine = TRUE
) %dopar% #Specify how to do the calculations. %do% loop. %dopar% parallel loop. %:% nested loops (next foreach tells how we do the loop)
{ #to do stuff in parallel we use the %dopar%. Alternative %do%. For multiple foreach we split each of them by %:%
IDX_ID <- row[["IDXname"]]
eval(parse(text = paste0("IDXfun_tmp <- function(HYPR_IMG){", row[["IDXFunc"]], "}")))
IDX_VAL <- as.vector(IDXfun_tmp(HYPR_IMG))
data.frame(ID = IDX_ID, IDX_VAL)
}
#output is saved in result
result
result_reformatted <- dcast(result[,indx := 1:.N, by = ID],
indx~ID,
value.var = "IDX_VAL")
#if we dont want to use data.table we could use unstack instead
unstack(test, IDX_VAL ~ ID)
There are several packages in R to simplify running code in parallel, like foreach and future. Most of these have constructs which are like lapply or a for loop: they carry on until all the tasks have finished.
Is there a simple parallel version of Find? That is, I would like to run several tasks in parallel. I don't need all of them to finish, I just need to get the first one that finishes (maybe with a particular result). After that the other tasks can be killed, or left to finish on their own.
Conceptual code:
hunt_needle <- function (x, y) x %in% (y-1000):y
x <- sample.int(1000000, 1)
result <- parallel_find(seq(1000, 1000000, 1000), hunt_needle)
# should return the first value for which hunt_needle is true
You can use shared memory so that processes can communicate with one another.
For that, you can use package bigstatsr (disclaimer: I'm the author).
Choose a block size and do:
# devtools::install_github("privefl/bigstatsr")
library(bigstatsr)
# Data example
cond <- logical(1e6)
cond[sample(length(cond), size = 1)] <- TRUE
ind.block <- bigstatsr:::CutBySize(length(cond), block.size = 1000)
cl <- parallel::makeCluster(nb_cores())
doParallel::registerDoParallel(cl)
# This value (in an on-disk matrix) is shared by processes
found_it <- FBM(1, 1, type = "integer", init = 0L)
library(foreach)
res <- foreach(ic = sample(rows_along(ind.block)), .combine = 'c') %dopar% {
if (found_it[1]) return(NULL)
ind <- bigstatsr:::seq2(ind.block[ic, ])
find <- which(cond[ind])
if (length(find)) {
found_it[1] <- 1L
return(ind[find[1]])
} else {
return(NULL)
}
}
parallel::stopCluster(cl)
# Verification
all.equal(res, which(cond))
Basically, when a solution is found, you don't need to do some computations anymore, and others know it because you put a 1 in found_it which is shared between all processes.
As your question is not reproducible and I don't understand everything you need, you may have to adapt this solution a little bit.
I have written a program to generate a very large amount of random multivariate distributed data (25 x 30 x 10 000 000) using mvtnorm, then do some simple calculations and manipulations on the matrices.
I am using the foreach and doParallel packages to run operations in parallel to reduce time. A completely arbitrary example, just to demonstrate the packages is:
foreach (x = matr) %dopar% {
x[time_horizon + 1] <- x[time_horizon]
x <- cbind(100,x)
for (m in 2:(time_horizon + 1)) {
# loop through each row of matrix to apply function
x[,m] <- x[,m-1] + x[,m]
}
return(x)
}
I have created an implicit cluster of cores to run these foreach functions on:
registerDoParallel(4)
The problem
When I run with multiple cores, it appears to multiply or duplicate the memory used when I monitor performance on Task Manager (i.e. 2 cores uses more memory than 1 core, 4 cores uses more memory than 2).
When I run my program for (25 x 30 x 1 000 000), running in parallel helps the speed of execution (i.e. 4 cores is faster than 1 core). However, when I run my program for (25 x 30 x 2 500 000) and above, too much memory is used and that appears to slow it down.
One friend said it could potentially be a page fault and the hard drive must be accessed when I run out of RAM.
Why is the duplication of memory across cores happening? Is it supposed to happen? Can I stop it? Are there other solutions?
Edit (Full Code):
library(mvtnorm)
library(foreach)
library(doParallel)
library(ggplot2)
library(reshape2)
library(plyr)
# Calculate the number of cores
no_cores <- detectCores()
# Create an implicit cluster and regular cluster
registerDoParallel(no_cores)
daily_pnl <- function() {
time_horizon <- 30
paths <- 2500000
asset <- 25
path_split <- 100
corr_mat <- diag(asset)
expected_returns <- runif(asset,0.0, 0.25)
# Create a list of vectors to store pnl information for each asset
foreach(icount(time_horizon), .packages = "mvtnorm") %dopar% {
average_matrix <- matrix(, (paths/path_split), asset)
split_start <- 1
my_day <- rmvnorm(paths, expected_returns, corr_mat, method="chol")
for (n in 1:(paths/path_split)) {
average_matrix[n,] <- colMeans(my_day[split_start:(split_start + path_split - 1),])
split_start <- split_start + path_split
}
return(average_matrix)
}
}
matrix_splitter <- function(matr) {
time_horizon <- 30
paths <- 2500000
path_split <- 100
asset <- 25
alply(array(unlist(daily), c(paths/path_split,time_horizon,asset)),3)
}
cum_returns <- function(matr) {
time_horizon <- 30
paths <- 2500000
asset <- 25
foreach (x = matr) %dopar% {
x[time_horizon + 1] <- x[time_horizon]
x <- cbind(100,x)
for (m in 2:(time_horizon + 1)) {
# loop through each row of matrix to apply function
x[,m] <- x[,m-1] + x[,m]
}
return(x)
}
}
plotting <- function(path_matr) {
security_paths <- as.data.frame(t(path_matr))
security_paths$id <- 1:nrow(security_paths)
plot_paths <- melt(security_paths, id.var="id")
ggplot(plot_paths, aes(x=id, y=value,group=variable,colour=variable)) +
geom_line(aes(lty=variable))
}
system.time(daily <- daily_pnl())
system.time(daily_by_security <- matrix_splitter(daily))
rm(daily)
gc()
system.time(security_paths <- cum_returns(daily_by_security))
rm(daily_by_security)
gc()
plot_list <- foreach(x = security_paths, .packages = c("reshape2", "ggplot2")) %dopar% {
if (nrow(x) > 100) {
plotting(head(x,100))
} else {
plotting(x)
}
}
#Stop implicit cluster and regular cluster
stopImplicitCluster()
gc()
This seems to be a really old problem. I am having a similar issue. I don't need compute parallelization I actually need memory parallelization. (if such a thing can exist)
what works for me is azure do parallel. instead of registering system cores register cores from the cloud using registerDoAzureParallel(cluster)
your json will define the size of the machines (memory) you hire for the job. make sure each worker has enough memory to get a copy of your r environment. This will probably kill your network. You will be sending data to 30 -40 (depending on how many you have asked for) workers from your machine.
more documentation here.
https://github.com/Azure/doAzureParallel
Can we do something with sparklyr to address such issues?
I'm trying to move from a serial to parallel approach to accomplish some multivariate time series analysis tasks on a large data.table. The table contains data for many different groups and I'm trying to move from a for loop to a foreach loop using the doParallel package to take advantage of the multicore processor installed.
The problem I am experiencing relates to memory and how the new R processes seem to consume large quantities of it. I think that what is happening is that the large data.table containing ALL data is copied into each new process, hence I run out of RAM and Windows starts swapping to disk.
I've created a simplified reproducible example which replicates my problem, but with less data and less analysis inside the loop. It would be ideal if a solution existed which could only farm out the data to the worker processes on demand, or sharing the memory already used between cores. Alternatively some kind of solution may already exist to split the big data into 4 chunks and pass these to the cores so they have a subset to work with.
A similar question has previously been posted here on Stackoverflow however I cannot make use of the bigmemory solution offered as my data contains a character field. I will look further into the iterators package, however I'd appreciate any suggestions from members with experience of this problem in practice.
rm(list=ls())
library(data.table)
num.series = 40 # can customise the size of the problem (x10 eats my RAM)
num.periods = 200 # can customise the size of the problem (x10 eats my RAM)
dt.all = data.table(
grp = rep(1:num.series,each=num.periods),
pd = rep(1:num.periods, num.series),
y = rnorm(num.series * num.periods),
x1 = rnorm(num.series * num.periods),
x2 = rnorm(num.series * num.periods)
)
dt.all[,y_lag := c(NA, head(y, -1)), by = c("grp")]
f_lm = function(dt.sub, grp) {
my.model = lm("y ~ y_lag + x1 + x2 ", data = dt.sub)
coef = summary(my.model)$coefficients
data.table(grp, variable = rownames(coef), coef)
}
library(doParallel)
registerDoParallel(4)
foreach(grp=unique(dt.all$grp), .packages="data.table", .combine="rbind") %dopar%
{
dt.sub = dt.all[grp == grp]
f_lm(dt.sub, grp)
}
detach(package:doParallel)
Iterators can help to reduce the amount of memory that needs to be passed to the workers of a parallel program. Since you're using the data.table package, it's a good idea to use iterators and combine functions that are optimized for data.table objects. For example, here is a function like isplit that works on data.table objects:
isplitDT <- function(x, colname, vals) {
colname <- as.name(colname)
ival <- iter(vals)
nextEl <- function() {
val <- nextElem(ival)
list(value=eval(bquote(x[.(colname) == .(val)])), key=val)
}
obj <- list(nextElem=nextEl)
class(obj) <- c('abstractiter', 'iter')
obj
}
Note that it isn't completely compatible with isplit, since the arguments and return value are slightly different. There may also be a better way to subset the data.table, but I think this is more efficient than using isplit.
Here is your example using isplitDT and a combine function that uses rbindlist which combines data.tables faster than rbind:
dtcomb <- function(...) {
rbindlist(list(...))
}
results <-
foreach(dt.sub=isplitDT(dt.all, 'grp', unique(dt.all$grp)),
.combine='dtcomb', .multicombine=TRUE,
.packages='data.table') %dopar% {
f_lm(dt.sub$value, dt.sub$key)
}
Update
I wrote a new iterator function called isplitDT2 which performs much better than isplitDT but requires that the data.table have a key:
isplitDT2 <- function(x, vals) {
ival <- iter(vals)
nextEl <- function() {
val <- nextElem(ival)
list(value=x[val], key=val)
}
obj <- list(nextElem=nextEl)
class(obj) <- c('abstractiter', 'iter')
obj
}
This is called as:
setkey(dt.all, grp)
results <-
foreach(dt.sub=isplitDT2(dt.all, levels(dt.all$grp)),
.combine='dtcomb', .multicombine=TRUE,
.packages='data.table') %dopar% {
f_lm(dt.sub$value, dt.sub$key)
}
This uses a binary search to subset dt.all rather than a vector scan, and so is more efficient. I don't know why isplitDT would use more memory, however. Since you're using doParallel, which doesn't call the iterator on-the-fly as it sends out tasks, you might want to experiment with splitting dt.all and then removing it to reduce your memory usage:
dt.split <- as.list(isplitDT2(dt.all, levels(dt.all$grp)))
rm(dt.all)
gc()
results <-
foreach(dt.sub=dt.split,
.combine='dtcomb', .multicombine=TRUE,
.packages='data.table') %dopar% {
f_lm(dt.sub$value, dt.sub$key)
}
This may help by reducing the amount of memory needed by the master process during the execution of the foreach loop, while still only sending the required data to the workers. If you still have memory problems, you could also try using doMPI or doRedis, both of which get iterator values as needed, rather than all at once, making them more memory efficient.
The answer requires the iterators package and use of isplit which is similar to split in that it breaks the main data object into chunks based on one or more factor columns. The foreach loop iterates through the chunks of data, passing only the subset out to the worker process rather than the whole table.
So the differences in the code are as follows:
library(iterators)
dt.all = data.table(
grp = factor(rep(1:num.series, each =num.periods)), # grp column is a factor
pd = rep(1:num.periods, num.series),
y = rnorm(num.series * num.periods),
x1 = rnorm(num.series * num.periods),
x2 = rnorm(num.series * num.periods)
)
results =
foreach(dt.sub = isplit(dt.all, dt.all$grp), .packages="data.table", .combine="rbind")
%dopar%
{
f_lm(dt.sub$value, dt.sub$key[[1]])
}
The result of the isplit is that dt.sub is now a list with 2 elements: the key is in itself a list of the values used to split and the value contains the subset as a data.table.
Credit for this solution is given to a SO answer given by David and a response by Russell to my question on an excellent blog post about iterators.
------------------------------------ EDIT ------------------------------------
To test the performance of isplitDT v isplit and rbindlist v rbind the following code was used:
rm(list=ls())
library(data.table) ; library(iterators) ; library(doParallel)
num.series = 400
num.periods = 2000
dt.all = data.table(
grp = factor(rep(1:num.series,each=num.periods)),
pd = rep(1:num.periods, num.series),
y = rnorm(num.series * num.periods),
x1 = rnorm(num.series * num.periods),
x2 = rnorm(num.series * num.periods)
)
dt.all[,y_lag := c(NA, head(y, -1)), by = c("grp")]
f_lm = function(dt.sub, grp) {
my.model = lm("y ~ y_lag + x1 + x2 ", data = dt.sub)
coef = summary(my.model)$coefficients
data.table(grp, variable = rownames(coef), coef)
}
registerDoParallel(8)
isplitDT <- function(x, colname, vals) {
colname <- as.name(colname)
ival <- iter(vals)
nextEl <- function() {
val <- nextElem(ival)
list(value=eval(bquote(x[.(colname) == .(val)])), key=val)
}
obj <- list(nextElem=nextEl)
class(obj) <- c('abstractiter', 'iter')
obj
}
dtcomb <- function(...) {
rbindlist(list(...))
}
# isplit/rbind
st1 = system.time(results <- foreach(dt.sub=isplit(dt.all,dt.all$grp),
.combine="rbind",
.packages="data.table") %dopar% {
f_lm(dt.sub$value, dt.sub$key[[1]])
})
# isplit/rbindlist
st2 = system.time(results <- foreach(dt.sub=isplit(dt.all,dt.all$grp),
.combine='dtcomb', .multicombine=TRUE,
.packages="data.table") %dopar% {
f_lm(dt.sub$value, dt.sub$key[[1]])
})
# isplitDT/rbind
st3 = system.time(results <- foreach(dt.sub=isplitDT(dt.all, 'grp', unique(dt.all$grp)),
.combine='dtcomb', .multicombine=TRUE,
.packages='data.table') %dopar% {
f_lm(dt.sub$value, dt.sub$key)
})
# isplitDT/rbindlist
st4 = system.time(results <- foreach(dt.sub=isplitDT(dt.all, 'grp', unique(dt.all$grp)),
.combine='dtcomb', .multicombine=TRUE,
.packages='data.table') %dopar% {
f_lm(dt.sub$value, dt.sub$key)
})
rbind(st1, st2, st3, st4)
This gives the following timings:
user.self sys.self elapsed user.child sys.child
st1 12.08 1.53 14.66 NA NA
st2 12.05 1.41 14.08 NA NA
st3 45.33 2.40 48.14 NA NA
st4 45.00 3.30 48.70 NA NA
------------------------------------ EDIT 2 ------------------------------------
Thanks to Steve's updated answer and the function isplitDT2, which makes use of the keys on the data.table, we have a clear new winner in terms of speed. Running microbenchmark to compare my original solution (in this answer) shows around 7-fold improvement from isplitDT2 with rbindlist. Memory usage has not yet been compared directly but the performance gain leads me to accept the answer at last.
Holding everything in memory is one of those (aargh, annoying) things that R programmers have to learn to deal with. It's pretty easy to imagine your code example as either memory-bound or CPU-bound, and you'll need to figure that out before trying to apply workarounds.
Assuming the memory is being consumed by your dataset (dt_all) and not during the actual model run, it is possible you might be able to release enough memory for the worker processes to parallelize:
foreach(grp=unique(dt.all$grp), .packages="data.table", .combine="rbind") %dopar%
{
dt.sub = dt.all[grp == grp]
rm(dt.all)
gc()
f_lm(dt.sub, grp)
}
However, this assumes that your working set (dt.sub) is small enough that you can fit more than one of them in memory at a time. It isn't hard to imagine a problem set too large for that. Also, and this is really annoying, all the workers are going to fire up at one time and kill your machine anyway, so you might need to make them pause for a couple seconds to allow other children to load up and release memory.
Though desperately stupid and brute-force, I have handled this exact problem by writing the subsets out to disk as individual data files, and then used a batch script to run my computations in parallel.
I have following for loop in R:
v = c(1,2,3,4)
s = create.some.complex.object()
for (i in v){
print(i)
s = some.complex.function.that.updates.s(s)
}
# s here has the right content.
Needless to say, this loop is horribly slow in R.
I tried to write it in functional style:
lapply(v, function(i){
print(i)
s = some.complex.function.that.updates.s(s)
})
# s wasn't updated.
But this doesn't work, because s is passed by value and not by reference.
I only need the result of the last iteration, not all of the intermediate steps.
How do I formulate the first loop in R-style?
Mulone
lapply(v, function(i){
print(i)
s = some.complex.function.that.updates.s(s)
return(s)
})
the result will be a list of object s created for each value of v. Even if it should have passed the value of v anyway cause it was the last operation performed by the function.
If you can't afford to create it many times then there are not a lot of options. It is hard to say as well without seeing the object that you are operating on. If the object is growing/appending you could collect the intermediate results and do the appending at the end. If it is actually mutating you should try to get away from the pass value and use reference classes (http://www.inside-r.org/r-doc/methods/ReferenceClasses). Then the function that modifies it will actually be a method you just call n times.
Is the loop itself really the problem? Or is it rather the time the execution of some.complex.function.that.updates.s needs?
Some R programers will jump through hoops to avoid loops but have a look at this example:
f <- function(a) a/1.001
loop <- function(n) { s = (1/f(1)^n); for (i in 1:n) s <- f(s); s}
system.time(loop(1E7))
user system elapsed
7.011 0.030 7.008
This is 0.7 micro seconds (on a MacBook Pro) per call of a very trivial function in a loop.
v = c(1,2,3,4)
s = create.some.complex.object()
lapply(v, function(i){
print(i)
s <<- some.complex.function.that.updates.s(s)
}) |> invisible()
Use of the <<- operator can sometimes get you into trouble and is (somewhat) discouraged, but when I want to mimic a for loop with side-effects this is a pattern I have found useful.
v = c(1,2,3,4)
s = create.some.complex.object()
lapply(v, function(i){
print(i)
assign('s', some.complex.function.that.updates.s(s), envir = .GlobalEnv)
}) |> invisible()
Using assign allows you to avoid the use of <<- operator. Using <<- is significantly faster than invoking the assign function. For performance reasons in more intensive applications it is very much worth it to replace sequential for loops with vectorized operations as the median execution time of lapply can be several orders of magnitude faster! Here are some toy benchmarks to support this assertion:
v <- c(1, 2, 3, 4)
microbenchmark::microbenchmark({
s <- 1
lapply(v, function(i) {
s <<- s + i
})
}, times = 1e4, unit = 'microseconds')
Median: ~ 4 microseconds
v <- c(1, 2, 3, 4)
microbenchmark::microbenchmark({
s <- 1
for(i in v) {
s <- s + i
}
}, times = 1e4, unit = 'microseconds')
Median: ~ 1488 microseconds