I am currently running a simulation using a for loop in R, but want to switch over to a foreach loop since it is faster. I use set.seed() in the for loop, and would like to use this again with foreach so I can obtain identical results.
For example, suppose I have
x <- c()
for (i in 1:10){
set.seed(i)
x[i] <- rnorm(1)
}
How can I do this same thing using foreach? I don't think this works:
x <- foreach(i = 1:10, ...) %dopar% {set.seed(i) ... }
This works:
library (foreach)
fn<-function(i)
{
set.seed(i)
y <- rnorm(1)
return(y)
}
x<-foreach(i=1:10) %do% fn(i)
print(x)
Related
the following is a parallel loop I am trying to run in R:
cl <- makeCluster(30,type="SOCK")
registerDoSNOW(cl)
results <- foreach (i = 1:30, .combine='bindlist', .multicombine=TRUE) %dopar% {
test <- i
test <- as.list(test)
list(test)
}
stopCluster(cl)
The output of my code is always a list and I want to combine the list into one large list. Thus I wrote the following .combine function:
bindlist <- function(x,y,...){
append(list(x),list(y),list(...))
}
As I am doing multiple runs and the number of variables change I tried to use .... However it does not work. How can I rewrite the .combine function so it can work with changing numbers of variables?
Have you considered using 'c'
results <- foreach (i = 1:4, .combine='c', .multicombine=TRUE) %dopar% {
test <- i
test <- as.list(test)
list(test)
}
If this adds an additional unwanted 'level' to your results, you could use 'unlist' to remove that level.
unlist(results, recursive = FALSE)
I'd like to know whether the cpv function within the trotter package works with %dopar%? I'm getting the following error:
task 1 failed - "object of type 'S4' is not subsettable"
Here's a small example:
library(doParallel)
library(trotter)
registerDoParallel(cores = 2)
x <- letters
combos <- cpv(2, 1:4)
print(combos)
num_combos <- length(combos)
results_list <- foreach(combo_num=1:num_combos) %dopar% { # many iterations
y <- x[combos[combo_num]]
# time consuming stuff follows that involves using y
}
Replacing %dopar% with %do% (or simply using a for loop) and it works fine.
Depending on the cluster type one needs to explicitly specify the used packages via the .packages argument. The following should work:
library(doParallel)
library(trotter)
cl <- makePSOCKcluster(2)
registerDoParallel(cl=cl)
x <- letters
combos <- cpv(2, 1:4)
num_combos <- length(combos)
rl <- foreach(combo_num=1:num_combos, .packages="trotter") %dopar% {
x[combos[combo_num]]
}
I have the following code where i get an error like:
object 'a' could not be found
This file is inside a package.
It works when i use %do% instead of %dopar%
a <- 2
fun1 <- function(x)
{
y <- x*a
return(y)
}
fun2 <- function(n)
{
foreach(data = 1:n, .combine = rbind, .multicombine = TRUE, .export = c("a","fun1")) %dopar%
{
load_all() # Works, get error if line is removed
y <- fun1(data)
return(y)
}
}
In my main file where i use devtools::load-all() to load the package and have used doParallel to run my foreach on multiple cores i execute fun2(5) when getting the error.
If i use a directly in the foreach function body everything works. But when i use a function that uses the a variable, i get the error.
My main issue is i wan't to be able to call the function fun1 from both fun2 aswell as stand alone from the package.
Cluster is created as
cl <- makeCluster(16)
registerDoParallel(cl)
clusterCall(cl, function(x) .libPaths(x), .libPaths())
# %dopar% code
stopCluster(cl)
This works normally on my computer:
registerDoSNOW(makeCluster(2, type = "SOCK"))
foreach(i = 1:M,.combine = "c") %dopar% {
sum(rnorm(M))
}
So I can say that I can run parallelized code on this computer, right?
Ok. I have a piece of code that I wish to run on parallel with foreach. It runs perfectly when it's written with %do%, but doesn't work properly when I change it to %dopar%. (PS: I have already initialized the cluster with registerDoSNOW(makeCluster(2, type = "SOCK")) in the same way as before.)
My main interest in the code is getting the vector u.varpred. I get it nicely with %do%, but when I run it with %dopar%, the vector comes as a NULL.
Here is the loop with the code that's needed to run it all properly. It uses functions in the geoR package.
#you can pretty much ignore all this, it's just preparation for the loop
N=20
NN=10
set.seed(111);
datap <- grf(N, cov.pars=c(20, 5),nug=1)
grid.o <- expand.grid(seq(0, 1, l=100), seq(0, 1, l=100))
grid.c <- expand.grid(seq(0, 1, l=NN), seq(0,1, l=NN))
beta1=mean(datap$data)
emv<- likfit(datap, ini=c(10,0.4), nug=1)
krieging <- krige.conv(datap, loc=grid.o,
krige=krige.control(type.krige="SK", trend.d="cte",
beta =beta1, cov.pars=emv$cov.pars))
names(grid.c) = names(as.data.frame(datap$coords))
list.geodatas<-list()
valores<-c(datap$data,0)
list.dataframes<-list()
list.krigings<-list(); i=0; u.varpred=NULL;
#here is the foreach code
t<-proc.time()
foreach(i=1:length(grid.c[,1]), .packages='geoR') %do% {
list.dataframes[[i]] <- rbind(datap$coords,grid.c[i,]);
list.geodatas[[i]] <- as.geodata(data.frame(cbind(list.dataframes[[i]],valores)))
list.krigings[[i]] <- krige.conv(list.geodatas[[i]], loc=grid.o,
krige=krige.control(type.krige="SK", trend.d="cte",
beta =beta1, cov.pars=emv$cov.pars));
u.varpred[i] <- mean(krieging$krige.var - list.krigings[[i]]$krige.var)
list.dataframes[[i]]<-0 #i dont need those objects anymore but since they
# are lists i dont want to put <-NULL as it'll ruin their ordering
list.krigings[[i]]<- 0
list.geodatas[[i]] <-0
}
t<-proc.time()-t
t
You can check that this runs nicely (provided you have the following packages: geoR, foreach and doSNOW). But once I use registerDoSNOW(......) and %dopar%, u.varpred comes as a NULL.
Could you guys please try to see if I made a mistake in the foreach statement/process or if it's just the code that can't be parallel? (I thought it could, because any given iteration does not deppend on any of the iterations before it..)
I am sorry both the code and this question are so long. Thanks in advance for taking the time to read it.
My friend helped me directly. Here is a way it works:
u.varpred <- foreach(i = 1:length(grid.c[,1]), .packages = 'geoR', .combine = "c") %dopar% {
list.dataframes[[i]] <- rbind(datap$coords,grid.c[i,]);
list.geodatas[[i]] <- as.geodata(data.frame(cbind(list.dataframes[[i]],valores)));
list.krigings[[i]] <- krige.conv(list.geodatas[[i]], loc = grid.o,
krige = krige.control(type.krige = "SK", trend.d = "cte",
beta = beta1, cov.pars = emv$cov.pars));
u.varpred <- mean(krieging$krige.var - list.krigings[[i]]$krige.var);
list.dataframes[[i]] <- 0;
list.krigings[[i]] <- 0;
list.geodatas[[i]] <- 0;
u.varpred #this makes the results go into u.varpred
}
He gave me an example on why this works:
a <- NULL
foreach(i = 1:10) %dopar% {
a <- 5
}
print(a)
# a is still NULL
a <- NULL
a <- foreach(i = 1:10) %dopar% {
a <- 5
a
}
print(a)
#now it works
Hope this helps anyone.
I've looked through much of the documentation and done a fair amount of Googling, but can't find an answer to the following question: Is there a way to induce 'next-like' functionality in a parallel foreach loop using the foreach package?
Specifically, I'd like to do something like (this doesn't work with next but does without):
foreach(i = 1:10, .combine = "c") %dopar% {
n <- i + floor(runif(1, 0, 9))
if (n %% 3) {next}
n
}
I realize I can nest my brackets, but if I want to have a few next conditions over a long loop this very quickly becomes a syntax nightmare.
Is there an easy workaround here (either next-like functionality or a different way of approaching the problem)?
You could put your code in a function and call return. It's not clear from your example what you want it to do when n %% 3 so I'll return NA.
funi <- function(i) {
n <- i + floor(runif(1, 0, 9))
if (n %% 3) return(NA)
n
}
foreach(i = 1:10, .combine = "c") %dopar% { funi(i) }
Although it seems strange, you can use a return in the body of a foreach loop, without the need for an auxiliary function (as demonstrated by #Aaron):
r <- foreach(i = 1:10, .combine='c') %dopar% {
n <- i + floor(runif(1, 0, 9))
if (n %% 3) return(NULL)
n
}
A NULL is returned in this example since it is filtered out by the c function, which can be useful.
Also, although it doesn't work well for your example, the when function can take the place of next at times, and is useful for preventing the computation from taking place at all:
r <- foreach(i=1:5, .combine='c') %:%
foreach(j=1:5, .combine='c') %:%
when (i != j) %dopar% {
10 * i + j
}
The inner expression is only evaluated 20 times, not 25. This is particularly useful with nested foreach loops, since when has access to all of the upstream iterator values.
Update
If you want to filter out NULLs when returning the results in a list, you need to write your own combine function. Here's a complete example that demonstrates a combine function that works like the default combine function but includes a filtering mechanism:
library(doSNOW)
cl <- makeSOCKcluster(3)
registerDoSNOW(cl)
filteredlist <- function(a, ...) {
values <- list(...)
c(a, values[! sapply(values, is.null)])
}
r <- foreach(i=1:200, .combine='filteredlist', .init=list(),
.multicombine=TRUE) %dopar% {
# filter out odd values of i
if (i %% 2) return(NULL)
i
}
Note that this code works correctly when there are more than 100 task results (100 is the default value of the .maxcombine option).