outptut two objects using foreach - r

I would like to know if it would be possible to output two different objects after using foreach %dopar% loop.
I will try to explain what I am looking for. Let's suppose I have two data.frames as a result of several operations inside the loop:
library(doMC)
library(parallel)
registerDoMC(cores=4)
result <- foreach(i=1:100) %dopar% {
#### some code here
#### some code here
vec1 <- result_from_previous code # It would be the 1st object I'd like to ouput
vec2 <- result_from_previous code # It would be the 2nd object I'd like to output
}
My desired output would be a list of data.frames of length 2, such as:
dim(result[[1]]) # equals to nrow=length(vec1) and ncol=100
dim(result[[2]]) # equals to nrow=length(vec2) and ncol=100
I have tried with this from a previous post Saving multiple outputs of foreach dopar loop:
comb <- function(x, ...) {
lapply(seq_along(x), function(i) c(x[[i]], lapply(list(...), function(y) y[[i]])))
result <- foreach(i=1:100, .comb='comb', .multicombine=TRUE) %dopar% {
#### some code here
#### some code here
vec1 <- result_from_previous code
vec2 <- result_from_previous code
list(vec1, vec2)
}
But it doesn't give the expected result
When I do the following:
result <- foreach(i=1:100, .comb=cbind) %dopar% {
#### some code here
#### some code here
vec1 <- result_from_previous code
vec2 <- result_from_previous code
}
I obtain only the data.frame of vec2. Is there any way of returning or saving both outputs?
Thanks

If you need to return two objects from the body of the foreach loop, you must bundle them into one object somehow or other, and a list is the most general way to do that. The trick is to provide an appropriate combine function to achieve the desired final result. If you want to combine all of the vec1 objects with cbind, and also all of the vec2 objects with cbind, the mapply function is quite handy. I think this is what you want:
comb <- function(...) {
mapply('cbind', ..., SIMPLIFY=FALSE)
}
Here's a little test program for this combine function:
result <- foreach(i=1:100, .combine='comb', .multicombine=TRUE) %dopar% {
vec1 <- rep(i, 10)
vec2 <- rep(2*i, 10)
list(vec1, vec2)
}
This will return a list containing two, 10 X 100 matrices, but the same combine function can be used if vec1 and vec2 are data frames.

Related

Apply a function to objects in my global environment R

This code chunk creates a 10 objects based of length of alpha.
alpha <- seq(.1,1,by=.1)
for (i in 1:length(alpha)){
assign(paste0("list_ts_ses_tune", i),NULL)
}
How do I put each function into the new list_ts_ses_tune1 ... null objects I've created? Each function puts in a list, and works if I set list_ts_ses_tune1 <- lapply ...
for (i in 1:length(alpha))
{
list_ts_ses_tune[i] <- lapply(list_ts, function(x)
forecast::forecast(ses(x,h=24,alpha=alpha[i])))
list_ts_ses_tune[i] <- lapply(list_ts_ses_tune[i], "[", c("mean"))
}
Maybe this is a better way to do this? I need each individual output in a list of values.
Edit:
for (i in 1:length(alpha))
{
list_ts_ses_tune[[i]] <- lapply(list_ts[1:(length(list_ts)/2)],
function(x)
forecast::forecast(ses(x,h=24,alpha=alpha[i])))
list_ts_ses_tune[[i]] <- lapply(list_ts_ses_tune[[i]], "[", c("mean"))
}
We can use mget to return all the objects into a list
mget(ls(pattern = '^list_ts_ses_tune\\d+'))
Also, the NULL list can be created more easily instead of 10 objects in the global environment
list_ts_ses_tune <- vector('list', length(alpha))
Now, we can just use the OP's code
for (i in 1:length(alpha))
{
list_ts_ses_tune[[i]] <- lapply(list_ts, function(x)
forecast::forecast(ses(x,h=24,alpha=alpha[i])))
}
If we want to create a single data.frame
for(i in seq_along(alpha)) {
list_ts_ses_tune[[i]] <- data.frame(Mean = do.call(rbind, lapply(list_ts, function(x)
forecast::forecast(ses(x,h=24,alpha=alpha[i]))$mean)))
}
You could simply accomplish everything by doing:
library(forecast)
list_ts_ses_tune <- Map(function(x)
lapply(alpha, function(y)forecast(ses(x,h=24,alpha=y))['mean']), list_ts)

Using foreach instead of for loop

I am trying to learn foreach to parallelise my task
My for-loop looks like this:
# create an empty matrix to store results
mat <- matrix(-9999, nrow = unique(dat$mun), ncol = 2)
for(mun in unique(dat$mun)) {
dat <- read.csv(paste0("data",mun,".csv")
tot.dat <- sum(dat$x)
mat[mat[,1]== mun,2] <- tot.dat
}
unique(dat$mun) has a length of 5563.
I want to use foreach to pararellise my task.
library(foreach)
library(doParallel)
# number of iterations
iters <- 5563
foreach(icount(iters)) %dopar% {
mun <- unique(dat$mun)[mun] # this is where I cannot figure out how to assing mun so that it read the data for mun
dat <- read.csv(paste0("data",mun,".csv")
tot.dat <- sum(dat$x)
mat[mat[,1]== mun,2] <- tot.dat
}
This could be one solution.
Do note that I'm using windows here, and i specified registerDoParallel() for it to work.
library(foreach)
library(doParallel)
# number of iterations
iters <- 5563
registerDoParallel()
mun <- unique(dat$mun)
tableList <- foreach(i=1:iters) %dopar% {
dat <- read.csv(paste0("data",mun[i],".csv")
tot.dat <- sum(dat$x)
}
unlist(tableList)
Essentially, whatever result inside {...} will be stored inside a list.
In this case, the result (tot.dat which is a number) is compiled in tableList, and by performing unlist() we can convert it to a vector for further use.
The result inside {...} can be anything, a single number, a vector, a dataframe, or anything.
Another approach for your problem would be to combine all existing data together, labelling it with its appropriate source file, so the middle component will look something like
library(plyr)
tableAll <- foreach(i=1:iters) %dopar% {
dat <- read.csv(paste0("data",mun[i],".csv")
dat$source = mun[i]
}
rbind.fill(tableAll)
Then we can use it for further analysis.

Putting user-defined on a list in for loop

I have problems storing user defined functions in R list when they are put on it in a for loop.
I have to define some segment-specific functions based on some parameters, so I create functions and put them on a list looping through segments with for-loop. The problem is I get same function everywhere on a result list.
The code looks like this:
n <- 100
segmenty <- 1:n
segment_functions <- list()
for (i in segmenty){
segment_functions[[i]] <- function(){return(i)}
}
When i run the code what I get is the same function (last created in the loop) for all indexes:
## for all k
segment_functions[[k]]()
[1] 100
There is no problem when I put the functions on list manually e.g.
segment_functions[[1]] <- function(){return(1)}
segment_functions[[2]] <- function(){return(2)}
segment_functions[[3]] <- function(){return(3)}
works just fine.
I honsetly have no idea what's wrong. Could you help?
You need to use the force function to ensure that the evaluation of i is done during the assignment into the list:
n <- 100
segmenty <- 1:n
segment_functions <- list()
f <- function(i) { force(i); function() return(i) }
for (i in segmenty){
segment_functions[[i]] <- f(i)
}
I'd use lapply and capture i in a clousre of the wrapper:
segment_functions <- lapply(1:100, function(i) function() i)

Saving multiple outputs of foreach dopar loop

I would like to know if/how it would be possible to return multiple outputs as part of foreach dopar loop.
Let's take a very simplistic example. Let's suppose I would like to do 2 operations as part of the foreach loop, and would like to return or save the results of both operations for each value of i.
For only one output to return, it would be as simple as:
library(foreach)
library(doParallel)
cl <- makeCluster(3)
registerDoParallel(cl)
oper1 <- foreach(i=1:100000) %dopar% {
i+2
}
oper1 would be a list with 100000 elements, each element is the result of the operation i+2 for each value of i.
Suppose now I would like to return or save the results of two different operations separately, e.g. i+2 and i+3. I tried the following:
oper1 = list()
oper2 <- foreach(i=1:100000) %dopar% {
oper1[[i]] = i+2
return(i+3)
}
hoping that the results of i+2 will be saved in the list oper1, and that the results of the second operation i+3 will be returned by foreach. However, nothing gets populated in the list oper1! In this case, only the result of i+3 gets returned from the loop.
Is there any way of returning or saving both outputs in two separate lists?
Don't try to use side-effects with foreach or any other parallel program package. Instead, return all of the values from the body of the foreach loop in a list. If you want your final result to be a list of two lists rather than a list of 100,000 lists, then specify a combine function that transposes the results:
comb <- function(x, ...) {
lapply(seq_along(x),
function(i) c(x[[i]], lapply(list(...), function(y) y[[i]])))
}
oper <- foreach(i=1:10, .combine='comb', .multicombine=TRUE,
.init=list(list(), list())) %dopar% {
list(i+2, i+3)
}
oper1 <- oper[[1]]
oper2 <- oper[[2]]
Note that this combine function requires the use of the .init argument to set the value of x for the first invocation of the combine function.
I prefer to use a class to hold multiple results for a %dopar% loop.
This example spins up 3 cores, calculates multiple results on each core, then returns the list of results to the calling thread.
Tested under RStudio, Windows 10, and R v3.3.2.
library(foreach)
library(doParallel)
# Create class which holds multiple results for each loop iteration.
# Each loop iteration populates two properties: $result1 and $result2.
# For a great tutorial on S3 classes, see:
# http://www.cyclismo.org/tutorial/R/s3Classes.html#creating-an-s3-class
multiResultClass <- function(result1=NULL,result2=NULL)
{
me <- list(
result1 = result1,
result2 = result2
)
## Set the name for the class
class(me) <- append(class(me),"multiResultClass")
return(me)
}
cl <- makeCluster(3)
registerDoParallel(cl)
oper <- foreach(i=1:10) %dopar% {
result <- multiResultClass()
result$result1 <- i+1
result$result2 <- i+2
return(result)
}
stopCluster(cl)
oper1 <- oper[[1]]$result1
oper2 <- oper[[1]]$result2
This toy example shows how to return multiple results from a %dopar% loop.
This example:
Spins up 3 cores.
Renders a graph on each core.
Returns the graph and an attached message.
Prints the graphs and it's attached message out.
I found this really useful to speed up using Rmarkdown to print 1,800 graphs into a PDF document.
Tested under Windows 10, RStudio, and R v3.3.2.
R code:
# Demo of returning multiple results from a %dopar% loop.
library(foreach)
library(doParallel)
library(ggplot2)
cl <- makeCluster(3)
registerDoParallel(cl)
# Create class which holds multiple results for each loop iteration.
# Each loop iteration populates two properties: $resultPlot and $resultMessage.
# For a great tutorial on S3 classes, see:
# http://www.cyclismo.org/tutorial/R/s3Classes.html#creating-an-s3-class
plotAndMessage <- function(resultPlot=NULL,resultMessage="?")
{
me <- list(
resultPlot = resultPlot,
resultMessage = resultMessage
)
# Set the name for the class
class(me) <- append(class(me),"plotAndMessage")
return(me)
}
oper <- foreach(i=1:5, .packages=c("ggplot2")) %dopar% {
x <- c(i:(i+2))
y <- c(i:(i+2))
df <- data.frame(x,y)
p <- ggplot(df, aes(x,y))
p <- p + geom_point()
message <- paste("Hello, world! i=",i,"\n",sep="")
result <- plotAndMessage()
result$resultPlot <- p
result$resultMessage <- message
return(result)
}
# Print resultant plots and messages. Despite running on multiple cores,
# 'foreach' guarantees that the plots arrive back in the original order.
foreach(i=1:5) %do% {
# Print message attached to plot.
cat(oper[[i]]$resultMessage)
# Print plot.
print(oper[[i]]$resultPlot)
}
stopCluster(cl)

*apply in r to repeat a function

I've written a function that is a simulation, that outputs a vector of 100 elements, and I want to use the *apply functions to run the function many times and store the repeated output in a new vector for each time the simulation is run.
The function looks like:
J <- c(1:100)
species_richness <- function(J){
a <- table(J)
return(NROW(a))
}
simulation <- function(J,gens,ploton=FALSE,v=0.1){
species_richness_output <- rep(NA,gens)
for(rep in 1:gens){
index1 <- sample(1:length(J),1)
if(runif(1,0,1) < v){
J[index1] <- (rep+100)
}
else{
index2 <- sample(1:length(J),1)
while(index1==index2) {
index2 <- sample(1:length(J),1)
}
J[index1] <- J[index2]
}
species_richness_output[rep] <- species_richness(J)
}
species_abundance <- function(J){
a <- table(J)
return(a)
}
abuntable <- species_abundance(J)
print(abuntable)
octaves <- function(abuntable){
oct <- (rep(0,log2(sum(abuntable))))
for(i in 1:length(abuntable)){
oct2 <- floor(log2(abuntable[i])+1)
oct[oct2] <- oct[oct2]+1
}
print(oct)
}
# octaves(c(100,64,63,5,4,3,2,2,1,1,1,1))
if(ploton==TRUE){
hist(octaves(abuntable))
}
print(species_richness(J))
return(J)
}
simulation(J, 10000,TRUE,v=0.1)
So that's my function, it takes J a vector I defined earlier, manipulates it, then returns:
the newly simulated vector J of 100 elements
a function called octave that categorises the new vector
a histogram corresponding to the above "octave"
I have tried a number of variations: using lapply, mapply
putting args=args_from_original_simulation
simulation_repeated <- c(mapply(list, FUN=simulation(args),times=10000))
but I keep getting an error with the match.fun part of the mapply function
Error in match.fun(FUN) :
'simulation(J, 10000, FALSE, 0.1)' is not a function, character or symbol
This is despite the simulation I have written showing as being saved as a function in the workspace.
Does anyone know what this error is pointing to?
In this line:
simulation_repeated <- c(mapply(list, FUN=simulation(args),times=10000))
You are not giving a function to mapply. You are (essentially) passing the result of calling simulation(args) and simulation does not return a function.

Resources