I'm hoping to convert the second lapply function (# Make the new list) into a foreach loop, using the foreach package.
## Example data
lst <- lapply(1:30, function(x) lapply(1:5, function(y) rnorm(10)))
## Make the new list
res <- lapply(1:5, function(x) lapply(1:10, function(y) sapply(lst, function(z) z[[x]][[y]])))
I'm not sure if this is possible. I'm not concerned about lapply performing better than the foreach loops. For context, I'm re-organizing a list of lists of vectors in such a way:
new_thing[[5]][[10]][30] <- daily_by_security[[30]][[5]][10]
Thanks!
To figure you how to solve your problem, I looked at the foreach examples and the second one does exactly what you are looking for:
library("foreach")
example(foreach)
# equivalent to lapply(1:3, sqrt)
foreach(i=1:3) %do% sqrt(i)
I then adapted this to your problem:
lst <- lapply(1:30, function(x) lapply(1:5, function(y) rnorm(10)))
resFE <- foreach(i = 1:5) %do%
lapply(1:10, function(y) sapply(lst, function(z) z[[i]][[y]]))
Edit: The OP was able to figure out a solution based upon my work. Here is the solution:
resFE <- foreach(i = 1:5, .packages = "foreach") %dopar%
{ foreach(m = 1:10) %dopar%
{ foreach(t = lst, .combine = c) %do%
{ t[[i]][[m]] } } }
Related
In R, how to make each dataframe generated in foreach loop available as individual dataframes in global environment
I was only able to save them into a list (x), but the list is of 3 layers; there are over 40,000 dataframes, and unpacking them is very time consuming.
x <- foreach(q=1:countq, .export = ls(globalenv())) %do% {
foreach(p=1:countp, .export = ls(globalenv())) %do% {
foreach(o=1:countero, .export = ls(globalenv())) %dopar% {
n<-rbind(df_o, df_p, df_q)
}
}
It would be nice to have dataframes n1, n2, n3, ... till n40000 from this nested foreach loop.
Your data should be easy to use once you convert your list of lists of lists of data.frames into a single data.frame, e.g., with data.table::rbindlist.
library(doParallel)
registerDoParallel( cores = 2 )
countq <- countp <- countero <- 30
d <- mtcars
x <-
foreach(q=1:countq) %do% {
foreach(p=1:countp) %do% {
foreach(o=1:countero) %dopar% {
data.frame( q=q, p=p, o=o, d )
}
}
}
x <- lapply( x, function(u) lapply(u, data.table::rbindlist) )
x <- lapply( x, data.table::rbindlist )
x <- data.table::rbindlist(x)
x <- as.data.frame(x)
I am trying to convert the following for loop to foreach to take the advantage of parallel.
dt = data.frame(t(data.frame(a=sample(1:10,10), b=sample(1:10,10), c=sample(1:10,10), d=sample(1:10,10))))
X = as.matrix(dt)
c = ncol(X)
itemnames=names(dt)
sm=matrix(0,c,c)
colnames(sm)=itemnames
row.names(sm)=itemnames
for (j in 1:c){
ind=setdiff(1:c,j)
print(ind)
print(j)
sm[j,ind]=sign(X[j]-X[ind])
print(sm[j,ind])
}
cvec = 1:c
r = foreach(d = cvec, .combine = rbind) %dopar% {
ind = setdiff(1:10,d)
sm[d,ind]=sign(X[d]-X[ind])
}
With for loop I am getting the 10*10 matrix where the above sign function repelaces the off diagonal elements and it would be 0 for diagonal elements.
But with foreach, I am getting 10*9 matrix, its missing the diagonal elements and everything else is same.
Please help me to get the same output as for loop. Thanks in advance.
I am not sure what you are trying to achieve here, since you are only using the first ten elements of you matrix. This can be done without any loops:
sign(outer(X[1:10], X[1:10], FUN = "-"))
In addition, I am not sure that parallel processing will be faster for this kind of problem, even assuming that the real case is much bigger. But if you want to use foreach, you should not assign to the global sm within the loop and instead return a suitable vector in the end:
foreach(d = cvec, .combine = rbind) %dopar% {
ind <- setdiff(cvec,d)
res <- rep(0, 10)
res[ind] <- sign(X[d]-X[ind])
res
}
If you want to assign to a matrix in parallel, you'll need a shared matrix:
# devtools::install_github("privefl/bigstatsr")
library(bigstatsr)
sm <- FBM(c, c)
library(foreach)
cl <- parallel::makeCluster(3)
doParallel::registerDoParallel(cl)
r = foreach(d = cvec, .combine = c) %dopar% {
ind = setdiff(1:10,d)
sm[d,ind]=sign(X[d]-X[ind])
NULL
}
parallel::stopCluster(cl)
sm[]
Let's say I have two functions f1 and f2. f2 is designed to take the output of f1 as an argument, and f1 is designed to take its own output to update it. Before the loop starts, output from f1 is initialized. Then within each iteration, f2 takes the previous output from f1 and executes, then f1 executes to update its own output. Two vectors will gather the sequential output from f1 and f2 respectively. The following code is a simple working example:
f1 <- function(x) return(x + pi)
f2 <- function(x) return(log(x))
f1.result <- res1 <- f1(1)
f2.result <- NULL
for(i in 2:100) { ## Need to parallelize these two lines ##
res2 <- f2(res1); f2.result <- c(f2.result, res2)
res1 <- f1(res1); f1.result <- c(f1.result, res1)
}
I am looking to parallelize the two executions inside the loop i.e. to get them run at the same time. How do I achieve this in R? I am familiar with the basics of foreach but can't figure this out. Thanks.
OK I think I figured this out. It's actually pretty simple. I use the doParallel package:
f1 <- function(x) return(x + pi)
f2 <- function(x) return(log(x))
f1.result <- res1 <- f1(1)
f2.result <- NULL
library(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)
getDoParWorkers()
for(j in 2:100) {
res <- foreach(i = 1:2, .combine = c) %dopar% {
if(i==1) res <- f1(res1)
else res <- f2(res1)
}
res1 <- res[1]; f1.result <- c(f1.result, res1)
res2 <- res[2]; f2.result <- c(f2.result, res2)
}
stopCluster(cl)
How can I define something similar to for(i in nums) in case of foreach? It seems that foreach allows i=1:nums, but in my case numbers in nums are not sequential.
nums <- c(1,2,5,8)
prob <- foreach(i in nums, .combine = rbind, .packages = "randomForest") %dopar% {
#...
}
You don't use in with foreach(). You just use named parameters. Try
nums <- c(1,2,5,8)
prob <- foreach(i =nums, .combine = rbind, .packages = "randomForest") %dopar% {#...}
The parameters will accept a vector without problem. The 1:n syntax is just an easy way to create a vector of elements from 1 to n. But you can pass in your own vector directly.
I would like to transform the following nested for loop
first <- c(1, 2, 3)
second <- c(1, 2, 3)
dummy = matrix(double(), len(first), len(second))
c <- list()
c$sum <- dummy
c$times <- dummy
for (i in 1:len(first)) {
for (j in 1:len(second)) {
c$sum[i, j] <- first[i] + second[j]
c$times[i, j] <- first[i] * second[j]
}
}
c
into code using foreach and get the same list of matrices as a result. I tried many different things but the closest "result" is this:
x <- foreach(b = second, .combine = "cbind") %:% foreach(a = first, .combine = "c") %do% {
c <- list()
c$sum <- a+b
c$times <- a*b
out <- c
}
x
How to get this list of matrices right using foreach?
EDIT: One possibility is using a result and transform it after calling foreach:
res <- list()
res$sum <- x[rownames(x)=="sum", ]
rownames(res$sum) <- NULL
colnames(res$sum) <- NULL
res$times <- x[rownames(x)=="times", ]
rownames(res$times) <- NULL
colnames(res$times) <- NULL
res
How to "parametrize" foreach so there is no need to transform results?
You "just" have to provide the correct .combine function.
If you only have numbers, you can return an array rather than a list.
library(foreach)
library(abind)
first <- 1:3
second <- 4:5
x <-
foreach(b = second, .combine = function(...) abind(..., along=3)) %:%
foreach(a = first, .combine = rbind) %do% {
c( sum=a+b, times=a*b )
}
If you really need lists, writing the combining functions is much harder.
Instead, you can build a data.frame, and reshape it afterwards, if needed.
x <-
foreach(b = second, .combine = rbind) %:%
foreach(a = first, .combine = rbind) %do% {
data.frame(a=a, b=b, sum=a+b, times=a*b )
}
library(reshape2)
list(
sum = dcast(x, a ~ b, value.var="sum" )[,-1],
times = dcast(x, a ~ b, value.var="times")[,-1]
)