I'm using doSNOW- package for parallelizing tasks, which differ in length. When one thread is finished, I want
some information generated by old threads passed to the next thread
start the next thread immediatly (loadbalancing like in clusterApplyLB)
It works in singlethreaded (see makeClust(spec = 1 ))
#Register Snow and doSNOW
require(doSNOW)
#CHANGE spec to 4 or more, to see what my problem is
registerDoSNOW(cl <- makeCluster(spec=1,type="SOCK",outfile=""))
numbersProcessed <- c() # init processed vector
x <- foreach(i = 1:10,.export=numbersProcessed) %dopar% {
#Do working stuff
cat(format(Sys.time(), "%X"),": ","Starting",i,"(Numbers processed so far:",numbersProcessed, ")\n")
Sys.sleep(time=i)
#Appends this number to general vector
numbersProcessed <- append(numbersProcessed,i)
cat(format(Sys.time(), "%X"),": ","Ending",i,"\n")
cat("--------------------\n")
}
#End it all
stopCluster(cl)
Now change the spec in "makeCluster" to 4. Output is something like this:
[..]
Type: EXEC
18:12:21 : Starting 9 (Numbers processed so far: 1 5 )
18:12:23 : Ending 6
--------------------
Type: EXEC
18:12:23 : Starting 10 (Numbers processed so far: 2 6 )
18:12:25 : Ending 7
At 18:12:21 thread 9 knew, that thread 1 and 5 have been processed. 2 seconds later thread 6 ends. The next thread has to know at least about 1, 5 and 6, right?. But thread 10 only knows about 6 and 2.
I realized, this has to do something with the cores specified in makeCluster. 9 knows about 1, 5 and 9 (1 + 4 + 4), 10 knows about 2,6 and 10 (2 + 4 + 4).
Is there a better way to pass "processed" stuff to further generations of threads?
Bonuspoints: Is there a way to "print" to the master- node in parallel processing, without having these "Type: EXEC" etc messages from the snow package? :)
Thanks!
Marc
My bad. Damn.
I thought, foreach with %dopar% is load-balanced. This isn't the case, and makes my question absolete, because there can nothing be executed on the host-side while parallel processing. That explains why global variables are only manipulated on the client side and never reach the host.
Related
Julia can parallelize threads but doesn't run them in order (i.e. from 1 to 10 in order). Is there anyway to have julia run them linearly with a defined max parallelism? Maybe by pulling from a queue?
julia> Threads.#threads for i = 1:10
println("i = $i on thread $(Threads.threadid())")
end
i = 1 on thread 1
i = 7 on thread 3
i = 2 on thread 1
i = 8 on thread 3
i = 3 on thread 1
i = 9 on thread 4
i = 10 on thread 4
i = 4 on thread 2
i = 5 on thread 2
i = 6 on thread 2
You can start threads in order using #spawn but there is no guarantee that they will finish in order, which I think is what you are looking for.
But if you must, you could send the results to a central place which will make it print sequentially.
c = Channel(10)
function channel_printer(c)
res = []
for i in 1:10
push!(res, take!(c))
end
sort!(res, by=first)
for (_, r) in res
println(r)
end
end
Threads.#threads for i = 1:10
#println("i = $i on thread $(Threads.threadid())")
str = "i = $i on thread $(Threads.threadid())"
put!(c, (i, str))
end
channel_printer(c)
Note, the code is definitely not efficient by is meant to illustrate the idea.
I'm trying to implement a bfs breadth first search algorithm in R. I know about the graph::bfs function and do_bfs from DiaGrammer. I think my problem is in the "for" of the bfs function.
The input would be a graph as the following
1
2 3
4 5 6 7
The output should be the path. in this case, if i start from 1, 1,2,3,4,5,6,7
library (igraph)
library(foreach)
library(flifo)
library(digest)
# devtools::install_github("rdpeng/queue")
This packages seemed useful for the implementation, especially the queue one.
t<-make_tree(7, children = 2, mode ="out")
plot.igraph(t)
bfsg(t, 1)
bfsg<- function (g, n) {
m <- c(replicate(length(V(t)), 0))
q<-flifo::fifo ()
m[n]<- 1
push (q, n)
pr <- c(replicate(length(V(t)), 0))
}
at this point, 1 should be in the queue, afrter this, got printed and popped out of the queue. After the pop, the algorithm should go to 2 and 3
while (size(q)!=0){
print (n)
pop(q)
}
for (i in unlist(adjacent_vertices(g, n, mode = "out"))){
if (m[i] == 0){
push(q,i)
m[i]=2
}
}
I am receiving some values in my R process and I want to compute them asynchronously. I am using promises and future package.
This is how my current code looks like:
arr = list()
i=0
while(i < 10)
{
a = read messages from KAFKA topic
arr[[i]] = future(DoSomething(a))
i = i + 1
}
Now, arr contains a list of promises
How do I get value() of the promise that has resolved first (and so on)?
Something like promise.race in javascript.
Edit: I just re-read your question and saw that you were asking about getting the first result, not just all results. Below is the code for getting that. It's a while loop that waits for any result to be ready and then moves forward when there's a result ready.
There is also a function called promise_race in the promises package, but the issue with the promises package is that it can only output results. You can't get the value produced back into a variable for further computations in the main thread.
require(future)
plan(multiprocess)
longRunningFunction <- function(value) {
random1<- runif(min= 5 ,max = 30,n = 1)
Sys.sleep(random1)
return(value)
}
arr = list()
#changed starting number to 1 since R lists start at 1, not 0
i=1
#If the number of futures generated is more than the number of cores available, then the main thread will block until the first future completes and allows more futures to be started
while(i < 6)
{
arr[[i]] = future(longRunningFunction(i))
i = i + 1
}
while(all(!resolved(arr))){ }
raceresults_from_future<-lapply(arr[resolved(arr)], value)
print(paste("raceresults_from_future: ",raceresults_from_future) )
I am working on the problem from Cracking the Coding Interview:
A child is running up a staircase with n steps, and can hop either 1 step, 2 steps, or 3 steps at a time.
Implement a method to count how many possible ways the child can run up the stairs.
I came up with a dynamic solution:
def dynamic_prog(N):
store_values = {1:1,2:2,3:3}
return dynamic_prog_helper(N, store_values)
def dynamic_prog_helper(N, map_n):
if N in map_n:
return map_n[N]
map_n[N] = dynamic_prog_helper(N-1, map_n) + dynamic_prog_helper(N-2, map_n) + dynamic_prog_helper(N-3,map_n)
return map_n[N]
I am not sure why it does not compute correctly.
dynamic_prog(5) = 11, but should be 13
dynamic_prog(4) = 6, but should be 7
Can someone point me in the right direction?
The critical problem is that your initial value for store_values[3] is wrong. From 3 steps down, you have 4 possibilities:
3
2 1
1 2
1 1 1
Fixing that error gets the expected results:
def dynamic_prog(N):
store_values = {1:1,2:2,3:4}
return dynamic_prog_helper(N, store_values)
...
for stair_count in range(3, 6):
print dynamic_prog(stair_count)
Output:
4
7
13
Let's say I've created a cluster using: cl <- parallel::makeCluster(2) and I send a call to the first node using parallel:::sendCall(cl[[1]], f, arg).
I want to get the results of a specific node (in this case the first node). I can do that using parallel:::recvResult(cl[[1]]). However, this process blocks until a result is received. Is there any way to check the status of of a specific node? I.e. a status like "is processing" or "is finished".
I'd recommend using the standard socketSelect function. For example:
library(parallel)
cl <- makePSOCKcluster(3, outfile="")
# Send task to worker 1
x <- 2
parallel:::sendCall(cl[[1]], sqrt, list(x), tag=1)
# Wait up to 5 seconds for worker 1 to send the result back
ready <- socketSelect(list(cl[[1]]$con), timeout=5)
if (ready > 0) {
result <- parallel:::recvData(cl[[1]])
cat(sprintf("sqrt(%f) = %f\n", x, result$value))
} else {
cat("result not ready after five seconds\n")
}
See the source for the recvOneData.SOCKcluster function in the file snowSOCK.R for a more complete example.