Run two pmap with different workers asynchronously in Julia - asynchronous

I have a function which I use with pmap to paralellize it. I would like to run 4 times this function asynchronously using 10 workers each but I can't run two or more pmap at the same time.
I'm using Julia v1.1 with a 40-CPUs machine on linux.
using Distributed
addprocs(4)
#everywhere function TestParallel(x)
a = 0
while a < 4
println("Value = ",x, " in worker = ", myid())
sleep(1)
a += 1
end
end
a = WorkerPool([2,3])
b = WorkerPool([4,5])
c = [i for i = 1:10]
#sync #async for i in c
pmap(x-> TestParallel(x), a, c)
pmap(x-> TestParallel(x), b, c)
end
I expect to have:
From worker 2: Value = 1 in worker = 2
From worker 3: Value = 2 in worker = 3
From worker 4: Value = 3 in worker = 4
From worker 5: Value = 4 in worker = 5
So the firsts two elements of c go to the first pmap and the next two elements to the second pmap, then whoever finishes first gets the next two elements.
Now I'm obtaining:
From worker 2: Value = 1 in worker = 2
From worker 3: Value = 2 in worker = 3
From worker 2: Value = 1 in worker = 2
From worker 3: Value = 2 in worker = 3
After the first pmap completes all elements of c the second pmap starts over solving all elements again.
From worker 2: Value = 9 in worker = 2
From worker 3: Value = 10 in worker = 3
From worker 5: Value = 2 in worker = 5
From worker 4: Value = 1 in worker = 4

There are some problems with your question: #sync and #async use green thread and you want to distribute your computations. Syntax #sync #async [some code] spawns a code asynchronously and waits for it to complete. Hence effectively it has the same meaning as [some code].
While your question is not clear I will assume that you want to launch 2 pmaps in parallel utilizing separate worker pools (this seems like the most likely thing you are trying to do).
In that case here is the code:
using Distributed
addprocs(4)
#everywhere function testpar2(x)
for a in 0:3
println("Value = $x [$a] in worker = $(myid())")
sleep(0.2)
end
return 1000*myid()+x*x #I assume you want to return some value
end
a = WorkerPool([2,3])
b = WorkerPool([4,5])
c = collect(1:10)
#sync begin
#async begin
res1 = pmap(x-> testpar2(x), a, c)
println("Got res1=$res1")
end
#async begin
res2 = pmap(x-> testpar2(x), b, c)
println("Got res2=$res2")
end
end
When running the above code you will see something like:
...
From worker 5: Value = 10 [3] in worker = 5
From worker 2: Value = 10 [3] in worker = 2
From worker 3: Value = 9 [3] in worker = 3
Got res2=[4001, 5004, 5009, 4016, 5025, 4036, 4049, 5064, 4081, 5100]
Got res1=[2001, 3004, 2009, 3016, 2025, 3036, 3049, 2064, 3081, 2100]
Task (done) #0x00000000134076b0
You can clearly seen that both pmaps have been run in parallel on different worker pools.

Related

Execution time orders of magnitude longer depending upon global definition location?

I finished writing the following program and began to do some cleanup after the debugging stage:
using BenchmarkTools
function main()
global solution = 0
global a = big"1"
global b = big"1"
global c = big"0"
global total = 0
while a < 100
while b < 100
c = a^b
s = string(c)
total = 0
for i in 1:length(s)
total = total + Int(s[i]) - 48
end
if total > solution
global solution = total
end
global b = b + 1
end
global b = 1
global a = a + 1
end
end
#elapsed begin
main()
end
#run #elapsed twice to ignore compilation overhead
t = #elapsed main()
print("Solution: ", solution)
t = t * 1000;
print("\n\nProgram completed in ", round.(t; sigdigits=5), " milliseconds.")
The runtime for my machine was around 150ms.
I decided to rearrange the globals to better match the typical layout of program, where globals are defined at the top:
using BenchmarkTools
global solution = 0
global a = big"1"
global b = big"1"
global c = big"0"
global total = 0
function main()
while a < 100
while b < 100
c = a^b
s = string(c)
total = 0
for i in 1:length(s)
total = total + Int(s[i]) - 48
end
if total > solution
global solution = total
end
global b = b + 1
end
global b = 1
global a = a + 1
end
end
#elapsed begin
main()
end
#run #elapsed twice to ignore compilation overhead
t = #elapsed main()
print("Solution: ", solution)
t = t * 1000;
print("\n\nProgram completed in ", round.(t; sigdigits=5), " milliseconds.")
Making that one change for where the globals were defined reduced the runtime on my machine to 0.0042ms.
Why is the runtime so drastically reduced?
Don't use globals.
Don't. Use. Globals. They are bad.
When you define your globals outside the main function, then the second time you run your function, a already equals 100, and main() bails out before doing anything at all.
Global variables are a bad idea, not just in Julia, but in programming in general. You can use them when defining proper constants, like π, and maybe some other specialized cases, but not for things like this.
Let me rewrite your function without globals:
function main_locals()
solution = 0
a = 1
while a < 100
b = 1
c = big(1)
while b < 100
c *= a
s = string(c)
total = sum(Int, s) - 48 * length(s)
solution = max(solution, total)
b += 1
end
a += 1
end
return solution
end
On my laptop this is >20x faster than your version with globals defined inside the function, that is, the version that actually works. The other one doesn't work as it should, so the comparison is not relevant.
Edit: I have even complicated this too much. The only thing you need to do is to remove all the globals from your first function, and return the solution, then it will work fine, and be almost as fast as the code I wrote:
function main_with_globals_removed()
solution = 0
a = big"1"
b = big"1"
c = big"0"
total = 0
while a < 100
while b < 100
c = a^b
s = string(c)
total = 0
for i in 1:length(s)
total = total + Int(s[i]) - 48
end
if total > solution
solution = total
end
b = b + 1
end
b = 1
a = a + 1
end
return solution # remember return!
end
Don't use globals.
In the first case, you are always assigning the globals and possibly changing types. Hence compiler needs to do extra work. I assume that the two programs generate different answers after the 2nd run because of the failure to reset globals…
Globals are discouraged in Julia for performance reasons because of potential type instability.

How to resolve "UndefVarError: consume not defined" in JULIA

When I call consume(generator) I get this error. Is this a version problem?
function fib()
a = 0
produce(a)
b = 1
produce(b)
while true
a , b = b , a+b
produce(b)
end
end
generator = Task(fib)
consume(generator)
Here a way to do something similar using a Channel
global channel = Channel{Int}(1)
function fib()
# global here is not needed because we aren't modifying the channel handle, just the contents
# global channel
a = 0
put!(channel, a)
b = 1
put!(channel, b)
while true
a , b = b , a+b
put!(channel,b)
sleep(1)
end
end
#async while true; item = take!(channel); println("item: $item"); end
#async fib()
Note that #async will hide errors, so you may want to do a try catch with showerror(stderr, e, catch_backtrace()) if things are not running.
Each #async produces a Task handle.
Also the put! and take! will block when the channel if filled up. You may want to expand the channel size to handle a larger buffer.

Julia - Linearly Parallelize

Julia can parallelize threads but doesn't run them in order (i.e. from 1 to 10 in order). Is there anyway to have julia run them linearly with a defined max parallelism? Maybe by pulling from a queue?
julia> Threads.#threads for i = 1:10
println("i = $i on thread $(Threads.threadid())")
end
i = 1 on thread 1
i = 7 on thread 3
i = 2 on thread 1
i = 8 on thread 3
i = 3 on thread 1
i = 9 on thread 4
i = 10 on thread 4
i = 4 on thread 2
i = 5 on thread 2
i = 6 on thread 2
You can start threads in order using #spawn but there is no guarantee that they will finish in order, which I think is what you are looking for.
But if you must, you could send the results to a central place which will make it print sequentially.
c = Channel(10)
function channel_printer(c)
res = []
for i in 1:10
push!(res, take!(c))
end
sort!(res, by=first)
for (_, r) in res
println(r)
end
end
Threads.#threads for i = 1:10
#println("i = $i on thread $(Threads.threadid())")
str = "i = $i on thread $(Threads.threadid())"
put!(c, (i, str))
end
channel_printer(c)
Note, the code is definitely not efficient by is meant to illustrate the idea.

#distributed seems to work, function return is wonky

I'm just learning how to do parallel computing in Julia. I'm using #sync #distributed at the start of a 3x nested for loop to parallelize things (see code at bottom). From the line println(errCmp[row, col]) I can watch all the elements of the array errCmp be printed out. E.g.
From worker 3: 2.351134946074191e9
From worker 4: 2.3500830193505473e9
From worker 5: 2.3502416529551845e9
From worker 2: 2.3509105625656652e9
From worker 3: 2.3508352842971106e9
From worker 4: 2.3497049296121807e9
From worker 5: 2.35048428351797e9
From worker 2: 2.350742582031195e9
From worker 3: 2.350616273660934e9
From worker 4: 2.349709546599313e9
However, when the function returns, errCmp is the array of zeros I pre-allocate at the begging.
Am I missing some closing term to collect everything?
function optimizeDragCalc(df::DataFrame)
paramGrid = [cd*AoM for cd = range(1e-3, stop = 0.01, length = 50), AoM = range(2e-4, stop = 0.0015, length = 50)]
errCmp = zeros(size(paramGrid))
# totalSize = size(paramGrid, 1) * size(paramGrid, 2) * size(df.time, 1)
#sync #distributed for row = 1:size(paramGrid, 1)
for col = 1:size(paramGrid, 2)
# Run the propagation here
BC = 1/paramGrid[row, col]
slns, _ = propWholeTraj(df, BC)
for time = 1:size(df.time, 1)
errDF = propError(slns[time], df, time)
errCmp[row, col] += sum(errDF.totalErr)
end # time
# println("row: ", row, " of ",size(paramGrid, 1)," col: ", col, " of ", size(paramGrid, 2))
println(errCmp[row, col])
end # col
end # row
# plot(heatmap(z = errCmp))
return errCmp, paramGrid
end
errCmp, paramGrid = #time optimizeDragCalc(df)
You did not provide a minimal working example but I guess it might be hard. So here is mine MWE. Let us assume that we want to use Distributed to calculate sums of Array's columns:
using Distributed
addprocs(2)
#everywhere using StatsBase
data = rand(1000,2000)
res = zeros(2000)
#sync #distributed for col = 1:size(data)[2]
res[col] = StatsBase.mean(data[:,col])
# does not work!
# ... because data is created locally and never returned!
end
In order to correct the above code you need to provide an aggregator function (I keep the example intentionally simplified - a further optimization is possible).
using Distributed
addprocs(2)
#everywhere using Distributed,StatsBase
data = rand(1000,2000)
#everywhere function t2(d1,d2)
append!(d1,d2)
d1
end
res = #sync #distributed (t2) for col = 1:size(data)[2]
[(myid(),col, StatsBase.mean(data[:,col]))]
end
Now let us see the output. We can see that some of the values have been calculated on worker 2 while others on worker 3:
julia> res
2000-element Array{Tuple{Int64,Int64,Float64},1}:
(2, 1, 0.49703681326230276)
(2, 2, 0.5035341367791002)
(2, 3, 0.5050607022354537)
⋮
(3, 1998, 0.4975699181976122)
(3, 1999, 0.5009498778934444)
(3, 2000, 0.499671315490524)
Further possible improvements/modifications:
use #spawnat to generate values at remote processes (instead of the master process and sending them)
use SharedArray - this allows to automatically distribute data among workers. From my experience requires very careful programming.
use ParallelDataTransfer.jl to send data among workers. Very easy to use, not efficient for huge number of messages.
always consider Julia threading mechanism (in some scenarios it makes life easier - again depends on the problem)

Is there a way to always start at 0 using math.sin() Lua

Edit: This question is about Roblox Lua.
I'm using math.sin(tick()) to get a variable number and would like for it to always start at 0. Is this possible using math.sin? Is there something else I can use other than tick() to make this work?
Example:
for i = 1, 10 do
local a = math.sin(tick())+1
print(a)
wait()
end
wait(1)
for i = 1, 10 do
local a = math.sin(tick())+1
print(a)
wait()
end
My goal is to have this number start at 0 every time and then increase from there. So, it would start at 0 then increase to 2 and then decrease back to zero and continue modulating between 0 and 2 for as long as I continue calling it. Using the example above the number starts at any arbitrary number between 0 and 2.
I took a different approach and came up with this. It does exactly what I wanted to do with math.sin(tick()). If anyone knows other ways to accomplish this I would like to know.
local n = 0
local m = 0
local Debounce = true
local function SmoothStep(num)
return num * num * (3 - 2 * num)
end
while Debounce do
for i = 1, 100 do
wait()
m = m+.01
n = SmoothStep(m)
print(n)
if not Debounce then break end
end
for i = 1, 100 do
wait()
m = m+.01
n = SmoothStep(m)
print(n)
if not Debounce then break end
end
end
To non-Roblox users: tick() returns the local UNIX time. wait(t) yields the current thread for t seconds, the smallest possible interval being roughly 1/30th of a second.
Given that math.sin(0) equals 0, what you have to do is subtract the tick() inside the loop with the time the loop began at. This should make the expression inside math.sin start at roughly 0 at the beginning of the loop.
local loopstart = tick()
for i = 1, 10 do
local a = math.sin(tick() - loopstart)+1
print(a)
wait()
end

Resources