Julia can parallelize threads but doesn't run them in order (i.e. from 1 to 10 in order). Is there anyway to have julia run them linearly with a defined max parallelism? Maybe by pulling from a queue?
julia> Threads.#threads for i = 1:10
println("i = $i on thread $(Threads.threadid())")
end
i = 1 on thread 1
i = 7 on thread 3
i = 2 on thread 1
i = 8 on thread 3
i = 3 on thread 1
i = 9 on thread 4
i = 10 on thread 4
i = 4 on thread 2
i = 5 on thread 2
i = 6 on thread 2
You can start threads in order using #spawn but there is no guarantee that they will finish in order, which I think is what you are looking for.
But if you must, you could send the results to a central place which will make it print sequentially.
c = Channel(10)
function channel_printer(c)
res = []
for i in 1:10
push!(res, take!(c))
end
sort!(res, by=first)
for (_, r) in res
println(r)
end
end
Threads.#threads for i = 1:10
#println("i = $i on thread $(Threads.threadid())")
str = "i = $i on thread $(Threads.threadid())"
put!(c, (i, str))
end
channel_printer(c)
Note, the code is definitely not efficient by is meant to illustrate the idea.
Related
I finished writing the following program and began to do some cleanup after the debugging stage:
using BenchmarkTools
function main()
global solution = 0
global a = big"1"
global b = big"1"
global c = big"0"
global total = 0
while a < 100
while b < 100
c = a^b
s = string(c)
total = 0
for i in 1:length(s)
total = total + Int(s[i]) - 48
end
if total > solution
global solution = total
end
global b = b + 1
end
global b = 1
global a = a + 1
end
end
#elapsed begin
main()
end
#run #elapsed twice to ignore compilation overhead
t = #elapsed main()
print("Solution: ", solution)
t = t * 1000;
print("\n\nProgram completed in ", round.(t; sigdigits=5), " milliseconds.")
The runtime for my machine was around 150ms.
I decided to rearrange the globals to better match the typical layout of program, where globals are defined at the top:
using BenchmarkTools
global solution = 0
global a = big"1"
global b = big"1"
global c = big"0"
global total = 0
function main()
while a < 100
while b < 100
c = a^b
s = string(c)
total = 0
for i in 1:length(s)
total = total + Int(s[i]) - 48
end
if total > solution
global solution = total
end
global b = b + 1
end
global b = 1
global a = a + 1
end
end
#elapsed begin
main()
end
#run #elapsed twice to ignore compilation overhead
t = #elapsed main()
print("Solution: ", solution)
t = t * 1000;
print("\n\nProgram completed in ", round.(t; sigdigits=5), " milliseconds.")
Making that one change for where the globals were defined reduced the runtime on my machine to 0.0042ms.
Why is the runtime so drastically reduced?
Don't use globals.
Don't. Use. Globals. They are bad.
When you define your globals outside the main function, then the second time you run your function, a already equals 100, and main() bails out before doing anything at all.
Global variables are a bad idea, not just in Julia, but in programming in general. You can use them when defining proper constants, like π, and maybe some other specialized cases, but not for things like this.
Let me rewrite your function without globals:
function main_locals()
solution = 0
a = 1
while a < 100
b = 1
c = big(1)
while b < 100
c *= a
s = string(c)
total = sum(Int, s) - 48 * length(s)
solution = max(solution, total)
b += 1
end
a += 1
end
return solution
end
On my laptop this is >20x faster than your version with globals defined inside the function, that is, the version that actually works. The other one doesn't work as it should, so the comparison is not relevant.
Edit: I have even complicated this too much. The only thing you need to do is to remove all the globals from your first function, and return the solution, then it will work fine, and be almost as fast as the code I wrote:
function main_with_globals_removed()
solution = 0
a = big"1"
b = big"1"
c = big"0"
total = 0
while a < 100
while b < 100
c = a^b
s = string(c)
total = 0
for i in 1:length(s)
total = total + Int(s[i]) - 48
end
if total > solution
solution = total
end
b = b + 1
end
b = 1
a = a + 1
end
return solution # remember return!
end
Don't use globals.
In the first case, you are always assigning the globals and possibly changing types. Hence compiler needs to do extra work. I assume that the two programs generate different answers after the 2nd run because of the failure to reset globals…
Globals are discouraged in Julia for performance reasons because of potential type instability.
When I call consume(generator) I get this error. Is this a version problem?
function fib()
a = 0
produce(a)
b = 1
produce(b)
while true
a , b = b , a+b
produce(b)
end
end
generator = Task(fib)
consume(generator)
Here a way to do something similar using a Channel
global channel = Channel{Int}(1)
function fib()
# global here is not needed because we aren't modifying the channel handle, just the contents
# global channel
a = 0
put!(channel, a)
b = 1
put!(channel, b)
while true
a , b = b , a+b
put!(channel,b)
sleep(1)
end
end
#async while true; item = take!(channel); println("item: $item"); end
#async fib()
Note that #async will hide errors, so you may want to do a try catch with showerror(stderr, e, catch_backtrace()) if things are not running.
Each #async produces a Task handle.
Also the put! and take! will block when the channel if filled up. You may want to expand the channel size to handle a larger buffer.
I have a function which I use with pmap to paralellize it. I would like to run 4 times this function asynchronously using 10 workers each but I can't run two or more pmap at the same time.
I'm using Julia v1.1 with a 40-CPUs machine on linux.
using Distributed
addprocs(4)
#everywhere function TestParallel(x)
a = 0
while a < 4
println("Value = ",x, " in worker = ", myid())
sleep(1)
a += 1
end
end
a = WorkerPool([2,3])
b = WorkerPool([4,5])
c = [i for i = 1:10]
#sync #async for i in c
pmap(x-> TestParallel(x), a, c)
pmap(x-> TestParallel(x), b, c)
end
I expect to have:
From worker 2: Value = 1 in worker = 2
From worker 3: Value = 2 in worker = 3
From worker 4: Value = 3 in worker = 4
From worker 5: Value = 4 in worker = 5
So the firsts two elements of c go to the first pmap and the next two elements to the second pmap, then whoever finishes first gets the next two elements.
Now I'm obtaining:
From worker 2: Value = 1 in worker = 2
From worker 3: Value = 2 in worker = 3
From worker 2: Value = 1 in worker = 2
From worker 3: Value = 2 in worker = 3
After the first pmap completes all elements of c the second pmap starts over solving all elements again.
From worker 2: Value = 9 in worker = 2
From worker 3: Value = 10 in worker = 3
From worker 5: Value = 2 in worker = 5
From worker 4: Value = 1 in worker = 4
There are some problems with your question: #sync and #async use green thread and you want to distribute your computations. Syntax #sync #async [some code] spawns a code asynchronously and waits for it to complete. Hence effectively it has the same meaning as [some code].
While your question is not clear I will assume that you want to launch 2 pmaps in parallel utilizing separate worker pools (this seems like the most likely thing you are trying to do).
In that case here is the code:
using Distributed
addprocs(4)
#everywhere function testpar2(x)
for a in 0:3
println("Value = $x [$a] in worker = $(myid())")
sleep(0.2)
end
return 1000*myid()+x*x #I assume you want to return some value
end
a = WorkerPool([2,3])
b = WorkerPool([4,5])
c = collect(1:10)
#sync begin
#async begin
res1 = pmap(x-> testpar2(x), a, c)
println("Got res1=$res1")
end
#async begin
res2 = pmap(x-> testpar2(x), b, c)
println("Got res2=$res2")
end
end
When running the above code you will see something like:
...
From worker 5: Value = 10 [3] in worker = 5
From worker 2: Value = 10 [3] in worker = 2
From worker 3: Value = 9 [3] in worker = 3
Got res2=[4001, 5004, 5009, 4016, 5025, 4036, 4049, 5064, 4081, 5100]
Got res1=[2001, 3004, 2009, 3016, 2025, 3036, 3049, 2064, 3081, 2100]
Task (done) #0x00000000134076b0
You can clearly seen that both pmaps have been run in parallel on different worker pools.
Hi I'm trying to understand how the macro #isdefined works.
I was expecting Chunk 1 to print out 1 2 3 4, but it is not printing anything.
Also related, I was expecting chunk 2 to print out 2 3 4 5, but it is throwing an error: "a is not defined".
# Chunk 1
for i = 1:5
if #isdefined a
print(a)
end
a = i
end
# Chunk 2
for i = 1:5
if i > 1
print(a)
end
a = i
end
Could someone help explain what is wrong about each chunk? Thank you.
The reason is that a is a local variable in the scope of for loop. Now the crucial part is that for loop follows the following rule defined here:
for loops, while loops, and comprehensions have the following behavior: any new variables introduced in their body scopes are freshly allocated for each loop iteration
This means that assignment to a at the end of the loop does not carry over to the next iteration, because when the new iteration starts the old value of a is discarded as a is freshly allocated. It only gets defined after a=i assignment.
Therefore you have the following behavior:
julia> for i = 1:5
if #isdefined a
println("before: ", a)
end
a = i
if #isdefined a
println("after: ", a)
end
end
after: 1
after: 2
after: 3
after: 4
after: 5
However, if a is defined in an outer scope, then its value is not for loop local and is preserved between iterations, so you have for instance:
julia> let a
for i = 1:5
if #isdefined a
println("before: ", a)
end
a = i
if #isdefined a
println("after: ", a)
end
end
end
after: 1
before: 1
after: 2
before: 2
after: 3
before: 3
after: 4
before: 4
after: 5
and
julia> let a
for i = 1:5
if i > 1
println(a)
end
a = i
end
end
1
2
3
4
I have used let block but it could be any kind of outer scope except global scope (in which case you would have to change a = i to global a = i to get the same effect).
I'm using doSNOW- package for parallelizing tasks, which differ in length. When one thread is finished, I want
some information generated by old threads passed to the next thread
start the next thread immediatly (loadbalancing like in clusterApplyLB)
It works in singlethreaded (see makeClust(spec = 1 ))
#Register Snow and doSNOW
require(doSNOW)
#CHANGE spec to 4 or more, to see what my problem is
registerDoSNOW(cl <- makeCluster(spec=1,type="SOCK",outfile=""))
numbersProcessed <- c() # init processed vector
x <- foreach(i = 1:10,.export=numbersProcessed) %dopar% {
#Do working stuff
cat(format(Sys.time(), "%X"),": ","Starting",i,"(Numbers processed so far:",numbersProcessed, ")\n")
Sys.sleep(time=i)
#Appends this number to general vector
numbersProcessed <- append(numbersProcessed,i)
cat(format(Sys.time(), "%X"),": ","Ending",i,"\n")
cat("--------------------\n")
}
#End it all
stopCluster(cl)
Now change the spec in "makeCluster" to 4. Output is something like this:
[..]
Type: EXEC
18:12:21 : Starting 9 (Numbers processed so far: 1 5 )
18:12:23 : Ending 6
--------------------
Type: EXEC
18:12:23 : Starting 10 (Numbers processed so far: 2 6 )
18:12:25 : Ending 7
At 18:12:21 thread 9 knew, that thread 1 and 5 have been processed. 2 seconds later thread 6 ends. The next thread has to know at least about 1, 5 and 6, right?. But thread 10 only knows about 6 and 2.
I realized, this has to do something with the cores specified in makeCluster. 9 knows about 1, 5 and 9 (1 + 4 + 4), 10 knows about 2,6 and 10 (2 + 4 + 4).
Is there a better way to pass "processed" stuff to further generations of threads?
Bonuspoints: Is there a way to "print" to the master- node in parallel processing, without having these "Type: EXEC" etc messages from the snow package? :)
Thanks!
Marc
My bad. Damn.
I thought, foreach with %dopar% is load-balanced. This isn't the case, and makes my question absolete, because there can nothing be executed on the host-side while parallel processing. That explains why global variables are only manipulated on the client side and never reach the host.