How to measure the #async runtime of functions in Julia? - asynchronous

My code is as the following,
function getImages()
#async for i in 1:30
load("path/to/image$i.jpg");
end
end
#time #sync getImages()
Is this the correct way to measure asynchronous runtime in Julia? I don't want to believe it because in synchronous run it takes 0.1 seconds to finish and with the #async code it shows only 0.000017 seconds.

You need to reorganize your code a little
julia> function getImages()
#sync for i in 1:30
#async begin
sleep(rand())
println("Image ", i)
end
end
end
getImages (generic function with 1 method)
julia> #time getImages()
Image 2
Image 5
Image 15
Image 30
Image 4
Image 7
Image 20
Image 3
Image 22
Image 18
Image 11
Image 29
Image 9
Image 24
Image 16
Image 6
Image 8
Image 14
Image 19
Image 21
Image 13
Image 1
Image 17
Image 10
Image 27
Image 25
Image 23
Image 28
Image 26
Image 12
0.894566 seconds (650 allocations: 37.406 KiB)

Here is the correct code:
function getImages()
myimages = Vector{Any}(undef, 30)
# instead of Any use the correct type of whatever your load returns
#sync for i in 1:30
#async myimages[i] = load("path/to/image$i.jpg");
end
myimages
end
This approach is usefull when your IO is slow.
Note however that this coode is going to utilize only a single thread. Hence if not IO is your performance bottleneck such parallelization will not help. In that case you should consider using threads.
Before starting Julia run:
set JULIA_NUM_THREADS=4
or on Linux
export JULIA_NUM_THREADS=4
And change your function to:
function getImages()
myimages = Vector{Any}(undef, 30)
# instead of Any use the correct type of whatever your load returns
Threads.#threads for i in 1:30
myimages[i] = load("path/to/image$i.jpg");
end
myimages
end

Your intuition is right here! The short answer is that you're not using #sync correctly.
Quote from the doc of #sync:
Wait until all lexically-enclosed uses of #async, #spawn, #spawnat and #distributed are complete. All exceptions thrown by enclosed async operations are collected and thrown as a CompositeException.
Here the #async is NOT lexically-enclosed in the #sync expression. Actually there's no magic in the #sync macro. The #async, #spawn, #spawnat and #distributed expressions will create tasks. And #sync simply wait for them to finish.
So you manually do it like this:
julia> f() = #async sleep(1)
julia> #time wait(f())
1.002095 seconds (9 allocations: 848 bytes)

Related

How to build a setInterval-like task?

This is my idea
task=#task begin
while true
sleep(1)
foo()
if should_end
# ?
end
end
end
But there are some problems
are there any simple ways to contact other than using global should_end?
how to end the task from inside the expression?
While implementing something like this can be a good exercise, note that Timers are in the standard library and they are similar to the JS functions you may be used to because Julia and Node.js both use libuv internally. Whether you use Node.js setInterval or Timeout, or Julia Timer eventually uv_timer_start is called (although the creation and low-level management of timers is different in the respective runtimes).
To answer your 2nd question, you can just break in the if expression:
julia> should_end = false
false
julia> task = #task begin
while true
sleep(1)
if should_end
break
end
end
"Tasks can return a value"
end
Task (runnable) #0x00007fca1c9c3990
julia> schedule(task)
Task (runnable) #0x00007fca1c9c3990
julia> task
Task (runnable) #0x00007fca1c9c3990
julia> should_end = true;sleep(1)
julia> task
Task (done) #0x00007fca1c9c3990
julia> fetch(task)
"Tasks can return a value"
As for the 1st question, there is a lot of information in the Julia docs, e.g. the Asynchronous Programming section. As is described there, Channels can be used for communication between tasks (when suitable). Note that should_end doesn't have to be global. A task wraps a function and that function can capture variables in enclosing scopes (#task begin a = 1 end is really just Task(() -> begin a = 1 end).
Here is a short example using a Timer and a Channel to send data to a task:
function pinger(n)
ping() = parse(Float64, match(r"time=(.*) ms", read(`ping -c 1 8.8.8.8`, String))[1])
ch = Channel{Float64}(0)
pinger = Timer(0.0; interval=1.0) do timer
put!(ch, ping())
end
summer = #task begin
total = 0.0
count = 0
while count < n
total += take!(ch)
count += 1
end
total / count
end
schedule(summer)
fetch(summer)
end
julia> pinger(3)
19.5

Julia #btime cannot find internal function

I'm wondering if I found a bug in Julia's BenchmarkTools or if there's something deeper happening here that I don't understand. Running the following script
function test()
function func1(n)
sum(1:n)
end
function func2(n)
ans = 0
for i = 1:n
ans += i
end
return ans
end
#time func1(100000)
#time func2(100000)
end
works exactly as expected and times both functions. However, using #btime instead of #time gives me an undefined error:
ERROR: UndefVarError: func1 not defined
If I move the internal functions outside test(), both timing versions work fine, but in my actual tests this is not something I can easily do. I prefer using #btime to #time, as it's more accurate and robust, but here I clearly can't. Can someone explain if this is a bug or what's going on here?
Try adding $ to your #btime calls:
function test()
function func1(n)
sum(1:n)
end
function func2(n)
ans = 0
for i = 1:n
ans += i
end
return ans
end
#btime $func1(100000)
#btime $func2(100000)
end
This interpolates the function definition and now the inner function will be visible to the benchmark.

Call the more general method from a specific one

I am trying to call a general method from a specific one, but cannot figure out how.
function fn(x)
# generic
...
end
function fn(x :: String)
# I want to call the generic version here
val = fn(x)
# do something with val and then return it
...
end
Is this possible?
A workaround is using a helper function that can be called from both generic and specific methods. e.g.
function helper(x)
# main work is here
...
end
function fn(x)
# generic
helper(x)
end
function fn(x :: String)
val = helper(x)
# now use the val to do something
...
end
Without using such helpers, is there a way to control the dispatch to select a particular method to use? Is there something like :before and :after keywords and call-next-method from lisp CLOS in Julia?
You can use the invoke function:
julia> function fn(x)
#info "generic $x"
end
fn (generic function with 1 method)
julia> function fn(x :: String)
#info "before"
invoke(fn, Tuple{Any}, x)
#info "after"
end
fn (generic function with 2 methods)
julia> fn(10)
[ Info: generic 10
julia> fn("10")
[ Info: before
[ Info: generic 10
[ Info: after
(just to be clear - the printing of "before" and "after" is only to highlight what gets executed in what sequence - the only thing that is related to method dispatch here is the invoke function)

Julia: #sync not passing control to the rest of the Julia function

I am trying to understand the usage of #sync and #async macros in Julia. I am trying to get this MWE to work and the program does not terminate. Any help is appreciated.
function process_node(nodes, id)
#show id
sleep(1.0)
nodes[id] = true
return
end
function main()
nodes = Dict( i => false for i in 1:10 )
jobs = Channel{Int}(15)
for i in 1:10
put!(jobs, i)
end
#sync for id in jobs
#async process_node(nodes, id)
end
println("done")
end
main()
The program never gets to the line println("done"). I do not know why.
Thanks in advance.
There is nothing wrong about the use of #sync and #async in this example.
the loop for id in jobs never return because it blocks forever waiting endlessy for values no more inserted into the Channel.
Directly from the docs:
The returned Channel can be used as an iterable object in a for loop, in which case the loop variable takes on all the produced > values. The loop is terminated when the channel is closed.
One solution is to signal the end of the streams of values with a special Int value, for example -1 or if this is not possible with a nothing value, declaring jobs as Channel{Union{Int, Nothing}}.
function main()
nodes = Dict( i => false for i in 1:10 )
jobs = Channel{Int}(15)
for i in 1:10
put!(jobs, i)
end
put!(jobs, -1)
#sync for id in jobs
if id == -1
close(jobs)
else
#async process_node(nodes, id)
end
end
println("done")
end

Using caching instead of memoization to speedup a function

While memoization of a function is a good idea, it could cause a program to crash because the program could potentially run out of memory.
Therefore it is NOT A SAFE OPTION to be used in a production program.
Instead I have developed caching with a fixed memory slots below with a soft limit and hard limit. When the cache slots is above the hard limit, it will have the least used slots deleted until the number of slots is reduced to the soft limit.
struct cacheType
softlimit::Int
hardlimit::Int
memory::Dict{Any,Any}
freq::Dict{Any,Int}
cacheType(soft::Int,hard::Int) = new(soft,hard,Dict(),Dict())
end
function tidycache!(c::cacheType)
memory_slots=length(c.memory)
if memory_slots > c.hardlimit
num_to_delete = memory_slots - c.softlimit
# Now sort the freq dictionary into array of key => AccessFrequency
# where the first few items have the lowest AccessFrequency
for item in sort(collect(c.freq),by = x -> x[2])[1:num_to_delete]
delete!(c.freq, item[1])
delete!(c.memory, item[1])
end
end
end
# Fibonacci function
function cachefib!(cache::cacheType,x)
if haskey(cache.memory,x)
# Increment the number of times this key has been accessed
cache.freq[x] += 1
return cache.memory[x]
else
# perform housekeeping and remove cache entries if over the hardlimit
tidycache!(cache)
if x < 3
cache.freq[x] = 1
return cache.memory[x] = 1
else
result = cachefib!(cache,x-2) + cachefib!(cache,x-1)
cache.freq[x] = 1
cache.memory[x] = result
return result
end
end
end
c = cacheType(3,4)
cachefib!(c,3)
cachefib!(c,4)
cachefib!(c,5)
cachefib!(c,6)
cachefib!(c,4)
println("c.memory is ",c.memory)
println("c.freq is ",c.freq)
I think this would be most useful in a production environment than just using memorization with no limits of memory consumption which could result in a program crashing.
In Python language, they have
#functools.lru_cache(maxsize=128, typed=False)
Decorator to wrap a function with a memoizing callable that saves up to the maxsize most recent calls. It can save time when an expensive or I/O bound function is periodically called with the same arguments.
Since a dictionary is used to cache results, the positional and keyword arguments to the function must be hashable.
Is there an equivalent in Julia language?
There is LRUCache.jl, which provides an LRU type which basically acts like a Dict. Unfortunately, this doesn't seem to work with the Memoize.jl package, but you can use my answer to your other question:
using LRUCache
const fibmem = LRU{Int,Int}(3) # store only 3 values
function fib(n)
get!(fibmem, n) do
n < 3 ? 1 : fib(n-1) + fib(n-2)
end
end

Resources