I have been headbutting on a wall for a few days around this code:
using Distributed
using SharedArrays
# Dimension size
M=10;
N=100;
z_ijw = zeros(Float64,M,N,M)
z_ijw_tmp = SharedArray{Float64}(M*M*N)
i2s = CartesianIndices(z_ijw)
#distributed for iall=1:(M*M*N)
# get index
i=i2s[iall][1]
j=i2s[iall][2]
w=i2s[iall][3]
# Assign function value
z_ijw_tmp[iall]=sqrt(i+j+w) # Any random function would do
end
# Print the last element of the array
println(z_ijw_tmp[end])
println(z_ijw_tmp[end])
println(z_ijw_tmp[end])
The first printed out number is always 0, the second number is either 0 or 10.95... (sqrt of 120, which is correct). The 3rd is either 0 or 10.95 (if the 2nd is 0)
So it appears that the print code (#mainthread?) is allowed to run before all the workers finish. Is there anyway for the print code to run properly the first time (without a wait command)
Without multiple println, I thought it was a problem with scope and spend a few days reading about it #.#
#distributed with a reducer function, i.e. #distributed (+), will be synced, whereas #distributed without a reducer function will be started asynchronously.
Putting a #sync in front of your #distributed should make the code behave the way you want it to.
This is also noted in the documentation here:
Note that without a reducer function, #distributed executes asynchronously, i.e. it spawns independent tasks on all available workers and returns immediately without waiting for completion. To wait for completion, prefix the call with #sync
Following the approach of this answer I am trying to understand what happens exactly and how expressions and generated functions work in Julia within the concept of metaprogramming.
The goal is to optimize a recursive function using expressions and generated functions (for a concrete example you can have a look at the question answered in the link provided above).
Consider the following modified fibonacci function, in which I want to compute the fibonacci series up to n and multiply it by a number p.
The straightforward, recursive implementation would be
function fib(n::Integer, p::Real)
if n <= 1
return 1 * p
else
return n * fib(n-1, p)
end
end
As a first step, I could define a function which returns an expression instead of the computed value
function fib_expr(n::Integer, p::Symbol)
if n <= 1
return :(1 * $p)
else
return :($n * $(fib_expr(n-1, p)))
end
end
which, e.g. returns something like
julia> ex = fib_expr(3, :myp)
:(3 * (2 * (1myp)))
In this way I get an expression which is fully expanded and depends on the value assigned to the symbol myp. In this way I do not see the recursion anymore, basically I am metaprogramming: I created a function that creates another "function" (in this case we call it expression though).
I can now set myp = 0.5 and call eval(ex) to compute the result.
However, this is slower than the first approach.
What I can do though, is to generate a parametric function in the following way
#generated function fib_gen{n}(::Type{Val{n}}, p::Real)
return fib_expr(n, :p)
end
And magically, calling fib_gen(Val{3}, 0.5) gets things done, and is incredibly fast.
So, what is going on?
To my understanding, in the first call to fib_gen(Val{3}, 0.5), the parametric function fib_gen{Val{3}}(...) gets compiled and its content is the fully expanded expression obtained through fib_expr(3, :p), i.e. 3*2*1*p with p substituted with the input value.
The reason why it is so fast then, is because fib_gen is basically just a series of multiplications, whereas the original fib has to allocate on the stack every single recursive call making it slower, am I correct?
To give some numbers, here is my short benchmark using BenchmarkTools.
julia> #benchmark fib(10, 0.5)
...
mean time: 26.373 ns
...
julia> p = 0.5
0.5
julia> #benchmark eval(fib_expr(10, :p))
...
mean time: 177.906 μs
...
julia> #benchmark fib_gen(Val{10}, 0.5)
...
mean time: 2.046 ns
...
I have many questions:
Why the second case is so slow?
What exactly is and means ::Type{Val{n}}? (I copied that from the answer linked above)
Because of the JIT compiler, sometimes I am lost in what happens at compile-time and at run-time, as it is the case here...
Furthermore, I tried to combine fib_expr and fib_gen in a single function according to
#generated function fib_tot{n}(::Type{Val{n}}, p::Real)
if n <= 1
return :(1 * p)
else
return :(n * fib_tot(Val{n-1}, p))
end
end
which however is slow
julia> #benchmark fib_tot(Val{10}, 0.5)
...
mean time: 4.601 μs
...
What am I doing wrong here? Is it even possible to combine fib_expr and fib_gen in a single function?
I realize this is more a monograph rather than a question, however, even though I read the metaprogramming section few times, I am having a hard time to grasp everything, in particular with an applied example such as this one.
A monograph in response:
Metaprogramming basics
It will be easier to start with "normal" macros first. I'll relax the definition you used a bit:
function fib_expr(n::Integer, p)
if n <= 1
return :(1 * $p)
else
return :($n * $(fib_expr(n-1, p)))
end
end
That allows to pass in more than just symbols for p, like integer literals or whole expressions. Given this, we can define a macro for the same functionality:
macro fib_macro(n::Integer, p)
fib_expr(n, p)
end
Now, if #fib_macro 45 1 is used anywhere in the code, at compile time it will first be replaced by a long nested expression
:(45 * (44 * ... * (1 * 1)) ... )
and then compiled normally -- to a constant.
That's all there is to macros, really. Replacing syntax during compile time; and by recursion, this can be an arbitrarily long alteration between compiling, and evaluating functions on expressions. And for things that are essentially constant, but tedious to write otherwise, it is very useful: a bood example example is Base.Math.#evalpoly.
Evaluation at runtime?
But it has the problem that you cannot inspect values which are only known at runtime: you can't implement fib(n) = #fib_macro n 1, since at compile time, n is a symbol representing the parameter, and not a number you can dispatch on.
The next best solution to this would be to use
fib_eval(n::Integer) = eval(fib_expr(n, 1))
which works, but will repeat the compilation process every time it is called -- and that is much more overhead than the original function, since now at runtime, we perform the whole recursion on the expression tree and then call the compiler on the result. Not good.
Method dispatch & compilation
So we need a way to intermingle runtime and compile time. Enter #generated functions. These will at runtime dispatch on a type, and then work like a macro defining the function body.
First about type dispatch. If we have
f(x) = x + 1
and have a function call f(1), about the following will happen:
The type of the argument is determined (Int)
The method table of the function is consulted to find the best matching method
The method body is compiled for the specific Int argument type, if that hasn't been done before
The compiled method is evaluated on the concrete argument
If we then enter f(1.0), the same will happen again, with a new, different specialized method being compiled for Float64, based on the same function body.
Value types & singleton types
Now, Julia has the peculiar feature that you can use numbers as types. That means that the dispatch process outlined above will also work on the following function:
g(::Type{Val{N}}) where N = N + 1
That's a bit tricky. Remember that types are themselves values in Julia: Int isa Type.
Here, Val{N} is for every N a so-called singleton type having exactly one instance, namely Val{N}() -- just like Int is a type having many instances 0, -1, 1, -2, ....
Type{T} is also a singleton type, having as its single instance the type T. Int is a Type{Int}, and Val{3} is a Type{Val{3}} -- in fact, both are the only values of their type.
So, for each N, there is a type Val{N}, being the single instance of Type{Val{N}}. Thus, g will be dispatched and compiled for each single N. This is how we can dispatch on numbers as types. This already allows for optimization:
julia> #code_llvm g(Val{1})
define i64 #julia_g_61158(i8**) #0 !dbg !5 {
top:
ret i64 2
}
julia> #code_llvm f(1)
define i64 #julia_f_61076(i64) #0 !dbg !5 {
top:
%1 = shl i64 %0, 2
%2 = or i64 %1, 3
%3 = mul i64 %2, %0
%4 = add i64 %3, 2
ret i64 %4
}
But remember that it requires compilation for each new N at the first call.
(And fkt(::T) is just short for fkt(x::T) if you don't use x in the body.)
Integrating generating functions and value types
Finally to generated functions. They work as a slight modification of the above dispatch pattern:
The type of the argument is determined (Int)
The method table of the function is consulted to find the best matching method
The method body is treated as a macro and called with the Int argument type as a parameter, if that hasn't been done before. The resulting expression is compiled into a method.
The compiled method is evaluated on the concrete argument
This pattern allows to change the implementation for each type which the function is dispatched on.
For our concrete setting, we want to dispatch on the Val types representing the arguments of the Fibonacci sequence:
#generated function fib_gen{n}(::Type{Val{n}}, p::Real)
return fib_expr(n, :p)
end
You now see that your explanation was exactly right:
in the first call to fib_gen(Val{3}, 0.5), the parametric function
fib_gen{Val{3}}(...) gets compiled and its content is the fully
expanded expression obtained through fib_expr(3, :p), i.e. 3*2*1*p
with p substituted with the input value.
I hope that the whole story has also answered all three of your listed questions:
The implementation using eval replicates the recursion every time, plus the overhead of compilation
Val is a trick to lift numbers to types, and Type{T} the singleton type containing only T -- but I hope the examples were helpful enough
Compile time is not before execution, because of JIT -- it is every time a method gets compiled first time, because it get's called.
First of all, I am joining myself to the comments: your question is very well written & constructive.
I have reproduced your results using Julia 0.7-beta.
Difference between #generated fib_tot (one piece of code) and fib_gen (that calls fib_expr)
With my julia version results are identicals:
julia> #btime fib_tot(Val{10},0.5)
0.042 ns (0 allocations: 0 bytes)
1.8144e6
julia> #btime fib_gen(Val{10},0.5)
0.042 ns (0 allocations: 0 bytes)
1.8144e6
Sometimes breaking a function into multiple parts see official doc:performance tips can be useful, however in your peculiar case I do not see why this could be useful. At compile time Julia has everything it needs to optimize fib_tot. There is a branch if n<=1 however n is known at "compile time" thanks to the Type{Val{n}} trick and this branch should be removed without problem in the generated (specialized) code.
The Type{Val{n}} trick
To specialize functions, Julia inference is performed according to argument type and not according to argument value.
For instance a compiled version of foo(n::Int) = ... is not generated for each n value. You must define a type that depends on n value to reach this goal. This is precisely how Type{Val{n}} works: Val{n} is simply a parametrized empty structure:
struct Val{T} end
Hence, each Val{1}, Val{2}, ... Val{100}, ... is a different type. By consequence, if foo is defined as:
foo(::Type{Val{n}}) where {n} = ...
Each foo(Val{1}), foo(Val{2}), ... foo(Val{100}) will trigger a specialized foo version (because argument type is different).
The eval(fib_expr(n, 1)) case
This
julia> #btime eval(fib_expr(10, :p))
401.651 μs (99 allocations: 6.45 KiB)
1.8144e6
is slow because your expression is (re-)compiled every time. The problem can be avoided if you use a macro instead (see phg answer).
The fib version
.
julia> #btime fib(10,0.5)
30.778 ns (0 allocations: 0 bytes)
1.8144e6
There is only one compiled version of this fib function. By consequence, it must contain all the runtime branch tests etc... This explains how slow it is.
Just a remark about:
foo{n}(::Type{Val{n}}) deprecated syntax
The foo{n}(::Type{Val{n}}) syntax is deprecated, the new one is foo(::Type{Val{n}}) where {n}. You can read Julia doc, parametric methods for further details.
My Julia version:
julia> versioninfo()
Julia Version 0.7.0-beta.0
Commit f41b1ecaec (2018-06-24 01:32 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) CPU E5-2603 v3 # 1.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, haswell)
I have written a recursive function for getting objects in larger arrays in julia. The following error occured:
ERROR: LoadError: StackOverflowError:
in cat_t at abstractarray.jl:831
in recGetObjChar at /home/user/Desktop/program.jl:1046
in recGetObjChar at /home/user/Desktop/program.jl:1075 (repeats 9179 times)
in getImChars at /home/user/Desktop/program.jl:968
in main at /home/user/Desktop/program.jl:69
in include at ./boot.jl:261
in include_from_node1 at ./loading.jl:304
in process_options at ./client.jl:308
in _start at ./client.jl:411
while loading /home/user/Desktop/program.jl, in expression starting on line 78
If you want to have a look at the code, I have already opened an issue (Assertion failed, process aborted). After debugging my code for julia v 0.4, it is more obvious, what causes the problem. The tupel locObj gets much bigger than 9000 entries, because one object can be e.g. 150 x 150 big.
That would result in a length of 22500 for locObj. How big can tupels get, and how can I avoid a stackoverflow? Is there another way to save my values?
As it's commented, I think better approaches exist to work with big arrays of data, and this answer is mainly belongs to this part of your question:
Is there another way to save my values?
I have prepared a test to show how using mmap is helpful when dealing with big array of data, following functions both do the same thing: they create a vector of 3*10E6 float64, then fill it, calculate sum and print result, in the first one (mmaptest()), a memory-map structure have been used to store Vector{Float64} while second one (ramtest()) do the work on machine ram:
function mmaptest()
s = open("./tmp/mmap.bin","w+") # tmp folder must exists in pwd() path
A = Mmap.mmap(s, Vector{Float64}, 3_000_000)
for j=1:3_000_000
A[j]=j
end
println("sum = $(sum(A))")
close(s)
end
function ramtest()
A = Vector{Float64}(3_000_000)
for j=1:3_000_000
A[j]=j
end
println("sum = $(sum(A))")
end
then both functions have been called and memory allocation size was calculated:
julia> gc(); # => remove old handles to closed stream
julia> #allocated mmaptest()
sum = 4.5000015e12
861684
julia> #allocated ramtest()
sum = 4.5000015e12
24072791
It's obvious from those tests that with a memory-map object, memory allocation is much smaller.
julia> gc()
julia> #time ramtest()
sum = 4.5000015e12
0.012584 seconds (29 allocations: 22.889 MB, 3.43% gc time)
julia> #time mmaptest()
sum = 4.5000015e12
0.019602 seconds (58 allocations: 2.277 KB)
as it's clear from #time test, using mmap makes the code slower while needs less memory.
I wish it helps you, regards.
In MATLAB, there are a pair of functions tic and toc
which can be used to start and stop a stopwatch timer.
An example taken from link:
tic
A = rand(12000, 4400);
B = rand(12000, 4400);
toc
C = A'.*B';
toc
I am aware that there is a macro #time in Julia
that has similar functionality.
julia> #time [sin(cos(i)) for i in 1:100000];
elapsed time: 0.00721026 seconds (800048 bytes allocated)
Is there a set of similar functions in Julia?
The #time macro works well for timing statements
that can be written in one or two lines.
For longer portions of code,
I would prefer to use tic-toc functions.
What I tried
When I googled "julia stopwatch",
I found one useful link and four unrelated links.
Introducing Julia/Metaprogramming - Wikibooks, open ...
Meta-programming is when you write Julia code to process and modify Julia code. ... The #time macro inserts a "start the stopwatch" command at the beginning ...
Our Invisible Stopwatch promo - YouTube
Video for julia stopwatch
Julia Larson on Twitter: "This #Mac OSX timer/stopwatch is ...
Timing episodes of The French Chef with a stopwatch
julia griffith | Oiselle Running Apparel for Women
I don't know why I hadn't thought of just trying tic() and toc().
tic() and toc() have been deprecated as of https://github.com/JuliaLang/julia/commit/1b023388f49e13e7a42a899c12602d0fd5d60b0a
You can use #elapsed and #time for longer chunks by wrapping them in an environment, like so:
t = #elapsed begin
...
end
There is also TickTock.jl, which has reimplemented tic() and toc() and tick() and tock().
using TickTock
tick()
# Started timer at 2017-12-13T22:30:59.632
tock()
# 55.052638936 ms: 55 seconds, 52 milliseconds
From a search of the Julia documentation
tic()
Set a timer to be read by the next call to toc() or toq(). The
macro call #time expr can also be used to time evaluation.
You can also use time() to get similar output. For example:
t = time()
# code block here
# ...
# ...
dt = time() - t
I have a piece of code in Julia in which a solver iterates many, many times as it seeks a solution to a very complex problem. At present, I have to provide a number of iterations for the code to do, set low enough that I don't have to wait hours for the code to halt in order to save the current state, but high enough that I don't have to keep activating the code every 5 minutes.
Is there a way, with the current state of Julia (0.2), to detect a keystroke instructing the code to either end without saving (in case of problems) or end with saving? I require a method such that the code will continue unimpeded unless such a keystroke event has happened, and that will interrupt on any iteration.
Essentially, I'm looking for a command that will read in a keystroke if a keystroke has occurred (while the terminal that Julia is running in has focus), and run certain code if the keystroke was a specific key. Is this possible?
Note: I'm running julia via xfce4-terminal on Xubuntu, in case that affects the required command.
You can you an asynchronous task to read from STDIN, blocking until something is available to read. In your main computation task, when you are ready to check for input, you can call yield() to lend a few cycles to the read task, and check a global to see if anything was read. For example:
input = ""
#async while true
global input = readavailable(STDIN)
end
for i = 1:10^6 # some long-running computation
if isempty(input)
yield()
else
println("GOT INPUT: ", input)
global input = ""
end
# do some other work here
end
Note that, since this is cooperative multithreading, there are no race conditions.
You may be able to achieve this by sending an interrupt (Ctrl+C). This should work from the REPL without any changes to your code – if you want to implement saving you'll have to handle the resulting InterruptException and prompt the user.
I had some trouble with the answer from steven-g-johnson, and ended up using a Channel to communicate between tasks:
function kbtest()
# allow 'q' pressed on the keyboard to break the loop
quitChannel = Channel(10)
#async while true
kb_input = readline(stdin)
if contains(lowercase(kb_input), "q")
put!(quitChannel, 1)
break
end
end
start_time = time()
while (time() - start_time) < 10
if isready(quitChannel)
break
end
println("in loop # $(time() - start_time)")
sleep(1)
end
println("out of loop # $(time() - start_time)")
end
This requires pressing and then , which works well for my needs.