How to store the output of #time to a variable? - julia

Is possible to store the time displayed with #time in a variable ?
For example the following code
for i in 1:10
#time my_function(i)
end
displays the wall time of my function my_function, but I would like to store the number of milliseconds in an array instead, in order to display it in a plot showing the evolution of the execution time regarding the parameter i.

The simplest is to use #elapsed, e.g.:
julia> [#elapsed rand(5^i) for i in 1:10]
10-element Vector{Float64}:
3.96e-6
4.64e-7
7.55e-7
3.909e-6
4.43e-6
1.5367e-5
7.0791e-5
0.000402877
0.001831287
0.071062595
and if you use BenchmarkTools.jl then there is also #belapsed macro there for more accurate benchmarking than #elapsed.
EDIT:
#time: is printing the time it took to execute, the number of allocations, and the total number of bytes its execution caused to be allocated, before returning the value of the expression. Any time spent garbage collecting (gc) or compiling is shown as a percentage.
#elapsed: discarding the resulting value, instead returning the number of seconds it took to execute as a floating-point number

I would like to add another example using #elapsed begin to show how it can be used to time multiple lines of code:
dt = #elapsed begin
x = 1
y = 2
z = x^2 + y
print(z)
end
Additionally, if this is not for benchmarking code and you just want time as an output you can alternatively use time():
t = time()
x = 1
y = 2
z = x^2 + y
print(z)
dt = time() - t

Related

Schroders Big number sequence

I am implementing a recursive program to calculate the certain values in the Schroder sequence, and I'm having two problems:
I need to calculate the number of calls in the program;
Past a certain number, the program will generate incorrect values (I think it's because the number is too big);
Here is the code:
let rec schroder n =
if n <= 0 then 1
else if n = 1 then 2
else 3 * schroder (n-1) + sum n 1
and sum n k =
if (k > n-2) then 0
else schroder k * schroder (n-k-1) + sum n (k+1)
When I try to return tuples (1.), the function sum stops working because it's trying to return int when it has type int * int;
Regarding 2., when I do schroder 15 it returns:
-357364258
when it should be returning
3937603038.
EDIT:
firstly thanks for the tips, secondly after some hours of deep struggle, i manage to create the function, now my problem is that i'm struggling to install zarith. I think I got it installed, but ..
in terminal when i do ocamlc -I +zarith test.ml i get an error saying Required module 'Z' is unavailable.
in utop after doing #load "zarith.cma";; and #install_printer Z.pp_print;; i can compile, run the function and it works. However i'm trying to implement a Scanf.scanf so that i can print different values of the sequence. With this being said whenever i try to run the scanf, i dont get a chance to write any number as i get a message saying that '\\n' is not a decimal digit.
With this being said i will most probably also have problems with printing the value, because i dont think that i'm going to be able to print such a big number with a %d. The let r1,c1 = in the following code, is a example of what i'm talking about.
Here's what i'm using :
(function)
..
let v1, v2 = Scanf.scanf "%d %d" (fun v1 v2-> v1,v2);;
let r1,c1 = schroder_a (Big_int_Z.of_int v1) in
Printf.printf "%d %d\n" (Big_int_Z.int_of_big_int r1) (Big_int_Z.int_of_big_int c1);
let r2,c2 = schroder_a v2 in
Printf.printf "%d %d\n" r2 c2;
P.S. 'r1' & 'r2' stands for result, and 'c1' and 'c2' stands for the number of calls of schroder's recursive function.
P.S.S. the prints are written differently because i was just testing, but i cant even pass through the scanf so..
This is the third time I've seen this problem here on StackOverflow, so I assume it's some kind of school assignment. As such, I'm just going to make some comments.
OCaml doesn't have a function named sum built in. If it's a function you've written yourself, the obvious suggestion would be to rewrite it so that it knows how to add up the tuples that you want to return. That would be one approach, at any rate.
It's true, ints in OCaml are subject to overflow. If you want to calculate larger values you need to use a "big number" package. The one to use with a modern OCaml is Zarith (I have linked to the description on ocaml.org).
However, none of the other people solving this assignment have mentioned overflow as a problem. It could be that you're OK if you just solve for representable OCaml int values.
3937603038 is larger than what a 32-bit int can hold, and will therefore overflow. You can fix this by using int64 instead (until you overflow that too). You'll have to use int64 literals, using the L suffix, and operations from the Int64 module. Here's your code converted to compute the value as an int64:
let rec schroder n =
if n <= 0 then 1L
else if n = 1 then 2L
else Int64.add (Int64.mul 3L (schroder (n-1))) (sum n 1)
and sum n k =
if (k > n-2) then 0L
else Int64.add (Int64.mul (schroder k) (schroder (n-k-1))) (sum n (k+1))
I need to calculate the number of calls in the program;
...
the function 'sum' stops working because it's trying to return 'int' when it has type 'int * int'
Make sure that you have updated all the recursive calls to shroder. Remember it is now returning a pair not a number, so you can't, for example, just to add it and you need to unpack the pair first. E.g.,
...
else
let r,i = schroder (n-1) (i+1) in
3 * r + sum n 1 and ...
and so on.
Past a certain number, the program will generate incorrect values (I think it's because the number is too big);
You need to use an arbitrary-precision numbers, e.g., zarith

Julia: parallelize operations on complex data structures (eg DataFrames)

I would like to process a number of large datasets in parallel. Unfortunately the speedup I am getting from using Threads.#threads is very sublinear, as the following simplified example shows.
(I'm very new to Julia, so apologies if I missed something obvious)
Let's create some dummy input data - 8 dataframes with 2 integer columns each and 10 million rows:
using DataFrames
n = 8
dfs = Vector{DataFrame}(undef, n)
for i = 1:n
dfs[i] = DataFrame(Dict("x1" => rand(1:Int64(1e7), Int64(1e7)), "x2" => rand(1:Int64(1e7), Int64(1e7))))
end
Now do some processing on each dataframe (group by x1 and sum x2)
function process(df::DataFrame)::DataFrame
combine([:x2] => sum, groupby(df, :x1))
end
Finally, compare the speed of doing the processing on a single dataframe with doing it on all 8 dataframes in parallel. The machine I'm running this on has 50 cores and Julia was started with 50 threads, so ideally there should not be much of a time difference.
julia> dfs_res = Vector{DataFrame}(undef, n)
julia> #time for i = 1:1
dfs_res[i] = process(dfs[i])
end
3.041048 seconds (57.24 M allocations: 1.979 GiB, 4.20% gc time)
julia> Threads.nthreads()
50
julia> #time Threads.#threads for i = 1:n
dfs_res[i] = process(dfs[i])
end
5.603539 seconds (455.14 M allocations: 15.700 GiB, 39.11% gc time)
So the parallel run takes almost twice as long per dataset (this gets worse with more datasets). I have a feeling this has something to do with inefficient memory management. GC time is pretty high for the second run. And I assume the preallocation with undef isn't efficient for DataFrames. Pretty much all the examples I've seen for parallel processing in Julia are done on numeric arrays with fixed and a-priori known sizes. However here the datasets could have arbitrary sizes, columns etc. In R workflows like that can be done very efficiently with mclapply. Is there something similar (or a different but efficient pattern) in Julia? I chose to go with threads and not multi-processing to avoid copying data (Julia doesn't seem to support the fork process model like R / mclapply).
Multithreading in Julia does not scale well beyond 16 threads.
Hence you need to use multiprocessing instead.
Your code might look like this:
using DataFrames, Distributed
addprocs(4) # or 50
#everywhere using DataFrames, Distributed
n = 8
dfs = Vector{DataFrame}(undef, n)
for i = 1:n
dfs[i] = DataFrame(Dict("x1" => rand(1:Int64(1e7), Int64(1e7)), "x2" => rand(1:Int64(1e7), Int64(1e7))))
end
#everywhere function process(df::DataFrame)::DataFrame
combine([:x2] => sum, groupby(df, :x1))
end
dfs_res = #distributed (vcat) for i = 1:n
df = process(dfs[i])
(i, myid(), df)
end
What is important in this type of code is that transferring data between processes takes time. So sometimes you might want just to keep separate DataFrames on separate workers. Like always - it depends on your processing architecture.
Edit some notes on the performance
For testing have your code in functions and use consts (or use BenchamrTools.jl)
using DataFrames
const dfs = [DataFrame(Dict("x1" => rand(1:Int64(1e7), Int64(1e7)), "x2" => rand(1:Int64(1e7), Int64(1e7)))) for i in 1:8 ]
function process(df::DataFrame)::DataFrame
combine([:x2] => sum, groupby(df, :x1))
end
function p1!(res, d)
for i = 1:8
res[i] = process(dfs[i])
end
end
function p2!(res, d)
Threads.#threads for i = 1:8
res[i] = process(dfs[i])
end
end
const dres = Vector{DataFrame}(undef, 8)
And here result
julia> GC.gc();#time p1!(dres, dfs)
30.840718 seconds (507.28 M allocations: 16.532 GiB, 6.42% gc time)
julia> GC.gc();#time p1!(dres, dfs)
30.827676 seconds (505.66 M allocations: 16.451 GiB, 7.91% gc time)
julia> GC.gc();#time p2!(dres, dfs)
18.002533 seconds (505.77 M allocations: 16.457 GiB, 23.69% gc time)
julia> GC.gc();#time p2!(dres, dfs)
17.675169 seconds (505.66 M allocations: 16.451 GiB, 23.64% gc time)
Why the difference is only approx 2x on an 8 cores machine - because we have spent most of the time garbage collecting! (look at output in your question - the problem is the same)
When you use less RAM you will see a better multithreading speed-up up to 3x.

Interpreting Body expression from #code_warntype

If we run:
#code_warntype deepcopy(rand(2))
at the Julia REPL, the output contains flagged values in the Body expression. Specifically, the two Any at the end of:
Body:
begin # deepcopy.jl, line 8:
GenSym(0) = (Base.Array)(Base.Any,32)::Array{Any,1}
return (Base.deepcopy_internal)(x::Array{Float64,1},$(Expr(:new, :((top(getfield))(Base,:ObjectIdDict)::Type{ObjectIdDict}), GenSym(0))))::Any
end::Any
I understand from this question that we usually don't need to worry about flagged values in the Body expression if our primary concern is type instability. So instead, my question is this:
Why does a fairly simple function from Base generate any flagged values in #code_warntype? I'm sure there are good reasons, but I am new at interpreting the output from #code_warntype, and had some trouble understanding the discussion of the Body expression from the official docs.
This is an example of a situation where type inference is unable to figure out the return type of a function. (Note the ::Any on the return value!) It is a problem, not because the computation itself will be slower because of type instability, but because the return type cannot be inferred, and so future computations using the return type will suffer from type instability.
You can see this effect by looking at allocations below:
julia> function f()
y = rand(10)
#time y[1] + y[10]
z = deepcopy(y)
#time z[1] + z[10]
end
f (generic function with 1 method)
julia> f(); # ignore output here on first compile
julia> f();
0.000000 seconds
0.000002 seconds (3 allocations: 48 bytes)
Note that the second operations requires allocations and takes time, because unboxing and dynamic dispatch are involved.
In the current nightly build of what will become 0.5 (which will likely be released within a few months), this has been fixed. Thus
julia> #code_warntype deepcopy(rand(2))
Variables:
#self#::Base.#deepcopy
x::Array{Float64,1}
Body:
begin # deepcopy.jl, line 8:
# meta: location dict.jl Type # dict.jl, line 338:
SSAValue(1) = (Core.ccall)(:jl_alloc_array_1d,(Core.apply_type)(Core.Array,Any,1)::Type{Array{Any,1}},(Core.svec)(Core.Any,Core.Int)::SimpleVector,Array{Any,1},0,32,0)::Array{Any,1}
# meta: pop location
return (Core.typeassert)((Base.deepcopy_internal)(x::Array{Float64,1},$(Expr(:new, :(Base.ObjectIdDict), SSAValue(1))))::Any,Array{Float64,1})::Array{Float64,1}
end::Array{Float64,1}
which has no type instability, and
julia> f()
0.000000 seconds
0.000000 seconds
which has no dynamic dispatch and no allocations.

Command timeouts in Julia

I have a Julia script that repeatedly calls a C++ program to perform an optimization. The C++ program writes a text file, then I have Julia read the results and decide what to do next. The problem is that occasionally (maybe 1 in 1000+ times) the C++ program freezes (the optimization probably gets stuck), and my entire script hangs indefinitely, making it very difficult for the script to make it through all necessary program calls. Is there a way I can add a timeout, so that if the program has not finished within 10 minutes I can restart with a new guess value?
Simplified example:
for k = 1:10
run(`program inputs`)
end
Desired:
max_runtime = 10*60 # 10 minutes
for k = 1:10
run(`program inputs`,max_runtime)
end
Alternative:
max_runtime = 10*60 # 10 minutes
for k = 1:10
deadline(function,max_runtime)
end
How about something like:
max_runtime = 10*60 # 10 minutes
for k = 1:10
proc = spawn(`program inputs`)
timedwait(() -> process_exited(proc), max_runtime)
if process_running(proc)
kill(proc)
end
end

Write a variable number of arguments for IF

I am trying to write a function that would solve any general version of this problem:
If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below 1000.
The way I solved it for this instance was:
multiples = Array(Int, 0)
[ (i % 3 == 0 || i % 5 == 0) && push!(multiples, i) for i in 1:1000 ]
sum(multiples)
I want to write a function that will take an array of multiples (in this case, [3,5]) and the final number (in this case, 1000). The point is that the array can consist of arbitrarily many numbers, not just two (e.g., [3,5,6]). Then the function should run i % N == 0 for each N.
How do I do that most efficiently? Could that involve metaprogramming? (The code doesn't have to be in a list comprehension format.)
Thanks!
Very first thing that popped into my head was the following, using modulo division and a functional style:
v1(N,M) = sum(filter(k->any(j->k%j==0,M), 1:N))
But I wanted to investigate some alternatives, as this has two problems:
Two levels of anonymous function, something Julia doesn't optimize well (yet).
Creates a temporary array from the range, then sums.
So here is the most obvious alternative, the C-style version of the one-liner:
function v2(N,M)
sum_so_far = 0
for k in 1:N
for j in M
if k%j==0
sum_so_far += k
break
end
end
end
return sum_so_far
end
But then I thought about it some more, and remembered reading somewhere that modulo division is a slow operation. I wanted to see how IntSets perform - a set specialized for integers. So here is another one-liner, IntSets without using any module division, and a functional style!
v3(N,M) = sum(union(map(j->IntSet(j:j:N), M)...))
Expanding the map into a for loop and repeatedly applying union! to a single IntSet was not much better, so I won't include that here. To break this down:
IntSet(j:j:N) is all the multiples of j between j and N
j->IntSet(j:j:N) is an anonymous function that returns that IntSet
map(j->IntSet(j:j:N), M) applies that function to each j in M, and returns a Vector{IntSet}.
The ... "splats" the vector out into arguments of union
union creates an IntSet that is the union of its arguments - in this case, all multiples of the numbers in M
Then we sum it to finish
I benchmarked these with
N,M = 10000000, [3,4,5]
which gives you
One-liner: 2.857292874 seconds (826006352 bytes allocated, 10.49% gc time)
C-style: 0.190581908 seconds (176 bytes allocated)
IntSet no modulo: 0.121820101 seconds (16781040 bytes allocated)
So you can definitely even beat C-style code with higher level objects - modulo is that expensive I guess! The neat thing about the no modulo one is it parallelizes quite easily:
addprocs(3)
#everywhere worker(j,N) = IntSet(j:j:N)
v4(N,M) = sum(union(pmap(j->worker(j,N),M)...))
#show v4(1000, [3,5])
#time v3(1000000000,[3,4,5]); # bigger N than before
#time v4(1000000000,[3,4,5]);
which gives
elapsed time: 12.279323079 seconds (2147831540 bytes allocated, 0.94% gc time)
elapsed time: 10.322364457 seconds (1019935752 bytes allocated, 0.71% gc time)
which isn't much better, but its something I suppose.
Okay, here is my updated answer.
Based on the benchmarks in the answer of #IainDunning, the method to beat is his v2. My approach below appears to be much faster, but I'm not clever enough to generalize it to input vectors of length greater than 2. A good mathematician should be able to improve on my answer.
Quick intuition: For the case of length(M)=2, the problem reduces to the sum of all multiples of M[1] up to N added to the sum of all multiples of M[2] up to N, where, to avoid double-counting, we then need to subtract the sum of all multiples of M[1]*M[2] up to N. A similar algorithm could be implemented for M > 2, but the double-counting issue becomes much more complicated very quickly. I suspect a general algorithm for this would definitely exist (it is the kind of issue that crops up all the time in the field of combinatorics) but I don't know it off the top of my head.
Here is the test code for my approach (f1) versus v2:
function f1(N, M)
if length(M) > 2
error("I'm not clever enough for this case")
end
runningSum = 0
for c = 1:length(M)
runningSum += sum(M[c]:M[c]:N)
end
for c1 = 1:length(M)
for c2 = c1+1:length(M)
temp1 = M[c1]*M[c2]
runningSum -= sum(temp1:temp1:N)
end
end
return(runningSum)
end
function v2(N, M)
sum_so_far = 0
for k in 1:N
for j in M
if k%j==0
sum_so_far += k
break
end
end
end
return sum_so_far
end
f1(1000, [3,5])
v2(1000, [3,5])
N = 10000000
M = [3,5]
#time f1(N, M)
#time v2(N, M)
Timings are:
elapsed time: 4.744e-6 seconds (96 bytes allocated)
elapsed time: 0.201480996 seconds (96 bytes allocated)
Sorry, this is an interesting problem but I'm afraid I have to get back to work :-) I'll check back in later if I get a chance...

Resources