Channel in Julia - julia

I read about channel in julia in parallel computing.
I realized that we can use channel to be able to write and read in a same time. When I wanted to try it I find out when we use take actually we take the top data on channel and then remove it from that! How another process can read that ? and how channel is doing parallel? and I don't know when it's good for using that and how efficient it is.
-------------------- EDITE---------------------- :
This is a simple program for showing how to use channel for parallelisation:
julia> addprocs(4)
4-element Array{Int64,1}:
2
3
4
5
julia> const jobs = RemoteChannel(()->Channel{Int}(32))
RemoteChannel{Channel{Int64}}(1, 1, 6)
julia> const results = RemoteChannel(()->Channel{Tuple}(32))
RemoteChannel{Channel{Tuple}}(1, 1, 7)
julia> #everywhere function do_work(jobs,results)
while true
job_id=take!(jobs)
exec_time=rand()
sleep(exec_time)
put!(results, (job_id, exec_time, myid()))
end
end
julia> function make_jobs(n)
for i in 1:n
put!(jobs,i)
end
end
make_jobs (generic function with 1 method)
julia> n=12
12
julia> #schedule make_jobs(n)
Task (done) #0x00007fa873905a50
julia> for p in workers()
#async remote_do(do_work, p, jobs, results)
end
julia> #elapsed while n>0
job_id, exec_time, where=take!(results)
println("$job_id finished in $(round(exec_time,2)) seconds on worker $where")
n=n-1
end
4 finished in 0.13 seconds on worker 5
1 finished in 0.21 seconds on worker 4
2 finished in 0.64 seconds on worker 3
3 finished in 0.82 seconds on worker 2
6 finished in 0.72 seconds on worker 5
7 finished in 0.35 seconds on worker 3
5 finished in 0.89 seconds on worker 4
11 finished in 0.33 seconds on worker 4
10 finished in 0.61 seconds on worker 3
8 finished in 0.82 seconds on worker 2
9 finished in 0.71 seconds on worker 5
12 finished in 0.24 seconds on worker 4
0.082128167
I can't understand! We can use a a for loop instead of the function do_work! and then we can save it the result as an SharedArray and then we could parallel it by dynamic scheduling or something.
Why we are using like this?
Actually my question is when we use like this? and why?

Related

How to obtain the execution time of a function in Julia?

I want to obtain the execution time of a function in Julia. Here is a minimum working example:
function raise_to(n)
for i in 1:n
y = (1/7)^n
end
end
How to obtain the time it took to execute raise_to(10) ?
The recommended way to benchmark a function is to use BenchmarkTools:
julia> function raise_to(n)
y = (1/7)^n
end
raise_to (generic function with 1 method)
julia> using BenchmarkTools
julia> #btime raise_to(10)
1.815 ns (0 allocations: 0 bytes)
Note that repeating the computation numerous times (like you did in your example) is a good idea to get more accurate measurements. But BenchmarTools does it for you.
Also note that BenchmarkTools avoids many pitfalls of merely using #time. Most notably with #time, you're likely to measure compilation time in addition to run time. This is why the first invocation of #time often displays larger times/allocations:
# First invocation: the method gets compiled
# Large resource consumption
julia> #time raise_to(10)
0.007901 seconds (7.70 k allocations: 475.745 KiB)
3.5401331746414338e-9
# Subsequent invocations: stable and low timings
julia> #time raise_to(10)
0.000003 seconds (5 allocations: 176 bytes)
3.5401331746414338e-9
julia> #time raise_to(10)
0.000002 seconds (5 allocations: 176 bytes)
3.5401331746414338e-9
julia> #time raise_to(10)
0.000001 seconds (5 allocations: 176 bytes)
3.5401331746414338e-9
#time
#time works as mentioned in previous answers, but it will include compile time if it is the first time you call the function in your julia session.
https://docs.julialang.org/en/v1/manual/performance-tips/#Measure-performance-with-%5B%40time%5D%28%40ref%29-and-pay-attention-to-memory-allocation-1
#btime
You can also use #btime if you put using BenchmarkTools in your code.
https://github.com/JuliaCI/BenchmarkTools.jl
This will rerun your function many times after an initial compile run, and then average the time.
julia> using BenchmarkTools
julia> #btime sin(x) setup=(x=rand())
4.361 ns (0 allocations: 0 bytes)
0.49587200950472454
#timeit
Another super useful library for Profiling is TimerOutputs.jl
https://github.com/KristofferC/TimerOutputs.jl
using TimerOutputs
# Time a section code with the label "sleep" to the `TimerOutput` named "to"
#timeit to "sleep" sleep(0.02)
# ... several more calls to #timeit
print_timer(to::TimerOutput)
──────────────────────────────────────────────────────────────────────
Time Allocations
────────────────────── ───────────────────────
Tot / % measured: 5.09s / 56.0% 106MiB / 74.6%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────
sleep 101 1.17s 41.2% 11.6ms 1.48MiB 1.88% 15.0KiB
nest 2 1 703ms 24.6% 703ms 2.38KiB 0.00% 2.38KiB
level 2.2 1 402ms 14.1% 402ms 368B 0.00% 368.0B
level 2.1 1 301ms 10.6% 301ms 368B 0.00% 368.0B
throwing 1 502ms 17.6% 502ms 384B 0.00% 384.0B
nest 1 1 396ms 13.9% 396ms 5.11KiB 0.01% 5.11KiB
level 2.2 1 201ms 7.06% 201ms 368B 0.00% 368.0B
level 2.1 3 93.5ms 3.28% 31.2ms 1.08KiB 0.00% 368.0B
randoms 1 77.5ms 2.72% 77.5ms 77.3MiB 98.1% 77.3MiB
funcdef 1 2.66μs 0.00% 2.66μs - 0.00% -
──────────────────────────────────────────────────────────────────────
Macros can have begin ... end
As seen in the docs for these functions they can cover multiple statements or functions.
#my_macro begin
statement1
statement2
# ...
statement3
end
Hope that helps.
The #time macro can be used to tell you how long the function took to evaluate. It also gives how the memory was allocated.
julia> function raise_to(n)
for i in 1:n
y = (1/7)^n
end
end
raise_to (generic function with 1 method)
julia> #time raise_to(10)
0.093018 seconds (26.00 k allocations: 1.461 MiB)
It would be nice to add that if you want to find the run time of a code block, you can do as follow:
#time begin
# your code
end

Profiling/Memory allocation in Julia

I am running an empty double loop in Julia
Ngal = 16000000
function get_vinz()
for i in 1:5
print(i, " ")
for j in i:Ngal
end
end
end
and the outcome of #time vinz() gives me
1 2 3 4 5 5.332660 seconds (248.94 M allocations: 4.946 GiB, 7.12% gc time)
What is the 5GB of memory allocated for?
the culprit is the use of global variables. your function calls the global variable, and with each call, a Int64 is allocated (64 bits). 64*16000000*5/1024/1024 = 4882.8125 MiB, that seems like the culprit your function doesn't know the size of the inner loop, and does a lookup on the global scope to check Ngal. It does that every single loop. compare that with this implementation:
function get_vinz(Ngal)
for i in 1:5
print(i, " ")
for j in i:Ngal
end
end
end
julia> #time get_vinz(Ngal)
1 2 3 4 5 0.043481 seconds (53.67 k allocations: 2.776 MiB)
also, the first time a function is called in julia, is compiled to machine code, so the subsecuent runs are fast. measuring time again:
julia> #time get_vinz(Ngal)
1 2 3 4 5 0.000639 seconds (50 allocations: 1.578 KiB)
The use of global variables is a bad practice in general. the recommended way is to pass those values to the function

how to change max recursion depth in Julia?

I was curious how quick and accurate, algorithm from Rosseta code ( https://rosettacode.org/wiki/Ackermann_function ) for (4,2) parameters, could be. But got StackOverflowError.
julia> using Memoize
#memoize ack3(m, n) =
m == 0 ? n + 1 :
n == 0 ? ack3(m-1, 1) :
ack3(m-1, ack3(m, n-1))
# WARNING! Next line has to calculate and print number with 19729 digits!
julia> ack3(4,2) # -> StackOverflowError
# has to be -> 2003529930406846464979072351560255750447825475569751419265016973710894059556311
# ...
# 4717124577965048175856395072895337539755822087777506072339445587895905719156733
EDIT:
Oscar Smith is right that trying ack3(4,2) is unrealistic. This is version translated from Rosseta's C++:
module Ackermann
function ackermann(m::UInt, n::UInt)
function ack(m::UInt, n::BigInt)
if m == 0
return n + 1
elseif m == 1
return n + 2
elseif m == 2
return 3 + 2 * n;
elseif m == 3
return 5 + 8 * (BigInt(2) ^ n - 1)
else
if n == 0
return ack(m - 1, BigInt(1))
else
return ack(m - 1, ack(m, n - 1))
end
end
end
return ack(m, BigInt(n))
end
end
julia> import Ackermann;Ackermann.ackermann(UInt(1),UInt(1));#time(a4_2 = Ackermann.ackermann(UInt(4),UInt(2)));t = "$a4_2"; println("len = $(length(t)) first_digits=$(t[1:20]) last digits=$(t[end-20:end])")
0.000041 seconds (57 allocations: 33.344 KiB)
len = 19729 first_digits=20035299304068464649 last digits=445587895905719156733
Julia itself does not have an internal limit to the stack size, but your operating system does. The exact limits here (and how to change them) will be system dependent. On my Mac (and I assume other POSIX-y systems), I can check and change the stack size of programs that get called by my shell with ulimit:
$ ulimit -s
8192
$ julia -q
julia> f(x) = x > 0 ? f(x-1) : 0 # a simpler recursive function
f (generic function with 1 method)
julia> f(523918)
0
julia> f(523919)
ERROR: StackOverflowError:
Stacktrace:
[1] f(::Int64) at ./REPL[1]:1 (repeats 80000 times)
$ ulimit -s 16384
$ julia -q
julia> f(x) = x > 0 ? f(x-1) : 0
f (generic function with 1 method)
julia> f(1048206)
0
julia> f(1048207)
ERROR: StackOverflowError:
Stacktrace:
[1] f(::Int64) at ./REPL[1]:1 (repeats 80000 times)
I believe the exact number of recursive calls that will fit on your stack will depend upon both your system and the complexity of the function itself (that is, how much each recursive call needs to store on the stack). This is the bare minimum. I have no idea how big you'd need to make the stack limit in order to compute that Ackermann function.
Note that I doubled the stack size and it more than doubled the number of recursive calls — this is because of a constant overhead:
julia> log2(523918)
18.998981503278365
julia> 2^19 - 523918
370
julia> log2(1048206)
19.99949084151746
julia> 2^20 - 1048206
370
Just fyi, even if you change the max recursion depth, you won't get the right answer as Julia uses 64 bit integers, so integer overflow with make stuff not work. To get the right answer, you will have to use big ints to have any hope. The next problem is that you probably don't want to memoize, as almost all of the computations are not repeated, and you will be computing the function more than 10^19729 different inputs, which you really do not want to store.

Utilizing ndgrid/meshgrid functionality in Julia

I'm trying to find functionality in Julia similar to MATLAB's meshgrid or ndgrid. I know Julia has defined ndgrid in the examples but when I try to use it I get the following error.
UndefVarError: ndgrid not defined
Anyone know either how to get the builtin ndgrid function to work or possibly another function I haven't found or library that provides these methods (the builtin function would be preferred)? I'd rather not write my own in this case.
Thanks!
We prefer to avoid these functions, since they allocate arrays that usually aren't necessary. The values in these arrays have such a regular structure that they don't need to be stored; they can just be computed during iteration. For example, one alternative approach is to write an array comprehension:
julia> [ 10i + j for i=1:5, j=1:5 ]
5×5 Array{Int64,2}:
11 12 13 14 15
21 22 23 24 25
31 32 33 34 35
41 42 43 44 45
51 52 53 54 55
Or, you can write for loops, or iterate over a product iterator:
julia> collect(Iterators.product(1:2, 3:4))
2×2 Array{Tuple{Int64,Int64},2}:
(1, 3) (1, 4)
(2, 3) (2, 4)
I do find sometimes it's convenient to use some function like meshgrid in numpy. It's easy to do it with list comprehension:
function meshgrid(x, y)
X = [i for i in x, j in 1:length(y)]
Y = [j for i in 1:length(x), j in y]
return X, Y
end
e.g.
x = 1:4
y = 1:3
X, Y = meshgrid(x, y)
now
julia> X
4×3 Array{Int64,2}:
1 1 1
2 2 2
3 3 3
4 4 4
julia> Y
4×3 Array{Int64,2}:
1 2 3
1 2 3
1 2 3
1 2 3
However, I did not find this makes the code run faster than using iteration. Here's what I mean:
After defining
x = 1:1000
y = x
X, Y = meshgrid(x, y)
I did benchmark on the following two functions
using Statistics
function fun1()
return mean(sqrt.(X.*X + Y.*Y))
end
function fun2()
sum = 0.0
for i in 1:1000
for j in 1:1000
sum += sqrt(i*i + j*j)
end
end
return sum / (1000*1000)
end
Here are the benchmark results:
julia> #btime fun1()
8.310 ms (19 allocations: 30.52 MiB)
julia> #btime run2()
1.671 ms (0 allocations: 0 bytes)
The meshgrid method is both significantly slower and taking more memory. Any Julia expert knows why? I understand Julia is a compiling language unlike Python so iterations won't be slower than vectorization, but I don't understand why vector(array) calculation is many times slower than iteration. (For bigger N this difference is even larger.)
Edit: After reading this post, I have the following updated version of the 'meshgrid' method. The idea is to not create a meshgrid beforehand, but to do it in the calculation via Julia's powerful elementwise array operation:
x = collect(1:1000)
y = x'
function fun1v2()
mean(sqrt.(x .* x .+ y .* y))
end
The trick here is the .+ between a size-M column array and a size-N row array which returns a M-by-N array. It does the 'meshgrid' for you. This function is nearly 3 times faster then fun1, albeit not as fast as fun2.
julia> #btime fun1v2()
3.189 ms (24 allocations: 7.63 MiB)
765.8435104896155
Above, #ChrisRackauckas suggests that the "proper way" to do this is with a lazy operator but he hadn't gotten around to it.
There is now a registered packaged with lazy ndgrid in it:
https://github.com/JuliaArrays/LazyGrids.jl
It is more general than the version in
VectorizedRoutines.jl
because it can handle vectors with different types, e.g.,
ndgrid(1:3, Float16[0:2], ["x", "y", "z"]).
There are Literate.jl examples in the docs that show the lazy performance is pretty good.
Of course lazy meshgrid is just one step away:
meshgrid(y,x) = (ndgrid_lazy(x,y)[[2,1]]...,)

Why does my R uses all CPU cores when running functions like step()?

My R often shows using more than 100% CPU in "top", does it mean it's using more than 1 core? As I understand, R by default uses 1 core of CPU unless when using certain parallel computing packages. But I am just using step() function. It's Dell T410 + Ubuntu Server 14.04 + R 3.3.2.
Is it R 3.3.2 or Dell Server or Ubuntu Server 14.04 that's helping? Or is it just a bug of "top"?
top - 17:42:39 up 11:09, 2 users, load average: 16.00, 16.01, 15.98
Tasks: 282 total, 3 running, 279 sleeping, 0 stopped, 0 zombie
%Cpu(s): 14.9 us, 85.1 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 24668964 total, 23472468 used, 1196496 free, 229884 buffers
KiB Swap: 25145340 total, 60 used, 25145280 free. 1117020 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17704 can 20 0 21.495g 0.020t 13016 R **1540** 87.1 4458:52 rsession
17748 can 20 0 26632 1780 1172 S 0.7 0.0 0:50.62 top
2528 can 20 0 105660 2276 1260 S 0.3 0.0 0:00.01 sshd
R often appears to be using more than one core even when your code is technically single-threaded. This often happens because R is switching between different processors too quickly to notice. I created the code below as an example. When I run it on my Windows 10 machine I see two processors hard at work.
library(microbenchmark)
pb <- txtProgressBar(min = 0, max = 100, style = 3)
for(i in 1:100) {
microbenchmark(rnorm(10000), runif(10000), rpois(10000, 1))
setTxtProgressBar(pb, i)
}
close(pb)
If you're interested in learning more about the situations in which a computer will try to stick with the same logical processor, look up "Processor Affinity."

Resources