Best way to subtract Matrix from Vector of Matrix - julia

I want to subtract a Matrix from a Vector of Matrix. For Example, In below defined variable i want to subtract y for each element of xl
xl = [Matrix{Float64}(undef, 2, 3) for i in 1:1000]
y = Matrix{Float64}(undef, 2, 3)
I know this can be done with map function
map(x -> x-y, xl)
But is there any better way to do this or can we reduce runtime by multithreading? I tried benchmarking simple map and ThreadsX.map and found ThreadsX.map is slower any suggestion to reduce runtime.
#btime map(x -> x-y, xl); # 55.558 μs (1010 allocations: 117.59 KiB)
#btime ThreadsX.map(x -> x-y, xl); # 99.764 μs (5414 allocations: 285.47 KiB)
Note: Here Matrix is just an example I want to do this on large vector of custom struct that support subtraction.

First of all, map is not the fastest thing, loops will almost always beat it. This array comprehension is 70% faster than map.
#btime [[x[i]-$y[i] for i in eachindex($y)] for x in $xl];
33.300 μs (1001 allocations: 117.31 KiB)
Now, for why multithreading is slower, the answer might be that multithreading setup for this simple calculation has non-negligible overhead. You want a heavier calculation for multithreading to have a noticeable effect. Also, don't forget to interpolate ($) when timing with #btime.
#btime map(x -> x-$y, $xl);
55.600 μs (1001 allocations: 117.31 KiB)
#btime $xl .- ($y,); # broadcasting
56.100 μs (1001 allocations: 117.31 KiB)
#btime ThreadsX.map(x -> x-$y, $xl);
53.600 μs (1602 allocations: 242.05 KiB)
See this heavier example, you get about 6X speedup with multithreading:
julia> #btime ThreadsX.map(x -> exp.(sin.(x))-exp.(sin.($y)), $xl);
303.700 μs (3605 allocations: 460.89 KiB)
julia> #btime map(x -> exp.(sin.(x))-exp.(sin.($y)), $xl);
1.673 ms (3001 allocations: 336.06 KiB)

By default, Julia start with just one thread. To start Julia with more than one thread you can use --threads=n option:
PS C:\Users\Elias> julia --threads=4
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.7.2 (2022-02-06)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> xl = [rand(2, 3) for i in 1:1000];
julia> y = rand(2, 3);
julia> using ThreadsX
julia> using BenchmarkTools
julia> #btime ThreadsX.map(x -> x-$y, $xl);
42.700 μs (1353 allocations: 206.77 KiB)
For more details, see: https://docs.julialang.org/en/v1/manual/multi-threading/

Related

Generic maximum/minimum function for complex numbers

In julia, one can find (supposedly) efficient implementations of the min/minimum and max/maximum over collections of real numbers.
As these concepts are not uniquely defined for complex numbers, I was wondering if a parametrized version of these functions was already implemented somewhere.
I am currently sorting elements of the array of interest, then taking the last value, which is as far as I know, much more costly than finding the value with the maximum absolute value (or something else).
This is mostly to reproduce the Matlab behavior of the max function over complex arrays.
Here is my current code
a = rand(ComplexF64,4)
b = sort(a,by = (x) -> abs(x))
c = b[end]
The probable function call would look like
c = maximum/minimum(a,by=real/imag/abs/phase)
EDIT Some performance tests in Julia 1.5.3 with the provided solutions
function maxby0(f,iter)
b = sort(iter,by = (x) -> f(x))
c = b[end]
end
function maxby1(f, iter)
reduce(iter) do x, y
f(x) > f(y) ? x : y
end
end
function maxby2(f, iter; default = zero(eltype(iter)))
isempty(iter) && return default
res, rest = Iterators.peel(iter)
fa = f(res)
for x in rest
fx = f(x)
if fx > fa
res = x
fa = fx
end
end
return res
end
compmax(CArray) = CArray[ (abs.(CArray) .== maximum(abs.(CArray))) .& (angle.(CArray) .== maximum( angle.(CArray))) ][1]
Main.isless(u1::ComplexF64, u2::ComplexF64) = abs2(u1) < abs2(u2)
function maxby5(arr)
arr_max = arr[argmax(map(abs, arr))]
end
a = rand(ComplexF64,10)
using BenchmarkTools
#btime maxby0(abs,$a)
#btime maxby1(abs,$a)
#btime maxby2(abs,$a)
#btime compmax($a)
#btime maximum($a)
#btime maxby5($a)
Output for a vector of length 10:
>841.653 ns (1 allocation: 240 bytes)
>214.797 ns (0 allocations: 0 bytes)
>118.961 ns (0 allocations: 0 bytes)
>Execution fails
>20.340 ns (0 allocations: 0 bytes)
>144.444 ns (1 allocation: 160 bytes)
Output for a vector of length 1000:
>315.100 μs (1 allocation: 15.75 KiB)
>25.299 μs (0 allocations: 0 bytes)
>12.899 μs (0 allocations: 0 bytes)
>Execution fails
>1.520 μs (0 allocations: 0 bytes)
>14.199 μs (1 allocation: 7.94 KiB)
Output for a vector of length 1000 (with all comparisons made with abs2):
>35.399 μs (1 allocation: 15.75 KiB)
>3.075 μs (0 allocations: 0 bytes)
>1.460 μs (0 allocations: 0 bytes)
>Execution fails
>1.520 μs (0 allocations: 0 bytes)
>2.211 μs (1 allocation: 7.94 KiB)
Some remarks :
Sorting clearly (and as expected) slows the operations
Using abs2 saves a lot of performance (expected as well)
To conclude :
As a built-in function will provide this in 1.7, I will avoid using the additional Main.isless method, though it is all things considered the most performing one, to not modify the core of my julia
The maxby1 and maxby2 allocate nothing
The maxby1 feels more readable
the winner is therefore Andrej Oskin!
EDIT n°2 a new benchmark using the corrected compmax implementation
julia> #btime maxby0(abs2,$a)
36.799 μs (1 allocation: 15.75 KiB)
julia> #btime maxby1(abs2,$a)
3.062 μs (0 allocations: 0 bytes)
julia> #btime maxby2(abs2,$a)
1.160 μs (0 allocations: 0 bytes)
julia> #btime compmax($a)
26.899 μs (9 allocations: 12.77 KiB)
julia> #btime maximum($a)
1.520 μs (0 allocations: 0 bytes)
julia> #btime maxby5(abs2,$a)
2.500 μs (4 allocations: 8.00 KiB)
In Julia 1.7 you can use argmax
julia> a = rand(ComplexF64,4)
4-element Vector{ComplexF64}:
0.3443509906876845 + 0.49984979589871426im
0.1658370274750809 + 0.47815764287341156im
0.4084798173736195 + 0.9688268736875587im
0.47476987432458806 + 0.13651720575229853im
julia> argmax(abs2, a)
0.4084798173736195 + 0.9688268736875587im
Since it will take some time to get to 1.7, you can use the following analog
maxby(f, iter) = reduce(iter) do x, y
f(x) > f(y) ? x : y
end
julia> maxby(abs2, a)
0.4084798173736195 + 0.9688268736875587im
UPD: slightly more efficient way to find such maximum is to use something like
function maxby(f, iter; default = zero(eltype(iter)))
isempty(iter) && return default
res, rest = Iterators.peel(iter)
fa = f(res)
for x in rest
fx = f(x)
if fx > fa
res = x
fa = fx
end
end
return res
end
According to octave's documentation (which presumably mimics matlab's behaviour):
For complex arguments, the magnitude of the elements are used for
comparison. If the magnitudes are identical, then the results are
ordered by phase angle in the range (-pi, pi]. Hence,
max ([-1 i 1 -i])
=> -1
because all entries have magnitude 1, but -1 has the largest phase
angle with value pi.
Therefore, if you'd like to mimic matlab/octave functionality exactly, then based on this logic, I'd construct a 'max' function for complex numbers as:
function compmax( CArray )
Absmax = CArray[ abs.(CArray) .== maximum( abs.(CArray)) ]
Totalmax = Absmax[ angle.(Absmax) .== maximum(angle.(Absmax)) ]
return Totalmax[1]
end
(adding appropriate typing as desired).
Examples:
Nums0 = [ 1, 2, 3 + 4im, 3 - 4im, 5 ]; compmax( Nums0 )
# 1-element Array{Complex{Int64},1}:
# 3 + 4im
Nums1 = [ -1, im, 1, -im ]; compmax( Nums1 )
# 1-element Array{Complex{Int64},1}:
# -1 + 0im
If this was a code for my computations, I would have made my life much simpler by:
julia> Main.isless(u1::ComplexF64, u2::ComplexF64) = abs2(u1) < abs2(u2)
julia> maximum(rand(ComplexF64, 10))
0.9876138798492835 + 0.9267321874614858im
This adds a new implementation for an existing method in Main. Therefore for a library code it is not an elegant idea, but it will you get where you need it with the least effort.
The "size" of a complex number is determined by the size of its modulus. You can use abs for that. Or get 1.7 as Andrej Oskin said.
julia> arr = rand(ComplexF64, 10)
10-element Array{Complex{Float64},1}:
0.12749588414783353 + 0.09918182087026373im
0.7486501790575264 + 0.5577981676269863im
0.9399200789666509 + 0.28339836191094747im
0.9695470502095325 + 0.9978696209350371im
0.6599207157942191 + 0.0999992072342546im
0.30059521996405425 + 0.6840859625686171im
0.22746651600614132 + 0.33739559003514885im
0.9212471084010287 + 0.2590484924393446im
0.74848598947588 + 0.41129443181449554im
0.8304447441317468 + 0.8014240389454632im
julia> arr_max = arr[argmax(map(abs, arr))]
0.9695470502095325 + 0.9978696209350371im
julia> arr_min = arr[argmin(map(abs, arr))]
0.12749588414783353 + 0.09918182087026373im

How to obtain the execution time of a function in Julia?

I want to obtain the execution time of a function in Julia. Here is a minimum working example:
function raise_to(n)
for i in 1:n
y = (1/7)^n
end
end
How to obtain the time it took to execute raise_to(10) ?
The recommended way to benchmark a function is to use BenchmarkTools:
julia> function raise_to(n)
y = (1/7)^n
end
raise_to (generic function with 1 method)
julia> using BenchmarkTools
julia> #btime raise_to(10)
1.815 ns (0 allocations: 0 bytes)
Note that repeating the computation numerous times (like you did in your example) is a good idea to get more accurate measurements. But BenchmarTools does it for you.
Also note that BenchmarkTools avoids many pitfalls of merely using #time. Most notably with #time, you're likely to measure compilation time in addition to run time. This is why the first invocation of #time often displays larger times/allocations:
# First invocation: the method gets compiled
# Large resource consumption
julia> #time raise_to(10)
0.007901 seconds (7.70 k allocations: 475.745 KiB)
3.5401331746414338e-9
# Subsequent invocations: stable and low timings
julia> #time raise_to(10)
0.000003 seconds (5 allocations: 176 bytes)
3.5401331746414338e-9
julia> #time raise_to(10)
0.000002 seconds (5 allocations: 176 bytes)
3.5401331746414338e-9
julia> #time raise_to(10)
0.000001 seconds (5 allocations: 176 bytes)
3.5401331746414338e-9
#time
#time works as mentioned in previous answers, but it will include compile time if it is the first time you call the function in your julia session.
https://docs.julialang.org/en/v1/manual/performance-tips/#Measure-performance-with-%5B%40time%5D%28%40ref%29-and-pay-attention-to-memory-allocation-1
#btime
You can also use #btime if you put using BenchmarkTools in your code.
https://github.com/JuliaCI/BenchmarkTools.jl
This will rerun your function many times after an initial compile run, and then average the time.
julia> using BenchmarkTools
julia> #btime sin(x) setup=(x=rand())
4.361 ns (0 allocations: 0 bytes)
0.49587200950472454
#timeit
Another super useful library for Profiling is TimerOutputs.jl
https://github.com/KristofferC/TimerOutputs.jl
using TimerOutputs
# Time a section code with the label "sleep" to the `TimerOutput` named "to"
#timeit to "sleep" sleep(0.02)
# ... several more calls to #timeit
print_timer(to::TimerOutput)
──────────────────────────────────────────────────────────────────────
Time Allocations
────────────────────── ───────────────────────
Tot / % measured: 5.09s / 56.0% 106MiB / 74.6%
Section ncalls time %tot avg alloc %tot avg
──────────────────────────────────────────────────────────────────────
sleep 101 1.17s 41.2% 11.6ms 1.48MiB 1.88% 15.0KiB
nest 2 1 703ms 24.6% 703ms 2.38KiB 0.00% 2.38KiB
level 2.2 1 402ms 14.1% 402ms 368B 0.00% 368.0B
level 2.1 1 301ms 10.6% 301ms 368B 0.00% 368.0B
throwing 1 502ms 17.6% 502ms 384B 0.00% 384.0B
nest 1 1 396ms 13.9% 396ms 5.11KiB 0.01% 5.11KiB
level 2.2 1 201ms 7.06% 201ms 368B 0.00% 368.0B
level 2.1 3 93.5ms 3.28% 31.2ms 1.08KiB 0.00% 368.0B
randoms 1 77.5ms 2.72% 77.5ms 77.3MiB 98.1% 77.3MiB
funcdef 1 2.66μs 0.00% 2.66μs - 0.00% -
──────────────────────────────────────────────────────────────────────
Macros can have begin ... end
As seen in the docs for these functions they can cover multiple statements or functions.
#my_macro begin
statement1
statement2
# ...
statement3
end
Hope that helps.
The #time macro can be used to tell you how long the function took to evaluate. It also gives how the memory was allocated.
julia> function raise_to(n)
for i in 1:n
y = (1/7)^n
end
end
raise_to (generic function with 1 method)
julia> #time raise_to(10)
0.093018 seconds (26.00 k allocations: 1.461 MiB)
It would be nice to add that if you want to find the run time of a code block, you can do as follow:
#time begin
# your code
end

Convert type Array{Union{Missing, Float64},1} to Array{Float64,1} in Julia

I have an array of floats having some missing values, hence its type is Array{Union{Missing, Float64},1}. Is there a command to convert the non-missing part into Array{Float64,1}?
Here are three solutions, in order of preference (thanks to #BogumilKaminski for the first one):
f1(x) = collect(skipmissing(x))
f2(x) = Float64[ a for a in x if !ismissing(a) ]
f3(x) = x[.!ismissing.(x)]
f1 lazy-loads the array with skipmissing (useful for e.g. iteration) and then builds the array via collect.
f2 uses a for loop but is likely to be slower than f1 since the final array length is not computed ahead of time.
f3 uses broadcasting, and allocates temporaries in the process, and so is likely to be the slowest of the three.
We can verify the above with a simple benchmark:
using BenchmarkTools
x = Array{Union{Missing,Float64}}(undef, 100);
inds = unique(rand(1:100, 50));
x[inds] = randn(length(inds));
#btime f1($x);
#btime f2($x);
#btime f3($x);
Resulting in:
julia> #btime f1($x);
377.186 ns (7 allocations: 1.22 KiB)
julia> #btime f2($x);
471.204 ns (8 allocations: 1.23 KiB)
julia> #btime f3($x);
732.726 ns (6 allocations: 4.80 KiB)

Order of linear algebra operations in Julia

If I have a command y = A*B*x where A & B are large matrices and x & y are vectors, will Julia preform y = ((A*B)*x) or y = (A*(B*x))?
The second option should be the best as it only has to allocate an extra vector rather than a large matrix.
The best way to verify this kind of thing is to dump the lowered code via #code_lowered macro:
julia> #code_lowered A * B * x
CodeInfo(:(begin
nothing
return (Core._apply)(Base.afoldl, (Core.tuple)(Base.*, (a * b) * c), xs)
end))
Like many other languages, Julia does y = (A*B)*x instead of y = A*(B*x), so it's up to you to explicitly use parens to reduce the allocation.
julia> using BenchmarkTools
julia> #btime $A * ($B * $x);
6.800 μs (2 allocations: 1.75 KiB)
julia> #btime $A * $B * $x;
45.453 μs (3 allocations: 79.08 KiB)

Unzip an array of tuples in julia

Suppose I have an array of tuples:
arr = [(1,2), (3,4), (5,6)]
With python I can do zip(*arr) == [(1, 3, 5), (2, 4, 6)]
What is the equivalent of this in julia?
As an alternative to splatting (since that's pretty slow), you could do something like:
unzip(a) = map(x->getfield.(a, x), fieldnames(eltype(a)))
This is pretty quick.
julia> using BenchmarkTools
julia> a = collect(zip(1:10000, 10000:-1:1));
julia> #benchmark unzip(a)
BenchmarkTools.Trial:
memory estimate: 156.45 KiB
allocs estimate: 6
--------------
minimum time: 25.260 μs (0.00% GC)
median time: 31.997 μs (0.00% GC)
mean time: 48.429 μs (25.03% GC)
maximum time: 36.130 ms (98.67% GC)
--------------
samples: 10000
evals/sample: 1
By comparison, I have yet to see this complete:
#time collect(zip(a...))
For larger arrays use #ivirshup's solution below.
For smaller arrays, you can use zip and splitting.
You can achieve the same thing in Julia by using the zip() function (docs here). zip() expects many tuples to work with so you have to use the splatting operator ... to supply your arguments. Also in Julia you have to use the collect() function to then transform your iterables into an array (if you want to).
Here are these functions in action:
arr = [(1,2), (3,4), (5,6)]
# wtihout splatting
collect(zip((1,2), (3,4), (5,6)))
# Output is a vector of arrays:
> ((1,3,5), (2,4,6))
# same results with splatting
collect(zip(arr...))
> ((1,3,5), (2,4,6))
julia:
use ...
for r in zip(arr...)
println(r)
end
There is also the Unzip.jl package:
julia> using Unzip
julia> unzip([(1,2), (3,4), (5,6)])
([1, 3, 5], [2, 4, 6])
which seems to work a bit faster than the selected answer:
julia> using Unzip, BenchmarkTools
julia> a = collect(zip(1:10000, 10000:-1:1));
julia> unzip_ivirshup(a) = map(x->getfield.(a, x), fieldnames(eltype(a))) ;
julia> #btime unzip_ivirshup($a);
18.439 μs (4 allocations: 156.41 KiB)
julia> #btime unzip($a); # unzip from Unzip.jl is faster
12.798 μs (4 allocations: 156.41 KiB)
julia> unzip(a) == unzip_ivirshup(a) # check output is the same
true
Following up on #ivirshup 's answer I would like to add a version that is still an iterator
unzip(a) = (getfield.(a, x) for x in fieldnames(eltype(a)))
which keeps the result unevaluated until used. It even gives a (very slight) speed improvement when comparing
#benchmark a1, b1 = unzip(a)
BenchmarkTools.Trial:
memory estimate: 156.52 KiB
allocs estimate: 8
--------------
minimum time: 33.185 μs (0.00% GC)
median time: 76.581 μs (0.00% GC)
mean time: 83.808 μs (18.35% GC)
maximum time: 7.679 ms (97.82% GC)
--------------
samples: 10000
evals/sample: 1
vs.
BenchmarkTools.Trial:
memory estimate: 156.52 KiB
allocs estimate: 8
--------------
minimum time: 33.914 μs (0.00% GC)
median time: 39.020 μs (0.00% GC)
mean time: 64.788 μs (16.52% GC)
maximum time: 7.853 ms (98.18% GC)
--------------
samples: 10000
evals/sample: 1
I will add a solution based on the following simple macro
"""
#unzip xs, ys, ... = us
will expand the assignment into the following code
xs, ys, ... = map(x -> x[1], us), map(x -> x[2], us), ...
"""
macro unzip(args)
args.head != :(=) && error("Expression needs to be of form `xs, ys, ... = us`")
lhs, rhs = args.args
items = isa(lhs, Symbol) ? [lhs] : lhs.args
rhs_items = [:(map(x -> x[$i], $rhs)) for i in 1:length(items)]
rhs_expand = Expr(:tuple, rhs_items...)
esc(Expr(:(=), lhs, rhs_expand))
end
Since it's just a syntactic expansion, there shouldn't be any performance or type instability issue. Compare to other solutions based on fieldnames, this has the advantage of also working when the array element type is abstract. For example, while
julia> unzip_get_field(a) = map(x->getfield.(a, x), fieldnames(eltype(a)));
julia> unzip_get_field(Any[("a", 3), ("b", 4)])
ERROR: ArgumentError: type does not have a definite number of fields
the macro version still works:
julia> #unzip xs, ys = Any[("a", 3), ("b",4)]
(["a", "b"], [3, 4])

Resources