avoid memory allocation when using vcat in julia - julia

Is there a way to avoid memory allocation when concatenating arrays in julia? For example,
const x = [1.0,2.0,3.0]
I preallocate
y = zeros(3,3)
Then get new y
y = hcat(x,x,x)
BenchmarkTools.Trial:
memory estimate: 256 bytes
allocs estimate: 4
--------------
minimum time: 62.441 ns (0.00% GC)
median time: 68.445 ns (0.00% GC)
mean time: 98.795 ns (18.76% GC)
maximum time: 40.485 μs (99.71% GC)
--------------
samples: 10000
evals/sample: 987
So how can I avoid allocation?

julia> using BenchmarkTools
julia> const y = zeros(3,3);
julia> const x = [1.0,2.0,3.0];
julia> #benchmark y[1:3,:] .= x
BenchmarkTools.Trial:
memory estimate: 64 bytes
allocs estimate: 1
--------------
minimum time: 17.066 ns (0.00% GC)
median time: 20.480 ns (0.00% GC)
mean time: 30.749 ns (24.95% GC)
maximum time: 38.536 μs (99.93% GC)
--------------
samples: 10000
evals/sample: 1000
julia> y
3×3 Array{Float64,2}:
1.0 1.0 1.0
2.0 2.0 2.0
3.0 3.0 3.0
Or you could iterate over rows - for a single row call no allocation will be made:
julia> #benchmark y[1,:] = x
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 12.373 ns (0.00% GC)
median time: 12.800 ns (0.00% GC)
mean time: 13.468 ns (0.00% GC)
maximum time: 197.547 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000

Related

Slow multiplication of transpose of sparse matrix

I'm having speed issues multiplying the transpose of a sparse matrix with a column vector.
In my code the matrix A is
501×501 SparseMatrixCSC{Float64, Integer} with 1501 stored entries:
⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸
⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸
⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸
⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⢸
⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⢸
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⢸
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⢸
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⢸
⠀⠀⠀⠀⠀⠀⠀⠀⠀⡀⠀⠀⠀⠀⠀⠈⠻⣾
These are the results I get from the multiplication with f0 = rand(Float64,501,1):
Method 1
A_tr = transpose(A)
#benchmark A_tr*f
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 350.083 μs … 9.066 ms ┊ GC (min … max): 0.00% … 95.44%
Time (median): 361.208 μs ┊ GC (median): 0.00%
Time (mean ± σ): 380.269 μs ± 355.997 μs ┊ GC (mean ± σ): 4.06% ± 4.15%
Memory estimate: 218.70 KiB, allocs estimate: 11736.
Method 2
A_tr = Matrix(transpose(A))
#benchmark A_tr*f
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 87.375 μs … 210.875 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 88.542 μs ┊ GC (median): 0.00%
Time (mean ± σ): 89.286 μs ± 3.266 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
Memory estimate: 4.06 KiB, allocs estimate: 1.
Method 3
A_tr = sparse(Matrix(transpose(A)))
#benchmark A_tr*f
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min … max): 2.102 μs … 1.017 ms ┊ GC (min … max): 0.00% … 99.40%
Time (median): 2.477 μs ┊ GC (median): 0.00%
Time (mean ± σ): 2.725 μs ± 13.428 μs ┊ GC (mean ± σ): 6.92% ± 1.41%
Memory estimate: 4.06 KiB, allocs estimate: 1.
Why doesn't Method 1 produce a similar performance as Method 3? I'm probably missing something basic here.
Thank you for your help!
501×501 SparseMatrixCSC{Float64, Integer} with 1501 stored entries
Integer is an abstract type. This is what is slowing your code down. See the performance tips.
using the following MWE
using LinearAlgebra, BenchmarkTools, SparseArrays
A = sprand(501,501,0.005)
At1 = transpose(A)
At2 = sparse(Matrix(transpose(A)))
f = rand(Float64,501,1)
you will find no significant performance difference between
#benchmark $At1*$f
and
#benchmark $At2*$f
As was pointed out by #SGJ the trick is to have a primitive type as parameter for your container, i.e. SparseMatrixCSC{Float64, Int64} instead of SparseMatrixCSC{Float64, Integer}, which is what sprand(501,501,0.005) generates.
#CRJ
IIRC, transpose(A) makes a view of A through LinearAlgebra, which requires translating coordinates for every access. I don't think the fast ways of doing MV math will work through that interface. I'm not surprised that converting your transpose to a matrix object instead of trying to do math through the view is faster.
transpose(A) yields Transpose(A), where Transpose is a lazy transpose wrapper. For sparse-matrix-dense-vector multiplication there are tailored methods, which do not require any mutations of A.

Julia: writing matrix with NaN values to binary file

I have a matrix containing Float64's and possibly some NaN entries. I would like to save it to file as a file in binary (as this is fastest and speed is important here) e.g.,
io = open(string(saveto_dir,"/arr"),"w")
#time write(io,arr)
close(io)
However Julia gives the error message:
`write` is not supported on non-isbits arrays
Is there a workaround?
It sounds like this is due to (perhaps inadvertently) using an Array of type Any (i.e., Array{Any}), but if your data is as described you can use an Array{Float64} instead, in which case you will not have this problem.
To give a concrete example, writing
arr = Any[1.0, NaN]
io = open("./arr","w")
write(io,arr)
close(io)
gives exactly the error you describe, but if you just change that to
arr = Float64[1.0, NaN]
io = open("./arr","w")
write(io,arr)
close(io)
there is no error.
In general, one should vehemently avoid Array{Any} (or really anything involving Any) when there is any other practicable option, effectively due to type instability. For example:
julia> A = rand(10000); # Array{Float64}
julia> B = Any[A...]; # Array{Any}
julia> using BenchmarkTools
julia> #benchmark sum($A)
BechmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.318 μs … 12.705 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.347 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.400 μs ± 269.376 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▇█▅▂▂▂ ▃ ▂▂ ▁ ▂
██████▇▆▁▅▅███▇▁▃▃▁▁██▆▅▃▄▃▃▁▁▁▁▁▁▁▁▁▁▁▁▃▁▃▃▁▃▁▁▃▁▁▁▃▁▁▃▁▇█ █
1.32 μs Histogram: log(frequency) by time 2.62 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> #benchmark sum($B)
BechmarkTools.Trial: 10000 samples with 1 evaluations.
Range (min … max): 185.967 μs … 1.664 ms ┊ GC (min … max): 0.00% … 75.01%
Time (median): 195.609 μs ┊ GC (median): 0.00%
Time (mean ± σ): 213.615 μs ± 62.394 μs ┊ GC (mean ± σ): 1.31% ± 4.62%
▅██▅▄▅▃▃▂▂▄▃▁▂▁▁▂▄▁▁ ▃▃ ▂ ▂
███████████████████████▇████▆▆▇▆▄▄▄▃▄▃▅▄▄▅▄▁▅▃▄▅▅▃▄▃▃▄▄▅▁▄▅█ █
186 μs Histogram: log(frequency) by time 413 μs <
Memory estimate: 156.23 KiB, allocs estimate: 9999.
About a 100x performance difference is fairly typical (notice also the difference in number of allocations).

Dictionary from two Arrays/Vectors in Julia

How can I create a Dict() from two arrays, one with the keys and one with the values:
a = ["a", "b", "c"] # keys
b = [1,2,3] # values
Solution 1
Dict(zip(a,b))
Solution 2
Dict(a .=> b)
To complement the Georgery's answer, note that the first method (Dict(zip(a,b))) is much faster for small vectors, but the difference becomes negligible for larger ones:
julia> using BenchmarkTools
julia> a = rand(5); b = rand(5);
julia> #benchmark Dict(zip(a,b))
BenchmarkTools.Trial:
memory estimate: 672 bytes
allocs estimate: 6
--------------
minimum time: 344.143 ns (0.00% GC)
median time: 356.382 ns (0.00% GC)
mean time: 383.371 ns (6.12% GC)
maximum time: 8.124 μs (94.84% GC)
--------------
samples: 10000
evals/sample: 217
julia> #benchmark Dict(a .=> b)
BenchmarkTools.Trial:
memory estimate: 832 bytes
allocs estimate: 7
--------------
minimum time: 950.615 ns (0.00% GC)
median time: 1.013 μs (0.00% GC)
mean time: 1.051 μs (2.30% GC)
maximum time: 62.573 μs (97.09% GC)
--------------
samples: 10000
evals/sample: 26
julia> a = rand(50000);b = rand(50000);
julia> #benchmark Dict(zip(a,b))
BenchmarkTools.Trial:
memory estimate: 5.67 MiB
allocs estimate: 38
--------------
minimum time: 1.581 ms (0.00% GC)
median time: 1.611 ms (0.00% GC)
mean time: 1.675 ms (3.41% GC)
maximum time: 2.917 ms (25.30% GC)
--------------
samples: 2984
evals/sample: 1
julia> #benchmark Dict(a .=> b)
BenchmarkTools.Trial:
memory estimate: 6.43 MiB
allocs estimate: 40
--------------
minimum time: 1.624 ms (0.00% GC)
median time: 1.666 ms (0.00% GC)
mean time: 1.740 ms (3.79% GC)
maximum time: 3.762 ms (14.17% GC)
--------------
samples: 2873
evals/sample: 1

Julia: How to return the number of unique elements in an Array

What's the function to return the number of unique elements in an Array in Julia?
In R you have length(unique(x)). I can do the same in Julia but there should be a more efficient way I think.
If you want an exact answer length(unique(x)) is as efficient as it gets for general objects. If your values have a limited domain, eg UInt8, it may be more efficient to use a fixed size table. If you can accept an approximation, then you can use the HyperLogLog data structure / algorithm, which is implemented in the OnlineStats package:
https://joshday.github.io/OnlineStats.jl/latest/api/#OnlineStats.HyperLogLog
It appears that length(Set(x)) is somewhat faster than length(unique(x)).
julia> using StatsBase, BenchmarkTools
julia> num_unique(x) = length(Set(x));
julia> a = sample(1:100, 200);
julia> num_unique(x) == length(unique(x))
true
julia> #benchmark length(unique(x)) setup=(x = sample(1:10000, 20000))
BenchmarkTools.Trial:
memory estimate: 450.50 KiB
allocs estimate: 36
--------------
minimum time: 498.130 μs (0.00% GC)
median time: 570.588 μs (0.00% GC)
mean time: 579.011 μs (2.41% GC)
maximum time: 2.321 ms (63.03% GC)
--------------
samples: 5264
evals/sample: 1
julia> #benchmark num_unique(x) setup=(x = sample(1:10000, 20000))
BenchmarkTools.Trial:
memory estimate: 288.68 KiB
allocs estimate: 8
--------------
minimum time: 283.031 μs (0.00% GC)
median time: 393.317 μs (0.00% GC)
mean time: 397.878 μs (4.24% GC)
maximum time: 33.499 ms (98.80% GC)
--------------
samples: 6704
evals/sample: 1
And another benchmark for an array of strings:
julia> using Random
julia> #benchmark length(unique(x)) setup=(x = [randstring(3) for _ in 1:10000])
BenchmarkTools.Trial:
memory estimate: 450.50 KiB
allocs estimate: 36
--------------
minimum time: 818.024 μs (0.00% GC)
median time: 895.944 μs (0.00% GC)
mean time: 906.568 μs (1.61% GC)
maximum time: 1.964 ms (51.19% GC)
--------------
samples: 3049
evals/sample: 1
julia> #benchmark num_unique(x) setup=(x = [randstring(3) for _ in 1:10000])
BenchmarkTools.Trial:
memory estimate: 144.68 KiB
allocs estimate: 8
--------------
minimum time: 367.018 μs (0.00% GC)
median time: 378.666 μs (0.00% GC)
mean time: 384.486 μs (1.07% GC)
maximum time: 1.314 ms (70.80% GC)
--------------
samples: 4527
evals/sample: 1
if you don't need the x array after, length(unique!(x)) is slightly faster.
with Floats and Integers, you can use map reduce if your array is already sorted.
function count_unique_sorted(x)
f(a) = (a,0)
function op(a,b)
if a[1] == b[1]
return (b[1],a[2])
else
return (b[1],a[2]+1)
end
end
return mapreduce(f,op,x)[2]+1
end
If you don't care about the order of the array x, you can sort and count in one function:
count_unique_sorted!(x)=count_unique_sorted(sort!(x))
Some benchmarks:
using Random,StatsBase, BenchmarkTools
x = sample(1:100,200)
length(unique(x)) == count_unique_sorted(sort(x)) #true
Using length(unique(x)):
#benchmark length(unique(x))
BenchmarkTools.Trial:
memory estimate: 6.08 KiB
allocs estimate: 17
--------------
minimum time: 3.350 μs (0.00% GC)
median time: 3.688 μs (0.00% GC)
mean time: 5.352 μs (24.35% GC)
maximum time: 6.691 ms (99.90% GC)
--------------
samples: 10000
evals/sample: 8
Using Set:
#benchmark length(Set(x))
BenchmarkTools.Trial:
memory estimate: 2.82 KiB
allocs estimate: 8
--------------
minimum time: 2.256 μs (0.00% GC)
median time: 2.467 μs (0.00% GC)
mean time: 3.654 μs (26.04% GC)
maximum time: 5.297 ms (99.91% GC)
--------------
samples: 10000
evals/sample: 9
Using count_unique_sorted!:
x2 = copy(x)
#benchmark count_unique_sorted!(x2)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 948.387 ns (0.00% GC)
median time: 990.323 ns (0.00% GC)
mean time: 1.038 μs (0.00% GC)
maximum time: 2.481 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 31
Using count_unique_sorted with an already sorted array
x3 = sort(x)
#benchmark count_unique_sorted(x3)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 140.962 ns (0.00% GC)
median time: 146.831 ns (0.00% GC)
mean time: 154.121 ns (0.00% GC)
maximum time: 381.806 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 852
Using count_unique_sorted and sorting the array
#benchmark count_unique_sorted(sort(x))
BenchmarkTools.Trial:
memory estimate: 1.77 KiB
allocs estimate: 1
--------------
minimum time: 1.470 μs (0.00% GC)
median time: 1.630 μs (0.00% GC)
mean time: 2.367 μs (21.82% GC)
maximum time: 4.880 ms (99.94% GC)
--------------
samples: 10000
evals/sample: 10
For strings, sorting and counting is slower than making a Set.

How to check in julia if a vector of bool is all falses

I just wander if there is some pre built-in and() function or something better than this one:
filter = [true,false,true,false]
length([i for i in filter if i]) > 0 # true
filter = [false,false,false]
length([i for i in filter if i]) > 0 # false
julia> x = [true,false,true,false]
4-element Array{Bool,1}:
true
false
true
false
julia> all(x)
false
Sorry, you said 'all falses'. Then:
julia> all(!, x)
or
julia> any(x)
This isn't an answer to your question, but note that filter is an existing function, so you probably won't want to overwrite it.
julia> a = [true, false, true, false];
julia> filter(!, a)
2-element Array{Bool,1}:
false
false
julia> filter(!!, a)
2-element Array{Bool,1}:
true
true
sum() is actually the fastest:
x = falses(1_000_000)
julia> #benchmark sum(x)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 2.834 μs (0.00% GC)
median time: 2.905 μs (0.00% GC)
mean time: 3.079 μs (0.00% GC)
maximum time: 12.648 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 9
julia> #benchmark all(!, x)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 546.055 μs (0.00% GC)
median time: 546.463 μs (0.00% GC)
mean time: 558.960 μs (0.00% GC)
maximum time: 1.709 ms (0.00% GC)
--------------
samples: 8928
evals/sample: 1
#benchmark any(x)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 5.728 μs (0.00% GC)
median time: 5.752 μs (0.00% GC)
mean time: 6.044 μs (0.00% GC)
maximum time: 28.300 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 6

Resources