I have a matrix containing Float64's and possibly some NaN entries. I would like to save it to file as a file in binary (as this is fastest and speed is important here) e.g.,
io = open(string(saveto_dir,"/arr"),"w")
#time write(io,arr)
close(io)
However Julia gives the error message:
`write` is not supported on non-isbits arrays
Is there a workaround?
It sounds like this is due to (perhaps inadvertently) using an Array of type Any (i.e., Array{Any}), but if your data is as described you can use an Array{Float64} instead, in which case you will not have this problem.
To give a concrete example, writing
arr = Any[1.0, NaN]
io = open("./arr","w")
write(io,arr)
close(io)
gives exactly the error you describe, but if you just change that to
arr = Float64[1.0, NaN]
io = open("./arr","w")
write(io,arr)
close(io)
there is no error.
In general, one should vehemently avoid Array{Any} (or really anything involving Any) when there is any other practicable option, effectively due to type instability. For example:
julia> A = rand(10000); # Array{Float64}
julia> B = Any[A...]; # Array{Any}
julia> using BenchmarkTools
julia> #benchmark sum($A)
BechmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.318 μs … 12.705 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.347 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.400 μs ± 269.376 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▇█▅▂▂▂ ▃ ▂▂ ▁ ▂
██████▇▆▁▅▅███▇▁▃▃▁▁██▆▅▃▄▃▃▁▁▁▁▁▁▁▁▁▁▁▁▃▁▃▃▁▃▁▁▃▁▁▁▃▁▁▃▁▇█ █
1.32 μs Histogram: log(frequency) by time 2.62 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> #benchmark sum($B)
BechmarkTools.Trial: 10000 samples with 1 evaluations.
Range (min … max): 185.967 μs … 1.664 ms ┊ GC (min … max): 0.00% … 75.01%
Time (median): 195.609 μs ┊ GC (median): 0.00%
Time (mean ± σ): 213.615 μs ± 62.394 μs ┊ GC (mean ± σ): 1.31% ± 4.62%
▅██▅▄▅▃▃▂▂▄▃▁▂▁▁▂▄▁▁ ▃▃ ▂ ▂
███████████████████████▇████▆▆▇▆▄▄▄▃▄▃▅▄▄▅▄▁▅▃▄▅▅▃▄▃▃▄▄▅▁▄▅█ █
186 μs Histogram: log(frequency) by time 413 μs <
Memory estimate: 156.23 KiB, allocs estimate: 9999.
About a 100x performance difference is fairly typical (notice also the difference in number of allocations).
Related
In Julia, you can easily multiply all values of a Vector, Matrix or Array a by a Float64, say 2, by:
a=ones(3,3)
a*=2
I wanted to know if this is also easily achievable for Dictionnaries, for instance, Dictionnary{Int,Float64} or Dictionnary{Tuple{Int,Int},Float64}
I know it can be done by iterating with for loops on keys and values but I want to do it "in place" like dict*=2. Is it possible?
This in-place map! might be 20X faster than replace.
map!(x->2x, values(d))
Testing:
julia> #benchmark map!(x->2x, values(d)) setup=(d = Dict(1:100 .=> rand(100)))
BenchmarkTools.Trial: 10000 samples with 985 evaluations.
Range (min … max): 59.391 ns … 154.315 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 60.508 ns ┊ GC (median): 0.00%
Time (mean ± σ): 61.706 ns ± 4.912 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█
▃██▆▄▄▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▂
59.4 ns Histogram: frequency by time 86.9 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
vs.
julia> #benchmark d_mult = replace(kv -> kv[1] => kv[2]*2, d) setup=(d = Dict(1:100 .=> rand(100)))
BenchmarkTools.Trial: 10000 samples with 30 evaluations.
Range (min … max): 916.667 ns … 274.520 μs ┊ GC (min … max): 0.00% … 99.13%
Time (median): 1.153 μs ┊ GC (median): 0.00%
Time (mean ± σ): 3.052 μs ± 11.919 μs ┊ GC (mean ± σ): 21.27% ± 5.50%
▆█▅▄▁ ▁▂▂▃▂▄▇▆▃▁ ▂
█████▇▇▆▄▄▁▁▄▁▄▅▅▃▃▁▃▁▁███████████▇▇▆▆▅▅▄▄▅▆▅▅▆▄▄▅▄▃▄▄▅▅▅▆▆▆█ █
917 ns Histogram: log(frequency) by time 7.68 μs <
Memory estimate: 4.72 KiB, allocs estimate: 5.
I have found a short way thanks to this answer:
d_mult = replace(kv -> kv[1] => kv[2]*2, d)
Maybe someone will find something even shorter (d_mult *= 2 or similar)
I assume you are actually referring to types like Base.Dict{Int,Float64}. There is an alternative implementation of dictionaries in Dictionaries.jl, providing the type Dictionary. This type, in contrast to Base.Dict, is treated as an iterable over values, not keys, and therefore can be used mostly like an array:
julia> dict .+ 1
3-element Dictionary{String,Int64}
"a" │ 2
"b" │ 3
"c" │ 4
I'm looking for the most efficient and fastest way to create a vector of SVectors from a 2D Matrix in Julia, dynamically. Suppose I have the following Matrix:
julia> r = rand(1000, 5);
Then I want to convert it to the exact type Vector{SVector{5, Float64}}. I tried the following, but I'm looking for a faster way:
julia> using StaticArrays
julia> function create_VSVec(data::Matrix{T}) where T
ldim, mdim = argmin(size(data)), argmax(size(data))
len_sv = size(data, ldim)
sv = Vector{SVector{len_sv, T}}(undef, size(data, mdim))
for idx in axes(sv, 1)
sv[idx] = SVector{len_sv, T}(selectdim(data, mdim, idx))
end
return sv
end;
julia> #benchmark create_VSVec($r)
BenchmarkTools.Trial: 2993 samples with 1 evaluation.
Range (min … max): 1.415 ms … 22.241 ms ┊ GC (min … max): 0.00% … 91.78%
Time (median): 1.518 ms ┊ GC (median): 0.00%
Time (mean ± σ): 1.665 ms ± 1.275 ms ┊ GC (mean ± σ): 4.76% ± 5.78%
█▃
▂▃▆▇██▇▆▅▄▃▄▃▃▃▃▄▃▃▃▃▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂ ▃
1.41 ms Histogram: frequency by time 2.31 ms <
Memory estimate: 703.19 KiB, allocs estimate: 14480.
julia> a = create_VSVec(r);
julia> a[1] == r[1, :]
true
A list-comprehension is 2X faster than create_VSVec2.
julia> r = rand(1000, 5);
julia> #benchmark create_VSVec2($r)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 12.500 μs … 4.741 ms ┊ GC (min … max): 0.00% … 98.95%
Time (median): 21.750 μs ┊ GC (median): 0.00%
Time (mean ± σ): 21.849 μs ± 96.395 μs ┊ GC (mean ± σ): 11.44% ± 2.93%
▆█▄ ▁▃▄▅▆▄▅▄▂▁ ▁ ▂ ▂
███▇▅▃▄▂▂▄▃▄▅▄▄▄▃▃▅▄█████████████▇▇▆▆▅▅▃▅▇▇▇███▆▅▆▅▆▆▆▇▆▆▆▇ █
12.5 μs Histogram: log(frequency) by time 39.9 μs <
Memory estimate: 39.78 KiB, allocs estimate: 15.
julia> #benchmark [SVector{5,Float64}(i) for i in eachrow($r)]
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
Range (min … max): 2.400 μs … 716.087 μs ┊ GC (min … max): 0.00% … 97.09%
Time (median): 6.094 μs ┊ GC (median): 0.00%
Time (mean ± σ): 12.261 μs ± 34.098 μs ┊ GC (mean ± σ): 19.12% ± 7.80%
▆█▅▅▂ ▄▅▆▅▄▄▄▃▃▃▂▂▂▁▂▁ ▂
▆▄▅██████▇▆▆▄▄▆▄▄▅▄▃▄▃▆▄▄▄▅▅▆██████████████████▇▇▇▆▇▆▆▆▅▄▅▄▄ █
2.4 μs Histogram: log(frequency) by time 24.8 μs <
Memory estimate: 39.11 KiB, allocs estimate: 2.
Following the approach in the OP, and considering the important note that #DNF mentioned in the comments, using reinterpret in the following way would make it ~90X faster and 18X lower in memory allocation:
julia> function create_VSVec2(data::Matrix{T}) where T
ldim = argmin(size(data))
ldim == 1 ? d = data : d = data'
sv = identity.(reinterpret(SVector{5, Float64}, vec(d)))
return sv
end;
julia> #benchmark create_VSVec2($r)
BenchmarkTools.Trial: 10000 samples with 6 evaluations.
Range (min … max): 4.167 μs … 1.611 ms ┊ GC (min … max): 0.00% … 98.04%
Time (median): 22.283 μs ┊ GC (median): 0.00%
Time (mean ± σ): 21.380 μs ± 55.021 μs ┊ GC (mean ± σ): 16.17% ± 6.87%
▁█▆▄▁▁ ▃▄▄▄▄▆▆▆▅▃▂▁▁▂▂▂▁▁▁ ▂
██████▇▇▆▄▅▅▃▅▅▃▅▄▇███████████████████████▇▇▇▇▇▇▇▆▅▆▆▆▆▆▅▅▆ █
4.17 μs Histogram: log(frequency) by time 50 μs <
Memory estimate: 39.19 KiB, allocs estimate: 4.
julia> a = create_VSVec2(r);
julia> a[1] == r[1, :]
true
julia> typeof(a)
Vector{SVector{5, Float64}} (alias for Array{SArray{Tuple{5}, Float64, 1, 5}, 1})
But, if I use the reinterpret without any other calculations, I get the best performance:
julia> r = rand(5, 1000);
julia> #benchmark reinterpret(SVector{5, Float64}, $r)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 9.600 ns … 51.800 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 9.800 ns ┊ GC (median): 0.00%
Time (mean ± σ): 9.898 ns ± 0.767 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▆ ▄ ▇ ▂ ▂
▆▁▁▁█▁▁▁█▁▁▁▁█▁▁▁█▁▁▁▁█▁▁▁█▁▁▁▁▆▁▁▁▃▁▁▁▁▃▁▁▁▁▁▁▁▁▃▁▁▁▇▁▁▁▇ █
9.6 ns Histogram: log(frequency) by time 10.9 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> #benchmark [SVector{5,Float64}(i) for i in eachcol($r)]
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
Range (min … max): 3.500 μs … 915.714 μs ┊ GC (min … max): 0.00% … 96.56%
Time (median): 24.486 μs ┊ GC (median): 0.00%
Time (mean ± σ): 22.559 μs ± 47.537 μs ┊ GC (mean ± σ): 15.27% ± 7.51%
▄█▄▁▁▁ ▂▂▂▄▃▃▅▆▅█▆▃▁▁▁▁▁▁ ▁ ▂
▇███████▅▆▆▇▄▄▃▆▅▁▃▆▆▃▅▇▆▆██████████████████████▇█▇▇▆█▆█▇███ █
3.5 μs Histogram: log(frequency) by time 41.4 μs <
Memory estimate: 39.11 KiB, allocs estimate: 2.
julia> r = rand(1000, 5);
julia> #benchmark reinterpret(SVector{5, Float64}, $r)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 9.300 ns … 58.300 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 9.400 ns ┊ GC (median): 0.00%
Time (mean ± σ): 9.545 ns ± 1.078 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█ █ ▁ ▇ ▅ ▂
█▁▁█▁▁▁█▁▁▁█▁▁▁█▁▁▁█▁▁▁█▁▁▁▆▁▁▃▁▁▁▁▁▁▁▁▁▁▁▇▁▁▁█▁▁▁▅▁▁▁▄▁▁▆ █
9.3 ns Histogram: log(frequency) by time 10.8 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> #benchmark [SVector{5,Float64}(i) for i in eachrow($r)]
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
Range (min … max): 3.814 μs … 867.043 μs ┊ GC (min … max): 0.00% … 96.02%
Time (median): 24.071 μs ┊ GC (median): 0.00%
Time (mean ± σ): 21.553 μs ± 47.004 μs ┊ GC (mean ± σ): 15.52% ± 7.48%
▂▇█▅▃▃▂ ▁▁▃▃▃▆▇▅▇▇▄▃▂▂▁▁▁▁ ▃
██████████▇▆▅▃▇▅▁▅▆▇▅▅▆▆▄▇████████████████████████▇▇▆▇▇█▇██▇ █
3.81 μs Histogram: log(frequency) by time 41.7 μs <
Memory estimate: 39.11 KiB, allocs estimate: 2.
Suppose I have a matrix named mat:
julia> mat = rand(1:10, 5, 3)
5×3 Matrix{Int64}:
2 4 3
5 3 10
5 7 5
9 5 7
4 9 6
And I want to calculate the correlation between each pair of mat rows (e.g., cor(mat[1, :], mat[2, :] and so on), and finally, achieve a correlation matrix. I wrote two scripts for it, and I'll provide the benchmarking. However, I would be more pleased if I could make it much faster (because I should perform the procedure on a large dataset, say 2000x20 size).
First approach
A pretty straightforward way; First, I create an initialized matrix with zeros and then try to fill it with the calculated correlations on each pair of rows. This isn't a good approach since I calculate twice as necessary (because, e.g., cor([mat[1, :], mat[3, :]) is equal to cor([mat[3, :], mat[1, :])):
using Statistics
function calc_corr(matrix::Matrix)
n::Int64 = size(matrix, 1)
corr_mat = zeros(Float64, n, n)
for (idx1, idx2)=Iterators.product(1:n, 1:n)
#inbounds corr_mat[idx1, idx2] = cor(
view(matrix, idx1, :),
view(matrix, idx2, :)
)
end
return corr_mat
end
Second approach
Calculate the upper triangular part and then create a symmetrical matrix to achieve a complete correlation matrix:
using LinearAlgebra
function calc_corr2(matrix::Matrix)
n::Int64 = size(matrix, 1)
corr_mat = ones(Float64, n, n)
# find upper triangular indices
upper_triang_idx = findall(==(1), triu(ones(Int8, n, n), 1))
for (idx1, idx2)=Tuple.(upper_triang_idx)
#inbounds corr_mat[idx1, idx2] = cor(
view(matrix, idx1, :),
view(matrix, idx2, :)
)
end
corr_mat = Symmetric(corr_mat)
return corr_mat
end
Benchmarking
First on a tiny matrix I declared before (the mat):
using BenchmarkTools
#benchmark calc_corr($mat)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.950 μs … 6.210 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.160 μs ┊ GC (median): 0.00%
Time (mean ± σ): 2.178 μs ± 289.600 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▇▇▂ ▁ ▅█▆▂▁▁▃▆▅▃▁ ▁▁▁▁ ▂
███████████████████████▆█▇▇█▇▇▆▇▆▇▆▆▄▆▄▄▅▃▁▄▄▄▃▅▃▄▄▄▁▁▁▃▄▁▄ █
1.95 μs Histogram: log(frequency) by time 3.62 μs <
Memory estimate: 256 bytes, allocs estimate: 1.
# ---------------------------------------------------------------------
#benchmark calc_corr2($mat)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.220 μs … 773.080 μs ┊ GC (min … max): 0.00% … 99.19%
Time (median): 1.420 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.698 μs ± 9.921 μs ┊ GC (mean ± σ): 8.15% ± 1.40%
█
█▇▃▂▃▄▂▂▄▄▃▂▃▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▁
1.22 μs Histogram: frequency by time 3.98 μs <
Memory estimate: 976 bytes, allocs estimate: 7.
Test if the results are identical:
julia> calc_corr(mat) == calc_corr2(mat)
true
On a big matrix:
test_mat = rand(1:10, 2_000, 20);
#benchmark calc_corr($test_mat)
BenchmarkTools.Trial: 8 samples with 1 evaluation.
Range (min … max): 632.258 ms … 680.094 ms ┊ GC (min … max): 0.33% … 1.30%
Time (median): 646.215 ms ┊ GC (median): 0.16%
Time (mean ± σ): 650.096 ms ± 16.089 ms ┊ GC (mean ± σ): 0.49% ± 0.60%
▁ ▁ ▁ █ ▁ ▁ ▁
█▁█▁▁▁▁▁▁▁▁▁▁▁█▁▁█▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
632 ms Histogram: frequency by time 680 ms <
Memory estimate: 30.52 MiB, allocs estimate: 2.
# ---------------------------------------------------------------------
#benchmark calc_corr2($test_mat)
BenchmarkTools.Trial: 14 samples with 1 evaluation.
Range (min … max): 351.040 ms … 396.431 ms ┊ GC (min … max): 2.58% … 1.81%
Time (median): 357.403 ms ┊ GC (median): 2.86%
Time (mean ± σ): 360.863 ms ± 11.661 ms ┊ GC (mean ± σ): 2.75% ± 0.80%
█
▇▁█▇▁▇▇▁▁▁▇▁▁▁▇▇▁▇▁▁▇▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
351 ms Histogram: frequency by time 396 ms <
Memory estimate: 99.63 MiB, allocs estimate: 14.
Memory isn't my main concern for now, and I'm looking for a way to make this procedure faster and optimal. The speed would be annoying if you create a more giant matrix like 10_000x100 (So the memory 🥵). Hence, I'm looking for any advice that helps me to achieve a higher speed for this procedure.
Do not reinvent the wheel, just do
cor(test_mat, dims=2)
This is much faster than your code.
Setup:
test_mat = rand(1:10, 2_000, 20)
And now benchmark:
julia> #btime calc_corr2($test_mat);
709.354 ms (16 allocations: 99.63 MiB)
julia> #btime cor(test_mat, dims=2);
52.679 ms (16 allocations: 30.85 MiB)
I have a matrix(rand2) of type "any", and I want to convert the type to float. I have following code.
for i in 1:size(rand2,1)
rand2[i,:]=convert(Array{Float64,1}, rand2[i,:])
end
Such code will not change the data type. What‘s the issue here?
Use dot operator to vectorize over type conversion.
Suppose you have
julia> m = Matrix{Any}(rand(2,2))
2×2 Matrix{Any}:
0.250737 0.0366769
0.240182 0.883665
Than you could do
julia> Float64.(m)
2×2 Matrix{Float64}:
0.250737 0.0366769
0.240182 0.883665
or you could explicitly call vectorized convert:
julia> convert.(Float64, m)
2×2 Matrix{Float64}:
0.250737 0.0366769
0.240182 0.883665
Julia arrays, once created, cannot have their type changed; this is necessary for high performance. So, trying to change the type midway as you have tried won't work. You have to create a new array similar to the original one but with the new type.
You could do this:
m64 = similar(m, Float64)
m64 .= m
This will be 10X faster than direct conversion like Float64.(m).
In addition to Przemyslaw Szufel's answer, you can use the identity function, which narrows the element type of your matrix. Example:
# I use the example of Przemyslaw Szufel
julia> m = Matrix{Any}(rand(2,2))
2×2 Matrix{Any}:
0.250737 0.0366769
0.240182 0.883665
julia> identity.(m)
2×2 Matrix{Float64}:
0.250737 0.0366769
0.240182 0.883665
You can use Matrix to convert to Float64.
m = Matrix{Any}([1. 2.; 3. 4.])
#2×2 Matrix{Any}:
# 1.0 2.0
# 3.0 4.0
Matrix{Float64}(m)
#Array{Float64}(m) #Alternative
#2×2 Matrix{Float64}:
# 1.0 2.0
# 3.0 4.0
Also it's possible to use convert as shown already by #przemyslaw-szufel but without ..
convert(Matrix{Float64}, m)
#convert(Array{Float64}, m) #Alternative
#2×2 Matrix{Float64}:
# 1.0 2.0
# 3.0 4.0
The conversion by using similar shown by #AboAmmar need not to be done over an intermediate step.
similar(m, Float64) .= m
#2×2 Matrix{Float64}:
# 1.0 2.0
# 3.0 4.0
Benchmark
using BenchmarkTools
m = Matrix{Any}(rand(1000,1000))
#benchmark Float64.($m)
#BenchmarkTools.Trial: 32 samples with 1 evaluation.
# Range (min … max): 156.779 ms … 170.114 ms ┊ GC (min … max): 0.00% … 0.00%
# Time (median): 159.612 ms ┊ GC (median): 0.00%
# Time (mean ± σ): 161.097 ms ± 4.111 ms ┊ GC (mean ± σ): 0.03% ± 0.07%
#
# ▁ ▁ ▁▁ ▁ ▄ █
# █▁█▆▁▆██▁█▆▁█▆█▆▆▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▆▁▁▆▁▆▁▆▆▁▁▆▁▁▁▆ ▁
# 157 ms Histogram: frequency by time 170 ms <
#
# Memory estimate: 7.63 MiB, allocs estimate: 2.
#benchmark convert.(Float64, $m)
#BenchmarkTools.Trial: 30 samples with 1 evaluation.
# Range (min … max): 168.258 ms … 177.510 ms ┊ GC (min … max): 0.00% … 0.00%
# Time (median): 170.444 ms ┊ GC (median): 0.00%
# Time (mean ± σ): 170.794 ms ± 1.996 ms ┊ GC (mean ± σ): 0.05% ± 0.12%
#
# █ ██ ▃ ▃ ▃
# ▇▁▇▇▁█▁██▁▁▇▁▇█▁▇▇▁▇▇█▇▇█▁▇▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▇ ▁
# 168 ms Histogram: frequency by time 178 ms <
#
# Memory estimate: 7.63 MiB, allocs estimate: 13.
#benchmark identity.($m)
#BenchmarkTools.Trial: 58 samples with 1 evaluation.
# Range (min … max): 84.857 ms … 91.658 ms ┊ GC (min … max): 0.00% … 0.00%
# Time (median): 85.980 ms ┊ GC (median): 0.00%
# Time (mean ± σ): 86.873 ms ± 2.009 ms ┊ GC (mean ± σ): 0.10% ± 0.24%
#
# █ ▃▁▆ ▁ ▁
# █▄▇▇▄███▇▄█▇▁▄▁█▇▄▁▁▁▄▁▄▁▁▄▄▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▄▇▁▇▄▁▇▄▁▁▄▄ ▁
# 84.9 ms Histogram: frequency by time 90.9 ms <
#
# Memory estimate: 7.63 MiB, allocs estimate: 12.
#benchmark begin
m64 = similar($m, Float64)
m64 .= $m
end
#BenchmarkTools.Trial: 289 samples with 1 evaluation.
# Range (min … max): 15.963 ms … 21.972 ms ┊ GC (min … max): 0.00% … 3.92%
# Time (median): 17.319 ms ┊ GC (median): 0.00%
# Time (mean ± σ): 17.332 ms ± 878.046 μs ┊ GC (mean ± σ): 2.79% ± 2.83%
#
# ▄ ▁ ▅ ▁▃▂▅▄█▂▂▁
# ▇███▆██▃▃▄▃▄▁▁▁▁▁▁▁▄▄▇█████████▇▅▄▄▆▃▃▁▆▄▅▆▅██▇▄█▄▄▅▄▄▄▁▃▄▃▃ ▄
# 16 ms Histogram: frequency by time 19 ms <
#
# Memory estimate: 22.88 MiB, allocs estimate: 999491.
#benchmark similar($m, Float64) .= $m
#BenchmarkTools.Trial: 299 samples with 1 evaluation.
# Range (min … max): 16.108 ms … 21.211 ms ┊ GC (min … max): 0.00% #… 0.00%
# Time (median): 16.795 ms ┊ GC (median): 0.00%
# Time (mean ± σ): 16.740 ms ± 500.870 μs ┊ GC (mean ± σ): 1.81% #± 1.82%
#
# ▂▃▄▆ ▃▁ ▄▃ █
# ▃▃▁▃▄█████▇██▅▇▄▆▆▆▃▅▃▃▁▅▃▄▃█▆████▅██▆█▆▅▆▅▅▃▃▃▄▃▃▁▃▁▄▁▁▁▁▁▃ ▃
# 16.1 ms Histogram: frequency by time 17.6 ms <
#
# Memory estimate: 22.88 MiB, allocs estimate: 999491.
#benchmark Matrix{Float64}($m)
#BenchmarkTools.Trial: 282 samples with 1 evaluation.
# Range (min … max): 16.243 ms … 23.092 ms ┊ GC (min … max): 0.00% … 7.39%
# Time (median): 18.299 ms ┊ GC (median): 4.62%
# Time (mean ± σ): 17.745 ms ± 1.196 ms ┊ GC (mean ± σ): 5.71% ± 5.45%
#
# ▃▆▆█▂ ▃▄
# ▃██████▄▆▃▄▄▅▁▃▃▄▃▃▃▁▁▁▁▁▁▁▁▁▁▁▃▃▄▅▆▇█▇███▇▇▇▆▆▃▃▃▃▁▃▁▄▃▃▁▃ ▃
# 16.2 ms Histogram: frequency by time 19.9 ms <
#
# Memory estimate: 22.88 MiB, allocs estimate: 999491.
#benchmark convert(Matrix{Float64}, $m)
#BenchmarkTools.Trial: 301 samples with 1 evaluation.
# Range (min … max): 15.912 ms … 21.628 ms ┊ GC (min … max): 0.00% #… 0.00%
# Time (median): 16.719 ms ┊ GC (median): 0.00%
# Time (mean ± σ): 16.622 ms ± 576.159 μs ┊ GC (mean ± σ): 2.43% #± 2.45%
#
# ▆█ ▁ ▃
# ▂▁▂▄▅▄███▇▅▅▇▃▂▃▃▆▃▁▂▂▁▁▂▃▅▇▇█▇▅██▅▆▅▄▄▃▃▃▃▂▂▂▃▂▁▂▁▂▁▂▂▁▁▁▁▃ ▃
# 15.9 ms Histogram: frequency by time 17.7 ms <
#
# Memory estimate: 22.88 MiB, allocs estimate: 999491.
Using similar, Matrix or convert are in this case about 5 times faster than using identity. and 10 times faster than convert. or Float64.. But they aren't memory efficient as identity., convert. and Float64. are and using in this case about 3 times more memory.
I'm having speed issues multiplying the transpose of a sparse matrix with a column vector.
In my code the matrix A is
501×501 SparseMatrixCSC{Float64, Integer} with 1501 stored entries:
⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸
⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸
⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸
⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⢸
⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⢸
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⢸
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⢸
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⢸
⠀⠀⠀⠀⠀⠀⠀⠀⠀⡀⠀⠀⠀⠀⠀⠈⠻⣾
These are the results I get from the multiplication with f0 = rand(Float64,501,1):
Method 1
A_tr = transpose(A)
#benchmark A_tr*f
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 350.083 μs … 9.066 ms ┊ GC (min … max): 0.00% … 95.44%
Time (median): 361.208 μs ┊ GC (median): 0.00%
Time (mean ± σ): 380.269 μs ± 355.997 μs ┊ GC (mean ± σ): 4.06% ± 4.15%
Memory estimate: 218.70 KiB, allocs estimate: 11736.
Method 2
A_tr = Matrix(transpose(A))
#benchmark A_tr*f
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 87.375 μs … 210.875 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 88.542 μs ┊ GC (median): 0.00%
Time (mean ± σ): 89.286 μs ± 3.266 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
Memory estimate: 4.06 KiB, allocs estimate: 1.
Method 3
A_tr = sparse(Matrix(transpose(A)))
#benchmark A_tr*f
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min … max): 2.102 μs … 1.017 ms ┊ GC (min … max): 0.00% … 99.40%
Time (median): 2.477 μs ┊ GC (median): 0.00%
Time (mean ± σ): 2.725 μs ± 13.428 μs ┊ GC (mean ± σ): 6.92% ± 1.41%
Memory estimate: 4.06 KiB, allocs estimate: 1.
Why doesn't Method 1 produce a similar performance as Method 3? I'm probably missing something basic here.
Thank you for your help!
501×501 SparseMatrixCSC{Float64, Integer} with 1501 stored entries
Integer is an abstract type. This is what is slowing your code down. See the performance tips.
using the following MWE
using LinearAlgebra, BenchmarkTools, SparseArrays
A = sprand(501,501,0.005)
At1 = transpose(A)
At2 = sparse(Matrix(transpose(A)))
f = rand(Float64,501,1)
you will find no significant performance difference between
#benchmark $At1*$f
and
#benchmark $At2*$f
As was pointed out by #SGJ the trick is to have a primitive type as parameter for your container, i.e. SparseMatrixCSC{Float64, Int64} instead of SparseMatrixCSC{Float64, Integer}, which is what sprand(501,501,0.005) generates.
#CRJ
IIRC, transpose(A) makes a view of A through LinearAlgebra, which requires translating coordinates for every access. I don't think the fast ways of doing MV math will work through that interface. I'm not surprised that converting your transpose to a matrix object instead of trying to do math through the view is faster.
transpose(A) yields Transpose(A), where Transpose is a lazy transpose wrapper. For sparse-matrix-dense-vector multiplication there are tailored methods, which do not require any mutations of A.