How to efficiently initialize huge sparse arrays in Julia? - julia

There are two ways one can initialize a NXN sparse matrix, whose entries are to be read from one/multiple text files. Which one is faster? I need the more efficient one, as N is large, typically 10^6.
1). I could store the (x,y) indices in arrays x, y, the entries in an array v and declare
K = sparse(x,y,value);
2). I could declare
K = spzeros(N)
then read of the (i,j) coordinates and values v and insert them as
K[i,j]=v;
as they are being read.
I found no tips about this on Julia’s page on sparse arrays.

Don’t insert values one by one: that will be tremendously inefficient since the storage in the sparse matrix needs to be reallocated over and over again.
You can also use BenchmarkTools.jl to verify this:
julia> using SparseArrays
julia> using BenchmarkTools
julia> I = rand(1:1000, 1000); J = rand(1:1000, 1000); X = rand(1000);
julia> function fill_spzeros(I, J, X)
x = spzeros(1000, 1000)
#assert axes(I) == axes(J) == axes(X)
#inbounds for i in eachindex(I)
x[I[i], J[i]] = X[i]
end
x
end
fill_spzeros (generic function with 1 method)
julia> #btime sparse($I, $J, $X);
10.713 μs (12 allocations: 55.80 KiB)
julia> #btime fill_spzeros($I, $J, $X);
96.068 μs (22 allocations: 40.83 KiB)
Original post can be found here

Related

Fast summation of symmetric matrix in Julia

I want to sum all elements in a matrix A with dimension n times n. The matrix is symmetric and has 0s on the diagonal. The fastest way to do so that I have found is simply
sum(A). However this seems wasteful since it doesn't use the fact that I only need to calculate the lower triangle of the matrix. However, sum(tril(A, -1)) is significantly slower, and sum(A[i, j] for i = 1:n-1 for j = i+1:n) even more so. Is there a more efficient way to sum the matrix?
Edit: The solution by #AboAmmar performs well. Here is code (with summing the diagonal separately, something that can be removed if there is only zeros on the diagonal) to compare:
using BenchmarkTools
using LinearAlgebra
function sum_triu(A)
m, n = size(A)
#assert m == n
s = zero(eltype(A))
for j = 2:n
#simd for i = 1:j-1
s += #inbounds A[i,j]
end
end
s *= 2
for i = 1:n
s += A[i, i]
end
return s
end
N = 1000
A = Symmetric(rand(0:9,N,N))
A -= diagm(diag(A))
#btime sum(A)
#btime 2 * sum(tril(A))
#btime sum_triu(A)
This is 2.7X faster than sum for n = 1000 matrix. Make sure to add a #simd before the loop and use #inbounds. Also, use the correct loop order for fast memory access.
function sum_triu(A)
m, n = size(A)
#assert m == n
s = zero(eltype(A))
for j = 1:n
#simd for i = 1:j
s += #inbounds A[i,j]
end
end
return 2 * s
end
Example run on my PC:
sum_triu(A) = 499268.7328022966
sum(A) = 499268.73280229873
93.000 μs (0 allocations: 0 bytes)
249.900 μs (0 allocations: 0 bytes)
How about
2 * sum(LowerTriangular(A))
help?> LA.LowerTriangular
LowerTriangular(A::AbstractMatrix)
Construct a LowerTriangular view of the matrix A.
tril creates a new matrix, which allocates memory. Since a LowerTriangular is a view into the existing matrix, there's no memory allocation.

List of vectors slower in Julia than R?

I tried to speed up an R function by porting it to Julia, but to my surprise Julia was slower. The function sequentially updates a list of vectors (array of arrays in Julia). Beforehand the index of the list element to be updated is unknown and the length of the new vector is unknown.
I have written a test function that demonstrates the behavior.
Julia
function MyTest(n)
a = [[0.0] for i in 1:n]
for i in 1:n
a[i] = cumsum(ones(i))
end
a
end
R
MyTest <- function(n){
a <- as.list(rep(0, n))
for (i in 1:n)
a[[i]] <- cumsum(rep(1, i))
a
}
By setting n to 5000, 10000 and 20000, typical computing times are (median of 21 tests):
R: 0.14, 0.45, and 1.28 seconds
Julia: 0.31, 3.38, and 27.03 seconds
I used a windows-laptop with 64 bit Julia-1.3.1 and 64 bit R-3.6.1.
Both these functions use 64 bit floating-point types. My real problem involves integers and then R is even more favorable. But integer comparison isn’t fair since R uses 32 bit integers and Julia 64 bit.
Is it something I can do to speed up Julia or is really Julia much slower than R in this case?
I don't quite see how you get your test results. Assuming you want 32 bit integers, as you said, then we have
julia> function mytest(n)
a = Vector{Vector{Int32}}(undef, n)
for i in 1:n
a[i] = cumsum(ones(i))
end
return a
end
mytest (generic function with 1 method)
julia> #btime mytest(20000);
1.108 s (111810 allocations: 3.73 GiB)
When we only get rid of those allocations, we already get down to the following:
julia> function mytest(n)
a = Vector{Vector{Int32}}(undef, n)
#inbounds for i in 1:n
a[i] = collect(UnitRange{Int32}(1, i))
end
return a
end
mytest (generic function with 1 method)
julia> #btime mytest(20000);
115.702 ms (35906 allocations: 765.40 MiB)
Further devectorization does not even help:
julia> function mytest(n)
a = Vector{Vector{Int32}}(undef, n)
#inbounds for i in 1:n
v = Vector{Int32}(undef, i)
v[1] = 1
#inbounds for j = 2:i
v[j] = v[j-1] + 1
end
a[i] = v
end
return a
end
mytest (generic function with 1 method)
julia> #btime mytest(20000);
188.856 ms (35906 allocations: 765.40 MiB)
But with a couple of threads (I assume the inner arrays are independent), we get 2x speed-up again:
julia> Threads.nthreads()
4
julia> function mytest(n)
a = Vector{Vector{Int32}}(undef, n)
Threads.#threads for i in 1:n
v = Vector{Int32}(undef, i)
v[1] = 1
#inbounds for j = 2:i
v[j] = v[j-1] + 1
end
a[i] = v
end
return a
end
mytest (generic function with 1 method)
julia> #btime mytest(20000);
99.718 ms (35891 allocations: 763.13 MiB)
But this is only about as fast as the second variant above.
That is, for the specific case of cumsum. Other inner functions are slower, of course, but can be equally threaded, and optimized in the same ways, with possibly different results.
(This is on Julia 1.2, 12 GiB RAM, and an older i7.)
Perhaps R is doing some type of buffering for such simple functions?
Here is the Julia version with buffering:
using Memoize
#memoize function cumsum_ones(i)
cumsum(ones(i))
end
function MyTest2(n)
a = Vector{Vector{Float64}}(undef, n)
for i in 1:n
a[i] = cumsum_ones(i)
end
a
end
In a warmed-up function, the timings look the following:
julia> #btime MyTest2(5000);
442.500 μs (10002 allocations: 195.39 KiB)
julia> #btime MyTest2(10000);
939.499 μs (20002 allocations: 390.70 KiB)
julia> #btime MyTest2(20000);
3.554 ms (40002 allocations: 781.33 KiB)

How to find the index of the last maximum in julialang?

I have an array that contains repeated nonnegative integers, e.g., A=[5,5,5,0,1,1,0,0,0,3,3,0,0]. I would like to find the position of the last maximum in A. That is the largest index i such that A[i]>=A[j] for all j. In my example, i=3.
I tried to find the indices of all maximum of A then find the maximum of these indices:
A = [5,5,5,0,1,1,0,0,0,3,3,0,0];
Amax = maximum(A);
i = maximum(find(x -> x == Amax, A));
Is there any better way?
length(A) - indmax(#view A[end:-1:1]) + 1
should be pretty fast, but I didn't benchmark it.
EDIT: I should note that by definition #crstnbr 's solution (to write the algorithm from scratch) is faster (how much faster is shown in Xiaodai's response). This is an attempt to do it using julia's inbuilt array functions.
What about findlast(A.==maximum(A)) (which of course is conceptually similar to your approach)?
The fastest thing would probably be explicit loop implementation like this:
function lastindmax(x)
k = 1
m = x[1]
#inbounds for i in eachindex(x)
if x[i]>=m
k = i
m = x[i]
end
end
return k
end
I tried #Michael's solution and #crstnbr's solution and I found the latter much faster
a = rand(Int8(1):Int8(5),1_000_000_000)
#time length(a) - indmax(#view a[end:-1:1]) + 1 # 19 seconds
#time length(a) - indmax(#view a[end:-1:1]) + 1 # 18 seconds
function lastindmax(x)
k = 1
m = x[1]
#inbounds for i in eachindex(x)
if x[i]>=m
k = i
m = x[i]
end
end
return k
end
#time lastindmax(a) # 3 seconds
#time lastindmax(a) # 2.8 seconds
Michael's solution doesn't support Strings (ERROR: MethodError: no method matching view(::String, ::StepRange{Int64,Int64})) or sequences so I add another solution:
julia> lastimax(x) = maximum((j,i) for (i,j) in enumerate(x))[2]
julia> A="abžcdž"; lastimax(A) # unicode is OK
6
julia> lastimax(i^2 for i in -10:7)
1
If you more like don't catch exception for empty Sequence:
julia> lastimax(x) = !isempty(x) ? maximum((j,i) for (i,j) in enumerate(x))[2] : 0;
julia> lastimax(i for i in 1:3 if i>4)
0
Simple(!) benchmarks:
This is up to 10 times slower than Michael's solution for Float64:
julia> mlastimax(A) = length(A) - indmax(#view A[end:-1:1]) + 1;
julia> julia> A = rand(Float64, 1_000_000); #time lastimax(A); #time mlastimax(A)
0.166389 seconds (4.00 M allocations: 91.553 MiB, 4.63% gc time)
0.019560 seconds (6 allocations: 240 bytes)
80346
(I am surprised) it is 2 times faster for Int64!
julia> A = rand(Int64, 1_000_000); #time lastimax(A); #time mlastimax(A)
0.015453 seconds (10 allocations: 304 bytes)
0.031197 seconds (6 allocations: 240 bytes)
423400
it is 2-3 times slower for Strings
julia> A = ["A$i" for i in 1:1_000_000]; #time lastimax(A); #time mlastimax(A)
0.175117 seconds (2.00 M allocations: 61.035 MiB, 41.29% gc time)
0.077098 seconds (7 allocations: 272 bytes)
999999
EDIT2:
#crstnbr solution is faster and works with Strings too (doesn't work with generators). There difference between lastindmax and lastimax - first return byte index, second return character index:
julia> S = "1š3456789ž"
julia> length(S)
10
julia> lastindmax(S) # return value is bigger than length
11
julia> lastimax(S) # return character index (which is not byte index to String) of last max character
10
julia> S[chr2ind(S, lastimax(S))]
'ž': Unicode U+017e (category Ll: Letter, lowercase)
julia> S[chr2ind(S, lastimax(S))]==S[lastindmax(S)]
true

Order of linear algebra operations in Julia

If I have a command y = A*B*x where A & B are large matrices and x & y are vectors, will Julia preform y = ((A*B)*x) or y = (A*(B*x))?
The second option should be the best as it only has to allocate an extra vector rather than a large matrix.
The best way to verify this kind of thing is to dump the lowered code via #code_lowered macro:
julia> #code_lowered A * B * x
CodeInfo(:(begin
nothing
return (Core._apply)(Base.afoldl, (Core.tuple)(Base.*, (a * b) * c), xs)
end))
Like many other languages, Julia does y = (A*B)*x instead of y = A*(B*x), so it's up to you to explicitly use parens to reduce the allocation.
julia> using BenchmarkTools
julia> #btime $A * ($B * $x);
6.800 μs (2 allocations: 1.75 KiB)
julia> #btime $A * $B * $x;
45.453 μs (3 allocations: 79.08 KiB)

Updating a dense vector by a sparse vector in Julia is slow

I am using Julia version 0.4.5 and I am experiencing the following issue:
As far as I know, taking inner product between a sparse vector and a dense vector should be as fast as updating the dense vector by a sparse vector. The latter one is much slower.
A = sprand(100000,100000,0.01)
w = rand(100000)
#time for i=1:100000
w += A[:,i]
end
26.304380 seconds (1.30 M allocations: 150.556 GB, 8.16% gc time)
#time for i=1:100000
A[:,i]'*w
end
0.815443 seconds (921.91 k allocations: 1.540 GB, 5.58% gc time)
I created a simple sparse matrix type of my own, and the addition code was ~ the same as the inner product.
Am I doing something wrong? I feel like there should be a special function doing the operation w += A[:,i], but I couldn't find it.
Any help is appreciated.
I asked the same question on GitHub and we came to the following conclusion. The type SparseVector was added as of Julia 0.4 and with it the BLAS function LinAlg.axpy!, which updates in-place a (possibly dense) vector x by a sparse vector y multiplied by a scalar a, i.e. performs x += a*y efficiently. However, in Julia 0.4 it is not implemented properly. It works only in Julia 0.5
#time for i=1:100000
LinAlg.axpy!(1,A[:,i],w)
end
1.041587 seconds (799.49 k allocations: 1.530 GB, 8.01% gc time)
However, this code is still sub-optimal, as it creates the SparseVector A[:,i]. One can get an even faster version with the following function:
function upd!(w,A,i,c)
rowval = A.rowval
nzval = A.nzval
#inbounds for j = nzrange(A,i)
w[rowval[j]] += c* nzval[j]
end
return w
end
#time for i=1:100000
upd!(w,A,i,1)
end
0.500323 seconds (99.49 k allocations: 1.518 MB)
This is exactly what I needed to achieve, after some research we managed to get there, thanks everyone!
Assuming you want to compute w += c * A[:, i], there is an easy way to vectorize it:
>>> A = sprand(100000, 100000, 0.01)
>>> c = rand(100000)
>>> r1 = zeros(100000)
>>> #time for i = 1:100000
>>> r1 += A[:, i] * c[i]
>>> end
29.997412 seconds (1.90 M allocations: 152.077 GB, 12.73% gc time)
>>> #time r2 = sum(A .* c', 2);
1.191850 seconds (50 allocations: 1.493 GB, 0.14% gc time)
>>> all(r1 == r2)
true
First, create a vector c of the constants to multiply with. Then multiplay de columns of A element-wise by the values of c (A .* c', it does broadcasting inside). Last, reduce over the columns of A (the part sum(.., 2)).

Resources