How to find the index of the last maximum in julialang? - julia

I have an array that contains repeated nonnegative integers, e.g., A=[5,5,5,0,1,1,0,0,0,3,3,0,0]. I would like to find the position of the last maximum in A. That is the largest index i such that A[i]>=A[j] for all j. In my example, i=3.
I tried to find the indices of all maximum of A then find the maximum of these indices:
A = [5,5,5,0,1,1,0,0,0,3,3,0,0];
Amax = maximum(A);
i = maximum(find(x -> x == Amax, A));
Is there any better way?

length(A) - indmax(#view A[end:-1:1]) + 1
should be pretty fast, but I didn't benchmark it.
EDIT: I should note that by definition #crstnbr 's solution (to write the algorithm from scratch) is faster (how much faster is shown in Xiaodai's response). This is an attempt to do it using julia's inbuilt array functions.

What about findlast(A.==maximum(A)) (which of course is conceptually similar to your approach)?
The fastest thing would probably be explicit loop implementation like this:
function lastindmax(x)
k = 1
m = x[1]
#inbounds for i in eachindex(x)
if x[i]>=m
k = i
m = x[i]
end
end
return k
end

I tried #Michael's solution and #crstnbr's solution and I found the latter much faster
a = rand(Int8(1):Int8(5),1_000_000_000)
#time length(a) - indmax(#view a[end:-1:1]) + 1 # 19 seconds
#time length(a) - indmax(#view a[end:-1:1]) + 1 # 18 seconds
function lastindmax(x)
k = 1
m = x[1]
#inbounds for i in eachindex(x)
if x[i]>=m
k = i
m = x[i]
end
end
return k
end
#time lastindmax(a) # 3 seconds
#time lastindmax(a) # 2.8 seconds

Michael's solution doesn't support Strings (ERROR: MethodError: no method matching view(::String, ::StepRange{Int64,Int64})) or sequences so I add another solution:
julia> lastimax(x) = maximum((j,i) for (i,j) in enumerate(x))[2]
julia> A="abžcdž"; lastimax(A) # unicode is OK
6
julia> lastimax(i^2 for i in -10:7)
1
If you more like don't catch exception for empty Sequence:
julia> lastimax(x) = !isempty(x) ? maximum((j,i) for (i,j) in enumerate(x))[2] : 0;
julia> lastimax(i for i in 1:3 if i>4)
0
Simple(!) benchmarks:
This is up to 10 times slower than Michael's solution for Float64:
julia> mlastimax(A) = length(A) - indmax(#view A[end:-1:1]) + 1;
julia> julia> A = rand(Float64, 1_000_000); #time lastimax(A); #time mlastimax(A)
0.166389 seconds (4.00 M allocations: 91.553 MiB, 4.63% gc time)
0.019560 seconds (6 allocations: 240 bytes)
80346
(I am surprised) it is 2 times faster for Int64!
julia> A = rand(Int64, 1_000_000); #time lastimax(A); #time mlastimax(A)
0.015453 seconds (10 allocations: 304 bytes)
0.031197 seconds (6 allocations: 240 bytes)
423400
it is 2-3 times slower for Strings
julia> A = ["A$i" for i in 1:1_000_000]; #time lastimax(A); #time mlastimax(A)
0.175117 seconds (2.00 M allocations: 61.035 MiB, 41.29% gc time)
0.077098 seconds (7 allocations: 272 bytes)
999999
EDIT2:
#crstnbr solution is faster and works with Strings too (doesn't work with generators). There difference between lastindmax and lastimax - first return byte index, second return character index:
julia> S = "1š3456789ž"
julia> length(S)
10
julia> lastindmax(S) # return value is bigger than length
11
julia> lastimax(S) # return character index (which is not byte index to String) of last max character
10
julia> S[chr2ind(S, lastimax(S))]
'ž': Unicode U+017e (category Ll: Letter, lowercase)
julia> S[chr2ind(S, lastimax(S))]==S[lastindmax(S)]
true

Related

Optimize looping over a large string to reduce allocations

I am trying to loop over a string in Julia to parse it. I have a DefaultDict inside a struct, containing the number of times I have seen a particular character.
#with_kw mutable struct Metrics
...
nucleotides = DefaultDict{Char, Int64}(0)
...
end
I have written a function to loop over a string and increment the value of each character in the DefaultDict.
function compute_base_composition(sequence::String, metrics::Metrics)
for i in 1:sizeof(sequence)
metrics.nucleotides[sequence[i]] += 1
end
end
This function is called in a for loop because I need to do this for multiple strings (which can be up to 2 billions characters long). When I run the #time macro, I get this result:
#time compute_base_composition(sequence, metrics)
0.167172 seconds (606.20 k allocations: 15.559 MiB, 78.00% compilation time)
0.099403 seconds (1.63 M allocations: 24.816 MiB)
0.032346 seconds (633.24 k allocations: 9.663 MiB)
0.171382 seconds (3.06 M allocations: 46.751 MiB, 4.64% gc time)
As you can see, there are a lot of memory allocations for such a simple function. I have tried to change the for loop to something like for c in sequence but that didn't change much. Would there be a way to reduce them and make the function faster?
Work on bytes no on unicode chars
Use Vectors not Dicts
Avoid untyped fields in containers
#with_kw struct MetricsB
nucleotides::Vector{Int}=zeros(Int, 256)
end
function compute_base_composition(sequence::String, metrics::MetricsB)
bs = Vector{UInt8}(sequence)
for i in 1:length(bs)
#inbounds metrics.nucleotides[bs[i]] += 1
end
end
And a benchmark with a nice speedup of 90x :
julia> st = randstring(10_000_000);
julia> #time compute_base_composition(st, Metrics())
1.793991 seconds (19.94 M allocations: 304.213 MiB, 3.33% gc time)
julia> #time compute_base_composition(st, MetricsB())
0.019398 seconds (3 allocations: 9.539 MiB)
Actually you can almost totally avoid allocations with the following code:
function compute_base_composition2(sequence::String, metrics::MetricsB)
pp = pointer(sequence)
for i in 1:length(sequence)
#inbounds metrics.nucleotides[Base.pointerref(pp, i, 1)] += 1
end
end
and now:
julia> #time compute_base_composition2(st, MetricsB())
0.021161 seconds (1 allocation: 2.125 KiB)

List of vectors slower in Julia than R?

I tried to speed up an R function by porting it to Julia, but to my surprise Julia was slower. The function sequentially updates a list of vectors (array of arrays in Julia). Beforehand the index of the list element to be updated is unknown and the length of the new vector is unknown.
I have written a test function that demonstrates the behavior.
Julia
function MyTest(n)
a = [[0.0] for i in 1:n]
for i in 1:n
a[i] = cumsum(ones(i))
end
a
end
R
MyTest <- function(n){
a <- as.list(rep(0, n))
for (i in 1:n)
a[[i]] <- cumsum(rep(1, i))
a
}
By setting n to 5000, 10000 and 20000, typical computing times are (median of 21 tests):
R: 0.14, 0.45, and 1.28 seconds
Julia: 0.31, 3.38, and 27.03 seconds
I used a windows-laptop with 64 bit Julia-1.3.1 and 64 bit R-3.6.1.
Both these functions use 64 bit floating-point types. My real problem involves integers and then R is even more favorable. But integer comparison isn’t fair since R uses 32 bit integers and Julia 64 bit.
Is it something I can do to speed up Julia or is really Julia much slower than R in this case?
I don't quite see how you get your test results. Assuming you want 32 bit integers, as you said, then we have
julia> function mytest(n)
a = Vector{Vector{Int32}}(undef, n)
for i in 1:n
a[i] = cumsum(ones(i))
end
return a
end
mytest (generic function with 1 method)
julia> #btime mytest(20000);
1.108 s (111810 allocations: 3.73 GiB)
When we only get rid of those allocations, we already get down to the following:
julia> function mytest(n)
a = Vector{Vector{Int32}}(undef, n)
#inbounds for i in 1:n
a[i] = collect(UnitRange{Int32}(1, i))
end
return a
end
mytest (generic function with 1 method)
julia> #btime mytest(20000);
115.702 ms (35906 allocations: 765.40 MiB)
Further devectorization does not even help:
julia> function mytest(n)
a = Vector{Vector{Int32}}(undef, n)
#inbounds for i in 1:n
v = Vector{Int32}(undef, i)
v[1] = 1
#inbounds for j = 2:i
v[j] = v[j-1] + 1
end
a[i] = v
end
return a
end
mytest (generic function with 1 method)
julia> #btime mytest(20000);
188.856 ms (35906 allocations: 765.40 MiB)
But with a couple of threads (I assume the inner arrays are independent), we get 2x speed-up again:
julia> Threads.nthreads()
4
julia> function mytest(n)
a = Vector{Vector{Int32}}(undef, n)
Threads.#threads for i in 1:n
v = Vector{Int32}(undef, i)
v[1] = 1
#inbounds for j = 2:i
v[j] = v[j-1] + 1
end
a[i] = v
end
return a
end
mytest (generic function with 1 method)
julia> #btime mytest(20000);
99.718 ms (35891 allocations: 763.13 MiB)
But this is only about as fast as the second variant above.
That is, for the specific case of cumsum. Other inner functions are slower, of course, but can be equally threaded, and optimized in the same ways, with possibly different results.
(This is on Julia 1.2, 12 GiB RAM, and an older i7.)
Perhaps R is doing some type of buffering for such simple functions?
Here is the Julia version with buffering:
using Memoize
#memoize function cumsum_ones(i)
cumsum(ones(i))
end
function MyTest2(n)
a = Vector{Vector{Float64}}(undef, n)
for i in 1:n
a[i] = cumsum_ones(i)
end
a
end
In a warmed-up function, the timings look the following:
julia> #btime MyTest2(5000);
442.500 μs (10002 allocations: 195.39 KiB)
julia> #btime MyTest2(10000);
939.499 μs (20002 allocations: 390.70 KiB)
julia> #btime MyTest2(20000);
3.554 ms (40002 allocations: 781.33 KiB)

How to efficiently initialize huge sparse arrays in Julia?

There are two ways one can initialize a NXN sparse matrix, whose entries are to be read from one/multiple text files. Which one is faster? I need the more efficient one, as N is large, typically 10^6.
1). I could store the (x,y) indices in arrays x, y, the entries in an array v and declare
K = sparse(x,y,value);
2). I could declare
K = spzeros(N)
then read of the (i,j) coordinates and values v and insert them as
K[i,j]=v;
as they are being read.
I found no tips about this on Julia’s page on sparse arrays.
Don’t insert values one by one: that will be tremendously inefficient since the storage in the sparse matrix needs to be reallocated over and over again.
You can also use BenchmarkTools.jl to verify this:
julia> using SparseArrays
julia> using BenchmarkTools
julia> I = rand(1:1000, 1000); J = rand(1:1000, 1000); X = rand(1000);
julia> function fill_spzeros(I, J, X)
x = spzeros(1000, 1000)
#assert axes(I) == axes(J) == axes(X)
#inbounds for i in eachindex(I)
x[I[i], J[i]] = X[i]
end
x
end
fill_spzeros (generic function with 1 method)
julia> #btime sparse($I, $J, $X);
10.713 μs (12 allocations: 55.80 KiB)
julia> #btime fill_spzeros($I, $J, $X);
96.068 μs (22 allocations: 40.83 KiB)
Original post can be found here

Conditional closures in Julia

In many applications of map(f,X), it helps to create closures that depending on parameters apply different functions f to data X.
I can think of at least the following three ways to do this (note that the second for some reason does not work, bug?)
f0(x,y) = x+y
f1(x,y,p) = x+y^p
function g0(power::Bool,X,y)
if power
f = x -> f1(x,y,2.0)
else
f = x -> f0(x,y)
end
map(f,X)
end
function g1(power::Bool,X,y)
if power
f(x) = f1(x,y,2.0)
else
f(x) = f0(x,y)
end
map(f,X)
end
abstract FunType
abstract PowerFun <: FunType
abstract NoPowerFun <: FunType
function g2{S<:FunType}(T::Type{S},X,y)
f(::Type{PowerFun},x) = f1(x,y,2.0)
f(::Type{NoPowerFun},x) = f0(x,y)
map(x -> f(T,x),X)
end
X = 1.0:1000000.0
burnin0 = g0(true,X,4.0) + g0(false,X,4.0);
burnin1 = g1(true,X,4.0) + g1(false,X,4.0);
burnin2 = g2(PowerFun,X,4.0) + g2(NoPowerFun,X,4.0);
#time r0true = g0(true,X,4.0); #0.019515 seconds (12 allocations: 7.630 MB)
#time r0false = g0(false,X,4.0); #0.002984 seconds (12 allocations: 7.630 MB)
#time r1true = g1(true,X,4.0); # 0.004517 seconds (8 allocations: 7.630 MB, 26.28% gc time)
#time r1false = g1(false,X,4.0); # UndefVarError: f not defined
#time r2true = g2(PowerFun,X,4.0); # 0.085673 seconds (2.00 M allocations: 38.147 MB, 3.90% gc time)
#time r2false = g2(NoPowerFun,X,4.0); # 0.234087 seconds (2.00 M allocations: 38.147 MB, 60.61% gc time)
What is the optimal way to do this in Julia?
There's no need to use map here at all. Using a closure doesn't make things simpler or faster. Just use "dot-broadcasting" to apply the functions directly:
function g3(X,y,power=1)
if power != 1
return f1.(X, y, power) # or simply X .+ y^power
else
return f0.(X, y) # or simply X .+ y
end
end

Updating a dense vector by a sparse vector in Julia is slow

I am using Julia version 0.4.5 and I am experiencing the following issue:
As far as I know, taking inner product between a sparse vector and a dense vector should be as fast as updating the dense vector by a sparse vector. The latter one is much slower.
A = sprand(100000,100000,0.01)
w = rand(100000)
#time for i=1:100000
w += A[:,i]
end
26.304380 seconds (1.30 M allocations: 150.556 GB, 8.16% gc time)
#time for i=1:100000
A[:,i]'*w
end
0.815443 seconds (921.91 k allocations: 1.540 GB, 5.58% gc time)
I created a simple sparse matrix type of my own, and the addition code was ~ the same as the inner product.
Am I doing something wrong? I feel like there should be a special function doing the operation w += A[:,i], but I couldn't find it.
Any help is appreciated.
I asked the same question on GitHub and we came to the following conclusion. The type SparseVector was added as of Julia 0.4 and with it the BLAS function LinAlg.axpy!, which updates in-place a (possibly dense) vector x by a sparse vector y multiplied by a scalar a, i.e. performs x += a*y efficiently. However, in Julia 0.4 it is not implemented properly. It works only in Julia 0.5
#time for i=1:100000
LinAlg.axpy!(1,A[:,i],w)
end
1.041587 seconds (799.49 k allocations: 1.530 GB, 8.01% gc time)
However, this code is still sub-optimal, as it creates the SparseVector A[:,i]. One can get an even faster version with the following function:
function upd!(w,A,i,c)
rowval = A.rowval
nzval = A.nzval
#inbounds for j = nzrange(A,i)
w[rowval[j]] += c* nzval[j]
end
return w
end
#time for i=1:100000
upd!(w,A,i,1)
end
0.500323 seconds (99.49 k allocations: 1.518 MB)
This is exactly what I needed to achieve, after some research we managed to get there, thanks everyone!
Assuming you want to compute w += c * A[:, i], there is an easy way to vectorize it:
>>> A = sprand(100000, 100000, 0.01)
>>> c = rand(100000)
>>> r1 = zeros(100000)
>>> #time for i = 1:100000
>>> r1 += A[:, i] * c[i]
>>> end
29.997412 seconds (1.90 M allocations: 152.077 GB, 12.73% gc time)
>>> #time r2 = sum(A .* c', 2);
1.191850 seconds (50 allocations: 1.493 GB, 0.14% gc time)
>>> all(r1 == r2)
true
First, create a vector c of the constants to multiply with. Then multiplay de columns of A element-wise by the values of c (A .* c', it does broadcasting inside). Last, reduce over the columns of A (the part sum(.., 2)).

Resources