I'm attempting to translate the equivalent of the following Python code (from SMT GEKPLS) into Julia:
def differences(X, Y):
D = X[:, np.newaxis, :] - Y[np.newaxis, :, :]
return D.reshape((-1, X.shape[1]))
So, given an input like this:
X = np.array([[1.0,1.0,1.0], [2.0,2.0,2.0]])
Y = np.array([[1.0,2.0,3.0], [4.0,5.0,6.0], [7.0,8.0,9.0]])
diff = differences(X,Y)
We get an output (diff) that looks like this:
[[ 0. -1. -2.]
[-3. -4. -5.]
[-6. -7. -8.]
[ 1. 0. -1.]
[-2. -3. -4.]
[-5. -6. -7.]]
What is an efficient way to do this with Julia code? I expect the X and Y input matrices to be quite large.
After some thinking, I came to this function:
function differences(X, Y)
Rx = repeat(X, inner=(size(Y, 1), 1))
Ry = repeat(Y, size(X, 1))
Rx - Ry
end
I hope I was helpful.
Here's a version that avoids repeat, which creates unnecessary data duplication:
function diffs_row(X, Y)
N = size(X, 2)
return reshape(reshape(X', 1, N, :) .- Y', N, :)'
end
The reason for all the adjoints ' is that it isn't really natural to operate row-wise in Julia. Julia arrays are column-major so reshape will retrieve data column-wise. If you decide instead to change the orientation of the data, you could write
function diffs_col(X, Y)
N = size(X, 1)
return reshape(reshape(X, N, 1, :) .- Y, N, :)
end
instead.
One often sees this when translating numpy code to Julia. Numpy is natively row-major, so the translation becomes a bit awkward. You should consider changing your data layout to be column major in many cases.
This might be faster than other alternatives, while still being easy to understand.
[x .- y for x ∈ X for y ∈ Y]
6-element Vector{Vector{Float64}}:
[0.0, -1.0, -2.0]
[-3.0, -4.0, -5.0]
[-6.0, -7.0, -8.0]
[1.0, 0.0, -1.0]
[-2.0, -3.0, -4.0]
[-5.0, -6.0, -7.0]
The one thing I disliked about numpy is that one has to exactly remember each function in conjunction with a combination of input parameters. In Julia, the traditional loop can serve as an efficient drop-in replacement for most algorithms.
Addendum: The above might be the fastest solution as I said, provided that working with a Vector{Vector{Float64}} is not an issue. If it is, here is another solution that outputs a Matrix{Float64} while being fast as well.
function diffr(X,Y)
i, l, m, n = 0, length(first(X)), length(X), length(Y)
Z = Matrix{Float64}(undef, m*n, l)
for x in X, y in Y
Z[i+=1,:] .= x .- y
end
Z
end
And here is a performance comparison of all posted solutions on my computer.
#btime [x.-y for x∈$X for y∈$Y] # 312.245 ns (9 allocations: 656 bytes)
#btime diffr($X, $Y) # 73.868 ns (1 allocation: 208 bytes)
#btime differences($X, $Y) # 439.000 ns (12 allocations: 896 bytes)
#btime diffs_row($X, $Y) # 463.131 ns (11 allocations: 784 bytes)
Related
I'm trying to use Optim in Julia to solve a two variable minimization problem, similar to the following
x = [1.0, 2.0, 3.0]
y = 1.0 .+ 2.0 .* x .+ [-0.3, 0.3, -0.1]
function sqerror(betas, X, Y)
err = 0.0
for i in 1:length(X)
pred_i = betas[1] + betas[2] * X[i]
err += (Y[i] - pred_i)^2
end
return err
end
res = optimize(b -> sqerror(b, x, y), [0.0,0.0])
res.minimizer
I do not quite understand what [0.0,0.0] means. By looking at the document http://julianlsolvers.github.io/Optim.jl/v0.9.3/user/minimization/. My understanding is that it is the initial condition. However, if I change that to [0.0,0., 0.0], the algorithm still work despite the fact that I only have two unknowns, and the algorithm gives me three instead of two minimizer. I was wondering if anyone knows what[0.0,0.0] really stands for.
It is initial value. optimize by itself cannot know how many values your sqerror function takes. You specify it by passing this initial value.
For example if you add dimensionality check to sqerror you will get a proper error:
julia> function sqerror(betas::AbstractVector, X::AbstractVector, Y::AbstractVector)
#assert length(betas) == 2
err = 0.0
for i in eachindex(X, Y)
pred_i = betas[1] + betas[2] * X[i]
err += (Y[i] - pred_i)^2
end
return err
end
sqerror (generic function with 2 methods)
julia> optimize(b -> sqerror(b, x, y), [0.0,0.0,0.0])
ERROR: AssertionError: length(betas) == 2
Note that I also changed the loop condition to eachindex(X, Y) to ensure that your function checks if X and Y vectors have aligned indices.
Finally if you want performance and reduce compilation cost (so e.g. assuming you do this optimization many times) it would be better to define your optimized function like this:
objective_factory(x, y) = b -> sqerror(b, x, y)
optimize(objective_factory(x, y), [0.0,0.0])
x = [1, 2, 3, 4]
y = [1, 2]
If I want to be able to operate on the two vectors with a default value filling in, what are the strategies?
E.g. would like to do the following and implicitly fill in with 0 or missing
x + y # would like [2, 4, 3, 4]
Ideally would like to do this in a generic way so that I could do arbitrary operations with the two.
Disregarding whether Julia has something built-in to do this, remember that Julia is fast. This means that you can write code to support this kind of need.
extend!(x, y::Vector, default=0) = extend!(x, length(y), default)
extend!(x, n::Int, default=0) = begin
while length(x) < n
push!(x, default)
end
x
end
Then when you have code such as you describe, you can symmetrically extend x and y:
x = [1, 2, 3, 4]
y = [1, 2]
extend!(x, y)
extend!(y, x)
x + y
==> [2, 4, 3, 4]
Note that this mutates y. In many cases, the desired length would come from outside the code and would be applied to both x and y. I can also imagine that 0 is a bad default in general (even though it is completely appropriate in your context of addition.
A comment below makes the worthy point that you should consider using append! instead of looping over push!. In fact, it is best to measure differences like that if you care about very small differences. I went ahead and tested:
julia> using BenchmarkTools
julia> extend1(x, n) = begin
while length(x) < n
push!(x, 0)
end
x
end
julia> #btime begin
x = rand(10)
sum(x)
end
59.815 ns (1 allocation: 160 bytes)
5.037723569560573
julia> #btime begin
x = rand(10)
extend1(x, 1000)
sum(x)
end
7.281 μs (8 allocations: 20.33 KiB)
6.079832879992913
julia> x = rand(10)
julia> #btime begin
x = rand(10)
append!(x, zeros(990))
sum(x)
end
1.290 μs (3 allocations: 15.91 KiB)
3.688526541987817
julia>
Pushing primitives in a loop is damned fast, allocating a vector of zeros so we can use append! is very slightly faster.
But the real lesson here is seen in the fact that the loop version takes microseconds to append nearly 1000 values (reallocating the array several times). Appending 10 values one by one takes just over 150ns (and append! is slightly faster). This is blindingly fast. Literally doing nothing in R or Python can take longer than this.
This difference would matter in some situations and would be undetectable in many others. If it matters, measure. If it doesn't, do the simplest thing that comes to mind because Julia has your back (performance-wise).
FURTHER UPDATE
Taking a hint from another of Colin's comments, here are results where we use append! but we don't allocate a list. Instead, we use a generator ... that is, a data structure that invents data when asked for it with an interface much like a list. The results are much better than what I showed above.
julia> #btime begin
x = rand(10)
append!(x, (0 for i in 1:990))
sum(x)
end
565.814 ns (2 allocations: 8.03 KiB)
Note the round brackets around the 0 for i in 1:990.
In the end, Colin was right. Using append! is much faster if we can avoid related overheads. Surprisingly, the base function Iterators.repeated(0, 990) is much slower.
But, no matter what, all of these options are pretty blazingly fast and all of them would probably be so fast that none of these subtle differences would matter.
Julia is fun!
Note that if you want to fill with missing or some other type different from the element type in your original vector, then you will need to change the type of your vectors to allow those new elements. The function below will handle any case.
function fillvectors(x, y, fillvalue=missing)
xl = length(x)
yl = length(y)
if xl < yl
x::Vector{Union{eltype(x), typeof(fillvalue)}} = x
for i in xl+1:yl
push!(x, fillvalue)
end
end
if yl < xl
y::Vector{Union{eltype(y), typeof(fillvalue)}} = y
for i in yl+1:xl
push!(y, fillvalue)
end
end
return x, y
end
x = [1, 2, 3, 4]
y = [1, 2]
julia> (x, y) = fillvectors(x, y)
([1, 2, 3, 4], Union{Missing, Int64}[1, 2, missing, missing])
julia> y
4-element Vector{Union{Missing, Int64}}:
1
2
missing
missing
julia> (x, y) = fillvectors(x, y, 0)
([1, 2, 3, 4], [1, 2, 0, 0])
julia> y
4-element Vector{Int64}:
1
2
0
0
julia> (x, y) = fillvectors(x, y, 1.001)
([1, 2, 3, 4], Union{Float64, Int64}[1, 2, 1.001, 1.001])
julia> y
4-element Vector{Union{Float64, Int64}}:
1
2
1.001
1.001
Here is my code in Julia platform and I like to speed it up. Is there anyway that I can make this faster? It takes 0.5 seconds for a dataset of 50k*50k. I was expecting Julia to be a lot faster than this or I am not sure if I am doing a silly implementation.
ar = [[1,2,3,4,5], [2,3,4,5,6,7,8], [4,7,8,9], [9,10], [2,3,4,5]]
SV = rand(10,5)
function h_score_0(ar ,SV)
m = length(ar)
SC = Array{Float64,2}(undef, size(SV, 2), m)
for iter = 1:m
nodes = ar[iter]
for jj = 1:size(SV, 2)
mx = maximum(SV[nodes, jj])
mn = minimum(SV[nodes, jj])
term1 = (mx - mn)^2;
SC[jj, iter] = (term1);
end
end
return score = sum(SC, dims = 1)
end
You have some unnecessary allocations in your code:
mx = maximum(SV[nodes, jj])
mn = minimum(SV[nodes, jj])
Slices allocate, so each line makes a copy of the data here, you're actually copying the data twice, once on each line. You can either make sure to copy only once, or even better: use view, so there is no copy at all (note that view is much faster on Julia v1.5, in case you are using an older version).
SC = Array{Float64,2}(undef, size(SV, 2), m)
And no reason to create a matrix here, and sum over it afterwards, just accumulate while you are iterating:
score[i] += (mx - mn)^2
Here's a function that is >5x as fast on my laptop for the input data you specified:
function h_score_1(ar, SV)
score = zeros(eltype(SV), length(ar))
#inbounds for i in eachindex(ar)
nodes = ar[i]
for j in axes(SV, 2)
SVview = view(SV, nodes, j)
mx = maximum(SVview)
mn = minimum(SVview)
score[i] += (mx - mn)^2
end
end
return score
end
This function outputs a one-dimensional vector instead of a 1xN matrix in your original function.
In principle, this could be even faster if we replace
mx = maximum(SVview)
mn = minimum(SVview)
with
(mn, mx) = extrema(SVview)
which only traverses the vector once, instead of twice. Unfortunately, there is a performance issue with extrema, so it is currently not as fast as separate maximum/minimum calls: https://github.com/JuliaLang/julia/issues/31442
Finally, for absolutely getting the best performance at the cost of brevity, we can avoid creating a view at all and turn the calls to maximum and minimum into a single explicit loop traversal:
function h_score_2(ar, SV)
score = zeros(eltype(SV), length(ar))
#inbounds for i in eachindex(ar)
nodes = ar[i]
for j in axes(SV, 2)
mx, mn = -Inf, +Inf
for node in nodes
x = SV[node, j]
mx = ifelse(x > mx, x, mx)
mn = ifelse(x < mn, x, mn)
end
score[i] += (mx - mn)^2
end
end
return score
end
This also avoids the performance issue that extrema suffers, and looks up the SV element once per node. Although this version is annoying to write, it's substantially faster, even on Julia 1.5 where views are free. Here are some benchmark timings with your test data:
julia> using BenchmarkTools
julia> #btime h_score_0($ar, $SV)
2.344 μs (52 allocations: 6.19 KiB)
1×5 Matrix{Float64}:
1.95458 2.94592 2.79438 0.709745 1.85877
julia> #btime h_score_1($ar, $SV)
392.035 ns (1 allocation: 128 bytes)
5-element Vector{Float64}:
1.9545848011260765
2.9459235098820167
2.794383144368953
0.7097448590904598
1.8587691646610984
julia> #btime h_score_2($ar, $SV)
118.243 ns (1 allocation: 128 bytes)
5-element Vector{Float64}:
1.9545848011260765
2.9459235098820167
2.794383144368953
0.7097448590904598
1.8587691646610984
So explicitly writing out the innermost loop is worth it here, reducing time by another 3x or so. It's annoying that the Julia compiler isn't yet able to generate code this efficient, but it does get smarter with every version. On the other hand, the explicit loop version will be fast forever, so if this code is really performance critical, it's probably worth writing it out like this.
There are two ways one can initialize a NXN sparse matrix, whose entries are to be read from one/multiple text files. Which one is faster? I need the more efficient one, as N is large, typically 10^6.
1). I could store the (x,y) indices in arrays x, y, the entries in an array v and declare
K = sparse(x,y,value);
2). I could declare
K = spzeros(N)
then read of the (i,j) coordinates and values v and insert them as
K[i,j]=v;
as they are being read.
I found no tips about this on Julia’s page on sparse arrays.
Don’t insert values one by one: that will be tremendously inefficient since the storage in the sparse matrix needs to be reallocated over and over again.
You can also use BenchmarkTools.jl to verify this:
julia> using SparseArrays
julia> using BenchmarkTools
julia> I = rand(1:1000, 1000); J = rand(1:1000, 1000); X = rand(1000);
julia> function fill_spzeros(I, J, X)
x = spzeros(1000, 1000)
#assert axes(I) == axes(J) == axes(X)
#inbounds for i in eachindex(I)
x[I[i], J[i]] = X[i]
end
x
end
fill_spzeros (generic function with 1 method)
julia> #btime sparse($I, $J, $X);
10.713 μs (12 allocations: 55.80 KiB)
julia> #btime fill_spzeros($I, $J, $X);
96.068 μs (22 allocations: 40.83 KiB)
Original post can be found here
Is there a generic way to zero out small values in an array?
By "small" I mean elements whose absolute value is less than some threshold like 10.0^-5.
Edit: For now, I loop with eachindex.
function sparsify(a, eps)
for i in eachindex(a)
if abs(a[i]) < eps
a[i] = 0
end
end
end
Why not just apply a mask and an element-wise less than operator?
>>> x = rand(Float32, 100)
>>> eps = 0.5
>>> x[abs(x) .< eps] = 0
or as a function (note the function modifies the vector x inplace):
>>> sparsify!(x, eps) = x[abs(x) .< eps] = 0;
You could also replace 0 with zero(eltype(x)) to ensure it has the same type as x.
The temporary boolean mask created by x .< eps will compare every element of x to eps. Then, every element that satisfies that condition will be set to 0.
To complete answer of Imanol Luengo, and extend it to multiple dimensions,
x[abs.(x) .< eps(eltype(x))] .= zero(eltype(x))
I ended up with a vectorized method, which is much shorter.
sparsify(x, eps) = abs(x) < eps ? 0.0 : x
#vectorize_2arg Float64 sparsify
Disclaimer (2019): The below answer is badly out of date, and refers to an old version of Julia (<0.7). In version 1.x you should instead use x .= 0 or fill!(x, 0).
What approach to choose depends on what you need. If you just need a simple one-liner, then the vectorized version is fine. But if you want optimal performance, a loop will serve you better.
Here are a few alternatives compared by performance. Do keep in mind that map is slow on version 0.4. The timings here are done with version 0.5.
function zerofy!(x, vmin)
for (i, val) in enumerate(x)
if abs(val) < vmin
x[i] = zero(eltype(x))
end
end
end
zerofy2!(x, vmin) = ( x[abs(x) .< vmin] = zero(eltype(x)) )
zerofy3(x, eps) = abs(x) < eps ? 0.0 : x
#vectorize_2arg Float64 zerofy3!
zerofy4(y, vmin) = map(x -> abs(x)<vmin ? zero(x) : x, y)
zerofy4!(y, vmin) = map!(x -> abs(x)<vmin ? zero(x) : x, y)
function time_zerofy(n, vmin)
x1 = rand(n)
x2, x3, x4, x5 = copy(x1), copy(x1), copy(x1), copy(x1)
#time zerofy!(x1, vmin)
#time zerofy2!(x2, vmin)
#time zerofy3(x3, vmin)
#time zerofy4(x4, vmin)
#time zerofy4!(x5, vmin)
return nothing
end
julia> time_sparse(10^8, 0.1)
0.122510 seconds
1.078589 seconds (73.25 k allocations: 778.590 MB, 5.42% gc time)
0.558914 seconds (2 allocations: 762.940 MB)
0.688640 seconds (5 allocations: 762.940 MB)
0.243921 seconds
There's a pretty big difference between the loop (fastest) and the naively vectorized one.
Edit: zerofy3! => zerofy3 since it's not in-place.