I am working on an optimization problem with a huge number of variables (upwards of hundreds of millions). Each of them should be a 0-1 binary variable.
I can write it in the form (maximize x'Qx) where Q is positive semi-definite, and I am using Julia, so the package COSMO.jl seems like a great fit. However, there is a ton of sparsity in my problem. Q is 0 except on approximately sqrt(|Q|) entries, and for the constraints there are approximately sqrt(|Q|) linear constraints on the variables.
I can describe this system pretty easily using SparseArrays, but it appears the most natural way to input problems into COSMO uses standard arrays. Is there a way I can take advantage of the sparsity in this massive problem?
While there is no sample code in your perhaps this could help:
JuMP works with sparse arrays so perhaps the easiest thing could be just use it in the construction of the goal function:
julia> using JuMP, SparseArrays, COSMO
julia> m = Model(with_optimizer(COSMO.Optimizer));
julia> q = sprand(Bool, 20, 20,0.05) # for readability I use a binary q
20×20 SparseMatrixCSC{Bool, Int64} with 21 stored entries:
⠀⠀⠀⡔⠀⠀⠀⠀⡀⠀
⠀⠀⠂⠀⠠⠀⠀⠈⠑⠀
⠀⠀⠀⠀⠀⠤⠀⠀⠀⠀
⠀⢠⢀⠄⠆⠀⠂⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠄⠀⠀⠌
julia> #variable(m, x[1:20], Bin);
julia> x'*q*x
x[1]*x[14] + x[14]*x[3] + x[15]*x[8] + x[16]*x[5] + x[18]*x[4] + x[18]*x[13] + x[19]*x[14] + x[20]*x[11]
You can see that the equation gets correctly reduced.
Indeed you could check the performance with a very sparse q having 100M elements:
julia> q = sprand(10000, 10000,0.000001)
10000×10000 SparseMatrixCSC{Float64, Int64} with 98 stored entries:
...
julia> #variable(m,z[1:10000], Bin);
julia> #btime $z'*$q*$z
1.276 ms (51105 allocations: 3.95 MiB)
You can see that you are just getting the expected performance when constructing the goal function.
Related
In Julia, I would like to randomly generate a discrete fourier transform matrix of size n by n. I am currently not sure how to how to do such. Does anyone perhaps know a way to do this in Julia?
As said in the comments, you can use the FFTW.jl package for this purpose:
julia> using FFTW
julia> n = 5;
julia> rnd = rand(1:100, n, n);
julia> fft(rnd)
5×5 Matrix{ComplexF64}:
1216.0+0.0im 65.8754+10.3181im 106.125+119.409im 106.125-119.409im 65.8754-10.3181im
160.529-95.3957im 177.376-31.8946im -28.6976+150.325im 52.8237+139.038im 82.2517-165.542im
-91.0288-22.1566im 136.676+28.1im -42.8763-97.2573im -97.7517+4.15021im 8.19756-13.5548im
-91.0288+22.1566im 8.19756+13.5548im -97.7517-4.15021im -42.8763+97.2573im 136.676-28.1im
160.529+95.3957im 82.2517+165.542im 52.8237-139.038im -28.6976-150.325im 177.376+31.8946im
And for a Real datatype, you can use the rfft function:
julia> let n = 5
rnd = rand(n, n)
rfft(rnd)
end
3×5 Matrix{ComplexF64}:
10.54+0.0im 1.15104+0.522166im -0.449373-0.686863im -0.449373+0.686863im 1.15104-0.522166im
-1.2319+0.3485im -0.622914-0.649385im 1.39743-0.733653im 1.66696+0.694317im -1.59092-0.578805im
0.501205+0.962713im 0.056338-0.207403im 0.0156042-0.181913im -1.87067-1.66951im -0.672603-0.969665im
It might rise the question that why the result is a 3x5 matrix:
According to the official doc about the rfft function:
"Multidimensional FFT of a real array A, exploiting the fact that the transform has conjugate symmetry in order to save roughly half the computational time and storage costs compared with fft. If A has size (n_1, ..., n_d), the result has size (div(n_1,2)+1, ..., n_d)."
It's also possible to first create a random nxn matrix with the eltype of ComplexF64 and perform the fft on it; For this create the rnd variable like rand(ComplexF64, n, n) in the above let block, and replace the rfft with fft function.
For scalars, the \ (solve linear system) operator is equivalent to the division operator /. Is the performance similar?
I ask because currently my code has a line like
x = (1 / alpha) * averylongfunctionname(input1, input2, input3)
Visually, it is important that the division by alpha happens on the "left," so I am considering replacing this with
x = alpha \ averylongfunctionname(input1, input2, input3)
What is the best practice in this situation, from the standpoint of style and the standpoint of performance?
Here are some perplexing benchmarking results:
julia> using BenchmarkTools
[ Info: Precompiling BenchmarkTools [6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf]
julia> #btime x[1]\sum(x) setup=(x=rand(100))
15.014 ns (0 allocations: 0 bytes)
56.23358979466163
julia> #btime (1/x[1]) * sum(x) setup=(x=rand(100))
13.312 ns (0 allocations: 0 bytes)
257.4552413802698
julia> #btime sum(x)/x[1] setup=(x=rand(100))
14.929 ns (0 allocations: 0 bytes)
46.25209548841374
They are all about the same, but I'm surprised that the (1 / x) * foo approach has the best performance.
Scalar / and \ really should have the same meaning and performance. Let's define these two test functions:
f(a, b) = a / b
g(a, b) = b \ a
We can then see that they produce identical LLVM code:
julia> #code_llvm f(1.5, 2.5)
; # REPL[29]:1 within `f'
define double #julia_f_380(double %0, double %1) {
top:
; ┌ # float.jl:335 within `/'
%2 = fdiv double %0, %1
; └
ret double %2
}
julia> #code_llvm g(1.5, 2.5)
; # REPL[30]:1 within `g'
define double #julia_g_382(double %0, double %1) {
top:
; ┌ # operators.jl:579 within `\'
; │┌ # float.jl:335 within `/'
%2 = fdiv double %0, %1
; └└
ret double %2
}
And the same machine code too. I'm not sure what is causing the differences in #btime results, but I'm pretty sure that the difference between / and \ is an illusion and not real.
As to x*(1/y), that does not compute the same thing as x/y: it will be potentially less accurate since there is rounding done when computing 1/y and then that rounded value is multiplied by x, which also rounds. For example:
julia> 17/0.7
24.28571428571429
julia> 17*(1/0.7)
24.285714285714285
Since floating point division is guaranteed to be correctly rounded, doing division directly is always going to be more accurate. If the divisor is shared by a lot of loop iterations, however, you can get a speedup by rewriting the computation like this since floating-point multiplication is usually faster than division (although timing my current computer does not show this). Be aware that this comes at a loss of accuracy, however, and if the divisor is not shared there would still be a loss of accuracy and no performance gain.
I don't know, but I can suggest you to try using BenchmarkTools
package: it can help you to evaluate the performance of the two
different statements. Here you can find more details. Bye!
I think that the best choice is (1/x)*foo for two reasons:
it has the best performance (although not much compared to the other ones);
it is more clear for another person reading the code.
The well known formula for OLS is (X'X)^(-1)X'y where X is nxK and y is nx1.
One way to implement this in Julia is (X'*X)\X'*y.
But I found that X\y gives the almost same output up to the tiny computational error.
Do they always compute the same thing (as long as n>k)? If so, which one should I use?
When X is square, there is a unique solution and LU-factorization (with pivoting) is a numerically-stable way to calculate this. That is the algorithm that backslash uses in this case.
When X is not square, which is the case in most regression problems, then there is no unique solution but there is a unique least square solution. The QR factorization method for solving Xβ = y is a numerically stable method for generating the least square solution, and in this case X\y uses the QR-factorization and thus gives the OLS solution.
Notice the words numerically stable. While (X'*X)\X'*y will theoretically always give the same result as backslash, in practice backslash (with the correct factorization choice) will be more precise. This is because the factorization algorithms are implemented to be numerically stable. Because of the change for floating point errors to accumulate when doing (X'*X)\X'*y, it's not recommended that you use this form for any real numerical work.
Instead, (X'*X)\X'*y is somewhat equivalent to an SVD factorization which is the most nuemrically stable algorithm, but also the most expensive (in fact, it's basically writing out the Moore-Penrose pseudoinverse which is how an SVD factorization is used to solve a linear system). To directly do an SVD factorization using a pivoted SVD, do svdfact(X) \ y on v0.6 or svd(X) \ y on v0.7. Doing this directly is more stable than (X'*X)\X'*y. Note that qrfact(X) \ y or qr(X) \ y (v0.7) is for QR. See the factorizations portion of the documentation for more details on all of the choices.
Following the documentation the result of X\y is (there notation \(A, B) is used not X and y):
For rectangular A the result is the minimum-norm least squares solution
This is your case I guess as you assume n>k (so your matrix is not square). So you can safely use X\y. Actually it is better to use it than the standard formula as you will get a result even if rank of X is less than min(n,k), whereas standard formula (X'*X)^(-1)*X'*y will fail or produce numerically unstable result if X'*X is nearly singular.
If X would be square (this is not your case) then we have a bit different rule in the documentation:
For input matrices A and B, the result X is such that A*X == B when A is square
This means that the \ algorithm would produce an error if your matrix were singular or produce numerically unstable results if the matrix were nearly singular (in practice most often lu function that is called internally for general dense matrices may throw SingularException).
If you want a catch-all solution (for square and non square matrices) then qr(X, Val(true)) \ y can be used.
Short answer: No, use the first one (the well-known one).
Long answer:
The linear regression model is Xβ = y, and it's easily to derive β = X \ y, which is your second method. However, in most time (when X is not invertible), this is wrong, since you cannot simply left multiply X^-1. The correct way is to solve β = argmin{‖y - Xβ‖^2} instead, which leads to the first method.
To show they are not always the same, simple construct a case where X is not invertible:
julia> X = rand(10, 10)
10×10 Array{Float64,2}:
0.938995 0.32773 0.740556 0.300323 0.98479 0.48808 0.748006 0.798089 0.864154 0.869864
0.973832 0.99791 0.271083 0.841392 0.743448 0.0951434 0.0144092 0.785267 0.690008 0.494994
0.356408 0.312696 0.543927 0.951817 0.720187 0.434455 0.684884 0.72397 0.855516 0.120853
0.849494 0.989129 0.165215 0.76009 0.0206378 0.259737 0.967129 0.733793 0.798215 0.252723
0.364955 0.466796 0.227699 0.662857 0.259522 0.288773 0.691278 0.421251 0.593215 0.542583
0.126439 0.574307 0.577152 0.664301 0.60941 0.742335 0.459951 0.516649 0.732796 0.990509
0.430213 0.763126 0.737171 0.433884 0.85549 0.163837 0.997908 0.586575 0.257428 0.33239
0.28398 0.162054 0.481452 0.903363 0.780502 0.994575 0.131594 0.191499 0.702596 0.0967979
0.42463 0.142 0.705176 0.0481886 0.728082 0.709598 0.630134 0.139151 0.423227 0.942262
0.197805 0.526095 0.562136 0.648896 0.805806 0.168869 0.200355 0.557305 0.69514 0.227137
julia> y = rand(10, 1)
10×1 Array{Float64,2}:
0.7751785556478308
0.24185992335144801
0.5681904264574333
0.9134364924569847
0.20167825754443536
0.5776727022413637
0.05289808385359085
0.5841180308242171
0.2862768657856478
0.45152080383822746
julia> ((X' * X) ^ -1) * X' * y
10×1 Array{Float64,2}:
-0.3768345891121706
0.5900885565174501
-0.6326640292669291
-1.3922334538787071
0.06182039005215956
1.0342060710792016
0.045791973670925995
0.7237081408801955
1.4256831037950832
-0.6750765481219443
julia> X \ y
10×1 Array{Float64,2}:
-0.37683458911228906
0.5900885565176254
-0.6326640292676649
-1.3922334538790346
0.061820390052523294
1.0342060710793235
0.0457919736711274
0.7237081408802206
1.4256831037952566
-0.6750765481220102
julia> X[2, :] = X[1, :]
10-element Array{Float64,1}:
0.9389947787349187
0.3277301697101178
0.7405555185711721
0.30032257202572477
0.9847899425069042
0.48807977638742295
0.7480061513093117
0.79808859136911
0.8641540973071822
0.8698636291189576
julia> ((X' * X) ^ -1) * X' * y
10×1 Array{Float64,2}:
0.7456524759867015
0.06233042922132548
2.5600126098899256
0.3182206475232786
-2.003080524452619
0.272673133766017
-0.8550165639656011
0.40827327221785403
0.2994419115664999
-0.37876151249955264
julia> X \ y
10×1 Array{Float64,2}:
3.852193379477664e15
-2.097948470376586e15
9.077766998701864e15
5.112094484728637e15
-5.798433818338726e15
-2.0446050874148052e15
-3.300267174800096e15
2.990882423309131e14
-4.214829360472345e15
1.60123572911982e15
I am newbie in Julia programming language, so I don't know much of how to optimize a code. I have heard that Julia should be faster in comparison to Python, but I've written a simple Julia code for solving the FitzHugh–Nagumo model , and it doesn't seems to be faster than Python.
The FitzHugh–Nagumo model equations are:
function FHN_equation(u,v,a0,a1,d,eps,dx)
u_t = u - u.^3 - v + laplacian(u,dx)
v_t = eps.*(u - a1 * v - a0) + d*laplacian(v,dx)
return u_t, v_t
end
where u and v are the variables, which are 2D fields (that is, 2 dimensional arrays), and a0,a1,d,eps are the model's parameters. Both parameters and the variables are of type Float. dx is the parameter that control the separation between grid point, for the use of the laplacian function, which is an implementation of finite differences with periodic boundary conditions.
If one of you expert Julia coders can give me a hint of how to do things better in Julia I will be happy to hear.
The Runge-Kutte step function is:
function uv_rk4_step(Vs,Ps, dt)
u = Vs.u
v = Vs.v
a0=Ps.a0
a1=Ps.a1
d=Ps.d
eps=Ps.eps
dx=Ps.dx
du_k1, dv_k1 = FHN_equation(u,v,a0,a1,d,eps,dx)
u_k1 = dt*du_k1י
v_k1 = dt*dv_k1
du_k2, dv_k2 = FHN_equation((u+(1/2)*u_k1),(v+(1/2)*v_k1),a0,a1,d,eps,dx)
u_k2 = dt*du_k2
v_k2 = dt*dv_k2
du_k3, dv_k3 = FHN_equation((u+(1/2)*u_k2),(v+(1/2)*v_k2),a0,a1,d,eps,dx)
u_k3 = dt*du_k3
v_k3 = dt*dv_k3
du_k4, dv_k4 = FHN_equation((u+u_k3),(v+v_k3),a0,a1,d,eps,dx)
u_k4 = dt*du_k4
v_k4 = dt*dv_k4
u_next = u+(1/6)*u_k1+(1/3)*u_k2+(1/3)*u_k3+(1/6)*u_k4
v_next = v+(1/6)*v_k1+(1/3)*v_k2+(1/3)*v_k3+(1/6)*v_k4
return u_next, v_next
end
And I've used imshow() from PyPlot package to plot the u field.
This is not a complete answer, but a taste of an optimization attempt on the laplacian function. The original laplacian on a 10x10 matrix gave me the #time:
0.000038 seconds (51 allocations: 12.531 KB)
While this version:
function laplacian2(a,dx)
# Computes Laplacian of a matrix
# Usage: al=laplacian(a,dx)
# where dx is the grid interval
ns=size(a,1)
ns != size(a,2) && error("Input matrix must be square")
aa=zeros(ns+2,ns+2)
for i=1:ns
aa[i+1,1]=a[i,end]
aa[i+1,end]=a[i,1]
aa[1,i+1]=a[end,i]
aa[end,i+1]=a[1,i]
end
for i=1:ns,j=1:ns
aa[i+1,j+1]=a[i,j]
end
lap = Array{eltype(a),2}(ns,ns)
scale = inv(dx*dx)
for i=1:ns,j=1:ns
lap[i,j]=(aa[i,j+1]+aa[i+2,j+1]+aa[i+1,j]+aa[i+1,j+2]-4*aa[i+1,j+1])*scale
end
return lap
end
Gives #time:
0.000010 seconds (6 allocations: 2.250 KB)
Notice the reduction in allocations. Extra allocations usually indicate the potential for optimization.
All,
I've just been starting to play around with the Julia language and am enjoying it quite a bit. At the end of the 3rd tutorial there's an interesting problem: genericize the quadratic formula such that it solves for the roots of any n-order polynomial equation.
This struck me as (a) an interesting programming problem and (b) an interesting Julia problem. Has anyone out there solved this one? For reference, here is the Julia code with a couple toy examples. Again, the idea is to make this generic for any n-order polynomial.
Cheers,
Aaron
function derivative(f)
return function(x)
# pick a small value for h
h = x == 0 ? sqrt(eps(Float64)) : sqrt(eps(Float64)) * x
# floating point arithmetic gymnastics
xph = x + h
dx = xph - x
# evaluate f at x + h
f1 = f(xph)
# evaluate f at x
f0 = f(x)
# divide the difference by h
return (f1 - f0) / dx
end
end
function quadratic(f)
f1 = derivative(f)
c = f(0.0)
b = f1(0.0)
a = f(1.0) - b - c
return (-b + sqrt(b^2 - 4a*c + 0im))/2a, (-b - sqrt(b^2 - 4a*c + 0im))/2a
end
quadratic((x) -> x^2 - x - 2)
quadratic((x) -> x^2 + 2)
The package PolynomialRoots.jl provides the function roots() to find all (real and complex) roots of polynomials of any order. The only mandatory argument is the array with coefficients of the polynomial in ascending order.
For example, in order to find the roots of
6x^5 + 5x^4 + 3x^2 + 2x + 1
after loading the package (using PolynomialRoots) you can use
julia> roots([1, 2, 3, 4, 5, 6])
5-element Array{Complex{Float64},1}:
0.294195-0.668367im
-0.670332+2.77556e-17im
0.294195+0.668367im
-0.375695-0.570175im
-0.375695+0.570175im
The package is a Julia implementation of the root-finding algorithm described in this paper: http://arxiv.org/abs/1203.1034
PolynomialRoots.jl has also support for arbitrary precision calculation. This is useful for solving equation that cannot be solved in double precision. For example
julia> r = roots([94906268.375, -189812534, 94906265.625]);
julia> (r[1], r[2])
(1.0000000144879793 - 0.0im,1.0000000144879788 + 0.0im)
gives the wrong result for the polynomial, instead passing the input array in arbitrary precision forces arbitrary precision calculations that provide the right answer (see https://en.wikipedia.org/wiki/Loss_of_significance):
julia> r = roots([BigFloat(94906268.375), BigFloat(-189812534), BigFloat(94906265.625)]);
julia> (Float64(r[1]), Float64(r[2]))
(1.0000000289759583,1.0)
There are no algebraic formulae for a general polynomials of degree five and above (infact there cant be see here). So theoretically, you could proceed using the same methodology for solutions to cubics and quartics, but even that would be a lot of hard work given very unwieldy formulae for roots of quartics. You could also use a CAS like SymPy to find out those formulae.