Solve a particular linear system efficiently in julia - julia

I use extensively the julia's linear equation solver res = X\b. I have to use it millions of times in my program because of parameter variation. This was working ok because I was using small dimensions (up to 30). Now that I want to analyse bigger systems, up to 1000, the linear solver is no longer efficient.
I think there can be a work around. However I must say that sometimes my X matrix is dense, and sometimes is sparse, so I need something that works fine for both cases.
The b vector is a vector with all zeroes, except for one entry which is always 1 (actually it is always the last entry). Moreover, I don't need all the res vector, just the first entry of it.

If your problem is of the form (A - µI)x = b, where µ is a variable parameter and A, b are fixed, you might work with diagonalization.
Let A = PDP° where P° denotes the inverse of P. Then (PDP° - µI)x = b can be transformed to
(D - µI)P°x = P°b,
P°x = P°b / (D - µI),
x = P(P°b / (D - µI)).
(the / operation denotes the division of the respective vector elements by the scalars Dr - µ.)
After you have diagonalized A, computing a solution for any µ reduces to two matrix/vector products, or a single one if you can also precompute P°b.
Numerical instability will show up in the vicinity of the Eigenvalues of A.

Usually when people talk about speeding up linear solvers res = X \ b, it’s for multiple bs. But since your b isn’t changing, and you just keep changing X, none of those tricks apply.
The only way to speed this up, from a mathematical perspective, seems to be to ensure that Julia is picking the fastest solver for X \ b, i.e., if you know X is positive-definite, use Cholesky, etc. Matlab’s flowcharts for how it picks the solver to use for X \ b, for dense and sparse X, are available—most likely Julia implements something close to these flowcharts too, but again, maybe you can find some way to simplify or shortcut it.
All programming-related speedups (multiple threads—while each individual solver is probably already multi-threaded, it may be worth running multiple solvers in parallel when each solver uses fewer threads than cores; #simd if you’re willing to dive into the solvers themselves; OpenCL/CUDA libraries; etc.) then can be applied.

Best approach for efficiency would be to use: JuliaMath/IterativeSolvers.jl. For A * x = b problems, I would recommend x = lsmr(A, b).
Second best alternatives would be to give a bit more information to the compiler: instead of x = inv(A'A) * A' * b, do x = inv(cholfact(A'A)) A' * b if Cholesky decomposition works for you. Otherwise, you could try U, S, Vt = svd(A) and x = Vt' * diagm(sqrt.(S)) * U' * b.
Unsure if x = pinv(A) * b is optimized, but might be slightly more efficient than x = A \ b.

Related

Integral basis nf.zk versus nfbasis in Pari GP

I have been using the database of lmfdb.org to find the integral basis of a number field. Now, I want to utilize PARI/GP in multiplying algebraic integers. However, I have encountered a problem. PARI/GP uses the integral basis "nf.zk" in its computations, which apparently is not always the same as the "nfbasis(f)", which is the integral basis that lmfdb.org provides.
For example, we have the following code from PARI/GP:
? f = x^3 - x^2 + 2*x + 8
nf = nfinit(f)
nf.zk
%1 = [1, x, 1/2*x^2 - 1/2*x + 1]
? nfbasis(f)
%2 = [1, x, 1/2*x^2 - 1/2*x]
Now, my questions are:
Why are nf.zk and nfbasis(f) different?
Why does PARI/GP use nf.zk instead of nfbasis(f)?
Lastly, can I tell PARI/GP to use nfbasis(f) instead of nf.zk?
When we take the trouble to initialize an nf structure with nfinit, we perform precomputations to speed up later work. Here, nfinit first computes the integer basis by calling nfbasis, which returns the (canonical) HNF basis, then LLL-reduces it with respect to the T2 norm. The LLL-reduced basis is usually different from the HNF one, but it usually has smaller elements.
This LLL reduction can be expensive (in particular when the degree is large) but it ensures that time complexities are bounded in terms of the field discriminant instead of the size of the input polynomial.
I believe all polynomials defining number fields in the lmfdb were run through polredabs which ensures their coefficients are small (in terms of the field discriminant), but the HNF integer basis may still be much larger than the LLL one. Additionally, if an algebraic integer has small T2 norm, its expression in terms of the LLL-reduced basis is guaranteed to have small coefficients, whereas it can have much larger coefficients on the HNF basis.
In pari-2.14 (which is not released yet but available via git or through nightly snapshots on the PARI/GP website), you can use nfinit(, 4), which removes the LLL reduction step. This speeds up the initialization, but usually slows down every operation involving the resulting nf.
? f = x^3 - x^2 + 2*x + 8
? nfinit(f,4).zk
%2 = [1, x, 1/2*x^2 - 1/2*x]

Get branch points of equation

If I have a general function,f(z,a), z and a are both real, and the function f takes on real values for all z except in some interval (z1,z2), where it becomes complex. How do I determine z1 and z2 (which will be in terms of a) using Mathematica (or is this possible)? What are the limitations?
For a test example, consider the function f[z_,a_]=Sqrt[(z-a)(z-2a)]. For real z and a, this takes on real values except in the interval (a,2a), where it becomes imaginary. How do I find this interval in Mathematica?
In general, I'd like to know how one would go about finding it mathematically for a general case. For a function with just two variables like this, it'd probably be straightforward to do a contour plot of the Riemann surface and observe the branch cuts. But what if it is a multivariate function? Is there a general approach that one can take?
What you have appears to be a Riemann surface parametrized by 'a'. Consider the algebraic (or analytic) relation g(a,z)=0 that would be spawned from this branch of a parametrized Riemann surface. In this case it is simply g^2 - (z - a)*(z - 2*a) == 0. More generally it might be obtained using Groebnerbasis, as below (no guarantee this will always work without some amount of user intervention).
grelation = First[GroebnerBasis[g - Sqrt[(z - a)*(z - 2*a)], {x, a, g}]]
Out[472]= 2 a^2 - g^2 - 3 a z + z^2
A necessary condition for the branch points, as functions of the parameter 'a', is that the zero set for 'g' not give a (single valued) function in a neighborhood of such points. This in turn means that the partial derivative of this relation with respect to g vanishes (this is from the implicit function theorem of multivariable calculus). So we find where grelation and its derivative both vanish, and solve for 'z' as a function of 'a'.
Solve[Eliminate[{grelation == 0, D[grelation, g] == 0}, g], z]
Out[481]= {{z -> a}, {z -> 2 a}}
Daniel Lichtblau
Wolfram Research
For polynomial systems (and some class of others), Reduce can do the job.
E.g.
In[1]:= Reduce[Element[{a, z}, Reals]
&& !Element[Sqrt[(z - a) (z - 2 a)], Reals], z]
Out[1]= (a < 0 && 2a < z < a) || (a > 0 && a < z < 2a)
This type of approach also works (often giving very complicated solutions for functions with many branch cuts) for other combinations of elementary functions I checked.
To find the branch cuts (as opposed to the simple class of branch points you're interested in) in general, I don't know of a good approach. The best place to find the detailed conventions that Mathematica uses is at the functions.wolfram site.
I do remember reading a good paper on this a while back... I'll try to find it....
That's right! The easiest approach I've seen for branch cut analysis uses the unwinding number. There's a paper "Reasoning about the elementary functions of complex analysis" about this the the journal "Artificial Intelligence and Symbolic Computation". It and similar papers can be found at one of the authors homepage: http://www.apmaths.uwo.ca/~djeffrey/offprints.html.
For general functions you cannot make Mathematica calculate it.
Even for polynomials, finding an exact answer takes time.
I believe Mathematica uses some sort of quantifier elimination when it uses Reduce,
which takes time.
Without any restrictions on your functions (are they polynomials, continuous, smooth?)
one can easily construct functions which Mathematica cannot simplify further:
f[x_,y_] := Abs[Zeta[y+0.5+x*I]]*I
If this function is real for arbitrary x and any -0.5 < y < 0 or 0<y<0.5,
then you will have found a counterexample to the Riemann zeta conjecture,
and I'm sure Mathematica cannot give a correct answer.

how to evaluate derivative of function in matlab?

This should be very simple. I have a function f(x), and I want to evaluate f'(x) for a given x in MATLAB.
All my searches have come up with symbolic math, which is not what I need, I need numerical differentiation.
E.g. if I define: fx = inline('x.^2')
I want to find say f'(3), which would be 6, I don't want to find 2x
If your function is known to be twice differentiable, use
f'(x) = (f(x + h) - f(x - h)) / 2h
which is second order accurate in h. If it is only once differentiable, use
f'(x) = (f(x + h) - f(x)) / h (*)
which is first order in h.
This is theory. In practice, things are quite tricky. I'll take the second formula (first order) as the analysis is simpler. Do the second order one as an exercise.
The very first observation is that you must make sure that (x + h) - x = h, otherwise you get huge errors. Indeed, f(x + h) and f(x) are close to each other (say 2.0456 and 2.0467), and when you substract them, you lose a lot of significant figures (here it is 0.0011, which has 3 significant figures less than x). So any error on h is likely to have a huge impact on the result.
So, first step, fix a candidate h (I'll show you in a minute how to chose it), and take as h for your computation the quantity h' = (x + h) - x. If you are using a language like C, you must take care to define h or x as volatile for that computation not to be optimized away.
Next, the choice of h. The error in (*) has two parts: the truncation error and the roundoff error. The truncation error is because the formula is not exact:
(f(x + h) - f(x)) / h = f'(x) + e1(h)
where e1(h) = h / 2 * sup_{x in [0,h]} |f''(x)|.
The roundoff error comes from the fact that f(x + h) and f(x) are close to each other. It can be estimated roughly as
e2(h) ~ epsilon_f |f(x) / h|
where epsilon_f is the relative precision in the computation of f(x) (or f(x + h), which is close). This has to be assessed from your problem. For simple functions, epsilon_f can be taken as the machine epsilon. For more complicated ones, it can be worse than that by orders of magnitude.
So you want h which minimizes e1(h) + e2(h). Plugging everything together and optimizing in h yields
h ~ sqrt(2 * epsilon_f * f / f'')
which has to be estimated from your function. You can take rough estimates. When in doubt, take h ~ sqrt(epsilon) where epsilon = machine accuracy. For the optimal choice of h, the relative accuracy to which the derivative is known is sqrt(epsilon_f), ie. half the significant figures are correct.
In short: too small a h => roundoff error, too large a h => truncation error.
For the second order formula, same computation yields
h ~ (6 * epsilon_f / f''')^(1/3)
and a fractional accuracy of (epsilon_f)^(2/3) for the derivative (which is typically one or two significant figures better than the first order formula, assuming double precision).
If this is too imprecise, feel free to ask for more methods, there are a lot of tricks to get better accuracy. Richardson extrapolation is a good start for smooth functions. But those methods typically compute f quite a few times, this may or not be what you want if your function is complex.
If you are going to use numerical derivatives a lot of times at different points, it becomes interesting to construct a Chebyshev approximation.
To get a numerical difference (symmetric difference), you calculate (f(x+dx)-f(x-dx))/(2*dx)
fx = #(x)x.^2;
fPrimeAt3 = (fx(3.1)-fx(2.9))/0.2;
Alternatively, you can create a vector of function values and apply DIFF, i.e.
xValues = 2:0.1:4;
fValues = fx(xValues);
df = diff(fValues)./0.1;
Note that diff takes the forward difference, and that it assumes that dx equals to 1.
However, in your case, you may be better off to define fx as a polynomial, and evaluating the derivative of the function, rather than the function values.
Lacking the symbolic toolbox, nothing stops you from using Derivest, a tool for automatic adaptive numerical differentiation.
derivest(#sin,pi)
ans =
-1
For your example it does very nicely. In fact, it even provides an estimate of the error in the resulting approximation.
fx = inline('x.^2');
[fp,errest] = derivest(fx,3)
fp =
6
errest =
3.6308e-14
did you try diff (calculates differences and approximates a derivative), gradient, or polyder (calculates the derivative of a polynomial) functions?
You can read more on these functions by using help <commandname> on MATLAB console, or use the function browser in the Help menu.
For a given function in analytical form, you can evaluate the derivative at a desired point with the following code:
syms x
df = diff(x^2);
df3 = subs(df, 'x', 3);
fprintf('f''(3)=%f\n', df3);
For pure numerical derivatives use the already given solutions by Jonas and posdef.

How to find which subset of bitfields xor to another bitfield?

I have a somewhat math-oriented problem. I have a bunch of bitfields and would like to calculate what subset of them to xor together to achieve a certain other bitfield, or if there isn't a way to do it discover that no such subset exists.
I'd like to do this using a free library, rather than original code, and I'd strongly prefer something with Python bindings (using Python's built-in math libraries would be acceptable as well, but I want to port this to multiple languages eventually). Also it would be good to not take the memory hit of having to expand each bit to its own byte.
Some further clarification: I only need a single solution. My matrices are the opposite of sparse. I'm very interested in keeping the runtime to an absolute minimum, so using algorithmically fancy methods for inverting matrices is strongly preferred. Also, it's very important that the specific given bitfield be the one outputted, so a technique which just finds a subset which xor to 0 doesn't quite cut it.
And I'm generally aware of gaussian elimination. I'm trying to avoid doing this from scratch!
cross-posted to mathoverflow, because it isn't clear what the right place for this question is - https://mathoverflow.net/questions/41036/how-to-find-which-subset-of-bitfields-xor-to-another-bitfield
Mathematically speaking, XOR of two bits can be treated as addition in F_2 field.
You want to solve a set of equations in a F_2 field. For four bitfiels with bits (a_0, a_1, ... a_n), (b_0, b_1, ..., b_n), (c_0, c_1, ..., c_n), (r_0, r_1, ..., r_n), you get equations:
x * a_0 + y * b_0 + z * c_0 = r_0
x * a_1 + y * b_1 + z * c_1 = r_1
...
x * a_n + y * b_n + z * c_n = r_n
(where you look for x, y, z).
You could program this as a simple integer linear problem with glpk, probably lp_solve (but I don't remember if it will fit). These might work very slowly though, as they are trying to solve much more general problem.
After googling for a while, it seems that this page might be a good start looking for code. From descriptions it seems that Dixon and LinBox could be a good fit.
Anyway, I think asking at mathoverflow might give you more precise answers. If you do, please link your question here.
Update: Sagemath uses M4RI for solving this problem. This makes it (for me) a very good recommendation.
For small instances that easily fit in memory, this is just solving a linear system over F_2, so try mod-2 Gaussian elimination. For very large sparse instances, like those that occur in factoring (sieve) algorithms, look up the Wiedemann algorithm.
It's possible to have multiple subsets xor to the same value; do you care about finding all subsets?
A perhaps heavy-handed approach would be to filter the powerset of bitfields. In Haskell:
import Data.Bits
xorsTo :: Int -> [Int] -> [[Int]]
xorsTo target fields = filter xorsToTarget (powerset fields)
where xorsToTarget f = (foldl xor 0 f) == target
powerset [] = [[]]
powerset (x:xs) = powerset xs ++ map (x:) (powerset xs)
Not sure if there is a way to do this without generating the powerset. (In the worst case, it is possible for the solution to actually be the entire powerset).
expanding on liori's answer above we have a linear system of equations (in modulo 2):
a0, b0, c0 ...| r0
a1, b1, c1 ...| r1
... |
an, bn, cn ...| rn
Gaussian elimination can be used to solve the system. In modulo 2, the add row operation becomes an XOR operation. It is much simpler computationally to do this than to use a generic linear systems solver.
So, if a0 is zero we swap up a row that has a 1 in the a position. Then perform an XOR (using row 0) on any other row whos "a" bit is a 1. Then repeat using row 1 and column b, then row 2 col c, etc.
If you get a row of zeroes with a non-zero in the r column then the subset DNE.

Random number generation without using bit operations

I'm writing a vertex shader at the moment, and I need some random numbers. Vertex shader hardware doesn't have logical/bit operations, so I cannot implement any of the standard random number generators.
Is it possible to make a random number generator using only standard arithmetic? the randomness doesn't have to be particularly good!
If you don't mind crappy randomness, a classic method is
x[n+1] = (x[n] * x[n] + C) mod N
where C and N are constants, C != 0 and C != -2, and N is prime. This is a typical pseudorandom generator for Pollard Rho factoring. Try C = 1 and N = 8051, those work ok.
Vertex shaders sometimes have built-in noise generators for you to use, such as cg's noise() function.
Use a linear congruential generator:
X_(n+1) = (a * X_n + c) mod m
Those aren't that strong, but at least they are well known and can have long periods. The Wikipedia page also has good recommendations:
The period of a general LCG is at most
m, and for some choices of a much less
than that. The LCG will have a full
period if and only if:
1. c and m are relatively prime,
2. a - 1 is divisible by all prime factors of m,
3. a - 1 is a multiple of 4 if m is a multiple of 4
Believe it or not, I used newx = oldx * 5 + 1 (or a slight variation of it) in several videogames. The randomness is horrible--it's more of a scrambled sequence than a random generator. But sometimes that's all you need. If I recall correctly, it goes through all numbers before it repeats.
It has some terrible characteristics. It doesn't ever give you the same number twice in a row. A few of us did a bunch of tests on variations of it and we used some variations in other games.
We used it when there was no good modulo available to us. It's just a shift by two and two adds (or a multiply by 5 and one add). I would never use it nowadays for random numbers--I'd use an LCG--but maybe it would work OK for a shader where speed is crucial and your instruction set may be limited.

Resources