Accuracy of ZHEEV and ZHEEVD - numerical

I am using LAPACK to diagonalize complex Hermitian matrices. I can choose between ZHEEV and ZHEEVD. Which one of these routines is more accurate for matrices of the size 40 and a range of eigenvalues from 1E-2 to 1E1?

ZHEEVD uses a divide-and-conquer method to compute eigenvalues.
If your matrices are 40 x 40 and the eigenvalues are within the range [1e-2, 1e1] then
you should have absolutely no numerical issues. You can use either routine.

I don't know the answer but,
It probably depends on which LAPACK library you're using. There are a number of them out there, optimized for various platforms. Are you using Netlib, MKL, ACML, ??
Why would you take a take a total stranger's word for this when you can measure it yourself?

Related

How many arithmetic operations should it take to calculate trig functions?

I'm trying to assess the expected performance of calculating trigonometry functions as a function of the required precision. Obviously the wall clock time depends on the speed of the underlying arithmetic, so factoring that out by just counting number of operations:
Using state-of-the-art algorithms, how many arithmetic operations (add, subtract, multiply, divide) should it take to calculate sin(x), as a function of the number of bits (or decimal digits) of precision required in the output?
... to assess the expected performance of calculating trigonometry functions as a function of the required precision.
Look as the first omitted term in the Taylor series sine for x = π/4 as the order of error.
Details: sin(x) usually has these phases:
Handling special cases: NaN, infinities.
Argument reduction to the primary range to say [-π/4...+π/4]. Real good reduction is hard as π is irrational and so involves code that reaches 50% of sin() time. Much time used to emulate the needed extended precision. (Research K.C. Ng's "ARGUMENT REDUCTION FOR HUGE ARGUMENTS: Good to the Last Bit")
Low quality reduction involves much less:/, truncate, -, *.
Calculation over a limited range. This is what many only consider. If done with a Taylor's series and needing 53 bits, then about 10-11 terms are needed: Taylor series sine. Yet quality code often uses a pair of crafted polynomials, each of about 4-5 terms, to form the quotient p(x)/q(x).
Of course dedicated hardware support in any of these steps greatly increases performance.
Note: code for sin() is often paired with cos() code as extensive use of trig identities simplify the calculation.
I'd expect a software solution for sin() to cost on the order of 25x a common *. This is a rough estimate.
To achieve a very low error rate in the ULP, code typically uses a tad more. sine_crap() could get by with only a few terms. So when assessing time performance, there is a trade-off with correctness. How good a sin() do you want?
assess the expected performance of calculating trigonometry functions as a function of the required precision
Using the Taylors series as a predictor of the number of ops, worst case x = π/4 (45°) and the error in the calculation on the order of the last term of the series:
For 32-bit float, order 6 float ops needed.
For 64-bit double, order 9 float ops needed.
So if time scales by the square of the FP width, double predicted to take 9/6*2*2 or 6 times as long.
We can calculate any trigonometric function using a simple right angled triangle or using the McLaurin\Taylor Series. So it really depends on which one you choose to implement. If you only pass an angle as an argument, and wish to calculate the sin of that particular angle, it would take about 4 to 6 steps to calculate the sin using an unit circle.

Which sparse linear solver is faster? SparseLU or BiCGSTAB?

I tested Eigen's SparseLU and BicGSTAB method on some sparse matrix, whose dense counterparts' size ranges from 3000*3000 to 16000*16000. All the cases shows that SparseLU is around 13% faster than BicGSTAB method.
I didn't feed the BiCGSTAB a RowMajor sparse matrix, or give it any pre-conditioner. That might be the reason of slow.
So I am wondering, if I do both methods well, which one should be faster?
How about if the matrix size goes up to millions*millions?
Thanks a lot!
You already mentioned the main reason of performance difference.
The iterative methods are getting much faster when you choose the "right" preconditioner.
An example list of preconditioners you might refer to is:
Jacobi
SOR
ILU
Multigrid
Each preconditioner has some parameters the should be tuned also.
Choice of linear solver has a lot to do with the distribution of eigenvalues/eigenvectors of the matrix. If you have a symmetric positive definite matrix then conjugate-gradient is a good option. Number of iterations depend on the condition number (max eigen value/min eigen value). For a matrix derived from an elliptic operator, condition number increases with the size of the matrix.
Check out this article by Jonathan Shewchuk for a great explanation on CG. (https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf).
For other matrix types, you can use GMRES etc. based on the eigen properties. Check out this paper http://www.sam.math.ethz.ch/~mhg/pub/biksm.pdf
Hope this helps.

How can you find the condition number in Eigen?

In Matlab there are cond and rcond, and in LAPACK too. Is there any routine in Eigen to find the condition number of a matrix?
I have a Cholesky decomposition of a matrix and I want to check if it is close to singularity, but cannot find a similar function in the docs.
UPDATE:
I think I can use something like this algorithm, which makes use of the triangular factorization. The method by Ilya is useful for more accurate answers, so I will mark it as correct.
Probably easiest way to compute the condition number is using the expression:
cond(A) = max(sigma) / min(sigma)
where sigma is an array of singular values, the result of SVD. Eigen author suggests this code:
JacobiSVD<MatrixXd> svd(A);
double cond = svd.singularValues()(0)
/ svd.singularValues()(svd.singularValues().size()-1);
Other ways are (less efficient)
cond(A) = max(lambda) / min(lambda)
cond(A) = norm2(A) * norm2(A^-1)
where lambda is an array of eigenvalues.
It looks like Cholesky decomposition does not directly help here, but I cant tell for sure at the moment.
You could use the Gershgorin circle theorem to get a rough estimate.
But as Ilya Popov has already pointed out, calculating the eigenvalues/singular values is more reliable. However, it doesn't make sense to calculate all eigenvalues, which gets very expensive. You only need the largest and the smallest eigenvalues, for that you can use the Power method for the largest and Inverse Iteration for the smallest eigenvalue.
Or you could use a library that can do this already, e.g. Spectra.
You can use norms. In my robotics experience, this is computationally faster than singular values:
pseudoInverse(matrix).norm() * matrix.norm()
I found this to be 2.6X faster than singular values for 6x6 matrices. It's also recommended in this book:
B. Siciliano, and O. Khatib, Springer Handbook of Robotics. Berlin: Springer Science and Business Media, 2008, p. 236.

OpenCL reduction result wrong with large floats

I used AMD's two-stage reduction example to compute the sum of all numbers from 0 to 65 536 using floating point precision. Unfortunately, the result is not correct. However, when I modify my code, so that I compute the sum of 65 536 smaller numbers (for example 1), the result is correct.
I couldn't find any error in the code. Is it possible that I am getting wrong results, because of the float type? If this is the case, what is the best approach to solve the issue?
This is a "side effect" of summing floating point numbers using finite precision CPU's or GPU's. The accuracy depends the algorithm and the order the values are summed. The theory and practice behind is explained in Nicholas J, Higham's paper
The Accuracy of Floating Point Summation
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=7AECC0D6458288CD6E4488AD63A33D5D?doi=10.1.1.43.3535&rep=rep1&type=pdf
The fix is to use a smarter algorithm like the Kahan Summation Algorithm
https://en.wikipedia.org/wiki/Kahan_summation_algorithm
And the Higham paper has some alternatives too.
This problem illustrates the nature of benchmarking, the first rule of the benchmark is to get the
right answer, using realistic data!
There is probably no error in the coding of your kernel or host application. The issue is with the single-precision floating point.
The correct sum is: 65537 * 32768 = 2147516416, and it takes 31 bits to represent it in binary (10000000000000001000000000000000). 32-bit floats can only hold integers accurately up to 2^24.
"Any integer with absolute value less than [2^24] can be exactly represented in the single precision format"
"Floating Point" article, wikipedia
This is why you are getting the correct sum when it is less than or equal to 2^24. If you are doing a complete sum using single-precision, you will eventually lose accuracy no matter which device you are executing the kernel on. There are a few things you can do to get the correct answer:
use double instead of float if your platform supports it
use int or unsigned int
sum a smaller set of numbers eg: 0+1+2+...+4095+4096 = (2^23 + 2^11)
Read more about single precision here.

How to check if m n-sized vectors are linearly independent?

Disclaimer
This is not strictly a programming question, but most programmers soon or later have to deal with math (especially algebra), so I think that the answer could turn out to be useful to someone else in the future.
Now the problem
I'm trying to check if m vectors of dimension n are linearly independent. If m == n you can just build a matrix using the vectors and check if the determinant is != 0. But what if m < n?
Any hints?
See also this video lecture.
Construct a matrix of the vectors (one row per vector), and perform a Gaussian elimination on this matrix. If any of the matrix rows cancels out, they are not linearly independent.
The trivial case is when m > n, in this case, they cannot be linearly independent.
Construct a matrix M whose rows are the vectors and determine the rank of M. If the rank of M is less than m (the number of vectors) then there is a linear dependence. In the algorithm to determine the rank of M you can stop the procedure as soon as you obtain one row of zeros, but running the algorithm to completion has the added bonanza of providing the dimension of the spanning set of the vectors. Oh, and the algorithm to determine the rank of M is merely Gaussian elimination.
Take care for numerical instability. See the warning at the beginning of chapter two in Numerical Recipes.
If m<n, you will have to do some operation on them (there are multiple possibilities: Gaussian elimination, orthogonalization, etc., almost any transformation which can be used for solving equations will do) and check the result (eg. Gaussian elimination => zero row or column, orthogonalization => zero vector, SVD => zero singular number)
However, note that this question is a bad question for a programmer to ask, and this problem is a bad problem for a program to solve. That's because every linearly dependent set of n<m vectors has a different set of linearly independent vectors nearby (eg. the problem is numerically unstable)
I have been working on this problem these days.
Previously, I have found some algorithms regarding Gaussian or Gaussian-Jordan elimination, but most of those algorithms only apply to square matrix, not general matrix.
To apply for general matrix, one of the best answers might be this:
http://rosettacode.org/wiki/Reduced_row_echelon_form#MATLAB
You can find both pseudo-code and source code in various languages.
As for me, I transformed the Python source code to C++, causes the C++ code provided in the above link is somehow complex and inappropriate to implement in my simulation.
Hope this will help you, and good luck ^^
If computing power is not a problem, probably the best way is to find singular values of the matrix. Basically you need to find eigenvalues of M'*M and look at the ratio of the largest to the smallest. If the ratio is not very big, the vectors are independent.
Another way to check that m row vectors are linearly independent, when put in a matrix M of size mxn, is to compute
det(M * M^T)
i.e. the determinant of a mxm square matrix. It will be zero if and only if M has some dependent rows. However Gaussian elimination should be in general faster.
Sorry man, my mistake...
The source code provided in the above link turns out to be incorrect, at least the python code I have tested and the C++ code I have transformed does not generates the right answer all the time. (while for the exmample in the above link, the result is correct :) -- )
To test the python code, simply replace the mtx with
[30,10,20,0],[60,20,40,0]
and the returned result would be like:
[1,0,0,0],[0,1,2,0]
Nevertheless, I have got a way out of this. It's just this time I transformed the matalb source code of rref function to C++. You can run matlab and use the type rref command to get the source code of rref.
Just notice that if you are working with some really large value or really small value, make sure use the long double datatype in c++. Otherwise, the result will be truncated and inconsistent with the matlab result.
I have been conducting large simulations in ns2, and all the observed results are sound.
hope this will help you and any other who have encontered the problem...
A very simple way, that is not the most computationally efficient, is to simply remove random rows until m=n and then apply the determinant trick.
m < n: remove rows (make the vectors shorter) until the matrix is square, and then
m = n: check if the determinant is 0 (as you said)
m < n (the number of vectors is greater than their length): they are linearly dependent (always).
The reason, in short, is that any solution to the system of m x n equations is also a solution to the n x n system of equations (you're trying to solve Av=0). For a better explanation, see Wikipedia, which explains it better than I can.

Resources