Solving linear equations during inverse iteration - opencl

I am using OpenCL to calculate the eigenvectors of a matrix. AMD has an example of eigenvalue calculation so I decided to use inverse iteration to get the eigenvectors.
I was following the algorithm described here and I noticed that in order to solve step 4 I need to solve a system of linear equations (or calculate the inverse of a matrix).
What is the best way to do this on a GPU using OpenCL? Are there any examples/references that I should look into?
EDIT: I'm sorry, I should have mentioned that my matrix is symmetric tridiagonal. From what I have been reading this could be important and maybe simplifies the whole process a lot

The fact that the matrix is tridiagonal is VERY important - that reduces the complexity of the problem from O(N^3) to O(N). You can probably get some speedup from the fact that it's symmetric too, but that won't be as dramatic.
The method for solving a tridiagonal system is here: http://en.wikipedia.org/wiki/Tridiagonal_matrix_algorithm.
Also note that you don't need to store all N^2 elements of the matrix, since almost all of them will be zeroes. You just need one vector of length N (for the diagonal) and two of length N-1 for the sub- and superdiagonals. And since your matrix is symmetric, the sub- and superdiagonals are the same.
Hope that's helpful...

I suggest using LU decomposition.
Here's example.
It's written in CUDA, but I think, it's not so hard to rewrite it in OpenCL.

Related

Is there any possibility to invert a symmetric banded (7 diagonals) matrix in linear time?

I need to invert a p x p symmetric banded hessian matrix H, which has 7 diagonals. p may be very high (=1000 or 10000).
H^{-1} can be considered as banded, and thus, I do not need to compute the complete inverse matrix, but rather its approximation. (It could be assumed to have 11 or 13 diagonals for example.)
I am looking for a method which does not imply parallelization.
Is there any possibility to build such an algorithm with R, in linear time? 
There is no linear time algorithm for this, to the best of my knowledge.
But you're not totally without hope:
Your matrix is not really that large, so using a relatively optimised implementation might be reasonably fast for p < 10K. For example a dense LU decomposition requires at most O(p^3), with p = 1000, that would probably take less than a second. In practice, an implementation for sparse matrices should achieve much better performance, taking advantage of the sparsity;
Do you really, really, really need to compute the inverse? Very often explicitly computing the inverse can be replaced by solving an equivalent linear system; with some methods such as iterative solvers (e.g. conjugate gradient) solving the linear system is significantly more efficient because the sparsity pattern of the source matrix is preserved, leading to reduced work; when computing the inverse, even if you do know it's OK to approximate with a banded matrix, there will still be substantial amount of fill-in (added nonzero values)
Putting it all together I'd suggest you try out the R matrix package for your matrix. Try all available signatures and ensure you have a high performance BLAS implementation installed. Also try to rewrite your call to compute the inverse:
# e.g. rewrite...
A_inverse = solve(A)
x = y * A_inverse
# ... as
x = solve(A, y)
This may be more subtle for your purposes, but there is a very high chance you should be able to do it, as suggested in the package docs:
solve(a, b, ...) ## *the* two-argument version, almost always preferred to
solve(a) ## the *rarely* needed one-argument version
If all else fails, you may have to try more efficient implementations available in: Matlab, Suite Sparse, PetSC, Eigen or Intel MKL.

Linear iterative solver vs direct solver stability

Is iterative solver more stable than direct solver based on LU factorization. For LU based solver, we always have cond(A) < cond(L) * cond(U), so factorization amplifies numerical inaccuracy. So in the event of an ill conditioned matrix A, whose condition number is large than 1e10, will it be better off using iterative solver for stability and numerical accuracy?
There are two factors involved into answering your question.
1) The physical system you are analyzing is ill-conditioned by itself (in mechanical terms, the system is pretty "loose", so its equilibrium state may vary greatly depending on just a small variation in the boundary conditions)
2) The physical system is OK, but the matrix has not been scaled properly before the solution process begins.
In the first case, there isn't much you can do: the physical system is inherently unstable. Consider applying different boundary conditions, for example.
In the second case, a preconditioner should be helpful; for example, the Jacobi preconditioner makes the matrix having all diagonal values equal to 1. In this case, the iterations are more likely to converge.The condition ratio of 1e10 shouldn't represent too much trouble, provided a preconditioning is used.

For a 3x3 only symmetric and positive definite linear system, is Cholesky still faster than Householder?

I am trying to solve a linear system Ax=b where A is 3x3 symmetric positive definite.
Though it is low in scale, I will have to repeat it for different As millions of times. So efficiency is still important.
There are many solvers for linear systems (C++, via Eigen).
I personally prefer: HouseholderQr().solve(), and llt().solve(), ldlt().solve().
I know that when n is very large, solvers based on Cholesky decomposition are faster than that of Householder's. But for my case when n is only 3, how can I compare their relative efficiency? Is there any formula for the exact float operation analysis?
thanks
Yes, Cholesky will still be faster. It will be about n^3/3 flops. The only reason to use QR would be if your matrix was very ill-conditioned.
If you need to solve these systems a million times and efficiency is important, I'd recommend calling LAPACK directly. You want the DPOSV function.
http://www.netlib.org/lapack/lug/node26.html#1272
http://www.netlib.org/clapack/what/double/dposv.c

Most efficient way to solve SEVERAL linear systems Ax=b with SMALL A (minimum 3x3 maximum 8x8)

I need to solve thousands of time SMALL linear system of the type Ax=b. Here A is a matrix that is not smaller than 3x3 and maximum 8x8. I am aware of this http://www.johndcook.com/blog/2010/01/19/dont-invert-that-matrix/ so I dont think it is smart to invert the matrix even if the matrices are small right? So what is the most efficient way to do that? I am programming in Fortran so probably I should use lapack library right? My matrices are full and in general non-simmetric.
Thanks
A.
Caveat: I didn't look into this extensively, but I have some experience I am happy to share.
In my experience, the fastest way to solve a 3x3 system is to basically use Cramer's rule. If you need to solve multiple systems with the same matrix A, it pays to pre-compute the inverse of A. This is only true for 2x2 and 3x3.
If you have to solve multiple 4x4 systems with the same matrix, then again using the inverse is noticeably faster than the forward and back-substitution of LU. I seem to remember that it uses less operations, and in practice the difference is even more (again, in my experience). As the matrix size grows, the difference shrinks, and asymptotically the difference disappears. If you are solving systems with difference matrices, then I don't think there is an advantage in computing the inverse.
In all cases, solving the system with the inverse can be much less accurate than using the LU-decomposition is A is fairly ill-conditioned. So if accuracy is an issue, then LU-factorization is definitely the way to go.
The LU factorization sounds like just the ticket for you, and the lapack routine dgetrf will compute this for you, after which you can use dgetrs to solve that linear system. Lapack has been optimized to the gills over the years, so in all likelihood you are better using that than writing any of this code yourself.
The computational cost of computing the matrix inverse and then multiplying that by the right-hand side vector is the same if not more than computing the LU-factorization of the matrix and then forward- and back-solving to find your answer. Moreover, computing the inverse exhibits even more bizarre pathological behavior than computing the LU-factorization, the stability of which is still a fairly subtle issue. It can be useful to know the inverse for small matrices, but it sounds like you don't need that for your purpose, so why do it?
Moreover, provided there are no loop-carried dependencies, you can parallelize this using OpenMP without too much trouble.

How expensive is it to compute the eigenvalues of a matrix?

How expensive is it to compute the eigenvalues of a matrix?
What is the complexity of the best algorithms?
How long might it take in practice if I have a 1000 x 1000 matrix? I assume it helps if the matrix is sparse?
Are there any cases where the eigenvalue computation would not terminate?
In R, I can compute the eigenvalues as in the following toy example:
m<-matrix( c(13,2, 5,4), ncol=2, nrow=2 )
eigen(m, only.values=1)
$values
[1] 14 3
Does anyone know what algorithm it uses?
Are there any other (open-source) packages that compute the eigenvalue?
Most of the algorithms for eigen value computations scale to big-Oh(n^3), where n is the row/col dimension of the (symmetric and square) matrix.
For knowing the time complexity of the best algorithm till date you would have to refer to the latest research papers in Scientific Computing/Numerical Methods.
But even if you assume the worse case, you would still need at least 1000^3 operations for a 1000x1000 matrix.
R uses the LAPACK routine's (DSYEVR, DGEEV, ZHEEV and ZGEEV) implementation by default. However you could specify the EISPACK=TRUE as a parameter to use a EISPACK's RS, RG, CH and CG routines.
The most popular and good open source packages for eigenvalue computation are LAPACK and EISPACK.
With big matrices you usually don't want all the eigenvalues. You just want the top few to do (say) a dimension reduction.
The canonical algorithm is the Arnoldi-Lanczos iterative algorithm implemented in ARPACK:
www.caam.rice.edu/software/ARPACK/
There is a matlab interface in eigs:
http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/ref/eigs.html
eigs(A,k) and eigs(A,B,k) return the k largest magnitude eigenvalues.
And there is now an R interface as well:
http://igraph.sourceforge.net/doc-0.5/R/arpack.html
I assume it helps if the matrix is
sparse?
Yes, there are algorithms, that perform well on sparse matrices.
See for example: http://www.cise.ufl.edu/research/sparse/
How long might it take in practice if
I have a 1000x1000 matrix?
MATLAB (based on LAPACK) computes on a dual-core 1.83 GHz machine all eigenvalues of a 1000x1000 random in roughly 5 seconds. When the matrix is symmetric, the computation can be done significantly faster and requires only about 1 second.
I would take a look at Eigenvalue algorithms, which link to a number of different methods. They'll all have different characteristics, and hopefully one will be suitable for your purposes.
You can use the GuessCompx package from CRAN to estimate the empirical complexity of your eigenvalues computation and predict the full running time (although it's still small in your example). You need a little helper function because the fitting process only subsets the rows, so you must make the matrix square:
library(GuessCompx)
m = matrix(rnorm(1e6), ncol=1000, nrow=1000)
# custom function to subset the increasing-size matrix to a square one:
eigen. = function(m) eigen(as.matrix(m[, 1:nrow(m)]))
CompEst(m, eigen.)
#### $`TIME COMPLEXITY RESULTS`
#### $`TIME COMPLEXITY RESULTS`$best.model
#### [1] "CUBIC"
#### $`TIME COMPLEXITY RESULTS`$computation.time.on.full.dataset
#### [1] "5.23S"
#### $`TIME COMPLEXITY RESULTS`$p.value.model.significance
#### [1] 1.784406e-34
You get a cubic complexity for time, and a Nlog(N) complexity for memory usage of the R base eigen() function. It takes 5.2 secs and 37Mb to run the whole computation.
Apache Mahout is an open-source framework built on map-reduce (i.e. it works for really really big matrices). Note that for a lot of matrix stuff the question isn't "whats the big-o runtime" but rather "how parallelizable is it?" Mahout says they use Lanczos, which can essentially be run in parallel on as many processors as you care to give it.
It uses the QR algo. See Wilkinson, J. H. (1965) The Algebraic Eigenvalue Problem. Clarendon Press, Oxford. It does not exploit sparsity.

Resources