What is the best way of calculating the diagonal of the inverse of a symmetric dense matrix (2000 * 2000)? Currently I calculate the inverse first using solve(x) and then extract the diagonal (diag(y)). Even though it works but I'm wondering whether there is a better way to do it so the code runs faster. I tried chol2inv() but it didn't work since my matrix is not positive-definite.
Update:
For anyone who may be interested, I was able to speed up the matrix inversion by using an optimized math library Intel MKL. It takes 3 seconds to inverse a 2000 * 2000 matrix on my machine. Intel MKL is available with Microsoft R Open.
If your matrix has no nice properties like being symmetric, diagonal, or positive-definite, your only choice sadly is to do sum(diag(solve(x)))
How long does that take to run on your matrix?
Related
I need to invert a p x p symmetric banded hessian matrix H, which has 7 diagonals. p may be very high (=1000 or 10000).
H^{-1} can be considered as banded, and thus, I do not need to compute the complete inverse matrix, but rather its approximation. (It could be assumed to have 11 or 13 diagonals for example.)
I am looking for a method which does not imply parallelization.
Is there any possibility to build such an algorithm with R, in linear time?
There is no linear time algorithm for this, to the best of my knowledge.
But you're not totally without hope:
Your matrix is not really that large, so using a relatively optimised implementation might be reasonably fast for p < 10K. For example a dense LU decomposition requires at most O(p^3), with p = 1000, that would probably take less than a second. In practice, an implementation for sparse matrices should achieve much better performance, taking advantage of the sparsity;
Do you really, really, really need to compute the inverse? Very often explicitly computing the inverse can be replaced by solving an equivalent linear system; with some methods such as iterative solvers (e.g. conjugate gradient) solving the linear system is significantly more efficient because the sparsity pattern of the source matrix is preserved, leading to reduced work; when computing the inverse, even if you do know it's OK to approximate with a banded matrix, there will still be substantial amount of fill-in (added nonzero values)
Putting it all together I'd suggest you try out the R matrix package for your matrix. Try all available signatures and ensure you have a high performance BLAS implementation installed. Also try to rewrite your call to compute the inverse:
# e.g. rewrite...
A_inverse = solve(A)
x = y * A_inverse
# ... as
x = solve(A, y)
This may be more subtle for your purposes, but there is a very high chance you should be able to do it, as suggested in the package docs:
solve(a, b, ...) ## *the* two-argument version, almost always preferred to
solve(a) ## the *rarely* needed one-argument version
If all else fails, you may have to try more efficient implementations available in: Matlab, Suite Sparse, PetSC, Eigen or Intel MKL.
I am trying to run the full SVD of a large (120k x 600k) and sparse (0,1% of non-null values) matrix M. Due to memory limitation all my previous attempts failed (with SVDLIBC, Octave, and R) and I am (almost) resigned to exploring other approaches to my problem (LSA).
However, at the moment, I am only interested in the eigenvalues of the diagonal matrix S and not in the left/right singular vectors (matrices U and V).
Is there a way to compute those singular values without storing in memory the dense matrix M and/or the singular vector matrices U and V?
Any help will be greatly appreciated.
[EDIT] My server configuration: 3,5GHz/3,9GHz (6 cores / 12 threads) 128GB of RAM
Looking for the meaning of that values (elements of matrix S from a SVD decomposition) in wikipedia we get:
The non-zero singular values of M (found on the diagonal entries of Σ)
are the square roots of the non-zero eigenvalues of both M*M and MM*
So you can look for the eigenvalues of the matrix A*A' (120k x 120k) without explicitly build the matrix, of course.
By the way, I dont think you are interested in ALL the eigenvalues (or singular values) for a matrix with such a dimensions. I do not think that any algorithm will give enough accurate results.
How comfortable are you with Fortran? I think you should be able to complete the computations using prebuilt packages available here and/or here. Also, if you're open to C++ and a decomposition using randomized and re-orthonormalized matrices, you can try the code at the google code project called redsvd. (I can't post the link because I don't have the requisite reputation for three links, but you can search for redsvd and find it readily.)
I need to solve thousands of time SMALL linear system of the type Ax=b. Here A is a matrix that is not smaller than 3x3 and maximum 8x8. I am aware of this http://www.johndcook.com/blog/2010/01/19/dont-invert-that-matrix/ so I dont think it is smart to invert the matrix even if the matrices are small right? So what is the most efficient way to do that? I am programming in Fortran so probably I should use lapack library right? My matrices are full and in general non-simmetric.
Thanks
A.
Caveat: I didn't look into this extensively, but I have some experience I am happy to share.
In my experience, the fastest way to solve a 3x3 system is to basically use Cramer's rule. If you need to solve multiple systems with the same matrix A, it pays to pre-compute the inverse of A. This is only true for 2x2 and 3x3.
If you have to solve multiple 4x4 systems with the same matrix, then again using the inverse is noticeably faster than the forward and back-substitution of LU. I seem to remember that it uses less operations, and in practice the difference is even more (again, in my experience). As the matrix size grows, the difference shrinks, and asymptotically the difference disappears. If you are solving systems with difference matrices, then I don't think there is an advantage in computing the inverse.
In all cases, solving the system with the inverse can be much less accurate than using the LU-decomposition is A is fairly ill-conditioned. So if accuracy is an issue, then LU-factorization is definitely the way to go.
The LU factorization sounds like just the ticket for you, and the lapack routine dgetrf will compute this for you, after which you can use dgetrs to solve that linear system. Lapack has been optimized to the gills over the years, so in all likelihood you are better using that than writing any of this code yourself.
The computational cost of computing the matrix inverse and then multiplying that by the right-hand side vector is the same if not more than computing the LU-factorization of the matrix and then forward- and back-solving to find your answer. Moreover, computing the inverse exhibits even more bizarre pathological behavior than computing the LU-factorization, the stability of which is still a fairly subtle issue. It can be useful to know the inverse for small matrices, but it sounds like you don't need that for your purpose, so why do it?
Moreover, provided there are no loop-carried dependencies, you can parallelize this using OpenMP without too much trouble.
I am using OpenCL to calculate the eigenvectors of a matrix. AMD has an example of eigenvalue calculation so I decided to use inverse iteration to get the eigenvectors.
I was following the algorithm described here and I noticed that in order to solve step 4 I need to solve a system of linear equations (or calculate the inverse of a matrix).
What is the best way to do this on a GPU using OpenCL? Are there any examples/references that I should look into?
EDIT: I'm sorry, I should have mentioned that my matrix is symmetric tridiagonal. From what I have been reading this could be important and maybe simplifies the whole process a lot
The fact that the matrix is tridiagonal is VERY important - that reduces the complexity of the problem from O(N^3) to O(N). You can probably get some speedup from the fact that it's symmetric too, but that won't be as dramatic.
The method for solving a tridiagonal system is here: http://en.wikipedia.org/wiki/Tridiagonal_matrix_algorithm.
Also note that you don't need to store all N^2 elements of the matrix, since almost all of them will be zeroes. You just need one vector of length N (for the diagonal) and two of length N-1 for the sub- and superdiagonals. And since your matrix is symmetric, the sub- and superdiagonals are the same.
Hope that's helpful...
I suggest using LU decomposition.
Here's example.
It's written in CUDA, but I think, it's not so hard to rewrite it in OpenCL.
How expensive is it to compute the eigenvalues of a matrix?
What is the complexity of the best algorithms?
How long might it take in practice if I have a 1000 x 1000 matrix? I assume it helps if the matrix is sparse?
Are there any cases where the eigenvalue computation would not terminate?
In R, I can compute the eigenvalues as in the following toy example:
m<-matrix( c(13,2, 5,4), ncol=2, nrow=2 )
eigen(m, only.values=1)
$values
[1] 14 3
Does anyone know what algorithm it uses?
Are there any other (open-source) packages that compute the eigenvalue?
Most of the algorithms for eigen value computations scale to big-Oh(n^3), where n is the row/col dimension of the (symmetric and square) matrix.
For knowing the time complexity of the best algorithm till date you would have to refer to the latest research papers in Scientific Computing/Numerical Methods.
But even if you assume the worse case, you would still need at least 1000^3 operations for a 1000x1000 matrix.
R uses the LAPACK routine's (DSYEVR, DGEEV, ZHEEV and ZGEEV) implementation by default. However you could specify the EISPACK=TRUE as a parameter to use a EISPACK's RS, RG, CH and CG routines.
The most popular and good open source packages for eigenvalue computation are LAPACK and EISPACK.
With big matrices you usually don't want all the eigenvalues. You just want the top few to do (say) a dimension reduction.
The canonical algorithm is the Arnoldi-Lanczos iterative algorithm implemented in ARPACK:
www.caam.rice.edu/software/ARPACK/
There is a matlab interface in eigs:
http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/ref/eigs.html
eigs(A,k) and eigs(A,B,k) return the k largest magnitude eigenvalues.
And there is now an R interface as well:
http://igraph.sourceforge.net/doc-0.5/R/arpack.html
I assume it helps if the matrix is
sparse?
Yes, there are algorithms, that perform well on sparse matrices.
See for example: http://www.cise.ufl.edu/research/sparse/
How long might it take in practice if
I have a 1000x1000 matrix?
MATLAB (based on LAPACK) computes on a dual-core 1.83 GHz machine all eigenvalues of a 1000x1000 random in roughly 5 seconds. When the matrix is symmetric, the computation can be done significantly faster and requires only about 1 second.
I would take a look at Eigenvalue algorithms, which link to a number of different methods. They'll all have different characteristics, and hopefully one will be suitable for your purposes.
You can use the GuessCompx package from CRAN to estimate the empirical complexity of your eigenvalues computation and predict the full running time (although it's still small in your example). You need a little helper function because the fitting process only subsets the rows, so you must make the matrix square:
library(GuessCompx)
m = matrix(rnorm(1e6), ncol=1000, nrow=1000)
# custom function to subset the increasing-size matrix to a square one:
eigen. = function(m) eigen(as.matrix(m[, 1:nrow(m)]))
CompEst(m, eigen.)
#### $`TIME COMPLEXITY RESULTS`
#### $`TIME COMPLEXITY RESULTS`$best.model
#### [1] "CUBIC"
#### $`TIME COMPLEXITY RESULTS`$computation.time.on.full.dataset
#### [1] "5.23S"
#### $`TIME COMPLEXITY RESULTS`$p.value.model.significance
#### [1] 1.784406e-34
You get a cubic complexity for time, and a Nlog(N) complexity for memory usage of the R base eigen() function. It takes 5.2 secs and 37Mb to run the whole computation.
Apache Mahout is an open-source framework built on map-reduce (i.e. it works for really really big matrices). Note that for a lot of matrix stuff the question isn't "whats the big-o runtime" but rather "how parallelizable is it?" Mahout says they use Lanczos, which can essentially be run in parallel on as many processors as you care to give it.
It uses the QR algo. See Wilkinson, J. H. (1965) The Algebraic Eigenvalue Problem. Clarendon Press, Oxford. It does not exploit sparsity.