I have a general real matrix (i.e. not symmetric or Hermitian, etc.), and I would like to find its right eigenvectors and corresponding left eigenvectors in Julia.
Julia's eigen function returns the right eigenvectors only. I can find the left eigenvectors by doing
eigen(copy(M'))
but this requires copying the whole matrix and performing the eigendecomposition again, and there is no guarantee that the eigenvectors will be in the same order. (The copy is necessary because there is no eigen method for matrices of type Adjoint.)
In Python we have scipy.linalg.eigs, which can compute the left and right eigenvectors simultaneously in a single pass, which is more efficient and guarantees that they will be in the same order. Is there something similar in Julia?
The left eigenvectors can be computed by taking the inverse of the matrix formed by the right eigenvectors:
using LinearAlgebra
A = [1 0.1; 0.1 1]
F = eigen(A)
Q = eigvecs(F) # right eigenvectors
QL = inv(eigvecs(F)) # left eigenvectors
Λ = Diagonal(eigvals(F))
# check the results
A * Q ≈ Q * Λ # returns true
QL * A ≈ Λ * QL # returns true, too
# in general we have:
A ≈ Q * Λ * inv(Q)
In the above example QL are the left eigenvectors.
If the left eigenvectors are applied to a vector is it preferable to compute Q \ v, instead of inv(QL)*v.
I use the SVD factorization which decomposes a matrix into 3 matrixes USV' = M. The U matrix contains columnwise the left eigenvectors, V the right eigenvectors, and S is diagonal with the sqrt of the eigenvalues.
Note that U = inv(V) if and only if A is symetric (as in PCA where both are used indistinctly on a correlation matrix) and must be positive definite.
Here are some sources where I confirmed my info:
https://www.cc.gatech.edu/~dellaert/pubs/svd-note.pdf
https://en.wikipedia.org/wiki/Singular_value_decomposition
Related
I have a very large symmetric matrix called M. The size of the matrix M is 1000000 * 1000000. Let M[i,j] denote the element at ith row and jth column of matrix M. The upper triangular part of the symmetric matrix M was saved as a vector called V. V = [M[1,1], M[1,2], M[2,2], M[1,3], M[2,3], M[3,3], M[1,4], M[2,4], M[3,4], M[4,4] ,..., M[1000000, 1000000]]. I had three questions.
(1) How can I convert V to M efficiently?
(2) How can I convert V to the upper triangular part of the symmetric matrix M efficiently?
I mean convert V to another matrix W. The upper triangular part of W is the same as M while the other elements in W is 0.
(3) How can I convert V to the lower triangular part of the symmetric matrix M efficiently?
I mean convert V to another matrix Q. The lower triangular part of Q is the same as M while the other elements in Q is 0.
In this case the most efficient way to create M is to have a custom type that is <:AbstractMatrix. This should be almost zero overhead and use no extra memory.
The type would be something like:
struct MyMatrix{S, T<:AbstractVector{S}} <: AbstractMatrix{S}
v::T
end
(I am omitting a constructor which should check if length of v matches the "half" of some square matrix)
Then you should define the appropriate methods for your type. Their list is given here in the Julia manual (and depending on the exact type of a matrix you want they should be implemented differently). In that section there is an example how such an object can be implemented.
Rao QE is a weighted Euclidian distance matrix. I have the vectors for the elements of the d_ijs in a data table dt, one column per element (say there are x of them). p is the final column. nrow = S. The double sums are for the lower left (or upper right since it is symmetric) elements of the distance matrix.
If I only needed an unweighted distance matrix I could simply do dist() over the x columns. How do I weight the d_ijs by the product of p_i and p_j?
And example data set is at https://github.com/GeraldCNelson/nutmod/blob/master/RaoD_example.csv with the ps in the column called foodQ.ratio.
You still start with dist for the raw Euclidean distance matrix. Let it be D. As you will read from R - How to get row & column subscripts of matched elements from a distance matrix, a "dist" object is not a real matrix, but a 1D array. So first do D <- as.matrix(D) or D <- dist2mat(D) to convert it to a complete matrix before the following.
Now, let p be the vector of weights, the Rao's QE is just a quadratic form q'Dq / 2:
c(crossprod(p, D %*% p)) / 2
Note, I am not doing everything in the most efficient way. I have performed a symmetric matrix-vector multiplication D %*% p using the full D rather than just its lower triangular part. However, R does not have a routine doing triangular matrix-vector multiplication. So I compute the full version than divide 2.
This doubles computation amount that is necessary; also, making D a full matrix doubles memory costs. But if your problem is small to medium size this is absolutely fine. For large problem, if you are R and C wizard, call BLAS routine dtrmv or even dtpmv for the triangular matrix-vector computation.
Update
I just found this simple paper: Rao's quadratic entropy as a measure of functional diversity based on multiple traits for definition and use of Rao's EQ. It mentions that we can replace Euclidean distance with Mahalanobis distance. In case we want to do this, use my code in Mahalanobis distance of each pair of observations for fast computation of Mahalanobis distance matrix.
I'm trying to execute this equation in scilab; however, I'm getting error: 59 of function %s_pow called ... even though I define x.
n=0:1:3;
x=[0:0.1:2];
z = factorial(3); w = factorial(n);u = factorial(3-n);
y = z /(w.*u);
t = y.*x^n*(1-x)^(3-n)
(at this point I haven't added in the plot command, although I would assume it's plot(t)?)
Thanks for any input.
The power x^n and (1-x)^(3-n) on the last line both cause the problem, because x and n are matrices and they are not the same size.
As mentioned in the documentation the power operation can only be performed between:
(A:square)^(b:scalar) If A is a square matrix and b is a scalar then A^b is the matrix A to the power b.
(A:matrix).^(b:scalar) If b is a scalar and A a matrix then A.^b is
the matrix formed by the element of A to the power b (elementwise
power). If A is a vector and b is a scalar then A^b and A.^b performs
the same operation (i.e elementwise power).
(A:scalar).^(b:matrix) If A is a scalar and b is a matrix (or
vector) A^b and A.^b are the matrices (or vectors) formed by
a^(b(i,j)).
(A:matrix).^(b:matrix) If A and b are vectors (matrices) of the same
size A.^b is the A(i)^b(i) vector (A(i,j)^b(i,j) matrix).
I have two square matrices A and B. A is symmetric, B is symmetric positive definite. I would like to compute $trace(A.B^{-1})$. For now, I compute the Cholesky decomposition of B, solve for C in the equation $A=C.B$ and sum up the diagonal elements.
Is there a more efficient way of proceeding?
I plan on using Eigen. Could you provide an implementation if the matrices are sparse (A can often be diagonal, B is often band-diagonal)?
If B is sparse, it may be efficient (i.e., O(n), assuming good condition number of B) to solve for x_i in
B x_i = a_i
(sample Conjugate Gradient code is given on Wikipedia). Taking a_i to be the column vectors of A, you get the matrix B^{-1} A in O(n^2). Then you can sum the diagonal elements to get the trace. Generally, it's easier to do this sparse inverse multiplication than to get the full set of eigenvalues. For comparison, Cholesky decomposition is O(n^3). (see Darren Engwirda's comment below about Cholesky).
If you only need an approximation to the trace, you can actually reduce the cost to O(q n) by averaging
r^T (A B^{-1}) r
over q random vectors r. Usually q << n. This is an unbiased estimate provided that the components of the random vector r satisfy
< r_i r_j > = \delta_{ij}
where < ... > indicates an average over the distribution of r. For example, components r_i could be independent gaussian distributed with unit variance. Or they could be selected uniformly from +-1. Typically the trace scales like O(n) and the error in the trace estimate scales like O(sqrt(n/q)), so the relative error scales as O(sqrt(1/nq)).
If generalized eigenvalues are more efficient to compute, you can compute the generalized eigenvalues, A*v = lambda* B *v and then sum up all the lambdas.
How does it actually reduce noise..can you suggest some nice tutorials?
SVD can be understood from a geometric sense for square matrices as a transformation on a vector.
Consider a square n x n matrix M multiplying a vector v to produce an output vector w:
w = M*v
The singular value decomposition M is the product of three matrices M=U*S*V, so w=U*S*V*v. U and V are orthonormal matrices. From a geometric transformation point of view (acting upon a vector by multiplying it), they are combinations of rotations and reflections that do not change the length of the vector they are multiplying. S is a diagonal matrix which represents scaling or squashing with different scaling factors (the diagonal terms) along each of the n axes.
So the effect of left-multiplying a vector v by a matrix M is to rotate/reflect v by M's orthonormal factor V, then scale/squash the result by a diagonal factor S, then rotate/reflect the result by M's orthonormal factor U.
One reason SVD is desirable from a numerical standpoint is that multiplication by orthonormal matrices is an invertible and extremely stable operation (condition number is 1). SVD captures any ill-conditioned-ness in the diagonal scaling matrix S.
One way to use SVD to reduce noise is to do the decomposition, set components that are near zero to be exactly zero, then re-compose.
Here's an online tutorial on SVD.
You might want to take a look at Numerical Recipes.
Singular value decomposition is a method for taking an nxm matrix M and "decomposing" it into three matrices such that M=USV. S is a diagonal square (the only nonzero entries are on the diagonal from top-left to bottom-right) matrix containing the "singular values" of M. U and V are orthogonal, which leads to the geometric understanding of SVD, but that isn't necessary for noise reduction.
With M=USV, we still have the original matrix M with all its noise intact. However, if we only keep the k largest singular values (which is easy, since many SVD algorithms compute a decomposition where the entries of S are sorted in nonincreasing order), then we have an approximation of the original matrix. This works because we assume that the small values are the noise, and that the more significant patterns in the data will be expressed through the vectors associated with larger singular values.
In fact, the resulting approximation is the most accurate rank-k approximation of the original matrix (has the least squared error).
To answer to the tittle question: SVD is a generalization of eigenvalues/eigenvectors to non-square matrices.
Say,
$X \in N \times p$, then the SVD decomposition of X yields X=UDV^T where D is diagonal and U and V are orthogonal matrices.
Now X^TX is a square matrice, and the SVD decomposition of X^TX=VD^2V where V is equivalent to the eigenvectors of X^TX and D^2 contains the eigenvalues of X^TX.
SVD can also be used to greatly ease global (i.e. to all observations simultaneously) fitting of an arbitrary model (expressed in an formula) to data (with respect to two variables and expressed in a matrix).
For example, data matrix A = D * MT where D represents the possible states of a system and M represents its evolution wrt some variable (e.g. time).
By SVD, A(x,y) = U(x) * S * VT(y) and therefore D * MT = U * S * VT
then D = U * S * VT * MT+ where the "+" indicates a pseudoinverse.
One can then take a mathematical model for the evolution and fit it to the columns of V, each of which are a linear combination the components of the model (this is easy, as each column is a 1D curve). This obtains model parameters which generate M? (the ? indicates it is based on fitting).
M * M?+ * V = V? which allows residuals R * S2 = V - V? to be minimized, thus determining D and M.
Pretty cool, eh?
The columns of U and V can also be inspected to glean information about the data; for example each inflection point in the columns of V typically indicates a different component of the model.
Finally, and actually addressing your question, it is import to note that although each successive singular value (element of the diagonal matrix S) with its attendant vectors U and V does have lower signal to noise, the separation of the components of the model in these "less important" vectors is actually more pronounced. In other words, if the data is described by a bunch of state changes that follow a sum of exponentials or whatever, the relative weights of each exponential get closer together in the smaller singular values. In other other words the later singular values have vectors which are less smooth (noisier) but in which the change represented by each component are more distinct.