Maxima: Eigenvectors output - math

So I solve the eigenvectors for a matrix in Maxima.
a:matrix([10,10],[-4,-3]);
\\outputs matrix
vec:eigenvectors(a);
[[[5,2],[1,1]],[[[1,-1/2]],[[1,-4/5]]]]
I've hand calculated the eigenvalues, and vectors as (1x2) 5: [-2,1]. 2:[-5,4], which are correct. What is Maxima outputting?

Eigenvectors are only determined up to a multiplicative constant. That is, if x is an eigenvector, then so is a*x where a is a scalar. I think if you look at your result and Maxima's result, you'll see that they are equivalent in that sense.
There are different normalization schemes. Looks like Maxima makes the first element 1. Another common scheme is to make the norm of the eigenvector equal to 1. Or one can just leave them unnormalized.

Related

eigen() and the correct eigenvectors

My problem is the following:
I'm trying to use R in order to compute numerically this problem.
So I've correctly setup the problem in my console, and then I tried to compute the eigenvectors.
But I expect that the eigenvector associated with lambda = 1 is (1,2,1) instead of what I've got here. So, the scaling is correct (0.4082483 is effectively half of 0.8164966), but I would like to obtain a consistent result.
My original problem is to find a stationary distribution for a Markov Chain using R instead of doing it on paper. So from a probabilistic point of view, my stationary distribution is a vector whose sum of the components is equal to 1. For that reason I was trying to change the scale in order to obtain what I've defined "a consistent result".
How can I do that ?
The eigen vectors returned by R are normalized (for the square-norm). If V is a eigen vector then s * V is a eigen vector as well for any non-zero scalar s. If you want the stationary distribution as in your link, divide by the sum:
V / sum(V)
and you will get (1/4, 1/2, 1/4).
So:
ev <- eigen(t(C))$vectors
ev / colSums(ev)
to get all the solutions in one shot.
C <- matrix(c(0.5,0.25,0,0.5,0.5,0.5,0,0.25,0.5),
nrow=3)
ee <- eigen(t(C))$vectors
As suggested by #Stéphane Laurent in the comments, the scaling of eigenvectors is arbitrary; only the relative value is specified. The default in R is that the sum of squares of the eigenvectors (their norms) are equal to 1; colSums(ee^2) is a vector of 1s.
Following the link, we can see that you want each eigenvector to sum to 1.
ee2 <- sweep(ee,MARGIN=2,STATS=colSums(ee),FUN=`/`)
(i.e., divide each eigenvector by its sum).
(This is a good general solution, but in this case the sum of the second and third eigenvectors are both approximately zero [theoretically, they are exactly zero], so this only really makes sense for the first eigenvector.)

Mahalonobis distance in R, error: system is computationally singular

I'd like to calculate multivariate distance from a set of points to the centroid of those points. Mahalanobis distance seems to be suited for this. However, I get an error (see below).
Can anyone tell me why I am getting this error, and if there is a way to work around it?
If you download the coordinate data and the associated environmental data, you can run the following code.
require(maptools)
occ <- readShapeSpatial('occurrences.shp')
load('envDat.Rdata')
#standardize the data to scale the variables
dat <- as.matrix(scale(dat))
centroid <- dat[1547,] #let's assume this is the centroid in this case
#Calculate multivariate distance from all points to centroid
mahalanobis(dat,center=centroid,cov=cov(dat))
Error in solve.default(cov, ...) :
system is computationally singular: reciprocal condition number = 9.50116e-19
The Mahalanobis distance requires you to calculate the inverse of the covariance matrix. The function mahalanobis internally uses solve which is a numerical way to calculate the inverse. Unfortunately, if some of the numbers used in the inverse calculation are very small, it assumes that they are zero, leading to the assumption that it is a singular matrix. This is why it specifies that they are computationally singular, because the matrix might not be singular given a different tolerance.
The solution is to set the tolerance for when it assumes that they are zero. Fortunately, mahalanobis allows you to pass this parameter (tol) to solve:
mahalanobis(dat,center=centroid,cov=cov(dat),tol=1e-20)
# [1] 24.215494 28.394913 6.984101 28.004975 11.095357 14.401967 ...
mahalanobis uses the covariance matrix, cov, (more precisely the inverse of it) to transform the coordinate system, then compute Euclidian distance in the new coordinates. A standard reference is Duda & Hart "Pattern Classification and Scene Recognition"
Looks like your cov matrix is singular. Perhaps there are linearly-dependent columns in "dat" that are unnecessary? Setting the tolerance to zero won't help if
the covariance matrix is truly singular. The first thing to do, instead, is look for columns that might be a rescaling of some other column, or might be just a sum of 2 or more other columns and remove them. Such columns are redundant for the mahalanobis distance.
BTW, since mahalanobis distance is effectively a rescaling and rotation, calling the scaling function looks superfluous - any reason why you want that?

Normalizing a matrix with respect to a constraint

I am doing a project which requires me to normalize a sparse NxNmatrix. I read somewhere that we can normalize a matrix so that its eigen values lie between [-1,1] by multiplying it with a diagonal matrix D such that N = D^{-1/2}*A*D^{-1/2}.
But I am not sure what D is here. Also, is there a function in Matlab that can do this normalization for sparse matrices?
It's possible that I am misunderstanding your question, but as it reads it makes no sense to me.
A matrix is just a representation of a linear transformation. Given that a matrix A corresponds to a linear transformation T, any matrix of the form B^{-1} A B (called the conjugate of A by B) for an invertible matrix B corresponds to the same transformation, represented in a difference basis. In particular, the eigen values of a matrix correspond to the eigen values of the linear transformation, so conjugating by an invertible matrix cannot change the eigen values.
It's possible that you meant that you want to scale the eigen vectors so that each has unit length. This is a common thing to do since then the eigen values tell you how far a vector of unit length is magnified by the transformation.

efficient computation of Trace(AB^{-1}) given A and B

I have two square matrices A and B. A is symmetric, B is symmetric positive definite. I would like to compute $trace(A.B^{-1})$. For now, I compute the Cholesky decomposition of B, solve for C in the equation $A=C.B$ and sum up the diagonal elements.
Is there a more efficient way of proceeding?
I plan on using Eigen. Could you provide an implementation if the matrices are sparse (A can often be diagonal, B is often band-diagonal)?
If B is sparse, it may be efficient (i.e., O(n), assuming good condition number of B) to solve for x_i in
B x_i = a_i
(sample Conjugate Gradient code is given on Wikipedia). Taking a_i to be the column vectors of A, you get the matrix B^{-1} A in O(n^2). Then you can sum the diagonal elements to get the trace. Generally, it's easier to do this sparse inverse multiplication than to get the full set of eigenvalues. For comparison, Cholesky decomposition is O(n^3). (see Darren Engwirda's comment below about Cholesky).
If you only need an approximation to the trace, you can actually reduce the cost to O(q n) by averaging
r^T (A B^{-1}) r
over q random vectors r. Usually q << n. This is an unbiased estimate provided that the components of the random vector r satisfy
< r_i r_j > = \delta_{ij}
where < ... > indicates an average over the distribution of r. For example, components r_i could be independent gaussian distributed with unit variance. Or they could be selected uniformly from +-1. Typically the trace scales like O(n) and the error in the trace estimate scales like O(sqrt(n/q)), so the relative error scales as O(sqrt(1/nq)).
If generalized eigenvalues are more efficient to compute, you can compute the generalized eigenvalues, A*v = lambda* B *v and then sum up all the lambdas.

What is SVD(singular value decomposition)

How does it actually reduce noise..can you suggest some nice tutorials?
SVD can be understood from a geometric sense for square matrices as a transformation on a vector.
Consider a square n x n matrix M multiplying a vector v to produce an output vector w:
w = M*v
The singular value decomposition M is the product of three matrices M=U*S*V, so w=U*S*V*v. U and V are orthonormal matrices. From a geometric transformation point of view (acting upon a vector by multiplying it), they are combinations of rotations and reflections that do not change the length of the vector they are multiplying. S is a diagonal matrix which represents scaling or squashing with different scaling factors (the diagonal terms) along each of the n axes.
So the effect of left-multiplying a vector v by a matrix M is to rotate/reflect v by M's orthonormal factor V, then scale/squash the result by a diagonal factor S, then rotate/reflect the result by M's orthonormal factor U.
One reason SVD is desirable from a numerical standpoint is that multiplication by orthonormal matrices is an invertible and extremely stable operation (condition number is 1). SVD captures any ill-conditioned-ness in the diagonal scaling matrix S.
One way to use SVD to reduce noise is to do the decomposition, set components that are near zero to be exactly zero, then re-compose.
Here's an online tutorial on SVD.
You might want to take a look at Numerical Recipes.
Singular value decomposition is a method for taking an nxm matrix M and "decomposing" it into three matrices such that M=USV. S is a diagonal square (the only nonzero entries are on the diagonal from top-left to bottom-right) matrix containing the "singular values" of M. U and V are orthogonal, which leads to the geometric understanding of SVD, but that isn't necessary for noise reduction.
With M=USV, we still have the original matrix M with all its noise intact. However, if we only keep the k largest singular values (which is easy, since many SVD algorithms compute a decomposition where the entries of S are sorted in nonincreasing order), then we have an approximation of the original matrix. This works because we assume that the small values are the noise, and that the more significant patterns in the data will be expressed through the vectors associated with larger singular values.
In fact, the resulting approximation is the most accurate rank-k approximation of the original matrix (has the least squared error).
To answer to the tittle question: SVD is a generalization of eigenvalues/eigenvectors to non-square matrices.
Say,
$X \in N \times p$, then the SVD decomposition of X yields X=UDV^T where D is diagonal and U and V are orthogonal matrices.
Now X^TX is a square matrice, and the SVD decomposition of X^TX=VD^2V where V is equivalent to the eigenvectors of X^TX and D^2 contains the eigenvalues of X^TX.
SVD can also be used to greatly ease global (i.e. to all observations simultaneously) fitting of an arbitrary model (expressed in an formula) to data (with respect to two variables and expressed in a matrix).
For example, data matrix A = D * MT where D represents the possible states of a system and M represents its evolution wrt some variable (e.g. time).
By SVD, A(x,y) = U(x) * S * VT(y) and therefore D * MT = U * S * VT
then D = U * S * VT * MT+ where the "+" indicates a pseudoinverse.
One can then take a mathematical model for the evolution and fit it to the columns of V, each of which are a linear combination the components of the model (this is easy, as each column is a 1D curve). This obtains model parameters which generate M? (the ? indicates it is based on fitting).
M * M?+ * V = V? which allows residuals R * S2 = V - V? to be minimized, thus determining D and M.
Pretty cool, eh?
The columns of U and V can also be inspected to glean information about the data; for example each inflection point in the columns of V typically indicates a different component of the model.
Finally, and actually addressing your question, it is import to note that although each successive singular value (element of the diagonal matrix S) with its attendant vectors U and V does have lower signal to noise, the separation of the components of the model in these "less important" vectors is actually more pronounced. In other words, if the data is described by a bunch of state changes that follow a sum of exponentials or whatever, the relative weights of each exponential get closer together in the smaller singular values. In other other words the later singular values have vectors which are less smooth (noisier) but in which the change represented by each component are more distinct.

Resources