I' trying to get the largest eigenvalue of a fully-connected right stochastic matrix in R & MATLAB.
From this link:
http://en.wikipedia.org/wiki/Stochastic_matrix
I understand that the largest eigenvalue will be 1. For example, we can see the eigenvalues are "1, 0" after running the following code in R:
> eigen(matrix(rep(0.5,4),ncol=2))
$values
[1] 1 0
$vectors
[,1] [,2]
[1,] 0.707107 -0.707107
[2,] 0.707107 0.707107
But recently, I found a very interested result if I try to get the largest eigenvalue of the following stochastic matrix:
> m = matrix(c(0.5, 0.995, 0.5, 0.005),ncol = 2 ,nrow=2);
> eigen(m)$value
[1] 1.000 -0.495
> eigen(m)$value[1] == 1
[1] FALSE
Notice that it show "FALSE". That's weird! It should be equal to 1, right?
There should have some computation errors. I also tried this matrix in MATLAB and still got the same result. So far, I can only round it up to 1. Any idea about how to fix it?
Thank you,
Ken
In this particular case, I get TRUE on my machine, but generally, comparing floating point values for equality is a bad idea and will be unreliable because of rounding. For example, I get:
> (0.1+0.1+0.1)/3==0.1
[1] FALSE
Floating point operations almost always involve some rounding, and so you can't expect the result of two calculations which should yield algebraically equal quantities to produce the same floating point value.
Related
I am trying to determine whether matrices are negative semi-definit or not. For this reason, I check if all eigenvalues are smaller or equal to zero. One example matrix is:
[,1] [,2] [,3] [,4]
[1,] -1.181830e-05 0.0001576663 -2.602332e-07 1.472770e-05
[2,] 1.576663e-04 -0.0116220027 3.249607e-04 -2.348050e-04
[3,] -2.602332e-07 0.0003249607 -2.616447e-05 3.492998e-05
[4,] 1.472770e-05 -0.0002348050 3.492998e-05 -9.103073e-05
The eigenvalues calculated by stata are 1.045e-12, -0.00001559, -0.00009737, -0.01163805. However, eigenvalues calculated by R are -1.207746e-20, -1.558760e-05, -9.737074e-05, -1.163806e-02. So the last three eigenvalues are very similar, but the first one which is very close to zero is not. With the eigenvalues obtained with stata, the matrix is not semi-definit, but with the eigenvalues obtained with R it is semi-definit. Is there a way I can find out which calculation is more precise? Or might it even be possible to rescale the matrix in order to avoid infinitely small eigenvalues?
Thank you very much in advance. Every hint will be highly appreciated.
You can't expect so much precision from a numerical algorithm using double precision floating point numbers.
You can expect no more than 17 decimal digits, and relative precision loss around zero is not uncommon. That is, given numerical error, 1e-12 and -1e-20 are both mostly indistinguishable from 0.
For instance, for the smallest eigenvalue (using coefficients you give in your comment), I get:
R 3.4.1: 5.929231e-21,
MATLAB R2017a: 3.412972022812169e-19
Stata 15: 3.2998e-20 (matrix eigenvalues) or 4.464e-19 (matrix symeigen)
Intel Fortran with MKL (DSYEV function): 2.2608e-19
You may choose a threshold, say 1e-10, and force an eigenvalue to zero when its ratio to the largest eigenvalue is less than 1e-10.
Anyway, your 1e-12 looks a bit large. You may have lost some precision when transferring data between Stata and R: a small relative error in the matrix can result in large relative error for eigenvalues arount zero.
With Stata and the data in your question (not in the comment), I get for instance 3.696e-12 for the smallest eigenvalue.
However, even with the same matrix, there may still be differences (there are, above), due to variations in:
the parser, if you enter your numbers as text
the algorithm used for eigenvalue computation
implementation details of the same algorithm (floating-point operators are not associative, for instance)
the compiler used to compile the computation routines, or compiler options
floating-point hardware
The traditionnal suggested reading for this kind of question:
What Every Computer Scientist Should Know About Floating-Point Arithmetic
So it is a mathematical fact that if the determinant of a matrix is equal to zero, then the matrix must be singular (not invertible). Now, the problem I am running into is that when I calculate the determinant of my matrix it is equal to zero, however, when I calculate the inverse it exist. I think it has to do with the way R calculates determinants that the two are not agreeing. Here is the code that I am trying (I wont print the results of solve because the matrix is 100 x 100).
> Rinv = solve(R)
>
> det(R)
[1] 0
>
> #Using a Cholesky Factorization
> L = chol(R)
> Q = t(L)
>
> det(L)*det(Q)
[1] 0
For large matrices the determinant can be too large or too small and overflow the double precision.
The determinant is the product of the eigenvalues: for instance, if they are all .0001, your matrix is invertible, but the determinant is 1e-400, which is too small, and can only be represented as 0.
You can look at the logarithm of the determinant instead,
determinant(R, logarithm=TRUE)
or, directly, the eigenvalues
eigen(R, only.values=TRUE)
The signs of the eigenvectors in the eigen function change depending on the specification of the symmetric argument. Consider the following example:
set.seed(1234)
data <- matrix(rnorm(200),nrow=100)
cov.matrix <- cov(data)
vectors.1 <- eigen(cov.matrix,symmetric=TRUE)$vectors
vectors.2 <- eigen(cov.matrix,symmetric=FALSE)$vectors
#The second and third eigenvectors have opposite sign
all(vectors.1 == vectors.2)
FALSE
This also has implications for principal component analysis as the princomp function appears to calculate the eigenvectors for the covariance matrix using the eigen function with symmetric set to TRUE.
pca <- princomp(data)
#princomp uses vectors.1
pca$loadings
Loadings:
Comp.1 Comp.2
[1,] -0.366 -0.931
[2,] 0.931 -0.366
Comp.1 Comp.2
SS loadings 1.0 1.0
Proportion Var 0.5 0.5
Cumulative Var 0.5 1.0
vectors.1
[,1] [,2]
[1,] -0.3659208 -0.9306460
[2,] 0.9306460 -0.3659208
Can someone please explain the source or reasoning behind the discrepancy?
Eigenvectors remain eigenvectors after multiplication by a scalar (including -1).
The proof is simple:
If v is an eigenvector of matrix A with matching eigenvalue c, then by definition Av=cv.
Then, A(-v) = -(Av) = -(cv) = c(-v). So -v is also an eigenvector with the same eigenvalue.
The bottom line is that this does not matter and does not change anything.
If you want to change the sign of eigenvector elements, then simply ensure $\mathbf{1}^T\mathbf{e}>1$. In other words, sum all the elements in each eigenvector, and ensure the sum is greater than one. If not, change the sign of each element to the opposite sign. This is the trick to get the sign of eigenvector elements, principal components, and loadings in PCA to come out the same as most statistical software.
Linear algebra libraries like LAPACK contain multiple subroutines for carrying out operations like eigendecompositions. The particular subroutine used in any given case may depend on the type of matrix being decomposed, and the pieces of that decomposition needed by the user.
As you can see in this snippet from eigen's code, it dispatches different LAPACK subroutines depending on whether symmetric=TRUE or symmetric=FALSE (and also, on whether the matrix is real or complex).
if (symmetric) {
z <- if (!complex.x)
.Internal(La_rs(x, only.values))
else .Internal(La_rs_cmplx(x, only.values))
ord <- rev(seq_along(z$values))
}
else {
z <- if (!complex.x)
.Internal(La_rg(x, only.values))
else .Internal(La_rg_cmplx(x, only.values))
ord <- sort.list(Mod(z$values), decreasing = TRUE)
}
Based on pointers in ?eigen, La_rs() (used when symmetric=TRUE) appears to refer to dsyevr while La_rg() refers to dgeev.
To learn exactly why those two algorithms switch some of the signs of the eigenvectors of the matrix you've handed to eigen(), you'd have to dig into the FORTRAN code used to implement them. (Since, as others have noted, the sign is irrelevant, I'm guessing you won't want to dig quite that deep ;).
How do I normalize/scale matrices in R by column. For example, when I compute eigenvectors of a matrix, R returns:
> eigen(matrix(c(2,-2,-2,5),2,2))$vectors
[,1] [,2]
[1,] -0.4472136 -0.8944272
[2,] 0.8944272 -0.4472136
// should be normalized to
[,1] [,2]
[1,] -1 -2
[2,] 2 -1
The function "scale" subtracts the means and divided by standard deviation by column which does not help in this case. How do I achieve this?
This produces the matrix you say you want:
> a <- eigen(matrix(c(2,-2,-2,5),2,2))$vectors
> a / min(abs(a))
[,1] [,2]
[1,] -1 -2
[2,] 2 -1
But I'm not sure I understand exactly what you want, so this may not do the right thing in general.
Wolfram Alpha gives the following result:
http://www.wolframalpha.com/input/?i=eigenvalues{{2,-2},{-2,5}}
Input:
alt text http://www4a.wolframalpha.com/Calculate/MSP/MSP2019c09551ice7322c0000597gh9iecce8ce5a?MSPStoreType=image/gif&s=58&w=162&h=36
Eigenvalues:
alt text http://www4a.wolframalpha.com/Calculate/MSP/MSP2319c09551ice7322c00000d87ab28c27g8i27?MSPStoreType=image/gif&s=58&w=500&h=52
Eigenvectors:
alt text http://www4a.wolframalpha.com/Calculate/MSP/MSP2619c09551ice7322c00001c9hcg6e2bgiefgf?MSPStoreType=image/gif&s=58&w=500&h=64
I'm not sure what you're talking about with means and standard deviations. A good iterative method like QR should get you the eigenvalues and eigenvectors you need. Check out Jacobi or Householder.
You normalize any vector by dividing every component by the square root of the sum of squares of its components. A unit vector will have magnitude equal to one.
In your case this is true: the vectors being presented by R have been normalized. If you normalize the two Wolfram eigenvectors, you'll see that both have a magnitude equal to the square root of 5. Divide each column vector by this value and you'll get the ones given to you by R. Both are correct.
I have a simple question:
given p points (non-collinear) in R^p i find the hyperplane passing by these points (to help clarify i type everything in R):
p<-2
x<-matrix(rnorm(p^2),p,p)
b<-solve(crossprod(cbind(1,x[,-2])))%*%crossprod(cbind(1,x[,-2]),x[,2])
then, given a p+1^th points not collinear with first p points, i find the direction perpendicular to b:
x2<-matrix(rnorm(p),p,1)
b2<-solve(c(-b[-1],1)%*%t(c(-b[-1],1))+x2%*%t(x2))%*%x2
That is, b2 defines a p dimensional hyperplane perpendicular to b and passing by x2.
Now, my questions are:
The formula comes from my interpretation of this wikipedia entry ("solve(A)" is the R command for A^-1). Why this doesn't work for p>2 ? What am i doing wrong ?
PS: I have seen this post (on stakeoverflow edit:sorry cannot post more than one link) but somehow it doesn't help me.
Thanks in advance,
i have a problem implementation/understanding of Liu's solution when p>2:
shouldn't the dot product between the qr decomposition of the sweeped matrix and the direction of the hyperplane be 0 ? (i.e. if the qr vectors are perpendicular to the hyperplane)
i.e, when p=2 this
c(-b[2:p],1)%*%c(a1)
gives 0. When p>2 it does not.
Here is my attempt to implement Victor Liu's solution.
a) given p linearly independent observations in R^p:
p<-2;x<-matrix(rnorm(p^2),p,p);x
[,1] [,2]
[1,] -0.4634923 -0.2978151
[2,] 1.0284040 -0.3165424
b) stake them in a matrix and subtract the first row:
a0<-sweep(x,2,x[1,],FUN="-");a0
[,1] [,2]
[1,] 0.000000 0.00000000
[2,] 1.491896 -0.01872726
c) perform a QR decomposition of the matrix a0. The vector in the nullspace is the direction im looking for:
qr(a0)
[,1] [,2]
[1,] -1.491896 0.01872726
[2,] 1.000000 0.00000000
Indeed; this direction is the same as the one given by application of the formula from wikipedia (using x2=(0.4965321,0.6373157)):
[,1]
[1,] 2.04694853
[2,] -0.02569464
...with the advantage that it works in higher dimensions.
I have one last question: what is the meaning of the other p-1 (i.e. (1,0) here) QR vector when p>2 ?
-thanks in advance,
A p-1 dimensional hyperplane is defined by a normal vector and a point that the plane passes through:
n.(x-x0) = 0
where n is the normal vector of length p, x0 is a point through which the hyperplane passes, . is a dot product, and the equation must be satisfied for any point x on the plane. We can also write this as
n.x = p
where p = n.x0 is just a number. This is a more compact representation of a hyperplane, which is parameterized by (n,p). To find your hyperplane, suppose your points are x1, ..., xp.
Form a matrix A with p-1 rows and p columns as follows. The rows of p are xi-x1, laid out as rows vectors, for all i>1 (there are only p-1 of them). If your p points are not "collinear" as you say (they need to be affinely independent), then matrix A will have rank p-1, and a nullspace dimension of 1. The one vector in the nullspace is the normal vector of the hyperplane. Once you find it (call it n), then p = n.x1. In order to find the nullspace of a matrix, you can use a QR decomposition (see here for details).