How is R able to find eigenvectors for the following matrix? Eigenvalues are 2,2 so eigenvectors require solving solve(matrix(c(0,1,0,0),2,2)) which is singular matrix with no solution.
> eigen(matrix(c(2,1,0,2),2,2))
$values
[1] 2 2
$vectors
[,1] [,2]
[1,] 0 4.440892e-16
[2,] 1 -1.000000e+00
> solve(matrix(c(0,1,0,0),2,2))
Error in solve.default(matrix(c(0, 1, 0, 0), 2, 2)) :
Lapack routine dgesv: system is exactly singular
Both the routines essentially do the same thing. They find x such that (A-lambdaI)x = 0 without finding the inverse of A-lambdaI. Clearly (0 1) is a solution but how I can't understand why solve did not come up with it and how do I manually solve it.
You asked for an eigen decomposition, you got an eigen decomposition.
Had you asked for rcond(), the condition number of the matrix, or for kappa(), you would have gotten the appropriate response.
For your second example:
> mat <- matrix(c(0,1,0,0), 2, 2)
> kappa(mat)
[1] Inf
>
> rcond(mat)
[1] 0
>
For your first example, there is actually no problem:
> mat <- matrix(c(2,1,0,2), 2, 2)
> kappa(mat)
[1] 1.772727
>
> rcond(mat)
[1] 0.5714286
>
>
See e.g. this previous question on SO for more.
Maybe it's using one of the algorithms listed here:
http://en.wikipedia.org/wiki/List_of_numerical_analysis_topics#Eigenvalue_algorithms
?
According to http://stat.ethz.ch/R-manual/R-devel/library/base/html/eigen.html, eigen seems to use the LAPACK routine at http://netlib.org/lapack/double/dgeev.f (if you have a square matrix which is not symmetric).
Note: you're right that A - lambda * I is singular if lambda is an eigenvalue but in order to find eigenvectors, one does need invert A - lambda * I or solve an equation y = (A - lambda * I) * x (with y not being the null vector). It is sufficient to find non-zero vectors x which satisfy
(A - lambda * I) * x = 0
One strategy is to find a non-singular transformation matrix T such that (A - lambda * I) * T is an upper triangular matrix (i.e. all elements below the diagonal are zero). Because A-lambda*I is singular, T can be constructed such that the last element on the diagonal (or even more diagonal elements if the multiplicity of the eigenvalue is larger than one) is zero.
A vector z which only has it's last element equal to a non-zero value (i.e. z = (0,....,0,1) ) will then give the zero vector when multiplied with (A-lambda *I) * T. So one has:
0 = ((A - lambda * I) * T) * z
or in other words, T*z is an eigenvector of A.
They are not doing the same.
Just take a look at the wikipedia link.
And look in code for dgeev.f and you will see that a simple solve of (A-lambda*I)x=0 is not performed.
Related
I'm reading Deep Learning by Goodfellow et al. and am trying to implement gradient descent as shown in Section 4.5 Example: Linear Least Squares. This is page 92 in the hard copy of the book.
The algorithm can be viewed in detail at https://www.deeplearningbook.org/contents/numerical.html with R implementation of linear least squares on page 94.
I've tried implementing in R, and the algorithm as implemented converges on a vector, but this vector does not seem to minimize the least squares function as required. Adding epsilon to the vector in question frequently produces a "minimum" less than the minimum outputted by my program.
options(digits = 15)
dim_square = 2 ### set dimension of square matrix
# Generate random vector, random matrix, and
set.seed(1234)
A = matrix(nrow = dim_square, ncol = dim_square, byrow = T, rlnorm(dim_square ^ 2)/10)
b = rep(rnorm(1), dim_square)
# having fixed A & B, select X randomly
x = rnorm(dim_square) # vector length of dim_square--supposed to be arbitrary
f = function(x, A, b){
total_vector = A %*% x + b # this is the function that we want to minimize
total = 0.5 * sum(abs(total_vector) ^ 2) # L2 norm squared
return(total)
}
f(x,A,b)
# how close do we want to get?
epsilon = 0.1
delta = 0.01
value = (t(A) %*% A) %*% x - t(A) %*% b
L2_norm = (sum(abs(value) ^ 2)) ^ 0.5
steps = vector()
while(L2_norm > delta){
x = x - epsilon * value
value = (t(A) %*% A) %*% x - t(A) %*% b
L2_norm = (sum(abs(value) ^ 2)) ^ 0.5
print(L2_norm)
}
minimum = f(x, A, b)
minimum
minimum_minus = f(x - 0.5*epsilon, A, b)
minimum_minus # less than the minimum found by gradient descent! Why?
On page 94 of the pdf appearing at https://www.deeplearningbook.org/contents/numerical.html
I am trying to find the values of the vector x such that f(x) is minimized. However, as demonstrated by the minimum in my code, and minimum_minus, minimum is not the actual minimum, as it exceeds minimum minus.
Any idea what the problem might be?
Original Problem
Finding the value of x such that the quantity Ax - b is minimized is equivalent to finding the value of x such that Ax - b = 0, or x = (A^-1)*b. This is because the L2 norm is the euclidean norm, more commonly known as the distance formula. By definition, distance cannot be negative, making its minimum identically zero.
This algorithm, as implemented, actually comes quite close to estimating x. However, because of recursive subtraction and rounding one quickly runs into the problem of underflow, resulting in massive oscillation, below:
Value of L2 Norm as a function of step size
Above algorithm vs. solve function in R
Above we have the results of A %% x followed by A %% min_x, with x estimated by the implemented algorithm and min_x estimated by the solve function in R.
The problem of underflow, well known to those familiar with numerical analysis, is probably best tackled by the programmers of lower-level libraries best equipped to tackle it.
To summarize, the algorithm appears to work as implemented. Important to note, however, is that not every function will have a minimum (think of a straight line), and also be aware that this algorithm should only be able to find a local, as opposed to a global minimum.
I'm trying to solve the problem
d = 0.5 * ||X - \Sigma||_{Frobenius Norm} + 0.01 * ||XX||_{1},
where X is a symmetric positive definite matrix, and all the diagnoal element should be 1. XX is same with X except the diagonal matrix is 0. \Sigma is known, I want minimum d with X.
My code is as following:
using Convex
m = 5;
A = randn(m, m);
x = Semidefinite(5);
xx=x;
xx[diagind(xx)].=0;
obj=vecnorm(A-x,2)+sumabs(xx)*0.01;
pro= minimize(obj, [x >= 0]);
pro.constraints+=[x[diagind(x)].=1];
solve!(pro)
MethodError: no method matching diagind(::Convex.Variable)
I just solve the optimal problem by constrain the diagonal elements in matrix, but it seems diagind function could not work here, How can I solve the problem.
I think the following does what you want:
m = 5
Σ = randn(m, m)
X = Semidefinite(m)
XX = X - diagm(diag(X))
obj = 0.5 * vecnorm(X - Σ, 2) + 0.01 * sum(abs(XX))
constraints = [X >= 0, diag(X) == 1]
pro = minimize(obj, constraints)
solve!(pro)
For the types of operations:
diag extracts the diagonal of a matrix, as a vector
diagm constructs a diagonal matrix out of a vector
So, to have XX be X with zero diagonal, we subtract the diagonal of X from it. And to constrain X having diagonal 1, we compare its diagonal with 1, using ==.
It is a good idea to keep immutable values as far as possible, instead of trying to modify things. I don't know whether Convex even supports that.
I am trying to grasp the basic concept of invertible and non-invertible matrices.
I created a random non-singular square matrix
S <- matrix(rnorm(100, 0, 1), ncol = 10, nrow = 10)
I know that this matrix is positive definite (thus invertible) because when I decompose the matrix S into its eigenvalues, their product is positive.
eig_S <- eigen(S)
eig_S$values
[1] 3.0883683+0.000000i -2.0577317+1.558181i -2.0577317-1.558181i 1.6884120+1.353997i 1.6884120-1.353997i
[6] -2.1295086+0.000000i 0.1805059+1.942696i 0.1805059-1.942696i -0.8874465+0.000000i 0.8528495+0.000000i
solve(S)
According to this paper, we can compute the inverse of a non-singular matrix by its SVD too.
Where
(where U and V are eigenvectors and D eigenvalues, please do correct me if I am wrong).
The inverse then is, .
Indeed, I can run the formula in R:
s <- svd(S)
s$v%*%solve(diag(s$d))%*%t(s$u)
Which produces exactly the same result as solve(S).
My first question is:
1) Are s$d indeed represent the eigenvalues of S? Because s$d and eig_S$values are quite different.
Now the second part,
If I create a singular matrix
I <- matrix(rnorm(100, 0, 1), ncol = 5, nrow = 20)
I <- I%*%t(I)
eig_I <- eigen(I)
eig_I$values
[1] 3.750029e+01 2.489995e+01 1.554184e+01 1.120580e+01 8.674039e+00 3.082593e-15 5.529794e-16 3.227684e-16
[9] 2.834454e-16 5.876634e-17 -1.139421e-18 -2.304783e-17 -6.636508e-17 -7.309336e-17 -1.744084e-16 -2.561197e-16
[17] -3.075499e-16 -4.150320e-16 -7.164553e-16 -3.727682e-15
The solve function will produce an error
solve(I)
system is computationally singular: reciprocal condition number =
1.61045e-19
So, again according to the same paper we can use the SVD
i <- svd(I)
solve(i$u %*% diag(i$d) %*% t(i$v))
which produces the same error.
Then I tried to use the Cholesky decomposition for matrix inversion
Conj(t(I))%*%solve(I%*%Conj(t(I)))
and again I get the same error.
Could someone please explain where am I using the equations wrong?
I know that for matrix I%*%Conj(t(I)), the determinant of the eigenvalue matrix is positive but the matrix is not a full rank due to the initial multiplication that I did.
j <- eigen(I%*%Conj(t(I)))
det(diag(j$values))
[1] 3.17708e-196
qr(I %*% Conj(t(I)))$rank
[1] 5
UPDATE 1: Following the comments bellow, and after going through the paper/Wikipedia page again. I used these two codes, which they produce some results but I am not sure about their validity. The first example seems more believable. The SVD solution
i$v%*%diag(1/i$d)%*%t(i$u)
and the Cholesky
Conj(t(I))%*%(I%*%Conj(t(I)))^(-1)
I am not sure if I interpreted the two sources correctly though.
y <- matrix(c(7, 9, -5, 0, 2, 6), ncol = 1)
try <- t(y)
tryy <- try %*% y
i <- solve(tryy)
h <- y %*% i %*% try
uniroot(as.vector(solve(((1-x) * diag(6)) + h)), c(-Inf, Inf))
Error in (1 - x) * diag(6) : non-conformable arrays
The purpose of this command uniroot(as.vector(solve(((1-x) * diag(6)) + h)), c(-Inf, Inf)) is to solve the characteristics equation det[(1-λ)I+h] = 0
where, λ=eigenvalues , I=identity matrix , h=hat matrix=y(y'y)^(-1)y'
here λ is unknown ,we have to solve for it.
I am not understanding where is the problem here? I have tried as:
as.vector(solve(6*diag(6)+h))
This is not non-conformable. But why is not working inside the uniroot function?
Your question is a bit confusing, so I have to make a couple of assumptions. If you want the eigenvalues of h, then the characteristic equation is:
det(h - I*λ) = 0
not
det[(1-λ)I+h] = 0
So I used the former.
Given the above, the short answer is: do it this way.
f <- function(lambda) det(h -lambda*diag(6))
F <- Vectorize(f)
library(rootSolve)
uniroot.all(F,c(-1000,1000),n=2000)
# [1] 0 1
# or, much more simply
eigen(h)$values
# [1] 1.000000e+00 2.220446e-16 0.000000e+00 -2.731318e-18 -6.876381e-18 -7.365903e-17
So h has 2 eigenvalues, 0 and 1. Note that the built-in function eigen(...) finds 6 roots, but 5 of them are within the machine tolerance of 0.
The question about why your code fails is a bit more involved.
First, your code:
tryy <- try %*% y
is the dot product of y with itself (so, a scalar), returned as a matrix with one element. When you "invert" that using solve(...)
i <- solve(tryy)
you simply take the reciprocal, so i is also a matrix with 1 element. I'm not sure if this is what you had in mind.
Second, uniroot(...) does not work this way. The first argument must be a function; you've passed an expression which depends on x, which in turn is undefined. You could try:
f <- function(x) det(h-x*diag(6))
uniroot(f,c(-Inf,Inf))
but this wouldn't work either because (a) uniroot(...) works on a finite interval, (b) it requires that the function f(...) have different sign at the ends of the interval, and (c) in any event it would return only one root (the smaller one).
So you could use uniroot.all(...) in package rootSolve. uniroot.all(...) also requires a function as it's first argument, but there's a twist: the function must be "vectorized". This means that if you pass a vector of lambda values, f(...) should return a vector of the same length. Fortunately in R there is an easy way to "vectorize" a given function, as in:
F <- Vectorize(f).
Even this has it's limits. uniroot.all(...) also requires a finite interval, so we have to guess what that is, and also it evaluates F on n sub-intervals. So if your interval does not contain all the roots, or if the sub-intervals are not small enough, you will not find all the roots.
Using the built-in eigen(...) function is definitely the best option.
I have a matrix and I would like to know if it is diagonalizable. How do I do this in the R programming language?
If you have a given matrix, m, then one way is the take the eigen vectors times the diagonal of the eigen values times the inverse of the original matrix. That should give us back the original matrix. In R that looks like:
m <- matrix( c(1:16), nrow = 4)
p <- eigen(m)$vectors
d <- diag(eigen(m)$values)
p %*% d %*% solve(p)
m
so in that example p %*% d %*% solve(p) should be the same as m
You can implement the full algorithm to check if the matrix reduces to a Jordan form or a diagonal one (see e.g., this document). Or you can take the quick and dirty way: for an n-dimensional square matrix, use eigen(M)$values and check that they are n distinct values. For random matrices, this always suffices: degeneracy has prob.0.
P.S.: based on a simple observation by JD Long below, I recalled that a necessary and sufficient condition for diagonalizability is that the eigenvectors span the original space. To check this, just see that eigenvector matrix has full rank (no zero eigenvalue). So here is the code:
diagflag = function(m,tol=1e-10){
x = eigen(m)$vectors
y = min(abs(eigen(x)$values))
return(y>tol)
}
# nondiagonalizable matrix
m1 = matrix(c(1,1,0,1),nrow=2)
# diagonalizable matrix
m2 = matrix(c(-1,1,0,1),nrow=2)
> m1
[,1] [,2]
[1,] 1 0
[2,] 1 1
> diagflag(m1)
[1] FALSE
> m2
[,1] [,2]
[1,] -1 0
[2,] 1 1
> diagflag(m2)
[1] TRUE
You might want to check out this page for some basic discussion and code. You'll need to search for "diagonalized" which is where the relevant portion begins.
All symmetric matrices across the diagonal are diagonalizable by orthogonal matrices. In fact if you want diagonalizability only by orthogonal matrix conjugation, i.e. D= P AP' where P' just stands for transpose then symmetry across the diagonal, i.e. A_{ij}=A_{ji}, is exactly equivalent to diagonalizability.
If the matrix is not symmetric, then diagonalizability means not D= PAP' but merely D=PAP^{-1} and we do not necessarily have P'=P^{-1} which is the condition of orthogonality.
you need to do something more substantial and there is probably a better way but you could just compute the eigenvectors and check rank equal to total dimension.
See this discussion for a more detailed explanation.