I am solving simple optimization problem. The data set has 26 columns and over 3000 rows.
The source code looks like
Means <- colMeans(Returns)
Sigma <- cov(Returns)
invSigma1 <- solve(Sigma)
And everything works perfect- but then I want to do the same for shorter period (only 261 rows) and the solve function writes the following error:
solve(Sigma)
Error in solve.default(Sigma) :
Lapack routine dgesv: system is exactly singular
Its weird because when I do the same with some random numbers:
Returns<-matrix(runif(6786,-1,1), nrow=261)
Means <- colMeans(Returns)
Sigma <- cov(Returns)
invSigma <- solve(Sigma)
no error occurs at all. Could someone explain me where could be the problem and how to treat it.
Thank you very much,
Alex
Using solve with a single parameter is a request to invert a matrix. The error message is telling you that your matrix is singular and cannot be inverted.
I guess your code uses somewhere in the second case a singular matrix (i.e. not invertible), and the solve function needs to invert it. This has nothing to do with the size but with the fact that some of your vectors are (probably) colinear.
Lapack is a Linear Algebra package which is used by R (actually it's used everywhere) underneath solve(), dgesv spits this kind of error when the matrix you passed as a parameter is singular.
As an addendum: dgesv performs LU decomposition, which, when using your matrix, forces a division by 0, since this is ill-defined, it throws this error. This only happens when matrix is singular or when it's singular on your machine (due to approximation you can have a really small number be considered 0)
I'd suggest you check its determinant if the matrix you're using contains mostly integers and is not big. If it's big, then take a look at this link.
I can understand your question. The problem is that your matrix is perpendicular. You can see your first number and the last number of your matrix is same.
Related
I was writing this code to fit a skew-normal to my data. Since I had some crazy value for alpha (alpha=183) I wanted to choose a starting point and see if things goes better. This is my code
my.mle=selm(Y2~1,start = list(xi=1, omega=1, alpha=0))
but I get this error
Error in abs(alpha) : non-numeric argument to mathematical function
what's wrong?
if you choose alpha=0 your fisher information matrix become singular and so your maximization doesn't work.
This is probably straightforward, elementary and whatever, but I can't manage to get it. I have 2 Nx1vectors u and w, which are composed by both negative and positive values. I am trying to compute w'u u'w , which should be a quadratic form. I should be able to write this like
t(w)%*%u%*%t(u)%*%w
However sometimes I get a negative value, depending on the values in the two vectors. This is not possible, since that thing is a quadratic form. I tried with
crossprod(w, u)%*%crossprod(u, w)
and
crossprod(w, u)*crossprod(u, w)
which gives positive and equal results. However, since I am dealing with Nx1vectors, I should also be able to write it as
`sum(w*u)^2`
which gives a positive value but different from the ones above.
So I guess I am doing something wrong somewhere. So, the question is: how can I express w'u u'w which is valid for both vectors and matrices ?
EDIT: here a csv file with the original vectors to reproduce exactly the same issue
I have a difficult R computation to do, and I have an option of 2 computers, called V and L, to run the code. V is supposed to be faster than L, but I did not experience this. So I decided to test it out.
As a simple test, I decided to ask them invert a 3000*3000 matrice 500 times, and record the time.
set.seed(123)
I=500
n=3000
time=matrix(NA,ncol=3,nrow=I)
for(i in 1:I){
t0<-proc.time()
x<-solve(matrix(runif(n^2),n))
mt1<-proc.time()
time[i,]<-(mt1-t0)[1:3]
}
The problem is that during a particular iteration, it got stuck. I don't know why but I suspect it is because the matrix generated was near singular. So I would like to improve the code. I can think of 3 ways:
make sure the matrix generated is easily invertible. But how do i enforce this??? Of course, any solution needs to be computationally inexpensive, otherwise the exercise becomes meaningless.
ask R to skip that iteration if solve takes too long? But again, how do I do that?
assign them a different computation task instead, any recommendation?
A random matrix is invertible with probability 1, meaning that, in practice, the probability of generating a singular (i.e. non-invertible) matrix is infinitesimally small.
Moreover, from the point of view of the algorithm that R uses to invert matrices, there is no such thing as an "easily invertible" matrix. Either the algorithm succeeds, or it determines that a matrix is singular and fails. But there is no scenario under which it tries "really hard" and takes a long time to invert a matrix. It's a deterministic algorithm which either runs into a 0 (or a value smaller than some given epsilon), in which case if fails, or else it doesn't.
On which iteration do you get stuck? Are you sure you are getting stuck on the inversion of the matrix, and it's not something like garbage collection that is taking a long time?
I can't reproduce the problem you describe. Starting with random seed 123, I can invert 500 random 3000x3000 matrices in a row, using your code, without any significant timing discrepancies. Can you find a random seed that generates a "hard to invert matrix" directly?
Is there a way to calculate the determinant of a complex matrix?
F4<-matrix(c(1,1,1,1,1,1i,-1,-1i,1,-1,1,-1,1,-1i,-1,1i),nrow=4)
det(F4)
Error in determinant.matrix(x, logarithm = TRUE, ...) :
determinant not currently defined for complex matrices
library(Matrix)
determinant(Matrix(F4))
Error in Matrix(F4) :
complex matrices not yet implemented in Matrix package
Error in determinant(Matrix(F4)) :
error in evaluating the argument 'x' in selecting a method for function 'determinant'
If you use prod(eigen(F4)$values)
I'd recommend
prod(eigen(F4, only.values=TRUE)$values)
instead.
Note that the qr() is advocated to use iff you are only interested in the
absolute value or rather Mod() :
prod(abs(Re(diag(qr(x)$qr))))
gives the Mod(determinant(x))
{In X = QR, |det(Q)|=1 and the diagonal of R is real (in R at least).}
BTW: Did you note the caveat
Often, computing the determinant is
not what you should be doing
to solve a given problem.
on the help(determinant) page ?
If you know that the characteristic polynomial of a matrix A splits into linear factors, then det(A) is the product of the eigenvalues of A, and you can use eigen value functions like this to work around your problem. I suspect you'll still want something better, but this might be a start.
Disclaimer
This is not strictly a programming question, but most programmers soon or later have to deal with math (especially algebra), so I think that the answer could turn out to be useful to someone else in the future.
Now the problem
I'm trying to check if m vectors of dimension n are linearly independent. If m == n you can just build a matrix using the vectors and check if the determinant is != 0. But what if m < n?
Any hints?
See also this video lecture.
Construct a matrix of the vectors (one row per vector), and perform a Gaussian elimination on this matrix. If any of the matrix rows cancels out, they are not linearly independent.
The trivial case is when m > n, in this case, they cannot be linearly independent.
Construct a matrix M whose rows are the vectors and determine the rank of M. If the rank of M is less than m (the number of vectors) then there is a linear dependence. In the algorithm to determine the rank of M you can stop the procedure as soon as you obtain one row of zeros, but running the algorithm to completion has the added bonanza of providing the dimension of the spanning set of the vectors. Oh, and the algorithm to determine the rank of M is merely Gaussian elimination.
Take care for numerical instability. See the warning at the beginning of chapter two in Numerical Recipes.
If m<n, you will have to do some operation on them (there are multiple possibilities: Gaussian elimination, orthogonalization, etc., almost any transformation which can be used for solving equations will do) and check the result (eg. Gaussian elimination => zero row or column, orthogonalization => zero vector, SVD => zero singular number)
However, note that this question is a bad question for a programmer to ask, and this problem is a bad problem for a program to solve. That's because every linearly dependent set of n<m vectors has a different set of linearly independent vectors nearby (eg. the problem is numerically unstable)
I have been working on this problem these days.
Previously, I have found some algorithms regarding Gaussian or Gaussian-Jordan elimination, but most of those algorithms only apply to square matrix, not general matrix.
To apply for general matrix, one of the best answers might be this:
http://rosettacode.org/wiki/Reduced_row_echelon_form#MATLAB
You can find both pseudo-code and source code in various languages.
As for me, I transformed the Python source code to C++, causes the C++ code provided in the above link is somehow complex and inappropriate to implement in my simulation.
Hope this will help you, and good luck ^^
If computing power is not a problem, probably the best way is to find singular values of the matrix. Basically you need to find eigenvalues of M'*M and look at the ratio of the largest to the smallest. If the ratio is not very big, the vectors are independent.
Another way to check that m row vectors are linearly independent, when put in a matrix M of size mxn, is to compute
det(M * M^T)
i.e. the determinant of a mxm square matrix. It will be zero if and only if M has some dependent rows. However Gaussian elimination should be in general faster.
Sorry man, my mistake...
The source code provided in the above link turns out to be incorrect, at least the python code I have tested and the C++ code I have transformed does not generates the right answer all the time. (while for the exmample in the above link, the result is correct :) -- )
To test the python code, simply replace the mtx with
[30,10,20,0],[60,20,40,0]
and the returned result would be like:
[1,0,0,0],[0,1,2,0]
Nevertheless, I have got a way out of this. It's just this time I transformed the matalb source code of rref function to C++. You can run matlab and use the type rref command to get the source code of rref.
Just notice that if you are working with some really large value or really small value, make sure use the long double datatype in c++. Otherwise, the result will be truncated and inconsistent with the matlab result.
I have been conducting large simulations in ns2, and all the observed results are sound.
hope this will help you and any other who have encontered the problem...
A very simple way, that is not the most computationally efficient, is to simply remove random rows until m=n and then apply the determinant trick.
m < n: remove rows (make the vectors shorter) until the matrix is square, and then
m = n: check if the determinant is 0 (as you said)
m < n (the number of vectors is greater than their length): they are linearly dependent (always).
The reason, in short, is that any solution to the system of m x n equations is also a solution to the n x n system of equations (you're trying to solve Av=0). For a better explanation, see Wikipedia, which explains it better than I can.