I am trying to generate a positive definite matrix (A'*A) of dimensions 8x8.
where A is 1x8.
I tried it for many randomly generated matrix A but not able to generate it.
octave-3.6.1.exe:166> A= (rand(1,8)+rand(1,8)*1i);
octave-3.6.1.exe:167> chol(A'*A);
error: chol: input matrix must be positive definite
Can anyone please tell me what is going wrong here. Thanks for the help in advance.
It's not possible to do that, since no matrix of that form is positive definite.
Claim: Given a 1xn (real, n>1) matrix A, the symmetric matrix M = A'A is not positive definite:
Proof: By definition, M is positive definite iff x'Mx > 0 for all non zero x. That is, iff x'A'Ax = (Ax)'Ax = (Ax)^2 = (A_1 x_1 + ... + A_n x_n) > 0 for all non zero x.
Since the real values A_i are linearly dependent, there exists x_i, not all zero, such that A_1 x_1 + ... + A_n x_n = 0. We found a non zero vector x such that x'Mx = 0, so M is not positive definite.
A different proof, that can be applied directly to the complex case is this: Let A be an 1xn (complex, n>1) matrix. Positive definiteness implies invertibility, so M = A*A must have full rank to be positive definite. It clearly has rank 1, so it's not invertible and thus not positive definite.
Here is how I routinelly create SPD matrix
1) Create a random Symetric Matrix
2) Make sure that all the diagonal values are greater than the sum of any row or column they appear in.
Usually for (1) I use random number between 0 and 1. Its then easy to figure out a number to use for each diagonal entries.
Cheers,
Related
I am creating optimization algorithm where I need to plug initial value of A which is a lower triangular matrix with positive diagonal. My first question is how to derive a random lower triangular matrix with positive diagonal as an initial value matrix in r? And is it good idea to choose random matrix? if not what are the ways to do a better initial guess for this type of matrix?
I can imagine there are much better solutions based on your specific applications, but we can set the diagonal to half-Normal (i.e. |e| where e ~ N(0,1)) and set the lower-triangular off-diagonal elements to standard Normal values. ...
n <- 10
M <- diag(abs(rnorm(n)))
M[lower.tri(M, diag = FALSE)] <- rnorm(n*(n-1)/2)
I'm trying to calculate this regression formula, but I have problem with the dimension calculation, they are not correct:
Where:
X-a matrix with dimensions 200x20, n=200 samples, p=20 predictors,
y-a matrix with dimensions 200x1,
- a sequence of coefficients, dimensions 20x1, and k=1,2,3...
- dimensions 20x200
j- and value from 1...p so from 1...20,
The problem is when I calculate
For example for k=20, k-1=19 i have and the dimensions do not match to do a substraction 200x1 - 200x20 x 1x1 =200x1 - 200x20 will not work.
If I take all the beta vector then it is correct. does this: mean to take the 19th value of Beta and to multiply it with the matrix X?
Source of the formula:
You should be using the entire beta vector at each stage of the calculation.
(Tibshirani has been a bit permissive with his use of notation, perhaps...)
The k is just a counter for which step of the algorithm we are on. Right at the start (k = 0 or "step 0") we initialise the entire beta vector to have all elements equal to zero:
At each step of the algorithm (steps k = 1, 2, 3... and so on) we use our previous estimate of the vector beta ( calculated in step k - 1) to calculate a new improved estimate for the vector beta (). The superscript number is not an index into the vector, rather it is a label telling us at which stage of the algorithm that beta vector was produced.
I hope this makes sense. The important point is that each of the values is a different 20x1 vector.
I would like to generate a random correlation matrix in R of 1000*1000 where the average correlation (excluding diagonal) is 0.3.
I looked at genPositiveDefMat from library clusterGeneration but I couldn't figure out how to specify a given correlation.
A boring example of such a matrix would be
C = (1-m)*I + m*U*U'
where I is the identity matrix, U a vector of all ones and m = 0.3. C is positive definite and the average (indeed every) off-diagonal element is m.
So we could try generating a matrix of the form
C = D + alpha*U*U'
where D is diagonal and positive definite, alpha a positive scaler and U a 'random' vector. Such a matrix will be positive definite. For this to have the correct average of the off diagonal elements a little algebra shows
alpha = dim*(dim-1)*m / (S*S-T)
where
S = Sum{ i | U[i] }
T = Sum{ i | U[i]*U[i]}
As long as all the elements of U are positive, we will have
S*S>T
and so alpha will be positive.
For the diagonal elements of C to be 1.0, we require
D[i] = 1 - alpha*U[i]*U[i] (i=1..dim)
and all of these must be non negative.
Alas I have been unable to find theoretically how the elements of U should be chosen to guarantee this. However experimentally, if the elements of U are uniform random numbers between 1.0 and 5.0, I've not seen a case where any of the D[i] are negative.
The upper limit for the elements of U, 5.0 above, controls how different the various correlations are. With 5.0 they vary between around 0.03 and 0.8, whild with an upper limit of 2.0 they vary between around 0.13 and 0.53.
Choosing the upper limit too high will increase the likelihood of the method failing (D not positive).
I have a non-negative matrix p, which has has elements that sum to 1 in each row.
How can I find a non-negative vector x which has it's sum equals 1, such as:
(i-t(p))*x = 0?
(where i is diagonal matrix and t(p) is the matrix transpose)
??eigenvector suggests 'base::eigen'.
?eigen also suggests 'svd'.
svd gives you what you want, but will not be scaled correctly.
"Sum equal 1" is just a scale. Typically eigenvectors are scaled as to have a length of 1 instead.
Edited:
You're looking for a particular vector with sum of 1. Not just any vector. You want a vector that is an eigenvector with the eigenvalue of 1.
Once you have such a (nonzero) vector x, any multiple of x will also be an eigenvector. In particular, the vector x/sum(x) has elements which sum to 1.
How does it actually reduce noise..can you suggest some nice tutorials?
SVD can be understood from a geometric sense for square matrices as a transformation on a vector.
Consider a square n x n matrix M multiplying a vector v to produce an output vector w:
w = M*v
The singular value decomposition M is the product of three matrices M=U*S*V, so w=U*S*V*v. U and V are orthonormal matrices. From a geometric transformation point of view (acting upon a vector by multiplying it), they are combinations of rotations and reflections that do not change the length of the vector they are multiplying. S is a diagonal matrix which represents scaling or squashing with different scaling factors (the diagonal terms) along each of the n axes.
So the effect of left-multiplying a vector v by a matrix M is to rotate/reflect v by M's orthonormal factor V, then scale/squash the result by a diagonal factor S, then rotate/reflect the result by M's orthonormal factor U.
One reason SVD is desirable from a numerical standpoint is that multiplication by orthonormal matrices is an invertible and extremely stable operation (condition number is 1). SVD captures any ill-conditioned-ness in the diagonal scaling matrix S.
One way to use SVD to reduce noise is to do the decomposition, set components that are near zero to be exactly zero, then re-compose.
Here's an online tutorial on SVD.
You might want to take a look at Numerical Recipes.
Singular value decomposition is a method for taking an nxm matrix M and "decomposing" it into three matrices such that M=USV. S is a diagonal square (the only nonzero entries are on the diagonal from top-left to bottom-right) matrix containing the "singular values" of M. U and V are orthogonal, which leads to the geometric understanding of SVD, but that isn't necessary for noise reduction.
With M=USV, we still have the original matrix M with all its noise intact. However, if we only keep the k largest singular values (which is easy, since many SVD algorithms compute a decomposition where the entries of S are sorted in nonincreasing order), then we have an approximation of the original matrix. This works because we assume that the small values are the noise, and that the more significant patterns in the data will be expressed through the vectors associated with larger singular values.
In fact, the resulting approximation is the most accurate rank-k approximation of the original matrix (has the least squared error).
To answer to the tittle question: SVD is a generalization of eigenvalues/eigenvectors to non-square matrices.
Say,
$X \in N \times p$, then the SVD decomposition of X yields X=UDV^T where D is diagonal and U and V are orthogonal matrices.
Now X^TX is a square matrice, and the SVD decomposition of X^TX=VD^2V where V is equivalent to the eigenvectors of X^TX and D^2 contains the eigenvalues of X^TX.
SVD can also be used to greatly ease global (i.e. to all observations simultaneously) fitting of an arbitrary model (expressed in an formula) to data (with respect to two variables and expressed in a matrix).
For example, data matrix A = D * MT where D represents the possible states of a system and M represents its evolution wrt some variable (e.g. time).
By SVD, A(x,y) = U(x) * S * VT(y) and therefore D * MT = U * S * VT
then D = U * S * VT * MT+ where the "+" indicates a pseudoinverse.
One can then take a mathematical model for the evolution and fit it to the columns of V, each of which are a linear combination the components of the model (this is easy, as each column is a 1D curve). This obtains model parameters which generate M? (the ? indicates it is based on fitting).
M * M?+ * V = V? which allows residuals R * S2 = V - V? to be minimized, thus determining D and M.
Pretty cool, eh?
The columns of U and V can also be inspected to glean information about the data; for example each inflection point in the columns of V typically indicates a different component of the model.
Finally, and actually addressing your question, it is import to note that although each successive singular value (element of the diagonal matrix S) with its attendant vectors U and V does have lower signal to noise, the separation of the components of the model in these "less important" vectors is actually more pronounced. In other words, if the data is described by a bunch of state changes that follow a sum of exponentials or whatever, the relative weights of each exponential get closer together in the smaller singular values. In other other words the later singular values have vectors which are less smooth (noisier) but in which the change represented by each component are more distinct.