I have a non-negative matrix p, which has has elements that sum to 1 in each row.
How can I find a non-negative vector x which has it's sum equals 1, such as:
(i-t(p))*x = 0?
(where i is diagonal matrix and t(p) is the matrix transpose)
??eigenvector suggests 'base::eigen'.
?eigen also suggests 'svd'.
svd gives you what you want, but will not be scaled correctly.
"Sum equal 1" is just a scale. Typically eigenvectors are scaled as to have a length of 1 instead.
Edited:
You're looking for a particular vector with sum of 1. Not just any vector. You want a vector that is an eigenvector with the eigenvalue of 1.
Once you have such a (nonzero) vector x, any multiple of x will also be an eigenvector. In particular, the vector x/sum(x) has elements which sum to 1.
Related
I am currently in an online class in genomics, coming in as a wetlab physician, so my statistical knowledge is not the best. Right now we are working on PCA and SVD in R. I got a big matrix:
head(mat)
ALL_GSM330151.CEL ALL_GSM330153.CEL ALL_GSM330154.CEL ALL_GSM330157.CEL ALL_GSM330171.CEL ALL_GSM330174.CEL ALL_GSM330178.CEL ALL_GSM330182.CEL
ENSG00000224137 5.326553 3.512053 3.455480 3.472999 3.639132 3.391880 3.282522 3.682531
ENSG00000153253 6.436815 9.563955 7.186604 2.946697 6.949510 9.095092 3.795587 11.987291
ENSG00000096006 6.943404 8.840839 4.600026 4.735104 4.183136 3.049792 9.736803 3.338362
ENSG00000229807 3.322499 3.263655 3.406379 9.525888 3.595898 9.281170 8.946498 3.473750
ENSG00000138772 7.195113 8.741458 6.109578 5.631912 5.224844 3.260912 8.889246 3.052587
ENSG00000169575 7.853829 10.428492 10.512497 13.041571 10.836815 11.964498 10.786381 11.953912
Those are just the first few columns and rows, it has 60 columns and 1000 rows. Columns are cancer samples, rows are genes
The task is to:
removing the eigenvectors and reconstructing the matrix using SVD, then we need to calculate the reconstruction error as the difference between the original and the reconstructed matrix. HINT: You have to use the svd() function and equalize the eigenvalue to $0$ for the component you want to remove.
I have been all over google, but can't find a way to solve this task, which might be because I don't really get the question itself.
so i performed SVD on my matrix m:
d <- svd(mat)
Which gives me 3 matrices (Eigenassays, Eigenvalues and Eigenvectors), which i can access using d$u and so on.
How do I equalize the eigenvalue and ultimately calculate the error?
https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/svd
the decomposition expresses your matrix mat as a product of 3 matrices
mat = d$u x diag(d$d) x t(d$v)
so first confirm you are able to do the matrix multiplications to get back mat
once you are able to do this, set the last couple of elements of d$d to zero before doing the matrix multiplication
It helps to create a function that handles the singular values.
Here, for instance, is one that zeros out any singular value that is too small compared to the largest singular value:
zap <- function(d, digits = 3) ifelse(d < 10^(-digits) * max(abs(d))), 0, d)
Although mathematically all singular values are guaranteed non-negative, numerical issues with floating point algorithms can--and do--create negative singular values, so I have prophylactically wrapped the singular values in a call to abs.
Apply this function to the diagonal matrix in the SVD of a matrix X and reconstruct the matrix by multiplying the components:
X. <- with(svd(X), u %*% diag(zap(d)) %*% t(v))
There are many ways to assess the reconstruction error. One is the Frobenius norm of the difference,
sqrt(sum((X - X.)^2))
I'm trying to calculate this regression formula, but I have problem with the dimension calculation, they are not correct:
Where:
X-a matrix with dimensions 200x20, n=200 samples, p=20 predictors,
y-a matrix with dimensions 200x1,
- a sequence of coefficients, dimensions 20x1, and k=1,2,3...
- dimensions 20x200
j- and value from 1...p so from 1...20,
The problem is when I calculate
For example for k=20, k-1=19 i have and the dimensions do not match to do a substraction 200x1 - 200x20 x 1x1 =200x1 - 200x20 will not work.
If I take all the beta vector then it is correct. does this: mean to take the 19th value of Beta and to multiply it with the matrix X?
Source of the formula:
You should be using the entire beta vector at each stage of the calculation.
(Tibshirani has been a bit permissive with his use of notation, perhaps...)
The k is just a counter for which step of the algorithm we are on. Right at the start (k = 0 or "step 0") we initialise the entire beta vector to have all elements equal to zero:
At each step of the algorithm (steps k = 1, 2, 3... and so on) we use our previous estimate of the vector beta ( calculated in step k - 1) to calculate a new improved estimate for the vector beta (). The superscript number is not an index into the vector, rather it is a label telling us at which stage of the algorithm that beta vector was produced.
I hope this makes sense. The important point is that each of the values is a different 20x1 vector.
I am trying to generate a positive definite matrix (A'*A) of dimensions 8x8.
where A is 1x8.
I tried it for many randomly generated matrix A but not able to generate it.
octave-3.6.1.exe:166> A= (rand(1,8)+rand(1,8)*1i);
octave-3.6.1.exe:167> chol(A'*A);
error: chol: input matrix must be positive definite
Can anyone please tell me what is going wrong here. Thanks for the help in advance.
It's not possible to do that, since no matrix of that form is positive definite.
Claim: Given a 1xn (real, n>1) matrix A, the symmetric matrix M = A'A is not positive definite:
Proof: By definition, M is positive definite iff x'Mx > 0 for all non zero x. That is, iff x'A'Ax = (Ax)'Ax = (Ax)^2 = (A_1 x_1 + ... + A_n x_n) > 0 for all non zero x.
Since the real values A_i are linearly dependent, there exists x_i, not all zero, such that A_1 x_1 + ... + A_n x_n = 0. We found a non zero vector x such that x'Mx = 0, so M is not positive definite.
A different proof, that can be applied directly to the complex case is this: Let A be an 1xn (complex, n>1) matrix. Positive definiteness implies invertibility, so M = A*A must have full rank to be positive definite. It clearly has rank 1, so it's not invertible and thus not positive definite.
Here is how I routinelly create SPD matrix
1) Create a random Symetric Matrix
2) Make sure that all the diagonal values are greater than the sum of any row or column they appear in.
Usually for (1) I use random number between 0 and 1. Its then easy to figure out a number to use for each diagonal entries.
Cheers,
If I have a matrix from scale, translate, and rotation transform. I want to split this matrix to two matrix. One is rotation+translation matrix, the other is scale matrix.
Because I want to compute the correct normal vector transform, so I only need orthogonal matrix to do the computation for surface normal vector
Any ideas?
If I have a matrix from scale, translate, and rotation transform. I want to split this matrix to two matrix. One is rotation+translation matrix, the other is scale matrix.
I'm assuming this matrix you are talking about is a 4x4 matrix that is widely used by some, widely despised by others, with the fourth row being 0,0,0,1.
I'll cause these two operations "scale" and "rotate+translate". Note well: These operations are not commutative. Scaling a 3-vector and then rotating/translating this scaled vector yields a different result than you would get by reversing the order of operations.
Case 1, operation is "rotate+translate", then "scale".
Let SR=S*R, where S is a 3x3 diagonal matrix with positive diagonal elements (a scaling matrix) and R is a 3x3 orthonormal rotation matrix. The rows of matrix SR will be orthogonal to one another, but the columns will not be orthogonal. The scale factors are the square root of the norms of the rows of the matrix SR.
Algorithm:
Given 4x4 matrix A, produce 4x4 scaling matrix S, 4x4 rotation+translation matrix T
A = [ SR(3x3) Sx(3x1) ]
[ 0(1x3) 1 ]
Partition A into a 3x3 matrix SR and a 3 vector Sx as depicted above.
Construct the scaling matrix S. The first three diagonal elements are the vector norms of the rows of matrix SR; the last diagonal element is 1.
Construct the 4x4 rotation+translation matrix T by dividing each row of A by the corresponding scale factor.
Case 2, operation is "scale", then "rotate+translate".
Now consider the case RS=R*S. Here the columns of A will be orthogonal to one another, but the rows will not be orthogonal. In this case the scale factors are the square root of the norms of the columns of the matrix RS.
Algorithm:
Given 4x4 matrix A, produce 4x4 rotation+translation matrix T, 4x4 scaling matrix S
A = [ RS(3x3) x(3x1) ]
[ 0(1x3) 1 ]
Partition A into a 3x3 matrix RS and a 3 vector x as depicted above.
Construct the scaling matrix S. The first three diagonal elements are the vector norms of the columns of matrix RS; the last diagonal element is 1.
Construct the 4x4 rotation+translation matrix T by dividing each row of A by the corresponding scale factor.
If the scaling is not uniform (e.g., scale x by 2, y by 4, z by 1/2), you can tell the order of operations by looking at the inner products of the rows and columns of the upper 3x3 matrix with one another. Scaling last (my case 1) means the row inner products will be very close to zero but the column inner products will be non zero. Scaling first (my case 2) reverses the situation. If the scaling is uniform there is no way to tell which case is which. You need to know beforehand.
Just an idea -
Multiply the matrix by the unit vectors (1/sqrt(3),1/sqrt(3),1/sqrt(3)),
check how the length of the vector after the multiplication,
scale the matrix by the reciprocal of that value. Now you have an orthogonal matrix
create a new scale matrix with the scale you found.
Remove the translation to get a 3x3 matrix
Perform the polar decomposition via SVD.
How does it actually reduce noise..can you suggest some nice tutorials?
SVD can be understood from a geometric sense for square matrices as a transformation on a vector.
Consider a square n x n matrix M multiplying a vector v to produce an output vector w:
w = M*v
The singular value decomposition M is the product of three matrices M=U*S*V, so w=U*S*V*v. U and V are orthonormal matrices. From a geometric transformation point of view (acting upon a vector by multiplying it), they are combinations of rotations and reflections that do not change the length of the vector they are multiplying. S is a diagonal matrix which represents scaling or squashing with different scaling factors (the diagonal terms) along each of the n axes.
So the effect of left-multiplying a vector v by a matrix M is to rotate/reflect v by M's orthonormal factor V, then scale/squash the result by a diagonal factor S, then rotate/reflect the result by M's orthonormal factor U.
One reason SVD is desirable from a numerical standpoint is that multiplication by orthonormal matrices is an invertible and extremely stable operation (condition number is 1). SVD captures any ill-conditioned-ness in the diagonal scaling matrix S.
One way to use SVD to reduce noise is to do the decomposition, set components that are near zero to be exactly zero, then re-compose.
Here's an online tutorial on SVD.
You might want to take a look at Numerical Recipes.
Singular value decomposition is a method for taking an nxm matrix M and "decomposing" it into three matrices such that M=USV. S is a diagonal square (the only nonzero entries are on the diagonal from top-left to bottom-right) matrix containing the "singular values" of M. U and V are orthogonal, which leads to the geometric understanding of SVD, but that isn't necessary for noise reduction.
With M=USV, we still have the original matrix M with all its noise intact. However, if we only keep the k largest singular values (which is easy, since many SVD algorithms compute a decomposition where the entries of S are sorted in nonincreasing order), then we have an approximation of the original matrix. This works because we assume that the small values are the noise, and that the more significant patterns in the data will be expressed through the vectors associated with larger singular values.
In fact, the resulting approximation is the most accurate rank-k approximation of the original matrix (has the least squared error).
To answer to the tittle question: SVD is a generalization of eigenvalues/eigenvectors to non-square matrices.
Say,
$X \in N \times p$, then the SVD decomposition of X yields X=UDV^T where D is diagonal and U and V are orthogonal matrices.
Now X^TX is a square matrice, and the SVD decomposition of X^TX=VD^2V where V is equivalent to the eigenvectors of X^TX and D^2 contains the eigenvalues of X^TX.
SVD can also be used to greatly ease global (i.e. to all observations simultaneously) fitting of an arbitrary model (expressed in an formula) to data (with respect to two variables and expressed in a matrix).
For example, data matrix A = D * MT where D represents the possible states of a system and M represents its evolution wrt some variable (e.g. time).
By SVD, A(x,y) = U(x) * S * VT(y) and therefore D * MT = U * S * VT
then D = U * S * VT * MT+ where the "+" indicates a pseudoinverse.
One can then take a mathematical model for the evolution and fit it to the columns of V, each of which are a linear combination the components of the model (this is easy, as each column is a 1D curve). This obtains model parameters which generate M? (the ? indicates it is based on fitting).
M * M?+ * V = V? which allows residuals R * S2 = V - V? to be minimized, thus determining D and M.
Pretty cool, eh?
The columns of U and V can also be inspected to glean information about the data; for example each inflection point in the columns of V typically indicates a different component of the model.
Finally, and actually addressing your question, it is import to note that although each successive singular value (element of the diagonal matrix S) with its attendant vectors U and V does have lower signal to noise, the separation of the components of the model in these "less important" vectors is actually more pronounced. In other words, if the data is described by a bunch of state changes that follow a sum of exponentials or whatever, the relative weights of each exponential get closer together in the smaller singular values. In other other words the later singular values have vectors which are less smooth (noisier) but in which the change represented by each component are more distinct.