LU decomposition of rectangular matrices - r

Method lu of package Matrix works fine for square matrices. However, I can't see why there is that square restriction. How can I perform LU decomposition on a rectangular matrix?

You can embed it into a identity matrix:
[ a11 a12 a13 ]
[ a21 a22 a23 ]
[ 0 0 1 ]
LU decomposition is for square matrices only. You may want to check Wikipedia for a refreshing.

Non-square matricies mean different things.
If it has more rows than columns (more equations than unknowns), it means you need a least squares approximation. You can pre-multiply both sides by the transpose of A and use LU decomp on that. The result is the least squares "best" solution.
If it has fewer rows than columns (more unknowns than equations), you need Singular Value Decomposition (SVD). It'll give you the best solution and the null space as well.

Related

Calculate the reconstruction error as the difference between the original and the reconstructed matrix

I am currently in an online class in genomics, coming in as a wetlab physician, so my statistical knowledge is not the best. Right now we are working on PCA and SVD in R. I got a big matrix:
head(mat)
ALL_GSM330151.CEL ALL_GSM330153.CEL ALL_GSM330154.CEL ALL_GSM330157.CEL ALL_GSM330171.CEL ALL_GSM330174.CEL ALL_GSM330178.CEL ALL_GSM330182.CEL
ENSG00000224137 5.326553 3.512053 3.455480 3.472999 3.639132 3.391880 3.282522 3.682531
ENSG00000153253 6.436815 9.563955 7.186604 2.946697 6.949510 9.095092 3.795587 11.987291
ENSG00000096006 6.943404 8.840839 4.600026 4.735104 4.183136 3.049792 9.736803 3.338362
ENSG00000229807 3.322499 3.263655 3.406379 9.525888 3.595898 9.281170 8.946498 3.473750
ENSG00000138772 7.195113 8.741458 6.109578 5.631912 5.224844 3.260912 8.889246 3.052587
ENSG00000169575 7.853829 10.428492 10.512497 13.041571 10.836815 11.964498 10.786381 11.953912
Those are just the first few columns and rows, it has 60 columns and 1000 rows. Columns are cancer samples, rows are genes
The task is to:
removing the eigenvectors and reconstructing the matrix using SVD, then we need to calculate the reconstruction error as the difference between the original and the reconstructed matrix. HINT: You have to use the svd() function and equalize the eigenvalue to $0$ for the component you want to remove.
I have been all over google, but can't find a way to solve this task, which might be because I don't really get the question itself.
so i performed SVD on my matrix m:
d <- svd(mat)
Which gives me 3 matrices (Eigenassays, Eigenvalues and Eigenvectors), which i can access using d$u and so on.
How do I equalize the eigenvalue and ultimately calculate the error?
https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/svd
the decomposition expresses your matrix mat as a product of 3 matrices
mat = d$u x diag(d$d) x t(d$v)
so first confirm you are able to do the matrix multiplications to get back mat
once you are able to do this, set the last couple of elements of d$d to zero before doing the matrix multiplication
It helps to create a function that handles the singular values.
Here, for instance, is one that zeros out any singular value that is too small compared to the largest singular value:
zap <- function(d, digits = 3) ifelse(d < 10^(-digits) * max(abs(d))), 0, d)
Although mathematically all singular values are guaranteed non-negative, numerical issues with floating point algorithms can--and do--create negative singular values, so I have prophylactically wrapped the singular values in a call to abs.
Apply this function to the diagonal matrix in the SVD of a matrix X and reconstruct the matrix by multiplying the components:
X. <- with(svd(X), u %*% diag(zap(d)) %*% t(v))
There are many ways to assess the reconstruction error. One is the Frobenius norm of the difference,
sqrt(sum((X - X.)^2))

Cholesky decomposition failure for my correlation matrix

I am trying to use chol() to find the Cholesky decomposition of the correlation matrix below. Is there a maximum size I can use that function on? I am asking because I get the following:
d <-chol(corrMat)
Error in chol.default(corrMat) :
the leading minor of order 61 is not positive definite
but, I can decompose it for less than 60 elements without a problem (even when it contains the 61st element of the original):
> d <-chol(corrMat[10:69, 10:69])
> d <-chol(corrMat[10:70, 10:70])
Error in chol.default(corrMat[10:70, 10:70]) :
the leading minor of order 61 is not positive definite
Here is the matrix:
https://drive.google.com/open?id=0B0F1yWDNKi2vNkJHMDVHLWh4WjA
The problem is not size, but numerical rank!
d <- chol(corrMat, pivot = TRUE)
dim(corrMat)
#[1] 72 72
attr(d, "rank")
#[1] 62
corrMat is not positive-definite. Ordinary Cholesky factorization will fail, but pivoted version works.
The correct Cholesky factor here can be obtained (see Correct use of pivot in Cholesky decomposition of positive semi-definite matrix)
r <- attr(d, "rank")
reverse_piv <- order(attr(d, "pivot"))
d[-(1:r), -(1:r)] <- 0
R <- d[, reverse_piv]
Whether this is acceptable depends on your context. It might need corresponding adjustment to your other code.
Pivoted Cholesky factorization can do many things that sound impossible for a deficient, non-invertible covariance matrix, like
sampling (Generate multivariate normal r.v.'s with rank-deficient covariance via Pivoted Cholesky Factorization);
least squares (linear regression by solving normal equations)

Calculate Rao's quadratic entropy

Rao QE is a weighted Euclidian distance matrix. I have the vectors for the elements of the d_ijs in a data table dt, one column per element (say there are x of them). p is the final column. nrow = S. The double sums are for the lower left (or upper right since it is symmetric) elements of the distance matrix.
If I only needed an unweighted distance matrix I could simply do dist() over the x columns. How do I weight the d_ijs by the product of p_i and p_j?
And example data set is at https://github.com/GeraldCNelson/nutmod/blob/master/RaoD_example.csv with the ps in the column called foodQ.ratio.
You still start with dist for the raw Euclidean distance matrix. Let it be D. As you will read from R - How to get row & column subscripts of matched elements from a distance matrix, a "dist" object is not a real matrix, but a 1D array. So first do D <- as.matrix(D) or D <- dist2mat(D) to convert it to a complete matrix before the following.
Now, let p be the vector of weights, the Rao's QE is just a quadratic form q'Dq / 2:
c(crossprod(p, D %*% p)) / 2
Note, I am not doing everything in the most efficient way. I have performed a symmetric matrix-vector multiplication D %*% p using the full D rather than just its lower triangular part. However, R does not have a routine doing triangular matrix-vector multiplication. So I compute the full version than divide 2.
This doubles computation amount that is necessary; also, making D a full matrix doubles memory costs. But if your problem is small to medium size this is absolutely fine. For large problem, if you are R and C wizard, call BLAS routine dtrmv or even dtpmv for the triangular matrix-vector computation.
Update
I just found this simple paper: Rao's quadratic entropy as a measure of functional diversity based on multiple traits for definition and use of Rao's EQ. It mentions that we can replace Euclidean distance with Mahalanobis distance. In case we want to do this, use my code in Mahalanobis distance of each pair of observations for fast computation of Mahalanobis distance matrix.

What is SVD(singular value decomposition)

How does it actually reduce noise..can you suggest some nice tutorials?
SVD can be understood from a geometric sense for square matrices as a transformation on a vector.
Consider a square n x n matrix M multiplying a vector v to produce an output vector w:
w = M*v
The singular value decomposition M is the product of three matrices M=U*S*V, so w=U*S*V*v. U and V are orthonormal matrices. From a geometric transformation point of view (acting upon a vector by multiplying it), they are combinations of rotations and reflections that do not change the length of the vector they are multiplying. S is a diagonal matrix which represents scaling or squashing with different scaling factors (the diagonal terms) along each of the n axes.
So the effect of left-multiplying a vector v by a matrix M is to rotate/reflect v by M's orthonormal factor V, then scale/squash the result by a diagonal factor S, then rotate/reflect the result by M's orthonormal factor U.
One reason SVD is desirable from a numerical standpoint is that multiplication by orthonormal matrices is an invertible and extremely stable operation (condition number is 1). SVD captures any ill-conditioned-ness in the diagonal scaling matrix S.
One way to use SVD to reduce noise is to do the decomposition, set components that are near zero to be exactly zero, then re-compose.
Here's an online tutorial on SVD.
You might want to take a look at Numerical Recipes.
Singular value decomposition is a method for taking an nxm matrix M and "decomposing" it into three matrices such that M=USV. S is a diagonal square (the only nonzero entries are on the diagonal from top-left to bottom-right) matrix containing the "singular values" of M. U and V are orthogonal, which leads to the geometric understanding of SVD, but that isn't necessary for noise reduction.
With M=USV, we still have the original matrix M with all its noise intact. However, if we only keep the k largest singular values (which is easy, since many SVD algorithms compute a decomposition where the entries of S are sorted in nonincreasing order), then we have an approximation of the original matrix. This works because we assume that the small values are the noise, and that the more significant patterns in the data will be expressed through the vectors associated with larger singular values.
In fact, the resulting approximation is the most accurate rank-k approximation of the original matrix (has the least squared error).
To answer to the tittle question: SVD is a generalization of eigenvalues/eigenvectors to non-square matrices.
Say,
$X \in N \times p$, then the SVD decomposition of X yields X=UDV^T where D is diagonal and U and V are orthogonal matrices.
Now X^TX is a square matrice, and the SVD decomposition of X^TX=VD^2V where V is equivalent to the eigenvectors of X^TX and D^2 contains the eigenvalues of X^TX.
SVD can also be used to greatly ease global (i.e. to all observations simultaneously) fitting of an arbitrary model (expressed in an formula) to data (with respect to two variables and expressed in a matrix).
For example, data matrix A = D * MT where D represents the possible states of a system and M represents its evolution wrt some variable (e.g. time).
By SVD, A(x,y) = U(x) * S * VT(y) and therefore D * MT = U * S * VT
then D = U * S * VT * MT+ where the "+" indicates a pseudoinverse.
One can then take a mathematical model for the evolution and fit it to the columns of V, each of which are a linear combination the components of the model (this is easy, as each column is a 1D curve). This obtains model parameters which generate M? (the ? indicates it is based on fitting).
M * M?+ * V = V? which allows residuals R * S2 = V - V? to be minimized, thus determining D and M.
Pretty cool, eh?
The columns of U and V can also be inspected to glean information about the data; for example each inflection point in the columns of V typically indicates a different component of the model.
Finally, and actually addressing your question, it is import to note that although each successive singular value (element of the diagonal matrix S) with its attendant vectors U and V does have lower signal to noise, the separation of the components of the model in these "less important" vectors is actually more pronounced. In other words, if the data is described by a bunch of state changes that follow a sum of exponentials or whatever, the relative weights of each exponential get closer together in the smaller singular values. In other other words the later singular values have vectors which are less smooth (noisier) but in which the change represented by each component are more distinct.

Align point clouds via 3 points correlation?

Let's say I have 3 point clouds: first that has 3 points {x1,y1,z1}, {x2,y2,z2}, {x3,y3,z3} and second point cloud that has same points as {xx1, yy1, zz1}, {xx2,yy2,zz2}, {xx3,yy3,zz3}... I assume to align second point cloud to first I have to multiply second one's points by T[3x3matrix].
1) So how do I find this transform matrix(T) ? I tried to do the equations by hand, but failed to solve them. Is there an solution somewhere, cause I'm pretty sure I'm not the first one to stumble into the problem.
2) I assume that matrix might include skewing and shearing. Is there a way to find matrix with only 7 degrees of freedom (3translation, 3rotation, 1scale)?
The transformation matrix T1 that takes the unit vectors {1, 0, 0}, {0, 1, 0}, and {0, 0, 1} to {x1, y1, z1}, {x2, y2, z2}, {x3, y3, z3} is simply
| x1 x2 x3 |
T1 = | y1 y2 y3 |
| z1 z2 z3 |
And likewise the transformation T2 that takes those 3 unit vectors to the second set of points is
| xx1 xx2 xx3 |
T2 = | yy1 yy1 yy3 |
| zz1 zz2 zz3 |
Therefore, the matrix that takes the first three points to the second three points is given by T2 * T1-1. If T1 is non-singular, then this transformation is uniquely determined, so it has no degrees of freedom. If T1 is a singular matrix, then there could be no solutions, or there could be infinitely many solutions.
When you say you want 7 degrees of freedom, this is somewhat of a misuse of terminology. In the general case, this matrix is composed of 3 rotational degrees of freedom, 3 scaling degrees, and 3 shearing degrees, making a total of 9. You can figure out these parameters by performing a QR factorization. The Q matrix gives you the rotational parameters, and the R matrix gives you the scaling parameters (along the diagonal) and the shearing parameters (above the diagonal).
Approach of Adam Rosenfield is correct. But solution as T2 * Inv (T1) is wrong. Since in Matrix multiplication A * B != B * A : Hence result is Inv(T1) * T2
The seven parameter transformation that you are talking about is referred to as a 3d conformal transformation, or sometimes a 3d similarity transformation given that the two clouds are similar. If the two shapes are identical, Adam Rosenfields solution is good. Where there are small differences, and you wish to get a best fit, the most commonly used solution is a Helmert transformation which uses a least squares approach to minimise the residuals. The wikipedia and google stuff on this doesn't seem great at a glance. My reference on this is Ghilani & Wolf's adjustment computations, p345. This is also a great book on matrix math as applied to spatial problems and a good addition to the library.
edit: Adam's 9 parameter version of this transformation is referred to as an affine transformation
Here is an example of computing least-squares estimates of the parameters of a 2D affine transformation in R.

Resources