(all the) directions perpendicular to hyperplane through p data points - r

I have a simple question:
given p points (non-collinear) in R^p i find the hyperplane passing by these points (to help clarify i type everything in R):
p<-2
x<-matrix(rnorm(p^2),p,p)
b<-solve(crossprod(cbind(1,x[,-2])))%*%crossprod(cbind(1,x[,-2]),x[,2])
then, given a p+1^th points not collinear with first p points, i find the direction perpendicular to b:
x2<-matrix(rnorm(p),p,1)
b2<-solve(c(-b[-1],1)%*%t(c(-b[-1],1))+x2%*%t(x2))%*%x2
That is, b2 defines a p dimensional hyperplane perpendicular to b and passing by x2.
Now, my questions are:
The formula comes from my interpretation of this wikipedia entry ("solve(A)" is the R command for A^-1). Why this doesn't work for p>2 ? What am i doing wrong ?
PS: I have seen this post (on stakeoverflow edit:sorry cannot post more than one link) but somehow it doesn't help me.
Thanks in advance,
i have a problem implementation/understanding of Liu's solution when p>2:
shouldn't the dot product between the qr decomposition of the sweeped matrix and the direction of the hyperplane be 0 ? (i.e. if the qr vectors are perpendicular to the hyperplane)
i.e, when p=2 this
c(-b[2:p],1)%*%c(a1)
gives 0. When p>2 it does not.
Here is my attempt to implement Victor Liu's solution.
a) given p linearly independent observations in R^p:
p<-2;x<-matrix(rnorm(p^2),p,p);x
[,1] [,2]
[1,] -0.4634923 -0.2978151
[2,] 1.0284040 -0.3165424
b) stake them in a matrix and subtract the first row:
a0<-sweep(x,2,x[1,],FUN="-");a0
[,1] [,2]
[1,] 0.000000 0.00000000
[2,] 1.491896 -0.01872726
c) perform a QR decomposition of the matrix a0. The vector in the nullspace is the direction im looking for:
qr(a0)
[,1] [,2]
[1,] -1.491896 0.01872726
[2,] 1.000000 0.00000000
Indeed; this direction is the same as the one given by application of the formula from wikipedia (using x2=(0.4965321,0.6373157)):
[,1]
[1,] 2.04694853
[2,] -0.02569464
...with the advantage that it works in higher dimensions.
I have one last question: what is the meaning of the other p-1 (i.e. (1,0) here) QR vector when p>2 ?
-thanks in advance,

A p-1 dimensional hyperplane is defined by a normal vector and a point that the plane passes through:
n.(x-x0) = 0
where n is the normal vector of length p, x0 is a point through which the hyperplane passes, . is a dot product, and the equation must be satisfied for any point x on the plane. We can also write this as
n.x = p
where p = n.x0 is just a number. This is a more compact representation of a hyperplane, which is parameterized by (n,p). To find your hyperplane, suppose your points are x1, ..., xp.
Form a matrix A with p-1 rows and p columns as follows. The rows of p are xi-x1, laid out as rows vectors, for all i>1 (there are only p-1 of them). If your p points are not "collinear" as you say (they need to be affinely independent), then matrix A will have rank p-1, and a nullspace dimension of 1. The one vector in the nullspace is the normal vector of the hyperplane. Once you find it (call it n), then p = n.x1. In order to find the nullspace of a matrix, you can use a QR decomposition (see here for details).

Related

How to construct the POE ensemble in julia

I'm having a trouble in building the POE ensemble in julia. I am following this paper and part of this other paper.
In julia, I calculate:
X = randn(dim, dim)
Q, R = qr(X)
Q = Q*diagm(sign(diag(R)))
ij = (irealiz-1)*dim
phases_ens[1+ij:ij+dim] = angle(eigvals(Q))
where dim is the matrix dimension and irealiz is just and index for the total number of realizations.
I am interested in the phases of Q, since I want that Q be an orthogonal matrix with the appropriate Haar measure. If dim=50 and the total number of realization is 100000, and since I am correcting Q, I should expect a flat phases_ens distribution. However, I obtain a flat distribution except a peak at zero and at pi. Is there something wrong with the code?
The code is actually correct, you just have the wrong field
The eigenvalue result is true for unitary matrices (complex entries); based on the code from section 4.6 of the Edelman and Rao paper, if you replace the first line by
X = randn(dim, dim) + im*randn(dim, dim)
you get the result you want.
Orthogonal matrices (real entries) behave slightly differently (see remark 1, in section 3 of this paper):
when dims is odd, one eigenvalue will be +1 or -1 (each with probability 1/2), all others will occur as conjugate pairs.
when dims is even, both +1 and -1 will be eigenvalues with probability 1/2, otherwise there are no real eigenvalues.
(Thanks for the links by the way: I wasn't aware of the Stewart paper)

Sign of eigenvectors change depending on specification of the symmetric argument for symmetric matrices

The signs of the eigenvectors in the eigen function change depending on the specification of the symmetric argument. Consider the following example:
set.seed(1234)
data <- matrix(rnorm(200),nrow=100)
cov.matrix <- cov(data)
vectors.1 <- eigen(cov.matrix,symmetric=TRUE)$vectors
vectors.2 <- eigen(cov.matrix,symmetric=FALSE)$vectors
#The second and third eigenvectors have opposite sign
all(vectors.1 == vectors.2)
FALSE
This also has implications for principal component analysis as the princomp function appears to calculate the eigenvectors for the covariance matrix using the eigen function with symmetric set to TRUE.
pca <- princomp(data)
#princomp uses vectors.1
pca$loadings
Loadings:
Comp.1 Comp.2
[1,] -0.366 -0.931
[2,] 0.931 -0.366
Comp.1 Comp.2
SS loadings 1.0 1.0
Proportion Var 0.5 0.5
Cumulative Var 0.5 1.0
vectors.1
[,1] [,2]
[1,] -0.3659208 -0.9306460
[2,] 0.9306460 -0.3659208
Can someone please explain the source or reasoning behind the discrepancy?
Eigenvectors remain eigenvectors after multiplication by a scalar (including -1).
The proof is simple:
If v is an eigenvector of matrix A with matching eigenvalue c, then by definition Av=cv.
Then, A(-v) = -(Av) = -(cv) = c(-v). So -v is also an eigenvector with the same eigenvalue.
The bottom line is that this does not matter and does not change anything.
If you want to change the sign of eigenvector elements, then simply ensure $\mathbf{1}^T\mathbf{e}>1$. In other words, sum all the elements in each eigenvector, and ensure the sum is greater than one. If not, change the sign of each element to the opposite sign. This is the trick to get the sign of eigenvector elements, principal components, and loadings in PCA to come out the same as most statistical software.
Linear algebra libraries like LAPACK contain multiple subroutines for carrying out operations like eigendecompositions. The particular subroutine used in any given case may depend on the type of matrix being decomposed, and the pieces of that decomposition needed by the user.
As you can see in this snippet from eigen's code, it dispatches different LAPACK subroutines depending on whether symmetric=TRUE or symmetric=FALSE (and also, on whether the matrix is real or complex).
if (symmetric) {
z <- if (!complex.x)
.Internal(La_rs(x, only.values))
else .Internal(La_rs_cmplx(x, only.values))
ord <- rev(seq_along(z$values))
}
else {
z <- if (!complex.x)
.Internal(La_rg(x, only.values))
else .Internal(La_rg_cmplx(x, only.values))
ord <- sort.list(Mod(z$values), decreasing = TRUE)
}
Based on pointers in ?eigen, La_rs() (used when symmetric=TRUE) appears to refer to dsyevr while La_rg() refers to dgeev.
To learn exactly why those two algorithms switch some of the signs of the eigenvectors of the matrix you've handed to eigen(), you'd have to dig into the FORTRAN code used to implement them. (Since, as others have noted, the sign is irrelevant, I'm guessing you won't want to dig quite that deep ;).

normalizing matrices in R

How do I normalize/scale matrices in R by column. For example, when I compute eigenvectors of a matrix, R returns:
> eigen(matrix(c(2,-2,-2,5),2,2))$vectors
[,1] [,2]
[1,] -0.4472136 -0.8944272
[2,] 0.8944272 -0.4472136
// should be normalized to
[,1] [,2]
[1,] -1 -2
[2,] 2 -1
The function "scale" subtracts the means and divided by standard deviation by column which does not help in this case. How do I achieve this?
This produces the matrix you say you want:
> a <- eigen(matrix(c(2,-2,-2,5),2,2))$vectors
> a / min(abs(a))
[,1] [,2]
[1,] -1 -2
[2,] 2 -1
But I'm not sure I understand exactly what you want, so this may not do the right thing in general.
Wolfram Alpha gives the following result:
http://www.wolframalpha.com/input/?i=eigenvalues{{2,-2},{-2,5}}
Input:
alt text http://www4a.wolframalpha.com/Calculate/MSP/MSP2019c09551ice7322c0000597gh9iecce8ce5a?MSPStoreType=image/gif&s=58&w=162&h=36
Eigenvalues:
alt text http://www4a.wolframalpha.com/Calculate/MSP/MSP2319c09551ice7322c00000d87ab28c27g8i27?MSPStoreType=image/gif&s=58&w=500&h=52
Eigenvectors:
alt text http://www4a.wolframalpha.com/Calculate/MSP/MSP2619c09551ice7322c00001c9hcg6e2bgiefgf?MSPStoreType=image/gif&s=58&w=500&h=64
I'm not sure what you're talking about with means and standard deviations. A good iterative method like QR should get you the eigenvalues and eigenvectors you need. Check out Jacobi or Householder.
You normalize any vector by dividing every component by the square root of the sum of squares of its components. A unit vector will have magnitude equal to one.
In your case this is true: the vectors being presented by R have been normalized. If you normalize the two Wolfram eigenvectors, you'll see that both have a magnitude equal to the square root of 5. Divide each column vector by this value and you'll get the ones given to you by R. Both are correct.

Decompose complex matrix transformation into a series of simple transformations?

I wonder if it is possible (and if it is then how) to re-present an arbitrary M3 matrix transformation as a sequence of simpler transformations (such as translate, scale, skew, rotate)
In other words: how to calculate MTranslate, MScale, MRotate, MSkew matrices from the MComplex so that the following equation would be true:
MComplex = MTranslate * MScale * MRotate * MSkew (or in an other order)
Singular Value Decomposition (see also this blog and this PDF). It turns an arbitrary matrix into a composition of 3 matrices: orthogonal + diagonal + orthogonal. The orthogonal matrices are rotation matrices; the diagonal matrix represents skewing along the primary axes = scaling.
The translation throws a monkey wrench into the game, but what you should do is take out the translation part of the matrix so you have a 3x3 matrix, run SVD on that to give you the rotation+skewing, then add the translation part back in. That way you'll have a rotation + scale + rotation + translate composition of 4 matrices. It's probably possible to do this in 3 matrices (rotation + scaling along some set of axes + translation) but I'm not sure exactly how... maybe a QR decomposition (Q = orthogonal = rotation, but I'm not sure if the R is skew-only or has a rotational part.)
Yes, but the solution will not be unique. Also you should rather put translation at the end (the order of the rest doesn't matter)
For any given square matrix A there exists infinitely many matrices B and C so that A = B*C. Choose any invertible matrix B (which means that B^-1 exists or det(B) != 0) and now C = B^-1*A.
So for your solution first decompose MC into MT and MS*MR*MSk*I, choosing MT to be some invertible transposition matrix. Then decompose the rest into MS and MR*MSk*I so that MS is arbitrary scaling matrix. And so on...
Now if at the end of the fun I is an identity matrix (with 1 on diagonal, 0 elsewhere) you're good. If it is not, start over, but choose different matrices ;-)
In fact, using the method above symbolically you can create set of equations that will yield you a parametrized formulas for all of these matrices.
How useful these decompositions would be for you, well - that's another story.
If you type this into Mathematica or Maxima they'll compute this for you in no time.

What is SVD(singular value decomposition)

How does it actually reduce noise..can you suggest some nice tutorials?
SVD can be understood from a geometric sense for square matrices as a transformation on a vector.
Consider a square n x n matrix M multiplying a vector v to produce an output vector w:
w = M*v
The singular value decomposition M is the product of three matrices M=U*S*V, so w=U*S*V*v. U and V are orthonormal matrices. From a geometric transformation point of view (acting upon a vector by multiplying it), they are combinations of rotations and reflections that do not change the length of the vector they are multiplying. S is a diagonal matrix which represents scaling or squashing with different scaling factors (the diagonal terms) along each of the n axes.
So the effect of left-multiplying a vector v by a matrix M is to rotate/reflect v by M's orthonormal factor V, then scale/squash the result by a diagonal factor S, then rotate/reflect the result by M's orthonormal factor U.
One reason SVD is desirable from a numerical standpoint is that multiplication by orthonormal matrices is an invertible and extremely stable operation (condition number is 1). SVD captures any ill-conditioned-ness in the diagonal scaling matrix S.
One way to use SVD to reduce noise is to do the decomposition, set components that are near zero to be exactly zero, then re-compose.
Here's an online tutorial on SVD.
You might want to take a look at Numerical Recipes.
Singular value decomposition is a method for taking an nxm matrix M and "decomposing" it into three matrices such that M=USV. S is a diagonal square (the only nonzero entries are on the diagonal from top-left to bottom-right) matrix containing the "singular values" of M. U and V are orthogonal, which leads to the geometric understanding of SVD, but that isn't necessary for noise reduction.
With M=USV, we still have the original matrix M with all its noise intact. However, if we only keep the k largest singular values (which is easy, since many SVD algorithms compute a decomposition where the entries of S are sorted in nonincreasing order), then we have an approximation of the original matrix. This works because we assume that the small values are the noise, and that the more significant patterns in the data will be expressed through the vectors associated with larger singular values.
In fact, the resulting approximation is the most accurate rank-k approximation of the original matrix (has the least squared error).
To answer to the tittle question: SVD is a generalization of eigenvalues/eigenvectors to non-square matrices.
Say,
$X \in N \times p$, then the SVD decomposition of X yields X=UDV^T where D is diagonal and U and V are orthogonal matrices.
Now X^TX is a square matrice, and the SVD decomposition of X^TX=VD^2V where V is equivalent to the eigenvectors of X^TX and D^2 contains the eigenvalues of X^TX.
SVD can also be used to greatly ease global (i.e. to all observations simultaneously) fitting of an arbitrary model (expressed in an formula) to data (with respect to two variables and expressed in a matrix).
For example, data matrix A = D * MT where D represents the possible states of a system and M represents its evolution wrt some variable (e.g. time).
By SVD, A(x,y) = U(x) * S * VT(y) and therefore D * MT = U * S * VT
then D = U * S * VT * MT+ where the "+" indicates a pseudoinverse.
One can then take a mathematical model for the evolution and fit it to the columns of V, each of which are a linear combination the components of the model (this is easy, as each column is a 1D curve). This obtains model parameters which generate M? (the ? indicates it is based on fitting).
M * M?+ * V = V? which allows residuals R * S2 = V - V? to be minimized, thus determining D and M.
Pretty cool, eh?
The columns of U and V can also be inspected to glean information about the data; for example each inflection point in the columns of V typically indicates a different component of the model.
Finally, and actually addressing your question, it is import to note that although each successive singular value (element of the diagonal matrix S) with its attendant vectors U and V does have lower signal to noise, the separation of the components of the model in these "less important" vectors is actually more pronounced. In other words, if the data is described by a bunch of state changes that follow a sum of exponentials or whatever, the relative weights of each exponential get closer together in the smaller singular values. In other other words the later singular values have vectors which are less smooth (noisier) but in which the change represented by each component are more distinct.

Resources