I have 2 sets of 3D Vectors with N entries. I am trying to calculate the Rotation matrix which best aligns the first set with the second set.
I believe I can use the java library JAMA to accomplish this with Singular Value Decomposition or EigenValue Decomposition.
1) Is SVD or EVD the correct algorithm to use?
2) SVD/EVD in JAMA requires a Matrix. How do I populate the matrix based on my two sets of Vectors?
Here is a 2-D version of what I believe you are describing (translating it to
3-D should be straightforward, except the m-matrix will be 3x3 and have
shftx/y/z entries).
An affine transformation of a point (x, y) to a point (u,
v) can be written as:
u m11 m12 x shftx
= * +
v m21 m22 y shfty
You can rewrite this as:
x y 0 0 1 0 m11 u
* =
0 0 x y 0 1 m12 v
The reason for doing it this way is that you make the matrix on the left (with
the x/y/0/1 values) have up to as many rows as there are points in your data
set. If you call this matrix X, and the column vector on the right U, then the
problem becomes finding the least-squares solution m to the equation X * m = U.
You can solve this via (new QRDecomposition(X)).solve(U). I should say that in
at least one version of QRDecomposition there was a bug in the code that was
assuming the wrong dimensions for the solution matrix, but I fixed it by
changing one line in the solve() method.
So we have a matrix like
with length 2xN and another
with length 2xN and there is some function f(x, y) that returns z[1], z[2]. That 2 matrices that we were given represent known value pairs for x,y and z[1],z[2]. What are interpolation formulas that would help in such case?
If you solve the problem for one return value, you can find two functions f_1(x,y) and f_2(x,y) by interpolation, and compose your function as f(x, y) = [f_1(x,y), f_2(x,y)]. Just pick any method for solving the interpolation function suitable for your problem.
For the actual interpolation problem in two dimensions, there are a lot of ways you can handle this. If simple is what you require, you can go with linear interpolation. If you are OK with piecewise functions, you can go for bezier curves, or splines. Or, if data is uniform, you could get away with a simple polynomial interpolation (well, not quite trivial when in 2D, but easy enough).
EDIT: More information and some links.
A piecewise solution is possible using Bilinear interpolation (wikipedia).
For polynomial interpolation, if your data is on a grid, you can use the following algorithm (I cannot find the reference for it, it is from memory).
If the data points are on a k by l grid, rewrite your polynomial as follows:
f(x,y) = cx_1(x)*y^(k-1) + cx_2(x)*y^(k-2) + ... + cx_k(x)
Here, each coefficient cx_i(x) is also a polynomial of degree l. The first step is to find k polynomials of degree l by interpolating each row or column of the grid. When this is done, you have l coefficient sets (or, in other words, l polynomials) as interpolation points for each cx_i(x) polynomials as cx_i(x0), cx_i(x1), ..., cx_i(xl) (giving you a total of l*k points). Now, you can determine these polynomials using the above constants as the interpolation points, which give you the resulting f(x,y).
The same method is used for bezier curves or splines. The only difference is that you use control points instead of polynomial coefficients. You first get a set of splines that will generate your data points, and then you interpolate the control points of these intermediate curves to get the control points of the surface curve.
Let me add an example to clarify the above algorithm. Let's have the following data points:
0,0 => 1
0,1 => 2
1,0 => 3
1,1 => 4
We start by fitting two polynomials: one for data points (0,0) and (0,1), and another for (1, 0) and (1, 1):
f_0(x) = x + 1
f_1(x) = x + 3
Now, we interpolate in the other direction to determine the coefficients.When we read these polynomial coefficients vertically, we need two polynomials. One evaluates to 1 at both 0 and 1; and another that evaluates to 1 at 0, and 3 at 1:
cy_1(y) = 1
cy_2(y) = 2*y + 1
If we combine these into f(x,y), we get:
f(x,y) = cy_1(y)*x + cy_2(y)
= 1*x + (2*y + 1)*1
= x + 2*y + 1
I have a simple question:
given p points (non-collinear) in R^p i find the hyperplane passing by these points (to help clarify i type everything in R):
then, given a p+1^th points not collinear with first p points, i find the direction perpendicular to b:
That is, b2 defines a p dimensional hyperplane perpendicular to b and passing by x2.
Now, my questions are:
The formula comes from my interpretation of this wikipedia entry ("solve(A)" is the R command for A^-1). Why this doesn't work for p>2 ? What am i doing wrong ?
PS: I have seen this post (on stakeoverflow edit:sorry cannot post more than one link) but somehow it doesn't help me.
Thanks in advance,
i have a problem implementation/understanding of Liu's solution when p>2:
shouldn't the dot product between the qr decomposition of the sweeped matrix and the direction of the hyperplane be 0 ? (i.e. if the qr vectors are perpendicular to the hyperplane)
i.e, when p=2 this
gives 0. When p>2 it does not.
Here is my attempt to implement Victor Liu's solution.
a) given p linearly independent observations in R^p:
[,1] [,2]
[1,] -0.4634923 -0.2978151
[2,] 1.0284040 -0.3165424
b) stake them in a matrix and subtract the first row:
[,1] [,2]
[1,] 0.000000 0.00000000
[2,] 1.491896 -0.01872726
c) perform a QR decomposition of the matrix a0. The vector in the nullspace is the direction im looking for:
[,1] [,2]
[1,] -1.491896 0.01872726
[2,] 1.000000 0.00000000
Indeed; this direction is the same as the one given by application of the formula from wikipedia (using x2=(0.4965321,0.6373157)):
[1,] 2.04694853
[2,] -0.02569464
...with the advantage that it works in higher dimensions.
I have one last question: what is the meaning of the other p-1 (i.e. (1,0) here) QR vector when p>2 ?
-thanks in advance,
A p-1 dimensional hyperplane is defined by a normal vector and a point that the plane passes through:
n.(x-x0) = 0
where n is the normal vector of length p, x0 is a point through which the hyperplane passes, . is a dot product, and the equation must be satisfied for any point x on the plane. We can also write this as
n.x = p
where p = n.x0 is just a number. This is a more compact representation of a hyperplane, which is parameterized by (n,p). To find your hyperplane, suppose your points are x1, ..., xp.
Form a matrix A with p-1 rows and p columns as follows. The rows of p are xi-x1, laid out as rows vectors, for all i>1 (there are only p-1 of them). If your p points are not "collinear" as you say (they need to be affinely independent), then matrix A will have rank p-1, and a nullspace dimension of 1. The one vector in the nullspace is the normal vector of the hyperplane. Once you find it (call it n), then p = n.x1. In order to find the nullspace of a matrix, you can use a QR decomposition (see here for details).
I wonder if it is possible (and if it is then how) to re-present an arbitrary M3 matrix transformation as a sequence of simpler transformations (such as translate, scale, skew, rotate)
In other words: how to calculate MTranslate, MScale, MRotate, MSkew matrices from the MComplex so that the following equation would be true:
MComplex = MTranslate * MScale * MRotate * MSkew (or in an other order)
Singular Value Decomposition (see also this blog and this PDF). It turns an arbitrary matrix into a composition of 3 matrices: orthogonal + diagonal + orthogonal. The orthogonal matrices are rotation matrices; the diagonal matrix represents skewing along the primary axes = scaling.
The translation throws a monkey wrench into the game, but what you should do is take out the translation part of the matrix so you have a 3x3 matrix, run SVD on that to give you the rotation+skewing, then add the translation part back in. That way you'll have a rotation + scale + rotation + translate composition of 4 matrices. It's probably possible to do this in 3 matrices (rotation + scaling along some set of axes + translation) but I'm not sure exactly how... maybe a QR decomposition (Q = orthogonal = rotation, but I'm not sure if the R is skew-only or has a rotational part.)
Yes, but the solution will not be unique. Also you should rather put translation at the end (the order of the rest doesn't matter)
For any given square matrix A there exists infinitely many matrices B and C so that A = B*C. Choose any invertible matrix B (which means that B^-1 exists or det(B) != 0) and now C = B^-1*A.
So for your solution first decompose MC into MT and MS*MR*MSk*I, choosing MT to be some invertible transposition matrix. Then decompose the rest into MS and MR*MSk*I so that MS is arbitrary scaling matrix. And so on...
Now if at the end of the fun I is an identity matrix (with 1 on diagonal, 0 elsewhere) you're good. If it is not, start over, but choose different matrices ;-)
In fact, using the method above symbolically you can create set of equations that will yield you a parametrized formulas for all of these matrices.
How useful these decompositions would be for you, well - that's another story.
If you type this into Mathematica or Maxima they'll compute this for you in no time.
How does it actually reduce noise..can you suggest some nice tutorials?
SVD can be understood from a geometric sense for square matrices as a transformation on a vector.
Consider a square n x n matrix M multiplying a vector v to produce an output vector w:
w = M*v
The singular value decomposition M is the product of three matrices M=U*S*V, so w=U*S*V*v. U and V are orthonormal matrices. From a geometric transformation point of view (acting upon a vector by multiplying it), they are combinations of rotations and reflections that do not change the length of the vector they are multiplying. S is a diagonal matrix which represents scaling or squashing with different scaling factors (the diagonal terms) along each of the n axes.
So the effect of left-multiplying a vector v by a matrix M is to rotate/reflect v by M's orthonormal factor V, then scale/squash the result by a diagonal factor S, then rotate/reflect the result by M's orthonormal factor U.
One reason SVD is desirable from a numerical standpoint is that multiplication by orthonormal matrices is an invertible and extremely stable operation (condition number is 1). SVD captures any ill-conditioned-ness in the diagonal scaling matrix S.
One way to use SVD to reduce noise is to do the decomposition, set components that are near zero to be exactly zero, then re-compose.
Here's an online tutorial on SVD.
You might want to take a look at Numerical Recipes.
Singular value decomposition is a method for taking an nxm matrix M and "decomposing" it into three matrices such that M=USV. S is a diagonal square (the only nonzero entries are on the diagonal from top-left to bottom-right) matrix containing the "singular values" of M. U and V are orthogonal, which leads to the geometric understanding of SVD, but that isn't necessary for noise reduction.
With M=USV, we still have the original matrix M with all its noise intact. However, if we only keep the k largest singular values (which is easy, since many SVD algorithms compute a decomposition where the entries of S are sorted in nonincreasing order), then we have an approximation of the original matrix. This works because we assume that the small values are the noise, and that the more significant patterns in the data will be expressed through the vectors associated with larger singular values.
In fact, the resulting approximation is the most accurate rank-k approximation of the original matrix (has the least squared error).
To answer to the tittle question: SVD is a generalization of eigenvalues/eigenvectors to non-square matrices.
$X \in N \times p$, then the SVD decomposition of X yields X=UDV^T where D is diagonal and U and V are orthogonal matrices.
Now X^TX is a square matrice, and the SVD decomposition of X^TX=VD^2V where V is equivalent to the eigenvectors of X^TX and D^2 contains the eigenvalues of X^TX.
SVD can also be used to greatly ease global (i.e. to all observations simultaneously) fitting of an arbitrary model (expressed in an formula) to data (with respect to two variables and expressed in a matrix).
For example, data matrix A = D * MT where D represents the possible states of a system and M represents its evolution wrt some variable (e.g. time).
By SVD, A(x,y) = U(x) * S * VT(y) and therefore D * MT = U * S * VT
then D = U * S * VT * MT+ where the "+" indicates a pseudoinverse.
One can then take a mathematical model for the evolution and fit it to the columns of V, each of which are a linear combination the components of the model (this is easy, as each column is a 1D curve). This obtains model parameters which generate M? (the ? indicates it is based on fitting).
M * M?+ * V = V? which allows residuals R * S2 = V - V? to be minimized, thus determining D and M.
Pretty cool, eh?
The columns of U and V can also be inspected to glean information about the data; for example each inflection point in the columns of V typically indicates a different component of the model.
Finally, and actually addressing your question, it is import to note that although each successive singular value (element of the diagonal matrix S) with its attendant vectors U and V does have lower signal to noise, the separation of the components of the model in these "less important" vectors is actually more pronounced. In other words, if the data is described by a bunch of state changes that follow a sum of exponentials or whatever, the relative weights of each exponential get closer together in the smaller singular values. In other other words the later singular values have vectors which are less smooth (noisier) but in which the change represented by each component are more distinct.
Let's say I have 3 point clouds: first that has 3 points {x1,y1,z1}, {x2,y2,z2}, {x3,y3,z3} and second point cloud that has same points as {xx1, yy1, zz1}, {xx2,yy2,zz2}, {xx3,yy3,zz3}... I assume to align second point cloud to first I have to multiply second one's points by T[3x3matrix].
1) So how do I find this transform matrix(T) ? I tried to do the equations by hand, but failed to solve them. Is there an solution somewhere, cause I'm pretty sure I'm not the first one to stumble into the problem.
2) I assume that matrix might include skewing and shearing. Is there a way to find matrix with only 7 degrees of freedom (3translation, 3rotation, 1scale)?
The transformation matrix T1 that takes the unit vectors {1, 0, 0}, {0, 1, 0}, and {0, 0, 1} to {x1, y1, z1}, {x2, y2, z2}, {x3, y3, z3} is simply
| x1 x2 x3 |
T1 = | y1 y2 y3 |
| z1 z2 z3 |
And likewise the transformation T2 that takes those 3 unit vectors to the second set of points is
| xx1 xx2 xx3 |
T2 = | yy1 yy1 yy3 |
| zz1 zz2 zz3 |
Therefore, the matrix that takes the first three points to the second three points is given by T2 * T1-1. If T1 is non-singular, then this transformation is uniquely determined, so it has no degrees of freedom. If T1 is a singular matrix, then there could be no solutions, or there could be infinitely many solutions.
When you say you want 7 degrees of freedom, this is somewhat of a misuse of terminology. In the general case, this matrix is composed of 3 rotational degrees of freedom, 3 scaling degrees, and 3 shearing degrees, making a total of 9. You can figure out these parameters by performing a QR factorization. The Q matrix gives you the rotational parameters, and the R matrix gives you the scaling parameters (along the diagonal) and the shearing parameters (above the diagonal).
Approach of Adam Rosenfield is correct. But solution as T2 * Inv (T1) is wrong. Since in Matrix multiplication A * B != B * A : Hence result is Inv(T1) * T2
The seven parameter transformation that you are talking about is referred to as a 3d conformal transformation, or sometimes a 3d similarity transformation given that the two clouds are similar. If the two shapes are identical, Adam Rosenfields solution is good. Where there are small differences, and you wish to get a best fit, the most commonly used solution is a Helmert transformation which uses a least squares approach to minimise the residuals. The wikipedia and google stuff on this doesn't seem great at a glance. My reference on this is Ghilani & Wolf's adjustment computations, p345. This is also a great book on matrix math as applied to spatial problems and a good addition to the library.
edit: Adam's 9 parameter version of this transformation is referred to as an affine transformation
Here is an example of computing least-squares estimates of the parameters of a 2D affine transformation in R.