How to represent a 4x4 matrix rotation? - math

Given the following definitions for x,y,z rotation matrices, how do I represent this as one complete matrix? Simply multiply x, y, & matrices?
X Rotation:
[1 0 0 0]
[0 cos(-X Angle) -sin(-X Angle) 0]
[0 sin(-X Angle) cos(-X Angle) 0]
[0 0 0 1]
Y Rotation:
[cos(-Y Angle) 0 sin(-Y Angle) 0]
[0 1 0 0]
[-sin(-Y Angle) 0 cos(-Y Angle) 0]
[0 0 0 1]
Z Rotation:
[cos(-Z Angle) -sin(-Z Angle) 0 0]
[sin(-Z Angle) cos(-Z Angle) 0 0]
[0 0 1 0]
[0 0 0 1]
Edit: I have a separate rotation class that contains an x, y, z float value, which I later convert to a matrix in order to combine with other translations / scales / rotations.
Judging from the answers here, I can assume that if I do something like:
Rotation rotation;
rotation.SetX(45);
rotation.SetY(90);
rotation.SetZ(180);
Then it's actually really important as to which order the rotations are applied? Or is it safe to make the assumption that when using the rotation class, you accept that they are applied in x, y, z order?

Yes, multiplying the three matrices in turn will compose them.
EDIT:
The order that you apply multiplication to the matrices will determine the order the rotations will be applied to the point.
P × (X × Y × Z) Rotations in X, Y, then Z will be performed
P × (Y × X × Z) Rotations in Y, X, then Z will be performed
P × (Z × X × Y) Rotations in Z, X, then Y will be performed

It actually is really important what order you apply your rotations in.
The order you want depends on what you want the rotations to do. For instance, if you are modeling an airplane, you might want to do the roll first (rotate along the long axis of the body), then the pitch (rotate along the other horizontal axis), then the heading (rotate along the vertical axis). This would be because, if you did the heading first, the plane would no longer be aligned along the other axes. Beyond that, you need to deal with your conventions: which of these axes correspond to X, Y, and Z?
Generally, you only want to choose a particular rotation order for specific applications. It doesn't make much sense to define a generic "XYZrotation" object; typically, you will have generic transformations (i.e., matrices that can be any concatenation of rotations, translations, etc.) and various ways to get them (e.g., rotX, rotY, translate, scale...), plus the ability to apply them in a particular order (by doing matrix multiplication).
If you want something that can only represent rotations and nothing else, you might consider quaternions (as anand suggests). However, you still need to decide which order to perform your rotations in, and, again, it doesn't really make sense to hardwire a required order for that.

As an aside and if you're early enough in your development activities here, you might want to consider using quaternion rotation. It has a number of comparative advantages to matrix based approaches.

Related

Are Geometric Translations Linear?

I'm studying linear algebra and I discovered that linear transformations are often used in video games.
I tried to calculate the associated matrix with the transformation that translates a point (x y z) by a vector (x y z) and I came to the conclusion that that transformation is not linear because, given p1, p2 and a translation vector v ∊ V:
T(v1 + v2) = v1 + v2 + p ≠ T(v1) + T(v2)
I navigated online and I found that 3D coordinates (x y z) are translated in a vector (x y z 1) but, given v1 and v2 ∊ V:
v1 + v2 = (x1 + x2, y1 + y2, z1 + z2, 2)
V is not even a vector space
My question is: why do I get these wrong results?
Thanks for all.
In the vector with homogeneous coordinate format, (x y z 1),
(x/1 y/1 z/1) are 3D cartesian coordinates, and 1 is the scaling factor.
We divide the first three values by the scaling factor to get the vector in cartesian coordinate format, (x y z). Homogeneous coordinates can be useful for efficient arbitrary precision with rational coordinates, and eloquent algebra with nice properties like linearity here.
When we want to translate a point by adding a vector, that's natural with cartesian coordinates. With all point/vector coordinates in cartesian format,
addition of a vector, v, to a point, p1, is a translation, T1, to some point, p2, such that
T1(p1) = p1 + v = p2
You are correct that something like T1 isn't linear.
When we want to translate a point by multiplying a matrix, it's more natural to think in homogeneous coordinates. This matrix, A, would be the identity matrix, but its last column is the vector, v, in homogeneous form. With points in homogeneous format, we can represent the transform, T2, as
T2(p1) = A * p1 = p2
With T2, we do have a linear transform.
Your results after not wrong: The translation of a point by a vector is not a linear transformation.
Translating a point by a vector is an affine transformation and it's done in an affine space. An affine space can be loosely defined as a set of points together with a vector space, where you can add a vector to a point and get another point as a result. Adding points to points is not allowed.
One way to construct an affine space is by taking a projective space whose elements are represented with homogeneous coordinates. These concepts come from the beautiful field of projective geometry, but a full explanation does not fit in a stack overflow post.
A more direct way to construct an affine space is by taking a vector space and adding one extra bit of information: take vectors of the form (x y z 1) as the points, and vectors of the form (x y z 0) as the vectors. Note that the points do not form a vector space, but the vectors do, and that if you add a vector to a point, the result is another point.
With this representation of points and vectors, translation of a point p by a vector can be written as a matrix multiplication T*p. The matrix T for translating by vector (x y z 0) is:
1 0 0 x
0 1 0 y
0 0 1 z
0 0 0 1
Note that this is still not a linear transform because points do not form a vector space.

2D rotation matrix applied on 3D points

I have rotation matrices, translation vector and a set of 3D categorized points (category depends on the z-coordinates).
One 2x2 rotation matrix M and one 2x1 translation vector T are related to one category.
How can I apply my rotation and translation matrix on each point with coordinate (x, y, z) ?
Is it simply that or I misunderstand the principle of rotation matrix?
add to M a column and a line of 0
add to T a 0 for the z-transformation
then : (x, y, z) = M * (xp, yp, zp) + T
If I understand you correctly you have an affine transformation on R^2 and you want to lift it to an affine transformation on R^3 in such a way that the effect when you apply it to (x,y,z) is to apply the original transformation to (x,y) and leave z unchanged.
If so -- you have to modify your matrix more carefully.
If your original matrix is
M = [a b]
[c d]
Then your new matrix should be
M' = [a b 0]
[c d 0]
[0 0 1]
Note the 1 in the lower right corner -- it is the missing ingredient in the approach that you described. Note that added row and column are the third row and column of the 3x3 identity matrix, which makes sense since you want the result to act like the identity matrix on z. It is easy to work out that
[a b 0] [x] [ax+by]
[c d 0] [y] = [cx+dy]
[0 0 1] [z] [ z ]
Which is what I think you want. (I don't think Stack Overflow has any mark-up for matrices, but my notation should be clear enough)
You are handling T correctly (adding a zero z-component).

Rotation matrix openCV

I would like to know how to find the rotation matrix for a set of features in a frame.
I will be more specific. I have 2 frames with 20 features, let's say frame 1 and frame 2. I could estimate the location of the features in both frames. For example let say a certain frame 1 feature at location (x, y) and I know exactly where it is so let's say (x',y').
My question is that the features are moved and probably rotated so I wanna know how to compute the rotation matrix, I know the rotation matrix for 2D:
But I don't know how to compute the angle, and how to do that? I tried a function in OpenCV which is cv2DRotationMatrix(); but the problem which as I mentioned above I don't know how to compute the angle for the rotation matrix and another problem which it gives 2*3 matrix, so it won't work out cause if I will take this 20*2 matrix, (20 is the number of features and 2 are the location in (x,y)) and multiply it by the matrix by 2*3 which is the results from the function then I will get 20*3 matrix which it doesn't seem to be realistic cause I'm working with 2D.
So what should I do? To be more specific again, show me how to compute the angle to use it in the matrix?
I'm not sure I've understood your question, but if you want to find out the angle of rotation resulting from an arbitrary transform...
A simple hack is to transform the points [0 0] and [1 0] and getting the angle of the ray from the first transformed point to the second.
o = M • [0 0]
x = M • [1 0]
d = x - o
θ = atan2(d.y, d.x)
This doesn't consider skew and other non-orthogonal transforms, for which the notion of "angle" is vague.
Have a look at this function:
cvGetAffineTransform
You give it three points in the first frame and three in the second. And it computes the affine transformation matrix (translation + rotation)
If you want, you could also try
cvGetPerspectiveTransform
With that, you can get translation+rotation+skew+lot of others.

How to always rotate from a particular orientation

(Apologies in advance. My math skills and powers of description seem to have left me for the moment)
Imagine a cube on screen with a two sets of controls. One set of controls to rotate the cube side to side (aka yaw or Y or even Z depending on one's mathematical leanings) and another set of controls to rotate up and down (aka pitch or X).
What I would like to do is make it so that the two set sof controls always rotate the cube in relation to the viewer / screen irrespective of how the cube is currently rotated.
A regular combination of either matrix or quaternion based rotations doesn't achieve this effect because the rotations get applied in a serial fashion (with each rotation "starting" from where the previous one left off).
e.g. With the psuedo code of
combinatedRotation = RotateYaw(90) * RotatePitch(45)
will give me a cube that appears to be "rolling" to one side because the Pitch rotation has been rotated as well.
(or for a more dramatic example RotateYaw(180) * RotatePitch(45) will produce a cube where it appears the the pitch is working in reverse to the screen)
Can somebody either point me to or supply the correct way to make the two rotations independant from each other in effect so that irrespective of how the cube is currently rotated Yaw and Pitch work "as expected" in relation to the on screen controls?
EDIT 3: It just occurred to me that the solution below, while correct, is unnecessarily complicated. You can achieve the same effect by simply multiplying the rotation matrix by the orientation matrix to compute the new orientation:
M = R * M
Though not relevant to the question, this would also correctly handle orientation matrices that aren't made up of pure rotation, but also contain translation, skew, etc.
(End of edit 3)
You need a transform matrix comprising the current rotated axes of your object's local coordinate system. You then apply rotations to that matrix.
In mathematical terms, you start with an identity matrix as follows:
M = [1 0 0 0]
[0 1 0 0]
[0 0 1 0]
[0 0 0 1]
This matrix comprises three vectors, U, V and W, that represent — in world coordinates — the three unit vectors of your object's local coordinate system:
M = [Ux Vx Wx 0]
[Uy Vy Wy 0]
[Uz Vz Wz 0]
[0 0 0 1]
When you want to rotate the object, rotate each vector in situ. In other words, apply the rotation independently to each of U, V and W within the matrix.
When rendering, simply apply M as a single transform to your object. (In case you're wondering, don't apply the rotations themselves; just the matrix.)
EDIT 2: (Appears before the first edit, since it provides context for it.)
On revisiting this answer long after it was originally posted, I've realised that I might not have picked up on a misunderstanding that you might have about how to apply rotations from each control.
The idea is not to accumulate the rotation to be applied by each control and apply them separately. Rather, you should apply each incremental rotation (i.e., every time one of your slider controls fires a change event) immediately to the U, V and W vectors.
To put this in more concrete terms, don't say, "In total, the vertical control has moved 47° and the horizontal control has moved -21°" and apply them as two big rotations. That will exhibit the same problem that motivated your question. Instead, say, "The vertical slider just moved 0.23°", and rotate U, V and W about the X-axis by 0.23°.
In short, the 90° yaw followed by 45° pitch described below is probably not what you want.
EDIT: As requested, here's how the case of 90° yaw followed by 45° pitch pans out in practice...
Since you start with the identity matrix, the basis vectors will simply be your world unit vectors:
U = [1] V = [0] W = [0]
[0] [1] [0]
[0] [0] [1]
To apply the 90° yaw, rotate each basis vector around the Z-axis (reflecting my mathematical leaning), which is almost trivial:
U = [0] V = [-1] W = [0]
[1] [ 0] [0]
[0] [ 0] [1]
Thus, after the 90° yaw, the transform matrix will be:
M = [0 -1 0 0]
[1 0 0 0]
[0 0 1 0]
[0 0 0 1]
Applying this matrix to the subject will achieve the desired 90° rotation around Z.
To then apply a 45° pitch (which I'll take to be around the X-axis), we rotate our new basis vectors in the YZ-plane, this time by 45°:
U = [0 ] V = [-1] W = [ 0 ]
[0.7] [ 0] [-0.7]
[0.7] [ 0] [ 0.7]
Thus, the new transform matrix is:
M = [0 -1 0 0]
[0.7 0 -0.7 0]
[0.7 0 0.7 0]
[0 0 0 1]
If you multiply the two rotations together:
Yaw(90)*Pitch(45) = [0 -1 0 0]*[1 0 0 0] = [0 -0.7 0.7 0]
[1 0 0 0] [0 0.7 -0.7 0] [1 0 0 0]
[0 0 1 0] [0 0.7 0.7 0] [0 0.7 0.7 0]
[0 0 0 1] [0 0 0 1] [0 0 0 1]
Pitch(45)*Yaw(90) = [1 0 0 0]*[0 -1 0 0] = [0 -1 0 0]
[0 0.7 -0.7 0] [1 0 0 0] [0.7 0 -0.7 0]
[0 0.7 0.7 0] [0 0 1 0] [0.7 0 0.7 0]
[0 0 0 1] [0 0 0 1] [0 0 0 1]
You'll notice that the second form is the same as the transform matrix produced by manipulating the basis vectors, but this is just a coincidence (and quite a common one when mixing up 90° and 45° rotations). In the general case, neither order of application will match the basis-transform.
I've run out of steam, so I hope the explanation so far makes things clearer, not muddier.

Vertex shader world transform, why do we use 4 dimensional vectors?

From this site: http://www.toymaker.info/Games/html/vertex_shaders.html
We have the following code snippet:
// transformations provided by the app, constant Uniform data
float4x4 matWorldViewProj: WORLDVIEWPROJECTION;
// the format of our vertex data
struct VS_OUTPUT
{
float4 Pos : POSITION;
};
// Simple Vertex Shader - carry out transformation
VS_OUTPUT VS(float4 Pos : POSITION)
{
VS_OUTPUT Out = (VS_OUTPUT)0;
Out.Pos = mul(Pos,matWorldViewProj);
return Out;
}
My question is: why does the struct VS_OUTPUT have a 4 dimensional vector as its position? Isn't position just x, y and z?
Because you need the w coordinate for perspective calculation. After you output from the vertex shader than DirectX performs a perspective divide by dividing by w.
Essentially if you have 32768, -32768, 32768, 65536 as your output vertex position then after w divide you get 0.5, -0.5, 0.5, 1. At this point the w can be discarded as it is no longer needed. This information is then passed through the viewport matrix which transforms it to usable 2D coordinates.
Edit: If you look at how a matrix multiplication is performed using the projection matrix you can see how the values get placed in the correct places.
Taking the projection matrix specified in D3DXMatrixPerspectiveLH
2*zn/w 0 0 0
0 2*zn/h 0 0
0 0 zf/(zf-zn) 1
0 0 zn*zf/(zn-zf) 0
And applying it to a random x, y, z, 1 (Note for a vertex position w will always be 1) vertex input value you get the following
x' = ((2*zn/w) * x) + (0 * y) + (0 * z) + (0 * w)
y' = (0 * x) + ((2*zn/h) * y) + (0 * z) + (0 * w)
z' = (0 * x) + (0 * y) + ((zf/(zf-zn)) * z) + ((zn*zf/(zn-zf)) * w)
w' = (0 * x) + (0 * y) + (1 * z) + (0 * w)
Instantly you can see that w and z are different. The w coord now just contains the z coordinate passed to the projection matrix. z contains something far more complicated.
So .. assume we have an input position of (2, 1, 5, 1) we have a zn (Z-Near) of 1 and a zf (Z-Far of 10) and a w (width) of 1 and a h (height) of 1.
Passing these values through we get
x' = (((2 * 1)/1) * 2
y' = (((2 * 1)/1) * 1
z' = ((10/(10-1) * 5 + ((10 * 1/(1-10)) * 1)
w' = 5
expanding that we then get
x' = 4
y' = 2
z' = 4.4
w' = 5
We then perform final perspective divide and we get
x'' = 0.8
y'' = 0.4
z'' = 0.88
w'' = 1
And now we have our final coordinate position. This assumes that x and y ranges from -1 to 1 and z ranges from 0 to 1. As you can see the vertex is on-screen.
As a bizarre bonus you can see that if |x'| or |y'| or |z'| is larger than |w'| or z' is less than 0 that the vertex is offscreen. This info is used for clipping the triangle to the screen.
Anyway I think thats a pretty comprehensive answer :D
Edit2: Be warned i am using ROW major matrices. Column major matrices are transposed.
Rotation is specified by a 3 dimensional matrix and translation by a vector. You can perform both transforms in a "single" operation by combining them into a single 4 x 3 matrix:
rx1 rx2 rx3 tx1
ry1 ry2 ry3 ty1
rz1 rz2 rz3 tz1
However as this isn't square there are various operations that can't be performed (inversion for one). By adding an extra row (that does nothing):
0 0 0 1
all these operations become possible (if not easy).
As Goz explains in his answer by making the "1" a non identity value the matrix becomes a perspective transformation.
Clipping is an important part of this process, as it helps to visualize what happens to the geometry. The clipping stage essentially discards any point in a primitive that is outside of a 2-unit cube centered around the origin (OK, you have to reconstruct primitives that are partially clipped but that doesn't matter here).
It would be possible to construct a matrix that directly mapped your world space coordinates to such a cube, but gradual movement from the far plane to the near plane would be linear. That is to say that a move of one foot (towards the viewer) when one mile away from the viewer would cause the same increase in size as a move of one foot when several feet from the camera.
However, if we have another coordinate in our vector (w), we can divide the vector component-wise by w, and our primitives won't exhibit the above behavior, but we can still make them end up inside the 2-unit cube above.
For further explanations see http://www.opengl.org/resources/faq/technical/depthbuffer.htm#0060 and http://en.wikipedia.org/wiki/Transformation_matrix#Perspective_projection.
A simple answer would be to say that if you don't tell the pipeline what w is then you haven't given it enough information about your projection. This can be verified directly without understanding what the pipeline does with it...
As you probably know the 4x4 matrix can be split into parts based on what each part does. The 3x3 matrix at the top left is altered when you do rotation or scale operations. The fourth column is altered when you do a translation. If you ever inspect a perspective matrix, it alters the bottom row of the matrix. If you then look at how a Matrix-Vector multiplication is done, you see that the bottom row of the matrix ONLY affects the resultant w component of the vector. So if you don't tell the pipeline about w it won't have all your information.

Resources