Calculating a LookAt matrix - math

I'm in the midst of writing a 3d engine and I've come across the LookAt algorithm described in the DirectX documentation:
zaxis = normal(At - Eye)
xaxis = normal(cross(Up, zaxis))
yaxis = cross(zaxis, xaxis)
xaxis.x yaxis.x zaxis.x 0
xaxis.y yaxis.y zaxis.y 0
xaxis.z yaxis.z zaxis.z 0
-dot(xaxis, eye) -dot(yaxis, eye) -dot(zaxis, eye) 1
Now I get how it works on the rotation side, but what I don't quite get is why it puts the translation component of the matrix to be those dot products. Examining it a bit it seems that it's adjusting the camera position by a small amount based on a projection of the new basis vectors onto the position of the eye/camera.
The question is why does it need to do this? What does it accomplish?

Note the example given is a left-handed, row major matrix.
So the operation is: Translate to the origin first (move by -eye), then rotate so that the vector from eye to At lines up with +z:
Basically you get the same result if you pre-multiply the rotation matrix by a translation -eye:
[ 1 0 0 0 ] [ xaxis.x yaxis.x zaxis.x 0 ]
[ 0 1 0 0 ] * [ xaxis.y yaxis.y zaxis.y 0 ]
[ 0 0 1 0 ] [ xaxis.z yaxis.z zaxis.z 0 ]
[ -eye.x -eye.y -eye.z 1 ] [ 0 0 0 1 ]
[ xaxis.x yaxis.x zaxis.x 0 ]
= [ xaxis.y yaxis.y zaxis.y 0 ]
[ xaxis.z yaxis.z zaxis.z 0 ]
[ dot(xaxis,-eye) dot(yaxis,-eye) dot(zaxis,-eye) 1 ]
Additional notes:
Note that a viewing transformation is (intentionally) inverted: you multiply every vertex by this matrix to "move the world" so that the portion you want to see ends up in the canonical view volume.
Also note that the rotation matrix (call it R) component of the LookAt matrix is an inverted change of basis matrix where the rows of R are the new basis vectors in terms of the old basis vectors (hence the variable names xaxis.x, .. xaxis is the new x axis after the change of basis occurs). Because of the inversion, however, the rows and columns are transposed.

I build a look-at matrix by creating a 3x3 rotation matrix as you have done here and then expanding it to a 4x4 with zeros and the single 1 in the bottom right corner. Then I build a 4x4 translation matrix using the negative eye point coordinates (no dot products), and multiply the two matrices together. My guess is that this multiplication yields the equivalent of the dot products in the bottom row of your example, but I would need to work it out on paper to make sure.
The 3D rotation transforms your axes. Therefore, you cannot use the eye point directly without also transforming it into this new coordinate system. That's what the matrix multiplications -- or in this case, the 3 dot-product values -- accomplish.

That translation component helps you by creating an orthonormal basis with your "eye" at the origin and everything else expressed in terms of that origin (your "eye") and the three axes.
The concept isn't so much that the matrix is adjusting the camera position. Rather, it is trying to simplify the math: when you want to render a picture of everything that you can see from your "eye" position, it's easiest to pretend that your eye is the center of the universe.
So, the short answer is that this makes the math much easier.
Answering the question in the comment: the reason you don't just subtract the "eye" position from everything has to do with the order of the operations. Think of it this way: once you are in the new frame of reference (i.e., the head position represented by xaxis, yaxis and zaxis) you now want to express distances in terms of this new (rotated) frame of reference. That is why you use the dot product of the new axes with the eye position: that represents the same distance that things need to move but it uses the new coordinate system.

Just some general information:
The lookat matrix is a matrix that positions / rotates something to point to (look at) a point in space, from another point in space.
The method takes a desired "center" of the cameras view, an "up" vector, which represents the direction "up" for the camera (up is almost always (0,1,0), but it doesn't have to be), and an "eye" vector which is the location of the camera.
This is used mainly for the camera but can also be used for other techniques like shadows, spotlights, etc.
Frankly I'm not entirely sure why the translation component is being set as it is in this method. In gluLookAt (from OpenGL), the translation component is set to 0,0,0 since the camera is viewed as being at 0,0,0 always.

Dot product simply projects a point to an axis to get the x-, y-, or z-component of the eye. You are moving the camera backwards so looking at (0, 0, 0) from (10, 0, 0) and from (100000, 0, 0) would have different effect.

The lookat matrix does these two steps:
Translate your model to the origin,
Rotate it according to the orientation set up by the up-vector and the looking
direction.
The dot product means simply that you make a translation first and then rotate. Instead of multiplying two matrices the dot product just multiplies a row with a column.

A transformation 4x4 matrix contains two-three components:
1. rotation matrix
2. translation to add.
3. scale (many engine do not use this directly in the matrix).
The combination of the them would transform a point from space A to Space B, hence this is a transformation matrix M_ab
Now, the location of the camera is in space A and so it is not the valid transformation for space B, so you need to multiply this location with the rotation transform.
The only open question remains is why the dots?
Well, if you write the 3 dots on a paper, you'd discover that 3 dots with X, Y and Z is exactly like multiplication with a rotation matrix.
An example for that forth row/column would be taking the zero point - (0,0,0) in world space. It is not the zero point in camera space, and so you need to know what is the representation in camera space, since rotation and scale leave it at zero!
cheers

It is necessary to put the eye point in your axis space, not in the world space. When you dot a vector with a coordinate unit basis vector, one of the x,y,z, it gives you the coordinates of the eye in that space. You transform location by applying the three translations in the last place, in this case the last row. Then moving the eye backwards, with a negative, is equivalent to moving all the rest of the space forwards. Just like moving up in an elevator makes you feel lke the rest of the world is dropping out from underneath you.
Using a left-handed matrix, with translation as the last row instead of the last column, is a religious difference which has absolutely nothing to do with the answer. However, it is a dogma that should be strictly avoided. It is best to chain global-to-local (forward kinematic) transforms left-to-right, in a natural reading order, when drawing tree sketches. Using left-handed matrices forces you to write these right-to-left.

Related

View matrix: to invert rotation or to not invert rotation?

Edit: My question may be too complex for what I am really asking, so skip to the TLDR; if you need it.
I have been programming 3D graphics for a while now and up until now I never seemed to have this issue, but maybe this is the first time I really understand things like I should (or not). So here's the question...
My 3D engine uses the typical OpenGL legacy convention of a RH coordinate system, which means X+ is right, Y+ is up and Z+ is towards the viewer, Z- goes into the screen. I did this so that I could test my 3D math against the one in OpenGL.
To coop with the coordinate convention of Blender 3D/Collada, I rotate every matrix during importing with -90 degrees over the X axis. (Collada uses X+ right, Y+ forward, Z+ up if I am not mistaken)
When I just use the projection matrix, an identity view matrix and a model matrix that transforms a triangle to position at (0, 0, -5), I will see it because Z- is into the screen.
In my (6DOF space) game, I have spaceships and asteroids. I use double-precision coordinates (because they are huge) and by putting the camera inside a spaceship, the coordinates are made relative every frame so they are precise enough to fit a single-precision coordinate for rendering on the GPU.
So, now I have a spaceship, the camera is inside, and it its rotation quaternion is identity. This gives an identity matrix and if I recall correctly, row columns 1-3 are representing the X, Y and Z axis of where the object is pointing at. To move the ship, I use this Z axis to go forward. With the identity matrix, the Z-axis will be (0, 0, 1).
Edit: actually, I don't take the columns from the matrix, I extract the axes directly from the quaternion.
Now, when I put the the camera in the spaceship, this means that its nose is pointing at (0, 0, 1) but OpenGL will render with -1 going into the screen because of its conventions.
I always heard that when you put the camera inside an object in your scene, you need to take the model matrix and invert it. It's logical: if the ship is at (0, 0, 1000) and an asteroid is at (0, 0, 1100), then is makes sense that you need to put the camera at (0, 0, -1000) so that the ship will be at (0, 0, 0) and the asteroid will be at (0, 0, 100).
So when I do this, the ship will be rendered with its nose looking at Z-, but now, when I start moving, my ship moves to its rotation (still identity) Z being (0, 0, 1) and the ship will back up instead of going forward. Which makes sense if (0, 0, 1) is towards the viewer...
So now I am confused... how should I handle this correctly??? Which convention did I use incorrectly? Which convention did I forget? It doesn't seem logical, for example, to invert the rotation of the ship when calculation the movement vectors...
Can someone clarify this for me? This has been bothering me for a week now and I don't seem to get it straight without doubting that I am making new errors.
Edit: isn't it at all very strange to invert the rotational part of the model's matrix for a view matrix? I understand that the translation part should be inverted, but the view should still look at the same direction as the object when it would be rendered, no?
TLDR;
If you take legacy OpenGL, set a standard projection matrix and an identity modelview matrix and render a triangle at (0, 0, -5), you will see it because OpenGL looks at Z-.
But if you take the Z-axis from the view matrix (3rd row column), which is (0, 0, 1) on an identity matrix, this means that going 'forward' means that you will be getting further away from that triangle, which looks illogical.
What am I missing?
Edit: As the answer is hidden in many comments below, I summarize it here: conventions! I chose to use the OpenGL legacy convention but I also chose to use my own physics convention and they collide, so I need to compensate for that.
Edit: After much consideration, I have decided to abandon the OpenGL legacy convention and use whatever looks most logical to me, which is the left-handed system.
I think the root cause of your confusion might lie here
So, now I have a spaceship, the camera is inside, and it its rotation quaternion is identity. This gives an identity matrix and if I recall correctly, row 1-3 are representing the X, Y and Z axis of where the object is pointing at. To move the ship, I use this Z axis to go forward. With the identity matrix, the Z-axis will be (0, 0, 1).
Since we can assume that a view matrix contains only rotations and translations (no scaling/shear or perspective tricks), we know that the upper left 3x3 sub-matrix will be a rotation only, and those are orthogonal by definition, so the inverse(mat3(view)) will be the transpose(mat3(view)), which is where your rows are coming from. Since in a standard matrix which you use to transform objects in a fixed coordinate frame (as opposed to moving the coordinate frame of reference), the columns of the matrix will simply show where the unit vectors for x, y and z (as well as the origin (0,0,0,1) will be mapped to by this matrix. By taking the rows, you use the transpose, which, in this particular setup, is the inverse (not considering the last column containing the translation, of course).
The view matrix will transform from wolrd space into eye space. As a result, inverse(view) will transform from eye space back to world space.
So, inverse(view) * (1,0,0,0) will give you the camera's right vector in world space, inverse(view) * (0,1,0,0) the up vector, but as per convention the camera will be looking at -z in eye space, so forward direction in wolrd space will be inverse(view) * (0,0,-1,0), which, in your setup, is just the third row of the matrix negated.
(Camera position will be inverse(view) * (0,0,0,1) of course, but we have to do a bit more than just transposing to get the fourth column of inverse(view) right).

OpenGL : equation of the line going through a point defined by a 4x4 matrix ? (camera for example)

I would like to know what is the set of 3 equations (in the world coordinates) of the line going through my camera (perpendicular to the camera screen). The position and rotation of my camera in the world coordinates being defined by a 4x4 matrix.
Any idea?
parametric line is simple just extract the Z axis direction vector and origin point O from the direct camera matrix (see the link below on how to do it). Then any point P on your line is defined as:
P(t) = O + t*Z
where t is your parameter. The camera view direction is usually -Z for OpenGL perspective in such case:
t = (-inf,0>
Depending on your projection you might want to use:
t = <-z_far,-z_near>
The problem is there are many combinations of conventions. So you need to know if you have row major or column major order of your matrix (so you know if the direction vectors and origins are in rows or columns). Also camera matrix in gfx is usually inverse one so you need to invert it first. For more info about this see:
Understanding 4x4 homogenous transform matrices

3D to 2D - moving camera

I'm trying to make a 3D engine to to see how it works (i love to know how things exactly work) .. I heard that they put the camera somewhere and move and rotate the whole world - that was easy to make the only hard thing was making a function to multiply matrices
but i want to move the camera and keep the world at it's position - i saw some people doing it and i actually prefer it .. when i tried to make it myself i faced a simple mathematical problem
to understand the problem i will convert 2D to 1D instead of 3D to 2D(same thing)
look at this picture:
Now i have a Camera position(x,y) and it's Vector (the blue point) From 0 to 1
And i made another Vector from the camera to the object and divided it by the distance to get a vector from 0 to 1 (the white point)
Now the distance from the two vectors (d) are the displacement that i need to draw a point on the screen - but the problem is "which direction" .. distance is always positive so i have to determine if the second point is on the right of the direction vector or on the left - it's very simple for the eye but it's very hard using codes
when i tried to compare (y2-y1)/(x2-x1) i got incorrect results when one vector is on a quarter and the other vector is on another quarter (quarters of the Coordinate Plane)
so how to compare those two vectors to see where it is according to the other vector?
i also tried atan2 and i got some incorrect results but i think atan2 will be slow for computers because i have to calculate 2 times for every 3D point
if there is an easier way tell me
i used many words to describe my question because i only know the simple words in english

Finding absolute coordinates from relative coordinates in 3D space

My question is fairly difficult to explain, so please bear with me. I have a random object with Forward, Right, and Up vectors. Now, imagine this particular object is rotated randomly across all three axis randomly. How would I go about finding the REAL coordinates of a point relative to the newly rotated object?
Example:
How would I, for instance, find the forward-most corner of the cube given its Forward, Right, and Up vectors (as well as its coordinates, obviously) assuming that the colored axis is the 'real' axis.
The best I could come up with is:
x=cube.x+pointToFind.x*(forward.x+right.x+up.x)
y=cube.y+pointToFind.y*(forward.y+right.y+up.y)
z=cube.z+pointToFind.z*(forward.z+right.z+up.z)
This worked sometimes, but failed when one of the coordinates for the point was 0 for obvious reasons.
In short, I don't know what do to, or really how to accurately describe what I'm trying to do... This is less of a programming questions and more of a general math question.
In general, you would have to project all corners of the object, one after the other, on the target direction (i.e., compute the scalar or dot product of both vectors) and remember the point delivering the maximum value.
Because of the special structure of the cube, several simplifications are possible. You can rotate the target direction vector into the local frame. Then the determination of the maximal projection can be read off the signs of its local coordinates. If the sign of the coordinate is positive the scalar product is maximized by maximizing the cube coordinate to 1. If the sign is negative, then the scalar product is maximized by minimizing the cube coordinate to 0.
Inverse rotation is the same as forming dot products with the columns of the rotation matrix (forward, right, up), so
result = zero-vector; //zero corner of the cube
if( dot( target, forward ) > 0 )
result += forward;
if( dot( target, up ) > 0 )
result += up;
if( dot( target, right ) > 0 )
result += right;

Area of polygon - clockwise

From this threa Determine the centroid of multiple points I came to know that area of polygon can also be negative if we start in clockwise direction. Why it can be negative?
It is a product of the maths. You can use the sign if you wish to, or use an absolute value for the area.
You often get a similar effect with dot products and cross products. This can be effective, for example determining the orientation of a polygon in 3d (does the 'outside' side of the polygon face towards me or away from me?)
The sign tells you some useful information, that you can either use or discard. For example, what is the area below the curve sin(x) and above the x axis, for x over the interval [0,pi]. Yes, this is simply a definite integral. In MATLAB, I'd do it as:
>> quad(#sin,0,pi)
ans =
2
But suppose I computed that same definite integral, with limits of integration [pi,0]? Clearly, we would get -2.
>> quad(#sin,pi,0)
ans =
-2
And of course this makes sense. In either case, we can assure that we get the positive area by ignoring the sign. But the sign tells us something in that integral.
The sign computed for the area of a polygon is indeed useful in some problems. In the case of a triangle, a cross product will yield a vector that points in the direction orthogonal to the plane of the triangle containing the vectors. The magnitude of the vector will be twice the area of that triangle. Note that this vector can point in one of two directions orthogonal to a given plane, which one is indicated by the right hand rule. You can think of that sign as indicating which direction the vector pointed.

Resources