View matrix: to invert rotation or to not invert rotation? - math

Edit: My question may be too complex for what I am really asking, so skip to the TLDR; if you need it.
I have been programming 3D graphics for a while now and up until now I never seemed to have this issue, but maybe this is the first time I really understand things like I should (or not). So here's the question...
My 3D engine uses the typical OpenGL legacy convention of a RH coordinate system, which means X+ is right, Y+ is up and Z+ is towards the viewer, Z- goes into the screen. I did this so that I could test my 3D math against the one in OpenGL.
To coop with the coordinate convention of Blender 3D/Collada, I rotate every matrix during importing with -90 degrees over the X axis. (Collada uses X+ right, Y+ forward, Z+ up if I am not mistaken)
When I just use the projection matrix, an identity view matrix and a model matrix that transforms a triangle to position at (0, 0, -5), I will see it because Z- is into the screen.
In my (6DOF space) game, I have spaceships and asteroids. I use double-precision coordinates (because they are huge) and by putting the camera inside a spaceship, the coordinates are made relative every frame so they are precise enough to fit a single-precision coordinate for rendering on the GPU.
So, now I have a spaceship, the camera is inside, and it its rotation quaternion is identity. This gives an identity matrix and if I recall correctly, row columns 1-3 are representing the X, Y and Z axis of where the object is pointing at. To move the ship, I use this Z axis to go forward. With the identity matrix, the Z-axis will be (0, 0, 1).
Edit: actually, I don't take the columns from the matrix, I extract the axes directly from the quaternion.
Now, when I put the the camera in the spaceship, this means that its nose is pointing at (0, 0, 1) but OpenGL will render with -1 going into the screen because of its conventions.
I always heard that when you put the camera inside an object in your scene, you need to take the model matrix and invert it. It's logical: if the ship is at (0, 0, 1000) and an asteroid is at (0, 0, 1100), then is makes sense that you need to put the camera at (0, 0, -1000) so that the ship will be at (0, 0, 0) and the asteroid will be at (0, 0, 100).
So when I do this, the ship will be rendered with its nose looking at Z-, but now, when I start moving, my ship moves to its rotation (still identity) Z being (0, 0, 1) and the ship will back up instead of going forward. Which makes sense if (0, 0, 1) is towards the viewer...
So now I am confused... how should I handle this correctly??? Which convention did I use incorrectly? Which convention did I forget? It doesn't seem logical, for example, to invert the rotation of the ship when calculation the movement vectors...
Can someone clarify this for me? This has been bothering me for a week now and I don't seem to get it straight without doubting that I am making new errors.
Edit: isn't it at all very strange to invert the rotational part of the model's matrix for a view matrix? I understand that the translation part should be inverted, but the view should still look at the same direction as the object when it would be rendered, no?
TLDR;
If you take legacy OpenGL, set a standard projection matrix and an identity modelview matrix and render a triangle at (0, 0, -5), you will see it because OpenGL looks at Z-.
But if you take the Z-axis from the view matrix (3rd row column), which is (0, 0, 1) on an identity matrix, this means that going 'forward' means that you will be getting further away from that triangle, which looks illogical.
What am I missing?
Edit: As the answer is hidden in many comments below, I summarize it here: conventions! I chose to use the OpenGL legacy convention but I also chose to use my own physics convention and they collide, so I need to compensate for that.
Edit: After much consideration, I have decided to abandon the OpenGL legacy convention and use whatever looks most logical to me, which is the left-handed system.

I think the root cause of your confusion might lie here
So, now I have a spaceship, the camera is inside, and it its rotation quaternion is identity. This gives an identity matrix and if I recall correctly, row 1-3 are representing the X, Y and Z axis of where the object is pointing at. To move the ship, I use this Z axis to go forward. With the identity matrix, the Z-axis will be (0, 0, 1).
Since we can assume that a view matrix contains only rotations and translations (no scaling/shear or perspective tricks), we know that the upper left 3x3 sub-matrix will be a rotation only, and those are orthogonal by definition, so the inverse(mat3(view)) will be the transpose(mat3(view)), which is where your rows are coming from. Since in a standard matrix which you use to transform objects in a fixed coordinate frame (as opposed to moving the coordinate frame of reference), the columns of the matrix will simply show where the unit vectors for x, y and z (as well as the origin (0,0,0,1) will be mapped to by this matrix. By taking the rows, you use the transpose, which, in this particular setup, is the inverse (not considering the last column containing the translation, of course).
The view matrix will transform from wolrd space into eye space. As a result, inverse(view) will transform from eye space back to world space.
So, inverse(view) * (1,0,0,0) will give you the camera's right vector in world space, inverse(view) * (0,1,0,0) the up vector, but as per convention the camera will be looking at -z in eye space, so forward direction in wolrd space will be inverse(view) * (0,0,-1,0), which, in your setup, is just the third row of the matrix negated.
(Camera position will be inverse(view) * (0,0,0,1) of course, but we have to do a bit more than just transposing to get the fourth column of inverse(view) right).

Related

DirectX negative W

I really was trying to find an answer on this very basic (at first sight) question.
For simplicity depth test is disabled during further discussion (it doesn’t have a big deal).
For example, we have triangle (after transformation) with next float4 coordinates.
top CenterPoint: (0.0f, +0.6f, 0.6f, 1f)
basic point1: (+0.4f, -0.4f, 0.4f, 1f),
basic point2: (-0.4f, -0.4f, 0.4f, 1f),
I’m sending float4 for input and use straight VertexShader (without transforms), so I’m sure about input. And we have result is reasonable:
But what we will get if we'll start to move CenterPoint to point of camera position. In our case we don’t have camera so will move this point to minus infinity.
I'm getting quite reasonable results as long as w (with z) is positive.
For example, (0.0f, +0.006f, 0.006f, .01f) – look the same.
But what if I'll use next coordinates (0.0f, -0.6f, -1f, -1f).
(Note: we have to switch points or change rasterizer for culling preventing).
According to huge amount of resource I'll have test like: -w < z < w, so GPU should cut of that point. And yes, in principle, I don’t see point. But triangle still visible! OK, according to huge amount of other resource (and my personal understanding) we'll have division like (x/w, y/w, z/w) so result should be (0, 0.6, 1). But I'm getting
And even if that result have some sense (one point is somewhere far away behind as), how really DirectX (I think it is rather GPU) works in such cases (in case of infinite points and negative W)?
It seems that I don't know something very basic, but it seems that nobody know that.
[Added]: I want to note that point w < 0 - is not a real input.
In real life such points are result of transformation by matrices and according to the math (math that are used in standard Direct sdk and other places) corresponds to the point that appears behind the camera position.
And yes, that point is clipped, but questions is rather about strange triangle that contains such point.
[Brief answer]: Clipping is essentially not just z/w checking and division (see details below).
Theoretically, NDC depth is divided into two distinct areas. The following diagram shows these areas for znear = 1, zfar = 3. The horizontal axis shows view-space z and the vertical axis shows the resulting NDC depth for a standard projective transform:
We can see that the part between view-space z of 1 and 3 (znear, zmax) gets mapped to NDC depth 0 to 1. This is the part that we are actually interested in.
However, the part where view-space z is negative also produces positive NDC depth. However, those are parts that result from fold-overs. I.e., if you take a corner of your triangle and slowly decrease z (along with w), starting in the area between znear and zfar, you would observe the following:
we start between znear and zfar, everything is good
as soon as we pass znear, the point gets clipped because NDC depth < 0.
when we are at view-space z = 0, the point also has w = 0 and no valid projection.
as we decrease view-space z further, the point gets a valid projection again (starting at infinity) and comes back in with positive NDC depth.
However, this last part is the area behind the camera. So, homogeneous clipping is made, such that this part is also clipped away by znear clipping.
Check the old D3D9 documentation for the formulas and some more illustrative explanations here.

How to deal with negative depth in 3D perspective projection

Background
This question is very similar to this question asked 3 years ago. Basically, I'm wanting to re-create a rudimentary first-person graphics engine as a learning experience.
So, say for example, that we're in a 3D space where z is representative of depth - x and y map to the x and y coordinates of the 2D space. If this coordinate system's origin is the camera, then a point at (0, 0, 1) would be located directly in front of the camera and a point at (0, 0, -1) would be located directly behind the camera.
Adding depth to this projection simply requires us to divide our x and y components by the depth (in this case, z). In practice, this makes sense to me and it appears to work.
Until...
...the depth becomes negative. If the depth is negative and you divide x and y by the depth, x and y's signs will change. We know that logically, however, this shouldn't be the case.
I've tried a few things so far :
Using the absolute value of depth - this wasn't ideal. Say there's a point (1, 1, 4) and (1, 1, -4). These points will then theoretically project onto the same location.
Trying to approximate negative values as decimals. So, if we have a negative depth, we try to map positive decimal number (between 0 and 1), allowing our x and y coordinates to stretch to infinity. The larger the negative number is, the closer to zero that the representative positive decimal is that we'd calculate. I feel like this might be a potential solution, but I'm still struggling a little bit with the concept.
So, how do you handle negative depths in your perspective projections?
I'm very new to graphics, so if I'm omitting any information that's needed to answer this question, feel free to ask. I wanted to keep this implementation agnostic since I feel like this question tends more towards the theoretical aspect of perspective projection.
EDIT
This video identifies the problem I'm trying to solve. It's a great video and is also what inspired me to start this little project - but I'm just wondering if there was a generally 'agreed-upon' way to handle this particular case.
You are doing a point projection, which means that your projected point in 2D is exactly the point where the line between 3D object and 3D camera would pass through the canvas. For positive depth, that intersection is between object and camera. For negative depth, the intersection is beyond the camera. But it's still the same line, hence swapping signs makes perfect sense.
Of course, actually drawing stuff with negative depth doesn't make that much sense, since usually you won't see things behind your camera. And if you do, then you have some extremely wide angle lense, so assuming the canvas as a plane in space is no longer accurate, and you'll have to switch to more complex projections to simulate fish-eye lenses and similar.
It might however be that you want to draw a triangle or other geometric primitive, and that just one of the corners has negative depth, while the others are positive. The usual approch in such scenarios is to clip the object to the frustrum, more particularly to intersect it with the near plane of the frustrum, thus getting rid of all points with negative depth. Usually your graphics pipeline can take care of this clipping.
I will try to provide a more math-y answer for anyone interested.
The mathemetical theory behind this is called projective geometry. You start with a three dimensional space and then split it into equivalence classes where two points a and b are equivalent if there is a factor f so that f*a == b. So for example (4, 4, 4) would be in the same class as (1, 1, 1) and (3, 6, 9) would be in the same class as (100, 200, 300). Geometrically speaking, you look at the set of straight lines through (0, 0, 0).
If you pick the point with z == 1 from every equivalence class you basically get a 2D space. This is exactly what "perspective projection" is. However, the equivalence classes for points like (1, 1, 0) do not have such a point. So what you actually get is a 2D space + some additional "points at infinity".
You can think of these points as a circle that goes around your coordinate system, but with an infinite radius. Also, opposite points are identical, so stuff that goes out on one end wraps around and comes back in on the opposite side. This means that straight lines are actually just circles that contain a point at infinity.
To make a concrete example. If you want to render a straight line from (1, 1, 4) to (1, 1, -4) you first normalize both of them to z == 1: (0.25, 0.25, 1) and (-0.25, -0.25, 1). But now when you draw the line between them, you need to go "the other way around", i.e. leave the screen in one direction and come back in at the opposite side. (You can skip the "come back in" part though because it is behind the camera.)
For implementation it is unfortunately not sufficient to map (1, 1, -4) to (inf, inf, 1) because that way there would be no way to know the slope of the line. You can either fake it by using a very large number instead of infinity or you can do it properly and handle these special cases throughout your code.

Linear Algebra in Games in a 2D space

I am currently teaching myself linear algebra in games and I almost feel ready to use my new-found knowledge in a simple 2D space. I plan on using a math library, with vectors/matrices etc. to represent positions and direction unlike my last game, which was simple enough not to need it.
I just want some clarification on this issue. First, is it valid to express a position in 2D space in 4x4 homogeneous coordinates, like this:
[400, 300, 0, 1]
Here, I am assuming, for simplicity that we are working in a fixed resolution (and in screen space) of 800 x 600, so this should be a point in the middle of the screen.
Is this valid?
Suppose that this position represents the position of the player, if I used a vector, I could represent the direction the player is facing:
[400, 400, 0, 0]
So this vector would represent that the player is facing the bottom of the screen (if we are working in screen space.
Is this valid?
Lastly, if I wanted to rotate the player by 90 degrees, I know I would multiply the vector by a matrix/quarternion, but this is where I get confused. I know that quarternions are more efficient, but I'm not exactly sure how I would go about rotating the direction my player is facing.
Could someone explain the math behind constructing a quarternion and multiplying it by my face vector?
I also heard that OpenGL and D3D represent vectors in a different manner, how does that work? I don't exactly understand it.
I am trying to start getting a handle on basic linear algebra in games before I step into a 3D space in several months.
You can represent your position as a 4D coordinate, however, I would recommend using only the dimensions that are needed (i.e. a 2D vector).
The direction is mostly expressed as a vector that starts at the player's position and points in the according direction. So a direction vector of (0,1) would be much easier to handle.
Given that vector you can use a rotation matrix. Quaternions are not really necessary in that case because you don't want to rotate about arbitrary axes. You just want to rotate about the z-axis. You helper library should provide methods to create such matrix and transform the vector with it (transform as a normal).
I am not sure about the difference between the OpenGL's and D3D's representation of the vectors. But I think, it is all about memory usage which should be a thing you don't want to worry about.
I can not answer all of your questions, but in terms of what is 'valid' or not it all completely depends on if it contains all of the information that you need and it makes sense to you.
Furthermore it is a little strange to have the direction that an object is facing be a non-unit vector. Basically you do not need the information of how long the vector is to figure out the direction they are facing, You simply need to be able to figure out the radians or degrees that they have rotated from 0 degrees or radians. Therefore people usually simply encode the radians or degrees directly as many linear algebra libraries will allow you to do vector math using them.

Flipping a quaternion from right to left handed coordinates

I need to flip a quaternion from right:
x = left to right
y = front to back
z = top to bottom
to left handed coordinates where:
x = left to right
y = top to bottom
z = front to back
How would I go about doing this?
I don't think any of these answers is correct.
Andres is correct that quaternions don't have handedness (*). Handedness (or what I'll call "axis conventions") is a property that humans apply; it's how we map our concepts of "forward, right, up" to the X, Y, Z axes.
These things are true:
Pure-rotation matrices (orthogonal, determinant 1, etc) can be converted to a unit quaternion and back, recovering the original matrix.
Matrices that are not pure rotations (ones that have determinant -1, for example matrices that flip a single axis) are also called "improper rotations", and cannot be converted to a unit quaternion and back. Your mat_to_quat() routine may not blow up, but it won't give you the right answer (in the sense that quat_to_mat(mat_to_quat(M)) == M).
A change-of-basis that swaps handedness has determinant -1. It is an improper rotation: equivalent to a rotation (maybe identity) composed with a mirroring about the origin.
To change the basis of a quaternion, say from ROS (right-handed) to Unity (left-handed), we can use the method of .
mat3x3 ros_to_unity = /* construct this by hand */;
mat3x3 unity_to_ros = ros_to_unity.inverse();
quat q_ros = ...;
mat3x3 m_unity = ros_to_unity * mat3x3(q_ros) * unity_to_ros ;
quat q_unity = mat_to_quat(m_unity);
Lines 1-4 are simply the method of https://stackoverflow.com/a/39519079/194921: "How do you perform a change-of-basis on a matrix?"
Line 5 is interesting. We know mat_to_quat() only works on pure-rotation matrices. How do we know that m_unity is a pure rotation? It's certainly conceivable that it's not, because unity_to_ros and ros_to_unity both have determinant -1 (as a result of the handedness switch).
The hand-wavy answer is that the handedness is switching twice, so the result has no handedness switch. The deeper answer has to do with the fact that similarity transformations preserve certain aspects of the operator, but I don't have enough math to make the proof.
Note that this will give you a correct result, but you can probably do it more quickly if unity_to_ros is a simple matrix (say, with just an axis swap). But you should probably derive that faster method by expanding the math done here.
(*) Actually, there is the distinction between Hamilton and JPL quaternions; but everybody uses Hamilton so there's no need to muddy the waters with that.
I think that the solution is:
Given: Right Hand: {w,x,y,z}
Convert: Left Hand: {-w,z,y,x}
In unity:
new Quaternion(rhQz,rhQy,rhQx,-rhQw)
Ok, just to be clear, quaternions don't actually have handedness. They are handless(see wikipedia article on quaternions). HOWEVER, the conversion to a matrix from a quaternion does have handedness associated with it. See http://osdir.com/ml/games.devel.algorithms/2002-11/msg00318.html
If your code performs this conversion, you may have to have two separate functions to convert to a left handed matrix or a right handed matrix.
Hope that helps.
Once you do that, you no longer have a quaternion, i.e. the usual rules for multiplying
them won't work. The identity i^2 = j^2 = k^2 = ijk = -1 will no longer hold if you swap
j and k (y and z in your right handed system).
http://www.gamedev.net/community/forums/topic.asp?topic_id=459925
To paraphrase, negate the axis.
I know this question is old, but the method below is tested and works.
I used pyquaternion to manipulate the quaternions.
To go from right to left.
Find the axis and angle of the right hand quaternion.
Then convert the axis to left hand coordinates.
Negate the right hand angle to get the left hand angle.
Construct quaternion with left handed axis and left hand angle.

Help me with Rigid Body Physics/Transformations

I want to instance a slider constraint, that allows a body to slide between point A and point B.
To instance the constraint, I assign the two bodies to constrain, in this case, one dynamic body constrained to the static world, think sliding door.
The third and fourth parameters are transformations, reference Frame A and reference Frame B.
To create and manipulate Transformations, the library supports Quaternions, Matrices and Euler angles.
The default slider constraint slides the body along the x-axis.
My question is:
How do I set up the two transformations, so that Body B slides along an axis given by its own origin and an additional point in space?
Naively I tried:
frameA.setOrigin(origin_of_point); //since the world itself has origin (0,0,0)
frameA.setRotation(Quaternion(directionToB, 0 rotation));
frameB.setOrigin(0,0,0); //axis goes through origin of object
frameB.setRotation(Quaternion(directionToPoint,0))
However, Quaternions don't seem to work as I expected. My mathematical knowledge of them is not good, so if someone could fill me in on why this doesn't work, I'd be grateful.
What happens is that the body slides along an axis orthogonal to the direction. When I vary the rotational part in the Quaternion constructor, the body is rotated around that sliding direction.
Edit:
The framework is bullet physics.
The two transformations are how the slider joint is attached at each body in respect to each body's local coordinate system.
Edit2
I could also set the transformations' rotational parts through a orthogonal basis, but then I'd have to reliably construct a orthogonal basis from a single vector. I hoped quaternions would prevent this.
Edit3
I'm having some limited success with the following procedure:
btTransform trafoA, trafoB;
trafoA.setIdentity();
trafoB.setIdentity();
vec3 bodyorigin(entA->getTrafo().col_t);
vec3 thisorigin(trafo.col_t);
vec3 dir=bodyorigin-thisorigin;
dir.Normalize();
mat4x4 dg=dgGrammSchmidt(dir);
mat4x4 dg2=dgGrammSchmidt(-dir);
btMatrix3x3 m(
dg.col_x.x, dg.col_y.x, dg.col_z.x,
dg.col_x.y, dg.col_y.y, dg.col_z.y,
dg.col_x.z, dg.col_y.z, dg.col_z.z);
btMatrix3x3 m2(
dg2.col_x.x, dg2.col_y.x, dg2.col_z.x,
dg2.col_x.y, dg2.col_y.y, dg2.col_z.y,
dg2.col_x.z, dg2.col_y.z, dg2.col_z.z);
trafoA.setBasis(m);
trafoB.setBasis(m2);
trafoA.setOrigin(btVector3(trafo.col_t.x,trafo.col_t.y,trafo.col_t.z));
btSliderConstraint* sc=new btSliderConstraint(*game.worldBody, *entA->getBody(), trafoA, trafoB, true);
However, the GramSchmidt always flips some axes of the trafoB matrix and the door appears upside down or right to left.
I was hoping for a more elegant way to solve this.
Edit4
I found a solution, but I'm not sure whether this will cause a singularity in the constraint solver if the top vector aligns with the sliding direction:
btTransform rbat = rba->getCenterOfMassTransform();
btVector3 up(rbat.getBasis()[0][0], rbat.getBasis()[1][0], rbat.getBasis()[2][0]);
btVector3 direction = (rbb->getWorldTransform().getOrigin() - btVector3(trafo.col_t.x, trafo.col_t.y, trafo.col_t.z)).normalize();
btScalar angle = acos(up.dot(direction));
btVector3 axis = up.cross(direction);
trafoA.setRotation(btQuaternion(axis, angle));
trafoB.setRotation(btQuaternion(axis, angle));
trafoA.setOrigin(btVector3(trafo.col_t.x,trafo.col_t.y,trafo.col_t.z));
Is it possible you're making this way too complicated? It sounds like a simple parametric translation (x = p*A+(1-p)*B) would do it. The whole rotation / orientation thing is a red herring if your sliding-door analogy is accurate.
If, on the other hand, you're trying to constrain to an interpolation between two orientations, you'll need to set additional limits 'cause there is no unique solution in the general case.
-- MarkusQ
It would help if you could say what framework or API you're using, or copy and paste the documentation for the function you're calling. Without that kind of detail I can only guess:
Background: a quaternion represents a 3-dimensional rotation combined with a scale. (Usually you don't want the complications involved in managing the scale, so you work with unit quaternions representing rotations only.) Matrices and Euler angles are two alternative ways of representing rotations.
A frame of reference is a position plus a rotation. Think of an object placed at a position in space and then rotated to face in a particular direction.
So frame A probably needs to be the initial position and rotation of the object (when the slider is at one end), and frame B the final position and rotation of the object (when the slider is at the other end). In particular, the two rotations probably ought to be the same, since you want the object to slide rigidly.
But as I say, this is just a guess.
Update: is this Bullet Physics? It doesn't seem to have much in the way of documentation, does it?
Perhaps you are looking for slerp?
Slerp is shorthand for spherical
linear interpolation, introduced by
Ken Shoemake in the context of
quaternion interpolation for the
purpose of animating 3D rotation. It
refers to constant speed motion along
a unit radius great circle arc, given
the ends and an interpolation
parameter between 0 and 1.
At the end of the day, you still need the traditional rotational matrix to get things rotated.
Edit: So, I am still guessing, but I assume that the framework takes care of the slerping and you want the two transformations which describes begin state and the end state?
You can stack affine transformations on top of the other. Except you have to think backwards. For example, let's say the sliding door is placed at (1, 1, 1) facing east at the begin state and you want to slide it towards north by (0, 1, 0). The door would end up at (1, 1, 1) + (0, 1, 0).
For begin state, rotate the door towards east. Then on top of that you apply another translation matrix to move the door to (1, 1, 1). For end state, again, you rotate the door towards east, then you move the door to (1, 1, 1) by applying the translation matrix again. Next, you apply the translation matrix (0, 1, 0).

Resources