I am trying to do / understand all the basic mathematical computations needed in the graphics pipeline to render a simple 2D image from a 3D scene description like VRML. Is there a good example of the steps needed, like model transformation (object coordinates to world coordinates), view transformation (from world coordinate to view coordinate), calculation of vertex normals for lighting, clipping, calculating the screen coordinates of objects inside the view frustum and creating the 2D projection to calculate the individual pixels with colors.
I am used to OpenGL style render math so I stick to it (all the renders use almost the same math)
First some therms to explain:
Transform matrix
Represents a coordinate system in 3D space
double m[16]; // it is 4x4 matrix stored as 1 dimensional array for speed
m[0]=xx; m[4]=yx; m[ 8]=zx; m[12]=x0;
m[1]=xy; m[5]=yy; m[ 9]=zy; m[13]=y0;
m[2]=xz; m[6]=yz; m[10]=zz; m[14]=z0;
m[3]= 0; m[7]= 0; m[11]= 0; m[15]= 1;
X(xx,xy,xz) is unit vector of X axis in GCS (global coordinate system)
Y(yx,yy,yz) is unit vector of Y axis in GCS
Z(zx,zy,zz) is unit vector of Z axis in GCS
P(x0,y0,z0) is origin of represented coordinate system in GCS
Transformation matrix is used to transform coordinates between GCS and LCS (local coordinate system)
GCS -> LCS: Al = Ag * m;
GCS <- LCS: Ag = Al * (m^-1);
Al (x,y,z,w=1) is 3D point in LCS ... in homogenous coordinates
Ag (x,y,z,w=1) is 3D point in GCS ... in homogenous coordinates
homogenous coordinate w=1 is added so we can multiply 3D vector by 4x4 matrix
m transformation matrix
m^-1 inverse transformation matrix
In most cases is m orthonormal which means X,Y,Z vectors are perpendicular to each other and with unit size this can be used for restoration of matrix accuracy after rotations,translations,etc ...
For more info see Understanding 4x4 homogenous transform matrices
Render matrices
There are usually used these matrices:
model - represents actual rendered object coordinate system
view - represents camera coordinate system (Z axis is the view direction)
modelview - model and view multiplied together
normal - the same as modelview but x0,y0,z0 = 0 for normal vector computations
texture - manipulate texture coordinates for easy texture animation and effect usually an unit matrix
projection - represent projections of camera view ( perspective ,ortho,...) it should not include any rotations or translations its more like Camera sensor calibration instead (otherwise fog and other effects will fail ...)
The rendering math
To render 3D scene you need 2D rendering routines like draw 2D textured triangle ... The render converts 3D scene data to 2D and renders it. There are more techniques out there but the most usual is use of boundary model representation + boundary rendering (surface only) The 3D -> 2D conversion is done by projection (orthogonal or perspective) and Z-buffer or Z-sorting.
Z-buffer is easy and native to now-days gfx HW
Z-sorting is done by CPU instead so its slower and need additional memory but it is necessary for correct transparent surfaces rendering.
So the pipeline is as this:
obtain actual rendered data from model
Vertex v
Normal n
Texture coord t
Color,Fog coord, etc...
convert it to appropriate space
v=projection*view*model*v ... camera space + projection
n=normal*n ... global space
t=texture*t ... texture space
clip data to screen
This step is not necessary but prevent to render of screen stuff for speed and also face culling is usually done here. If normal vector of rendered 'triangle' is opposite then the polygon winding rule set then ignore 'triangle'
render the 3D/2D data
use only v.x,v.y coordinates for screen rendering and v.z for z-buffer test/value also here goes the perspective division for perspective projections
Z-buffer works like this: Z-buffer (zed) is 2D array with the same size (resolution) as screen (scr). Any pixel scr[y][x] is rendered only if (zed[y][x]>=z) in that case scr[y][x]=color; zed[y][x]=z; The if condition can be different (it is changeable)
In case of using triangles or higher primitives for rendering The resulting 2D primitives are converted to pixels in process called rasterization for example like this:
Algorithm to fill triangle
For more clarity here is how it looks like:
Transformation matrices are multiplicative so if you need transform N points by M matrices you can create single matrix = m1*m2*...mM and convert N points by this resulting matrix only (for speed). Sometimes are used 3x3 transform matrix + shift vector instead of 4x4 matrix. it is faster in some cases but you cannot multiply more transformations together so easy. For transformation matrix manipulation look for basic operations like Rotate or Translate there are also matrices for rotations inside LCS which are more suitable for human control input but these are not native to renders like OpenGL or DirectX. (because they use inverse matrix)
Now all the above stuff was for standard polygonal rendering (surface boundary representation of objects). There are also other renderers out there like Volumetric rendering or (Back)Ray-tracers and hybrid methods. Also the scene can have any dimensionality not just 3D. Here some related QAs covering these topics:
GLSL 3D Volumetric back raytracer
GLSL 3D Mesh back raytracer
2D Doom/Wolfenstein techniques
4D Rendering techniques
Comanche Voxel space ray casting
You can have a look Chapter 15 from the book Computer Graphics: Principles and Practice - Third Edition by Hughes et al. That chapter
Derives the ray-casting and rasterization algorithms and then builds
the complete source code for a software ray-tracer, software
rasterizer, and hardware-accelerated rasterization renderer.
I've recently been venturing into conversion of 3D points in space to a 2D pixel position on a screen, and almost every single answer I've found has been something like "do X with your world-to-camera matrix, and multiply by your viewport height to get it in pixels".
Now, that's all fine and good, but oftentimes these questions were about programming for video game engines, where a function to get a camera's view matrix is often built into a library and called on-command. But in my case, I can't do that - I need to know how to, given an FOV (say, 78 degrees) and a position and angle (of the format pitch = x, yaw = y, roll = z) it's facing, calculate the view matrix of a virtual camera.
Does anybody know what I need to do? I'm working with Lua (with built-in userdata for things like 3D vectors, angles, and 4x4 matrices exposed via the C interface), if that helps.
I am using gluPerspective
fovw,fovh // are FOV in width and height of screen angles [rad]
zn,zf // are znear,zfar distances from focal point of camera
When using FOVy notation from OpenGL then:
aspect = width/height
fovh = FOVy
fovw = FOVx = FOVy*aspect
so just feed your 4x4 matrix with the values in order defined by notations you use (column or row major order).
I got the feeling you are doing SW render on your own so Do not forget to do the perspective divide!. Also take a look at the matrix link above and also at:
3D graphic pipeline
I would like to know what is the set of 3 equations (in the world coordinates) of the line going through my camera (perpendicular to the camera screen). The position and rotation of my camera in the world coordinates being defined by a 4x4 matrix.
Any idea?
parametric line is simple just extract the Z axis direction vector and origin point O from the direct camera matrix (see the link below on how to do it). Then any point P on your line is defined as:
P(t) = O + t*Z
where t is your parameter. The camera view direction is usually -Z for OpenGL perspective in such case:
t = (-inf,0>
Depending on your projection you might want to use:
t = <-z_far,-z_near>
The problem is there are many combinations of conventions. So you need to know if you have row major or column major order of your matrix (so you know if the direction vectors and origins are in rows or columns). Also camera matrix in gfx is usually inverse one so you need to invert it first. For more info about this see:
Understanding 4x4 homogenous transform matrices
This is related to a problem described in another question (images there):
Opengl shader problems - weird light reflection artifacts
I have a .obj importer that creates a data structure and calculates the tangents and bitangents. Here is the data for the first triangle in my object:
My understanding of tangent space is that the normal points outward from the vertex, the tangent is perpendicular (orthogonal?) to the normal vector and points in the direction of positive S in the texture, and the bitangent is perpendicular to both. I'm not sure what you call it but I thought that these 3 vectors formed what would look like a rotated or transformed x,y,z axis. They wouldn't be 3 randomly oriented vectors, right?
Also my understanding: The normals in a normal map provide a new normal vector. But in tangent space texture maps there is no built in orientation between the rgb encoded normal and the per vertex normal. So you use a TBN matrix to bridge the gap and get them in the same space (or get the lighting in the right space).
But then I saw the object data... My structure has 270 vertices and all of them have a 0 for the Tangent Y. Is that correct for tangent data? Are these tangents in like a vertex normal space or something? Or do they just look completely wrong? Or am I confused about how this works and my data is right?
To get closer to solving my problem in the other question I need to make sure my data is right and my understanding on how tangent space lighting math works.
The tangent and bitangent vectors point in the direction of the S and T components of the texture coordinate (U and V for people not used to OpenGL terms). So the tangent vector points along S and the bitangent points along T.
So yes, these do not have to be orthogonal to either the normal or each other. They follow the direction of the texture mapping. Indeed, that's their purpose: to allow you to transform normals from model space into the texture's space. They define a mapping from model space into the space of the texture.
The tangent and bitangent will only be orthogonal to each other if the S and T components at that vertex are orthogonal. That is, if the texture mapping has no sheering. And while most texture mapping algorithms will try to minimize sheering, they can't eliminate it. So if you want an accurate matrix, you need a non-orthogonal tangent and bitangent.
I need to project a 3D object onto a sphere's surface (uhm.. like casting a shadow).
AFAIR this should be possible with a projection matrix.
If the "shadow receiver" was a plane, then my projection matrix would be a 3D to 2D-plane projection, but my receiver in this case is a 3D spherical surface.
So given sphere1(centerpoint,radius),sphere2(othercenter,otherradius) and an eyepoint how can I compute a matrix that projects all points from sphere2 onto sphere1 (like casting a shadow).
Do you mean that given a vertex v you want the following projection:
v'= centerpoint + (v - centerpoint) * (radius / |v - centerpoint|)
This is not possible with a projection matrix. You could easily do it in a shader though.
Matrixes are commonly used to represent linear operations, like projection onto a plane.
In your case, the resulting vertices aren't deduced from input using a linear function, so this projection is not possible using a matrix.
If the sphere1 is sphere((0,0,0),1), that is, the sphere of radius 1 centered at the origin, then you're in effect asking for a way to convert any location (x,y,z) in 3D to a corresponding location (x', y', z') on the unit sphere. This is equivalent to vector renormalization: (x',y',z') = (x,y,z)/sqrt(x^2+y^2+z^2).
If sphere1 is not the unit sphere, but is say sphere((a,b,c),R) you can do mostly the same thing:
(x',y',z') = R*(x-a,y-b,z-c) / sqrt((x-a)^2+(y-b)^2+(z-c)^2) + (a,b,c). This is equivalent to changing coordinates so the first sphere is the unit sphere, solving the problem, then changing coordinates back.
As people have pointed out, these functions are nonlinear, so the projection cannot be called a "matrix." But if you prefer for some reason to start with a projection matrix, you could project first from 3D to a plane, then from a plane to the sphere. I'm not sure if that would be any better though.
Finally, let me point out that linear maps don't produce division-by-zero errors, but if you look closely at the formulas above, you'll see that this map can. Geometrically, that's because it's hard to project the center point of a sphere to its boundary.
In HLSL there's a lot of matrix multiplication and while I understand how and where to use them I'm not sure about how they are derived or what their actual goals are.
So I was wondering if there was a resource online that explains this, I'm particularly curious about what is the purpose behind multiplying a world matrix by a view matrix and a world+view matrix by a projection matrix.
You can get some info, from a mathematical viewpoint, on this wikipedia article or on msdn.
Essentially, when you render a 3d model to the screen, you start with a simple collection of vertices scattered in 3d space. These vertices all have their own positions expressed in "object space". That is, they usually have coordinates which have no meaning in the scene that is being rendered, but only express the relations between one vertex and the other of the same model.
For instance, the positions of the vertices of a model could only range from -1 to 1 (or similar, it depends on how the model has been created).
In order to render the model in the correct position, you have to scale, rotate and translate it to the "real" position in your scene. This position you are moving to is expressed in "world space" coordinates which also express the real relationships between vertices in your scene. To do so, you simply multiply each vertex' position with its World matrix. This matrix must be created to include the translation/rotation/scale parameters you need to apply, in order for the object to appear in the correct position in the scene.
At this point (after multiplying all vertices of all your models with a world matrix) your vertices are expressed in world coordinates, but you still cannot render them correctly because their position is not relative to your "view" (i.e. your camera). So, this time you multiply everything using a View matrix which reflects the position and orientation of the viewpoint from which you are rendering the scene.
All vertices are now in the correct position, but in order to simulate perspective you still have to multiply everything with a Projection matrix. This last multiplication determines how the position of the vertices changes based on distance from the camera.
And now finally all vertices, starting from their position in "object space", have been moved to the final position on the screen, where they will be rendered, rasterized and then presented.
Online resources: Direct3D Matrices , Projection Metrices, Direct3D Transformation, The Importance of Matrices in the DirectX API.