I've recently been venturing into conversion of 3D points in space to a 2D pixel position on a screen, and almost every single answer I've found has been something like "do X with your world-to-camera matrix, and multiply by your viewport height to get it in pixels".
Now, that's all fine and good, but oftentimes these questions were about programming for video game engines, where a function to get a camera's view matrix is often built into a library and called on-command. But in my case, I can't do that - I need to know how to, given an FOV (say, 78 degrees) and a position and angle (of the format pitch = x, yaw = y, roll = z) it's facing, calculate the view matrix of a virtual camera.
Does anybody know what I need to do? I'm working with Lua (with built-in userdata for things like 3D vectors, angles, and 4x4 matrices exposed via the C interface), if that helps.
I am using gluPerspective
where:
fovw,fovh // are FOV in width and height of screen angles [rad]
zn,zf // are znear,zfar distances from focal point of camera
When using FOVy notation from OpenGL then:
aspect = width/height
fovh = FOVy
fovw = FOVx = FOVy*aspect
so just feed your 4x4 matrix with the values in order defined by notations you use (column or row major order).
I got the feeling you are doing SW render on your own so Do not forget to do the perspective divide!. Also take a look at the matrix link above and also at:
3D graphic pipeline
Related
I would like to know what is the set of 3 equations (in the world coordinates) of the line going through my camera (perpendicular to the camera screen). The position and rotation of my camera in the world coordinates being defined by a 4x4 matrix.
Any idea?
parametric line is simple just extract the Z axis direction vector and origin point O from the direct camera matrix (see the link below on how to do it). Then any point P on your line is defined as:
P(t) = O + t*Z
where t is your parameter. The camera view direction is usually -Z for OpenGL perspective in such case:
t = (-inf,0>
Depending on your projection you might want to use:
t = <-z_far,-z_near>
The problem is there are many combinations of conventions. So you need to know if you have row major or column major order of your matrix (so you know if the direction vectors and origins are in rows or columns). Also camera matrix in gfx is usually inverse one so you need to invert it first. For more info about this see:
Understanding 4x4 homogenous transform matrices
Is there a way to convert that data:
Object position which is a 3D point (X, Y, Z),
Camera position which is a 3D point (X, Y, Z),
Camera yaw, pitch, roll (-180:180, -90:90, 0)
Field of view (-45°:45°)
Screen width & height
into the 2D point on the screen (X, Y)?
I'm looking for proper math calculations according to this exact set of data.
It's difficult, but it's possible to do it for yourself.
There are lots of libraries that do this for you, but it is more satisfying if you do it yourself:
This problem is possible and I have written my own 3D engine to do this for objects in javascript using the HTML5 Canvas. You can see my code here and solve a 3D maze game I wrote here to try and understand what I will talk about below...
The basic idea is to work in steps. To start, you have to forget about camera angle (yaw, pitch and roll) as these come later and just imagine you are looking down the y axis. Then the basic idea is to calculate, using trig, the pitch angle and yaw to your object coordinate. By this I mean imagining that you are looking through a letterbox, the yaw angle would be the angle in degrees left and right to your coordinate (so both positive and negative) from the center/ mid line and the yaw up and down from it. Taking these angles, you can map them to the x and y 2D coordinate system.
The calculations for the angles are:
pitch = atan((coord.x - cam.x) / (coord.y - cam.y))
yaw = atan((coord.z - cam.z) / (coord.y - cam.y))
with coord.x, coord.y and coord.z being the coordinates of the object and the same for the cam (cam.x, cam.y and cam.z). These calculations also assume that you are using a Cartesian coordinate system with the different axis being: z up, y forward and x right.
From here, the next step is to map this angle in the 3D world to a coordinate which you can use in a 2D graphical representation.
To map these angles into your screen, you need to scale them up as distances from the mid line. This means multiplying them by your screen width / fov. Finally, these distances will now be positive or negative (as it is an angle from the mid line) so to actually draw it on a canvas, you need to add it to half of the screen width.
So this would mean your canvas coordinate would be:
x = width / 2 + (pitch * (width / fov)
y = height / 2 + (yaw * (height / fov)
where width and height are the dimensions of you screen, fov is the camera's fov and yaw and pitch are the respective angles of the object from the camera.
You have now achieved the first big step which is mapping a 3D coordinate down to 2D. If you have managed to get this all working, I would suggest trying multiple points and connecting them to form shapes. Also try moving your cameras position to see how the perspective changes as you will soon see how realistic it already looks.
In addition, if this worked fine for you, you can move on to having the camera be able to not only change its position in the 3D world but also change its perspective as in yaw, pitch and roll angles. I will not go into this entirely now, but the basic idea is to use 3D world transformation matrices. You can read up about them here but they do get quite complicated, however I can give you the calculations if you get this far.
It might help to read (old style) OpenGL specs:
https://www.khronos.org/registry/OpenGL/specs/gl/glspec14.pdf
See section 2.10
Also:
https://www.khronos.org/opengl/wiki/Vertex_Transformation
Might help with more concrete examples.
Also, for "proper math" look up 4x4 matrices, projections, and homogeneous coordinates.
https://en.wikipedia.org/wiki/Homogeneous_coordinates
I'm writing a .NET program that allows a user to register an image by identifying specific points on an image and then specifying the real world coordinates associated with each of those points.
http://www.ironbyte.ca/temp/mountain.jpg
The image registration process also requires the user to specify the coordinates of the camera.
What I'd the like to be able to do after the image is registered is draw other points on the image based on their real-world coordinates.
I've done a great deal of reading on perspective projections but I'm struggling to get things working. I must admit that my math skills are not what they should be which is part of the struggle. Where I am getting stuck is trying to determine focal length and distance to the display surface:
Referred to as the Viewer's Position (e [x,y,z]) in this article: http://en.wikipedia.org/wiki/3D_projection#Perspective_projection
I've also been referring to this article as well:
http://www.shotlink.com/Tour/WebTemplate/shotlinknew.nsf/2c47cc31e412bc4985256e6e00287832/c1743b40acf6aa03852575b7007122b0/$FILE/Plotting%203D%20ShotLink%20Data%20on%202D%20Images.pdf
which extracts the focal length from the field of view, which appears to be know beforehand, but is not in my case.
So I guess my question then is, is there a way I can work in reverse to determine focal length and/or field of view based on the position of the known points on my image? Or am I looking at this the wrong way and maybe there is an easier way to accomplish the end goal?
EDIT: I got myself confused by mixing units on the schema. I thus reworked a bit my answer.
It sounds feasible to me, if we look at the maths behind the projection.
Here is a not-so-rigorous schema of the situation for the horizontal coordinate (I'm mixing real world coordinates and pixels one to try to illustrate your situation):
With:
D, one of the points given by the users, with (x,y,z) its projected position with respects to the relative coordinate system defined by the camera (so after applying its translation and rotation)
E the camera point - origin of the coordinate system described above.
B the resulting point in your picture plane, with u and v in pixels. The picture plane has for dimensions w x h pixels.
f the focal length (same unit as for x, y, z...) in pixels, F its value in the real-world unit, and α the horizontal half-angle of view - the values you want to evaluate
You can see that the triangles ECD and EBM are similar, so using the Side-Splitter Theorem, we get:
EM / EC = MB / CD <=> f / z = u / x (we are comparing ratio, so no problem if the left member of the equation uses a real-world unit while the right one uses pixels are real-world values divided by pixels one)
We thus get:
f = u / x * z
Now if you want α F, I think you'll need to know the dimensions r_x x r_y (real-world unit) of your camera's sensor, since:
tan(α) = (r_x / 2) / f F = r_x / (w / 2) * f
But as for α, you can get it through:
tan(α) = (w / 2) / f
If you want to do the parallel with the Wikipedia article you're pointing out, we've been using:
Where:
(d_x,d_y,d_z) = (x,y,z), position of the point in the camera system
(s_x,s_y) = (w,h), size of your printable surface
(r_x,r_y,r_z) = (r_x,r_y,f), characteristics of your recording surface
(b_x,b_y) = (u,v), position on your printable surface
I was able to solve this problem using an implementation of the Tsai algorithm (http://en.wikipedia.org/wiki/Camera_resectioning#Algorithms) which can compute a projection matrix using a minimum of 4 known points. Basically, I allow the user to specify the world coordinates of a point and then click on the image to specify the image coordinates. The algorithm uses these mappings (the more mappings, the more accurate the solution is), along with the image width and height to calculate a projection matrix. This projection matrix can then be used to project additional points onto the image using world coordinates.
I'm trying to calculate modelview matrix of my 2D camera but I can't get the formula right. I use the Affine3f transform class so the matrix is compatible with OpenGL. This is closest that I did get by trial and error. This code rotates and scales the camera ok, but if I apply translation and rotation at same time the camera movement gets messed up: camera moves in rotated fashion, which is not what I want. (And this probaly due to fact I first apply the rotation matrix and then translation)
Eigen::Affine3f modelview;
modelview.setIdentity();
modelview.translate(Eigen::Vector3f(camera_offset_x, camera_offset_y, 0.0f));
modelview.scale(Eigen::Vector3f(camera_zoom_x, camera_zoom_y, 0.0f));
modelview.rotate(Eigen::AngleAxisf(camera_angle, Eigen::Vector3f::UnitZ()));
modelview.translate(Eigen::Vector3f(camera_x, camera_y, 0.0f));
[loadmatrix_to_gl]
What I want is that camera would rotate and scale around offset position in screenspace {(0,0) is middle of the screen in this case} and then be positioned along the global xy-axes in worldspace {(0,0) is also initialy at middle of the screen} to the final position. How would I do this?
Note that I have set up also an orthographic projection matrix, which may affect this problem.
If you want a 2D image, rendered in the XY plane with OpenGL, to (1) rotate counter-clockwise by a around point P, (2) scale by S, and then (3) translate so that pixels at C (in the newly scaled and rotated image) are at the origin, you would use this transformation:
translate by -P (this moves the pixels at P to the origin)
rotate by a
translate by P (this moves the origin back to where it was)
scale by S (if you did this earlier, your rotation would be messed up)
translate by -C
If the 2D image we being rendered at the origin, you'd also need to end by translate by some value along the negative z axis to be able to see it.
Normally, you'd just do this with OpenGL basics (glTranslatef, glScalef, glRotatef, etc.). And you would do them in the reverse order that I've listed them. Since you want to use glLoadMatrix, you'd do things in the order I described with Eigen. It's important to remember that OpenGL is expecting a Column Major matrix (but that seems to be the default for Eigen; so that's probably not a problem).
JCooper did great explaining the steps to construct the initial matrix.
However I eventually solved the problem bit differently. There was few additional things and steps that were not obvious for me at the time. See JCooper answer's comments. First is to realize all matrix operations are relative.
Thus if you want to position or move the camera with absolute xy-axes, you must first decompose the matrix to extract its absolute position with unchanged axes. Then you translate the matrix by the difference of the old and new position.
Here is way to do this with Eigen:
First compute Affine2f matrix cmat scalar determinant D. With Eigen this is done with D = cmat.linear().determinant();. Next compute 'reverse' matrix matrev of the current rotation+scale matrix R using the D. matrev = (RS.array() / (1.0f / determ)).matrix()); where RS is cmat.matrix().topLeftCorner(2,2)
The absolute camera position P is then given by P = invmat * -C where C is cmat.matrix().col(2).head<2>()
Now we can reposition the camera anywhere along the absolute axes and keeping the rotation+scaling same: V = RS * (T - P) where RS is same as before, T is the new position vec and P is the decomposed position vec.
The cmat then simply translated by V to move the camera: cmat.pretranslate(V)
I'm playing around with OpenGL and I've got a question that I haven't been able to find an answer to or at least haven't found the right way to ask search engines. I have a a pretty simple setup. An 800x600 viewport and a projection matrix with a 45 degree field of view and near and far planes of 1.0 and 200.0. For the sake of discussion, the modelview matrix is the identity matrix.
What I'm trying to determine is the bounds of the view at a given depth. For example, (0,0,0) is the center of the screen. And I'm looking in the -Z direction.
I want to know, if I draw geometry on a plane 100 units into the screen (0,0,-100), what are the bounds of the view? How far in the x and y direction can I draw in this plane and the geometry still be visible.
More generically, Given a plane parallel to the near and far plane (and between them), what are the visible bounds of that plane?
Also, if what I'm trying to determine has a common name or is a common operation, what's it called? That way I can track down more reading material
Your view angle is 45 degrees, you have a plane at a distance of a away from the camera, with an unkown height h. The whole thing looks like this:
Note that the angle here is half of your field of view.
Dusting off the highschool maths books, we get:
tan(angle) = h/a
Rearrange for h and subsitute the half field of view:
h = tan(FieldOfView / 2) * a;
This is how much your plane extends upwards along the Y axis.
Since screens aren't square, the width of your plane is different to the height. More exactly, the width is the aspect ratio times the height. I.e. w = h * aspectRatio
I hope this answers your question.