Projection of 3D Coordinates onto a 2D image with known points

Projection of 3D Coordinates onto a 2D image with known points - math

I'm writing a .NET program that allows a user to register an image by identifying specific points on an image and then specifying the real world coordinates associated with each of those points.
http://www.ironbyte.ca/temp/mountain.jpg
The image registration process also requires the user to specify the coordinates of the camera.
What I'd the like to be able to do after the image is registered is draw other points on the image based on their real-world coordinates.
I've done a great deal of reading on perspective projections but I'm struggling to get things working. I must admit that my math skills are not what they should be which is part of the struggle. Where I am getting stuck is trying to determine focal length and distance to the display surface:
Referred to as the Viewer's Position (e [x,y,z]) in this article: http://en.wikipedia.org/wiki/3D_projection#Perspective_projection
I've also been referring to this article as well:
http://www.shotlink.com/Tour/WebTemplate/shotlinknew.nsf/2c47cc31e412bc4985256e6e00287832/c1743b40acf6aa03852575b7007122b0/$FILE/Plotting%203D%20ShotLink%20Data%20on%202D%20Images.pdf
which extracts the focal length from the field of view, which appears to be know beforehand, but is not in my case.
So I guess my question then is, is there a way I can work in reverse to determine focal length and/or field of view based on the position of the known points on my image? Or am I looking at this the wrong way and maybe there is an easier way to accomplish the end goal?

EDIT: I got myself confused by mixing units on the schema. I thus reworked a bit my answer.
It sounds feasible to me, if we look at the maths behind the projection.
Here is a not-so-rigorous schema of the situation for the horizontal coordinate (I'm mixing real world coordinates and pixels one to try to illustrate your situation):
With:
D, one of the points given by the users, with (x,y,z) its projected position with respects to the relative coordinate system defined by the camera (so after applying its translation and rotation)
E the camera point - origin of the coordinate system described above.
B the resulting point in your picture plane, with u and v in pixels. The picture plane has for dimensions w x h pixels.
f the focal length (same unit as for x, y, z...) in pixels, F its value in the real-world unit, and α the horizontal half-angle of view - the values you want to evaluate
You can see that the triangles ECD and EBM are similar, so using the Side-Splitter Theorem, we get:
EM / EC = MB / CD <=> f / z = u / x (we are comparing ratio, so no problem if the left member of the equation uses a real-world unit while the right one uses pixels are real-world values divided by pixels one)
We thus get:
f = u / x * z
Now if you want α F, I think you'll need to know the dimensions r_x x r_y (real-world unit) of your camera's sensor, since:
tan(α) = (r_x / 2) / f F = r_x / (w / 2) * f
But as for α, you can get it through:
tan(α) = (w / 2) / f
If you want to do the parallel with the Wikipedia article you're pointing out, we've been using:
Where:
(d_x,d_y,d_z) = (x,y,z), position of the point in the camera system
(s_x,s_y) = (w,h), size of your printable surface
(r_x,r_y,r_z) = (r_x,r_y,f), characteristics of your recording surface
(b_x,b_y) = (u,v), position on your printable surface

I was able to solve this problem using an implementation of the Tsai algorithm (http://en.wikipedia.org/wiki/Camera_resectioning#Algorithms) which can compute a projection matrix using a minimum of 4 known points. Basically, I allow the user to specify the world coordinates of a point and then click on the image to specify the image coordinates. The algorithm uses these mappings (the more mappings, the more accurate the solution is), along with the image width and height to calculate a projection matrix. This projection matrix can then be used to project additional points onto the image using world coordinates.

Related

Calculating camera matrix, given position, angle and FOV

I've recently been venturing into conversion of 3D points in space to a 2D pixel position on a screen, and almost every single answer I've found has been something like "do X with your world-to-camera matrix, and multiply by your viewport height to get it in pixels".
Now, that's all fine and good, but oftentimes these questions were about programming for video game engines, where a function to get a camera's view matrix is often built into a library and called on-command. But in my case, I can't do that - I need to know how to, given an FOV (say, 78 degrees) and a position and angle (of the format pitch = x, yaw = y, roll = z) it's facing, calculate the view matrix of a virtual camera.
Does anybody know what I need to do? I'm working with Lua (with built-in userdata for things like 3D vectors, angles, and 4x4 matrices exposed via the C interface), if that helps.

I am using gluPerspective
where:
fovw,fovh // are FOV in width and height of screen angles [rad]
zn,zf // are znear,zfar distances from focal point of camera
When using FOVy notation from OpenGL then:
aspect = width/height
fovh = FOVy
fovw = FOVx = FOVy*aspect
so just feed your 4x4 matrix with the values in order defined by notations you use (column or row major order).
I got the feeling you are doing SW render on your own so Do not forget to do the perspective divide!. Also take a look at the matrix link above and also at:
3D graphic pipeline

3D: Check point inside elliptical cone

I seem to have searched the whole internet trying to find an implementation of checking if a 3d point is within an elliptical cone defined by (origin, length, horizontal angle, vertical angle). Unfortunately without success as I only really found one math solution which I did not understand.
Now I am aware on how to use implement it using a normal cone:
inRange = magnitude(point - origin) <= length;
heading = normalized(point - origin);
return dot(forward, heading) >= cos(angle) && inRange;
However there the height detection is far too tall. I would really like to implement a more realistic vision cone for the AI for a game but this requires having the cone shaped more like a human field of view being more wide than tall.
Thanks a lot for any help:)

Given a 3D elliptic cone, with base at B=(x_B,y_B,z_B), height h along the cone axis k=(k_x,k_y,j_z), major base radius a, minor base radius b and direction along the major axis i=(i_x,i_y,i_z) you need to find if a point P=(x,y,z) lies inside the cone. It is your choice on how to parametrize the major axis direction and I think your are trying to use spherical coordinates with two angles.
Here are the steps to take:
Establish a coordinate system with origin on the base B and with the local x axis along your major axis i. The local z axis should be towards the tip along k. Finally the local y axis should be
j=cross(k,i)=(i_z*k_y-i_y*k_z, i_x*k_z-i_z*k_x, i_y*k_x-i_x*k_y)
j=normalize(j)
Your 3×3 rotation matrix is defined by the columns E=[i,j,k]
Transform your point P=(x,y,z) into the local coordinates with
P2 = transpose(E)*(P-B) = (x2,y2,z2)
Now establish how far along the axis of the cone is with s=(h-z2)/h where s=0 at the tip and s=1 at the base.
If s>1 or s<0 then the point is outside
Otherwise if s>0 you need to check that (x2/(s*a))^2+(y2/(s*b))^2<=1 for the point to be inside.
If s=0 then check that x2=0 and y2=0 for the point being exactly at the tip.
If you cannot do basic vector algebra, like cross products, 3D transformations and normalization that I suggest you have some reading to do before you can understand what is going on here.
Note:
// | i_x i_y i_z |
// transpose(E) = | j_x j_y j_z |
// | k_x k_y k_z |

In OpenGL, How can I determine the bounds of the view at a given depth

I'm playing around with OpenGL and I've got a question that I haven't been able to find an answer to or at least haven't found the right way to ask search engines. I have a a pretty simple setup. An 800x600 viewport and a projection matrix with a 45 degree field of view and near and far planes of 1.0 and 200.0. For the sake of discussion, the modelview matrix is the identity matrix.
What I'm trying to determine is the bounds of the view at a given depth. For example, (0,0,0) is the center of the screen. And I'm looking in the -Z direction.
I want to know, if I draw geometry on a plane 100 units into the screen (0,0,-100), what are the bounds of the view? How far in the x and y direction can I draw in this plane and the geometry still be visible.
More generically, Given a plane parallel to the near and far plane (and between them), what are the visible bounds of that plane?
Also, if what I'm trying to determine has a common name or is a common operation, what's it called? That way I can track down more reading material

Your view angle is 45 degrees, you have a plane at a distance of a away from the camera, with an unkown height h. The whole thing looks like this:
Note that the angle here is half of your field of view.
Dusting off the highschool maths books, we get:
tan(angle) = h/a
Rearrange for h and subsitute the half field of view:
h = tan(FieldOfView / 2) * a;
This is how much your plane extends upwards along the Y axis.
Since screens aren't square, the width of your plane is different to the height. More exactly, the width is the aspect ratio times the height. I.e. w = h * aspectRatio
I hope this answers your question.

What are barycentric calculations used for?

I've been looking at XNA's Barycentric method and the descriptions I find online are pretty opaque to me. An example would be nice. Just an explanation in English would be great... what is the purpose and how could it be used?

From Wikipedia:
In geometry, the barycentric coordinate system is a coordinate system in which the location of a point is specified as the center of mass, or barycenter, of masses placed at the vertices of a simplex (a triangle, tetrahedron, etc).
They are used, I believe, for raytracing in game development.
When a ray intersects a triangle in a normal mesh, you just record it as either a hit or a miss. But if you want to implement a subsurf modifier (image below), which makes meshes much smoother, you will need the distance the ray hit from the center of the triangle (which is much easier to work with in Barycentric coordinates).
Subsurf modifiers are not that hard to visualize:
The cube is the original shape, and the smooth mesh inside is the "subsurfed" cube, I think with a recursion depth of three or four.
Actually, that might not be correct. Don't take my exact word for it, but I do know that they are used for texture mapping on geometric shapes.
Here's a little set of slides you can look at: http://www8.cs.umu.se/kurser/TDBC07/HT04/handouts/HO-lecture11.pdf

In practice the barycentric coordinates of a point P in respect of a triangle ABC are just its weights (u,v,w) according to the triangle's vertices, such that P = u*A + v*B + w*C. If the point lies within the triangle, you got u,v,w in [0,1] and u+v+w = 1.
They are used for any task involving knowledge of a point's location in respect to the vertices of a triangle, like e.g. interpolation of attributes across a triangle. For example in raytracing you got a hitpoint inside the triangle. When you want to know that point's normal or other attributes, you compute its barycentric coordinates within the triangle. Then you can use these weights to sum up the attributes of the triangle's vertices and you got the interpolated attribute.
To compute a point P's barycentric coordinates (u,v,w) within a triangle ABC you can use:
u = [PBC] / [ABC]
v = [APC] / [ABC]
w = [ABP] / [ABC]
where [ABC] denotes the area of the triangle ABC.

Normal Vector of Three Points

Hey math geeks, I've got a problem that's been stumping me for a while now. It's for a personal project.
I've got three dots: red, green, and blue. They're positioned on a cardboard slip such that the red dot is in the lower left (0,0), the blue dot is in the lower right (1,0), and the green dot is in the upper left. Imagine stepping back and taking a picture of the card from an angle. If you were to find the center of each dot in the picture (let's say the units are pixels), how would you find the normal vector of the card's face in the picture (relative to the camera)?
Now a few things I've picked up about this problem:
The dots (in "real life") are always at a right angle. In the picture, they're only at a right angle if the camera has been rotated around the red dot along an "axis" (axis being the line created by the red and blue or red and green dots).
There are dots on only one side of the card. Thus, you know you'll never be looking at the back of it.
The distance of the card to the camera is irrelevant. If I knew the depth of each point, this would be a whole lot easier (just a simple cross product, no?).
The rotation of the card is irrelevant to what I'm looking for. In the tinkering that I've been doing to try to figure this one out, the rotation can be found with the help of the normal vector in the end. Whether or not the rotation is a part of (or product of) finding the normal vector is unknown to me.
Hope there's someone out there that's either done this or is a math genius. I've got two of my friends here helping me on it and we've--so far--been unsuccessful.

i worked it out in my old version of MathCAD:
Edit: Wording wrong in screenshot of MathCAD: "Known: g and b are perpendicular to each other"
In MathCAD i forgot the final step of doing the cross-product, which i'll copy-paste here from my earlier answer:
Now we've solved for the X-Y-Z of the
translated g and b points, your
original question wanted the normal of
the plane.
If cross g x b, we'll get the
vector normal to both:
| u1 u2 u3 |
g x b = | g1 g2 g3 |
| b1 b2 b3 |
= (g2b3 - b2g3)u1 + (b1g3 - b3g1)u2 + (g1b2 - b1g2)u3
All the values are known, plug them in
(i won't write out the version with g3
and b3 substituted in, since it's just
too long and ugly to be helpful.
But in practical terms, i think you'll have to solve it numerically, adjusting gz and bz so as to best fit the conditions:
g · b = 0
and
|g| = |b|
Since the pixels are not algebraically perfect.
Example
Using a picture of the Apollo 13 astronauts rigging one of the command module's square Lithium Hydroxide cannister to work in the LEM, i located the corners:
Using them as my basis for an X-Y plane:
i recorded the pixel locations using Photoshop, with positive X to the right, and positive Y down (to keep the right-hand rule of Z going "into" the picture):
g = (79.5, -48.5, gz)
b = (-110.8, -62.8, bz)
Punching the two starting formulas into Excel, and using the analysis toolpack to "minimize" the error by adjusting gz and bz, it came up with two Z values:
g = (79.5, -48.5, 102.5)
b = (-110.8, -62.8, 56.2)
Which then lets me calcuate other interesting values.
The length of g and b in pixels:
|g| = 138.5
|b| = 139.2
The normal vector:
g x b = (3710, -15827, -10366)
The unit normal (length 1):
uN = (0.1925, -0.8209, -0.5377)
Scaling normal to same length (in pixels) as g and b (138.9):
Normal = (26.7, -114.0, -74.7)
Now that i have the normal that is the same length as g and b, i plotted them on the same picture:
i think you're going to have a new problem: distortion introduced by the camera lens. The three dots are not perfectly projected onto the 2-dimensional photographic plane. There's a spherical distortion that makes straight lines no longer straight, makes equal lengths no longer equal, and makes the normals slightly off of normal.
Microsoft research has an algorithm to figure out how to correct for the camera's distortion:
A Flexible New Technique for Camera Calibration
But it's beyond me:
We propose a flexible new technique to
easily calibrate a camera. It is well
suited for use without specialized
knowledge of 3D geometry or computer
vision. The technique only requires
the camera to observe a planar pattern
shown at a few (at least two)
different orientations. Either the
camera or the planar pattern can be
freely moved. The motion need not be
known. Radial lens distortion is
modeled. The proposed procedure
consists of a closed-form solution,
followed by a nonlinear refinement
based on the maximum likelihood
criterion. Both computer simulation
and real data have been used to test
the proposed technique, and very good
results have been obtained. Compared
with classical techniques which use
expensive equipments such as two or
three orthogonal planes, the proposed
technique is easy to use and flexible.
It advances 3D computer vision one
step from laboratory environments to
real world use.
They have a sample image, where you can see the distortion:
(source: microsoft.com)
Note
you don't know if you're seeing the "top" of the cardboard, or the "bottom", so the normal could be mirrored vertically (i.e. z = -z)
Update
Guy found an error in the derived algebraic formulas. Fixing it leads to formulas that i, don't think, have a simple closed form. This isn't too bad, since it can't be solved exactly anyway; but numerically.
Here's a screenshot from Excel where i start with the two knowns rules:
g · b = 0
and
|g| = |b|
Writing the 2nd one as a difference (an "error" amount), you can then add both up and use that value as a number to have excel's solver minimize:
This means you'll have to write your own numeric iterative solver. i'm staring over at my Numerical Methods for Engineers textbook from university; i know it contains algorithms to solve recursive equations with no simple closed form.

From the sounds of it, you have three points p1, p2, and p3 defining a plane, and you want to find the normal vector to the plane.
Representing the points as vectors from the origin, an equation for a normal vector would be
n = (p2 - p1)x(p3 - p1)
(where x is the cross-product of the two vectors)
If you want the vector to point outwards from the front of the card, then ala the right-hand rule, set
p1 = red (lower-left) dot
p2 = blue (lower-right) dot
p3 = green (upper-left) dot

# Ian Boyd...I liked your explanation, only I got stuck on step 2, when you said to solve for bz. You still had bz in your answer, and I don't think you should have bz in your answer...
bz should be +/- square root of gx2 + gy2 + gz2 - bx2 - by2
After I did this myself, I found it very difficult to substitute bz into the first equation when you solved for gz, because when substituting bz, you would now get:
gz = -(gxbx + gyby) / sqrt( gx2 + gy2 + gz2 - bx2 - by2 )
The part that makes this difficult is that there is gz in the square root, so you have to separate it and combine the gz together, and solve for gz Which I did, only I don't think the way I solved it was correct, because when I wrote my program to calculate gz for me, I used your gx, and gy values to see if my answer matched up with yours, and it did not.
So I was wondering if you could help me out, because I really need to get this to work for one of my projects. Thanks!

Just thinking on my feet here.
Your effective inputs are the apparent ratio RB/RG [+], the apparent angle BRG, and the angle that (say) RB makes with your screen coordinate y-axis (did I miss anything). You need out the components of the normalized normal (heh!) vector, which I believe is only two independent values (though you are left with a front-back ambiguity if the card is see through).[++]
So I'm guessing that this is possible...
From here on I work on the assumption that the apparent angle of RB is always 0, and we can rotate the final solution around the z-axis later.
Start with the card positioned parallel to the viewing plane and oriented in the "natural" way (i.e. you upper vs. lower and left vs. right assignments are respected). We can reach all the interesting positions of the card by rotating by \theta around the initial x-axis (for -\pi/2 < \theta < \pi/2), then rotating by \phi around initial y-axis (for -\pi/2 < \phi < \pi/2). Note that we have preserved the apparent direction of the RB vector.
Next step compute the apparent ratio and apparent angle after in terms of \theta and \phi and invert the result.[+++]
The normal will be R_y(\phi)R_x(\theta)(0, 0, 1) for R_i the primitive rotation matrix around axis i.
[+] The absolute lengths don't count, because that just tells you the distance to card.
[++] One more assumption: that the distance from the card to view plane is much large than the size of the card.
[+++] Here the projection you use from three-d space to the viewing plane matters. This is the hard part, but not something we can do for you unless you say what projection you are using. If you are using a real camera, then this is a perspective projection and is covered in essentially any book on 3D graphics.

right, the normal vector does not change by distance, but the projection of the cardboard on a picture does change by distance (Simple: If you have a small cardboard, nothing changes.
If you have a cardboard 1 mile wide and 1 mile high and you rotate it so that one side is nearer and the other side more far away, the near side is magnified and the far side shortened on the picture. You can see that immediately that an rectangle does not remain a rectangle, but a trapeze)
The mostly accurate way for small angles and the camera centered on the middle is to measure the ratio of the width/height between "normal" image and angle image on the middle lines (because they are not warped).
We define x as left to right, y as down to up, z as from far to near.
Then
x = arcsin(measuredWidth/normWidth) red-blue
y = arcsin(measuredHeight/normHeight) red-green
z = sqrt(1.0-x^2-y^2)
I will calculate tomorrow a more exact solution, but I'm too tired now...

You could use u,v,n co-oridnates. Set your viewpoint to the position of the "eye" or "camera", then translate your x,y,z co-ordinates to u,v,n. From there you can determine the normals, as well as perspective and visible surfaces if you want (u',v',n'). Also, bear in mind that 2D = 3D with z=0. Finally, make sure you use homogenious co-ordinates.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex