Can I use homography to project camera image points to the ground (2D plane)? - math

My problem is quite simple yet I struggle to solve it correctly.
I have a camera looking towards the ground and I know all the parameters of the shot. So, using some maths I was able to compute the 4 points defining the field of view of the camera (the coordinates on the ground of each image's corners).
Now, from the coordinates (x, y) of a pixel of the image, I would like to know its real coordinates projected on the ground.
I thought that homography was the way to go, but I read here and there that "homography maps a plane seen from a camera to the same plane seen from another" which is a slightly different problem.
What should I use, please?
Edit: Here is an example.
Given this image:
I know everything about the camera that took the picture (height, angles of view, orientation), so I could calculate the coordinates of the four corners forming its field of view on the ground, for example (in centimeters, relative to the camera position, clockwise from top-left): (-300, 500), (300, 500), (100, 50), (-100, 50).
Knowing that the coordinates on the image of the blade of grass are (1750, 480), how can I know its actual coordinates on the ground?

By "knowing everything" about the camera, do you mean you have the camera FOV, rotation and translation with respect to the ground plane? Then it's trivial, right?
Write the camera matrix K = [[f, 0, w/2],[0, f, h/2],[0, 0, 1]]. Let R and t be respectively the 3x3 rotation matrix and 3x1 translation from camera to ground. A point on the ray going through a given pixel p=[u, v, 1] has camera coordinates r = inv(K) * p. Express it in world coordinates as R * r + t, intersect with the ground plane and you are done.

Related

Converting XYZ to XY (world coords to screen coords)

Is there a way to convert that data:
Object position which is a 3D point (X, Y, Z),
Camera position which is a 3D point (X, Y, Z),
Camera yaw, pitch, roll (-180:180, -90:90, 0)
Field of view (-45°:45°)
Screen width & height
into the 2D point on the screen (X, Y)?
I'm looking for proper math calculations according to this exact set of data.
It's difficult, but it's possible to do it for yourself.
There are lots of libraries that do this for you, but it is more satisfying if you do it yourself:
This problem is possible and I have written my own 3D engine to do this for objects in javascript using the HTML5 Canvas. You can see my code here and solve a 3D maze game I wrote here to try and understand what I will talk about below...
The basic idea is to work in steps. To start, you have to forget about camera angle (yaw, pitch and roll) as these come later and just imagine you are looking down the y axis. Then the basic idea is to calculate, using trig, the pitch angle and yaw to your object coordinate. By this I mean imagining that you are looking through a letterbox, the yaw angle would be the angle in degrees left and right to your coordinate (so both positive and negative) from the center/ mid line and the yaw up and down from it. Taking these angles, you can map them to the x and y 2D coordinate system.
The calculations for the angles are:
pitch = atan((coord.x - cam.x) / (coord.y - cam.y))
yaw = atan((coord.z - cam.z) / (coord.y - cam.y))
with coord.x, coord.y and coord.z being the coordinates of the object and the same for the cam (cam.x, cam.y and cam.z). These calculations also assume that you are using a Cartesian coordinate system with the different axis being: z up, y forward and x right.
From here, the next step is to map this angle in the 3D world to a coordinate which you can use in a 2D graphical representation.
To map these angles into your screen, you need to scale them up as distances from the mid line. This means multiplying them by your screen width / fov. Finally, these distances will now be positive or negative (as it is an angle from the mid line) so to actually draw it on a canvas, you need to add it to half of the screen width.
So this would mean your canvas coordinate would be:
x = width / 2 + (pitch * (width / fov)
y = height / 2 + (yaw * (height / fov)
where width and height are the dimensions of you screen, fov is the camera's fov and yaw and pitch are the respective angles of the object from the camera.
You have now achieved the first big step which is mapping a 3D coordinate down to 2D. If you have managed to get this all working, I would suggest trying multiple points and connecting them to form shapes. Also try moving your cameras position to see how the perspective changes as you will soon see how realistic it already looks.
In addition, if this worked fine for you, you can move on to having the camera be able to not only change its position in the 3D world but also change its perspective as in yaw, pitch and roll angles. I will not go into this entirely now, but the basic idea is to use 3D world transformation matrices. You can read up about them here but they do get quite complicated, however I can give you the calculations if you get this far.
It might help to read (old style) OpenGL specs:
https://www.khronos.org/registry/OpenGL/specs/gl/glspec14.pdf
See section 2.10
Also:
https://www.khronos.org/opengl/wiki/Vertex_Transformation
Might help with more concrete examples.
Also, for "proper math" look up 4x4 matrices, projections, and homogeneous coordinates.
https://en.wikipedia.org/wiki/Homogeneous_coordinates

Formula for calculating camera x,y,z position to force 3D point to appear at left side of the screen and rightmost position on the globe

I'd need a formula to calculate 3D position and direction or orientation of a camera in a following situation:
Camera starting position is looking directly into center of the Earth. Green line goes straight up to the sky
Position that camera needs to move to is looking like this
Starting position probably shouldn't matter, but the question is:
How to calculate camera position and direction given 3D coordinates of any point on the globe. In the camera final position, the distance from Earth is always fixed. From desired camera point of view, the chosen point should appear at the rightmost point of a globe.
I think what you want for camera position is a point on the intersection of a plane parallel to the tangent plane at the location, but somewhat further from the Center, and a sphere representing the fixed distance the camera should be from the center. The intersection will be a circle, so there are infinitely many camera positions that work.
Camera direction will be 1/2 determined by the location and 1/2 determined by how much earth you want in the picture.
Suppose (0,0,0) is the center of the earth, Re is the radius of the earth, and (a,b,c) is the location on the earth you want to look at. If it's in terms of latitude and longitude you should convert to Cartesian coordinates which is straightforward. Your camera should be on a plane perpendicular to the vector (a,b,c) and at a height kRe above the earth where k>1 is some number you can adjust. The equation for the plane is then ax+by+cz=d where d = kRe^2. Note that the plane passes through the point (ka,kb,kc) in space, which is what we wanted.
Since you want the camera to be at a certain height above the earth, say h*Re where 1 < k < h, you need to find points on ax+by+cz=d for which x^2+y^2+z^2 = h^2*Re^2. So we need the intersection of the plane and a sphere. It will be easier to manage if we have a coordinate system on the plane, which we get from an orthonormal system which includes (a,b,c). A good candidate for the second vector in the orthonormal system is the projection of the z-axis (polar axis, I assume). Projecting (0,0,1) onto (a,b,c),
proj_(a,b,c)(0,0,1) = (a,b,c).(0,0,1)/|(a,b,c)|^2 (a,b,c)
= c/Re^2 (a,b,c)
Then the "horizontal component" of (0,0,1) is
u = proj_Plane(0,0,1) = (0,0,1) - c/Re^2 (a,b,c)
= (-ac/Re^2,-bc/Re^2,1-c^2/Re^2)
You can normalize the vector to length 1 if you wish but there's no need. However, you do need to calculate and store the square of the length of the vector, namely
|u|^2 = ((ac)^2 + (bc)^2 + (Re^2-c^2))/Re^4
We could complicate this further by taking the cross product of (0,0,1) and the previous vector to get the third vector in the orthonormal system, then obtain a parametric equation for the intersection of the plane and sphere on which the camera lies, but if you just want the simplest answer we won't do that.
Now we need to solve for t such that
|(ka,kb,kc)+t(-ac/Re^2,-bc/Re^2,1-c^2/Re^2)|^2 = h^2 Re^2
|(ka,kb,kc)|^2 + 2t (a,b,c).u + t^2 |u|^2 = h^2 Re^2
Since (a,b,c) and u are perpendicular, the middle term drops out, and you have
t^2 = (h^2 Re^2 - k^2 Re^2)/|u|^2.
Substituting that value of t into
(ka,kb,kc)+t(-ac/Re^2,-bc/Re^2,1-c^2/Re^2)
gives the position of the camera in space.
As for direction, you'll have to experiment with that a bit. Some vector that looks like
(a,b,c) + s(-ac/Re^2,-bc/Re^2,1-c^2/Re^2)
should work. It's hard to say a priori because it depends on the camera magnification, width of the view screen, etc. I'm not sure offhand whether you'll need positive or negative values for s. You may also need to rotate the camera viewport, possibly by 90 degrees, I'm not sure.
If this doesn't work out, it's possible I made an error. Let me know how it works out and I'll check.

Picking in true 3D isometric view

To view my 3D environment, I use the "true" 3D isometric projection (flat square on XZ plane, Y is "always" 0). I used the explanation on wikipedia: http://en.wikipedia.org/wiki/Isometric_projection to come to how to do this transformation:
The projection matrix is an orthographic projection matrix between some minimum and maximum coordinate.
The view matrix is two rotations: one around the Y-axis (n * 45 degrees) and one around the X-axis (arctan(sin(45 degrees))).
The result looks ok, so I think I have done it correctly.
But now I want to be able to pick a coordinate with the mouse. I have successfully implemented this by rendering coordinates to an invisible framebuffer and then getting the pixel under the mouse cursor to get the coordinate. Although this works fine, I would really like to see a mathematical sollution because I will need it to calculate bounding boxes, frustums of the area on the screen and stuff like that.
My instincts tell me to:
- go from screen-coordinates to 2D projection coordinates (or how do you say this, I mean transforming screen coordinates to a coordinate between -1 and +1 for both axisses, with y inverted)
- untransform the coordinate with the inverse of the view-matrix.
- yeah... untransform this coordinate with the inverse of the projection matrix, but as my instincts tell, this won't work as everything will have the same Z-coordinate.
This, while every information is perfectly available on the isometric view (I know that the Y value is always 0). So I should be able to convert the isometric 2D x,y coordinate to a calculated 3d (x, 0, z) coordinate without using scans or something like that.
My math isn't bad, but this is something I can't seem to grasp.
Edit: IMO. every different (x, 0, z) coordinate corresponds to a different (x2, y2) coordinate in isometric view. So I should be able to simply calculate a way from (x2, y2) to (x, 0, z). But how?
Anyone?
there is something called project and unproject to transform screen to world and vice versa....
You seem to miss some core concepts here (it’s been a while since I did this stuff, so minor errors included):
There are 3 kinds of coordinates involved here (there are more, these are the relevant ones): Scene, Projection and Window
Scene (3D) are the coordinates in your world
Projection (3D) are those coordinates after being transformed by camera position and projection
Window (2D) are the coordinates in your window. They are generated from projection by scaling x and y appropriately and discarding z (z is still used for “who’s in front?” calculations)
You can not transform from window to scene with a matrix, as every point in window does correspond to a whole line in scene. If you want (x, 0, z) coordinates, you can generate this line and intersect it with the y-plane.
If you want to do this by hand, generate two points in projection with the same (x,y) and different (arbitrary) z coordinates and transform them to scene by multiplying with the inverse of your projection transformation. Now intersect the line through those two points with your y-plane and you’re done.
Note that there should be a “static” solution (a single formula) to this problem – if you solve this all on paper, you should get to it.

3d Parabolic Trajectory

I'm trying to figure out some calculations using arcs in 3d space but am a bit lost. Lets say that I want to animate an arc in 3d space to connect 2 x,y,z coordinates (both coordinates have a z value of 0, and are just points on a plane). I'm controlling the arc by sending it a starting x,y,z position, a rotation, a velocity, and a gravity value. If I know both the x,y,z coordinates that need to be connected, is there a way to calculate what the necessary rotation, velocity, and gravity values to connect it from the starting x,y,z coordinate to the ending one?
Thanks.
EDIT: Thanks tom10. To clarify, I'm making "arcs" by creating a parabola with particles. I'm trying to figure out how to ( by starting a parabola formed by a series particles with an beginning x,y,z,velocity,rotation,and gravity) determine where it will in end(the last x,y,z coordinates). So if it if these are the two coordinates that need to be connected:
x1=240;
y1=140;
z1=0;
x2=300;
y2=200;
z2=0;
how can the rotation, velocity, and gravity of this parabola be calculated using only these variables start the formation of the parabola:
x1=240;
y1=140;
z1=0;
rotation;
velocity;
gravity;
I am trying to keep the angle a constant value.
This link describes the ballistic trajectory to "hit a target at range x and altitude y when fired from (0,0) and with initial velocity v the required angle(s) of launch θ", which is what you want, right? To get your variables into the right form, set the rotation angle (in the x-y plane) so you're pointing in the right direction, that is atan(y/x), and from then on out, to match the usual terminology for 2D problem, rewrite your z to y, and the horizontal distance to the target (which is sqrt(xx + yy)) as x, and then you can directly use the formula in link.
Do the same as you'd do in 2D. You just have to convert your figures to an affine space by rotating the axis, so one of them becomes zero; then solve and undo the rotation.

Show lat/lon points on screen, in 3d

It's been a while since my math in university, and now I've come to need it like I never thought i would.
So, this is what I want to achieve:
Having a set of 3D points (geographical points, latitude and longitude, altitude doesn't matter), I want to display them on a screen, considering the direction I want to take into account.
This is going to be used along with a camera and a compass , so when I point the camera to the North, I want to display on my computer the points that the camera should "see". It's a kind of Augmented Reality.
Basically what (i think) i need is a way of transforming the 3D points viewed from above (like viewing the points on google maps) into a set of 3d Points viewed from a side.
The conversion of Latitude and longitude to 3-D cartesian (x,y,z) coordinates can be accomplished with the following (Java) code snippet. Hopefully it's easily converted to your language of choice. lat and lng are initially the latitude and longitude in degrees:
lat*=Math.PI/180.0;
lng*=Math.PI/180.0;
z=Math.sin(-lat);
x=Math.cos(lat)*Math.sin(-lng);
y=Math.cos(lat)*Math.cos(-lng);
The vector (x,y,z) will always lie on a sphere of radius 1 (i.e. the Earth's radius has been scaled to 1).
From there, a 3D perspective projection is required to convert the (x,y,z) into (X,Y) screen coordinates, given a camera position and angle. See, for example, http://en.wikipedia.org/wiki/3D_projection
It really depends on the degree of precision you require. If you're working on a high-precision, close-in view of points anywhere on the globe you will need to take the ellipsoidal shape of the earth into account. This is usually done using an algorithm similar to the one descibed here, on page 38 under 'Conversion between Geographical and Cartesian Coordinates':
http://www.icsm.gov.au/gda/gdatm/gdav2.3.pdf
If you don't need high precision the techniques mentioned above work just fine.
could anyone explain me exactly what these params mean ?
I've tried and the results where very weird so i guess i am missunderstanding some of the params for the perspective projection
* {a}_{x,y,z} - the point in 3D space that is to be projected.
* {c}_{x,y,z} - the location of the camera.
* {\theta}_{x,y,z} - The rotation of the camera. When {c}_{x,y,z}=<0,0,0>, and {\theta}_{x,y,z}=<0,0,0>, the 3D vector <1,2,0> is projected to the 2D vector <1,2>.
* {e}_{x,y,z} - the viewer's position relative to the display surface. [1]
Well, you'll want some 3D vector arithmetic to move your origin, and probably some quaternion-based rotation functions to rotate the vectors to match your direction. There are any number of good tutorials on using quaternions to rotate 3D vectors (since they're used a lot for rendering and such), and the 3D vector stuff is pretty simple if you can remember how vectors are represented.
well, just a pice ov advice, you can plot this points into a 3d space (you can do easily this using openGL).
You have to transforrm the lat/long into another system for example polar or cartesian.
So starting from lat/longyou put the origin of your space into the center of the heart, than you have to transform your data in cartesian coord:
z= R * sin(long)
x= R * cos(long) * sin(lat)
y= R * cos(long) * cos(lat)
R is the radius of the world, you can put it at 1 if you need only to cath the direction between yoour point of view anthe points you need "to see"
than put the Virtual camera in a point of the space you've created, and link data from your real camera (simply a vector) to the data of the virtual one.
The next stemp to gain what you want to do is to try to plot timages for your camera overlapped with your "virtual space", definitevly you should have a real camera that is a control to move the virtual one in a virtual space.

Resources