Kinect intrinsic parameters from field of view - math

Microsoft state that the field of view angles for the Kinect are 43 degrees vertical and 57 horizontal (stated here) . Given these, can we calculate the intrinsic parameters i.e. focal point and centre of projection? I assume centre of projection can be given as (0,0,0)?
Thanks
EDIT: some more information on what I'm trying to do
I have a dataset of images recorded with a Kinect, I am trying to convert pixel positions (x_screen,y_screen and z_world (in mm)) to real world coordinates.
If I know the camera is placed at point (x',y',z') in the real world coordinate system, is it sufficient to find the real world coordinates by doing the following:
x_world = (x_screen - c_x) * z_world / f_x
y_world = (y_screen - c_y) * z_world / f_y
where c_x = x' and c_y = y' and f_x, f_y is the focal length? And also how can I find the focal length given just knowledge of the field of view?
Thanks

If you equate the world origin (0,0,0) with the camera focus (center of projection as you call it) and you assume the camera is pointing along the positive z-axis, then the situation looks like this in the plane x=0:
Here the axes are z (horizontal) and y (vertical). The subscript v is for "viewport" or screen, and w is for world.
If I get your meaning correctly, you know h, the screen height in pixels. Also, zw, yv and xv. You want to know yw and xw. Note this calculation has (0,0) in the center of the viewport. Adjust appropriately for the usual screen coordinate system with (0,0) in the upper left corner. Apply a little trig:
tan(43/2) = (h/2) / f = h / (2f), so f = h / ( 2 tan(43/2) )
and similar triangles
yw / zw = yv / f also xw / zw = xv / f
Solve:
yw = zw * yv / f and xw = zw * xv / f
Note this assumes the "focal length" of the camera is equal in the x-direction. It doesn't have to be. For best accuracy in xw, you should recalculate with f = w / 2 tan(57/2) where w is the screen width. This is because f isn't a true focal length. It's just a constant of conversion. If the pixels of the camera are square and optics have no aberrations, these two f calculations will give the same result.
NB: In a deleted (improper) article the OP seemed to say that it isn't zw that's known but the length D of the hypotenuse: origin to (xw,yw,zw). In this case just note zw = D * f / sqrt(xv² + yv² + f²) (assuming camera pixels are square; some scaling is necessary if not). They you can proceed as above.

i cannot add comment since i have a too low reputation here.
But I remind that the camera angle of the kinect isn't general the same
like in a normal photo camera, due to the video stream format and its sensor chip. Therefore the SDK mentioning 57 degrees and 43 degrees, might refer to different degree resolution for hight and width.
it sends a bitmap of 320x240 pixels and those pixels relate to
Horizontal FOV: 58,5° (as distributed over 320 pixels horizontal)
Vertical FOV: 45,6° (as distributed over 240 pixels vertical).
Z is known your angle is known, so i supose law of sines can get you proper locations then https://en.wikipedia.org/wiki/Law_of_sines

Related

How do I triangulate a 3d coordinate from two 2d points?

I'm working on a project with two infrared positioning cameras which output the (X,Y) coordinate of any IR source. I'm placing them next to each other and my goal is to measure the 3D coordinate (X,Y,Z) of the IR source, using the same technique our eyes use to measure depth.
I have drawn a (lousy) sketch here
which illustrates what I'm trying to calculate. The red dot is my IR source, which can also be seen on the 'views' of the camera to the right. I am trying to measure the length of the blue line.
I have a few known variables:
The cameras have a resolution of 1024x768 (which also means that this is the maximum of the (X,Y) coordinate mentioned earlier)
Horizontally the field of view is 41deg, vertically 31deg.
I have yet to decide on the distance between cameras (AB), but this will be a known variable. Let's make it 30 cm for now.
Sadly I cannot seem to find the focal length of the camera.
Ultimately I'm hoping for an (X,Y,Z) coordinate relative to the middle point of AB. How would I go about measuring (Z)?
I am not sure how well aligned your cameras are, but from your pictures I am beginning to assume that the camera A and camera B are so well aligned that the rectangle representing the camera B's screen is simply horizontal translation of the rectangle representing the camera A's screen. What I mean by that is that the corresponding edges of the screens' rectangles are parallel to each other and the two rectangular screens lie in a common vertical plane perpendicular to the ground. Now, consider the plane parallel to the vertical plane that contains the two camera screens and passing through the focal points A and B of the two cameras. Call this latter plane the screen_plane. Also, the focal points A and B are at an equal height from the ground. If that is the case, and if I assume that c = |AB| is the distance between the focal points of the two cameras, and if I put a coordinate system at A, so that the x axis is horizontal to the ground, the y axis is perpendicular to the ground, and the z axis is parallel to the ground but perpendicular to the screen, then the focal point of camera B would have coordinates ( c, 0, 0 ). As an example, you have given c = 30 cm. Also the screen_plane is spanned by the x and y axes described above and the z axis is perpendicular to the screen_plane.
If that is the setting you want to work with, then the red point P will appear on both screens with the same coordinate Y_A = Y_B but different coordinates X_A and X_B.
Then let us denote by theta the horizontal field of view angle, which you have determined as theta = 41 deg. Just to be clear, I am assuming the angle between the leftmost side to the rightmost side of view is 2 * theta = 82 deg.
If I understand correctly, you are trying to calculate the distance Z between the vertical plane screen_plane that contains both camera focal points and the plane parallel to screen_plane and passing through the red point P, i.e. you are trying to calculate the distance from P to the vertical plane screen_plane.
Then, here is how you calculate Z:
Step 1: From the image of point P on screen A calculate the distances (e.g. the number of pixels) from P to the vertical edges of the screen. Say they are dist_P_to_left_edge and dist_P_to_right_edge. Set
a_A = dist_P_to_left_edge / (dist_P_to_left_edge + dist_P_to_right_edge) (this one is not really necessary)
b_A = dist_P_to_right_edge / (dist_P_to_left_edge + dist_P_to_right_edge)
Step 2: Do the same with the image of point P on screen B:
a_B = dist_P_to_left_edge / (dist_P_to_left_edge + dist_P_to_right_edge)
b_B = dist_P_to_right_edge / (dist_P_to_left_edge + dist_P_to_right_edge) (this one is not really necessary)
Step 3: Apply the formula:
Z = c * cot(theta) / (2 * (1 - b_A - a_B) )
So for example, from the pictures of the screens of camera A and B you have provided, I measured with a ruler, that
b_A = 4/38
a_B = 12.5/38
and from the data you have included
theta = 41 deg
c = 30 cm
so I have calculated that the length of the blue segment on your picture is
Z = 30 * cos(41*pi/180) / (sin(41*pi/180) * (1 - 4/38 - 12.5/38))
= 60.99628 cm

Calculating the angle tilt using only accelerometers

I would like to know how to calculate using the coordinates of the accelerometer of my Android phone the angle between the two segments connecting the accelerometer and the bottom of the tree (B) and the accelerometer and the top of the tree (T) .
The accelerometer takes a value of acceleration on 3 axes every second, so I calculated the average and I have:
For the phone towards B: Ay1 = -9.69m.s^-1 and Az1 = 0.71m.s^-1
For the phone towards T: Ay2 = -9.71m.s^-1 and Az2 = 0.71m.s^-1
I am located at a distance D = 20m from the tree.
I would like at the end to know the value of H. So I would like to know how to calculate the angle and then find the height of the tree.
Thanks for your help
The angles we need are the angles between world-up and device-up. Since the gravity vector is pointing towards world-down, this is simply (assuming, you are pointing with the device y-axis):
cos angle = -a.y / sqrt(a.x^2 + a.y^2 + a.z^2)
The two angles we get from your readings are:
angle1 = 4.19065°
angle2 = 4.18205°
You can already see that the angles are very close as the two acceleration values are also extremely close. Btw, I wonder if you are really pointing with the y-axis because the gravity values suggest that you are holding the phone almost upright in both cases.
Anyway, if we assume that the two angles are correct, we can now calculate the height of the respective triangles assuming a length to target l. Then:
tan (90° - angle) = h / l
Assuming l=20 m, this gives us two height values:
h1 = 272.958 m
h2 = 273.521 m
These are heights above the height of the phone. In theory, one should be positive and the other should be negative. The height of the tree would be the difference of the two heights:
treeH = h2 - h1
treeH = 0.56338 m
As you have seen throughout the example, your readings must be pretty off. Nevertheless, this is how you would calculate the tree height.

Calcluation of viewport coordinates

I read an article about normalized device coordinates (on the german DGL wiki) and the following example is provided:
"Let's consider that we had a Viewport with dimensions 1024 pixel(width) and 768 pixel height. A point P with absolute, not normalized, coordinates P(350/210) would be in normalized coordinates P(-0,32/-0,59).These coordinates can now be projected on a Viewport (800x600) just by multiplying the normalized device coordinates (similar to vector scaling) with the size of the viewport. In this case the result would be P(273/164).
Somehow I can't understand how one can get to the result provided (I mean 273/164 and -0,32/-0,59 ...could somebody explain to me how to calculate the coordinates?
P.S. : This is the article - https://wiki.delphigl.com/index.php/Normalisierte_Ger%C3%A4tekoordinate
Thank you!
That article is definitely lacking description. I can get you part of the way there; maybe someone with more math can help finish.
According to this answer, the formula to convert non-normalized coords to normalized coords is:
(where Cx/y = Coordinate X/Y; Sx/y = Screen X/Y; and Nx/y = Normalized X/Y).
Plugging the example's numbers in:
Nx = (350/1024) * 2 - 1 = -0.31640625
Ny = 1 - (210/768) * 2 = 0.453125
...or (-.36, 0.45).
Reversing this to get the new coords:
Cx = (1 + -0.31640625) / 2 * 800 = 273.4375
Cy = (1 - 0.453125) / 2 * 600 = 164.0625
Note that the Y value doesn't match. This is probably because my calculation doesn't account for the aspect ratio, and it should be since these screens have a .75 aspect ratio, while NDC's is 1. This SO answer may help too.

Estimate visible bounds of webcam using diagonal fov

I'm using a Logitech C920 webcam (specs here) and I need to estimate the visible bounds of it before installing it at the user place.
I see that it has a Diagonal FOV of 78°. So, following the math described here we have:
Where H is the vertical Fov, W is the horizontal Fov, D is the diagonal Fov and the aspect ratio is r.
Considering an aspect ratio of 16/9, that gives me approx. W = 67.9829 and H = 38.2403
So I create a frustum using W and H.
The problem is: a slice of this frustum isn't 16:9. Is it due because of the numeric approximations or I'm doing something else wrong?
Does the camera crop a bigger image?
How can I compute effectively what will be the visible frustum?
Thank you very much!
The formulae you have are for distances, not for angles. You would need to calculate the distance using tangens:
D = 2 * tan(diagonalFov / 2)
Then you can go ahead with your formula. H and W will again be distance values. If you need the according angles, you can use arc tan:
verticalFov = 2 * arc tan (H / 2)
horizontalFov = 2 * arc tan (W / 2)
For your values, you'll get
verticalFov = 43.3067°
horizontalFov = 70.428°

Radius of projected Sphere

i want to refine a previous question:
How do i project a sphere onto the screen?
(2) gives a simple solution:
approximate radius on screen[CLIP SPACE] = world radius * cot(fov / 2) / Z
with:
fov = field of view angle
Z = z distance from camera to sphere
result is in clipspace, multiply by viewport size to get size in pixels
Now my problem is that i don't have the FOV. Only the view and projection matrices are known. (And the viewport size if that does help)
Anyone knows how to extract the FOV from the projection matrix?
Update:
This approximation works better in my case:
float radius = glm::atan(radius/distance);
radius *= glm::max(viewPort.width, viewPort.height) / glm::radians(fov);
I'm a bit late to this party. But I came across this thread when I was looking into the same problem. I spent a day looking into this and worked though some excellent articles I found here:
http://www.antongerdelan.net/opengl/virtualcamera.html
I ended up starting with the projection matrix and working backwards. I got the same formula you mention in your post above. ( where cot(x) = 1/tan(x) )
radius_pixels = (radius_worldspace / {tan(fovy/2) * D}) * (screen_height_pixels / 2)
(where D is the distance from camera to the target's bounding sphere)
I'm using this approach to determine the radius of an imaginary trackball that I use to rotate my object.
Btw Florian, you can extract the fovy from the Projection matrix as follows:
If you take the Sy component from the Projection matrix as shown here:
Sx 0 0 0
0 Sy 0 0
0 0 Sz Pz
0 0 -1 0
where Sy = near / range
and where range = tan(fovy/2) x near
(you can find these definitions at the page I linked above)
if you substitute range in the Sy eqn above you get:
Sy = 1 / tan(fovy/2) = cot(fovy/2)
rearranging:
tan(fovy/2) = 1 / Sy
taking arctan (the inverse of tan) of both sides we get:
fovy/2 = arctan(1/Sy)
so,
fovy = 2 x arctan(1/Sy)
Not sure if you still care - its been a while! - but maybe this will help someone else.
Update: see below.
Since you have the view and projection matrices, here's one way to do it, though it's probably not the shortest:
transform the sphere's center into view space using the view matrix: call the result point C
transform a point on the surface of the sphere, e.g. C+(r, 0, 0) in world coordinates where r is the sphere's world radius, into view space; call the result point S
compute rv = distance from C to S (in view space)
let point S1 in view coordinates be C + (rv, 0, 0) - i.e. another point on the surface of the sphere in view space, for which the line C -> S1 is perpendicular to the "look" vector
project C and S1 into screen coords using the projection matrix as Cs and S1s
compute screen radius = distance between Cs and S1s
But yeah, like Brandorf said, if you can preserve the camera variables, like FOVy, it would be a lot easier. :-)
Update:
Here's a more efficient variant on the above: make an inverse of the projection matrix. Use it to transform the viewport edges back into view space. Then you won't have to project every box into screen coordinates.
Even better, do the same with the view matrix and transform the camera frustum back into world space. That would be more efficient for comparing many boxes against; but harder to figure out the math.
The answer posted at your link radiusClipSpace = radius * cot(fov / 2) / Z, where fov is the angle of the field of view, and Z is the z-distance to the sphere, definitely works. However, keep in mind that radiusClipSpace must be multiplied by the viewport's width to get a pixel measure. The value measured in radiusClipSpace will be a value between 0 and 1 if the object fits on the screen.
An alternative solution may be to use the solid angle of the sphere. The solid angle subtended by a sphere in a sky is basically the area it covers when projected to the unit sphere.
The formulae are given at this link but roughly what I'm doing is:
if( (!radius && !distance) || fabsf(radius) > fabsf(distance) )
; // NAN conditions. do something special.
theta=arcsin( radius/distance )
sphereSolidAngle = ( 1 - cosf( theta ) ) ; // not multiplying by 2PI since below ratio used only
frustumSolidAngle = ( 1 - cosf( fovy / 2 ) ) / M_PI ; // I cheated here. I assumed
// the solid angle of a frustum is (conical), then divided by PI
// to turn it into a square (area unit square=area unit circle/PI)
numPxCovered = 768.f*768.f * sphereSolidAngle / frustumSolidAngle ; // 768x768 screen
radiusEstimate = sqrtf( numPxCovered/M_PI ) ; // area=pi*r*r
This works out to roughly the same numbers as radius * cot(fov / 2) / Z. If you only want an estimate of the area covered by the sphere's projection in px, this may be an easy way to go.
I'm not sure if a better estimate of the solid angle of the frustum could be found easily. This method involves more comps than radius * cot(fov / 2) / Z.
The FOV is not directly stored in the projection matrix, but rather used when you call gluPerspective to build the resulting matrix.
The best approach would be to simply keep all of your camera variables in their own class, such as a frustum class, whose member variables are used when you call gluPerspective or similar.
It may be possible to get the FOVy back out of the matrix, but the math required eludes me.

Resources