I'm working on a project with two infrared positioning cameras which output the (X,Y) coordinate of any IR source. I'm placing them next to each other and my goal is to measure the 3D coordinate (X,Y,Z) of the IR source, using the same technique our eyes use to measure depth.
I have drawn a (lousy) sketch here
which illustrates what I'm trying to calculate. The red dot is my IR source, which can also be seen on the 'views' of the camera to the right. I am trying to measure the length of the blue line.
I have a few known variables:
The cameras have a resolution of 1024x768 (which also means that this is the maximum of the (X,Y) coordinate mentioned earlier)
Horizontally the field of view is 41deg, vertically 31deg.
I have yet to decide on the distance between cameras (AB), but this will be a known variable. Let's make it 30 cm for now.
Sadly I cannot seem to find the focal length of the camera.
Ultimately I'm hoping for an (X,Y,Z) coordinate relative to the middle point of AB. How would I go about measuring (Z)?
I am not sure how well aligned your cameras are, but from your pictures I am beginning to assume that the camera A and camera B are so well aligned that the rectangle representing the camera B's screen is simply horizontal translation of the rectangle representing the camera A's screen. What I mean by that is that the corresponding edges of the screens' rectangles are parallel to each other and the two rectangular screens lie in a common vertical plane perpendicular to the ground. Now, consider the plane parallel to the vertical plane that contains the two camera screens and passing through the focal points A and B of the two cameras. Call this latter plane the screen_plane. Also, the focal points A and B are at an equal height from the ground. If that is the case, and if I assume that c = |AB| is the distance between the focal points of the two cameras, and if I put a coordinate system at A, so that the x axis is horizontal to the ground, the y axis is perpendicular to the ground, and the z axis is parallel to the ground but perpendicular to the screen, then the focal point of camera B would have coordinates ( c, 0, 0 ). As an example, you have given c = 30 cm. Also the screen_plane is spanned by the x and y axes described above and the z axis is perpendicular to the screen_plane.
If that is the setting you want to work with, then the red point P will appear on both screens with the same coordinate Y_A = Y_B but different coordinates X_A and X_B.
Then let us denote by theta the horizontal field of view angle, which you have determined as theta = 41 deg. Just to be clear, I am assuming the angle between the leftmost side to the rightmost side of view is 2 * theta = 82 deg.
If I understand correctly, you are trying to calculate the distance Z between the vertical plane screen_plane that contains both camera focal points and the plane parallel to screen_plane and passing through the red point P, i.e. you are trying to calculate the distance from P to the vertical plane screen_plane.
Then, here is how you calculate Z:
Step 1: From the image of point P on screen A calculate the distances (e.g. the number of pixels) from P to the vertical edges of the screen. Say they are dist_P_to_left_edge and dist_P_to_right_edge. Set
a_A = dist_P_to_left_edge / (dist_P_to_left_edge + dist_P_to_right_edge) (this one is not really necessary)
b_A = dist_P_to_right_edge / (dist_P_to_left_edge + dist_P_to_right_edge)
Step 2: Do the same with the image of point P on screen B:
a_B = dist_P_to_left_edge / (dist_P_to_left_edge + dist_P_to_right_edge)
b_B = dist_P_to_right_edge / (dist_P_to_left_edge + dist_P_to_right_edge) (this one is not really necessary)
Step 3: Apply the formula:
Z = c * cot(theta) / (2 * (1 - b_A - a_B) )
So for example, from the pictures of the screens of camera A and B you have provided, I measured with a ruler, that
b_A = 4/38
a_B = 12.5/38
and from the data you have included
theta = 41 deg
c = 30 cm
so I have calculated that the length of the blue segment on your picture is
Z = 30 * cos(41*pi/180) / (sin(41*pi/180) * (1 - 4/38 - 12.5/38))
= 60.99628 cm
Related
In a 3D space (x,y,z), you are given two points with no restrictions.
Let's say Point 1 = (15,10,-5), Point 2 = (-1, 0, 11)
An arbitrary point (denoted X in the image) is made by finding the mid-point between point 1 and point 2, in this case (7,5,6), and then y is incremented by 10 which creates a third point
Point 3 = (7,15,6)
Attached is an image to better portray these points
The problem is to find an equation that creates the orange line that links the points 1, 2 and 3. The line doesn't necessarily have to link on the bottom, but I assume it is easier to create an ellipse with these points than an inverse parabola.
It is rather simple to build a circle through these three points (note they must be non-collinear).
Make a plane containing given points, use arbitrary coordinate system in this plane. For example, point P1 is origin, vector P2-P1 defines OX axis, vector product of P2-P1 and P3-P1 defines normal N, and (P2-P1) x N defines OY axis
Solve "circle through three points" problem in this plane, find radius and center.
Transform center back into 3D.
Also note that there is infinite number of ellipses and parabolas through three points (until we define additional limitations),
I am trying to create a game using one-point perspective. Everything works fine for points within the view but goes wrong with the negative depth. I understand the perspective as shown on the following picture (source).
In general, I took a point at some distance from the left of the right vertical edge of the frame along the lower horizontal line (5 points in this case), join it with the O' point (line H'O') and where the line intersects the vertical line (at point H') is the depth line (of 5 in this case). This works well even for negative depth (as the line H'O' intersect the vertical line below the viewpoint). However, if the depth is more then is the distance of O' (that mean the point would be on the right from the O') the line flip and the H' end on top of the viewpoint (although it should end up below).
How should I correct it, so the point with negative depth is transformed correctly (mean from 3D space to 2D space)?
EDIT
This image is probably better.
My question is how to handle points with negative depth (should end up below the screen) higher then is a distance of transversal.
The points to the right of the point O', along the line determined by the lower edge of the frame, correspond to points that are behind the observer, so technically, the observer cannot see them. To see the points behind you, means that you have to turn around, so you need to change the position of the screen. Draw a copy of the black square frame to the right of the point O', so that the new square is the mirror symmetric image of the original frame square with respect to the line orthogonal to the horizon line and passing trough the point O'.
Edit: The points with negative depth to the right of point O' (i.e. a point behind the observer) is supposed to be mapped above the horizontal blue line. This is the right way to go.
I assume your coordinate system in three dimensions has its origin at the lower right corner of the square frame on your picture. The x axis (I think how you measure width) runs along the lower horizontl edge of the frame, while the y axis (what you call height) is along the right vertical edge of the frame. The depth axis is in three dimensions and it's perpendicular to the plane of the square frame (so it is parallel to the ground). It starts from the lower right corner of the frame square. Assume that the distance of point O' from the right vertical edge of the square is S and the coordinates of the point C are {C1, C2} (C1 is the distance of point C from the right vertical edge and C2 is the distance of C from the lower horizontal edge of the square).
Given the coordinates {w, h, d} (w - width, h - height, d - depth) of a point in three dimensions, its representation on the two dimesnional square screen is gievn by the formulas:
x = (S*w + C1*d)/(S+d)
y = (S*h + C2*d)/(S+d)
So the points you gave as an example in the comments are
P1 = {h = 5, w = 5, d = 5} and P2 = {h = 5, w = 5, d = -10}
Their representation on the screen is
P1_screen = {(S*5 + C1*5)/(S+5), (S*5 + C2*5)/(S+5)}
P2_screen = {(S*5 - C1*10)/(S-10), (S*5 - C2*10)/(S-10)}
whatever your parameters S, C1 and C2 are. The representation of the (infinte) line connecting points P1 and P2 is represented on the screen as the (infinite) line connecting the points P1_screen and P2_screen. However, if you want the 2D representation of the visible part of the segment that connects P1 and P2, then you have to draw the (infinite) line between P1_screen and P2_screen and exclude the following two segment: segment [P1_screen, P2_screen] and the segment from P2_screen along the line up towards the upper top edge. You have to draw on the screen only the segment from the infinite line connecting P1_screen and P2_screen which starts from P1_Screen and goes down towards the lower horizontal edge of the screen.
Microsoft state that the field of view angles for the Kinect are 43 degrees vertical and 57 horizontal (stated here) . Given these, can we calculate the intrinsic parameters i.e. focal point and centre of projection? I assume centre of projection can be given as (0,0,0)?
Thanks
EDIT: some more information on what I'm trying to do
I have a dataset of images recorded with a Kinect, I am trying to convert pixel positions (x_screen,y_screen and z_world (in mm)) to real world coordinates.
If I know the camera is placed at point (x',y',z') in the real world coordinate system, is it sufficient to find the real world coordinates by doing the following:
x_world = (x_screen - c_x) * z_world / f_x
y_world = (y_screen - c_y) * z_world / f_y
where c_x = x' and c_y = y' and f_x, f_y is the focal length? And also how can I find the focal length given just knowledge of the field of view?
Thanks
If you equate the world origin (0,0,0) with the camera focus (center of projection as you call it) and you assume the camera is pointing along the positive z-axis, then the situation looks like this in the plane x=0:
Here the axes are z (horizontal) and y (vertical). The subscript v is for "viewport" or screen, and w is for world.
If I get your meaning correctly, you know h, the screen height in pixels. Also, zw, yv and xv. You want to know yw and xw. Note this calculation has (0,0) in the center of the viewport. Adjust appropriately for the usual screen coordinate system with (0,0) in the upper left corner. Apply a little trig:
tan(43/2) = (h/2) / f = h / (2f), so f = h / ( 2 tan(43/2) )
and similar triangles
yw / zw = yv / f also xw / zw = xv / f
Solve:
yw = zw * yv / f and xw = zw * xv / f
Note this assumes the "focal length" of the camera is equal in the x-direction. It doesn't have to be. For best accuracy in xw, you should recalculate with f = w / 2 tan(57/2) where w is the screen width. This is because f isn't a true focal length. It's just a constant of conversion. If the pixels of the camera are square and optics have no aberrations, these two f calculations will give the same result.
NB: In a deleted (improper) article the OP seemed to say that it isn't zw that's known but the length D of the hypotenuse: origin to (xw,yw,zw). In this case just note zw = D * f / sqrt(xv² + yv² + f²) (assuming camera pixels are square; some scaling is necessary if not). They you can proceed as above.
i cannot add comment since i have a too low reputation here.
But I remind that the camera angle of the kinect isn't general the same
like in a normal photo camera, due to the video stream format and its sensor chip. Therefore the SDK mentioning 57 degrees and 43 degrees, might refer to different degree resolution for hight and width.
it sends a bitmap of 320x240 pixels and those pixels relate to
Horizontal FOV: 58,5° (as distributed over 320 pixels horizontal)
Vertical FOV: 45,6° (as distributed over 240 pixels vertical).
Z is known your angle is known, so i supose law of sines can get you proper locations then https://en.wikipedia.org/wiki/Law_of_sines
I have 2 circles that collide in a certain collision point and under a certain collision angle which I calculate using this formula :
C1(x1,y1) C2(x2,y2)
and the angle between the line uniting their centre and the x axis is
X = arctg (|y2 - y1| / |x2 - x1|)
and what I want is to translate the circle on top under the same angle that collided with the other circle. I mean with the angle X and I don't know what translation coordinates should I give for a proper and a straight translation!
For what I think you mean, here's how to do it cleanly.
Think in vectors.
Suppose the centre of the bottom circle has coordinates (x1,y1), and the centre of the top circle has coordinates (x2,y2). Then define two vectors
support = (x1,y1)
direction = (x2,y2) - (x1,y1)
now, the line between the two centres is fully described by the parametric representation
line = support + k*direction
with k any value in (-inf,+inf). At the initial time, substituting k=1 in the equation above indeed give the coordinates of the top circle. On some later time t, the value of k will have increased, and substituting that new value of k in the equation will give the new coordinates of the centre of the top circle.
How much k increases at value t is equal to the speed of the circle, and I leave that entirely up to you :)
Doing it this way, you never need to mess around with any angles and/or coordinate transformations etc. It even works in 3D (provided you add in z-coordinates everywhere).
I get a series of square binary images as in the picture below,
I want to find the red point, which is the point of intersection of four blocks (2 black and 2 white). For doing so, I use to get the sum of all pixel values along the diagonal directions of the square image, which is 45 deg and 135 deg respectively. The intersection of maximum pixel sum 45 deg line and minimum pixel sum 135 deg line is where my red point is.
Now that I get the co-ordinate of the red point in 45 deg-135 deg co-ordinate system, how to I transform them to earth co-ordinates?
In other words, say I have a point in 45deg-135deg co-ordinate system; How do I find the corresponding co-ordinate values in x-y co-ordinate system? What is the transformation matrix?
some more information that might help:
1) if the image is a 60x60 image, I get 120 values in 45deg-135deg system, since i scan each row followed by column to add the pixels.
I don't know much about matlab, but in general all you need to do is rotate your grid by 45 degrees.
Here's a helpful link; shows you the rotation matrix you need
wikipedia rotation matrix article
The new coordinates for a point after 2D rotation look like this:
x' = x \cos \theta - y \sin \theta.
y' = x \sin \theta + y \cos \theta.
replace theta with 45 (or maybe -45) and you should be all set.
If your red dot starts out at (x,y), then after the -45 degree rotation it will have the new coordinates (x',y'), which are defined as follows:
x' = x cos(-45) - y sin (-45)
y' = x sin (-45) + y cos (-45)
Sorry when I misunderstood your question but why do you rotate the image? The x-value of your red point is just the point where the derivative in x-direction has the maximum absolute value. And for the y-direction it is the same with the derivative in y-direction.
Assume you have the following image
If you take the first row of the image it has at the beginning all 1 and the for most of the width zeroes. The plot of the first column looks like this.
Now you convolve this line with the kernel {-1,1} which is only one nested loop over your line and you get
Going now through this result and extracting the position of the point with the highest value gets you 72. Therefore the x-position of the red point is 73 (since the kernel of the convolution finds the derivative one point too soon).
Therefore, if data is the image matrix of the above binary image then extracting your red point position is near to one line in Mathematica
Last[Transpose[Position[ListConvolve[{-1, 1}, #] & /#
{data[[1]],Transpose[data][[1]]}, 1 | -1]]] + 1
Here you get {73, 86} which is the correct position if y=0 is the top row. This method should be implemented in a few minutes in any language.
Remarks:
The approximated derivative which is the result of the convolution can either be negative or positive. This depends whether it is a change from 0 to 1 or vice versa. If you want to search for the highest value, you have to take the absolute value of the convolution result.
Remember that the first row in the image matrix is not always in top position of the displayed image. This depends on the software you are using. If you get wrong y values be aware of that.