Transforming captured co-ordinates into screen co-ordinates - math

I think this is probably a simple maths question but I have no idea what's going on right now.
I'm capturing the positions of "markers" on a webcam and I have a list of markers and their co-ordinates. Four of the markers are the outer corners of a work surface, and the fifth (green) marker is a widget. Like this:
Here's some example data:
Top left marker (a=98, b=86)
Top right marker (c=119, d=416)
Bottom left marker (e=583, f=80)
Bottom right marker (g=569, h=409)
Widget marker (x=452, y=318)
I'd like to somehow transform the webcam's widget position into a co-ordinate to display on the screen, where top left is 0,0 not 98,86 and somehow take into account the warped angles from the webcam capture.
Where would I even begin? Any help appreciated

In order to compute the warping, you need to compute a homography between the four corners of your input rectangle and the screen.
Since your webcam polygon seems to have an arbitrary shape, a full perspective homography can be used to convert it to a rectangle. It's not that complicated, and you can solve it with a mathematical function (should be easily available) known as Singular Value Decomposition or SVD.
Background information:
For planar transformations like this, you can easily describe them with a homography, which is a 3x3 matrix H such that if any point on or in your webcam polygon, say x1 were multiplied by H, i.e. H*x1, we would get a point on the screen (rectangular), i.e. x2.
Now, note that these points are represented by their homogeneous coordinates which is nothing but adding a third coordinate (the reason for which is beyond the scope of this post). So, suppose your coordinates for X1 were, (100,100), then the homogeneous representation would be a column vector x1 = [100;100;1] (where ; represents a new row).
Ok, so now we have 8 homogeneous vectors representing 4 points on the webcam polygon and the 4 corners of your screen - this is all we need to compute a homography.
Computing the homography:
A little math:
I'm not going to get into the math, but briefly this is how we solve it:
We know that 3x3 matrix H,
H =
h11 h12 h13
h21 h22 h23
h31 h32 h33
where hij represents the element in H at the ith row and the jth column
can be used to get the new screen coordinates by x2 = H*x1. Also, the result will be something like x2 = [12;23;0.1] so to get it in the screen coordinates, we normalize it by the third element or X2 = (120,230) which is (12/0.1,23/0.1).
So this means each point in your webcam polygon (WP) can be multiplied by H (and then normalized) to get your screen coordinates (SC), i.e.
SC1 = H*WP1
SC2 = H*WP2
SC3 = H*WP3
SC4 = H*WP4
where SCi refers to the ith point in screen coordinates and
WPi means the same for the webcam polygon
Computing H: (the quick and painless explanation)
Pseudocode:
for n = 1 to 4
{
// WP_n refers to the 4th point in the webcam polygon
X = WP_n;
// SC_n refers to the nth point in the screen coordinates
// corresponding to the nth point in the webcam polygon
// For example, WP_1 and SC_1 is the top-left point for the webcam
// polygon and the screen coordinates respectively.
x = SC_n(1); y = SC_n(2);
// A is the matrix which we'll solve to get H
// A(i,:) is the ith row of A
// Here we're stacking 2 rows per point correspondence on A
// X(i) is the ith element of the vector X (the webcam polygon coordinates, e.g. (120,230)
A(2*n-1,:) = [0 0 0 -X(1) -X(2) -1 y*X(1) y*X(2) y];
A(2*n,:) = [X(1) X(2) 1 0 0 0 -x*X(1) -x*X(2) -x];
}
Once you have A, just compute svd(A) which will give decompose it into U,S,VT (such that A = USVT). The vector corresponding to the smallest singular value is H (once you reshape it into a 3x3 matrix).
With H, you can retrieve the "warped" coordinates of your widget marker location by multiplying it with H and normalizing.
Example:
In your particular example if we assume that your screen size is 800x600,
WP =
98 119 583 569
86 416 80 409
1 1 1 1
SC =
0 799 0 799
0 0 599 599
1 1 1 1
where each column corresponds to corresponding points.
Then we get:
H =
-0.0155 -1.2525 109.2306
-0.6854 0.0436 63.4222
0.0000 0.0001 -0.5692
Again, I'm not going into the math, but if we normalize H by h33, i.e. divide each element in H by -0.5692 in the example above,
H =
0.0272 2.2004 -191.9061
1.2042 -0.0766 -111.4258
-0.0000 -0.0002 1.0000
This gives us a lot of insight into the transformation.
[-191.9061;-111.4258] defines the translation of your points (in pixels)
[0.0272 2.2004;1.2042 -0.0766] defines the affine transformation (which is essentially scaling and rotation).
The last 1.0000 is so because we scaled H by it and
[-0.0000 -0.0002] denotes the projective transformation of your webcam polygon.
Also, you can check if H is accurate my multiplying SC = H*WP and normalizing each column with its last element:
SC = H*WP
0.0000 -413.6395 0 -411.8448
-0.0000 0.0000 -332.7016 -308.7547
-0.5580 -0.5177 -0.5554 -0.5155
Dividing each column, by it's last element (e.g. in column 2, -413.6395/-0.5177 and 0/-0.5177):
SC
-0.0000 799.0000 0 799.0000
0.0000 -0.0000 599.0000 599.0000
1.0000 1.0000 1.0000 1.0000
Which is the desired result.
Widget Coordinates:
Now, your widget coordinates can be transformed as well H*[452;318;1], which (after normalizing is (561.4161,440.9433).
So, this is what it would look like after warping:
As you can see, the green + represents the widget point after warping.
Notes:
There are some nice pictures in this article explaining homographies.
You can play with transformation matrices here
MATLAB Code:
WP =[
98 119 583 569
86 416 80 409
1 1 1 1
];
SC =[
0 799 0 799
0 0 599 599
1 1 1 1
];
A = zeros(8,9);
for i = 1 : 4
X = WP(:,i);
x = SC(1,i); y = SC(2,i);
A(2*i-1,:) = [0 0 0 -X(1) -X(2) -1 y*X(1) y*X(2) y];
A(2*i,:) = [X(1) X(2) 1 0 0 0 -x*X(1) -x*X(2) -x];
end
[U S V] = svd(A);
H = transpose(reshape(V(:,end),[3 3]));
H = H/H(3,3);
A
0 0 0 -98 -86 -1 0 0 0
98 86 1 0 0 0 0 0 0
0 0 0 -119 -416 -1 0 0 0
119 416 1 0 0 0 -95081 -332384 -799
0 0 0 -583 -80 -1 349217 47920 599
583 80 1 0 0 0 0 0 0
0 0 0 -569 -409 -1 340831 244991 599
569 409 1 0 0 0 -454631 -326791 -799

Due to perspective effects linear or even bilinear transformations may not be accurate enough.
Look at correct perspective mapping and more from google on this phrase, may be this is what you need...

Since your input area isn't a rectangle of the same aspect-ratio as the screen, you'll have to apply some sort of transformation to do the mapping.
What I would do is take the proportions of where the inner point is with respect to the outer sides and map that to the same proportions of the screen.
To do this, calculate the amount of the free space above, below, to the left, and to the right of the inner point and use the ratio to find out where in the screen the point should be.
alt text http://img230.imageshack.us/img230/5301/mapkg.png
Once you have the measurements, place the inner point at:
x = left / (left + right)
y = above / (above + below)
This way, no matter how skewed the webcam frame is, you can still map to the full regular rectangle on the screen.

Try the following: split the original rectangle and this figure with 2 diagonals. Their crossing is (k, l). You have 4 distorted triangles (ab-cd-kl, cd-ef-kl, ef-gh-kl, gh-ab-kl) and the point xy is in one of them.
(4 triangles are better than 2, since the distortion doesn't depend on the diagonal chosen)
You need to find in which triangle point XY is. To do that you need only 2 checks:
Check if it's in ab-cd-ef. If true, go on with ab-cd-ef, (in your case it's not, so we proceed with cd-ef-gh).
We don't check cd-ef-gh, but already check a half of it: cd-gh-kl. The point is there. (Otherwise it would have been ef-gh-kl)
Here's an excellent algorythm to check if a point is in a polygon, using only it's points.
Now you need only to map the point to the original triangle cd-gh-kl. The point xy is a linear combination of the 3 points:
x = c * a1 + g * a2 + k * (1 - a1 - a2)
y = d * a1 + h * a2 + l * (1 - a1 - a2)
a1 + a2 <= 1
2 variables (a1, a2) with 2 equations. I guess you can derive the solution formulae on your own.
Then you just make a linear combinations of a1&a2 with the corresponding points' co-ordinates in the original rectangle. In this case with W (width) and H (height) it's
X = width * a1 + width * a2 + width / 2 * (1 - a1 - a2)
Y = 0 * a1 + height * a2 + height / 2 * (1 - a1 - a2)

More of how to do this in objective-c in xcode, related to jacobs post, you can find here: calculate the V from A = USVt in objective-C with SVD from LAPACK in xcode

The "Kabcsh Algorithm" does exactly this: it creates a rotation matrix between two spaces given N matched pairs of positions.
http://en.wikipedia.org/wiki/Kabsch_algorithm

Related

Calculate vector endpoint

I want to found the endpoints (X,Y,Z) coordinatas of yellow vector.
In two dimension is very simple, but i want to rotate 45 degrees around Z axis in 3D
in 2D:
lenght: 10
start point: 0, 0
end point X=lenght*COS(45deg)=7,07
end point Z=lenght*SIN(45deg)=7,07
How do i calculate X,Y,Z endpoints in 3D?
step, yellow vector end position in 10,0,0 ->
step, increment degrees from X axis, +45 degrees ->
step rotate the vector endpoint around Z axis with +45 degrees, what is the vector new endpoint coordinatas?
You use some unconventional terminology. Particularly it is not clear what "increment degrees from X axis, +45 degrees" means. Anyway this is probably solvable with rotation matricies. I assume that step #2 means "rotate along the Y-axis by 45 degrees".
So this gives us:
Step 1 v1 = (10,0,0)
Step 2 rotate along the Y-axis by 45 degrees. So we should multiply our vector by a matrix:
cos(45) 0 sin(45)
0 1 0
-sin(45) 0 cos(45)
which gives us v2 = (10*sqrt(2)/2, 0, 10*sqrt(2)/2) = (5*sqrt(2), 0, 5*sqrt(2))
Step 3 rotate the vector endpoint around Z axis with +45 degrees. So we should multiply our vector v2 by a matrix:
cos(45) -sin(45) 0
sin(45) cos(45) 0
0 0 1
which gives us v3 = (5, 5, 5*sqrt(2)).
P.S. Note that by doing those 2 rotations you don't get the "middle line" of the 3 axis as one might think, because that vector obviously has all 3 components equal.

R: Find the values of a 2D function (given by a matrix) along an arbitrary line

I have a matrix describing a 2D surface and I need to be able to calculate values along the surface for an arbitrary line.
This is best explained by an example
#x and y axes
x=c(1:100)
y=c(1:100)
# 2D Matrix function defined as 0 except for a middle box filled with 1
M=matrix(0,nrow=100,ncol=100)
M[40:60,40:60]=1
# define two points
x1=50
y1=50
x2=23
y2=80
# plot contour graph of M, add points (x1,y1) and (x2,y2)
# and a line connecting the two
contour(x,y,M)
points(x1,y1,col=2)
points(x2,y2,col=3)
lines(c(x1,x2),c(y1,y2),lty=2)
What I want to do is to get values of M along the line from (x1,y1) to (x2,y2), for instance at the values (xvec,yvec) where xvec=seq(x1,x2,length.out=N) and likewise for yvec.
Is there a simple way to do this in R?
Sincerely
It actually pretty easy using the capacity of the "[" function to accept a two column matrix (and here assuming N <- 20):
M[ cbind(xvec,yvec) ]
[1] 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
Because indices are truncated, the values won't necessary be those of the nearest points when the arguments are not integers.

Very Simple Math Question

Very simple math question.
Say I have an image with a point being tracked in it. Here are my variables:
Image Height
Image Width
Point (pixles from left) coordinate X
Point (pixles from top) coordinate Y
For example the width, I want it to return a value of -0.5, which represents the distance from the center, such that 1 would be the total right, and -1 would be the total left.
So, how would I calculate so that
The point was (width) a quarter way across the entire frame, or a half way across the left SIDE of the frame. The variables would equal:
Image width: 40
Point X: 10
I know this is basic, but I seriously am having a mind cramp right now O_o.
Thanks,
Christian
Xnew = 2*X/Width - 1
Ynew = 2*Y/Height - 1
Explanation:
X/Width gives you value from 0 (total left) to 1 (total right). 2*X/Width then gives a value from 0 (total left) to 2 (total right). Subtract 1 to get a value from -1 (total left) to 1 (total right).
The same for Y.
If image width is 40, and Point x is 10, then in "your" coordinates PointX will be 0.5 (assuming that coordinates are from -20 to 20). So:
PointX = 1 - 2 * (X / ImageWidth)
PointY = 1 - 2 * (Y / ImageHeight)
Checkout:
PointX = 1 - 2 * (10 / 40) = 0.5 (or 10 pixels to the right side)

rotating a 2d square into another

I have two squares, S1 = (x1,y1,x2,y2) and S2 = (a1,b1,a2,b2)
I'm looking for the A transformation matrix with which
A * S1 = S2
As far as I see, A is an affine 3x3 matrix, so I have 9 unknown values.
How can I calculate these values?
thanks and best,
Viktor
There are really only four unknown values here. A rotation angle, a scale factor and an x and y translation. Of your three by three matrix the bottom row is always 0,0,1 which reduces you to six unknowns. The right hand column will be Tx,Ty,1 which are your translations (and the 1 we already know about).
The two by two "matrix" left will be your rotation and scaling. This will (off the top of my head) be something like:
ACos(B), -Asin(B)
ASin(B), aCos(B)
So in total:
ACos(B), -Asin(B), Tx
ASin(B), ACos(B), Ty
0 , 0 , 1
You extend your co-ordinate matrices with the 1 on the end of each co-ordinate to give 2x3 matrices and they then multiply to give you the four equations you need to solve for the four variables. That is left as an exercise for the reader.
A transformation matrix is a factor of scaling matrix Ss, transition matrix St and rotation matrix Sr.
Assume the old point is Po is (Xo,Yo) and as vector will be represented as (Xo Yo 1)' same for the new point Pn
Then Pnv =SsStSrPov
Where Sx is
Sx 0 0
0 Sy 0
0 0 1
St is
1 0 Tx
0 1 Ty
0 0 1
Sr is
Cos(th) -Sin(th) 0
Sin(th) Cos(th) 0
0 0 1
Now back to your question. if two point are giving to represent a rectangle we can just find the parameter of two matrix and the third one will be an identity matrix.
Rect1 is represented as Top-Left point P11 and Bottom-Right Point P12
Rect2 is represented as Top-Left point P21 and Bottom-Right Point P22
S=Ss*St
Sx 0 Tx
0 Sy Ty
0 0 1
Now you have 4 missing parameters and 4 set of equations
P21=S*P11
P22=S*P12
X[P21] =Sx*X[P11]+Tx
Y[P21] =Sy*Y[P11]+Ty
X[P22] =Sx*X[P12]+Tx
Y[P22] =Sy*Y[P12]+Ty
Solve it and you'll get your answer.
and if you have transition and rotation then
S=Sr*St.
Cos(th) -Sin(th) Tx
Sin(th) Cos(th) Ty
0 0 1
Now you have 3 missing parameters and 4 set of equations
P21=S*P11
P22=S*P12
X[P21] =Cos(th)*X[P11]-Sin(th)*Y[P11]+Tx
Y[P21] =Sin(th)*X[P11]+Cos(th)*Y[P11]+Ty
X[P22] =Cos(th)*X[P11]-Sin(th)*Y[P12]+Tx
Y[P22] =Sin(th)*X[P11]+Cos(th)*Y[P12]+Ty
Replace Cos(th) with A and Sin(th) With B and solve the equations.
X[P21] =A*X[P11]-B*Y[P11]+Tx
Y[P21] =B*X[P11]+A*Y[P11]+Ty
X[P22] =A*X[P11]-B*Y[P12]+Tx
Y[P22] =B*X[P11]+A*Y[P12]+Ty
Check if its correct A^2+B^2 =? 1 if is true then th = aCos(A)
The last part of the solution, if you'll have all three matrixes, then S=SrStSs is
Sx*sin(th) -Sx*cos(th) Tx
Sy*cos(th) Sy*sin(th) Ty
0 0 1
Now we have 5 missing variables and we need 6 different set of equations to solve it. which is mean 3 points from each rectangle.
You shouldn't have a 3x3 matrix if you're just looking to transform a 2D object. What you're looking for is a 2x2 matrix that solves A*S1=S2. This can be done in many different ways; in MATLAB, you'd do a S2/S1 (right matrix division), and generally this performs some kind of Gaussian elimination.
How can I calculate these values?
When applied to 2d/3d transformations, matrix can be represented a coordinate system, unless we are talking about projections.
Matrix rows (or columns, depending on notation) form axes of a new coordinate system, in which object will be placed placed if every object vertex is multiplied by the matrix. Last row (or columne, depending on notation) points to the center of the new coordinate system.
Standard OpenGL/DirectX transformation matrix (NOT a projection matrix):
class Matrix{//C++ code
public:
union{
float f[16];
float m[4][4];
};
};
Can be represented as combination of 4 vectors vx (x axis of the new coordinate system), vy(y axis of a new coordinate system), vz(z axis of a new coordinate system), and vp (center of the new system). Like this:
vx.x vx.y vx.z 0
vy.x vy.y vy.z 0
vz.x vz.y vz.z 0
vp.x vp.y vp.z 1
All "calculate rotation matrix", "calculate scale matrix", etc go down to this idea.
Thus, for 2d matrix, you'll have 3x3 matrix that consists of 3 vectors - vx, vy, vp, because there is no z vector in 2d. I.e.:
vx.x vx.y 0
vy.x vy.y 0
vp.x vp.y 1
To find a transform that would transform quad A into quad B, you need to find two transforms:
Transform that will move quad A into origin (i.e. at point zero), and convert it into quad of fixed size. Say, quad (rectangle) whose one vertex x = 0, y = 0, and whose vertices are located at (0, 1), (1, 0), (1, 1).
Transform that turns quad of fixed size into quad B.
You CANNOT do that it this way if opposite edges of quad are not parallel. I.e. parallelograms are fine, but random 4-sided polygons are not.
A quad can be represented by base point (vp) which can be any vertex of the quad and two vectors that define quad sizes (direction of the edge multiplied by edge's length). I.e. "up" vector and "side" vector. Which makes it a matrix:
side.x side.y 0
up.x up.y 0
vp.x vp.y 1
So, multiplying a quad (vp.x = 0, vp.y = 0, side.x = 1, side.y = 0, up.x = 0, up.y = 1) by this matrix will turn original quad into your quad. Which means, that in order to transform
quad A into quad B, you need to do this:
1) make a matrix that would transform "base 1unit quad" into quad A. Let's call it matA.
2) make a matrix that would transform "base 1 unit quad" into quad B. let's call it matB.
3) invert matA and store result into invMatA.
4) the result matrix is invMatA * matB.
Done. If you multiply quad A by result matrix, you'll get quad B. This won't work if quads have zero widths or heights, and it won't work if quads are not parallelograms.
This is hard to understand, but I cannot to make it simpler.
What do you mean by S1 = (x1,y1,x2,y2)?
Do they represent the top-left and bottom-right corners of the square?
Also, can you guarantee there's only rotation between the squares or do you need a full affine transformation which allows for scaling, skewing, and translation?
Or do you also need a perspective transformation?
Only if it's a perspective transformation, will you need 3x3 matrix with 8 dof as you've mentioned in your post.

Vertex shader world transform, why do we use 4 dimensional vectors?

From this site: http://www.toymaker.info/Games/html/vertex_shaders.html
We have the following code snippet:
// transformations provided by the app, constant Uniform data
float4x4 matWorldViewProj: WORLDVIEWPROJECTION;
// the format of our vertex data
struct VS_OUTPUT
{
float4 Pos : POSITION;
};
// Simple Vertex Shader - carry out transformation
VS_OUTPUT VS(float4 Pos : POSITION)
{
VS_OUTPUT Out = (VS_OUTPUT)0;
Out.Pos = mul(Pos,matWorldViewProj);
return Out;
}
My question is: why does the struct VS_OUTPUT have a 4 dimensional vector as its position? Isn't position just x, y and z?
Because you need the w coordinate for perspective calculation. After you output from the vertex shader than DirectX performs a perspective divide by dividing by w.
Essentially if you have 32768, -32768, 32768, 65536 as your output vertex position then after w divide you get 0.5, -0.5, 0.5, 1. At this point the w can be discarded as it is no longer needed. This information is then passed through the viewport matrix which transforms it to usable 2D coordinates.
Edit: If you look at how a matrix multiplication is performed using the projection matrix you can see how the values get placed in the correct places.
Taking the projection matrix specified in D3DXMatrixPerspectiveLH
2*zn/w 0 0 0
0 2*zn/h 0 0
0 0 zf/(zf-zn) 1
0 0 zn*zf/(zn-zf) 0
And applying it to a random x, y, z, 1 (Note for a vertex position w will always be 1) vertex input value you get the following
x' = ((2*zn/w) * x) + (0 * y) + (0 * z) + (0 * w)
y' = (0 * x) + ((2*zn/h) * y) + (0 * z) + (0 * w)
z' = (0 * x) + (0 * y) + ((zf/(zf-zn)) * z) + ((zn*zf/(zn-zf)) * w)
w' = (0 * x) + (0 * y) + (1 * z) + (0 * w)
Instantly you can see that w and z are different. The w coord now just contains the z coordinate passed to the projection matrix. z contains something far more complicated.
So .. assume we have an input position of (2, 1, 5, 1) we have a zn (Z-Near) of 1 and a zf (Z-Far of 10) and a w (width) of 1 and a h (height) of 1.
Passing these values through we get
x' = (((2 * 1)/1) * 2
y' = (((2 * 1)/1) * 1
z' = ((10/(10-1) * 5 + ((10 * 1/(1-10)) * 1)
w' = 5
expanding that we then get
x' = 4
y' = 2
z' = 4.4
w' = 5
We then perform final perspective divide and we get
x'' = 0.8
y'' = 0.4
z'' = 0.88
w'' = 1
And now we have our final coordinate position. This assumes that x and y ranges from -1 to 1 and z ranges from 0 to 1. As you can see the vertex is on-screen.
As a bizarre bonus you can see that if |x'| or |y'| or |z'| is larger than |w'| or z' is less than 0 that the vertex is offscreen. This info is used for clipping the triangle to the screen.
Anyway I think thats a pretty comprehensive answer :D
Edit2: Be warned i am using ROW major matrices. Column major matrices are transposed.
Rotation is specified by a 3 dimensional matrix and translation by a vector. You can perform both transforms in a "single" operation by combining them into a single 4 x 3 matrix:
rx1 rx2 rx3 tx1
ry1 ry2 ry3 ty1
rz1 rz2 rz3 tz1
However as this isn't square there are various operations that can't be performed (inversion for one). By adding an extra row (that does nothing):
0 0 0 1
all these operations become possible (if not easy).
As Goz explains in his answer by making the "1" a non identity value the matrix becomes a perspective transformation.
Clipping is an important part of this process, as it helps to visualize what happens to the geometry. The clipping stage essentially discards any point in a primitive that is outside of a 2-unit cube centered around the origin (OK, you have to reconstruct primitives that are partially clipped but that doesn't matter here).
It would be possible to construct a matrix that directly mapped your world space coordinates to such a cube, but gradual movement from the far plane to the near plane would be linear. That is to say that a move of one foot (towards the viewer) when one mile away from the viewer would cause the same increase in size as a move of one foot when several feet from the camera.
However, if we have another coordinate in our vector (w), we can divide the vector component-wise by w, and our primitives won't exhibit the above behavior, but we can still make them end up inside the 2-unit cube above.
For further explanations see http://www.opengl.org/resources/faq/technical/depthbuffer.htm#0060 and http://en.wikipedia.org/wiki/Transformation_matrix#Perspective_projection.
A simple answer would be to say that if you don't tell the pipeline what w is then you haven't given it enough information about your projection. This can be verified directly without understanding what the pipeline does with it...
As you probably know the 4x4 matrix can be split into parts based on what each part does. The 3x3 matrix at the top left is altered when you do rotation or scale operations. The fourth column is altered when you do a translation. If you ever inspect a perspective matrix, it alters the bottom row of the matrix. If you then look at how a Matrix-Vector multiplication is done, you see that the bottom row of the matrix ONLY affects the resultant w component of the vector. So if you don't tell the pipeline about w it won't have all your information.

Resources