Very Simple Math Question - math

Very simple math question.
Say I have an image with a point being tracked in it. Here are my variables:
Image Height
Image Width
Point (pixles from left) coordinate X
Point (pixles from top) coordinate Y
For example the width, I want it to return a value of -0.5, which represents the distance from the center, such that 1 would be the total right, and -1 would be the total left.
So, how would I calculate so that
The point was (width) a quarter way across the entire frame, or a half way across the left SIDE of the frame. The variables would equal:
Image width: 40
Point X: 10
I know this is basic, but I seriously am having a mind cramp right now O_o.

Xnew = 2*X/Width - 1
Ynew = 2*Y/Height - 1
X/Width gives you value from 0 (total left) to 1 (total right). 2*X/Width then gives a value from 0 (total left) to 2 (total right). Subtract 1 to get a value from -1 (total left) to 1 (total right).
The same for Y.

If image width is 40, and Point x is 10, then in "your" coordinates PointX will be 0.5 (assuming that coordinates are from -20 to 20). So:
PointX = 1 - 2 * (X / ImageWidth)
PointY = 1 - 2 * (Y / ImageHeight)
PointX = 1 - 2 * (10 / 40) = 0.5 (or 10 pixels to the right side)


Given three boxes X, Y, Z. Let W denote white balls and B denote black balls. The contents of the boxes are : X (2W, 3B) , Y(3W, 1B) , Z(1W, 4B).

You need to select 1 box and from that draw 1 object at random. What is the probability that the object drawn is black ?
Approach 1 : Sample space = { (the box number, the ball) }
// Sample Space = {(X,W) (X,W) (X,B) (X,B) (X,B) .... similar for Y and Z }
Thus answer is (3+1+4)/(2+3+3+1+1+4) = 8/14
Approach 2: summation ( probability of chosing ith box* prob of chosing a black) = (1/3 * 3/5) + (1/3 * 1/4) + (1/3 * 4/5) = 11/20
Which approach is correct and why ?
The second approach is correct. Just consider the extreme case you had the following setup:
Box X: (0W, 1B) <- no white balls, just 1 black ball
Box Y: (99W, 0B) <- loads of white balls, no black ball
Your first approach would give you a probability of 1% to get a black ball, but obviously, since you pick a box at random first and Box X does not contain white balls and Box Y does not contain black balls, the probability must be 50%. So it's
(1/2 * 1) + (1/2 * 0) = 1/2
Second Approach is correct.
There are basically 2 events , selecting the box and selecting the ball.
In first approach you are assuming that only 1 event is present(selecting the ball).

Mathematical game wondering

Imagine an arm that is 50 px long.
It is placed at 100,100.
The rotation center is at 100, 100.
The arm rotates all the time.
On the arm there is a hook that travels back and forth the full distance of the arm.
My variables:
X = 100;
Y = 100;
RotationAngel = 120; // Loops up to 360.
HookDistanceFromCenter = 25; // Goes 0 -> 50 -> 0 by a loop.
How do I get the position (x,y) of the hook?
From your specific data:
x = 100 - HookDistanceFromCenter * cos(180 - RotationAngle)
y = 100 + HookDistanceFromCenter * sin(180 - RotationAngle)
but it changes depending on which quadrant you are in. This is basic trigonometry. You should be able to use the info here: except that the radius of your circle is HookDistanceFromCenter and you have to add your rotation center coordinates to the result to get the actual (x,y).

Convert arbitrary length to a value between -1.0 a 1.0?

How can I convert a length into a value in the range -1.0 to 1.0?
Example: my stage is 440px in length and accepts mouse events. I would like to click in the middle of the stage, and rather than an output of X = 220, I'd like it to be X = 0. Similarly, I'd like the real X = 0 to become X = -1.0 and the real X = 440 to become X = 1.0.
I don't have access to the stage, so i can't simply center-register it, which would make this process a lot easier. Also, it's not possible to dynamically change the actual size of my stage, so I'm looking for a formula that will translate the mouse's real X coordinate of the stage to evenly fit within a range from -1 to 1.
-1 + (2/440)*x
where x is the distance
So, to generalize it, if the minimum normalized value is a and the maximum normalized value is b (in your example a = -1.0, b = 1.0 and the maximum possible value is k (in your example k = 440):
a + x*(b-a)/k
where x is >= 0 and <= k
This is essentially two steps:
Center the range on 0, so for example a range from 400 to 800 moves so it's from -200 to 200. Do this by subtracting the center (average) of the min and max of the range
Divide by the absolute value of the range extremes to convert from a -n to n range to a -1 to 1 range. In the -200 to 200 example, you'd divide by 200
Doesn't answer your question, but for future googlers looking for a continuous monotone function that maps all real numbers to (-1, 1), any sigmoid curve will do, such as atan or a logistic curve:
f(x) = atan(x) / (pi/2)
f(x) = 2/(1+e-x) - 1
(x - 220) / 220 = new X
Is that what you're looking for?
You need to shift the origin and normalize the range. So the expression becomes
(XCoordinate - 220) / 220.0
handling arbitrary stage widths (no idea if you've got threads to consider, which might require mutexes or similar depending on your language?)
stageWidth = GetStageWidth(); // which may return 440 in your case
clickedX = MouseInput(); // should be 0 to 440
x = -1.0 + 2.0 * (clickedX / stageWidth); // scale to -1.0 to +1.0
you may also want to limit x to the range [-1,1] here?
if ( x < -1 ) x = -1.0;
if ( x > 1 ) x = 1.0;
or provide some kind of feedback/warning/error if its out of bounds (only if it really matters and simply clipping it to the range [-1,1] isn't good enough).
You have an interval [a,b] that you'd like to map to a new interval [c,d], and a value x in the original coordinates that you'd like to map to y in the new coordinates. Then:
y = c + (x-a)*(c-d)/(b-a)
And for your example with [a,b] = [0,440] and [c,d] = [-1,1], with x=220:
y = -1 + (220-0)*(1 - -1)/(440-0)
= 0
and so forth.
By the way, this works even if x is outside of [a,b]. So as long as you know any two values in both systems, you can convert any value in either direction.

Transforming captured co-ordinates into screen co-ordinates

I think this is probably a simple maths question but I have no idea what's going on right now.
I'm capturing the positions of "markers" on a webcam and I have a list of markers and their co-ordinates. Four of the markers are the outer corners of a work surface, and the fifth (green) marker is a widget. Like this:
Here's some example data:
Top left marker (a=98, b=86)
Top right marker (c=119, d=416)
Bottom left marker (e=583, f=80)
Bottom right marker (g=569, h=409)
Widget marker (x=452, y=318)
I'd like to somehow transform the webcam's widget position into a co-ordinate to display on the screen, where top left is 0,0 not 98,86 and somehow take into account the warped angles from the webcam capture.
Where would I even begin? Any help appreciated
In order to compute the warping, you need to compute a homography between the four corners of your input rectangle and the screen.
Since your webcam polygon seems to have an arbitrary shape, a full perspective homography can be used to convert it to a rectangle. It's not that complicated, and you can solve it with a mathematical function (should be easily available) known as Singular Value Decomposition or SVD.
Background information:
For planar transformations like this, you can easily describe them with a homography, which is a 3x3 matrix H such that if any point on or in your webcam polygon, say x1 were multiplied by H, i.e. H*x1, we would get a point on the screen (rectangular), i.e. x2.
Now, note that these points are represented by their homogeneous coordinates which is nothing but adding a third coordinate (the reason for which is beyond the scope of this post). So, suppose your coordinates for X1 were, (100,100), then the homogeneous representation would be a column vector x1 = [100;100;1] (where ; represents a new row).
Ok, so now we have 8 homogeneous vectors representing 4 points on the webcam polygon and the 4 corners of your screen - this is all we need to compute a homography.
Computing the homography:
A little math:
I'm not going to get into the math, but briefly this is how we solve it:
We know that 3x3 matrix H,
H =
h11 h12 h13
h21 h22 h23
h31 h32 h33
where hij represents the element in H at the ith row and the jth column
can be used to get the new screen coordinates by x2 = H*x1. Also, the result will be something like x2 = [12;23;0.1] so to get it in the screen coordinates, we normalize it by the third element or X2 = (120,230) which is (12/0.1,23/0.1).
So this means each point in your webcam polygon (WP) can be multiplied by H (and then normalized) to get your screen coordinates (SC), i.e.
SC1 = H*WP1
SC2 = H*WP2
SC3 = H*WP3
SC4 = H*WP4
where SCi refers to the ith point in screen coordinates and
WPi means the same for the webcam polygon
Computing H: (the quick and painless explanation)
for n = 1 to 4
// WP_n refers to the 4th point in the webcam polygon
X = WP_n;
// SC_n refers to the nth point in the screen coordinates
// corresponding to the nth point in the webcam polygon
// For example, WP_1 and SC_1 is the top-left point for the webcam
// polygon and the screen coordinates respectively.
x = SC_n(1); y = SC_n(2);
// A is the matrix which we'll solve to get H
// A(i,:) is the ith row of A
// Here we're stacking 2 rows per point correspondence on A
// X(i) is the ith element of the vector X (the webcam polygon coordinates, e.g. (120,230)
A(2*n-1,:) = [0 0 0 -X(1) -X(2) -1 y*X(1) y*X(2) y];
A(2*n,:) = [X(1) X(2) 1 0 0 0 -x*X(1) -x*X(2) -x];
Once you have A, just compute svd(A) which will give decompose it into U,S,VT (such that A = USVT). The vector corresponding to the smallest singular value is H (once you reshape it into a 3x3 matrix).
With H, you can retrieve the "warped" coordinates of your widget marker location by multiplying it with H and normalizing.
In your particular example if we assume that your screen size is 800x600,
WP =
98 119 583 569
86 416 80 409
1 1 1 1
SC =
0 799 0 799
0 0 599 599
1 1 1 1
where each column corresponds to corresponding points.
Then we get:
H =
-0.0155 -1.2525 109.2306
-0.6854 0.0436 63.4222
0.0000 0.0001 -0.5692
Again, I'm not going into the math, but if we normalize H by h33, i.e. divide each element in H by -0.5692 in the example above,
H =
0.0272 2.2004 -191.9061
1.2042 -0.0766 -111.4258
-0.0000 -0.0002 1.0000
This gives us a lot of insight into the transformation.
[-191.9061;-111.4258] defines the translation of your points (in pixels)
[0.0272 2.2004;1.2042 -0.0766] defines the affine transformation (which is essentially scaling and rotation).
The last 1.0000 is so because we scaled H by it and
[-0.0000 -0.0002] denotes the projective transformation of your webcam polygon.
Also, you can check if H is accurate my multiplying SC = H*WP and normalizing each column with its last element:
0.0000 -413.6395 0 -411.8448
-0.0000 0.0000 -332.7016 -308.7547
-0.5580 -0.5177 -0.5554 -0.5155
Dividing each column, by it's last element (e.g. in column 2, -413.6395/-0.5177 and 0/-0.5177):
-0.0000 799.0000 0 799.0000
0.0000 -0.0000 599.0000 599.0000
1.0000 1.0000 1.0000 1.0000
Which is the desired result.
Widget Coordinates:
Now, your widget coordinates can be transformed as well H*[452;318;1], which (after normalizing is (561.4161,440.9433).
So, this is what it would look like after warping:
As you can see, the green + represents the widget point after warping.
There are some nice pictures in this article explaining homographies.
You can play with transformation matrices here
WP =[
98 119 583 569
86 416 80 409
1 1 1 1
SC =[
0 799 0 799
0 0 599 599
1 1 1 1
A = zeros(8,9);
for i = 1 : 4
X = WP(:,i);
x = SC(1,i); y = SC(2,i);
A(2*i-1,:) = [0 0 0 -X(1) -X(2) -1 y*X(1) y*X(2) y];
A(2*i,:) = [X(1) X(2) 1 0 0 0 -x*X(1) -x*X(2) -x];
[U S V] = svd(A);
H = transpose(reshape(V(:,end),[3 3]));
H = H/H(3,3);
0 0 0 -98 -86 -1 0 0 0
98 86 1 0 0 0 0 0 0
0 0 0 -119 -416 -1 0 0 0
119 416 1 0 0 0 -95081 -332384 -799
0 0 0 -583 -80 -1 349217 47920 599
583 80 1 0 0 0 0 0 0
0 0 0 -569 -409 -1 340831 244991 599
569 409 1 0 0 0 -454631 -326791 -799
Due to perspective effects linear or even bilinear transformations may not be accurate enough.
Look at correct perspective mapping and more from google on this phrase, may be this is what you need...
Since your input area isn't a rectangle of the same aspect-ratio as the screen, you'll have to apply some sort of transformation to do the mapping.
What I would do is take the proportions of where the inner point is with respect to the outer sides and map that to the same proportions of the screen.
To do this, calculate the amount of the free space above, below, to the left, and to the right of the inner point and use the ratio to find out where in the screen the point should be.
alt text
Once you have the measurements, place the inner point at:
x = left / (left + right)
y = above / (above + below)
This way, no matter how skewed the webcam frame is, you can still map to the full regular rectangle on the screen.
Try the following: split the original rectangle and this figure with 2 diagonals. Their crossing is (k, l). You have 4 distorted triangles (ab-cd-kl, cd-ef-kl, ef-gh-kl, gh-ab-kl) and the point xy is in one of them.
(4 triangles are better than 2, since the distortion doesn't depend on the diagonal chosen)
You need to find in which triangle point XY is. To do that you need only 2 checks:
Check if it's in ab-cd-ef. If true, go on with ab-cd-ef, (in your case it's not, so we proceed with cd-ef-gh).
We don't check cd-ef-gh, but already check a half of it: cd-gh-kl. The point is there. (Otherwise it would have been ef-gh-kl)
Here's an excellent algorythm to check if a point is in a polygon, using only it's points.
Now you need only to map the point to the original triangle cd-gh-kl. The point xy is a linear combination of the 3 points:
x = c * a1 + g * a2 + k * (1 - a1 - a2)
y = d * a1 + h * a2 + l * (1 - a1 - a2)
a1 + a2 <= 1
2 variables (a1, a2) with 2 equations. I guess you can derive the solution formulae on your own.
Then you just make a linear combinations of a1&a2 with the corresponding points' co-ordinates in the original rectangle. In this case with W (width) and H (height) it's
X = width * a1 + width * a2 + width / 2 * (1 - a1 - a2)
Y = 0 * a1 + height * a2 + height / 2 * (1 - a1 - a2)
More of how to do this in objective-c in xcode, related to jacobs post, you can find here: calculate the V from A = USVt in objective-C with SVD from LAPACK in xcode
The "Kabcsh Algorithm" does exactly this: it creates a rotation matrix between two spaces given N matched pairs of positions.

Vertex shader world transform, why do we use 4 dimensional vectors?

From this site:
We have the following code snippet:
// transformations provided by the app, constant Uniform data
float4x4 matWorldViewProj: WORLDVIEWPROJECTION;
// the format of our vertex data
struct VS_OUTPUT
float4 Pos : POSITION;
// Simple Vertex Shader - carry out transformation
Out.Pos = mul(Pos,matWorldViewProj);
return Out;
My question is: why does the struct VS_OUTPUT have a 4 dimensional vector as its position? Isn't position just x, y and z?
Because you need the w coordinate for perspective calculation. After you output from the vertex shader than DirectX performs a perspective divide by dividing by w.
Essentially if you have 32768, -32768, 32768, 65536 as your output vertex position then after w divide you get 0.5, -0.5, 0.5, 1. At this point the w can be discarded as it is no longer needed. This information is then passed through the viewport matrix which transforms it to usable 2D coordinates.
Edit: If you look at how a matrix multiplication is performed using the projection matrix you can see how the values get placed in the correct places.
Taking the projection matrix specified in D3DXMatrixPerspectiveLH
2*zn/w 0 0 0
0 2*zn/h 0 0
0 0 zf/(zf-zn) 1
0 0 zn*zf/(zn-zf) 0
And applying it to a random x, y, z, 1 (Note for a vertex position w will always be 1) vertex input value you get the following
x' = ((2*zn/w) * x) + (0 * y) + (0 * z) + (0 * w)
y' = (0 * x) + ((2*zn/h) * y) + (0 * z) + (0 * w)
z' = (0 * x) + (0 * y) + ((zf/(zf-zn)) * z) + ((zn*zf/(zn-zf)) * w)
w' = (0 * x) + (0 * y) + (1 * z) + (0 * w)
Instantly you can see that w and z are different. The w coord now just contains the z coordinate passed to the projection matrix. z contains something far more complicated.
So .. assume we have an input position of (2, 1, 5, 1) we have a zn (Z-Near) of 1 and a zf (Z-Far of 10) and a w (width) of 1 and a h (height) of 1.
Passing these values through we get
x' = (((2 * 1)/1) * 2
y' = (((2 * 1)/1) * 1
z' = ((10/(10-1) * 5 + ((10 * 1/(1-10)) * 1)
w' = 5
expanding that we then get
x' = 4
y' = 2
z' = 4.4
w' = 5
We then perform final perspective divide and we get
x'' = 0.8
y'' = 0.4
z'' = 0.88
w'' = 1
And now we have our final coordinate position. This assumes that x and y ranges from -1 to 1 and z ranges from 0 to 1. As you can see the vertex is on-screen.
As a bizarre bonus you can see that if |x'| or |y'| or |z'| is larger than |w'| or z' is less than 0 that the vertex is offscreen. This info is used for clipping the triangle to the screen.
Anyway I think thats a pretty comprehensive answer :D
Edit2: Be warned i am using ROW major matrices. Column major matrices are transposed.
Rotation is specified by a 3 dimensional matrix and translation by a vector. You can perform both transforms in a "single" operation by combining them into a single 4 x 3 matrix:
rx1 rx2 rx3 tx1
ry1 ry2 ry3 ty1
rz1 rz2 rz3 tz1
However as this isn't square there are various operations that can't be performed (inversion for one). By adding an extra row (that does nothing):
0 0 0 1
all these operations become possible (if not easy).
As Goz explains in his answer by making the "1" a non identity value the matrix becomes a perspective transformation.
Clipping is an important part of this process, as it helps to visualize what happens to the geometry. The clipping stage essentially discards any point in a primitive that is outside of a 2-unit cube centered around the origin (OK, you have to reconstruct primitives that are partially clipped but that doesn't matter here).
It would be possible to construct a matrix that directly mapped your world space coordinates to such a cube, but gradual movement from the far plane to the near plane would be linear. That is to say that a move of one foot (towards the viewer) when one mile away from the viewer would cause the same increase in size as a move of one foot when several feet from the camera.
However, if we have another coordinate in our vector (w), we can divide the vector component-wise by w, and our primitives won't exhibit the above behavior, but we can still make them end up inside the 2-unit cube above.
For further explanations see and
A simple answer would be to say that if you don't tell the pipeline what w is then you haven't given it enough information about your projection. This can be verified directly without understanding what the pipeline does with it...
As you probably know the 4x4 matrix can be split into parts based on what each part does. The 3x3 matrix at the top left is altered when you do rotation or scale operations. The fourth column is altered when you do a translation. If you ever inspect a perspective matrix, it alters the bottom row of the matrix. If you then look at how a Matrix-Vector multiplication is done, you see that the bottom row of the matrix ONLY affects the resultant w component of the vector. So if you don't tell the pipeline about w it won't have all your information.
