Greatest distance between set of longitude/latitude points

Greatest distance between set of longitude/latitude points - r

I have a set of lng/lat coordinates. What would be an efficient method of calculating the greatest distance between any two points in the set (the "maximum diameter" if you will)?
A naive way is to use Haversine formula to calculate the distance between each 2 points and get the maximum, but this doesn't scale well obviously.
Edit: the points are located on a sufficiently small area, measuring the area in which a person carrying a mobile device was active in the course of a single day.

Theorem #1: The ordering of any two great circle distances along the surface of the earth is the same as the ordering as the straight line distance between the points where you tunnel through the earth.
Hence turn your lat-long into x,y,z based either on a spherical earth of arbitrary radius or an ellipsoid of given shape parameters. That's a couple of sines/cosines per point (not per pair of points).
Now you have a standard 3-d problem that doesn't rely on computing Haversine distances. The distance between points is just Euclidean (Pythagoras in 3d). Needs a square-root and some squares, and you can leave out the square root if you only care about comparisons.
There may be fancy spatial tree data structures to help with this. Or algorithms such as http://www.tcs.fudan.edu.cn/rudolf/Courses/Algorithms/Alg_ss_07w/Webprojects/Qinbo_diameter/2d_alg.htm (click 'Next' for 3d methods). Or C++ code here: http://valis.cs.uiuc.edu/~sariel/papers/00/diameter/diam_prog.html
Once you've found your maximum distance pair, you can use the Haversine formula to get the distance along the surface for that pair.

I think that the following could be a useful approximation, which scales linearly instead of quadratically with the number of points, and is quite easy to implement:
calculate the center of mass M of the points
find the point P0 that has the maximum distance to M
find the point P1 that has the maximum distance to P0
approximate the maximum diameter with the distance between P0 and P1
This can be generalized by repeating step 3 N times,
and taking the distance between PN-1 and PN
Step 1 can be carried out efficiently approximating M as the average of longitudes and latitudes, which is OK when distances are "small" and the poles are sufficiently far away. The other steps could be carried out using the exact distance formula, but they are much faster if the points' coordinates can be approximated as lying on a plane. Once the "distant pair" (hopefully the pair with the maximum distance) has been found, its distance can be re-calculated with the exact formula.
An example of approximation could be the following: if φ(M) and λ(M) are latitude and longitude of the center of mass calculated as Σφ(P)/n and Σλ(P)/n,
x(P) = (λ(P) - λ(M) + C) cos(φ(P))
y(P) = φ(P) - φ(M) [ this is only for clarity, it can also simply be y(P) = φ(P) ]
where C is usually 0, but can be ± 360° if the set of points crosses the λ=±180° line. To find the maximum distance you simply have to find
max((x(PN) - x(PN-1))2 + (y(PN) - y(PN-1))2)
(you don't need the square root because it is monotonic)
The same coordinate transformation could be used to repeat step 1 (in the new coordinate system) in order to have a better starting point. I suspect that if some conditions are met, the above steps (without repeating step 3) always lead to the "true distant pair" (my terminology). If I only knew which conditions...
EDIT:
I hate building on others' solutions, but someone will have to.
Still keeping the above 4 steps, with the optional (but probably beneficial, depending on the typical distribution of points) repetition of step 3,
and following the solution of Spacedman,
doing calculations in 3D overcomes the limitations of closeness and distance from poles:
x(P) = sin(φ(P))
y(P) = cos(φ(P)) sin(λ(P))
z(P) = cos(φ(P)) cos(λ(P))
(the only approximation is that this holds only for a perfect sphere)
The center of mass is given by x(M) = Σx(P)/n, etc.,
and the maximum one has to look for is
max((x(PN) - x(PN-1))2 + (y(PN) - y(PN-1))2 + (z(PN) - z(PN-1))2)
So: you first transform spherical to cartesian coordinates, then start from the center of mass, to find, in at least two steps (steps 2 and 3), the farthest point from the preceding point. You could repeat step 3 as long as the distance increases, perhaps with a maximum number of repetitions, but this won't take you away from a local maximum. Starting from the center of mass is not of much help, either, if the points are spread all over the Earth.
EDIT 2:
I learned enough R to write down the core of the algorithm (nice language for data analysis!)
For the plane approximation, ignoring the problem around the λ=±180° line:
# input: lng, lat (vectors)
rad = pi / 180;
x = (lng - mean(lng)) * cos(lat * rad)
y = (lat - mean(lat))
i = which.max((x - mean(x))^2 + (y )^2)
j = which.max((x - x[i] )^2 + (y - y[i])^2)
# output: i, j (indices)
On my PC it takes less than a second to find the indices i and j for 1000000 points. The following 3D version is a bit slower, but works for any distribution of points (and does not need to be amended when the λ=±180° line is crossed):
# input: lng, lat
rad = pi / 180
x = sin(lat * rad)
f = cos(lat * rad)
y = sin(lng * rad) * f
z = cos(lng * rad) * f
i = which.max((x - mean(x))^2 + (y - mean(y))^2 + (z - mean(z))^2)
j = which.max((x - x[i] )^2 + (y - y[i] )^2 + (z - z[i] )^2)
k = which.max((x - x[j] )^2 + (y - y[j] )^2 + (z - z[j] )^2) # optional
# output: j, k (or i, j)
The calculation of k can be left out (i.e., the result could be given by i and j), depending on the data and on the requirements. On the other hand, my experiments have shown that calculating a further index is useless.
It should be remembered that, in any case, the distance between the resulting points is an estimate which is a lower bound of the "diameter" of the set, although it very often will be the diameter itself (how often depends on the data.)
EDIT 3:
Unfortunately the relative error of the plane approximation can, in extreme cases, be as much as 1-1/√3 ≅ 42.3%, which may be unacceptable, even if very rare. The algorithm can be modified in order to have an upper bound of approximately 20%, which I have derived by compass and straight-edge (the analytic solution is cumbersome). The modified algorithm finds a pair of points whith a locally maximal distance, then repeats the same steps, but this time starting from the midpoint of the first pair, possibly finding a different pair:
# input: lng, lat
rad = pi / 180
x = (lng - mean(lng)) * cos(lat * rad)
y = (lat - mean(lat))
i.n_1 = 1 # n_1: n-1
x.n_1 = mean(x)
y.n_1 = 0 # = mean(y)
s.n_1 = 0 # s: square of distance
repeat {
s = (x - x.n_1)^2 + (y - y.n_1)^2
i.n = which.max(s)
x.n = x[i.n]
y.n = y[i.n]
s.n = s[i.n]
if (s.n <= s.n_1) break
i.n_1 = i.n
x.n_1 = x.n
y.n_1 = y.n
s.n_1 = s.n
}
i.m_1 = 1
x.m_1 = (x.n + x.n_1) / 2
y.m_1 = (y.n + y.n_1) / 2
s.m_1 = 0
m_ok = TRUE
repeat {
s = (x - x.m_1)^2 + (y - y.m_1)^2
i.m = which.max(s)
if (i.m == i.n || i.m == i.n_1) { m_ok = FALSE; break }
x.m = x[i.m]
y.m = y[i.m]
s.m = s[i.m]
if (s.m <= s.m_1) break
i.m_1 = i.m
x.m_1 = x.m
y.m_1 = y.m
s.m_1 = s.m
}
if (m_ok && s.m > s.n) {
i = i.m
j = i.m_1
} else {
i = i.n
j = i.n_1
}
# output: i, j
The 3D algorithm can be modified in a similar way. It is possible (both in the 2D and in the 3D case) to start over once again from the midpoint of the second pair of points (if found). The upper bound in this case is "left as an exercise for the reader" :-).
Comparison of the modified algorithm with the (too) simple algorithm has shown, for normal and for square uniform distributions, a near doubling of processing time, and a reduction of the average error from .6% to .03% (order of magnitude). A further restart from the midpoint results in an a just slightly better average error, but almost equal maximum error.
EDIT 4:
I have to study this article yet, but it looks like the 20% I found with compass and straight-edge is in fact 1-1/√(5-2√3) ≅ 19.3%

Here's a naive example that doesn't scale well (as you say), as you say but might help with building a solution in R.
## lonlat points
n <- 100
d <- cbind(runif(n, -180, 180), runif(n, -90, 90))
library(sp)
## distances on WGS84 ellipsoid
x <- spDists(d, longlat = TRUE)
## row, then column index of furthest points
ind <- c(row(x)[which.max(x)], col(x)[which.max(x)])
## maps
library(maptools)
data(wrld_simpl)
plot(as(wrld_simpl, "SpatialLines"), col = "grey")
points(d, pch = 16, cex = 0.5)
## draw the points and a line between on the page
points(d[ind, ], pch = 16)
lines(d[ind, ], lwd = 2)
## for extra credit, draw the great circle on which the furthest points lie
library(geosphere)
lines(greatCircle(d[ind[1], ], d[ind[2], ]), col = "firebrick")
The geosphere package provides more options for distance calculation if that's needed. See ?spDists in sp for the details used here.

You don't tell us whether these points will be located in a sufficiently small part of the globe. For truly global sets of points, my first guess would be running a naive O(n^2) algorithm, possibly getting performance boost with some spatial indexing (R*-trees, octal-trees etc.). The idea is to pre-generate an n*(n-1) list of the triangle in the distance matrix and feed it in chunks to a fast distance library to minimize I/O and process churn. Haversine is fine, you could also do it with Vincenty's method (the greatest contributor to running time is quadratic complexity, not the (fixed number of) iterations in Vincenty's formula). As a side note, in fact, you don't need R for this stuff.
EDIT #2: The Barequet-Har-Peled algorithm (as pointed at by Spacedman in his reply) has O((n+1/(e^3))log(1/e)) complexity for e>0, and is worth exploring.
For the quasi-planar problem, this is known as "diameter of convex hull" and has three parts:
Computing convex hull with Graham's scan which is O(n*log(n)) - in fact, one should try transforming points into a transverse Mercator projection (using the centroid of the points in data set).
Finding antipodal points by Rotating Calipers algorithm - linear O(n).
Finding the largest distance among all antipodal pairs - linear search, O(n).
The link with pseudo-code and discussion: http://fredfsh.com/2013/05/03/convex-hull-and-its-diameter/
See also the discussion on a related question here: https://gis.stackexchange.com/questions/17358/how-can-i-find-the-farthest-point-from-a-set-of-existing-points
EDIT: Spacedman's solution pointed me to the Malandain-Boissonnat algorithm (see the paper in pdf here). However, this is worse or the same as the bruteforce naive O(n^2) algorithm.

Related

Understanding & Deriving Jacobian Determinant Scaling Factor

I have been trying to understand Jacobian Determinant.
I hope someone is able to give me a pointer.
Most material that I found on Internet didn't provide
derivation of Jacobian Determinant.
One such web site is:
http://tutorial.math.lamar.edu
(Which I find quite good, otherwise.)
I spent a lot of time trying to deepen my understanding of
Jacobian Determinant.
I played with Transformations that define uv-axes and
how integration of a function over a Region/area would work
with the Transformations.
For example, when I started with simple Transformations of:
u = ( x - y )/√2
v = ( x + y )/2√2
which is uv-axes rotated -45° from Cartesian xy-axes,
and with v-axis at 2 times the scale,
that is, v = 1 maps to 2 units length in xy-coords.
So, I say that uscale = 1, vscale = 2,
for the above transformations.
With this uv-axes, I can simplify a 10x20 rectangle Region
which is rotated at 45° from x-axis,
such that the longer dimension points at 45° from x-axis.
With such examples, I begin to develop intuition
how Jacobian Determinant works.
I understand Jacobian Determinant to be a Scaling Factor
to convert area measurement in uv-axes to xy-dimensions.
Area measurement in uv-axes is given simply by formula
Δu x Δv, where Δu = 10, Δv = 10, because vscale = 2).
Jacobian Determinant Scaling Factor = uscale x vscale
(quite intuitively).
Area in xy-dimensions = Δu x Δv x (uscale x vscale)
= 10 x 10 x 1 x 2 = 200.
Integration of volume over such a simpler uv Square,
could be easier than over the same xy Region,
appearing at an angle.
With the above initial understanding,
I am trying to work out how Jacobian Determinant is derived.
Deriving from the above Transformations formula:
dx/du = √2 / 2
dx/dv = √2
dy/du = -√2 / 2
dy/dv = √2
I can also derive from Geometry that:
dx/du = uscale cos Θ
dy/du = uscale sin Θ
dx/dv = vscale cos (90° - Θ)
dy/dv = vscale sin (90° - Θ)
I could get:
areaInXY / areaInUV = uscale x vscale
which matches my understanding.
However, Jacobian Determinant formula is:
∂(x, y) / ∂(u, v) = ∂x/∂u ∂y/∂v - ∂x/∂v ∂y/∂u
= uscale * vscale * cos 2Θ
This leaves me quite puzzled why I have the extra cos 2Θ factor
which isn't making intuitive sense -- why would the
area Scaling Factor depends on how the rectangle is rotated
and thus how uv-axes are rotated?!
Anybody can see where my reasoning went wrong above?

Let me try to explain what basically the Jacobian determinant does. This is true in general for smooth functions mapping from R^n to R^n, but for the sake of simplicity, assume we are working on R^2. Let F(x,y) a smooth R^2 to R^2 function. Then we can say that F(x,y) sends the x coordinate to f1(x,y) and the y coordiate to f2(x,y) at point (x,y). Then think about an infinitesimal rectangular area, defined by the points (x,y),(x+dx,y),(x,y+dy) and (x+dx,y+dy). Now, the area of this infinitesimal rectangle is dxdy. What happens to this rectangle when it goes through the F(x,y) transformation? We apply F(x,y) to each of the four coordinates and obtain the following points:
A:(x,y)->(f1(x,y),f2(x,y))
B:(x+dx,y) -> (f1(x+dx,y),f2(x+dx,y)) (approx.)= (f1(x,y) + (∂f1/∂x)dx,f2(x,y) + (∂f2/∂x)dx)
C:(x,y+dy) -> (f1(x,y+dy),f2(x,y+dy)) (approx.)= (f1(x,y) + (∂f1/∂y)dy,f2(x,y) + (∂f2/∂y)dy)
D:(x+dx,y+dy) -> (f1(x+dx,y+dy),f2(x+dx,y+dy)) (approx.)=(f1(x,y) + (∂f1/∂x)dx + (∂f1/∂y)dy,f2(x,y) + (∂f2/∂x)dx + (∂f2/∂y)dy)
The equalities are approximately equal and exactly hold in the limit where dx and dy goes to 0, they are the best linear approximation to the function F at new points. (We obtain these from the first order parts of the Taylor approximation of the functions f1 and f2).
If we look to the new (approximated) area under the transformation F(x,y), we see the new distance vectors between the transformed points a:
B-A:((∂f1/∂x)dx,(∂f2/∂x)dx)
C-A:((∂f1/∂y)dy,(∂f2/∂y)dy)
D-C:((∂f1/∂x)dx,(∂f2/∂x)dx)
D-B:((∂f1/∂y)dy,(∂f2/∂y)dy)
As you can see, the newly transformed infinitesimal area is a parallelogram. Let:
u=((∂f1/∂x)dx,(∂f2/∂x)dx)
v=((∂f1/∂y)dy,(∂f2/∂y)dy)
These vectors constitute the edges of our parallelogram. It can be shown with the help of the cross product between u and v, that the area of the parallelogram is:
area^2 = (u1v2 - u2v1)^2 = ((∂f1/∂x)(∂f2/∂y)dxdy - (∂f2/∂x)(∂f1/∂y)dxdy)^2
area^2 = ((∂f1/∂x)(∂f2/∂y) - (∂f2/∂x)(∂f1/∂y))^2 (dxdy)^2
area = |(∂f1/∂x)(∂f2/∂y) - (∂f2/∂x)(∂f1/∂y)|dxdy (dx and dy are positive)
area = |det([∂f1/∂x, ∂f1/∂y],[∂f2/∂x, ∂f2/∂y])|dxdy
So, the matrix we are going to take the determinant of is simply the Jacobian matrix. Like I said in the beginning, this derivation can be extended to arbitrary dimensions of n,given the coordinate transformation function F is smooth and the Jacobian matrix is hence invertible, with non-zero determinant.
A good visual explanation of this is given at: http://mathinsight.org/double_integral_change_variables_introduction

Generating random points on a surface of an n-dimensional torus

I'd like to generate random points being located on the surface of an n-dimensional torus. I have found formulas for how to generate the points on the surface of a 3-dimensional torus:
x = (c + a * cos(v)) * cos(u)
y = (c + a * cos(v)) * sin(u)
z = a * sin(v)
u, v ∈ [0, 2 * pi); c, a > 0.
My question is now: how to extend this formulas to n dimensions. Any help on the matter would be much appreciated.

I guess that you can do this recursively. Start with a full orthonormal basis of your vector space, and let the current location be the origin. At each step, choose a point in the plane spanned by the first two coordinate vectors, i.e. take w1 = cos(t)*v1 + sin(t)*v2. Shift the other basis vectors, i.e. w2 = v3, w3 = v4, …. Also take a step from your current position in the direction w1, with the radius r1 chosen up front. When you only have a single basis vector remaining, then the current point is a point on the n-dimensional torus of the outermost recursive call.
Note that while the above may be used to choose points randomly, it won't choose them uniformly. That would likely be a much harder question, and you definitely should ask about the math of that on Math SE or perhaps on Cross Validated (Statistics SE) to get the math right before you worry about implementation.

An n-torus (n being the dimensionality of the surface of the torus; a bagel or doughnut is therefore a 2-torus, not a 3-torus) is a smooth mapping of an n-rectangle. One way to approach this is to generate points on the rectangle and then map them onto the torus. Aside from the problem of figuring out how to map a rectangle onto a torus (I don't know it off-hand), there is the problem that the resulting distribution of points on the torus is not uniform even if the distribution of points is uniform on the rectangle. But there must be a way to adjust the distribution on the rectangle to make it uniform on the torus.

Merely generating u and v uniformly will not necessarily sample uniformly from a torus surface. An additional step is needed.
J.F. Williamson, "Random selection of points distributed on curved surfaces", Physics in Medicine & Biology 32(10), 1987, describes a general method of choosing a uniformly random point on a parametric surface. It is an acceptance/rejection method that accepts or rejects each candidate point depending on its stretch factor (norm-of-gradient). To use this method for a parametric surface, several things have to be known about the surface, namely—
x(u, v), y(u, v) and z(u, v), which are functions that generate 3-dimensional coordinates from two dimensional coordinates u and v,
The ranges of u and v,
g(point), the norm of the gradient ("stretch factor") at each point on the surface, and
gmax, the maximum value of g for the entire surface.
For the 3-dimensional torus with the parameterization you give in your question, g and gmax are the following:
g(u, v) = a * (c + cos(v) * a).
gmax = a * (a + c).
The algorithm to generate a uniform random point on the surface of a 3-dimensional torus with torus radius c and tube radius a is then as follows (where RNDEXCRANGE(x,y) returns a number in [x,y) uniformly at random, and RNDRANGE(x,y) returns a number in [x,y] uniformly at random):
// Maximum stretch factor for torus
gmax = a * (a + c)
while true
u = RNDEXCRANGE(0, pi * 2)
v = RNDEXCRANGE(0, pi * 2)
x = cos(u)*(c+cos(v)*a)
y = sin(u)*(c+cos(v)*a)
z = sin(v)*a
// Norm of gradient (stretch factor)
g = a*abs(c+cos(v)*a)
if g >= RNDRANGE(0, gmax)
// Accept the point
return [x, y, z]
end
end
If you have n-dimensional torus generating formulas, a similar approach can be used to generate uniform random points on that torus (accept a candidate point if norm-of-gradient equals or exceeds a random number in [0, gmax), where gmax is the maximum norm-of-gradient).

Intersection of two moving line segments (or a moving line segment and a point)

I'm trying to design a 2D physics engine with continuous collision detection. Objects are stored as a list of non-rotating line-segments. Therefore I can detect collisions by finding the collision time between each pair of line segments between any two objects.
I want to find the exact time for an intersection between two moving line-segments that are moving in a constant direction, and it is proving to be difficult.
I have figured out that I can simplify the problem further by finding the collision time between each point on a line-segment and the other line-segment (and vice versa). It's possible that it is computationally inefficient, so a general solution for two line segments would be the ideal answer. I can also ignore the case in which lines are parallel (I want to treat a line/point sharing the same position and velocity as 'no collision').
If the answer is "not possible" to exactly find this intersection time, I would accept it as a solution. Any help on the subject would be appreciated.
EDIT: According to Wikipedia's article on a Line segment, for a line segment with endpoints A = (a_x, a_y) and C = (c_x, c_y), a general equation for the line segment looks like this:
For a line-segment--point intersection, would substituting
p_x + p_v * t for a_x (left-side only, right-side is just p_x)
p_y + p_v * t for a_y (left-side only, right-side is just p_y)
q_x + q_v * t for c_x (left-side only, right-side is just q_x)
q_y + q_v * t for c_y (left-side only, right-side is just q_y)
r_x + r_v * t for x
r_y + r_v * t for y
for a line segment pq [(p_x, p_y), (q_x, q_y)], point r (r_x, r_y), moving at rates of p_v == q_v != r_v be solvable for t? Here's the full equation:

The equation I have up above is incorrect in that it uses the same velocity for both its x and y components.
Since velocity is constant, I can simplify the equation such that the point is moving in reference to the line segment. The amount of variables used for velocity reduces greatly, by using v = r_v - qp_v for the velocity of the point r, and 0 for the velocity of each line segment. The equation with the variables plugged in then becomes:
Thanks to WolframAlpha, the equation is then solved for t:
What's interesting is that if you analyze this, it's symmetrical for 3D. Cross product for [x1, y1, 0] and [x2, y2, 0] is [0, 0, x1*y2 - y1*x2]. This equation then translates into:

For a line-segment--point intersection, I can find an interval in which there is a collision (although this interval is larger than the actual time for the collision):
Given a line segment [p, q] moving at velocity v, and a point r with velocity w, direction(w) != direction(v), define three lines L1 = [p, p+v], L2 = [q, q+v], L3 = [r, r+w]. Let t1, t_p and t2, t_q be the intersection times between L1 and L3 and between L2 and L3, respectively. If the interval [t1, t2] is not mutually exclusive with [t_p, t_q], then there is an intersection in the intersection of these two intervals (e.g. intersection between [-1, 10] and [2, 20] is [2, 10]). If these intervals are mutually exclusive, then there is no collision.
Additionally, if the direction of v and w ARE the same, but not of equal length, then you can find the exact time of collision. Let s be the point r when projected onto the line [p, q]. If this point is in the line segment [p, q], there is a collision at time t1, which can be calculated by dividing the distance between point r and point s by the relative velocity between the point r and the line-segment [p, q].
Using the interval it's possible to get an estimate for the time by using a binary-search--like method of comparing distances between the segment and point at specific times. This is very inefficient, however.

Generate a random point within a circle (uniformly)

I need to generate a uniformly random point within a circle of radius R.
I realize that by just picking a uniformly random angle in the interval [0 ... 2π), and uniformly random radius in the interval (0 ... R) I would end up with more points towards the center, since for two given radii, the points in the smaller radius will be closer to each other than for the points in the larger radius.
I found a blog entry on this over here but I don't understand his reasoning. I suppose it is correct, but I would really like to understand from where he gets (2/R2)×r and how he derives the final solution.
Update: 7 years after posting this question I still hadn't received a satisfactory answer on the actual question regarding the math behind the square root algorithm. So I spent a day writing an answer myself. Link to my answer.

How to generate a random point within a circle of radius R:
r = R * sqrt(random())
theta = random() * 2 * PI
(Assuming random() gives a value between 0 and 1 uniformly)
If you want to convert this to Cartesian coordinates, you can do
x = centerX + r * cos(theta)
y = centerY + r * sin(theta)
Why sqrt(random())?
Let's look at the math that leads up to sqrt(random()). Assume for simplicity that we're working with the unit circle, i.e. R = 1.
The average distance between points should be the same regardless of how far from the center we look. This means for example, that looking on the perimeter of a circle with circumference 2 we should find twice as many points as the number of points on the perimeter of a circle with circumference 1.
Since the circumference of a circle (2πr) grows linearly with r, it follows that the number of random points should grow linearly with r. In other words, the desired probability density function (PDF) grows linearly. Since a PDF should have an area equal to 1 and the maximum radius is 1, we have
So we know how the desired density of our random values should look like.
Now: How do we generate such a random value when all we have is a uniform random value between 0 and 1?
We use a trick called inverse transform sampling
From the PDF, create the cumulative distribution function (CDF)
Mirror this along y = x
Apply the resulting function to a uniform value between 0 and 1.
Sounds complicated? Let me insert a blockquote with a little side track that conveys the intuition:
Suppose we want to generate a random point with the following distribution:
That is
1/5 of the points uniformly between 1 and 2, and
4/5 of the points uniformly between 2 and 3.
The CDF is, as the name suggests, the cumulative version of the PDF. Intuitively: While PDF(x) describes the number of random values at x, CDF(x) describes the number of random values less than x.
In this case the CDF would look like:
To see how this is useful, imagine that we shoot bullets from left to right at uniformly distributed heights. As the bullets hit the line, they drop down to the ground:
See how the density of the bullets on the ground correspond to our desired distribution! We're almost there!
The problem is that for this function, the y axis is the output and the x axis is the input. We can only "shoot bullets from the ground straight up"! We need the inverse function!
This is why we mirror the whole thing; x becomes y and y becomes x:
We call this CDF-1. To get values according to the desired distribution, we use CDF-1(random()).
…so, back to generating random radius values where our PDF equals 2x.
Step 1: Create the CDF:
Since we're working with reals, the CDF is expressed as the integral of the PDF.
CDF(x) = ∫ 2x = x2
Step 2: Mirror the CDF along y = x:
Mathematically this boils down to swapping x and y and solving for y:
CDF: y = x2
Swap: x = y2
Solve: y = √x
CDF-1: y = √x
Step 3: Apply the resulting function to a uniform value between 0 and 1
CDF-1(random()) = √random()
Which is what we set out to derive :-)

Let's approach this like Archimedes would have.
How can we generate a point uniformly in a triangle ABC, where |AB|=|BC|? Let's make this easier by extending to a parallelogram ABCD. It's easy to generate points uniformly in ABCD. We uniformly pick a random point X on AB and Y on BC and choose Z such that XBYZ is a parallelogram. To get a uniformly chosen point in the original triangle we just fold any points that appear in ADC back down to ABC along AC.
Now consider a circle. In the limit we can think of it as infinitely many isoceles triangles ABC with B at the origin and A and C on the circumference vanishingly close to each other. We can pick one of these triangles simply by picking an angle theta. So we now need to generate a distance from the center by picking a point in the sliver ABC. Again, extend to ABCD, where D is now twice the radius from the circle center.
Picking a random point in ABCD is easy using the above method. Pick a random point on AB. Uniformly pick a random point on BC. Ie. pick a pair of random numbers x and y uniformly on [0,R] giving distances from the center. Our triangle is a thin sliver so AB and BC are essentially parallel. So the point Z is simply a distance x+y from the origin. If x+y>R we fold back down.
Here's the complete algorithm for R=1. I hope you agree it's pretty simple. It uses trig, but you can give a guarantee on how long it'll take, and how many random() calls it needs, unlike rejection sampling.
t = 2*pi*random()
u = random()+random()
r = if u>1 then 2-u else u
[r*cos(t), r*sin(t)]
Here it is in Mathematica.
f[] := Block[{u, t, r},
u = Random[] + Random[];
t = Random[] 2 Pi;
r = If[u > 1, 2 - u, u];
{r Cos[t], r Sin[t]}
]
ListPlot[Table[f[], {10000}], AspectRatio -> Automatic]

Here is a fast and simple solution.
Pick two random numbers in the range (0, 1), namely a and b. If b < a, swap them. Your point is (b*R*cos(2*pi*a/b), b*R*sin(2*pi*a/b)).
You can think about this solution as follows. If you took the circle, cut it, then straightened it out, you'd get a right-angled triangle. Scale that triangle down, and you'd have a triangle from (0, 0) to (1, 0) to (1, 1) and back again to (0, 0). All of these transformations change the density uniformly. What you've done is uniformly picked a random point in the triangle and reversed the process to get a point in the circle.

Note the point density in proportional to inverse square of the radius, hence instead of picking r from [0, r_max], pick from [0, r_max^2], then compute your coordinates as:
x = sqrt(r) * cos(angle)
y = sqrt(r) * sin(angle)
This will give you uniform point distribution on a disk.
http://mathworld.wolfram.com/DiskPointPicking.html

Think about it this way. If you have a rectangle where one axis is radius and one is angle, and you take the points inside this rectangle that are near radius 0. These will all fall very close to the origin (that is close together on the circle.) However, the points near radius R, these will all fall near the edge of the circle (that is, far apart from each other.)
This might give you some idea of why you are getting this behavior.
The factor that's derived on that link tells you how much corresponding area in the rectangle needs to be adjusted to not depend on the radius once it's mapped to the circle.
Edit: So what he writes in the link you share is, "That’s easy enough to do by calculating the inverse of the cumulative distribution, and we get for r:".
The basic premise is here that you can create a variable with a desired distribution from a uniform by mapping the uniform by the inverse function of the cumulative distribution function of the desired probability density function. Why? Just take it for granted for now, but this is a fact.
Here's my somehwat intuitive explanation of the math. The density function f(r) with respect to r has to be proportional to r itself. Understanding this fact is part of any basic calculus books. See sections on polar area elements. Some other posters have mentioned this.
So we'll call it f(r) = C*r;
This turns out to be most of the work. Now, since f(r) should be a probability density, you can easily see that by integrating f(r) over the interval (0,R) you get that C = 2/R^2 (this is an exercise for the reader.)
Thus, f(r) = 2*r/R^2
OK, so that's how you get the formula in the link.
Then, the final part is going from the uniform random variable u in (0,1) you must map by the inverse function of the cumulative distribution function from this desired density f(r). To understand why this is the case you need to find an advanced probability text like Papoulis probably (or derive it yourself.)
Integrating f(r) you get F(r) = r^2/R^2
To find the inverse function of this you set u = r^2/R^2 and then solve for r, which gives you r = R * sqrt(u)
This totally makes sense intuitively too, u = 0 should map to r = 0. Also, u = 1 shoudl map to r = R. Also, it goes by the square root function, which makes sense and matches the link.

Let ρ (radius) and φ (azimuth) be two random variables corresponding to polar coordinates of an arbitrary point inside the circle. If the points are uniformly distributed then what is the disribution function of ρ and φ?
For any r: 0 < r < R the probability of radius coordinate ρ to be less then r is
P[ρ < r] = P[point is within a circle of radius r] = S1 / S0 =(r/R)2
Where S1 and S0 are the areas of circle of radius r and R respectively.
So the CDF can be given as:
0 if r<=0
CDF = (r/R)**2 if 0 < r <= R
1 if r > R
And PDF:
PDF = d/dr(CDF) = 2 * (r/R**2) (0 < r <= R).
Note that for R=1 random variable sqrt(X) where X is uniform on [0, 1) has this exact CDF (because P[sqrt(X) < y] = P[x < y**2] = y**2 for 0 < y <= 1).
The distribution of φ is obviously uniform from 0 to 2*π. Now you can create random polar coordinates and convert them to Cartesian using trigonometric equations:
x = ρ * cos(φ)
y = ρ * sin(φ)
Can't resist to post python code for R=1.
from matplotlib import pyplot as plt
import numpy as np
rho = np.sqrt(np.random.uniform(0, 1, 5000))
phi = np.random.uniform(0, 2*np.pi, 5000)
x = rho * np.cos(phi)
y = rho * np.sin(phi)
plt.scatter(x, y, s = 4)
You will get

The reason why the naive solution doesn't work is that it gives a higher probability density to the points closer to the circle center. In other words the circle that has radius r/2 has probability r/2 of getting a point selected in it, but it has area (number of points) pi*r^2/4.
Therefore we want a radius probability density to have the following property:
The probability of choosing a radius smaller or equal to a given r has to be proportional to the area of the circle with radius r. (because we want to have a uniform distribution on the points and larger areas mean more points)
In other words we want the probability of choosing a radius between [0,r] to be equal to its share of the overall area of the circle. The total circle area is pi*R^2, and the area of the circle with radius r is pi*r^2. Thus we would like the probability of choosing a radius between [0,r] to be (pi*r^2)/(pi*R^2) = r^2/R^2.
Now comes the math:
The probability of choosing a radius between [0,r] is the integral of p(r) dr from 0 to r (that's just because we add all the probabilities of the smaller radii). Thus we want integral(p(r)dr) = r^2/R^2. We can clearly see that R^2 is a constant, so all we need to do is figure out which p(r), when integrated would give us something like r^2. The answer is clearly r * constant. integral(r * constant dr) = r^2/2 * constant. This has to be equal to r^2/R^2, therefore constant = 2/R^2. Thus you have the probability distribution p(r) = r * 2/R^2
Note: Another more intuitive way to think about the problem is to imagine that you are trying to give each circle of radius r a probability density equal to the proportion of the number of points it has on its circumference. Thus a circle which has radius r will have 2 * pi * r "points" on its circumference. The total number of points is pi * R^2. Thus you should give the circle r a probability equal to (2 * pi * r) / (pi * R^2) = 2 * r/R^2. This is much easier to understand and more intuitive, but it's not quite as mathematically sound.

It really depends on what you mean by 'uniformly random'. This is a subtle point and you can read more about it on the wiki page here: http://en.wikipedia.org/wiki/Bertrand_paradox_%28probability%29, where the same problem, giving different interpretations to 'uniformly random' gives different answers!
Depending on how you choose the points, the distribution could vary, even though they are uniformly random in some sense.
It seems like the blog entry is trying to make it uniformly random in the following sense: If you take a sub-circle of the circle, with the same center, then the probability that the point falls in that region is proportional to the area of the region. That, I believe, is attempting to follow the now standard interpretation of 'uniformly random' for 2D regions with areas defined on them: probability of a point falling in any region (with area well defined) is proportional to the area of that region.

Here is my Python code to generate num random points from a circle of radius rad:
import matplotlib.pyplot as plt
import numpy as np
rad = 10
num = 1000
t = np.random.uniform(0.0, 2.0*np.pi, num)
r = rad * np.sqrt(np.random.uniform(0.0, 1.0, num))
x = r * np.cos(t)
y = r * np.sin(t)
plt.plot(x, y, "ro", ms=1)
plt.axis([-15, 15, -15, 15])
plt.show()

I think that in this case using polar coordinates is a way of complicate the problem, it would be much easier if you pick random points into a square with sides of length 2R and then select the points (x,y) such that x^2+y^2<=R^2.

Solution in Java and the distribution example (2000 points)
public void getRandomPointInCircle() {
double t = 2 * Math.PI * Math.random();
double r = Math.sqrt(Math.random());
double x = r * Math.cos(t);
double y = r * Math.sin(t);
System.out.println(x);
System.out.println(y);
}
based on previus solution https://stackoverflow.com/a/5838055/5224246 from #sigfpe

I used once this method:
This may be totally unoptimized (ie it uses an array of point so its unusable for big circles) but gives random distribution enough. You could skip the creation of the matrix and draw directly if you wish to. The method is to randomize all points in a rectangle that fall inside the circle.
bool[,] getMatrix(System.Drawing.Rectangle r) {
bool[,] matrix = new bool[r.Width, r.Height];
return matrix;
}
void fillMatrix(ref bool[,] matrix, Vector center) {
double radius = center.X;
Random r = new Random();
for (int y = 0; y < matrix.GetLength(0); y++) {
for (int x = 0; x < matrix.GetLength(1); x++)
{
double distance = (center - new Vector(x, y)).Length;
if (distance < radius) {
matrix[x, y] = r.NextDouble() > 0.5;
}
}
}
}
private void drawMatrix(Vector centerPoint, double radius, bool[,] matrix) {
var g = this.CreateGraphics();
Bitmap pixel = new Bitmap(1,1);
pixel.SetPixel(0, 0, Color.Black);
for (int y = 0; y < matrix.GetLength(0); y++)
{
for (int x = 0; x < matrix.GetLength(1); x++)
{
if (matrix[x, y]) {
g.DrawImage(pixel, new PointF((float)(centerPoint.X - radius + x), (float)(centerPoint.Y - radius + y)));
}
}
}
g.Dispose();
}
private void button1_Click(object sender, EventArgs e)
{
System.Drawing.Rectangle r = new System.Drawing.Rectangle(100,100,200,200);
double radius = r.Width / 2;
Vector center = new Vector(r.Left + radius, r.Top + radius);
Vector normalizedCenter = new Vector(radius, radius);
bool[,] matrix = getMatrix(r);
fillMatrix(ref matrix, normalizedCenter);
drawMatrix(center, radius, matrix);
}

First we generate a cdf[x] which is
The probability that a point is less than distance x from the centre of the circle. Assume the circle has a radius of R.
obviously if x is zero then cdf[0] = 0
obviously if x is R then the cdf[R] = 1
obviously if x = r then the cdf[r] = (Pi r^2)/(Pi R^2)
This is because each "small area" on the circle has the same probability of being picked, So the probability is proportionally to the area in question. And the area given a distance x from the centre of the circle is Pi r^2
so cdf[x] = x^2/R^2 because the Pi cancel each other out
we have cdf[x]=x^2/R^2 where x goes from 0 to R
So we solve for x
R^2 cdf[x] = x^2
x = R Sqrt[ cdf[x] ]
We can now replace cdf with a random number from 0 to 1
x = R Sqrt[ RandomReal[{0,1}] ]
Finally
r = R Sqrt[ RandomReal[{0,1}] ];
theta = 360 deg * RandomReal[{0,1}];
{r,theta}
we get the polar coordinates
{0.601168 R, 311.915 deg}

This might help people interested in choosing an algorithm for speed; the fastest method is (probably?) rejection sampling.
Just generate a point within the unit square and reject it until it is inside a circle. E.g (pseudo-code),
def sample(r=1):
while True:
x = random(-1, 1)
y = random(-1, 1)
if x*x + y*y <= 1:
return (x, y) * r
Although it may run more than once or twice sometimes (and it is not constant time or suited for parallel execution), it is much faster because it doesn't use complex formulas like sin or cos.

The area element in a circle is dA=rdr*dphi. That extra factor r destroyed your idea to randomly choose a r and phi. While phi is distributed flat, r is not, but flat in 1/r (i.e. you are more likely to hit the boundary than "the bull's eye").
So to generate points evenly distributed over the circle pick phi from a flat distribution and r from a 1/r distribution.
Alternatively use the Monte Carlo method proposed by Mehrdad.
EDIT
To pick a random r flat in 1/r you could pick a random x from the interval [1/R, infinity] and calculate r=1/x. r is then distributed flat in 1/r.
To calculate a random phi pick a random x from the interval [0, 1] and calculate phi=2*pi*x.

You can also use your intuition.
The area of a circle is pi*r^2
For r=1
This give us an area of pi. Let us assume that we have some kind of function fthat would uniformly distrubute N=10 points inside a circle. The ratio here is 10 / pi
Now we double the area and the number of points
For r=2 and N=20
This gives an area of 4pi and the ratio is now 20/4pi or 10/2pi. The ratio will get smaller and smaller the bigger the radius is, because its growth is quadratic and the N scales linearly.
To fix this we can just say
x = r^2
sqrt(x) = r
If you would generate a vector in polar coordinates like this
length = random_0_1();
angle = random_0_2pi();
More points would land around the center.
length = sqrt(random_0_1());
angle = random_0_2pi();
length is not uniformly distributed anymore, but the vector will now be uniformly distributed.

There is a linear relationship between the radius and the number of points "near" that radius, so he needs to use a radius distribution that is also makes the number of data points near a radius r proportional to r.

I don't know if this question is still open for a new solution with all the answer already given, but I happened to have faced exactly the same question myself. I tried to "reason" with myself for a solution, and I found one. It might be the same thing as some have already suggested here, but anyway here it is:
in order for two elements of the circle's surface to be equal, assuming equal dr's, we must have dtheta1/dtheta2 = r2/r1. Writing expression of the probability for that element as P(r, theta) = P{ r1< r< r1 + dr, theta1< theta< theta + dtheta1} = f(r,theta)*dr*dtheta1, and setting the two probabilities (for r1 and r2) equal, we arrive to (assuming r and theta are independent) f(r1)/r1 = f(r2)/r2 = constant, which gives f(r) = c*r. And the rest, determining the constant c follows from the condition on f(r) being a PDF.

I am still not sure about the exact '(2/R2)×r' but what is apparent is the number of points required to be distributed in given unit 'dr' i.e. increase in r will be proportional to r2 and not r.
check this way...number of points at some angle theta and between r (0.1r to 0.2r) i.e. fraction of the r and number of points between r (0.6r to 0.7r) would be equal if you use standard generation, since the difference is only 0.1r between two intervals. but since area covered between points (0.6r to 0.7r) will be much larger than area covered between 0.1r to 0.2r, the equal number of points will be sparsely spaced in larger area, this I assume you already know, So the function to generate the random points must not be linear but quadratic, (since number of points required to be distributed in given unit 'dr' i.e. increase in r will be proportional to r2 and not r), so in this case it will be inverse of quadratic, since the delta we have (0.1r) in both intervals must be square of some function so it can act as seed value for linear generation of points (since afterwords, this seed is used linearly in sin and cos function), so we know, dr must be quadratic value and to make this seed quadratic, we need to originate this values from square root of r not r itself, I hope this makes it little more clear.

Such a fun problem.
The rationale of the probability of a point being chosen lowering as distance from the axis origin increases is explained multiple times above. We account for that by taking the root of U[0,1].
Here's a general solution for a positive r in Python 3.
import numpy
import math
import matplotlib.pyplot as plt
def sq_point_in_circle(r):
"""
Generate a random point in an r radius circle
centered around the start of the axis
"""
t = 2*math.pi*numpy.random.uniform()
R = (numpy.random.uniform(0,1) ** 0.5) * r
return(R*math.cos(t), R*math.sin(t))
R = 200 # Radius
N = 1000 # Samples
points = numpy.array([sq_point_in_circle(R) for i in range(N)])
plt.scatter(points[:, 0], points[:,1])

A programmer solution:
Create a bit map (a matrix of boolean values). It can be as large as you want.
Draw a circle in that bit map.
Create a lookup table of the circle's points.
Choose a random index in this lookup table.
const int RADIUS = 64;
const int MATRIX_SIZE = RADIUS * 2;
bool matrix[MATRIX_SIZE][MATRIX_SIZE] = {0};
struct Point { int x; int y; };
Point lookupTable[MATRIX_SIZE * MATRIX_SIZE];
void init()
{
int numberOfOnBits = 0;
for (int x = 0 ; x < MATRIX_SIZE ; ++x)
{
for (int y = 0 ; y < MATRIX_SIZE ; ++y)
{
if (x * x + y * y < RADIUS * RADIUS)
{
matrix[x][y] = true;
loopUpTable[numberOfOnBits].x = x;
loopUpTable[numberOfOnBits].y = y;
++numberOfOnBits;
} // if
} // for
} // for
} // ()
Point choose()
{
int randomIndex = randomInt(numberOfBits);
return loopUpTable[randomIndex];
} // ()
The bitmap is only necessary for the explanation of the logic. This is the code without the bitmap:
const int RADIUS = 64;
const int MATRIX_SIZE = RADIUS * 2;
struct Point { int x; int y; };
Point lookupTable[MATRIX_SIZE * MATRIX_SIZE];
void init()
{
int numberOfOnBits = 0;
for (int x = 0 ; x < MATRIX_SIZE ; ++x)
{
for (int y = 0 ; y < MATRIX_SIZE ; ++y)
{
if (x * x + y * y < RADIUS * RADIUS)
{
loopUpTable[numberOfOnBits].x = x;
loopUpTable[numberOfOnBits].y = y;
++numberOfOnBits;
} // if
} // for
} // for
} // ()
Point choose()
{
int randomIndex = randomInt(numberOfBits);
return loopUpTable[randomIndex];
} // ()

1) Choose a random X between -1 and 1.
var X:Number = Math.random() * 2 - 1;
2) Using the circle formula, calculate the maximum and minimum values of Y given that X and a radius of 1:
var YMin:Number = -Math.sqrt(1 - X * X);
var YMax:Number = Math.sqrt(1 - X * X);
3) Choose a random Y between those extremes:
var Y:Number = Math.random() * (YMax - YMin) + YMin;
4) Incorporate your location and radius values in the final value:
var finalX:Number = X * radius + pos.x;
var finalY:Number = Y * radois + pos.y;

Projecting to a 2D Plane for Barycentric Calculations

I have three vertices which make up a plane/polygon in 3D Space, v0, v1 & v2.
To calculate barycentric co-ordinates for a 3D point upon this plane I must first project both the plane and point into 2D space.
After trawling the web I have a good understanding of how to calculate barycentric co-ordinates in 2D space, but I am stuck at finding the best way to project my 3D points into a suitable 2D plane.
It was suggested to me that the best way to achieve this was to "drop the axis with the smallest projection". Without testing the area of the polygon formed when projected on each world axis (xy, yz, xz) how can I determine which projection is best (has the largest area), and therefore is most suitable for calculating the most accurate barycentric co-ordinate?

Example of computation of barycentric coordinates in 3D space as requested by the OP. Given:
3D points v0, v1, v2 that define the triangle
3D point p that lies on the plane defined by v0, v1 and v2 and inside the triangle spanned by the same points.
"x" denotes the cross product between two 3D vectors.
"len" denotes the length of a 3D vector.
"u", "v", "w" are the barycentric coordinates belonging to v0, v1 and v2 respectively.
triArea = len((v1 - v0) x (v2 - v0)) * 0.5
u = ( len((v1 - p ) x (v2 - p )) * 0.5 ) / triArea
v = ( len((v0 - p ) x (v2 - p )) * 0.5 ) / triArea
w = ( len((v0 - p ) x (v1 - p )) * 0.5 ) / triArea
=> p == u * v0 + v * v1 + w * v2
The cross product is defined like this:
v0 x v1 := { v0.y * v1.z - v0.z * v1.y,
v0.z * v1.x - v0.x * v1.z,
v0.x * v1.y - v0.y * v1.x }

WARNING - Almost every thing I know about using barycentric coordinates, and using matrices to solve linear equations, was learned last night because I found this question so interesting. So the following may be wrong, wrong, wrong - but some test values I have put in do seem to work.
Guys and girls, please feel free to rip this apart if I screwed up completely - but here goes.
Finding barycentric coords in 3D space (with a little help from Wikipedia)
Given:
v0 = (x0, y0, z0)
v1 = (x1, y1, z1)
v2 = (x2, y2, z2)
p = (xp, yp, zp)
Find the barycentric coordinates:
b0, b1, b2 of point p relative to the triangle defined by v0, v1 and v2
Knowing that:
xp = b0*x0 + b1*x1 + b2*x2
yp = b0*y0 + b1*y1 + b2*y2
zp = b0*z0 + b1*z1 + b2*z2
Which can be written as
[xp] [x0] [x1] [x2]
[yp] = b0*[y0] + b1*[y1] + b2*[y2]
[zp] [z0] [z1] [z2]
or
[xp] [x0 x1 x2] [b0]
[yp] = [y0 y1 y2] . [b1]
[zp] [z0 z1 z2] [b2]
re-arranged as
-1
[b0] [x0 x1 x2] [xp]
[b1] = [y0 y1 y2] . [yp]
[b2] [z0 z1 z2] [zp]
the determinant of the 3x3 matrix is:
det = x0(y1*z2 - y2*z1) + x1(y2*z0 - z2*y0) + x2(y0*z1 - y1*z0)
its adjoint is
[y1*z2-y2*z1 x2*z1-x1*z2 x1*y2-x2*y1]
[y2*z0-y0*z2 x0*z2-x2*z0 x2*y0-x0*y2]
[y0*z1-y1*z0 x1*z0-x0*z1 x0*y1-x1*y0]
giving:
[b0] [y1*z2-y2*z1 x2*z1-x1*z2 x1*y2-x2*y1] [xp]
[b1] = ( [y2*z0-y0*z2 x0*z2-x2*z0 x2*y0-x0*y2] . [yp] ) / det
[b2] [y0*z1-y1*z0 x1*z0-x0*z1 x0*y1-x1*y0] [zp]
If you need to test a number of points against the triangle, stop here. Calculate the above 3x3 matrix once for the triangle (dividing it by the determinant as well), and then dot product that result to each point to get the barycentric coords for each point.
If you are only doing it once per triangle, then here is the above multiplied out (courtesy of Maxima):
b0 = ((x1*y2-x2*y1)*zp+xp*(y1*z2-y2*z1)+yp*(x2*z1-x1*z2)) / det
b1 = ((x2*y0-x0*y2)*zp+xp*(y2*z0-y0*z2)+yp*(x0*z2-x2*z0)) / det
b2 = ((x0*y1-x1*y0)*zp+xp*(y0*z1-y1*z0)+yp*(x1*z0-x0*z1)) / det
That's quite a few additions, subtractions and multiplications - three divisions - but no sqrts or trig functions. It obviously does take longer than the pure 2D calcs, but depending on the complexity of your projection heuristics and calcs, this might end up being the fastest route.
As I mentioned - I have no idea what I'm talking about - but maybe this will work, or maybe someone else can come along and correct it.

Update: Disregard, this approach does not work in all cases
I think I have found a valid solution to this problem.
NB: I require a projection to 2D space rather than working with 3D Barycentric co-ordinates as I am challenged to make the most efficient algorithm possible. The additional overhead incurred by finding a suitable projection plane should still be smaller than the overhead incurred when using more complex operations such as sqrt or sin() cos() functions (I guess I could use lookup tables for sin/cos but this would increase the memory footprint and defeats the purpose of this assignment).
My first attempts found the delta between the min/max values on each axis of the polygon, then eliminated the axis with the smallest delta. However, as suggested by #PeterTaylor there are cases where dropping the axis with the smallest delta, can yeild a straight line rather than a triangle when projected into 2D space. THIS IS BAD.
Therefore my revised solution is as follows...
Find each sub delta on each axis for the polygon { abs(v1.x-v0.x), abs(v2.x-v1.x), abs(v0.x-v2.x) }, this results in 3 scalar values per axis.
Next, multiply these scaler values to compute a score. Repeat this, calculating a score for each axis. (This way any 0 deltas force the score to 0, automatically eliminating this axis, avoiding triangle degeneration)
Eliminate the axis with the lowest score to form the projection, e.g. If the lowest score is in the x-axis, project onto the y-z plane.
I have not had time to unit test this approach but after preliminary tests it seems to work rather well. I would be eager to know if this is in-fact the best approach?

After much discussion there is actually a pretty simple way to solve the original problem of knowing which axis to drop when projecting to 2D space. The answer is described in 3D Math Primer for Graphics and Game Development as follows...
"A solution to this dilemma is to
choose the plane of projection so as
to maximize the area of the projected
triangle. This can be done by
examining the plane normal; the
coordinate that has the largest
absolute value is the coordinate that
we will discard. For example, if the
normal is [–1, 0, 0], then we would
discard the x values of the vertices
and p, projecting onto the yz plane."
My original solution which involved computing a score per axis (using sub deltas) is flawed as it is possible to generate a zero score for all three axis, in which case the axis to drop remains undetermined.
Using the normal of the collision plane (which can be precomputed for efficiency) to determine which axis to drop when projecting into 2D is therefore the best approach.

To project a point p onto the plane defined by the vertices v0, v1 & v2 you must calculate a rotation matrix. Let us call the projected point pd
e1 = v1-v0
e2 = v2-v0
r = normalise(e1)
n = normalise(cross(e1,e2))
u = normalise(n X r)
temp = p-v0
pd.x = dot(temp, r)
pd.y = dot(temp, u)
pd.z = dot(temp, n)
Now pd can be projected onto the plane by setting pd.z=0
Also pd.z is the distance between the point and the plane defined by the 3 triangles. i.e. if the projected point lies within the triangle, pd.z is the distance to the triangle.
Another point to note above is that after rotation and projection onto this plane, the vertex v0 lies is at the origin and v1 lies along the x axis.
HTH

I'm not sure that the suggestion is actually the best one. It's not too hard to project to the plane containing the triangle. I assume here that p is actually in that plane.
Let d1 = sqrt((v1-v0).(v1-v0)) - i.e. the distance v0-v1.
Similarly let d2 = sqrt((v2-v0).(v2-v0))
v0 -> (0,0)
v1 -> (d1, 0)
What about v2? Well, you know the distance v0-v2 = d2. All you need is the angle v1-v0-v2. (v1-v0).(v2-v0) = d1 d2 cos(theta). Wlog you can take v2 as having positive y.
Then apply a similar process to p, with one exception: you can't necessarily take it as having positive y. Instead you can check whether it has the same sign of y as v2 by taking the sign of (v1-v0)x(v2-v0) . (v1-v0)x(p-v0).
As an alternative solution, you could use a linear algebra solver on the matrix equation for the tetrahedral case, taking as the fourth vertex of the tetrahedron v0 + (v1-v0)x(v2-v0) and normalising if necessary.

You shouldn't need to determine the optimal area to find a decent projection.
It's not strictly necessary to find the "best" projection at all, just one that's good enough, and that doesn't degenerate to a line when projected into 2D.
EDIT - algorithm deleted due to degenerate case I hadn't thought of

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex