Related
I have a set of lng/lat coordinates. What would be an efficient method of calculating the greatest distance between any two points in the set (the "maximum diameter" if you will)?
A naive way is to use Haversine formula to calculate the distance between each 2 points and get the maximum, but this doesn't scale well obviously.
Edit: the points are located on a sufficiently small area, measuring the area in which a person carrying a mobile device was active in the course of a single day.
Theorem #1: The ordering of any two great circle distances along the surface of the earth is the same as the ordering as the straight line distance between the points where you tunnel through the earth.
Hence turn your lat-long into x,y,z based either on a spherical earth of arbitrary radius or an ellipsoid of given shape parameters. That's a couple of sines/cosines per point (not per pair of points).
Now you have a standard 3-d problem that doesn't rely on computing Haversine distances. The distance between points is just Euclidean (Pythagoras in 3d). Needs a square-root and some squares, and you can leave out the square root if you only care about comparisons.
There may be fancy spatial tree data structures to help with this. Or algorithms such as http://www.tcs.fudan.edu.cn/rudolf/Courses/Algorithms/Alg_ss_07w/Webprojects/Qinbo_diameter/2d_alg.htm (click 'Next' for 3d methods). Or C++ code here: http://valis.cs.uiuc.edu/~sariel/papers/00/diameter/diam_prog.html
Once you've found your maximum distance pair, you can use the Haversine formula to get the distance along the surface for that pair.
I think that the following could be a useful approximation, which scales linearly instead of quadratically with the number of points, and is quite easy to implement:
calculate the center of mass M of the points
find the point P0 that has the maximum distance to M
find the point P1 that has the maximum distance to P0
approximate the maximum diameter with the distance between P0 and P1
This can be generalized by repeating step 3 N times,
and taking the distance between PN-1 and PN
Step 1 can be carried out efficiently approximating M as the average of longitudes and latitudes, which is OK when distances are "small" and the poles are sufficiently far away. The other steps could be carried out using the exact distance formula, but they are much faster if the points' coordinates can be approximated as lying on a plane. Once the "distant pair" (hopefully the pair with the maximum distance) has been found, its distance can be re-calculated with the exact formula.
An example of approximation could be the following: if φ(M) and λ(M) are latitude and longitude of the center of mass calculated as Σφ(P)/n and Σλ(P)/n,
x(P) = (λ(P) - λ(M) + C) cos(φ(P))
y(P) = φ(P) - φ(M) [ this is only for clarity, it can also simply be y(P) = φ(P) ]
where C is usually 0, but can be ± 360° if the set of points crosses the λ=±180° line. To find the maximum distance you simply have to find
max((x(PN) - x(PN-1))2 + (y(PN) - y(PN-1))2)
(you don't need the square root because it is monotonic)
The same coordinate transformation could be used to repeat step 1 (in the new coordinate system) in order to have a better starting point. I suspect that if some conditions are met, the above steps (without repeating step 3) always lead to the "true distant pair" (my terminology). If I only knew which conditions...
EDIT:
I hate building on others' solutions, but someone will have to.
Still keeping the above 4 steps, with the optional (but probably beneficial, depending on the typical distribution of points) repetition of step 3,
and following the solution of Spacedman,
doing calculations in 3D overcomes the limitations of closeness and distance from poles:
x(P) = sin(φ(P))
y(P) = cos(φ(P)) sin(λ(P))
z(P) = cos(φ(P)) cos(λ(P))
(the only approximation is that this holds only for a perfect sphere)
The center of mass is given by x(M) = Σx(P)/n, etc.,
and the maximum one has to look for is
max((x(PN) - x(PN-1))2 + (y(PN) - y(PN-1))2 + (z(PN) - z(PN-1))2)
So: you first transform spherical to cartesian coordinates, then start from the center of mass, to find, in at least two steps (steps 2 and 3), the farthest point from the preceding point. You could repeat step 3 as long as the distance increases, perhaps with a maximum number of repetitions, but this won't take you away from a local maximum. Starting from the center of mass is not of much help, either, if the points are spread all over the Earth.
EDIT 2:
I learned enough R to write down the core of the algorithm (nice language for data analysis!)
For the plane approximation, ignoring the problem around the λ=±180° line:
# input: lng, lat (vectors)
rad = pi / 180;
x = (lng - mean(lng)) * cos(lat * rad)
y = (lat - mean(lat))
i = which.max((x - mean(x))^2 + (y )^2)
j = which.max((x - x[i] )^2 + (y - y[i])^2)
# output: i, j (indices)
On my PC it takes less than a second to find the indices i and j for 1000000 points. The following 3D version is a bit slower, but works for any distribution of points (and does not need to be amended when the λ=±180° line is crossed):
# input: lng, lat
rad = pi / 180
x = sin(lat * rad)
f = cos(lat * rad)
y = sin(lng * rad) * f
z = cos(lng * rad) * f
i = which.max((x - mean(x))^2 + (y - mean(y))^2 + (z - mean(z))^2)
j = which.max((x - x[i] )^2 + (y - y[i] )^2 + (z - z[i] )^2)
k = which.max((x - x[j] )^2 + (y - y[j] )^2 + (z - z[j] )^2) # optional
# output: j, k (or i, j)
The calculation of k can be left out (i.e., the result could be given by i and j), depending on the data and on the requirements. On the other hand, my experiments have shown that calculating a further index is useless.
It should be remembered that, in any case, the distance between the resulting points is an estimate which is a lower bound of the "diameter" of the set, although it very often will be the diameter itself (how often depends on the data.)
EDIT 3:
Unfortunately the relative error of the plane approximation can, in extreme cases, be as much as 1-1/√3 ≅ 42.3%, which may be unacceptable, even if very rare. The algorithm can be modified in order to have an upper bound of approximately 20%, which I have derived by compass and straight-edge (the analytic solution is cumbersome). The modified algorithm finds a pair of points whith a locally maximal distance, then repeats the same steps, but this time starting from the midpoint of the first pair, possibly finding a different pair:
# input: lng, lat
rad = pi / 180
x = (lng - mean(lng)) * cos(lat * rad)
y = (lat - mean(lat))
i.n_1 = 1 # n_1: n-1
x.n_1 = mean(x)
y.n_1 = 0 # = mean(y)
s.n_1 = 0 # s: square of distance
repeat {
s = (x - x.n_1)^2 + (y - y.n_1)^2
i.n = which.max(s)
x.n = x[i.n]
y.n = y[i.n]
s.n = s[i.n]
if (s.n <= s.n_1) break
i.n_1 = i.n
x.n_1 = x.n
y.n_1 = y.n
s.n_1 = s.n
}
i.m_1 = 1
x.m_1 = (x.n + x.n_1) / 2
y.m_1 = (y.n + y.n_1) / 2
s.m_1 = 0
m_ok = TRUE
repeat {
s = (x - x.m_1)^2 + (y - y.m_1)^2
i.m = which.max(s)
if (i.m == i.n || i.m == i.n_1) { m_ok = FALSE; break }
x.m = x[i.m]
y.m = y[i.m]
s.m = s[i.m]
if (s.m <= s.m_1) break
i.m_1 = i.m
x.m_1 = x.m
y.m_1 = y.m
s.m_1 = s.m
}
if (m_ok && s.m > s.n) {
i = i.m
j = i.m_1
} else {
i = i.n
j = i.n_1
}
# output: i, j
The 3D algorithm can be modified in a similar way. It is possible (both in the 2D and in the 3D case) to start over once again from the midpoint of the second pair of points (if found). The upper bound in this case is "left as an exercise for the reader" :-).
Comparison of the modified algorithm with the (too) simple algorithm has shown, for normal and for square uniform distributions, a near doubling of processing time, and a reduction of the average error from .6% to .03% (order of magnitude). A further restart from the midpoint results in an a just slightly better average error, but almost equal maximum error.
EDIT 4:
I have to study this article yet, but it looks like the 20% I found with compass and straight-edge is in fact 1-1/√(5-2√3) ≅ 19.3%
Here's a naive example that doesn't scale well (as you say), as you say but might help with building a solution in R.
## lonlat points
n <- 100
d <- cbind(runif(n, -180, 180), runif(n, -90, 90))
library(sp)
## distances on WGS84 ellipsoid
x <- spDists(d, longlat = TRUE)
## row, then column index of furthest points
ind <- c(row(x)[which.max(x)], col(x)[which.max(x)])
## maps
library(maptools)
data(wrld_simpl)
plot(as(wrld_simpl, "SpatialLines"), col = "grey")
points(d, pch = 16, cex = 0.5)
## draw the points and a line between on the page
points(d[ind, ], pch = 16)
lines(d[ind, ], lwd = 2)
## for extra credit, draw the great circle on which the furthest points lie
library(geosphere)
lines(greatCircle(d[ind[1], ], d[ind[2], ]), col = "firebrick")
The geosphere package provides more options for distance calculation if that's needed. See ?spDists in sp for the details used here.
You don't tell us whether these points will be located in a sufficiently small part of the globe. For truly global sets of points, my first guess would be running a naive O(n^2) algorithm, possibly getting performance boost with some spatial indexing (R*-trees, octal-trees etc.). The idea is to pre-generate an n*(n-1) list of the triangle in the distance matrix and feed it in chunks to a fast distance library to minimize I/O and process churn. Haversine is fine, you could also do it with Vincenty's method (the greatest contributor to running time is quadratic complexity, not the (fixed number of) iterations in Vincenty's formula). As a side note, in fact, you don't need R for this stuff.
EDIT #2: The Barequet-Har-Peled algorithm (as pointed at by Spacedman in his reply) has O((n+1/(e^3))log(1/e)) complexity for e>0, and is worth exploring.
For the quasi-planar problem, this is known as "diameter of convex hull" and has three parts:
Computing convex hull with Graham's scan which is O(n*log(n)) - in fact, one should try transforming points into a transverse Mercator projection (using the centroid of the points in data set).
Finding antipodal points by Rotating Calipers algorithm - linear O(n).
Finding the largest distance among all antipodal pairs - linear search, O(n).
The link with pseudo-code and discussion: http://fredfsh.com/2013/05/03/convex-hull-and-its-diameter/
See also the discussion on a related question here: https://gis.stackexchange.com/questions/17358/how-can-i-find-the-farthest-point-from-a-set-of-existing-points
EDIT: Spacedman's solution pointed me to the Malandain-Boissonnat algorithm (see the paper in pdf here). However, this is worse or the same as the bruteforce naive O(n^2) algorithm.
The magnitude of the cross product describes the signed area of the parallelogram described by the two vectors (u, v) used to build the cross product, it has its uses. This same magnitude can be calculated as the magnitude of u times the magnitude of v times the sine of the angle between u and v:
||u||||v||sin(theta).
Now the dot product of u (normalized) and v (normalized) gives the cosine of the angle between u and v:
cos(theta)==dot(normalize(u), normalize(v))
I want to be able to get the signed sine value that is related to the cosine value. It is related because the sine and cosine waves are PI/2 out of sync. I know that the square root of 1 less the cosine value squared gives the unsigned sine value:
sin(theta)==sqrt(1 - (cos(theta) * cos(theta))
Where by cos(theta) I mean the dot product not the angle.
But the attendant sign calculation (+/-) requires theta as an angle:
(cos(theta + PI / 2)) > or == or < 0
If I have to perform an acos function I might as well just do the cross product and find the magnitude.
Is there a known ratio or step that can be added to a cosine value to get its related sine value?
For each possible cosine, both signs are possible for the sine if the corresponding angle is unrestricted.
If you know the angle is between [0,pi], then the sine must be positive or zero.
If you want to know the area of a parallelogram, always take the positive branch sin(x) = sqrt(1 - cos(x)^2). Negative area rarely makes sense (only to define orientation w.r.t. to a plane such as for backface culling)
If you have the two vectors, use a cross product or dot product directly, not the other one and convert.
Seems to me like a complicated way to get to atan2 identities:
d = 𝐚·𝐛 = |𝐚||𝐛|cosθ
c = |𝐚×𝐛| = |𝐚||𝐛|sinθ (with 0° < θ < 180°)
tanθ = 𝐚·𝐛 / |𝐚×𝐛|
θ = atan2(c·sgn(c|z), d) (= four quadrant)
where sgn(c|z) is the sign of the z-component in c (unless 𝐚 and 𝐛 both run exactly parallel with the xz or yz plane, then its the sign of the y-component and x-component, respectively).
Now, from basic trig identities,
r = √(x²+y²)
cos(atan2(y,x)) = x/r
sin(atan2(y,x)) = y/r
Therefore,
sinθ = c·sgn(c|z)/√(c²+d²)
cosθ = d/√(c²+d²)
I think I have found a solution.
cos(b) == sin(a)
v_parallel = dot(normalize(u), v) // the projection of v on u
v_perp = normalize(v) - v_parallel
cos(b) = dot(normalize(v), v_perp) // v_perp is already normalized
Therefore, the magnitude of
u cross v = magnitude(u) * magnitude(v) * cos(b)
I'm trying to design a 2D physics engine with continuous collision detection. Objects are stored as a list of non-rotating line-segments. Therefore I can detect collisions by finding the collision time between each pair of line segments between any two objects.
I want to find the exact time for an intersection between two moving line-segments that are moving in a constant direction, and it is proving to be difficult.
I have figured out that I can simplify the problem further by finding the collision time between each point on a line-segment and the other line-segment (and vice versa). It's possible that it is computationally inefficient, so a general solution for two line segments would be the ideal answer. I can also ignore the case in which lines are parallel (I want to treat a line/point sharing the same position and velocity as 'no collision').
If the answer is "not possible" to exactly find this intersection time, I would accept it as a solution. Any help on the subject would be appreciated.
EDIT: According to Wikipedia's article on a Line segment, for a line segment with endpoints A = (a_x, a_y) and C = (c_x, c_y), a general equation for the line segment looks like this:
For a line-segment--point intersection, would substituting
p_x + p_v * t for a_x (left-side only, right-side is just p_x)
p_y + p_v * t for a_y (left-side only, right-side is just p_y)
q_x + q_v * t for c_x (left-side only, right-side is just q_x)
q_y + q_v * t for c_y (left-side only, right-side is just q_y)
r_x + r_v * t for x
r_y + r_v * t for y
for a line segment pq [(p_x, p_y), (q_x, q_y)], point r (r_x, r_y), moving at rates of p_v == q_v != r_v be solvable for t? Here's the full equation:
The equation I have up above is incorrect in that it uses the same velocity for both its x and y components.
Since velocity is constant, I can simplify the equation such that the point is moving in reference to the line segment. The amount of variables used for velocity reduces greatly, by using v = r_v - qp_v for the velocity of the point r, and 0 for the velocity of each line segment. The equation with the variables plugged in then becomes:
Thanks to WolframAlpha, the equation is then solved for t:
What's interesting is that if you analyze this, it's symmetrical for 3D. Cross product for [x1, y1, 0] and [x2, y2, 0] is [0, 0, x1*y2 - y1*x2]. This equation then translates into:
For a line-segment--point intersection, I can find an interval in which there is a collision (although this interval is larger than the actual time for the collision):
Given a line segment [p, q] moving at velocity v, and a point r with velocity w, direction(w) != direction(v), define three lines L1 = [p, p+v], L2 = [q, q+v], L3 = [r, r+w]. Let t1, t_p and t2, t_q be the intersection times between L1 and L3 and between L2 and L3, respectively. If the interval [t1, t2] is not mutually exclusive with [t_p, t_q], then there is an intersection in the intersection of these two intervals (e.g. intersection between [-1, 10] and [2, 20] is [2, 10]). If these intervals are mutually exclusive, then there is no collision.
Additionally, if the direction of v and w ARE the same, but not of equal length, then you can find the exact time of collision. Let s be the point r when projected onto the line [p, q]. If this point is in the line segment [p, q], there is a collision at time t1, which can be calculated by dividing the distance between point r and point s by the relative velocity between the point r and the line-segment [p, q].
Using the interval it's possible to get an estimate for the time by using a binary-search--like method of comparing distances between the segment and point at specific times. This is very inefficient, however.
I have a 3D Plane defined by two 3D Vectors:
P = a Point which lies on the Plane
N = The Plane's surface Normal
And I want to calculate any vector that lies on the plane.
Take any vector, v, not parallel to N, its vector cross product with N ( w1 = v x N ) is a vector that is parallel to the plane.
You can also take w2 = v - N (v.N)/(N.N) which is the projection of v into plane.
A point in the plane can then be given by x = P + a w, In fact all points in the plane can be expressed as
x = P + a w2 + b ( w2 x N )
So long as the v from which w2 is "suitable".. cant remember the exact conditions and too lazy to work it out ;)
If you want to determine if a point lies in the plane rather than find a point in the plane, you can use
x.N = P.N
for all x in the plane.
If N = (xn, yn, zn) and P = (xp, yp, zp), then the plane's equation is given by:
(x-xp, y-yp, z-zp) * (xn, yn, zn) = 0
where (x, y, z) is any point of the plane and * denotes the inner product.
And I want to calculate any vector
that lies on the plane.
If I understand correctly You need to check if point belongs to the plane?
http://en.wikipedia.org/wiki/Plane_%28geometry%29
You mast check if this equation: nx(x − x0) + ny(y − y0) + nz(z − z0) = 0 is true for your point.
where: [nx,ny,nz] is normal vector,[x0,y0,z0] is given point, [x,y,z] is point you are checking.
//edit
Now I'm understand Your question. You need two linearly independent vectors that are the planes base. Sow You need to fallow Michael Anderson answerer but you must add second vector and use combination of that vectors. More: http://en.wikipedia.org/wiki/Basis_%28linear_algebra%29
Let's say I have two points in 3D space (a and b) and a fixed axis/unit vector called n.
I want to create a rotation matrix that minimizes the euclidan distance between point a (unrotated) and the rotated point b.
E.g:
Q := matrix_from_axis_and_angle (n, alpha);
find the unknown alpha that minimizes sqrt(|a - b*Q|)
Btw - If a solution/algorithm can be easier expressed with unit-quaternions go ahead and use them. I just used matrices to formulate my question because they're more widely used.
Oh - I know there are some degenerated cases ( a or b lying exactly in line with n ect.) These can be ignored. I'm just looking for the case where a single solution can be calculated.
sounds fairly easy. Assume unit vector n implies rotation around a line parallel to n through point x0. If x0 != the origin, translate the coordinate system by -x0 to get points a' and b' relative to new coordinate system origin 0, and use those 2 points instead of a and b.
1) calculate vector ry = n x a
2) calculate unit vector uy = unit vector in direction ry
3) calculate unit vector ux = uy x n
You now have a triplet of mutually perpendicular unit vectors ux, uy, and n, which form a right-handed coordinate system. It can be shown that:
a = dot(a,n) * n + dot(a,ux) * ux
This is because unit vector uy is parallel to ry which is perpendicular to both a and n. (from step 1)
4) Calculate components of b along unit vectors ux, uy. a's components are (ax,0) where ax = dot(a,ux). b's components are (bx,by) where bx = dot(b,ux), by = dot(b,uy). Because of the right-handed coordinate system, ax is always positive so you don't actually need to calculate it.
5) Calculate theta = atan2(by, bx).
Your rotation matrix is the one which rotates by angle -theta relative to coordinate system (ux,uy,n) around the n-axis.
This yields degenerate answers if a is parallel to n (steps 1 and 2) or if b is parallel to n (steps 4, 5).
I think you can rephrase the question to:
what is the distance from a point to a 2d circle in 3d space.
the answer can be found here
so the steps needed are as following:
rotating the point b around a vector n gives you a 2d circle in 3d space
using the above, find the distance to that circle (and the point on the circle)
the point on the circle is the rotated point b you are looking for.
deduce the rotated angle
...or something ;^)
The distance will be minimized when the vector from a to the line along n lines up with the vector from b to the line along n.
Project a and b into the plane perpendicular to n and solve the problem in 2 dimensions. The rotation you get there is the rotation you need to minimize the distance.
Let P be the plane that is perpendicular to n.
We can find the projection of a into the P-plane, (and similarly for b):
a' = a - (dot(a,n)) n
b' = b - (dot(b,n)) n
where dot(a,n) is the dot-product of a and n
a' and b' lie in the P-plane.
We've now reduced the problem to 2 dimensions. Yay!
The angle (of rotation) between a' and b' equals the angle (of rotation) needed to swing b around the n-axis so as to be closest to a. (Think about the shadows b would cast on the P-plane).
The angle between a' and b' is easy to find:
dot(a',b') = |a'| * |b'| * cos(theta)
Solve for theta.
Now you can find the rotation matrix given theta and n here:
http://en.wikipedia.org/wiki/Rotation_matrix
Jason S rightly points out that once you know theta, you must still decide to rotate b clockwise or counterclockwise about the n-axis.
The quantity, dot((a x b),n), will be a positive quantity if (a x b) lies in the same direction as n, and negative if (a x b) lies in the opposite direction. (It is never zero as long as neither a nor b is collinear with n.)
If (a x b) lies in the same direction as n, then b has to be rotated clockwise by the angle theta about the n-axis.
If (a x b) lies in the opposite direction, then b has to be rotated clockwise by the angle -theta about the n-axis.