Alternative metric for Hausdorff distance - math

For my project, I need to measure the distance between two 3D meshes based on OBJ-Files.
I have to implement two different metrics and compare them.
In the course of my literature research, I have so far found only the Hausdorff distance as a metric. Apparently, the Hausdorff distance can be used to calculate the distance of 3D meshes.
Is there an adequate alternative for the Hausdorff distance?
This topic is similiar to mine, but i want to implement two different metrics.
Measure distance between meshes

Many. Depends on your case.
Hausdorff distance is "it is the greatest of all the distances from a point in one set to the closest point in the other set." from wikipedia
Consider the below example of two sets (u and v) in 2 dimensions:
from scipy.spatial.distance import directed_hausdorff
import numpy as np
u = np.array([(1.0, 0.0),
(0.0, 1.0),
(-1.0, 0.0),
(0.0, -1.0)])
v = np.array([(2.0, 0.0),
(0.0, 2.0),
(-2.0, 0.0),
(0.0, -4.0)])
print(directed_hausdorff(u, v))
(2.23606797749979, 3, 0)
Depending on the group you have: 2.23606797749979 or 3.
Going back to the definition I can easily reproduce that results using euclidian distance.
print(euclidean_distances(u, v).min(axis = 0).max(axis = 0))
print(euclidean_distances(u, v).min(axis = 1).max(axis = 0))
3.0
2.23606797749979
Let have a look to all the distances between all the points of the two sets:
print(euclidean_distances(u, v))
[[1. 2.23606798 3. 4.12310563]
[2.23606798 1. 2.23606798 5. ]
[3. 2.23606798 1. 4.12310563]
[2.23606798 3. 2.23606798 3. ]]
As you can see the sortest distance is 1 and the longest 5 for instance. I could formalize that as follow:
print(np.max(euclidean_distances(u, v)))
print(np.min(euclidean_distances(u, v)))
5
1
I could take the average, too:
print(np.mean(euclidean_distances(u, v)))
2.603913694764629
As you see, you have different alternatives there.

Related

R: distance between point and line in n-dimensions

I'd like to calculate the distance between a point and a line in any number (i.e., n) of dimensions.
An excellent example for 2- and 3- dimensions is found here.
Is there a way to generalize this solution to a greater number of dimensions? I have seen other solutions posted previously, but I am not sure exactly how to apply this in R.
Many thanks,
Ken
I have figured out an answer, working from the solution linked in the original question. Posting the R code here for future readers.
two points, A and B, define the line of interest (here length 10)
A <- runif(10, 0.0, 1.0)
B <- runif(10, 0.0, 1.0)
determine distance of the following point, P
P <- runif(10, 0.0, 1.0)
then work through solution posted in original question
pa = P - A
ba = B - A
t = as.vector((pa %*% ba) / (ba %*% ba))
d = (pa - t * ba)
last, determine the length of d, the vector of interest by taking the sum of squares of its elements, and then its square root
dist = sqrt(sum(d^2))
dist # the solution`

Generating random points on a surface of an n-dimensional torus

I'd like to generate random points being located on the surface of an n-dimensional torus. I have found formulas for how to generate the points on the surface of a 3-dimensional torus:
x = (c + a * cos(v)) * cos(u)
y = (c + a * cos(v)) * sin(u)
z = a * sin(v)
u, v ∈ [0, 2 * pi); c, a > 0.
My question is now: how to extend this formulas to n dimensions. Any help on the matter would be much appreciated.
I guess that you can do this recursively. Start with a full orthonormal basis of your vector space, and let the current location be the origin. At each step, choose a point in the plane spanned by the first two coordinate vectors, i.e. take w1 = cos(t)*v1 + sin(t)*v2. Shift the other basis vectors, i.e. w2 = v3, w3 = v4, …. Also take a step from your current position in the direction w1, with the radius r1 chosen up front. When you only have a single basis vector remaining, then the current point is a point on the n-dimensional torus of the outermost recursive call.
Note that while the above may be used to choose points randomly, it won't choose them uniformly. That would likely be a much harder question, and you definitely should ask about the math of that on Math SE or perhaps on Cross Validated (Statistics SE) to get the math right before you worry about implementation.
An n-torus (n being the dimensionality of the surface of the torus; a bagel or doughnut is therefore a 2-torus, not a 3-torus) is a smooth mapping of an n-rectangle. One way to approach this is to generate points on the rectangle and then map them onto the torus. Aside from the problem of figuring out how to map a rectangle onto a torus (I don't know it off-hand), there is the problem that the resulting distribution of points on the torus is not uniform even if the distribution of points is uniform on the rectangle. But there must be a way to adjust the distribution on the rectangle to make it uniform on the torus.
Merely generating u and v uniformly will not necessarily sample uniformly from a torus surface. An additional step is needed.
J.F. Williamson, "Random selection of points distributed on curved surfaces", Physics in Medicine & Biology 32(10), 1987, describes a general method of choosing a uniformly random point on a parametric surface. It is an acceptance/rejection method that accepts or rejects each candidate point depending on its stretch factor (norm-of-gradient). To use this method for a parametric surface, several things have to be known about the surface, namely—
x(u, v), y(u, v) and z(u, v), which are functions that generate 3-dimensional coordinates from two dimensional coordinates u and v,
The ranges of u and v,
g(point), the norm of the gradient ("stretch factor") at each point on the surface, and
gmax, the maximum value of g for the entire surface.
For the 3-dimensional torus with the parameterization you give in your question, g and gmax are the following:
g(u, v) = a * (c + cos(v) * a).
gmax = a * (a + c).
The algorithm to generate a uniform random point on the surface of a 3-dimensional torus with torus radius c and tube radius a is then as follows (where RNDEXCRANGE(x,y) returns a number in [x,y) uniformly at random, and RNDRANGE(x,y) returns a number in [x,y] uniformly at random):
// Maximum stretch factor for torus
gmax = a * (a + c)
while true
u = RNDEXCRANGE(0, pi * 2)
v = RNDEXCRANGE(0, pi * 2)
x = cos(u)*(c+cos(v)*a)
y = sin(u)*(c+cos(v)*a)
z = sin(v)*a
// Norm of gradient (stretch factor)
g = a*abs(c+cos(v)*a)
if g >= RNDRANGE(0, gmax)
// Accept the point
return [x, y, z]
end
end
If you have n-dimensional torus generating formulas, a similar approach can be used to generate uniform random points on that torus (accept a candidate point if norm-of-gradient equals or exceeds a random number in [0, gmax), where gmax is the maximum norm-of-gradient).

Greatest distance between set of longitude/latitude points

I have a set of lng/lat coordinates. What would be an efficient method of calculating the greatest distance between any two points in the set (the "maximum diameter" if you will)?
A naive way is to use Haversine formula to calculate the distance between each 2 points and get the maximum, but this doesn't scale well obviously.
Edit: the points are located on a sufficiently small area, measuring the area in which a person carrying a mobile device was active in the course of a single day.
Theorem #1: The ordering of any two great circle distances along the surface of the earth is the same as the ordering as the straight line distance between the points where you tunnel through the earth.
Hence turn your lat-long into x,y,z based either on a spherical earth of arbitrary radius or an ellipsoid of given shape parameters. That's a couple of sines/cosines per point (not per pair of points).
Now you have a standard 3-d problem that doesn't rely on computing Haversine distances. The distance between points is just Euclidean (Pythagoras in 3d). Needs a square-root and some squares, and you can leave out the square root if you only care about comparisons.
There may be fancy spatial tree data structures to help with this. Or algorithms such as http://www.tcs.fudan.edu.cn/rudolf/Courses/Algorithms/Alg_ss_07w/Webprojects/Qinbo_diameter/2d_alg.htm (click 'Next' for 3d methods). Or C++ code here: http://valis.cs.uiuc.edu/~sariel/papers/00/diameter/diam_prog.html
Once you've found your maximum distance pair, you can use the Haversine formula to get the distance along the surface for that pair.
I think that the following could be a useful approximation, which scales linearly instead of quadratically with the number of points, and is quite easy to implement:
calculate the center of mass M of the points
find the point P0 that has the maximum distance to M
find the point P1 that has the maximum distance to P0
approximate the maximum diameter with the distance between P0 and P1
This can be generalized by repeating step 3 N times,
and taking the distance between PN-1 and PN
Step 1 can be carried out efficiently approximating M as the average of longitudes and latitudes, which is OK when distances are "small" and the poles are sufficiently far away. The other steps could be carried out using the exact distance formula, but they are much faster if the points' coordinates can be approximated as lying on a plane. Once the "distant pair" (hopefully the pair with the maximum distance) has been found, its distance can be re-calculated with the exact formula.
An example of approximation could be the following: if φ(M) and λ(M) are latitude and longitude of the center of mass calculated as Σφ(P)/n and Σλ(P)/n,
x(P) = (λ(P) - λ(M) + C) cos(φ(P))
y(P) = φ(P) - φ(M) [ this is only for clarity, it can also simply be y(P) = φ(P) ]
where C is usually 0, but can be ± 360° if the set of points crosses the λ=±180° line. To find the maximum distance you simply have to find
max((x(PN) - x(PN-1))2 + (y(PN) - y(PN-1))2)
(you don't need the square root because it is monotonic)
The same coordinate transformation could be used to repeat step 1 (in the new coordinate system) in order to have a better starting point. I suspect that if some conditions are met, the above steps (without repeating step 3) always lead to the "true distant pair" (my terminology). If I only knew which conditions...
EDIT:
I hate building on others' solutions, but someone will have to.
Still keeping the above 4 steps, with the optional (but probably beneficial, depending on the typical distribution of points) repetition of step 3,
and following the solution of Spacedman,
doing calculations in 3D overcomes the limitations of closeness and distance from poles:
x(P) = sin(φ(P))
y(P) = cos(φ(P)) sin(λ(P))
z(P) = cos(φ(P)) cos(λ(P))
(the only approximation is that this holds only for a perfect sphere)
The center of mass is given by x(M) = Σx(P)/n, etc.,
and the maximum one has to look for is
max((x(PN) - x(PN-1))2 + (y(PN) - y(PN-1))2 + (z(PN) - z(PN-1))2)
So: you first transform spherical to cartesian coordinates, then start from the center of mass, to find, in at least two steps (steps 2 and 3), the farthest point from the preceding point. You could repeat step 3 as long as the distance increases, perhaps with a maximum number of repetitions, but this won't take you away from a local maximum. Starting from the center of mass is not of much help, either, if the points are spread all over the Earth.
EDIT 2:
I learned enough R to write down the core of the algorithm (nice language for data analysis!)
For the plane approximation, ignoring the problem around the λ=±180° line:
# input: lng, lat (vectors)
rad = pi / 180;
x = (lng - mean(lng)) * cos(lat * rad)
y = (lat - mean(lat))
i = which.max((x - mean(x))^2 + (y )^2)
j = which.max((x - x[i] )^2 + (y - y[i])^2)
# output: i, j (indices)
On my PC it takes less than a second to find the indices i and j for 1000000 points. The following 3D version is a bit slower, but works for any distribution of points (and does not need to be amended when the λ=±180° line is crossed):
# input: lng, lat
rad = pi / 180
x = sin(lat * rad)
f = cos(lat * rad)
y = sin(lng * rad) * f
z = cos(lng * rad) * f
i = which.max((x - mean(x))^2 + (y - mean(y))^2 + (z - mean(z))^2)
j = which.max((x - x[i] )^2 + (y - y[i] )^2 + (z - z[i] )^2)
k = which.max((x - x[j] )^2 + (y - y[j] )^2 + (z - z[j] )^2) # optional
# output: j, k (or i, j)
The calculation of k can be left out (i.e., the result could be given by i and j), depending on the data and on the requirements. On the other hand, my experiments have shown that calculating a further index is useless.
It should be remembered that, in any case, the distance between the resulting points is an estimate which is a lower bound of the "diameter" of the set, although it very often will be the diameter itself (how often depends on the data.)
EDIT 3:
Unfortunately the relative error of the plane approximation can, in extreme cases, be as much as 1-1/√3 ≅ 42.3%, which may be unacceptable, even if very rare. The algorithm can be modified in order to have an upper bound of approximately 20%, which I have derived by compass and straight-edge (the analytic solution is cumbersome). The modified algorithm finds a pair of points whith a locally maximal distance, then repeats the same steps, but this time starting from the midpoint of the first pair, possibly finding a different pair:
# input: lng, lat
rad = pi / 180
x = (lng - mean(lng)) * cos(lat * rad)
y = (lat - mean(lat))
i.n_1 = 1 # n_1: n-1
x.n_1 = mean(x)
y.n_1 = 0 # = mean(y)
s.n_1 = 0 # s: square of distance
repeat {
s = (x - x.n_1)^2 + (y - y.n_1)^2
i.n = which.max(s)
x.n = x[i.n]
y.n = y[i.n]
s.n = s[i.n]
if (s.n <= s.n_1) break
i.n_1 = i.n
x.n_1 = x.n
y.n_1 = y.n
s.n_1 = s.n
}
i.m_1 = 1
x.m_1 = (x.n + x.n_1) / 2
y.m_1 = (y.n + y.n_1) / 2
s.m_1 = 0
m_ok = TRUE
repeat {
s = (x - x.m_1)^2 + (y - y.m_1)^2
i.m = which.max(s)
if (i.m == i.n || i.m == i.n_1) { m_ok = FALSE; break }
x.m = x[i.m]
y.m = y[i.m]
s.m = s[i.m]
if (s.m <= s.m_1) break
i.m_1 = i.m
x.m_1 = x.m
y.m_1 = y.m
s.m_1 = s.m
}
if (m_ok && s.m > s.n) {
i = i.m
j = i.m_1
} else {
i = i.n
j = i.n_1
}
# output: i, j
The 3D algorithm can be modified in a similar way. It is possible (both in the 2D and in the 3D case) to start over once again from the midpoint of the second pair of points (if found). The upper bound in this case is "left as an exercise for the reader" :-).
Comparison of the modified algorithm with the (too) simple algorithm has shown, for normal and for square uniform distributions, a near doubling of processing time, and a reduction of the average error from .6% to .03% (order of magnitude). A further restart from the midpoint results in an a just slightly better average error, but almost equal maximum error.
EDIT 4:
I have to study this article yet, but it looks like the 20% I found with compass and straight-edge is in fact 1-1/√(5-2√3) ≅ 19.3%
Here's a naive example that doesn't scale well (as you say), as you say but might help with building a solution in R.
## lonlat points
n <- 100
d <- cbind(runif(n, -180, 180), runif(n, -90, 90))
library(sp)
## distances on WGS84 ellipsoid
x <- spDists(d, longlat = TRUE)
## row, then column index of furthest points
ind <- c(row(x)[which.max(x)], col(x)[which.max(x)])
## maps
library(maptools)
data(wrld_simpl)
plot(as(wrld_simpl, "SpatialLines"), col = "grey")
points(d, pch = 16, cex = 0.5)
## draw the points and a line between on the page
points(d[ind, ], pch = 16)
lines(d[ind, ], lwd = 2)
## for extra credit, draw the great circle on which the furthest points lie
library(geosphere)
lines(greatCircle(d[ind[1], ], d[ind[2], ]), col = "firebrick")
The geosphere package provides more options for distance calculation if that's needed. See ?spDists in sp for the details used here.
You don't tell us whether these points will be located in a sufficiently small part of the globe. For truly global sets of points, my first guess would be running a naive O(n^2) algorithm, possibly getting performance boost with some spatial indexing (R*-trees, octal-trees etc.). The idea is to pre-generate an n*(n-1) list of the triangle in the distance matrix and feed it in chunks to a fast distance library to minimize I/O and process churn. Haversine is fine, you could also do it with Vincenty's method (the greatest contributor to running time is quadratic complexity, not the (fixed number of) iterations in Vincenty's formula). As a side note, in fact, you don't need R for this stuff.
EDIT #2: The Barequet-Har-Peled algorithm (as pointed at by Spacedman in his reply) has O((n+1/(e^3))log(1/e)) complexity for e>0, and is worth exploring.
For the quasi-planar problem, this is known as "diameter of convex hull" and has three parts:
Computing convex hull with Graham's scan which is O(n*log(n)) - in fact, one should try transforming points into a transverse Mercator projection (using the centroid of the points in data set).
Finding antipodal points by Rotating Calipers algorithm - linear O(n).
Finding the largest distance among all antipodal pairs - linear search, O(n).
The link with pseudo-code and discussion: http://fredfsh.com/2013/05/03/convex-hull-and-its-diameter/
See also the discussion on a related question here: https://gis.stackexchange.com/questions/17358/how-can-i-find-the-farthest-point-from-a-set-of-existing-points
EDIT: Spacedman's solution pointed me to the Malandain-Boissonnat algorithm (see the paper in pdf here). However, this is worse or the same as the bruteforce naive O(n^2) algorithm.

Minimum Weight Triangulation Dynamic Programming Algorithm

So, I'm trying to understand the dynamic programming algorithm for finding the minimum weighted triangulation decomposition of a convex polygon. For those of you that don't know, triangulation is where we take a convex polygon, and break it up into triangles. The minimum weighted triangulation is the triangulation of a polygon where the sum of all the edges(or perimeter of every triangle) is the smallest.
It's actually a fairly common algorithm, however I just can't grasp it. Here is the algorithm I'm trying to understand:
http://en.wikipedia.org/wiki/Minimum-weight_triangulation#Variations
Here's another description I'm trying to follow(Scroll down to 5.2 Optimal Triangulations):
http://valis.cs.uiuc.edu/~sariel/teach/notes/algos/lec/05_dprog_II.pdf
So I understand this much so far. I take all my vertices, and make sure they are in clockwise order around the perimeter of the original polygon. I make a function that returns the minimum weight triangulation, which I call MWT(i, j) of a polygon starting at vertex i and going to vertex j. This function will be recursive, so the first call should be MWT(0, n-1), where n is the total number of vertices. MWT should test all the triangles that are made of the points i, j, and k, where k is any vertex between those. Here's my code so far:
def MWT(i, j):
if j <= i: return 0
elif j == i+1: return 0
cheap_cost = float("infinity")
for k in range(i, j):
cheap_cost = min(cheap_cost, cost((vertices[i], vertices[j], vertices[k])) + MWT(i, k) + MWT(k, j))
return cheap_cost
However when I run it it overflows the stack. I'm just completely lost and would appreciate if somebody could help direct me in the right direction.
If you guys need any more info just ask.
I think that you want to do
for k in range(i+1, j):
not
for k in range(i, j):
because you never want k to be the same as i or j (otherwise you'll just calculate it for the same values that you're currently running).

Intersection of two moving line segments (or a moving line segment and a point)

I'm trying to design a 2D physics engine with continuous collision detection. Objects are stored as a list of non-rotating line-segments. Therefore I can detect collisions by finding the collision time between each pair of line segments between any two objects.
I want to find the exact time for an intersection between two moving line-segments that are moving in a constant direction, and it is proving to be difficult.
I have figured out that I can simplify the problem further by finding the collision time between each point on a line-segment and the other line-segment (and vice versa). It's possible that it is computationally inefficient, so a general solution for two line segments would be the ideal answer. I can also ignore the case in which lines are parallel (I want to treat a line/point sharing the same position and velocity as 'no collision').
If the answer is "not possible" to exactly find this intersection time, I would accept it as a solution. Any help on the subject would be appreciated.
EDIT: According to Wikipedia's article on a Line segment, for a line segment with endpoints A = (a_x, a_y) and C = (c_x, c_y), a general equation for the line segment looks like this:
For a line-segment--point intersection, would substituting
p_x + p_v * t for a_x (left-side only, right-side is just p_x)
p_y + p_v * t for a_y (left-side only, right-side is just p_y)
q_x + q_v * t for c_x (left-side only, right-side is just q_x)
q_y + q_v * t for c_y (left-side only, right-side is just q_y)
r_x + r_v * t for x
r_y + r_v * t for y
for a line segment pq [(p_x, p_y), (q_x, q_y)], point r (r_x, r_y), moving at rates of p_v == q_v != r_v be solvable for t? Here's the full equation:
The equation I have up above is incorrect in that it uses the same velocity for both its x and y components.
Since velocity is constant, I can simplify the equation such that the point is moving in reference to the line segment. The amount of variables used for velocity reduces greatly, by using v = r_v - qp_v for the velocity of the point r, and 0 for the velocity of each line segment. The equation with the variables plugged in then becomes:
Thanks to WolframAlpha, the equation is then solved for t:
What's interesting is that if you analyze this, it's symmetrical for 3D. Cross product for [x1, y1, 0] and [x2, y2, 0] is [0, 0, x1*y2 - y1*x2]. This equation then translates into:
For a line-segment--point intersection, I can find an interval in which there is a collision (although this interval is larger than the actual time for the collision):
Given a line segment [p, q] moving at velocity v, and a point r with velocity w, direction(w) != direction(v), define three lines L1 = [p, p+v], L2 = [q, q+v], L3 = [r, r+w]. Let t1, t_p and t2, t_q be the intersection times between L1 and L3 and between L2 and L3, respectively. If the interval [t1, t2] is not mutually exclusive with [t_p, t_q], then there is an intersection in the intersection of these two intervals (e.g. intersection between [-1, 10] and [2, 20] is [2, 10]). If these intervals are mutually exclusive, then there is no collision.
Additionally, if the direction of v and w ARE the same, but not of equal length, then you can find the exact time of collision. Let s be the point r when projected onto the line [p, q]. If this point is in the line segment [p, q], there is a collision at time t1, which can be calculated by dividing the distance between point r and point s by the relative velocity between the point r and the line-segment [p, q].
Using the interval it's possible to get an estimate for the time by using a binary-search--like method of comparing distances between the segment and point at specific times. This is very inefficient, however.

Resources