What practical algorithms exist for simplifying sampled multidimensional curves - math

For 2-dimensional sampled curves (an array of 2D points) there exists the Rahmer-Douglas-Peucker algorithm which only keeps "important" points. It works by calculating the perpendicular distance of each point (or sample) to a line that connects the first and the last point of. If the maximum distance is larger than a value epsilon the point is kept and the array is split into 2 parts. For both parts the operation is repeated (maximal perpendicular distance, if larger than epsilon etc.) The smaller epsilon the more detail is kept.
I am trying to write a function that can also do this for higher arrays of higher dimensional points. But I am unsure how to define distance. Or if this is actually a good idea.
I guess there exist lots of complicated and elegant algorithms that fit the curves to beziers and NURBS and what not. But are there also relatively simple ones?
I would prefer not to use beziers, but simply to identify "important" N-dimensional points.

You could extend your 2D algorithm using algebra and the L2 norm. Let's say you want to calculate the distance from a point X to a line segment PQ (where X, P and Q are defined as N-dimensional vectors).
First you can calculate the vector "proj" as:
Then, the distance is the module of the vector V = PX-proj.
For this calculation you only need the dot product between vectors, and that is well defined for N-dimensional spaces.
Using this approach I have successfuly used Rahmer-Douglas-Peucker algorithm in 3D.

Related

How to find whole distance between two points in a curved line in R?

I have a similar line graph plotted using R plot function (plot(df))
I want to get distance of the whole line between two points in the graph (e.g., between x(1) and x(3)). How can I do this?
If your function is defined over a fine grid of points, you can compute the length of the line segment between each pair of points and add them. Pythagoras is your friend here:
To the extent that the points are not close enough together that the function is essentially linear between the points, it will tend to (generally only slightly) underestimate the arc length.
Note that if your x-values are stored in increasing order, these ẟx and ẟy values can be obtained directly by differencing (in R that's diff)
If you have a functional form for y as a function of x you can apply the integral for the arc length -- i.e. integrate
∫ √[1+(dy/dx)²] dx
between a and b. This is essentially just the calculation in 1 taken to the limit.
If both x and y are parametric functions of another variable (t, say) you can simplify the parametric form of the above integral (if we don't forget the Jacobian) to integrating
∫ √[(dx/dt)²+(dy/dt)²] dt
between a and b
(Note the direct parallel to 1.)
if you don't have a convenient-to-integrate functional form in 2. or 3. you can use numerical quadrature; this can be quite efficient (which can be handy when the derivative function is expensive to evaluate).

Finding the coordinates of points from distance matrix

I have a set of points (with unknow coordinates) and the distance matrix. I need to find the coordinates of these points in order to plot them and show the solution of my algorithm.
I can set one of these points in the coordinate (0,0) to simpify, and find the others. Can anyone tell me if it's possible to find the coordinates of the other points, and if yes, how?
Thanks in advance!
EDIT
Forgot to say that I need the coordinates on x-y only
The answers based on angles are cumbersome to implement and can't be easily generalized to data in higher dimensions. A better approach is that mentioned in my and WimC's answers here: given the distance matrix D(i, j), define
M(i, j) = 0.5*(D(1, j)^2 + D(i, 1)^2 - D(i, j)^2)
which should be a positive semi-definite matrix with rank equal to the minimal Euclidean dimension k in which the points can be embedded. The coordinates of the points can then be obtained from the k eigenvectors v(i) of M corresponding to non-zero eigenvalues q(i): place the vectors sqrt(q(i))*v(i) as columns in an n x k matrix X; then each row of X is a point. In other words, sqrt(q(i))*v(i) gives the ith component of all of the points.
The eigenvalues and eigenvectors of a matrix can be obtained easily in most programming languages (e.g., using GSL in C/C++, using the built-in function eig in Matlab, using Numpy in Python, etc.)
Note that this particular method always places the first point at the origin, but any rotation, reflection, or translation of the points will also satisfy the original distance matrix.
Step 1, arbitrarily assign one point P1 as (0,0).
Step 2, arbitrarily assign one point P2 along the positive x axis. (0, Dp1p2)
Step 3, find a point P3 such that
Dp1p2 ~= Dp1p3+Dp2p3
Dp1p3 ~= Dp1p2+Dp2p3
Dp2p3 ~= Dp1p3+Dp1p2
and set that point in the "positive" y domain (if it meets any of these criteria, the point should be placed on the P1P2 axis).
Use the cosine law to determine the distance:
cos (A) = (Dp1p2^2 + Dp1p3^2 - Dp2p3^2)/(2*Dp1p2* Dp1p3)
P3 = (Dp1p3 * cos (A), Dp1p3 * sin(A))
You have now successfully built an orthonormal space and placed three points in that space.
Step 4: To determine all the other points, repeat step 3, to give you a tentative y coordinate.
(Xn, Yn).
Compare the distance {(Xn, Yn), (X3, Y3)} to Dp3pn in your matrix. If it is identical, you have successfully identified the coordinate for point n. Otherwise, the point n is at (Xn, -Yn).
Note there is an alternative to step 4, but it is too much math for a Saturday afternoon
If for points p, q, and r you have pq, qr, and rp in your matrix, you have a triangle.
Wherever you have a triangle in your matrix you can compute one of two solutions for that triangle (independent of a euclidean transform of the triangle on the plane). That is, for each triangle you compute, it's mirror image is also a triangle that satisfies the distance constraints on p, q, and r. The fact that there are two solutions even for a triangle leads to the chirality problem: You have to choose the chirality (orientation) of each triangle, and not all choices may lead to a feasible solution to the problem.
Nevertheless, I have some suggestions. If the number entries is small, consider using simulated annealing. You could incorporate chirality into the annealing step. This will be slow for large systems, and it may not converge to a perfect solution, but for some problems it's the best you and do.
The second suggestion will not give you a perfect solution, but it will distribute the error: the method of least squares. In your case the objective function will be the error between the distances in your matrix, and actual distances between your points.
This is a math problem. To derive coordinate matrix X only given by its distance matrix.
However there is an efficient solution to this -- Multidimensional Scaling, that do some linear algebra. Simply put, it requires a pairwise Euclidean distance matrix D, and the output is the estimated coordinate Y (perhaps rotated), which is a proximation to X. For programming reason, just use SciKit.manifold.MDS in Python.
The "eigenvector" method given by the favourite replies above is very general and automatically outputs a set of coordinates as the OP requested, however I noticed that that algorithm does not even ask for a desired orientation (rotation angle) for the frame of the output points, the algorithm chooses that orientation all by itself!
People who use it might want to know at what angle the frame will be tipped before hand so I found an equation which gives the answer for the case of up to three input points, however I have not had time to generalize it to n-points and hope someone will do that and add it to this discussion. Here are the three angles the output sides will form with the x-axis as a function of the input side lengths:
angle side a = arcsin(sqrt(((c+b+a)*(c+b-a)*(c-b+a)*(-c+b+a)*(c^2-b^2)^2)/(a^4*((c^2+b^2-a^2)^2+(c^2-b^2)^2))))*180/Pi/2
angle side b = arcsin(sqrt(((c+b+a)*(c+b-a)*(c-b+a)*(-c+b+a)*(c^2+b^2-a^2)^2)/(4*b^4*((c^2+b^2-a^2)^2+(c^2-b^2)^2))))*180/Pi/2
angle side c = arcsin(sqrt(((c+b+a)*(c+b-a)*(c-b+a)*(-c+b+a)*(c^2+b^2-a^2)^2)/(4*c^4*((c^2+b^2-a^2)^2+(c^2-b^2)^2))))*180/Pi/2
Those equations also lead directly to a solution to the OP's problem of finding the coordinates for each point because: the side lengths are already given from the OP as the input, and my equations give the slope of each side versus the x-axis of the solution, thus revealing the vector for each side of the polygon answer, and summing those sides through vector addition up to a desired vertex will produce the coordinate of that vertex. So if anyone can extend my angle equations to handling beyond three input lengths (but I note: that might be impossible?), it might be a very fast way to the general solution of the OP's question, since slow parts of the algorithms that people gave above like "least square fitting" or "matrix equation solving" might be avoidable.

Difference of simple convex and simple non-convex polygon

Given two simple polygons P and Q where P is convex but Q not, how fast can one compute the difference $P - Q$ between P and Q if P has n and Q has m vertices?
One can assume that the polygons are given as list of vertices ordered in clockwise direction.
"How fast" depends on lots of parameters, so I think, we should start with how to do it,first.
Firstly, I assume polygons lie on the same plane. Start with computing the finite intersection of each line of P with each line of Q. If the intersection exists and intersection point lies on intersecting lines(I mean between start and end, not on them), divide line into two and continue finite line-line intersections iteratively. Then, categorize each line segment(now I can call them as segment because that they are all divided if necessary) by using a point in polygon computation with mid-points of segments..Inner,Outer or OnthePolygon...After categorization construct a new polygon from the lines of P that lies outside of Q and lines of Q that lies inside of the P. Here, your challenge will be dealing with tolerances and lines that are lies on eachother..At first glance, overall algorithm is like this..
This algorithm can be improved by eliminating lines and even polygons by computing their ranges(min and max for each axis). Except from the hardware, programming language or data handling parameters, the speed of this operation is dependent on the input polygons and their orientation.

How to calculate the nearest point of a line and curve? .. or curve and curve?

Given the points of a line and a quadratic bezier curve, how do you calculate their nearest point?
There exist a scientific paper regarding this question from INRIA: Computing the minimum distance between two Bézier curves (PDF here)
I once wrote a tool to do a similar task. Bezier splines are typically parametric cubic polynomials. To compute the square of the distance between a cubic segment and a line, this is just the square of the distance between two polynomial functions, itself just another polynomial function! Note that I said the square of the distance, not the square root.
Essentially, for any point on a cubic segment, one could compute the square of the distance from that point to the line. This will be a 6th order polynomial. Can we minimize that square of the distance? Yes. The minimum must occur where the derivative of that polynomial is zero. So differentiate, getting a 5th order polynomial. Use your favorite root finding tool that generates all of the roots numerically. Jenkins & Traub, whatever. Choose the correct solution from that set of roots, excluding any solutions that are complex, and only picking a solution if it lies inside the cubic segment in question. Make sure you exclude the points that correspond to local maxima of the distance.
All of this can be efficiently done, and no iterative optimizer besides a polynomial root finder need be used, thus one does not require the use of optimization tools that require starting values, finding only a solution near that starting value.
For example, in the 3-d figure I show a curve generated by a set of points in 3-d (in red), then I took another set of points that lay in a circle outside, I computed the closest point on the inner curve from each, drawing a line down to that curve. These points of minimum distance were generated by the scheme outlined above.
I just wanna give you a few hints, in for the case Q.B.Curve // segment :
to get a fast enough computation, i think you should first think about using a kind of 'bounding box' for your algorithm.
Say P0 is first point of the Q. B. Curve, P2 the second point, P1 the control point, and P3P4 the segment then :
Compute distance from P0, P1, P2 to P3P4
if P0 OR P2 is nearest point --> this is the nearest point of the curve from P3P4. end :=).
if P1 is nearest point, and Pi (i=0 or 1) the second nearest point, the distance beetween PiPC and P3P4 is an estimate of the distance you seek that might be precise enough, depending on your needs.
if you need to be more acurate : compute P1', which is the point on the Q.B.curve the nearest from P1 : you find it applying the BQC formula with t=0.5. --> distance from PiP1' to P3P4 is an even more accurate estimate -but more costly-.
Note that if the line defined by P1P1' intersects P3P4, P1' is the closest point of QBC from P3P4.
if P1P1' does not intersect P3P4, then you're out of luck, you must go the hard way...
Now if (and when) you need precision :
think about using a divide and conquer algorithm on the parameter of the curve :
which is nearest from P3P4 ?? P0P1' or P1'P2 ??? if it is P0P1' --> t is beetween 0 and 0.5 so compute Pm for t=0.25.
Now which is nearest from P3P4?? P0Pm or PmP1' ?? if it is PmP1' --> compute Pm2 for t=0.25+0.125=0.375 then which is nearest ? PmPm2 or Pm2P1' ??? etc
you will come to accurate solution in no time, like 6 iteration and your precision on t is 0.004 !! you might stop the search when distance beetween two points becomes below a given value. (and not difference beetwen two parameters, since for a little change in parameter, points might be far away)
in fact the principle of this algorithm is to approximate the curve with segments more and more precisely each time.
For the curve / curve case i would first 'box' them also to avoid useless computation, so first use segment/segment computation, then (maybe) segment/curve computation, and only if needed curve/curve computation.
For curve/curve, divide and conquer works also, more difficult to explain but you might figure it out. :=)
hope you can find your good balance for speed/accuracy with this :=)
Edit : Think i found for the general case a nice solution :-)
You should iterate on the (inner) bounding triangles of each B.Q.C.
So we have Triangle T1, points A, B, C having 't' parameter tA, tB, tC.
and Triangle T2, points D, E, F, having t parameter tD, tE, tF.
Initially we have tA=0 tB=0.5 tC= 1.0 and same for T2 tD=0, tE=0.5, tF=1.0
The idea is to call a procedure recursivly that will split T1 and/or T2 into smaller rectangles until we are ok with the precision reached.
The first step is to compute distance from T1 from T2, keeping track of with segments were the nearest on each triangle. First 'trick': if on T1 the segment is AC, then stop recursivity on T1, the nearest point on Curve 1 is either A or C. if on T2 the nearest segment is DF, then stop recursivity on T2, the nearest point on Curve2 is either D or F. If we stopped recursivity for both -> return distance = min (AD, AF, CD, CF). then if we have recursivity on T1, and segment AB is nearest, new T1 becomes : A'=A B= point of Curve one with tB=(tA+tC)/2 = 0.25, C=old B. same goes for T2 : apply recursivityif needed and call same algorithm on new T1 and new T2. Stop algorithm when distance found beetween T1 and T2 minus distance found beetween previous T1 and T2 is below a threshold.
the function might look like ComputeDistance(curveParam1, A, C, shouldSplitCurve1, curveParam2, D, F, shouldSplitCurve2, previousDistance) where points store also their t parameters.
note that distance (curve, segment) is just a particular case of this algorithm, and that you should implement distance (triangle, triangle) and distance (segment, triangle) to have it worked. Have fun.
1.Simple bad method - by iteration go by point from first curve and go by point from second curve and get minimum
2.Determine math function of distance between curves and calc limit of this function like:
|Fcur1(t)-Fcur2(t)| ->0
Fs is vector.
I think we can calculate the derivative of this for determine extremums and get nearest and farest points
I think about this some time later, and post full response.
Formulate your problem in terms of standard analysis: You have got a quantity to minimize (distance), so you formulate an equation for this quantity and find the points where the first derivatives are zero. Parameterize with a single parameter by using the curve's parameter p, which is between 0 for the first point and 1 for the last point.
In the line case, the equation is fairly simple: Get the x/y coordinates from the spline's equation and compute the distance to the given line via vector equations (scalar product with the line's normal).
In the curve's case, the analytical solution could get pretty complicated. You might want to use a numerical minimization technique such as Nelder-Mead or, since you have a 1D continuous problem, simple bisection.
In the case of a Bézier curve and a line
There are three candidates for the closest point to the line:
The place on the Bézier curve segment that is parallel to the line (if such a place exists),
One end of the curve segment,
The other end of the curve segment.
Test all three; the shortest distance wins.
In the case of two Bézier curves
Depends if you want the exact analytical result, or if an optimised numerical result is good enough.
Analytical result
Given two Bézier curves A(t) and B(s), you can derive equations for their local orientation A'(t) and B'(s). The point pairs for which A'(t) = B'(s) are candidates, i.e. the (t, s) for which the curves are locally parallel. I haven't checked, but I assume that A'(t) - B'(s) = 0 can be solved analytically. If your curves are anything like those you show in your example, there should be either only one solution or no solution to that equation, but there could be two (or infinitely many in the case where the curves identical but translated -- in which case you can ignore this because the winner will always be one of the curve segment endpoints).
In an approach similar to the curve-line case outline above, test each of these point pairs, plus the curve segment endpoints. The shortest distance wins.
Numerical result
Let's say the points on the two Bézier curves are defined as A(t) and B(s). You want to minimize the distance d( t, s) = |A(t) - B(s)|. It's a simple two-parameter optimization problem: find the s and t that minimize d( t, s) with the constraints 0 ≤ t ≤ 1 and 0 ≤ s ≤ 1.
Since d = SQRT( ( xA - xB)² + (yA - yB)²), you can also just minimize the function f( t, s) = [d( t, s)]² to save a square root calculation.
There are numerous ready-made methods for such optimization problems. Pick and choose.
Note that in both cases above, anything higher-order than quadratic Bézier curves can giver you more than one local minimum, so this is something to watch out for. From the examples you give, it looks like your curves have no inflexion points, so this concern may not apply in your case.
The point where there normals match is their nearest point. I mean u draw a line orthogonal to the line. .if that line is orthogonal to the curve as well then the point of intersection is the nearest point

Determine if a set of points lie on a regular grid

Problem: Suppose you have a collection of points in the 2D plane. I want to know if this set of points sits on a regular grid (if they are a subset of a 2D lattice). I would like some ideas on how to do this.
For now, let's say I'm only interested in whether these points form an axis-aligned rectangular grid (that the underlying lattice is rectangular, aligned with the x and y axes), and that it is a complete rectangle (the subset of the lattice has a rectangular boundary with no holes). Any solutions must be quite efficient (better than O(N^2)), since N can be hundreds of thousands or millions.
Context: I wrote a 2D vector field plot generator which works for an arbitrarily sampled vector field. In the case that the sampling is on a regular grid, there are simpler/more efficient interpolation schemes for generating the plot, and I would like to know when I can use this special case. The special case is sufficiently better that it merits doing. The program is written in C.
This might be dumb but if your points were to lie on a regular grid, then wouldn't peaks in the Fourier transform of the coordinates all be exact multiples of the grid resolution? You could do a separate Fourier transform the X and Y coordinates. If theres no holes on grid then the FT would be a delta function I think. FFT is O(nlog(n)).
p.s. I would have left this as a comment but my rep is too low..
Not quite sure if this is what you are after but for a collection of 2d points on a plane you can always fit them on a rectangular grid (down to the precision of your points anyway), the problem may be the grid they fit to may be too sparsly populated by the points to provide any benefit to your algorithm.
to find a rectangular grid that fits a set of points you essentially need to find the GCD of all the x coordinates and the GCD of all the y coordinates with the origin at xmin,ymin this should be O( n (log n)^2) I think.
How you decide if this grid is then too sparse is not clear however
If the points all come only from intersections on the grid then the hough transform of your set of points might help you. If you find that two mutually perpendicular sets of lines occur most often (meaning you find peaks at four values of theta all 90 degrees apart) and you find repeating peaks in gamma space then you have a grid. Otherwise not.
Here's a solution that works in O(ND log N), where N is the number of points and D is the number of dimensions (2 in your case).
Allocate D arrays with space for N numbers: X, Y, Z, etc. (Time: O(ND))
Iterate through your point list and add the x-coordinate to list X, the y-coordinate to list Y, etc. (Time: O(ND))
Sort each of the new lists. (Time: O(ND log N))
Count the number of unique values in each list and make sure the difference between successive unique values is the same across the whole list. (Time: O(ND))
If
the unique values in each dimension are equally spaced, and
if the product of the number of unique values of each coordinate is equal to the number of original points (length(uniq(X))*length(uniq(Y))* ... == N,
then the points are in a regular rectangular grid.
Let's say a grid is defined by an orientation Or (within 0 and 90 deg) and a resolution Res. You could compute a cost function that evaluate if a grid (Or, Res) sticks to your points. For example, you could compute the average distance of each point to its closest point of the grid.
Your problem is then to find the (Or, Res) pair that minimize the cost function. In order to narrow the search space and improve the , some a heuristic to test "good" candidate grids could be used.
This approach is the same as the one used in the Hough transform proposed by jilles. The (Or, Res) space is comparable to the Hough's gamma space.

Resources