Data Structure - Diameter of Polygon - polygon

I need a data structute that allows
addPoint(x, y) in O(logN)
printDiameter() in O(logN)
where N is the current number of points in the polygon.
Obviously the two points will lie on the convex hull of the polygon. Using concept of anti-nodal pairs(Rotating-Callipers), we can find the diameter of N points is O(N).
This explains neatly the O(n) solution, but it doesn't supports insertion of point.

A k-d tree could do insertion in O(logN) as long as it's balanced in some way. For the diameter part, you would have to check every node in order to find the farthest, so it should be O(N).
Another solution would be to use a QuadTree, and traverse it in order to obtain only the nodes located at the external part of the tree.

Related

Shortest distance to cover all the (N+1) points. All the N points lie on x- axis. Remaining one point lies anywhere in the coordinate plane

Given (N+1) points. All the N points lie on x- axis. Remaining one point (HEAD point) lies anywhere in the coordinate plane.
Given a START point on x- axis.
Find the shortest distance to cover all the points starting from START point.We can traverse a point multiple times.
Example N+1=4
points on x axis
(0,1),(0,2),(0,3)
HEAD Point
(1,1) //only head point can lie anywhere //Rest all on x axis
START Point
(0,1)
I am looking for a method as of how to approach this problem.
Whether we should visit HEAD point first or HEAD point in between.
I tried to find a way using Graph Theory to simplify this problem and reduce the paths that need to be considered. If there is an elegant way to represent this problem using graphs to identify a solution, I was not able to find it. This approach becomes very inefficient as the n increases - the time and memory is O(2^n).
Looking at this as a tree graph, the root node would be the START point, then each of its child nodes would be the points it is connected to.
Since the START point and the rest of the points aside from the HEAD all lay on the x-axis, all non-HEAD points only need to be connected to adjacent points on the x-axis. This is because the distance of the path between any two points is the sum of the distances between any adjacent points along the path between those two points (the subset of nodes representing points on the x-axis does not need to form a complete graph). This reduces the brute force approach some.
Here's a simple example:
The upper left shows the original problem: points on the x-axis along with the START and HEAD points.
In the upper right, this has been transformed into a graph with each node representing a point from the original problem. The edges represent the paths that can be taken between points. This assumes that the START point only represents the first point in the path. Unlike the other nodes, it is only included in the path once. If that is not the case and the path can return to the START point, this would approximately double the possible paths, but the same approach can be followed.
In the bottom left, the START point, a, is the root of a tree graph, and each node connected to the START point is a child node. This process is repeated for each child node until either:
A path that is obviously not optimal is identified, in which case that node can just be excluded from the graph. See the nodes in red boxes; going back and forth between the same nodes is unnecessary.
All points are included when traversing the tree from the root to that node, producing a potential solution.
Note that when creating the tree graph, each time a node is repeated, its "potential" child nodes are the same as the first time the node was included. By "potential", I mean cases above still need to be checked, because the result might include a nonsensical path, in which case that node would not be included. It is also possible a potential solution results from the path after its child nodes are included.
The last step is to add up the distances for each of the potential solutions to determine which path is shortest.
This requires a careful examination of the different cases.
Assume for now START (S) is on the far left, and HEAD (H) is somewhere in the middle the path maybe something like
H
/ \
S ---- * ----*----* * --- * ----*
Or it might be shorter to from H to and from the one of the other node
H
//
S ---- * --- * -- *----------*---*
If S is not at one end you might have something like
H
/ \
* ---- * ----*----* * --- * ----*
--------S
Or even going direct from S to H on the first step
H
/ |
* ---- * ----*----* |
S
A full analysis of cases would be quite extensive.
Actually solving the problem, might depend on the number of nodes you have. If the number is small < 10, then compete enumeration might be possible. Just work out every possible path, eliminate the ones which are illegal, and choose the smallest. The number of paths is I think in the order of n!, so its computable for small n.
For large n you can break the problem into small segments. I think its enough just to consider a small patch with nodes either side of H and a small patch with nodes either side of S.
This is not really a solution, but a possible way to think about tackling the problem.
(To be pedantic stackoverflow.com is not the right site for this question in the stack exchange network. Computational Science : algorithms might be a better place.
This is a fun problem. First, lets try to find a brute force solution, as Poosh did.
Observations about the Shortest Path
No repeated points
You are in an Euclidean geometry, thus the triangle inequality holds: For all points a,b,c, the distance d(a,b) + d(b,c) <= d(a,c). Thus, whenever you have an optimal path that contains a point that occurs more than once, you can remove one of them, which means it is not an optimal path, which leads to a contradiction and proves that your optimal path contains each point exactly once.
Permutations
Our problem is thus to find the permutation, lets call it M_i, of the numbers 1...n for points P1...Pn (where P0 is the fixed start point and Pn the head point, P1...Pn-1 are ordered by increasing x value) that minimizes the sum of |(P_M_i)-(P_M_(i-1))| for i from 1 to n, || being the vector length sqrt(v_x²+v_y²).
The number of permutations of a set of size n is n!. In this case we have n+1 points, so a brute force approach testing all permutations would have complexity (n+1)!, which is higher than even 2^n and definitely not practical, so we need further observations to improve this.
Next Steps
My next step would now be to see if there are any other sequences that can be proven to be not optimal, leading to a reduction in the number of candidates to be tested.
Paths of non-head points
Lets look at all paths (sequences of indices of points that don't contain a head point and that are parts of the optimal path. If we don't change the start and end point of a path, then any other transpositions have no effect on the outside environment and we can perform purely local optimizations. We can prove that those sequences must have monotonic (increasing or decreasing) x coordinate values and thus monotonic indices (as they are ordered by ascending x coordinate between indices 0 and n-1):
We are in a purely one dimensional subspace and the total distance of the path is thus equal to the sum of the absolute values of the differences in x coordinates between one such point and the next. It is clear that this sum is minimized by ordering by x coordinate in either ascending or descending order and thus ordering the indices in the same way. Note that this is true for maximal such paths as well as for all continuous "subpaths" of them.
Wrapping it up
The only choices we have left are:
where do we place the head node in the optimal path?
which way do we order the two paths to the left and right?
This means we have n values for the index of the head node (1...n, 0 is fixed as the start node) and 2x2 values for sort order. So we have 4n choices which we can all calculate and pick the shortest one. One of the sort orders probably determines the other but I leave that to you.
Anyways, the complexity of this algorithm is O(4n) = O(n). Because reading in the input of the problems is in O(n) and writing the output is as well, I believe that is an algorithm of optimal complexity. However, if we could reformulate the problem somewhat, so that we could read and write the input and output in some compressed form, as in only the parameters that we actually need to solve the problem, then it is possible that we could do better.
P.S.: I'm not a mathematician so I probably used wrong words for some concepts and missed the usual notation for the variables and functions. I would be glad for some expert to check this for any obvious errors.

how to exclude the existence of a line 'segregating' a set of points

Apologies for the vague title: I can't find a proper name for the theory I am looking for (that's why I am asking the question), so I'm going to explain it with an example, and I hope someone can point me in the right direction.
Suppose you have a set of points in 2D.
The following R code:
# make a random set of N points in 2D space as a numerical matrix
set.seed(1010)
d = 2
N = 15
ps <- matrix(rnorm(d*N), , d)
# center the points (subtract the mean of each coordinate)
pss <- scale(ps,scale=F)
# represent the points in a 2D plot, with the origin (the new mean) in red
plot(pss)
text(pss,label=1:N,pos=4)
points(0,0,col=2,pch=16)
text(0,0,label=0)
abline(v=0)
abline(h=0)
should make a plot like:
Consider point 7. Intuitively one can see that there are several possible lines passing through point 7, which 'leave' all other points on 'one side' of the line (i.e. 'segregate' them in the half-plane defined by the line).
Consider instead point 6. There can never be any line passing through point 6 for which one half-plane contains all the points.
A point like 9 can also have such a line, although it's not particularly evident from the plot.
Question: is there any way to exclude the existence of such a line for each specific point? Meaning, could one do some operations on the coordinates of the points proving that such a line can NOT exist for a given point (so one could quickly classify the point into one that can and or can't have it)? I'm also thinking of higher dimensional applications, where lines would be planes, etc.
All my searches on the topic so far took me to concepts like 'convex hull', and 'boundary', which seem indeed quite closely related to what I'm looking for, but go far beyond my simple requirement of classifying the points, and are reported to be 'output-sensitive', indeed because they provide a lot of information on the hull itself, which I do not need.
Any ideas?
Thanks!
Given a set of points, the following two statements about an individual point p are equivalent:
There is a line through p such that every point in the set lies on the same side of the line (or on the line),
p is on the boundary of the set's convex hull.
This is true because if p is in the interior of the convex hull, then any line through p divides the convex hull into two parts. If one side of the line has no points in the set, then the other side is a smaller convex region which contains every point. This would contradict the definition of the convex hull, which is the smallest convex set containing every point.
So the set of points which satisfy the property about having a line which doesn't divide the set in two, is precisely the same set of points that are on the boundary of the convex hull. The latter is what a convex hull algorithm returns, so logically, any algorithm which solves your problem for each point in the set is a convex hull algorithm.
The only subtle difference I can think of is that standard convex hull algorithms usually also return the boundary points in a particular order, whereas you don't need them in any particular order. But I don't think there is a more efficient convex hull algorithm when the order doesn't matter. The running time is O(n log n) in the worst case, which gives you an average query time per point of at most O(log n).
That is asymptotically optimal for this problem if you want to test each point in the set, since computing even the unordered convex hull takes at least O(n log n) in the worst case, according to an arXiv paper by Herman Haverkort. There's evidence that this is optimal even for just finding the cardinality of the convex hull (see this paper by Davis Avis).

Fast algorithm to uncross any crossing edges in a set of polygons

I have a number of polygons each represented as a list of points. I'm looking for a fast algorithm to go through the list of polygons and uncross all of the crossed edges until no crossed edges remain.
Psudocode for current version:
While True:
For each pair of polygons:
for edge1 in first_polygon:
for edge2 in second_polygon:
if edges_cross(edge1,edge2): # Uses a line segment intersection test
uncross_edges(first_polygon,second_polygon,edge1,edge2)
If no edges have been uncrossed:
break
This can be improved a fair bit by replacing the while loop with recursion. However, it's still rather poor in terms of performance.
Below is a simple example of the untangling*. The there'll actually be a large number of polygons and a fair number of points per polygon (around 10-500). The red line shows which two edges are being uncrossed. The result should always be a series of planar graphs, although not sure if there are multiple valid outcomes or just one.
Edit: This time I added the lines first then added the points, and used a bit more complex of a shape. Pretend the points are fixed.
First, let us illustrate what you want (if I got it right). Suppose you have two polygons, one of them has an edge (a, b) which intersects with an edge (s, r) of the other one. These polygons also have a clock-wise orientation, so you know the next vertex after b, and the next vertex after r. Since the edges crosses, you remove them both, and add four new ones. The new ones you add are: (a, r), (r, next(b)); (s, b), (b, next(r)). So you again have two polygons. This is illustrated in the following figure. Note that by initially removing only two edges (one from each polygon), all the crossing were resolved.
Speeding the trivial implementation of O(n^2) per iteration is not entirely easy, and 500 points per polygon is a very small amount to be worried about. If you decide that you need to improve this time, my initial suggestion would be to use the Bentley-Otmann algorithm in some smart way. The smart way involves running the algorithm, then when you find an intersection, you do the procedure above to eliminate the intersection, and then you update the events that guide the algorithm. Hopefully, the events to be handled can be updated without rendering the algorithm useless for this situation, but I don't have a proof for that.
It seems that you want to end up with an embedded planar polygon whose vertices are exactly a given collection of points. The desired "order" on the points is what you get by going around the boundary of the polygon and enumerating the vertices in the order they appear.
For a given collection of points in general there will be more than one embedded polygon with this property; for an example, consider the following list of points:
(-1,-1), (0,0), (1,0), (1,1), (0,1)
This list defines a polygon meeting your criteria (if I understand it correctly). But so does the following ordering of this list:
(-1,-1), (1,0), (0,0), (1,1), (0,1)
Here is one algorithm that will work (I don't know about fast).
First, sort your points by x-coordinate (eg with quicksort) in increasing order (call this list L).
Second, find the convex hull (eg with quickhull); the boundary of the convex hull will contain the leftmost and rightmost points in the sorted list L (call these L[1] and L[n]); let S be the subset of points on the boundary between L[1] and L[n].
The list you want is S in the order it appears in L (which will also be the order it appears in the boundary of the convex hull) followed by the other elements L-S in the reverse of the order they appear in L.
The first two operations should usually take time O(n log n) (worst case O(n^2)); the last will take time O(n). The polygon you get will be the lower boundary of the convex hull (from left to right, say), together with the rest of the points in a "zigzag" above them going from right to left.

Node's position in tree as a Feature Vector?

Background
I have a tree of nodes and I'm trying to run some machine learning algorithms to classify them. One of the features I want to use is the position of the nodes in the tree, i.e. closer nodes are likely to be in the same class.
My Problem
I'm representing all the features as a vector of numbers. Any thoughts on how I can represent position in the tree as a vector? So that distance b/n two vectors corresponds to distance between nodes in the tree? (I have a small tree of depth around 5-7 and branching around 2-3)
What I tried
P.S. I read about algorithms to find shortest distance between 2 nodes (finding each one's distance to their closest common ancestor) One idea I found was to have a vector x where each index corresponds to possible ancestors in the tree. Then set x[i] = numbers of levels from that ancestor. The problem with that is- I don't know what to do with nodes that aren't ancestors.
just put the path of the tree as the vector. then simply calculate the length of the difference between the two paths. so for example. 2,3,1,5,3 is one path. and 2,3,3,5,9,5 is another path. so 2,3 they have in common. so the length of the difference is 1,5,3 and 3,5,9,5 which is 7. good luck
So there is actually a very nice way to derive the features that you want; you can do so with MDS.
What MDS does is that it takes a N by N matrix (here N is number of nodes) where entry a_{i,j} is the distance between item i and item j (node i and node j) and for each item i it will return a D (pre-specified) position vector, D_i, such that the distance between D_i and D_j is approximately a_{i,j}.
Thus, we can have your feature vector with a bit of pre-processing. First, find the shortest distance (in hops) for each pair of nodes (you could use Floyd-Warshall) then use the distance matrix as input for MDS and specify the number of dimensions for you're position vector, and MDS will output position vectors of said dimensions.
If you search the web I'm sure you can find open sourced implementations to both Floyd-Warshall and MDS.

A simple algorithm for polygon intersection

I'm looking for a very simple algorithm for computing the polygon intersection/clipping.
That is, given polygons P, Q, I wish to find polygon T which is contained in P and in Q, and I wish T to be maximal among all possible polygons.
I don't mind the run time (I have a few very small polygons), I can also afford getting an approximation of the polygons' intersection (that is, a polygon with less points, but which is still contained in the polygons' intersection).
But it is really important for me that the algorithm will be simple (cheaper testing) and preferably short (less code).
edit: please note, I wish to obtain a polygon which represent the intersection. I don't need only a boolean answer to the question of whether the two polygons intersect.
I understand the original poster was looking for a simple solution, but unfortunately there really is no simple solution.
Nevertheless, I've recently created an open-source freeware clipping library (written in Delphi, C++ and C#) which clips all kinds of polygons (including self-intersecting ones). This library is pretty simple to use: https://github.com/AngusJohnson/Clipper2
You could use a Polygon Clipping algorithm to find the intersection between two polygons. However these tend to be complicated algorithms when all of the edge cases are taken into account.
One implementation of polygon clipping that you can use your favorite search engine to look for is Weiler-Atherton. wikipedia article on Weiler-Atherton
Alan Murta has a complete implementation of a polygon clipper GPC.
Edit:
Another approach is to first divide each polygon into a set of triangles, which are easier to deal with. The Two-Ears Theorem by Gary H. Meisters does the trick. This page at McGill does a good job of explaining triangle subdivision.
If you use C++, and don't want to create the algorithm yourself, you can use Boost.Geometry. It uses an adapted version of the Weiler-Atherton algorithm mentioned above.
You have not given us your representation of a polygon. So I am choosing (more like suggesting) one for you :)
Represent each polygon as one big convex polygon, and a list of smaller convex polygons which need to be 'subtracted' from that big convex polygon.
Now given two polygons in that representation, you can compute the intersection as:
Compute intersection of the big convex polygons to form the big polygon of the intersection. Then 'subtract' the intersections of all the smaller ones of both to get a list of subracted polygons.
You get a new polygon following the same representation.
Since convex polygon intersection is easy, this intersection finding should be easy too.
This seems like it should work, but I haven't given it more deeper thought as regards to correctness/time/space complexity.
Here's a simple-and-stupid approach: on input, discretize your polygons into a bitmap. To intersect, AND the bitmaps together. To produce output polygons, trace out the jaggy borders of the bitmap and smooth the jaggies using a polygon-approximation algorithm. (I don't remember if that link gives the most suitable algorithms, it's just the first Google hit. You might check out one of the tools out there to convert bitmap images to vector representations. Maybe you could call on them without reimplementing the algorithm?)
The most complex part would be tracing out the borders, I think.
Back in the early 90s I faced something like this problem at work, by the way. I muffed it: I came up with a (completely different) algorithm that would work on real-number coordinates, but seemed to run into a completely unfixable plethora of degenerate cases in the face of the realities of floating-point (and noisy input). Perhaps with the help of the internet I'd have done better!
I have no very simple solution, but here are the main steps for the real algorithm:
Do a custom double linked list for the polygon vertices and
edges. Using std::list won't do because you must swap next and
previous pointers/offsets yourself for a special operation on the
nodes. This is the only way to have simple code, and this will give
good performance.
Find the intersection points by comparing each pair of edges. Note
that comparing each pair of edge will give O(N²) time, but improving
the algorithm to O(N·logN) will be easy afterwards. For some pair of
edges (say a→b and c→d), the intersection point is found by using
the parameter (from 0 to 1) on edge a→b, which is given by
tₐ=d₀/(d₀-d₁), where d₀ is (c-a)×(b-a) and d₁ is (d-a)×(b-a). × is
the 2D cross product such as p×q=pₓ·qᵧ-pᵧ·qₓ. After having found tₐ,
finding the intersection point is using it as a linear interpolation
parameter on segment a→b: P=a+tₐ(b-a)
Split each edge adding vertices (and nodes in your linked list)
where the segments intersect.
Then you must cross the nodes at the intersection points. This is
the operation for which you needed to do a custom double linked
list. You must swap some pair of next pointers (and update the
previous pointers accordingly).
Then you have the raw result of the polygon intersection resolving algorithm. Normally, you will want to select some region according to the winding number of each region. Search for polygon winding number for an explanation on this.
If you want to make a O(N·logN) algorithm out of this O(N²) one, you must do exactly the same thing except that you do it inside of a line sweep algorithm. Look for Bentley Ottman algorithm. The inner algorithm will be the same, with the only difference that you will have a reduced number of edges to compare, inside of the loop.
The way I worked about the same problem
breaking the polygon into line segments
find intersecting line using IntervalTrees or LineSweepAlgo
finding a closed path using GrahamScanAlgo to find a closed path with adjacent vertices
Cross Reference 3. with DinicAlgo to Dissolve them
note: my scenario was different given the polygons had a common vertice. But Hope this can help
If you do not care about predictable run time you could try by first splitting your polygons into unions of convex polygons and then pairwise computing the intersection between the sub-polygons.
This would give you a collection of convex polygons such that their union is exactly the intersection of your starting polygons.
If the polygons are not aligned then they have to be aligned. I would do this by finding the centre of the polygon (average in X, average in Y) then incrementally rotating the polygon by matrix transformation, project the points to one of the axes and use the angle of minimum stdev to align the shapes (you could also use principal components). For finding the intersection, a simple algorithm would be define a grid of points. For each point maintain a count of points inside one polygon, or the other polygon or both (union) (there are simple & fast algorithms for this eg. http://wiki.unity3d.com/index.php?title=PolyContainsPoint). Count the points polygon1 & polygon2, divide by the amount of points in polygon1 or Polygon2 and you have a rough (depending on the grid sampling) estimate of proportion of polygons overlap. The intersection area would be given by the points corresponding to an AND operation.
eg.
function get_polygon_intersection($arr, $user_array)
{
$maxx = -999; // choose sensible limits for your application
$maxy = -999;
$minx = 999;
$miny = 999;
$intersection_count = 0;
$not_intersected = 0;
$sampling = 20;
// find min, max values of polygon (min/max variables passed as reference)
get_array_extent($arr, $maxx, $maxy, $minx, $miny);
get_array_extent($user_array, $maxx, $maxy, $minx, $miny);
$inc_x = $maxx-$minx/$sampling;
$inc_y = $maxy-$miny/$sampling;
// see if x,y is within poly1 and poly2 and count
for($i=$minx; $i<=$maxx; $i+= $inc_x)
{
for($j=$miny; $j<=$maxy; $j+= $inc_y)
{
$in_arr = pt_in_poly_array($arr, $i, $j);
$in_user_arr = pt_in_poly_array($user_array, $i, $j);
if($in_arr && $in_user_arr)
{
$intersection_count++;
}
else
{
$not_intersected++;
}
}
}
// return score as percentage intersection
return 100.0 * $intersection_count/($not_intersected+$intersection_count);
}
This can be a huge approximation depending on your polygons, but here's one :
Compute the center of mass for each
polygon.
Compute the min or max or average
distance from each point of the
polygon to the center of mass.
If C1C2 (where C1/2 is the center of the first/second polygon) >= D1 + D2 (where D1/2 is the distance you computed for first/second polygon) then the two polygons "intersect".
Though, this should be very efficient as any transformation to the polygon applies in the very same way to the center of mass and the center-node distances can be computed only once.

Resources