R: Find a shape from a point cloud - r

I have a point cloud like such below
df <- data.frame(x=c(2,3,3,5,6,2,6,7,7,4,3,8,9,10,10,12,11,12,14,15),
y=c(6,5,4,4,4,4,3,3,2,3,7,3,2,3,4,6,5,5,4,6))
plot(df,xlab="",ylab="",pch=20)
Think of them as gps coordinates of movement by an animal. I would like to find the spatial area covered by the points (animal). The most obvious solution is a convex hull which produces this:
df1 <- df[chull(x = df$x,y=df$y),]
polygon(x = df1$x,df1$y)
But this is not the result I am looking for. The movement area is not a closed geometric shape, but rather a boomerang kind of shape. The convex hull covers a lot of area not covered by the animal thereby overestimating the area. I am looking for something like this:
Of course, this is a mock dataset to give an idea. The original datasets have lot more points and varying geometries in point cloud. I was thinking along the lines of DBSCAN or minimum spanning networks, but they don't quite work.
I am not sure how to describe this geometrically or mathematically. If anyone has any ideas on how to approach this (even if it's not a full solution), I would very much appreciate that. If anyone has a better title for this question, that would be nice too :-)
Thanks.
Update ----------------------------------------------------------------
Plot of (minimum spanning tree) MST. I think this might be in the right direction.
library(ape)
d <- dist(df)
mstree <-mst(d)
plot(mstree, x1 = df$x, x2 = df$y)

Try alphahull
library(alphahull)
p <- ahull(df$x, df$y, alpha = 2.5)
plot(p)
Still, purely geometric tricks like this are rarely helpful for animal tracking data. It's too ad hoc to be applicable for other cases, doesn't have anything for the temporal component or information about the environment or the uncertainty of the locations or the relationship between the point samples and the real track etc etc.

library(geometry)
polyarea(df$x, df$y)
[1] 18.5
This requires the right order though.

You might want to consider an approach based on TSP heuristics. Such approaches are near ideal when all points are relevant.
Below is a simple approach extended from the insertion heuristic for TSP that might be workable, but it's O(N^2) or worst unless you rather careful with the data structure. The link gives the following for the heuristic description of the convex hull method.
Convex Hull, O(n^2*log^2(n))
Find the convex hull of our set of cities, and make it our initial subtour.
For each city not in the subtour, find its cheapest insertion (as in step 3 of Nearest Insertion). Then chose the city with the least
cost/increase ratio, and insert it.
Repeat step 2 until no more cities remain.
In this case, the cities are the data points, and since the goal isn't to connect to all of the data points but rather get the general shape, an extra step is needed to determine when a data point either shouldn't be added or is no longer needed and can be removed. The issue though is that it's not clear what what points would be considered irrelevant.
This TSP Test Data site should give you an idea of what the results of that heuristic will be, and how you want to go about removing points form the resulting "tour", which you consider irrelevant.
Although possibility solution is to keep track of the original convex hull, and limit the increase in distance between two adjacent hull points to some (relatively small) multiple of the original distance between the hull points, which is similar to how alpha hulls work. This would prevent shapes such as the one at the bottom of this, TSP Test Case BCL380, by limiting the distance that can be traveled between two hull points.

Related

how to exclude the existence of a line 'segregating' a set of points

Apologies for the vague title: I can't find a proper name for the theory I am looking for (that's why I am asking the question), so I'm going to explain it with an example, and I hope someone can point me in the right direction.
Suppose you have a set of points in 2D.
The following R code:
# make a random set of N points in 2D space as a numerical matrix
set.seed(1010)
d = 2
N = 15
ps <- matrix(rnorm(d*N), , d)
# center the points (subtract the mean of each coordinate)
pss <- scale(ps,scale=F)
# represent the points in a 2D plot, with the origin (the new mean) in red
plot(pss)
text(pss,label=1:N,pos=4)
points(0,0,col=2,pch=16)
text(0,0,label=0)
abline(v=0)
abline(h=0)
should make a plot like:
Consider point 7. Intuitively one can see that there are several possible lines passing through point 7, which 'leave' all other points on 'one side' of the line (i.e. 'segregate' them in the half-plane defined by the line).
Consider instead point 6. There can never be any line passing through point 6 for which one half-plane contains all the points.
A point like 9 can also have such a line, although it's not particularly evident from the plot.
Question: is there any way to exclude the existence of such a line for each specific point? Meaning, could one do some operations on the coordinates of the points proving that such a line can NOT exist for a given point (so one could quickly classify the point into one that can and or can't have it)? I'm also thinking of higher dimensional applications, where lines would be planes, etc.
All my searches on the topic so far took me to concepts like 'convex hull', and 'boundary', which seem indeed quite closely related to what I'm looking for, but go far beyond my simple requirement of classifying the points, and are reported to be 'output-sensitive', indeed because they provide a lot of information on the hull itself, which I do not need.
Any ideas?
Thanks!
Given a set of points, the following two statements about an individual point p are equivalent:
There is a line through p such that every point in the set lies on the same side of the line (or on the line),
p is on the boundary of the set's convex hull.
This is true because if p is in the interior of the convex hull, then any line through p divides the convex hull into two parts. If one side of the line has no points in the set, then the other side is a smaller convex region which contains every point. This would contradict the definition of the convex hull, which is the smallest convex set containing every point.
So the set of points which satisfy the property about having a line which doesn't divide the set in two, is precisely the same set of points that are on the boundary of the convex hull. The latter is what a convex hull algorithm returns, so logically, any algorithm which solves your problem for each point in the set is a convex hull algorithm.
The only subtle difference I can think of is that standard convex hull algorithms usually also return the boundary points in a particular order, whereas you don't need them in any particular order. But I don't think there is a more efficient convex hull algorithm when the order doesn't matter. The running time is O(n log n) in the worst case, which gives you an average query time per point of at most O(log n).
That is asymptotically optimal for this problem if you want to test each point in the set, since computing even the unordered convex hull takes at least O(n log n) in the worst case, according to an arXiv paper by Herman Haverkort. There's evidence that this is optimal even for just finding the cardinality of the convex hull (see this paper by Davis Avis).

Suggested package/function to compute the vertices of a 3-simplex (tetrahedron)

I'd like to draw the 3-simplex which encloses some random points in 3D. So for example:
pts <- rnorm(30)
pts <- matrix(pts, ncol = 3)
With these points, I'd like to compute the vertices of the 3-simplex (irregular tetrahedron) that just encloses these points. Can someone suggest a package/function that will do this? All manner of searching for simplex-related material is dominated by answers that relate to using simplices for other purposes, of which there are many. I just want to compute one and draw it. Seems simple, but I don't seem to know the relevant keywords for what I need.
If nobody can find a suitable package for this, you'll have to settle for doing it yourself, which isn't so difficult if you don't require it to be the absolute tightest fit. See this question over at mathexchange.
The simplest approach presented in this question seems to me to be translating the origin so that all points lie in the positive orthant (i.e, all point dimensions are positive) and then projecting the points to lie within the simplex denoted by each unit vector. To get this simplex in your original coordinate system you can take the inverse projection and inverse translation of the points in this simplex.
Another approach suggested there is to find the enveloping sphere (which you can for instance use Ritter's algorithm for), and then find an enveloping simplex of the sphere, which might be an easier task depending what you are most comfortable with.
I think you're looking for convhulln in the geometry package, but I'm no expert, so maybe that isn't quite what you are looking for.

quality analysis of fitted pyramid

sorry for posting this in programing site, but there might be many programming people who are professional in geometry, 3d geometry... so allow this.
I have been given best fitted planes with the original point data. I want to model a pyramid for this data as the data represent a pyramid. My approach of this modeling is
Finding the intersection lines (e.g. AB, CD,..etc) for each pair of adjacent plane
Then, finding the pyramid top (T) by intersecting the previously found lines as these lines don’t pass through a single point
Intersecting the available side planes with a desired horizontal plane to get the basement
In figure – black triangles are original best fitted triangles; red
and blue triangles are model triangles
I want to show that the points are well fitted for the pyramid model
than that it fitted for the given best fitted planes. (Assume original
planes are updated as shown)
Actually step 2 is done using weighted least square process. Each intersection line is assigned with a weight. Weight is proportional to the angle between normal vectors of corresponding planes. in this step, I tried to find the point which is closest to all the intersection lines i.e. point T. according to the weights, line positions might change with respect to the influence of high weight line. That mean, original planes could change little bit. So I want to show that these new positions of planes are well fitted for the original point data than original planes.
Any idea to show this? I am thinking to use RMSE and show before and after RMSE. But again I think I should use weighted RMSE as all the planes refereeing to the point T are influenced so that I should cope this as a global case rather than looking individual planes….. But I can’t figure out a way to show this. Or maybe I should use some other measure…
So, I am confused and no idea to show this.. Please help me…
If you are given the best-fit planes, why not intersect the three of them to get a single unambiguous T, then determine the lines AT, BT, and CT?
This is not a rhetorical question, by the way. Your actual question seems to be for reassurance that your procedure yields "well-fitted" results, but you have not explained or described what kind of fit you're looking for!
Unfortunately, without this information, your question cannot be answered as asked. If you describe your goals, we may be able to help you achieve them -- or, if you have not yet articulated them for yourself, that exercise may be enough to let you answer your own question...
That said, I will mention that the only difference between the planes you started with and the planes your procedure ends up with should be due to floating point error. This is because, geometrically speaking, all three lines should intersect at the same point as the planes that generated them.

Finding a density peak / cluster centrum in 2D grid / point process

I have a dataset with minute by minute GPS coordinates recorded by a persons cellphone. I.e. the dataset has 1440 rows with LON/LAT values. Based on the data I would like a point estimate (lon/lat value) of where the participants home is. Let's assume that home is the single location where they spend most of their time in a given 24h interval. Furthermore, the GPS sensor most of the time has quite high accuracy, however sometimes it is completely off resulting in gigantic outliers.
I think the best way to go about this is to treat it as a point process and use 2D density estimation to find the peak. Is there a native way to do this in R? I looked into kde2d (MASS) but this didn't really seem to do the trick. Kde2d creates a 25x25 grid of the data range with density values. However, in my data, the person can easily travel 100 miles or more per day, so these blocks are generally too large of an estimate. I could narrow them down and use a much larger grid but I am sure there must be a better way to get a point estimate.
There are "time spent" functions in the trip package (I'm the author). You can create objects from the track data that understand the underlying track process over time, and simply process the points assuming straight line segments between fixes. If "home" is where the largest value pixel is, i.e. when you break up all the segments based on the time duration and sum them into cells, then it's easy to find it. A "time spent" grid from the tripGrid function is a SpatialGridDataFrame with the standard sp package classes, and a trip object can be composed of one or many tracks.
Using rgdal you can easily transform coordinates to an appropriate map projection if lon/lat is not appropriate for your extent, but it makes no difference to the grid/time-spent calculation of line segments.
There is a simple speedfilter to remove fixes that imply movement that is too fast, but that is very simplistic and can introduce new problems, in general updating or filtering tracks for unlikely movement can be very complicated. (In my experience a basic time spent gridding gets you as good an estimate as many sophisticated models that just open up new complications). The filter works with Cartesian or long/lat coordinates, using tools in sp to calculate distances (long/lat is reliable, whereas a poor map projection choice can introduce problems - over short distances like humans on land it's probably no big deal).
(The function tripGrid calculates the exact components of the straight line segments using pixellate.psp, but that detail is hidden in the implementation).
In terms of data preparation, trip is strict about a sensible sequence of times and will prevent you from creating an object if the data have duplicates, are out of order, etc. There is an example of reading data from a text file in ?trip, and a very simple example with (really) dummy data is:
library(trip)
d <- data.frame(x = 1:10, y = rnorm(10), tms = Sys.time() + 1:10, id = gl(1, 5))
coordinates(d) <- ~x+y
tr <- trip(d, c("tms", "id"))
g <- tripGrid(tr)
pt <- coordinates(g)[which.max(g$z), ]
image(g, col = c("transparent", heat.colors(16)))
lines(tr, col = "black")
points(pt[1], pt[2], pch = "+", cex = 2)
That dummy track has no overlapping regions, but it shows that finding the max point in "time spent" is simple enough.
How about using the location that minimises the sum squared distance to all the events? This might be close to the supremum of any kernel smoothing if my brain is working right.
If your data comprises two clusters (home and work) then I think the location will be in the biggest cluster rather than between them. Its not the same as the simple mean of the x and y coordinates.
For an uncertainty on that, jitter your data by whatever your positional uncertainty is (would be great if you had that value from the GPS, otherwise guess - 50 metres?) and recompute. Do that 100 times, do a kernel smoothing of those locations and find the 95% contour.
Not rigorous, and I need to experiment with this minimum distance/kernel supremum thing...
In response to spacedman - I am pretty sure least squares won't work. Least squares is best known for bowing to the demands of outliers, without much weighting to things that are "nearby". This is the opposite of what is desired.
The bisquare estimator would probably work better, in my opinion - but I have never used it. I think it also requires some tuning.
It's more or less like a least squares estimator for a certain distance from 0, and then the weighting is constant beyond that. So once a point becomes an outlier, it's penalty is constant. We don't want outliers to weigh more and more and more as we move away from them, we would rather weigh them constant, and let the optimization focus on better fitting the things in the vicinity of the cluster.

A simple algorithm for polygon intersection

I'm looking for a very simple algorithm for computing the polygon intersection/clipping.
That is, given polygons P, Q, I wish to find polygon T which is contained in P and in Q, and I wish T to be maximal among all possible polygons.
I don't mind the run time (I have a few very small polygons), I can also afford getting an approximation of the polygons' intersection (that is, a polygon with less points, but which is still contained in the polygons' intersection).
But it is really important for me that the algorithm will be simple (cheaper testing) and preferably short (less code).
edit: please note, I wish to obtain a polygon which represent the intersection. I don't need only a boolean answer to the question of whether the two polygons intersect.
I understand the original poster was looking for a simple solution, but unfortunately there really is no simple solution.
Nevertheless, I've recently created an open-source freeware clipping library (written in Delphi, C++ and C#) which clips all kinds of polygons (including self-intersecting ones). This library is pretty simple to use: https://github.com/AngusJohnson/Clipper2
You could use a Polygon Clipping algorithm to find the intersection between two polygons. However these tend to be complicated algorithms when all of the edge cases are taken into account.
One implementation of polygon clipping that you can use your favorite search engine to look for is Weiler-Atherton. wikipedia article on Weiler-Atherton
Alan Murta has a complete implementation of a polygon clipper GPC.
Edit:
Another approach is to first divide each polygon into a set of triangles, which are easier to deal with. The Two-Ears Theorem by Gary H. Meisters does the trick. This page at McGill does a good job of explaining triangle subdivision.
If you use C++, and don't want to create the algorithm yourself, you can use Boost.Geometry. It uses an adapted version of the Weiler-Atherton algorithm mentioned above.
You have not given us your representation of a polygon. So I am choosing (more like suggesting) one for you :)
Represent each polygon as one big convex polygon, and a list of smaller convex polygons which need to be 'subtracted' from that big convex polygon.
Now given two polygons in that representation, you can compute the intersection as:
Compute intersection of the big convex polygons to form the big polygon of the intersection. Then 'subtract' the intersections of all the smaller ones of both to get a list of subracted polygons.
You get a new polygon following the same representation.
Since convex polygon intersection is easy, this intersection finding should be easy too.
This seems like it should work, but I haven't given it more deeper thought as regards to correctness/time/space complexity.
Here's a simple-and-stupid approach: on input, discretize your polygons into a bitmap. To intersect, AND the bitmaps together. To produce output polygons, trace out the jaggy borders of the bitmap and smooth the jaggies using a polygon-approximation algorithm. (I don't remember if that link gives the most suitable algorithms, it's just the first Google hit. You might check out one of the tools out there to convert bitmap images to vector representations. Maybe you could call on them without reimplementing the algorithm?)
The most complex part would be tracing out the borders, I think.
Back in the early 90s I faced something like this problem at work, by the way. I muffed it: I came up with a (completely different) algorithm that would work on real-number coordinates, but seemed to run into a completely unfixable plethora of degenerate cases in the face of the realities of floating-point (and noisy input). Perhaps with the help of the internet I'd have done better!
I have no very simple solution, but here are the main steps for the real algorithm:
Do a custom double linked list for the polygon vertices and
edges. Using std::list won't do because you must swap next and
previous pointers/offsets yourself for a special operation on the
nodes. This is the only way to have simple code, and this will give
good performance.
Find the intersection points by comparing each pair of edges. Note
that comparing each pair of edge will give O(N²) time, but improving
the algorithm to O(N·logN) will be easy afterwards. For some pair of
edges (say a→b and c→d), the intersection point is found by using
the parameter (from 0 to 1) on edge a→b, which is given by
tₐ=d₀/(d₀-d₁), where d₀ is (c-a)×(b-a) and d₁ is (d-a)×(b-a). × is
the 2D cross product such as p×q=pₓ·qᵧ-pᵧ·qₓ. After having found tₐ,
finding the intersection point is using it as a linear interpolation
parameter on segment a→b: P=a+tₐ(b-a)
Split each edge adding vertices (and nodes in your linked list)
where the segments intersect.
Then you must cross the nodes at the intersection points. This is
the operation for which you needed to do a custom double linked
list. You must swap some pair of next pointers (and update the
previous pointers accordingly).
Then you have the raw result of the polygon intersection resolving algorithm. Normally, you will want to select some region according to the winding number of each region. Search for polygon winding number for an explanation on this.
If you want to make a O(N·logN) algorithm out of this O(N²) one, you must do exactly the same thing except that you do it inside of a line sweep algorithm. Look for Bentley Ottman algorithm. The inner algorithm will be the same, with the only difference that you will have a reduced number of edges to compare, inside of the loop.
The way I worked about the same problem
breaking the polygon into line segments
find intersecting line using IntervalTrees or LineSweepAlgo
finding a closed path using GrahamScanAlgo to find a closed path with adjacent vertices
Cross Reference 3. with DinicAlgo to Dissolve them
note: my scenario was different given the polygons had a common vertice. But Hope this can help
If you do not care about predictable run time you could try by first splitting your polygons into unions of convex polygons and then pairwise computing the intersection between the sub-polygons.
This would give you a collection of convex polygons such that their union is exactly the intersection of your starting polygons.
If the polygons are not aligned then they have to be aligned. I would do this by finding the centre of the polygon (average in X, average in Y) then incrementally rotating the polygon by matrix transformation, project the points to one of the axes and use the angle of minimum stdev to align the shapes (you could also use principal components). For finding the intersection, a simple algorithm would be define a grid of points. For each point maintain a count of points inside one polygon, or the other polygon or both (union) (there are simple & fast algorithms for this eg. http://wiki.unity3d.com/index.php?title=PolyContainsPoint). Count the points polygon1 & polygon2, divide by the amount of points in polygon1 or Polygon2 and you have a rough (depending on the grid sampling) estimate of proportion of polygons overlap. The intersection area would be given by the points corresponding to an AND operation.
eg.
function get_polygon_intersection($arr, $user_array)
{
$maxx = -999; // choose sensible limits for your application
$maxy = -999;
$minx = 999;
$miny = 999;
$intersection_count = 0;
$not_intersected = 0;
$sampling = 20;
// find min, max values of polygon (min/max variables passed as reference)
get_array_extent($arr, $maxx, $maxy, $minx, $miny);
get_array_extent($user_array, $maxx, $maxy, $minx, $miny);
$inc_x = $maxx-$minx/$sampling;
$inc_y = $maxy-$miny/$sampling;
// see if x,y is within poly1 and poly2 and count
for($i=$minx; $i<=$maxx; $i+= $inc_x)
{
for($j=$miny; $j<=$maxy; $j+= $inc_y)
{
$in_arr = pt_in_poly_array($arr, $i, $j);
$in_user_arr = pt_in_poly_array($user_array, $i, $j);
if($in_arr && $in_user_arr)
{
$intersection_count++;
}
else
{
$not_intersected++;
}
}
}
// return score as percentage intersection
return 100.0 * $intersection_count/($not_intersected+$intersection_count);
}
This can be a huge approximation depending on your polygons, but here's one :
Compute the center of mass for each
polygon.
Compute the min or max or average
distance from each point of the
polygon to the center of mass.
If C1C2 (where C1/2 is the center of the first/second polygon) >= D1 + D2 (where D1/2 is the distance you computed for first/second polygon) then the two polygons "intersect".
Though, this should be very efficient as any transformation to the polygon applies in the very same way to the center of mass and the center-node distances can be computed only once.

Resources