Problem: Suppose you have a collection of points in the 2D plane. I want to know if this set of points sits on a regular grid (if they are a subset of a 2D lattice). I would like some ideas on how to do this.
For now, let's say I'm only interested in whether these points form an axis-aligned rectangular grid (that the underlying lattice is rectangular, aligned with the x and y axes), and that it is a complete rectangle (the subset of the lattice has a rectangular boundary with no holes). Any solutions must be quite efficient (better than O(N^2)), since N can be hundreds of thousands or millions.
Context: I wrote a 2D vector field plot generator which works for an arbitrarily sampled vector field. In the case that the sampling is on a regular grid, there are simpler/more efficient interpolation schemes for generating the plot, and I would like to know when I can use this special case. The special case is sufficiently better that it merits doing. The program is written in C.

This might be dumb but if your points were to lie on a regular grid, then wouldn't peaks in the Fourier transform of the coordinates all be exact multiples of the grid resolution? You could do a separate Fourier transform the X and Y coordinates. If theres no holes on grid then the FT would be a delta function I think. FFT is O(nlog(n)).
Not quite sure if this is what you are after but for a collection of 2d points on a plane you can always fit them on a rectangular grid (down to the precision of your points anyway), the problem may be the grid they fit to may be too sparsly populated by the points to provide any benefit to your algorithm.
to find a rectangular grid that fits a set of points you essentially need to find the GCD of all the x coordinates and the GCD of all the y coordinates with the origin at xmin,ymin this should be O( n (log n)^2) I think.
How you decide if this grid is then too sparse is not clear however

If the points all come only from intersections on the grid then the hough transform of your set of points might help you. If you find that two mutually perpendicular sets of lines occur most often (meaning you find peaks at four values of theta all 90 degrees apart) and you find repeating peaks in gamma space then you have a grid. Otherwise not.

Here's a solution that works in O(ND log N), where N is the number of points and D is the number of dimensions (2 in your case).
Allocate D arrays with space for N numbers: X, Y, Z, etc. (Time: O(ND))
Iterate through your point list and add the x-coordinate to list X, the y-coordinate to list Y, etc. (Time: O(ND))
Sort each of the new lists. (Time: O(ND log N))
Count the number of unique values in each list and make sure the difference between successive unique values is the same across the whole list. (Time: O(ND))
the unique values in each dimension are equally spaced, and
if the product of the number of unique values of each coordinate is equal to the number of original points (length(uniq(X))*length(uniq(Y))* ... == N,
then the points are in a regular rectangular grid.

Let's say a grid is defined by an orientation Or (within 0 and 90 deg) and a resolution Res. You could compute a cost function that evaluate if a grid (Or, Res) sticks to your points. For example, you could compute the average distance of each point to its closest point of the grid.
Your problem is then to find the (Or, Res) pair that minimize the cost function. In order to narrow the search space and improve the , some a heuristic to test "good" candidate grids could be used.
This approach is the same as the one used in the Hough transform proposed by jilles. The (Or, Res) space is comparable to the Hough's gamma space.


What practical algorithms exist for simplifying sampled multidimensional curves

For 2-dimensional sampled curves (an array of 2D points) there exists the Rahmer-Douglas-Peucker algorithm which only keeps "important" points. It works by calculating the perpendicular distance of each point (or sample) to a line that connects the first and the last point of. If the maximum distance is larger than a value epsilon the point is kept and the array is split into 2 parts. For both parts the operation is repeated (maximal perpendicular distance, if larger than epsilon etc.) The smaller epsilon the more detail is kept.
I am trying to write a function that can also do this for higher arrays of higher dimensional points. But I am unsure how to define distance. Or if this is actually a good idea.
I guess there exist lots of complicated and elegant algorithms that fit the curves to beziers and NURBS and what not. But are there also relatively simple ones?
I would prefer not to use beziers, but simply to identify "important" N-dimensional points.
You could extend your 2D algorithm using algebra and the L2 norm. Let's say you want to calculate the distance from a point X to a line segment PQ (where X, P and Q are defined as N-dimensional vectors).
First you can calculate the vector "proj" as:
Then, the distance is the module of the vector V = PX-proj.
For this calculation you only need the dot product between vectors, and that is well defined for N-dimensional spaces.
Using this approach I have successfuly used Rahmer-Douglas-Peucker algorithm in 3D.

Given a set of points with x, y and z coordinates whose bounds are 0 to 1 (inclusive), determine if they're all uniformly distributed (or close to)

I'm trying to determine whether a set of points are uniformly distributed in a 1 x 1 x 1 cube. Each point comes with an x, y, and z coordinate that corresponds to their location in the cube.
A trivial way that I can think of is to flatten the set of points into 2 graphs and check how normally distributed both are however I do not know whether that's a correct way of doing so.
Anyone else has any idea?
I would compute point density map and then check for anomalies in it:
let assume we have N points to test. If the points are uniformly distributed then they should form "uniform grid" of mmm points:
m * m * m = N
m = N^(1/3)
To account for disturbances from uniform grid and asses statistics you need to divide your cube to grid of cubes where each cube will hold several points (so statistical properties could be computed) let assume k>=5 points per grid cube so:
cubes = m/k
create a 3D array of counters
simply we need integer counter per each grid cube so:
int map[cubes][cubes][cubes];
fill it with zeroes.
process all points p(x,y,z) and update map[][][]
Simply loop through all of your points, and compute grid cube position they belong to and update their counter by incrementing it.
compute average count of the map[][][]
simple average like this will do:
for (xx=0;xx<cubes;xx++)
for (yy=0;yy<cubes;yy++)
for (zz=0;zz<cubes;zz++)
now just compute abs distance to this average
for (xx=0;xx<cubes;xx++)
for (yy=0;yy<cubes;yy++)
for (zz=0;zz<cubes;zz++)
the d will hold a metric telling how far are the points from uniform density. Where 0 means uniform distribution. So just threshold it ... the d is also depending on the number of points and my intuition tells me d>=k means totally not uniform so if you want to make it more robust you can do something like this (the threshold might need tweaking):
if (d<0.25) uniform;
else nonuniform;
As you can see all this is O(N) time so it should be fast enough for you. If it isn't you can evaluate every 10th point by skipping points however that can be done only if the order of points is random. If not you would need to pick N/10 random points instead. The 10 might be any constant but you need to take in mind you still need enough points to process so the statistic results are representing your set so I would not go below 250 points (but that depends on what exactly you need)
Here few of my answers using density map technique:
inverse interpolation of multidimensional grids

I am working on a project of interpolating sample data {(x_i,y_i)} where the input domain for x_i locates in 4D space and output y_i locates in 3D space. I need generate two look up tables for both directions. I managed to generate the 4D -> 3D table. But the 3D -> 4D one is tricky. The sample data are not on regular grid points, and it is not one to one mapping. Is there any known method to treat this situation? I did some search online, but what I found is only for 3D -> 3D mapping, which are not suitable for this case. Thank you!
To answer the questions of Spektre:
X(3D) -> Y(4D) is the case 1X -> nY
I want to generate a table that for any given X, we can find the value for Y. The sample data is not occupy all the domain of X. But it's fine, we only need accuracy for point inside the domain of sample data. For example, we have sample data like {(x1,x2,x3) ->(y1,y2,y3,y4)}. It is possible we also have a sample data {(x1,x2,x3) -> (y1_1,y2_1,y3_1,y4_1)}. But it is OK. We need a table for any (a,b,c) in space X, it corresponds to ONE (e,f,g,h) in space Y. There might be more than one choice, but we only need one. (Sorry for the symbol confusing if any)
One possible way to deal with this: Since I have already established a smooth mapping from Y->X, I can use Newton's method or any other method to reverse search the point y for any given x. But it is not accurate enough, and time consuming. Because I need do search for each point in the table, and the error is the sum of the model error with the search error.
So I want to know it is possible to find a mapping directly to interpolate the sample data instead of doing such kind of search in 3.
You are looking for projections/mappings
as you mentioned you have projection X(3D) -> Y(4D) which is not one to one in your case so what case it is (1 X -> n Y) or (n X -> 1 Y) or (n X -> m Y) ?
you want to use look-up table
I assume you just want to generate all X for given Y the problem with non (1 to 1) mappings is that you can use lookup table only if it has
all valid points
or mapping has some geometric or mathematic symmetry (for example distance between points in X and Yspace is similar,and mapping is continuous)
You can not interpolate between generic mapped points so the question is what kind of mapping/projection you have in mind?
First the 1->1 projections/mappings interpolation
if your X->Y projection mapping is suitable for interpolation
then for 3D->4D use tri-linear interpolation. Find closest 8 points (each in its axis to form grid hypercube) and interpolate between them in all 4 dimensions
if your X<-Y projection mapping is suitable for interpolation
then for 4D->3D use quatro-linear interpolation. Find closest 16 points (each in its axis to form grid hypercube) and interpolate between them in all 3 dimensions.
Now what about 1->n or n->m projections/mappings
That solely depends on the projection/mapping properties which I know nothing of. Try to provide an example of your datasets and adding some image would be best.
[edit1] 1 X <- n Y
I still would use quatro-linear interpolation. You still will need to search your Y table but if you group it like 4D grid then it should be easy enough.
find 16 closest points in Y-table to your input Y point
These points should be the closest points to your Y in each +/- direction of all axises. In 3D it looks like this:
red point is your input Y point
blue points are the found closest points (grid) they do not need to be so symmetric as on image .
Please do not want me to draw 4D example that make sense :) (at least for sober mind)
find corresponding X points. If there is more then one per point chose the closer one to the others ... Now you should have 16 X points and 16+1 Y points. Then from Y points you need just to calculate the distance along lines from your input Y point. These distances are used as parameter for linear interpolations. Normalize them to <0,1> where
0 means 'left' and 1 means 'right' point
0.5 means exact middle
You will need this scalar distance in each of Y-domain dimension. Now just compute all the X points along the linear interpolations until you get the corresponding red point in X-domain.
With tri-linear interpolation (3D) there are 4+2+1=7 linear interpolations (as on image). For quatro-linear interpolation (4D) there are 8+4+2+1=15 linear interpolations.
linear interpolation
X = X0 + (X1-X0)*t
X is interpolated point
X0,X1 are the 'left','right' points
t is the distance parameter <0,1>

Finding the coordinates of points from distance matrix

I have a set of points (with unknow coordinates) and the distance matrix. I need to find the coordinates of these points in order to plot them and show the solution of my algorithm.
I can set one of these points in the coordinate (0,0) to simpify, and find the others. Can anyone tell me if it's possible to find the coordinates of the other points, and if yes, how?
Thanks in advance!
Forgot to say that I need the coordinates on x-y only
The answers based on angles are cumbersome to implement and can't be easily generalized to data in higher dimensions. A better approach is that mentioned in my and WimC's answers here: given the distance matrix D(i, j), define
M(i, j) = 0.5*(D(1, j)^2 + D(i, 1)^2 - D(i, j)^2)
which should be a positive semi-definite matrix with rank equal to the minimal Euclidean dimension k in which the points can be embedded. The coordinates of the points can then be obtained from the k eigenvectors v(i) of M corresponding to non-zero eigenvalues q(i): place the vectors sqrt(q(i))*v(i) as columns in an n x k matrix X; then each row of X is a point. In other words, sqrt(q(i))*v(i) gives the ith component of all of the points.
The eigenvalues and eigenvectors of a matrix can be obtained easily in most programming languages (e.g., using GSL in C/C++, using the built-in function eig in Matlab, using Numpy in Python, etc.)
Note that this particular method always places the first point at the origin, but any rotation, reflection, or translation of the points will also satisfy the original distance matrix.
Step 1, arbitrarily assign one point P1 as (0,0).
Step 2, arbitrarily assign one point P2 along the positive x axis. (0, Dp1p2)
Step 3, find a point P3 such that
Dp1p2 ~= Dp1p3+Dp2p3
Dp1p3 ~= Dp1p2+Dp2p3
Dp2p3 ~= Dp1p3+Dp1p2
and set that point in the "positive" y domain (if it meets any of these criteria, the point should be placed on the P1P2 axis).
Use the cosine law to determine the distance:
cos (A) = (Dp1p2^2 + Dp1p3^2 - Dp2p3^2)/(2*Dp1p2* Dp1p3)
P3 = (Dp1p3 * cos (A), Dp1p3 * sin(A))
You have now successfully built an orthonormal space and placed three points in that space.
Step 4: To determine all the other points, repeat step 3, to give you a tentative y coordinate.
(Xn, Yn).
Compare the distance {(Xn, Yn), (X3, Y3)} to Dp3pn in your matrix. If it is identical, you have successfully identified the coordinate for point n. Otherwise, the point n is at (Xn, -Yn).
Note there is an alternative to step 4, but it is too much math for a Saturday afternoon
If for points p, q, and r you have pq, qr, and rp in your matrix, you have a triangle.
Wherever you have a triangle in your matrix you can compute one of two solutions for that triangle (independent of a euclidean transform of the triangle on the plane). That is, for each triangle you compute, it's mirror image is also a triangle that satisfies the distance constraints on p, q, and r. The fact that there are two solutions even for a triangle leads to the chirality problem: You have to choose the chirality (orientation) of each triangle, and not all choices may lead to a feasible solution to the problem.
Nevertheless, I have some suggestions. If the number entries is small, consider using simulated annealing. You could incorporate chirality into the annealing step. This will be slow for large systems, and it may not converge to a perfect solution, but for some problems it's the best you and do.
The second suggestion will not give you a perfect solution, but it will distribute the error: the method of least squares. In your case the objective function will be the error between the distances in your matrix, and actual distances between your points.
This is a math problem. To derive coordinate matrix X only given by its distance matrix.
However there is an efficient solution to this -- Multidimensional Scaling, that do some linear algebra. Simply put, it requires a pairwise Euclidean distance matrix D, and the output is the estimated coordinate Y (perhaps rotated), which is a proximation to X. For programming reason, just use SciKit.manifold.MDS in Python.
The "eigenvector" method given by the favourite replies above is very general and automatically outputs a set of coordinates as the OP requested, however I noticed that that algorithm does not even ask for a desired orientation (rotation angle) for the frame of the output points, the algorithm chooses that orientation all by itself!
People who use it might want to know at what angle the frame will be tipped before hand so I found an equation which gives the answer for the case of up to three input points, however I have not had time to generalize it to n-points and hope someone will do that and add it to this discussion. Here are the three angles the output sides will form with the x-axis as a function of the input side lengths:
angle side a = arcsin(sqrt(((c+b+a)*(c+b-a)*(c-b+a)*(-c+b+a)*(c^2-b^2)^2)/(a^4*((c^2+b^2-a^2)^2+(c^2-b^2)^2))))*180/Pi/2
angle side b = arcsin(sqrt(((c+b+a)*(c+b-a)*(c-b+a)*(-c+b+a)*(c^2+b^2-a^2)^2)/(4*b^4*((c^2+b^2-a^2)^2+(c^2-b^2)^2))))*180/Pi/2
angle side c = arcsin(sqrt(((c+b+a)*(c+b-a)*(c-b+a)*(-c+b+a)*(c^2+b^2-a^2)^2)/(4*c^4*((c^2+b^2-a^2)^2+(c^2-b^2)^2))))*180/Pi/2
Those equations also lead directly to a solution to the OP's problem of finding the coordinates for each point because: the side lengths are already given from the OP as the input, and my equations give the slope of each side versus the x-axis of the solution, thus revealing the vector for each side of the polygon answer, and summing those sides through vector addition up to a desired vertex will produce the coordinate of that vertex. So if anyone can extend my angle equations to handling beyond three input lengths (but I note: that might be impossible?), it might be a very fast way to the general solution of the OP's question, since slow parts of the algorithms that people gave above like "least square fitting" or "matrix equation solving" might be avoidable.

Computational geometry, tetrahedron signed volume

I'm not sure if this is the right place to ask, but here goes...
Short version: I'm trying to compute the orientation of a triangle on a plane, formed by the intersection of 3 edges, without explicitly computing the intersection points.
Long version: I need to triangulate a PSLG on a triangle in 3D. The vertices of the PSLG are defined by the intersections of line segments with the plane through the triangle, and are guaranteed to lie within the triangle. Assuming I had the intersection points, I could project to 2D and use a point-line-side (or triangle signed area) test to determine the orientation of a triangle between any 3 intersection points.
The problem is I can't explicitly compute the intersection points because of the floating-point error that accumulates when I find the line-plane intersection. To figure out if the line segments strike the triangle in the first place, I'm using some freely available robust geometric predicates, which give the sign of the volume of a tetrahedron, or equivalently which side of a plane a point lies on. I can determine if the line segment endpoints are on opposite sides of the plane through the triangle, then form tetrahedra between the line segment and each edge of the triangle to determine whether the intersection point lies within the triangle.
Since I can't explicitly compute the intersection points, I'm wondering if there is a way to express the same 2D orient calculation in 3D using only the original points. If there are 3 edges striking the triangle that gives me 9 points in total to play with. Assuming what I'm asking is even possible (using only the 3D orient tests), then I'm guessing that I'll need to form some subset of all the possible tetrahedra between those 9 points. I'm having difficultly even visualizing this, let alone distilling it into a formula or code. I can't even google this because I don't know what the industry standard terminology might be for this type of problem.
Any ideas how to proceed with this? Thanks. Perhaps I should ask MathOverflow as well...
EDIT: After reading some of the comments, one thing that occurs to me... Perhaps if I could fit non-overlapping tetrahedra between the 3 line segments, then the orientation of any one of those that crossed the plane would be the answer I'm looking for. Other than when the edges enclose a simple triangular prism, I'm not sure this sub-problem is solvable either.
EDIT: The requested image.
I am answering this on both MO & SO, expanding the comments I made on MO.
My sense is that no computational trick with signed tetrahedra volumes will avoid the precision issues that are your main concern. This is because, if you have tightly twisted segments, the orientation of the triangle depends on the precise positioning of the cutting plane.
[image removed; see below]
In the above example, the upper plane crosses the segments in the order (a,b,c) [ccw from above]: (red,blue,green), while the lower plane crosses in the reverse order (c,b,a): (green,blue,red). The height
of the cutting plane could be determined by your last bit of precision.
Consequently, I think it makes sense to just go ahead and compute the points of intersection in
the cutting plane, using enough precision to make the computation exact. If your segment endpoints coordinates and plane coefficients have L bits of precision, then there is just a small constant-factor increase needed. Although I am not certain of precisely what that factor is, it is small--perhaps 4. You will not need e.g., L2 bits, because the computation is solving linear equations.
So there will not be an explosion in the precision required to compute this exactly.
Good luck!
(I was prevented from posting the clarifying image because I don't have the reputation. See
the MO answer instead.)
Edit: Do see the MO answer, but here's the image:
I would write symbolic vector equations, you know, with dot and cross products, to find the normal of the intersection triangle. Then, the sign of the dot product of this normal with the initial triangle one gives the orientation. So finally you can express this in a form sign(F(p1,...,p9)), where p1 to p9 are your points and F() is an ugly formula including dot and cross products of differences (pi-pj). Don't know if this can be done simpler, but this general approach does the job.
As I understand it, you have three lines intersecting the plane, and you want to calculate the orientation of the triangle formed by the intersection points, without calculating the intersection points themselves?
If so: you have a plane
N·(x - x0) = 0
and six points...
l1a, l1b, l2a, l2b, l3a, l3b
...forming three lines
l1 = l1a + t(l1b - l1a)
l2 = l2a + u(l2b - l2a)
l3 = l3a + v(l3b - l3a)
The intersection points of these lines to the plane occur at specific values of t, u, v, which I'll call ti, ui, vi
N·(l1a + ti(l1b - l1a) - x0) = 0
N·(x0 - l1a)
ti = ----------------
N·(l1b - l1a)
(similarly for ui, vi)
Then the specific points of intersection are
intersect1 = l1a + ti(l1b - l1a)
intersect2 = l2a + ui(l2b - l2a)
intersect3 = l3a + vi(l3b - l3a)
Finally, the orientation of your triangle is
orientation = direction of (intersect2 - intersect1)x(intersect3 - intersect1)
(x is cross-product) Work backwards plugging the values, and you'll have an equation for orientation based only on N, x0, and your six points.
Let's call your triangle vertices T[0], T[1], T[2], and the first line segment's endpoints are L[0] and L[1], the second is L[2] and L[3], and the third is L[4] and L[5]. I imagine you want a function
int Orient(Pt3 T[3], Pt3 L[6]); // index L by L[2*i+j], i=0..2, j=0..1
which returns 1 if the intersections have the same orientation as the triangle, and -1 otherwise.
The result should be symmetric under interchange of j values, antisymmetric under interchange of i values and T indices. As long as you can compute a quantity with these symmetries, that's all you need.
Let's try
Sign(Product( Orient3D(T[i],T[i+1],L[2*i+0],L[2*i+1]) * -Orient3D(T[i],T[i+1],L[2*i+1],L[2*i+0]) ), i=0..2))
where the product should be taken over cyclic permutations of the indices (modulo 3). I believe this has all the symmetry properties required. Orient3D is Shewchuk's 4-point plane orientation test, which I assume you're using.
