How to evaluate how well one group of curves fits the other?

How to evaluate how well one group of curves fits the other? - math

I need define a misfit function to describe the fitting situation of two groups of curves. The curves are recorded as arrays of points. Who has an idea or give me some hints?
The two groups of curves are like below, they may be very different or sometimes they're the same. If all the parts of the first one locates on the second, I would say the fitting is perfect. They have the same x-axis and y-axis.
The first is real-world data, the second is synthetic that comes from forwarding, I need to define a misfit/fitting function that I minimize or maximize to perform an inversion. Indeed, the second one is not its original format,  the synthetic data is a 2D array with a value for each (x, y)，then I pick the maximum and get these points as below. 
Appendix:
I have a set of real-world data, which have the format like this:
x  y
1.1 1.2 
3.1  2.3
...
I plot it then get some curves as the first figure. 
These are data, if I want to perform a inversion, I'll do the forward simulation to get synthetics to compare with data. The synthetics I got are in the format like this:
x  y  value
   1.     3.4
   2.     1.2
   3.     5.6
   4.     1.2
...
   1.    -1.3
   2.    6.7
...
The second figure is the result that I pick the maximum value. Of course, it's better to use the raw synthetics.
The inversion needs a misfit/fitting function, I've no idea how to define it. If they're just some points that compose one curve and compared with the other curve, I can just do an interpolation and calculate the Euclidian distance. However, they're a group of curve now, and it's hard to define it.

Off the top of my head, I would use the result of some sort of Iterative Closest Point (ICP) to characterize the distance between the two point clouds (the distance would be, informally, the size of the transformation that will give you the best fitting). Basically, you assume that the red and blue points are in the same plane. Then,
1- You associate each blue point to its closest red point.
2- Find the translation/rotation that minimizes the sum of the distances between each blue point and its associated red point (this is simple minimization, you can do this via a simple Levenberg-Marquardt, though it seems overkill, Gauss-Newton shoud do it). That is, you will be solving
argmin_{R,T} sum_i{norm2(b_i-R*r_i+T)}
with (b_i,r_i) a couple of blue/red matches obtained at step 1. R is the 2d rotation matrix, and T=[t_x t_y]' is a 2d translation.
3- Iterate throught these steps 1 and 2 until convergence.
This gives you a vector in R^3, which is of the form
transformation=[angle translation_x translation_y]
Now, you can use norm(transformation) as a rough measure for how well the two curves fit each other. You should probably be careful about fixing the estimated angle in an interval like [0,pi] though.

Related

Determining whether an object is in the path of two other objects in 2D space

Say there are 3 objects defined by rectangles in x-y coordinates. The rectangles can be of any orientation (not necessarily parallel to the axes).
How would you go about approaching the problem of determining whether object C is partially, fully, or not at all obscured by object B from the perspective of object A (object A can see from anywhere on its rectangle)?
Second question: is it possible to determine the percentage of object C that is visible?

Here is my (completely untested) approach.
Consider first the same problem but only looking from one fixed point P.
Find the two (infinite) lines that go through P and enclose your rectangle B. Since it is a rectangle these two lines will be two of the four lines that go through P and each vertex of B.
Check whether each vertex of C is in between these two lines or not. If any vertex of C is between these two lines check whether it is closer or farther from P than B is. If it is farther than B is at least partially obscuring C.
Now do this for each vertex of A. You may get more complicated results if you see a part of C from one point P in A and a different part of C from a different point P in A. I will leave it up to you how to deal with that.
To determine the percent coverage, compute the shape that you get from intersecting rectangle C with these two lines and compute its area, then divide by the total area of the shape.
I have left all the math in this up to you to figure out, but if you have any specific questions about your work, feel free to ask those.
This answer would benefit nicely from having some pictures added, but I'm hoping you can understand this answer while drawing your own diagrams according to the steps provided.

Maximum number of points that lie on the same straight straight line in a 2D plane

I am trying to solve a programming interview question that requires one to find the maximum number of points that lie on the same straight straight line in a 2D plane. I have looked up the solutions on the web. All of them discuss a O(N^2) solution using hashing such as the one at this link: Here
I understand the part where common gradient is used to check for co-linear points since that is a common mathematical technique. However, the solution points out that one must beware of vertical lines and overlapping points. I am not sure how these points can cause problems? Can't I just store the gradient of vertical lines as infinity (a large number)?

Hint:
Three distinct points are collinear if
x_1*(y_2-y_3)+x_2*(y_3-y_1)+x_3*(y_1-y_2) = 0
No need to check for slopes or anything else. You need need to eliminate duplicate points from the set before the search begins though.
So pick a pair of points, and find all other points that are collinear and store them in a list of lines. For the remainder points do the same and then compare which lines have the most points.
The first time around you have n-2 tests. The second time around you have n-4 tests because there is no point on revisiting the first two points. Next time n-6 etc, for a total of n/2 tests. At worst case this results in (n/2)*(n/2-1) operations which is O(n^2) complexity.
PS. Who ever decided the canonical answer is using slopes knows very little about planar geometry. People invented homogeneous coordinates for points and lines in a plane for the exact reason of having to represent vertical lines and other degenerate situations.

How to smooth hand drawn lines?

So I am using Kinect with Unity.
With the Kinect, we detect a hand gesture and when it is active we draw a line on the screen that follows where ever the hand is going. Every update the location is stored as the newest (and last) point in a line. However the lines can often look very choppy.
Here is a general picture that shows what I want to achieve:
With the red being the original line, and the purple being the new smoothed line. If the user suddenly stops and turns direction, we think we want it to not exactly do that but instead have a rapid turn or a loop.
My current solution is using Cubic Bezier, and only using points that are X distance away from each other (with Y points being placed between the two points using Cubic Bezier). However there are two problems with this, amongst others:
1) It often doesn't preserve the curves to the distance outwards the user drew them, for example if the user suddenly stop a line and reverse the direction there is a pretty good chance the line won't extend to point where the user reversed the direction.
2) There is also a chance that the selected "good" point is actually a "bad" random jump point.
So I've thought about other solutions. One including limiting the max angle between points (with 0 degrees being a straight line). However if the point has an angle beyond the limit the math behind lowering the angle while still following the drawn line as best possible seems complicated. But maybe it's not. Either way I'm not sure what to do and looking for help.
Keep in mind this needs to be done in real time as the user is drawing the line.

You can try the Ramer-Douglas-Peucker algorithm to simplify your curve:
https://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm
It's a simple algorithm, and parameterization is reasonably intuitive. You may use it as a preprocessing step or maybe after one or more other algorithms. In any case it's a good algorithm to have in your toolbox.
Using angles to reject "jump" points may be tricky, as you've seen. One option is to compare the total length of N line segments to the straight-line distance between the extreme end points of that chain of N line segments. You can threshold the ratio of (totalLength/straightLineLength) to identify line segments to be rejected. This would be a quick calculation, and it's easy to understand.
If you want to take line segment lengths and segment-to-segment angles into consideration, you could treat the line segments as vectors and compute the cross product. If you imagine the two vectors as defining a parallelogram, and if knowing the area of the parallegram would be a method to accept/reject a point, then the cross product is another simple and quick calculation.
https://www.math.ucdavis.edu/~daddel/linear_algebra_appl/Applications/Determinant/Determinant/node4.html
If you only have a few dozen points, you could randomly eliminate one point at a time, generate your spline fits, and then calculate the point-to-spline distances for all the original points. Given all those point-to-spline distances you can generate a metric (e.g. mean distance) that you'd like to minimize: the best fit would result from eliminating points (Pn, Pn+k, ...) resulting in a spline fit quality S. This technique wouldn't scale well with more points, but it might be worth a try if you break each chain of line segments into groups of maybe half a dozen segments each.
Although it's overkill for this problem, I'll mention that Euler curves can be good fits to "natural" curves. What's nice about Euler curves is that you can generate an Euler curve fit by two points in space and the tangents at those two points in space. The code gets hairy, but Euler curves (a.k.a. aesthetic curves, if I remember correctly) can generate better and/or more useful fits to natural curves than Bezier nth degree splines.
https://en.wikipedia.org/wiki/Euler_spiral

Determining if a list of points fit a "formation"?

I have, as input, an arbitrary "formation", which is a list of rectangles, F:
And as another input, an unordered list of 2D points, P:
In this example, I consider P to match the formation F, because if P were to be rotated 45° counter-clockwise, each rectangle in F will be satisfied by containing a point. It would also be considered a match if there were an extraneous point in P which did not fall into a rectangle.
Neither the formation, nor point inputs, have any particular origin, and the scale between the two are not required to be the same, e.g., the formation could describe an area of a kilometer, and the input points could describe an area of a centimeter. And lastly, I need to know which point ended up in which node in the formation.
I'm trying to develop a general-purpose algorithm that satisfies all of these constraints. It will be executed millions of times per second against a large database of location information, so I'm trying to "fail out" as soon as I can.
I've considered taking the angles between all points in both inputs and comparing them, or calculating and comparing hulls, but every approach seems to fall apart with one of the constraints.
Points in the formation could also easily be represented as circles with an x,y origin and tolerance radius, and that seems to simplify the approaches I've tried so far. I'd appreciate any solid plan-of-attack or A-Ha! insights.

I've had another thought - using polar coordinates this time.
The description was getting complex/ambiguous, so here is some code that hopefully illustrates the idea.
The gist is to express the formations and points in terms of polar coordinates, with the origin in the center of the formation/point set. It then becomes a lot easier to find the rotation and scaling factors of the transform between points and formations. The translation component is trivially found by comparing the average of the point set and of the formation zone set.
Note that this approach will treat your formation zones not as squares or circles, but as sections of circle segments. Hopefully this is a fudge that you can live with.
It will also not return the exact scaling and rotation terms of a valid mapping transform. It will give you a mapping between formation zones and points, and a good approximation of the final rotation and scaling factors. This approximation could be very quickly refined into a valid solution via a simple relaxation scheme. It will also quickly disregard invalid point sets.

One approach would be to express the point sets and formations in relative coordinate systems.
For each point set and formation:
Identify the most mutually-distant pair of points, call them A and B
Identify the point farthest from the line through A and B, call it C. Ensure that C is on the left of the line AB - you may need to swap A and B to make this so.
Express the rest of the points in terms of A, B and C. This is a simple matter of finding the closest point D on the line through AB for each point, and scaling such that all distances are in terms of the distance between A and B. The distance from A to D is your relative x coordinate, and the distance from D to the point is the y.
For example, if you find that A and B are ten units apart, and that C is 5 units distant from the midpoint of AB, then the relative coordinates would be:
A: (0,0)
B: (1,0)
C: (0.5,0.5)
You can then compare the point sets and formations independently of the global coordinate system. Note that the distance tolerances to find a match also have to be scaled in terms of AB.
I can easily imagine problem formations for this approach, where the choices of A, B and C are difficult to make unambiguously, but it's a start.

Minimising interpolation error between two data sets

In the top of the diagrams below we can see some value (y-axis) changing over time (x-axis).
As this happens we are sampling the value at different and unpredictable times, also we are alternating the sampling between two data sets, indicated by red and blue.
When computing the value at any time, we expect that both red and blue data sets will return similar values. However as shown in the three smaller boxes this is not the case. Viewed over time the values from each data set (red and blue) will appear to diverge and then converge about the original value.
Initially I used linear interpolation to obtain a value, next I tried using Catmull-Rom interpolation. The former results in a values come close together and then drift apart between each data point; the latter results in values which remain closer, but where the average error is greater.
Can anyone suggest another strategy or interpolation method which will provide greater smoothing (perhaps by using a greater number of sample points from each data set)?

I believe what you ask is a question that does not have a straight answer without further knowledge on the underlying sampled process. By its nature, the value of the function between samples can be merely anything, so I think there is no way to assure the convergence of the interpolations of two sample arrays.
That said, if you have a prior knowledge of the underlying process, then you can choose among several interpolation methods to minimize the errors. For example, if you measure the drag force as a function of the wing velocity, you know the relation is square (a*V^2). Then you can choose polynomial fitting of the 2nd order and have pretty good match between the interpolations of the two serieses.

Try B-splines: Catmull-Rom interpolates (goes through the data points), B-spline does smoothing.
For example, for uniformly-spaced data (not your case)
Bspline(t) = (data(t-1) + 4*data(t) + data(t+1)) / 6
Of course the interpolated red / blue curves depend on the spacing of the red / blue data points,
so cannot match perfectly.

I'd like to quote Introduction to Catmull-Rom Splines to suggest not using Catmull-Rom for this interpolation task.
One of the features of the Catmull-Rom
spline is that the specified curve
will pass through all of the control
points - this is not true of all types
of splines.
By definition your red interpolated curve will pass through all red data points and your blue interpolated curve will pass through all blue points. Therefore you won't get a best fit for both data sets.
You might change your boundary conditions and use data points from both data sets for a piecewise approximation as shown in these slides.

I agree with ysap that this question cannot be answered as you may be expecting. There may be better interpolation methods, depending on your model dynamics - as with ysap, I recommend methods that utilize the underlying dynamics, if known.
Regarding the red/blue samples, I think you have made a good observation about sampled and interpolated data sets and I would challenge your original expectation that:
When computing the value at any time, we expect that both red and blue data sets will return similar values.
I do not expect this. If you assume that you cannot perfectly interpolate - and particularly if the interpolation error is large compared to the errors in samples - then you are certain to have a continuous error function that exhibits largest errors longest (time) from your sample points. Therefore two data sets that have differing sample points should exhibit the behaviour you see because points that are far (in time) from red sample points may be near (in time) to blue sample points and vice versa - if staggered as your points are, this is sure to be true. Thus I would expect what you show, that:
Viewed over time the values from each data set (red and blue) will appear to diverge and then converge about the original value.
(If you do not have information about underlying dynamics (except frequency content), then Giacomo's points on sampling are key - however, you need not interpolate if looking at info below Nyquist.)

When sampling the original continuous function, the sampling frequency should comply to the Nyquist-Shannon sampling theorem, otherwise the sampling process introduces an error (also known as aliasing). The error, being different in the two datasets, results in a different value when you interpolate.
Therefore, you need to know the highest frequency B of the original function and then collect samples with a frequency at least 2B. If your function has very high frequencies and you cannot sample that fast, you should at least try to filter them away before sampling.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex