how to compare two curves (arrays of points) - math

i have 2 arrays of points (x,y), with those points I can draw 2 curves.
Anyone have ideas how to calculate how those curves are similar?

You can always calculate the area between those two curves. (This is a bit easier if the endpoints match.) The curves are similar if the area is small, not so similar if the area is not small.
Note that I did not define 'small'. That was intentional. Then again, you didn't define 'similar'.
Edit
Sometimes area isn't the best metric. For example consider the function f(x)=0 and f(x)=1e6*sin(x). If the range of x is some integral multiple of 2*pi, the area between these curves is zero. A function that oscillates between plus and minus one million is not a good approximation of f(x)=0.
A better metric is needed. Here are a couple. Note: I am assuming here that the x values are identical in the two sets; the only things that differ are the y values.
Sum of squares. For each x value, compute delta_yi = y1,i - y2,i and accumulate delta_yi2. This metric is the basis for a least square optimization, where the goal is to minimize the sum of the squares of the errors. This is a widely used approach because oftentimes it is fairly easy to implement.
Maximum deviation. Find the abs_delta_yi = |y1,i - y2,i| that maximizes the |y1,i - y2,i| for all x values. This metric is the basis for a lot of the implementations of the functions in the math library, where the goal is to minimize the maximum error. These math library implementations are approximations of the true function. As a consumer of such an approximation, I typically care more about the worst thing that the approximation is going to do to my application than I care about how that approximation is going to behave on average.

You might want to consider using Dynamic Time Warping (DTW) or Frechet distance.
Dynamic Time Warping
Dynamic Time Warping sums the difference throughout the entire curve. It can handle two arrays of different sizes. Here is a snippet from Wikipedia on how the code might look. This solution uses a two-dimensional array. The cost would be the distance between two points. The final value of the array DTW[n, m] contains the cumulative distance.
int DTWDistance(s: array [1..n], t: array [1..m]) {
DTW := array [0..n, 0..m]
for i := 1 to n
DTW[i, 0] := infinity
for i := 1 to m
DTW[0, i] := infinity
DTW[0, 0] := 0
for i := 1 to n
for j := 1 to m
cost:= d(s[i], t[j])
DTW[i, j] := cost + minimum(DTW[i-1, j ], // insertion
DTW[i , j-1], // deletion
DTW[i-1, j-1]) // match
return DTW[n, m]
}
DTW is similar to Jacopson's answer.
Frechet Distance
Frechet distance calculates the farthest that the curves separate. This means that all other points on the curve are closer together than this distance. This approach is typically represented with a dog and owner as shown here:
Frechet Distance Example.
Depending on your arrays, you can compare the distance of the points and use the maximum.

I assume a Curve is an array of 2D points over the real numbers, the size of the array is N, so I call p[i] the i-th point of the curve; i goes from 0 to N-1.
I also assume that the two curves have the same size and that it is meaningful to "compare" the i-th point of the first curve with the i-th point of the second curve.
I call Delta, a real number, the result of the comparison of the two curves.
Delta can be computed as follow:
Delta = 0;
for( i = 0; i < N; i++ ) {
Delta = Delta + distance(p[i],q[i]);
}
where p are points from the first curve and q are points from the second curve.
Now you have to choose a suitable distance function depending on your problem: the function has two points as arguments and returns a real number.
For example distance can be the usual distance of two point on the plane (Pythagorean theorem and http://en.wikipedia.org/wiki/Euclidean_distance).
An example of the method in C++:
#include <numeric>
#include <vector>
#include <cmath>
#include <iostream>
#include <functional>
#include <stdexcept>
typedef double Real_t;
class Point
{
public:
Point(){}
Point(std::initializer_list<Real_t> args):x(args.begin()[0]),y(args.begin()[1]){}
Point( const Real_t& xx, const Real_t& yy ):x(xx),y(yy){}
Real_t x,y;
};
typedef std::vector< Point > Curve;
Real_t point_distance( const Point& a, const Point& b )
{
return hypot(a.x-b.x,a.y-b.y);
}
Real_t curve_distance( const Curve& c1, const Curve& c2 )
{
if ( c1.size() != c2.size() ) throw std::invalid_argument("size mismatch");
return std::inner_product( c1.begin(), c1.end(), c2.begin(), Real_t(0), std::plus< Real_t >(), point_distance );
}
int main(int,char**)
{
Curve c1{{0,0},
{1,1},
{2,4},
{3,9}};
Curve c2{{0.1,-0.1},
{1.1,0.9},
{2.1,3.9},
{3.1,8.9}};
std::cout << curve_distance(c1,c2) << "\n";
return 0;
}
If your two curves have different size then you have to think how to extend the previous method, for example you can reduce the size of the longest curve by means of a suitable algorithm (for example the Ramer–Douglas–Peucker algorithm can be a starting point) in order to match it to the size of the shortest curve.
I have just described a very simple method, you can also take different approaches; for example you can fit two curves to the two set of points and then work with the two curves expressed as mathematical function.

This can also be solved, thinking in terms of distributions.
Especially if the position of a value is interchangeable within an array.
Then you could calculate the mean and the std (and other distribution characteristics) for both arrays. And calculate the difference between those characteristics.

Related

Mixing function for non power of 2 integer intervals

I'm looking for a mixing function that given an integer from an interval <0, n) returns a random-looking integer from the same interval. The interval size n will typically be a composite non power of 2 number. I need the function to be one to one. It can only use O(1) memory, O(1) time is strongly preferred. I'm not too concerned about randomness of the output, but visually it should look random enough (see next paragraph).
I want to use this function as a pixel shuffling step in a realtime-ish renderer to select the order in which pixels are rendered (The output will be displayed after a fixed time and if it's not done yet this gives me a noisy but fast partial preview). Interval size n will be the number of pixels in the render (n = 1920*1080 = 2073600 would be a typical value). The function must be one to one so that I can be sure that every pixel is rendered exactly once when finished.
I've looked at the reversible building blocks used by hash prospector, but these are mostly specific to power of 2 ranges.
The only other method I could think of is multiply by large prime, but it doesn't give particularly nice random looking outputs.
What are some other options here?
Here is one solution based on the idea of primitive roots modulo a prime:
If a is a primitive root mod p then the function g(i) = a^i % p is a permutation of the nonzero elements which are less than p. This corresponds to the Lehmer prng. If n < p, you can get a permutation of 0, ..., n-1 as follows: Given i in that range, first add 1, then repeatedly multiply by a, taking the result mod p, until you get an element which is <= n, at which point you return the result - 1.
To fill in the details, this paper contains a table which gives a series of primes (all of which are close to various powers of 2) and corresponding primitive roots which are chosen so that they yield a generator with good statistical properties. Here is a part of that table, encoded as a Python dictionary in which the keys are the primes and the primitive roots are the values:
d = {32749: 30805,
65521: 32236,
131071: 66284,
262139: 166972,
524287: 358899,
1048573: 444362,
2097143: 1372180,
4194301: 1406151,
8388593: 5169235,
16777213: 9726917,
33554393: 32544832,
67108859: 11526618,
134217689: 70391260,
268435399: 150873839,
536870909: 219118189,
1073741789: 599290962}
Given n (in a certain range -- see the paper if you need to expand that range), you can find the smallest p which works:
def find_p_a(n):
for p in sorted(d.keys()):
if n < p:
return p, d[p]
once you know n and the matching p,a the following function is a permutation of 0 ... n-1:
def f(i,n,p,a):
x = a*(i+1) % p
while x > n:
x = a*x % p
return x-1
For a quick test:
n = 2073600
p,a = find_p_a(n) # p = 2097143, a = 1372180
nums = [f(i,n,p,a) for i in range(n)]
print(len(set(nums)) == n) #prints True
The average number of multiplications in f() is p/n, which in this case is 1.011 and will never be more than 2 (or very slightly larger since the p are not exact powers of 2). In practice this method is not fundamentally different from your "multiply by a large prime" approach, but in this case the factor is chosen more carefully, and the fact that sometimes more than 1 multiplication is required adding to the apparent randomness.

Calculating sqrt and arcTan in javacard without float type

i want to calculate sqrt and arctangent in javacard. i haven't any math lib to do this for me and i haven't float type to calculate it manually. I have some questions in my mind:
1- Can i use float number in byte array form and working on it? how?
2- Usually how these operations is calculated in javacard?
I found some links but i couldn't help me:
http://stackoverflow.com/questions/15363244/math-library-for-javacard
http://javacardos.com/javacardforum/viewtopic.php?t=437
I should mention that i have to calculate these operation on card. Thank you very much if anyone can help me.
The integer square root can be computed by the Babylonian method, if integer division is available.
Just iterate
R' = (R + S / R) / 2
with a suitable initial R.
Such a value can be found with
R= 1
while S > 2:
R*= 2
S/= 4
(preferably implemented with shifts, if available).
You can stop the iterations when the value of R stabilizes (you can also determine a priori a constant number of iterations that yields sufficient accuracy).
The idea for CORDIC in the computation of atan is to have a table of values
angle[i] = atan(pow(2,-i));
It does not matter if the angles are precomputed in radians or degrees. Then use the tangent addition theorem
tan(a+b)=(tan(a)+tan(b) ) / ( 1-tan(a)*tan(b) )
to successively reduce the given tangent value
tan(x) {
if(x<0) return -atan(-x);
if(x>1) return 2*angle[0]-atan(1/x);
pow2=1.0;
phi=0;
for(i=0;i<10; i++) {
if(x>pow2) {
phi += angle[i];
x = (x-pow2)/(1+pow2*x);
}
pow2 /= 2;
}
return phi+x;
Now one needs to translate these operations and constants into using some kind of fixed point format.

3 random numbers from 2 random numbers task

Suppose, you have some uniform destribution rnd(x) function what will return 0 or 1.
How you can use this function to create any rnd(x,n) function what will return uniform distributed numbers from 0 to n?
I mean everyone using it, but for me it's not so clever. For example, I can create distributions with right border 2^n-1 ([0-1],[0-3],[0-7], etc.) but can't find a way how to do this for ranges like [0-2] or [0-5] without using very big numbers for reasonable precision.
Suppose that you need to create function rnd(n) which returns uniformly distributed random number in range [0, n] by using another function rnd1() which returns 0 or 1.
Find such smallest k that 2^k >= n+1
Create number consisting of k bits and fill all its bits by using rnd1(). Result is uniformly distributed number in range [0, 2^k-1]
Compare generated number to n. If it is smaller or equal to n, return it. Otherwise go to step 2.
In general, this is a variation of how to generate uniform numbers in small range by using library function which generates numbers in large range:
unsigned int rnd(n) {
while (true) {
unsigned int x = rnd_full_unsigned_int();
if (x < MAX_UNSIGNED_INT / (n+1) * (n+1)) {
return x % (n+1);
}
}
}
Explanation for above code. If you simply return rnd_full_unsigned_int() % (n+1) then this will generate bias towards small valued numbers. Black spiral represents all possible values from 0 to MAX_UNSIGNED_INT, counted from inside. Length of single revolution path is (n+1). Red line shows why bias occurs. So, in order to remove this bias we first create random number x in range [0, MAX_UNSIGNED_INT] (this is easy with bit-fill). Then, if x falls into bias-generating region, we recreate it. We keep recreating it until it doesn't fall into bias-generating region. x at this moment is in form a*(n+1)-1, so x % (n+1) is a uniformly distributed number [0, n].

How to check whether the curve is C1 class?

How to check whether the curve is C1 class or C2 class.
Example:
x = [1,2,3,4,5,6,7,8,9 ......1500]
y = [0.56, 1, 12, 41, 01. ....... 11, 0.11, 3, 23, 95]
This curve is C1 class "function" ?
Thank you very much.
MatLab vectors contain samples of the function, not the function itself.
Sampled data is always discrete, not continuous.
There are infinitely many functions with the same samples. Specifically, there are always both continuous and discontinous functions with those samples, so there's no way to determine C1 or not from just samples.
Example of a continuous function: The Fourier (or DCT) reconstructed estimate.
Example of a discontinuous function: The Fourier reconstructed estimate, plus a sawtooth wave with period equal to the sampling rate.
You can't tell from the data you're given; you have to know something about how you represent a function from it.
For example, if I plot those as a histogram it's discontinuous (jumps at each point). If I do straight line interpolation between points it's C0 continuous. If I use a smooth interpolation like a spline I can get C1 continuity and so on depending on how I choose to represent the function from your arrays of data.
While technically you can't check if the data corresponds to a C1 or C2 curve - you can do something that still might be useful.
C1 means continuous 1st derivative. So if you calculate the derivative numerically and then see big jumps in the derivative then you might suspect that the underlying curve is not C1. (You can't actually guarantee that, but you can guarantee that it is either not C1 or has derivative outside some bounds). Conversely if you don't get any big jumps then there is a C1 curve with bounded derivative that does fit the data - just not necessarily the same curve that actually generated the data.
You can do something similar with the numerically calculated second derivative to determine its C2 status. (Note that if its not C1, then it can't be C2 - so if that test fails you can forget about the second test.)
Here's roughly how I'd do it in C++ for the C1 case with evenly spaced x points. (If things are not evenly spaced you'll need to tweak the calculation of s).
double y[N] = {0.56, 1, 12, 41, ..., 11, 0.11, 3, 23, 95 };
double max_abs_slope = 0;
double sum_abs_slope = 0;
double sum_abs_slope_sq = 0;
unsigned int imax=0;
for(unsigned int i=0; i<N-1; ++i )
{
double s = fabs( y[i+1]-y[i] );
sum_abs_slope += s;
sum_abs_slope_sq += s*s;
if(s>max_abs_slope) { max_abs_slope = s; imax = i; }
}
// We expect the max to be within three std-dev of the average.
double stddev = sqrt( (N*sum_abs_slope_sq - sum_abs_slope*sum_abs_slope)/(N*(N-1)) );
if( ( max_abs_slope - sum_abs_slope/(N-1) ) > 3 * stddev )
{
std::cout<<"There's an unexpectedly large jump in interval "<<imax<<std::endl;
}
else
{
std::cout<<"It seems smooth"<<std::endl;
}
However you might use a different threshold than 3*stddev, you might pick an actual limit based on
your knowledge of the underlying problem, or you might choose to be stricter (using a value >3) or less strict (<3).
I've not tested this code, so it may not run or may be buggy.
I've also not checked that 3*stddev makes sense for any curves.
This is very much caveat emptor.

Curve fitting: Find the smoothest function that satisfies a list of constraints

Consider the set of non-decreasing surjective (onto) functions from (-inf,inf) to [0,1].
(Typical CDFs satisfy this property.)
In other words, for any real number x, 0 <= f(x) <= 1.
The logistic function is perhaps the most well-known example.
We are now given some constraints in the form of a list of x-values and for each x-value, a pair of y-values that the function must lie between.
We can represent that as a list of {x,ymin,ymax} triples such as
constraints = {{0, 0, 0}, {1, 0.00311936, 0.00416369}, {2, 0.0847077, 0.109064},
{3, 0.272142, 0.354692}, {4, 0.53198, 0.646113}, {5, 0.623413, 0.743102},
{6, 0.744714, 0.905966}}
Graphically that looks like this:
(source: yootles.com)
We now seek a curve that respects those constraints.
For example:
(source: yootles.com)
Let's first try a simple interpolation through the midpoints of the constraints:
mids = ({#1, Mean[{#2,#3}]}&) ### constraints
f = Interpolation[mids, InterpolationOrder->0]
Plotted, f looks like this:
(source: yootles.com)
That function is not surjective. Also, we'd like it to be smoother.
We can increase the interpolation order but now it violates the constraint that its range is [0,1]:
(source: yootles.com)
The goal, then, is to find the smoothest function that satisfies the constraints:
Non-decreasing.
Tends to 0 as x approaches negative infinity and tends to 1 as x approaches infinity.
Passes through a given list of y-error-bars.
The first example I plotted above seems to be a good candidate but I did that with Mathematica's FindFit function assuming a lognormal CDF.
That works well in this specific example but in general there need not be a lognormal CDF that satisfies the constraints.
I don't think you've specified enough criteria to make the desired CDF unique.
If the only criteria that must hold is:
CDF must be "fairly smooth" (see below)
CDF must be non-decreasing
CDF must pass through the "error bar" y-intervals
CDF must tend toward 0 as x --> -Infinity
CDF must tend toward 1 as x --> Infinity.
then perhaps you could use Monotone Cubic Interpolation.
This will give you a C^2 (twice continously differentiable) function which,
unlike cubic splines, is guaranteed to be monotone when given monotone data.
This leaves open the question, exactly what data should you use to generate the
monotone cubic interpolation. If you take the center point (mean) of each error
bar, are you guaranteed that the resulting data points are monotonically
increasing? If not, you might as well make some arbitrary choice to guarantee
that the points you select are monotonically increasing (because the criteria does not force our solution to be unique).
Now what to do about the last data point? Is there an X which is guaranteed to
be larger than any x in the constraints data set? Perhaps you can again make an
arbitrary choice of convenience and pick some very large X and put (X,1) as the
final data point.
Comment 1: Your problem can be broken into 2 sub-problems:
Given exact points (x_i,y_i) through which the CDF must pass, how do you generate CDF? I suspect there are infinitely many possible solutions, even with the infinite-smoothness constraint.
Given y-errorbars, how should you pick (x_i,y_i)? Again, there infinitely many possible solutions. Some additional criteria may need to be added to force a unique choice. Additional criteria would also probably make the problem even harder than it currently is.
Comment 2: Here is a way to use monotonic cubic interpolation, and satisfy criteria 4 and 5:
The monotonic cubic interpolation (let's call it f) maps R --> R.
Let CDF(x) = exp(-exp(f(x))). Then CDF: R --> (0,1). If we could find the appropriate f, then by defining CDF this way, we could satisfy criteria 4 and 5.
To find f, transform the CDF constraints (x_0,y_0),...,(x_n,y_n) using the transformation xhat_i = x_i, yhat_i = log(-log(y_i)). This is the inverse of the CDF transformation. If the y_i's were increasing, then the yhat_i's are decreasing.
Now apply monotone cubic interpolation to the (x_hat,y_hat) data points to generate f. Then finally, define CDF(x) = exp(-exp(f(x))). This will be a monotonically increasing function from R --> (0,1), which passes through the points (x_i,y_i).
This, I think, satisfies all the criteria 2--5. Criteria 1 is somewhat satisfied, though there certainly could exist smoother solutions.
I have found a solution that gives reasonable results for a variety of inputs.
I start by fitting a model -- once to the low ends of the constraints, and again to the high ends.
I'll refer to the mean of these two fitted functions as the "ideal function".
I use this ideal function to extrapolate to the left and to the right of where the constraints end, as well as to interpolate between any gaps in the constraints.
I compute values for the ideal function at regular intervals, including all the constraints, from where the function is nearly zero on the left to where it's nearly one on the right.
At the constraints, I clip these values as necessary to satisfy the constraints.
Finally, I construct an interpolating function that goes through these values.
My Mathematica implementation follows.
First, a couple helper functions:
(* Distance from x to the nearest member of list l. *)
listdist[x_, l_List] := Min[Abs[x - #] & /# l]
(* Return a value x for the variable var such that expr/.var->x is at least (or
at most, if dir is -1) t. *)
invertish[expr_, var_, t_, dir_:1] := Module[{x = dir},
While[dir*(expr /. var -> x) < dir*t, x *= 2];
x]
And here's the main function:
(* Return a non-decreasing interpolating function that maps from the
reals to [0,1] and that is as close as possible to expr[var] without
violating the given constraints (a list of {x,ymin,ymax} triples).
The model, expr, will have free parameters, params, so first do a
model fit to choose the parameters to satisfy the constraints as well
as possible. *)
cfit[constraints_, expr_, params_, var_] :=
Block[{xlist,bots,tops,loparams,hiparams,lofit,hifit,xmin,xmax,gap,aug,bests},
xlist = First /# constraints;
bots = Most /# constraints; (* bottom points of the constraints *)
tops = constraints /. {x_, _, ymax_} -> {x, ymax};
(* fit a model to the lower bounds of the constraints, and
to the upper bounds *)
loparams = FindFit[bots, expr, params, var];
hiparams = FindFit[tops, expr, params, var];
lofit[z_] = (expr /. loparams /. var -> z);
hifit[z_] = (expr /. hiparams /. var -> z);
(* find x-values where the fitted function is very close to 0 and to 1 *)
{xmin, xmax} = {
Min#Append[xlist, invertish[expr /. hiparams, var, 10^-6, -1]],
Max#Append[xlist, invertish[expr /. loparams, var, 1-10^-6]]};
(* the smallest gap between x-values in constraints *)
gap = Min[(#2 - #1 &) ### Partition[Sort[xlist], 2, 1]];
(* augment the constraints to fill in any gaps and extrapolate so there are
constraints everywhere from where the function is almost 0 to where it's
almost 1 *)
aug = SortBy[Join[constraints, Select[Table[{x, lofit[x], hifit[x]},
{x, xmin,xmax, gap}],
listdist[#[[1]],xlist]>gap&]], First];
(* pick a y-value from each constraint that is as close as possible to
the mean of lofit and hifit *)
bests = ({#1, Clip[(lofit[#1] + hifit[#1])/2, {#2, #3}]} &) ### aug;
Interpolation[bests, InterpolationOrder -> 3]]
For example, we can fit to a lognormal, normal, or logistic function:
g1 = cfit[constraints, CDF[LogNormalDistribution[mu,sigma], z], {mu,sigma}, z]
g2 = cfit[constraints, CDF[NormalDistribution[mu,sigma], z], {mu,sigma}, z]
g3 = cfit[constraints, 1/(1 + c*Exp[-k*z]), {c,k}, z]
Here's what those look like for my original list of example constraints:
(source: yootles.com)
The normal and logistic are nearly on top of each other and the lognormal is the blue curve.
These are not quite perfect.
In particular, they aren't quite monotone.
Here's a plot of the derivatives:
Plot[{g1'[x], g2'[x], g3'[x]}, {x, 0, 10}]
(source: yootles.com)
That reveals some lack of smoothness as well as the slight non-monotonicity near zero.
I welcome improvements on this solution!
You can try to fit a Bezier curve through the midpoints. Specifically I think you want a C2 continuous curve.

Resources