cubic interpolation with not equidistant points - math

I'm trying to create an interpolation of a list of points.
I've some point of coordinates (ti, xi), where ti are timestamp and xi are associated values. I want to create a function that passes through these points and allows me to find the x value corresponding to a generic t that lies in the interval.
I want to interpolate them with a third-order interpolation. I've seen something like catmull-rom interpolation, but it works only if points xi are equidistant.
For example here http://www.mvps.org/directx/articles/catmull/ you can find that the timestamp points are equidistant, like also here http://www.cs.cmu.edu/~462/projects/assn2/assn2/catmullRom.pdf .
There' some way to apply cubic interpolation with non regular points?

Unequal spacing of the arguments is not a problem as long as they are all distinct. As you probably know, if you have four distinct times t[i], then there exists a unique polynomial interpolant of corresponding values x[i] having degree at most 3 (cubic or lower order).
There are two main approaches to computing the interpolant, Newton's divided-differences and Lagrange's method of interpolation.
Keeping in mind that just finding the polynomial is not the point, but rather evaluating it at another time in the interval, there are some programming tradeoffs to consider.
If the times t[i] are fixed but values x[i] are changed repeatedly, you might be well off using Lagrange's method. It basically constructs four cubic polynomials that take roots at three of the four points and gives a normalized value 1 at the remaining point. Once you have those four polynomials, interpolating the values x[i] is just a matter of taking the corresponding linear combination of them. Lagrange's method suffers from Runge's phenomenon at the edges of the interval.
However if the times t[i] keep changing, or perhaps you are evaluating the interpolating polynomial for a number of intermediate points for the same t[i], x[i] data, then Newton's divided differences may be better. If accuracy is important, one can vary the order that the times t[i] appear in the divided-difference tableau so that the evaluation is localized around the closest times to the intermediate time where the value is needed.
It's not hard to find sample code for Newton's divided difference method on the Web, e.g. in C++, Python, or Java.

One way might be to fit a least squares cubic through the points. I've found that approach here to be robust and practical, even with a small number of points.

Related

Approximating two different curves in R

I have two different density plots in R- one of them is the observed data (x1), and the other is randomly generated data from a Poisson distribution with the observed mean (x2). I would like to approximate the curves, i.e. make the expected curve look more like the observed data as it is over and under-estimated in certain areas. How do I go about doing this? I know you can get the absolute value between the curves by using
abs (x1 - x2)
However I'm not too sure how to proceed. Anybody have any ideas?
I think if you want to find an analytical solution, you might just have to play with the functions for a while. Otherwise, it seems that you could use calculus of variations to do this. That is, you take the difference between the area under both of your functions, and then minimize that (take the derivative). Formally, you need to take the second derivative to find if it's a max, min, or inflection point. However, you don't need to in this case if the function fits the data. I'm not sure what the best program would be for finding an analytical solution, but maybe that will put you on the right track. Just an idea to bounce around

I want to draw a curve and generate a polynomial that closely fits it. How would I go about this?

I have an arbitrary curve (defined by a set of points) and I would like to generate a polynomial that fits that curve to an arbitrary precision. What is the best way to tackle this problem, or is there already a library or online service that performs this task?
Thanks!
If your "arbitrary curve" is described by a set of points (x_i,y_i) where each x_i is unique, and if you mean by "fits" the calculation of the best least-squares polynomial approximation of degree N, you can simply obtain the coefficients b of the polynomial using
b = polyfit(X,Y,N)
where X is the vector of x_i values, Y is the vector of Y_i values. In this way you can increase N until you obtain the accuracy you require. Of course you can achieve zero approximation error by calculating the interpolating polynomial. However, data fitting often requires some thought beforehand - you need to give thought to what you want the approximation to achieve. There are a variety of mathematical ways of assessing approximation error (by using different norms), the choice of which will depend on your requirements of the resulting approximation. There are also many potential pitfalls (such as overfitting) that you may come across and blindly attempting to fit curves may result in an approximation that is theoritically sound but utterly useless to you in practical terms. I would suggest doing a little research on approximation theory if the above method does not meet your requirements, as has been suggested in the comments on your question.

Filtering methods of the complex oscilliations

If I have a system of a springs, not one, but for example 3 degree of freedom system of the springs connected in some with each other. I can make a system of differential equations for but it is impossible to solve it in a general way. The question is, are there any papers or methods for filtering such a complex oscilliations, in order to get rid of the oscilliations and get a real signal as much as possible? For example if I connect 3 springs in some way, and push them to start the vibrations, or put some weight on them, and then take the vibrations from each spring, are there any filtering methods to make it easy to determine the weight (in case if some mass is put above) of each mass? I am interested in filtering complex spring like systems.
Three springs, six degrees of freedom? This is a trivial solution using finite element methods and numerical integration. It's a system of six coupled ODEs. You can apply any form of numerical integration, such as 5th order Runge-Kutta.
I'd recommend doing an eigenvalue analysis of the system first to find out something about its frequency characteristics and normal modes. I'd also do an FFT of the dynamic forces you apply to the system. You don't mention any damping, so if you happen to excite your system at a natural frequency that's close to a resonance you might have some interesting behavior.
If the dynamic equation has this general form (sorry, I don't have LaTeX here to make it look nice):
Ma + Kx = F
where M is the mass matrix (diagonal), a is the acceleration (2nd derivative of displacements w.r.t. time), K is the stiffness matrix, and F is the forcing function.
If you're saying you know the response, you'll have to pre-multiply by the transpose of the response function and try to solve for M. It's diagonal, so you have a shot at it.
Are you connecting the springs in such a way that the behavior of the system is approximately linear? (e.g. at least as close to linear as are musical instrument springs/strings?) Is this behavior consistant over time? (e.g. the springs don't melt or break.) If so, LTI (linear time invariant) systems theory might be applicable. Given enough measurements versus the numbers of degrees of freedom in the LTI system, one might be able to estimate a pole-zero plot of the system response, and go from there. Or something like a linear predictor might be useful.
Actually it is possible to solve the resulting system of differential equations as long as you know the masses, etc.
The standard approach is to use a Laplace Transform. In particular you start with a set of linear differential equations. Add variables until you have a set of first order linear differential equations. (So if you have y'' in your equation, you'd add the equation z = y' and replace y'' with z'.) Rewrite this in the form:
v' = Av + w
where v is a vector of variable, A is a matrix, and w is a scalar vector. (An example of something that winds up in w is gravity.)
Now apply a Laplace transform to get
s L(v) - v(0) = AL(v) + s w
Solve it to get
L(v) = inv(A - I s)(s w + v(0))
where inv inverts a matrix and I is the identity matrix. Apply the inverse Laplace transform (if you read up on Laplace transforms you can find tables of inverse of common types of functions - getting a complete list of the functions you actually encounter shouldn't be that hard), and you have your solution. (Be warned, these computations quickly get very complex.)
Now you have the ability to take a particular setup and solve for the future behavior. You also have the ability to (if you do things really carefully) figure out how the model responds to a small perturbation in parameters. But your problem is that you don't know the parameters to use. However you do have the ability to measure the positions in the system at repeated times.
If you put this together, what you can do is this. Measure your position at a number of points. First estimate all of the initial values of the parameters, and then all of the values a second later. You can adjust your parameters (using Newton's method) to come close enough to the values a second later. Take the measurements from 5 seconds later and use that initial estimate as your starting point to refine your calculations for what is happening 5 seconds later. Repeat with longer intervals to get all of your answers.
Writing and debugging this should take you some time. :-) I would strongly recommend investigating how much of this Mathematica knows how to do for you already...

Calculus, How can you find an equation from a series of numbers?

I'm analyzing financial data and would like to find the inflection points of a line. I know I can do this using derivatives, but first I need an equation. Is there a way to generate an equation based off of a series of numbers. I would need to do this programmaticly.
Spline interpolation is probably more useful for you than polynomial interpolation: if you fit a polynomial, it must inevitably head off to +/- infinity outside your data range.
You will also want a method which allows a slightly loose fit: financial data is often a bit noisy which can result in very weird curves if you try to fit it exactly.
There are established procedures for turning a set of existing data points into a polynomial; this is called Polynomial Interpolation. This article in Wikipedia: http://en.wikipedia.org/wiki/Polynomial_interpolation
explains it mathematically. You can probably Google for algorithms easily enough.
Given enough points, your polynomial tracks the original, unknown function reasonably well, so the polynomial's inflection points should roughly coincide with the peaks and troughs of your data.
On the other hand, we all know there's not really a function behind financial data. So if I were you I'd scan along those points and find every point that has a smaller value to either side of it, and declare that a high; and vice versa for lows. Force-fitting this data into a fictitious function isn't going to make it any more useful.
Update: Tom Smith advises that spline interpolation is to be preferred to polynomial interpolation for this kind of thing, and Wikipedia bears him out. Or rather, it's bullish on his answer.
What you are thinking is analytical calculus ... when having discrete data (e.g. points), you have to do it numerically. Now, a line usually doesn't have inflection points, so I guess you're thinking of a curve. You can either interpolate some kind of it through the points, then calculate the first derivative (also numerically, but for a larger number of points), or you can just calculate the first derivation from the points you have (which will be better depends on how many points you actually have).
But really, this is just theory since we don't know the nature of data, or the language or anything.
For more on the subject search: numerical analysis on wiki, and go from there.
I think curve fitting might help you in this case. Here is a discussion which might be handy.
cheers

Correct way to standardize/scale/normalize multiple variables following power law distribution for use in linear combination

I'd like to combine a few metrics of nodes in a social network graph into a single value for rank ordering the nodes:
in_degree + betweenness_centrality = informal_power_index
The problem is that in_degree and betweenness_centrality are measured on different scales, say 0-15 vs 0-35000 and follow a power law distribution (at least definitely not the normal distribution)
Is there a good way to rescale the variables so that one won't dominate the other in determining the informal_power_index?
Three obvious approaches are:
Standardizing the variables (subtract mean and divide by stddev). This seems it would squash the distribution too much, hiding the massive difference between a value in the long tail and one near the peak.
Re-scaling variables to the range [0,1] by subtracting min(variable) and dividing by max(variable). This seems closer to fixing the problem since it won't change the shape of the distribution, but maybe it won't really address the issue? In particular the means will be different.
Equalize the means by dividing each value by mean(variable). This won't address the difference in scales, but perhaps the mean values are more important for the comparison?
Any other ideas?
You seem to have a strong sense of the underlying distributions. A natural rescaling is to replace each variate with its probability. Or, if your model is incomplete, choose a transformation that approximately acheives that. Failing that, here's a related approach: If you have a lot of univariate data from which to build a histogram (of each variate), you could convert each to a 10 point scale based on whether it is in the 0-10% percentile or 10-20%-percentile ...90-100% percentile. These transformed variates have, by construction, a uniform distribution on 1,2,...,10, and you can combine them however you wish.
you could translate each to a percentage and then apply each to a known qunantity. Then use the sum of the new value.
((1 - (in_degee / 15) * 2000) + ((1 - (betweenness_centrality / 35000) * 2000) = ?
Very interesting question. Could something like this work:
Lets assume that we want to scale both the variables to a range of [-1,1]
Take the example of betweeness_centrality that has a range of 0-35000
Choose a large number in the order of the range of the variable. As an example lets choose 25,000
create 25,000 bins in the original range [0-35000] and 25,000 bins in the new range [-1,1]
For each number x-i find out the bin# it falls in the original bin. Let this be B-i
Find the range of B-i in the range [-1,1].
Use either the max/min of the range of B-i in [-1,1] as the scaled version of x-i.
This preserves the power law distribution while also scaling it down to [-1,1] and does not have the problem as experienced by (x-mean)/sd.
normalizing to [0,1] would be my short answer recommendation to combine the 2 values as it will maintain the distribution shape as you mentioned and should solve the problem of combining the values.
if the distribution of the 2 variables is different which sounds likely this won't really give you what i think your after, which is a combined measure of where each variable is within its given distribution. you would have to come up with a metric which determines where in the given distribution the value lies, this could be done many ways, one of which would be to determine how many standard deviations away from the mean the given value is, you could then combine these 2 values in some way to get your index. (addition may no longer be sufficient)
you'd have to work out what makes the most sense for the data sets your looking at. standard deviations may well be meaningless for your application, but you need to look at statistical measures that related to the distribution and combine those, rather than combing absolute values, normalized or not.

Resources