Calculus, How can you find an equation from a series of numbers? - math

I'm analyzing financial data and would like to find the inflection points of a line. I know I can do this using derivatives, but first I need an equation. Is there a way to generate an equation based off of a series of numbers. I would need to do this programmaticly.

Spline interpolation is probably more useful for you than polynomial interpolation: if you fit a polynomial, it must inevitably head off to +/- infinity outside your data range.
You will also want a method which allows a slightly loose fit: financial data is often a bit noisy which can result in very weird curves if you try to fit it exactly.

There are established procedures for turning a set of existing data points into a polynomial; this is called Polynomial Interpolation. This article in Wikipedia: http://en.wikipedia.org/wiki/Polynomial_interpolation
explains it mathematically. You can probably Google for algorithms easily enough.
Given enough points, your polynomial tracks the original, unknown function reasonably well, so the polynomial's inflection points should roughly coincide with the peaks and troughs of your data.
On the other hand, we all know there's not really a function behind financial data. So if I were you I'd scan along those points and find every point that has a smaller value to either side of it, and declare that a high; and vice versa for lows. Force-fitting this data into a fictitious function isn't going to make it any more useful.
Update: Tom Smith advises that spline interpolation is to be preferred to polynomial interpolation for this kind of thing, and Wikipedia bears him out. Or rather, it's bullish on his answer.

What you are thinking is analytical calculus ... when having discrete data (e.g. points), you have to do it numerically. Now, a line usually doesn't have inflection points, so I guess you're thinking of a curve. You can either interpolate some kind of it through the points, then calculate the first derivative (also numerically, but for a larger number of points), or you can just calculate the first derivation from the points you have (which will be better depends on how many points you actually have).
But really, this is just theory since we don't know the nature of data, or the language or anything.
For more on the subject search: numerical analysis on wiki, and go from there.

I think curve fitting might help you in this case. Here is a discussion which might be handy.
cheers

Related

How do I numerically compare the Dymos solution to the simulated solution?

I want to conduct a convergence study for my Dymos optimization results where I vary the number of nodes and compare the simulated solution to the optimization solution. From what I understand, Dymos fits polynomials to the system dynamics to represent the timeseries solution. What is the best way to compare the polynomial trajectory of the optimization solution to the trajectory of the simulated solution? I specifically want to evaluate the difference between the two trajectories away from the collocation/control nodes... to show that the polynomial fitting actually represents the simulated solution. How would I access the polynomial fitting data?
Thanks in advance.
For some of the testing we have an assert_timeseries_near_equal function that treats the more dense time series as the truth and tests that the less dense timeseries (usually the discrete solution) is reasonably close to it.
We're actually working on this method a bit more explicit right now so it's a little easier for users to apply in general cases, such as comparing discrete solutions from two different cases.
In general, there's a few different ways you can test your explicit results against an explicit integration. You could just verify that the final states of the two solutions are reasonably close. Since the error tends to increase over the course of the trajectory this is often good enough for a quick check. The downside of this approach is that it doesn't test that both solutions took the same path to the final condition.
To test the solution away from the nodes I'd recommend the following: Add a second timeseries output to the relevant phase that contains more segments or higher order segments. This timeseries will have more nodes. Dymos will interpolate from the solution's collocation grid onto this more dense timeseries output grid. Comparing this against the explicit simulation should still match exactly in terms of times, controls, and parameters, you'll better capture the interpolating state polynomials vs the explicitly simulated results.
There are other statistical methods out there for comparing timeseries that you can bring to bear, but visualizing the explicit trajectory plus some error bound as a "tube" into which we want to fit the discrete solution is usually how I handle it.

Approximating two different curves in R

I have two different density plots in R- one of them is the observed data (x1), and the other is randomly generated data from a Poisson distribution with the observed mean (x2). I would like to approximate the curves, i.e. make the expected curve look more like the observed data as it is over and under-estimated in certain areas. How do I go about doing this? I know you can get the absolute value between the curves by using
abs (x1 - x2)
However I'm not too sure how to proceed. Anybody have any ideas?
I think if you want to find an analytical solution, you might just have to play with the functions for a while. Otherwise, it seems that you could use calculus of variations to do this. That is, you take the difference between the area under both of your functions, and then minimize that (take the derivative). Formally, you need to take the second derivative to find if it's a max, min, or inflection point. However, you don't need to in this case if the function fits the data. I'm not sure what the best program would be for finding an analytical solution, but maybe that will put you on the right track. Just an idea to bounce around

How can I do blind fitting on a list of x, y value pairs if I don't know the form of f(x) = y?

If I have a function f(x) = y that I don't know the form of, and if I have a long list of x and y value pairs (potentially thousands of them), is there a program/package/library that will generate potential forms of f(x)?
Obviously there's a lot of ambiguity to the possible forms of any f(x), so something that produces many non-trivial unique answers (in reduced terms) would be ideal, but something that could produce at least one answer would also be good.
If x and y are derived from observational data (i.e. experimental results), are there programs that can create approximate forms of f(x)? On the other hand, if you know beforehand that there is a completely deterministic relationship between x and y (as in the input and output of a pseudo random number generator) are there programs than can create exact forms of f(x)?
Soooo, I found the answer to my own question. Cornell has released a piece of software for doing exactly this kind of blind fitting called Eureqa. It has to be one of the most polished pieces of software that I've ever seen come out of an academic lab. It's seriously pretty nifty. Check it out:
It's even got turnkey integration with Amazon's ec2 clusters, so you can offload some of the heavy computational lifting from your local computer onto the cloud at the push of a button for a very reasonable fee.
I think that I'm going to have to learn more about GUI programming so that I can steal its interface.
(This is more of a numerical methods question.) If there is some kind of observable pattern (you can kinda see the function), then yes, there are several ways you can approximate the original function, but they'll be just that, approximations.
What you want to do is called interpolation. Two very simple (and not very good) methods are Newton's method and Laplace's method of interpolation. They both work on the same principle but they are implemented differently (Laplace's is iterative, Newton's is recursive, for one).
If there's not much going on between any two of your data points (ie, the actual function doesn't have any "bumps" whose "peaks" are not represented by one of your data points), then the spline method of interpolation is one of the best choices you can make. It's a bit harder to implement, but it produces nice results.
Edit: Sometimes, depending on your specific problem, these methods above might be overkill. Sometimes, you'll find that linear interpolation (where you just connect points with straight lines) is a perfectly good solution to your problem.
It depends.
If you're using data acquired from the real-world, then statistical regression techniques can provide you with some tools to evaluate the best fit; if you have several hypothesis for the form of the function, you can use statistical regression to discover the "best" fit, though you may need to be careful about over-fitting a curve -- sometimes the best fit (highest correlation) for a specific dataset completely fails to work for future observations.
If, on the other hand, the data was generated something synthetically (say, you know they were generated by a polynomial), then you can use polynomial curve fitting methods that will give you the exact answer you need.
Yes, there are such things.
If you plot the values and see that there's some functional relationship that makes sense, you can use least squares fitting to calculate the parameter values that minimize the error.
If you don't know what the function should look like, you can use simple spline or interpolation schemes.
You can also use software to guess what the function should be. Maybe something like Maxima can help.
Wolfram Alpha can help you guess:
http://blog.wolframalpha.com/2011/05/17/plotting-functions-and-graphs-in-wolframalpha/
Polynomial Interpolation is the way to go if you have a totally random set
http://en.wikipedia.org/wiki/Polynomial_interpolation
If your set is nearly linear, then regression will give you a good approximation.
Creating exact form from the X's and Y's is mostly impossible.
Notice that what you are trying to achieve is at the heart of many Machine Learning algorithm and therefor you might find what you are looking for on some specialized libraries.
A list of x/y values N items long can always be generated by an degree-N polynomial (assuming no x values are the same). See this article for more details:
http://en.wikipedia.org/wiki/Polynomial_interpolation
Some lists may also match other function types, such as exponential, sinusoidal, and many others. It is impossible to find the 'simplest' matching function, but the best you can do is go through a list of common ones like exponential, sinusoidal, etc. and if none of them match, interpolate the polynomial.
I'm not aware of any software that can do this for you, though.

I want to draw a curve and generate a polynomial that closely fits it. How would I go about this?

I have an arbitrary curve (defined by a set of points) and I would like to generate a polynomial that fits that curve to an arbitrary precision. What is the best way to tackle this problem, or is there already a library or online service that performs this task?
Thanks!
If your "arbitrary curve" is described by a set of points (x_i,y_i) where each x_i is unique, and if you mean by "fits" the calculation of the best least-squares polynomial approximation of degree N, you can simply obtain the coefficients b of the polynomial using
b = polyfit(X,Y,N)
where X is the vector of x_i values, Y is the vector of Y_i values. In this way you can increase N until you obtain the accuracy you require. Of course you can achieve zero approximation error by calculating the interpolating polynomial. However, data fitting often requires some thought beforehand - you need to give thought to what you want the approximation to achieve. There are a variety of mathematical ways of assessing approximation error (by using different norms), the choice of which will depend on your requirements of the resulting approximation. There are also many potential pitfalls (such as overfitting) that you may come across and blindly attempting to fit curves may result in an approximation that is theoritically sound but utterly useless to you in practical terms. I would suggest doing a little research on approximation theory if the above method does not meet your requirements, as has been suggested in the comments on your question.

Finding the Formula for a Curve

Is there a program that will take "response curve" values from me, and provide a formula that approximates the response curve?
It would be cool if such a program would take a numeric "percent correct" (perhaps with a standard deviation) so that it returns simplified formulas when laxity is permissable, and more precise (viz. complex) formulas when the curve needs to be approximated closely.
My interest is to play with the response curve values and "laxity" factor, until such a tool spits out a curve-fit formula simple enough that I know it will be high performance during machine computations.
Check our Eureqa, a free (as in beer) utility from Cornell University.
What's particularly interesting about Eureqa is that it uses genetic algorithms to fit the input curve you specify, and you can say what functions to allow or not in the fit. So if you wanted to stay away from sine and cosine, for instance, it wouldn't even consider those. It will also show you the best approximation with the fewest steps, and the most accurate approximation (regardless of steps). You can also run the fitting tool across multiple networked computers to speed up getting your results.
It's a very interesting tool -- check out their how-to videos.
Matlab, mathematica, octave, maple, numpy, scilab are just six among thousands of programs that will do this.
SigmaPlot - does exactly what you're looking for. Statistics and visualization of data.
(source: sigmaplot.com)

Resources