Fitting polynomials to data - math

Is there a way, given a set of values (x,f(x)), to find the polynomial of a given degree that best fits the data?
I know polynomial interpolation, which is for finding a polynomial of degree n given n+1 data points, but here there are a large number of values and we want to find a low-degree polynomial (find best linear fit, best quadratic, best cubic, etc.). It might be related to least squares...
More generally, I would like to know the answer when we have a multivariate function -- points like (x,y,f(x,y)), say -- and want to find the best polynomial (p(x,y)) of a given degree in the variables. (Specifically a polynomial, not splines or Fourier series.)
Both theory and code/libraries (preferably in Python, but any language is okay) would be useful.

Thanks for everyone's replies. Here is another attempt at summarizing them. Pardon if I say too many "obvious" things: I knew nothing about least squares before, so everything was new to me.
NOT polynomial interpolation
Polynomial interpolation is fitting a polynomial of degree n given n+1 data points, e.g. finding a cubic that passes exactly through four given points. As said in the question, this was not want I wanted—I had a lot of points and wanted a small-degree polynomial (which will only approximately fit, unless we've been lucky)—but since some of the answers insisted on talking about it, I should mention them :) Lagrange polynomial, Vandermonde matrix, etc.
What is least-squares?
"Least squares" is a particular definition/criterion/"metric" of "how well" a polynomial fits. (There are others, but this is simplest.) Say you are trying to fit a polynomial
p(x,y) = a + bx + cy + dx2 + ey2 + fxy
to some given data points (xi,yi,Zi) (where "Zi" was "f(xi,yi)" in the question). With least-squares the problem is to find the "best" coefficients (a,b,c,d,e,f), such that what is minimized (kept "least") is the "sum of squared residuals", namely
S = ∑i (a + bxi + cyi + dxi2 + eyi2 + fxiyi - Zi)2
Theory
The important idea is that if you look at S as a function of (a,b,c,d,e,f), then S is minimized at a point at which its gradient is 0. This means that for example ∂S/∂f=0, i.e. that
∑i2(a + … + fxiyi - Zi)xiyi = 0
and similar equations for a, b, c, d, e.
Note that these are just linear equations in a…f. So we can solve them with Gaussian elimination or any of the usual methods.
This is still called "linear least squares", because although the function we wanted was a quadratic polynomial, it is still linear in the parameters (a,b,c,d,e,f). Note that the same thing works when we want p(x,y) to be any "linear combination" of arbitrary functions fj, instead of just a polynomial (= "linear combination of monomials").
Code
For the univariate case (when there is only variable x — the fj are monomials xj), there is Numpy's polyfit:
>>> import numpy
>>> xs = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> ys = [1.1, 3.9, 11.2, 21.5, 34.8, 51, 70.2, 92.3, 117.4, 145.5]
>>> p = numpy.poly1d(numpy.polyfit(xs, ys, deg=2))
>>> print p
2
1.517 x + 2.483 x + 0.4927
For the multivariate case, or linear least squares in general, there is SciPy. As explained in its documentation, it takes a matrix A of the values fj(xi). (The theory is that it finds the Moore-Penrose pseudoinverse of A.) With our above example involving (xi,yi,Zi), fitting a polynomial means the fj are the monomials x()y(). The following finds the best quadratic (or best polynomial of any other degree, if you change the "degree = 2" line):
from scipy import linalg
import random
n = 20
x = [100*random.random() for i in range(n)]
y = [100*random.random() for i in range(n)]
Z = [(x[i]+y[i])**2 + 0.01*random.random() for i in range(n)]
degree = 2
A = []
for i in range(n):
A.append([])
for xd in range(degree+1):
for yd in range(degree+1-xd):
A[i].append((x[i]**xd)*(y[i]**yd)) #f_j(x_i)
c,_,_,_ = linalg.lstsq(A,Z)
j = 0
for xd in range(0,degree+1):
for yd in range(0,degree+1-xd):
print " + (%.2f)x^%dy^%d" % (c[j], xd, yd),
j += 1
prints
+ (0.01)x^0y^0 + (-0.00)x^0y^1 + (1.00)x^0y^2 + (-0.00)x^1y^0 + (2.00)x^1y^1 + (1.00)x^2y^0
so it has discovered that the polynomial is x2+2xy+y2+0.01. [The last term is sometimes -0.01 and sometimes 0, which is to be expected because of the random noise we added.]
Alternatives to Python+Numpy/Scipy are R and Computer Algebra Systems: Sage, Mathematica, Matlab, Maple. Even Excel might be able to do it. Numerical Recipes discusses methods to implement it ourselves (in C, Fortran).
Concerns
It is strongly influenced by how the points are chosen. When I had x=y=range(20) instead of the random points, it always produced 1.33x2+1.33xy+1.33y2, which was puzzling... until I realised that because I always had x[i]=y[i], the polynomials were the same: x2+2xy+y2 = 4x2 = (4/3)(x2+xy+y2). So the moral is that it is important to choose the points carefully to get the "right" polynomial. (If you can chose, you should choose Chebyshev nodes for polynomial interpolation; not sure if the same is true for least squares as well.)
Overfitting: higher-degree polynomials can always fit the data better. If you change the degree to 3 or 4 or 5, it still mostly recognizes the same quadratic polynomial (coefficients are 0 for higher-degree terms) but for larger degrees, it starts fitting higher-degree polynomials. But even with degree 6, taking larger n (more data points instead of 20, say 200) still fits the quadratic polynomial. So the moral is to avoid overfitting, for which it might help to take as many data points as possible.
There might be issues of numerical stability I don't fully understand.
If you don't need a polynomial, you can obtain better fits with other kinds of functions, e.g. splines (piecewise polynomials).

Yes, the way this is typically done is by using least squares. There are other ways of specifying how well a polynomial fits, but the theory is simplest for least squares. The general theory is called linear regression.
Your best bet is probably to start with Numerical Recipes.
R is free and will do everything you want and more, but it has a big learning curve.
If you have access to Mathematica, you can use the Fit function to do a least squares fit. I imagine Matlab and its open source counterpart Octave have a similar function.

For (x, f(x)) case:
import numpy
x = numpy.arange(10)
y = x**2
coeffs = numpy.polyfit(x, y, deg=2)
poly = numpy.poly1d(coeffs)
print poly
yp = numpy.polyval(poly, x)
print (yp-y)

Bare in mind that a polynomial of higher degree ALWAYS fits the data better. Polynomials of higher degree typically leads to highly improbable functions (see Occam's Razor), though (overfitting). You want to find a balance between simplicity (degree of polynomial) and fit (e.g. least square error). Quantitatively, there are tests for this, the Akaike Information Criterion or the Bayesian Information Criterion. These tests give a score which model is to be prefered.

If you want to fit the (xi, f(xi)) to an polynomial of degree n then you would set up a linear least squares problem with the data (1, xi, xi, xi^2, ..., xi^n, f(xi) ). This will return a set of coefficients (c0, c1, ..., cn) so that the best fitting polynomial is *y = c0 + c1 * x + c2 * x^2 + ... + cn * x^n.*
You can generalize this two more than one dependent variable by including powers of y and combinations of x and y in the problem.

Lagrange polynomials (as #j w posted) give you an exact fit at the points you specify, but with polynomials of degree more than say 5 or 6 you can run into numerical instability.
Least squares gives you the "best fit" polynomial with error defined as the sum of squares of the individual errors. (take the distance along the y-axis between the points you have and the function that results, square them, and sum them up) The MATLAB polyfit function does this, and with multiple return arguments, you can have it automatically take care of scaling/offset issues (e.g. if you have 100 points all between x=312.1 and 312.3, and you want a 6th degree polynomial, you're going to want to calculate u = (x-312.2)/0.1 so the u-values are distributed between -1 and +=).
NOTE that the results of least-squares fits are strongly influenced by the distribution of x-axis values. If the x-values are equally spaced, then you'll get larger errors at the ends. If you have a case where you can choose the x values and you care about the maximum deviation from your known function and an interpolating polynomial, then the use of Chebyshev polynomials will give you something that is close to the perfect minimax polynomial (which is very hard to calculate). This is discussed at some length in Numerical Recipes.
Edit: From what I gather, this all works well for functions of one variable. For multivariate functions it is likely to be much more difficult if the degree is more than, say, 2. I did find a reference on Google Books.

at college we had this book which I still find extremely useful: Conte, de Boor; elementary numerical analysis; Mc Grow Hill. The relevant paragraph is 6.2: Data Fitting.
example code comes in FORTRAN, and the listings are not very readable either, but the explanations are deep and clear at the same time. you end up understanding what you are doing, not just doing it (as is my experience of Numerical Recipes).
I usually start with Numerical Recipes but for things like this I quickly have to grab Conte-de Boor.
maybe better posting some code... it's a bit stripped down, but the most relevant parts are there. it relies on numpy, obviously!
def Tn(n, x):
if n==0:
return 1.0
elif n==1:
return float(x)
else:
return (2.0 * x * Tn(n - 1, x)) - Tn(n - 2, x)
class ChebyshevFit:
def __init__(self):
self.Tn = Memoize(Tn)
def fit(self, data, degree=None):
"""fit the data by a 'minimal squares' linear combination of chebyshev polinomials.
cfr: Conte, de Boor; elementary numerical analysis; Mc Grow Hill (6.2: Data Fitting)
"""
if degree is None:
degree = 5
data = sorted(data)
self.range = start, end = (min(data)[0], max(data)[0])
self.halfwidth = (end - start) / 2.0
vec_x = [(x - start - self.halfwidth)/self.halfwidth for (x, y) in data]
vec_f = [y for (x, y) in data]
mat_phi = [numpy.array([self.Tn(i, x) for x in vec_x]) for i in range(degree+1)]
mat_A = numpy.inner(mat_phi, mat_phi)
vec_b = numpy.inner(vec_f, mat_phi)
self.coefficients = numpy.linalg.solve(mat_A, vec_b)
self.degree = degree
def evaluate(self, x):
"""use Clenshaw algorithm
http://en.wikipedia.org/wiki/Clenshaw_algorithm
"""
x = (x-self.range[0]-self.halfwidth) / self.halfwidth
b_2 = float(self.coefficients[self.degree])
b_1 = 2 * x * b_2 + float(self.coefficients[self.degree - 1])
for i in range(2, self.degree):
b_1, b_2 = 2.0 * x * b_1 + self.coefficients[self.degree - i] - b_2, b_1
else:
b_0 = x*b_1 + self.coefficients[0] - b_2
return b_0

Remember, there's a big difference between approximating the polynomial and finding an exact one.
For example, if I give you 4 points, you could
Approximate a line with a method like least squares
Approximate a parabola with a method like least squares
Find an exact cubic function through these four points.
Be sure to select the method that's right for you!

It's rather easy to scare up a quick fit using Excel's matrix functions if you know how to represent the least squares problem as a linear algebra problem. (That depends on how reliable you think Excel is as a linear algebra solver.)

The lagrange polynomial is in some sense the "simplest" interpolating polynomial that fits a given set of data points.
It is sometimes problematic because it can vary wildly between data points.

Related

Split a cubic Bézier curve at a point

This question and this question both show how to split a cubic Bézier curve at a particular parameterized value 0 ≤ t ≤ 1 along the curve, composing the original curve shape from two new segments. I need to split my Bézier curve at a point along the curve whose coordinate I know, but not the parameterized value t for the point.
For example, consider Adobe Illustrator, where the user can click on a curve to add a point into the path, without affecting the shape of the path.
Assuming I find the point on the curve closest to where the user clicks, how do I calculate the control points from this? Is there a formula to split a Bézier curve given a point on the curve?
Alternatively (and less desirably), given a point on the curve, is there a way to determine the parameterized value t corresponding to that point (other than using De Casteljau's algorithm in a binary search)?
My Bézier curve happens to only be in 2D, but a great answer would include the vector math needed to apply in arbitrary dimensions.
It is possible, and perhaps simpler, to determine the parametric value of a point on the curve without using De Casteljau's algorithm, but you will have to use heuristics to find a good starting value and similarly approximate the result.
One possible, and fairly simple way is to use Newton's method such that:
tn+1 = tn - ( bx(tn) - cx ) / bx'(tn)
Where bx(t) refers to the x component of some Bezier curve in polynomial form with the control points x0, x1, x2 and x3, bx'(t) is the first derivative and cx is a point on the curve such that:
cx = bx(t) | 0 < t < 1
the coefficients of bx(t) are:
A = -x0 + 3x1 - 3x2 + x3
B = 3x0 - 6x1 + 3x2
C = -3x0 + 3x1
D = x0
and:
bx(t) = At3 + Bt2 + Ct + D,
bx'(t) = 3At2 + 2Bt + C
Now finding a good starting value to plug into Newton's method is the tricky part. For most curves which do not contain loops or cusps, you can simply use the formula:
tn = ( cx - x0 ) / ( x3 - x0 ) | x0 < x1 < x2 < x3
Now you already have:
bx(tn) ≈ cx
So applying one or more iterations of Newton's method will give a better approximation of t for cx.
Note that the Newton Raphson algorithm has quadratic convergence. In most cases a good starting value will result in negligible improvement after two iterations, i.e. less than half a pixel.
Finally it's worth noting that cubic Bezier curves have exact solutions for finding extrema via finding roots of the first derivative. So curves which are problematic can simply be subdivided at their extrema to remove loops or cusps, then better results can be obtained by analyzing the resulting section in question. Subdividing cubics in this way will satisfy the above constraint.

How to perform a Multivariate Polynomial Regression when output has stochastic behavior?

I have a experiment being simulated. This experiment has 3 parameters a,b,c (variables?) but the result, r, cannot be "predicted" as it has a stochastic component. In order to minimize the stochastic component I've run this experiment several times(n). So in resume I have n 4-tuples a,b,c,r where a,b,c are the same but r varies. And each batch of experiments is run with different values for a, b, c (k batches) making the complete data-set having k times n sets of 4-tuples.
I would like to find out the best polynomial fit for this data and how to compare them like:
fit1: with
fit2: with
fit3: some 3rd degree polynomial function and corresponding error
fit4: another 3rd degree (simpler) polynomial function and corresponding error
and so on...
This could be done with R or Matlab®. I've searched and found many examples but none handled same input values with different outputs.
I considered doing the multivariate polynomial regression n times adding some small delta to each parameter but I'd rather take a cleaner sollution before that.
Any help would be appreciated.
Thanks in advance,
Jacques
Polynomial regression should be able to handle stochastic simulations just fine. Just simulate r, n times, and perform a multivariate polynomial regression across all points you've simulated (I recommend polyfitn()).
You'll have multiple r values for the same [a,b,c] but a well-fit curve should be able to estimate the true distribution.
In polyfitn it will look something like this
n = 1000;
a = rand(500,1);
b = rand(500,1);
c = rand(500,1);
for n = 1:1000
for i = 1:length(a)
r(n,i) = foo(a,b,c);
end
end
my_functions = {'a^2 b^2 c^2 a b c',...};
for fun_id = 1:length(my_functions)
p{f_id} = polyfitn(repmat([a,b,c],[n,1]),r(:),myfunctions{fun_id})
end
It's not hard to iteratively/recursively generate a set of polynomial equations from a basis function; but for three variables there might not be a need to. Unless you have a specific reason for fitting higher order polynomials (planetary physics, particle physics, etc. physics), you shouldn't have too many functions to fit. It is generally not good practice to use higher-order polynomials to explain data unless you have a specific reason for doing so (risk of overfitting, sparse data inter-variable noise, more accurate non-linear methods).

Mistake in Chi-Square Distribution

I'm trying to generate chi-square random variables according to the following algorithm:
where a(i) are independent, standard normal random variables witn m even and odd
respectively.
Wikipedia gives the following definition:
https://en.wikipedia.org/wiki/Chi-squared_distribution#Definition
The code I wrote is:
dch = double(1000)
t = double(1)
for(i in 1:1000) {
for(j in 1:m) {
x = runif(1, 0, 1)
t = t + x*x
}
dch[i] = t
}
but I am getting the wrong density plot.
So, where is/are the mistake/s and how can I fix them?
As Gregor suggested in comments, you are misinterpreting the inputs to the algorithm. One way to get a Chi-squared with m degrees of freedom is to sum m independent squared standard normals, but that's not the only distributional relationship we know. It turns out that a Chi-squared(2) is the same as an exponential distribution with a mean of 2, and exponentials are straightforward to generate with inverse transform sampling, a.k.a. inversion. So in principle, if m is even you want to generate m/2 exponential(2)'s and sum them. If m is odd, do the same but add one additional standard normal squared.
What all that means is that a straightforward implementation would have you doing m/2 logarithmic evaluations to generate the exponentials. It turns out you can apply the superposition property of exponentials so you only have to do one log evaluation. Since transcendental functions are computationally expensive, this improves the efficiency of the algorithm.
When the dust settles - the z on the second line of your algorithm is a standard normal, but the a's are Uniform(0,1)'s, not normals.

What mathematical methods work for interpolation 2d to 2d functions?

So we have a matrix like
12,32
24,12
...
with length 2xN and another
44,32
44,19
...
with length 2xN and there is some function f(x, y) that returns z[1], z[2]. That 2 matrices that we were given represent known value pairs for x,y and z[1],z[2]. What are interpolation formulas that would help in such case?
If you solve the problem for one return value, you can find two functions f_1(x,y) and f_2(x,y) by interpolation, and compose your function as f(x, y) = [f_1(x,y), f_2(x,y)]. Just pick any method for solving the interpolation function suitable for your problem.
For the actual interpolation problem in two dimensions, there are a lot of ways you can handle this. If simple is what you require, you can go with linear interpolation. If you are OK with piecewise functions, you can go for bezier curves, or splines. Or, if data is uniform, you could get away with a simple polynomial interpolation (well, not quite trivial when in 2D, but easy enough).
EDIT: More information and some links.
A piecewise solution is possible using Bilinear interpolation (wikipedia).
For polynomial interpolation, if your data is on a grid, you can use the following algorithm (I cannot find the reference for it, it is from memory).
If the data points are on a k by l grid, rewrite your polynomial as follows:
f(x,y) = cx_1(x)*y^(k-1) + cx_2(x)*y^(k-2) + ... + cx_k(x)
Here, each coefficient cx_i(x) is also a polynomial of degree l. The first step is to find k polynomials of degree l by interpolating each row or column of the grid. When this is done, you have l coefficient sets (or, in other words, l polynomials) as interpolation points for each cx_i(x) polynomials as cx_i(x0), cx_i(x1), ..., cx_i(xl) (giving you a total of l*k points). Now, you can determine these polynomials using the above constants as the interpolation points, which give you the resulting f(x,y).
The same method is used for bezier curves or splines. The only difference is that you use control points instead of polynomial coefficients. You first get a set of splines that will generate your data points, and then you interpolate the control points of these intermediate curves to get the control points of the surface curve.
Let me add an example to clarify the above algorithm. Let's have the following data points:
0,0 => 1
0,1 => 2
1,0 => 3
1,1 => 4
We start by fitting two polynomials: one for data points (0,0) and (0,1), and another for (1, 0) and (1, 1):
f_0(x) = x + 1
f_1(x) = x + 3
Now, we interpolate in the other direction to determine the coefficients.When we read these polynomial coefficients vertically, we need two polynomials. One evaluates to 1 at both 0 and 1; and another that evaluates to 1 at 0, and 3 at 1:
cy_1(y) = 1
cy_2(y) = 2*y + 1
If we combine these into f(x,y), we get:
f(x,y) = cy_1(y)*x + cy_2(y)
= 1*x + (2*y + 1)*1
= x + 2*y + 1

how to evaluate derivative of function in matlab?

This should be very simple. I have a function f(x), and I want to evaluate f'(x) for a given x in MATLAB.
All my searches have come up with symbolic math, which is not what I need, I need numerical differentiation.
E.g. if I define: fx = inline('x.^2')
I want to find say f'(3), which would be 6, I don't want to find 2x
If your function is known to be twice differentiable, use
f'(x) = (f(x + h) - f(x - h)) / 2h
which is second order accurate in h. If it is only once differentiable, use
f'(x) = (f(x + h) - f(x)) / h (*)
which is first order in h.
This is theory. In practice, things are quite tricky. I'll take the second formula (first order) as the analysis is simpler. Do the second order one as an exercise.
The very first observation is that you must make sure that (x + h) - x = h, otherwise you get huge errors. Indeed, f(x + h) and f(x) are close to each other (say 2.0456 and 2.0467), and when you substract them, you lose a lot of significant figures (here it is 0.0011, which has 3 significant figures less than x). So any error on h is likely to have a huge impact on the result.
So, first step, fix a candidate h (I'll show you in a minute how to chose it), and take as h for your computation the quantity h' = (x + h) - x. If you are using a language like C, you must take care to define h or x as volatile for that computation not to be optimized away.
Next, the choice of h. The error in (*) has two parts: the truncation error and the roundoff error. The truncation error is because the formula is not exact:
(f(x + h) - f(x)) / h = f'(x) + e1(h)
where e1(h) = h / 2 * sup_{x in [0,h]} |f''(x)|.
The roundoff error comes from the fact that f(x + h) and f(x) are close to each other. It can be estimated roughly as
e2(h) ~ epsilon_f |f(x) / h|
where epsilon_f is the relative precision in the computation of f(x) (or f(x + h), which is close). This has to be assessed from your problem. For simple functions, epsilon_f can be taken as the machine epsilon. For more complicated ones, it can be worse than that by orders of magnitude.
So you want h which minimizes e1(h) + e2(h). Plugging everything together and optimizing in h yields
h ~ sqrt(2 * epsilon_f * f / f'')
which has to be estimated from your function. You can take rough estimates. When in doubt, take h ~ sqrt(epsilon) where epsilon = machine accuracy. For the optimal choice of h, the relative accuracy to which the derivative is known is sqrt(epsilon_f), ie. half the significant figures are correct.
In short: too small a h => roundoff error, too large a h => truncation error.
For the second order formula, same computation yields
h ~ (6 * epsilon_f / f''')^(1/3)
and a fractional accuracy of (epsilon_f)^(2/3) for the derivative (which is typically one or two significant figures better than the first order formula, assuming double precision).
If this is too imprecise, feel free to ask for more methods, there are a lot of tricks to get better accuracy. Richardson extrapolation is a good start for smooth functions. But those methods typically compute f quite a few times, this may or not be what you want if your function is complex.
If you are going to use numerical derivatives a lot of times at different points, it becomes interesting to construct a Chebyshev approximation.
To get a numerical difference (symmetric difference), you calculate (f(x+dx)-f(x-dx))/(2*dx)
fx = #(x)x.^2;
fPrimeAt3 = (fx(3.1)-fx(2.9))/0.2;
Alternatively, you can create a vector of function values and apply DIFF, i.e.
xValues = 2:0.1:4;
fValues = fx(xValues);
df = diff(fValues)./0.1;
Note that diff takes the forward difference, and that it assumes that dx equals to 1.
However, in your case, you may be better off to define fx as a polynomial, and evaluating the derivative of the function, rather than the function values.
Lacking the symbolic toolbox, nothing stops you from using Derivest, a tool for automatic adaptive numerical differentiation.
derivest(#sin,pi)
ans =
-1
For your example it does very nicely. In fact, it even provides an estimate of the error in the resulting approximation.
fx = inline('x.^2');
[fp,errest] = derivest(fx,3)
fp =
6
errest =
3.6308e-14
did you try diff (calculates differences and approximates a derivative), gradient, or polyder (calculates the derivative of a polynomial) functions?
You can read more on these functions by using help <commandname> on MATLAB console, or use the function browser in the Help menu.
For a given function in analytical form, you can evaluate the derivative at a desired point with the following code:
syms x
df = diff(x^2);
df3 = subs(df, 'x', 3);
fprintf('f''(3)=%f\n', df3);
For pure numerical derivatives use the already given solutions by Jonas and posdef.

Resources