Drawing random values from a Fisher Distribution - math

In my research, I am generating discrete planes that are intended to represent fractures in rock. The orientation of a fracture plane is specified by its dip and dip direction. Knowing this, I also know the components of the normal vector for each plane.
So far, I have been drawing dip and dip direction independently from normal distributions. This is OK, but I would like to add the ability to draw from the Fisher distribution.
The fisher distribution is described
HERE
Basically, I want to be able to specify an average dip and dip direction (or a mean vector) and a "fisher constant" or dispersion factor, k, and draw values randomly from that orientation distribution.
Additional Info: It seems like the "Von Mises-Fisher distribution" is either the same as what I've been calling the "Fisher distribution" or is somehow related. Some info on the Von Mises-Fisher distribution:
As you can see, I've done some looking into this, but I admit that I don't fully understand the mathematics. I feel like I'm close, but am not quite getting it... Any help is much appreciated!
If it helps, my programming is in FORTRAN.

The algorithm is on page 59 of "Statistical analysis of spherical data" by N. I. Fisher, T. Lewis and B. J. J. Embleton. I highly recommend that book -- it will help you understand the mathematics.
The following will produce random Fisher distribution locations centered on the North pole. If you want them randomly centered, then you produce additional uniform random locations on the sphere and rotate these locations to be centered on those locations. If you are not sure of those steps, consult the aforementioned book. This Fortran code fragment uses a random number generator that produces uniform deviates from 0 to 1.
lambda = exp (-2.0 * kappa)
term1 = get_uniform_random () * (1.0 - lambda) + lambda
CoLat = 2.0 * asin ( sqrt ( -log (term1) / (2.0 * kappa) ) )
Long = 2.0 * PI * get_uniform_random ()

I think that you can do the math by hand
Integrate the density function of the Fisher Distribution to get the cumulative distribution function
F(theta)=exp(K cos(theta)))/(exp(k)-exp(-k))
The next step is to find the inverse cumulative distribution function function, F^(-1)(y). This function fulfills
F(theta)= y <=> F^(-1)(y) =theta
I think that you get the following.
F^(-1)(y) = arccos(log((exp(k)-exp(-k))*y)/K)
Draw y1, y2, y3, y4... from a uniform distribution on the interval [0, 1]
Now, the numbers F^(-1)(y1), F^(-1)(y2), F^(-1)(y3), F^(-1)(y4) will be distributed according to the Fisher distribution..

Related

Simulate Inhomogeneous Poisson Process by Gaussian Scattering

I am currently trying to redo plots which could be found on p. 120 in the textbook "Statistical Analysis and Modelling of Spatial Point Patterns". The following information should be sufficient to help me without having a look into the mentioned textbook. Using the fantastic spatstat package, I try to simulate point patterns in the unit square resulting from inhomgenous Poisson point processes (IPP) with the intensity functions (a) $\lambda(x,y)=a*(x+y)$ (linear trend) and (b) $/lambda(r)=c*exp(-dr^2)$, with $r$ being the distance from the origin.
For (a) I did the following:
library(spatstat)
linear <- function(x,y,a) {a*(x+y)}
plot(rpoispp(lambda = linear, a=150))
The resulting plot is not too bad from my understanding. I am unable figuring out how to implement (b) and would appreciate any help.
Hopefully, understanding how the implemantation of (b) works helps me to fit a model to an observed point pattern, with only a few clusters, probably one, which is likely to stem from an IPP using ppm(pattern, function describing the simple model) or kppm.
Note. The reason I am asking this question is self-interest. I could easily retrieve the plots from the source, but this does not help me understanding how to implement intensities, or create and fit simple models to observed point patterns.
If my question is answered elsewhere I would appreciate the provision of links. Thank you!
If you want to code an intensity function as a function in the R language, then it should be a function of the spatial location (x,y).
In (b) the intensity function is $\lambda(x,y) = c exp(-d (x^2 + y^2))$ where we use the fact that the distance from the origin (0,0) to the point (x, y) is $r = sqrt(x^2 + y^2)$. The code is
lam <- function(x,y,c,d) { c * exp(- d * (x^2 + y^2))
In this example the value of lambda(x,y) depends only on the distance r, so we say loosely that "the intensity is a function of r", which may be the source of your confusion.

Modified exponential distribution in R

I would like to make my own probability density function in R to simulate some things from a paper.
It is somehow similar to exponential distribution but what i really want to do is to redefine the exponential distribution into a "modified" one...
Is there such a way to do this? Thanks.
I want to simulate this:
b(x) = {(µ/p)(e ^ (-(µx-q)/p) , x > q(xbar) and 0 otherwise }
xbar is an x with a line above it, mean, average
Are you aware that exponential distributions truncated from below remain exponential distributions?
I believe you can do what you want with ?sample . Use your known distribution function b(x) to generate a vector of probabilities, say bprob , then
sample(x,size, replace=TRUE,prob=bprob)
There are some very interesting methods for generating samples from arbitrary distributions. See for example, Normal RandomNumbers:UsingMachine AnalysisTo Choosethe BestAlgorithm
W. H. PAYNE WashlngtonState University (somewhere on the web), or Numerical Recipes in C
The Art of Scientific Computing Second Edition
William H. Press,Saul A. Teukolsky et al, chapter 7.3

Runge-Kutta (RK4) integration for game physics

Gaffer on Games has a great article about using RK4 integration for better game physics. The implementation is straightforward, but the math behind it confuses me. I understand derivatives and integrals on a conceptual level, but haven't manipulated equations in a long while.
Here's the brunt of Gaffer's implementation:
void integrate(State &state, float t, float dt)
{
Derivative a = evaluate(state, t, 0.0f, Derivative());
Derivative b = evaluate(state, t+dt*0.5f, dt*0.5f, a);
Derivative c = evaluate(state, t+dt*0.5f, dt*0.5f, b);
Derivative d = evaluate(state, t+dt, dt, c);
const float dxdt = 1.0f/6.0f * (a.dx + 2.0f*(b.dx + c.dx) + d.dx);
const float dvdt = 1.0f/6.0f * (a.dv + 2.0f*(b.dv + c.dv) + d.dv)
state.x = state.x + dxdt * dt;
state.v = state.v + dvdt * dt;
}
Can anybody explain in simple terms how RK4 works? Specifically, why are we averaging the derivatives at 0.0f, 0.5f, 0.5f, and 1.0f? How is averaging derivatives up to the 4th order different from doing a simple euler integration with a smaller timestep?
After reading the accepted answer below, and several other articles, I have a grasp on how RK4 works. To answer my own questions:
Can anybody explain in simple terms how RK4 works?
RK4 takes advantage of the fact that
we can get a much better approximation
of a function if we use its
higher-order derivatives rather than
just the first or second derivative.
That's why the Taylor series
converges much faster than Euler
approximations. (take a look at the
animation on the right side of that
page)
Specifically, why are we averaging the derivatives at 0.0f, 0.5f, 0.5f, and 1.0f?
The Runge-Kutta method is an
approximation of a function that
samples derivatives of several points
within a timestep, unlike the Taylor
series which only samples derivatives
of a single point. After sampling
these derivatives we need to know how
to weigh each sample to get the
closest approximation possible. An
easy way to do this is to pick
constants that coincide with the
Taylor series, which is how the
constants of a Runge-Kutta equation
are determined.
This article made it clearer for
me. Notice how (15) is the Taylor
series expansion while (17) is the
Runge-Kutta derivation.
How is averaging derivatives up to the 4th order different from doing a simple euler integration with a smaller timestep?
Mathematically, it converges much
faster than doing many Euler
approximations. Of course, with enough
Euler approximations we can gain equal
accuracy to RK4, but the computational
power needed doesn't justify using
Euler.
This may be a bit oversimplified so far as actual math, but meant as an intuitive guide to Runge Kutta integration.
Given some quantity at some time t1, we want to know the quantity at another time t2. With a first-order differential equation, we can know the rate of change of that quantity at t1. There is nothing else we can know for sure; the rest is guessing.
Euler integration is the simplest way to guess: linearly extrapolate from t1 to t2, using the precisely known rate of change at t1. This usually gives a bad answer. If t2 is far from t1, this linear extrapolation will fail to match any curvature in the ideal answer. If we take many small steps from t1 to t2, we'll have the problem of subtraction of similar values. Roundoff errors will ruin the result.
So we refine our guess. One way is to go ahead and do this linear extrapolation anyway, then hoping it's not too far off from truth, use the differential equation to compute an estimate of the rate of change at t2. This, averaged with the (accurate) rate of change at t1, better represents the typical slope of the true answer between t1 and t2. We use this to make a fresh linear extrapolation from to t1 to t2. It's not obvious if we should take the simple average, or give more weight to the rate at t1, without doing the math to estimate errors, but there is a choice here. In any case, it's a better answer than Euler gives.
Perhaps better, make our initial linear extrapolation to a point in time midway between t1 and t2, and use the differential equation to compute the rate of change there. This gives roughly as good an answer as the average just described. Then use this for a linear extrapolation from t1 to t2, since our purpose it to find the quantity at t2. This is the midpoint algorithm.
You can imagine using the mid-point estimate of the rate of change to make another linear extrapolation of the quantity from t1 to the midpoint. With the differential equation we get an better estimate of the slope there. Using this, we end by extrapolating from t1 all the way to t2 where we want an answer. This is the Runge Kutta algorithm.
Could we do a third extrapolation to the midpoint? Sure, it's not illegal, but detailed analysis shows diminishing improvement, such that other sources of error dominate the final result.
Runge Kutta applies the differential equation to the intial point t1, twice to the midpoint, and once at the final point t2. The in-between points are a matter of choice. It is possible to use other points between t1 and t2 for making those improved estimates of the slope. For example, we could use t1, a point one third the way toward t2, another 2/3 the way toward t2, and at t2. The weights for the average of the four derivatives will be different. In practice this doesn't really help, but might have a place in testing since it ought to give the same answer but will provide a different set of round off errors.
As to your question why: I recall once writing a cloth simulator where the cloth was a series of springs interconnected at nodes. In the simulator, the force exerted by the spring is proportional to how far the spring is stretched. The force causes acceleration at the node, which causes velocity which moves the node which stretches the spring. There are two integrals (integrating acceleration to get velocity, and integrating velocity to get position) and if they are inaccurate, the errors snowball: Too much acceleration causes too much velocity which causes too much stretch which causes even more acceleration, making the whole system unstable.
It is difficult to explain without graphics, but I'll try: Say you have f(t), where f(0) = 10, f(1) = 20, and f(2) = 30.
A proper integration of f(t) over the interval 0 < t < 1 would give you the surface under the graph of f(t) over that interval.
The rectangle rule integration approximates that surface with a rectangle where the breadth is the delta in time and the length is the new value of f(t), so in the interval 0 < t < 1 , it will yield 20 * 1 = 20, and in the next interval 1
Now if you were to plot these points and draw a line through them you'll see that it is actually triangular, with a surface of 30 (units), and therefore the Euler integration is inadequate.
To get a more accurate estimation of the surface (integral) you can take smaller intervals of t, evaluating at for example f(0), f(0.5), f(1), f(1.5) and f(2).
If you're still following me, the RK4 method is then simply a way of estimating values of f(t) for t0 < t < t0+dt invented by people smarter than myself for getting accurate estimates of the integral.
(but as others have said, read the Wikipedia article for a more detailed explanation. RK4 is in the category of numerical integration)
RK4 in the simplest sense is making a approximation function that that is based on 4 derivatives and point for each time step: Your initial condition at starting point A, a first approximated slope B based on data point A at your time step/2 and the slope from A, a third approximation C , which has an correction value for the slope at B to reflect the shape changes of your function, and finally a final slope based on the corrected slope at point C.
So basically this method lets you calculate using a starting point, an averaged midpoint which has corrections built into both parts to adjust for the shape, and a doubly corrected endpoint. This makes the effective contribution from each data point 1/6 1/3 1/3 and 1/6, so most of your answer is based on your corrections for the shape of your function.
It turns out that the order of an RK approximation (Euler is considered an RK1) corresponds to how its accuracy scales with smaller time steps.
The relationship between RK1 approximations is linear, so for 10 times the precision you get roughly 10 times better convergence.
For RK4, 10 times the precision yields you about 10^4 times better convergence. So while your calcuation time increases linearly in RK4, it increases your accuracy polynomially.

approximation methods

I attached image:
(source: piccy.info)
So in this image there is a diagram of the function, which is defined on the given points.
For example on points x=1..N.
Another diagram, which was drawn as a semitransparent curve,
That is what I want to get from the original diagram,
i.e. I want to approximate the original function so that it becomes smooth.
Are there any methods for doing that?
I heard about least squares method, which can be used to approximate a function by straight line or by parabolic function. But I do not need to approximate by parabolic function.
I probably need to approximate it by trigonometric function.
So are there any methods for doing that?
And one idea, is it possible to use the Least squares method for this problem, if we can deduce it for trigonometric functions?
One more question!
If I use the discrete Fourier transform and think about the function as a sum of waves, so may be noise has special features by which we can define it and then we can set to zero the corresponding frequency and then perform inverse Fourier transform.
So if you think that it is possible, then what can you suggest in order to identify the frequency of noise?
Unfortunately many solutions here presented don't solve the problem and/or they are plain wrong.
There are many approaches and they are specifically built to solve conditions and requirements you must be aware of !
a) Approximation theory: If you have a very sharp defined function without errors (given by either definition or data) and you want to trace it exactly as possible, you are using
polynominal or rational approximation by Chebyshev or Legendre polynoms, meaning that you
approach the function by a polynom or, if periodical, by Fourier series.
b) Interpolation: If you have a function where some points (but not the whole curve!) are given and you need a function to get through this points, you can use several methods:
Newton-Gregory, Newton with divided differences, Lagrange, Hermite, Spline
c) Curve fitting: You have a function with given points and you want to draw a curve with a given (!) function which approximates the curve as closely as possible. There are linear
and nonlinear algorithms for this case.
Your drawing implicates:
It is not remotely like a mathematical function.
It is not sharply defined by data or function
You need to fit the curve, not some points.
What do you want and need is
d) Smoothing: Given a curve or datapoints with noise or rapidly changing elements, you only want to see the slow changes over time.
You can do that with LOESS as Jacob suggested (but I find that overkill, especially because
choosing a reasonable span needs some experience). For your problem, I simply recommend
the running average as suggested by Jim C.
http://en.wikipedia.org/wiki/Running_average
Sorry, cdonner and Orendorff, your proposals are well-minded, but completely wrong because you are using the right tools for the wrong solution.
These guys used a sixth polynominal to fit climate data and embarassed themselves completely.
http://scienceblogs.com/deltoid/2009/01/the_australians_war_on_science_32.php
http://network.nationalpost.com/np/blogs/fullcomment/archive/2008/10/20/lorne-gunter-thirty-years-of-warmer-temperatures-go-poof.aspx
Use loess in R (free).
E.g. here the loess function approximates a noisy sine curve.
(source: stowers-institute.org)
As you can see you can tweak the smoothness of your curve with span
Here's some sample R code from here:
Step-by-Step Procedure
Let's take a sine curve, add some
"noise" to it, and then see how the
loess "span" parameter affects the
look of the smoothed curve.
Create a sine curve and add some noise:
period <- 120 x <- 1:120 y <-
sin(2*pi*x/period) +
runif(length(x),-1,1)
Plot the points on this noisy sine curve:
plot(x,y, main="Sine Curve +
'Uniform' Noise") mtext("showing
loess smoothing (local regression
smoothing)")
Apply loess smoothing using the default span value of 0.75:
y.loess <- loess(y ~ x, span=0.75,
data.frame(x=x, y=y))
Compute loess smoothed values for all points along the curve:
y.predict <- predict(y.loess,
data.frame(x=x))
Plot the loess smoothed curve along with the points that were already
plotted:
lines(x,y.predict)
You could use a digital filter like a FIR filter. The simplest FIR filter is just a running average. For more sophisticated treatment look a something like a FFT.
This is called curve fitting. The best way to do this is to find a numeric library that can do it for you. Here is a page showing how to do this using scipy. The picture on that page shows what the code does:
(source: scipy.org)
Now it's only 4 lines of code, but the author doesn't explain it at all. I'll try to explain briefly here.
First you have to decide what form you want the answer to be. In this example the author wants a curve of the form
f(x) = p0 cos (2π/p1 x + p2) + p3 x
You might instead want the sum of several curves. That's OK; the formula is an input to the solver.
The goal of the example, then, is to find the constants p0 through p3 to complete the formula. scipy can find this array of four constants. All you need is an error function that scipy can use to see how close its guesses are to the actual sampled data points.
fitfunc = lambda p, x: p[0]*cos(2*pi/p[1]*x+p[2]) + p[3]*x # Target function
errfunc = lambda p: fitfunc(p, Tx) - tX # Distance to the target function
errfunc takes just one parameter: an array of length 4. It plugs those constants into the formula and calculates an array of values on the candidate curve, then subtracts the array of sampled data points tX. The result is an array of error values; presumably scipy will take the sum of the squares of these values.
Then just put some initial guesses in and scipy.optimize.leastsq crunches the numbers, trying to find a set of parameters p where the error is minimized.
p0 = [-15., 0.8, 0., -1.] # Initial guess for the parameters
p1, success = optimize.leastsq(errfunc, p0[:])
The result p1 is an array containing the four constants. success is 1, 2, 3, or 4 if ths solver actually found a solution. (If the errfunc is sufficiently crazy, the solver can fail.)
This looks like a polynomial approximation. You can play with polynoms in Excel ("Add Trendline" to a chart, select Polynomial, then increase the order to the level of approximation that you need). It shouldn't be too hard to find an algorithm/code for that.
Excel can show the equation that it came up with for the approximation, too.

Fitting polynomials to data

Is there a way, given a set of values (x,f(x)), to find the polynomial of a given degree that best fits the data?
I know polynomial interpolation, which is for finding a polynomial of degree n given n+1 data points, but here there are a large number of values and we want to find a low-degree polynomial (find best linear fit, best quadratic, best cubic, etc.). It might be related to least squares...
More generally, I would like to know the answer when we have a multivariate function -- points like (x,y,f(x,y)), say -- and want to find the best polynomial (p(x,y)) of a given degree in the variables. (Specifically a polynomial, not splines or Fourier series.)
Both theory and code/libraries (preferably in Python, but any language is okay) would be useful.
Thanks for everyone's replies. Here is another attempt at summarizing them. Pardon if I say too many "obvious" things: I knew nothing about least squares before, so everything was new to me.
NOT polynomial interpolation
Polynomial interpolation is fitting a polynomial of degree n given n+1 data points, e.g. finding a cubic that passes exactly through four given points. As said in the question, this was not want I wanted—I had a lot of points and wanted a small-degree polynomial (which will only approximately fit, unless we've been lucky)—but since some of the answers insisted on talking about it, I should mention them :) Lagrange polynomial, Vandermonde matrix, etc.
What is least-squares?
"Least squares" is a particular definition/criterion/"metric" of "how well" a polynomial fits. (There are others, but this is simplest.) Say you are trying to fit a polynomial
p(x,y) = a + bx + cy + dx2 + ey2 + fxy
to some given data points (xi,yi,Zi) (where "Zi" was "f(xi,yi)" in the question). With least-squares the problem is to find the "best" coefficients (a,b,c,d,e,f), such that what is minimized (kept "least") is the "sum of squared residuals", namely
S = ∑i (a + bxi + cyi + dxi2 + eyi2 + fxiyi - Zi)2
Theory
The important idea is that if you look at S as a function of (a,b,c,d,e,f), then S is minimized at a point at which its gradient is 0. This means that for example ∂S/∂f=0, i.e. that
∑i2(a + … + fxiyi - Zi)xiyi = 0
and similar equations for a, b, c, d, e.
Note that these are just linear equations in a…f. So we can solve them with Gaussian elimination or any of the usual methods.
This is still called "linear least squares", because although the function we wanted was a quadratic polynomial, it is still linear in the parameters (a,b,c,d,e,f). Note that the same thing works when we want p(x,y) to be any "linear combination" of arbitrary functions fj, instead of just a polynomial (= "linear combination of monomials").
Code
For the univariate case (when there is only variable x — the fj are monomials xj), there is Numpy's polyfit:
>>> import numpy
>>> xs = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> ys = [1.1, 3.9, 11.2, 21.5, 34.8, 51, 70.2, 92.3, 117.4, 145.5]
>>> p = numpy.poly1d(numpy.polyfit(xs, ys, deg=2))
>>> print p
2
1.517 x + 2.483 x + 0.4927
For the multivariate case, or linear least squares in general, there is SciPy. As explained in its documentation, it takes a matrix A of the values fj(xi). (The theory is that it finds the Moore-Penrose pseudoinverse of A.) With our above example involving (xi,yi,Zi), fitting a polynomial means the fj are the monomials x()y(). The following finds the best quadratic (or best polynomial of any other degree, if you change the "degree = 2" line):
from scipy import linalg
import random
n = 20
x = [100*random.random() for i in range(n)]
y = [100*random.random() for i in range(n)]
Z = [(x[i]+y[i])**2 + 0.01*random.random() for i in range(n)]
degree = 2
A = []
for i in range(n):
A.append([])
for xd in range(degree+1):
for yd in range(degree+1-xd):
A[i].append((x[i]**xd)*(y[i]**yd)) #f_j(x_i)
c,_,_,_ = linalg.lstsq(A,Z)
j = 0
for xd in range(0,degree+1):
for yd in range(0,degree+1-xd):
print " + (%.2f)x^%dy^%d" % (c[j], xd, yd),
j += 1
prints
+ (0.01)x^0y^0 + (-0.00)x^0y^1 + (1.00)x^0y^2 + (-0.00)x^1y^0 + (2.00)x^1y^1 + (1.00)x^2y^0
so it has discovered that the polynomial is x2+2xy+y2+0.01. [The last term is sometimes -0.01 and sometimes 0, which is to be expected because of the random noise we added.]
Alternatives to Python+Numpy/Scipy are R and Computer Algebra Systems: Sage, Mathematica, Matlab, Maple. Even Excel might be able to do it. Numerical Recipes discusses methods to implement it ourselves (in C, Fortran).
Concerns
It is strongly influenced by how the points are chosen. When I had x=y=range(20) instead of the random points, it always produced 1.33x2+1.33xy+1.33y2, which was puzzling... until I realised that because I always had x[i]=y[i], the polynomials were the same: x2+2xy+y2 = 4x2 = (4/3)(x2+xy+y2). So the moral is that it is important to choose the points carefully to get the "right" polynomial. (If you can chose, you should choose Chebyshev nodes for polynomial interpolation; not sure if the same is true for least squares as well.)
Overfitting: higher-degree polynomials can always fit the data better. If you change the degree to 3 or 4 or 5, it still mostly recognizes the same quadratic polynomial (coefficients are 0 for higher-degree terms) but for larger degrees, it starts fitting higher-degree polynomials. But even with degree 6, taking larger n (more data points instead of 20, say 200) still fits the quadratic polynomial. So the moral is to avoid overfitting, for which it might help to take as many data points as possible.
There might be issues of numerical stability I don't fully understand.
If you don't need a polynomial, you can obtain better fits with other kinds of functions, e.g. splines (piecewise polynomials).
Yes, the way this is typically done is by using least squares. There are other ways of specifying how well a polynomial fits, but the theory is simplest for least squares. The general theory is called linear regression.
Your best bet is probably to start with Numerical Recipes.
R is free and will do everything you want and more, but it has a big learning curve.
If you have access to Mathematica, you can use the Fit function to do a least squares fit. I imagine Matlab and its open source counterpart Octave have a similar function.
For (x, f(x)) case:
import numpy
x = numpy.arange(10)
y = x**2
coeffs = numpy.polyfit(x, y, deg=2)
poly = numpy.poly1d(coeffs)
print poly
yp = numpy.polyval(poly, x)
print (yp-y)
Bare in mind that a polynomial of higher degree ALWAYS fits the data better. Polynomials of higher degree typically leads to highly improbable functions (see Occam's Razor), though (overfitting). You want to find a balance between simplicity (degree of polynomial) and fit (e.g. least square error). Quantitatively, there are tests for this, the Akaike Information Criterion or the Bayesian Information Criterion. These tests give a score which model is to be prefered.
If you want to fit the (xi, f(xi)) to an polynomial of degree n then you would set up a linear least squares problem with the data (1, xi, xi, xi^2, ..., xi^n, f(xi) ). This will return a set of coefficients (c0, c1, ..., cn) so that the best fitting polynomial is *y = c0 + c1 * x + c2 * x^2 + ... + cn * x^n.*
You can generalize this two more than one dependent variable by including powers of y and combinations of x and y in the problem.
Lagrange polynomials (as #j w posted) give you an exact fit at the points you specify, but with polynomials of degree more than say 5 or 6 you can run into numerical instability.
Least squares gives you the "best fit" polynomial with error defined as the sum of squares of the individual errors. (take the distance along the y-axis between the points you have and the function that results, square them, and sum them up) The MATLAB polyfit function does this, and with multiple return arguments, you can have it automatically take care of scaling/offset issues (e.g. if you have 100 points all between x=312.1 and 312.3, and you want a 6th degree polynomial, you're going to want to calculate u = (x-312.2)/0.1 so the u-values are distributed between -1 and +=).
NOTE that the results of least-squares fits are strongly influenced by the distribution of x-axis values. If the x-values are equally spaced, then you'll get larger errors at the ends. If you have a case where you can choose the x values and you care about the maximum deviation from your known function and an interpolating polynomial, then the use of Chebyshev polynomials will give you something that is close to the perfect minimax polynomial (which is very hard to calculate). This is discussed at some length in Numerical Recipes.
Edit: From what I gather, this all works well for functions of one variable. For multivariate functions it is likely to be much more difficult if the degree is more than, say, 2. I did find a reference on Google Books.
at college we had this book which I still find extremely useful: Conte, de Boor; elementary numerical analysis; Mc Grow Hill. The relevant paragraph is 6.2: Data Fitting.
example code comes in FORTRAN, and the listings are not very readable either, but the explanations are deep and clear at the same time. you end up understanding what you are doing, not just doing it (as is my experience of Numerical Recipes).
I usually start with Numerical Recipes but for things like this I quickly have to grab Conte-de Boor.
maybe better posting some code... it's a bit stripped down, but the most relevant parts are there. it relies on numpy, obviously!
def Tn(n, x):
if n==0:
return 1.0
elif n==1:
return float(x)
else:
return (2.0 * x * Tn(n - 1, x)) - Tn(n - 2, x)
class ChebyshevFit:
def __init__(self):
self.Tn = Memoize(Tn)
def fit(self, data, degree=None):
"""fit the data by a 'minimal squares' linear combination of chebyshev polinomials.
cfr: Conte, de Boor; elementary numerical analysis; Mc Grow Hill (6.2: Data Fitting)
"""
if degree is None:
degree = 5
data = sorted(data)
self.range = start, end = (min(data)[0], max(data)[0])
self.halfwidth = (end - start) / 2.0
vec_x = [(x - start - self.halfwidth)/self.halfwidth for (x, y) in data]
vec_f = [y for (x, y) in data]
mat_phi = [numpy.array([self.Tn(i, x) for x in vec_x]) for i in range(degree+1)]
mat_A = numpy.inner(mat_phi, mat_phi)
vec_b = numpy.inner(vec_f, mat_phi)
self.coefficients = numpy.linalg.solve(mat_A, vec_b)
self.degree = degree
def evaluate(self, x):
"""use Clenshaw algorithm
http://en.wikipedia.org/wiki/Clenshaw_algorithm
"""
x = (x-self.range[0]-self.halfwidth) / self.halfwidth
b_2 = float(self.coefficients[self.degree])
b_1 = 2 * x * b_2 + float(self.coefficients[self.degree - 1])
for i in range(2, self.degree):
b_1, b_2 = 2.0 * x * b_1 + self.coefficients[self.degree - i] - b_2, b_1
else:
b_0 = x*b_1 + self.coefficients[0] - b_2
return b_0
Remember, there's a big difference between approximating the polynomial and finding an exact one.
For example, if I give you 4 points, you could
Approximate a line with a method like least squares
Approximate a parabola with a method like least squares
Find an exact cubic function through these four points.
Be sure to select the method that's right for you!
It's rather easy to scare up a quick fit using Excel's matrix functions if you know how to represent the least squares problem as a linear algebra problem. (That depends on how reliable you think Excel is as a linear algebra solver.)
The lagrange polynomial is in some sense the "simplest" interpolating polynomial that fits a given set of data points.
It is sometimes problematic because it can vary wildly between data points.

Resources