Keeping exact wave form in memory - math

Let's say I have a program that calculates the value of the sine wave at time t. The sine wave is of the form sin(f*t + phi). Amplitude is 1.
If I only have one sin term all is fine. I can easily calculate the value at any time t.
But, at runtime, the wave form becomes modified when it combines with other waves. sin(f1 * t + phi1) + sin(f2 * t + phi2) + sin(f3 * t + phi3) + ...
The simplest solution is to have a table with columns for phi and f, iterate over all rows, and sum the results. But to me it feels that once I reach thousands of rows, the computation will become slow.
Is there a different way of doing this? Like combining all the sines into one statement/formula?

If you have a Fourier series (i.e. f_i = i f for some f) you can use the Clenshaw recurrence relation which is significantly faster than computing all the sines (but it might be slightly less accurate).
In your case you can consider the sequence:
f_k = exp( i ( k f t + phi_k) ) , where i is the imaginary unit.
Notice that Im(f_k) = sin( k f t + phi_k ), that is your sequence.
Also
f_k = exp( i ( k f t + phi_k) ) = exp( i k f t ) exp( i phi_k )
Hence you have a_k = exp(i phi_k). You can precompute these values and store them in an array. For simplicity from now on assume a_0 = 0.
Now, exp( i (k + 1) f t) = exp(i k f t) * exp(i f t), so alpha_k = exp(i f t) and beta_k = 0.
You can now apply the recurrence formula, in C++ you can do something like this:
complex<double> clenshaw_fourier(double f, double t, const vector< complex<double> > & a )
{
const complex<double> alpha = exp(f * t * i);
complex<double> b = 0;
for (int k = a.size() - 1; k >0; -- k )
b = a[k] + alpha * b;
return a[0] + alpha * b;
}
Assuming that a[k] == exp( i phi_k ).
The real part of the answer is the sum of cos(k f t + phi_k), while the imaginary part is the sum of sin(k f t + phi_k).
As you can see this only uses addition and multiplications, except for exp(f * t * i) that is only computed once.

There are different bases (plural of basis) that can be advantageous (i.e. compact) for representing different waveforms. The most common and well-known one is that which you mention, called the Fourier basis usually. Daubechies wavelets for example are a relatively recent addition that cope with more discontinuous waveforms much better than a Fourier basis does. But this is really a math topic and probably if you post on Math Overflow you will get better answers.

Related

Renewal Function for Weibull Distribution

The renewal function for Weibull distribution m(t) with t = 10 is given as below.
I want to find the value of m(t). I wrote the following r code to compute m(t)
last_term = NULL
gamma_k = NULL
n = 50
for(k in 1:n){
gamma_k[k] = gamma(2*k + 1)/factorial(k)
}
for(j in 1: (n-1)){
prev = gamma_k[n-j]
last_term[j] = gamma(2*j + 1)/factorial(j)*prev
}
final_term = NULL
find_value = function(n){
for(i in 2:n){
final_term[i] = gamma_k[i] - sum(last_term[1:(i-1)])
}
return(final_term)
}
all_k = find_value(n)
af_sum = NULL
m_t = function(t){
for(k in 1:n){
af_sum[k] = (-1)^(k-1) * all_k[k] * t^(2*k)/gamma(2*k + 1)
}
return(sum(na.omit(af_sum)))
}
m_t(20)
The output is m(t) = 2.670408e+93. Does my iteratvie procedure correct? Thanks.
I don't think it will work. First, lets move Γ(2k+1) from denominator of m(t) into Ak. Thus, Ak will behave roughly as 1/k!.
In the nominator of the m(t) terms there is t2k, so roughly speaking you're computing sum with terms
100k/k!
From Stirling formula
k! ~ kk, making terms
(100/k)k
so yes, they will start to decrease and converge to something but after 100th term
Anyway, here is the code, you could try to improve it, but it breaks at k~70
N <- 20
A <- rep(0, N)
# compute A_k/gamma(2k+1) terms
ps <- 0.0 # previous sum
A[1] = 1.0
for(k in 2:N) {
ps <- ps + A[k-1]*gamma(2*(k-1) + 1)/factorial(k-1)
A[k] <- 1.0/factorial(k) - ps/gamma(2*k+1)
}
print(A)
t <- 10.0
t2 <- t*t
r <- 0.0
for(k in 1:N){
r <- r + (-t2)^k*A[k]
}
print(-r)
UPDATE
Ok, I calculated Ak as in your question, got the same answer. I want to estimate terms Ak/Γ(2k+1) from m(t), I believe it will be pretty much dominated by 1/k! term. To do that I made another array k!*Ak/Γ(2k+1), and it should be close to one.
Code
N <- 20
A <- rep(0.0, N)
psum <- function( pA, k ) {
ps <- 0.0
if (k >= 2) {
jmax <- k - 1
for(j in 1:jmax) {
ps <- ps + (gamma(2*j+1)/factorial(j))*pA[k-j]
}
}
ps
}
# compute A_k/gamma(2k+1) terms
A[1] = gamma(3)
for(k in 2:N) {
A[k] <- gamma(2*k+1)/factorial(k) - psum(A, k)
}
print(A)
B <- rep(0.0, N)
for(k in 1:N) {
B[k] <- (A[k]/gamma(2*k+1))*factorial(k)
}
print(B)
shows that
I got the same Ak values as you did.
Bk is indeed very close to 1
It means that term Ak/Γ(2k+1) could be replaced by 1/k! to get quick estimate of what we might get (with replacement)
m(t) ~= - Sum(k=1, k=Infinity) (-1)k (t2)k / k! = 1 - Sum(k=0, k=Infinity) (-t2)k / k!
This is actually well-known sum and it is equal to exp() with negative argument (well, you have to add term for k=0)
m(t) ~= 1 - exp(-t2)
Conclusions
Approximate value is positive. Probably will stay positive after all, Ak/Γ(2k+1) is a bit different from 1/k!.
We're talking about 1 - exp(-100), which is 1-3.72*10-44! And we're trying to compute it precisely summing and subtracting values on the order of 10100 or even higher. Even with MPFR I don't think this is possible.
Another approach is needed
OK, so I ended up going down a pretty different road on this. I have implemented a simple discretization of the integral equation which defines the renewal function:
m(t) = F(t) + integrate (m(t - s)*f(s), s, 0, t)
The integral is approximated with the rectangle rule. Approximating the integral for different values of t gives a system of linear equations. I wrote a function to generate the equations and extract a matrix of coefficients from it. After looking at some examples, I guessed a rule to define the coefficients directly and used that to generate solutions for some examples. In particular I tried shape = 2, t = 10, as in OP's example, with step = 0.1 (so 101 equations).
I found that the result agrees pretty well with an approximate result which I found in a paper (Baxter et al., cited in the code). Since the renewal function is the expected number of events, for large t it is approximately equal to t/mu where mu is the mean time between events; this is a handy way to know if we're anywhere in the neighborhood.
I was working with Maxima (http://maxima.sourceforge.net), which is not efficient for numerical stuff, but which makes it very easy to experiment with different aspects. At this point it would be straightforward to port the final, numerical stuff to another language such as Python.
Thanks to OP for suggesting the problem, and S. Pappadeux for insightful discussions. Here is the plot I got comparing the discretized approximation (red) with the approximation for large t (blue). Trying some examples with different step sizes, I saw that the values tend to increase a little as step size gets smaller, so I think the red line is probably a little low, and the blue line might be more nearly correct.
Here is my Maxima code:
/* discretize weibull renewal function and formulate system of linear equations
* copyright 2020 by Robert Dodier
* I release this work under terms of the GNU General Public License
*
* This is a program for Maxima, a computer algebra system.
* http://maxima.sourceforge.net/
*/
"Definition of the renewal function m(t):" $
renewal_eq: m(t) = F(t) + 'integrate (m(t - s)*f(s), s, 0, t);
"Approximate integral equation with rectangle rule:" $
discretize_renewal (delta_t, k) :=
if equal(k, 0)
then m(0) = F(0)
else m(k*delta_t) = F(k*delta_t)
+ m(k*delta_t)*f(0)*(delta_t / 2)
+ sum (m((k - j)*delta_t)*f(j*delta_t)*delta_t, j, 1, k - 1)
+ m(0)*f(k*delta_t)*(delta_t / 2);
make_eqs (n, delta_t) :=
makelist (discretize_renewal (delta_t, k), k, 0, n);
make_vars (n, delta_t) :=
makelist (m(k*delta_t), k, 0, n);
"Discretized integral equation and variables for n = 4, delta_t = 1/2:" $
make_eqs (4, 1/2);
make_vars (4, 1/2);
make_eqs_vars (n, delta_t) :=
[make_eqs (n, delta_t), make_vars (n, delta_t)];
load (distrib);
subst_pdf_cdf (shape, scale, e) :=
subst ([f = lambda ([x], pdf_weibull (x, shape, scale)), F = lambda ([x], cdf_weibull (x, shape, scale))], e);
matrix_from (eqs, vars) :=
(augcoefmatrix (eqs, vars),
[submatrix (%%, length(%%) + 1), - col (%%, length(%%) + 1)]);
"Subsitute Weibull pdf and cdf for shape = 2 into discretized equation:" $
apply (matrix_from, make_eqs_vars (4, 1/2));
subst_pdf_cdf (2, 1, %);
"Just the right-hand side matrix:" $
rhs_matrix_from (eqs, vars) :=
(map (rhs, eqs),
augcoefmatrix (%%, vars),
[submatrix (%%, length(%%) + 1), col (%%, length(%%) + 1)]);
"Generate the right-hand side matrix, instead of extracting it from equations:" $
generate_rhs_matrix (n, delta_t) :=
[delta_t * genmatrix (lambda ([i, j], if i = 1 and j = 1 then 0
elseif j > i then 0
elseif j = i then f(0)/2
elseif j = 1 then f(delta_t*(i - 1))/2
else f(delta_t*(i - j))), n + 1, n + 1),
transpose (makelist (F(k*delta_t), k, 0, n))];
"Generate numerical right-hand side matrix, skipping over formulas:" $
generate_rhs_matrix_numerical (shape, scale, n, delta_t) :=
block ([f, F, numer: true], local (f, F),
f: lambda ([x], pdf_weibull (x, shape, scale)),
F: lambda ([x], cdf_weibull (x, shape, scale)),
[genmatrix (lambda ([i, j], delta_t * if i = 1 and j = 1 then 0
elseif j > i then 0
elseif j = i then f(0)/2
elseif j = 1 then f(delta_t*(i - 1))/2
else f(delta_t*(i - j))), n + 1, n + 1),
transpose (makelist (F(k*delta_t), k, 0, n))]);
"Solve approximate integral equation (shape = 3, t = 1) via LU decomposition:" $
fpprintprec: 4 $
n: 20 $
t: 1;
[AA, bb]: generate_rhs_matrix_numerical (3, 1, n, t/n);
xx_by_lu: linsolve_by_lu (ident(n + 1) - AA, bb, floatfield);
"Iterative solution of approximate integral equation (shape = 3, t = 1):" $
xx: bb;
for i thru 10 do xx: AA . xx + bb;
xx - (AA.xx + bb);
xx_iterative: xx;
"Should find iterative and LU give same result:" $
xx_diff: xx_iterative - xx_by_lu[1];
sqrt (transpose(xx_diff) . xx_diff);
"Try shape = 2, t = 10:" $
n: 100 $
t: 10 $
[AA, bb]: generate_rhs_matrix_numerical (2, 1, n, t/n);
xx_by_lu: linsolve_by_lu (ident(n + 1) - AA, bb, floatfield);
"Baxter, et al., Eq. 3 (for large values of t) compared to discretization:" $
/* L.A. Baxter, E.M. Scheuer, D.J. McConalogue, W.R. Blischke.
* "On the Tabulation of the Renewal Function,"
* Econometrics, vol. 24, no. 2 (May 1982).
* H(t) is their notation for the renewal function.
*/
H(t) := t/mu + sigma^2/(2*mu^2) - 1/2;
tx_points: makelist ([float (k/n*t), xx_by_lu[1][k, 1]], k, 1, n);
plot2d ([H(u), [discrete, tx_points]], [u, 0, t]), mu = mean_weibull(2, 1), sigma = std_weibull(2, 1);

Given a list of coefficients, create a polynomial

I want to create a polynomial with given coefficients. This seems very simple but what I have found till now did not appear to be the thing I desired.
For example in such an environment;
n = 11
K = GF(4,'a')
R = PolynomialRing(GF(4,'a'),"x")
x = R.gen()
a = K.gen()
v = [1,a,0,0,1,1,1,a,a,0,1]
Given a list/vector v of length n (I will set this n and v at the begining), I want to get the polynomial v(x) as v[i]*x^i.
(Actually after that I am going to build the quotient ring GF(4,'a')[x] /< x^n-v(x) > after getting this v(x) from above) then I will say;
S = R.quotient(x^n-v(x), 'y')
y = S.gen()
But I couldn't write it.
This is a frequently asked question in many places so it is better to leave it here as an answer although the answer I have is so simple:
I just wrote R(v) and it gave me the polynomial:
sage
n = 11
K = GF(4,'a')
R = PolynomialRing(GF(4,'a'),"x")
x = R.gen()
a = K.gen()
v = [1,a,0,0,1,1,1,a,a,0,1]
R(v)
x^10 + a*x^8 + a*x^7 + x^6 + x^5 + x^4 + a*x + 1
Basically (that is, ignoring the specifics of your polynomial ring) you have a list/vector v of length n and you require a polynomial which is the sum of all v[i]*x^i. Note that this sum equals the matrix product V.X where V is a one row matrix (essentially equal to the vector v) and X is a column matrix consisting of powers of x. In Maxima you could write
v: [1,a,0,0,1,1,1,a,a,0,1]$
n: length(v)$
V: matrix(v)$
X: genmatrix(lambda([i,j], x^(i-1)), n, 1)$
V.X;
The output is
x^10+ax^8+ax^7+x^6+x^5+x^4+a*x+1

Running Time of a d-ary heap?

How is the running time of a d-Ary heap simplified from O(logd n) to O( (log n) / (log d))?
A correct simplification would be:
logdn = log d * log n
How is the division simplification derived?
This uses the common identity to convert between logarithmic bases:
logx(z) = logm(z) / logm(x)
By multiplying both sides by logm(x), you get:
logm(z) = logx(z) * logm(x)
Which is equivalent to the answer in the question you site.
More information is available here.
Suppose x = logd(n) Equivalently we have, n = dx Then log2n = log2(dx) = x log2(d) Dividing through by log2(d) yields: log2(n) / log2(d) = x And so log2(n) / log2(d) = x = logd(n)
Of course, assuming d is fixed, then log2(d) is just a constant. And so O( logd(n) ) = O( 1 / log2(d) * log2(n) ) = O( log2(n) ) That is, so far as Big-Oh notation is concerned, you can change out any logarithm base (larger than 1) for any other (such) logarithm base. And so it's customary to just drop the base and write O( log(n) )

Calculate bessel function in MATLAB using Jm+1=2mj(m) -j(m-1) formula

I tried to implement bessel function using that formula, this is the code:
function result=Bessel(num);
if num==0
result=bessel(0,1);
elseif num==1
result=bessel(1,1);
else
result=2*(num-1)*Bessel(num-1)-Bessel(num-2);
end;
But if I use MATLAB's bessel function to compare it with this one, I get too high different values.
For example if I type Bessel(20) it gives me 3.1689e+005 as result, if instead I type bessel(20,1) it gives me 3.8735e-025 , a totally different result.
such recurrence relations are nice in mathematics but numerically unstable when implementing algorithms using limited precision representations of floating-point numbers.
Consider the following comparison:
x = 0:20;
y1 = arrayfun(#(n)besselj(n,1), x); %# builtin function
y2 = arrayfun(#Bessel, x); %# your function
semilogy(x,y1, x,y2), grid on
legend('besselj','Bessel')
title('J_\nu(z)'), xlabel('\nu'), ylabel('log scale')
So you can see how the computed values start to differ significantly after 9.
According to MATLAB:
BESSELJ uses a MEX interface to a Fortran library by D. E. Amos.
and gives the following as references for their implementation:
D. E. Amos, "A subroutine package for Bessel functions of a complex
argument and nonnegative order", Sandia National Laboratory Report,
SAND85-1018, May, 1985.
D. E. Amos, "A portable package for Bessel functions of a complex
argument and nonnegative order", Trans. Math. Software, 1986.
The forward recurrence relation you are using is not stable. To see why, consider that the values of BesselJ(n,x) become smaller and smaller by about a factor 1/2n. You can see this by looking at the first term of the Taylor series for J.
So, what you're doing is subtracting a large number from a multiple of a somewhat smaller number to get an even smaller number. Numerically, that's not going to work well.
Look at it this way. We know the result is of the order of 10^-25. You start out with numbers that are of the order of 1. So in order to get even one accurate digit out of this, we have to know the first two numbers with at least 25 digits precision. We clearly don't, and the recurrence actually diverges.
Using the same recurrence relation to go backwards, from high orders to low orders, is stable. When you start with correct values for J(20,1) and J(19,1), you can calculate all orders down to 0 with full accuracy as well. Why does this work? Because now the numbers are getting larger in each step. You're subtracting a very small number from an exact multiple of a larger number to get an even larger number.
You can just modify the code below which is for the Spherical bessel function. It is well tested and works for all arguments and order range. I am sorry it is in C#
public static Complex bessel(int n, Complex z)
{
if (n == 0) return sin(z) / z;
if (n == 1) return sin(z) / (z * z) - cos(z) / z;
if (n <= System.Math.Abs(z.real))
{
Complex h0 = bessel(0, z);
Complex h1 = bessel(1, z);
Complex ret = 0;
for (int i = 2; i <= n; i++)
{
ret = (2 * i - 1) / z * h1 - h0;
h0 = h1;
h1 = ret;
if (double.IsInfinity(ret.real) || double.IsInfinity(ret.imag)) return double.PositiveInfinity;
}
return ret;
}
else
{
double u = 2.0 * abs(z.real) / (2 * n + 1);
double a = 0.1;
double b = 0.175;
int v = n - (int)System.Math.Ceiling((System.Math.Log(0.5e-16 * (a + b * u * (2 - System.Math.Pow(u, 2)) / (1 - System.Math.Pow(u, 2))), 2)));
Complex ret = 0;
while (v > n - 1)
{
ret = z / (2 * v + 1.0 - z * ret);
v = v - 1;
}
Complex jnM1 = ret;
while (v > 0)
{
ret = z / (2 * v + 1.0 - z * ret);
jnM1 = jnM1 * ret;
v = v - 1;
}
return jnM1 * sin(z) / z;
}
}

Roots of a Quartic Function

I came across a situation doing some advanced collision detection, where I needed to calculate the roots of a quartic function.
I wrote a function that seems to work fine using Ferrari's general solution as seen here: http://en.wikipedia.org/wiki/Quartic_function#Ferrari.27s_solution.
Here's my function:
private function solveQuartic(A:Number, B:Number, C:Number, D:Number, E:Number):Array{
// For paramters: Ax^4 + Bx^3 + Cx^2 + Dx + E
var solution:Array = new Array(4);
// Using Ferrari's formula: http://en.wikipedia.org/wiki/Quartic_function#Ferrari.27s_solution
var Alpha:Number = ((-3 * (B * B)) / (8 * (A * A))) + (C / A);
var Beta:Number = ((B * B * B) / (8 * A * A * A)) - ((B * C) / (2 * A * A)) + (D / A);
var Gamma:Number = ((-3 * B * B * B * B) / (256 * A * A * A * A)) + ((C * B * B) / (16 * A * A * A)) - ((B * D) / (4 * A * A)) + (E / A);
var P:Number = ((-1 * Alpha * Alpha) / 12) - Gamma;
var Q:Number = ((-1 * Alpha * Alpha * Alpha) / 108) + ((Alpha * Gamma) / 3) - ((Beta * Beta) / 8);
var PreRoot1:Number = ((Q * Q) / 4) + ((P * P * P) / 27);
var R:ComplexNumber = ComplexNumber.add(new ComplexNumber((-1 * Q) / 2), ComplexNumber.sqrt(new ComplexNumber(PreRoot1)));
var U:ComplexNumber = ComplexNumber.pow(R, 1/3);
var preY1:Number = (-5 / 6) * Alpha;
var RedundantY:ComplexNumber = ComplexNumber.add(new ComplexNumber(preY1), U);
var Y:ComplexNumber;
if(U.isZero()){
var preY2:ComplexNumber = ComplexNumber.pow(new ComplexNumber(Q), 1/3);
Y = ComplexNumber.subtract(RedundantY, preY2);
} else{
var preY3:ComplexNumber = ComplexNumber.multiply(new ComplexNumber(3), U);
var preY4:ComplexNumber = ComplexNumber.divide(new ComplexNumber(P), preY3);
Y = ComplexNumber.subtract(RedundantY, preY4);
}
var W:ComplexNumber = ComplexNumber.sqrt(ComplexNumber.add(new ComplexNumber(Alpha), ComplexNumber.multiply(new ComplexNumber(2), Y)));
var Two:ComplexNumber = new ComplexNumber(2);
var NegativeOne:ComplexNumber = new ComplexNumber(-1);
var NegativeBOverFourA:ComplexNumber = new ComplexNumber((-1 * B) / (4 * A));
var NegativeW:ComplexNumber = ComplexNumber.multiply(W, NegativeOne);
var ThreeAlphaPlusTwoY:ComplexNumber = ComplexNumber.add(new ComplexNumber(3 * Alpha), ComplexNumber.multiply(new ComplexNumber(2), Y));
var TwoBetaOverW:ComplexNumber = ComplexNumber.divide(new ComplexNumber(2 * Beta), W);
solution["root1"] = ComplexNumber.add(NegativeBOverFourA, ComplexNumber.divide(ComplexNumber.add(W, ComplexNumber.sqrt(ComplexNumber.multiply(NegativeOne, ComplexNumber.add(ThreeAlphaPlusTwoY, TwoBetaOverW)))), Two));
solution["root2"] = ComplexNumber.add(NegativeBOverFourA, ComplexNumber.divide(ComplexNumber.subtract(NegativeW, ComplexNumber.sqrt(ComplexNumber.multiply(NegativeOne, ComplexNumber.subtract(ThreeAlphaPlusTwoY, TwoBetaOverW)))), Two));
solution["root3"] = ComplexNumber.add(NegativeBOverFourA, ComplexNumber.divide(ComplexNumber.subtract(W, ComplexNumber.sqrt(ComplexNumber.multiply(NegativeOne, ComplexNumber.add(ThreeAlphaPlusTwoY, TwoBetaOverW)))), Two));
solution["root4"] = ComplexNumber.add(NegativeBOverFourA, ComplexNumber.divide(ComplexNumber.add(NegativeW, ComplexNumber.sqrt(ComplexNumber.multiply(NegativeOne, ComplexNumber.subtract(ThreeAlphaPlusTwoY, TwoBetaOverW)))), Two));
return solution;
}
The only issue is that I seem to get a few exceptions. Most notably when I have two real roots, and two imaginary roots.
For example, this equation:
y = 0.9604000000000001x^4 - 5.997600000000001x^3 + 13.951750054511718x^2 - 14.326264455924333x + 5.474214401412618
Returns the roots:
1.7820304835380467 + 0i
1.34041662585388 + 0i
1.3404185025061823 + 0i
1.7820323472855648 + 0i
If I graph that particular equation, I can see that the actual roots are closer to 1.2 and 2.9 (approximately). I can't dismiss the four incorrect roots as random, because they're actually two of the roots for the equation's first derivative:
y = 3.8416x^3 - 17.9928x^2 + 27.9035001x - 14.326264455924333
Keep in mind that I'm not actually looking for the specific roots to the equation I posted. My question is whether there's some sort of special case that I'm not taking into consideration.
Any ideas?
For finding roots of polynomials of degree >= 3, I've always had better results using Jenkins-Traub ( http://en.wikipedia.org/wiki/Jenkins-Traub_algorithm ) than explicit formulas.
I do not know why Ferrari's solution does not work, but I tried to use the standard numerical method (create a companion matrix and compute its eigenvalues), and I obtain the correct solution, i.e., two real roots at 1.2 and 1.9.
This method is not for the faint of heart. After constructing the companion matrix of the polynomial, you run the QR algorithm to find the eigenvalues of that matrix. Those are the zeroes of the polynomial.
I suggest you to use an existing implementation of the QR algorithm since a good deal of it is closer to kitchen recipe than algorithmics. But it is, I believe, the most widely used algorithm to compute eigenvalues, and thereby, roots of polynomials.
You can see my answer to a related question. I support the view of Olivier: the way to go may just be the companion matrix / eigenvalue approach (very stable, simple, reliable, and fast).
Edit
I guess it does not hurt if I reproduce the answer here, for convenience:
The numerical solution for doing this many times in a reliable, stable manner, involve: (1) Form the companion matrix, (2) find the eigenvalues of the companion matrix.
You may think this is a harder problem to solve than the original one, but this is how the solution is implemented in most production code (say, Matlab).
For the polynomial:
p(t) = c0 + c1 * t + c2 * t^2 + t^3
the companion matrix is:
[[0 0 -c0],[1 0 -c1],[0 1 -c2]]
Find the eigenvalues of such matrix; they correspond to the roots of the original polynomial.
For doing this very fast, download the singular value subroutines from LAPACK, compile them, and link them to your code. Do this in parallel if you have too many (say, about a million) sets of coefficients. You could use QR decomposition, or any other stable methodology for computing eigenvalues (see the Wikipedia entry on "matrix eigenvalues").
Notice that the coefficient of t^3 is one, if this is not the case in your polynomials, you will have to divide the whole thing by the coefficient and then proceed.
Good luck.
Edit: Numpy and octave also depend on this methodology for computing the roots of polynomials. See, for instance, this link.
The other answers are good and sound advice. However, recalling my experience with the implementation of Ferrari's method in Forth, I think your wrong results are probably caused by 1. wrong implementation of the necessary and rather tricky sign combinations, 2. not realizing yet that ".. == beta" in floating-point should become "abs(.. - beta) < eps, 3. not yet having found out that there are other square roots in the code that may return complex solutions.
For this particular problem my Forth code in diagnostic mode returns:
x1 = 1.5612244897959360787072371026316680470492e+0000 -1.6542769593216835969789894020584464029664e-0001 i
--> -4.2123274051525879873007970023884313331788e-0054 3.4544674220377778501545407451201598284464e-0077 i
x2 = 1.5612244897959360787072371026316680470492e+0000 1.6542769593216835969789894020584464029664e-0001 i
--> -4.2123274051525879873007970023884313331788e-0054 -3.4544674220377778501545407451201598284464e-0077 i
x3 = 1.2078440724224197532447709413299479764843e+0000 0.0000000000000000000000000000000000000000e-0001 i
--> -4.2123274051525879873010733597821943554068e-0054 0.0000000000000000000000000000000000000000e-0001 i
x4 = 1.9146049071693819497220585618954851525216e+0000 -0.0000000000000000000000000000000000000000e-0001 i
--> -4.2123274051525879873013497171759573776348e-0054 0.0000000000000000000000000000000000000000e-0001 i
The text after "-->" follows from backsubstituting the root into the original equation.
For reference, here are Mathematica/Alpha's results to the highest possible precision I managed to set it:
Mathematica:
x1 = 1.20784407242
x2 = 1.91460490717
x3 = 1.56122449 - 0.16542770 i
x4 = 1.56122449 + 0.16542770 i
A good alternative to the methods already mentioned is the TOMS Algorithm 326, which is based on the paper "Roots of Low Order Polynomials" by Terence R.F.Nonweiler CACM (Apr 1968).
This is an algebraic solution to 3rd and 4th order polynomials that is reasonably compact, fast, and quite accurate. It is much simpler than Jenkins Traub.
Be warned however that the TOMS code doesn't work all that well.
This Iowa Hills Root Solver page has code for a Quartic / Cubic root finder that is a bit more refined. It also has a Jenkins Traub type root finder.

Resources