Solving a system of differential equations in R - r

I have a simple flux model in R. It boils down to two differential equations that model two state variables within the model, we'll call them A and B. They are calculated as simple difference equations of four component fluxes flux1-flux4, 5 parameters p1-p5, and a 6th parameter, of_interest, that can take on values between 0-1.
parameters<- c(p1=0.028, p2=0.3, p3=0.5, p4=0.0002, p5=0.001, of_interest=0.1)
state <- c(A=28, B=1.4)
model<-function(t,state,parameters){
with(as.list(c(state,parameters)),{
#fluxes
flux1 = (1-of_interest) * p1*(B / (p2 + B))*p3
flux2 = p4* A #microbial death
flux3 = of_interest * p1*(B / (p2 + B))*p3
flux4 = p5* B
#differential equations of component fluxes
dAdt<- flux1 - flux2
dBdt<- flux3 - flux4
list(c(dAdt,dBdt))
})
I would like to write a function to take the derivative of dAdt with respect to of_interest, set the derived equation to 0, then rearrange and solve for the value of of_interest. This will be the value of the parameter of_interest that maximizes the function dAdt.
So far I have been able to solve the model at steady state, across the possible values of of_interest to demonstrate there should be a maximum.
require(rootSolve)
range<- seq(0,1,by=0.01)
for(i in range){
of_interest=i
parameters<- c(p1=0.028, p2=0.3, p3=0.5, p4=0.0002, p5=0.001, of_interest=of_interest)
state <- c(A=28, B=1.4)
ST<- stode(y=y,func=model,parms=parameters,pos=T)
out<- c(out,ST$y[1])
Then plotting:
plot(out~range, pch=16,col='purple')
lines(smooth.spline(out~range,spar=0.35), lwd=3,lty=1)
How can I analytically solve for the value of of_interest that maximizes dAdt in R? If an analytical solution is not possible, how can I know, and how can I go about solving this numerically?
Update: I think this problem can be solved with the deSolve package in R, linked here, however I am having trouble implementing it using my particular example.

Your equation in B(t) is just-about separable since you can divide out B(t), from which you can get that
B(t) = C * exp{-p5 * t} * (p2 + B(t)) ^ {of_interest * p1 * p3}
This is an implicit solution for B(t) which we'll solve point-wise.
You can solve for C given your initial value of B. I suppose t = 0 initially? In which case
C = B_0 / (p2 + B_0) ^ {of_interest * p1 * p3}
This also gives a somewhat nicer-looking expression for A(t):
dA(t) / dt = B_0 / (p2 + B_0) * p1 * p3 * (1 - of_interest) *
exp{-p5 * t} * ((p2 + B(t) / (p2 + B_0)) ^
{of_interest * p1 * p3 - 1} - p4 * A(t)
This can be solved by integrating factor (= exp{p4 * t}), via numerical integration of the term involving B(t). We specify the lower limit of the integral as 0 so that we never have to evaluate B outside the range [0, t], which means the integrating constant is simply A_0 and thus:
A(t) = (A_0 + integral_0^t { f(tau; parameters) d tau}) * exp{-p4 * t}
The basic gist is B(t) is driving everything in this system -- the approach will be: solve for the behavior of B(t), then use this to figure out what's going on with A(t), then maximize.
First, the "outer" parameters; we also need nleqslv to get B:
library(nleqslv)
t_min <- 0
t_max <- 10000
t_N <- 10
#we'll only solve the behavior of A & B over t_rng
t_rng <- seq(t_min, t_max, length.out = t_N)
#I'm calling of_interest ttheta
ttheta_min <- 0
ttheta_max <- 1
ttheta_N <- 5
tthetas <- seq(ttheta_min, ttheta_max, length.out = ttheta_N)
B_0 <- 1.4
A_0 <- 28
#No sense storing this as a vector when we'll only ever use it as a list
parameters <- list(p1 = 0.028, p2 = 0.3, p3 = 0.5,
p4 = 0.0002, p5 = 0.001)
From here, the basic outline is:
Given the parameter values (in particular ttheta), solve for BB over t_rng via non-linear equation solving
Given BB and the parameter values, solve for AA over t_rng by numerical integration
Given AA and your expression for dAdt, plug & maximize.
derivs <-
sapply(tthetas, function(th){
#append current ttheta
params <- c(parameters, ttheta = th)
#declare a function we'll use to solve for B (see above)
b_slv <- function(b, t)
with(params, b - B_0 * ((p2 + b)/(p2 + B_0)) ^
(ttheta * p1 * p3) * exp(-p5 * t))
#solving point-wise (this is pretty fast)
# **See below for a note**
BB <- sapply(t_rng, function(t) nleqslv(B_0, function(b) b_slv(b, t))$x)
#this is f(tau; params) that I mentioned above;
# we have to do linear interpolation since the
# numerical integrator isn't constrained to the grid.
# **See below for note**
a_int <- function(t){
#approximate t to the grid (t_rng)
# (assumes B is monotonic, which seems to be true)
# (also, if t ends up negative, just assign t_rng[1])
t_n <- max(1L, which.max(t_rng - t >= 0) - 1L)
idx <- t_n:(t_n+1)
ts <- t_rng[idx]
#distance-weighted average of the local B values
B_app <- sum((-1) ^ (0:1) * (t - ts) / diff(ts) * BB[idx])
#finally, f(tau; params)
with(params, (1 - ttheta) * p1 * p3 * B_0 / (p2 + B_0) *
((p2 + B_app)/(p2 + B_0)) ^ (ttheta * p1 * p3 - 1) *
exp((p4 - p5) * t))
}
#a_int only works on scalars; the numeric integrator
# requires a version that works on vectors
a_int_v <- function(t) sapply(t, a_int)
AA <- exp(-params$p4 * t_rng) *
sapply(t_rng, function(tt)
#I found the subdivisions constraint binding in some cases
# at the default value; no trouble at 1000.
A_0 + integrate(a_int_v, 0, tt, subdivisions = 1000L)$value)
#using the explicit version of dAdt given as flux1 - flux2
max(with(params, (1 - ttheta) * p1 * p3 * BB / (p2 + BB) - p4 * AA))})
Finally, simply run `tthetas[which.max(derivs)]` to get the maximizer.
Note:
This code is not optimized for efficiency. There are a few places where there are some potential speed-ups:
probably faster to run the equation solver recursively, as it'll converge faster with better initial guesses -- using the previous value instead of the initial value is surely better
Will be faster to simply use Riemann sums to integrate; the tradeoff is in accuracy, but should be fine if you have a dense enough grid. One beauty of Riemann is you won't have to interpolate at all, and numerically they're simple linear algebra. I ran this with t_N == ttheta_N == 1000L and it ran within a few minutes.
Probably possible to vectorize a_int directly instead of just sapplying on it, which concomitant speed-up by more direct appeal to BLAS.
Loads of other small stuff. Pre-compute ttheta * p1 * p3 since it's re-used so much, etc.
I didn't bother including any of that stuff, though, because you're honestly probably better off porting this to a faster language -- Julia is my own pet favorite, but of course R speaks well with C++, C, Fortran, etc.

Related

Multi-parameter optimization in R

I'm trying to estimate parameters that will maximize the likelihood of a certain event. My objective function looks like that:
event_prob = function(p1, p2) {
x = ((1-p1-p2)^4)^67 *
((1-p1-p2)^3*p2)^5 *
((1-p1-p2)^3*p1)^2 *
((1-p1-p2)^2*p1*p2)^3 *
((1-p1-p2)^2*p1^2) *
((1-p1-p2)*p1^2*p2)^2 *
(p1^3*p2) *
(p1^4)
return(x)
}
In this case, I'm looking for p1 and p2 [0,1] that will maximize this function. I tried using optim() in the following manner:
aaa = optim(c(0,0),event_prob)
but I'm getting an error "Error in fn(par, ...) : argument "p2" is missing, with no default".
Am I using optim() wrong? Or is there a different function (package?) I should be using for multi-parameter optimization?
This problem can in fact be solved analytically.
The objective function simplifies to
F(p1,p2) = (1-p1-p2)^299 * p1^19 * p2^11
which is to be maximised over the region
C = { (p1,p2) | 0<=p1, 0<=p2, p1+p2<=1 }
Note that F is 0 if p1=0 or p2 =0 or p1+p2 = 1, while if none of those are true then F is positive. Thus the maximum of F occurs in the interior of C
Taking the log
f(p1,p2) = 299*log(1-p1-p2) + 19*log(p1) + 11*log(p2)
In fact it is as easy to solve the more general problem: maximise f over C where
f( p1,..pN) = b*log( 1-p1-..-pn) + Sum{ a[j]*log(p[j])}
where b and each a[j] is positive and
C = { (p1,..pN) | 0<pj, j=1..N and p1+p2+..pN<1 }
The critical point occurs where all the partial derivatives of f are zero, which is at
-b/(1-p1-..-pn) + a[j]/p[j] = 0 j=1..N
which can be written as
b*p[j] + a[j]*(p1+..p[N]) = a[j] j=1..N
or
M*p = a
where M = b*I + a*Ones', and Ones is a vector with each component 1
The inverse of M is
inv(M) = (1/b)*(I - a*Ones'/(b + Ones'*a))
Thus the unique critical point is
p^ = inv(M)*a
= a/(b + Sum{i|a[i]})
Since there is a maximum, and only one critical point, the critical point must be the maximum.
Based on Erwin Kalvelagen's comment: Redefine your function event_prob:
event_prob = function(p) {
p1 = p[1]
p2 = p[2]
x = ((1-p1-p2)^4)^67 *
((1-p1-p2)^3*p2)^5 *
((1-p1-p2)^3*p1)^2 *
((1-p1-p2)^2*p1*p2)^3 *
((1-p1-p2)^2*p1^2) *
((1-p1-p2)*p1^2*p2)^2 *
(p1^3*p2) *
(p1^4)
return(x)
}
You may want to set limits to ensure that p1 and p2 fulfill your constraints:
optim(c(0.5,0.5),event_prob,method="L-BFGS-B",lower=0,upper=1)

deSolve ODE Not Working with Differential Equations (Calculates NA)

I have been using method deSolve::ode45 which has been working until I made a few necessary changes to my equations. Does anyone know why the ODE solver is not working? I have tried running with ode45 as well as the default ode method and neither work. Please let me know if any further explanation would be helpful.
I have checked over the differential equations and I am confident they are correct.
The equations used are as follows:
CCHFModel = function(t,x,params)
{
# get SIR values
SH <- x[1]
EH <- x[2]
IA <- x[3]
IS <- x[4]
RH <- x[5]
ST <- x[6]
IT <- x[7]
SC <- x[9]
IC <- x[10]
RC <- x[11]
# Load values ----
# Beta values
betaHHA = params["betaHHA"]
betaHHS = params["betaHHS"]
betaTH = params["betaTH"]
betaCH = params["betaCH"]
betaTC = params["betaTC"]
betaCT = params["betaCT"]
betaTT = params["betaTT"]
# Gamma value
gamma = params["gamma"]
# death rates
muH = params["muH"]
muT = params["muT"]
muC = params["muC"]
# birth rates
piH = params["piH"]
piT = params["piT"]
piC = params["piC"]
# incubation
deltaHS = params["deltaHS"]
deltaHA = params["deltaHA"]
# recovery rate
alphaA = params["alphaA"]
alphaS = params["alphaS"]
alphaC = params["alphaC"]
# total population
NH = (SH + IA + IS + EH + RH) + (piH * SH) - (muH * SH)
NT = (ST + IT) + (piT * ST) - (muT * ST)
NC = (SC + IC + RC) + (piC * SC) - (muH * SC)
# tick carrying Capacity
# KT = NC * 130 # 130 ticks per carrier max
#computations ----
dSHdt <- (piH * NH) - (betaHHA * IA + betaHHS * IS + betaCH * IC + betaTH * IT)*(SH/NH) - (muH * SH)
dEHdt <- (betaHHA * IA + betaHHS * IS + betaCH * IC + betaTH * IT)*(SH/NH) - ((deltaHA + muH)*EH)
dIAdt <- (deltaHA * EH) - ((alphaA + muH + deltaHS) * IA)
dISdt <- (deltaHS * IA) - ((alphaS + muH + gamma) * IS)
dRHdt <- alphaA * IA + alphaS * IS - muH*RH
dSTdt <- (piT * NT) - (betaTT * IT + betaCT * IC)*(ST/NT) - (muT * ST)
dITdt <- (betaTT * IT + betaCT * IC)*(ST/NT) - (muT * IT)
dSCdt <- (piC * NC) - (betaTC * IT)*(SC/NC) - (muC * SC)
dICdt <- (betaTC * IT)*(SC/NC) - ((alphaC +muC) * IC)
dRCdt <- (alphaC * IC) - (muC * RC)
# return results
list(c(dSHdt, dEHdt, dIAdt, dISdt, dRHdt, dSTdt, dITdt, dSCdt, dICdt, dRCdt))
}
I run the ODE solver using:
defaultParms = c(betaHHA = .0413,
betaHHS = .0413,
betaTH = .2891,
betaCH = .0826,
betaTC = (1/365),
betaCT = 59/365,
betaTT = ((1/(365 * 2)) * .04) * 280,
gamma = 1/10,
muH = (1/(365 * 73)),
muT = (1/(365 * 2)),
muC = (1/(11 * 365)),
piH = 1.25/(73 * 365),
piT = 4.5/730,
piC = 1/(11 * 365),
deltaHS = 1/3,
deltaHA = 1/2,
alphaA = 1/17,
alphaS = 1/17,
alphaC = 1/7)
# time to start solution
t = seq(from = 0, to = 365, by = 0.1)
#initialize initial conditions
initialConditions = c(SH = 10000, EH = 5, IA = 5, IS = 10, RH = 2, ST = 80000, IT = 50, SC = 30000, IC = 5, RC = 1)
dataSet = ode(y = initialConditions, times = t, func = CCHFModel, parms = defaultParms)%>%
as.data.frame()
After running this all the output following the initial conditions is NA.
This is due to a typo - you misnumbered the translation of input values in the first section of your code (i.e., you skipped x[8]. I will go through two (hopefully) useful exercises, first explaining how I debugged this and then showing how to rewrite your function to make it less error-prone ...
debugging
Try running the gradient function for t=0, x=<initial conditions>:
CCHFModel(0,initialConditions, defaultParms)
## piH betaHHA deltaHA deltaHS alphaA piT
## -15.02882327 12.62349834 0.53902803 0.07805607 0.88227788 385.31052332
## betaTT piC betaTC alphaC
## 0.85526763 NA NA NA
Hmm, we already see we have a problem. Why are the last three elements of the computed gradients NA?
add browser() near the end of the function (before the dsCdt <- ... line) so we can take a closer look. Redefine the function, and try computing the gradient again.
When we get there and print out some of the quantities involved in the computation we see that both NC and RC are NA ... we can also see that an NA value of RC will cause NC to be NA, so let's check the definition of RC ...
aha! RC is defined as x[11], but length(initialConditions) is only 10 ... and a closer look shows that we missed x[8]. Redefining properly gives non-NA values throughout (I don't know if they're correct, but at least they're not NA).
error-proofing (1)
Although using [] or [[]] to extract elements of a vector usually give equivalent answers, you should always use [[]] when you want to extract a single element (scalar) from a vector. Here's why:
initialConditions[11] ## NA
initialConditions[[11]] ## Error in x[[11]] : subscript out of bounds
If you use [], the NA propagates through your code and you have to hunt down the original source. If you use [[]], R fails right away and tells you where the problem is. An additional benefit is that [] propagates the names of the vector elements in a way that doesn't usually make sense (take a look at the names of the output in "debugging/1" above ...)
error-proofing (2)
You can avoid all of the tedious and error-prone unpacking of the parameter and state vectors by replacing the unpacking code (everything before the computation of total populations) with
comb <- c(as.list(x), as.list(params))
attach(comb)
on.exit(detach(comb))
Provided that your parameter and state vectors are properly named (and there are no names that overlap between them), this will create a named list and allow looking up of the elements by name within your function; on.exit(detach(comb)) makes sure that everything gets cleaned up properly at the end. (You will see recommendations to use with() to do this; I prefer the strategy here because it makes debugging within the function [if necessary] easier. But as #tpetzoldt notes in comments, you should always pair attach(...) with on.exit(detach(...)); otherwise things get very confusing and messy ...)
At the end of the function I would use
g <- c(dSHdt, dEHdt, dIAdt, dISdt, dRHdt, dSTdt, dITdt, dSCdt, dICdt, dRCdt)
names(g) <- names(x)
list(g)
to make sure the gradient vector is properly labeled, which makes troubleshooting easier.

Normalizing constant for beta distribution with discrete prior : R code query

I am currently going through Bayesian Thinking with R by Jim Albert. I have a query about his code for his example with a beta likelihood and discrete prior. His code for calculating the posterior is:
pdisc <- function (p, prior, data)
s = data[1] # successes
f = data[2] # failures
#############
p1 = p + 0.5 * (p == 0) - 0.5 * (p == 1)
like = s * log(p1) + f * log(1 - p1)
like = like * (p > 0) * (p < 1) - 999 * ((p == 0) * (s >
0) + (p == 1) * (f > 0))
like = exp(like - max(like))
#############
product = like * prior
post = product/sum(product)
return(post)
}
My query is about the highlighted bit of code for calculating the likelihood and what the logic behind it is (not explained in the book). I'm aware of the pdf for the beta distribution, and that the log likelihood will be proportional to s * log(p1) + f * log(1 - p1) but it is not clear what the following 2 lines are doing - I imagine it's something to do with the normalizing constant, but again there isn't an explanation for this in the book.
The line
like = like * (p > 0) * (p < 1) - 999 * ((p == 0) * (s >
0) + (p == 1) * (f > 0))
takes care of the edge cases when you have prior probability at 0 or 1. Basically, if p=0 and any successes are observed then like=-999 and if p=1 and any failures are observed then like=-999. I would have preferred to use -Inf rather than -999 as that is what the log likelihood is in those cases.
The second line
like = exp(like - max(like))
is a numerically stable way to exponentiate when only the relative differences in the logged values are important. If like were really small, e.g. you had lots of successes and failures, then it is possible that exp(like) would be represented as a 0 vector in a computer. Only the relative differences are important here because you renormalize the product to sum to 1 when constructing the posterior probabilities.

Quadratic Bezier Curve: Calculate t given x

Good day. I am using a Quadratic Bezier Curve with the following configurations:
Start Point P1 = (1, 2)
Anchor Point P2 = (1, 8)
End Point P3 = (10, 8)
I know that given a t, I know I can solve for x and y using the following equation:
t = 0.5; // given example value
x = (1 - t) * (1 - t) * P1.x + 2 * (1 - t) * t * P2.x + t * t * P3.x;
y = (1 - t) * (1 - t) * P1.y + 2 * (1 - t) * t * P2.y + t * t * P3.y;
where P1.x is the x coordinate of P1, and so on.
What I've tried now is that given an x value, I calculate for t using wolframalpha and then I plug that t in to the y equation and I get a my x and y point.
However, I want to automate finding t and then y. I have a formula to get x and y given a t. However, I don't have a formula to get t based on x. I'm a bit rusty with my algebra and expanding the first equation to isolate t doesn't look too easy.
Does anyone have a formula to get t based on x? My google search skills are failing me as of now.
I think it's also worth noting that my Bezier curve faces right.
Any help will be very much appreciated. Thanks.
problem is that what you want to solve is not function in general
for any t is just one (x,y) pair
but for any x there can be 0,1,2,+inf solutions of t
I would do this iteratively
you already can get any point p(t)=Bezier(t) so use iteration of t to minimize distance |p(t).x-x|
for(t=0.0,dt=0.1;t<=1.0;t+=dt)
find all local mins of d=|p(t).x-x|
so when d start rising again set dt*=-0.1 and stop if |dt|<1e-6 or any other threshold. Stop if t is out of interval <0,1> and remember the solution to some list. Restore original t,dt and reset the local min search variables
process all local mins
eliminate all that has bigger distance then some threshold/accuracy compute y and do what you need with the point ...
It is much slower then algebraic approach but you can use this for any curvature not just quadratic
Usually cubic curves are used and do this algebraically with them is a nightmare.
Look at your Bernstein polynomials B[i]; you have...
x = SUM_i ( B[i](t) * P[i].x )
...where...
B[0](t) = t^2 - 2*t + 1
B[1](t) = -2*t^2 + 2*t
B[2](t) = t^2
...so you can rearrange (assuming I did this right)...
0 = (P[0].x - 2*P[1].x + P[2].x) * t^2 + (-2*P[0].x + 2*P[1].x) * t + P[0].x - x
Now you should just be able to use the quadratic formula to find if the solutions for t exist (i.e., are real, not complex), and what they are.
import numpy as np
import matplotlib.pyplot as plt
#Control points
p0=(1000,2500); p1=(2000,-1500); p2=(5000,3000)
#x-coordinates to fit
xcoord = [1750., 2750., 3950.,4760., 4900.]
# t variable with as few points as needed, considering accuracy. I found 30 is good enough
t = np.linspace(0,1,30)
# calculate coordinates of quadratic Bezier curve
x = (1 - t) * (1 - t) * p0[0] + 2 * (1 - t) * t * p1[0] + t * t * p2[0];
y = (1 - t) * (1 - t) * p0[1] + 2 * (1 - t) * t * p1[1] + t * t * p2[1];
# find the closest points to each x-coordinate. Interpolate y-coordinate
ycoord=[]
for ind in xcoord:
for jnd in range(len(x[:-1])):
if ind >= x[jnd] and ind <= x[jnd+1]:
ytemp = (ind-x[jnd])*(y[jnd+1]-y[jnd])/(x[jnd+1]-x[jnd]) + y[jnd]
ycoord.append(ytemp)
plt.figure()
plt.xlim(0, 6000)
plt.ylim(-2000, 4000)
plt.plot(p0[0],p0[1],'kx', p1[0],p1[1],'kx', p2[0],p2[1],'kx')
plt.plot((p0[0],p1[0]),(p0[1],p1[1]),'k:', (p1[0],p2[0]),(p1[1],p2[1]),'k:')
plt.plot(x,y,'r', x, y, 'k:')
plt.plot(xcoord, ycoord, 'rs')
plt.show()

Roots of a Quartic Function

I came across a situation doing some advanced collision detection, where I needed to calculate the roots of a quartic function.
I wrote a function that seems to work fine using Ferrari's general solution as seen here: http://en.wikipedia.org/wiki/Quartic_function#Ferrari.27s_solution.
Here's my function:
private function solveQuartic(A:Number, B:Number, C:Number, D:Number, E:Number):Array{
// For paramters: Ax^4 + Bx^3 + Cx^2 + Dx + E
var solution:Array = new Array(4);
// Using Ferrari's formula: http://en.wikipedia.org/wiki/Quartic_function#Ferrari.27s_solution
var Alpha:Number = ((-3 * (B * B)) / (8 * (A * A))) + (C / A);
var Beta:Number = ((B * B * B) / (8 * A * A * A)) - ((B * C) / (2 * A * A)) + (D / A);
var Gamma:Number = ((-3 * B * B * B * B) / (256 * A * A * A * A)) + ((C * B * B) / (16 * A * A * A)) - ((B * D) / (4 * A * A)) + (E / A);
var P:Number = ((-1 * Alpha * Alpha) / 12) - Gamma;
var Q:Number = ((-1 * Alpha * Alpha * Alpha) / 108) + ((Alpha * Gamma) / 3) - ((Beta * Beta) / 8);
var PreRoot1:Number = ((Q * Q) / 4) + ((P * P * P) / 27);
var R:ComplexNumber = ComplexNumber.add(new ComplexNumber((-1 * Q) / 2), ComplexNumber.sqrt(new ComplexNumber(PreRoot1)));
var U:ComplexNumber = ComplexNumber.pow(R, 1/3);
var preY1:Number = (-5 / 6) * Alpha;
var RedundantY:ComplexNumber = ComplexNumber.add(new ComplexNumber(preY1), U);
var Y:ComplexNumber;
if(U.isZero()){
var preY2:ComplexNumber = ComplexNumber.pow(new ComplexNumber(Q), 1/3);
Y = ComplexNumber.subtract(RedundantY, preY2);
} else{
var preY3:ComplexNumber = ComplexNumber.multiply(new ComplexNumber(3), U);
var preY4:ComplexNumber = ComplexNumber.divide(new ComplexNumber(P), preY3);
Y = ComplexNumber.subtract(RedundantY, preY4);
}
var W:ComplexNumber = ComplexNumber.sqrt(ComplexNumber.add(new ComplexNumber(Alpha), ComplexNumber.multiply(new ComplexNumber(2), Y)));
var Two:ComplexNumber = new ComplexNumber(2);
var NegativeOne:ComplexNumber = new ComplexNumber(-1);
var NegativeBOverFourA:ComplexNumber = new ComplexNumber((-1 * B) / (4 * A));
var NegativeW:ComplexNumber = ComplexNumber.multiply(W, NegativeOne);
var ThreeAlphaPlusTwoY:ComplexNumber = ComplexNumber.add(new ComplexNumber(3 * Alpha), ComplexNumber.multiply(new ComplexNumber(2), Y));
var TwoBetaOverW:ComplexNumber = ComplexNumber.divide(new ComplexNumber(2 * Beta), W);
solution["root1"] = ComplexNumber.add(NegativeBOverFourA, ComplexNumber.divide(ComplexNumber.add(W, ComplexNumber.sqrt(ComplexNumber.multiply(NegativeOne, ComplexNumber.add(ThreeAlphaPlusTwoY, TwoBetaOverW)))), Two));
solution["root2"] = ComplexNumber.add(NegativeBOverFourA, ComplexNumber.divide(ComplexNumber.subtract(NegativeW, ComplexNumber.sqrt(ComplexNumber.multiply(NegativeOne, ComplexNumber.subtract(ThreeAlphaPlusTwoY, TwoBetaOverW)))), Two));
solution["root3"] = ComplexNumber.add(NegativeBOverFourA, ComplexNumber.divide(ComplexNumber.subtract(W, ComplexNumber.sqrt(ComplexNumber.multiply(NegativeOne, ComplexNumber.add(ThreeAlphaPlusTwoY, TwoBetaOverW)))), Two));
solution["root4"] = ComplexNumber.add(NegativeBOverFourA, ComplexNumber.divide(ComplexNumber.add(NegativeW, ComplexNumber.sqrt(ComplexNumber.multiply(NegativeOne, ComplexNumber.subtract(ThreeAlphaPlusTwoY, TwoBetaOverW)))), Two));
return solution;
}
The only issue is that I seem to get a few exceptions. Most notably when I have two real roots, and two imaginary roots.
For example, this equation:
y = 0.9604000000000001x^4 - 5.997600000000001x^3 + 13.951750054511718x^2 - 14.326264455924333x + 5.474214401412618
Returns the roots:
1.7820304835380467 + 0i
1.34041662585388 + 0i
1.3404185025061823 + 0i
1.7820323472855648 + 0i
If I graph that particular equation, I can see that the actual roots are closer to 1.2 and 2.9 (approximately). I can't dismiss the four incorrect roots as random, because they're actually two of the roots for the equation's first derivative:
y = 3.8416x^3 - 17.9928x^2 + 27.9035001x - 14.326264455924333
Keep in mind that I'm not actually looking for the specific roots to the equation I posted. My question is whether there's some sort of special case that I'm not taking into consideration.
Any ideas?
For finding roots of polynomials of degree >= 3, I've always had better results using Jenkins-Traub ( http://en.wikipedia.org/wiki/Jenkins-Traub_algorithm ) than explicit formulas.
I do not know why Ferrari's solution does not work, but I tried to use the standard numerical method (create a companion matrix and compute its eigenvalues), and I obtain the correct solution, i.e., two real roots at 1.2 and 1.9.
This method is not for the faint of heart. After constructing the companion matrix of the polynomial, you run the QR algorithm to find the eigenvalues of that matrix. Those are the zeroes of the polynomial.
I suggest you to use an existing implementation of the QR algorithm since a good deal of it is closer to kitchen recipe than algorithmics. But it is, I believe, the most widely used algorithm to compute eigenvalues, and thereby, roots of polynomials.
You can see my answer to a related question. I support the view of Olivier: the way to go may just be the companion matrix / eigenvalue approach (very stable, simple, reliable, and fast).
Edit
I guess it does not hurt if I reproduce the answer here, for convenience:
The numerical solution for doing this many times in a reliable, stable manner, involve: (1) Form the companion matrix, (2) find the eigenvalues of the companion matrix.
You may think this is a harder problem to solve than the original one, but this is how the solution is implemented in most production code (say, Matlab).
For the polynomial:
p(t) = c0 + c1 * t + c2 * t^2 + t^3
the companion matrix is:
[[0 0 -c0],[1 0 -c1],[0 1 -c2]]
Find the eigenvalues of such matrix; they correspond to the roots of the original polynomial.
For doing this very fast, download the singular value subroutines from LAPACK, compile them, and link them to your code. Do this in parallel if you have too many (say, about a million) sets of coefficients. You could use QR decomposition, or any other stable methodology for computing eigenvalues (see the Wikipedia entry on "matrix eigenvalues").
Notice that the coefficient of t^3 is one, if this is not the case in your polynomials, you will have to divide the whole thing by the coefficient and then proceed.
Good luck.
Edit: Numpy and octave also depend on this methodology for computing the roots of polynomials. See, for instance, this link.
The other answers are good and sound advice. However, recalling my experience with the implementation of Ferrari's method in Forth, I think your wrong results are probably caused by 1. wrong implementation of the necessary and rather tricky sign combinations, 2. not realizing yet that ".. == beta" in floating-point should become "abs(.. - beta) < eps, 3. not yet having found out that there are other square roots in the code that may return complex solutions.
For this particular problem my Forth code in diagnostic mode returns:
x1 = 1.5612244897959360787072371026316680470492e+0000 -1.6542769593216835969789894020584464029664e-0001 i
--> -4.2123274051525879873007970023884313331788e-0054 3.4544674220377778501545407451201598284464e-0077 i
x2 = 1.5612244897959360787072371026316680470492e+0000 1.6542769593216835969789894020584464029664e-0001 i
--> -4.2123274051525879873007970023884313331788e-0054 -3.4544674220377778501545407451201598284464e-0077 i
x3 = 1.2078440724224197532447709413299479764843e+0000 0.0000000000000000000000000000000000000000e-0001 i
--> -4.2123274051525879873010733597821943554068e-0054 0.0000000000000000000000000000000000000000e-0001 i
x4 = 1.9146049071693819497220585618954851525216e+0000 -0.0000000000000000000000000000000000000000e-0001 i
--> -4.2123274051525879873013497171759573776348e-0054 0.0000000000000000000000000000000000000000e-0001 i
The text after "-->" follows from backsubstituting the root into the original equation.
For reference, here are Mathematica/Alpha's results to the highest possible precision I managed to set it:
Mathematica:
x1 = 1.20784407242
x2 = 1.91460490717
x3 = 1.56122449 - 0.16542770 i
x4 = 1.56122449 + 0.16542770 i
A good alternative to the methods already mentioned is the TOMS Algorithm 326, which is based on the paper "Roots of Low Order Polynomials" by Terence R.F.Nonweiler CACM (Apr 1968).
This is an algebraic solution to 3rd and 4th order polynomials that is reasonably compact, fast, and quite accurate. It is much simpler than Jenkins Traub.
Be warned however that the TOMS code doesn't work all that well.
This Iowa Hills Root Solver page has code for a Quartic / Cubic root finder that is a bit more refined. It also has a Jenkins Traub type root finder.

Resources