I'm trying to solve this in LPSolve IDE:
/* Objective function */
min: x + y;
/* Variable bounds */
r_1: 2x = 2y;
r_2: x + y = 1.11 x y;
r_3: x >= 1;
r_4: y >= 1;
but the response I get is:
Model name: 'LPSolver' - run #1
Objective: Minimize(R0)
SUBMITTED
Model size: 4 constraints, 2 variables, 5 non-zeros.
Sets: 0 GUB, 0 SOS.
Using DUAL simplex for phase 1 and PRIMAL simplex for phase 2.
The primal and dual simplex pricing strategy set to 'Devex'.
The model is INFEASIBLE
lp_solve unsuccessful after 2 iter and a last best value of 1e+030
How come this can happen when x=1.801801802 and y=1.801801802 are possible solutions here?
How To Find The Solution
Let's do some math.
Your problem is:
min x+y
s.t. 2x = 2y
x + y = 1.11 x y
x >= 1
y >= 1
The first constraint 2x = 2y can be simplified to x=y. We now substitute throughout the problem:
min 2*x
s.t. 2*x = 1.11 x^2
x >= 1
And rearrange:
min 2*x
s.t. 1.11 x^2-2*x=0
x >= 1
From geometry we know that 1.11 x^2-2*x makes an upward-opening parabola with a minimum less than zero. Therefore, there are exactly two points. These are given by the quadratic equation: 200/111 and 0.
Only one of these satisfies the second constraint: 200/111.
Why Can't I Find This Constraint With My Solver
The easy way out is to say it's because the x^2 term (x*y before the substitution is nonlinear). But it goes a little deeper than that. Nonlinear problems can be easy to solve as long as they are convex. A convex problem is one whose constraints form a single, contiguous space such that any line drawn between two points in the space stays within the boundaries of the space.
Your problem is not convex. The constraint 1.11 x^2-2*x=0 defines an infinite number of points. No two of these points can be connected by a straight line which stays in the space defined by the constraint because that space is curved. If the constraint were instead 1.11 x^2-2*x<=0 then the space would be convex because all points could be connected with straight lines that stay in its interior.
Nonconvex problems are part of a broader class of problems called NP-Hard. This means that there is not (and perhaps cannot) be any easy way of solving the problem. We have to be smart.
Solvers that can handle mixed-integer programming (MIP/MILP) can solve many non-convex problems efficiently, as can other techniques such as genetic algorithms. But, beneath the hood, these techniques all rely on glorified guess-and-check.
So your solver fails because the problem is nonconvex and your solver is neither smart enough to use MIP to guess-and-check its way to a solution nor smart enough to use the quadratic equation.
How Then Can I Solve The Problem?
In this particular instance, we are able to use mathematics to quickly find a solution because, although the problem is nonconvex, it is part of a class of special cases. Deep thinking by mathematicians has given us a simple way of handling this class.
But consider a few generalizations of the problem:
(a) a x^3+b x^2+c x+d=0
(b) a x^4+b x^3+c x^2+d x+e =0
(c) a x^5+b x^4+c x^3+d x^2+e x+f=0
(a) has three potential solutions which must be checked (exact solutions are tricky), (b) has four (trickier), and (c) has five. The formulas for (a) and (b) are much more complex than the quadratic formula and mathematicians have shown that there is no formula for (c) that can be expressed using "elementary operations". Instead, we have to resort to glorified guess-and-check.
So the techniques we used to solve your problem don't generalize very well. This is what it means to live in the realm of the nonconvex and NP-hard, and it's a good reason to fund research in mathematics, computer science, and related fields.
Related
I was wondering if there is a nice way (preferably using JuMP) to get all optimal solutions of a linear program (in case there are multiple optimal solutions).
An example
minimize the statistical distance (Kolmogorov distance) between two probabilities distributions.
min sum_{i=1}^{4} |P[i] - Q[i]| over free variable Q
P = [0.25,0.25,0.25,0.25]
sum_i P[i] = 1
Q[1] + Q[4] = 1
sum_i Q[i] = 1 -> Q[2],Q[3] = 0
Note we can phrase the optimization as a linear program, the objective becomes
min S >= sum_i S[i]
S[i] >= P[i]-Q[i]
S[i] >= Q[i]-P[i]
There is no unique solution to this problem, instead the subspace of optimal solution is spanned by
Q1 = [0.75,0,0,0.25]
Q2 = [0.25,0,0,0.75]
Both have the minimal distance of 0.5,
any convex combination of the these two solution is optimal.
I was wondering if there is a nice way to find all these optimal extreme points (points that span the optimal subspace)?
Why am I interested in this; the points that gives the maximal Bhattacharyya coefficient (concave function), lies somewhere in the middle of the optimal subspace of the statical distance.
So far I`ve tried to find optimal P,Q pairs (refering to example I gave) by making the algorithm favor miniziming the distance between P[i],Q[i], by adding a weight of 1.001 to this term in the sum. It seems to work to some extend, although I can hardly know for sure.
There is an interesting way to enumerate all possible optimal LP solutions (or rather all optimal LP bases) using a standard MIP solver. Basically the algorithm is:
step 1. solve LP/MIP
step 2. if infeasible or if objective starts to deteriorate: stop
step 3. add cuts (constraints) to the model to forbid current optimal solution
step 4. goto step 1
For an example see here.
LP solvers are not designed to enumerate all optimal solutions. Once you know the optimal objective value, you can define the polyhedron containing all optimal solutions and then use a vertex enumeration algorithm to collect the possibly very large set of extreme points of this polyhedron. All optimal solutions are convex combinations of these extreme points. From Julia, you could use the wrapper for cdd.
I don't know about julia, but there is a tool called PPL that you can use to determine all the vertices of the solution polyedron after you solved the linear program.
See my answer here to a similar question:
Find all alternative basic solutions using existing linear-programming tool.
If I have a general function,f(z,a), z and a are both real, and the function f takes on real values for all z except in some interval (z1,z2), where it becomes complex. How do I determine z1 and z2 (which will be in terms of a) using Mathematica (or is this possible)? What are the limitations?
For a test example, consider the function f[z_,a_]=Sqrt[(z-a)(z-2a)]. For real z and a, this takes on real values except in the interval (a,2a), where it becomes imaginary. How do I find this interval in Mathematica?
In general, I'd like to know how one would go about finding it mathematically for a general case. For a function with just two variables like this, it'd probably be straightforward to do a contour plot of the Riemann surface and observe the branch cuts. But what if it is a multivariate function? Is there a general approach that one can take?
What you have appears to be a Riemann surface parametrized by 'a'. Consider the algebraic (or analytic) relation g(a,z)=0 that would be spawned from this branch of a parametrized Riemann surface. In this case it is simply g^2 - (z - a)*(z - 2*a) == 0. More generally it might be obtained using Groebnerbasis, as below (no guarantee this will always work without some amount of user intervention).
grelation = First[GroebnerBasis[g - Sqrt[(z - a)*(z - 2*a)], {x, a, g}]]
Out[472]= 2 a^2 - g^2 - 3 a z + z^2
A necessary condition for the branch points, as functions of the parameter 'a', is that the zero set for 'g' not give a (single valued) function in a neighborhood of such points. This in turn means that the partial derivative of this relation with respect to g vanishes (this is from the implicit function theorem of multivariable calculus). So we find where grelation and its derivative both vanish, and solve for 'z' as a function of 'a'.
Solve[Eliminate[{grelation == 0, D[grelation, g] == 0}, g], z]
Out[481]= {{z -> a}, {z -> 2 a}}
Daniel Lichtblau
Wolfram Research
For polynomial systems (and some class of others), Reduce can do the job.
E.g.
In[1]:= Reduce[Element[{a, z}, Reals]
&& !Element[Sqrt[(z - a) (z - 2 a)], Reals], z]
Out[1]= (a < 0 && 2a < z < a) || (a > 0 && a < z < 2a)
This type of approach also works (often giving very complicated solutions for functions with many branch cuts) for other combinations of elementary functions I checked.
To find the branch cuts (as opposed to the simple class of branch points you're interested in) in general, I don't know of a good approach. The best place to find the detailed conventions that Mathematica uses is at the functions.wolfram site.
I do remember reading a good paper on this a while back... I'll try to find it....
That's right! The easiest approach I've seen for branch cut analysis uses the unwinding number. There's a paper "Reasoning about the elementary functions of complex analysis" about this the the journal "Artificial Intelligence and Symbolic Computation". It and similar papers can be found at one of the authors homepage: http://www.apmaths.uwo.ca/~djeffrey/offprints.html.
For general functions you cannot make Mathematica calculate it.
Even for polynomials, finding an exact answer takes time.
I believe Mathematica uses some sort of quantifier elimination when it uses Reduce,
which takes time.
Without any restrictions on your functions (are they polynomials, continuous, smooth?)
one can easily construct functions which Mathematica cannot simplify further:
f[x_,y_] := Abs[Zeta[y+0.5+x*I]]*I
If this function is real for arbitrary x and any -0.5 < y < 0 or 0<y<0.5,
then you will have found a counterexample to the Riemann zeta conjecture,
and I'm sure Mathematica cannot give a correct answer.
Lets suppose I am trying to analyze an algorithm and all I can do is run it with different inputs. I can construct a set of points (x,y) as (sample size, run time).
I would like to dynamically categorize the algorithm into a complexity class (linear, quadratic, exponential, logarithmic, etc..)
Ideally I could give an equation that more or less approximates the behavior.
I am just not sure what the best way to do this is.
For any degree polynomial I can create regression curves and come up with some measure of fitness, but I don't really have a clue how I would do that for any nonpolynomial function. It is harder since I don't have any previous knowledge of what shape I should try to fit.
This may be more of a math question than a programming question, but it is very interesting to me. I'm not a mathematician, so there may be a simpler established method to get a reasonable function from a set of points that I just don't know about. Does anyone have any ideas for solving a problem like this? Is there a numerical library for C# that could help me crunch the numbers?
Well there are not that many complexity classes you really care about, so let's say: linear, quadratic, polynomial (degree > 2), exponential, and logarithmic.
For each of these you could use the largest (x,y) pair to solve for the unknown variable. Let y = f(x) denote the runtime of your algorithm as a function of the sample size. Let's assume that f(1) = 0, and if it doesn't we can always subtract of that value y(1) from each of the y's, this just eliminates the constants in f(x). Let y(end) denote the last (and largest) value of y in your (x,y) data set.
At this point we can solve for the unknown in each canonical form:
f(x) = c*x
f(x) = c*x^2
f(x) = x^c
f(x) = c^x
f(x) = log(x)/log(c)
Since there is only a single unknown in each equation we can you any point to solve for it. Consider the following data generated from a polynomial of random degree > 2:
x = [ 1 2 3 4 5 6 7 8 9 10 ];
y = [ 0 6 19 44 81 135 206 297 411 550 ];
If we use the last point to solve for c for each possibility (assuming this would be the least noise estimate)
550 = c*10 -> c = 55
550 = c*10^2 -> c = 5.5
550 = 10^c -> c = log(550)/log(10) ~= 2.74
550 = c^10 -> c = 550^(1/10) ~= 1.88
550 = log(x)/log(c) -> c = 10^(1/550) ~= 1.0042
We can now compare how well each of these functions fit the remaining data, here is a plot:
I'm new and I can't post images so look at the plot here: http://i.stack.imgur.com/UH6T8.png
The true data is shown in the red asterisk, linear with green line, quadratic in blue, polynomial in black, exponential in pink, and the log plot in green with O's. It should be pretty clear from the residuals what function fits your data the best.
Curve fitting used to be an art, but is now somehow decadent :) (That's a joke for the physicists around)
A lot of progress has been made, that allows simple mortals to guess (some) non trivial functional dependencies.
I'll not enter into a description of the methods and limitations, but instead I'll refer you to eureqa, which is a very nice piece of software developed at Cornell.
Eureqa (pronounced "eureka") is a software tool for detecting equations and hidden mathematical relationships in your data. Its goal is to identify the simplest mathematical formulas which could describe the underlying mechanisms that produced the data. Eureqa is free to download and use. Look for the program download, video tutorial, user forum, and other and reference materials.
I tried eureqa several times with very good results if the models are not too complicated. I think it is good enough for distinguishing between polynomials, logs and exponentials.
HTH!
Post Scriptum:
Regrettably the software isn't free anymore :(
There is a question I am stuck on using the following formula for the unipolar transfer function:
f(net)= 1
__________
-net
1 + e
The example has the following:
out = 1
____________ = 0.977
-3.75
1 + e
How do we arrive at 0.977?
What is e?
e = 2.71828... is the base of natural logarithms. It's a mathematical constant that comes up in many different equations, similar to π. You will see it all the time when doing exponents and logarithms.
Plug it into your equation and you get 0.977.
While factually correct the other responses merely provide the value of e and confirm the underlying computation. This type of sigmoid functions is so ubiquitous to neural networks that some additional insight may be welcome.
Essentially the exponential function (e to the x power), has a very characteristic curve:
Mostly flat at zero (very slightly above zero, actually), from - infinity to about -2
incrementally sharp turn towards the vertical, between about -2 and +4
quasi "vertical", with values in excess of 150 and increasingly huge, from +5 to infinity
As a result exponential curves are very useful for producing "S-shaped" functions; BTW, "S" is Sigma in Greek which supplied the etymology for "sigmoid". Such functions are often patterned on the formula shown in the question:
1/(1 + e^-x)
where x is the variable. Typically such functions also include constants aimed at stretching the range (the input zone where changes in x are significant) and/or at modifying the curve in this middle zone.
The result of such functions is that up to a particular value of the input, the function is quasi constant, then, for a particular range of inputs, the function provides a increasing output, and finally past the upper value of the range, the function is quasi constant. Also looking in more details, such Sigmoids have a point of inflection which correspond to a reversing of the rate of change of the ouptut and which also marks an area of the curve, on either side, where the changes are the slowest, relatively.
In turn, such S-shaped curves (1) are very useful to normalize the output of neural network neurons, or more generally, to normalize various numeric values during processes of various nature. Intuitively these correspond to a "sweet spot" or a "sweet range" of the underlying neuron or device.
(1) Or also, possibly, "step-down" shaped curves, i.e. curves with a mostly constant high value, a decreasing value within the mid-range, and a low mostly constant value thereafter.
e is Euler's number == 2.718281828....
If you raise e to the -3.75 power, add one to it, and take the inverse, you'll get precisely 0.977022630....
'e' is the base for the natural logarithm function, the value of which is equivalent to the sum of the infinite series 1/n! for n from 0 to infinity. It is available in the C standard library or the java Math package as the exp() function.
If you evaluate 1/(1+exp(-3.75)) you will get 0.977
Gaffer on Games has a great article about using RK4 integration for better game physics. The implementation is straightforward, but the math behind it confuses me. I understand derivatives and integrals on a conceptual level, but haven't manipulated equations in a long while.
Here's the brunt of Gaffer's implementation:
void integrate(State &state, float t, float dt)
{
Derivative a = evaluate(state, t, 0.0f, Derivative());
Derivative b = evaluate(state, t+dt*0.5f, dt*0.5f, a);
Derivative c = evaluate(state, t+dt*0.5f, dt*0.5f, b);
Derivative d = evaluate(state, t+dt, dt, c);
const float dxdt = 1.0f/6.0f * (a.dx + 2.0f*(b.dx + c.dx) + d.dx);
const float dvdt = 1.0f/6.0f * (a.dv + 2.0f*(b.dv + c.dv) + d.dv)
state.x = state.x + dxdt * dt;
state.v = state.v + dvdt * dt;
}
Can anybody explain in simple terms how RK4 works? Specifically, why are we averaging the derivatives at 0.0f, 0.5f, 0.5f, and 1.0f? How is averaging derivatives up to the 4th order different from doing a simple euler integration with a smaller timestep?
After reading the accepted answer below, and several other articles, I have a grasp on how RK4 works. To answer my own questions:
Can anybody explain in simple terms how RK4 works?
RK4 takes advantage of the fact that
we can get a much better approximation
of a function if we use its
higher-order derivatives rather than
just the first or second derivative.
That's why the Taylor series
converges much faster than Euler
approximations. (take a look at the
animation on the right side of that
page)
Specifically, why are we averaging the derivatives at 0.0f, 0.5f, 0.5f, and 1.0f?
The Runge-Kutta method is an
approximation of a function that
samples derivatives of several points
within a timestep, unlike the Taylor
series which only samples derivatives
of a single point. After sampling
these derivatives we need to know how
to weigh each sample to get the
closest approximation possible. An
easy way to do this is to pick
constants that coincide with the
Taylor series, which is how the
constants of a Runge-Kutta equation
are determined.
This article made it clearer for
me. Notice how (15) is the Taylor
series expansion while (17) is the
Runge-Kutta derivation.
How is averaging derivatives up to the 4th order different from doing a simple euler integration with a smaller timestep?
Mathematically, it converges much
faster than doing many Euler
approximations. Of course, with enough
Euler approximations we can gain equal
accuracy to RK4, but the computational
power needed doesn't justify using
Euler.
This may be a bit oversimplified so far as actual math, but meant as an intuitive guide to Runge Kutta integration.
Given some quantity at some time t1, we want to know the quantity at another time t2. With a first-order differential equation, we can know the rate of change of that quantity at t1. There is nothing else we can know for sure; the rest is guessing.
Euler integration is the simplest way to guess: linearly extrapolate from t1 to t2, using the precisely known rate of change at t1. This usually gives a bad answer. If t2 is far from t1, this linear extrapolation will fail to match any curvature in the ideal answer. If we take many small steps from t1 to t2, we'll have the problem of subtraction of similar values. Roundoff errors will ruin the result.
So we refine our guess. One way is to go ahead and do this linear extrapolation anyway, then hoping it's not too far off from truth, use the differential equation to compute an estimate of the rate of change at t2. This, averaged with the (accurate) rate of change at t1, better represents the typical slope of the true answer between t1 and t2. We use this to make a fresh linear extrapolation from to t1 to t2. It's not obvious if we should take the simple average, or give more weight to the rate at t1, without doing the math to estimate errors, but there is a choice here. In any case, it's a better answer than Euler gives.
Perhaps better, make our initial linear extrapolation to a point in time midway between t1 and t2, and use the differential equation to compute the rate of change there. This gives roughly as good an answer as the average just described. Then use this for a linear extrapolation from t1 to t2, since our purpose it to find the quantity at t2. This is the midpoint algorithm.
You can imagine using the mid-point estimate of the rate of change to make another linear extrapolation of the quantity from t1 to the midpoint. With the differential equation we get an better estimate of the slope there. Using this, we end by extrapolating from t1 all the way to t2 where we want an answer. This is the Runge Kutta algorithm.
Could we do a third extrapolation to the midpoint? Sure, it's not illegal, but detailed analysis shows diminishing improvement, such that other sources of error dominate the final result.
Runge Kutta applies the differential equation to the intial point t1, twice to the midpoint, and once at the final point t2. The in-between points are a matter of choice. It is possible to use other points between t1 and t2 for making those improved estimates of the slope. For example, we could use t1, a point one third the way toward t2, another 2/3 the way toward t2, and at t2. The weights for the average of the four derivatives will be different. In practice this doesn't really help, but might have a place in testing since it ought to give the same answer but will provide a different set of round off errors.
As to your question why: I recall once writing a cloth simulator where the cloth was a series of springs interconnected at nodes. In the simulator, the force exerted by the spring is proportional to how far the spring is stretched. The force causes acceleration at the node, which causes velocity which moves the node which stretches the spring. There are two integrals (integrating acceleration to get velocity, and integrating velocity to get position) and if they are inaccurate, the errors snowball: Too much acceleration causes too much velocity which causes too much stretch which causes even more acceleration, making the whole system unstable.
It is difficult to explain without graphics, but I'll try: Say you have f(t), where f(0) = 10, f(1) = 20, and f(2) = 30.
A proper integration of f(t) over the interval 0 < t < 1 would give you the surface under the graph of f(t) over that interval.
The rectangle rule integration approximates that surface with a rectangle where the breadth is the delta in time and the length is the new value of f(t), so in the interval 0 < t < 1 , it will yield 20 * 1 = 20, and in the next interval 1
Now if you were to plot these points and draw a line through them you'll see that it is actually triangular, with a surface of 30 (units), and therefore the Euler integration is inadequate.
To get a more accurate estimation of the surface (integral) you can take smaller intervals of t, evaluating at for example f(0), f(0.5), f(1), f(1.5) and f(2).
If you're still following me, the RK4 method is then simply a way of estimating values of f(t) for t0 < t < t0+dt invented by people smarter than myself for getting accurate estimates of the integral.
(but as others have said, read the Wikipedia article for a more detailed explanation. RK4 is in the category of numerical integration)
RK4 in the simplest sense is making a approximation function that that is based on 4 derivatives and point for each time step: Your initial condition at starting point A, a first approximated slope B based on data point A at your time step/2 and the slope from A, a third approximation C , which has an correction value for the slope at B to reflect the shape changes of your function, and finally a final slope based on the corrected slope at point C.
So basically this method lets you calculate using a starting point, an averaged midpoint which has corrections built into both parts to adjust for the shape, and a doubly corrected endpoint. This makes the effective contribution from each data point 1/6 1/3 1/3 and 1/6, so most of your answer is based on your corrections for the shape of your function.
It turns out that the order of an RK approximation (Euler is considered an RK1) corresponds to how its accuracy scales with smaller time steps.
The relationship between RK1 approximations is linear, so for 10 times the precision you get roughly 10 times better convergence.
For RK4, 10 times the precision yields you about 10^4 times better convergence. So while your calcuation time increases linearly in RK4, it increases your accuracy polynomially.