Why is this linear program infeasible in GLPK? - glpk

I have the following problem set up in glpk. Two variables, p and v, and three constraints. The objective is to maximize v.
p >= 0
p == 1
-v + 3p >= 0
The answer should be v==3, but for some reason, the solver tells me it is infeasible when using the simplex method, and complains about numerical instability when using an interior point method.
This problem is generated as a subproblem of a bigger problem, and obviously not all subproblems are as trivial or I would just hardcode the solution.

Because, for some reason, by default, columns variables are fixed at 0 (GLP_FX) and not free. I don't see how that default makes sense.

Related

Recomendations (functions/solution) to apply in OpenMDAO instead of boolean conditions (if/else)

I have been working for a couple of months with OpenMDAO and I find myself struggling with my code when I want to impose conditions for trying to replicate a physical/engineering behaviour.
I have tried using sigmoid functions, but I am still not convinced with that, due to the difficulty about trading off sensibility and numerical stabilization. Most of times I found overflows in exp so I end up including other conditionals (like np.where) so loosing linearity.
outputs['sigmoid'] = 1 / (1 + np.exp(-x))
I was looking for another kind of step function or something like that, able to keep linearity and derivability to the ease of the optimization. I don't know if something like that exists or if there is any strategy that can help me. If it helps, I am working with an OpenConcept benchmark, which uses vectorized computations ans Simpson's rule numerical integration.
Thank you very much.
PD: This is my first ever question in stackoverflow, so I would like to apologyze in advance for any error or bad practice commited. Hope to eventually collaborate and become active in the community.
Update after Justin answer:
I will take the opportunity to define a little bit more my problem and the strategy I tried. I am trying to monitorize and control thermodynamics conditions inside a tank. One of the things is to take actions when pressure P1 reaches certein threshold P2, for defining this:
eval= (inputs['P1'] - inputs['P2']) / (inputs['P1'] + inputs['P2'])
# P2 = threshold [Pa]
# P1 = calculated pressure [Pa]
k=100 #steepness control
outputs['sigmoid'] = (1 / (1 + np.exp(-eval * k)))
eval was defined in order avoid overflows normalizing the values, so when the threshold is recahed, corrections are taken. In a very similar way, I defined a function to check if there is still mass (so flowing can continue between systems):
eval= inputs['mass']/inputs['max']
k=50
outputs['sigmoid'] = (1 / (1 + np.exp(-eval*k)))**3
maxis also used for normalizing the value and the exponent is added for reaching zero before entering in the negative domain.
PLot (sorry it seems I cannot post images yet for my reputation)
It may be important to highlight that both mass and pressure are calculated from coupled ODE integration, in which this activation functions take part. I guess OpenConcept nature 'explore' a lot of possible values before arriving the solution, so most of the times giving negative infeasible values for massand pressure and creating overflows. For that sometimes I try to include:
eval[np.where(eval > 1.5)] = 1.5
eval[np.where(eval < -1.5)] = -1.5
That is not a beautiful but sometimes effective solution. I try to avoid using it since I taste that this bounds difficult solver and optimizer work.
I could give you a more complete answer if you distilled your question down to a specific code example of the function you're wrestling with and its expected input range. If you provide that code-sample, I'll update my answer.
Broadly, this is a common challenge when using gradient based optimization. You want some kind of behavior like an if-condition to turn something on/off and in many cases thats a fundamentally discontinuous function.
To work around that we often use sigmoid functions, but these do have some of the numerical challenges you pointed out. You could try a hyberbolic tangent as an alternative, though it may suffer the same kinds of problems.
I will give you two broad options:
Option 1
sometimes its ok (even if not ideal) to leave the purely discrete conditional in the code. Lets say you wanted to represent a kind of simple piecewise function:
y = 2x; x>=0
y = 0; x < 0
There is a sharp corner in that function right at 0. That corner is not differentiable, but the function is fine everywhere else. This is very much like the absolute value function in practice, though you might not draw the analogy looking at the piecewise definition of the function because the piecewise nature of abs is often hidden from you.
If you know (or at least can check after the fact) that your final answer will no lie right on or very near to that C1 discontinuity, then its probably fine to leave the code the way is is. Your derivatives will be well defined everywhere but right at 0 and you can simply pick the left or the right answer for 0.
Its not strictly mathematically correct, but it works fine as long as you're not ending up stuck right there.
Option 2
Apply a smoothing function. This can be a sigmoid, or a simple polynomial. The exact nature of the smoothing function is highly specific to the kind of discontinuity you are trying to approximate.
In the case of the piecewise function above, you might be tempted to define that function as:
2x*sig(x)
That would give you roughly the correct behavior, and would be differentiable everywhere. But wolfram alpha shows that it actually undershoots a little. Thats probably undesirable, so you can increase the exponent to mitigate that. This however, is where you start to get underflow and overflow problems.
So to work around that, and make a better behaved function all around, you could instead defined a three part piecewise polynomial:
y = 2x; x>=a
y = c0 + c1*x + c2*x**2; -a <= x < a
y = 0 x < -a
you can solve for the coefficients as a function of a (please double check my algebra before using this!):
c0 = 1.5a
c1 = 2
c2 = 1/(2a)
The nice thing about this approach is that it will never overshoot and go negative. You can also make a reasonably small and still get decent numerics. But if you try to make it too small, c2 will obviously blow up.
In general, I consider the sigmoid function to be a bit of a blunt instrument. It works fine in many cases, but if you try to make it approximate a step function too closely, its a nightmare. If you want to represent physical processes, I find polynomial fillet functions work more nicely.
It takes a little effort to derive that polynomial, because you want it to be c1 continuous on both sides of the curve. So you have to construct the system of equations to solve for it as a function of the polynomial order and the specific relaxation you want (0.1 here).
My goto has generally been to consult the table of activation functions on wikipedia: https://en.wikipedia.org/wiki/Activation_function
I've had good luck with sigmoid and the hyperbolic tangent, scaling them such that we can choose the lower and upper values as well as choosing the location of the activation on the x-axis and the steepness.
Dymos uses a vectorization that I think is similar to OpenConcept and I've had success with numpy.where there as well, providing derivatives for each possible "branch" taken. It is true that you may have issues with derivative mismatches if you have an analysis point right on the transition, but often I've had success despite that. If the derivative at the transition becomes a hinderance then implementing a sigmoid or relu are more appropriate.
If x is of a magnitude such that it can cause overflows, consider applying units or using scaling to put it within reasonable limits if you cannot bound it directly.

Efficiently finding the closest zero of an arbitrary function

In summary, I am trying to start at a given x and find the nearest point in the positive direction where f(x) = 0. For simplicity, solutions are only needed in the interval [initial_x, maximum_x] (the maximum is given), but any better reach is desirable. Additionally, a specific precision is not mandatory; I am looking to maximize it, but not at the cost of performance.
While this seems simple, there are a few caveats that make the solution more difficult.
Performance is the first priority, even over some precision. The zero needs to be found in the fewest possible calls to f(x), as this code will be run many times per second.
There are not guaranteed to be any specific number of zeros on this line. There may be zero, one, or many places that the function intersects the x-axis. (This is why a direct binary search will not work.)
The function f(x) cannot be manipulated algebraically, only supporting numerical evaluation at a discrete point. (This is why the solution cannot be found analytically.)
My current strategy is to define a step size that is within an acceptable loss of precision and then test in increments until an interval is found on which there is guaranteed to be at least one zero (in [a,b], a and b are on opposite sides of 0). From there, I use a binary search to narrow down the (more) exact point.
// assuming y != 0
initial_y = f(x);
while (x < maximum_x) {
y = f(x);
// test to see if y has crossed 0
if (initial_y > 0) {
if (y < 0) {
return binary_search(x - step_size, x);
}
} else {
if (y > 0) {
return binary_search(x - step_size, x);
}
}
x += step_size;
}
This has several disadvantages, mainly the fact that there is a significant trade-off between resolution and performance (the smaller step_size is, the better it works but the longer it takes). Is there a more efficient formula or strategy I can take? I thought of using the value of y to scale the step size, but I cannot figure out how to preserve precision while doing that.
The solution can be in any language because I am looking more for a strategy to find the zeros, than a specific program.
(edit:)
The function above is assumed to be continuous.
To clarify the question, I understand that this problem may be impossible to solve exactly. I am just asking for ways to improve the speed or precision of the algorithm. The one I am currently using is working quite well, even though it fails during many edge cases.
For example, a solution that requires fewer steps with similar precision or another algorithm that increases the precision or reliability with some performance impact would both be extremely helpful.
Your problem is essentially impossible to solve in the general case. For example, no algorithm can find the "first" root of sin(1/x), starting from x=0.
A tentative answer is by exponential search, i.e. starting from a small step and increase it following a geometric progression rather than an arithmetic one, until you find a change of sign. But this will fail if the first root is closer than the initial step, or if the first root is followed by a close one.
Without any information on the behavior of f, I would not even try anything (but a "standard" root finder), this is too hopeless ! (But I am sure you do have some information.)

Is 1 / 0 = 0 according to Isabelle?

The following lemma:
lemma "(1::real) / 0 = 0" by simp
goes through because of theorem division_ring_divide_zero
I find this very disturbing since if I want to show that some fraction is non-zero I have to show that the numerator is non-zero AND the denominator is non-zero, which might make sense but confuses two different problems into one.
Is there a way of separating the well-definition of a fraction and its non-zeroness?
Isabelle/HOL is a logic of total functions, so there is no built-in notion of a fraction or any other function application being undefined. That is, a / b is defined for all a and b, and it returns their quotient except when b is zero. But then it still has a value.
In the library, the decision was made to complete the function in such a way that x / 0 = 0. This decision simplifies many proofs, since you have to deal with less side conditions. Unfortunately it also sometimes confuses people who expect something else.

Julia and system of ordinary differential equations

I want to try to solve a system of ordinary differential equations, perhaps parallelized and came across Julia and DifferentialEquations.jl. the system looks like
x'(t) = f(t)*z(t)
y'(t) = g(t)*z(t)
z'(t) = f(t)*(1-2*x(t))/(2) -g(t)*y(t)
over 10^2 < t < 10^14, but my initial boundary conditions are
x(10^14) == 0
y(10^14) == 0
z(10^14) == 0
Could someone please explain to me how to setup this problem in julia? I checked the documentation and could only find u0 as a parameter, but it doesn't give details on choosing for a right handed set of boundary conditions Many thanks!
You're looking to solve a boundary value problem (BVP). While this area is currently less developed than other areas of DifferentialEquations.jl, there are methods which exist for this which are shown in the tutorial on solving BVPs. The MIRK4 method may be the one to try.
I will note however that your timescale is quite large and may lead to numerical errors. Either using higher precision numbers (BigFloat, ArbFloat, DoubleFloat) may be required for handling that range, or you may want to rescale time in your equations so that way it better fits for standard double precision floating point numbers (Float64).

Support Vector Machine Geometrical Intuition

Hi,
I have a big difficult trying to understand why in the equation of the hyperplane of support vector machine there is a 1 after >=?? w.x + b >= 1 <==(why this 1??) I know that could be something about the intersection point on y axes but I cannot relate that to the support vector and to its meaning of classification.
Can anyone please explain me why the equation has that 1(-1) ?
Thank you.
The 1 is just an algebraic simplification, which comes in handy in the later optimization.
First, notice, that all three hyperplanes can be denotes as
w'x+b= 0
w'x+b=+A
w'x+b=-A
If we would fix the norm of the normal w, ||w||=1, then the above would have one solution with some arbitrary A depending on the data, lets call our solution v and c (values of optimal w and b respectively). But if we let w to have any norm, then we can easily see, that if we put
w'x+b= 0
w'x+b=+1
w'x+b=-1
then there is one unique w which satisfies these equations, and it is given by w=v/A, b=c/A, because
(v/A)'x+(b/A)= 0 (when v'x+b=0) // for the middle hyperplane
(v/A)'x+(b/A)=+1 (when v'x+b=+A) // for the positive hyperplane
(v/A)'x+(b/A)=-1 (when v'x+b=-A) // for the negative hyperplane
In other words - we assume that these "supporting vectors" satisfy w'x+b=+/-1 equation for future simplification, and we can do it, because for any solution satisfing v'x+c=+/-A there is a solution for our equation (with different norm of w)
So once we have these simplifications our optimization problem simplifies to the minimization of the norm of ||w|| (maximization of the size of the margin, which now can be expressed as `2/||w||). If we would stay with the "normal" equation with (not fixed!) A value, then the maximization of the margin would be in one more "dimension" - we would have to look through w,b,A to find the triple which maximizes it (as the "restrictions" would be in the form of y(w'x+b)>A). Now, we just search through w and b (and in the dual formulation - just through alpha but this is the whole new story).
This step is not required. You can build SVM without it, but this makes thing simplier - the Ockham's razor rule.
This boundary is called "margin" and must be maximized then you have to minimize ||w||.
The aim of SVM is to find a hyperplane able to maximize the distances between the two groups.
However there are infinite solutions ( see figure: move the optimal hyperplane along the perpendicualr vector) and we need to fix at least the boundaries: the +1 or -1 is a common convention to avoid these infinite solutions.
Formally you have to optimize r ||w|| and we set a bounadry condition r ||w|| = 1.

Resources