I have a component that includes np.sqrt(1-x). This works fine for normal operation, since all inputs will strictly be between 0 and 1. However, when checking partials and providing an input array that goes all the way up to 1, the finite differencer will step past 1, and break the component. The inputs shouldn't be less than 0 either, so simply switching the direction of the finite difference wouldn't work.
The workaround is just using np.linspace(0, 0.99, 400) instead of np.linspace(0, 1, 400).
Is it possible to set allowable bounds for the finite differencing?
As of V3.1 there isn't a way to set bounds like that. Just make sure that you test the derivatives around a more well posed point, like around 0.5.
If you're having trouble getting things to work in the optimization context, since you might bump into both bounds ... well, thats why analytic derivatives are better :)
Related
I have been working for a couple of months with OpenMDAO and I find myself struggling with my code when I want to impose conditions for trying to replicate a physical/engineering behaviour.
I have tried using sigmoid functions, but I am still not convinced with that, due to the difficulty about trading off sensibility and numerical stabilization. Most of times I found overflows in exp so I end up including other conditionals (like np.where) so loosing linearity.
outputs['sigmoid'] = 1 / (1 + np.exp(-x))
I was looking for another kind of step function or something like that, able to keep linearity and derivability to the ease of the optimization. I don't know if something like that exists or if there is any strategy that can help me. If it helps, I am working with an OpenConcept benchmark, which uses vectorized computations ans Simpson's rule numerical integration.
Thank you very much.
PD: This is my first ever question in stackoverflow, so I would like to apologyze in advance for any error or bad practice commited. Hope to eventually collaborate and become active in the community.
Update after Justin answer:
I will take the opportunity to define a little bit more my problem and the strategy I tried. I am trying to monitorize and control thermodynamics conditions inside a tank. One of the things is to take actions when pressure P1 reaches certein threshold P2, for defining this:
eval= (inputs['P1'] - inputs['P2']) / (inputs['P1'] + inputs['P2'])
# P2 = threshold [Pa]
# P1 = calculated pressure [Pa]
k=100 #steepness control
outputs['sigmoid'] = (1 / (1 + np.exp(-eval * k)))
eval was defined in order avoid overflows normalizing the values, so when the threshold is recahed, corrections are taken. In a very similar way, I defined a function to check if there is still mass (so flowing can continue between systems):
eval= inputs['mass']/inputs['max']
k=50
outputs['sigmoid'] = (1 / (1 + np.exp(-eval*k)))**3
maxis also used for normalizing the value and the exponent is added for reaching zero before entering in the negative domain.
PLot (sorry it seems I cannot post images yet for my reputation)
It may be important to highlight that both mass and pressure are calculated from coupled ODE integration, in which this activation functions take part. I guess OpenConcept nature 'explore' a lot of possible values before arriving the solution, so most of the times giving negative infeasible values for massand pressure and creating overflows. For that sometimes I try to include:
eval[np.where(eval > 1.5)] = 1.5
eval[np.where(eval < -1.5)] = -1.5
That is not a beautiful but sometimes effective solution. I try to avoid using it since I taste that this bounds difficult solver and optimizer work.
I could give you a more complete answer if you distilled your question down to a specific code example of the function you're wrestling with and its expected input range. If you provide that code-sample, I'll update my answer.
Broadly, this is a common challenge when using gradient based optimization. You want some kind of behavior like an if-condition to turn something on/off and in many cases thats a fundamentally discontinuous function.
To work around that we often use sigmoid functions, but these do have some of the numerical challenges you pointed out. You could try a hyberbolic tangent as an alternative, though it may suffer the same kinds of problems.
I will give you two broad options:
Option 1
sometimes its ok (even if not ideal) to leave the purely discrete conditional in the code. Lets say you wanted to represent a kind of simple piecewise function:
y = 2x; x>=0
y = 0; x < 0
There is a sharp corner in that function right at 0. That corner is not differentiable, but the function is fine everywhere else. This is very much like the absolute value function in practice, though you might not draw the analogy looking at the piecewise definition of the function because the piecewise nature of abs is often hidden from you.
If you know (or at least can check after the fact) that your final answer will no lie right on or very near to that C1 discontinuity, then its probably fine to leave the code the way is is. Your derivatives will be well defined everywhere but right at 0 and you can simply pick the left or the right answer for 0.
Its not strictly mathematically correct, but it works fine as long as you're not ending up stuck right there.
Option 2
Apply a smoothing function. This can be a sigmoid, or a simple polynomial. The exact nature of the smoothing function is highly specific to the kind of discontinuity you are trying to approximate.
In the case of the piecewise function above, you might be tempted to define that function as:
2x*sig(x)
That would give you roughly the correct behavior, and would be differentiable everywhere. But wolfram alpha shows that it actually undershoots a little. Thats probably undesirable, so you can increase the exponent to mitigate that. This however, is where you start to get underflow and overflow problems.
So to work around that, and make a better behaved function all around, you could instead defined a three part piecewise polynomial:
y = 2x; x>=a
y = c0 + c1*x + c2*x**2; -a <= x < a
y = 0 x < -a
you can solve for the coefficients as a function of a (please double check my algebra before using this!):
c0 = 1.5a
c1 = 2
c2 = 1/(2a)
The nice thing about this approach is that it will never overshoot and go negative. You can also make a reasonably small and still get decent numerics. But if you try to make it too small, c2 will obviously blow up.
In general, I consider the sigmoid function to be a bit of a blunt instrument. It works fine in many cases, but if you try to make it approximate a step function too closely, its a nightmare. If you want to represent physical processes, I find polynomial fillet functions work more nicely.
It takes a little effort to derive that polynomial, because you want it to be c1 continuous on both sides of the curve. So you have to construct the system of equations to solve for it as a function of the polynomial order and the specific relaxation you want (0.1 here).
My goto has generally been to consult the table of activation functions on wikipedia: https://en.wikipedia.org/wiki/Activation_function
I've had good luck with sigmoid and the hyperbolic tangent, scaling them such that we can choose the lower and upper values as well as choosing the location of the activation on the x-axis and the steepness.
Dymos uses a vectorization that I think is similar to OpenConcept and I've had success with numpy.where there as well, providing derivatives for each possible "branch" taken. It is true that you may have issues with derivative mismatches if you have an analysis point right on the transition, but often I've had success despite that. If the derivative at the transition becomes a hinderance then implementing a sigmoid or relu are more appropriate.
If x is of a magnitude such that it can cause overflows, consider applying units or using scaling to put it within reasonable limits if you cannot bound it directly.
I'm doing some materials work right now using Open Shading Language (OSL), and it has a convenient function, isinf(), which will determine whether a floating-point is infinite or not...
However, I can't find anything in the documentation about actually setting a variable to infinite. I'm instead going to be setting it to "irrationally large", which will certainly work well enough for my purposes (effectively cell noise generation), but I'm curious whether there's a built-in way to express infinity in OSL?
The problem is that OSL tries very very hard not to let you generate non-finite numbers, and there is no call to intentionally give you an infinity value. You could use what would in C be FLT_MAX: 3.402823466+38
I recently learned about the feature of the semi-total derivative approximation. I started to use this feature with bsplines and an explicit component. My current problem is that my design variables are input from two different components similar to the xsdm below. As far as I see it is not possible to set up different finite difference steps for different design variables. So looking at the xsdm again the control points, x and z should have identical FD steps i.e.
model.approx_totals(step=1)
works but
model.approx_totals(step=np.ones(5))
won't work. I guess, one remedy is to use the relative step size but some of my input bounds are varying from 0 to xx so maybe the relative step size is not the best. Is there a way to feed in FD steps as a vector or something similar to ;
for out in outputs:
for dep,fdstep in zip(inputs,inputsteps):
self.declare_partials(of=out,wrt=dep,method='fd',step=fdstep, form='central')
As of OpenMDAO V2.4, you don't have the ability to set per-variable FD step sizes when using approx_totals. The best option is just to use relative step sizes.
if i have a design variable that has lower and upper bounds of 0 and 1e6 and an initial value of 1e5
it surely is very insensitive to the default finite difference steps of 1e-6
is the correct way of overcoming this problem ;
a) change FD step size f.e. to 5e4
b) scale the design variable with 'scaler' of 1e6 and set the lower upper bounds to 0 and 1, while keeping the default FD steps.
I think "a" is your best bet if you are using the latest (OpenMDAO 2.x).
When you call declare_partials for a specific derivative in a component, or when you call approx_totals on a group, you can pass in an optional argument called "step", which contains the desired stepsize. Since your variable spans [0, 1e6], I think maybe a step size between 1e1 and 1e3 would work for you.
Idea "b" wouldn't actually work at present for fixing the FD problem. The step size you declare is applied to the unscaled value of the input, so you would still have the same precision problem. This is true for both kinds of scaling (1. specified on add_output, and 2. specified on add_design_var). Note though that you may still want to scale this problem anyway because the optimizer may work better on a scaled problem. If you do this then, you should still declare the larger "step" size mentioned above.
BTW, another option is to use a relative stepsize in the 'fd' calculation by setting the "step_calc" argument to "rel". This turns the absolute stepsize into a relative stepsize. However, I don't recommend this here because your range includes zero, and when it is close to zero, the stepsize falls back to an absolute one to prevent it from being too tiny.
I can see that my design variable exceeds its limits. (using COBYLA in this case)
I have a sample setup with single design variable where the optimum lies around 0.
I set the 'lower=0'.
I want this to be a very strict limit, because negative values yield NaN for my solver.
The optimizer goes i.e.
1, 2, 0, -0.125000000e-01, -1.56250000e-02, -1.95312500e-03, -2.44140625e-04
-3.05175781e-05, -3.81469727e-06, -5.00000000e-07
I am guessing this is optimizer type dependent? But is there a way enforce more strictly.
Unfortunately, COBYLA does not strictly respect variable bounds (see scipy docs) The best you can do is to add them as linear constraints, and it will attempt to enforce them at the optimum point.
You can try SLSQP, though. It does strictly respect the bounds.