Semi-total derivative approximation with varying finite difference steps

Semi-total derivative approximation with varying finite difference steps - openmdao

I recently learned about the feature of the semi-total derivative approximation. I started to use this feature with bsplines and an explicit component. My current problem is that my design variables are input from two different components similar to the xsdm below. As far as I see it is not possible to set up different finite difference steps for different design variables. So looking at the xsdm again the control points, x and z should have identical FD steps i.e.
model.approx_totals(step=1)
works but
model.approx_totals(step=np.ones(5))
won't work. I guess, one remedy is to use the relative step size but some of my input bounds are varying from 0 to xx so maybe the relative step size is not the best. Is there a way to feed in FD steps as a vector or something similar to ;
for out in outputs:
for dep,fdstep in zip(inputs,inputsteps):
self.declare_partials(of=out,wrt=dep,method='fd',step=fdstep, form='central')

As of OpenMDAO V2.4, you don't have the ability to set per-variable FD step sizes when using approx_totals. The best option is just to use relative step sizes.

Related

Recomendations (functions/solution) to apply in OpenMDAO instead of boolean conditions (if/else)

I have been working for a couple of months with OpenMDAO and I find myself struggling with my code when I want to impose conditions for trying to replicate a physical/engineering behaviour.
I have tried using sigmoid functions, but I am still not convinced with that, due to the difficulty about trading off sensibility and numerical stabilization. Most of times I found overflows in exp so I end up including other conditionals (like np.where) so loosing linearity.
outputs['sigmoid'] = 1 / (1 + np.exp(-x))
I was looking for another kind of step function or something like that, able to keep linearity and derivability to the ease of the optimization. I don't know if something like that exists or if there is any strategy that can help me. If it helps, I am working with an OpenConcept benchmark, which uses vectorized computations ans Simpson's rule numerical integration.
Thank you very much.
PD: This is my first ever question in stackoverflow, so I would like to apologyze in advance for any error or bad practice commited. Hope to eventually collaborate and become active in the community.
Update after Justin answer:
I will take the opportunity to define a little bit more my problem and the strategy I tried. I am trying to monitorize and control thermodynamics conditions inside a tank. One of the things is to take actions when pressure P1 reaches certein threshold P2, for defining this:
eval= (inputs['P1'] - inputs['P2']) / (inputs['P1'] + inputs['P2'])
# P2 = threshold [Pa]
# P1 = calculated pressure [Pa]
k=100 #steepness control
outputs['sigmoid'] = (1 / (1 + np.exp(-eval * k)))
eval was defined in order avoid overflows normalizing the values, so when the threshold is recahed, corrections are taken. In a very similar way, I defined a function to check if there is still mass (so flowing can continue between systems):
eval= inputs['mass']/inputs['max']
k=50
outputs['sigmoid'] = (1 / (1 + np.exp(-eval*k)))**3
maxis also used for normalizing the value and the exponent is added for reaching zero before entering in the negative domain.
PLot (sorry it seems I cannot post images yet for my reputation)
It may be important to highlight that both mass and pressure are calculated from coupled ODE integration, in which this activation functions take part. I guess OpenConcept nature 'explore' a lot of possible values before arriving the solution, so most of the times giving negative infeasible values for massand pressure and creating overflows. For that sometimes I try to include:
eval[np.where(eval > 1.5)] = 1.5
eval[np.where(eval < -1.5)] = -1.5
That is not a beautiful but sometimes effective solution. I try to avoid using it since I taste that this bounds difficult solver and optimizer work.

I could give you a more complete answer if you distilled your question down to a specific code example of the function you're wrestling with and its expected input range. If you provide that code-sample, I'll update my answer.
Broadly, this is a common challenge when using gradient based optimization. You want some kind of behavior like an if-condition to turn something on/off and in many cases thats a fundamentally discontinuous function.
To work around that we often use sigmoid functions, but these do have some of the numerical challenges you pointed out. You could try a hyberbolic tangent as an alternative, though it may suffer the same kinds of problems.
I will give you two broad options:
Option 1
sometimes its ok (even if not ideal) to leave the purely discrete conditional in the code. Lets say you wanted to represent a kind of simple piecewise function:
y = 2x; x>=0
y = 0; x < 0
There is a sharp corner in that function right at 0. That corner is not differentiable, but the function is fine everywhere else. This is very much like the absolute value function in practice, though you might not draw the analogy looking at the piecewise definition of the function because the piecewise nature of abs is often hidden from you.
If you know (or at least can check after the fact) that your final answer will no lie right on or very near to that C1 discontinuity, then its probably fine to leave the code the way is is. Your derivatives will be well defined everywhere but right at 0 and you can simply pick the left or the right answer for 0.
Its not strictly mathematically correct, but it works fine as long as you're not ending up stuck right there.
Option 2
Apply a smoothing function. This can be a sigmoid, or a simple polynomial. The exact nature of the smoothing function is highly specific to the kind of discontinuity you are trying to approximate.
In the case of the piecewise function above, you might be tempted to define that function as:
2x*sig(x)
That would give you roughly the correct behavior, and would be differentiable everywhere. But wolfram alpha shows that it actually undershoots a little. Thats probably undesirable, so you can increase the exponent to mitigate that. This however, is where you start to get underflow and overflow problems.
So to work around that, and make a better behaved function all around, you could instead defined a three part piecewise polynomial:
y = 2x; x>=a
y = c0 + c1*x + c2*x**2; -a <= x < a
y = 0 x < -a
you can solve for the coefficients as a function of a (please double check my algebra before using this!):
c0 = 1.5a
c1 = 2
c2 = 1/(2a)
The nice thing about this approach is that it will never overshoot and go negative. You can also make a reasonably small and still get decent numerics. But if you try to make it too small, c2 will obviously blow up.
In general, I consider the sigmoid function to be a bit of a blunt instrument. It works fine in many cases, but if you try to make it approximate a step function too closely, its a nightmare. If you want to represent physical processes, I find polynomial fillet functions work more nicely.
It takes a little effort to derive that polynomial, because you want it to be c1 continuous on both sides of the curve. So you have to construct the system of equations to solve for it as a function of the polynomial order and the specific relaxation you want (0.1 here).

My goto has generally been to consult the table of activation functions on wikipedia: https://en.wikipedia.org/wiki/Activation_function
I've had good luck with sigmoid and the hyperbolic tangent, scaling them such that we can choose the lower and upper values as well as choosing the location of the activation on the x-axis and the steepness.
Dymos uses a vectorization that I think is similar to OpenConcept and I've had success with numpy.where there as well, providing derivatives for each possible "branch" taken. It is true that you may have issues with derivative mismatches if you have an analysis point right on the transition, but often I've had success despite that. If the derivative at the transition becomes a hinderance then implementing a sigmoid or relu are more appropriate.
If x is of a magnitude such that it can cause overflows, consider applying units or using scaling to put it within reasonable limits if you cannot bound it directly.

compute the tableau's nonbasic term in SCIP separator

In traditional Simplex Algorithm notation, we have x at the current basis selection B as so:
xB = AB-1b - AB-1ANxN. How can I compute the AB-1AN term inside a separator in SCIP, or at least iterate over its columns?
I see three helpful methods: getLPColsData, getLPRowsData, getLPBasisInd. I'm just not sure exactly what data those methods represent, particularly the last one, with its negative row indexes. How do I use those to get the value I want?
Do those methods return the same data no matter what LP algorithm is used? Or do I need to account for dual vs primal? How does the use of the "revised" algorithm play into my calculation?
Update: I discovered the getLPBInvARow and getLPBInvRow. That seems to be much closer to what I'm after. I don't yet understand their results; they seem to include more/less dimensions than expected. I'm still looking for understanding at how to use them to get the rays away from the corner.

you are correct that getLPBInvRow or getLPBInvARow are the methods you want. getLPBInvARow directly returns you a of the simplex tableau, but it is not more efficient to use than getLPBInvRow and doing the multiplication yourself since the LP solver needs to also compute the actual tableau first.
I suggest you look into either sepa_gomory.c or sepa_gmi.c for examples of how to use these methods. How do they include less dimensions than expected? They both return sparse vectors.

difference between FD steps and scaling for the design variables

if i have a design variable that has lower and upper bounds of 0 and 1e6 and an initial value of 1e5
it surely is very insensitive to the default finite difference steps of 1e-6
is the correct way of overcoming this problem ;
a) change FD step size f.e. to 5e4
b) scale the design variable with 'scaler' of 1e6 and set the lower upper bounds to 0 and 1, while keeping the default FD steps.

I think "a" is your best bet if you are using the latest (OpenMDAO 2.x).
When you call declare_partials for a specific derivative in a component, or when you call approx_totals on a group, you can pass in an optional argument called "step", which contains the desired stepsize. Since your variable spans [0, 1e6], I think maybe a step size between 1e1 and 1e3 would work for you.
Idea "b" wouldn't actually work at present for fixing the FD problem. The step size you declare is applied to the unscaled value of the input, so you would still have the same precision problem. This is true for both kinds of scaling (1. specified on add_output, and 2. specified on add_design_var). Note though that you may still want to scale this problem anyway because the optimizer may work better on a scaled problem. If you do this then, you should still declare the larger "step" size mentioned above.
BTW, another option is to use a relative stepsize in the 'fd' calculation by setting the "step_calc" argument to "rel". This turns the absolute stepsize into a relative stepsize. However, I don't recommend this here because your range includes zero, and when it is close to zero, the stepsize falls back to an absolute one to prevent it from being too tiny.

what if the FD steps varied w.r.t output/input

I am using the finite difference scheme to find gradients.
Lets say i have 2 outputs (y1,y2) and 1 input (x) in a single component. And in advance I know that the sensitivity of y1 with respect to x is not same as the sensitivity of y2 to x. And thus i could potentially have two different steps for those as in ;
self.declare_partials(of=y1, wrt=x, method='fd',step=0.01, form='central')
self.declare_partials(of=y2, wrt=x, method='fd',step=0.05, form='central')
There is nothing that stops me (algorithmically) but it is not clear what would openmdao gradient calculation exactly do in this case?
does it exchange information from the case where the steps are different by looking at the steps ratios or simply treating them independently and therefore doubling computational time ?

I just tested this, and it does the finite difference twice with the two different step sizes, and only saves the requested outputs for each step. I don't think we could do anything with the ratios as you suggested, as the reason for using different stepsizes to resolve individual outputs is because you don't trust the accuracy of the outputs at the smaller (or large) stepsize.

This is a fair question about the effect of the API. In typical FD applications you would get only 1 function call per design variable for forward and backward difference and 2 function calls for central difference.
However in this case, you have asked for two different step sizes for two different outputs, both with central difference. So here, you'll end up with 4 function calls to compute all the derivatives. dy1_dx will be computed using the step size of .01 and dy2_dx will be computed with a step size of .05.
There is no crosstalk between the two different FD calls, and you do end up with more function calls than you would have if you just specified a single step size via:
self.declare_partials(of='*', wrt=x, method='fd',step=0.05, form='central')
If the cost is something you can bear, and you get improved accuracy, then you could use this method to get different step sizes for different outputs.

Finite difference between old and new OpenMDAO

So I am converting a code from the old OpenMDAO to the new OpenMDAO. All the outputs and the partial gradients have been verified as correct. At first the problem would not optimize at all and then I realized that the old code had some components that did not provide gradients so they were automatically finite differenced. So I added fd_options['force_fd'] = True to those components but it still does not optimize to the right value. I checked the total derivative and it was still not correct. It also takes quite a bit longer to do each iteration than the old OpenMDAO. The only way I can get my new code to optimize to the same value as the old OpenMDAO code is to set each component to finite difference, even on the components that provide gradients. So I have a few questions about how finite difference works between the old and the new OpenMDAO:
When the old OpenMDAO did automatic finite difference did it only do it on the outputs and inputs needed for the optimization or did it calculate the entire Jacobian for all the inputs and outputs? Same question for the new OpenMDAO when you turn 'force_fd' to True.
Can you provide some parts of the Jacobian of a component and have it finite difference the rest? In the old OpenMDAO did it finite difference any gradients not provided unless you put missing_deriv_policy = 'assume_zero'?

So, the old OpenMDAO looked for groups of components without derivatives, and bundled them together into a group that could be finite differenced together. New OpenMDAO doesn't do that, so each of those components would be finite differenced separately.
We don't support that yet, and didn't in old OpenMDAO. We do have a story up on our pivotal tracker though, so we will eventually have this feature.
What I suspect might be happening for you is that the finite-difference groupings happened to be better in classic OpenMDAO. Consider one component with one input and 10 outputs connected to a second component with 10 inputs and 1 output. If you finite difference them together, only one execution is required. if you finite difference them individually, you need one execution of component one, and 10 executions of component two. This could cause a noticeable or even major performance hit.
Individual FD vs group FD can also cause accuracy problems, if there is an important input that has vastly different scaling than the other variables, so that the default FD stepsize of 1.0e-6 is no good. (Note: you can set a step_size when you add a param or output and it overrides the default for that var.)
Luckilly, new OpenMDAO has a way to recreate what you had in old OpenMDAO, but it is not automatic. What you would need to do is take a look at your model and figure out what components can be FD'd together, and then create a sub Group and move those components into that group. You can set fd_options['force_fd'] to True on the group, and it'll finite difference that group together. So for example, if you have A -> B -> C, with no components in between, and none have derivatives, you can move A, B, and C into a new sub Group with force_fd set to True.
If that doesn't fix things, we may have to look more deeply at your model.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex