How do you choose to compute partials with the adjoint method? - openmdao

Is there a specific option? Can you choose forward or reverse mode? Does it not matter since under the hood openMDAO computes the derivatives with the unified method?

When you call Problem.setup you can include the argument mode which is one of `fwd', 'rev', or 'auto'.
Note that the choice of forward or reverse (adjoint) derivatives affects how total derivatives (the ones that the optimizers and solvers need) are computed from the partial derivatives (the ones the provided by the components). It does not affect how the partial derivatives are provided in the compute_partials or linearize methods though.
Choosing the correct mode can make a big difference in performance, and in most use cases 'auto' will figure out the correct mode based on the number of design variables and the number of constraints + objective.
Problems with many constraints and few design variables (many rows, few columns in the total jacobian) will usually be much faster in forward mode.
Those with few constraints but many design variables (few rows, many columns) will be significantly faster in reverse.
Advanced Bidirectional Derivatives
OpenMDAO will also attempt to figure out the fastest way to compute totals through "coloring" the total jacobian. The analogy of this is, in finite-difference, sometimes you can perturb multiple design variables simultaneously to speed the calculation of the jacobian, assuming these variables don't both contribute to the same constraints/objective.
Traditionally, if a total jacobian had a dense column, then it can't efficiently be colored in reverse mode (multiple constraints "impact" the same design variable).
Similarly, a dense row would kill the efficiency of coloring the jacobian in forward mode.
However, OpenMDAO can figure out which derivatives can be efficiently colored in forward mode, and which can be colored in reverse mode, using both approaches to fill in the total jacobian.
You can read more about this capability here: http://openmdao.org/twodocs/versions/3.0.0/features/core_features/working_with_derivatives/simul_derivs.html

Related

Min Max component, objective function

I would like to perform some optimizations by minimizing the maximum of a specific path variable within Dymos. or the maximum of the absolute of such a variable.
In linear programming methods, this can be done by introducing slack variables.
Do you know if this has been attempted before with Dymos, or if there was a reason not to include it?
I understand gradient based methods are not entirely suitable for these problems, though I think some "functions" can be introduced to mitigate this.
For example,
The space shuttle reentry problem from [Betts][1] used as a [test example][2] in dymos, the original source contains an example where the maximum heat flux is minimized. Such functionality could be implemented with the "loc" argument as:
phase.add_objective('q_c', loc='max')
[1]: J. Betts. Practical Methods for Optimal Control and Estimation Using Nonlinear Programming. Society for Industrial and Applied Mathematics, second edition, 2010. URL: https://epubs.siam.org/doi/abs/10.1137/1.9780898718577, arXiv:https://epubs.siam.org/doi/pdf/10.1137/1.9780898718577, doi:10.1137/1.9780898718577.
[2]: https://openmdao.github.io/dymos/examples/reentry/reentry.html
This has been done with pseudospectral methods before. Dymos currently doesn't have any direct way of implementing this, for a few reasons:
As you said, doing this naively can introduce discontinuous gradients that confuse the optimizer. When the node at which the maximum occurs switches, this tends to cause a sharp edge discontinuity in the gradient.
Since the pseudospectral methods are discrete, you cannot guarantee that the maximum will occur at a node. It's often fine to assume it does, but sometimes your requirements might demand more precision.
There are two possible ways to get around this.
The KSComp in OpenMDAO can be used as a "differentiable maximum". Add one after the trajectory, feed it the timeseries data for the output of interest, and set it up such that it returns a smooth approximation to the maximum. The KS function is a bit conservative, so it won't pick out the precise maximum, but depending on the value of the rho option it can be tuned to get pretty close.
When a more precise value of a maximum is needed, it's pretty common to set up a trajectory such that a phase ends when the maximum or minimum is reached.
If the variable whose maximum is being sought is a state, this can be done by adding a boundary constraint on the rate source for that state.
This ensures that the maximum occurs at the first or last node in the phase (depending on if its an initial or final boundary constraint). That lets you more accurately capture its value.
If the variable being sought is not a state, its possible to use the polynomials that are used for fitting states and controls in a phase to interpolate the variable of interest. By then taking the time derivative of that polynomial we can get a reasonably good approximation for its rate. The master branch of dymos has a method add_timeseries_rate_output that does this. And soon, within a few weeks hopefully, we'll add add_boundary_rate_constraint so that these interpolated rates can be easily used as boundary constraints.
In the meantime, you should be able to achieve this by adding the timeseries rate output and then manually applying the OpenMDAO method 'add_constraint' to the resulting timeseries output, using either indices=[0] or indices=[-1] to treat it as an initial or final constraint.
This is a common enough request that we'll add some documentation on how to achieve this behavior using both the KSComp approach and the boundary constraint approach.
Personally I'm not as much of a fan of KSComp because I've had trouble getting problems getting those types of objectives to converge in the past. I've used the slack variable and that has worked well. In the following example, we take a guess at the Rotor power in static analysis, and then we run a trajectory and get the actual rotor power during the mission. The objective was to minimize aircraft weight, so if you have a large amount of power in statics, that costs more weight. The constraint shown below prevents us from decreasing our updated guess of rotor power in statics below the maximum power required during the trajectory.
p.model.add_subsystem(
'static_power_check',
om.ExecComp('Power_check = Power_ODE - Power_statics',
Power_check = {'value':np.ones(nn_timeseries_main_tx), 'units':'kW'},
Power_ODE = {'value':np.ones(nn_timeseries_main_tx), 'units':'kW'},
Power_statics = {'value':0.0, 'units':'kW'}),
promotes_inputs=[
('Power_ODE','hop0.main_phase.timeseries.Power_R'), ('Power_statics','Power_{rotor,slack}')],
promotes_outputs=['Power_check'])
p.model.add_constraint('Power_check', upper=0, ref=1)
The constraint on the slack variable effectively helped us ensure that our slack rotor power matched the maximum rotor power during the mission. This allowed us to get the right sizes for the rotor parts (i.e. motors).

OpenMDAO: conditional statement depending on the number of iterations

During an optimisation using OpenMDAO, is there any way to access the number of iterations or the values of the design variables in previous iterations during optimisation?
I would like to create a conditional statement depending on the corresponding number of iterations.
I have created a continuous function representing discrete points linked by exponential functions. I would like to increase the exponent of the intermediate function with the number of iterations so that it penalises the intermediate values and the optimisation converges close to one of the discrete values.
Thank you in advance.
What you are describing sounds like a form of continuation/smoothing. I can suggest two different approaches:
Set a reasonable max-iteration limit on the optimizer and add an outside for-loop around the call to run_driver. You could even adapt the iteration limit after each stopping point is reached. Start with a very small iteration limit, and let it grow as you converge more.
Pros:
fairly simple to implement
uses existing OpenMDAO Driver APIs
Cons:
Limited ability to set your own stoping conditions (only really have iteration limit)
Restarting the optimization does not preserve the prior hessian approximation and may lead to poor convergence for quasi-newton method
Skip the OpenMDAO driver interface, and roll your own. This approach was suggested in the 2020 OpenMDAO Reverse Hackathon, for users who find the OpenMDAO Driver interface doesn't meet their need.
Pros:
A lot more flexibility
total control
Cons:
A lot more work

OpenMDAO efficiency with using multiple comp

I recently read this sentence in a paper:
One important feature of OpenMDAO is the ability to subdivide a
problem into components that have a small number of inputs and outputs
and contain relatively simple analyzes.
Moreover, looking at the examples in the manual there are few number of inputs and outputs for each component.
Would that mean it is more efficient to use an execcomp that takes in two inputs from from an explicit component and outputs a constraint instead of doing everything within the explicitcomp. I try to come up with an example here:
x1,x2 --> ExplicitComp -->y1
y1 --> Execcomp --->constraint
OR
x1,x2 --->ExplicitComp -->y1,constraint
What the comment in that paper is referring to is not computational efficiency, but rather the benefit to the user in terms of making models more modular and maintainable. Additionally, when you have smaller components with fewer inputs, it is much easier to compute analytic derivatives for them.
The idea is that by breaking your calculation up into smaller steps, the partial derivatives are them simpler for you to compute by hand. OpenMDAO will then compute the total derivatives across the model for you.
So in a sense, you're leaning on OpenMDAO's ability to compute derivatives across large models to lessen your work load.
From a computational cost perspective, there is some cost associated with having more components vs less. Taken to the extreme, if you had one component for each line of code in a huge calculation then the framework overhead could become a problem. There are some features in OpenMDAO that can help mitigate some of this cost, specifically the in-memory assembly of Jacobians, for serial models.
With regard to the ExecComp specifically, that component is meant for simple and inexpensive calculations. It computes its derivatives using complex-step, which can be costly if large array inputs are involved. Its there to make simple steps like adding variables easier. But for expensive calculations, you shouldn't use it.
In your specific case, I would suggest that you consider if it is hard to propogate the derivatives from x1,x1 through to the constraint yourself. If the chain rule is not hard to handle, then probably I would just lump it all into one calculation. If for some reason, the derivatives are nasty when you combine all the calculations, then just split them up.

OpenMDAO: When is it needed to define the partial derivative?

I've noticed that defining unneccesary partial derivatives can significantly slow down the optimizer. Therefore I'm trying to understand: how can I know whether I should define the partial derivative for a certain input/output relationship?
When you say "unnecessary" do you mean partial derivatives that are always zero?
Using declare_partials('*', '*'), when a component is really more sparse than that will significantly slow down your model. Anywhere where a partial derivatives is always zero, you should simply not declare it.
Furthermore, if you have a vectorized operation, then your Jacobian is actually a diagonal matrix. In that case, you should declare a [sparse partial derivative] by giving rows and cols arguments to the declare_partial call1. This will often substantially speed up your code.
Technically speaking, if you follows the data path from all of your design variables, through each components, to the objective and constraints, then any variable you passed would need to have its partials defined. But practically speaking you should declare and specify all the partials for every output w.r.t. every input (unless they are zero), so that changes to model connectivity don't break your derivatives.
It takes a little bit more time to declare your partials more sparsely, but the performance speed up is well worth it.
I think they need to be defined if they are ever relevant to a response (constraint or objective) in the optimization, or as part of a nonlinear solve within a group. My personal practice is to always define them. Should I every change my optimization problem, which I do often, I don't want to have to go back and make sure I'm always defining the appropriate derivatives.
The master-branch of OpenMDAO contains some jacobian-coloring techniques which can significantly improve performance if your problem is particularly sparse in nature. This method is enabled by setting the following options on the driver:
p.driver.options['dynamic_simul_derivs'] = True
p.driver.options['dynamic_simul_derivs_repeats'] = 5
This method works by filling in the user-described sparsity pattern (specified with rows and cols in declare partials) with random numbers and computing the total jacobian. The repeat option is there in improve confidence in the results, since it's possible but unlikely that a single pass will result in an "incidental zero" in the jacobian that is not truly part of the sparsity structure.
With this technique, and by doing things like vectorizing by calculations instead of using nested for loops, I've been able to get very good performance in a lot of situations. Of course, the effectiveness of these methods is going to change from model to model.

fitness function and Selection for a Genetic Algorithm

I'm trying to design a nonlinear fitness function where I maximize variable A and minimize the variable B. The issue is that maximizing A is much more important at single digit values, almost logarithmic. B needs to be minimized and in contrast to A, it becomes less important when small (less than one) and more important when it's larger (>1), so exponential decay.
The main goal is to optimize A, so I guess an analog is A=profits, B=costs
Should I aim to keep everything positive so that the I can use a roulette wheel selection, or would it be better to use a rank/torunament kind of system? The purpose of my algorithm is shape optimization.
Thanks
When considering a multi-objective problem the goal is usually to identify all solutions that lie on the Pareto curve - the Pareto optimal set. Have a look here for a 2-dimensional visual example. When the algorithm completes you want a set of solutions that are not dominated by any other solution. You therefore need to define a pareto ranking mechanism to take into account both objectives - for a more in depth explanation, as well as links to even more reading, go here
With this in mind, in order to effectively explore all solutions along the pareto front you do not want an implementation that encourages premature convergence, otherwise your algorithm will only explore the search space in one specific area of the Pareto curve. I would implement a selection operator that keeps all members of each iteration's optimal set of solutions, that is all solutions which are not dominated by another + plus a parameter controlled percentage of other solutions. This way you encourage exploration all along the Pareto curve.
You also need to ensure your mutation and crossover operators are tuned correctly too. With any novel application of Evolutionary Algorithms, part of the problem is trying to identify an optimal parameter set for the problem domain... this is where it gets really interesting!!
The description is very vague, but assuming that you actually have an idea of what the function should look like and you're just wondering whether you need to modify it so that proportional selection can be used easily, then no. Regardless of fitness function, you should probably default to using something like tournament selection. Controlling selection pressure is one of the most important things you have to do in order to get consistently good results, and roulette wheel selection doesn't allow you that control. You typically get enormous pressure very early, which drives premature convergence. That might be preferable in a few cases, but it's not where I'd start my investigations.

Resources