How do I warm start a dymos optimization problem? - openmdao

My problem:
I have a system with 4 states and 4 parameters (static) that I would like to optimize. The parameters are initialized to some known values that would result in trajectories that respect constraints. The states are initialized to a constant value. To verify the model, I run the problem where the parameters setting opt=False. Once verified, I rebuild the OpenMDAO problem with opt=True and run the optimizer.
I'm running a study to evaluate how each parameter affects the system, cost function, etc. and how the initial guess impacts the optimization (ideally, it doesn't). The problem I encounter is that some initial guesses for a parameter result in a failed optimization (iteration limit or positive line search) while others don't and it's not immediately clear why. Note: I always provide an initial guess for the problem that results in feasible trajectories. I check this by setting opt=False for the parameters when I build the problem.
My assumption is that although my initial guess for the parameters are okay, my initial guess for the states is not and the problem gets stuck trying to get feasible trajectories.
My solution/idea:
Is it possible to warm start an optimization problem in Dymos? To warm start, I would like to provide a feasible solution to the states and state rates of the optimizer. As a general flow I would like to first (1) run the optimization with the opt setting in controls and parameters set to False to get a state trajectory, then (2) set the opt setting for controls and parameters to True, and finally (3) re-run the optimization. It seems like there should be an easy way to do this, but I can't determine how without creating 2 problems (with different opt settings) and setting all the initial state guesses of the opt=True problem.
Note: I did read this post: Dymos how to use previous trajectory solution as initial guess? and I can rerun a problem. I just don't know how to change the opt setting between runs.
If there is an alternate or better solution to my problem, I'd be interested in that as well.

If you are using IPOPT, using a previous solution as your initial guess doesn't really help. This is due to the nature of interior point optimizers. On start, the barrier parameter mu is large. This will push the "optimum" solution, for that value of the barrier parameter mu, from doing Newton's method, AWAY from the initial guess. Then mu is decreased, Newton's method gets you closer to the true optimum. This process gets repeated as mu as decreased, until finally mu is small and you get back to the point, which was the optimum that you guessed initially.
Also, because we are using a Quasi-Newton method with a limited-memory Hessian approximation (L-BFGS) when going through Dymos/pyoptsparse, all the information about the Hessian is not there when you start again even if your initial guess is the optimum. So that information has to be filled in again as the algorithm iterates.
I am not an IPOPT expert but this seems to explain why I had no luck trying to use an "improved" initial guess. One thing that did help a lot with convergence was increasing the "limited_memory_max_history" parameter to 100 or so.
IPOPT does have the warm-start option but getting it the initial information it needs regarding the Hessian and initial multipliers might be something you have to go into pyoptsparse to figure out how to do.

Related

Should linear solver be converging when Coloring is beign computed

I had to add some circular dependencies to my model and thus adding NonlinearBlockGS and LinearBlockGS to the Group with the circular dependency. I get messages like this
LN: LNBGSSolver 'LN: LNBGS' on system 'XXX' failed to converge in 10
iterations.
in the phase where it's finding the Coloring of the problem. There is a Dymos trajectory as part of the problem, but the circular dependency is not in the Trajectory group, it's upstream. It however converges very easily when actually solving the problem. The number of FWD solves is the same as it was before-- everything seem to work fine. Should I be worried about anything?
the way our total derivative coloring works is that we replace partial derivatives with random numbers and then solve the linear system. So the linear solver should be converging. Now, whether or not it should converge with LNBGS in 10 iterations... probably not.
Its hard to speak diffinitively when putting random numbers into a matrix to invert it... but generally speaking it should remain invertible (though we can't promise). That does not mean that it will remain easily invertible. How close does the linear residual get during the coloring? it is decreasing, but slowly. Would more iteration let it get there?
If your problem is working well, I don't think you need to freak out about this. If you would like it to converge better, it won't hurt anything and might give you better coloring. You can increase the iprint of that solver to get more information on the convergence history.
Another option, if your system is small enough, is to try using the DirectSolver instead of LNBGS. For most models with less than 10,000 variables in them a DirectSolver will be overall faster than the LNBGS. There is a nice symetry to using LNBGS with NLGBS ... but while the nonlinear solver tends to be a good choice (i.e. fast and stable) for cyclic dependencies the same can't be said for its linear counter part.
So my go-to combination if NLBGS and DirectSolver. You can't always use the DirectSolver. If you have distributed components in your model, or components that use the matrix-free derivative APIs (apply_linear, compute_jacvec_product), then LNBGS is a good option. But if everything is explicit components with compute_partials or implicit components that provide partials in the linearize method then I suggest using the DirectSolver as your first option.
I think you may have discovered a coloring performance issue in OpenMDAO. When we compute coloring, internally we replace the component partials with random arrays matching the declared sparsity. Since we're not trying to find an actual solution when we compute coloring, we probably don't need to iterate more than once in any given system. And we shouldn't be generating convergence warnings when computing the coloring. I don't think you need to be worried in this case. I'll put a story in our bug tracker to look into this.

Min Max component, objective function

I would like to perform some optimizations by minimizing the maximum of a specific path variable within Dymos. or the maximum of the absolute of such a variable.
In linear programming methods, this can be done by introducing slack variables.
Do you know if this has been attempted before with Dymos, or if there was a reason not to include it?
I understand gradient based methods are not entirely suitable for these problems, though I think some "functions" can be introduced to mitigate this.
For example,
The space shuttle reentry problem from [Betts][1] used as a [test example][2] in dymos, the original source contains an example where the maximum heat flux is minimized. Such functionality could be implemented with the "loc" argument as:
phase.add_objective('q_c', loc='max')
[1]: J. Betts. Practical Methods for Optimal Control and Estimation Using Nonlinear Programming. Society for Industrial and Applied Mathematics, second edition, 2010. URL: https://epubs.siam.org/doi/abs/10.1137/1.9780898718577, arXiv:https://epubs.siam.org/doi/pdf/10.1137/1.9780898718577, doi:10.1137/1.9780898718577.
[2]: https://openmdao.github.io/dymos/examples/reentry/reentry.html
This has been done with pseudospectral methods before. Dymos currently doesn't have any direct way of implementing this, for a few reasons:
As you said, doing this naively can introduce discontinuous gradients that confuse the optimizer. When the node at which the maximum occurs switches, this tends to cause a sharp edge discontinuity in the gradient.
Since the pseudospectral methods are discrete, you cannot guarantee that the maximum will occur at a node. It's often fine to assume it does, but sometimes your requirements might demand more precision.
There are two possible ways to get around this.
The KSComp in OpenMDAO can be used as a "differentiable maximum". Add one after the trajectory, feed it the timeseries data for the output of interest, and set it up such that it returns a smooth approximation to the maximum. The KS function is a bit conservative, so it won't pick out the precise maximum, but depending on the value of the rho option it can be tuned to get pretty close.
When a more precise value of a maximum is needed, it's pretty common to set up a trajectory such that a phase ends when the maximum or minimum is reached.
If the variable whose maximum is being sought is a state, this can be done by adding a boundary constraint on the rate source for that state.
This ensures that the maximum occurs at the first or last node in the phase (depending on if its an initial or final boundary constraint). That lets you more accurately capture its value.
If the variable being sought is not a state, its possible to use the polynomials that are used for fitting states and controls in a phase to interpolate the variable of interest. By then taking the time derivative of that polynomial we can get a reasonably good approximation for its rate. The master branch of dymos has a method add_timeseries_rate_output that does this. And soon, within a few weeks hopefully, we'll add add_boundary_rate_constraint so that these interpolated rates can be easily used as boundary constraints.
In the meantime, you should be able to achieve this by adding the timeseries rate output and then manually applying the OpenMDAO method 'add_constraint' to the resulting timeseries output, using either indices=[0] or indices=[-1] to treat it as an initial or final constraint.
This is a common enough request that we'll add some documentation on how to achieve this behavior using both the KSComp approach and the boundary constraint approach.
Personally I'm not as much of a fan of KSComp because I've had trouble getting problems getting those types of objectives to converge in the past. I've used the slack variable and that has worked well. In the following example, we take a guess at the Rotor power in static analysis, and then we run a trajectory and get the actual rotor power during the mission. The objective was to minimize aircraft weight, so if you have a large amount of power in statics, that costs more weight. The constraint shown below prevents us from decreasing our updated guess of rotor power in statics below the maximum power required during the trajectory.
p.model.add_subsystem(
'static_power_check',
om.ExecComp('Power_check = Power_ODE - Power_statics',
Power_check = {'value':np.ones(nn_timeseries_main_tx), 'units':'kW'},
Power_ODE = {'value':np.ones(nn_timeseries_main_tx), 'units':'kW'},
Power_statics = {'value':0.0, 'units':'kW'}),
promotes_inputs=[
('Power_ODE','hop0.main_phase.timeseries.Power_R'), ('Power_statics','Power_{rotor,slack}')],
promotes_outputs=['Power_check'])
p.model.add_constraint('Power_check', upper=0, ref=1)
The constraint on the slack variable effectively helped us ensure that our slack rotor power matched the maximum rotor power during the mission. This allowed us to get the right sizes for the rotor parts (i.e. motors).

OpenMDAO: conditional statement depending on the number of iterations

During an optimisation using OpenMDAO, is there any way to access the number of iterations or the values of the design variables in previous iterations during optimisation?
I would like to create a conditional statement depending on the corresponding number of iterations.
I have created a continuous function representing discrete points linked by exponential functions. I would like to increase the exponent of the intermediate function with the number of iterations so that it penalises the intermediate values and the optimisation converges close to one of the discrete values.
Thank you in advance.
What you are describing sounds like a form of continuation/smoothing. I can suggest two different approaches:
Set a reasonable max-iteration limit on the optimizer and add an outside for-loop around the call to run_driver. You could even adapt the iteration limit after each stopping point is reached. Start with a very small iteration limit, and let it grow as you converge more.
Pros:
fairly simple to implement
uses existing OpenMDAO Driver APIs
Cons:
Limited ability to set your own stoping conditions (only really have iteration limit)
Restarting the optimization does not preserve the prior hessian approximation and may lead to poor convergence for quasi-newton method
Skip the OpenMDAO driver interface, and roll your own. This approach was suggested in the 2020 OpenMDAO Reverse Hackathon, for users who find the OpenMDAO Driver interface doesn't meet their need.
Pros:
A lot more flexibility
total control
Cons:
A lot more work

Solver selection : NonlinearBlockGS vs NewtonSolver

I have two openmdao groups with cyclic dependency between the groups. I calculate the derivatives using Complex step. I have a non-linear solver for the dependency and use SLSQP to optimize my objective function. The issue is with the choice of the non-linear solver. When I use NonlinearBlockGS the optimization is successful in 12 iterations. But when I use NewtonSolver with Directsolver or ScipyKrylov the optimization fails (Iteration limit exceeded), even with maxiter=2000. The cyclic connections converge, but it is just that the design variables does not reach the optimal values. The difference between the design variables in consecutive iterations is in the order 1e-5. And this increases the iterations needed. Also when I change the initial guess to a value closer to the optimal value it works.
To check further, I converted the model into IDF (by creating copies of coupling variables and consistency constraints) thereby removing the need for a solver. Now the optimization is successful in 5 iterations and the results are similar to the results when NonlinearBlockGS is used.
Why does this happen? Am I missing something? When should I use NewtonSolver over others? I know that it is difficult to answer without seeing the code. But it is just that my code is long with multiple components and I couldn't recreate the issue with a toy model. So any general insight is much appreciated.
Without seeing the code, you're right that its hard to give specifics.
Very broadly speaking, Newton can sometimes have a lot more trouble converging than NLBGS (Note: this is not absolutely true, but is a good rule of thumb). So what I would guess is happening is that on your first or second iteration, the newton solver isn't actually converging. You can check this by setting newton.options['iprint']=2 and looking at the iteration history as the optimizer iterates.
When you have a solver in your optimization, its critical that you also make sure that you set it to throw an error on non-convergence. Some optimizers can handle this error, and will backtrack on the line search. Others will just die. Either way, its important. Otherwise, you end up giving the optimizer an unconverged case that it doesn't know is unconverged.
This is bad for two reasons. First, the objective and constraints values it gets are going to be wrong! Second, and perhaps more importantly, the derivatives it computes are going to be wrong! You can read the details [in the theory manual,] but in summary the analytic derivative methods that OpenMDAO uses assume that the residuals have gone to 0. If thats not the case, the math breaks down. Even if you were doing full model finite-difference, non-convergenced models are a problem. You'll just get noisy garbage when you try to FD it.2
So, assuming you have set up your model correctly, and that you have the linear solvers set up problems (it sounds like you do since it works with NLBGS), then its most likely that the newton solver isn't converging. Use iprint, possibly combined with driver debug printing, to check this for yourself. If thats the case, you need to figure out how to get newton to behave better.
There are some tips here that are pretty general. You could also try using the armijo line search, which can often stablize a newton solve at the cost of some speed.
Finally... Newton isn't the best answer in all situations. If NLBGS is more stable, and computational cheaper you should use it. I applaud your desire to get it to work with Newton. You should definitely track down why its not, but if it turns out that Newton just can't solve your coupled problem reliably thats ok too!
the set it to throw an error on non-convergence is broken on your answer. I have added the link which I think is the right one. Please correct if the linked one is not the one you were thinking to link.

Numerical integration, Runge-Kutta, RK4 in game design

Nearly every game tends to use some of a game loop. Gafferongames has a great article on how to make a well designed game loop: http://gafferongames.com/game-physics/fix-your-timestep/
In his code, he uses integrate( state, t, deltaTime );, where I believe state contains position, velocity, and acceleration of the object. He uses RK4 to integrate it from t to t+deltaTime.
My question is, why use a numerical integration technique like RK4, when you can use kinematics equations (here) to find the exact value?
These equations work when acceleration is constant. It seems rare that you would have a changing acceleration within a timestep. It seems like RK4 is a lower performance, lower accuracy, more complex solution.
Edit: I think you could add a "jerk" value to objects and still find exact expressions for acceleration, velocity, and displacement, if you really wanted to.
Edit 2: Well, I did not read his "Integration Basics" article too carefully. I think he's modelling a damper and spring, which do cause non-constant acceleration within a timestep.
As soon as you add things that many game designers want, like (velocity dependent) drag, position dependent forces, etc. the equations are no longer solvable exactly.
So, if you're happy to limit your forces to those the kinematic equation can handle, then go with it. If you want something flexible, then numerical integration is the only way to go.
Note: If you treat the forces as constant over a time interval when they are not really constant - then you are actually using a form of numerical integration. And it is an inaccurate form of integration too. So why not use a tried and proven numerical method instead? RK4 is one of many such methods.
Approximating acceleration (derivatives, really) as constant within a time step is how numerical integration methods work. When the derivatives are not constant, you need to consider what sort of error you introduce by treating them as constant.
Imagine breaking a time range T up into N equal steps of width h=T/N. Now integrate the dynamical equations stepwise. With RK4, the local error per-step is O(h^5) giving a global error of O(h^4).
Using the kinematical equations as you propose, we can assess the error by considering the Taylor expansion of the position, keeping terms to second order. The position will have error of O(h^3) introduced at each step, corresponding to where you truncate the expansion. This gives local error O(h^3) and global error O(h^2).
Based on the asymptotic error, the error from RK4 goes to zero much more rapidly than does the kinematical equations. It's more accurate. RK4 obtains a very nice accuracy obtained for the number of function evaluations that need to be done.

Resources