OpenMDAO: conditional statement depending on the number of iterations - openmdao

During an optimisation using OpenMDAO, is there any way to access the number of iterations or the values of the design variables in previous iterations during optimisation?
I would like to create a conditional statement depending on the corresponding number of iterations.
I have created a continuous function representing discrete points linked by exponential functions. I would like to increase the exponent of the intermediate function with the number of iterations so that it penalises the intermediate values and the optimisation converges close to one of the discrete values.
Thank you in advance.

What you are describing sounds like a form of continuation/smoothing. I can suggest two different approaches:
Set a reasonable max-iteration limit on the optimizer and add an outside for-loop around the call to run_driver. You could even adapt the iteration limit after each stopping point is reached. Start with a very small iteration limit, and let it grow as you converge more.
Pros:
fairly simple to implement
uses existing OpenMDAO Driver APIs
Cons:
Limited ability to set your own stoping conditions (only really have iteration limit)
Restarting the optimization does not preserve the prior hessian approximation and may lead to poor convergence for quasi-newton method
Skip the OpenMDAO driver interface, and roll your own. This approach was suggested in the 2020 OpenMDAO Reverse Hackathon, for users who find the OpenMDAO Driver interface doesn't meet their need.
Pros:
A lot more flexibility
total control
Cons:
A lot more work

Related

Should linear solver be converging when Coloring is beign computed

I had to add some circular dependencies to my model and thus adding NonlinearBlockGS and LinearBlockGS to the Group with the circular dependency. I get messages like this
LN: LNBGSSolver 'LN: LNBGS' on system 'XXX' failed to converge in 10
iterations.
in the phase where it's finding the Coloring of the problem. There is a Dymos trajectory as part of the problem, but the circular dependency is not in the Trajectory group, it's upstream. It however converges very easily when actually solving the problem. The number of FWD solves is the same as it was before-- everything seem to work fine. Should I be worried about anything?
the way our total derivative coloring works is that we replace partial derivatives with random numbers and then solve the linear system. So the linear solver should be converging. Now, whether or not it should converge with LNBGS in 10 iterations... probably not.
Its hard to speak diffinitively when putting random numbers into a matrix to invert it... but generally speaking it should remain invertible (though we can't promise). That does not mean that it will remain easily invertible. How close does the linear residual get during the coloring? it is decreasing, but slowly. Would more iteration let it get there?
If your problem is working well, I don't think you need to freak out about this. If you would like it to converge better, it won't hurt anything and might give you better coloring. You can increase the iprint of that solver to get more information on the convergence history.
Another option, if your system is small enough, is to try using the DirectSolver instead of LNBGS. For most models with less than 10,000 variables in them a DirectSolver will be overall faster than the LNBGS. There is a nice symetry to using LNBGS with NLGBS ... but while the nonlinear solver tends to be a good choice (i.e. fast and stable) for cyclic dependencies the same can't be said for its linear counter part.
So my go-to combination if NLBGS and DirectSolver. You can't always use the DirectSolver. If you have distributed components in your model, or components that use the matrix-free derivative APIs (apply_linear, compute_jacvec_product), then LNBGS is a good option. But if everything is explicit components with compute_partials or implicit components that provide partials in the linearize method then I suggest using the DirectSolver as your first option.
I think you may have discovered a coloring performance issue in OpenMDAO. When we compute coloring, internally we replace the component partials with random arrays matching the declared sparsity. Since we're not trying to find an actual solution when we compute coloring, we probably don't need to iterate more than once in any given system. And we shouldn't be generating convergence warnings when computing the coloring. I don't think you need to be worried in this case. I'll put a story in our bug tracker to look into this.

Min Max component, objective function

I would like to perform some optimizations by minimizing the maximum of a specific path variable within Dymos. or the maximum of the absolute of such a variable.
In linear programming methods, this can be done by introducing slack variables.
Do you know if this has been attempted before with Dymos, or if there was a reason not to include it?
I understand gradient based methods are not entirely suitable for these problems, though I think some "functions" can be introduced to mitigate this.
For example,
The space shuttle reentry problem from [Betts][1] used as a [test example][2] in dymos, the original source contains an example where the maximum heat flux is minimized. Such functionality could be implemented with the "loc" argument as:
phase.add_objective('q_c', loc='max')
[1]: J. Betts. Practical Methods for Optimal Control and Estimation Using Nonlinear Programming. Society for Industrial and Applied Mathematics, second edition, 2010. URL: https://epubs.siam.org/doi/abs/10.1137/1.9780898718577, arXiv:https://epubs.siam.org/doi/pdf/10.1137/1.9780898718577, doi:10.1137/1.9780898718577.
[2]: https://openmdao.github.io/dymos/examples/reentry/reentry.html
This has been done with pseudospectral methods before. Dymos currently doesn't have any direct way of implementing this, for a few reasons:
As you said, doing this naively can introduce discontinuous gradients that confuse the optimizer. When the node at which the maximum occurs switches, this tends to cause a sharp edge discontinuity in the gradient.
Since the pseudospectral methods are discrete, you cannot guarantee that the maximum will occur at a node. It's often fine to assume it does, but sometimes your requirements might demand more precision.
There are two possible ways to get around this.
The KSComp in OpenMDAO can be used as a "differentiable maximum". Add one after the trajectory, feed it the timeseries data for the output of interest, and set it up such that it returns a smooth approximation to the maximum. The KS function is a bit conservative, so it won't pick out the precise maximum, but depending on the value of the rho option it can be tuned to get pretty close.
When a more precise value of a maximum is needed, it's pretty common to set up a trajectory such that a phase ends when the maximum or minimum is reached.
If the variable whose maximum is being sought is a state, this can be done by adding a boundary constraint on the rate source for that state.
This ensures that the maximum occurs at the first or last node in the phase (depending on if its an initial or final boundary constraint). That lets you more accurately capture its value.
If the variable being sought is not a state, its possible to use the polynomials that are used for fitting states and controls in a phase to interpolate the variable of interest. By then taking the time derivative of that polynomial we can get a reasonably good approximation for its rate. The master branch of dymos has a method add_timeseries_rate_output that does this. And soon, within a few weeks hopefully, we'll add add_boundary_rate_constraint so that these interpolated rates can be easily used as boundary constraints.
In the meantime, you should be able to achieve this by adding the timeseries rate output and then manually applying the OpenMDAO method 'add_constraint' to the resulting timeseries output, using either indices=[0] or indices=[-1] to treat it as an initial or final constraint.
This is a common enough request that we'll add some documentation on how to achieve this behavior using both the KSComp approach and the boundary constraint approach.
Personally I'm not as much of a fan of KSComp because I've had trouble getting problems getting those types of objectives to converge in the past. I've used the slack variable and that has worked well. In the following example, we take a guess at the Rotor power in static analysis, and then we run a trajectory and get the actual rotor power during the mission. The objective was to minimize aircraft weight, so if you have a large amount of power in statics, that costs more weight. The constraint shown below prevents us from decreasing our updated guess of rotor power in statics below the maximum power required during the trajectory.
p.model.add_subsystem(
'static_power_check',
om.ExecComp('Power_check = Power_ODE - Power_statics',
Power_check = {'value':np.ones(nn_timeseries_main_tx), 'units':'kW'},
Power_ODE = {'value':np.ones(nn_timeseries_main_tx), 'units':'kW'},
Power_statics = {'value':0.0, 'units':'kW'}),
promotes_inputs=[
('Power_ODE','hop0.main_phase.timeseries.Power_R'), ('Power_statics','Power_{rotor,slack}')],
promotes_outputs=['Power_check'])
p.model.add_constraint('Power_check', upper=0, ref=1)
The constraint on the slack variable effectively helped us ensure that our slack rotor power matched the maximum rotor power during the mission. This allowed us to get the right sizes for the rotor parts (i.e. motors).

How do you choose to compute partials with the adjoint method?

Is there a specific option? Can you choose forward or reverse mode? Does it not matter since under the hood openMDAO computes the derivatives with the unified method?
When you call Problem.setup you can include the argument mode which is one of `fwd', 'rev', or 'auto'.
Note that the choice of forward or reverse (adjoint) derivatives affects how total derivatives (the ones that the optimizers and solvers need) are computed from the partial derivatives (the ones the provided by the components). It does not affect how the partial derivatives are provided in the compute_partials or linearize methods though.
Choosing the correct mode can make a big difference in performance, and in most use cases 'auto' will figure out the correct mode based on the number of design variables and the number of constraints + objective.
Problems with many constraints and few design variables (many rows, few columns in the total jacobian) will usually be much faster in forward mode.
Those with few constraints but many design variables (few rows, many columns) will be significantly faster in reverse.
Advanced Bidirectional Derivatives
OpenMDAO will also attempt to figure out the fastest way to compute totals through "coloring" the total jacobian. The analogy of this is, in finite-difference, sometimes you can perturb multiple design variables simultaneously to speed the calculation of the jacobian, assuming these variables don't both contribute to the same constraints/objective.
Traditionally, if a total jacobian had a dense column, then it can't efficiently be colored in reverse mode (multiple constraints "impact" the same design variable).
Similarly, a dense row would kill the efficiency of coloring the jacobian in forward mode.
However, OpenMDAO can figure out which derivatives can be efficiently colored in forward mode, and which can be colored in reverse mode, using both approaches to fill in the total jacobian.
You can read more about this capability here: http://openmdao.org/twodocs/versions/3.0.0/features/core_features/working_with_derivatives/simul_derivs.html

A framework for comparing the time performance of Expectation Maximization

I have my own implementation of the Expectation Maximization (EM) algorithm based on this paper, and I would like to compare this with the performance of another implementation. For the tests, I am using k centroids with 1 Gb of txt data and I am just measuring the time it takes to compute the new centroids in 1 iteration. I tried it with an EM implementation in R, but I couldn't, since the result is plotted in a graph and gets stuck when there's a large number of txt data. I was following the examples in here.
Does anybody know of an implementation of EM that can measure its performance or know how to do it with R?
Fair benchmarking of EM is hard. Very hard.
the initialization will usually involve random, and can be very different. For all I know, the R implementation by default uses hierarchical clustering to find the initial clusters. Which comes at O(n^2) memory and most likely at O(n^3) runtime cost. In my benchmarks, R would run out of memory due to this. I assume there is a way to specify initial cluster centers/models. A random-objects initialization will of course be much faster. Probably k-means++ is a good way to choose initial centers in practise.
EM theoretically never terminates. It just at some point does not change much anymore, and thus you can set a threshold to stop. However, the exact definition of the stopping threshold varies.
There exist all kinds of model variations. A method only using fuzzy assignments such as Fuzzy-c-means will of course be much faster than an implementation using multivariate Gaussian Mixture Models with a covaraince matrix. In particular with higher dimensionality.
Covariance matrixes also need O(k * d^2) memory, and the inversion will take O(k * d^3) time, and thus is clearly not appropriate for text data.
Data may or may not be appropriate. If you run EM on a data set that actually has Gaussian clusters, it will usually work much better than on a data set that doesn't provide a good fit at all. When there is no good fit, you will see a high variance in runtime even with the same implementation.
For a starter, try running your own algorithm several times with different initialization, and check your runtime for variance. How large is the variance compared to the total runtime?
You can try benchmarking against the EM implementation in ELKI. But I doubt the implementation will work with sparse data such as text - that data just is not Gaussian, it is not proper to benchmark. Most likely it will not be able to process the data at all because of this. This is expected, and can be explained from theory. Try to find data sets that are dense and that can be expected to have multiple gaussian clusters (sorry, I can't give you many recommendations here. The classic Iris and Old Faithful data sets are too small to be useful for benchmarking.

fitness function and Selection for a Genetic Algorithm

I'm trying to design a nonlinear fitness function where I maximize variable A and minimize the variable B. The issue is that maximizing A is much more important at single digit values, almost logarithmic. B needs to be minimized and in contrast to A, it becomes less important when small (less than one) and more important when it's larger (>1), so exponential decay.
The main goal is to optimize A, so I guess an analog is A=profits, B=costs
Should I aim to keep everything positive so that the I can use a roulette wheel selection, or would it be better to use a rank/torunament kind of system? The purpose of my algorithm is shape optimization.
Thanks
When considering a multi-objective problem the goal is usually to identify all solutions that lie on the Pareto curve - the Pareto optimal set. Have a look here for a 2-dimensional visual example. When the algorithm completes you want a set of solutions that are not dominated by any other solution. You therefore need to define a pareto ranking mechanism to take into account both objectives - for a more in depth explanation, as well as links to even more reading, go here
With this in mind, in order to effectively explore all solutions along the pareto front you do not want an implementation that encourages premature convergence, otherwise your algorithm will only explore the search space in one specific area of the Pareto curve. I would implement a selection operator that keeps all members of each iteration's optimal set of solutions, that is all solutions which are not dominated by another + plus a parameter controlled percentage of other solutions. This way you encourage exploration all along the Pareto curve.
You also need to ensure your mutation and crossover operators are tuned correctly too. With any novel application of Evolutionary Algorithms, part of the problem is trying to identify an optimal parameter set for the problem domain... this is where it gets really interesting!!
The description is very vague, but assuming that you actually have an idea of what the function should look like and you're just wondering whether you need to modify it so that proportional selection can be used easily, then no. Regardless of fitness function, you should probably default to using something like tournament selection. Controlling selection pressure is one of the most important things you have to do in order to get consistently good results, and roulette wheel selection doesn't allow you that control. You typically get enormous pressure very early, which drives premature convergence. That might be preferable in a few cases, but it's not where I'd start my investigations.

Resources