I occasionally don't get convergence on my problem. My problem is setup as a Dymos problem. I am using IPOPT as my optimizer. If I am only running the problem once, I can check IPOPT.out for the converged string and that's ok.
I often want to run parameter sweeps, where I vary boundary conditions and problem options. I use Ray https://www.ray.io/, a python library for running parallel processes to do these. I turn off all file I/O that I can for this as otherwise the multiple processes interfere with each other writing to file.
However, it's then difficult to know if a particular process / case did not converge. For this reason actually having run_problem() return information on convergence would be useful. It doesn't seem to do that, so is there a way to get convergence info some other way, that does not involve reading a file?
I do realize there is the whole DOE driver system that is setup for OpenMDAO. However the learning curve looked rather steep. I got parallel processing working with Ray in a matter of hours, and it works quite well except for this one issue.
prob.driver.fail should be False if the the optimization was successful, and doesn't need to be read from a file. However, given the various levels of success in optimizers this might not be completely accurate. For instance, solved to acceptable tolerance vs. optimal solution found is a little difficult to capture in a simple boolean output, and we should probably find a better way to report the optimizer's success.
Related
My problem:
I have a system with 4 states and 4 parameters (static) that I would like to optimize. The parameters are initialized to some known values that would result in trajectories that respect constraints. The states are initialized to a constant value. To verify the model, I run the problem where the parameters setting opt=False. Once verified, I rebuild the OpenMDAO problem with opt=True and run the optimizer.
I'm running a study to evaluate how each parameter affects the system, cost function, etc. and how the initial guess impacts the optimization (ideally, it doesn't). The problem I encounter is that some initial guesses for a parameter result in a failed optimization (iteration limit or positive line search) while others don't and it's not immediately clear why. Note: I always provide an initial guess for the problem that results in feasible trajectories. I check this by setting opt=False for the parameters when I build the problem.
My assumption is that although my initial guess for the parameters are okay, my initial guess for the states is not and the problem gets stuck trying to get feasible trajectories.
My solution/idea:
Is it possible to warm start an optimization problem in Dymos? To warm start, I would like to provide a feasible solution to the states and state rates of the optimizer. As a general flow I would like to first (1) run the optimization with the opt setting in controls and parameters set to False to get a state trajectory, then (2) set the opt setting for controls and parameters to True, and finally (3) re-run the optimization. It seems like there should be an easy way to do this, but I can't determine how without creating 2 problems (with different opt settings) and setting all the initial state guesses of the opt=True problem.
Note: I did read this post: Dymos how to use previous trajectory solution as initial guess? and I can rerun a problem. I just don't know how to change the opt setting between runs.
If there is an alternate or better solution to my problem, I'd be interested in that as well.
If you are using IPOPT, using a previous solution as your initial guess doesn't really help. This is due to the nature of interior point optimizers. On start, the barrier parameter mu is large. This will push the "optimum" solution, for that value of the barrier parameter mu, from doing Newton's method, AWAY from the initial guess. Then mu is decreased, Newton's method gets you closer to the true optimum. This process gets repeated as mu as decreased, until finally mu is small and you get back to the point, which was the optimum that you guessed initially.
Also, because we are using a Quasi-Newton method with a limited-memory Hessian approximation (L-BFGS) when going through Dymos/pyoptsparse, all the information about the Hessian is not there when you start again even if your initial guess is the optimum. So that information has to be filled in again as the algorithm iterates.
I am not an IPOPT expert but this seems to explain why I had no luck trying to use an "improved" initial guess. One thing that did help a lot with convergence was increasing the "limited_memory_max_history" parameter to 100 or so.
IPOPT does have the warm-start option but getting it the initial information it needs regarding the Hessian and initial multipliers might be something you have to go into pyoptsparse to figure out how to do.
I am running my trajectory problem multiple times in a row while varying a parameter to generate plots and compare to other things. I think I can make it run faster by just using the previous solution as a guess.
Would I do something like
p['traj.phase_1.states:v'] = prev_p.get_val['traj.phase_1.states:v']
also is there a single function to load the file "dymos_simulation.db" into memory?
The dymos.run_problem is intended to be the mechanism that makes this simple.
There is currently a PR that addresses some shortcomings, but expect it to be merged sometime today and included in dymos 0.18.0 in the next day or two. In the meantime you can test against the source branch of the PR if you like:
https://github.com/OpenMDAO/dymos/pull/510
First, you can simulate out the initial guess of the controls (this is not recommended if you're likely to hit a singularity in the ODE during your simulation).
dymos.run_problem(p, run_driver=False, simulate=True)
That will generate the file 'dymos_simulation.db'. Then you can run
dymos.run_problem(p, run_driver=True, simulate=True, restart='dymos_simulation.db')
It will use the simulated guess as the initial guess for the solution. This should adequately satisfy the collocation constraints and give the optimizer an easier path to the solution.
I have two openmdao groups with cyclic dependency between the groups. I calculate the derivatives using Complex step. I have a non-linear solver for the dependency and use SLSQP to optimize my objective function. The issue is with the choice of the non-linear solver. When I use NonlinearBlockGS the optimization is successful in 12 iterations. But when I use NewtonSolver with Directsolver or ScipyKrylov the optimization fails (Iteration limit exceeded), even with maxiter=2000. The cyclic connections converge, but it is just that the design variables does not reach the optimal values. The difference between the design variables in consecutive iterations is in the order 1e-5. And this increases the iterations needed. Also when I change the initial guess to a value closer to the optimal value it works.
To check further, I converted the model into IDF (by creating copies of coupling variables and consistency constraints) thereby removing the need for a solver. Now the optimization is successful in 5 iterations and the results are similar to the results when NonlinearBlockGS is used.
Why does this happen? Am I missing something? When should I use NewtonSolver over others? I know that it is difficult to answer without seeing the code. But it is just that my code is long with multiple components and I couldn't recreate the issue with a toy model. So any general insight is much appreciated.
Without seeing the code, you're right that its hard to give specifics.
Very broadly speaking, Newton can sometimes have a lot more trouble converging than NLBGS (Note: this is not absolutely true, but is a good rule of thumb). So what I would guess is happening is that on your first or second iteration, the newton solver isn't actually converging. You can check this by setting newton.options['iprint']=2 and looking at the iteration history as the optimizer iterates.
When you have a solver in your optimization, its critical that you also make sure that you set it to throw an error on non-convergence. Some optimizers can handle this error, and will backtrack on the line search. Others will just die. Either way, its important. Otherwise, you end up giving the optimizer an unconverged case that it doesn't know is unconverged.
This is bad for two reasons. First, the objective and constraints values it gets are going to be wrong! Second, and perhaps more importantly, the derivatives it computes are going to be wrong! You can read the details [in the theory manual,] but in summary the analytic derivative methods that OpenMDAO uses assume that the residuals have gone to 0. If thats not the case, the math breaks down. Even if you were doing full model finite-difference, non-convergenced models are a problem. You'll just get noisy garbage when you try to FD it.2
So, assuming you have set up your model correctly, and that you have the linear solvers set up problems (it sounds like you do since it works with NLBGS), then its most likely that the newton solver isn't converging. Use iprint, possibly combined with driver debug printing, to check this for yourself. If thats the case, you need to figure out how to get newton to behave better.
There are some tips here that are pretty general. You could also try using the armijo line search, which can often stablize a newton solve at the cost of some speed.
Finally... Newton isn't the best answer in all situations. If NLBGS is more stable, and computational cheaper you should use it. I applaud your desire to get it to work with Newton. You should definitely track down why its not, but if it turns out that Newton just can't solve your coupled problem reliably thats ok too!
the set it to throw an error on non-convergence is broken on your answer. I have added the link which I think is the right one. Please correct if the linked one is not the one you were thinking to link.
I have a rather complicated issue with my small package. Basically, I'm building a GARCH(1,1) model with rugarch package that is designed exactly for this purpose. It uses a chain of solvers (provided by Rsolnp and nloptr, general-purpose nonlinear optimization) and works fine. I'm testing my method with testthat by providing a benchmark solution, which was obtained previously by manually running the code under Windows (which is the main platform for the package to be used in).
Now, I initially had some issues when the solution was not consistent across several consecutive runs. The difference was within the tolerance I specified for the solver (default solver = 'hybrid', as recommended by the documentation), so my guess was it uses some sort of randomization. So I took away both random seed and parallelization ("legitimate" reasons) and the issue was solved, I'm getting identical results every time under Windows, so I run R CMD CHECK and testthat succeeds.
After that I decided to automate a little bit and now the build process is controlled by travis. To my surprise, the result under Linux is different from my benchmark, the log states that
read_sequence(file_out) not equal to read_sequence(file_benchmark)
Mean relative difference: 0.00000014688
Rebuilding several times yields the same result, and the difference is always the same, which means that under Linux the solution is also consistent. As a temporary fix, I'm setting a tolerance limit depending on the platform, and the test passes (see latest builds).
So, to sum up:
A numeric procedure produces identical output on both Windows and Linux platforms separately;
However, these outputs are different and are not caused by random seeds and/or parallelization;
I generally only care about supporting under Windows and do not plan to make a public release, so this is not a big deal for my package per se. But I'm bringing this to attention as there may be an issue with one of the solvers that are being used quite widely.
And no, I'm not asking to fix my code: platform dependent tolerance is quite ugly, but it does the job so far. The questions are:
Is there anything else that can "legitimately" (or "naturally") lead to the described difference?
Are low-level numeric routines required to produce identical results on all platforms? Can it happen I'm expecting too much?
Should I care a lot about this? Is this a common situation?
Have you ever written simulations or randomized algorithms where you've run into trouble because of the quality of the (pseudo)-random numbers you used?
What was happening?
How did you detect / realize your prng was the problem?
Was switching PRNGs enough to fix the problem, or did you have to switch to a source of true randomness?
I'm trying to figure out what types of applications require one to worry about the quality of their source of randomness and how one realizes when this becomes a problem.
The dated random number generator RANDU was infamous in the seventies for producing "bad" random numbers. My PhD supervisor mentioned that it affected his PhD and he had to rerun simulations. A search on Google for RANDU linear congrunetial generator brings up other examples.
When I run simulations on multiple machines, I've sometimes been tempted to generate "random seeds", rather than just use a proper parallel random number generator. For example, generate the seed using the current time in seconds. This has caused me enough problems that I avoid this at all costs.
This is mainly due to my particular interests, but other than parallel computing, the thought of creating my own random number generator would never cross my mind. Calling a well tested random number function is trivial in most languages.
It is a good practice to run your prng against DieHard. Very good and fast PRNG exist nowadays (see the work of Marsaglia), see Numerical Recipes edition 3 for a good introduction.