How to setup nested driver/ optimizer? - openmdao

I'm pretty new to OpenMDAO. If would like to setup my problem such that there is a subdiscipline that is driven by its own optimizer, and it hands off the results to the top level problem, where a separate optimizer will use those results.
For a bit more context, the sub-problem is trajectory optimization of a vehicle. I successfully got that problem to converge in a few iterations, without varying the vehicle parameters (mass, thrust, fuel etc.). So far so good. However, if I let the optimizer also vary some vehicle parameters, it can't seem to get it to go to the global optimum.
So my thought was to let trajectory optimization subproblem do what it does succesfully, and incorporate that as subproblem to the overall problem, and see if that works better.
So my question is:
Can an OpenMDAO problem have multiple drivers?
What's the right way to set that up? Do I wrap my subproblem into its own ExplicitComponent?

While this is possible, solving a problem in this way will not pass accurate analytic derivatives between the system design and the trajectory design.
We've developed another tool specifically for the purpose of doing multidisciplinary optimization which involves trajectory optimization. Dymos
It supports pseudospectral methods (like those in GPOPS, PSOPT, and OTIS) as well as shooting methods, and it allows a trajectory to be optimized as part of a larger system optimization problem.
Take a look at some of the example problems and see if it might work for you.

Related

Avoiding singularity in analysis - does OpenMDAO automatically enable 'fully-simultaneous' solution?

Turbulent boundary layer calculations break down at the point of flow separation when solved with a prescribed boundary layer edge velocity, ue, in what is called the direct method.
This can be alleviated by solving the system in a fully-simultaneous or quasi-simultaneous manner. Details about both methods are available here (https://www.rug.nl/research/portal/files/14407586/root.pdf), pages 38 onwards. Essentially, the fully-simultaneous method combines the inviscid and viscous equations into a single large system of equations, and solves them with Newton iteration.
I have currently implemented an inviscid panel solver entirely in ExplicitComponents. I intend to implement the boundary layer solver also entirely with ExplicitComponents. I am unsure whether coupling these two groups would then result in an execution procedure like the direct method, or whether it would work like the fully-simultaneous method. I note that in the OpenMDAO paper, it is stated that the components are solved "as a single nonlinear system of equations", and that the reformulation from explicit components to the implicit system is handled automatically by OpenMDAO.
Does this mean that if I couple my two analyses (again, consisting purely of ExplicitComponents) and set the group to solve with the Newton solver, I'll get a fully-simultaneous solution 'for free'? This seems too good to be true, as ultimately the component that integrates the boundary layer equations will have to take some prescribed ue as an input, and then will run into the singularity in the execution of its compute() method.
If doing the above would instead make it execute like the direct method and lead to the singularity, (briefly) what changes would I need to make to avoid it? Would it require defining the boundary layer components implicitly?
despite seeming too good to be true, you can in fact change the structure of your system by changing out the top level solver.
If you used a NonlinearBlockGS solver at the tope, it would solve in the weak form. If you used a NewtonSolver at the top, it would solve as one large monolithic system. This property does indeed derive from the unique structure of how OpenMDAO stores things.
There are some caveats. I would guess that your panel code is implemented as a set of intermediate calculations broken up across several components. If thats the case, then the NewtonSolver will be treating each intermediate variable as it it was its own state variable. In other words, you would have more than just delta and u_e as states, but also all the intermediate calculations too.
This is might be somewhat unstable (though it might work just fine, so try it!). You might need a hybrid between the weak and strong forms, that can be achieved via the solve_subsystems option on the NewtonSolver. This approach, is called the Hierarchical Newton Method in section 5.1.2 of the OpenMDAO paper. It will do a sub-iteration of NLBGS for every top level Newton iteration. This acts as a form of nonlinear preconditioner which can help stabilize the strong form. You can limit ho many sub-iterations are done, and in your case you may want to use just 2 or 3 because of the risk of singularity.

Is there a way to specify partials for an Exec Comp?

Looking into the class, I'm seeing that by default it looks like they're complex stepped. Is there a way to specify an analytical partial?
I've got some code that has a lot of essentially one liner explicit comps with analytical partials specified. Is there any real performance benefit to that over an ExecComp? Or with simple functions does work out to roughly the same?
There's currently no way to specify analytic partials for ExecComps and you're right that they're complex-stepped.
The short answer to your next question is that for simple functions there's no meaningful performance benefit using explicit components over ExecComp. This is because complex-step computes derivatives within machine precision when using an adequately small step size, which OpenMDAO does. The actual computational cost of performing the complex-step, for one-liners, is generally trivial.
The longer answer involves a few considerations, such as the sizes of the component's input and output arrays, the sparsity pattern of the Jacobian, and the cost of the actual compute function. If you want, I can go into more detail about these considerations and suggest which method to use for your problems.
[Edit: I've updated the figure with results for this compute: y=sum(log(x)/x**2+3*log(x)]
I've added a figure below showing the cost for computing derivatives of a component as we change the size of the input array to that component. The analytic component is slightly faster across the board, but requires more lines of code.
Basically, whichever method is easier to implement is probably advantageous as there's not a huge cost difference. For this extremely simple compute function, because it's so inexpensive, the framework overhead probably has a larger impact on cost than the actual derivative computation. Of course these trends are also problem dependent.

Openmdao 1.x: Efficient way to implement Expected Improvement

I am currently using Openmdao 1.7.1. I am trying to have a MetaModel with Kriging train itself at the best point of Expected Improvement. The aim is to find a global optimum on a compact design space with an EGO-like method.
However I am facing the following conundrum:
In order to find the best point, the only way I see is to run an optimization on the Expected Improvement function with a gradient base optimizer in a nested Problem, with an outer problem running a FixedPointIterator, checking on the value of the Expected Improvement value.
My questions are the following:
Is there another, more efficient way of doing this ? I couldn't find anything about EGO in Openmdao 1.x, if there is, where should I look ?
If this is the only way:
Will this find the global optimum in my design space ?
Thank you in advance for your responses.
I think that you could develop EGO as a stand alone driver. The driver would be responsible for running the underlying model, collecting the cases, building the surrogate and doing its own sub-optimization.
You can use the surrogate models built into OpenMDAO for this. You just wouldn't use the meta-model component. You would just use the surrogate model by itself. For an example of how to do that, look at this test which runs kriging by itself.
So 90% of the EGO process would be wrapped up into a driver. This avoids the need for a sub-problem and I think simplifies the code significantly. The EGO algorithm is fairly simple and is not hard to code into the driver. You won't gain much by using nested-problems to implement it. But by making it a driver, you can still build a more complex model that will get run by EGO.

Custom Graph Partitioning algorithms in Giraph

There have been mentions of using Custom Partitioning algorithms for Giraph applications. However it is not clearly given at any place. As Castagna pointed out here in how to partition graph for pregel to maximize processing speed?, there may not be a need for such partitioning as HashPartitioner will in itself be very good in most cases.
The problem of partitioning a graph 'intelligently' in order to minimize execution time is an interesting one, however it's not simple and it depends on your data and your algorithm. You might find also that, in practice, it's not necessary and a random partitioning is sufficiently good.
For example, if you are interested in exploring Pregel-like approaches, you can have a look at Apache Giraph and experiment with different partitioning techniques.
However for the purpose of learning, it would be good to see live examples and there are none found as far as I've seen. For example, the normal k-way partitioning algorithm (Kerninghan-Lin) being executed in Giraph or atleast the direction I should implement it towards.
All the google results were from the Apache giraph page where there are only definitions of the functions and various options to use them.

Tuning Mathematical Parallel Codes

Assuming that I am interested in performance rather than portability of my linear algebra iterative multi-threaded solver and that I have the results of profiling my code in hand, how do I go about tuning my code to run optimally on that machine of my choice?
The algorithm involves Matrix-Vector multiplications, norms and dot-products. (FWIW, I am working on CG and GMRES).
I am working on codes which are of matrix size roughly equivalent to the full size of the RAM (~6GB). I'll be working on Intel i3 Laptop. I'll be linking my codes using Intel MKL.
Specifically,
Is there a good resource(PDF/Book/Paper) for learning manual tuning? There are numerous things that I learnt by doing for instance : Manual Unrolling isn't always optimal or about compiler flags but I would prefer a centralized resource.
I need something to translate profiler information to improved performance. For instance, my profiler tells me that my stacks of one processor are being accessed by another or that my mulpd ASM is taking too much time. I have no clue what these mean and how I could use this information for improving my code.
My intention is to spend as much time as needed to squeeze as much compute power as possible. Its more of a learning experience than for actual use or distribution as of now.
(I am concerned about manual tuning not auto-tuning)
Misc Details:
This differs from usual performance tuning since the major portions of the code are linked to Intel's proprietary MKL library.
Because of Memory Bandwidth issues in O(N^2) matrix-vector multiplications and dependencies, there is a limit to what I could manage on my own through simple observation.
I write in C and Fortran and I have tried both and as discussed a million times on SO, I found no difference in either if I tweak them appropriately.
Gosh, this still has no answers. After you've read this you'll still have no useful answers ...
You imply that you've already done all the obvious and generic things to make your codes fast. Specifically you have:
chosen the fastest algorithm for your problem (either that, or your problem is to optimise the implementation of an algorithm rather than to optimise the finding of a solution to a problem);
worked your compiler like a dog to squeeze out the last drop of execution speed;
linked in the best libraries you can find which are any use at all (and tested to ensure that they do in fact improve the performance of your program;
hand-crafted your memory access to optimise r/w performance;
done all the obvious little tricks that we all do (eg when comparing the norms of 2 vectors you don't need to take a square root to determine that one is 'larger' than another, ...);
hammered the parallel scalability of your program to within a gnat's whisker of the S==P line on your performance graphs;
always executed your program on the right size of job, for a given number of processors, to maximise some measure of performance;
and still you are not satisfied !
Now, unfortunately, you are close to the bleeding edge and the information you seek is not to be found easily in books or on web-sites. Not even here on SO. Part of the reason for this is that you are now engaged in optimising your code on your platform and you are in the best position to diagnose problems and to fix them. But these problems are likely to be very local indeed; you might conclude that no-one else outside your immediate research group would be interested in what you do, I know you wouldn't be interested in any of the micro-optimisations I do on my code on my platform.
The second reason is that you have stepped into an area that is still an active research front and the useful lessons (if any) are published in the academic literature. For that you need access to a good research library, if you don't have one nearby then both the ACM and IEEE-CS Digital Libraries are good places to start. (Post or comment if you don't know what these are.)
In your position I'd be looking at journals on 2 topics: peta- and exa-scale computing for science and engineering, and compiler developments. I trust that the former is obvious, the latter may be less obvious: but if your compiler already did all the (useful) cutting-edge optimisations you wouldn't be asking this question and compiler-writers are working hard so that your successors won't have to.
You're probably looking for optimisations which like, say, loop unrolling, were relatively difficult to find implemented in compilers 25 years ago and which were therefore bleeding-edge back then, and which themselves will be old and established in another 25 years.
EDIT
First, let me make explicit something that was originally only implicit in my 'answer': I am not prepared to spend long enough on SO to guide you through even a summary of the knowledge I have gained in 25+ years in scientific/engineering and high-performance computing. I am not given to writing books, but many are and Amazon will help you find them. This answer was way longer than most I care to post before I added this bit.
Now, to pick up on the points in your comment:
on 'hand-crafted memory access' start at the Wikipedia article on 'loop tiling' (see, you can't even rely on me to paste the URL here) and read out from there; you should be able to quickly pick up the terms you can use in further searches.
on 'working your compiler like a dog' I do indeed mean becoming familiar with its documentation and gaining a detailed understanding of the intentions and realities of the various options; ultimately you will have to do a lot of testing of compiler options to determine which are 'best' for your code on your platform(s).
on 'micro-optimisations', well here's a start: Performance Optimization of Numerically Intensive Codes. Don't run away with the idea that you will learn all (or even much) of what you want to learn from this book. It's now about 10 years old. The take away messages are:
performance optimisation requires intimacy with machine architecture;
performance optimisation is made up of 1001 individual steps and it's generally impossible to predict which ones will be most useful (and which ones actually harmful) without detailed understanding of a program and its run-time environment;
performance optimisation is a participation sport, you can't learn it without doing it;
performance optimisation requires obsessive attention to detail and good record-keeping.
Oh, and never write a clever piece of optimisation that you can't easily un-write when the next compiler release implements a better approach. I spend a fair amount of time removing clever tricks from 20-year old Fortran that was justified (if at all) on the grounds of boosting execution performance but which now just confuses the programmer (it annoys the hell out of me too) and gets in the way of the compiler doing its job.
Finally, one piece of wisdom I am prepared to share: these days I do very little optimisation that is not under one of the items in my first list above; I find that the cost/benefit ratio of micro-optimisations is unfavourable to my employers.

Resources