OpenMDAO efficiency with using multiple comp - openmdao

I recently read this sentence in a paper:
One important feature of OpenMDAO is the ability to subdivide a
problem into components that have a small number of inputs and outputs
and contain relatively simple analyzes.
Moreover, looking at the examples in the manual there are few number of inputs and outputs for each component.
Would that mean it is more efficient to use an execcomp that takes in two inputs from from an explicit component and outputs a constraint instead of doing everything within the explicitcomp. I try to come up with an example here:
x1,x2 --> ExplicitComp -->y1
y1 --> Execcomp --->constraint
OR
x1,x2 --->ExplicitComp -->y1,constraint

What the comment in that paper is referring to is not computational efficiency, but rather the benefit to the user in terms of making models more modular and maintainable. Additionally, when you have smaller components with fewer inputs, it is much easier to compute analytic derivatives for them.
The idea is that by breaking your calculation up into smaller steps, the partial derivatives are them simpler for you to compute by hand. OpenMDAO will then compute the total derivatives across the model for you.
So in a sense, you're leaning on OpenMDAO's ability to compute derivatives across large models to lessen your work load.
From a computational cost perspective, there is some cost associated with having more components vs less. Taken to the extreme, if you had one component for each line of code in a huge calculation then the framework overhead could become a problem. There are some features in OpenMDAO that can help mitigate some of this cost, specifically the in-memory assembly of Jacobians, for serial models.
With regard to the ExecComp specifically, that component is meant for simple and inexpensive calculations. It computes its derivatives using complex-step, which can be costly if large array inputs are involved. Its there to make simple steps like adding variables easier. But for expensive calculations, you shouldn't use it.
In your specific case, I would suggest that you consider if it is hard to propogate the derivatives from x1,x1 through to the constraint yourself. If the chain rule is not hard to handle, then probably I would just lump it all into one calculation. If for some reason, the derivatives are nasty when you combine all the calculations, then just split them up.

Related

How do you choose to compute partials with the adjoint method?

Is there a specific option? Can you choose forward or reverse mode? Does it not matter since under the hood openMDAO computes the derivatives with the unified method?
When you call Problem.setup you can include the argument mode which is one of `fwd', 'rev', or 'auto'.
Note that the choice of forward or reverse (adjoint) derivatives affects how total derivatives (the ones that the optimizers and solvers need) are computed from the partial derivatives (the ones the provided by the components). It does not affect how the partial derivatives are provided in the compute_partials or linearize methods though.
Choosing the correct mode can make a big difference in performance, and in most use cases 'auto' will figure out the correct mode based on the number of design variables and the number of constraints + objective.
Problems with many constraints and few design variables (many rows, few columns in the total jacobian) will usually be much faster in forward mode.
Those with few constraints but many design variables (few rows, many columns) will be significantly faster in reverse.
Advanced Bidirectional Derivatives
OpenMDAO will also attempt to figure out the fastest way to compute totals through "coloring" the total jacobian. The analogy of this is, in finite-difference, sometimes you can perturb multiple design variables simultaneously to speed the calculation of the jacobian, assuming these variables don't both contribute to the same constraints/objective.
Traditionally, if a total jacobian had a dense column, then it can't efficiently be colored in reverse mode (multiple constraints "impact" the same design variable).
Similarly, a dense row would kill the efficiency of coloring the jacobian in forward mode.
However, OpenMDAO can figure out which derivatives can be efficiently colored in forward mode, and which can be colored in reverse mode, using both approaches to fill in the total jacobian.
You can read more about this capability here: http://openmdao.org/twodocs/versions/3.0.0/features/core_features/working_with_derivatives/simul_derivs.html

Understanding the complex-step in a physical sense

I think I understand what complex step is doing numerically/algorithmically.
But the questions still linger. First two questions might have the same answer.
1- I replaced the partial derivative calculations of 'Betz_limit' example with complex step and removed the analytical gradients. Looking at the recorded design_var evolution none of the values are complex? Aren't they supposed to be shown as somehow a+bi?
Or it always steps in the real space. ?
2- Tying to picture 'cs', used in a physical concept. For example a design variable of beam length (m), objective of mass (kg) and a constraint of loads (Nm). I could be using an explicit component to calculate these (pure python) or an external code component (pure fortran). Numerically they all can handle complex numbers but obviously the mass is a real value. So when we say capable of handling the complex numbers is it just an issue of handling a+bi (where actual mass is always 'a' and b is always equal to 0?)
3- How about the step size. I understand there wont be any subtractive cancellation errors but what if i have a design variable normalized/scaled to 1 and a range of 0.8 to 1.2. Decreasing the step to 1e-10 does not make sense. I am a bit confused there.
The ability to use complex arithmetic to compute derivative approximations is based on the mathematics of complex arithmetic.
You should read about the theory to get a better understanding of why it works and how the step size issue is resolved with complex-step vs finite-difference.
There is no physical interpretation that you can make for the complex-step method. You are simply taking advantage of the mathematical properties of complex arithmetic to approximate a derivative in a more accurate manner than FD can. So the key is that your code is set up to do complex-arithmetic correctly.
Sometimes, engineering analyses do actually leverage complex numbers. One aerospace example of this is the Jukowski Transformation. In electrical engineering, complex numbers come up all the time for load-flow analysis of ac circuits. If you have such an analysis, then you can not easily use complex-step to approximate derivatives since the analysis itself is already complex. In these cases, it is technically possible to use a more general class of numbers called hyper dual numbers, but this is not supported in OpenMDAO. So if you had an analysis like this you could not use complex-step.
Also, occationally there are implementations of methods that are not complex-step safe which will prevent you from using it unless you define a new complex-step safe version. The simplest example of this is the np.absolute() method in the numpy library for python. The implementation of this, when passed a complex number, will return the asolute magnitude of the number:
abs(a+bj) = sqrt(1^2 + 1^2) = 1.4142
While not mathematically incorrect, this implementation would mess up the complex-step derivative approximation.
Instead you need an alternate version that gives:
abs(a+bj) = abs(a) + abs(b)*j
So in summary, you need to watch out for these kinds of functions that are not implemented correctly for use with complex-step. If you have those functions, you need to use alternate complex-step safe versions of them. Also, if your analysis itself uses complex numbers then you can not use complex-step derivative approximations either.
With regard to your step size question, again I refer you to the this paper for greater detail. The basic idea is that without subtractive cancellation you are free to use a very small step size with complex-step without the fear of lost accuracy due to numerical issues. So typically you will use 1e-20 smaller as the step. Since complex-step accuracy scalea with the order of step^2, using such a small step gives effectively exact results. You need not worry about scaling issues in most cases, if you just take a small enough step.

OpenMDAO: When is it needed to define the partial derivative?

I've noticed that defining unneccesary partial derivatives can significantly slow down the optimizer. Therefore I'm trying to understand: how can I know whether I should define the partial derivative for a certain input/output relationship?
When you say "unnecessary" do you mean partial derivatives that are always zero?
Using declare_partials('*', '*'), when a component is really more sparse than that will significantly slow down your model. Anywhere where a partial derivatives is always zero, you should simply not declare it.
Furthermore, if you have a vectorized operation, then your Jacobian is actually a diagonal matrix. In that case, you should declare a [sparse partial derivative] by giving rows and cols arguments to the declare_partial call1. This will often substantially speed up your code.
Technically speaking, if you follows the data path from all of your design variables, through each components, to the objective and constraints, then any variable you passed would need to have its partials defined. But practically speaking you should declare and specify all the partials for every output w.r.t. every input (unless they are zero), so that changes to model connectivity don't break your derivatives.
It takes a little bit more time to declare your partials more sparsely, but the performance speed up is well worth it.
I think they need to be defined if they are ever relevant to a response (constraint or objective) in the optimization, or as part of a nonlinear solve within a group. My personal practice is to always define them. Should I every change my optimization problem, which I do often, I don't want to have to go back and make sure I'm always defining the appropriate derivatives.
The master-branch of OpenMDAO contains some jacobian-coloring techniques which can significantly improve performance if your problem is particularly sparse in nature. This method is enabled by setting the following options on the driver:
p.driver.options['dynamic_simul_derivs'] = True
p.driver.options['dynamic_simul_derivs_repeats'] = 5
This method works by filling in the user-described sparsity pattern (specified with rows and cols in declare partials) with random numbers and computing the total jacobian. The repeat option is there in improve confidence in the results, since it's possible but unlikely that a single pass will result in an "incidental zero" in the jacobian that is not truly part of the sparsity structure.
With this technique, and by doing things like vectorizing by calculations instead of using nested for loops, I've been able to get very good performance in a lot of situations. Of course, the effectiveness of these methods is going to change from model to model.

A framework for comparing the time performance of Expectation Maximization

I have my own implementation of the Expectation Maximization (EM) algorithm based on this paper, and I would like to compare this with the performance of another implementation. For the tests, I am using k centroids with 1 Gb of txt data and I am just measuring the time it takes to compute the new centroids in 1 iteration. I tried it with an EM implementation in R, but I couldn't, since the result is plotted in a graph and gets stuck when there's a large number of txt data. I was following the examples in here.
Does anybody know of an implementation of EM that can measure its performance or know how to do it with R?
Fair benchmarking of EM is hard. Very hard.
the initialization will usually involve random, and can be very different. For all I know, the R implementation by default uses hierarchical clustering to find the initial clusters. Which comes at O(n^2) memory and most likely at O(n^3) runtime cost. In my benchmarks, R would run out of memory due to this. I assume there is a way to specify initial cluster centers/models. A random-objects initialization will of course be much faster. Probably k-means++ is a good way to choose initial centers in practise.
EM theoretically never terminates. It just at some point does not change much anymore, and thus you can set a threshold to stop. However, the exact definition of the stopping threshold varies.
There exist all kinds of model variations. A method only using fuzzy assignments such as Fuzzy-c-means will of course be much faster than an implementation using multivariate Gaussian Mixture Models with a covaraince matrix. In particular with higher dimensionality.
Covariance matrixes also need O(k * d^2) memory, and the inversion will take O(k * d^3) time, and thus is clearly not appropriate for text data.
Data may or may not be appropriate. If you run EM on a data set that actually has Gaussian clusters, it will usually work much better than on a data set that doesn't provide a good fit at all. When there is no good fit, you will see a high variance in runtime even with the same implementation.
For a starter, try running your own algorithm several times with different initialization, and check your runtime for variance. How large is the variance compared to the total runtime?
You can try benchmarking against the EM implementation in ELKI. But I doubt the implementation will work with sparse data such as text - that data just is not Gaussian, it is not proper to benchmark. Most likely it will not be able to process the data at all because of this. This is expected, and can be explained from theory. Try to find data sets that are dense and that can be expected to have multiple gaussian clusters (sorry, I can't give you many recommendations here. The classic Iris and Old Faithful data sets are too small to be useful for benchmarking.

fitness function and Selection for a Genetic Algorithm

I'm trying to design a nonlinear fitness function where I maximize variable A and minimize the variable B. The issue is that maximizing A is much more important at single digit values, almost logarithmic. B needs to be minimized and in contrast to A, it becomes less important when small (less than one) and more important when it's larger (>1), so exponential decay.
The main goal is to optimize A, so I guess an analog is A=profits, B=costs
Should I aim to keep everything positive so that the I can use a roulette wheel selection, or would it be better to use a rank/torunament kind of system? The purpose of my algorithm is shape optimization.
Thanks
When considering a multi-objective problem the goal is usually to identify all solutions that lie on the Pareto curve - the Pareto optimal set. Have a look here for a 2-dimensional visual example. When the algorithm completes you want a set of solutions that are not dominated by any other solution. You therefore need to define a pareto ranking mechanism to take into account both objectives - for a more in depth explanation, as well as links to even more reading, go here
With this in mind, in order to effectively explore all solutions along the pareto front you do not want an implementation that encourages premature convergence, otherwise your algorithm will only explore the search space in one specific area of the Pareto curve. I would implement a selection operator that keeps all members of each iteration's optimal set of solutions, that is all solutions which are not dominated by another + plus a parameter controlled percentage of other solutions. This way you encourage exploration all along the Pareto curve.
You also need to ensure your mutation and crossover operators are tuned correctly too. With any novel application of Evolutionary Algorithms, part of the problem is trying to identify an optimal parameter set for the problem domain... this is where it gets really interesting!!
The description is very vague, but assuming that you actually have an idea of what the function should look like and you're just wondering whether you need to modify it so that proportional selection can be used easily, then no. Regardless of fitness function, you should probably default to using something like tournament selection. Controlling selection pressure is one of the most important things you have to do in order to get consistently good results, and roulette wheel selection doesn't allow you that control. You typically get enormous pressure very early, which drives premature convergence. That might be preferable in a few cases, but it's not where I'd start my investigations.

Resources