Make input variables inactive when computing the jacobian - openmdao

Make input variables inactive when computing the jacobian
We are setting up an aero elastic optimization framework for wind turbine optimization and we are there facing issues with defining inputs and outputs for the components.
The issues is that we might have many inputs and outputs for a solver (example further down) but they are likely not active for all optimization cases. It leads to the problem that we need to compute partials for all combinations of input and output even though we only might have a single input and output. Is it possible to tell the component which input and outputs are active design variables?
Example:
An aerodynamic wind turbine rotor solver (ExplicitComponent).
Inputs
Chord (c, distributed along the blade span - 1D array)
Twist (t, distributed along the blade span - 1D array)
Outputs
Power (P, scalar)
Lift coefficient (Cl, distributed along the blade span - 1D array)
For the solver above we have both AD forward and backward gradients. Below we have two optimization problems where the fist do not lead to computational overhed but the other does.
Optimization problem 1
Maximize power while constraining the lift-coefficient to 1
max P for c, t
subj Cl <= 1
All input and outputs are active design-variables and objectives/constraints.
Optimization problem 2
Maximize power
max P for c, t
If using the same OpenMDAO component the Cl output is still there and it would therefore compute the gradient for it. But that is computational expensive as all the needed gradient are given when running reverse AD for P, but it will still to compute the gradients for Cl. Is there is a way to side step that behavior? Ex. making the output inactive?
We have tried to make input and output dynamic for the component but it quickly get to be difficult code to read and for nested components is difficult to keep. Another thing is that it is mostly a think you need to define for the problem and not the component.

You mention that you are using AD, but not which of the derivative APIs you are using. From the context of your question it sounds like you're using the compute_partials API. That means you're likely asking the AD system to compute all the partials you need and then passing them to OpenMDAO.
Assuming that I have guessed right, then there is one possible way to speed things up a bit here and get the effects you are looking for without explicitly turning I/O on an off, and AD based partials are particularly well suited to this approach.
The matrix-free derivative APIs in OpenMDAO are designed to give you the exact behavior you want automatically. For ExplicitComponent, the method is called compute_jacvec_product. In the example from the OpenMDAO docs, this is implemented manually but it should tie in with an AD system very easily. For example the JAX AD library has JVP and ['VJP][3] methods that can be used in the fwdanrev` modes of the OpenMDAO matrix-free APIs respectively.
When using these Matrix-free APIs, OpenMDAO will only call your AD system a minimum number of times. The exact number depends on if OpenMDAO selects fwd or rev mode (or what you hard code in setup) and then also on the number of design variables, and constraints you have.
In your case, I would guess you'd end up using reverse mode. Then when you don't have the CL constraint, you wouldn't get the extra calls to the AD library.
There are a few additional caveats for the matrix-free APIs when using implicit components that I didn't cover here. Your question specifically noted ExplicitComponent, so Im not sure they are relevant. But I wanted to note that if you graduate to implicit components then you have to worry about the solve_linear method along with the apply_linear (which is the Implicit analogue to the compute_jacvec_product explicit method)

Related

Can I use automatic differentiation for non-differentiable functions?

I am testing performance of different solvers on minimizing an objective function derived from simulated method of moments. Given that my objective function is not differentiable, I wonder if automatic differentiation would work in this case? I tried my best to read some introduction on this method, but I couldn't figure it out.
I am actually trying to use Ipopt+JuMP in Julia for this test. Previously, I have tested it using BlackBoxoptim in Julia. I will also appreciate if you could provide some insights on optimization of non-differentiable functions in Julia.
It seems that I am not clear on "non-differentiable". Let me give you an example. Consider the following objective function. X is dataset, B is unobserved random errors which will be integrated out, \theta is parameters. However, A is discrete and therefore not differentiable.
I'm not exactly an expert on optimization, but: it depends on what you mean by "nondifferentiable".
For many mathematical functions that are used, "nondifferentiable" will just mean "not everywhere differentiable" -- but that's still "differentiable almost everywhere, except on countably many points" (e.g., abs, relu). These functions are not a problem at all -- you can just chose any subgradient and apply any normal gradient method. That's what basically all AD systems for machine learning do. The case for non-singular subgradients will happen with low probability anyway. An alternative for certain forms of convex objectives are proximal gradient methods, which "smooth" the objective in an efficient way that preserves optima (cf. ProximalOperators.jl).
Then there's those functions that seem like they can't be differentiated at all, since they seem "combinatoric" or discrete, but are in fact piecewise differentiable (if seen from the correct point of view). This includes sorting and ranking. But you have to find them, and describing and implementing the derivative is rather complicated. Whether such functions are supported by an AD system depends on how sophisticated its "standard library" is. Some variants of this, like "permute", can just fall out AD over control structures, while move complex ones require the primitive adjoints to be manually defined.
For certain kinds of problems, though, we just work in an intrinsically discrete space -- like, integer parameters of some probability distributions. In these case, differentiation makes no sense, and hence AD libraries define their primitives not to work on these parameters. Possible alternatives are to use (mixed) integer programming, approximations, search, and model selection. This case also occurs for problems where the optimized space itself depends on the parameter in question, like the second argument of fill. We also have things like the ℓ0 "norm" or the rank of a matrix, for which there exist well-known continuous relaxations, but that's outside of the scope of AD).
(In the specific case of MCMC for discrete or dimensional parameters, there's other ways to deal with that, like combining HMC with other MC methods in a Gibbs sampler, or using a nonparametric model instead. Other tricks are possible for VI.)
That being said, you will rarely encounter complicated nowhere differentiable continuous functions in optimization. They are already complicated to describe, are just unlikely to arise in the kind of math we use for modelling.

Possible issue in presentation of N² diagram with indexed inputs/outputs?

When visualizing the structure of the circuit tutorial via an N² diagram, I noticed that implicit components with indexed inputs/outputs labelled using the pattern x:y (e.g. I_out:0 of n1) do not display connections into the output of the block (in this case V of n1).
I understand that it is computing the residuals with the inputs and some initial "guess" to provide the output, so is this by design for ImplicitComponent because the connections are implicit? I tend to use the diagrams for debugging, and seeing no connections to the output makes it look unclear if it's connected, even though the inputs are fed into it and the code processes it via the residual equation correctly.
This is a known bug in OpenMDAO 2.9.1, but has been fixed already on OpenMDAO master. So the next release, due out before the end of Feb 2020 (2.10) should have the issue fixed.

Understanding the complex-step in a physical sense

I think I understand what complex step is doing numerically/algorithmically.
But the questions still linger. First two questions might have the same answer.
1- I replaced the partial derivative calculations of 'Betz_limit' example with complex step and removed the analytical gradients. Looking at the recorded design_var evolution none of the values are complex? Aren't they supposed to be shown as somehow a+bi?
Or it always steps in the real space. ?
2- Tying to picture 'cs', used in a physical concept. For example a design variable of beam length (m), objective of mass (kg) and a constraint of loads (Nm). I could be using an explicit component to calculate these (pure python) or an external code component (pure fortran). Numerically they all can handle complex numbers but obviously the mass is a real value. So when we say capable of handling the complex numbers is it just an issue of handling a+bi (where actual mass is always 'a' and b is always equal to 0?)
3- How about the step size. I understand there wont be any subtractive cancellation errors but what if i have a design variable normalized/scaled to 1 and a range of 0.8 to 1.2. Decreasing the step to 1e-10 does not make sense. I am a bit confused there.
The ability to use complex arithmetic to compute derivative approximations is based on the mathematics of complex arithmetic.
You should read about the theory to get a better understanding of why it works and how the step size issue is resolved with complex-step vs finite-difference.
There is no physical interpretation that you can make for the complex-step method. You are simply taking advantage of the mathematical properties of complex arithmetic to approximate a derivative in a more accurate manner than FD can. So the key is that your code is set up to do complex-arithmetic correctly.
Sometimes, engineering analyses do actually leverage complex numbers. One aerospace example of this is the Jukowski Transformation. In electrical engineering, complex numbers come up all the time for load-flow analysis of ac circuits. If you have such an analysis, then you can not easily use complex-step to approximate derivatives since the analysis itself is already complex. In these cases, it is technically possible to use a more general class of numbers called hyper dual numbers, but this is not supported in OpenMDAO. So if you had an analysis like this you could not use complex-step.
Also, occationally there are implementations of methods that are not complex-step safe which will prevent you from using it unless you define a new complex-step safe version. The simplest example of this is the np.absolute() method in the numpy library for python. The implementation of this, when passed a complex number, will return the asolute magnitude of the number:
abs(a+bj) = sqrt(1^2 + 1^2) = 1.4142
While not mathematically incorrect, this implementation would mess up the complex-step derivative approximation.
Instead you need an alternate version that gives:
abs(a+bj) = abs(a) + abs(b)*j
So in summary, you need to watch out for these kinds of functions that are not implemented correctly for use with complex-step. If you have those functions, you need to use alternate complex-step safe versions of them. Also, if your analysis itself uses complex numbers then you can not use complex-step derivative approximations either.
With regard to your step size question, again I refer you to the this paper for greater detail. The basic idea is that without subtractive cancellation you are free to use a very small step size with complex-step without the fear of lost accuracy due to numerical issues. So typically you will use 1e-20 smaller as the step. Since complex-step accuracy scalea with the order of step^2, using such a small step gives effectively exact results. You need not worry about scaling issues in most cases, if you just take a small enough step.

OpenMDAO efficiency with using multiple comp

I recently read this sentence in a paper:
One important feature of OpenMDAO is the ability to subdivide a
problem into components that have a small number of inputs and outputs
and contain relatively simple analyzes.
Moreover, looking at the examples in the manual there are few number of inputs and outputs for each component.
Would that mean it is more efficient to use an execcomp that takes in two inputs from from an explicit component and outputs a constraint instead of doing everything within the explicitcomp. I try to come up with an example here:
x1,x2 --> ExplicitComp -->y1
y1 --> Execcomp --->constraint
OR
x1,x2 --->ExplicitComp -->y1,constraint
What the comment in that paper is referring to is not computational efficiency, but rather the benefit to the user in terms of making models more modular and maintainable. Additionally, when you have smaller components with fewer inputs, it is much easier to compute analytic derivatives for them.
The idea is that by breaking your calculation up into smaller steps, the partial derivatives are them simpler for you to compute by hand. OpenMDAO will then compute the total derivatives across the model for you.
So in a sense, you're leaning on OpenMDAO's ability to compute derivatives across large models to lessen your work load.
From a computational cost perspective, there is some cost associated with having more components vs less. Taken to the extreme, if you had one component for each line of code in a huge calculation then the framework overhead could become a problem. There are some features in OpenMDAO that can help mitigate some of this cost, specifically the in-memory assembly of Jacobians, for serial models.
With regard to the ExecComp specifically, that component is meant for simple and inexpensive calculations. It computes its derivatives using complex-step, which can be costly if large array inputs are involved. Its there to make simple steps like adding variables easier. But for expensive calculations, you shouldn't use it.
In your specific case, I would suggest that you consider if it is hard to propogate the derivatives from x1,x1 through to the constraint yourself. If the chain rule is not hard to handle, then probably I would just lump it all into one calculation. If for some reason, the derivatives are nasty when you combine all the calculations, then just split them up.

fitness function and Selection for a Genetic Algorithm

I'm trying to design a nonlinear fitness function where I maximize variable A and minimize the variable B. The issue is that maximizing A is much more important at single digit values, almost logarithmic. B needs to be minimized and in contrast to A, it becomes less important when small (less than one) and more important when it's larger (>1), so exponential decay.
The main goal is to optimize A, so I guess an analog is A=profits, B=costs
Should I aim to keep everything positive so that the I can use a roulette wheel selection, or would it be better to use a rank/torunament kind of system? The purpose of my algorithm is shape optimization.
Thanks
When considering a multi-objective problem the goal is usually to identify all solutions that lie on the Pareto curve - the Pareto optimal set. Have a look here for a 2-dimensional visual example. When the algorithm completes you want a set of solutions that are not dominated by any other solution. You therefore need to define a pareto ranking mechanism to take into account both objectives - for a more in depth explanation, as well as links to even more reading, go here
With this in mind, in order to effectively explore all solutions along the pareto front you do not want an implementation that encourages premature convergence, otherwise your algorithm will only explore the search space in one specific area of the Pareto curve. I would implement a selection operator that keeps all members of each iteration's optimal set of solutions, that is all solutions which are not dominated by another + plus a parameter controlled percentage of other solutions. This way you encourage exploration all along the Pareto curve.
You also need to ensure your mutation and crossover operators are tuned correctly too. With any novel application of Evolutionary Algorithms, part of the problem is trying to identify an optimal parameter set for the problem domain... this is where it gets really interesting!!
The description is very vague, but assuming that you actually have an idea of what the function should look like and you're just wondering whether you need to modify it so that proportional selection can be used easily, then no. Regardless of fitness function, you should probably default to using something like tournament selection. Controlling selection pressure is one of the most important things you have to do in order to get consistently good results, and roulette wheel selection doesn't allow you that control. You typically get enormous pressure very early, which drives premature convergence. That might be preferable in a few cases, but it's not where I'd start my investigations.

Resources