Constraint implementation for the Ipopt sovler with a non-contiguous target range - constraints

In a project, Ipopt is used to solve a problem. I am wondering if I can add a new constraint to the problem, but I am not very deep into the topic.
Basically, a variable should be greater than a certain value, or it can be zero. The latter could be a problem since the target range is not contiguous.
I was thinking of using binary variables, but as far as I can see, the Ipopt solver does not support them. Is there any way to implement my condition?

While it would be possible to formulate this condition with a nonconvex nonlinear constraint, I would also suggest that you look for a solver for MINLP and use a binary or semicontinuous variable.
Sometimes you have to change the tool when your problem changes.

Related

Should linear solver be converging when Coloring is beign computed

I had to add some circular dependencies to my model and thus adding NonlinearBlockGS and LinearBlockGS to the Group with the circular dependency. I get messages like this
LN: LNBGSSolver 'LN: LNBGS' on system 'XXX' failed to converge in 10
iterations.
in the phase where it's finding the Coloring of the problem. There is a Dymos trajectory as part of the problem, but the circular dependency is not in the Trajectory group, it's upstream. It however converges very easily when actually solving the problem. The number of FWD solves is the same as it was before-- everything seem to work fine. Should I be worried about anything?
the way our total derivative coloring works is that we replace partial derivatives with random numbers and then solve the linear system. So the linear solver should be converging. Now, whether or not it should converge with LNBGS in 10 iterations... probably not.
Its hard to speak diffinitively when putting random numbers into a matrix to invert it... but generally speaking it should remain invertible (though we can't promise). That does not mean that it will remain easily invertible. How close does the linear residual get during the coloring? it is decreasing, but slowly. Would more iteration let it get there?
If your problem is working well, I don't think you need to freak out about this. If you would like it to converge better, it won't hurt anything and might give you better coloring. You can increase the iprint of that solver to get more information on the convergence history.
Another option, if your system is small enough, is to try using the DirectSolver instead of LNBGS. For most models with less than 10,000 variables in them a DirectSolver will be overall faster than the LNBGS. There is a nice symetry to using LNBGS with NLGBS ... but while the nonlinear solver tends to be a good choice (i.e. fast and stable) for cyclic dependencies the same can't be said for its linear counter part.
So my go-to combination if NLBGS and DirectSolver. You can't always use the DirectSolver. If you have distributed components in your model, or components that use the matrix-free derivative APIs (apply_linear, compute_jacvec_product), then LNBGS is a good option. But if everything is explicit components with compute_partials or implicit components that provide partials in the linearize method then I suggest using the DirectSolver as your first option.
I think you may have discovered a coloring performance issue in OpenMDAO. When we compute coloring, internally we replace the component partials with random arrays matching the declared sparsity. Since we're not trying to find an actual solution when we compute coloring, we probably don't need to iterate more than once in any given system. And we shouldn't be generating convergence warnings when computing the coloring. I don't think you need to be worried in this case. I'll put a story in our bug tracker to look into this.

Solver selection : NonlinearBlockGS vs NewtonSolver

I have two openmdao groups with cyclic dependency between the groups. I calculate the derivatives using Complex step. I have a non-linear solver for the dependency and use SLSQP to optimize my objective function. The issue is with the choice of the non-linear solver. When I use NonlinearBlockGS the optimization is successful in 12 iterations. But when I use NewtonSolver with Directsolver or ScipyKrylov the optimization fails (Iteration limit exceeded), even with maxiter=2000. The cyclic connections converge, but it is just that the design variables does not reach the optimal values. The difference between the design variables in consecutive iterations is in the order 1e-5. And this increases the iterations needed. Also when I change the initial guess to a value closer to the optimal value it works.
To check further, I converted the model into IDF (by creating copies of coupling variables and consistency constraints) thereby removing the need for a solver. Now the optimization is successful in 5 iterations and the results are similar to the results when NonlinearBlockGS is used.
Why does this happen? Am I missing something? When should I use NewtonSolver over others? I know that it is difficult to answer without seeing the code. But it is just that my code is long with multiple components and I couldn't recreate the issue with a toy model. So any general insight is much appreciated.
Without seeing the code, you're right that its hard to give specifics.
Very broadly speaking, Newton can sometimes have a lot more trouble converging than NLBGS (Note: this is not absolutely true, but is a good rule of thumb). So what I would guess is happening is that on your first or second iteration, the newton solver isn't actually converging. You can check this by setting newton.options['iprint']=2 and looking at the iteration history as the optimizer iterates.
When you have a solver in your optimization, its critical that you also make sure that you set it to throw an error on non-convergence. Some optimizers can handle this error, and will backtrack on the line search. Others will just die. Either way, its important. Otherwise, you end up giving the optimizer an unconverged case that it doesn't know is unconverged.
This is bad for two reasons. First, the objective and constraints values it gets are going to be wrong! Second, and perhaps more importantly, the derivatives it computes are going to be wrong! You can read the details [in the theory manual,] but in summary the analytic derivative methods that OpenMDAO uses assume that the residuals have gone to 0. If thats not the case, the math breaks down. Even if you were doing full model finite-difference, non-convergenced models are a problem. You'll just get noisy garbage when you try to FD it.2
So, assuming you have set up your model correctly, and that you have the linear solvers set up problems (it sounds like you do since it works with NLBGS), then its most likely that the newton solver isn't converging. Use iprint, possibly combined with driver debug printing, to check this for yourself. If thats the case, you need to figure out how to get newton to behave better.
There are some tips here that are pretty general. You could also try using the armijo line search, which can often stablize a newton solve at the cost of some speed.
Finally... Newton isn't the best answer in all situations. If NLBGS is more stable, and computational cheaper you should use it. I applaud your desire to get it to work with Newton. You should definitely track down why its not, but if it turns out that Newton just can't solve your coupled problem reliably thats ok too!
the set it to throw an error on non-convergence is broken on your answer. I have added the link which I think is the right one. Please correct if the linked one is not the one you were thinking to link.

Can I use automatic differentiation for non-differentiable functions?

I am testing performance of different solvers on minimizing an objective function derived from simulated method of moments. Given that my objective function is not differentiable, I wonder if automatic differentiation would work in this case? I tried my best to read some introduction on this method, but I couldn't figure it out.
I am actually trying to use Ipopt+JuMP in Julia for this test. Previously, I have tested it using BlackBoxoptim in Julia. I will also appreciate if you could provide some insights on optimization of non-differentiable functions in Julia.
It seems that I am not clear on "non-differentiable". Let me give you an example. Consider the following objective function. X is dataset, B is unobserved random errors which will be integrated out, \theta is parameters. However, A is discrete and therefore not differentiable.
I'm not exactly an expert on optimization, but: it depends on what you mean by "nondifferentiable".
For many mathematical functions that are used, "nondifferentiable" will just mean "not everywhere differentiable" -- but that's still "differentiable almost everywhere, except on countably many points" (e.g., abs, relu). These functions are not a problem at all -- you can just chose any subgradient and apply any normal gradient method. That's what basically all AD systems for machine learning do. The case for non-singular subgradients will happen with low probability anyway. An alternative for certain forms of convex objectives are proximal gradient methods, which "smooth" the objective in an efficient way that preserves optima (cf. ProximalOperators.jl).
Then there's those functions that seem like they can't be differentiated at all, since they seem "combinatoric" or discrete, but are in fact piecewise differentiable (if seen from the correct point of view). This includes sorting and ranking. But you have to find them, and describing and implementing the derivative is rather complicated. Whether such functions are supported by an AD system depends on how sophisticated its "standard library" is. Some variants of this, like "permute", can just fall out AD over control structures, while move complex ones require the primitive adjoints to be manually defined.
For certain kinds of problems, though, we just work in an intrinsically discrete space -- like, integer parameters of some probability distributions. In these case, differentiation makes no sense, and hence AD libraries define their primitives not to work on these parameters. Possible alternatives are to use (mixed) integer programming, approximations, search, and model selection. This case also occurs for problems where the optimized space itself depends on the parameter in question, like the second argument of fill. We also have things like the ℓ0 "norm" or the rank of a matrix, for which there exist well-known continuous relaxations, but that's outside of the scope of AD).
(In the specific case of MCMC for discrete or dimensional parameters, there's other ways to deal with that, like combining HMC with other MC methods in a Gibbs sampler, or using a nonparametric model instead. Other tricks are possible for VI.)
That being said, you will rarely encounter complicated nowhere differentiable continuous functions in optimization. They are already complicated to describe, are just unlikely to arise in the kind of math we use for modelling.

Understanding the complex-step in a physical sense

I think I understand what complex step is doing numerically/algorithmically.
But the questions still linger. First two questions might have the same answer.
1- I replaced the partial derivative calculations of 'Betz_limit' example with complex step and removed the analytical gradients. Looking at the recorded design_var evolution none of the values are complex? Aren't they supposed to be shown as somehow a+bi?
Or it always steps in the real space. ?
2- Tying to picture 'cs', used in a physical concept. For example a design variable of beam length (m), objective of mass (kg) and a constraint of loads (Nm). I could be using an explicit component to calculate these (pure python) or an external code component (pure fortran). Numerically they all can handle complex numbers but obviously the mass is a real value. So when we say capable of handling the complex numbers is it just an issue of handling a+bi (where actual mass is always 'a' and b is always equal to 0?)
3- How about the step size. I understand there wont be any subtractive cancellation errors but what if i have a design variable normalized/scaled to 1 and a range of 0.8 to 1.2. Decreasing the step to 1e-10 does not make sense. I am a bit confused there.
The ability to use complex arithmetic to compute derivative approximations is based on the mathematics of complex arithmetic.
You should read about the theory to get a better understanding of why it works and how the step size issue is resolved with complex-step vs finite-difference.
There is no physical interpretation that you can make for the complex-step method. You are simply taking advantage of the mathematical properties of complex arithmetic to approximate a derivative in a more accurate manner than FD can. So the key is that your code is set up to do complex-arithmetic correctly.
Sometimes, engineering analyses do actually leverage complex numbers. One aerospace example of this is the Jukowski Transformation. In electrical engineering, complex numbers come up all the time for load-flow analysis of ac circuits. If you have such an analysis, then you can not easily use complex-step to approximate derivatives since the analysis itself is already complex. In these cases, it is technically possible to use a more general class of numbers called hyper dual numbers, but this is not supported in OpenMDAO. So if you had an analysis like this you could not use complex-step.
Also, occationally there are implementations of methods that are not complex-step safe which will prevent you from using it unless you define a new complex-step safe version. The simplest example of this is the np.absolute() method in the numpy library for python. The implementation of this, when passed a complex number, will return the asolute magnitude of the number:
abs(a+bj) = sqrt(1^2 + 1^2) = 1.4142
While not mathematically incorrect, this implementation would mess up the complex-step derivative approximation.
Instead you need an alternate version that gives:
abs(a+bj) = abs(a) + abs(b)*j
So in summary, you need to watch out for these kinds of functions that are not implemented correctly for use with complex-step. If you have those functions, you need to use alternate complex-step safe versions of them. Also, if your analysis itself uses complex numbers then you can not use complex-step derivative approximations either.
With regard to your step size question, again I refer you to the this paper for greater detail. The basic idea is that without subtractive cancellation you are free to use a very small step size with complex-step without the fear of lost accuracy due to numerical issues. So typically you will use 1e-20 smaller as the step. Since complex-step accuracy scalea with the order of step^2, using such a small step gives effectively exact results. You need not worry about scaling issues in most cases, if you just take a small enough step.

Anyone know of the logic to compute the Version of a QR code needed to encode data?

The spec has 4 of these tables:
http://www.denso-wave.com/qrcode/vertable1-e.html
to handle versions 1-40
I'm wondering if anyone has coded something to formulate calculating the version needed for a string of data. None of the libraries I've seen for encoding the data offer this.
http://code.google.com/p/jsqrencode/downloads/list
Inside is genframe that finds the smallest version that a string will fit in.
It doesn't really use a formula, simply tests linearly (maybe a bsearch would be faster). There is no algorithm or equation nor is one (compact) really possible since the tables use fixed values and aren't algorithmically generated.

Resources