OpenMDAO: discrete optimization problems; how to define the set of discrete variables? - openmdao

I am trying to learn how to use OpenMDAO in order to solve discrete optimization problems. I saw that it was possible to define discrete variables (https://openmdao.org/newdocs/versions/latest/features/core_features/working_with_components/discrete_variables.html) but I cannot find where I can define the set of possible values that the optimizer is allowed to select.
Can you help me?

OpenMDAO supports using discrete variables as optimizer variables for certain optimizers that support them, but that support is limited to integer variables. You can only specify a lower and upper bound, just as you would with a continuous variable.
A relevant example can be found here, where 'xI' is a discrete variable:
https://openmdao.org/newdocs/versions/latest/features/building_blocks/drivers/genetic_algorithm.html
Note that the SimpleGADriver will also encode any continuous OpenMDAO variable as if it were an integer, if you don't set a 'bits' value for it in the driver's options.

Related

Constraint implementation for the Ipopt sovler with a non-contiguous target range

In a project, Ipopt is used to solve a problem. I am wondering if I can add a new constraint to the problem, but I am not very deep into the topic.
Basically, a variable should be greater than a certain value, or it can be zero. The latter could be a problem since the target range is not contiguous.
I was thinking of using binary variables, but as far as I can see, the Ipopt solver does not support them. Is there any way to implement my condition?
While it would be possible to formulate this condition with a nonconvex nonlinear constraint, I would also suggest that you look for a solver for MINLP and use a binary or semicontinuous variable.
Sometimes you have to change the tool when your problem changes.

Can I use automatic differentiation for non-differentiable functions?

I am testing performance of different solvers on minimizing an objective function derived from simulated method of moments. Given that my objective function is not differentiable, I wonder if automatic differentiation would work in this case? I tried my best to read some introduction on this method, but I couldn't figure it out.
I am actually trying to use Ipopt+JuMP in Julia for this test. Previously, I have tested it using BlackBoxoptim in Julia. I will also appreciate if you could provide some insights on optimization of non-differentiable functions in Julia.
It seems that I am not clear on "non-differentiable". Let me give you an example. Consider the following objective function. X is dataset, B is unobserved random errors which will be integrated out, \theta is parameters. However, A is discrete and therefore not differentiable.
I'm not exactly an expert on optimization, but: it depends on what you mean by "nondifferentiable".
For many mathematical functions that are used, "nondifferentiable" will just mean "not everywhere differentiable" -- but that's still "differentiable almost everywhere, except on countably many points" (e.g., abs, relu). These functions are not a problem at all -- you can just chose any subgradient and apply any normal gradient method. That's what basically all AD systems for machine learning do. The case for non-singular subgradients will happen with low probability anyway. An alternative for certain forms of convex objectives are proximal gradient methods, which "smooth" the objective in an efficient way that preserves optima (cf. ProximalOperators.jl).
Then there's those functions that seem like they can't be differentiated at all, since they seem "combinatoric" or discrete, but are in fact piecewise differentiable (if seen from the correct point of view). This includes sorting and ranking. But you have to find them, and describing and implementing the derivative is rather complicated. Whether such functions are supported by an AD system depends on how sophisticated its "standard library" is. Some variants of this, like "permute", can just fall out AD over control structures, while move complex ones require the primitive adjoints to be manually defined.
For certain kinds of problems, though, we just work in an intrinsically discrete space -- like, integer parameters of some probability distributions. In these case, differentiation makes no sense, and hence AD libraries define their primitives not to work on these parameters. Possible alternatives are to use (mixed) integer programming, approximations, search, and model selection. This case also occurs for problems where the optimized space itself depends on the parameter in question, like the second argument of fill. We also have things like the ℓ0 "norm" or the rank of a matrix, for which there exist well-known continuous relaxations, but that's outside of the scope of AD).
(In the specific case of MCMC for discrete or dimensional parameters, there's other ways to deal with that, like combining HMC with other MC methods in a Gibbs sampler, or using a nonparametric model instead. Other tricks are possible for VI.)
That being said, you will rarely encounter complicated nowhere differentiable continuous functions in optimization. They are already complicated to describe, are just unlikely to arise in the kind of math we use for modelling.

Predicting a numeric attribute through high dimensional nominal attributes

I'm having difficulties mining a big (100K entries) dataset of mine concerning logistics transportation. I have around 10 nominal String attributes (i.e. city/region/country names, customers/vessel identification codes, etc.). Along with those, I have one date attribute "departure" and one ratio-scaled numeric attribute "goal".
What I'm trying to do is using a training set to find out which attributes have strong correlations with "goal" and then validating these patterns by predicting the "goal" value of entries in a test set.
I assume clustering, classification and neural networks could be useful for this problem, so I used RapidMiner, Knime and elki and tried to apply some of their tools on my data. However, most of these tools only handle numeric data, so I got no useful results.
Is it possible to transform my nominal attributes into numeric ones? Or do I need to find different algorithms that can actually handle nominal data?
you most likely want to use tree based algorithm. These are good to use nominal features. Please be aware, that you do not want to use "id-like" attributes.
I would recommend RapidMiner's AutoModel feature as a start. GBT and RandomForest should work well.
Best,
Martin
the handling of nominal attributes does not depend on the tool. It is a question what algorithm you use. For example k-means with Euclidean distance can't handle string values. But other distance functions can handle them and algorithms can handle them, for example the random forest implementation of RapidMiner
You can also of course transform the nominal attributes to numerical, for example by using a binary dummy encoding or assigning an unique integer value (which might result in some bias). In RapidMiner you have the Nominal to Numerical operator for that.
Depending on the distribution of your nominal values it might also be useful to handle rare values. You could either group them together in a new category (such as "other") or to use a feature selection algorithm after you apply the dummy encoding.
See the screen shot for a sample RapidMiner process (which uses the Replace Rare Values operator from the Operator Toolbox extension).
Edit: Martin is also right, AutoModel will be a good start to check for problematic attributes and find a fitting algorithm.

Including an offset variable [or constraining a coefficient to 1] using mlogit R

I´ve been estimating some MNL models in R using mlogit. The package works very well but it seems that it does not allow to include offset variables. I read the package documentation in order to see whether it allowed to constrain a coefficient when it estimated the model. It seems that there is one alternative using mlogit.optim. Unfortunately, it does not specify how it must be used.
So, my point is: does any of you know how to either (a) including an offset variable or (b) how to constrain a coefficient (coeff A = 1) using mlogit?
Thanks in advance!
Best,
Dr. Wall

Difference between solnp and gosolnp functions in R for Non-Linear optimization problems

What is the main difference between the two functions, the r Help manual says that gosolnp helps to set the initial parameters correctly. Is there any difference otherwise? Also, if so is the case, how do we determine the correct distribution type for the parameter space?
In my problem, the initial set of parameters is difficult to determine, which is why the optimization problem is used. However, I have idea about the parameter upper and lower bounds.
gsolnp is an extension of solnp, a wrapper, allowing for multiple restarts. Simply put, it uses solnp several times (controllable by n.restarts) to avoid getting stuck in local minima. If your function is known to have no local minima (e.g., it is convex, which can be derived analytically), use solnp to save time. Otherwise, use gsolnp. If you know any additional information (for instance, an area where a global minimum is supposed to be), you may use it for finer control of the starting parameter distribution: see parameters distr and distr.opt.

Resources