Equality constrainsts handling in Evolutionary multiobjective algorithms - constraints

I am working on a problem which has two linear constraints including one equality constraint. I am using SPEA2 algorithm. The constraints are given below.
I have tried penalty function approach but had difficulty in selecting parameters. Secondly, I have used constrained dominance relation approach but again could not get feasible solutions. Please advise...

Related

Can I use automatic differentiation for non-differentiable functions?

I am testing performance of different solvers on minimizing an objective function derived from simulated method of moments. Given that my objective function is not differentiable, I wonder if automatic differentiation would work in this case? I tried my best to read some introduction on this method, but I couldn't figure it out.
I am actually trying to use Ipopt+JuMP in Julia for this test. Previously, I have tested it using BlackBoxoptim in Julia. I will also appreciate if you could provide some insights on optimization of non-differentiable functions in Julia.
It seems that I am not clear on "non-differentiable". Let me give you an example. Consider the following objective function. X is dataset, B is unobserved random errors which will be integrated out, \theta is parameters. However, A is discrete and therefore not differentiable.
I'm not exactly an expert on optimization, but: it depends on what you mean by "nondifferentiable".
For many mathematical functions that are used, "nondifferentiable" will just mean "not everywhere differentiable" -- but that's still "differentiable almost everywhere, except on countably many points" (e.g., abs, relu). These functions are not a problem at all -- you can just chose any subgradient and apply any normal gradient method. That's what basically all AD systems for machine learning do. The case for non-singular subgradients will happen with low probability anyway. An alternative for certain forms of convex objectives are proximal gradient methods, which "smooth" the objective in an efficient way that preserves optima (cf. ProximalOperators.jl).
Then there's those functions that seem like they can't be differentiated at all, since they seem "combinatoric" or discrete, but are in fact piecewise differentiable (if seen from the correct point of view). This includes sorting and ranking. But you have to find them, and describing and implementing the derivative is rather complicated. Whether such functions are supported by an AD system depends on how sophisticated its "standard library" is. Some variants of this, like "permute", can just fall out AD over control structures, while move complex ones require the primitive adjoints to be manually defined.
For certain kinds of problems, though, we just work in an intrinsically discrete space -- like, integer parameters of some probability distributions. In these case, differentiation makes no sense, and hence AD libraries define their primitives not to work on these parameters. Possible alternatives are to use (mixed) integer programming, approximations, search, and model selection. This case also occurs for problems where the optimized space itself depends on the parameter in question, like the second argument of fill. We also have things like the ℓ0 "norm" or the rank of a matrix, for which there exist well-known continuous relaxations, but that's outside of the scope of AD).
(In the specific case of MCMC for discrete or dimensional parameters, there's other ways to deal with that, like combining HMC with other MC methods in a Gibbs sampler, or using a nonparametric model instead. Other tricks are possible for VI.)
That being said, you will rarely encounter complicated nowhere differentiable continuous functions in optimization. They are already complicated to describe, are just unlikely to arise in the kind of math we use for modelling.

Linear programming using blocking theory R

The following linear programming problem is not of canonical form. I am really stuck when trying to put this in regular form and feed it into the normal lp() function.
Does someone has experience with such weird form?
B and A are the blocker and antiblocker, respectively, which are simply two sets of inequalities.
I don't know what the "normal lp() function" is. Let's assume this is the lp function from the LpSolve package.
This function does not expect a canonical form. (Canonical usually means each constraint has the same fixed sign, e.g. Ax<=b; lp() allows different signs for each constraint).
lp() just wants one big constraint matrix: each column is an individual variable and each row is an individual constraint. This is conceptual simple, but often tedious in practice. Best thing to do is to get a large piece of paper and draw the layout of the LP matrix: which variables and constraints go where.
For some classes of models there are easier-to-use tools to express an LP model, such as OMPR, CVXR.

Choosing the proper optimisation algorithm in R

I am trying to find extremum of a linear objective function with linear equality, linear inequality and nonlinear (quadratic) inequality constraints. The problem is I have already tried many algorithms from packages like nloptr, Rsolnp Nlcoptim and for every time I have obtained different results. What is more the results differ (in many cases) from GRG algorithm from Excel which can find better results in terms of the minimising objective function.
So far solnp (Rsolnp package) gives some good results and after proper calibrating the results are even better than the one from GRG Excel algorithm. Results from Solnl (NlcOptim) are average and very different, even if the data input is slightly changed.
Nloptr (Nloptr package) function has implemented various number of algorithms. I tried few (I do not remember which exactly) and the results were still average and completely different than the one obtained so far from other algorithms.
My knowledge about optimisation algorithms is really poor and my attempts are rather based on a random selection of algorithms. Thus could you advise some algorithms implemented in R that can handle such problem? And which one (and why) is better from another? Maybe there is some framework or decision tree regarding choosing proper optimisation algorithm.
If this can help, I try to find the optimal weights of the portfolio assets, where the objective function is to minimise portfolio risk (standard deviation), with all assets weights sum up to 1 and are greater then or equal to 0, and with defined portfolio return as constraints.

What method do you use for selecting the optimum number of clusters in k-means and EM?

Many algorithms for clustering are available. A popular algorithm is the K-means where, based on a given number of clusters, the algorithm iterates to find best clusters for the objects.
What method do you use to determine the number of clusters in the data in k-means clustering?
Does any package available in R contain the V-fold cross-validation method for determining the right number of clusters?
Another well used approach is Expectation Maximization (EM) algorithm which assigns a probability distribution to each instance which indicates the probability of it belonging to each of the clusters.
Is this algorithm implemented in R?
If it is, does it have the option to automatically select the optimum number of clusters by cross validation?
Do you prefer some other clustering method instead?
For large "sparse" datasets i would seriously recommend "Affinity propagation" method.
It has superior performance compared to k means and it is deterministic in nature.
http://www.psi.toronto.edu/affinitypropagation/
It was published in journal "Science".
However the choice of optimal clustering algorithm depends on the data set under consideration. K Means is a text book method and it is very likely that some one has developed a better algorithm more suitable for your type of dataset/
This is a good tutorial by Prof. Andrew Moore (CMU, Google) on K Means and Hierarchical Clustering.
http://www.autonlab.org/tutorials/kmeans.html
Last week I coded up such an estimate-the-number-of-clusters algorithm for a K-Means clustering program. I used the method outlined in:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.70.9687&rep=rep1&type=pdf
My biggest implementation problem was that I had to find a suitable Cluster Validation Index (ie error metric) that would work. Now it is a matter of processing speed, but the results currently look reasonable.

What is the difference between a 'combinatorial algorithm' and a 'linear algorithm'?

Or rather, what is the definition of a combinatorial algorithm and a linear algorithm, resp.?
To make it clear because obviously the first responders misunderstood the question: I am not looking for a definition of an algorithm running in linear time vs non-linear time. A linear algorithm is somehow related to linear programming, which is a technique for finding or approximating solutions to linear optimization problems.
Since NP-hard problems are so hard, there is a whole field trying to find approximate solutions. The traveling salesman problem for instance has several approximate solutions which run in polynomial time and produce a solution which is within a given bound of the best solution.
Some of these approximating algorithms are called a linear algorithm, others a combinatorial algorithm; and the latter seems to be preferred (Why?). These are the two concepts I would like to understand.
The issue is one of problem formulation.
Just as you said Traveling Salesperson Problem (TSP) is NP-hard precisely because it has a discrete problem formulation (the salesperson either visits a city or not at a particular time). This discrete formulation makes the problem, and it's algorithm, combinatorial. (Note that not all combinatorial problems are NP-hard; consider sorting algorithms.)
However, the Linear-Programming (LP) relaxation of TSP results in a linear algorithm. This is because the problem has been reformulated such that the salesperson visits a city a certain proportion of the time. The main reason for using an LP relaxation is because the relaxed version can be solved in polynomial time. However, the solution to the LP relaxation is not necessarily a solution to the original problem.
A linear algorithm tends to work with just one set of data - 'Take all the numbers in set a, double them, and put the result in set b'. The number of operations is equal to the count of items in set a
A combinatorial one works on combinations of sets - 'For each number in set a, work out the sum of that number and each number in set b and print to screen'. The number of operations is the product of the size of set a and the size of set b.
Combinatorial algorithms "explode" as their input grows. Linear algorithms grows proportional to their input, while combinatorial algorithms grows proportional to an exponent (or worse) or their input: enumerating all possible paths through a graph, for example.

Resources