Non Linear Integer Programming - r

I would like to know if there is a package in R handling non linear integer optimization.
"Basically", I would like to solve the following problem:
max f(x) s.t x in (0,10) and x is integer.
I know that some branching algorithms are able to handle the linear version of this problem, but here my function f() might be more complicated. (I can't even make sure it would be quadratic of the form f(x)=xQx).
I guess there is always the brute force solution to test all the possibilities as long as they are bounded, but I was wondering if there wasn't anything smarter.

I have a few options for you, but none of them is the silver bullet, although it looks like your silver bullet is in the works under the rino project: http://r-forge.r-project.org/projects/rino/.
Since your function is complicated, you may want to use a genetic algorithm (i.e., gradient-based optimizers may not be reliable). genoud in the rgenoud library may do the trick (link text). If you set data.type.int=TRUE it should do the trick. I have not used this library, but have some experience with GAs in matlab and the time to convergence is sensitive to the settings, so you'll be well served to read the man page a few times through.
Or, if your function in strictly concave (unlikely, since you say it may be complicated) you can solve with a gradient solver (e.g., optim) then check the neighborhood around the optimum (can't be more than 2^n points to check).
Sorry, I can't be of more help.

If it is hardly nonlinear there is no better method than brute force (you will never know if the minimum is local or if some flat-looking fragment doesn't have any narrow and deep valleys), except of course symbolic computation (which probably won't work because the function is too complicated) or soft computing, I mean things like genetic algorithms, Monte-Carlo, swarms, etc. (here you don't have a guarantee that it will find the very global minimum and because you have integer x it can be slower than brute force).

http://cran.r-project.org/web/views/Optimization.html lists the packages Rdonlp2 and Rsolnp which may be suitable.

Discrete filled function method is one of the recent methods that can find global solution of nonlinear integer programming with about 100 constraints and variables.

Related

Should linear solver be converging when Coloring is beign computed

I had to add some circular dependencies to my model and thus adding NonlinearBlockGS and LinearBlockGS to the Group with the circular dependency. I get messages like this
LN: LNBGSSolver 'LN: LNBGS' on system 'XXX' failed to converge in 10
iterations.
in the phase where it's finding the Coloring of the problem. There is a Dymos trajectory as part of the problem, but the circular dependency is not in the Trajectory group, it's upstream. It however converges very easily when actually solving the problem. The number of FWD solves is the same as it was before-- everything seem to work fine. Should I be worried about anything?
the way our total derivative coloring works is that we replace partial derivatives with random numbers and then solve the linear system. So the linear solver should be converging. Now, whether or not it should converge with LNBGS in 10 iterations... probably not.
Its hard to speak diffinitively when putting random numbers into a matrix to invert it... but generally speaking it should remain invertible (though we can't promise). That does not mean that it will remain easily invertible. How close does the linear residual get during the coloring? it is decreasing, but slowly. Would more iteration let it get there?
If your problem is working well, I don't think you need to freak out about this. If you would like it to converge better, it won't hurt anything and might give you better coloring. You can increase the iprint of that solver to get more information on the convergence history.
Another option, if your system is small enough, is to try using the DirectSolver instead of LNBGS. For most models with less than 10,000 variables in them a DirectSolver will be overall faster than the LNBGS. There is a nice symetry to using LNBGS with NLGBS ... but while the nonlinear solver tends to be a good choice (i.e. fast and stable) for cyclic dependencies the same can't be said for its linear counter part.
So my go-to combination if NLBGS and DirectSolver. You can't always use the DirectSolver. If you have distributed components in your model, or components that use the matrix-free derivative APIs (apply_linear, compute_jacvec_product), then LNBGS is a good option. But if everything is explicit components with compute_partials or implicit components that provide partials in the linearize method then I suggest using the DirectSolver as your first option.
I think you may have discovered a coloring performance issue in OpenMDAO. When we compute coloring, internally we replace the component partials with random arrays matching the declared sparsity. Since we're not trying to find an actual solution when we compute coloring, we probably don't need to iterate more than once in any given system. And we shouldn't be generating convergence warnings when computing the coloring. I don't think you need to be worried in this case. I'll put a story in our bug tracker to look into this.

Can I use automatic differentiation for non-differentiable functions?

I am testing performance of different solvers on minimizing an objective function derived from simulated method of moments. Given that my objective function is not differentiable, I wonder if automatic differentiation would work in this case? I tried my best to read some introduction on this method, but I couldn't figure it out.
I am actually trying to use Ipopt+JuMP in Julia for this test. Previously, I have tested it using BlackBoxoptim in Julia. I will also appreciate if you could provide some insights on optimization of non-differentiable functions in Julia.
It seems that I am not clear on "non-differentiable". Let me give you an example. Consider the following objective function. X is dataset, B is unobserved random errors which will be integrated out, \theta is parameters. However, A is discrete and therefore not differentiable.
I'm not exactly an expert on optimization, but: it depends on what you mean by "nondifferentiable".
For many mathematical functions that are used, "nondifferentiable" will just mean "not everywhere differentiable" -- but that's still "differentiable almost everywhere, except on countably many points" (e.g., abs, relu). These functions are not a problem at all -- you can just chose any subgradient and apply any normal gradient method. That's what basically all AD systems for machine learning do. The case for non-singular subgradients will happen with low probability anyway. An alternative for certain forms of convex objectives are proximal gradient methods, which "smooth" the objective in an efficient way that preserves optima (cf. ProximalOperators.jl).
Then there's those functions that seem like they can't be differentiated at all, since they seem "combinatoric" or discrete, but are in fact piecewise differentiable (if seen from the correct point of view). This includes sorting and ranking. But you have to find them, and describing and implementing the derivative is rather complicated. Whether such functions are supported by an AD system depends on how sophisticated its "standard library" is. Some variants of this, like "permute", can just fall out AD over control structures, while move complex ones require the primitive adjoints to be manually defined.
For certain kinds of problems, though, we just work in an intrinsically discrete space -- like, integer parameters of some probability distributions. In these case, differentiation makes no sense, and hence AD libraries define their primitives not to work on these parameters. Possible alternatives are to use (mixed) integer programming, approximations, search, and model selection. This case also occurs for problems where the optimized space itself depends on the parameter in question, like the second argument of fill. We also have things like the ℓ0 "norm" or the rank of a matrix, for which there exist well-known continuous relaxations, but that's outside of the scope of AD).
(In the specific case of MCMC for discrete or dimensional parameters, there's other ways to deal with that, like combining HMC with other MC methods in a Gibbs sampler, or using a nonparametric model instead. Other tricks are possible for VI.)
That being said, you will rarely encounter complicated nowhere differentiable continuous functions in optimization. They are already complicated to describe, are just unlikely to arise in the kind of math we use for modelling.

Can any existing Machine Learning structures perfectly emulate recursive functions like the Fibonacci sequence?

To be clear I don't mean, provided the last two numbers in the sequence provide the next one:
(2, 3, -> 5)
But rather given any index provide the Fibonacci number:
(0 -> 1) or (7 -> 21) or (11 -> 144)
Adding two numbers is a very simple task for any machine learning structure, and by extension counting by ones, twos or any fixed number is a simple addition rule. Recursive calculations however...
To my understanding, most learning networks rely on forwards only evaluation, whereas most programming languages have loops, jumps, or circular flow patterns (all of which are usually ASM jumps of some kind), thus allowing recursion.
Sure some networks aren't forwards only; But can processing weights using the hyperbolic tangent or sigmoid function enter any computationally complete state?
i.e. conditional statements, conditional jumps, forced jumps, simple loops, complex loops with multiple conditions, providing sort order, actual reordering of elements, assignments, allocating extra registers, etc?
It would seem that even a non-forwards only network would only find a polynomial of best fit, reducing errors across the expanse of the training set and no further.
Am I missing something obvious, or did most of Machine Learning just look at recursion and pretend like those problems don't exist?
Update
Technically any programming language can be considered the DNA of a genetic algorithm, where the compiler (and possibly console out measurement) would be the fitness function.
The issue is that programming (so far) cannot be expressed in a hill climbing way - literally, the fitness is 0, until the fitness is 1. Things don't half work in programming, and if they do, there is no way of measuring how 'working' a program is for unknown situations. Even an off by one error could appear to be a totally different and chaotic system with no output. This is exactly the reason learning to code in the first place is so difficult, the learning curve is almost vertical.
Some might argue that you just need to provide stronger foundation rules for the system to exploit - but that just leads to attempting to generalize all programming problems, which circles right back to designing a programming language and loses all notion of some learning machine at all. Following this road brings you to a close variant of LISP with mutate-able code and virtually meaningless fitness functions that brute force the 'nice' and 'simple' looking code-space in attempt to follow human coding best practices.
Others might argue that we simply aren't using enough population or momentum to gain footing on the error surface, or make a meaningful step towards a solution. But as your population approaches the number of DNA permutations, you are really just brute forcing (and very inefficiently at that). Brute forcing code permutations is nothing new, and definitely not machine learning - it's actually quite common in regex golf, I think there's even an xkcd about it...
The real problem isn't finding a solution that works for some specific recursive function, but finding a solution space that can encompass the recursive domain in some useful way.
So other than Neural Networks trained using Backpropagation hypothetically finding the closed form of a recursive function (if a closed form even exists, and they don't in most real cases where recursion is useful), or a non-forwards only network acting like a pseudo-programming language with awful fitness prospects in the best case scenario, plus the virtually impossible task of tuning exit constraints to prevent infinite recursion... That's really it so far for machine learning and recursion?
According to Kolmogorov et al's On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition, a three layer neural network can model arbitrary function with the linear and logistic functions, including f(n) = ((1+sqrt(5))^n - (1-sqrt(5))^n) / (2^n * sqrt(5)), which is the close form solution of Fibonacci sequence.
If you would like to treat the problem as a recursive sequence without a closed-form solution, I would view it as a special sliding window approach (I called it special because your window size seems fixed as 2). There are more general studies on the proper window size for your interest. See these two posts:
Time Series Prediction via Neural Networks
Proper way of using recurrent neural network for time series analysis
Ok, where to start...
Firstly, you talk about 'machine learning' and 'perfectly emulate'. This is not generally the purpose of machine learning algorithms. They make informed guesses given some evidence and some general notions about structures that exist in the world. That typically means an approximate answer is better than an 'exact' one that is wrong. So, no, most existing machine learning approaches aren't the right tools to answer your question.
Second, you talk of 'recursive structures' as some sort of magic bullet. Yet they are merely convenient ways to represent functions, somewhat analogous to higher order differential equations. Because of the feedbacks they tend to introduce, the functions tend to be non-linear. Some machine learning approaches will have trouble with this, but many (neural networks for example) should be able to approximate you function quite well, given sufficient evidence.
As an aside, having or not having closed form solutions is somewhat irrelevant here. What matters is how well the function at hand fits with the assumptions embodied in the machine learning algorithm. That relationship may be complex (eg: try approximating fibbonacci with a support vector machine), but that's the essence.
Now, if you want a machine learning algorithm tailored to the search for exact representations of recursive structures, you could set up some assumptions and have your algorithm produce the most likely 'exact' recursive structure that fits your data. There are probably real world problems in which such a thing would be useful. Indeed the field of optimisation approaches similar problems.
The genetic algorithms mentioned in other answers could be an example of this, especially if you provided a 'genome' that matches the sort of recursive function you think you may be dealing with. Closed form primitives could form part of that space too, if you believe they are more likely to be 'exact' than more complex genetically generated algorithms.
Regarding your assertion that programming cannot be expressed in a hill climbing way, that doesn't prevent a learning algorithm from scoring possible solutions by how many much of your evidence it's able to reproduce and how complex they are. In many cases (most? though counting cases here isn't really possible) such an approach will find a correct answer. Sure, you can come up with pathological cases, but with those, there's little hope anyway.
Summing up, machine learning algorithms are not usually designed to tackle finding 'exact' solutions, so aren't the right tools as they stand. But, by embedding some prior assumptions that exact solutions are best, and perhaps the sort of exact solution you're after, you'll probably do pretty well with genetic algorithms, and likely also with algorithms like support vector machines.
I think you also sum things up nicely with this:
The real problem isn't finding a solution that works for some specific recursive function, but finding a solution space that can encompass the recursive domain in some useful way.
The other answers go a long way to telling you where the state of the art is. If you want more, a bright new research path lies ahead!
See this article:
Turing Machines are Recurrent Neural Networks
http://lipas.uwasa.fi/stes/step96/step96/hyotyniemi1/
The paper describes how a recurrent neural network can simulate a register machine, which is known to be a universal computational model equivalent to a Turing machine. The result is "academic" in the sense that the neurons have to be capable of computing with unbounded numbers. This works mathematically, but would have problems pragmatically.
Because the Fibonacci function is just one of many computable functions (in fact, it is primitive recursive), it could be computed by such a network.
Genetic algorithms should do be able to do the trick. The important this is (as always with GAs) the representation.
If you define the search space to be syntax trees representing arithmetic formulas and provide enough training data (as you would with any machine learning algorithm), it probably will converge to the closed-form solution for the Fibonacci numbers, which is:
Fib(n) = ( (1+srqt(5))^n - (1-sqrt(5))^n ) / ( 2^n * sqrt(5) )
[Source]
If you were asking for a machine learning algorithm to come up with the recursive formula to the Fibonacci numbers, then this should also be possible using the same method, but with individuals being syntax trees of a small program representing a function.
Of course, you also have to define good cross-over and mutation operators as well as a good evaluation function. And I have no idea how well it would converge, but it should at some point.
Edit: I'd also like to point out that in certain cases there is always a closed-form solution to a recursive function:
Like every sequence defined by a linear recurrence with constant coefficients, the Fibonacci numbers have a closed-form solution.
The Fibonacci sequence, where a specific index of the sequence must be returned, is often used as a benchmark problem in Genetic Programming research. In most cases recursive structures are generated, although my own research focused on imperative programs so used an iterative approach.
There's a brief review of other GP research that uses the Fibonacci problem in Section 3.4.2 of my PhD thesis, available here: http://kar.kent.ac.uk/34799/. The rest of the thesis also describes my own approach, which is covered a bit more succinctly in this paper: http://www.cs.kent.ac.uk/pubs/2012/3202/
Other notable research which used the Fibonacci problem is Simon Harding's work with Self-Modifying Cartesian GP (http://www.cartesiangp.co.uk/papers/eurogp2009-harding.pdf).

CVX-esque convex optimization in R?

I need to solve (many times, for lots of data, alongside a bunch of other things) what I think boils down to a second order cone program. It can be succinctly expressed in CVX something like this:
cvx_begin
variable X(2000);
expression MX(2000);
MX = M * X;
minimize( norm(A * X - b) + gamma * norm(MX, 1) )
subject to
X >= 0
MX((1:500) * 4 - 3) == MX((1:500) * 4 - 2)
MX((1:500) * 4 - 1) == MX((1:500) * 4)
cvx_end
The data lengths and equality constraint patterns shown are just arbitrary values from some test data, but the general form will be much the same, with two objective terms -- one minimizing error, the other encouraging sparsity -- and a large number of equality constraints on the elements of a transformed version of the optimization variable (itself constrained to be non-negative).
This seems to work pretty nicely, much better than my previous approach, which fudges the constraints something rotten. The trouble is that everything else around this is happening in R, and it would be quite a nuisance to have to port it over to Matlab. So is doing this in R viable, and if so how?
This really boils down to two separate questions:
1) Are there any good R resources for this? As far as I can tell from the CRAN task page, the SOCP package options are CLSCOP and DWD, which includes an SOCP solver as an adjunct to its classifier. Both have similar but fairly opaque interfaces and are a bit thin on documentation and examples, which brings us to:
2) What's the best way of representing the above problem in the constraint block format used by these packages? The CVX syntax above hides a lot of tedious mucking about with extra variables and such, and I can just see myself spending weeks trying to get this right, so any tips or pointers to nudge me in the right direction would be very welcome...
You might find the R package CVXfromR useful. This lets you pass an optimization problem to CVX from R and returns the solution to R.
OK, so the short answer to this question is: there's really no very satisfactory way to handle this in R. I have ended up doing the relevant parts in Matlab with some awkward fudging between the two systems, and will probably migrate everything to Matlab eventually. (My current approach predates the answer posted by user2439686. In practice my problem would be equally awkward using CVXfromR, but it does look like a useful package in general, so I'm going to accept that answer.)
R resources for this are pretty thin on the ground, but the blog post by Vincent Zoonekynd that he mentioned in the comments is definitely worth reading.
The SOCP solver contained within the R package DWD is ported from the Matlab solver SDPT3 (minus the SDP parts), so the programmatic interface is basically the same. However, at least in my tests, it runs a lot slower and pretty much falls over on problems with a few thousand vars+constraints, whereas SDPT3 solves them in a few seconds. (I haven't done a completely fair comparison on this, because CVX does some nifty transformations on the problem to make it more efficient, while in R I'm using a pretty naive definition, but still.)
Another possible alternative, especially if you're eligible for an academic license, is to use the commercial Mosek solver, which has an R interface package Rmosek. I have yet to try this, but may give it a go at some point.
(As an aside, the other solver bundled with CVX, SeDuMi, fails completely on the same problem; the CVX authors aren't kidding when they suggest trying multiple solvers. Also, in a significant subset of cases, SDTP3 has to switch from Cholesky to LU decomposition, which makes the processing orders of magnitude slower, with only very marginal improvement in the objective compared to the pre-LU steps. I've found it worth reducing the requested precision to avoid this, but YMMV.)
There is a new alternative: CVXR, which comes from the same people.
There is a website, a paper and a github project.
Disciplined Convex Programming seems to be growing in popularity observing cvxpy (Python) and Convex.jl (Julia), again, backed by the same people.

How can I do blind fitting on a list of x, y value pairs if I don't know the form of f(x) = y?

If I have a function f(x) = y that I don't know the form of, and if I have a long list of x and y value pairs (potentially thousands of them), is there a program/package/library that will generate potential forms of f(x)?
Obviously there's a lot of ambiguity to the possible forms of any f(x), so something that produces many non-trivial unique answers (in reduced terms) would be ideal, but something that could produce at least one answer would also be good.
If x and y are derived from observational data (i.e. experimental results), are there programs that can create approximate forms of f(x)? On the other hand, if you know beforehand that there is a completely deterministic relationship between x and y (as in the input and output of a pseudo random number generator) are there programs than can create exact forms of f(x)?
Soooo, I found the answer to my own question. Cornell has released a piece of software for doing exactly this kind of blind fitting called Eureqa. It has to be one of the most polished pieces of software that I've ever seen come out of an academic lab. It's seriously pretty nifty. Check it out:
It's even got turnkey integration with Amazon's ec2 clusters, so you can offload some of the heavy computational lifting from your local computer onto the cloud at the push of a button for a very reasonable fee.
I think that I'm going to have to learn more about GUI programming so that I can steal its interface.
(This is more of a numerical methods question.) If there is some kind of observable pattern (you can kinda see the function), then yes, there are several ways you can approximate the original function, but they'll be just that, approximations.
What you want to do is called interpolation. Two very simple (and not very good) methods are Newton's method and Laplace's method of interpolation. They both work on the same principle but they are implemented differently (Laplace's is iterative, Newton's is recursive, for one).
If there's not much going on between any two of your data points (ie, the actual function doesn't have any "bumps" whose "peaks" are not represented by one of your data points), then the spline method of interpolation is one of the best choices you can make. It's a bit harder to implement, but it produces nice results.
Edit: Sometimes, depending on your specific problem, these methods above might be overkill. Sometimes, you'll find that linear interpolation (where you just connect points with straight lines) is a perfectly good solution to your problem.
It depends.
If you're using data acquired from the real-world, then statistical regression techniques can provide you with some tools to evaluate the best fit; if you have several hypothesis for the form of the function, you can use statistical regression to discover the "best" fit, though you may need to be careful about over-fitting a curve -- sometimes the best fit (highest correlation) for a specific dataset completely fails to work for future observations.
If, on the other hand, the data was generated something synthetically (say, you know they were generated by a polynomial), then you can use polynomial curve fitting methods that will give you the exact answer you need.
Yes, there are such things.
If you plot the values and see that there's some functional relationship that makes sense, you can use least squares fitting to calculate the parameter values that minimize the error.
If you don't know what the function should look like, you can use simple spline or interpolation schemes.
You can also use software to guess what the function should be. Maybe something like Maxima can help.
Wolfram Alpha can help you guess:
http://blog.wolframalpha.com/2011/05/17/plotting-functions-and-graphs-in-wolframalpha/
Polynomial Interpolation is the way to go if you have a totally random set
http://en.wikipedia.org/wiki/Polynomial_interpolation
If your set is nearly linear, then regression will give you a good approximation.
Creating exact form from the X's and Y's is mostly impossible.
Notice that what you are trying to achieve is at the heart of many Machine Learning algorithm and therefor you might find what you are looking for on some specialized libraries.
A list of x/y values N items long can always be generated by an degree-N polynomial (assuming no x values are the same). See this article for more details:
http://en.wikipedia.org/wiki/Polynomial_interpolation
Some lists may also match other function types, such as exponential, sinusoidal, and many others. It is impossible to find the 'simplest' matching function, but the best you can do is go through a list of common ones like exponential, sinusoidal, etc. and if none of them match, interpolate the polynomial.
I'm not aware of any software that can do this for you, though.

Resources