Choosing the proper optimisation algorithm in R - r

I am trying to find extremum of a linear objective function with linear equality, linear inequality and nonlinear (quadratic) inequality constraints. The problem is I have already tried many algorithms from packages like nloptr, Rsolnp Nlcoptim and for every time I have obtained different results. What is more the results differ (in many cases) from GRG algorithm from Excel which can find better results in terms of the minimising objective function.
So far solnp (Rsolnp package) gives some good results and after proper calibrating the results are even better than the one from GRG Excel algorithm. Results from Solnl (NlcOptim) are average and very different, even if the data input is slightly changed.
Nloptr (Nloptr package) function has implemented various number of algorithms. I tried few (I do not remember which exactly) and the results were still average and completely different than the one obtained so far from other algorithms.
My knowledge about optimisation algorithms is really poor and my attempts are rather based on a random selection of algorithms. Thus could you advise some algorithms implemented in R that can handle such problem? And which one (and why) is better from another? Maybe there is some framework or decision tree regarding choosing proper optimisation algorithm.
If this can help, I try to find the optimal weights of the portfolio assets, where the objective function is to minimise portfolio risk (standard deviation), with all assets weights sum up to 1 and are greater then or equal to 0, and with defined portfolio return as constraints.

Related

How do I numerically compare the Dymos solution to the simulated solution?

I want to conduct a convergence study for my Dymos optimization results where I vary the number of nodes and compare the simulated solution to the optimization solution. From what I understand, Dymos fits polynomials to the system dynamics to represent the timeseries solution. What is the best way to compare the polynomial trajectory of the optimization solution to the trajectory of the simulated solution? I specifically want to evaluate the difference between the two trajectories away from the collocation/control nodes... to show that the polynomial fitting actually represents the simulated solution. How would I access the polynomial fitting data?
Thanks in advance.
For some of the testing we have an assert_timeseries_near_equal function that treats the more dense time series as the truth and tests that the less dense timeseries (usually the discrete solution) is reasonably close to it.
We're actually working on this method a bit more explicit right now so it's a little easier for users to apply in general cases, such as comparing discrete solutions from two different cases.
In general, there's a few different ways you can test your explicit results against an explicit integration. You could just verify that the final states of the two solutions are reasonably close. Since the error tends to increase over the course of the trajectory this is often good enough for a quick check. The downside of this approach is that it doesn't test that both solutions took the same path to the final condition.
To test the solution away from the nodes I'd recommend the following: Add a second timeseries output to the relevant phase that contains more segments or higher order segments. This timeseries will have more nodes. Dymos will interpolate from the solution's collocation grid onto this more dense timeseries output grid. Comparing this against the explicit simulation should still match exactly in terms of times, controls, and parameters, you'll better capture the interpolating state polynomials vs the explicitly simulated results.
There are other statistical methods out there for comparing timeseries that you can bring to bear, but visualizing the explicit trajectory plus some error bound as a "tube" into which we want to fit the discrete solution is usually how I handle it.

How can one calculate ROC's AUCs in complex designs with clustering in R?

The packages that calculate AUCs I've found so far do not contemplate sample clustering, which increases standard errors compared to simple random sampling. I wonder if the ones provided by these packages could be recalculated to allow for clustering.
Thank you.
Your best bet is probably replicate weights, as long as you can get point estimates of AUC that incorporate weights.
If you convert your design into a replicate-weights design object (using survey::as.svrepdesign()), you can then run any R function or expression using the replicate weights using survey::withReplicates() and return a standard error.

Optimizing over 3 dimensional piece-wise constant function

I'm working on a simulation project with a 3-dimensional piece-wise constant function, and I'm trying to find the inputs that maximize the output. Using optim() in R with the Nelder-Mead or SANN algorithms seems best (they don't require the function to be differentiable), but I'm finding that optim() ends up returning my starting value exactly. This starting value was obtained using a grid search, so it's likely reasonably good, but I'd be surprised if it was the exact peak.
I suspect that optim() is not testing points far enough out from the initial guess, leading to a situation where all tested points give the same output.
Is this a reasonable concern?
How can I tweak the breadth of values that optim() is testing as it searches?

Price Optimization in R

I need help in a price model optimization.
I am trying to maximize Sale based on several conditions.I have already done optimization in Excel using solver(GRG Nonlinear) but want to do in R since solver has limitations(Microsoft Excel Solver has a limit of 200 decision variables, for both linear and nonlinear problems).
Excel's NLP solver is based on Lasdon's GRG2 solver. I don't think this is available under R. We don't know the exact form of your model and its size (details like whether the constraints are linear or not, whether the objective is linear, quadratic or otherwise nonlinear etc), so it is difficult to recommend a particular solver. Here is a list of solvers available under R. Opposed to good LP solvers that basically can solve whatever you throw at them, NLP solvers are a little bit more fragile and may require a little bit more hand-holding (things like scaling, initial point, bounds come to mind).

What method do you use for selecting the optimum number of clusters in k-means and EM?

Many algorithms for clustering are available. A popular algorithm is the K-means where, based on a given number of clusters, the algorithm iterates to find best clusters for the objects.
What method do you use to determine the number of clusters in the data in k-means clustering?
Does any package available in R contain the V-fold cross-validation method for determining the right number of clusters?
Another well used approach is Expectation Maximization (EM) algorithm which assigns a probability distribution to each instance which indicates the probability of it belonging to each of the clusters.
Is this algorithm implemented in R?
If it is, does it have the option to automatically select the optimum number of clusters by cross validation?
Do you prefer some other clustering method instead?
For large "sparse" datasets i would seriously recommend "Affinity propagation" method.
It has superior performance compared to k means and it is deterministic in nature.
http://www.psi.toronto.edu/affinitypropagation/
It was published in journal "Science".
However the choice of optimal clustering algorithm depends on the data set under consideration. K Means is a text book method and it is very likely that some one has developed a better algorithm more suitable for your type of dataset/
This is a good tutorial by Prof. Andrew Moore (CMU, Google) on K Means and Hierarchical Clustering.
http://www.autonlab.org/tutorials/kmeans.html
Last week I coded up such an estimate-the-number-of-clusters algorithm for a K-Means clustering program. I used the method outlined in:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.70.9687&rep=rep1&type=pdf
My biggest implementation problem was that I had to find a suitable Cluster Validation Index (ie error metric) that would work. Now it is a matter of processing speed, but the results currently look reasonable.

Resources