Generalized Reduced Gradient (GRG2) Algorithm in R - r

Does anyone know which R package has the implementation of Generalized Reduced Gradient (GRG2) Algorithm ? thanks

Since #BenBolker has done the initial footwork in finding what sort of functionality you were hoping to replicate I'm posting a follow-up that might be useful. A recent exchange on Rhelp ended with a quote that was nominated for the R fortunes package, although it is not clear to me whether it was accepted:
"The idea that the Excel solver "has a good reputation for being fast
and accurate" does not withstand an examination of the Excel solver's
ability to solve the StRD nls test problems. ...
Excel solver does have the virtue that it will always produce an
answer, albeit one with zero accurate digits."
"I am unaware of R being applied to the StRD, but I did apply S+ to the
StRD and, with analytic derivatives, it performed flawlessly."
From: Bruce McCullough <bdmccullough#drexel.edu>
Date: February 20, 2013 7:58:24 AM PST
Here is a link to the self-cited work documenting the failures of the Excel Solver (which we now know is powered by some version of the GRG2 algorithm) by McCullough:
www.pages.drexel.edu/~bdm25/chap8.pdf and the link to the NIST website for the testing problems are here: http://www.itl.nist.gov/div898/strd/nls/nls_info.shtml and http://www.itl.nist.gov/div898/strd/nls/nls_main.shtml
The negative comment (brought to my attention by a downvote) from #jwg prompted me to redo the search suggested by Bolker. Still no hits for findFn( "GRG2"). I can report several hits for "GRG" none of them apparently to a solver, and was amused that one of them has the catchy expansion to "General Random Guessing model". That seemed particularly amusing when the thrust of my arguably non-answer was that choosing to use Excel's solver left one genuinely uncertain about the accuracy of the solution. I am unrepentant about posting an "answer" that does not deliver exactly what was requested, but instead warns users who might not be religiously committed to the Microsoft way in this statistical/mathematical arena. The lack of any effort on the part of the distributed R developers to provide a drop-in-replacement for the Excel solver is something to ponder seriously.

Some relavant insights come from this post to R-help by a reputable statistical scientist :
The code in Excel is actually called GRG2 (the 2 does matter). Unlike
any of the methods for optim(), it can handle nonlinear inequality
constraints and does not need a feasible initial solution.
There's a blurb about it in the NEOS optimisation guide:
http://www-fp.mcs.anl.gov/otc/Guide/SoftwareGuide/Blurbs/grg2.html
Judging from this blurb, it will be similar to L-BFGS-B for problems
with no constraints or box constraints.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics tlumley at
u.washington.edu University of Washington, Seattle
So under some conditions it may be suitable to use optim like this in place of the Excel solver:
optim(pars,
OptPars,
... ,
method = "L-BFGS-B")
Note that the NEOS optimisation guide is now here: http://neos-guide.org/content/optimization-guide and GRG2 is mentioned on this page: http://neos-guide.org/content/reduced-gradient-methods It lists BFGS, CONOPT and several others as related algorithms. The article describes these as 'projected augmented Lagrangian algorithm.' According to the Optimization CTV, these algorithms can be found in nloptr, alabama and Rsolnp.
I've had good matches (to six sig figs) between the Excel solver and R using the optimx package, but YMMV.

Related

right R package for portfolio optimization using nonlinear constraints

I am currently looking at rewriting a commercial "back-box" portfolio optimiser, data in -> results out. I want to move away and use my own R version of it, so far a have to implementations working for my equality constraints, "solve.QP" and "constrOptim".
My problem now is the more I move towards nonlinear constraints (especially turnover limitations and transaction costs) the less information I find, would be great if someone could recommend a package, best case already a finance package or a more general mathematical one. The few packages I read along the lines so far were, "nloptr","fportfolio" and sometimes "rmetrics".
Any examples would also be highly appreciated.
thanks
Turnover constraints involve an absolute value. This can be linearized. So you can use your existing solver.
Linear transaction cost: same story. If your transaction cost have a fixed cost structure then things become more complicated. That may require an MIQP solver.

Finding nonlinear data dependencies

I have a multidimensional array of data (x1,x2,x3,...,y). There are no information about data correlation, nature and boundaries. I have performed some analyses to find linear dependence using regression but nothing were found.
I would like try to find non-linear dependence. I haven't found any information how to perform the analysis if I just have portion of data. Which methods and/or algorithms can I use to find dependence of data?
The general topic you are looking for has various names. Search for "nonlinear regression" and "data mining" and "machine learning". I second the recommendation for Hastie & Tibshirani, "Elements of Statistical Learning". Brian Ripley also has a good book on the topic; I don't remember the title. There are probably many more good books.
If you can give more details about the problem, maybe someone has more specific advice. Probably it's better to take it to the StackExchange statistics forum rather than StackOverflow.

Rules-of-thumb doc for mathematical programming in R?

Does there exist a simple, cheatsheet-like document which compiles the best practices for mathematical computing in R? Does anyone have a short list of their best-practices? E.g., it would include items like:
For large numerical vectors x, instead of computing x^2, one should compute x*x. This speeds up calculations.
To solve a system $Ax = b$, never solve $A^{-1}$ and left-multiply $b$. Lower order algorithms exist (e.g., Gaussian elimination)
I did find a nice numerical analysis cheatsheet here. But I'm looking for something quicker, dirtier, and more specific to R.
#Dirk Eddelbeuttel has posted a bunch of stuff on "high performance computing with R". He's also a regular so will probably come along and grab some well-deserved reputation points. While you are waiting you can read some of his stuff here:
http://dirk.eddelbuettel.com/papers/ismNov2009introHPCwithR.pdf
There is an archive of the r-devel mailing list where discussions about numerical analysis issues relating to R performance occur. I will often put its URL in the Google advanced search page domain slot when I want to see what might have been said in the past: https://stat.ethz.ch/pipermail/r-devel/

Comparing R to Matlab for Data Mining

Instead of starting to code in Matlab, I recently started learning R, mainly because it is open-source. I am currently working in data mining and machine learning field. I found many machine learning algorithms implemented in R, and I am still exploring different packages implemented in R.
I have quick question: how do you compare R to Matlab for data mining application, its popularity, pros and cons, industry and academic acceptance etc.? Which one would you choose and why?
I went through various comparisons for Matlab vs R against various metrics but I am specifically interested to get answer for its applicability in Data Mining and ML.
Since both language are pretty new for me I was just wondering if R would be a good choice or not.
I appreciate any kind of suggestions.
For the past three years or so, i have used R daily, and the largest portion of that daily use is spent on Machine Learning/Data Mining problems.
I was an exclusive Matlab user while in University; at the time i thought it was
an excellent set of tools/platform. I am sure it is today as well.
The Neural Network Toolbox, the Optimization Toolbox, Statistics Toolbox,
and Curve Fitting Toolbox are each highly desirable (if not essential)
for someone using MATLAB for ML/Data Mining work, yet they are all separate from
the base MATLAB environment--in other words, they have to be purchased separately.
My Top 5 list for Learning ML/Data Mining in R:
Mining Association Rules in R
This refers to a couple things: First, a group of R Package that all begin arules (available from CRAN); you can find the complete list (arules, aruluesViz, etc.) on the Project Homepage. Second, all of these packages are based on a data-mining technique known as Market-Basked Analysis and alternatively as Association Rules. In many respects, this family of algorithms is the essence of data-mining--exhaustively traverse large transaction databases and find above-average associations or correlations among the fields (variables or features) in those databases. In practice, you connect them to a data source and let them run overnight. The central R Package in the set mentioned above is called arules; On the CRAN Package page for arules, you will find links to a couple of excellent secondary sources (vignettes in R's lexicon) on the arules package and on Association Rules technique in general.
The standard reference, The Elements of Statistical
Learning by Hastie et al.
The most current edition of this book is available in digital form for free. Likewise, at the book's website (linked to just above) are all data sets used in ESL, available for free download. (As an aside, i have the free digital version; i also purchased the hardback version from BN.com; all of the color plots in the digital version are reproduced in the hardbound version.) ESL contains thorough introductions to at least one exemplar from most of the major
ML rubrics--e.g., neural metworks, SVM, KNN; unsupervised
techniques (LDA, PCA, MDS, SOM, clustering), numerous flavors of regression, CART,
Bayesian techniques, as well as model aggregation techniques (Boosting, Bagging)
and model tuning (regularization). Finally, get the R Package that accompanies the book from CRAN (which will save the trouble of having to download the enter the datasets).
CRAN Task View: Machine Learning
The +3,500 Packages available
for R are divided up by domain into about 30 package families or 'Task Views'. Machine Learning
is one of these families. The Machine Learning Task View contains about 50 or so
Packages. Some of these Packages are part of the core distribution, including e1071
(a sprawling ML package that includes working code for quite a few of
the usual ML categories.)
Revolution Analytics Blog
With particular focus on the posts tagged with Predictive Analytics
ML in R tutorial comprised of slide deck and R code by Josh Reich
A thorough study of the code would, by itself, be an excellent introduction to ML in R.
And one final resource that i think is excellent, but didn't make in the top 5:
A Guide to Getting Stared in Machine Learning [in R]
posted at the blog A Beautiful WWW
Please look at the CRAN Task Views and in particular at the CRAN Task View on Machine Learning and Statistical Learning which summarises this nicely.
Both Matlab and R are good if you are doing matrix-heavy operations. Because they can use highly optimized low-level code (BLAS libraries and such) for this.
However, there is more to data-mining than just crunching matrixes. A lot of people totally neglect the whole data organization aspect of data mining (as opposed to say, plain machine learning).
And once you get to data organization, R and Matlab are a pain. Try implementing an R*-tree in R or matlab to take an O(n^2) algorithm down to O(n log n) runtime. First of all, it totally goes against the way R and Matlab are designed (use bulk math operations wherever possible), secondly it will kill your performance. Interpreted R code for example seems to run at around 50% of the speed of the C code (try R built-in k-means vs. flexclus k-means); and the BLAS libraries are optimized to an insane level, exploiting cache sizes, data alignment, advanced CPU features. If you are adventurous, try implementing a manual matrix multiplication in R or Matlab, and benchmark it against the native one.
Don't get me wrong. There is a lot of stuff where R and matlab are just elegant and excellent for prototyping. You can solve a lot of things in just 10 lines of code, and get a decent performance out of it. Writing the same thing by hand would be hundreds of lines, and probably 10x slower. But sometimes you can optimize by a level of complexity, which for large data sets does beat the optimized matrix operations of R and matlab.
If you want to scale up to "Hadoop size" on the long run, you will have to think about data layout and organization, too, unless all you need is a linear scan over the data. But then, you could just be sampling, too!
Yesterday I found two new books about Data mining. These series of books entitled by ‘Data Mining’ address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters.The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. Books are: “New Fundamental Technologies in Data Mining” here http://www.intechopen.com/books/show/title/new-fundamental-technologies-in-data-mining & “Knowledge-Oriented Applications in Data Mining” here http://www.intechopen.com/books/show/title/knowledge-oriented-applications-in-data-mining These are open access books so you can download it for free or just read on online reading platform like I do. Cheers!
We should not forget the origin sources for these two software: scientific computation and also signal processing leads to Matlab but statistics leads to R.
I used matlab a lot in University since we have one installed on Unix and open to all students. However, the price for Matlab is too high especially compared to free R. If your major focus is not on matrix computation and signal processing, R should work well for your needs.
I think it also depends in which field of study you are. I know of people in coastal research that use a lot of Matlab. Using R in this group would make your life more difficult. If a colleague has solved a problem, you can't use it because he fixed it using Matlab.
I would also look at the capabilities of each when you are dealing with large amounts of data. I know that R can have problems with this, and might be restrictive if you are used to an iterative data mining process. For example looking at multiple models concurrently. I don't know if MATLAB has a data limitation.
I admit to favoring MATLAB for data mining problems, and I give some of my reasoning here:
Why MATLAB for Data Mining?
I will admit to only a passing familiarity with R/S-Plus, but I'll make the following observations:
R definitely has more of a statistical focus than MATLAB. I prefer building my own tools in MATLAB, so that I know exactly what they're doing, and I can customize them, but this is more of a necessity in MATLAB than it would be in R.
Code for new statistical techniques (spatial statistics, robust statistics, etc.) often appears early in S-Plus (I assume that this carries over to R, at least some).
Some years ago, I found the commercial version of R, S-Plus to have an extremely limited capacity for data. I cannot say what the state of R/S-Plus is today, but you may want to check if your data will fit into such tools comfortably.

Implementation of Particle Swarm Optimization Algorithm in R

I'm checking a simple moving average crossing strategy in R. Instead of running a huge simulation over the 2 dimenional parameter space (length of short term moving average, length of long term moving average), I'd like to implement the Particle Swarm Optimization algorithm to find the optimal parameter values. I've been browsing through the web and was reading that this algorithm was very effective. Moreover, the way the algorithm works fascinates me...
Does anybody of you guys have experience with implementing this algorithm in R? Are there useful packages that can be used?
Thanks a lot for your comments.
Martin
Well, there is a package available on CRAN called pso, and indeed it is a particle swarm optimizer (PSO).
I recommend this package.
It is under actively development (last update 22 Sep 2010) and is consistent with the reference implementation for PSO. In addition, the package includes functions for diagnostics and plotting results.
It certainly appears to be a sophisticated package yet the main function interface (the function psoptim) is straightforward--just pass in a few parameters that describe your problem domain, and a cost function.
More precisely, the key arguments to pass in when you call psoptim:
dimensions of the problem, as a vector
(par);
lower and upper bounds for each
variable (lower, upper); and
a cost function (fn)
There are other parameters in the psoptim method signature; those are generally related to convergence criteria and the like).
Are there any other PSO implementations in R?
There is an R Package called ppso for (parallel PSO). It is available on R-Forge. I do not know anything about this package; i have downloaded it and skimmed the documentation, but that's it.
Beyond those two, none that i am aware of. About three months ago, I looked for R implementations of the more popular meta-heuristics. This is the only pso implementation i am aware of. The R bindings to the Gnu Scientific Library GSL) has a simulated annealing algorithm, but none of the biologically inspired meta-heuristics.
The other place to look is of course the CRAN Task View for Optimization. I did not find another PSO implementation other than what i've recited here, though there are quite a few packages listed there and most of them i did not check other than looking at the name and one-sentence summary.

Resources