This question already has an answer here:
What is the closest function to R's nlminb in Python?
(1 answer)
Closed 4 years ago.
I'm using gnls function of nlme package to fit a curve. When I try to know what optimizer it was using, I was directed to nlminb function documentation and it states:
Unconstrained and box-constrained optimization using PORT routines.
I don't know what is "PORT routines". Is it a series of optimization algorithms or it's just an optimization algorithm called "PORT routines"?
Can anyone please at least tell me some names in the "routines". For example, "gradient descent", "Levenberg–Marquardt", or "trust region"?
Thanks in advance!!
nlminb is an unconstrained and bounds-constrained quasi-Newton method optimizer. This code is based on a FORTRAN PORT library by David Gay in the Bell Labs designed to be portable over different types of computer (from comments by Erwin Kalvelagen). See more here (Section 2).
L-BFGS-B & BFGS, being a member of quasi-Newton family methods, are the closest analogs of nlminb "adaptive nonlinear least-squares algorithm".
You can see the original report at An Adaptive Nonlinear Least-Squares Algorithm by J.E. Dennis, Jr., David M. Gay, Roy E. Welsch (thanks to Ben Bolker's comment).
Related
I've read elsewhere that the VGAM package can be used to model underdispersed count data via the genpoisson families. However, when I look up the help file for genpoisson0, genpoisson1, and genpoisson2 they all say the following:
"In theory the λ parameter is allowed to be negative to handle underdispersion, however this is no longer supported, hence 0 < λ < 1."
"In theory the \varphi parameter might be allowed to be less than unity to handle underdispersion but this is not supported."
"In theory the α parameter might be allowed to be negative to handle underdispersion but this is not supported."
Where can I go to handle underdispersion?
You can use quasilikelihood methods, e.g. family=quasipoisson in glm() (in base R)
the glmmTMB package supports COM-Poisson (family = compois) and generalized Poisson (family = genpois) conditional distributions.
It's not clear to me whether the reasons discussed (briefly) here for why underdispersed generalized Poisson distributions are no longer supported in VGAM also apply to the implementation in glmmTMB ...
There is some discussion of the glmmTMB parameterizations/implementations of COM-Poisson and generalized Poisson in Brooks et al (2019).
Brooks, Mollie E., Kasper Kristensen, Maria Rosa Darrigo, Paulo Rubim, María Uriarte, Emilio Bruna, and Benjamin M. Bolker. “Statistical Modeling of Patterns in Annual Reproductive Rates.” Ecology 100, no. 7 (2019): e02706. https://doi.org/10.1002/ecy.2706.
I try to optimize the averaged prediction of two logistic regressions in a classification task using a superlearner.
My measure of interest is classif.auc
The mlr3 help file tells me (?mlr_learners_avg)
Predictions are averaged using weights (in order of appearance in the
data) which are optimized using nonlinear optimization from the
package "nloptr" for a measure provided in measure (defaults to
classif.acc for LearnerClassifAvg and regr.mse for LearnerRegrAvg).
Learned weights can be obtained from $model. Using non-linear
optimization is implemented in the SuperLearner R package. For a more
detailed analysis the reader is referred to LeDell (2015).
I have two questions regarding this information:
When I look at the source code I think LearnerClassifAvg$new() defaults to "classif.ce", is that true?
I think I could set it to classif.auc with param_set$values <- list(measure="classif.auc",optimizer="nloptr",log_level="warn")
The help file refers to the SuperLearner package and LeDell 2015. As I understand it correctly, the proposed "AUC-Maximizing Ensembles through Metalearning" solution from the paper above is, however, not impelemented in mlr3? Or do I miss something? Could this solution be applied in mlr3? In the mlr3 book I found a paragraph regarding calling an external optimization function, would that be possible for SuperLearner?
As far as I understand it, LeDell2015 proposes and evaluate a general strategy that optimizes AUC as a black-box function by learning optimal weights. They do not really propose a best strategy or any concrete defaults so I looked into the defaults of the SuperLearner package's AUC optimization strategy.
Assuming I understood the paper correctly:
The LearnerClassifAvg basically implements what is proposed in LeDell2015 namely, it optimizes the weights for any metric using non-linear optimization. LeDell2015 focus on the special case of optimizing AUC. As you rightly pointed out, by setting the measure to "classif.auc" you get a meta-learner that optimizes AUC. The default with respect to which optimization routine is used deviates between mlr3pipelines and the SuperLearner package, where we use NLOPT_LN_COBYLA and SuperLearner ... uses the Nelder-Mead method via the optim function to minimize rank loss (from the documentation).
So in order to get exactly the same behaviour, you would need to implement a Nelder-Mead bbotk::Optimizer similar to here that simply wraps stats::optim with method Nelder-Mead and carefully compare settings and stopping criteria. I am fairly confident that NLOPT_LN_COBYLA delivers somewhat comparable results, LeDell2015 has a comparison of the different optimizers for further reference.
Thanks for spotting the error in the documentation. I agree, that the description is a little unclear and I will try to improve this!
Which algorithm does R use for computing one-class SVM ? This is the function
e1071::svm(..., type=one-classification, ...)
I have found this very nice blog, where author writes about two algorithms for one-class SVM. Which one does the function mentioned use?
You can see the following link:
https://cran.r-project.org/web/packages/e1071/vignettes/svmdoc.pdf
The link shows the dual problem formulation of the SVM algorithm this package uses (when one use one-class SVM, page 7 index (3)), easy transformation from the dual to the primal problem shows that this default implementation is the one Schölkopf suggested, see paper:
https://www.stat.purdue.edu/~yuzhu/stat598m3/Papers/NewSVM.pdf
What is the weights argument for in the R gbm function? Does it implement cost-sensitive stochastic gradient boosting?
You may have already read this, but the documentation says that the weights parameter is defined in this way:
an optional vector of weights to be used in the fitting process. Must
be positive but do not need to be normalized. If keep.data=FALSE in
the initial call to gbm then it is the user’s responsibility to
resupply the weights to gbm.more.
Thus my interpretation would be that they are standard observation weights as in any statistical model.
Is it cost-sensitive? Good question. I first noticed that one of the main citations for the package is:
B. Kriegler (2007). Cost-Sensitive Stochastic Gradient Boosting Within a Quantitative Regression Framework.
so I figured it does imply cost-sensitivity, but there's not an explicit use of that term in the vignette, so if it was not apparent.
I did a little bit of a deeper dive though and found some more resources. You can find the equations describing the weights towards the end of this article which describes the package.
I also found this question being asked way back in 2009 in a mailing list, and while there was no response, I finally found a a scholarly article discussing the use of gbm and other R packages for cost-sensitive gradient boosting.
The conclusion is that gbm's quantile loss function is differentiable and can be used in cost-sensitive applications wherein over/under-estimation have different error costs, however other quantitative loss functions (aside from quantile) may be necessary/appropriate in some applications of cost-sensitive gradient boosting.
That paper is centered around gbm but also discusses other packages and if your focus is on cost-sensitive gradient boosting then you may want to look at the others they mention in the paper as well.
Does anyone know which R package has the implementation of Generalized Reduced Gradient (GRG2) Algorithm ? thanks
Since #BenBolker has done the initial footwork in finding what sort of functionality you were hoping to replicate I'm posting a follow-up that might be useful. A recent exchange on Rhelp ended with a quote that was nominated for the R fortunes package, although it is not clear to me whether it was accepted:
"The idea that the Excel solver "has a good reputation for being fast
and accurate" does not withstand an examination of the Excel solver's
ability to solve the StRD nls test problems. ...
Excel solver does have the virtue that it will always produce an
answer, albeit one with zero accurate digits."
"I am unaware of R being applied to the StRD, but I did apply S+ to the
StRD and, with analytic derivatives, it performed flawlessly."
From: Bruce McCullough <bdmccullough#drexel.edu>
Date: February 20, 2013 7:58:24 AM PST
Here is a link to the self-cited work documenting the failures of the Excel Solver (which we now know is powered by some version of the GRG2 algorithm) by McCullough:
www.pages.drexel.edu/~bdm25/chap8.pdf and the link to the NIST website for the testing problems are here: http://www.itl.nist.gov/div898/strd/nls/nls_info.shtml and http://www.itl.nist.gov/div898/strd/nls/nls_main.shtml
The negative comment (brought to my attention by a downvote) from #jwg prompted me to redo the search suggested by Bolker. Still no hits for findFn( "GRG2"). I can report several hits for "GRG" none of them apparently to a solver, and was amused that one of them has the catchy expansion to "General Random Guessing model". That seemed particularly amusing when the thrust of my arguably non-answer was that choosing to use Excel's solver left one genuinely uncertain about the accuracy of the solution. I am unrepentant about posting an "answer" that does not deliver exactly what was requested, but instead warns users who might not be religiously committed to the Microsoft way in this statistical/mathematical arena. The lack of any effort on the part of the distributed R developers to provide a drop-in-replacement for the Excel solver is something to ponder seriously.
Some relavant insights come from this post to R-help by a reputable statistical scientist :
The code in Excel is actually called GRG2 (the 2 does matter). Unlike
any of the methods for optim(), it can handle nonlinear inequality
constraints and does not need a feasible initial solution.
There's a blurb about it in the NEOS optimisation guide:
http://www-fp.mcs.anl.gov/otc/Guide/SoftwareGuide/Blurbs/grg2.html
Judging from this blurb, it will be similar to L-BFGS-B for problems
with no constraints or box constraints.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics tlumley at
u.washington.edu University of Washington, Seattle
So under some conditions it may be suitable to use optim like this in place of the Excel solver:
optim(pars,
OptPars,
... ,
method = "L-BFGS-B")
Note that the NEOS optimisation guide is now here: http://neos-guide.org/content/optimization-guide and GRG2 is mentioned on this page: http://neos-guide.org/content/reduced-gradient-methods It lists BFGS, CONOPT and several others as related algorithms. The article describes these as 'projected augmented Lagrangian algorithm.' According to the Optimization CTV, these algorithms can be found in nloptr, alabama and Rsolnp.
I've had good matches (to six sig figs) between the Excel solver and R using the optimx package, but YMMV.