I have an objective function (a log likelihood), which I want to maximize using R for a vector of inputs (parameters). I have the gradient of the function (i.e. the score vector), and I also happen to know the Hessian of the function.
In Matlab, I can maximize the function easily and the performance is drastically improved by including both the gradient and the Hessian to the minimization using optimset('GradObj', 'on') and optimset('Hessian', 'on'). In particular, the latter makes a huge difference in this case.
However, I want to do this in R. In optim, I can supply the gradient, but as far as I can tell I can only request the Hessian.
My question: is there a straight-forward way of inlcuding the Hessian for optimization problems in R, as there is in Matlab?
Related
I need to perform optimization using a custom function in R. For the sake of argument, my complicated function is
ComplicatedFunction<-function(X){X*exp(-X/cummax(X))}
How would I, in a fast and automated fashion, extract the gradient and hessian.
A simple example is mean square error. MSE<-function(X){mean(X**2)}. The gradient is X and the hessian just just a bunch of 1s
This is a more general question, somewhat independent of data, so I do not have a MWE.
I often have functions fn(.) that implement algorithms that are not differentiable but that I want to optimize. I usually use optim(.) with its standard method, which works fine for me in terms of speed and results.
However, I now have a problem that requires me to set bounds on one of the several parameters of fn. From what I understand, optim(method="L-BFGS-B",...) allows me to set limits to parameters but also requires a gradient. Because fn(.) is not a mathematical function but an algorithm, I suspect it does not have a gradient that I could derive through differentiation. This leads me to ask whether there is a way of performing constrained optimization in R in a way that does not require me to give a gradient.
I have looked at some sources, e.g. John C. Nash's texts on this topic but as far as I understand them, they concern mostly differentiable functions where gradients can be supplied.
Summarizing the comments so far (which are all things I would have said myself):
you can use method="L-BFGS-B" without providing explicit gradients (the gr argument is optional); in that case, R will compute approximations to the derivative by finite differencing (#G.Grothendieck). It is the simplest solution, because it works "out of the box": you can try it and see if it works for you. However:
L-BFGS-B is probably the finickiest of the methods provided by optim() (e.g. it can't handle the case where a trial set of parameters evaluates to NA)
finite-difference approximations are relatively slow and numerically unstable (but, fine for simple problems)
for simple cases you can fit the parameter on a transformed scale, e.g. if b is a parameter that must be positive, you can use log_b as a parameter (and transform it via b <- exp(log_b) in your objective function). (#SamMason) But:
there isn't always a simple transformation that will achieve the constraint you want
if the optimal solution is on the boundary, transforming will cause problems
there are a variety of derivative-free optimizers with constraints (typically "box constraints", i.e. independent lower and/or upper bounds one or more parameters) (#ErwinKalvelagen): dfoptim has a few, I have used the nloptr package (and its BOBYQA optimizer) extensively, minqa has some as well. This is the solution I would recommend.
I use nlm to maximize a likelihood in R. I would like to predict the number of likelihood evaluations and abort if the task is likely to take too long. nlm returns the number of 'iterations' (typically 10-20), and I take it that each iteration involves one numerical evaluation of the Hessian. Time for each iteration (Hessian?) depends on the number of parameters. So I'd like to know: What is the general relationship between the number of parameters and the number of function evaluations per iteration in nlm?
The question is very general, and so is my answer.
From the reference of nlm:
If the function value has an attribute called gradient or both
gradient and hessian attributes, these will be used in the calculation
of updated parameter values. Otherwise, numerical derivatives are
used. deriv returns a function with suitable gradient attribute and
optionally a hessian attribute.
If you provide the gradient and the Hessian for a function being minimized, each iteration involves two function evaluations. If you don't, the Hessian and the gradient are calculated numerically. The source code can be found here. As far as I understand, the parameters of the R nlm function only affect the number of iterations until convergence, but have no influence on the way gradients are evaluated numerically.
I'm working on a simulation project with a 3-dimensional piece-wise constant function, and I'm trying to find the inputs that maximize the output. Using optim() in R with the Nelder-Mead or SANN algorithms seems best (they don't require the function to be differentiable), but I'm finding that optim() ends up returning my starting value exactly. This starting value was obtained using a grid search, so it's likely reasonably good, but I'd be surprised if it was the exact peak.
I suspect that optim() is not testing points far enough out from the initial guess, leading to a situation where all tested points give the same output.
Is this a reasonable concern?
How can I tweak the breadth of values that optim() is testing as it searches?
I have a complex objective function I am looking to optimize. The optimization problem takes a considerable time to optimize. Fortunately, I do have the gradient and the hessian of the function available.
Is there an optimization package in R that can take all three of these inputs? The class 'optim' does not accept the Hessian. I have scanned the CRAN task page for optimization and nothing pops.
For what it's worth, I am able to perform the optimization in MATLAB using 'fminunc' with the the 'GradObj' and 'Hessian' arguments.
I think the package trust which does trust region optimization will do the trick. From the documentation of trust, you see that
This function carries out a minimization or maximization of a function
using a trust region algorithm... (it accepts) an R function that
computes value, gradient, and Hessian of the function to be minimized
or maximized and returns them as a list with components value,
gradient, and hessian.
In fact, I think it uses the same algorithm used by fminunc.
By default fminunc chooses the large-scale algorithm if you supply the
gradient in fun and set GradObj to 'on' using optimset. This algorithm
is a subspace trust-region method and is based on the
interior-reflective Newton method described in [2] and [3]. Each
iteration involves the approximate solution of a large linear system
using the method of preconditioned conjugate gradients (PCG). See
Large Scale fminunc Algorithm, Trust-Region Methods for Nonlinear
Minimization and Preconditioned Conjugate Gradient Method.
Both stats::nlm() and stats::nlminb() take analytical gradients and hessians. Note, however, that the former (nlm()) currently does not update the analytical gradient correctly but that this is fixed in the current development version of R (since R-devel, svn rev 72555).