There are two methods I am experimenting with in minimizing a cost function. The first is optim() and the second is optim_nm() part of the optimization package. The problem I am facing is my error function takes on 2 parameters,
A list of variable parameters the optimization function needs to modify
A set of fixed parameters
optim(par = variableParameters,fn = error_function, par2 = fixedParameters):
optim handles this well because the first argument is the variable parameters, the function and then a set of optional parameters where I can pass the fixed parameters. This works, however, the function is slow.
optim_nm(fun = error_function,k=5,start = variable_parameters)
optim_nm, allows me to tune the optimization function, however, i'm unsure of how to pass the fixed parameters. All the examples in the documentation are with variable parameters.
Both methods implement the Nelder and Mead algorithm which is robust for nondifferentiable error functions which is what I require. If there are other packages that do this fast please do mention them too!
If someone has used this, or can better interpret the documentation I could use your help.
optim_nm Documentation
optim documentation
Create a wrapper function that fills in the values for the fixed parameters:
error_function <- function(variableParameters, fixedParameters) {
...
}
wrapper <- function(x) {
error_function(x, fixedParameters = 3)
}
optim_nm(fun = wrapper,
k = 5,
start = initial_parameter_values)
If error_function is expensive to evaluate, you may want to look into Bayesian optimization with the rBayesianOptimization or mlrMBO packages.
Related
Looking at the Flux.jl docs, I see there a ton of built in loss functions: https://fluxml.ai/Flux.jl/stable/models/losses/. My question is how can I define and use my own loss function in Flux if I want something more esoteric?
You can use any differentiable function which returns a single float value as your loss, as stated in the comment above, the prepared functions are just for your convenience.
You can pass anything e.g.
using Flux
yourcustomloss(ŷ, y) = sum(.- sum(y .* logsoftmax(ŷ), dims = 1))
and calculate the gradient of it or pass it to train! function.
I have a function that I am optimizing using the optimx function in R (I'm also open to using optim, since I'm not sure it will make a difference for what I'm trying to do). I have a gradient that I am passing to optimx for (hopefully) faster convergence compared to not using a gradient. Both the function and the gradient use many of the same quantities that are computed from each new parameter set. One of these quantities in particular is very computationally costly, and it's redundant to have to compute this quantity twice for each iteration - once for the function, and again for the gradient. I'm trying to find a way to compute this quantity once, then pass it to the function and the gradient.
So here is what I am doing. So far this works, but it is inefficient:
optfunc<-function(paramvec){
quant1<-costlyfunction(paramvec)
#costlyfunction is a separate function that takes a while to run
loglikelihood<-sum(quant1)**2
#not really squared, but the log likelihood uses quant1 in its calculation
return(loglikelihood)
}
optgr<-function(paramvec){
quant1<-costlyfunction(paramvec)
mygrad<-sum(quant1) #again not the real formula, just for illustration
return(mygrad)
}
optimx(par=paramvec,fn=optfunc,gr=optgr,method="BFGS")
I am trying to find a way to calculate quant1 only once with each iteration of optimx. It seems the first step would be to combine fn and gr into a single function. I thought the answer to this question may help me, and so I recoded the optimization as:
optfngr<-function(){
quant1<-costlyfunction(paramvec)
optfunc<-function(paramvec){
loglikelihood<-sum(quant1)**2
return(loglikelihood)
}
optgr<-function(paramvec){
mygrad<-sum(quant1)
return(mygrad)
}
return(list(fn = optfunc, gr = optgr))
}
do.call(optimx, c(list(par=paramvec,method="BFGS",optfngr() )))
Here, I receive the error: "Error in optimx.check(par, optcfg$ufn, optcfg$ugr, optcfg$uhess, lower, : Cannot evaluate function at initial parameters." Of course, there are obvious problems with my code here. So, I'm thinking answering any or all of the following questions may shed some light:
I passed paramvec as the only arguments to optfunc and optgr so that optimx knows that paramvec is what needs to be iterated over. However, I don't know how to pass quant1 to optfunc and optgr. Is it true that if I try to pass quant1, then optimx will not properly identify the parameter vector?
I wrapped optfunc and optgr into one function, so that the quantity quant1 will exist in the same function space as both functions. Perhaps I can avoid this if I can find a way to return quant1 from optfunc, and then pass it to optgr. Is this possible? I'm thinking it's not, since the documentation for optimx is pretty clear that the function needs to return a scalar.
I'm aware that I might be able to use the dots arguments to optimx as extra parameter arguments, but I understand that these are for fixed parameters, and not arguments that will change with each iteration. Unless there is also a way to manipulate this?
Thanks in advance!
Your approach is close to what you want, but not quite right. You want to call costlyfunction(paramvec) from within optfn(paramvec) or optgr(paramvec), but only when paramvec has changed. Then you want to save its value in the enclosing frame, as well as the value of paramvec that was used to do it. That is, something like this:
optfngr<-function(){
quant1 <- NULL
prevparam <- NULL
updatecostly <- function(paramvec) {
if (!identical(paramvec, prevparam)) {
quant1 <<- costlyfunction(paramvec)
prevparam <<- paramvec
}
}
optfunc<-function(paramvec){
updatecostly(paramvec)
loglikelihood<-sum(quant1)**2
return(loglikelihood)
}
optgr<-function(paramvec){
updatecostly(paramvec)
mygrad<-sum(quant1)
return(mygrad)
}
return(list(fn = optfunc, gr = optgr))
}
do.call(optimx, c(list(par=paramvec,method="BFGS"),optfngr() ))
I used <<- to make assignments to the enclosing frame, and fixed up your do.call second argument.
Doing this is called "memoization" (or "memoisation" in some locales; see http://en.wikipedia.org/wiki/Memoization), and there's a package called memoise that does it. It keeps track of lots of (or all of?) the previous results of calls to costlyfunction, so would be especially good if paramvec only takes on a small number of values. But I think it won't be so good in your situation because you'll likely only make a small number of repeated calls to costlyfunction and then never use the same paramvec again.
I'm using the function nls.lm {package: minpack.lm} to optimize a parameteristion for a hydrological model. The function is working quite well, but I want to use an other objective function (OF). Normally, the obective function "fn" in the nls.lm is defined as
A function that returns a vector of residuals, the sum square of which
is to be minimized. The first argument of fn must be par.
Now I want to use the Nash-Sutcliff-Efficiency, which is defined as
NSE <- 1 - (sum((obs - sim)^2) / sum((obs - mean(obs))^2))
or other OF. The problem is that nls.lm minimizes the expression sum(x)^2 and only the x is modifiable. I know that the best fit NSE = 1. Thus 1 - NSE creates a real minimization problem.
BTW: Example 1 from a nls.lm help page is a good example; there
observed - getPred(p,xx)
is minimized, what actually means that
sum ( observed - getPred(p,xx) )^2
is minimized by the nls.lm function, whereas getPred(p,xx) returns sim.
Any suggestion would be helpful. Thanks in advance.
Micha
nls.lm (and the related functions nls, and nlsLM) are designed to minimize the sum square of the residuals. For the problem you seek to solve, I would try application of a general-purpose minimizer.
If the problem is 'not too hard' (that is, has a single global minimum, is smooth), you could try to apply 'optim' to it (I would try the 'Nelder-Mead' and 'BFGS' options first), or the 'bobyqa' function from the package 'minqa', among other functions.
If the problem requires a global optimizer, you could try the 'GenSA' function from package 'GenSA', the 'genoud' function from the package 'rgenoud', or the 'DEoptim' function from package 'DEoptim', among other options. A review on 'Global Optimization in R' is forthcoming in the Journal of Statistical Software, and that will give a more complete overview of applicable functions.
Suppose you have a function f<- function(x,y,z) { ... }. How would you go about passing a constant to one argument, but letting the other ones vary? In other words, I would like to do something like this:
output <- outer(x,y,f(x,y,z=2))
This code doesn't evaluate, but is there a way to do this?
outer(x, y, f, z=2)
The arguments after the function are additional arguments to it, see ... in ?outer. This syntax is very common in R, the whole apply family works the same for instance.
Update:
I can't tell exactly what you want to accomplish in your follow up question, but think a solution on this form is probably what you should use.
outer(sigma_int, theta_int, function(s,t)
dmvnorm(y, rep(0, n), y_mat(n, lambda, t, s)))
This calculates a variance matrix for each combination of the values in sigma_int and theta_int, uses that matrix to define a dennsity and evaluates it in the point(s) defined in y. I haven't been able to test it though since I don't know the types and dimensions of the variables involved.
outer (along with the apply family of functions and others) will pass along extra arguments to the functions which they call. However, if you are dealing with a case where this is not supported (optim being one example), then you can use the more general approach of currying. To curry a function is to create a new function which has (some of) the variables fixed and therefore has fewer parameters.
library("functional")
output <- outer(x,y,Curry(f,z=2))
I've alway had trouble understanding the the documentation on how S3 methods are called, and this time it's biting me back.
I'll apologize up front for asking more than one question, but they are all closely related. Deep in the heart of a complex set of functions, I create a lot of glmnet fits, in particular logistic ones. Now, glmnet documentation specifies its return value to have both classes "glmnet" and (for logistic regression) "lognet". In fact, these are specified in this order.
However, looking at the end of the implementation of glmnet, righter after the call to (the internal function) lognet, that sets the class of fit to "lognet", I see this line of code just before the return (of the variable fit):
class(fit) = c(class(fit), "glmnet")
From this, I would conclude that the order of the classes is in fact "lognet", "glmnet".
Unfortunately, the fit I had, had (like the doc suggests):
> class(myfit)
[1] "glmnet" "lognet"
The problem with this is the way S3 methods are dispatched for it, in particular predict. Here's the code for predict.lognet:
function (object, newx, s = NULL, type = c("link", "response",
"coefficients", "class", "nonzero"), exact = FALSE, offset,
...)
{
type = match.arg(type)
nfit = NextMethod("predict") #<- supposed to call predict.glmnet, I think
switch(type, response = {
pp = exp(-nfit)
1/(1 + pp)
}, class = ifelse(nfit > 0, 2, 1), nfit)
}
I've added a comment to explain my reasoning. Now when I call predict on this myfit with a new datamatrix mydata and type="response", like this:
predict(myfit, newx=mydata, type="response")
, I do not, as per the documentation, get the predicted probabilities, but the linear combinations, which is exactly the result of calling predict.glmnet immediately.
I've tried reversing the order of the classes, like so:
orgclass<-class(myfit)
class(myfit)<-rev(orgclass)
And then doing the predict call again: lo and behold: it works! I do get the probabilities.
So, here come some questions:
Am I right in 'having learned' that
S3 methods are dispatched in order
of appearance of the classes?
Am I right in assuming the code in
glmnetwould cause the wrong order
for correct dispatching of
predict?
In my code there is nothing that
manipulates classes
explicitly/visibly to my knowledge.
What could cause the order to
change?
For completeness' sake: here's some sample code to play around with (as I'm doing myself now):
library(glmnet)
y<-factor(sample(2, 100, replace=TRUE))
xs<-matrix(runif(100), ncol=1)
colnames(xs)<-"x"
myfit<-glmnet(xs, y, family="binomial")
mydata<-matrix(runif(10), ncol=1)
colnames(mydata)<-"x"
class(myfit)
predict(myfit, newx=mydata, type="response")
class(myfit)<-rev(class(myfit))
class(myfit)
predict(myfit, newx=mydata, type="response")
class(myfit)<-rev(class(myfit))#set it back
class(myfit)
Depending on the data generated, the difference is more or less obvious (in my true dataset I noticed negative values in the so called probabilities, which is how I picked up the problem), but you should indeed see a difference.
Thanks for any input.
Edit:
I just found out the horrible truth: either order worked in glmnet 1.5.2 (which is present on the server where I ran the actual code, resulting in the fit with the class order reversed), but the code from 1.6 requires the order to be "lognet", "glmnet". I have yet to check what happens in 1.7.
Thanks to #Aaron for reminding me of the basics of informatics (besides 'if all else fails, restart': 'check your versions'). I had mistakenly assumed that a package by the gods of statistical learning would be protected from this type of error), and to #Gavin for confirming my reconstruction of how S3 works.
Yes, the order of dispatch is in the order in which the classes are listed in the class attribute. In the simple, every-day case, yes, the first stated class is the one chosen first by method dispatch, and only if it fails to find a method for that class (or NextMethod is called) will it move on to the second class, or failing that search for a default method.
No, I do not think you are right that the order of the classes is wrong in the code. The documentation appears wrong. The intent is clearly to call predict.lognet() first, use the workhorse predict.glmnet() to do the basic computations for all types of lasso/elastic net models fitted by glmnet, and finally do some post processing of those general predictions. That predict.glmnet() is not exported from the glmnet NAMESPACE whilst the other methods are is perhaps telling, also.
I'm not sure why you think the output from this:
predict(myfit, newx=mydata, type="response")
is wrong? I get a matrix of 10 rows and 21 columns, with the columns relating to the intercept-only model prediction plus predictions at 20 values of lambda at which model coefficients along the lasso/elastic net path have been computed. These do not seem to be linear combinations and are one the response scale as you requested.
The order of the classes is not changing. I think you are misunderstanding how the code is supposed to work. There is a bug in the documentation, as the ordering is stated wrong there. But the code is working as I think it should.