Looking at the Flux.jl docs, I see there a ton of built in loss functions: https://fluxml.ai/Flux.jl/stable/models/losses/. My question is how can I define and use my own loss function in Flux if I want something more esoteric?
You can use any differentiable function which returns a single float value as your loss, as stated in the comment above, the prepared functions are just for your convenience.
You can pass anything e.g.
using Flux
yourcustomloss(ŷ, y) = sum(.- sum(y .* logsoftmax(ŷ), dims = 1))
and calculate the gradient of it or pass it to train! function.
Related
There are two methods I am experimenting with in minimizing a cost function. The first is optim() and the second is optim_nm() part of the optimization package. The problem I am facing is my error function takes on 2 parameters,
A list of variable parameters the optimization function needs to modify
A set of fixed parameters
optim(par = variableParameters,fn = error_function, par2 = fixedParameters):
optim handles this well because the first argument is the variable parameters, the function and then a set of optional parameters where I can pass the fixed parameters. This works, however, the function is slow.
optim_nm(fun = error_function,k=5,start = variable_parameters)
optim_nm, allows me to tune the optimization function, however, i'm unsure of how to pass the fixed parameters. All the examples in the documentation are with variable parameters.
Both methods implement the Nelder and Mead algorithm which is robust for nondifferentiable error functions which is what I require. If there are other packages that do this fast please do mention them too!
If someone has used this, or can better interpret the documentation I could use your help.
optim_nm Documentation
optim documentation
Create a wrapper function that fills in the values for the fixed parameters:
error_function <- function(variableParameters, fixedParameters) {
...
}
wrapper <- function(x) {
error_function(x, fixedParameters = 3)
}
optim_nm(fun = wrapper,
k = 5,
start = initial_parameter_values)
If error_function is expensive to evaluate, you may want to look into Bayesian optimization with the rBayesianOptimization or mlrMBO packages.
I have written a function that takes two arguments, a number between 0:16 and a vector which contains four parameter values.
The output of the function does change if I change the parameters in the vector, but it does not change if I change the number between 0:16.
I can add, that the function I'm having troubles with, includes another function (called 'pi') which takes the same arguments.
I have checked that the 'pi' function does actually change values if I change the value from 0:16 (and it does also change if I change the values of the parameters).
Firstly, here is my code;
pterm_ny <- function(x, theta){
(1-sum(theta[1:2]))*(theta[4]^(x))*exp((-1)*theta[4])/pi(x, theta)
}
pi <- function(x, theta){
theta[1]*1*(x==0)+theta[2]*(theta[3]^(x))*exp((-1)*(theta[3]))+(1-
sum(theta[1:2]))*(theta[4]^(x))*exp((-1)*(theta[4]))
}
Which returns 0.75 for pterm_ny(i,c(0.2,0.2,2,2)), were i = 1,...,16 and 0.2634 for i = 0, which tells me that the indicator function part in 'pi' does work.
With respect to raising a number to a certain power, I have been told that one should wrap the wished number in a 'I', as an example it would be like;
x^I(2)
I have tried to do that in my code, but that didn't help either.
I can't remember the argument for doing it, but I expect that it's to ensure that the number in parentheses is interpreted as an integer.
My end goal is to get 17 different values of the 'pterm' and to accomplish that, I was thinking of using the sapply function like this;
sapply(c(0:16),pterm_ny,theta = c(0.2,0.2,2,2))
I really hope that someone can point out what I'm missing here.
In advance, thank you!
You have a theta[4]^x term both in your main expression and in your pi() function; these are cancelling out, leaving the result invariant to changes in x ...
Also:
you might want to avoid using pi as your function name, as it's also a built-in variable (3.14159...) - this can sometimes cause confusion
the advice about using the "as is" function I() to protect powers is only relevant within formulas, e.g. as used in lm() (linear regression). (It would be used as I(x^2), not x^I(2)
I would like to use optim() to optimize a cost function (fn argument), and I will be providing a gradient (gr argument). I can write separate functions for fn and gr. However, they have a lot of code in common and I don't want the optimizer to waste time repeating those calculations. So is it possible to provide one function that computes both the cost and the gradient? If so, what would be the calling syntax to optim()?
As an example, suppose the function I want to minimize is
cost <- function(x) {
x*exp(x)
}
Obviously, this is not the function I'm trying to minimize. That's too complicated to list here, but the example serves to illustrate the issue. Now, the gradient would be
grad <- function(x) {
(x+1)*exp(x)
}
So as you can see, the two functions, if called separately, would repeat some of the work (in this case, the exponential function). However, since optim() takes two separate arguments (fn and gr), it appears there is no way to avoid this inefficiency, unless there is a way to define a function like
costAndGrad <- function(x) {
ex <- exp(x)
list(cost=x*ex, grad=(x+1)*ex)
}
and then pass that function to optim(), which would need to know how to extract the cost and gradient.
Hope that explains the problem. Like I said my function is much more complicated, but the idea is the same: there is considerable code that goes into both calculations (cost and gradient), which I don't want to repeat unnecessarily.
By the way, I am an R novice, so there might be something simple that I'm missing!
Thanks very much
The nlm function does optimization and it expects the gradient information to be returned as an attribute to the value returned as the original function value. That is similar to what you show above. See the examples in the help for nlm.
I'm using the function nls.lm {package: minpack.lm} to optimize a parameteristion for a hydrological model. The function is working quite well, but I want to use an other objective function (OF). Normally, the obective function "fn" in the nls.lm is defined as
A function that returns a vector of residuals, the sum square of which
is to be minimized. The first argument of fn must be par.
Now I want to use the Nash-Sutcliff-Efficiency, which is defined as
NSE <- 1 - (sum((obs - sim)^2) / sum((obs - mean(obs))^2))
or other OF. The problem is that nls.lm minimizes the expression sum(x)^2 and only the x is modifiable. I know that the best fit NSE = 1. Thus 1 - NSE creates a real minimization problem.
BTW: Example 1 from a nls.lm help page is a good example; there
observed - getPred(p,xx)
is minimized, what actually means that
sum ( observed - getPred(p,xx) )^2
is minimized by the nls.lm function, whereas getPred(p,xx) returns sim.
Any suggestion would be helpful. Thanks in advance.
Micha
nls.lm (and the related functions nls, and nlsLM) are designed to minimize the sum square of the residuals. For the problem you seek to solve, I would try application of a general-purpose minimizer.
If the problem is 'not too hard' (that is, has a single global minimum, is smooth), you could try to apply 'optim' to it (I would try the 'Nelder-Mead' and 'BFGS' options first), or the 'bobyqa' function from the package 'minqa', among other functions.
If the problem requires a global optimizer, you could try the 'GenSA' function from package 'GenSA', the 'genoud' function from the package 'rgenoud', or the 'DEoptim' function from package 'DEoptim', among other options. A review on 'Global Optimization in R' is forthcoming in the Journal of Statistical Software, and that will give a more complete overview of applicable functions.
Suppose you have a function f<- function(x,y,z) { ... }. How would you go about passing a constant to one argument, but letting the other ones vary? In other words, I would like to do something like this:
output <- outer(x,y,f(x,y,z=2))
This code doesn't evaluate, but is there a way to do this?
outer(x, y, f, z=2)
The arguments after the function are additional arguments to it, see ... in ?outer. This syntax is very common in R, the whole apply family works the same for instance.
Update:
I can't tell exactly what you want to accomplish in your follow up question, but think a solution on this form is probably what you should use.
outer(sigma_int, theta_int, function(s,t)
dmvnorm(y, rep(0, n), y_mat(n, lambda, t, s)))
This calculates a variance matrix for each combination of the values in sigma_int and theta_int, uses that matrix to define a dennsity and evaluates it in the point(s) defined in y. I haven't been able to test it though since I don't know the types and dimensions of the variables involved.
outer (along with the apply family of functions and others) will pass along extra arguments to the functions which they call. However, if you are dealing with a case where this is not supported (optim being one example), then you can use the more general approach of currying. To curry a function is to create a new function which has (some of) the variables fixed and therefore has fewer parameters.
library("functional")
output <- outer(x,y,Curry(f,z=2))