How to use a Loss function in Flux.jl

How to use a Loss function in Flux.jl - julia

As I was reading through the Flux docs, I see there are a bunch of different loss functions defined for us that we can use. I understand that the loss tells us how far we are away from from target value. But where do I actually make use of the loss function in the training loop?

If you are using the built in train!() function, you can define your loss function and use it during training as follows:
loss(x, y) = Flux.Losses.mse(m(x), y)
ps = Flux.params(m)
Flux.train!(loss, ps, data, opt)
where Flux.Losses.mse is using the built in Mean Squared Error function to calculate the distance between m(x) and y. You can read more about loss functions in Flux here: https://fluxml.ai/Flux.jl/stable/training/training/#Loss-Functions

Related

Explicit gradient for custom loss function in keras

I'm working in R and trying to get started with neural networks, using the keras package.
I'd like to use a custom loss function for training my NN. It's possible to do this by writing a the custom loss function as lossFn <- function(y_true, y_pred) { ... } and passing it to the compile method as model %>% compile(loss = lossFn, ...).
Now in order to use the gradient descent method of training the NN, the loss function needs to be differentiable. I understand that you'd usually accomplish this by restricting yourself to using backend functions in your loss function, e.g.
lossFn <- function(y_true, y_pred) {
K <- backend()
K$mean(K$square(y_true - y_pred), axis = 1L)
}
or something like that.
Now, my problem is that I cannot express my loss function this way; I need to use functions that aren't available in the backend.
So my idea was that I'd work out the gradient myself on paper, and then provide it to compile as another argument, say compile(loss = lossFn, gradient = gradientFn, ...), with gradientFn suitably defined.
The documentation for keras (the R package!) does not indicate that this is possible. At the same time, it does not suggest it's not. And googling has turned up little that is relevant.
So my question is, is it possible?
An addendum: since Google has suggested that there are other training methods for NNs that do not rely on the gradient of the loss function, I should add I'm not too hung up on the specific training method. My ultimate goal isn't to manually supply the gradient of a custom loss function, it's to use a custom loss function to train the NN. The gradient is just a technical obstacle for me right now.
Thanks!

This is certainly possible in Keras, you'll just have to move up the stack a little and implement a train_step method and then call optimizer$apply_gradients().
Chapter 7 in the Deep Learning with R book covers this use case:
https://github.com/t-kalinowski/deep-learning-with-R-2nd-edition-code/blob/9f8b6d08dbb8d6565e4f5396e509aaea3e242b84/ch07.R#L608
Also, this keras guide may be useful, even though it's in Python and you're working in R. (The Python interface is very similar to the R interface).
https://keras.io/guides/writing_a_training_loop_from_scratch/

Using optim to find the equilibrium for a discrete 2-equation model

I am working with a 2-variable, 2-equation model. I would like to use the optim function to find the equilibrium of the model numerically. The model looks something like this (with f1 and f2 being defined already as functions).
X_{t+1} = f1(x_t, y_t)
Y_{t+1} = f2(x_t, y_t)
Additionally, there are various parameters to this system, so if at all possible I would like the code to make it relatively easy to vary these parameters.
I have been struggling with trying to make this work, but always get an error from optim when I try. Does anyone know how to use this function? The documentation is unfortunately pretty sparse on details.
Thank you for any help.
Edit: I should also add that part of my problem is that optim seems to only want to take values of length 1, when I have two variables and two equations. Placing it in a vector has not worked, and presented the following error:
Error in optim(initialGuess.v, sumDiffSqr, parms = parms, f1 = f1, f2 = f2) :
objective function in optim evaluates to length 2 not 1
Edit 2: I have solved the issue. Unfortunately SO will not allow me to accept the answer for 2 days.

I figured out the issue. You have to provide some value related to the functions, but not directly the functions, in order for R to optimize it. So what I did was wrote a function that found the difference between the next step and the previous step of both f1 and f2, then added the squares of the differences. This value gets plugged into optim, which then allows R to try and optimize this difference and minimize it until it produces an estimate where this difference is very small (in other words, at equilibrium).

R: Deep Neural Network with Custom Loss Function

(In R)
Suppose I have a loss function, which takes a function as an input and evaluates it at a (fixed) series of transformations of a fixed data-set. Is it possible to integrate this into tensorflow and use it as a custom loss function for
DNN regression? In order to perform Deep Learning, I'm currently using a tensorflow -> R interface.

The keras implementation of R allows you to use a custom loss function. However, the function needs to be implemented using a very specific syntax and should take in y_true and y_pred parameters. You can find a nice tutorial here. The following code would give you some intution:
model %>% compile(
optimizer = "your-choice-of-optimezer",
loss = custom_loss_function,
metrics = c("your-choice-of-metric")
)
where
custom_loss_function <- function(y_true, y_pred) {
K <- backend()
... # define your function using the backend K
}

100-fold-cross-validation for Ridge Regression in R

I have a huge dataset, and I am quite new to R, so the only way I can think of implementing 100-fold-CV by myself is through many for's and if's which makes it extremely inefficient for my huge dataset, and might even take several hours to compile. I started looking for packages that do this instead and found quite many topics related to CV on stackoverflow, and I have been trying to use the ones I found but none of them are working for me, I would like to know what I am doing wrong here.
For instance, this code from DAAG package:
cv.lm(data=Training_Points, form.lm=formula(t(alpha_cofficient_values)
%*% Training_Points), m=100, plotit=TRUE)
..gives me the following error:
Error in formula.default(t(alpha_cofficient_values)
%*% Training_Points) : invalid formula
I am trying to do Kernel Ridge Regression, therefore I have alpha coefficient values already computed. So for getting predictions, I only need to do either t(alpha_cofficient_values)%*% Test_Points or simply crossprod(alpha_cofficient_values,Test_Points) and this will give me all the predictions for unknown values. So I am assuming that in order to test my model, I should do the same thing but for KNOWN values, therefore I need to use my Training_Points dataset.
My Training_Points data set has 9000 columns and 9000 rows. I can write for's and if's and do 100-fold-CV each time take 100 rows as test_data and leave 8900 rows for training and do this until the whole data set is done, and then take averages and then compare with my known values. But isn't there a package to do the same? (and ideally also compare the predicted values with known values and plot them, if possible)
Please do excuse me for my elementary question, I am very new to both R and cross-validation, so I might be missing some basic points.

The CVST package implements fast cross-validation via sequential testing. This method significantly speeds up the computations while preserving full cross-validation capability. Additionaly, the package developers also added default cross validation functionality.
I haven't used the package before but it seems pretty flexible and straightforward to use. Additionally, KRR is readily available as a CVST.learner object through the constructKRRLearner() function.
To use the crossval functionality, you must first convert your data to a CVST.data object by using the constructData(x, y) function, with x the feature data and y the labels. Next, you can use one of the cross validation functions to optimize over a defined parameter space. You can tweak the settings of both the cv or fastcv methods to your liking.
After the cross validation spits out the optimal parameters you can create the model by using the learn function and subsequently predict new labels.
I puzzled together an example from the package documentation on CRAN.
# contruct CVST.data using constructData(x,y)
# constructData(x,y)
# Load some data..
ns = noisySinc(1000)
# Kernel ridge regression
krr = constructKRRLearner()
# Create parameter Space
params=constructParams(kernel="rbfdot", sigma=10^(-3:3),
lambda=c(0.05, 0.1, 0.2, 0.3)/getN(ns))
# Run Crossval
opt = fastCV(ns, krr, params, constructCVSTModel())
# OR.. much slower!
opt = CV(ns, krr, params, fold=100)
# p = list(kernel=opt[[1]]$kernel, sigma=opt[[1]]$sigma, lambda=opt[[1]]$lambda)
p = opt[[1]]
# Create model
m = krr$learn(ns, p)
# Predict with model
nsTest = noisySinc(10000)
pred = krr$predict(m, nsTest)
# Evaluate..
sum((pred - nsTest$y)^2) / getN(nsTest)
If further speedup is required, you can run the cross validations in parallel. View this post for an example of the doparallel package.

A case of GMM estimation in R

I want to estimate the forward looking version of the Taylor rule equation using the iterative nonlinear GMM:
I have the data for all the variables in the model, namely (inflation rate), (unemployment gap) and (effective federal funds rate) and what I am trying to estimate is the set of parameters , and .
Where I need help is in the usage of the gmm() function in the {gmm} R package. I 'think' that the parameters of the function that I need are the parameters:
gmm(g, x, type = "iterative",...)
where g is the formula (so, the model stated above), x is the data vector (or matrix) and type is the type of GMM to use.
My problem is with the data matrix parameter. I do not know the way in which to construct it (not that I don't know of matrices in R and all the examples I have seen on the internet are not similar to what I am attempting to do here. Also, this is my first time using the gmm() function in R. Is there anything else I need to know?
Your help will be much appreciated. Thank you :)