R: Deep Neural Network with Custom Loss Function - r

(In R)
Suppose I have a loss function, which takes a function as an input and evaluates it at a (fixed) series of transformations of a fixed data-set. Is it possible to integrate this into tensorflow and use it as a custom loss function for
DNN regression? In order to perform Deep Learning, I'm currently using a tensorflow -> R interface.

The keras implementation of R allows you to use a custom loss function. However, the function needs to be implemented using a very specific syntax and should take in y_true and y_pred parameters. You can find a nice tutorial here. The following code would give you some intution:
model %>% compile(
optimizer = "your-choice-of-optimezer",
loss = custom_loss_function,
metrics = c("your-choice-of-metric")
)
where
custom_loss_function <- function(y_true, y_pred) {
K <- backend()
... # define your function using the backend K
}

Related

Explicit gradient for custom loss function in keras

I'm working in R and trying to get started with neural networks, using the keras package.
I'd like to use a custom loss function for training my NN. It's possible to do this by writing a the custom loss function as lossFn <- function(y_true, y_pred) { ... } and passing it to the compile method as model %>% compile(loss = lossFn, ...).
Now in order to use the gradient descent method of training the NN, the loss function needs to be differentiable. I understand that you'd usually accomplish this by restricting yourself to using backend functions in your loss function, e.g.
lossFn <- function(y_true, y_pred) {
K <- backend()
K$mean(K$square(y_true - y_pred), axis = 1L)
}
or something like that.
Now, my problem is that I cannot express my loss function this way; I need to use functions that aren't available in the backend.
So my idea was that I'd work out the gradient myself on paper, and then provide it to compile as another argument, say compile(loss = lossFn, gradient = gradientFn, ...), with gradientFn suitably defined.
The documentation for keras (the R package!) does not indicate that this is possible. At the same time, it does not suggest it's not. And googling has turned up little that is relevant.
So my question is, is it possible?
An addendum: since Google has suggested that there are other training methods for NNs that do not rely on the gradient of the loss function, I should add I'm not too hung up on the specific training method. My ultimate goal isn't to manually supply the gradient of a custom loss function, it's to use a custom loss function to train the NN. The gradient is just a technical obstacle for me right now.
Thanks!
This is certainly possible in Keras, you'll just have to move up the stack a little and implement a train_step method and then call optimizer$apply_gradients().
Chapter 7 in the Deep Learning with R book covers this use case:
https://github.com/t-kalinowski/deep-learning-with-R-2nd-edition-code/blob/9f8b6d08dbb8d6565e4f5396e509aaea3e242b84/ch07.R#L608
Also, this keras guide may be useful, even though it's in Python and you're working in R. (The Python interface is very similar to the R interface).
https://keras.io/guides/writing_a_training_loop_from_scratch/

How to use a Loss function in Flux.jl

As I was reading through the Flux docs, I see there are a bunch of different loss functions defined for us that we can use. I understand that the loss tells us how far we are away from from target value. But where do I actually make use of the loss function in the training loop?
If you are using the built in train!() function, you can define your loss function and use it during training as follows:
loss(x, y) = Flux.Losses.mse(m(x), y)
ps = Flux.params(m)
Flux.train!(loss, ps, data, opt)
where Flux.Losses.mse is using the built in Mean Squared Error function to calculate the distance between m(x) and y. You can read more about loss functions in Flux here: https://fluxml.ai/Flux.jl/stable/training/training/#Loss-Functions

Tensorflow works w/ GPU, but warns when invoked within R 'for' loop: 'triggered tf.function retracing'

I'm using R/Keras/TensorFlow to implement a CNN for regression in the context of neuroscience analysis, and this entails applying the same TF model (exact same model structure and input dimensions) on hundreds of different X/Y combinations. The network runs and outputs expected values but complains about tensorflow being called repeatedly.
The schematic approach I took was
for (i in nrow(xy_combinations) {
k_clear_session()
model = keras_model_sequential() %>% [model comes here]
model %>% compile [details here]
model %>% fit(as.array(x_train), as.matrix(y_train), epochs=50, batch_size=32, shuffle = FALSE, verbose=0)
y_pred = model %>% predict(x_test))
do_stuff(y_pred)
}
Everything works well (meaning I get similar and sane results for VMs configd for GPU or CPU), but after a few iterations on a GPU install, I receive a tensorflow warning of the sort:
5 out of the last 13 calls to <function Model.make_predict_function..predict_function at 0x7f0f19a59840> triggered tf.function retracing. Tracing is expensive
I use k_clear_session and model (re)compiling to make sure the model doesn't maintain weights across 'fit' calls, because each invocation of the model applies to different X/Y pairs, with potentially very different relations. Is there a saner way to implement this loop, within R, in a way that uses tensorflow optimally? I'd prefer not to have to code TF at the graph level and stay in R.
Thanks.

feature selection function in caret package

I am posting this because this postfeture selection in caret hasent helped my issue and I have 2 questions regarding feature selection function in caret package
when I run code below on my matrix of gene expression allsamplecombat with 5 classes defined in y= :
control <- rfeControl(functions=rfFuncs, method="cv", number=10)
results <- rfe(t(allsamplecombat[filter,]), y = factor(info$clust), sizes=c(300,400,500,600,700,800,1000,1200), rfeControl=control)
I get an out put like this
So, I want to know if I can extract top features for each classes, because predictors(results) just give me the resulting feature without indicating importance for each classes.
my second problem is that when i try to change rfeControl functions to treebagFuncs and run 'parRF` method
control <- rfeControl(functions=treebagFuncs, method="cv", number=5)
results <- rfe(t(allsamplecombat[filter,]), y = factor(info$clust), sizes=c(400,500,600,700,800), rfeControl=control, method="parRF")
i get Error in { : task 1 failed - "subscript out of bounds" error.
what is wrong in my code?
For the importances, there is a sub-object called variables that contains this information for each step of the elimination.
treebagFuncs is designed to work with ipred's bagging function and isn't related to random forest.
You would probably used caretFuncs and pass method to that. However, if you are going to parallelize something, do it to the resampling loop and not the model function. This is generally more efficient. Note that if you do both with M workers, you might actually get M^3 (one for rfe, one for train, and one for parRF). There are options in rfe and train to turn their parallelism off.

Spark ML Pipeline Logistic Regression Produces Much Worse Predictions Than R GLM

I used ML PipeLine to run logistic regression models but for some reasons I got worst results than R. I have done some researches and the only post that I found that is related to this issue is this . It seems that Spark Logistic Regression returns models that minimize loss function while R glm function uses maximum likelihood. The Spark model only got 71.3% of the records right while R can predict 95.55% of the cases correctly. I was wondering if I did something wrong on the set up and if there's a way to improve the prediction. The below is my Spark code and R code-
Spark code
partial model_input
label,AGE,GENDER,Q1,Q2,Q3,Q4,Q5,DET_AGE_SQ
1.0,39,0,0,1,0,0,1,31.55709342560551
1.0,54,0,0,0,0,0,0,83.38062283737028
0.0,51,0,1,1,1,0,0,35.61591695501733
def trainModel(df: DataFrame): PipelineModel = {
val lr = new LogisticRegression().setMaxIter(100000).setTol(0.0000000000000001)
val pipeline = new Pipeline().setStages(Array(lr))
pipeline.fit(df)
}
val meta = NominalAttribute.defaultAttr.withName("label").withValues(Array("a", "b")).toMetadata
val assembler = new VectorAssembler().
setInputCols(Array("AGE","GENDER","DET_AGE_SQ",
"QA1","QA2","QA3","QA4","QA5")).
setOutputCol("features")
val model = trainModel(model_input)
val pred= model.transform(model_input)
pred.filter("label!=prediction").count
R code
lr <- model_input %>% glm(data=., formula=label~ AGE+GENDER+Q1+Q2+Q3+Q4+Q5+DET_AGE_SQ,
family=binomial)
pred <- data.frame(y=model_input$label,p=fitted(lr))
table(pred $y, pred $p>0.5)
Feel free to let me know if you need any other information. Thank you!
Edit 9/18/2015 I have tried increasing the maximum iteration and decreasing the tolerance dramatically. Unfortunately, it didn't improve the prediction. It seems the model converged to a local minimum instead of the global minimum.
It seems that Spark Logistic Regression returns models that minimize loss function while R glm function uses maximum likelihood.
Minimization of a loss function is pretty much a definition of the linear models and both glm and ml.classification.LogisticRegression are no different here. Fundamental difference between these two is the way how it is achieved.
All linear models from ML/MLlib are based on some variants of Gradient descent. Quality of the model generated using this approach vary on a case by case basis and depend on the Gradient Descent and regularization parameters.
R from the other hand computes an exact solution which, given its time complexity, is not well suited for large datasets.
As I've mentioned above quality of the model generated using GS depends on the input parameters so typical way to improve it is to perform hyperparameter optimization. Unfortunately ML version is rather limited here compared to MLlib but for starters you can increase a number of iterations.

Resources