Fixing initial weights when training Keras model in R - r

I want to fix the weights of the model I'm training to get reproducible results for the report. The problem is that on different runs I get similar but slightly different results and training sometimes takes 600 epochs, sometimes takes 3500 using callback_early_stopping monitoring validation MSE and with min_delta of 0.00003.
Overall, I'm happy enough with results of all runs, but just need to find if there's a way to get reproducible results by fixing weights.
I tried setting seed at various parts of the process - before creating model, before compiling it and before training but nothing seems to work. Any way to do it?
BATCH <- nrow(x_train)
SHAPE <- ncol(x_train)
# Create a neural network model
set.seed(42)
model <- keras_model_sequential()
model %>%
layer_dense(units = 12, activation = "relu", input_shape = c(SHAPE)) %>%
layer_dense(units = 24, activation = "relu") %>%
layer_dense(units = 1, activation = "linear")
#print model summary
print(summary(model))
# initialise early stopping callback and optimiser
early_stoping <- callback_early_stopping(
monitor = "val_loss",
min_delta = 0.00003,
patience = 50,
restore_best_weights = TRUE
)
optim <- optimizer_adam(learning_rate = 0.00005)
set.seed(42)
model %>% compile(
optimizer = optim,
loss = "mse",
metrics = c("mse", "mae")
)
# fit model
set.seed(42)
val_data <- list(x_val = x_val, y_val = y_val)
hist <- model %>% fit(
x = x_train,
y = y_train,
batch_size = BATCH,
epochs = 6000,
validation_data = val_data,
shuffle = FALSE,
callbacks = early_stoping
)

Related

How to make keras utilize all available CPU capacity when training?

I am trying to implement a bunch of different neural networks for a regression problem. When I train a single model I see that my computer doesn't utilize all the available CPU, which I guess would be preferable to make the training faster.
Ultimately I want to specify multiple models (around 4), that can be trained simultaneously, but to begin with I just want to utilize all CPU when training a single model. The screenshot below shows how my CPU is used when i train a model:
CPU utilization
In the code below I tried setting use_multiprocessing = TRUE, which I thought could help, but I get the error that the argument is not used.
library(keras)
epoch <- 50
lr <- 0.1
decay <- lr / epoch
# initialize model
fit_NN4 <- keras_model_sequential() %>%
layer_flatten(input_shape = training %>% select(-date, -mktcap, -permno, -ret.adj) %>% ncol()) %>%
layer_dense(units = 64, activation = "relu") %>%
layer_dense(units = 32, activation = "relu") %>%
layer_dense(units = 16, activation = "relu") %>%
layer_dense(units = 8, activation = "relu") %>%
layer_dense(units = 1)
# compile
fit_NN4 %>% compile(
loss = "mse", # loss objective function
#optimizer = optimizer_rmsprop(),
optimizer = optimizer_sgd(lr = lr, decay = decay),
metrics = c("mean_absolute_error")
)
# train the model
fit_NN4 %>%
fit(
# Training data
x = training %>% select(-date, -mktcap, -permno, -ret.adj) %>% as.matrix(),
y = training %>% pull(ret.adj) %>% as.matrix(),
epoch = epoch,
# Validation data
validation_data =
list(validation %>% select(-date, -mktcap, -permno, -ret.adj) %>% as.matrix(),
validation %>% pull(ret.adj) %>% as.matrix()),
# Callbacks
callbacks = list(
callback_early_stopping(monitor = "val_loss", # early stop objective
mode = "min", # minimize objective
verbose = 1, # Return epoch at stop
patience = 4 # wait for 4 epochs to stop relative to min
use_multiprocessing = TRUE
)) # Use all CPU capacity
)

Set class weights in Keras of R when there are multiple outputs

I'm using the keras package in R to fit a neural network model. The model I'm working on has two outputs: output1 is continuous(for regression), output2 is binary(for classification).
Since we have a very imbalanced dataset for the classification problem(output2), I want to assign different class weights to deal with the imbalance, but apparently we don't need to do that for output1(the regression).
Here is the sample code for the NN model that I'm working on:
input <- layer_input(shape = c(32,24))
output <- input %>%
layer_lstm(units = 64, dropout = 0.2, recurrent_dropout = 0.2)
pred1 <- output %>%
layer_dense(units = 1, name = "output1")
pred2 <- output %>%
layer_dense(units = 1, activation = "sigmoid", name = "output2")
model <- keras_model(
input,
list(pred1, pred2)
)
summary(model)
model %>% compile(
optimizer = "rmsprop",
loss = list(
output1 = "mse",
output2 = "binary_crossentropy"
),
loss_weights = list(
output1 = 0.25,
output2 = 10
)
)
history <- model %>% fit(
train_x, list(output1 = train_y1,output2 = train_y2),
epochs = 10,
batch_size = 5000,
class_weight = ???,
validation_data = list(valid_x, list(output1 = valid_y1,output2 = valid_y2))
)
If we just have one binary output, I know that the class weights can be assigned by:
class_weight = list("0"=1,"1"=100),
but it doesn't work anymore when we have two outputs and just want to assign the weights to one of them. I guess I may need to somehow specify the name of the binary output in "class_weight" so that it knows the weights only apply to output2, but I don't know how to do it in R.
Does anyone know how to assign class weights to the binary output only when we have two outputs(one is regression, one is classification)? Thank you very much for the help!

Why is this simple regression (keras) ANN is failing so bad?

I am trying to do a non-linear regression on a very simple data. When running the following code i got really bad results. Almost every time the result is a simple linear regression. When i check the weights of my model most (if not all) neurons are 'dead'. They all have negative weights with negative biases making the ReLu function to return 0 for all inputs (since all inputs are in the range [0,1]).
As far as i can tell this is a problem with the optimizer. I also tried using a very low and a very high learning rate, no luck. The optimizer seems to be getting stuck in a 'very' sub optimal local minima.
I also tried to set the initial weights to be all positive [0,0.1], the optimizer 'cheats' its way into a linear regression by setting all biases roughly at the same value.
Any can help me? what i am doing wrong? Is this really the best a state of the art ANN can achieve on a simple regression problem?
library(keras)
fun <- function(x) 0.2+0.4*x^2+0.3*x*sin(15*x)+0.05*cos(50*x).
x_test <- seq(0,1,0.01)
y_test <- fun(x_test)
plot(x_test, y_test, type = 'l')
x_train <- runif(50)
y_train <- fun(x_train)
points(x_train, y_train)
model <- keras_model_sequential() %>%
layer_dense(10, 'relu', input_shape = 1) %>%
layer_dense(1)
model %>% compile(
optimizer = 'sgd',
loss = "mse"
)
history <- model %>%
fit(x = x_train, y = y_train,
epochs = 100,
batch_size = 10,
validation_data = list(x_test, y_test)
)
y_pred <- model %>% predict(x_test)
plot(x_test, y_test, type = 'l')
points(x_train, y_train)
lines(x_test, y_pred, col = 'red')
predicted outputs versus actual ones.
Change sigmoid with relu activation and fix your ) type error in the end of sgd.
EDIT
Also add a second dense layer and train for much more epochs, like this:
model <- keras_model_sequential() %>%
layer_dense(10, 'relu', input_shape = 1) %>%
layer_dense(10, 'relu') %>%
layer_dense(1)
model %>% compile(
optimizer = 'sgd',
loss = "mse"
)
history <- model %>%
fit(x = x_train, y = y_train,
epochs = 2000,
batch_size = 10,
validation_data = list(x_test, y_test)
)

Model training stage: validation_data with the same data set

I would like someone to explain to me why when I do a training with a validation_data identical to the training data set, I get two curves that are different and not superimposed?
x <- matrix(rnorm(50 * 10), nrow = 50)
y <- matrix(rnorm(50), nrow = 50)
model <- keras_model_sequential()
model %>%
layer_dense(units = 1, input_shape = dim(x)[2]) %>%
layer_dropout(rate = 1) %>%
layer_activation("linear")
model %>% compile(
loss = "mse",
optimizer = "adam",
metrics = "mse"
)
history <- model %>% fit(x, y, batch_size = 1, epochs = 10, verbose = 1, validation_data = list(x, y))
plot(history)
Here are some reasons why that might happened:
The loss is calculated and averaged during training. That means between loss calculations there are gradient updates, so the next loss over minibatch is the loss over a different model. On other hand, val_loss is calculated after training, over the same model for the whole dataset. That's why they are different in value.
To put it visually, it is like this:
Epoch 1:
batch_1 -> nnet_1 -> loss_1 -> optimize nnet_1 to nnet_2
batch_2 -> nnet_2 -> loss_2 -> optimize nnet_2 to nnet_3
...
batch_n -> nnet_n -> loss_n -> optimize nnet_n-1 to nnet_n
loss = loss_1 + loss_2 + ... + loss_n
val_loss = loss of the nnet_n over whole dataset
you see how their calculation differs?
During training (when loss is calculated), dropout is enabled. After training (validation phase, when val_loss is calculated, dropout is disabled.

Validation accuracy much lower than training accuracy in Keras for text classification

I am new to Keras and trying to create a model. The issue is that my training accuracy is around 80 percent but the validation accuracy is drastically low at 15 percent. I have 545 rows in my dataset. I have normalized all the input features. Any help on what can be tweaked would be really helpful.
Sharing the complete data and code here
https://drive.google.com/open?id=1g8Cmw2bmAI9DnOU-rB4sjsOeBuFp6NUy
#Normalize data
data[,1:(ncol(data)-1)] = normalize(data[,1:(ncol(data)-1)])
data[,ncol(data)] = as.numeric(data[,ncol(data)]) - 1
set.seed(128)
ind = sample(2,nrow(data),replace = T,prob = c(0.7,0.3))
training = data[ind==1,1:(ncol(data)-1)]
test = data[ind==2,1:(ncol(data)-1)]
traintarget = data[ind==1,ncol(data)]
testtarget = data[ind==2,ncol(data)]
# One hot encoding
trainLabels = to_categorical(traintarget)
testLabels = to_categorical(testtarget)
print(testLabels)
model = keras_model_sequential()
model %>%
layer_dense(units = 150, activation = 'relu', input_shape = c(520)) %>%
layer_dense(units = 50, activation = 'relu') %>%
layer_dense(units = 9, activation = 'softmax')
model %>%
compile(loss = 'categorical_crossentropy', optimizer = 'adam',metrics = 'accuracy')
history = model %>%
fit(training,
trainLabels,
epoch = 300,
batch_size = 32,
validation_split = 0.2)
prob = model %>%
predict_proba(test)
pred = model %>%
predict_classes(test)
table2 = table(Predicted = pred, Actual = testtarget)
cbind(prob,pred,testtarget)
Simply put when your model is succeeding at training but not validation it is overfitting. The best way to combat this is by a) making sure that your inputs actually predict the outputs because otherwise a large enough model will just memorize the historical data. and b) by adding a dropout layer in your network. Finally, 500 something training samples seems a little low for training a neural network.

Resources