Retrain mxnet model in R - r

I have created a neural network with mxnet. Now I want to train this model iteratively on new data points. After I simulated a new data point I want to make a new gradient descent update on this model. I do not want to save the model to an external file and load it again.
I have written the following code, but the weights do not change after a new training step. I also get NaN as a training error.
library(mxnet)
data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, num_hidden = 2, no.bias = TRUE)
lro <- mx.symbol.LinearRegressionOutput(fc1)
# first data observation
train.x = matrix(0, ncol = 3)
train.y = matrix(0, nrow = 2)
# first training step
model = mx.model.FeedForward.create(lro,
X = train.x, y = train.y, initializer = mx.init.uniform(0.001),
num.round = 1, array.batch.size = 1, array.layout = "rowmajor",
learning.rate = 0.1, eval.metric = mx.metric.mae)
print(model$arg.params)
# second data observation
train.x = matrix(0, ncol = 3)
train.x[1] = 1
train.y = matrix(0, nrow = 2)
train.y[1] = -33
# retrain model on new data
# pass on params of old model
model = mx.model.FeedForward.create(symbol = model$symbol,
arg.params = model$arg.params, aux.params = model$aux.params,
X = train.x, y = train.y, num.round = 1,
array.batch.size = 1, array.layout = "rowmajor",
learning.rate = 0.1, eval.metric = mx.metric.mae)
# weights do not change
print(model$arg.params)

I found a solution. begin.round in the second training step must be greater than num.round in the first training step, so that the model continues to train.
library(mxnet)
data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, num_hidden = 2, no.bias = TRUE)
lro <- mx.symbol.LinearRegressionOutput(fc1)
# first data observation
train.x = matrix(0, ncol = 3)
train.y = matrix(0, nrow = 2)
# first training step
model = mx.model.FeedForward.create(lro,
X = train.x, y = train.y, initializer = mx.init.uniform(0.001),
num.round = 1, array.batch.size = 1, array.layout = "rowmajor",
learning.rate = 0.1, eval.metric = mx.metric.mae)
print(model$arg.params)
# second data observation
train.x = matrix(0, ncol = 3)
train.x[1] = 1
train.y = matrix(0, nrow = 2)
train.y[1] = -33
# retrain model on new data
# pass on params of old model
model = mx.model.FeedForward.create(symbol = model$symbol,
arg.params = model$arg.params, aux.params = model$aux.params,
X = train.x, y = train.y, begin.round = 2, num.round = 3,
array.batch.size = 1, array.layout = "rowmajor",
learning.rate = 0.1, eval.metric = mx.metric.mae)
print(model$arg.params)

did you try to call mx.model.FeedForward.create only once and then use the fit function for incremental training?

Related

fitting data in a for loop using Ugarchroll : How do I get rid of the following warning message? --> possible convergence problem: optim gave code = 1

I get the following warning message that I want to get rid of but I don't understand it:
possible convergence problem: optim gave code = 1
Bellow is my code:
# Creating the specification with ugarchroll
library("rugarch")
garch_spec_N <- ugarchspec(variance.model = list(model = "sGARCH", garchOrder = c(1, 1)), mean.model = list(armaOrder = c(1, 1), include.mean = TRUE),distribution.model = "norm")
nsim <- 2 # number of simulations
nstart <- 500
simul_ <- matrix(rnorm(2000), nrow = 1000, ncol = nsim))
# Model fitting
garch_sigma_N <- garch_mu_N <- matrix(NA, nrow = nstart, ncol = nsim)
for (i in 1:nsim) {
garch_fit_N <- ugarchroll(garch_spec_N, simul_[,i], n.ahead = 1,
forecast.length = 1, n.start = nstart,
refit.every = 1, refit.window = "moving",
window.size = nstart, calculate.VaR = TRUE,
VaR.alpha = alpha)
# Retrieving estimated garch variance and Mu
garch_sigma_N[,i] <- garch_fit_N#forecast[["density"]][["Sigma"]]
garch_mu_N[,i] <- garch_fit_N#forecast[["density"]][["Mu"]]
}
Any help would be very much appreciated :)

what are the parameters of bayes optimization for tuning parameter?

I am using Bayesian optimization to tune the parameters of SVM for regression problem. In the following code, what should be the value of init_grid_dt = initial_grid ? I got the upper and lower bounds of the sigma and C parameters of SVM, but dont know what should be the initial-grid?
In one of the example on the web, they took a random search results as input to the initial grid. The code is as follow:
ctrl <- trainControl(method = "repeatedcv", repeats = 5)
svm_fit_bayes <- function(logC, logSigma) {
## Use the same model code but for a single (C, sigma) pair.
txt <- capture.output(
mod <- train(y ~ ., data = train_dat,
method = "svmRadial",
preProc = c("center", "scale"),
metric = "RMSE",
trControl = ctrl,
tuneGrid = data.frame(C = exp(logC), sigma = exp(logSigma)))
)
list(Score = -getTrainPerf(mod)[, "TrainRMSE"], Pred = 0)
}
lower_bounds <- c(logC = -5, logSigma = -9)
upper_bounds <- c(logC = 20, logSigma = -0.75)
bounds <- list(logC = c(lower_bounds[1], upper_bounds[1]),
logSigma = c(lower_bounds[2], upper_bounds[2]))
## Create a grid of values as the input into the BO code
initial_grid <- rand_search$results[, c("C", "sigma", "RMSE")]
initial_grid$C <- log(initial_grid$C)
initial_grid$sigma <- log(initial_grid$sigma)
initial_grid$RMSE <- -initial_grid$RMSE
names(initial_grid) <- c("logC", "logSigma", "Value")
library(rBayesianOptimization)
ba_search <- BayesianOptimization(svm_fit_bayes,
bounds = bounds,
init_grid_dt = initial_grid,
init_points = 0,
n_iter = 30,
acq = "ucb",
kappa = 1,
eps = 0.0,
verbose = TRUE)

Image Recognition with Scalar output using CNN MXnet in R

So I am trying to use image recognition using the mxnet package in R using a CNN to try and predict a scalar output (in my case wait time) based on the image.
However, when I do this, I get the same resultant output (it predicts the same number which is probably just the average of all of the results). How do I get it to predict the scalar output correctly.
Also, my image has already been pre-processed by greyscaling it and converting into the pixel format below.
I am essentially using images to predict wait times which is why my train_y is the current wait times in seconds, hence why I didn't convert it into a [0,1] range. I would prefer a regression type output or some kind of scalar output that outputs the predicted wait time based on the image.
What other ways would you recommend to tackle this problem, not sure if my approach is correct.
Here is my reproducible code:
set.seed(0)
df <- data.frame(replicate(784,runif(7538)))
df$waittime <- 1000*runif(7538)
training_index <- createDataPartition(df$waittime, p = .9, times = 1)
training_index <- unlist(training_index)
train_set <- df[training_index,]
dim(train_set)
test_set <- df[-training_index,]
dim(test_set)
## Fix train and test datasets
train_data <- data.matrix(train_set)
train_x <- t(train_data[, -785])
train_y <- train_data[,785]
train_array <- train_x
dim(train_array) <- c(28, 28, 1, ncol(train_array))
test_data <- data.matrix(test_set)
test_x <- t(test_set[,-785])
test_y <- test_set[,785]
test_array <- test_x
dim(test_array) <- c(28, 28, 1, ncol(test_x))
library(mxnet)
## Model
mx_data <- mx.symbol.Variable('data')
## 1st convolutional layer 5x5 kernel and 20 filters.
conv_1 <- mx.symbol.Convolution(data = mx_data, kernel = c(5, 5), num_filter = 20)
tanh_1 <- mx.symbol.Activation(data = conv_1, act_type = "tanh")
pool_1 <- mx.symbol.Pooling(data = tanh_1, pool_type = "max", kernel = c(2, 2), stride = c(2,2 ))
## 2nd convolutional layer 5x5 kernel and 50 filters.
conv_2 <- mx.symbol.Convolution(data = pool_1, kernel = c(5,5), num_filter = 50)
tanh_2 <- mx.symbol.Activation(data = conv_2, act_type = "tanh")
pool_2 <- mx.symbol.Pooling(data = tanh_2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
## 1st fully connected layer
flat <- mx.symbol.Flatten(data = pool_2)
fcl_1 <- mx.symbol.FullyConnected(data = flat, num_hidden = 500)
tanh_3 <- mx.symbol.Activation(data = fcl_1, act_type = "tanh")
## 2nd fully connected layer
fcl_2 <- mx.symbol.FullyConnected(data = tanh_3, num_hidden = 1)
## Output
#NN_model <- mx.symbol.SoftmaxOutput(data = fcl_2)
label <- mx.symbol.Variable("label")
#NN_model <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fcl_2, shape = 0) - label))
NN_model <- mx.symbol.LinearRegressionOutput(fcl_2)
## Device used. Sadly not the GPU :-(
#device <- mx.gpu
#Didn't work well, predicted same number continuously regardless of image
## Train on 1200 samples
model <- mx.model.FeedForward.create(NN_model, X = train_array, y = train_y,
# ctx = device,
num.round = 30,
array.batch.size = 100,
initializer=mx.init.uniform(0.002),
learning.rate = 0.00001,
momentum = 0.9,
wd = 0.00001,
eval.metric = mx.metric.rmse)
epoch.end.callback = mx.callback.log.train.metric(100))
pred <- predict(model, test_array)
#gives the same numeric output
Just modify your code a little. train_y is also in [0, 1] and initializer = mx.init.Xavier(factor_type = "in", magnitude = 2.34).
library(caret)
set.seed(0)
df <- data.frame(replicate(784, runif(7538)))
df$waittime <- runif(7538)
training_index <- createDataPartition(df$waittime, p = .9, times = 1)
training_index <- unlist(training_index)
train_set <- df[training_index, ]
dim(train_set)
test_set <- df[-training_index, ]
dim(test_set)
## Fix train and test datasets
train_data <- data.matrix(train_set)
train_x <- t(train_data[,-785])
train_y <- train_data[, 785]
train_array <- train_x
dim(train_array) <- c(28, 28, 1, ncol(train_array))
test_data <- data.matrix(test_set)
test_x <- t(test_set[, -785])
test_y <- test_set[, 785]
test_array <- test_x
dim(test_array) <- c(28, 28, 1, ncol(test_x))
library(mxnet)
## Model
mx_data <- mx.symbol.Variable('data')
## 1st convolutional layer 5x5 kernel and 20 filters.
conv_1 <- mx.symbol.Convolution(data = mx_data, kernel = c(5, 5), num_filter = 20)
tanh_1 <- mx.symbol.Activation(data = conv_1, act_type = "tanh")
pool_1 <- mx.symbol.Pooling(data = tanh_1, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
## 2nd convolutional layer 5x5 kernel and 50 filters.
conv_2 <- mx.symbol.Convolution(data = pool_1, kernel = c(5, 5), num_filter = 50)
tanh_2 <- mx.symbol.Activation(data = conv_2, act_type = "tanh")
pool_2 <- mx.symbol.Pooling(data = tanh_2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
## 1st fully connected layer
flat <- mx.symbol.Flatten(data = pool_2)
fcl_1 <- mx.symbol.FullyConnected(data = flat, num_hidden = 500)
tanh_3 <- mx.symbol.Activation(data = fcl_1, act_type = "tanh")
## 2nd fully connected layer
fcl_2 <- mx.symbol.FullyConnected(data = tanh_3, num_hidden = 1)
## Output
#NN_model <- mx.symbol.SoftmaxOutput(data = fcl_2)
label <- mx.symbol.Variable("label")
#NN_model <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fcl_2, shape = 0) - label))
NN_model <- mx.symbol.LinearRegressionOutput(fcl_2)
mx.set.seed(0)
model <- mx.model.FeedForward.create(NN_model,
X = train_array,
y = train_y,
num.round = 4,
array.batch.size = 64,
initializer = mx.init.Xavier(factor_type = "in", magnitude = 2.34),
learning.rate = 0.00001,
momentum = 0.9,
wd = 0.00001,
eval.metric = mx.metric.rmse)
pred <- predict(model, test_array)
pred[1,1:10]
# [1] 0.4859098 0.4865469 0.5671642 0.5729486 0.5008956 0.4962234 0.4327411 0.5478653 0.5446281 0.5707113
It appears that your network is collapsing, due to a number of potentials. I would try the following modifications:
Use ReLU activation instead of tanh. ReLU has proven to be a much more robust activation in Conv networks than sigmoid or tanh.
User batch-normalization between at the input of your convolutional layers (see paper here).
Divide your range into sections and use softmax. If you must have regression, consider a separate regression network for each range and select the correct regression net based on the output of the softmax. Cross Entropy loss has shown more success in learning highly non-linear functions.

R Package Deepnet: Why sae_dnn_train does not work with large data sets

I am trying sae.dnn.train() with 5000 cases, 55-inputs and 3 hidden layers.
Why function nn.predict returns NaN? (vector)
I am using the following command
Nrow <-5000
Ncol <- 55
v <- c(rnorm(Nrow*Ncol,1, 0.5))
x <- matrix(v, nrow=Nrow, ncol=Ncol)
y <- c(rep(1, Nrow/2), rep(0, Nrow/2))
dnn <- sae.dnn.train(x, y, hidden = c(100,90,80),activationfun = "tanh", learningrate = 0.6, momentum = 0.5, learningrate_scale = 1.0,output = "sigm", sae_output = "linear", numepochs = 10, batchsize = 100, hidden_dropout = 0, visible_dropout = 0)
yy <- nn.predict(dnn, x)

How to isolate the plots of this method?

I am using the rugarch package and I fitted a model. Now I want to look at the output and use the plot function. My problem is, that the 5th plot contains some subplots, which are plotted in one device, but I want to plot each in a single device. How can I do this? As an example I give you a code example, which uses the sp500ret data of the package:
The code:
library(rugarch)
data(sp500ret)
somemodel<-ugarchspec(variance.model = list(model = "sGARCH", garchOrder = c(2, 2)),
mean.model = list(armaOrder = c(1, 1), include.mean = TRUE),
distribution.model = "ged")
somefit<-ugarchfit(spec=somemodel,data=sp500ret)
rollingesti = ugarchroll(somemodel, sp500ret, n.start=500,
refit.every = 100, refit.window = 'moving', window.size = 500,
calculate.VaR = FALSE, keep.coef = TRUE)
plot(rollingesti,which=5)
the plot(rollingesti,which=5) plots several plots into one device, I want to isolate them.
So I want to have them as single plots and bigger, now, they are too small, since they are all put into one output.
Your example does not work (at least for me), i.e. it does not converge. However, this one works:
library(rugarch)
data(sp500ret)
spec <- ugarchspec(distribution.model = "std")
mod <- ugarchroll(spec, data = sp500ret[1:2000,], n.ahead = 1,
n.start = 1000, refit.every = 100, refit.window = "moving",
solver = "hybrid", fit.control = list(),
calculate.VaR = TRUE, VaR.alpha = c(0.01, 0.025, 0.05),
keep.coef = TRUE)
First, we find a method that is used in plot(mod, which = 5). It can be obtained by
getMethod("plot", c(x = "uGARCHroll", y = "missing"))
You are interested in the following lines
.intergarchrollPlot(x, choices = choices, plotFUN = paste(".plot.garchroll",
1:5, sep = "."), which = which, VaR.alpha = VaR.alpha,
density.support = density.support, ...)
where choices is "Fit Coefficients (with s.e. bands)". By inspecting rugarch:::.intergarchrollPlot we finally arrive to rugarch:::.plot.garchroll.5. These plots are not returned in any list or similar, hence I provide a bit modified version so that you could use them separately. Here I changed the first two and the last one line:
library(xts)
x <- mod
vmodel = x#model$spec#model$modeldesc$vmodel
if (!x#model$keep.coef)
stop("\n\nplot-->error: keep.coef set to FALSE in estimation\n")
coefs = x#model$coef
m = dim(coefs[[1]]$coef)[1]
N = length(coefs)
Z = matrix(NA, ncol = m, nrow = N)
Zup = matrix(NA, ncol = m, nrow = N)
Zdn = matrix(NA, ncol = m, nrow = N)
for (i in 1:m) {
Z[, i] = sapply(coefs, FUN = function(y) y$coef[i, 1])
Zup[, i] = Z[, i] + sapply(coefs, FUN = function(y) y$coef[i,
2])
Zdn[, i] = Z[, i] - sapply(coefs, FUN = function(y) y$coef[i,
2])
}
dt = sapply(coefs, FUN = function(y) as.character(y$index))
cnames = rownames(coefs[[1]]$coef)
np = rugarch:::.divisortable(m) # added rugarch:::
This is a function for each plot separately, i is a number of the graph, e.g. from 1 to 7 in this case:
plotFun <- function(i){
plot(xts(Z[, i], as.POSIXct(dt)), type = "l",
ylim = c(min(Zdn[, i]), max(Zup[, i])), ylab = "value", xlab = "", main = "",
minor.ticks = FALSE, ann = FALSE, auto.grid = FALSE)
lines(xts(Zdn[, i], as.POSIXct(dt)), col = 2)
lines(xts(Zup[, i], as.POSIXct(dt)), col = 2)
title(cnames[i], line = 0.4, cex = 0.9)
grid()
}
For example:
plotFun(1)
plotFun(2)

Resources