Using a custom R generator function with fit_generator (Keras, R) - r

I'd like to train a convolutional network to solve a multi-class, multi-label problem on image data. Due to the nature of the data, and for reasons I'll spare you, it would be best if I could use a custom R generator function to feed to the fit_generator command, instead of its built-in image_data_generator and flow_images_from_directory commands (which I was successfully able to get working, just not for this particular problem).
Here (https://www.rdocumentation.org/packages/keras/versions/2.2.0/topics/fit_generator) it says that I can do just that, without giving any examples. So I tried the following. Here is an extremely stripped down example of what I'm trying to do (this code is entirely self contained):
library(keras)
library(reticulate) #for py_iterator function
play.network = keras_model_sequential() %>%
layer_dense(units = 10, activation = "relu", input_shape = c(10)) %>%
layer_dense(units = 1, activation = "relu")
play.network %>% compile(
optimizer = "rmsprop",
loss = "mse"
)
mikes.custom.generator.function = function() #generates a 2-list of a random 1 x 10 array, and a scalar
{
new.func = function()
{
arr = array(dim = c(1,10))
arr[,] = sample(1:10, 10, replace = TRUE)/10
return(list(arr,runif(1)))
}
}
mikes.custom.iterator = py_iterator(mikes.custom.generator.function()) #creates a python iterator object
generator_next(mikes.custom.iterator) #correctly returns a 2-member list consisting of a 1 x 10 array, and a scalar
generator_next(mikes.custom.iterator)[[1]] #a 1 x 10 array
generator_next(mikes.custom.iterator)[[2]] #a scalar
#try to fit with "fit_generator":
play.network %>% fit_generator( #FREEZES.
mikes.custom.iterator,
steps_per_epoch = 1,
epochs = 1
)
The thing freezes at training time, without giving me an error message or anything. I also tried it with a custom image data generator for my original problem, same result.
Note that this network trains just fine if I just use fit and input the training data manually:
play.network %>% fit(generator_next(mikes.custom.iterator)[[1]],generator_next(mikes.custom.iterator)[[2]], epochs = 1, batch_size = 1)
#trains just fine
I think I know the problem, but I don't know the solution. If you ask it for the class of my custom iterator, it gives
class(mikes.custom.iterator)
[1] "python.builtin.iterator" "rpytools.generator.RGenerator" "python.builtin.object"
whereas if I build an iterator using the builtin image_data_generator and flow_images_from_directory commands, it gives
train_datagen <- image_data_generator(rescale = 1/255)
class(train_datagen)
[1] "keras.preprocessing.image.ImageDataGenerator" "keras_preprocessing.image.ImageDataGenerator" "python.builtin.object"
train_generator <- flow_images_from_directory(
train_dir,
train_datagen,
....
)
class(train_generator)
[1] "python.builtin.iterator" "keras_preprocessing.image.DirectoryIterator" "keras_preprocessing.image.Iterator" "tensorflow.python.keras.utils.data_utils.Sequence" "python.builtin.object"
So my guess is that train_datagen and/or train_generator have attributes that mikes.custom.iterator does not, and fit_generator is trying to call upon mikes.custom.iterator using functions other than the basic generator_next (which is in theory all it should really need). But I don't know what they may be, or how to build mikes.custom.iterator correctly, even after searching for two hours online.
Help anyone?

In R, you can build an iterator using <<- operator. This is very helpful to build a custom generator function; and it is compatible with Keras' fit_generator() function.
Some minimal example:
# example data
data <- data.frame(
x = runif(80),
y = runif(80),
z = runif(80)
)
# example generator
data_generator <- function(data, x, y, batch_size) {
# start iterator
i <- 1
# return an iterator function
function() {
# reset iterator if already seen all data
if ((i + batch_size - 1) > nrow(data)) i <<- 1
# iterate current batch's rows
rows <- c(i:min(i + batch_size - 1, nrow(data)))
# update to next iteration
i <<- i + batch_size
# create container arrays
x_array <- array(0, dim = c(length(rows), length(x)))
y_array <- array(0, dim = c(length(rows), length(y)))
# fill the container
x_array[1:length(rows), ] <- data[rows, x]
y_array[1:length(rows), ] <- data[rows, y]
# return the batch
list(x_array, y_array)
}
}
# set-up a generator
gen <- data_generator(
data = data.matrix(data),
x = 1:2, # it is flexible, you can use the column numbers,
y = c("y", "z"), # or the column name
batch_size = 32
)
From above function, you can simply check the resulting arrays by calling the generator:
gen()
Or you could also test the generator using a simple Keras model:
# import keras
library(keras)
# set up a simple keras model
model <- keras_model_sequential() %>%
layer_dense(32, input_shape = c(2)) %>%
layer_dense(2)
model %>% compile(
optimizer = "rmsprop",
loss = "mse"
)
# fit using generator
model %>% fit_generator(
generator = gen,
steps_per_epoch = 100, # will auto-reset after see all sample
epochs = 10
)
I have to admit that the process is a little bit complex and requires extensive programming. You should check this featured blog post by François Chollet himself, or kerasgenerator package that I develop personally.

sampling_generator <- function(X_data, Y_data, batch_size) {
function() {
rows <- sample(1:nrow(X_data), batch_size, replace = TRUE)
list(X_data[rows,], Y_data[rows,])
}
}
model %>%
fit_generator(sampling_generator(X_train, Y_train, batch_size = 128),
steps_per_epoch = nrow(X_train) / 128, epochs = 10)
I found this answer in R keras FAQs which seems to work
https://keras.rstudio.com/articles/faq.html#how-can-i-use-keras-with-datasets-that-dont-fit-in-memory

Related

Calculating mse for multiple dimensionality reduction technique

I'm trying to find a metric to compare multiple dimensionality reduction techniques similar to what was done in this blog post pca-vs-autoencoders-for-dimensionality-reduction...
Specifically this part of the comparison
# pCA reconstruction
pca.recon <- function(pca, x, k){
mu <- matrix(rep(pca$center, nrow(pca$x)), nrow = nrow(pca$x), byrow = T)
recon <- pca$x[,1:k] %*% t(pca$rotation[,1:k]) + mu
mse <- mean((recon - x)^2)
return(list(x = recon, mse = mse))
}
xhat <- rep(NA, 10)
for(k in 1:10){
xhat[k] <- pca.recon(pca, x_train, k)$mse
}
ae.mse <- rep(NA, 5)
for(k in 1:5){
modelk <- keras_model_sequential()
modelk %>%
layer_dense(units = 6, activation = "tanh", input_shape = ncol(x_train)) %>%
layer_dense(units = k, activation = "tanh", name = "bottleneck") %>%
layer_dense(units = 6, activation = "tanh") %>%
layer_dense(units = ncol(x_train))
modelk %>% compile(
loss = "mean_squared_error",
optimizer = "adam"
)
modelk %>% fit(
x = x_train,
y = x_train,
epochs = 5000,
verbose = 0
)
ae.mse[k] <- unname(evaluate(modelk, x_train, x_train))
}
df <- data.frame(k = c(1:10, 1:5), mse = c(xhat, ae.mse), method = c(rep("pca", 10), rep("autoencoder", 5)))
ggplot(df, aes(x = k, y = mse, col = method)) + geom_line()
i will like to add other techniques to the mix such as TSNE from Rtsne package, UMAP from the umap package and IVIS from the ivis package (currently not on CRAN but can be installed like so ->
devtools::install_github("beringresearch/ivis/R-package")
library(ivis)
install_ivis()
The data input and processing for all the techniques are similar but it seems some of them already have mse determination baked into their functions (e.g. autoencoder). I'm wondering if anyone has experience with what i'm trying to do.
Different decomposition methods can be regarded as interchangeable gears in a statistical machine, that serves to be useful to you, the creator.
To pick the best gear, the metric you evaluate by should not necessarily concern the gears, but how well the machine performs overall with each gear inserted respectively.
Disregard the gear specs:
You have a couple of gears which all come with their own validation specs from their factories (packages). Those numbers/summeries/specs are likely not what you want. Likely gears will not be provided with the same metrics, so it is hard to do a fair comparison. Also, those metrics will be all about the gears and not about your specific machine anyways. Do not do as the blog suggest and mix machine metrics with the gears in the pca.recon(). Let the gears be gears, and delay the metric evaluation onto the machine level.
Does the gear fit at all?: You need to check for your specific machine, that all your candidate gears will actually fit inside. Your gears for your composition/reconstruction machine, must be able to turn both ways. The t-sne is only designed to turn forward and do decomposition, so it is not possible to do a meaningful evaluation. Likewise for the UMAP. 

Maybe the whole reconstruction loss benchmarking thing was not the actual machine you wanted to use in the first place. Maybe just a side project to pick gears for another machine... 
If your machine is to make beautiful plots then good quantitative benchmarking is hard to come by. If your machine is some initial decomposition mixed with a simple classifier, then the t-sne gear will fit just fine and some prediction accuracy metric is likely useful to pick gears with

.
Interfacing various gears: The gears will actually not fit out-of-the-box into your machine because the size and form is not the same. Each gear needs individual adaptation. You may be tempted to re-fit the machine to the gear and that will do for a few gears. That would be to literal copy-paste your machine code, insert and adapt each gear. A more scalable way is to just interface the gears, such you can leave them in a bag next to the machine and let a robot insert one gear at the time and write you a report. That is a main selling point of frameworks such as sklearn, caret and keras. You could also code it your self. Here is a simple example

:
rm(list=ls())
#some data
X <- iris[,c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")]
#my_gear, prcomp wrapped in an interface
#any gear must have the gear(X, N, ...) signature
pca_decompose <- function(X, N=2, ...) {
#implement gear forward (decompose)
pca <- prcomp(
X, rank. = N,
scale = FALSE #must be false, beacuse reconstructor below does not support re-scaling, because I'm lazy.
)
#implement gear backward (reconstruct)
reconstruct <- function(Xnew = pca$x) {
# a pca reconstructor implementation similar to function from the blog, pca already in closure
# I think the blog mistankenly referred to pca$x instead of x sometimes
pca.recon <- function(x, k){
x_recon <- x[,1:k] %*% t(pca$rotation[,1:k])
#slightly more effecient way to reapply center
for(i in seq_along(pca$center)) x_recon[,i] <- x_recon[,i] + pca$center[i]
return(x_recon)
}
X_rc <- pca.recon(Xnew, k=N)
return(X_rc)
}
#wrap up the interface
self <- list(
X_decomposed = pca$x, # any decomposition must be named X_dc
reconstruct = reconstruct
)
class(self) <- c("my_pca","my_universal_gear")
return(self)
}
#define a machine with the relevant use case
my_machine <- function(gear, data, ...) {
dc_obj <- gear(data, ...)
data_rc <- dc_obj$reconstruct(dc_obj$X_decomposed)
}
#define the most useful metric
my_metric <- function(X,Y) {
# this 'multivariate' mse, is not commonly used I think.
# but whatever floats the boat
mean((X-Y)^2)
}
#define how to evaluate.
#try the gear in the mahine and meassure outcome with metric
my_evaluation <- function(gear, machine, data, metric, ...) {
data <- as.matrix(data)
output <- machine(gear,data, ...)
my_metric(data,output)
}
#useful syntactic sugar
set_params <- function(gear, ...) {
params = list(...)
function(...) do.call(gear,c(list(...),params))
}
#evaluate a gear
my_evaluation(
gear = pca_decompose,
machine = my_machine,
data = X,
#gear params
N=2
)
#the same as
my_evaluation(
gear = set_params(pca_decompose,N=2), #nice to preset gear params
machine = my_machine,
data = X
)
#define all gears to evaluate
#the gearbag could also in another usecase be a grid search of optimal hyper-parameters
my_gearbag = list(
pca_dc_N1 = set_params(pca_decompose,N=1),
pca_dc_N2 = set_params(pca_decompose,N=2),
pca_dc_N3 = set_params(pca_decompose,N=3),
pca_dc_N4 = set_params(pca_decompose,N=4)
#put also autoencoder or what ever in the gearbag
)
my_robot <- function(evaluation, machine, gearbag, data) {
results <- sapply(
X = gearbag, #this X is not the data put placeholder for what to iterate
FUN = evaluation,
machine = machine,
data = X
)
report = list(
README = "metric results for gears",
results = results
)
}
my_report <- my_robot(my_evaluation, my_machine, my_gearbag, data)
print(my_report)
printout
$README
[1] "metric results for gears"
$results
pca_dc_N1 pca_dc_N2 pca_dc_N3 pca_dc_N4
8.560431e-02 2.534107e-02 5.919048e-03 1.692109e-31

SHAP with Keras model : operands could not be broadcast together with shapes (2,6) (10,)

I am running SHAP from the library shapper in R for a classification model intrepetation on a Keras CNN model:
library(keras)
library("shapper")
library("DALEX")
I made a simple reproductible example
mdat.train <- cbind(rep(1:2, each = 5), matrix(c(1:30), ncol = 3, byrow = TRUE))
train.conv <- array_reshape(mdat.train[,-1], c(nrow(mdat.train[,-1]), ncol(mdat.train[,-1]), 1))
mdat.test <- cbind(rep(1:2, each = 3), matrix(c(1:18), ncol = 3, byrow = TRUE))
test.conv <- array_reshape(mdat.test[,-1], c(nrow(mdat.test[,-1]), ncol(mdat.test[,-1]), 1))
My CNN model
model.CNN <- keras_model_sequential()
model.CNN %>%
layer_conv_1d(filters=16L, kernel_initializer=initializer_he_normal(seed=NULL), kernel_size=2L, input_shape = c(dim(train.conv)[[2]],1)) %>%
layer_batch_normalization() %>%
layer_activation_leaky_relu() %>%
layer_flatten() %>%
layer_dense(50, activation ="relu") %>%
layer_dropout(rate=0.5) %>%
layer_dense(units=2, activation ='sigmoid')
model.CNN %>% compile(
loss = loss_binary_crossentropy,
optimizer = optimizer_adam(lr = 0.001, beta_1 = 0.9, beta_2 = 0.999, epsilon = 1e-08),
metrics = c("accuracy"))
model.CNN %>% fit(
train.conv, mdat.train[,1], epochs = 5, verbose = 1)
My Shap command
p_function <- function(model, data) predict(model.CNN, test.conv, type = "prob")
exp_cnn <- explain(model.CNN, data = train.conv)
ive_cnn <- shap(exp_cnn, data = train.conv, new_observation = test.conv, predict_function = p_function)
I am getting this error :
Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: operands could not be broadcast together with shapes (2,6) (10,)
Detailed traceback:
File "/.local/lib/python3.6/site-packages/shap/explainers/kernel.py", line 120, in __init__
self.fnull = np.sum((model_null.T * self.data.weights).T, 0)
Problem You've presented has two steps. First of all shown error comes from code typo. p_function shown by You calls global objects instead of passed ones. Thats why You have witnessed that error.
But to my surpirse I've found package not working even after clarifying that mistake. Let me explain motivation and the solution.
Have to say that 3D Arrays are not common in R, therefore shapper package does not support that type of train data. It assumes data.frame format at the beginning of the task (because it's iterating over variables). To be honest it took me like 2 hourse to find a reason why it is not working as well as a solution.
First of all we need new variables that are understandable for shapper.
shapper_data <- as.data.frame(train.conv)
shapper_new_obs <- as.data.frame(test.conv)[1,]
as well as new predict_function
p_function <- function(model, data) {
mat <- as.matrix(data)
mat <- array_reshape(mat, c(nrow(data), ncol(data), 1))
predict(model, mat, type = "prob")
}
Two new lines will convert data.frame into proper shaped array.
Then line
ive_cnn <- individual_variable_effect(x = model.CNN, data = shapper_data, new_observation = shapper_new_obs, predict_function = p_function)
Works perfectly fine for me.
Best
Szymon

Keras LSTM and multiple input feature: how to define parameters

I am discoveting Keras in R and the LSTM. Following this blog post, I want to predict time series, and I would like to use various past time point (t-1, t-2) to predict the t point.
Here is what I tried so far:
library(data.table)
library(tensorflow)
library(keras)
Serie <- c(5.66333333333333, 5.51916666666667, 5.43416666666667, 5.33833333333333,
5.44916666666667, 6.2025, 6.57916666666667, 6.70666666666667,
6.95083333333333, 8.1775, 8.55083333333333, 8.42166666666667,
8.01333333333333, 8.99833333333333, 11.0025, 10.3116666666667,
10.51, 10.9916666666667, 10.6116666666667, 10.8475, 13.7841666666667,
16.2916666666667, 15.9975, 14.3683333333333, 13.4041666666667,
11.8666666666667, 9.11916666666667, 9.47862416666667, 9.08404666666667,
8.79606166666667, 9.93211091666667, 9.03834041666667, 8.58787275,
6.77499383333333, 7.21377583333333, 7.53497175, 6.31212966666667,
5.5825105, 4.64021041666667, 4.608787, 5.39446983333333, 4.93945983333333,
4.8612215, 4.13088808333333, 4.09916575, 3.40943183333333, 3.79573258333333,
4.30319966666667, 4.23431266666667, 3.64880758333333, 3.11700716666667,
3.321058, 2.53599408333333, 2.20433991666667, 1.66643905833333,
0.84187275, 0.467880658333333, 0.810507858333333, 0.795)
Npoints <- 2 # number of previous point to take into account
I then create a data frame with the lagged time series, and create a test and train set:
supervised <- data.table(x = diff(Serie, differences = 1))
supervised[,c(paste0("x-",1:Npoints)) := lapply(1:Npoints,function(i){c(rep(NA,i),x[1:(.N-i)])})] # create shifted versions
# take the non NA
supervised <- supervised[!is.na(get(paste0("x-",Npoints)))]
head(supervised)
# Split dataset into training and testing sets
N = nrow(supervised)
n = round(N *0.7, digits = 0)
train = supervised[1:n, ]
test = supervised[(n+1):N, ]
I rescale the data
scale_data = function(train, test, feature_range = c(0, 1)) {
x = train
fr_min = feature_range[1]
fr_max = feature_range[2]
std_train = ((x - min(x,na.rm = T) ) / (max(x,na.rm = T) - min(x,na.rm = T) ))
std_test = ((test - min(x,na.rm = T) ) / (max(x,na.rm = T) - min(x,na.rm = T) ))
scaled_train = std_train *(fr_max -fr_min) + fr_min
scaled_test = std_test *(fr_max -fr_min) + fr_min
return( list(scaled_train = as.vector(scaled_train), scaled_test = as.vector(scaled_test) ,scaler= c(min =min(x,na.rm = T), max = max(x,na.rm = T))) )
}
Scaled = scale_data(train, test, c(-1, 1))
# define x and y train
y_train = as.vector(Scaled$scaled_train[, 1])
x_train = Scaled$scaled_train[, -1]
And following this post I reshape the data in 3D
x_train_reshaped <- array(NA,dim= c(1,dim(x_train)))
x_train_reshaped[1,,] <- as.matrix(x_train)
I do the following model and try to start the learning :
model <- keras_model_sequential()
model%>%
layer_lstm(units = 1, batch_size = 1, input_shape = dim(x_train), stateful= TRUE)%>%
layer_dense(units = 1)
# compile model ####
model %>% compile(
loss = 'mean_squared_error',
optimizer = optimizer_adam( lr= 0.02, decay = 1e-6 ),
metrics = c('accuracy')
)
# make a test
model %>% fit(x_train_reshaped, y_train, epochs=1, batch_size=1, verbose=1, shuffle=FALSE)
but I get the following error:
Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: No data provided for "dense_11". Need data for each key in: ['dense_11']
Trying to reshape the data differently didn't help.
What I am doing wrong ?
Keras and tensorflow in R cannot recognise the size of your input/target data when they are data frames.
y_train is both a data.table and a data.frame:
class(y_train)
[1] "data.table" "data.frame"
The keras fit documentation states: "y: Vector, matrix, or array of target (label) data (or list if the model has multiple outputs)." Similarly, for x.
Unfortunately, there still appears to be an input and/or target dimensionality mismatch when y_train is cast to a matrix:
model %>%
fit(x_train_reshaped, as.matrix(y_train), epochs=1, batch_size=1, verbose=1, shuffle=FALSE)
Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: Input arrays should have the same number of samples as target arrays.
Found 1 input samples and 39 target samples.
Hope this answer helps you, or someone else, make further progress.

Understanding Keras prediction output of a rnn model in R

I'm trying out the Keras package in R by doing this tutorial about forecasting the temperature. However, the tutorial has no explanation on how to predict with the trained RNN model and I wonder how to do this. To train a model I used the following code copied from the tutorial:
dir.create("~/Downloads/jena_climate", recursive = TRUE)
download.file(
"https://s3.amazonaws.com/keras-datasets/jena_climate_2009_2016.csv.zip",
"~/Downloads/jena_climate/jena_climate_2009_2016.csv.zip"
)
unzip(
"~/Downloads/jena_climate/jena_climate_2009_2016.csv.zip",
exdir = "~/Downloads/jena_climate"
)
library(readr)
data_dir <- "~/Downloads/jena_climate"
fname <- file.path(data_dir, "jena_climate_2009_2016.csv")
data <- read_csv(fname)
data <- data.matrix(data[,-1])
train_data <- data[1:200000,]
mean <- apply(train_data, 2, mean)
std <- apply(train_data, 2, sd)
data <- scale(data, center = mean, scale = std)
generator <- function(data, lookback, delay, min_index, max_index,
shuffle = FALSE, batch_size = 128, step = 6) {
if (is.null(max_index))
max_index <- nrow(data) - delay - 1
i <- min_index + lookback
function() {
if (shuffle) {
rows <- sample(c((min_index+lookback):max_index), size = batch_size)
} else {
if (i + batch_size >= max_index)
i <<- min_index + lookback
rows <- c(i:min(i+batch_size, max_index))
i <<- i + length(rows)
}
samples <- array(0, dim = c(length(rows),
lookback / step,
dim(data)[[-1]]))
targets <- array(0, dim = c(length(rows)))
for (j in 1:length(rows)) {
indices <- seq(rows[[j]] - lookback, rows[[j]],
length.out = dim(samples)[[2]])
samples[j,,] <- data[indices,]
targets[[j]] <- data[rows[[j]] + delay,2]
}
list(samples, targets)
}
}
lookback <- 1440
step <- 6
delay <- 144
batch_size <- 128
train_gen <- generator(
data,
lookback = lookback,
delay = delay,
min_index = 1,
max_index = 200000,
shuffle = TRUE,
step = step,
batch_size = batch_size
)
val_gen = generator(
data,
lookback = lookback,
delay = delay,
min_index = 200001,
max_index = 300000,
step = step,
batch_size = batch_size
)
test_gen <- generator(
data,
lookback = lookback,
delay = delay,
min_index = 300001,
max_index = NULL,
step = step,
batch_size = batch_size
)
# How many steps to draw from val_gen in order to see the entire validation set
val_steps <- (300000 - 200001 - lookback) / batch_size
# How many steps to draw from test_gen in order to see the entire test set
test_steps <- (nrow(data) - 300001 - lookback) / batch_size
library(keras)
model <- keras_model_sequential() %>%
layer_flatten(input_shape = c(lookback / step, dim(data)[-1])) %>%
layer_dense(units = 32, activation = "relu") %>%
layer_dense(units = 1)
model %>% compile(
optimizer = optimizer_rmsprop(),
loss = "mae"
)
history <- model %>% fit_generator(
train_gen,
steps_per_epoch = 500,
epochs = 20,
validation_data = val_gen,
validation_steps = val_steps
)
I tried to predict the temperature with the code below. If I am correct, this should give me the normalized predicted temperature for every batch. So when I denormalize the values and average them, I get the predicted temperature. Is this correct and if so for which time is then predicted (latest observation time + delay?) ?
prediction.set <- test_gen()[[1]]
prediction <- predict(model, prediction.set)
Also, what is the correct way to use keras::predict_generator() and the test_gen() function? If I use the following code:
model %>% predict_generator(generator = test_gen,
steps = test_steps)
it gives this error:
error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: Error when checking model input: the list of Numpy
arrays that you are passing to your model is not the size the model expected.
Expected to see 1 array(s), but instead got the following list of 2 arrays:
[array([[[ 0.50394005, 0.6441838 , 0.5990761 , ..., 0.22060473,
0.2018686 , -1.7336458 ],
[ 0.5475698 , 0.63853574, 0.5890239 , ..., -0.45618412,
-0.45030192, -1.724062...
Note: my familiarity with syntax of R is very little, so unfortunately I can't give you an answer using R. Instead, I am using Python in my answer. I hope you could easily translate back, my words at least, to R.
... If I am correct, this should give me the normalized predicted
temperature for every batch.
Yes, that's right. The predictions would be normalized since you have trained it with normalized labels:
data <- scale(data, center = mean, scale = std)
Therefore, you would need to denormalize the values using the computed mean and std to find the real predictions:
pred = model.predict(test_data)
denorm_pred = pred * std + mean
... for which time is then predicted (latest observation time +
delay?)
That's right. Concretely, since in this particular dataset every ten minutes a new obeservation is recorded and you have set delay=144, it would mean that the predicted value is the temperature 24 hours ahead (i.e. 144 * 10 = 1440 minutes = 24 hours) from the last given observation.
Also, what is the correct way to use keras::predict_generator() and
the test_gen() function?
predict_generator takes a generator that gives as output only test samples and not the labels (since we don't need labels when we are performing prediction; the labels are needed when training, i.e. fit_generator(), and when evaluating the model, i.e. evaluate_generator()). That's why the error mentions that you need to pass one array instead of two arrays. So you need to define a generator that only gives test samples or one alternative way, in Python, is to wrap your existing generator inside another function that gives only the input samples (I don't know whether you can do this in R or not):
def pred_generator(gen):
for data, labels in gen:
yield data # discards labels
preds = model.predict_generator(pred_generator(test_generator), number_of_steps)
You need to provide one other argument which is the number of steps of generator to cover all the samples in test data. Actually we have num_steps = total_number_of_samples / batch_size. For example, if you have 1000 samples and each time the generator generate 10 samples, you need to use generator for 1000 / 10 = 100 steps.
Bonus: To see how good your model performs you can use evaluate_generator using the existing test generator (i.e. test_gen):
loss = model.evaluate_generator(test_gen, number_of_steps)
The given loss is also normalized and to denormalize it (to get a better sense of prediction error) you just need to multiply it by std (you don't need to add mean since you are using mae, i.e. mean absolute error, as the loss function):
denorm_loss = loss * std
This would tell you how much your predictions are off on average. For example, if you are predicting the temperature, a denorm_loss of 5 means that the predictions are on average 5 degrees off (i.e. are either less or more than the actual value).
Update: For prediction, you can define a new generator using an existing generator in R like this:
pred_generator <- function(gen) {
function() { # wrap it in a function to make it callable
gen()[1] # call the given generator and get the first element (i.e. samples)
}
}
preds <- model %>%
predict_generator(
generator = pred_generator(test_gen), # pass test_gen directly to pred_generator without calling it
steps = test_steps
)
evaluate_generator(model, test_gen, test_steps)

R neuralnet package too slow for millions of records

I am trying to train a neural network for churn prediction with R package neuralnet. Here is the code:
data <- read.csv('C:/PredictChurn.csv')
maxs <- apply(data, 2, max)
mins <- apply(data, 2, min)
scaled_temp <- as.data.frame(scale(data, center = mins, scale = maxs - mins))
scaled <- data
scaled[, -c(1)] <- scaled_temp[, -c(1)]
index <- sample(1:nrow(data),round(0.75*nrow(data)))
train_ <- scaled[index,]
test_ <- scaled[-index,]
library(neuralnet)
n <- names(train_[, -c(1)])
f <- as.formula(paste("CHURNED_F ~", paste(n[!n %in% "CHURNED_F"], collapse = " + ")))
nn <- neuralnet(f,data=train_,hidden=c(5),linear.output=F)
It works as it should, however when training with the full data set (in the range of millions of rows) it just takes too long. So I know R is by default single threaded, so I have tried researching on how to parallelize the work into all the cores. Is it even possible to make this function in parallel? I have tried various packages with no success.
Has anyone been able to do this?
It doesn't have to be the neuralnet package, any solution that lets me train a neural network would work.
Thank you
I have had good experiences with the package Rmpi, and it may be applicable in your case too.
library(Rmpi)
Briefly, its usage is as follows:
nproc = 4 # could be automatically determined
# Specify one master and nproc-1 slaves
Rmpi:: mpi.spawn.Rslaves(nslaves=nproc-1)
# Execute function "func_to_be_parallelized" on multiple CPUs; pass two variables to function
my_fast_results = Rmpi::mpi.parLapply(var1_passed_to_func,
func_to_be_parallelized,
var2_passed_to_func)
# Close slaves
Rmpi::mpi.close.Rslaves(dellog=T)
You can try using the caret and doParallel packages for this. This is what I have been using. It works for some of the model types but may not work for all.
layer1 = c(6,12,18,24,30)
layer2 = c(6,12,18,24,30)
layer3 = c(6,12,18,24,30)
cv.folds = 5
# In order to make models fully reproducible when using parallel processing, we need to pass seeds as a parameter
# https://stackoverflow.com/questions/13403427/fully-reproducible-parallel-models-using-caret
total.param.permutations = length(layer1) * length(layer2) * length(layer3)
seeds <- vector(mode = "list", length = cv.folds + 1)
set.seed(1)
for(i in 1:cv.folds) seeds[[i]]<- sample.int(n=1, total.param.permutations, replace = TRUE)
seeds[[cv.folds + 1]]<-sample.int(1, 1, replace = TRUE) #for the last model
nn.grid <- expand.grid(layer1 = layer1, layer2 = layer2, layer3 = layer3)
cl <- makeCluster(detectCores()*0.5) # use 50% of cores only, leave rest for other tasks
registerDoParallel(cl)
train_control <- caret::trainControl(method = "cv"
,number=cv.folds
,seeds = seeds # user defined seeds for parallel processing
,verboseIter = TRUE
,allowParallel = TRUE
)
stopCluster(cl)
registerDoSEQ()
tic("Total Time to NN Training: ")
set.seed(1)
model.nn.caret = caret::train(form = formula,
data = scaled.train.data,
method = 'neuralnet',
tuneGrid = nn.grid,
trControl = train_control
)
toc()

Resources