I am implementing a neural network in MXNetR. I attempted to customize my loss function to compute the correlation between my output vector and the targeting vector. Below is my code:
Below is my code:
# Generate testing data
train.x = matrix(data = rexp(200, rate = 10), nrow = 120, ncol = 6380)
test.x = matrix(data = rexp(200, rate = 10), nrow = 60, ncol = 6380)
train.y = matrix(data = rexp(200, rate = 10), nrow = 120, ncol = 319)
test.y = matrix(data = rexp(200, rate = 10), nrow = 60, ncol = 319)
# Reshape testing data
train.array <-train.x
dim(train.array) <-c(20,319,1,ncol(train.x))
test.array<-test.x
dim(test.array) <-c (20,319,1,ncol(test.x))
# Define the input data
data <- mx.symbol.Variable("data")
# Define the first fully connected layer
fc1 <- mx.symbol.FullyConnected(data, num_hidden = 100)
act.fun <- mx.symbol.Activation(fc1, act_type = "relu") # create a hidden layer with Rectified Linear Unit as its activation function.
output <<- mx.symbol.FullyConnected(act.fun, num_hidden = 319)
# Customize loss function
label <- mx.symbol.Variable("label")
output_mean <- mx.symbol.mean(output)
label_mean <- mx.symbol.mean(label)
output_delta <-mx.symbol.broadcast_sub(output, output_mean)
label_delta <- mx.symbol.broadcast_sub(label, label_mean)
output_sqr <-mx.symbol.square(output_delta)
label_sqr <- mx.symbol.square(label_delta)
output_sd <- mx.symbol.sqrt(mx.symbol.sum(output_delta))
label_sd <- mx.symbol.sqrt(mx.symbol.sum(label_delta))
numerator <- mx.symbol.sum(output_delta * label_delta)
denominator <- output_sd * label_sd
lro <- mx.symbol.MakeLoss(numerator/denominator)
# Generate a new model
model <- mx.model.FeedForward.create(symbol=lro, X=train.array, y=train.y,
num.round=5000, array.batch.size=1, optimizer = "adam",
learning.rate = 0.0003, eval.metric = mx.metric.rmse,
epoch.end.callback = mx.callback.log.train.metric(20, logger))
And I got this error:
Error in mx.model.init.params(symbol, input.shape, initializer, mx.cpu()) :
Not enough information to get shapes
I tried to wrap the whole correlation formula in MXNet:
lro2 <- mx.symbol.MakeLoss(
mx.symbol.negative((mx.symbol.sum(output * label) -
(mx.symbol.sum(output) * mx.symbol.sum(label))) /
mx.symbol.sqrt((mx.symbol.sum(mx.symbol.square(output)) -
((mx.symbol.sum(output)) * (mx.symbol.sum(output)))) *
(mx.symbol.sum(mx.symbol.square(label)) - ((mx.symbol.sum(label)) * (mx.symbol.sum(label))))))
)
I can compile with this version, but my model runs very slowly and the code is apparently not very readable. I wonder if there is any way to implement get around the error and implement the first version as I described above.
MXNet performs shape inference to determine the required shape of the model parameters (weights and biases) in order to allocate memory, and the first time this is done is when the model parameters are initialized.
Somewhere in your symbol you have a shape that can't be inferred from the neighbors, and I suspect it may be the broadcast_sub which you removed in the inline definition. It's hard to diagnose the exact issue due to the error in the reshape. You could also try working with NDArray to test the logic and then convert back to using Symbol.
If you're looking to batch samples, you should change the array.batch.size parameter of mx.model.FeedForward.create rather than reshaping your data into batches.
Related
I am trying to fit a piecewise regression for this dataset. I know we do not have a linear relation between the dependent and independent variable but my real world application requires me to model the data as a lm segmented regression.
Here is my code with description of the steps
bond_data <- data.frame(
yield_change = c(-1.2,-0.9,-1.8,-1.4,-1.8,-2.1,-2.3,-2.1,-2.5,-2.2,-2.4,-2.5,-2.4,-2.4,
-3.0,-2.6,-5.1,-4.8,-4.9,-5.0,-5.0,-6.2,-6.1,-6.3,-5.0,-5.0),
maturity =c(10.2795,10.8603,11.7753,12.3562,12.5205,13.3589,13.8630,14.2822,14.3589,15.3589,
15.8630,16.778,17.3616,17.8658,18.3616,21.8685,22.5288,23.8685,24.3644,25.3671,
26.8712,27.8712,28.8712,29.8740,44.3781,49.3836))
The bond_data Dataframe contains these two vectors stated above.
#Defining lm model & segmented modelmodel <- lm(yield_change~maturity, data = bond_data)
segmented.model <- segmented(model,seg.Z=~maturity,psi = list(maturity = c(15,20,30)),fixed.psi = c(15,20,30),control = seg.control(it.max = 0, n.boot = 50))
xp <- c(min(bond_data$maturity), segmented.model$psi[,"Est."], max(bond_data$maturity))
new_data <- data.frame(xp)
colnames(new_data) <- "maturity"
o <- segmented.model
new_data$dummy1 <- pmax(new_data$maturity - o$psi[1,2], 0)
new_data$dummy2 <- pmax(new_data$maturity - o$psi[2,2], 0)
new_data$dummy3 <- pmax(new_data$maturity - o$psi[3,2], 0)
new_data$dummy4 <-I(new_data$maturity > o$psi[1,2]) * coef(o)[3]
new_data$dummy5 <-I(new_data$maturity > o$psi[2,2]) * coef(o)[4]
new_data$dummy6 <-I(new_data$maturity > o$psi[3,2]) * coef(o)[5]
names(new_data)[-1] <- names(model.frame(o))[-c(1,2)]
yp <- predict(segmented.model,new_data)
plot(bond_data$maturity,bond_data$yield_change, pch=16, col="blue",ylim = c(-8,0))
lines(xp,yp)
I get the following image
Plot of actual values in blue points and pred line
I am trying to get the first segment start at the point(maturity = 10, yield_change = 0)
One thing to note is that all my breakpoints have fixed x positions and no estimates are made so when I run segmented.model$psi my initial values are the same as my estimates (15,20 and 30) and all my st.err are zero.
How would I go about making my prediction line start at the point(maturity = 10, yield_change = 0)? I appreciate any help!
I have tried doing the following:
model <- lm(I(yield_change-0)~I(maturity-10), data = bond_data)
segmented.model <- segmented(model,seg.Z=~maturity,psi = list(maturity = c(15,20,30)),fixed.psi = c(15,20,30), control = seg.control(it.max = 0, n.boot = 50))
#But by running the previous line I get the error (object maturity not recognised).
#By running:
segmented.model <- segmented(model,seg.Z=~I(maturity-10),psi = list(I(maturity-10) = c(15,20,30)),fixed.psi = c(15,20,30), control = seg.control(it.max = 0, n.boot = 50))
I get this error:
Error: unexpected '=' in "segmented.model <- segmented(model,seg.Z=~I(maturity-10),psi = list(I(maturity-10) ="
I do not think I am using the correct method to solve my problem...
I am running SHAP from the library shapper in R for a classification model intrepetation on a Keras CNN model:
library(keras)
library("shapper")
library("DALEX")
I made a simple reproductible example
mdat.train <- cbind(rep(1:2, each = 5), matrix(c(1:30), ncol = 3, byrow = TRUE))
train.conv <- array_reshape(mdat.train[,-1], c(nrow(mdat.train[,-1]), ncol(mdat.train[,-1]), 1))
mdat.test <- cbind(rep(1:2, each = 3), matrix(c(1:18), ncol = 3, byrow = TRUE))
test.conv <- array_reshape(mdat.test[,-1], c(nrow(mdat.test[,-1]), ncol(mdat.test[,-1]), 1))
My CNN model
model.CNN <- keras_model_sequential()
model.CNN %>%
layer_conv_1d(filters=16L, kernel_initializer=initializer_he_normal(seed=NULL), kernel_size=2L, input_shape = c(dim(train.conv)[[2]],1)) %>%
layer_batch_normalization() %>%
layer_activation_leaky_relu() %>%
layer_flatten() %>%
layer_dense(50, activation ="relu") %>%
layer_dropout(rate=0.5) %>%
layer_dense(units=2, activation ='sigmoid')
model.CNN %>% compile(
loss = loss_binary_crossentropy,
optimizer = optimizer_adam(lr = 0.001, beta_1 = 0.9, beta_2 = 0.999, epsilon = 1e-08),
metrics = c("accuracy"))
model.CNN %>% fit(
train.conv, mdat.train[,1], epochs = 5, verbose = 1)
My Shap command
p_function <- function(model, data) predict(model.CNN, test.conv, type = "prob")
exp_cnn <- explain(model.CNN, data = train.conv)
ive_cnn <- shap(exp_cnn, data = train.conv, new_observation = test.conv, predict_function = p_function)
I am getting this error :
Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: operands could not be broadcast together with shapes (2,6) (10,)
Detailed traceback:
File "/.local/lib/python3.6/site-packages/shap/explainers/kernel.py", line 120, in __init__
self.fnull = np.sum((model_null.T * self.data.weights).T, 0)
Problem You've presented has two steps. First of all shown error comes from code typo. p_function shown by You calls global objects instead of passed ones. Thats why You have witnessed that error.
But to my surpirse I've found package not working even after clarifying that mistake. Let me explain motivation and the solution.
Have to say that 3D Arrays are not common in R, therefore shapper package does not support that type of train data. It assumes data.frame format at the beginning of the task (because it's iterating over variables). To be honest it took me like 2 hourse to find a reason why it is not working as well as a solution.
First of all we need new variables that are understandable for shapper.
shapper_data <- as.data.frame(train.conv)
shapper_new_obs <- as.data.frame(test.conv)[1,]
as well as new predict_function
p_function <- function(model, data) {
mat <- as.matrix(data)
mat <- array_reshape(mat, c(nrow(data), ncol(data), 1))
predict(model, mat, type = "prob")
}
Two new lines will convert data.frame into proper shaped array.
Then line
ive_cnn <- individual_variable_effect(x = model.CNN, data = shapper_data, new_observation = shapper_new_obs, predict_function = p_function)
Works perfectly fine for me.
Best
Szymon
I have not got a clear idea about how labels for the softmax classifier should be shaped.
What I could understand from my experiments is that a scalar laber indicating the index of class probability output is one option, while another is a 2D label where the rows are class probabilities, or one-hot encoded variable, like c(1, 0, 0).
What puzzles me though is that:
I can use sclalar label values that go beyong indexing, like 4 in my
example below -- without warning or error. Why is that?
When my label is a negative scalar or an array with a negative value,
the model converges to uniform probablity distribution over classes.
For example, is this expected that actor_train.y = matrix(c(0, -1,v0), ncol = 1) results in equal probabilities in the softmax output?
I try to use softmax MXNET classifier to produce the policy gradient
reifnrocement learning, and my negative rewards lead to the issue
above: uniform probability. Is that expected?
require(mxnet)
actor_initializer <- mx.init.Xavier(rnd_type = "gaussian",
factor_type = "avg",
magnitude = 0.0001)
actor_nn_data <- mx.symbol.Variable('data') actor_nn_label <- mx.symbol.Variable('label')
device.cpu <- mx.cpu()
NN architecture
actor_fc3 <- mx.symbol.FullyConnected(
data = actor_nn_data
, num_hidden = 3 )
actor_output <- mx.symbol.SoftmaxOutput(
data = actor_fc3
, label = actor_nn_label
, name = 'actor' )
crossentfunc <- function(label, pred)
{
- sum(label * log(pred)) }
actor_loss <- mx.metric.custom(
feval = crossentfunc
, name = "log-loss"
)
initialize NN
actor_train.x <- matrix(rnorm(11), nrow = 1)
actor_train.y = 0 #1 #2 #3 #-3 # matrix(c(0, 0, -1), ncol = 1)
rm(actor_model)
actor_model <- mx.model.FeedForward.create(
symbol = actor_output,
X = actor_train.x,
y = actor_train.y,
ctx = device.cpu,
num.round = 100,
array.batch.size = 1,
optimizer = 'adam',
eval.metric = actor_loss,
clip_gradient = 1,
wd = 0.01,
initializer = actor_initializer,
array.layout = "rowmajor" )
predict(actor_model, actor_train.x, array.layout = "rowmajor")
It is quite strange to me, but I found a solution.
I changed optimizer from optimizer = 'adam' to optimizer = 'rmsprop', and the NN started to converge as expected in case of negative targets. I made simulations in R using a simple NN and optim function to get the same result.
Looks like adam or SGD may be buggy or whatever in case of multinomial classification... I also used to get stuck at the fact those optimizers did not converge to a perfect solution on just 1 example, while rmsprop does! Be aware!
I'm trying out the Keras package in R by doing this tutorial about forecasting the temperature. However, the tutorial has no explanation on how to predict with the trained RNN model and I wonder how to do this. To train a model I used the following code copied from the tutorial:
dir.create("~/Downloads/jena_climate", recursive = TRUE)
download.file(
"https://s3.amazonaws.com/keras-datasets/jena_climate_2009_2016.csv.zip",
"~/Downloads/jena_climate/jena_climate_2009_2016.csv.zip"
)
unzip(
"~/Downloads/jena_climate/jena_climate_2009_2016.csv.zip",
exdir = "~/Downloads/jena_climate"
)
library(readr)
data_dir <- "~/Downloads/jena_climate"
fname <- file.path(data_dir, "jena_climate_2009_2016.csv")
data <- read_csv(fname)
data <- data.matrix(data[,-1])
train_data <- data[1:200000,]
mean <- apply(train_data, 2, mean)
std <- apply(train_data, 2, sd)
data <- scale(data, center = mean, scale = std)
generator <- function(data, lookback, delay, min_index, max_index,
shuffle = FALSE, batch_size = 128, step = 6) {
if (is.null(max_index))
max_index <- nrow(data) - delay - 1
i <- min_index + lookback
function() {
if (shuffle) {
rows <- sample(c((min_index+lookback):max_index), size = batch_size)
} else {
if (i + batch_size >= max_index)
i <<- min_index + lookback
rows <- c(i:min(i+batch_size, max_index))
i <<- i + length(rows)
}
samples <- array(0, dim = c(length(rows),
lookback / step,
dim(data)[[-1]]))
targets <- array(0, dim = c(length(rows)))
for (j in 1:length(rows)) {
indices <- seq(rows[[j]] - lookback, rows[[j]],
length.out = dim(samples)[[2]])
samples[j,,] <- data[indices,]
targets[[j]] <- data[rows[[j]] + delay,2]
}
list(samples, targets)
}
}
lookback <- 1440
step <- 6
delay <- 144
batch_size <- 128
train_gen <- generator(
data,
lookback = lookback,
delay = delay,
min_index = 1,
max_index = 200000,
shuffle = TRUE,
step = step,
batch_size = batch_size
)
val_gen = generator(
data,
lookback = lookback,
delay = delay,
min_index = 200001,
max_index = 300000,
step = step,
batch_size = batch_size
)
test_gen <- generator(
data,
lookback = lookback,
delay = delay,
min_index = 300001,
max_index = NULL,
step = step,
batch_size = batch_size
)
# How many steps to draw from val_gen in order to see the entire validation set
val_steps <- (300000 - 200001 - lookback) / batch_size
# How many steps to draw from test_gen in order to see the entire test set
test_steps <- (nrow(data) - 300001 - lookback) / batch_size
library(keras)
model <- keras_model_sequential() %>%
layer_flatten(input_shape = c(lookback / step, dim(data)[-1])) %>%
layer_dense(units = 32, activation = "relu") %>%
layer_dense(units = 1)
model %>% compile(
optimizer = optimizer_rmsprop(),
loss = "mae"
)
history <- model %>% fit_generator(
train_gen,
steps_per_epoch = 500,
epochs = 20,
validation_data = val_gen,
validation_steps = val_steps
)
I tried to predict the temperature with the code below. If I am correct, this should give me the normalized predicted temperature for every batch. So when I denormalize the values and average them, I get the predicted temperature. Is this correct and if so for which time is then predicted (latest observation time + delay?) ?
prediction.set <- test_gen()[[1]]
prediction <- predict(model, prediction.set)
Also, what is the correct way to use keras::predict_generator() and the test_gen() function? If I use the following code:
model %>% predict_generator(generator = test_gen,
steps = test_steps)
it gives this error:
error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: Error when checking model input: the list of Numpy
arrays that you are passing to your model is not the size the model expected.
Expected to see 1 array(s), but instead got the following list of 2 arrays:
[array([[[ 0.50394005, 0.6441838 , 0.5990761 , ..., 0.22060473,
0.2018686 , -1.7336458 ],
[ 0.5475698 , 0.63853574, 0.5890239 , ..., -0.45618412,
-0.45030192, -1.724062...
Note: my familiarity with syntax of R is very little, so unfortunately I can't give you an answer using R. Instead, I am using Python in my answer. I hope you could easily translate back, my words at least, to R.
... If I am correct, this should give me the normalized predicted
temperature for every batch.
Yes, that's right. The predictions would be normalized since you have trained it with normalized labels:
data <- scale(data, center = mean, scale = std)
Therefore, you would need to denormalize the values using the computed mean and std to find the real predictions:
pred = model.predict(test_data)
denorm_pred = pred * std + mean
... for which time is then predicted (latest observation time +
delay?)
That's right. Concretely, since in this particular dataset every ten minutes a new obeservation is recorded and you have set delay=144, it would mean that the predicted value is the temperature 24 hours ahead (i.e. 144 * 10 = 1440 minutes = 24 hours) from the last given observation.
Also, what is the correct way to use keras::predict_generator() and
the test_gen() function?
predict_generator takes a generator that gives as output only test samples and not the labels (since we don't need labels when we are performing prediction; the labels are needed when training, i.e. fit_generator(), and when evaluating the model, i.e. evaluate_generator()). That's why the error mentions that you need to pass one array instead of two arrays. So you need to define a generator that only gives test samples or one alternative way, in Python, is to wrap your existing generator inside another function that gives only the input samples (I don't know whether you can do this in R or not):
def pred_generator(gen):
for data, labels in gen:
yield data # discards labels
preds = model.predict_generator(pred_generator(test_generator), number_of_steps)
You need to provide one other argument which is the number of steps of generator to cover all the samples in test data. Actually we have num_steps = total_number_of_samples / batch_size. For example, if you have 1000 samples and each time the generator generate 10 samples, you need to use generator for 1000 / 10 = 100 steps.
Bonus: To see how good your model performs you can use evaluate_generator using the existing test generator (i.e. test_gen):
loss = model.evaluate_generator(test_gen, number_of_steps)
The given loss is also normalized and to denormalize it (to get a better sense of prediction error) you just need to multiply it by std (you don't need to add mean since you are using mae, i.e. mean absolute error, as the loss function):
denorm_loss = loss * std
This would tell you how much your predictions are off on average. For example, if you are predicting the temperature, a denorm_loss of 5 means that the predictions are on average 5 degrees off (i.e. are either less or more than the actual value).
Update: For prediction, you can define a new generator using an existing generator in R like this:
pred_generator <- function(gen) {
function() { # wrap it in a function to make it callable
gen()[1] # call the given generator and get the first element (i.e. samples)
}
}
preds <- model %>%
predict_generator(
generator = pred_generator(test_gen), # pass test_gen directly to pred_generator without calling it
steps = test_steps
)
evaluate_generator(model, test_gen, test_steps)
I am trying to train a neural network for churn prediction with R package neuralnet. Here is the code:
data <- read.csv('C:/PredictChurn.csv')
maxs <- apply(data, 2, max)
mins <- apply(data, 2, min)
scaled_temp <- as.data.frame(scale(data, center = mins, scale = maxs - mins))
scaled <- data
scaled[, -c(1)] <- scaled_temp[, -c(1)]
index <- sample(1:nrow(data),round(0.75*nrow(data)))
train_ <- scaled[index,]
test_ <- scaled[-index,]
library(neuralnet)
n <- names(train_[, -c(1)])
f <- as.formula(paste("CHURNED_F ~", paste(n[!n %in% "CHURNED_F"], collapse = " + ")))
nn <- neuralnet(f,data=train_,hidden=c(5),linear.output=F)
It works as it should, however when training with the full data set (in the range of millions of rows) it just takes too long. So I know R is by default single threaded, so I have tried researching on how to parallelize the work into all the cores. Is it even possible to make this function in parallel? I have tried various packages with no success.
Has anyone been able to do this?
It doesn't have to be the neuralnet package, any solution that lets me train a neural network would work.
Thank you
I have had good experiences with the package Rmpi, and it may be applicable in your case too.
library(Rmpi)
Briefly, its usage is as follows:
nproc = 4 # could be automatically determined
# Specify one master and nproc-1 slaves
Rmpi:: mpi.spawn.Rslaves(nslaves=nproc-1)
# Execute function "func_to_be_parallelized" on multiple CPUs; pass two variables to function
my_fast_results = Rmpi::mpi.parLapply(var1_passed_to_func,
func_to_be_parallelized,
var2_passed_to_func)
# Close slaves
Rmpi::mpi.close.Rslaves(dellog=T)
You can try using the caret and doParallel packages for this. This is what I have been using. It works for some of the model types but may not work for all.
layer1 = c(6,12,18,24,30)
layer2 = c(6,12,18,24,30)
layer3 = c(6,12,18,24,30)
cv.folds = 5
# In order to make models fully reproducible when using parallel processing, we need to pass seeds as a parameter
# https://stackoverflow.com/questions/13403427/fully-reproducible-parallel-models-using-caret
total.param.permutations = length(layer1) * length(layer2) * length(layer3)
seeds <- vector(mode = "list", length = cv.folds + 1)
set.seed(1)
for(i in 1:cv.folds) seeds[[i]]<- sample.int(n=1, total.param.permutations, replace = TRUE)
seeds[[cv.folds + 1]]<-sample.int(1, 1, replace = TRUE) #for the last model
nn.grid <- expand.grid(layer1 = layer1, layer2 = layer2, layer3 = layer3)
cl <- makeCluster(detectCores()*0.5) # use 50% of cores only, leave rest for other tasks
registerDoParallel(cl)
train_control <- caret::trainControl(method = "cv"
,number=cv.folds
,seeds = seeds # user defined seeds for parallel processing
,verboseIter = TRUE
,allowParallel = TRUE
)
stopCluster(cl)
registerDoSEQ()
tic("Total Time to NN Training: ")
set.seed(1)
model.nn.caret = caret::train(form = formula,
data = scaled.train.data,
method = 'neuralnet',
tuneGrid = nn.grid,
trControl = train_control
)
toc()