Saliency maps for individual channels? - r

I trained a CNN using Keras and R with the TensorFlow backend for classifying multispectral images. I want to calculate saliency maps per input data band for a single input image. My idea is to calculate the mean of the saliency map for each channel to get the information, which input band contributed most to the classification. Is this possible and if yes, how? Everywhere I looked, I only found python implementations of saliency maps.
Let's just assume, this is my network and I want to calculate saliency maps for all three channels of the one image in the end, so I know which channel is most important:
# download & load data
cifar <- dataset_cifar10()
# set up model
model <- keras_model_sequential() %>%
layer_conv_2d(filters = 32, kernel_size = c(3,3), activation = "relu", input_shape = c(32,32,3)) %>%
layer_max_pooling_2d() %>%
layer_conv_2d(filters = 64, kernel_size = c(3,3), activation = "relu") %>%
layer_max_pooling_2d() %>%
layer_flatten() %>%
layer_dense(units = 64, activation = "relu") %>%
layer_dense(units = 10, activation = "softmax")
# compile model
model %>% compile(
optimizer = "adam",
loss = "sparse_categorical_crossentropy",
metrics = "accuracy"
)
# run model
history <- model %>%
fit(
x = cifar$train$x,
y = cifar$train$y,
epochs = 10
)
# pick out one image
test_img <- cifar$test$x[1,,,]
# what now?

Related

How to define lstm, units, lags and batch size in keras package

I have data of almost 4700 entries. I have to predict power output. I am unable to understand the algorithm of the LSTM like what is units? how to select units for my data and what are data lags? The code I am using for this work is available here https://www.r-bloggers.com/2018/11/lstm-with-keras-tensorflow/ as I have interest in lstm so I am only using that part of this code.
library(keras)
model <- keras_model_sequential()
model %>%
layer_lstm(units = 100,
input_shape = c(datalags, 2),
batch_size = batch.size,
return_sequences = TRUE,
stateful = TRUE) %>%
layer_dropout(rate = 0.5) %>%
layer_lstm(units = 50,
return_sequences = FALSE,
stateful = TRUE) %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = 1)
model %>%
compile(loss = 'mae', optimizer = 'adam')
So in this code I am unable to understand
what is meant by units here?
What are datalags,
The code uses datalags value as 10, how do I define it for my data? and how to manually select them for my data?

How to make keras utilize all available CPU capacity when training?

I am trying to implement a bunch of different neural networks for a regression problem. When I train a single model I see that my computer doesn't utilize all the available CPU, which I guess would be preferable to make the training faster.
Ultimately I want to specify multiple models (around 4), that can be trained simultaneously, but to begin with I just want to utilize all CPU when training a single model. The screenshot below shows how my CPU is used when i train a model:
CPU utilization
In the code below I tried setting use_multiprocessing = TRUE, which I thought could help, but I get the error that the argument is not used.
library(keras)
epoch <- 50
lr <- 0.1
decay <- lr / epoch
# initialize model
fit_NN4 <- keras_model_sequential() %>%
layer_flatten(input_shape = training %>% select(-date, -mktcap, -permno, -ret.adj) %>% ncol()) %>%
layer_dense(units = 64, activation = "relu") %>%
layer_dense(units = 32, activation = "relu") %>%
layer_dense(units = 16, activation = "relu") %>%
layer_dense(units = 8, activation = "relu") %>%
layer_dense(units = 1)
# compile
fit_NN4 %>% compile(
loss = "mse", # loss objective function
#optimizer = optimizer_rmsprop(),
optimizer = optimizer_sgd(lr = lr, decay = decay),
metrics = c("mean_absolute_error")
)
# train the model
fit_NN4 %>%
fit(
# Training data
x = training %>% select(-date, -mktcap, -permno, -ret.adj) %>% as.matrix(),
y = training %>% pull(ret.adj) %>% as.matrix(),
epoch = epoch,
# Validation data
validation_data =
list(validation %>% select(-date, -mktcap, -permno, -ret.adj) %>% as.matrix(),
validation %>% pull(ret.adj) %>% as.matrix()),
# Callbacks
callbacks = list(
callback_early_stopping(monitor = "val_loss", # early stop objective
mode = "min", # minimize objective
verbose = 1, # Return epoch at stop
patience = 4 # wait for 4 epochs to stop relative to min
use_multiprocessing = TRUE
)) # Use all CPU capacity
)

Multi-input model with generator function - Keras R

I have been trying to build a multi-input model in keras. One input branch would be images and the second one some metaData for the corresponding images.
For the images I need a generator function which would input batches of images. The metaData is in a tabular form.
Now I am wondering how I should pass the data to the model so the right image would be processed with the respective metaData Information. For your Information this will be a regression Task.
The Input Data I have:
Images in dir1/
Data Frame with the path and features.
path feature1 feature2 target
image1.jpg 23.5 100 16
image2.jpg 25.0 88 33
The code I have for now:
generator function for Images:
train_datagen <- image_data_generator(rescale = 1/255)
train_generator <- flow_images_from_dataframe(
dataframe = joined_path_with_metadata,
directory = 'data_dir',
x_col = "path",
y_col = "train",
generator = train_datagen,
target_size = c(150, 150),
batch_size = 20,
color_mode = 'rgb',
class_mode = "sparse"
)
model definition:
vision_model <- keras_model_sequential()
vision_model %>%
layer_conv_2d(filters = 64,
kernel_size = c(3, 3),
activation = 'relu',
padding = 'same',
input_shape = c(150, 150, 3)) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_flatten()
# Now let's get a tensor with the output of our vision model:
image_input <- layer_input(shape = c(150, 150, 3))
encoded_image <- image_input %>% vision_model
# ANN for tabular data
tabular_input <- layer_input(shape = ncol(dataframe), dtype = 'float32')
mlp_model <- tabular_input %>%
layer_dense(
units = 16,
kernel_initializer = "uniform",
activation = "relu") # Dropout to prevent overfitting
layer_dropout(rate = 0.1) %>%
layer_dense(
units = 32,
kernel_initializer = "uniform",
activation = "relu") %>%
# concatenate the metadata and the image vector then
# train a linear regression on it
output <- layer_concatenate(c(mlp_model, encoded_image)) %>%
layer_dense(units = 1, activation='linear')
# This is the final model:
vqa_model <- keras_model(inputs = c(image_input, tabular_input), outputs = output)
compile:
vqa_model %>% compile(
optimizer = 'adam',
loss = 'mean_squared_error',
metrics = c('mean_squared_error')
)
and the last step would be to fit the model. I am not sure how to do this to make sure that the first row of features will be taken as the metadata of the Images which are read in in batches.

Keras neural network not fitting in R

I made a neural network in R using the Keras package. I basically made the same model I had created in python. I used the same data as well in the same order. However, when I run it in R, the model doesn't seem to be fitting at all.
When I call predict on the model, it returns the same value regardless of the input.
I'm guessing the weights are zeroing out and its returning the bias.
Heres how I built the model:
model <- keras_model_sequential()
model %>%
layer_dense(units = 256, activation = 'relu',input= c(18)) %>%
layer_dense(units = 64, activation = 'relu')%>%
layer_dropout(rate = 0.25) %>%
layer_dense(units = 32, activation = 'relu') %>%
layer_dropout(rate = 0.25) %>%
layer_dense(units = 16, activation = 'relu') %>%
layer_dropout(rate = 0.25) %>%
layer_dense(units = 8, activation = 'relu') %>%
layer_dense(units = 2, activation = 'softmax')
Heres the output when I call predict:
model%>%
predict(nbainput_test_x)

How to design keras neuralnet for predicting 2 (+1 not A not B) classes data with 2 classes training

I have a convnet model for binary image classification: cat/dog.
library(keras)
conv_base <- application_vgg16(
weights = "imagenet",
include_top = FALSE,
input_shape = c(150, 150, 3)
)
# Hyperparameter construction
model <- keras_model_sequential() %>%
conv_base %>%
layer_flatten() %>%
layer_dense(units = 256, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
model %>% compile(
loss = "binary_crossentropy",
optimizer = optimizer_rmsprop(lr = 2e-5),
metrics = c("accuracy")
)
img <- image_load('test_image.jpg', target_size = c(150, 150))
x <- image_to_array(img)
x <- array_reshape(x, c(1, dim(x)))
preds_class <- model %>% predict_classes(x)
model %>% predict(x)
The predict(x) give one probability which lets us infer it being
a cat or dog.
I only have training data two classes: cat/dog.
Is there a way I can modify the code at compile() or hyperparameter construction
so that it spit out 3 probabilities for
cat
dog
non_catdog
The third category is everything not in class 1 and 2 (cat/dog)
Strategy to design hyperparameter or compile for predicting 2 (+1 other) classes data with 2 class training
I feel your issue may be in the construction of the network:
# Hyperparameter construction
model <- keras_model_sequential() %>%
conv_base %>%
layer_flatten() %>%
layer_dense(units = 256, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
Your final layer uses a sigmoid activation which squashes your output to [0,1]. I think what you are after is a softmax activation as you have more than 2 classes.
Not exactly sure on the keras syntax but maybe something along the lines of:
model <- keras_model_sequential() %>%
conv_base %>%
layer_flatten() %>%
layer_dense(units = 256, activation = "relu") %>%
layer_dense(units = 3, activation = "softmax")
As flagged in the comments - the networks loss function will also need to be changed. The binary entropy equation assumes only a single vector of predictions and observations which is not the case in this architecture.
model %>% compile(
loss = "categorical_crossentropy",
optimizer = optimizer_rmsprop(lr = 2e-5),
metrics = c("accuracy")
)
Update
You are trying to capture 3 possible outputs. Possibility of belonging to class A, B, or neither. Your label vector should look something like:
Class A = [1, 0, 0]
Class B = [0, 1, 0]
Class C (! A || B) = [0, 0, 1]
It might seem logical to assign class C as [0, 0] but that is problematic giving how the 'softmax' works. Each training case is given a probability of belonging to EACH of the classes. Therefore, a training example of Class A could be given a 30% probability of belonging to class B. The class prediction is essentially a vote.
I.e. I predict that this example is of Class A as it has the highest probability compared to the other class probabilities.

Resources