Regression using MXNet in R with image recognition - r

So I am trying to use image recognition to output a regression style number using the mxnet package in R using a CNN.
I have used this as the basis of my analysis: https://rstudio-pubs-static.s3.amazonaws.com/236125_e0423e328e4b437888423d3821626d92.html
This is an image recognition analysis using mxnet in R using CNN, so I have followed these steps to prepare my data for preprocessing by doing the same steps, resizing, grayscaling.
My "image" dataset looks like like this, I have 784 columns of pixels, and the last column is a numeric column with the "label" that I am trying to predict so it will be: 1132, 1491, 845, etc.
From there, I create a training and testing:
library(pbapply)
library(caret)
## test/training partitions
training_index <- createDataPartition(image$STOPPING_TIME, p = .9, times = 1)
training_index <- unlist(training_index)
train_set <- image[training_index,]
dim(train_set)
test_set <- image[-training_index,]
dim(test_set)
## Fix train and test datasets
train_data <- data.matrix(train_set)
train_x <- t(train_data[, -785])
train_y <- train_data[,785]
train_array <- train_x
dim(train_array) <- c(28, 28, 1, ncol(train_x))
test_data <- data.matrix(test_set)
test_x <- t(test_set[,-785])
test_y <- test_set[,785]
test_array <- test_x
dim(test_array) <- c(28, 28, 1, ncol(test_x))
Now I get onto using the mxnet, which is what is causing problems, not sure what I am doing wrong:
library(mxnet)
## Model
mx_data <- mx.symbol.Variable('data')
## 1st convolutional layer 5x5 kernel and 20 filters.
conv_1 <- mx.symbol.Convolution(data = mx_data, kernel = c(5, 5), num_filter = 20)
tanh_1 <- mx.symbol.Activation(data = conv_1, act_type = "tanh")
pool_1 <- mx.symbol.Pooling(data = tanh_1, pool_type = "max", kernel = c(2, 2), stride = c(2,2 ))
## 2nd convolutional layer 5x5 kernel and 50 filters.
conv_2 <- mx.symbol.Convolution(data = pool_1, kernel = c(5,5), num_filter = 50)
tanh_2 <- mx.symbol.Activation(data = conv_2, act_type = "tanh")
pool_2 <- mx.symbol.Pooling(data = tanh_2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
## 1st fully connected layer
flat <- mx.symbol.Flatten(data = pool_2)
fcl_1 <- mx.symbol.FullyConnected(data = flat, num_hidden = 500)
tanh_3 <- mx.symbol.Activation(data = fcl_1, act_type = "tanh")
## 2nd fully connected layer
fcl_2 <- mx.symbol.FullyConnected(data = tanh_3, num_hidden = 2)
## Output
label <- mx.symbol.Variable("label")
NN_model <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fcl_2, shape = 0) - label))
## Set seed for reproducibility
mx.set.seed(100)
## Train on 1200 samples
model <- mx.model.FeedForward.create(NN_model, X = train_array, y = train_y,
num.round = 30,
array.batch.size = 100,
initializer=mx.init.uniform(0.002),
learning.rate = 0.05,
momentum = 0.9,
wd = 0.00001,
eval.metric = mx.metric.rmse)
epoch.end.callback = mx.callback.log.train.metric(100))
I get the error:
[00:30:08] D:\Program Files (x86)\Jenkins\workspace\mxnet\mxnet\dmlc-core\include\dmlc/logging.h:308: [00:30:08] d:\program files (x86)\jenkins\workspace\mxnet\mxnet\src\operator\tensor\./matrix_op-inl.h:134: Check failed: oshape.Size() == dshape.Size() (100 vs. 200) Target shape size is different to source. Target: (100,)
Source: (100,2)
Error in symbol$infer.shape(list(...)) :
Error in operator reshape9: [00:30:08] d:\program files (x86)\jenkins\workspace\mxnet\mxnet\src\operator\tensor\./matrix_op-inl.h:134: Check failed: oshape.Size() == dshape.Size() (100 vs. 200) Target shape size is different to source. Target: (100,)
Source: (100,2)
I can get it to work using if I use
NN_model <- mx.symbol.SoftmaxOutput(data = fcl_2)
and keep the rmse there, but it doesn't improve performance of my model after 30 iterations.
Thanks!

Your last fully connected layer fcl_2 <- mx.symbol.FullyConnected(data = tanh_3, num_hidden = 2) creates an output shape of (batch_size, 2), reshaping it results in (2 * batch_size).
Then you are doing (mx.symbol.Reshape(fcl_2, shape = 0) - label), i.e. you are trying to subtract tensors of the following shapes: (200) - (100), which cannot work.
Instead what you likely want to do is change your last fully connected layer to have only one hidden unit fcl_2 <- mx.symbol.FullyConnected(data = tanh_3, num_hidden = 1), as you say that you are trying to learn a network that predicts a single scalar output.

Related

Raster predictions do not reproduce session to session when factor variable is included in model

This question is related to this one I posted a year and a half ago: Reproducibility of results from predict() function - raster package. But since it did not have an example, I have created a new question also with updated information.
I have a somewhat obscure issue with reproducing my predictions to a raster. I am creating a gbm model with numerical variables and a single factor variable. I then use the raster package to predict to a raster using my trained model. The predictions vary session to session, but reproduce within a single R session. If I remove the factor variable, the results reproduce session to session. Also, in my below example, if I have more factor levels in the training data than in the raster variable version, I can get it to reproduce session to session. What causes this and how can I reproduce my results session to session while including the factor variable?
# This code will not reproduce session to session, but does if I leave many many factor levels in newwine with the
# commented out code
library(breakDown)
library(gbm)
library(dplyr)
library(raster)
# leave in many levels and code will reproduce session to session
#newwine <- wine[1:500,c(1:3,6)]
# specify only levels which are in the below raster and code will not reproduce session to session
newwine <- wine[,c(1:3,6)] %>%
filter(free.sulfur.dioxide == 3 | free.sulfur.dioxide == 10 | free.sulfur.dioxide == 15 |
free.sulfur.dioxide == 37 | free.sulfur.dioxide == 76)
head(newwine)
# make free.sulfur.dioxide as factor variable
newwine$free.sulfur.dioxide <- as.factor(newwine$free.sulfur.dioxide)
levels(newwine$free.sulfur.dioxide)
set.seed(123)
model <- gbm(fixed.acidity ~ ., data = newwine,
distribution = "gaussian",
bag.fraction = 0.50,
n.trees = 1000,
interaction.depth = 16,
shrinkage = 0.016,
n.minobsinnode = 10, verbose = FALSE)
summary(model)
plot(model, i.var = 3, n.trees = 1000)
# make some rasters for the predictor variables
free.sulfur.dioxide <- c(rep(3,times=10), rep(10, times = 10),
rep(15, times = 10), rep(37, times = 10),
rep(76, times = 10))
free.sulfur.dioxide.r <- raster(ext = extent(-10, 5, -10, 5), nrows = 5, ncols = 10)
values(free.sulfur.dioxide.r) <- free.sulfur.dioxide
set.seed(123)
volatile.acidity <- newwine %>%
dplyr::select(volatile.acidity) %>%
sample_n(50)
volatile.acidity <- as.vector(volatile.acidity)[,1]
volatile.acidity.r <- raster(ext = extent(-10, 5, -10, 5), nrows = 5, ncols = 10)
values(volatile.acidity.r) <- volatile.acidity
set.seed(123)
citric.acid <- newwine %>%
dplyr::select(citric.acid) %>%
sample_n(50)
citric.acid <- as.vector(citric.acid)[,1]
citric.acid.r <- raster(ext = extent(-10, 5, -10, 5), nrows = 5, ncols = 10)
values(citric.acid.r) <- citric.acid
# create a raster stack
r <- stack(free.sulfur.dioxide.r, volatile.acidity.r, citric.acid.r)
names(r) <- c("free.sulfur.dioxide", "volatile.acidity", "citric.acid")
###########################################################################################################################
# predict to a raster with raster predict
pred <- predict(r, model, n.trees = model$n.trees, format="GTiff")
writeRaster(pred, "prediction1.tif", overwrite = TRUE)
###########################################################################################################################
# close the session and reopen, run until line 61, then run below to make a new prediction, called prediction 2
pred <- predict(r, model, n.trees = model$n.trees, format="GTiff")
writeRaster(pred, "prediction2.tif", overwrite = TRUE)
# read in the previous prediction
prediction1 <- raster("prediction1.tif")
prediction2 <- raster("prediction2.tif")
# compare rasters built across sessions
compareRaster(prediction1, prediction2, values = TRUE)
summary(prediction1-prediction2)
# compare rasters built within same session
pred2 <- predict(r, model, n.trees = model$n.trees, format="GTiff")
compareRaster(pred, pred2, values = TRUE)
However, the below code does not use the factor variable and will reproduce session to session.
### Same exercise but without setting the free sulfur dioxide to factor
## this code will reproduce session to session
library(breakDown)
library(gbm)
library(dplyr)
library(raster)
newwine <- wine[1:500,c(1:3)]
head(newwine)
set.seed(123)
model <- gbm(fixed.acidity ~ ., data = newwine,
distribution = "gaussian",
bag.fraction = 0.50,
n.trees = 1000,
interaction.depth = 16,
shrinkage = 0.016,
n.minobsinnode = 10, verbose = FALSE)
summary(model)
set.seed(123)
volatile.acidity <- newwine %>%
dplyr::select(volatile.acidity) %>%
sample_n(50)
volatile.acidity <- as.vector(volatile.acidity)[,1]
volatile.acidity.r <- raster(ext = extent(-10, 5, -10, 5), nrows = 5, ncols = 10)
values(volatile.acidity.r) <- volatile.acidity
set.seed(123)
citric.acid <- newwine %>%
dplyr::select(citric.acid) %>%
sample_n(50)
citric.acid <- as.vector(citric.acid)[,1]
citric.acid.r <- raster(ext = extent(-10, 5, -10, 5), nrows = 5, ncols = 10)
values(citric.acid.r) <- citric.acid
# create a raster stack
r <- stack( volatile.acidity.r, citric.acid.r)
names(r) <- c( "volatile.acidity", "citric.acid")
#######################################################################################################################
# predict to a raster with raster predict
pred <- predict(r, model, n.trees = model$n.trees, format="GTiff")
writeRaster(pred, "prediction1.tif", overwrite = TRUE)
#######################################################################################################################
# close the session and reopen to make a new prediction, called prediction 2
pred <- predict(r, model, n.trees = model$n.trees, format="GTiff")
writeRaster(pred, "prediction2.tif", overwrite = TRUE)
# read in the previous prediction
prediction1 <- raster("prediction1.tif")
prediction2 <- raster("prediction2.tif")
# compare rasters built across sessions
compareRaster(prediction1, prediction2, values = TRUE)
summary(prediction1-prediction2)
# compare rasters built within same session
pred2 <- predict(r, model, n.trees = model$n.trees, format="GTiff")
compareRaster(pred, pred2, values = TRUE)
summary(pred-pred2)
It does appear that this issue is not due to the raster package but due to the gbm package. After some digging, I discovered the gbm package was orphaned March 2017, and there is a new gbm package, called gbm3 on github (not yet available on CRAN) https://github.com/gbm-developers/gbm3. When you predict to the raster, you use whatever predict method your model type calls for (e.g. predict.gbm() for gbm and predict.GBMFit() for gbm3. It seems predict.gbm() just does not handle factors coming from rasters in models correctly. It may or may not be a bug, but in either case, this package is no longer being maintained. gbm3 does the trick and is reproducible.
# This code will reproduce session to session for the gbm3 model, but not for old gbm model
library(breakDown)
# install gbm3 from github
library(gbm3)
library(dplyr)
library(raster)
# specify only levels which are in the below raster
newwine <- wine[,c(1:3,6)] %>%
filter(free.sulfur.dioxide == 3 | free.sulfur.dioxide == 10 | free.sulfur.dioxide == 15 |
free.sulfur.dioxide == 37 | free.sulfur.dioxide == 76)
head(newwine)
# make free.sulfur.dioxide as factor variable
newwine$free.sulfur.dioxide <- as.factor(newwine$free.sulfur.dioxide)
levels(newwine$free.sulfur.dioxide)
#set.seed(123)
# model <- gbm(fixed.acidity ~ ., data = newwine, #gbm.fit(x = newwine[,2:4], y = newwine[,1],
# distribution = "gaussian",
# bag.fraction = 0.50,
# n.trees = 1000,
# interaction.depth = 16,
# shrinkage = 0.016,
# n.minobsinnode = 10, verbose = FALSE)
set.seed(123)
model <- gbmt(fixed.acidity ~ ., data = newwine, distribution = gbm_dist("Gaussian"))
summary(model)
plot(model, var_index = 3, num_trees = 1000)
# make some rasters for the predictor variables
free.sulfur.dioxide <- c(rep(3,times=10), rep(10, times = 10),
rep(15, times = 10), rep(37, times = 10),
rep(76, times = 10))
free.sulfur.dioxide.r <- raster(ext = extent(-10, 5, -10, 5), nrows = 5, ncols = 10)
values(free.sulfur.dioxide.r) <- free.sulfur.dioxide
set.seed(123)
volatile.acidity <- newwine %>%
dplyr::select(volatile.acidity) %>%
sample_n(50)
volatile.acidity <- as.vector(volatile.acidity)[,1]
volatile.acidity.r <- raster(ext = extent(-10, 5, -10, 5), nrows = 5, ncols = 10)
values(volatile.acidity.r) <- volatile.acidity
set.seed(123)
citric.acid <- newwine %>%
dplyr::select(citric.acid) %>%
sample_n(50)
citric.acid <- as.vector(citric.acid)[,1]
citric.acid.r <- raster(ext = extent(-10, 5, -10, 5), nrows = 5, ncols = 10)
values(citric.acid.r) <- citric.acid
# create a raster stack
r <- stack(free.sulfur.dioxide.r, volatile.acidity.r, citric.acid.r)
names(r) <- c("free.sulfur.dioxide", "volatile.acidity", "citric.acid")
###########################################################################################################################
# predict to a raster with raster predict
pred <- raster::predict(r, model, n.trees = 2000, format="GTiff")
writeRaster(pred, "prediction1.tif", overwrite = TRUE)
# predict to a vector with predict
v <- values(r)
v <- data.frame(v)
v$free.sulfur.dioxide <- as.factor(v$free.sulfur.dioxide)
vpred <- predict(model, v, n.trees = 2000)
write.table(vpred, "vector_predict.txt", row.names = FALSE, col.names = TRUE)
###########################################################################################################################
# close the session and reopen, run until #### line, then run below to make a new prediction, called prediction 2
pred <- raster::predict(r, model, n.trees = 2000, format="GTiff")
writeRaster(pred, "prediction2.tif", overwrite = TRUE)
# predict to a vector with predict
v <- values(r)
v <- data.frame(v)
v$free.sulfur.dioxide <- as.factor(v$free.sulfur.dioxide)
vpred <- predict(model, v, n.trees = 2000)
write.table(vpred, "vector_predict2.txt", row.names = FALSE, col.names = TRUE)
# read in the previous prediction
prediction1 <- raster("prediction1.tif")
prediction2 <- raster("prediction2.tif")
# compare rasters built across sessions
compareRaster(prediction1, prediction2, values = TRUE)
summary(prediction1-prediction2)
# compare rasters built within same session
pred2 <- raster::predict(r, model, n.trees = 2000, format="GTiff", factors = f)
compareRaster(pred, pred2, values = TRUE)
# compare the vector predictions
p1 <- read.delim("vector_predict.txt")
p2 <- read.delim("vector_predict2.txt")
plot(p1$x,p2$x)
summary(p1$x - p2$x)
This is not a solution, but an attempt at closing in on the problem. It seems to me that this is not related to raster.
When I do:
v <- values(r)
pred <- predict(model, data.frame(v), n.trees = model$n.trees)
rpred <- predict(r, model, n.trees = model$n.trees)
Quit, save the session, start a new session and do:
library(gbm)
library(raster)
pred2 <- predict(model, data.frame(v), n.trees = model$n.trees )
rpred2 <- predict(r, model, n.trees = model$n.trees)
I see that the values of pred and pred2 are not quite the same. (see plot(pred, pred2). However, the values of pred2 and rpred2 are the same: plot(values(rpred2), pred2).
Alternatively, when I save pred (saveRDS(pred, 'pred.rds'), and I load it into a fresh session pred1 <- readRDS(pred.rds), the results are not quite the same.
It suggests to me that there is some randomization going on somewhere in gbm that is not controlled by set.seed.

Training mxnet:mx.mlp

I am trying to reproduce an example from ND Lewis: Neural Networks for time series forecasting with R. If I include the device argument I get the error:
Error in mx.opt.sgd(...) :
unused argument (device = list(device = "cpu", device_id = 0, device_typeid = 1))
In addition: Warning message:
In mx.model.select.layout.train(X, y) :
Auto detect layout of input matrix, use rowmajor..
If I remove this parameter, I still get this warning:
Warning message:
In mx.model.select.layout.train(X, y) :
Auto detect layout of input matrix, use rowmajor..
The code is:
library(zoo)
library(quantmod)
library(mxnet)
# data
data("ecoli", package = "tscount")
data <- ecoli$cases
data <- as.zoo(ts(data, start = c(2001, 1), end = c(2013, 20), frequency = 52))
xorig <- do.call(cbind, lapply((1:4), function(x) as.zoo(Lag(data, k = x))))
xorig <- cbind(xorig, data)
xorig <- xorig[-(1:4), ]
# normalization
range_data <- function(x) {
(x - min(x))/(max(x) - min(x))
}
xnorm <- data.matrix(xorig)
xnorm <- range_data(xnorm)
# test/train
y <- xnorm[, 5]
x <- xnorm[, -5]
n_train <- 600
x_train <- x[(1:n_train), ]
y_train <- y[(1:n_train)]
x_test <- x[-(1:n_train), ]
y_test <- y[-(1:n_train)]
# mxnet:
mx.set.seed(2018)
model1 <- mx.mlp(x_train,
y_train,
hidden_node = c(10, 2),
out_node = 1,
activation = "sigmoid",
out_activation = "rmse",
num.round = 100,
array.batch.size = 20,
learning.rate = 0.07,
momentum = 0.9
#, device = mx.cpu()
)
pred1_train <- predict(model1, x_train, ctx = mx.cpu())
How can I fix this?
Regarding the second warning message, MXNet is trying to detect the row/column major based on the shape of your inputs: https://github.com/apache/incubator-mxnet/blob/424143ac47ab3a38ae8aedaeb3319379887de0bc/R-package/R/model.R#L329
For the unused argument device = mx.cpu(), should the argument name be corrected to ctx instead of device?

Image Recognition with Scalar output using CNN MXnet in R

So I am trying to use image recognition using the mxnet package in R using a CNN to try and predict a scalar output (in my case wait time) based on the image.
However, when I do this, I get the same resultant output (it predicts the same number which is probably just the average of all of the results). How do I get it to predict the scalar output correctly.
Also, my image has already been pre-processed by greyscaling it and converting into the pixel format below.
I am essentially using images to predict wait times which is why my train_y is the current wait times in seconds, hence why I didn't convert it into a [0,1] range. I would prefer a regression type output or some kind of scalar output that outputs the predicted wait time based on the image.
What other ways would you recommend to tackle this problem, not sure if my approach is correct.
Here is my reproducible code:
set.seed(0)
df <- data.frame(replicate(784,runif(7538)))
df$waittime <- 1000*runif(7538)
training_index <- createDataPartition(df$waittime, p = .9, times = 1)
training_index <- unlist(training_index)
train_set <- df[training_index,]
dim(train_set)
test_set <- df[-training_index,]
dim(test_set)
## Fix train and test datasets
train_data <- data.matrix(train_set)
train_x <- t(train_data[, -785])
train_y <- train_data[,785]
train_array <- train_x
dim(train_array) <- c(28, 28, 1, ncol(train_array))
test_data <- data.matrix(test_set)
test_x <- t(test_set[,-785])
test_y <- test_set[,785]
test_array <- test_x
dim(test_array) <- c(28, 28, 1, ncol(test_x))
library(mxnet)
## Model
mx_data <- mx.symbol.Variable('data')
## 1st convolutional layer 5x5 kernel and 20 filters.
conv_1 <- mx.symbol.Convolution(data = mx_data, kernel = c(5, 5), num_filter = 20)
tanh_1 <- mx.symbol.Activation(data = conv_1, act_type = "tanh")
pool_1 <- mx.symbol.Pooling(data = tanh_1, pool_type = "max", kernel = c(2, 2), stride = c(2,2 ))
## 2nd convolutional layer 5x5 kernel and 50 filters.
conv_2 <- mx.symbol.Convolution(data = pool_1, kernel = c(5,5), num_filter = 50)
tanh_2 <- mx.symbol.Activation(data = conv_2, act_type = "tanh")
pool_2 <- mx.symbol.Pooling(data = tanh_2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
## 1st fully connected layer
flat <- mx.symbol.Flatten(data = pool_2)
fcl_1 <- mx.symbol.FullyConnected(data = flat, num_hidden = 500)
tanh_3 <- mx.symbol.Activation(data = fcl_1, act_type = "tanh")
## 2nd fully connected layer
fcl_2 <- mx.symbol.FullyConnected(data = tanh_3, num_hidden = 1)
## Output
#NN_model <- mx.symbol.SoftmaxOutput(data = fcl_2)
label <- mx.symbol.Variable("label")
#NN_model <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fcl_2, shape = 0) - label))
NN_model <- mx.symbol.LinearRegressionOutput(fcl_2)
## Device used. Sadly not the GPU :-(
#device <- mx.gpu
#Didn't work well, predicted same number continuously regardless of image
## Train on 1200 samples
model <- mx.model.FeedForward.create(NN_model, X = train_array, y = train_y,
# ctx = device,
num.round = 30,
array.batch.size = 100,
initializer=mx.init.uniform(0.002),
learning.rate = 0.00001,
momentum = 0.9,
wd = 0.00001,
eval.metric = mx.metric.rmse)
epoch.end.callback = mx.callback.log.train.metric(100))
pred <- predict(model, test_array)
#gives the same numeric output
Just modify your code a little. train_y is also in [0, 1] and initializer = mx.init.Xavier(factor_type = "in", magnitude = 2.34).
library(caret)
set.seed(0)
df <- data.frame(replicate(784, runif(7538)))
df$waittime <- runif(7538)
training_index <- createDataPartition(df$waittime, p = .9, times = 1)
training_index <- unlist(training_index)
train_set <- df[training_index, ]
dim(train_set)
test_set <- df[-training_index, ]
dim(test_set)
## Fix train and test datasets
train_data <- data.matrix(train_set)
train_x <- t(train_data[,-785])
train_y <- train_data[, 785]
train_array <- train_x
dim(train_array) <- c(28, 28, 1, ncol(train_array))
test_data <- data.matrix(test_set)
test_x <- t(test_set[, -785])
test_y <- test_set[, 785]
test_array <- test_x
dim(test_array) <- c(28, 28, 1, ncol(test_x))
library(mxnet)
## Model
mx_data <- mx.symbol.Variable('data')
## 1st convolutional layer 5x5 kernel and 20 filters.
conv_1 <- mx.symbol.Convolution(data = mx_data, kernel = c(5, 5), num_filter = 20)
tanh_1 <- mx.symbol.Activation(data = conv_1, act_type = "tanh")
pool_1 <- mx.symbol.Pooling(data = tanh_1, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
## 2nd convolutional layer 5x5 kernel and 50 filters.
conv_2 <- mx.symbol.Convolution(data = pool_1, kernel = c(5, 5), num_filter = 50)
tanh_2 <- mx.symbol.Activation(data = conv_2, act_type = "tanh")
pool_2 <- mx.symbol.Pooling(data = tanh_2, pool_type = "max", kernel = c(2, 2), stride = c(2, 2))
## 1st fully connected layer
flat <- mx.symbol.Flatten(data = pool_2)
fcl_1 <- mx.symbol.FullyConnected(data = flat, num_hidden = 500)
tanh_3 <- mx.symbol.Activation(data = fcl_1, act_type = "tanh")
## 2nd fully connected layer
fcl_2 <- mx.symbol.FullyConnected(data = tanh_3, num_hidden = 1)
## Output
#NN_model <- mx.symbol.SoftmaxOutput(data = fcl_2)
label <- mx.symbol.Variable("label")
#NN_model <- mx.symbol.MakeLoss(mx.symbol.square(mx.symbol.Reshape(fcl_2, shape = 0) - label))
NN_model <- mx.symbol.LinearRegressionOutput(fcl_2)
mx.set.seed(0)
model <- mx.model.FeedForward.create(NN_model,
X = train_array,
y = train_y,
num.round = 4,
array.batch.size = 64,
initializer = mx.init.Xavier(factor_type = "in", magnitude = 2.34),
learning.rate = 0.00001,
momentum = 0.9,
wd = 0.00001,
eval.metric = mx.metric.rmse)
pred <- predict(model, test_array)
pred[1,1:10]
# [1] 0.4859098 0.4865469 0.5671642 0.5729486 0.5008956 0.4962234 0.4327411 0.5478653 0.5446281 0.5707113
It appears that your network is collapsing, due to a number of potentials. I would try the following modifications:
Use ReLU activation instead of tanh. ReLU has proven to be a much more robust activation in Conv networks than sigmoid or tanh.
User batch-normalization between at the input of your convolutional layers (see paper here).
Divide your range into sections and use softmax. If you must have regression, consider a separate regression network for each range and select the correct regression net based on the output of the softmax. Cross Entropy loss has shown more success in learning highly non-linear functions.

Separate Bayesian parameter estimates for multiple groups in JAGS/rjags

I am trying to perform a hierarchical analysis in JAGS, extrapolating from Kruschke's Doing Bayesian Data Analysis, chapter 9. I wish to obtain posterior parameter estimates for the proportion of heads for four coins (theta's 1,2,3 and 4), coming from two mints, and also the estimates for average bias of the coins that come from each mint (mint bias: omega). I have kept the variability of each mint's bias, kappa, as a constant. The trouble is that I cannot get a posterior estimate from the second mint, it seems to just be sampling the prior. Does anyone know how to fix the model string text (see step 3 below) so as to generate the posterior estimate for the second mint?
Entire script for the analysis below
library(rjags)
library(runjags)
library(coda)
############### 1. Generate the data
flips <- c(sample(c(rep(1,3), rep(0,9))), # coin 1, mint 1, 12 flips total
sample(c(rep(1,1), rep(0,4))), # coin 2, mint 1, 5 flips total
sample(c(rep(1,10), rep(0,5))), # coin 1, mint 2, 15 flips
sample(c(rep(1,17), rep(0,6)))) # coin 2, mint 2, 23 flips
coins <- factor(c(rep(1,12), rep(2,5), rep(3, 15), rep(4, 23)))
mints <- factor(c(rep(1,17), rep(2,38)))
nFlips <- length(flips)
nCoins <- length(unique(coins))
nMints <- length(unique(mints))
#################### 2. Pass data into a list
dataList <- list(
flips = flips,
coins = coins,
mints = mints,
nFlips = nFlips,
nCoins = nCoins,
nMints = nMints)
################### 3. specify and save the model
modelString <- "
model{
# start with nested likelihood function
for (i in 1:nFlips) {
flips[i] ~ dbern(theta[coins[i]])
}
# next the prior on theta
for (coins in 1:nCoins) {
theta[coins] ~ dbeta(omega[mints[coins]]*(kappa - 2) + 1, (1 - omega[mints[coins]])*(kappa - 2) + 1)
}
# next we specify the prior for the higher-level parameters on the mint, omega and kappa
for (mints in 1:nMints) {
omega[mints] ~ dbeta(2,2)
}
kappa <- 5
}
"
writeLines(modelString, "tempModelHier4CoinTwoMint.txt")
############################### Step 4: Initialise Chains
initsList <- list(theta1 = mean(flips[coins==1]),
theta2 = mean(flips[coins==2]),
theta3 = mean(flips[coins==3]),
theta4 = mean(flips[coins==4]),
omega1 = mean(c(mean(flips[coins==1]),
mean(flips[coins==2]))),
omega2 = mean(c(mean(flips[coins==3]),
mean(flips[coins==4]))))
initsList
############################### Step 5: Generate Chains
runJagsOut <- run.jags(method = "simple",
model = "tempModelHier4CoinTwoMint.txt",
monitor = c("theta[1]", "theta[2]", "theta[3]", "theta[4]", "omega[1]", "omega[2]"),
data = dataList,
inits = initsList,
n.chains = 1,
adapt = 500,
burnin = 1000,
sample = 50000,
thin = 1,
summarise = FALSE,
plots = FALSE)
############################### Step 6: Convert to Coda Object
codaSamples <- as.mcmc.list(runJagsOut)
head(codaSamples)
############################### Step 7: Make Graphs
df <- data.frame(as.matrix(codaSamples))
theta1 <- ggplot(df, aes(x = df$theta.1.)) + geom_density()
theta2 <- ggplot(df, aes(x = df$theta.2.)) + geom_density()
theta3 <- ggplot(df, aes(x = df$theta.3.)) + geom_density()
theta4 <- ggplot(df, aes(x = df$theta.4.)) + geom_density()
omega1 <- ggplot(df, aes(x = df$omega.1.)) + geom_density()
omega2 <- ggplot(df, aes(x = df$omega.2.)) + geom_density()
require(gridExtra)
ggsave("coinsAndMintsHier/hierPropFourCoinsTwoMints.pdf", grid.arrange(theta1, theta2, theta3, theta4, omega1, omega2, ncol = 2), device = "pdf", height = 30, width = 10, units = "cm")
The problem was the way you were attempting to index the mints of the coins when setting the prior on theta. There are only 4 theta's in this case, not nFlips. Your nested indexing mints[coins] was accessing the mints data vector, not a vector of which mint each coin belongs to. I've created a corrected version below. Notice the explicit construction of a vector that indexes which mint each coin belongs to. Notice also in the model specification each for-loop index has its own index name distinct from data names.
graphics.off() # This closes all of R's graphics windows.
rm(list=ls()) # Careful! This clears all of R's memory!
library(runjags)
library(coda)
#library(rjags)
############### 1. Generate the data
flips <- c(sample(c(rep(1,3), rep(0,9))), # coin 1, mint 1, 12 flips total
sample(c(rep(1,1), rep(0,4))), # coin 2, mint 1, 5 flips total
sample(c(rep(1,10), rep(0,5))), # coin 1, mint 2, 15 flips
sample(c(rep(1,17), rep(0,6)))) # coin 2, mint 2, 23 flips
# NOTE: I got rid of `factor` because it was unneeded and got in the way
coins <- c(rep(1,12), rep(2,5), rep(3, 15), rep(4, 23))
# NOTE: I got rid of `factor` because it was unneeded and got in the way
mints <- c(rep(1,17), rep(2,38))
nFlips <- length(flips)
nCoins <- length(unique(coins))
nMints <- length(unique(mints))
# NEW: Create vector that specifies the mint of each coin. There must be a more
# elegant way to do this, but here is a logical brute-force approach. This
# assumes that coins are consecutively numbered from 1 to nCoins.
mintOfCoin = NULL
for ( cIdx in 1:nCoins ) {
mintOfCoin = c( mintOfCoin , unique(mints[coins==cIdx]) )
}
#################### 2. Pass data into a list
dataList <- list(
flips = flips,
coins = coins,
mints = mints,
nFlips = nFlips,
nCoins = nCoins,
nMints = nMints,
mintOfCoin = mintOfCoin # NOTE
)
################### 3. specify and save the model
modelString <- "
model{
# start with nested likelihood function
for (fIdx in 1:nFlips) {
flips[fIdx] ~ dbern( theta[coins[fIdx]] )
}
# next the prior on theta
# NOTE: Here we use the mintOfCoin index.
for (cIdx in 1:nCoins) {
theta[cIdx] ~ dbeta( omega[mintOfCoin[cIdx]]*(kappa - 2) + 1 ,
( 1 - omega[mintOfCoin[cIdx]])*(kappa - 2) + 1 )
}
# next we specify the prior for the higher-level parameters on the mint,
# omega and kappa
# NOTE: I changed the name of the mint index so it doesn't conflict with
# mints data vector.
for (mIdx in 1:nMints) {
omega[mIdx] ~ dbeta(2,2)
}
kappa <- 5
}
"
writeLines(modelString, "tempModelHier4CoinTwoMint.txt")
############################### Step 4: Initialise Chains
initsList <- list(theta1 = mean(flips[coins==1]),
theta2 = mean(flips[coins==2]),
theta3 = mean(flips[coins==3]),
theta4 = mean(flips[coins==4]),
omega1 = mean(c(mean(flips[coins==1]),
mean(flips[coins==2]))),
omega2 = mean(c(mean(flips[coins==3]),
mean(flips[coins==4]))))
initsList
############################### Step 5: Generate Chains
runJagsOut <- run.jags(method = "parallel",
model = "tempModelHier4CoinTwoMint.txt",
# NOTE: theta and omega are vectors:
monitor = c( "theta", "omega" , "kappa" ),
data = dataList,
#inits = initsList, # NOTE: Let JAGS initialize.
n.chains = 4, # NOTE: Not only 1 chain.
adapt = 500,
burnin = 1000,
sample = 10000,
thin = 1,
summarise = FALSE,
plots = FALSE)
############################### Step 6: Convert to Coda Object
codaSamples <- as.mcmc.list(runJagsOut)
head(codaSamples)
########################################
## NOTE: Important step --- Check MCMC diagnostics
# Display diagnostics of chain, for specified parameters:
source("DBDA2E-utilities.R") # For function diagMCMC()
parameterNames = varnames(codaSamples) # from coda package
for ( parName in parameterNames ) {
diagMCMC( codaObject=codaSamples , parName=parName )
}
############################### Step 7: Make Graphs
# ...

How to make predictions after every 50 cycles in RSNNS

I am RSNNS to make a model. I am using QuickProp algorithm. here's my neural network:
mydata1 <- read.csv("-1-5_rand1.csv");
mydata <- mydata1[1:151, ]
test_set <- mydata1[152:168, ]
test_set1 <- test_set[c(-7)]
a <- SnnsRObjectFactory()
input <- mydata[c(-7)]
output <- mydata[c(7)]
b <- splitForTrainingAndTest(input, output, ratio = 0.22)
a <- mlp(b$inputsTrain, b$targetsTrain, size = 9, maxit = 650, learnFunc = "Quickprop", learnFuncParams = c(0.01, 2.5, 0.0001, 0, 0), updateFunc = "Topological_Order",
updateFuncParams = c(0.0), hiddenActFunc = "Act_TanH", computeError=TRUE, initFunc = "Randomize_Weights", initFuncParams = c(-1,1),
shufflePatterns = TRUE, linOut = FALSE, inputsTest = b$inputsTest, targetsTest = b$targetsTest)
I am predicting using test set as:
predictions <- predict(a, test_set1)
Is it possible to in RSNNS to predict after every 50 cycles using test set instead of predicting after 650 cycles?
the answer is you can't do it with the high-level interface, but with the low-level interface, you can have a look, e.g., at the mlp_irisSnnsR.R demo that is included in RSNNS

Resources