LSTM Sequential Model, Predict future Values on M15 chart for one day - jupyter-notebook

Hello Stackoverflow members,
I have built up an LSTM Seuqential Model for Forex M15 Values, specifically for the pair EURUSD, with typical_price as the price type.
Now after setting up and train the model, I would like to predict, extrapolate the typical_price for one future day.
In my dataset I took the data for one month (January 2017) from 1st to 30th as training and testing dataset (1920 values). Now I would like to extrapolate the prices for the 31th of January. I cannot really resolve what the model likes as input data and shape, to extrapolate the data from the last value of the 30th of January.
Can someone give me a hint or explain what the function model.predict() needs as input values?
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from subprocess import check_output
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM
from keras.models import Sequential
from sklearn.cross_validation import train_test_split
import time #helper libraries
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
from numpy import newaxis
from keras.metrics import mean_squared_error
from sklearn.model_selection import StratifiedKFold
import time
df = pd.read_csv('EURUSD15.csv')
df.columns = ['date','time','open','high','low','close','vol']
df['date']=df['date'].str.replace('.','-')
J = df[(df['date'] > '2017-01-01') & (df['date'] < '2017-01-30')]
J['timestamp'] = pd.to_datetime(J['date'].apply(str)+' '+J['time'])
J['tp']=((J['high']+J['low']+J['close'])/3)
EURUSD = J[['timestamp','open','high','low','close','vol','tp']]
df = EURUSD.drop(['timestamp','open','high','low','close','vol'], axis=1)
scaler = MinMaxScaler(feature_range=(0,1))
df = scaler.fit_transform(df)
def window_transform_series(series,window_size):
# containers for input/output pairs
dataX = []
datay = []
for i in range(window_size, len(series)):
dataX.append(series[i - window_size:i])
datay.append(series[i])
# reshape
dataX = np.asarray(dataX)
dataX.shape = (np.shape(dataX)[0:2])
datay = np.asarray(datay)
datay.shape = (len(datay),1)
return dataX,datay
window_size = 50
dataX,datay = window_transform_series(series = df, window_size = window_size)
train_test_split = int(np.ceil(2*len(datay)/float(3))) # set the split point
# partition the training set
# X_train = dataX[:train_test_split,:]
# y_train = datay[:train_test_split]
# partition the training set
X_train = dataX[:train_test_split,:]
y_train = datay[:train_test_split]
#keep the last chunk for testing
X_test = dataX[train_test_split:,:]
y_test = datay[train_test_split:]
# NOTE: to use keras's RNN LSTM module our input must be reshaped
X_train = np.asarray(np.reshape(X_train, (X_train.shape[0], window_size, 1)))
X_test = np.asarray(np.reshape(X_test, (X_test.shape[0], window_size, 1)))
import keras
np.random.seed(0)
#Build an RNN to perform regression on our time series input/output data
model = Sequential()
model.add(LSTM(5, input_shape=(window_size, 1)))
model.add(Dense(1))
optimizer = keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=1e-08, decay=0.0)
# compile the model
model.compile(loss='mean_squared_error', optimizer=optimizer)
model.fit(X_train, y_train, epochs=500, batch_size=64, verbose=1)
train_predict = model.predict(X_train)
test_predict = model.predict(X_test)
# print out training and testing errors
training_error = model.evaluate(X_train, y_train, verbose=0)
print('training error = ' + str(training_error))
testing_error = model.evaluate(X_test, y_test, verbose=0)
print('testing error = ' + str(testing_error))
training error = 0.0001732897365647525
testing error = 0.00019586048660112955
%matplotlib inline
#plot original series
plt.plot(df, color = 'k')
# plot training set prediction
split_pt = train_test_split + window_size
plt.plot(np.arange(window_size,split_pt,1),train_predict,color = 'b')
# plot testing set prediction
plt.plot(np.arange(split_pt,split_pt + len(test_predict),1), test_predict,color ='r')
# pretty up graph
plt.xlabel('day')
plt.ylabel('(normalized) price of EURUSD')
plt.legend(['original series','training fit','testing fit'],loc='center left', bbox_to_anchor=(1, 0.5))
plt.show()

It suppose to be open, highest, lowest price, volume. So you can predict closing price for some imaginary date or you can model.predict(X_test[30]). But one line in your code is strange - the line where you drop all yours features. I wonder how yout X_train[0] looks like.

Related

How do I test BERT with my own input instead of using a database?

So I am new to BERT and language classification and I wrote this code following an online course, but they use a database in order to test and train the model:
# imports and setup for BERT
!pip install transformers
import os
import gdown
import torch
import numpy as np
import transformers
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from keras.utils import pad_sequences
from transformers import BertTokenizer
from transformers import get_linear_schedule_with_warmup
from transformers import BertForSequenceClassification, AdamW, BertConfig
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
%matplotlib inline
# gdown.download('https://drive.google.com/uc?id=1q4U2gVY9tWEPdT6W-pdQpKmo152QqWLE', 'finance_train.csv', True)
# gdown.download('https://drive.google.com/uc?id=1nIBqAsItwVEGVayYTgvybz7HeK0asom0', 'finance_test.csv', True)
!wget 'https://storage.googleapis.com/inspirit-ai-data-bucket-1/Data/AI%20Scholars/Sessions%206%20-%2010%20(Projects)/Project%20-%20NLP%2BFinance/finance_test.csv'
!wget 'https://storage.googleapis.com/inspirit-ai-data-bucket-1/Data/AI%20Scholars/Sessions%206%20-%2010%20(Projects)/Project%20-%20NLP%2BFinance/finance_train.csv'
def get_finance_train():
df_train = pd.read_csv("finance_train.csv")
return df_train
def get_finance_test():
df_test = pd.read_csv("finance_test.csv")
return df_test
def flat_accuracy(preds, labels):
pred_flat = np.argmax(preds, axis=1).flatten()
labels_flat = labels.flatten()
return np.sum(pred_flat == labels_flat) / len(labels_flat)
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased", do_lower_case = True)
print ("Train and Test Files Loaded as train.csv and test.csv")
LABEL_MAP = {0 : "negative", 1 : "neutral", 2 : "positive"}
NONE = 4 * [None]
RND_SEED=2020
df_train = get_finance_train()
sentences = df_train["Sentence"].values
labels = df_train["Label"].values
input_ids = []
sentences_with_special_tokens = []
tokenized_texts = []
attention_masks = []
for sentence in sentences:
new_sentence = "[CLS] " + sentence + " [SEP]"
sentences_with_special_tokens.append(new_sentence)
for sentence in sentences_with_special_tokens:
tokenized_sentence = tokenizer.tokenize(sentence)
tokenized_texts.append(tokenized_sentence)
for token in tokenized_texts:
ids = tokenizer.convert_tokens_to_ids(token)
input_ids.append(ids)
input_ids = pad_sequences(input_ids,
maxlen=128,
dtype="long",
truncating="post",
padding="post")
for id in input_ids:
mask=[float(i>0) for i in id]
attention_masks.append(mask)
X_train, X_val, y_train, y_val =train_test_split(input_ids, labels, test_size=0.15, random_state=RND_SEED)
train_masks, validation_masks, _, _ = train_test_split(attention_masks, input_ids, test_size=0.15,random_state=RND_SEED)
train_inputs = torch.tensor(np.array(X_train));
validation_inputs = torch.tensor(np.array(X_val));
train_masks = torch.tensor(np.array(train_masks));
validation_masks = torch.tensor(np.array(validation_masks));
train_labels = torch.tensor(np.array(y_train));
validation_labels = torch.tensor(np.array(y_val));
batch_size = 32
train_data = TensorDataset(train_inputs, train_masks, train_labels);
train_sampler = RandomSampler(train_data); # Samples data randonly for training
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size);
validation_data = TensorDataset(validation_inputs, validation_masks, validation_labels);
validation_sampler = SequentialSampler(validation_data); # Samples data sequentially
validation_dataloader = DataLoader(validation_data, sampler=validation_sampler, batch_size=batch_size);
#Load BertForSequenceClassification, the pretrained BERT model with a single linear classification layer on top.
model = BertForSequenceClassification.from_pretrained(
"bert-base-uncased", # Use the 12-layer BERT small model, with an uncased vocab.
num_labels = 3,
output_attentions = False, # Whether the model returns attentions weights.
output_hidden_states = False, # Whether the model returns all hidden-states.
);
# Given that this a huge neural network, we need to explicity specify
# in pytorch to run this model on the GPU.
model.cuda();
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
n_gpu = torch.cuda.device_count()
torch.cuda.get_device_name(0)
optimizer = AdamW(model.parameters(),
lr = 2e-5,
eps = 1e-8
)
epochs = 4
# Total number of training steps is [number of batches] x [number of epochs].
# (Note that this is not the same as the number of training samples).
total_steps = len(train_dataloader) * epochs
# Create the learning rate scheduler.
scheduler = get_linear_schedule_with_warmup(optimizer,
num_warmup_steps = 0, # Default value in run_glue.py
num_training_steps = total_steps)
# We'll store training and validation loss,
# validation accuracy, and timings.
training_loss = []
validation_loss = []
training_stats = []
for epoch_i in range(0, epochs):
# Training
print('Epoch {:} / {:} ========'.format(epoch_i + 1, epochs))
print('Training the model')
# Reset the total loss for epoch.
total_train_loss = 0
# Put the model into training mode.
model.train()
# For each batch of training data
for step, batch in enumerate(train_dataloader):
# Progress update every 40 batches.
if step % 20 == 0 and not step == 0:
# Report progress.
print(' Batch {:>5,} of {:>5,}. '.format(step, len(train_dataloader)))
# STEP 1 & 2: Unpack this training batch from our dataloader.
# As we unpack the batch, we'll also copy each tensor to the GPU using the
# `to` method.
# `batch` contains three pytorch tensors:
# [0]: input ids
# [1]: attention masks
# [2]: labels
b_input_ids = batch[0].to(device)
b_input_mask = batch[1].to(device)
b_labels = batch[2].to(device)
# STEP 3
# Always clear any previously calculated gradients before performing a
# backward pass.
model.zero_grad()
# STEP 4
# Perform a forward pass (evaluate the model on this training batch).
# It returns the loss (because we provided labels) and
# the "logits"--the model outputs prior to activation.
outputs = model(b_input_ids,
token_type_ids=None,
attention_mask=b_input_mask,
labels=b_labels)
loss = outputs[0]
logits = outputs[1]
# Accumulate the training loss over all of the batches so that we can
# calculate the average loss at the end. `loss` is a Tensor containing a
# single value; the `.item()` function just returns the Python value
# from the tensor.
total_train_loss += loss.item()
# STEP 5
# Perform a backward pass to calculate the gradients.
loss.backward()
# Clip the norm of the gradients to 1.0.
# This is to help prevent the "exploding gradients" problem.
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
# STEP 6
# Update parameters and take a step using the computed gradient
optimizer.step()
# Update the learning rate.
scheduler.step()
# Calculate the average loss over all of the batches.
avg_train_loss = total_train_loss / len(train_dataloader)
print(" Average training loss: {0:.2f}".format(avg_train_loss))
# Validation
# After the completion of each training epoch, measure our performance on
# our validation set.
print("Evaluating on Validation Set")
# Put the model in evaluation mode
model.eval()
# Tracking variables
total_eval_accuracy = 0
total_eval_loss = 0
nb_eval_steps = 0
# Evaluate data for one epoch
for batch in validation_dataloader:
#Step 1 and Step 2
# Unpack this validation batch from our dataloader.
b_input_ids = batch[0].to(device)
b_input_mask = batch[1].to(device)
b_labels = batch[2].to(device)
# Tell pytorch not to bother with constructing the compute graph during
# the forward pass, since this is only needed for backprop (training).
with torch.no_grad():
# Forward pass, calculate logit predictions.
# The "logits" are the output
# values prior to applying an activation function like the softmax.
outputs = model(b_input_ids,
token_type_ids=None,
attention_mask=b_input_mask,
labels=b_labels)
loss = outputs[0]
logits = outputs[1]
# Accumulate the validation loss.
total_eval_loss += loss.item()
# Move logits and labels to CPU
logits = logits.detach().cpu().numpy()
label_ids = b_labels.to('cpu').numpy()
# Calculate the accuracy for this batch of test sentences, and
# accumulate it over all batches.
total_eval_accuracy += flat_accuracy(logits, label_ids)
# Report the final accuracy for this validation run.
avg_val_accuracy = total_eval_accuracy / len(validation_dataloader)
print("Validation Accuracy: {0:.2f}".format(avg_val_accuracy))
# Calculate the average loss over all of the batches.
avg_val_loss = total_eval_loss / len(validation_dataloader)
print("Validation Loss: {0:.2f}".format(avg_val_loss))
training_loss.append(avg_train_loss)
validation_loss.append(avg_val_loss)
# Record all statistics from this epoch.
training_stats.append(
{
'epoch': epoch_i + 1,
'Training Loss': avg_train_loss,
'Valid. Loss': avg_val_loss,
'Valid. Accur.': avg_val_accuracy
}
)
print("Training complete!")
Here is what I have:
import tensorflow as tf
df_test = "The stock is growing, the company is doing well. The company is close to bankruptcy and the stock price is falling"
sentences_with_special_tokens = []
tokenized_texts = []
input_ids = []
attention_masks = []
for sentence in df_test:
new_sentence = "[CLS] " + sentence + " [SEP]"
sentences_with_special_tokens.append(new_sentence)
print(sentences_with_special_tokens)
tokenized_texts = []
for sentence in sentences_with_special_tokens:
tokenized_sentence = tokenizer.tokenize(sentence)
tokenized_texts.append(tokenized_sentence)
print(tokenized_texts)
for token in tokenized_texts:
ids = tokenizer.convert_tokens_to_ids(token)
input_ids.append(ids)
print(input_ids)
input_ids = pad_sequences(input_ids,
maxlen=128,
dtype="long",
truncating="post",
padding="post")
for id in input_ids:
mask=[float(i>0) for i in id]
attention_masks.append(mask)
print(attention_masks)
I want to input df_test into the model and for it to return the percentages for each of the labels
LABEL_MAP = {0 : "negative", 1 : "neutral", 2 : "positive"}
I looked online and I found this on hugging face for single label classification:
import torch
from transformers import BertTokenizer, BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained("textattack/bert-base-uncased-yelp-polarity")
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_class_id = logits.argmax().item()
model.config.id2label[predicted_class_id]
But when I tried implementing this into my code, it tells me that it needs all of the tensors on the same device and I have at least two devices.
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

holoviews overaly of quadmesh and points with time slider not working

I am trying to overaly hv.points on top of hv.quadmesh or hv.image where both these plots have a time dimension.
import xarray as xr
import numpy as np
import pandas as pd
import holoviews as hv
hv.extension('bokeh')
#create data
np.random.seed(0)
temperature = 15 + 8 * np.random.randn(2, 2, 3)
precipitation = 10 * np.random.rand(2, 2, 3)
lon = [[-99.83, -99.32], [-99.79, -99.23]]
lat = [[42.25, 42.21], [42.63, 42.59]]
time = pd.date_range("2014-09-06", periods=3)
reference_time = pd.Timestamp("2014-09-05")
ds = xr.Dataset(
data_vars=dict(
temperature=(["x", "y", "time"], temperature),
precipitation=(["x", "y", "time"], precipitation),
),
coords=dict(
lon=(["x", "y"], lon),
lat=(["x", "y"], lat),
time=time,
reference_time=reference_time,
),
attrs=dict(description="Weather related data."),
)
df = ds.temperature.to_dataframe().reset_index()
#create quadmesh plot
img = hv.Dataset(ds, ['lon', 'lat','time'],['temperature']).to(hv.QuadMesh,['lon', 'lat'],['temperature'])
img
#create points plot
pnt = hv.Points(df,vdims=['temperature','time'], kdims=['lon','lat']).groupby('time').opts(color='temperature',cmap='turbo')
pnt
Overaly of both image and point only shows first plot
img*pnt
If i remove the time component from the points plot, I can overlay the data but then the slider does not change points value with time
img*hv.Points(df,vdims=['temperature','time'], kdims=['lon','lat']).opts(color='temperature',cmap='turbo')
[image*points with no time component][4]
Thank you for your help !!
Your problem is likely due to the transformation of the time objects when you cast to a dataframe.
If you manually try to select data for a particular time step via
dfSubset = df.where(df.time == ds.time[0]).dropna()
you will encounter an error.
In my experience a manual selection of the time and then letting HoloViews decide how to handle the objects works better than grouping and then transforming via .to().
In your case this would be the following 3 lines:
dsSubset = ds.where(ds.time == ds.time[0], drop=True)
dfSubset = df.where(df.time == df.time[0]).dropna()
hv.QuadMesh(dsSubset, ['lon', 'lat'] ) * hv.Points(dfSubset, ['lon','lat'], vdims=['temperature']).opts(color='temperature',cmap='plasma',colorbar=True)

expected axis -1 of input shape to have value 20 but received input with shape (None, 29)

ValueError: Input 0 of layer sequential_66 is incompatible with the layer: expected axis -1 of input shape to have value 20 but received input with shape (None, 29)
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD
# Generate dummy data
import numpy as np
x_train = np.random.random((1000, 29))
y_train = keras.utils.to_categorical(np.random.randint(10, size=(1000, 1)), num_classes=10)
x_test = np.random.random((100, 20))
y_test = keras.utils.to_categorical(np.random.randint(10, size=(100, 1)), num_classes=10)
model = Sequential()
# Dense(64) is a fully-connected layer with 64 hidden units.
# in the first layer, you must specify the expected input data shape:
# here, 20-dimensional vectors.
model.add(Dense(64, activation='relu', input_dim=20))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
model.fit(x_train, y_train,
epochs=20,
batch_size=128)
score = model.evaluate(x_test, y_test, batch_size=128)
please, explained for me! Thanks.
learning:
Keras models are built on Numpy arrays of input data and labels. To train a model, we usually fituse functions.
Multi-layer artificial neural networks (MLP) for multi-class softmax classification:
https://keras.io/ko/getting-started/sequential-model-guide/
#x_train, và x_test có dạng 2 chiều nên số cột của x_train là số chiều vào cho mạng ở trên.

Holoviews tap stream of correlation heatmap and regression plot

I want to make a correlation heatmap for a DataFrame and a regression plot for each pair of the variables. I have tried to read all the docs and am still having a very hard time to connect two plots so that when I tap the heatmap, the corresponding regression plot can show up.
Here's some example code:
import holoviews as hv
from holoviews import opts
import seaborn as sns
import numpy as np
import pandas as pd
hv.extension('bokeh')
df = sns.load_dataset('tips')
df = df[['total_bill', 'tip', 'size']]
corr = df.corr()
heatmap = hv.HeatMap((corr.columns, corr.index, corr))\
.opts(tools=['tap', 'hover'], height=400, width=400, toolbar='above')
m, b = np.polyfit(df.tip, df.total_bill, deg=1)
x = np.linspace(df.tip.min(), df.tip.max())
y = m*x + b
curve = hv.Curve((x, y))\
.opts(height=400, width=400, color='red', ylim=(0, 100))
points = hv.Scatter((df.tip, df.total_bill))
hv.Layout((points * curve) + heatmap).cols(2)
I adjusted the relevant parts of the docs http://holoviews.org/reference/streams/bokeh/Tap.html with your code. Maybe this clears up your confusion.
import pandas as pd
import numpy as np
import holoviews as hv
from holoviews import opts
hv.extension('bokeh', width=90)
import seaborn as sns
# Declare dataset
df = sns.load_dataset('tips')
df = df[['total_bill', 'tip', 'size']]
# Declare HeatMap
corr = df.corr()
heatmap = hv.HeatMap((corr.columns, corr.index, corr))
# Declare Tap stream with heatmap as source and initial values
posxy = hv.streams.Tap(source=heatmap, x='total_bill', y='tip')
# Define function to compute histogram based on tap location
def tap_histogram(x, y):
m, b = np.polyfit(df[x], df[y], deg=1)
x_data = np.linspace(df.tip.min(), df.tip.max())
y_data = m*x_data + b
return hv.Curve((x_data, y_data), x, y) * hv.Scatter((df[x], df[y]), x, y)
tap_dmap = hv.DynamicMap(tap_histogram, streams=[posxy])
(heatmap + tap_dmap).opts(
opts.Scatter(height=400, width=400, color='red', ylim=(0, 100), framewise=True),
opts.HeatMap(tools=['tap', 'hover'], height=400, width=400, toolbar='above'),
opts.Curve(framewise=True)
)
Two common problems we face while modeling is collinearity and nonlinearity. The collinearity could be visualized with a correlation heatmap, but it would become hard to explore with a large amount of variables/features. In the following application, you can hover the mouse over to check the correlation coefficient between any two variables. When you tap, the scatter plot will be updated with a second-degree fitted curve to reveal the nonlinearity between the two variables.
With the help of #doopler, I changed the code a little bit and share it here:
import numpy as np
import pandas as pd
import holoviews as hv
hv.extension('bokeh')
# generate random data
df = pd.DataFrame(data={'col_1': np.random.normal(5, 2, 100)})
df['col_2'] = df.col_1 + np.random.gamma(5, 2, 100)
df['col_3'] = df.col_1*2 + np.random.normal(0, 10, 100)
df['col_4'] = df.col_1**2 + np.random.normal(0, 10, 100)
df['col_5'] = np.sin(df.col_1)
df['col_6'] = np.cos(df.col_1)
corr = df.corr().abs()
# mask the upper triangle of the heatmap
corr.values[np.triu_indices_from(corr, 0)] = np.nan
heatmap = hv.HeatMap((corr.columns, corr.index, corr))\
.opts(tools=['hover'], height=400, width=400, fontsize=9,
toolbar='above', colorbar=False, cmap='Blues',
invert_yaxis=True, xrotation=90, xlabel='', ylabel='',
title='Correlation Coefficient Heatmap (absolute value)')
# define tap stream with heatmap as source
tap_xy = hv.streams.Tap(source=heatmap, x='col_1', y='col_4')
# calculate correlation plot based on tap
def tap_corrplot(x, y):
# drop missing values if there are any
df_notnull = df[[x, y]].dropna(how='any')
# fit a 2nd degree line/curve
m1, m2, b = np.polyfit(df_notnull[x], df_notnull[y], deg=2)
# generate data to plot fitted line/curve
x_curve = np.linspace(df[x].min(), df[x].max())
y_curve = m1*x_curve**2 + m2*x_curve+ b
curve = hv.Curve((x_curve, y_curve), x, y)\
.opts(color='#fc4f30', framewise=True)
scatter = hv.Scatter((df[x], df[y]), x, y)\
.opts(height=400, width=400, fontsize=9, size=5,
alpha=0.2, ylim=(df[y].min(), df[y].max()),
color='#30a2da', framewise=True,
title='Correlation Plot (2nd degree fit)')
return curve * scatter
# map tap in heatmap with correlation plot
tap_dmap = hv.DynamicMap(tap_corrplot, streams=[tap_xy])
layout = heatmap + tap_dmap
layout
In case that you need to run a Bokeh application:
from bokeh.server.server import Server
renderer = hv.renderer('bokeh')
app = renderer.app(layout)
server = Server({'/': app}, port=0)
server.start()
server.show('/')
The code works well with Jupyter Lab. If you use Jupyter Notebook, check this link.

Understanding Keras prediction output of a rnn model in R

I'm trying out the Keras package in R by doing this tutorial about forecasting the temperature. However, the tutorial has no explanation on how to predict with the trained RNN model and I wonder how to do this. To train a model I used the following code copied from the tutorial:
dir.create("~/Downloads/jena_climate", recursive = TRUE)
download.file(
"https://s3.amazonaws.com/keras-datasets/jena_climate_2009_2016.csv.zip",
"~/Downloads/jena_climate/jena_climate_2009_2016.csv.zip"
)
unzip(
"~/Downloads/jena_climate/jena_climate_2009_2016.csv.zip",
exdir = "~/Downloads/jena_climate"
)
library(readr)
data_dir <- "~/Downloads/jena_climate"
fname <- file.path(data_dir, "jena_climate_2009_2016.csv")
data <- read_csv(fname)
data <- data.matrix(data[,-1])
train_data <- data[1:200000,]
mean <- apply(train_data, 2, mean)
std <- apply(train_data, 2, sd)
data <- scale(data, center = mean, scale = std)
generator <- function(data, lookback, delay, min_index, max_index,
shuffle = FALSE, batch_size = 128, step = 6) {
if (is.null(max_index))
max_index <- nrow(data) - delay - 1
i <- min_index + lookback
function() {
if (shuffle) {
rows <- sample(c((min_index+lookback):max_index), size = batch_size)
} else {
if (i + batch_size >= max_index)
i <<- min_index + lookback
rows <- c(i:min(i+batch_size, max_index))
i <<- i + length(rows)
}
samples <- array(0, dim = c(length(rows),
lookback / step,
dim(data)[[-1]]))
targets <- array(0, dim = c(length(rows)))
for (j in 1:length(rows)) {
indices <- seq(rows[[j]] - lookback, rows[[j]],
length.out = dim(samples)[[2]])
samples[j,,] <- data[indices,]
targets[[j]] <- data[rows[[j]] + delay,2]
}
list(samples, targets)
}
}
lookback <- 1440
step <- 6
delay <- 144
batch_size <- 128
train_gen <- generator(
data,
lookback = lookback,
delay = delay,
min_index = 1,
max_index = 200000,
shuffle = TRUE,
step = step,
batch_size = batch_size
)
val_gen = generator(
data,
lookback = lookback,
delay = delay,
min_index = 200001,
max_index = 300000,
step = step,
batch_size = batch_size
)
test_gen <- generator(
data,
lookback = lookback,
delay = delay,
min_index = 300001,
max_index = NULL,
step = step,
batch_size = batch_size
)
# How many steps to draw from val_gen in order to see the entire validation set
val_steps <- (300000 - 200001 - lookback) / batch_size
# How many steps to draw from test_gen in order to see the entire test set
test_steps <- (nrow(data) - 300001 - lookback) / batch_size
library(keras)
model <- keras_model_sequential() %>%
layer_flatten(input_shape = c(lookback / step, dim(data)[-1])) %>%
layer_dense(units = 32, activation = "relu") %>%
layer_dense(units = 1)
model %>% compile(
optimizer = optimizer_rmsprop(),
loss = "mae"
)
history <- model %>% fit_generator(
train_gen,
steps_per_epoch = 500,
epochs = 20,
validation_data = val_gen,
validation_steps = val_steps
)
I tried to predict the temperature with the code below. If I am correct, this should give me the normalized predicted temperature for every batch. So when I denormalize the values and average them, I get the predicted temperature. Is this correct and if so for which time is then predicted (latest observation time + delay?) ?
prediction.set <- test_gen()[[1]]
prediction <- predict(model, prediction.set)
Also, what is the correct way to use keras::predict_generator() and the test_gen() function? If I use the following code:
model %>% predict_generator(generator = test_gen,
steps = test_steps)
it gives this error:
error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: Error when checking model input: the list of Numpy
arrays that you are passing to your model is not the size the model expected.
Expected to see 1 array(s), but instead got the following list of 2 arrays:
[array([[[ 0.50394005, 0.6441838 , 0.5990761 , ..., 0.22060473,
0.2018686 , -1.7336458 ],
[ 0.5475698 , 0.63853574, 0.5890239 , ..., -0.45618412,
-0.45030192, -1.724062...
Note: my familiarity with syntax of R is very little, so unfortunately I can't give you an answer using R. Instead, I am using Python in my answer. I hope you could easily translate back, my words at least, to R.
... If I am correct, this should give me the normalized predicted
temperature for every batch.
Yes, that's right. The predictions would be normalized since you have trained it with normalized labels:
data <- scale(data, center = mean, scale = std)
Therefore, you would need to denormalize the values using the computed mean and std to find the real predictions:
pred = model.predict(test_data)
denorm_pred = pred * std + mean
... for which time is then predicted (latest observation time +
delay?)
That's right. Concretely, since in this particular dataset every ten minutes a new obeservation is recorded and you have set delay=144, it would mean that the predicted value is the temperature 24 hours ahead (i.e. 144 * 10 = 1440 minutes = 24 hours) from the last given observation.
Also, what is the correct way to use keras::predict_generator() and
the test_gen() function?
predict_generator takes a generator that gives as output only test samples and not the labels (since we don't need labels when we are performing prediction; the labels are needed when training, i.e. fit_generator(), and when evaluating the model, i.e. evaluate_generator()). That's why the error mentions that you need to pass one array instead of two arrays. So you need to define a generator that only gives test samples or one alternative way, in Python, is to wrap your existing generator inside another function that gives only the input samples (I don't know whether you can do this in R or not):
def pred_generator(gen):
for data, labels in gen:
yield data # discards labels
preds = model.predict_generator(pred_generator(test_generator), number_of_steps)
You need to provide one other argument which is the number of steps of generator to cover all the samples in test data. Actually we have num_steps = total_number_of_samples / batch_size. For example, if you have 1000 samples and each time the generator generate 10 samples, you need to use generator for 1000 / 10 = 100 steps.
Bonus: To see how good your model performs you can use evaluate_generator using the existing test generator (i.e. test_gen):
loss = model.evaluate_generator(test_gen, number_of_steps)
The given loss is also normalized and to denormalize it (to get a better sense of prediction error) you just need to multiply it by std (you don't need to add mean since you are using mae, i.e. mean absolute error, as the loss function):
denorm_loss = loss * std
This would tell you how much your predictions are off on average. For example, if you are predicting the temperature, a denorm_loss of 5 means that the predictions are on average 5 degrees off (i.e. are either less or more than the actual value).
Update: For prediction, you can define a new generator using an existing generator in R like this:
pred_generator <- function(gen) {
function() { # wrap it in a function to make it callable
gen()[1] # call the given generator and get the first element (i.e. samples)
}
}
preds <- model %>%
predict_generator(
generator = pred_generator(test_gen), # pass test_gen directly to pred_generator without calling it
steps = test_steps
)
evaluate_generator(model, test_gen, test_steps)

Resources