How to use pykalman filter_update for online regression - recursion

I want to use Kalman regression recursively on an incoming stream of price data using kf.filter_update() but I can't make it work. Here's the example code framing the problem:
The dataset (i.e. the stream):
DateTime CAT DOG
2015-01-02 09:01:00, 1471.24, 9868.76
2015-01-02 09:02:00, 1471.75, 9877.75
2015-01-02 09:03:00, 1471.81, 9867.70
2015-01-02 09:04:00, 1471.59, 9849.03
2015-01-02 09:05:00, 1471.45, 9840.15
2015-01-02 09:06:00, 1471.16, 9852.71
2015-01-02 09:07:00, 1471.30, 9860.24
2015-01-02 09:08:00, 1471.39, 9862.94
The data is read into a Pandas dataframe and the following code simulates the stream by iterating over the df:
df = pd.read_csv('data.txt')
df.dropna(inplace=True)
history = {}
history["spread"] = []
history["state_means"] = []
history["state_covs"] = []
for idx, row in df.iterrows():
if idx == 0: # Initialize the Kalman filter
delta = 1e-9
trans_cov = delta / (1 - delta) * np.eye(2)
obs_mat = np.vstack([df.iloc[0].CAT, np.ones(df.iloc[0].CAT.shape)]).T[:, np.newaxis]
kf = KalmanFilter(n_dim_obs=1, n_dim_state=2,
initial_state_mean=np.zeros(2),
initial_state_covariance=np.ones((2, 2)),
transition_matrices=np.eye(2),
observation_matrices=obs_mat,
observation_covariance=1.0,
transition_covariance=trans_cov)
state_means, state_covs = kf.filter(np.asarray(df.iloc[0].DOG))
history["state_means"], history["state_covs"] = state_means, state_covs
slope=state_means[:, 0]
print "SLOPE", slope
else:
state_means, state_covs = kf.filter_update(history["state_means"][-1], history["state_covs"][-1], observation = np.asarray(df.iloc[idx].DOG))
history["state_means"].append(state_means)
history["state_covs"].append(state_covs)
slope=state_means[:, 0]
print "SLOPE", slope
The Kalman filter initializes properly and I get the first regression coefficient, but the subsequent updates throws an exception:
Traceback (most recent call last):
SLOPE [ 6.70319125]
File "C:/Users/.../KalmanUpdate_example.py", line 50, in <module>
KalmanOnline(df)
File "C:/Users/.../KalmanUpdate_example.py", line 43, in KalmanOnline
state_means, state_covs = kf.filter_update(history["state_means"][-1], history["state_covs"][-1], observation = np.asarray(df.iloc[idx].DOG))
File "C:\Python27\Lib\site-packages\pykalman\standard.py", line 1253, in filter_update
2, "observation_matrix"
File "C:\Python27\Lib\site-packages\pykalman\standard.py", line 38, in _arg_or_default
+ ' You must specify it manually.') % (name,)
ValueError: observation_matrix is not constant for all time. You must specify it manually.
Process finished with exit code 1
It seems intuitively clear that the observation matrix is required (it's provided in the initial step, but not in the updating steps), but I cannot figure out how to set it up properly. Any feedback would be highly appreciated.

Pykalman allows you to declare the observation matrix in two ways:
[n_timesteps, n_dim_obs, n_dim_obs] - once for the whole estimation
[n_dim_obs, n_dim_obs] - separately for each estimation step
In your code you used the first option (that's why "observation_matrix is not constant for all time"). But then you used filter_update in the loop and Pykalman could not understand what to use as the observation matrix in each iteration.
I would declare the observation matrix as a 2-element array:
from pykalman import KalmanFilter
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.txt')
df.dropna(inplace=True)
n = df.shape[0]
n_dim_state = 2;
history_state_means = np.zeros((n, n_dim_state))
history_state_covs = np.zeros((n, n_dim_state, n_dim_state))
for idx, row in df.iterrows():
if idx == 0: # Initialize the Kalman filter
delta = 1e-9
trans_cov = delta / (1 - delta) * np.eye(2)
obs_mat = [df.iloc[0].CAT, 1]
kf = KalmanFilter(n_dim_obs=1, n_dim_state=2,
initial_state_mean=np.zeros(2),
initial_state_covariance=np.ones((2, 2)),
transition_matrices=np.eye(2),
observation_matrices=obs_mat,
observation_covariance=1.0,
transition_covariance=trans_cov)
history_state_means[0], history_state_covs[0] = kf.filter(np.asarray(df.iloc[0].DOG))
slope=history_state_means[0, 0]
print "SLOPE", slope
else:
obs_mat = np.asarray([[df.iloc[idx].CAT, 1]])
history_state_means[idx], history_state_covs[idx] = kf.filter_update(history_state_means[idx-1],
history_state_covs[idx-1],
observation = df.iloc[idx].DOG,
observation_matrix=obs_mat)
slope=history_state_means[idx, 0]
print "SLOPE", slope
plt.figure(1)
plt.plot(history_state_means[:, 0], label="Slope")
plt.grid()
plt.show()
It results in the following output:
SLOPE 6.70322464199
SLOPE 6.70512037269
SLOPE 6.70337808649
SLOPE 6.69956406785
SLOPE 6.6961767953
SLOPE 6.69558438828
SLOPE 6.69581682668
SLOPE 6.69617670459
The Pykalman is not really good documented and there are mistakes on the official page. That's why I recomend to test the result using the offline estimation in one step. In this case the observation matrix has to be declared as you did it in your code.
from pykalman import KalmanFilter
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.txt')
df.dropna(inplace=True)
delta = 1e-9
trans_cov = delta / (1 - delta) * np.eye(2)
obs_mat = np.vstack([df.iloc[:].CAT, np.ones(df.iloc[:].CAT.shape)]).T[:, np.newaxis]
kf = KalmanFilter(n_dim_obs=1, n_dim_state=2,
initial_state_mean=np.zeros(2),
initial_state_covariance=np.ones((2, 2)),
transition_matrices=np.eye(2),
observation_matrices=obs_mat,
observation_covariance=1.0,
transition_covariance=trans_cov)
state_means, state_covs = kf.filter(df.iloc[:].DOG)
print "SLOPE", state_means[:, 0]
plt.figure(1)
plt.plot(state_means[:, 0], label="Slope")
plt.grid()
plt.show()
The result is the same.

Related

TypeError: Caught TypeError in DataLoader worker process 0. TypeError: 'KeyError' object is not iterable

from torchvision_starter.engine import train_one_epoch, evaluate
from torchvision_starter import utils
import multiprocessing
import time
n_cpu = multiprocessing.cpu_count()
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
_ = model.to(device)
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.Adam(model.parameters(), lr=0.00001)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
step_size=3,
gamma=0.2,
verbose=True
)
# Let's train for 10 epochs
num_epochs = 1
start = time.time()
for epoch in range(10, 10 + num_epochs):
# train for one epoch, printing every 10 iterations
train_one_epoch(model, optimizer, data_loaders['train'], device, epoch, print_freq=10)
# update the learning rate
lr_scheduler.step()
# evaluate on the validation dataset
evaluate(model, data_loaders['valid'], device=device)
stop = time.time()
print(f"\n\n{num_epochs} epochs in {stop - start} s ({(stop-start) / 3600:.2f} hrs)")
Before I move on to this part, everything is OK. But after I run the part, the error is like below:
I have tried to add drop_last to the helper.py's function like:
data_loaders["train"] = torch.utils.data.DataLoader(
train_data,
batch_size=batch_size,
sampler=train_sampler,
num_workers=num_workers,
collate_fn=utils.collate_fn,
drop_last=True
)
But it doesn't work. By the way, the torch and torchvision are compatible and Cuda is available.
I wonder how to fix it.
The get_data_loaders function:
def get_data_loaders(
folder, batch_size: int = 2, valid_size: float = 0.2, num_workers: int = -1, limit: int = -1, thinning: int = None
):
"""
Create and returns the train_one_epoch, validation and test data loaders.
:param foder: folder containing the dataset
:param batch_size: size of the mini-batches
:param valid_size: fraction of the dataset to use for validation. For example 0.2
means that 20% of the dataset will be used for validation
:param num_workers: number of workers to use in the data loaders. Use -1 to mean
"use all my cores"
:param limit: maximum number of data points to consider
:param thinning: take every n-th frame, instead of all frames
:return a dictionary with 3 keys: 'train_one_epoch', 'valid' and 'test' containing respectively the
train_one_epoch, validation and test data loaders
"""
if num_workers == -1:
# Use all cores
num_workers = multiprocessing.cpu_count()
# We will fill this up later
data_loaders = {"train": None, "valid": None, "test": None}
# create 3 sets of data transforms: one for the training dataset,
# containing data augmentation, one for the validation dataset
# (without data augmentation) and one for the test set (again
# without augmentation)
data_transforms = {
"train": get_transform(UdacitySelfDrivingDataset.mean, UdacitySelfDrivingDataset.std, train=True),
"valid": get_transform(UdacitySelfDrivingDataset.mean, UdacitySelfDrivingDataset.std, train=False),
"test": get_transform(UdacitySelfDrivingDataset.mean, UdacitySelfDrivingDataset.std, train=False),
}
# Create train and validation datasets
train_data = UdacitySelfDrivingDataset(
folder,
transform=data_transforms["train"],
train=True,
thinning=thinning
)
# The validation dataset is a split from the train_one_epoch dataset, so we read
# from the same folder, but we apply the transforms for validation
valid_data = UdacitySelfDrivingDataset(
folder,
transform=data_transforms["valid"],
train=True,
thinning=thinning
)
# obtain training indices that will be used for validation
n_tot = len(train_data)
indices = torch.randperm(n_tot)
# If requested, limit the number of data points to consider
if limit > 0:
indices = indices[:limit]
n_tot = limit
split = int(math.ceil(valid_size * n_tot))
train_idx, valid_idx = indices[split:], indices[:split]
# define samplers for obtaining training and validation batches
train_sampler = torch.utils.data.SubsetRandomSampler(train_idx)
valid_sampler = torch.utils.data.SubsetRandomSampler(valid_idx) # =
# prepare data loaders
data_loaders["train"] = torch.utils.data.DataLoader(
train_data,
batch_size=batch_size,
sampler=train_sampler,
num_workers=num_workers,
collate_fn=utils.collate_fn,
drop_last=True
)
data_loaders["valid"] = torch.utils.data.DataLoader(
valid_data, # -
batch_size=batch_size, # -
sampler=valid_sampler, # -
num_workers=num_workers, # -
collate_fn=utils.collate_fn,
drop_last=True
)
# Now create the test data loader
test_data = UdacitySelfDrivingDataset(
folder,
transform=data_transforms["test"],
train=False,
thinning=thinning
)
if limit > 0:
indices = torch.arange(limit)
test_sampler = torch.utils.data.SubsetRandomSampler(indices)
else:
test_sampler = None
data_loaders["test"] = torch.utils.data.DataLoader(
test_data,
batch_size=batch_size,
shuffle=False,
num_workers=num_workers,
sampler=test_sampler,
collate_fn=utils.collate_fn,
drop_last=True
# -
)
return data_loaders
class UdacitySelfDrivingDataset(torch.utils.data.Dataset):
# Mean and std of the dataset to be used in nn.Normalize
mean = torch.tensor([0.3680, 0.3788, 0.3892])
std = torch.tensor([0.2902, 0.3069, 0.3242])
def __init__(self, root, transform, train=True, thinning=None):
super().__init__()
self.root = os.path.abspath(os.path.expandvars(os.path.expanduser(root)))
self.transform = transform
# load datasets
if train:
self.df = pd.read_csv(os.path.join(self.root, "labels_train.csv"))
else:
self.df = pd.read_csv(os.path.join(self.root, "labels_test.csv"))
# Index by file id (i.e., a sequence of the same length as the number of images)
codes, uniques = pd.factorize(self.df['frame'])
if thinning:
# Take every n-th rows. This makes sense because the images are
# frames of videos from the car, so we are essentially reducing
# the frame rate
thinned = uniques[::thinning]
idx = self.df['frame'].isin(thinned)
print(f"Keeping {thinned.shape[0]} of {uniques.shape[0]} images")
print(f"Keeping {idx.sum()} objects out of {self.df.shape[0]}")
self.df = self.df[idx].reset_index(drop=True)
# Recompute codes
codes, uniques = pd.factorize(self.df['frame'])
self.n_images = len(uniques)
self.df['image_id'] = codes
self.df.set_index("image_id", inplace=True)
self.classes = ['car', 'truck', 'pedestrian', 'bicyclist', 'light']
self.colors = ['cyan', 'blue', 'red', 'purple', 'orange']
#property
def n_classes(self):
return len(self.classes)
def __getitem__(self, idx):
if idx in self.df.index:
row = self.df.loc[[idx]]
else:
return KeyError(f"Element {idx} not in dataframe")
# load images fromm file
img_path = os.path.join(self.root, "images", row['frame'].iloc[0])
img = Image.open(img_path).convert("RGB")
# Exclude bogus boxes with 0 height or width
h = row['ymax'] - row['ymin']
w = row['xmax'] - row['xmin']
filter_idx = (h > 0) & (w > 0)
row = row[filter_idx]
# get bounding box coordinates for each mask
boxes = row[['xmin', 'ymin', 'xmax', 'ymax']].values
# convert everything into a torch.Tensor
boxes = torch.as_tensor(boxes, dtype=torch.float32)
# get the labels
labels = torch.as_tensor(row['class_id'].values, dtype=int)
image_id = torch.tensor([idx])
area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
# assume no crowd for everything
iscrowd = torch.zeros((row.shape[0],), dtype=torch.int64)
target = {}
target["boxes"] = boxes
target["labels"] = labels
target["image_id"] = image_id
target["area"] = area
target["iscrowd"] = iscrowd
if self.transform is not None:
img, target = self.transform(img, target)
return img, target
def __len__(self):
return self.n_images
def plot(self, idx, renormalize=True, predictions=None, threshold=0.5, ax=None):
image, label_js = self[idx]
if renormalize:
# Invert the T.Normalize transform
unnormalize = T.Compose(
[
T.Normalize(mean = [ 0., 0., 0. ], std = 1 / type(self).std),
T.Normalize(mean = -type(self).mean, std = [ 1., 1., 1. ])
]
)
image, label_js = unnormalize(image, label_js)
if ax is None:
fig, ax = plt.subplots(figsize=(8, 8))
_ = ax.imshow(torch.permute(image, [1, 2, 0]))
for i, box in enumerate(label_js['boxes']):
xy = (box[0], box[1])
h, w = (box[2] - box[0]), (box[3] - box[1])
r = patches.Rectangle(xy, h, w, fill=False, color=self.colors[label_js['labels'][i]-1], lw=2, alpha=0.5)
ax.add_patch(r)
if predictions is not None:
# Make sure the predictions are on the CPU
for k in predictions:
predictions[k] = predictions[k].detach().cpu().numpy()
for i, box in enumerate(predictions['boxes']):
if predictions['scores'][i] > threshold:
xy = (box[0], box[1])
h, w = (box[2] - box[0]), (box[3] - box[1])
r = patches.Rectangle(xy, h, w, fill=False, color=self.colors[predictions['labels'][i]-1], lw=2, linestyle=':')
ax.add_patch(r)
_ = ax.axis("off")
return ax

RuntimeError: quantile() q tensor must be same dtype as the input tensor in pytorch-forecasting

PyTorch-Forecasting version: 0.10.2
PyTorch version:1.12.1
Python version:3.10.4
Operating System: windows
Expected behavior
No Error
Actual behavior
The Error is
File c:\Users\josepeeterson.er\Miniconda3\envs\pytorch\lib\site-packages\pytorch_forecasting\metrics\base_metrics.py:979, in DistributionLoss.to_quantiles(self, y_pred, quantiles, n_samples)
977 except NotImplementedError: # resort to derive quantiles empirically
978 samples = torch.sort(self.sample(y_pred, n_samples), -1).values
--> 979 quantiles = torch.quantile(samples, torch.tensor(quantiles, device=samples.device), dim=2).permute(1, 2, 0)
980 return quantiles
RuntimeError: quantile() q tensor must be same dtype as the input tensor
How do I set them to be of same datatype? This is happening internally. I do not have control over this. I am not using any GPUs.
The link to the .csv file with input data is https://github.com/JosePeeterson/Demand_forecasting
The data is just sampled from a negative binomila distribution wiht parameters (9,0.5) every 4 hours. the time inbetween is all zero.
I just want to see if DeepAR can learn this pattern.
Code to reproduce the problem
from pytorch_forecasting.data.examples import generate_ar_data
import matplotlib.pyplot as plt
import pandas as pd
from pytorch_forecasting.data import TimeSeriesDataSet
from pytorch_forecasting.data import NaNLabelEncoder
from pytorch_lightning.callbacks import EarlyStopping, LearningRateMonitor
import pytorch_lightning as pl
from pytorch_forecasting import NegativeBinomialDistributionLoss, DeepAR
import torch
from pytorch_forecasting.data.encoders import TorchNormalizer
data = [pd.read_csv('1_f_nbinom_train.csv')]
data["date"] = pd.Timestamp("2021-08-24") + pd.to_timedelta(data.time_idx, "H")
data['_hour_of_day'] = str(data["date"].dt.hour)
data['_day_of_week'] = str(data["date"].dt.dayofweek)
data['_day_of_month'] = str(data["date"].dt.day)
data['_day_of_year'] = str(data["date"].dt.dayofyear)
data['_week_of_year'] = str(data["date"].dt.weekofyear)
data['_month_of_year'] = str(data["date"].dt.month)
data['_year'] = str(data["date"].dt.year)
max_encoder_length = 60
max_prediction_length = 20
training_cutoff = data["time_idx"].max() - max_prediction_length
training = TimeSeriesDataSet(
data.iloc[0:-620],
time_idx="time_idx",
target="value",
categorical_encoders={"series": NaNLabelEncoder(add_nan=True).fit(data.series), "_hour_of_day": NaNLabelEncoder(add_nan=True).fit(data._hour_of_day), \
"_day_of_week": NaNLabelEncoder(add_nan=True).fit(data._day_of_week), "_day_of_month" : NaNLabelEncoder(add_nan=True).fit(data._day_of_month), "_day_of_year" : NaNLabelEncoder(add_nan=True).fit(data._day_of_year), \
"_week_of_year": NaNLabelEncoder(add_nan=True).fit(data._week_of_year), "_year": NaNLabelEncoder(add_nan=True).fit(data._year)},
group_ids=["series"],
min_encoder_length=max_encoder_length,
max_encoder_length=max_encoder_length,
min_prediction_length=max_prediction_length,
max_prediction_length=max_prediction_length,
time_varying_unknown_reals=["value"],
time_varying_known_categoricals=["_hour_of_day","_day_of_week","_day_of_month","_day_of_year","_week_of_year","_year" ],
time_varying_known_reals=["time_idx"],
add_relative_time_idx=False,
randomize_length=None,
scalers=[],
target_normalizer=TorchNormalizer(method="identity",center=False,transformation=None )
)
validation = TimeSeriesDataSet.from_dataset(
training,
data.iloc[-620:-420],
# predict=True,
stop_randomization=True,
)
batch_size = 64
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=8)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers=8)
# save datasets
training.save("training.pkl")
validation.save("validation.pkl")
early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=5, verbose=False, mode="min")
lr_logger = LearningRateMonitor()
trainer = pl.Trainer(
max_epochs=10,
gpus=0,
gradient_clip_val=0.1,
limit_train_batches=30,
limit_val_batches=3,
# fast_dev_run=True,
# logger=logger,
# profiler=True,
callbacks=[lr_logger, early_stop_callback],
)
deepar = DeepAR.from_dataset(
training,
learning_rate=0.1,
hidden_size=32,
dropout=0.1,
loss=NegativeBinomialDistributionLoss(),
log_interval=10,
log_val_interval=3,
# reduce_on_plateau_patience=3,
)
print(f"Number of parameters in network: {deepar.size()/1e3:.1f}k")
torch.set_num_threads(10)
trainer.fit(
deepar,
train_dataloaders=train_dataloader,
val_dataloaders=val_dataloader,
)
Need to cast samples to torch.tensor as shown below. Then save this base_metrics.py and rerun above code.
except NotImplementedError: # resort to derive quantiles empirically
samples = torch.sort(self.sample(y_pred, n_samples), -1).values
quantiles = torch.quantile(torch.tensor(samples), torch.tensor(quantiles, device=samples.device), dim=2).permute(1, 2, 0)
return quantiles

Pitch Calculation Error via Autocorrelation Method

Aim : Pitch Calculation
Issue : The calculated pitch does not match the expected one. For instance, the output is approx. 'D3', however the expected output is 'C5'.
Source Sound : https://freewavesamples.com/1980s-casio-celesta-c5
Source Code
library("tuneR")
library("seewave")
#0: Acquisition of sample sound
snd_smpl = readWave(paste("~/Music/sample/1980s-Casio-Celesta-C5.wav"),
from = 0, to = 1, units = "seconds")
dur_smpl = duration(snd_smpl)
len_smpl = length(snd_smpl)
#1 : Pre-Processing Stage
#1.1 : Application of Hanning Window
n = 1:len_smpl
han_win = 0.5-0.5*cos(2*pi*n/(len_smpl-1))
wind_sig = han_win*snd_smpl#left
#2.1 : Auto-Correlation Calculation
rev_wind_sig = rev(wind_sig) #Reversing the windowed signal
acorr_1 = convolve(wind_sig, rev_wind_sig, type = "open")
# Obtaining the 2nd half of the correlation, to simplify calculation
n = 2*len_smpl-1
acorr_2 = (1/len_smpl)*acorr_1[len_smpl:n]
#2.2 : Note Calculation
min_index = which.min(acorr_2)
print(min_index)
fs = 44100
fo = fs/min_index #To obtain fundamental frequency
print(fo)
print(notenames(noteFromFF(fo)))
Output
> print(min_index)
[1] 37
> fs = 44100
> fo = fs/min_index
> print(fo)
[1] 1191.892
> print(notenames(noteFromFF(fo)))
[1] "d'''"
The entire calculation is performed in the Time Domain.
I'm currently using autocorrelation as a base to understand more about Pitch Detection & Analysis. I've tried to analyse the sample with 'Audacity' and the result is 'C5'. Hence, I'm wondering where actually the issue is.
Can you all help me find it?
Also, there are a few but important doubts:
How small should actually my analysis window be (20ms, 1s,..)?
Will reinforcement of the Autocorrelation Algorithm with AMDF and other similar algorithms make this Pitch Detection module more robust?
This whole analysis seems not correct. You should not use windowing in time domain analysis.
Attached a short solution in the python language; you can use it as pseudocode
from soundfile import read
from glob import glob
from scipy.signal import correlate, find_peaks
from matplotlib.pyplot import plot, show, xlim, title, xlabel
import numpy as np
%matplotlib inline
name = glob('*wav')[0]
samples, fs = read(name)
corr = correlate(samples, samples)
corr = corr[corr.size / 2:]
time = np.arange(corr.size) / float(fs)
ind = find_peaks(corr[time < 0.002])[0]
plot(time, corr)
plot(time[ind], corr[ind], '*')
xlim([0, 0.005])
title('Frequency = {} Hz'.format(1 / time[ind][0]))
xlabel('Time [Sec]')
show()

Categorical image classification always predicts one class, though calculated accuracy reaches 100%

I followed the Keras cat/dog image classification tutorial
Keras Image Classification tutorial
and found similar results to the reported values. I then took the code from the first example in that tutorial Tutorial Example 1 code, slightly altered a few lines, and trained the model for a dataset of grayscale images (~150 thousand images across 7 classes).
This gave me great initial results ( ~84% accuracy), which I am happy with.
Next I tried implementing the image batch generator myself, which is where I am having trouble. Briefly, the code seems to run well, except the reported accuracy of the model quickly shoots to >= 99% within two epochs. Due to noise in the dataset, this amount of accuracy is not believable. After using the trained model to predict a new batch of data ( images outside of the training or validation dataset ), I find the model always predicts the first class ( i.e. [1.,0.,0.,0.,0.,0.,0.]. The loss function is forcing the model to predict a single class 100% of the time, even though the labels I pass in are distributed across all the classes.
After 28 epochs of training, I see the following output:
320/320 [==============================] - 1114s - loss: 1.5820e-07 - categorical_accuracy: 1.0000 - sparse_categorical_accuracy: 0.0000e+00 - val_loss: 16.1181 - val_categorical_accuracy: 0.0000e+00 - val_sparse_categorical_accuracy: 0.0000e+00
When I examine the batch generator output from the tutorial code, and compare my batch generator output, the shape, datatype, and range of values are identical between both generators. I would like to emphasize that the generator passes y labels from each category, not just array([ 1.., 0., 0., 0., 0., 0., 0.], dtype=float32). Therefore, I am lost as to what I am doing incorrectly.
Since I posted this code several days ago, I have used the default Keras image generator, and successfully trained the network on the same dataset and same network architecture. Therefore, something about how I load and pass the data in the generator must be incorrect.
Here is the code I implemented:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.optimizers import SGD
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
import imgaug as ia
from imgaug import augmenters as iaa
import numpy as np
import numpy.random as nprand
import imageio
import os, re, random, sys, csv
import scipy
img_width, img_height = 112, 112
input_shape = (img_width,img_height,1)
batch_size = 200
epochs = 2
train_image_directory = '/PATH/To/Directory/train/'
valid_image_directory = '/PATH/To/Directory/validate/'
video_info_file = '/PATH/To/Directory/train_labels.csv'
train_image_paths = [train_image_directory + m.group(1) for m in [re.match(r"(\d+_\d+\.png)", fname) for fname in os.listdir(train_image_directory)] if m is not None]
valid_image_paths = [valid_image_directory + m.group(1) for m in [re.match(r"(\d+_\d+\.png)", fname) for fname in os.listdir(valid_image_directory)] if m is not None]
num_train_images = len(train_image_paths)
num_val_images = len(valid_image_paths)
label_map = {}
label_decode = {
'0': [1.,0.,0.,0.,0.,0.,0.],
'1': [0.,1.,0.,0.,0.,0.,0.],
'2': [0.,0.,1.,0.,0.,0.,0.],
'3': [0.,0.,0.,1.,0.,0.,0.],
'4': [0.,0.,0.,0.,1.,0.,0.],
'5': [0.,0.,0.,0.,0.,1.,0.],
'6': [0.,0.,0.,0.,0.,0.,1.]
}
with open(video_info_file) as f:
reader = csv.reader(f)
for row in reader:
key = row[0]
if key in label_map:
pass
label_map[key] = label_decode[row[1]]
sometimes = lambda aug: iaa.Sometimes(0.5,aug)
seq = iaa.Sequential(
[
iaa.Fliplr(0.5),
iaa.Flipud(0.2),
sometimes(iaa.Crop(percent=(0, 0.1))),
sometimes(iaa.Affine(
scale={"x": (0.8, 1.2), "y": (0.8, 1.2)},
translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)},
rotate=(-5, 5),
shear=(-16, 16),
order=[0, 1],
cval=(0, 1),
mode=ia.ALL
)),
iaa.SomeOf((0, 3),
[
sometimes(iaa.Superpixels(p_replace=(0, 0.40), n_segments=(20, 100))),
iaa.Sharpen(alpha=(0, 1.0), lightness=(0.75, 1.5)),
iaa.Emboss(alpha=(0, 1.0), strength=(0, 1.0)),
iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05*255)),
iaa.OneOf([
iaa.Dropout((0.01, 0.1)),
iaa.CoarseDropout((0.03, 0.15), size_percent=(0.02, 0.05)),
]),
iaa.Invert(0.05),
iaa.Add((-10, 10)),
iaa.Multiply((0.5, 1.5), per_channel=0.5),
iaa.ContrastNormalization((0.5, 2.0)),
sometimes(iaa.ElasticTransformation(alpha=(0.5, 1.5), sigma=0.2)),
sometimes(iaa.PiecewiseAffine(scale=(0.01, 0.03))) # sometimes move parts of the image around
],
random_order=True
)
],
random_order=True)
def image_data_generator(image_paths, labels, batch_size, training):
while(1):
image_paths = nprand.choice(image_paths, batch_size)
X0 = np.asarray([imageio.imread(x) for x in image_paths])
Y = np.asarray([labels[x] for x in image_paths],dtype=np.float32)
if(training):
X = np.divide(np.expand_dims(seq.augment_images(X0)[:,:,:,0],axis=3),255.)
else:
X = np.expand_dims(np.divide(X0[:,:,:,0],255.),axis=3)
X = np.asarray(X,dtype=np.float32)
yield X,Y
def predict_videos(model,video_paths):
i=0
predictions=[]
while(i < len(video_paths)):
video_reader = imageio.get_reader(video_paths[i])
X0 = np.expand_dims([ im[:,:,0] for x,im in enumerate(video_reader) ],axis=3)
prediction = model.predict(X0)
i=i+1
predictions.append(prediction)
return predictions
train_gen = image_data_generator(train_image_paths,label_map,batch_size,True)
val_gen = image_data_generator(valid_image_paths,label_map,batch_size,False)
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.4))
model.add(Dense(7))
model.add(Activation('softmax'))
model.load_weights('/PATH/To_pretrained_weights/pretrained_model.h5')
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=['categorical_accuracy','sparse_categorical_accuracy'])
checkpointer = ModelCheckpoint('/PATH/To_pretrained_weights/pretrained_model.h5', monitor='val_loss', verbose=0, save_best_only=True, save_weights_only=False, mode='auto', period=1)
reduceLR = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=20, verbose=0, mode='auto', cooldown=0, min_lr=0)
early_stop = EarlyStopping(monitor='val_loss', patience=20, verbose=1)
callbacks_list = [checkpointer, early_stop, reduceLR]
model.fit_generator(
train_gen,
steps_per_epoch = -(-num_train_images // batch_size),
epochs=epochs,
validation_data=val_gen,
validation_steps = -(-num_val_images // batch_size),
callbacks=callbacks_list)
For some reason that I cannot fully determine, if you do not give the fit_generator function accurate numbers for steps per epoch or steps for validation, the result is inaccurate reporting of the accuracy metric and strange gradient descent steps.
You can fix this problem by using the Train_on_batch function in Keras instead of the fit generator, or by accurately reporting these step numbers.

Tensorflow: 6 layer CNN: OOM (use 10Gb GPU memory)

I am using the following code for running a 6 layer CNN with 2 FC layers on top (on Tesla K-80 GPU).
Somehow, it consumes entire memory 10GB and died out of memory.I know that i can reduce the batch_size and then run , but i also want to run with 15 or 20 CNN layers.Whats wrong with the following code and why it takes all the memory? How should i run the code for 15 layers CNN.
Code:
import model
with tf.Graph().as_default() as g_train:
filenames = tf.train.match_filenames_once(FLAGS.train_dir+'*.tfrecords')
filename_queue = tf.train.string_input_producer(filenames, shuffle=True, num_epochs=FLAGS.num_epochs)
feats,labels = get_batch_input(filename_queue, batch_size=FLAGS.batch_size)
### feats size=(batch_size, 100, 50)
logits = model.inference(feats, FLAGS.batch_size)
loss = model.loss(logits, labels, feats)
tvars = tf.trainable_variables()
global_step = tf.Variable(0, name='global_step', trainable=False)
# Add to the Graph operations that train the model.
train_op = model.training(loss, tvars, global_step, FLAGS.learning_rate, FLAGS.clip_gradients)
# Add the Op to compare the logits to the labels during evaluation.
eval_correct = model.evaluation(logits, labels, feats)
summary_op = tf.merge_all_summaries()
saver = tf.train.Saver(tf.all_variables(), max_to_keep=15)
# The op for initializing the variables.
init_op = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init_op)
summary_writer = tf.train.SummaryWriter(FLAGS.model_dir,
graph=sess.graph)
# Start input enqueue threads.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
step = 0
while not coord.should_stop():
_, loss_value = sess.run([train_op, loss])
if step % 100 == 0:
print('Step %d: loss = %.2f (%.3f sec)' % (step, loss_value))
# Update the events file.
summary_str = sess.run(summary_op)
summary_writer.add_summary(summary_str, step)
if (step == 0) or (step + 1) % 1000 == 0 or (step + 1) == FLAGS.max_steps:
ckpt_model = os.path.join(FLAGS.model_dir, 'model.ckpt')
saver.save(sess, ckpt_model, global_step=step)
#saver.save(sess, FLAGS.model_dir, global_step=step)
step += 1
except tf.errors.OutOfRangeError:
print('Done training for %d epochs, %d steps.' % (FLAGS.num_epochs, step))
finally:
coord.join(threads)
sess.close()
###################### File model.py ####################
def conv2d(x, W, b, strides=1):
# Conv2D wrapper, with bias and relu activation
x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1],
padding='SAME')
x = tf.nn.bias_add(x, b)
return tf.nn.relu(x)
def maxpool2d(x, k=2,s=2):
# MaxPool2D wrapper
return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, s,
s,1],padding='SAME')
def inference(feats,batch_size):
#feats size (batch_size,100,50,1) #batch_size=256
conv1_w=tf.get_variable("conv1_w", [filter_size,filter_size,1,256],initializer=tf.uniform_unit_scaling_initializer())
conv1_b=tf.get_variable("conv1_b",[256])
conv1 = conv2d(feats, conv1_w, conv1_b,2)
conv1 = maxpool2d(conv1, k=2,s=2)
### This was replicated for 6 layers and the 2 FC connected layers are added
return logits
def training(loss, train_vars, global_step, learning_rate, clip_gradients):
# Add a scalar summary for the snapshot loss.
tf.scalar_summary(loss.op.name, loss)
grads, _ = tf.clip_by_global_norm(tf.gradients(loss, train_vars,aggregation_method=1), clip_gradients)
optimizer = tf.train.AdamOptimizer(learning_rate)
train_op = optimizer.apply_gradients(zip(grads, train_vars), global_step=global_step)
return train_op
I am not too sure what the model python library is. If it is something you wrote and can change the setting in the optimizer I would suggest the following which I use in my own code
train_step = tf.train.AdamOptimizer(learning_rate).minimize(cost, aggregation_method = tf.AggregationMethod.EXPERIMENTAL_ACCUMULATE_N)
By default the aggeragetion_method is ADD_N but if you change it to EXPERIMENTAL_ACCUMULATE_N or EXPERIMENTAL_TREE this will greatly save memory. The main memory hog in these programs is that tensorflow must save the output values at every neuron so that it can compute the gradients. Changing the aggregation_method helps a lot from my experience.
Also BTW I don't think there is anything wrong with your code. I can run out of memory on small cov-nets as well.

Resources