exponential decay with scipy just gives step function - plot

I'm trying to do an exponential fit with a set of data:
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as opt
def func(x, a, b, c):
return a * np.exp(x / -b) + c
epr_data = np.loadtxt('T2_text', skiprows=1)
time = epr_data[:, 1]
intensity = epr_data[:, 2]
optimizedParameters, pcov = opt.curve_fit(func, time, intensity)
print(optimizedParameters)
plt.plot(time, intensity, func(time, *optimizedParameters), label="fit")
plt.show()
but i just get this step function and these parameters:
[1.88476367e+05 1.00000000e+00 6.49563230e+03]
the plot with "fit"
as well as this error message:
OptimizeWarning: Covariance of the parameters could not be estimated
warnings.warn('Covariance of the parameters could not be estimated'
EDIT:
https://pastebin.com/GTTGf0ed
i want to plot the time and first row with intensity
the graph after your suggestion
edit 2:
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as opt
def func(x, a, b, c):
return a * np.exp(x / -b) + c
epr_data = np.loadtxt('T2_text', skiprows=1)
time = epr_data[:, 1]
intensity = epr_data[:, 2]
c0 = np.mean(intensity[-10:])
a0 = intensity[0]-c0
th = time[np.searchsorted(-intensity+c0, -0.5*a0)]
b0 = th / np.log(2)
optimizedParameters, pcov = opt.curve_fit(func, time, intensity, p0=(a0, b0, c0))
print(optimizedParameters)
plt.plot(time, intensity, label='data')
plt.plot(time, func(time, *optimizedParameters), label="fit")
plt.legend()
plt.show()

First, fix your plot function call. To plot two curves with one call, you have to give the x and y values for each curve:
plt.plot(time, intensity, time, func(time, *optimizedParameters), label="fit")
For labeling, it might be simpler to call plot twice:
plt.plot(time, intensity, label="data")
plt.plot(time, func(time, *optimizedParameters), label="fit")
It would be easier to address the warning generated by curve_fit if we had your data. Experience shows, however, that it is most likely that the default initial guess of the parameters used by curve_fit (which is all 1s) is just a really bad guess.
Try helping curve_fit by giving it a better starting point for its numerical optimization routine. The following bit of code shows how you can compute rough guesses for a, b and c. Pass these to curve_fit with the argument p0=(a0, b0, c0).
# Assume that the time series is long enough that the tail
# has flattened out to approximately random noise around c.
c0 = np.mean(intensity[-10:])
# This assumes time[0] is 0.
a0 = intensity[0] - c0
# Rough guess of the half-life.
th = time[np.searchsorted(-intensity+c0, -0.5*a0)]
b0 = th / np.log(2)

Related

How do I define matrix parameters in a Dymos problem?

I'm trying to setup a dynamic optimization with dymos where I have an analysis upstream of my dymos trajectory. This upstream analysis computes some 2D-matrix K. I want to pass this matrix into my dymos problem. According to the documentation (and how I've done this in the past) is to add K as a paramter to the trajectory:
traj.add_parameter('K',targets={'phase0':['K'],opt=False,static_target=True).
However, this returns an error because static_target expects K to be a scalar. If I have static_target=False, this also returns an error because it expects K to have some dimension related to the number of nodes in the trajectory.
Is there something I'm missing here?
Is it sufficient to manually connect K to the trajectory via
p.model.connect('K','traj.phase0.rhs_disc.K') and
p.model.connect('K','traj.phase0.rhs_col.K')? Or will that create issues in how dymos works the problem.
It doesn't seem appropriate to vectorize K either.
Any suggestions are greatly appreciated.
In my opinion, the easiest way to connect parameters from trajectory to phase is to add the parameter to both the Trajectory and the phases in which it is to be used.
Consider a simple oscillator where the mass, spring constant, and dampening coefficient are given as a single size-3 input.
In this case, I used OpenMDAO's tags feature and a special dymos tag dymos.static_target so that dymos realizes the target isn't shaped with a different value at each node. I think its a bit easier to do it this way as opposed to having to add it later at the add_parameter call.
class OscillatorODEVectorParam(om.ExplicitComponent):
"""
A Dymos ODE for a damped harmonic oscillator.
"""
def initialize(self):
self.options.declare('num_nodes', types=int)
def setup(self):
nn = self.options['num_nodes']
# Inputs
self.add_input('x', shape=(nn,), desc='displacement', units='m')
self.add_input('v', shape=(nn,), desc='velocity', units='m/s')
self.add_input('constants', shape=(3,), units=None,
desc='a vector of mass, spring constant, and damping coefficient [m, k, c]',
tags=['dymos.static_target'])
self.add_output('v_dot', val=np.zeros(nn), desc='rate of change of velocity', units='m/s**2')
self.declare_coloring(wrt='*', method='fd')
def compute(self, inputs, outputs):
x = inputs['x']
v = inputs['v']
m, k, c = inputs['constants']
f_spring = -k * x
f_damper = -c * v
outputs['v_dot'] = (f_spring + f_damper) / m
To use the ODE, we have a problem with a single trajectory and in this case, as single phase.
Again, in my opinion, the clearest way to link parameters from the trajectory to phases is to add them in both places with the same name.
Dymos will perform some introspection and automatically link them up.
def test_ivp_driver_shaped_param(self):
import openmdao.api as om
import dymos as dm
import matplotlib.pyplot as plt
# plt.switch_backend('Agg') # disable plotting to the screen
from dymos.examples.oscillator.oscillator_ode import OscillatorODEVectorParam
# Instantiate an OpenMDAO Problem instance.
prob = om.Problem()
# We need an optimization driver. To solve this simple problem ScipyOptimizerDriver will work.
prob.driver = om.ScipyOptimizeDriver()
# Instantiate a Phase
phase = dm.Phase(ode_class=OscillatorODEVectorParam, transcription=dm.Radau(num_segments=10))
# Tell Dymos that the duration of the phase is bounded.
phase.set_time_options(fix_initial=True, fix_duration=True)
# Tell Dymos the states to be propagated using the given ODE.
phase.add_state('x', fix_initial=True, rate_source='v', targets=['x'], units='m')
phase.add_state('v', fix_initial=True, rate_source='v_dot', targets=['v'], units='m/s')
# The spring constant, damping coefficient, and mass are inputs to the system that are
# constant throughout the phase.
# Declare this parameter on phase and then we'll feed its value from the parent trajectory.
phase.add_parameter('constants', units=None)
# Since we're using an optimization driver, an objective is required. We'll minimize
# the final time in this case.
phase.add_objective('time', loc='final')
# Instantiate a Dymos Trajectory and add it to the Problem model.
traj = prob.model.add_subsystem('traj', dm.Trajectory())
traj.add_phase('phase0', phase)
# This parameter value will connect to any phase with a parameter named constants by default.
# This is the easiest way, in my opinion, to pass parameters from trajectory to phase.
traj.add_parameter('constants', units=None, opt=False)
# Setup the OpenMDAO problem
prob.setup()
# Assign values to the times and states
prob.set_val('traj.phase0.t_initial', 0.0)
prob.set_val('traj.phase0.t_duration', 15.0)
prob.set_val('traj.phase0.states:x', 10.0)
prob.set_val('traj.phase0.states:v', 0.0)
# m k c
prob.set_val('traj.parameters:constants', [1.0, 1.0, 0.5])
# Now we're using the optimization driver to iteratively run the model and vary the
# phase duration until the final y value is 0.
prob.run_driver()
# Perform an explicit simulation of our ODE from the initial conditions.
sim_out = traj.simulate(times_per_seg=50)
# Plot the state values obtained from the phase timeseries objects in the simulation output.
t_sol = prob.get_val('traj.phase0.timeseries.time')
t_sim = sim_out.get_val('traj.phase0.timeseries.time')
states = ['x', 'v']
fig, axes = plt.subplots(len(states), 1)
for i, state in enumerate(states):
sol = axes[i].plot(t_sol, prob.get_val(f'traj.phase0.timeseries.states:{state}'), 'o')
sim = axes[i].plot(t_sim, sim_out.get_val(f'traj.phase0.timeseries.states:{state}'), '-')
axes[i].set_ylabel(state)
axes[-1].set_xlabel('time (s)')
fig.legend((sol[0], sim[0]), ('solution', 'simulation'), 'lower right', ncol=2)
plt.tight_layout()
plt.show()

Clustering Time Series in R - is K Mean accurate?

My data set is composed by measurement of the same index for 14 years (columns) for 105 countries (rows). I want to cluster countries based on their index trend over time.
I am trying Hierarchical clustering (hclust) and K Medoids (pam) exploiting DTW distance matrix (dtw package).
I also tried K Mean, using the DTW distance matrix as first argument of function kmeans. The algorithm works, but I'm not sure about the accuracy of that, since K Mean exploit Eucledian Distance and computes centroids as means.
I am also thinking about using data directly, but I can't understand how the result would be accurate since the algorithm would consider different measurement of the same variable over time as different variables in order to compute the centroids at each iteration and Eucledian distance to assign observations to clusters. It doesn't seem to me that this process could cluster time series as well as Hierarchical and K Medoids clustering.
Is K Mean algorithm a good choice when clustering Time Series or it is better to use algorithms that exploit distance concept as DTW (but are slower)? Does it exist an R function that allows to use K Mean algorithm with distance matrix or a specific package to cluster Time Series data?
KMeans will do exactly what you tell it to do. Unfortunately, trying to feed a time series dataset into a KMeans algo will result in meaningless results. The KMeans algo, and most general clustering methods, are built around the Euclidean distance, which does not seem to be a good measure for time series data. Quite simply, K-means often doesn’t work when clusters are not round shaped because of it uses some kind of distance function and distance is measured from cluster center. Check out the GMM algo as an alternative. It sounds like you are going with R for this experiment. If so, check out the sample code below.
Here is a KMeans cluster.
Here is a GMM cluster.
Which one looks more like a time series plot to you??!!
I Googled around for a good sample of R code to demonstrate how GMM clustering works. Unfortunately, I couldn't find anything decent. Personally, I use Python much more than I use R. If you are open to a Python solution, check out the sample code below.
import numpy as np
import itertools
from scipy import linalg
import matplotlib.pyplot as plt
import matplotlib as mpl
from sklearn import mixture
print(__doc__)
# Number of samples per component
n_samples = 500
# Generate random sample, two components
np.random.seed(0)
C = np.array([[0., -0.1], [1.7, .4]])
X = np.r_[np.dot(np.random.randn(n_samples, 2), C),
.7 * np.random.randn(n_samples, 2) + np.array([-6, 3])]
lowest_bic = np.infty
bic = []
n_components_range = range(1, 7)
cv_types = ['spherical', 'tied', 'diag', 'full']
for cv_type in cv_types:
for n_components in n_components_range:
# Fit a Gaussian mixture with EM
gmm = mixture.GaussianMixture(n_components=n_components,
covariance_type=cv_type)
gmm.fit(X)
bic.append(gmm.bic(X))
if bic[-1] < lowest_bic:
lowest_bic = bic[-1]
best_gmm = gmm
bic = np.array(bic)
color_iter = itertools.cycle(['navy', 'turquoise', 'cornflowerblue',
'darkorange'])
clf = best_gmm
bars = []
# Plot the BIC scores
plt.figure(figsize=(8, 6))
spl = plt.subplot(2, 1, 1)
for i, (cv_type, color) in enumerate(zip(cv_types, color_iter)):
xpos = np.array(n_components_range) + .2 * (i - 2)
bars.append(plt.bar(xpos, bic[i * len(n_components_range):
(i + 1) * len(n_components_range)],
width=.2, color=color))
plt.xticks(n_components_range)
plt.ylim([bic.min() * 1.01 - .01 * bic.max(), bic.max()])
plt.title('BIC score per model')
xpos = np.mod(bic.argmin(), len(n_components_range)) + .65 +\
.2 * np.floor(bic.argmin() / len(n_components_range))
plt.text(xpos, bic.min() * 0.97 + .03 * bic.max(), '*', fontsize=14)
spl.set_xlabel('Number of components')
spl.legend([b[0] for b in bars], cv_types)
# Plot the winner
splot = plt.subplot(2, 1, 2)
Y_ = clf.predict(X)
for i, (mean, cov, color) in enumerate(zip(clf.means_, clf.covariances_,
color_iter)):
v, w = linalg.eigh(cov)
if not np.any(Y_ == i):
continue
plt.scatter(X[Y_ == i, 0], X[Y_ == i, 1], .8, color=color)
# Plot an ellipse to show the Gaussian component
angle = np.arctan2(w[0][1], w[0][0])
angle = 180. * angle / np.pi # convert to degrees
v = 2. * np.sqrt(2.) * np.sqrt(v)
ell = mpl.patches.Ellipse(mean, v[0], v[1], 180. + angle, color=color)
ell.set_clip_box(splot.bbox)
ell.set_alpha(.5)
splot.add_artist(ell)
plt.xticks(())
plt.yticks(())
plt.title('Selected GMM: full model, 2 components')
plt.subplots_adjust(hspace=.35, bottom=.02)
plt.show()
Finall, from the image below, you can clearly see how
Here's an example of how to visualize clusters using plotGMM. The code to reproduce follows:
require(quantmod)
SCHB <- fortify(getSymbols('SCHB', auto.assign=FALSE))
set.seed(730) # for reproducibility
mixmdl <- mixtools::normalmixEM(Cl(SCHB), k = 5); plot_GMM(mixmdl, k = 5) # 5 clusters
plot_GMM(mixmdl, k = 5)
I hope that helps. Oh, and for plotting time series with ggplot2, you should avail yourself of ggplot2's fortify function. Hope that helps.

Efficiently sample a collection of multi-normal variables with varying sigma (covariance) matrix

I'm new to Stan, so hoping you can point me in the right direction. I'll build up to my situation to make sure we're on the same page...
If I had a collection of univariate normals, the docs tell me that:
y ~ normal(mu_vec, sigma);
provides the same model as the unvectorized version:
for (n in 1:N)
y[n] ~ normal(mu_vec[n], sigma);
but that the vectorized version is (much?) faster. Ok, fine, makes good sense.
So the first question is: is it possible to take advantage of this vectorization speedup in the univariate normal case where both the mu and sigma of the samples vary by position in the vector. I.e. if both mu_vec and sigma_vec are vectors (in the previous case sigma was a scalar), then is this:
y ~ normal(mu_vec, sigma_vec);
equivalent to this:
for (n in 1:N)
y[n] ~ normal(mu_vec[n], sigma_vec[n]);
and if so is there a comparable speedup?
Ok. That's the warmup. The real question is how to best approach the multi-variate equivalent of the above.
In my particular case, I have N of observations of bivariate data for some variable y, which I store in an N x 2 matrix. (For order of magnitude, N is about 1000 in my use case.)
My belief is that the mean of each component of each observation is 0 and that the stdev of each component is each observation is 1 (and I'm happy to hard code them as such). However, my belief is that the correlation (rho) varies from observation to observation as a (simple) function of another observed variable, x (stored in an N element vector). For example, we might say that rho[n] = 2*inverse_logit(beta * x[n]) - 1 for n in 1:N and our goal is to learn about beta from our data. I.e. the covariance matrix for the nth observation would be:
[1, rho[n]]
[rho[n], 1 ]
My question is what's the best way to put this together in a STAN model so that it isn't slow as heck? Is there a vectorized version of the multi_normal distribution so that I could specify this as:
y ~ multi_normal(vector_of_mu_2_tuples, vector_of_sigma_matrices)
or perhaps as some other similar formulation? Or will I need to write:
for (n in 1:N)
y[n] ~ multi_normal(vector_of_mu_2_tuples[n], vector_of_sigma_matrices[n])
after having set up vector_of_sigma_matrices and vector_of_mu_2_tuples in an earlier block?
Thanks in advance for any guidance!
Edit to add code
Using python, I can generate data in the spirit of my problem as follows:
import numpy as np
import pandas as pd
import pystan as pys
import scipy as sp
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
import seaborn as sns
def gen_normal_data(N, true_beta, true_mu, true_stdevs):
N = N
true_beta = true_beta
true_mu = true_mu
true_stdevs = true_stdevs
drivers = np.random.randn(N)
correls = 2.0 * sp.special.expit(drivers*true_beta)-1.0
observations = []
for i in range(N):
covar = np.array([[true_stdevs[0]**2, true_stdevs[0] * true_stdevs[1] * correls[i]],
[true_stdevs[0] * true_stdevs[1] * correls[i], true_stdevs[1]**2]])
observations.append(sp.stats.multivariate_normal.rvs(true_mu, covar, size=1).tolist())
observations = np.array(observations)
return {
'N': N,
'true_mu': true_mu,
'true_stdev': true_stdevs,
'y': observations,
'd': drivers,
'correls': correls
}
and then actually generate the data using:
normal_data = gen_normal_data(100, 1.5, np.array([1., 5.]), np.array([2., 5.]))
Here's what the data set looks like (scatterplot of y colored by correls in the left pane and by drivers in the right pane...so the idea is that the higher the driver the closer to 1 the correl and the lower the driver, the closer to -1 the correl. So would expect red dots on the left pane to be "down-left to up-right" and blue dots to be "up-left to down-right", and indeed they are:
fig, axes = plt.subplots(1, 2, figsize=(15, 6))
x = normal_data['y'][:, 0]
y = normal_data['y'][:, 1]
correls = normal_data['correls']
drivers = normal_data['d']
for ax, colordata, cmap in zip(axes, [correls, drivers], ['coolwarm', 'viridis']):
color_extreme = max(abs(colordata.max()), abs(colordata.min()))
sc = ax.scatter(x, y, c=colordata, lw=0, cmap=cmap, vmin=-color_extreme, vmax=color_extreme)
divider = make_axes_locatable(ax)
cax = divider.append_axes('right', size='5%', pad=0.05)
fig.colorbar(sc, cax=cax, orientation='vertical')
fig.tight_layout()
Using the brute force approach, I can set up a STAN model that looks like this:
model_naked = pys.StanModel(
model_name='naked',
model_code="""
data {
int<lower=0> N;
vector[2] true_mu;
vector[2] true_stdev;
real d[N];
vector[2] y[N];
}
parameters {
real beta;
}
transformed parameters {
}
model {
real rho[N];
matrix[2, 2] cov[N];
for (n in 1:N) {
rho[n] = 2.0*inv_logit(beta * d[n]) - 1.0;
cov[n, 1, 1] = true_stdev[1]^2;
cov[n, 1, 2] = true_stdev[1] * true_stdev[2] * rho[n];
cov[n, 2, 1] = true_stdev[1] * true_stdev[2] * rho[n];
cov[n, 2, 2] = true_stdev[2]^2;
}
beta ~ normal(0, 10000);
for (n in 1:N) {
y[n] ~ multi_normal(true_mu, cov[n]);
}
}
"""
)
This fits nicely:
fit_naked = model_naked.sampling(data=normal_data, iter=1000, chains=2)f = fit_naked.plot();
f.tight_layout()
But I'm hoping someone can point me in the right direction for the "marginalized" approach where we break down our bivariate normal into a pair of independent normals that can be blended using the correlation. The reason I need this is that in my actual use case, both dimensions of are fat-tailed. I am happy to model this as a student-t distribution, but the issue is that STAN only allows a single nu to be specified (not one for each dimension), so I think I'll need to find a way to decompose a multi_student_t into a pair of independent student_t's so that I can set the degrees of freedom separately for each dimension.
The univariate normal distribution does accept vectors for any or all of its arguments and it will be faster than looping over the N observations to call it N times with scalar arguments.
However, the speedup is only going to be linear because the calculations are all the same, but it only has to allocate memory once if you only call it once. The overall wall time is more affected by the number of function evaluations you have to do, which is up to 2^10 - 1 per MCMC iteration (by default), but whether you hit the maximum treedepth depends on the geometry of the posterior distribution you are trying to sample from, which, in turn, depends on everything including the data you condition on.
The bivariate normal distribution can be written as a product of a marginal univariate normal distribution for the first variable and a conditional univariate normal distribution for the second variable given the first variable. In Stan code, we can utilize element-wise multiplication and division to write its log-density like
target += normal_lpdf(first_variable | first_means, first_sigmas);
target += normal_lpdf(second_variable | second_means +
rhos .* first_sigmas ./ second_sigmas .* (first_variable - first_means),
second_sigmas .* sqrt(1 - square(rhos)));
Unfortunately, the more general multivariate normal distribution in Stan does not have an implementation that inputs arrays of covariance matrices.
This isn't quite answering your question, but you can make your program more efficient by removing a bunch of redundant calculations and converting scale a bit to use tanh rather than scaled inverse logit. I'd get rid of the scaling and just use smaller betas, but I left it so that it should get the same results.
data {
int<lower=0> N;
vector[2] mu;
vector[2] sigma;
vector[N] d;
vector[2] y[N];
}
parameters {
real beta;
}
transformed data {
real var1 = square(sigma[1]);
real var2 = square(sigma[2]);
real covar12 = sigma[1] * sigma[2];
vector[N] d_div_2 = d * 0.5;
}
model {
// note: tanh(u) = 2 * inv_logit(u / 2) - 1
vector[N] rho = tanh(beta * d_div_2);
matrix[2, 2] Sigma;
Sigma[1, 1] = var1;
Sigma[2, 2] = var2;
// only reassign what's necessary with minimal recomputation
for (n in 1:N) {
Sigma[1, 2] = rho[n] * covar12;
Sigma[2, 1] = Sigma[1, 2];
y[n] ~ multi_normal(true_mu, Sigma);
}
// weakly informative priors fit more easily
beta ~ normal(0, 8);
}
You could also factor by figuring out Cholesky factorization as function of rho and other fixed values and use that---it saves a solver step in the multivariate normal.
The other option you have is to write out the multi-student-t directly rather than using our built-in implementation. The built-in probably won't be a whole lot faster as the whole operation's pretty heavily dominated by the matrix solve.

Sequential Monte Carlo

I was given this model, and to get the probability I am supposed to simulate the data.
x_1 ∼N(0, 102)
x_t =0.5 ∗ (x_t−1) + 25 · (x_t−1)/(1 + (x_t-1)^2) + 8 · cos(1.2 ∗ (t − 1)) + εt
, t = 2, 3, ..
y_t =(x_t)^2/25 + ηt, t = 1, 2, 3, ...
Where εT and ηt follows normal distribution.
I tried to inverse the function but I cannot do it because of the fact that I have no idea if my Xs will be positive or negative. I understood that I should use a sequential monte carlo, but I can't figure out how to find the functions of the algorithm. What is f and g, and how can we decide x(t-1) if it is equally likely to be positive or negative because of the x squared?
Algorithm:
1 Sample X1 ∼ g1(·). Let w1 = u1 = f1(x1)/g1(x1). Set t = 2
2 Sample Xt|xt−1 ∼ gt(xt|xt−1).
3 Append xt to x1:t−1, obtaining xt
4 Let ut = ft(xt|xt−1)/gt(xt|xt−1)
5 Let wt = wt−1ut , the importance weight for x1:t
6 Increment t and return to step 2
With a time-series model like yours, essentially the only way to compute the probability distribution of x or y is to run multiple simulations of the model, with randomly drawn values of x_0, eps_t, eta_t, and then construct histograms by aggregating the samples across all the runs. In very special cases (e.g. damped Brownian motion) it may be possible to calculate the resulting probability distributions algebraically, but I don't think there's any chance of that for your model.
In Python (I'm afraid I'm not fluent enough in R), you can simulate the time-series by something like this:
import math, random
def simSequence(steps, eps=0.1, eta=0.1):
x = random.normalvariate(0, 102)
ySamples = []
for t in range(steps):
y = (x ** 2) / 25 + random.normalvariate(0, eta)
ySamples.append(y)
x = (0.5 * x + 25 * x / (1 + x ** 2)
+ 8 * math.cos(1.2 * t) + random.normalvariate(0, eps))
return ySamples
(This replaces your t=1..n with t=0..(n-1).)
You could then generate a plot of a few examples of the y time-series:
import matplotlib.pyplot as plt
nSteps = 100
for run in range(5):
history = simSequence(nSteps)
plt.plot(range(nSteps), history)
plt.show()
to get something like:
If you then want to compute the probability distribution of y at different times, you could generate a matrix whose columns represent realizations of y_t at a common value of time and compute histograms at selected values of t:
import numpy
runs = numpy.array([ simSequence(nSteps) for run in range(10000) ])
plt.hist(runs[:,5], bins=25, label='t=5', alpha=0.5, normed=True)
plt.hist(runs[:,10], bins=25, label='t=10', alpha=0.5, normed=True)
plt.legend(loc='best')
plt.show()
which gives:

Math behind Conv2D function in Keras

I am using Conv2D model of Keras 2.0. However, I cannot fully understand what the function is doing mathematically. I try to understand the math using randomly generated data and a very simple network:
import numpy as np
import keras
from keras.layers import Input, Conv2D
from keras.models import Model
from keras import backend as K
# create the model
inputs = Input(shape=(10,10,1)) # 1 channel, 10x10 image
outputs = Conv2D(32, (3, 3), activation='relu', name='block1_conv1')(inputs)
model = Model(outputs=outputs, inputs=inputs)
# input
x = np.random.random(100).reshape((10,10))
# predicted output for x
y_pred = model.predict(x.reshape((1,10,10,1))) # y_pred.shape = (1,8,8,32)
I tried to calculate, for example, the value of the first row, the first column in the first feature map, following the demo in here.
w = model.layers[1].get_weights()[0] # w.shape = (3,3,1,32)
w0 = w[:,:,0,0]
b = model.layers[1].get_weights()[1] # b.shape = (32,)
b0 = b[0] # b0 = 0
y_pred_000 = np.sum(x[0:3,0:3] * w0) + b0
But relu(y_pred_000) is not equal to y_pred[0][0][0][0].
Could anyone point out what's wrong with my understanding? Thank you.
It's easy and it comes from Theano dim ordering. The result of applying filter in stored in a so called channel dimension. In case of TensorFlow this is the last dimension and that's why results are good. In case of Theano it's second dimension (convolution result has shape (cases, channels, width, height) so in order to solve your problem you need to change prediction line to:
y_pred = model.predict(x.reshape((1,1,10,10)))
Also you need to change the way you get the weights as weights in Theano has shape (output_channels, input_channels, width, height) you need to change the weight getter to:
w = model.layers[1].get_weights()[0] # w.shape = (32,1,3,3)
w0 = w[0,0,:,:]

Resources