Math behind Conv2D function in Keras - math

I am using Conv2D model of Keras 2.0. However, I cannot fully understand what the function is doing mathematically. I try to understand the math using randomly generated data and a very simple network:
import numpy as np
import keras
from keras.layers import Input, Conv2D
from keras.models import Model
from keras import backend as K
# create the model
inputs = Input(shape=(10,10,1)) # 1 channel, 10x10 image
outputs = Conv2D(32, (3, 3), activation='relu', name='block1_conv1')(inputs)
model = Model(outputs=outputs, inputs=inputs)
# input
x = np.random.random(100).reshape((10,10))
# predicted output for x
y_pred = model.predict(x.reshape((1,10,10,1))) # y_pred.shape = (1,8,8,32)
I tried to calculate, for example, the value of the first row, the first column in the first feature map, following the demo in here.
w = model.layers[1].get_weights()[0] # w.shape = (3,3,1,32)
w0 = w[:,:,0,0]
b = model.layers[1].get_weights()[1] # b.shape = (32,)
b0 = b[0] # b0 = 0
y_pred_000 = np.sum(x[0:3,0:3] * w0) + b0
But relu(y_pred_000) is not equal to y_pred[0][0][0][0].
Could anyone point out what's wrong with my understanding? Thank you.

It's easy and it comes from Theano dim ordering. The result of applying filter in stored in a so called channel dimension. In case of TensorFlow this is the last dimension and that's why results are good. In case of Theano it's second dimension (convolution result has shape (cases, channels, width, height) so in order to solve your problem you need to change prediction line to:
y_pred = model.predict(x.reshape((1,1,10,10)))
Also you need to change the way you get the weights as weights in Theano has shape (output_channels, input_channels, width, height) you need to change the weight getter to:
w = model.layers[1].get_weights()[0] # w.shape = (32,1,3,3)
w0 = w[0,0,:,:]

Related

Interpretation/Mechanics of eigenvectors/eigenvalues of covariance matrix (PCA)

1st Q: Can someone explain the connection between the covariance and the eigenvectors?
2nd Q: How do dependencies in the data effect my PCA and how is the "best" component then chosen?
This could happen with easy detectable things like height, weight and BMI or less obvious things.
Maybe someone can recommend a video or explain clearly how to think about the eigenvectors extracted from the covariance matrix.
I understand them as the characteristic components which stay the same after applying matrix transformation.
But here we don't apply any transformation so how to interpret them now.
I read so far that eigenvalues are the "amount" of how much the extracted eigenvector spans/explains space. 3rd Q:Is this wrong ?
# Testing something for us !
import numpy as np
import pandas as pd
x = np.array((range(1,11)))
y = np.array((range(20,220,20)))
z = x*y
data = np.array([x,y,z])
test_matrix = np.cov(data)
#print(test_matrix)
eigenval, eigenvect = np.linalg.eig(test_matrix)
print(eigenval)
#sorting
sorting_indices = eigenval.argsort()[::-1] # not needed in our case
eigenval = eigenval[sorting_indices]
eigenvect = eigenvect[sorting_indices]
print(eigenval)
print(eigenvect_test)
# mapping
result = np.dot(eigenvect.transpose(), data)
# first component explanation amount
first_comp_perc =eigenval[0]/sum(eigenval)
print(f'The first component makes up for {round(first_comp_perc*100, 2)}% of the data variance')

exponential decay with scipy just gives step function

I'm trying to do an exponential fit with a set of data:
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as opt
def func(x, a, b, c):
return a * np.exp(x / -b) + c
epr_data = np.loadtxt('T2_text', skiprows=1)
time = epr_data[:, 1]
intensity = epr_data[:, 2]
optimizedParameters, pcov = opt.curve_fit(func, time, intensity)
print(optimizedParameters)
plt.plot(time, intensity, func(time, *optimizedParameters), label="fit")
plt.show()
but i just get this step function and these parameters:
[1.88476367e+05 1.00000000e+00 6.49563230e+03]
the plot with "fit"
as well as this error message:
OptimizeWarning: Covariance of the parameters could not be estimated
warnings.warn('Covariance of the parameters could not be estimated'
EDIT:
https://pastebin.com/GTTGf0ed
i want to plot the time and first row with intensity
the graph after your suggestion
edit 2:
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as opt
def func(x, a, b, c):
return a * np.exp(x / -b) + c
epr_data = np.loadtxt('T2_text', skiprows=1)
time = epr_data[:, 1]
intensity = epr_data[:, 2]
c0 = np.mean(intensity[-10:])
a0 = intensity[0]-c0
th = time[np.searchsorted(-intensity+c0, -0.5*a0)]
b0 = th / np.log(2)
optimizedParameters, pcov = opt.curve_fit(func, time, intensity, p0=(a0, b0, c0))
print(optimizedParameters)
plt.plot(time, intensity, label='data')
plt.plot(time, func(time, *optimizedParameters), label="fit")
plt.legend()
plt.show()
First, fix your plot function call. To plot two curves with one call, you have to give the x and y values for each curve:
plt.plot(time, intensity, time, func(time, *optimizedParameters), label="fit")
For labeling, it might be simpler to call plot twice:
plt.plot(time, intensity, label="data")
plt.plot(time, func(time, *optimizedParameters), label="fit")
It would be easier to address the warning generated by curve_fit if we had your data. Experience shows, however, that it is most likely that the default initial guess of the parameters used by curve_fit (which is all 1s) is just a really bad guess.
Try helping curve_fit by giving it a better starting point for its numerical optimization routine. The following bit of code shows how you can compute rough guesses for a, b and c. Pass these to curve_fit with the argument p0=(a0, b0, c0).
# Assume that the time series is long enough that the tail
# has flattened out to approximately random noise around c.
c0 = np.mean(intensity[-10:])
# This assumes time[0] is 0.
a0 = intensity[0] - c0
# Rough guess of the half-life.
th = time[np.searchsorted(-intensity+c0, -0.5*a0)]
b0 = th / np.log(2)

customized metric function for multi class in lightgbm

In my data, there are about 70 classes and I am using lightGBM to predict the correct class label.
In R, would like to have a customised "metric" function where I can evaluate whether top 3 predictions by lightgbm cover the true label.
The link here is inspiring to see
def lgb_f1_score(y_hat, data):
y_true = data.get_label()
y_hat = np.round(y_hat) # scikits f1 doesn't like probabilities
return 'f1', f1_score(y_true, y_hat), True
however I don't know the dimensionality of the arguments going to function. seems data are shuffled for some reason.
Scikit-learn implementation
from sklearn.metrics import f1_score
def lgb_f1_score(y_true, y_pred):
preds = y_pred.reshape(len(np.unique(y_true)), -1)
preds = preds.argmax(axis = 0)
print(preds.shape)
print(y_true.shape)
return 'f1', f1_score(y_true, preds,average='weighted'), True
After reading through the docs for lgb.train and lgb.cv, I had to make a separate function get_ith_pred and then call that repeatedly within lgb_f1_score.
The function's docstring explains how it works. I have used the same argument names as in the LightGBM docs. This can work for any number of classes but does not work for binary classification. In the binary case, preds is a 1D array containing the probability of the positive class.
from sklearn.metrics import f1_score
def get_ith_pred(preds, i, num_data, num_class):
"""
preds: 1D NumPY array
A 1D numpy array containing predicted probabilities. Has shape
(num_data * num_class,). So, For binary classification with
100 rows of data in your training set, preds is shape (200,),
i.e. (100 * 2,).
i: int
The row/sample in your training data you wish to calculate
the prediction for.
num_data: int
The number of rows/samples in your training data
num_class: int
The number of classes in your classification task.
Must be greater than 2.
LightGBM docs tell us that to get the probability of class 0 for
the 5th row of the dataset we do preds[0 * num_data + 5].
For class 1 prediction of 7th row, do preds[1 * num_data + 7].
sklearn's f1_score(y_true, y_pred) expects y_pred to be of the form
[0, 1, 1, 1, 1, 0...] and not probabilities.
This function translates preds into the form sklearn's f1_score
understands.
"""
# Only works for multiclass classification
assert num_class > 2
preds_for_ith_row = [preds[class_label * num_data + i]
for class_label in range(num_class)]
# The element with the highest probability is predicted
return np.argmax(preds_for_ith_row)
def lgb_f1_score(preds, train_data):
y_true = train_data.get_label()
num_data = len(y_true)
num_class = 70
y_pred = []
for i in range(num_data):
ith_pred = get_ith_pred(preds, i, num_data, num_class)
y_pred.append(ith_pred)
return 'f1', f1_score(y_true, y_pred, average='weighted'), True

PyQt-Fit's NonParamRegression vs. R's loess

Are those two functions more or less equivalent? For example, if I have an R call like:
loess(formula = myformula, data = mydata, span = myspan, degree = 2, normalize = TRUE, family = "gaussian")
How can I obtain the same or similar result with PyQt-Fit? Should I simply call the smooth.NonParamRegression function (http://pythonhosted.org/PyQt-Fit/NonParam_tut.html) with method=npr_methods.LocalPolynomialKernel(q=2)? What about other parameters, such as span, and family?
UPDATE
I do realize the two implementations are likely not equivalent (https://www.statsdirect.com/help/nonparametric_methods/loess.htm). But any comments regarding "approximating" their outcomes are appreciated.
Statsmodels has a LOWESS implementation
(http://www.statsmodels.org/devel/generated/statsmodels.nonparametric.smoothers_lowess.lowess.html).
Check out this post on the difference between LOESS and LOWESS: https://stats.stackexchange.com/questions/161069/difference-between-loess-and-lowess
Quick example on how to use statsmodels' lowess function in Python
import numpy as np
import statsmodels.api as sm
lowess = sm.nonparametric.lowess
Generate two random arrays, x and y:
x = np.random.rand(100, 1)
y = np.random.rand(100, 1)
Run the lowess function (Frac refers to bandwidth. Note that frac and it are set arbitrarily. Also, not all parameters are specified here, some are set to default. For more, see the official documentation):
results = lowess(y, x, frac=0.05, it=3)
The results are stored in a two-dimensional array. The first column contains the sorted x (exog) values and the second column the associated estimated y (endog) values.
If, for instance, you'd like to construct the residuals, you can proceed as follows:
res = y - results[:,1]

Optimized fitting coefficients for better fitting

I'm running a nonlinear least squares using the minpack.lm package.
However, for each group in the data I would like optimize (minimize) fitting parameters like similar to Python's minimize function.
The minimize() function is a wrapper around Minimizer for running an
optimization problem. It takes an objective function (the function
that calculates the array to be minimized), a Parameters object, and
several optional arguments.
The reason why I need this is that I want to optimize fitting function based on the obtained fitting parameters to find global fitting parameters that can fit both of the groups in the data.
Here is my current approach for fitting in groups,
df <- data.frame(y=c(replicate(2,c(rnorm(10,0.18,0.01), rnorm(10,0.17,0.01))),
c(replicate(2,c(rnorm(10,0.27,0.01), rnorm(10,0.26,0.01))))),
DVD=c(replicate(4,c(rnorm(10,60,2),rnorm(10,80,2)))),
gr = rep(seq(1,2),each=40),logic=rep(c(1,0),each=40))
the fitting equation of these groups is
fitt <- function(data) {
fit <- nlsLM(y~pi*label2*(DVD/2+U1)^2,
data=data,start=c(label2=1,U1=4),trace=T,control = nls.lm.control(maxiter=130))
}
library(minpack.lm)
library(plyr) # will help to fit in groups
fit <- dlply(df, c('gr'), .fun = fitt) #,"Die" only grouped by Waferr
> fit
$`1`
Nonlinear regression model
model: y ~ pi * label2 * (DVD/2 + U1)^2
data: data
label2 U1
2.005e-05 1.630e+03
$`2`
label2 U1
2.654 -35.104
I need to know are there any function that optimizes the sum-of-squares to get best fitting for both of the groups.
We may say that you already have the best fitting parameters as the residual sum-of-squares but I know that minimizer can do this but I haven't find any similar example we can do this in R.
ps. I made it up the numbers and fitting lines.
Not sure about r, but having least squares with shared parameters is usually simple to implement.
A simple python example looks like:
import matplotlib
matplotlib.use('Qt4Agg')
from matplotlib import pyplot as plt
from random import random
from scipy import optimize
import numpy as np
#just for my normal distributed errord
def boxmuller(x0,sigma):
u1=random()
u2=random()
ll=np.sqrt(-2*np.log(u1))
z0=ll*np.cos(2*np.pi*u2)
z1=ll*np.cos(2*np.pi*u2)
return sigma*z0+x0, sigma*z1+x0
#some non-linear function
def f0(x,a,b,c,s=0.05):
return a*np.sqrt(x**2+b**2)-np.log(c**2+x)+boxmuller(0,s)[0]
# residual function for least squares takes two data sets.
# not necessarily same length
# two of three parameters are common
def residuals(parameters,l1,l2,dataPoints):
a,b,c1,c2 = parameters
set1=dataPoints[:l1]
set2=dataPoints[-l2:]
distance1 = [(a*np.sqrt(x**2+b**2)-np.log(c1**2+x))-y for x,y in set1]
distance2 = [(a*np.sqrt(x**2+b**2)-np.log(c2**2+x))-y for x,y in set2]
res = distance1+distance2
return res
xList0=np.linspace(0,8,50)
#some xy data
xList1=np.linspace(0,7,25)
data1=np.array([f0(x,1.2,2.3,.33) for x in xList1])
#more xy data using different third parameter
xList2=np.linspace(0.1,7.5,28)
data2=np.array([f0(x,1.2,2.3,.77) for x in xList2])
alldata=np.array(zip(xList1,data1)+zip(xList2,data2))
# rough estimates
estimate = [1, 1, 1, .1]
#fitting; providing second length is actually redundant
bestFitValues, ier= optimize.leastsq(residuals, estimate,args=(len(data1),len(data2),alldata))
print bestFitValues
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(xList1, data1)
ax.scatter(xList2, data2)
ax.plot(xList0,[f0(x,bestFitValues[0],bestFitValues[1],bestFitValues[2] ,s=0) for x in xList0])
ax.plot(xList0,[f0(x,bestFitValues[0],bestFitValues[1],bestFitValues[3] ,s=0) for x in xList0])
plt.show()
#output
>> [ 1.19841984 2.31591587 0.34936418 0.7998094 ]
If required you can even make your minimization yourself. If your parameter space is sort of well behaved, i.e. approximately parabolic minimum, a simple Nelder Mead method is quite OK.

Resources