Sigmoid function and derivative of sigmoid function in ANN

Sigmoid function and derivative of sigmoid function in ANN - math

I'm making ANN from a tutorial. In the tutorial, the sigmoid and dsigmoid are as following:
sigmoid(x) = tanh(x)
dsigmoid(x) = 1-x*x
However, by definition, dsignmoid is derivative of sigmoid function, thus it should be (http://www.derivative-calculator.net/#expr=tanh%28x%29):
dsigmoid(x) = sech(x)*sech(x)
When using 1-x*x, the training does converge, but when I use the mathematically correct derivate, ie. sech squared, the training process doesn't converge.
The question is why 1-x*x works (model trained to correct weights), and the mathematical derivative sech2(x) doesn't (model obtained after max number of iterations holds wrong weights)?

In the first set of formulas, the derivative is expressed as function of the function value, that is
tanh'(x) = 1-tanh(x)^2 = dsigmoid(sigmoid(f))
As that is probably used and implemented in the existing code that way, you will get the wrong derivative if you replace that with the "right" formula.

Related

Smoothing M-splines

I'm currently approximating the baseline function required in a model using Msplines of order d=4:
r(t)=γ1 M1(t|4,x)+...γp Mp(t|4,x)
To avoid local fluctuation, I would like to penalized the likelihood by penalizing the curvature of the baseline function.
l(O)-∫r''(u)^2du
My problem is I don't know how to calculate the following part in R: ∫r''(u)^2du
I've found that ∫r''(u)^2du = γT Ωγ
Where Ω=∫M''(u)M''(u)du
Where M'' is the second derivative matrix of M-splines.

Extracting Lagrange Multipliers from SVM output in R

I would like to extract the alpha lagrange multipliers from the SVM function in the e1071 R package, however I am not sure if svm$coef is producing these?
Alphas are defined as in Equation 9.23, p352, An Introduction to Statistical Learning
In the documentation for SVM, it says that
SVM$Coefs = The corresponding coefficients times the training labels
Could someone please explain it?

$coefs produces alpha_i * y_i, but as alpha_i are by definition non-negative, you can simply take absolute value of coefs and it gives you Lagrange multipliers, and extract y_i by taking a sign (as they are only +1 or -1). This is just a simplification, often used in SVM packages, as multipliers are never actually used - only their product with the label, thus they are stored as a single number, for simplicity and efficiency, and in a case of need (like this one) - you can always reconstruct them.

R function for Likelihood

I'm trying to analyze repairable systems reliability using growth models.
I have already fitted a Crow-Amsaa model but I wonder if there is any package or any code for fitting a Generalized Renewal Process (Kijima Model I) or type II
in R and find it's parameters Beta, Lambda(or alpha) and q.
(or some other model for the mean cumulative function MCF)
The equation number 15 of this article gives an expression for the
Log-likelihood
I tried to create the function like this:
likelihood.G1=function(theta,x){
# x is a vector with the failure times, theta vector of parameters
a=theta[1] #Alpha
b=theta[2] #Beta
q=theta[3] #q
logl2=log(b/a) # First part of the equation
for (i in 1:length(x)){
logl2=logl2 +(b-1)*log(x[i]/(a*(1+q)^(i-1))) -(x[i]/(a*(1+q)^(i-1)))^b
}
return(-logl2) #Negavite of the log-likelihood
}
And then use some rutine for minimize the -Log(L)
theta=c(0.5,1.2,0.8) #Start parameters (lambda,beta,q)
nlm(likelihood.G1,theta, x=Data)
Or also
optim(theta,likelihood.G1,method="BFGS",x=Data)
However it seems to be some mistake, since the parameters it returns has no sense
Any ideas of what I'm doing wrong?
Thanks

Looking at equation (16) of the paper you reference and comparing it with your code it looks like you are missing one term in the for loop. It seems that each data point contributes to three terms of the log-likelihood but in your code (inside the loop) you only have two terms (not considering the updating term)
Specifically, your code does not include the 4th term in equation (16):
and neither it does the 7th term, and so on. This is at least one error in the code. An extra consideration would be that α and β are constrained to be greater than zero. I am not sure if the solver you are using is considering this constraint.

Why do the inverse t-distributions for small values differ in Matlab and R?

I would like to evaluate the inverse Student's t-distribution function for small values, e.g., 1e-18, in Matlab. The degrees of freedom is 2.
Unfortunately, Matlab returns NaN:
tinv(1e-18,2)
NaN
However, if I use R's built-in function:
qt(1e-18,2)
-707106781
The result is sensible. Why can Matlab not evaluate the function for this small value? The Matlab and R results are quite similar to 1e-15, but for smaller values the difference is considerable:
tinv(1e-16,2)/qt(1e-16,2) = 1.05
Does anyone know what is the difference in the implemented algorithms of Matlab and R, and if R gives correct results, how could I effectively calculate the inverse t-distribution, in Matlab, for smaller values?

It appears that R's qt may use a completely different algorithm than Matlab's tinv. I think that you and others should report this deficiency to The MathWorks by filing a service request. By the way, in R2014b and R2015a, -Inf is returned instead of NaN for small values (about eps/8 and less) of the first argument, p. This is more sensible, but I think they should do better.
In the interim, there are several workarounds.
Special Cases
First, in the case of the Student's t-distribution, there are several simple analytic solutions to the inverse CDF or quantile function for certain integer parameters of ν. For your example of ν = 2:
% for v = 2
p = 1e-18;
x = (2*p-1)./sqrt(2*p.*(1-p))
which returns -7.071067811865475e+08. At a minimum, Matlab's tinv should include these special cases (they only do so for ν = 1). It would probably improve the accuracy and speed of these particular solutions as well.
Numeric Inverse
The tinv function is based on the betaincinv function. It appears that it may be this function that is responsible for the loss of precision for small values of the first argument, p. However, as suggested by the OP, one can use the CDF function, tcdf, and root-finding methods to evaluate the inverse CDF numerically. The tcdf function is based on betainc, which doesn't appear to be as sensitive. Using fzero:
p = 1e-18;
v = 2
x = fzero(#(x)tcdf(x,v)-p, 0)
This returns -7.071067811865468e+08. Note that this method is not very robust for values of p close to 1.
Symbolic Solutions
For more general cases, you can take advantage of symbolic math and variable precision arithmetic. You can use identities in terms of Gausian hypergeometric functions, 2F1, as given here for the CDF. Thus, using solve and hypergeom:
% Supposedly valid for or x^2 < v, but appears to work for your example
p = sym('1e-18');
v = sym(2);
syms x
F = 0.5+x*gamma((v+1)/2)*hypergeom([0.5 (v+1)/2],1.5,-x^2/v)/(sqrt(sym('pi')*v)*gamma(v/2));
sol_x = solve(p==F,x);
vpa(sol_x)
The tinv function is based on the betaincinv function. There is no equivalent function or even an incomplete Beta function in the Symbolic Math toolbox or MuPAD, but a similar 2F1 relation for the incomplete Beta function can be used:
p = sym('1e-18');
v = sym(2);
syms x
a = v/2;
F = 1-x^a*hypergeom([a 0.5],a+1,x)/(a*beta(a,0.5));
sol_x = solve(2*abs(p-0.5)==F,x);
sol_x = sign(p-0.5).*sqrt(v.*(1-sol_x)./sol_x);
vpa(sol_x)
Both symbolic schemes return results that agree to -707106781.186547523340184 using the default value of digits.
I've not fully validated the two symbolic methods above so I can't vouch for their correctness in all cases. The code also needs to be vectorized and will be slower than a fully numerical solution.

Log Likelihood using R

I have a probability density function (PDF)
(1-cos(x-theta))/(2*pi)
theta is the unknown parameter. How do I write a log likelihood function for this PDF? I am confused; the x will come from my data, but how do I handle the theta in the equation.
Thanks

You need to use an optimisation or maximisation function in R to compute the value of theta that maximises the log-likelihood. See help(nlmin) for starters.

The function you wrote is a likelihood function of theta given the known x:
ll(theta|x) = log((1-cos(x-theta))/(2*pi))
if you have many iid observations from this distribution, x1,x2,...xn just take the sum of the above:
ll(theta|x1,x2,...) = Sum[log((1-cos(xi-theta))/(2*pi))]

If f(x_i) = (1-cos(x_i-theta))/(2*pi) for observation i, then likelihood function L(Theta)=product(f(x_i)) and logL(theta)=sum(f(x_i)), of course assuming that x_i are independent.

I think log-likelihood only works for normal-distributions. The special property of the log-function is, that it cancels out the exp-function, but here's no exp-function.
Btw., your PDF is periodic and theta just manipulates the phase of that function. Where does this PDF come from? What should it describe?

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Sigmoid function and derivative of sigmoid function in ANN - math

In the first set of formulas, the derivative is expressed as function of the function value, that is tanh'(x) = 1-tanh(x)^2 = dsigmoid(sigmoid(f)) As that is probably used and implemented in the existing code that way, you will get the wrong derivative if you replace that with the "right" formula.

Related

Smoothing M-splines

Extracting Lagrange Multipliers from SVM output in R

R function for Likelihood

Why do the inverse t-distributions for small values differ in Matlab and R?

Log Likelihood using R

Categories

Resources