Neural Network Initialization - Nguyen Widrow Implementation? - initialization

I've had a go at implementing the Nguyen Widrow algorithm (below) and it appears to function correctly, but I have some follow-on questions:
Does this look like a correct implementation?
Does Nguyen Widrow initialization apply to any network topology /
size ? (ie 5 layer AutoEncoder)
Is Nguyen Widrow initialization valid for any input range? (0/1, -1/+1, etc)
Is Nguyen Widrow initialization valid for any activation function? (Ie Logistic, Tanh, Linear)
The code below assumes that the network has already been randomized to -1/+1 :
' Calculate the number of hidden neurons
Dim HiddenNeuronsCount As Integer = Me.TotalNeuronsCount - (Me.InputsCount - Me.OutputsCount)
' Calculate the Beta value for all hidden layers
Dim Beta As Double = (0.7 * Math.Pow(HiddenNeuronsCount, (1.0 / Me.InputsCount)))
' Loop through each layer in neural network, skipping input layer
For i As Integer = 1 To Layers.GetUpperBound(0)
' Loop through each neuron in layer
For j As Integer = 0 To Layers(i).Neurons.GetUpperBound(0)
Dim InputsNorm As Double = 0
' Loop through each weight in neuron inputs, add weight value to InputsNorm
For k As Integer = 0 To Layers(i).Neurons(j).ConnectionWeights.GetUpperBound(0)
InputsNorm += Layers(i).Neurons(j).ConnectionWeights(k) * Layers(i).Neurons(j).ConnectionWeights(k)
Next
' Add bias value to InputsNorm
InputsNorm += Layers(i).Neurons(j).Bias * Layers(i).Neurons(j).Bias
' Finalize euclidean norm calculation
InputsNorm = Math.Sqrt(InputsNorm)
' Loop through each weight in neuron inputs, scale the weight based on euclidean norm and beta
For k As Integer = 0 To Layers(i).Neurons(j).ConnectionWeights.GetUpperBound(0)
Layers(i).Neurons(j).ConnectionWeights(k) = (Beta * Layers(i).Neurons(j).ConnectionWeights(k)) / InputsNorm
Next
' Scale the bias based on euclidean norm and beta
Layers(i).Neurons(j).Bias = (Beta * Layers(i).Neurons(j).Bias) / InputsNorm
Next
Next

Nguyen & Widrow in their paper assume that the inputs are between -1 and +1.
Nguyen Widrow initialization is valid for any activation function which is finite in length.
Again in their paper they are only talking about a 2 layer NN, not sure about a 5 layer one.
S

Related

Is there any way to bound the region searched by NLsolve in Julia?

I'm trying to find one of the roots of a nonlinear (roughly quartic) equation.
The equation always has four roots, a pair of them close to zero, a large positive, and a large negative root. I'd like to identify either of the near zero roots, but nlsolve, even with an initial guess very close to these roots, seems to always converge on the large positive or negative root.
A plot of the function essentially looks like a constant negative value, with a (very narrow) even-ordered pole near zero, and gradually rising to cross zero at the large positive and negative roots.
Is there any way I can limit the region searched by nlsolve, or do something to make it more sensitive to the presence of this pole in my function?
EDIT:
Here's some example code reproducing the problem:
using NLsolve
function f!(F,x)
x = x[1]
F[1] = -15000 + x^4 / (x+1e-5)^2
end
# nlsolve will find the root at -122
nlsolve(f!,[0.0])
As output, I get:
Results of Nonlinear Solver Algorithm
* Algorithm: Trust-region with dogleg and autoscaling
* Starting Point: [0.0]
* Zero: [-122.47447713915808]
* Inf-norm of residuals: 0.000000
* Iterations: 15
* Convergence: true
* |x - x'| < 0.0e+00: false
* |f(x)| < 1.0e-08: true
* Function Calls (f): 16
* Jacobian Calls (df/dx): 6
We can find the exact roots in this case by transforming the objective function into a polynomial:
using PolynomialRoots
roots([-1.5e-6,-0.3,-15000,0,1])
produces
4-element Array{Complex{Float64},1}:
122.47449713915809 - 0.0im
-122.47447713915808 + 0.0im
-1.0000000813048448e-5 + 0.0im
-9.999999186951818e-6 + 0.0im
I would love a way to identify the pair of roots around the pole at x = -1e-5 without knowing the exact form of the objective function.
EDIT2:
Trying out Roots.jl :
using Roots
f(x) = -15000 + x^4 / (x+1e-5)^2
find_zero(f,0.0) # finds +122... root
find_zero(f,(-1e-4,0.0)) # error, not a bracketing interval
find_zeros(f,-1e-4,0.0) # finds 0-element Array{Float64,1}
find_zeros(f,-1e-4,0.0,no_pts=6) # finds root slightly less than -1e-5
find_zeros(f,-1e-4,0.0,no_pts=10) # finds 0-element Array{Float64,1}, sensitive to value of no_pts
I can get find_zeros to work, but it's very sensitive to the no_pts argument and the exact values of the endpoints I pick. Doing a loop over no_pts and taking the first non-empty result might work, but something more deterministic to converge would be preferable.
EDIT3 :
Here's applying the tanh transformation suggested by Bogumił
using NLsolve
function f_tanh!(F,x)
x = x[1]
x = -1e-4 * (tanh(x)+1) / 2
F[1] = -15000 + x^4 / (x+1e-5)^2
end
nlsolve(f_tanh!,[100.0]) # doesn't converge
nlsolve(f_tanh!,[1e5]) # doesn't converge
using Roots
function f_tanh(x)
x = -1e-4 * (tanh(x)+1) / 2
return -15000 + x^4 / (x+1e-5)^2
end
find_zeros(f_tanh,-1e10,1e10) # 0-element Array
find_zeros(f_tanh,-1e3,1e3,no_pts=100) # 0-element Array
find_zero(f_tanh,0.0) # convergence failed
find_zero(f_tanh,0.0,max_evals=1_000_000,maxfnevals=1_000_000) # convergence failed
EDIT4 : This combination of techniques identifies at least one root somewhere around 95% of the time, which is good enough for me.
using Peaks
using Primes
using Roots
# randomize pole location
a = 1e-4*rand()
f(x) = -15000 + x^4 / (x+a)^2
# do an initial sample to find the pole location
l = 1000
minval = -1e-4
maxval = 0
m = []
sample_r = []
while l < 1e6
sample_r = range(minval,maxval,length=l)
rough_sample = f.(sample_r)
m = maxima(rough_sample)
if length(m) > 0
break
else
l *= 10
end
end
guess = sample_r[m[1]]
# functions to compress the range around the estimated pole
cube(x) = (x-guess)^3 + guess
uncube(x) = cbrt(x-guess) + guess
f_cube(x) = f(cube(x))
shift = l ÷ 1000
low = sample_r[m[1]-shift]
high = sample_r[m[1]+shift]
# search only over prime no_pts, so no samplings divide into each other
# possibly not necessary?
for i in primes(500)
z = find_zeros(f_cube,uncube(low),uncube(high),no_pts=i)
if length(z)>0
println(i)
println(cube.(z))
break
end
end
More comment could be given if you provided more information on your problem.
However in general:
It seems that your problem is univariate, in which case you can use Roots.jl where find_zero and find_zeros give the interface you ask for (i.e. allowing to specify the search region)
If a problem is multivariate you have several options how to do it in the problem specification for nlsolve (as it by default does not allow to specify a bounding box AFAICT). The simplest is to use variable transformation. E.g. you can apply a ai * tanh(xi) + bi transformation selecting ai and bi for each variable so that it is bounded to the desired interval
The first problem you have in your definition is that the way you define f it never crosses 0 near the two roots you are looking for because Float64 does not have enough precision when you write 1e-5. You need to use greater precision of computations:
julia> using Roots
julia> f(x) = -15000 + x^4 / (x+1/big(10.0^5))^2
f (generic function with 1 method)
julia> find_zeros(f,big(-2*10^-5), big(-8*10^-6), no_pts=100)
2-element Array{BigFloat,1}:
-1.000000081649671426108658262468117284940444265467160592853348997523986352593615e-05
-9.999999183503552405580084054429938261707450678661727461293670518591720605751116e-06
and set no_pts to be sufficiently large to find intervals bracketing the roots.

Wave prediction after a FFT with a certain phase and frequency

I am using a sliding window to extract information from my EEG data with a FFT. Now I want to predict the signal from my window into the next one. So I extract the phase from a 0.25 second time window to predict for the next 0.25 second long window.
I am new to signal-processing/prediction, so my knowledge here is a little rusty.
I am not able to generate a sine wave with my extracted phase and frequency. I am just not finding a solution. I might just need a push into the right direction, who knows.
Is there a function in R to help me generate a suitable sine wave?
So I have my maximum Frequency with the phase extracted and need to generate a wave with this information.
here is pseudo-code to synthesize a sin curve of chosen frequency ... currently it assumes an initial seed phase shift of zero so just alter the theta value if you need a different initial phase shift
func pop_audio_buffer(number_of_samples float64, given_freq float64,
samples_per_second float64) ([]float64, error) {
// output sinusoidal curve is assured to both start and stop at the zero cross over threshold,
// independent of supplied input parms which control samples per cycle and buffer size.
// This avoids that "pop" which otherwise happens when rendering audio curves
// which begins at say 0.5 of a possible range -1 to 0 to +1
int_number_of_samples := int(number_of_samples)
if int_number_of_samples == 0 {
panic("ERROR - seeing 0 number_of_samples in pop_audio_buffer ... float number_of_samples " +
FloatToString(number_of_samples) + " is your desired_num_seconds too small ? " +
" or maybe too low value of sample rate")
}
source_buffer := make([]float64, int_number_of_samples)
incr_theta := (2.0 * math.Pi * given_freq) / samples_per_second
theta := 0.0
for curr_sample := 0; curr_sample < int_number_of_samples; curr_sample++ {
source_buffer[curr_sample] = math.Sin(theta)
theta += incr_theta
}
return source_buffer, nil
} // pop_audio_buffer

Solve the Heat Equation with non-zero Dirichlet BCs with Implicit Euler and Conjugate Gradient Linear Solvers?

Many users have asked how to solve the Heat Equation, u_t = u_xx, with non-zero Dirichlet BCs and with conjugate gradients for the internal linear solver. This is a common simplified PDE problem before moving to more difficult versions of parabolic PDEs. How is this done in DifferentialEquations.jl?
Let's solve this problem in steps. First, let's build the linear operator for the discretized Heat Equation with Dirichlet BCs. A discussion of the discretization can be found on this Wiki page which shows that the central difference method gives a 2nd order discretization of the second derivative by (u[i-1] - 2u[i] + u[i+1])/dx^2. This is the same as multiplying by the Tridiagonal matrix of [1 -2 1]*(1/dx^2), so let's start by building this matrix:
using LinearAlgebra, OrdinaryDiffEq
x = collect(-π : 2π/511 : π)
## Dirichlet 0 BCs
u0 = #. -(x).^2 + π^2
n = length(x)
A = 1/(2π/511)^2 * Tridiagonal(ones(n-1),-2ones(n),ones(n-1))
Notice that we have implicitly simplified the end, since (u[0] - 2u[1] + u[2])/dx^2 = (- 2u[1] + u[2])/dx^2 when the left BC is zero, so the term is dropped from the matmul. We then use this discretization of the derivative to solve the Heat Equation:
function f(du,u,A,t)
mul!(du,A,u)
end
prob = ODEProblem(f,u0,(0.0,10.0),A)
sol = solve(prob,ImplicitEuler())
using Plots
plot(sol[1])
plot!(sol[end])
Now we make the BCs non-zero. Notice that we just have to add back the u[0]/dx^2 that we previously dropped off, so we have:
## Dirichlet non-zero BCs
## Note that the operator is no longer linear
## To handle affine BCs, we add the dropped term
u0 = #. (x - 0.5).^2 + 1/12
n = length(x)
A = 1/(2π/511)^2 * Tridiagonal(ones(n-1),-2ones(n),ones(n-1))
function f(du,u,A,t)
mul!(du,A,u)
# Now do the affine part of the BCs
du[1] += 1/(2π/511)^2 * u0[1]
du[end] += 1/(2π/511)^2 * u0[end]
end
prob = ODEProblem(f,u0,(0.0,10.0),A)
sol = solve(prob,ImplicitEuler())
plot(sol[1])
plot!(sol[end])
Now let's swap out the linear solver. The documentation suggests that you should use LinSolveCG here, which looks like:
sol = solve(prob,ImplicitEuler(linsolve=LinSolveCG()))
There are some advantages to this, since it has a norm handling that helps conditioning. Howerver, the documentation also states that you can build your own linear solver routine. This is done by giving a Val{:init} dispatch that returns the type to use as the linear solver, so we do:
## Create a linear solver for CG
using IterativeSolvers
function linsolve!(::Type{Val{:init}},f,u0;kwargs...)
function _linsolve!(x,A,b,update_matrix=false;kwargs...)
cg!(x,A,b)
end
end
sol = solve(prob,ImplicitEuler(linsolve=linsolve!))
plot(sol[1])
plot!(sol[end])
And there we are, non-zero Dirichlet Heat Equation with a Krylov method (conjugate gradients) for the linear solver, making it a Newton-Krylov method.

Basic RNN cost function not converging to zero

I'm writing a basic RNN.
My function for generating the hidden state is:
tanh(xU + sW)
Where x is the input vector and s is the previous hidden state vector. U and W are both the parameters that are being adjusted during backprop.
For modifying U and W I use:
U += 1/(cosh^2(xU+sW)) * x * expectedValue * stepSize
W += 1/(cosh^2(xU+sW)) * s * expectedValue * stepSize
Where stepSize is about 0.01, though I've tested lots of smaller values. The expectedValue is the same for both and it is just the value of the function I am trying to learn for testing.
For the cost function to determine how close my estimations are, I am using the mean squared error function:
1/n * (expectedValue^2 - predictedValue^2)
My cost function is not converging to zero over 10,000,000 iterations. Am I screwing up some of the math somewhere?
It could be the gradient vanishing problem. Try RELU activation function instead of tanh().

periodic boundary conditions - finite differences

Hi I have a code below that solves non linear coupled PDE's. However I need to implement periodic boundary conditions. The periodic boundary conditions are troubling me, what should I add into my code to enforce periodic boundary conditions? Updated based on modular arithmetic suggestions below.
Note, t>=0 and x is in the interval [0,1]. Here are the coupled equations, below that I provide my code
where a, b > 0.
Those are the initial conditions, but now I need to impose periodic boundary conditions. These can be mathematically written as u(0,t)=u(1,t) and du(0,t)/dx=du(1,t)/dx, the same holds for f(x,t). The du/dx I have for the boundary conditions are really meant to be partial derivatives.
My code is below
program coupledPDE
integer, parameter :: n = 10, A = 20
real, parameter :: h = 0.1, k = 0.05
real, dimension(0:n-1) :: u,v,w,f,g,d
integer:: i,m
real:: t, R, x,c1,c2,c3,aa,b
R=(k/h)**2.
aa=2.0
b=1.0
c1=(2.+aa*k**2.-2.0*R)/(1+k/2.)
c2=R/(1.+k/2.)
c3=(1.0-k/2.)/(1.0+k/2.)
c4=b*k**2./(1+k/2.)
do i = 0,n-1 !loop over all space points except 0 and n
x = real(i)*h
w(i) = z(x) !u(x,0)
d(i) = z(x) !f(x,0)
end do
do i=0,n-1
ip=mod(i+1,n)
il=modulo(i-1,n)
v(i) = (c1/(1.+c3))*w(i) + (c2/(1.+c3))*(w(ip)+w(il)) -(c4/(1.+c3))*w(i)*((w(i))**2.+(d(i))**2.) !\partial_t u(x,0)=0
g(i) = (c1/(1.+c3))*d(i) + (c2/(1.+c3))*(d(ip)+d(il)) -(c4/(1.+c3))*d(i)*((w(i))**2.+(d(i))**2.) !\partial_t f(x,0)=0
end do
do m=1,A
do i=0,n-1
ip=mod(i+1,n)
il=modulo(i-1,n)
u(i)=c1*v(i)+c2*(v(ip)+v(il))-c3*w(i)-c4*v(i)*((v(i))**2.+(g(i))**2.)
f(i)=c1*g(i)+c2*(g(ip)+g(il))-c3*d(i)-c4*g(i)*((v(i))**2.+(g(i))**2.)
end do
print*, "the values of u(x,t+k) for all m=",m
print "(//3x,i5,//(3(3x,e22.14)))",m,u
do i=0,n-1
w(i)=v(i)
v(i)=u(i)
d(i)=g(i)
t=real(m)*k
x=real(i)*k
end do
end do
end program coupledPDE
function z(x)
real, intent(in) :: x
real :: pi
pi=4.0*atan(1.0)
z = sin(pi*x)
end function z
Thanks for reading, if I should reformat my question in a more proper way please let me know.
One option to boundary conditions in PDE discretization is to use ghost (halo) cells (gridpoints). It may be not the most clever one for periodic BC, but it can be used for all other boundary condition types.
So you declare your arrays as
real, dimension(-1:n) :: u,v,w,f,g,d
but you solve your PDE only in points 0..n-1 (point n is identical with point 0). You could also do from 1..n and declare arrays form 0..n+1.
Then you set
u(-1) = u(n-1)
and
u(n) = u(0)
and the same for all other arrays.
At each time-step you set this again for u and f or all other fields that are modified during the solution:
do m=1,A
u(-1) = u(n-1)
u(n) = u(0)
f(-1) = f(n-1)
f(n) = f(0)
do i=0,n-1 !Discretization equation for all times after the 1st step
u(i)=...
f(i)=...
end do
end do
All above assumed explicit temporal discretization and spatial discretization with finite differences and it assumed that x(0) = 0 and x(n) = 1 are your boundary points.

Resources