Gradient descent implementation is not working in Julia - julia

I am trying to Implement gradient Descent algorithm from scratch to find the slope and intercept value for my linear fit line.
Using the package and calculating slope and intercept, I get slope = 0.04 and intercept = 7.2 but when I use my gradient descent algorithm for the same problem, I get slope and intercept both values = (-infinity,-infinity)
Here is my code
x= [1,2,3,4,5,6,7,8,9,10,11,12,13,141,5,16,17,18,19,20]
y=[2,3,4,5,6,7,8,9,10,11,12,13,141,5,16,17,18,19,20,21]
function GradientDescent()
m=0
c=0
for i=1:10000
for k=1:length(x)
Yp = m*x[k] + c
E = y[k]-Yp #error in predicted value
dm = 2*E*(-x[k]) # partial derivation of cost function w.r.t slope(m)
dc = 2*E*(-1) # partial derivate of cost function w.r.t. Intercept(c)
m = m + (dm * 0.001)
c = c + (dc * 0.001)
end
end
return m,c
end
Values = GradientDescent() # after running values = (-inf,-inf)

I have not done the math, but instead wrote the tests. It seems you got a sign error when assigning m and c.
Also, writing the tests really helps, and Julia makes it simple :)
function GradientDescent(x, y)
m=0.0
c=0.0
for i=1:10000
for k=1:length(x)
Yp = m*x[k] + c
E = y[k]-Yp
dm = 2*E*(-x[k])
dc = 2*E*(-1)
m = m - (dm * 0.001)
c = c - (dc * 0.001)
end
end
return m,c
end
using Base.Test
#testset "gradient descent" begin
#testset "slope $slope" for slope in [0, 1, 2]
#testset "intercept for $intercept" for intercept in [0, 1, 2]
x = 1:20
y = broadcast(x -> slope * x + intercept, x)
computed_slope, computed_intercept = GradientDescent(x, y)
#test slope ≈ computed_slope atol=1e-8
#test intercept ≈ computed_intercept atol=1e-8
end
end
end

I can't get your exact numbers, but this is close. Perhaps it helps?
# 141 ?
datax = [1,2,3,4,5,6,7,8,9,10,11,12,13,141,5,16,17,18,19,20]
datay = [2,3,4,5,6,7,8,9,10,11,12,13,141,5,16,17,18,19,20,21]
function gradientdescent()
m = 0
b = 0
learning_rate = 0.00001
for n in 1:10000
for i in 1:length(datay)
x = datax[i]
y = datay[i]
guess = m * x + b
error = y - guess
dm = 2error * x
dc = 2error
m += dm * learning_rate
b += dc * learning_rate
end
end
return m, b
end
gradientdescent()
(-0.04, 17.35)
It seems that adjusting the learning rate is critical...

Related

Julia and Zygote: Adjoint Error When Doing QR Factorization

I have written a surrogate function called GEKPLS and I am trying to make the code work with an optimizer to find the optimal theta parameter values.
As a first step, I'm trying to make it work with Zygote and have the following code:
function min_rlfv(theta)
g = GEKPLS(X, y, grads, n_comp, delta_x, xlimits, extra_points, theta)
return -g.reduced_likelihood_function_value
end
Zygote.gradient(min_rlfv, [0.01, 0.1])
Running the above code results in the following error message:
ERROR: Need an adjoint for constructor LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}}. Gradient is of type LinearAlgebra.Transpose{Float64, Matrix{Float64}}
The stacktrace leads to the following function. The highlighted line in my code editor is Q, G = qr(Ft) from the following code block:
function _reduced_likelihood_function(theta, kernel_type, d, nt, ij, y_norma, noise = 0.0)
reduced_likelihood_function_value = -Inf
nugget = 1000000.0 * eps() #a jitter for numerical stability;
if kernel_type == "squar_exp"
r = squar_exp(theta, d)
end
R = (I + zeros(nt, nt)) .* (1.0 + nugget + noise)
for k in 1:size(ij)[1]
R[ij[k, 1], ij[k, 2]] = r[k]
R[ij[k, 2], ij[k, 1]] = r[k]
end
C = cholesky(R).L
F = ones(nt, 1)
Ft = C \ F
Q, G = qr(Ft)
Q = Array(Q)
Yt = C \ y_norma
beta = G \ [(transpose(Q) ⋅ Yt)]
rho = Yt .- (Ft .* beta)
gamma = transpose(C) \ rho
sigma2 = sum((rho) .^ 2, dims = 1) / nt
detR = prod(diag(C) .^ (2.0 / nt))
reduced_likelihood_function_value = -nt * log10(sum(sigma2)) - nt * log10(detR)
return beta, gamma, reduced_likelihood_function_value
end
Any pointers on how this can be fixed?

Plotting credible intervals in Julia from Turing model

Ok so I figured out how to plot the credible intervals for a univariate linear model in Turing.jl using the following code (I'm replicating Statistical rethinking by McElreath) This particular exercise is in chapter 4. If anyone has already plotted these types of models with Turing and can give me a guide, it would be great!!!
Univariate model code:
using Turing
using StatsPlots
using Plots
height = df2.height
weight = df2.weight
#model heightmodel(y, x) = begin
#priors
α ~ Normal(178, 100)
σ ~ Uniform(0, 50)
β ~ LogNormal(0, 10)
x_bar = mean(x)
#model
μ = α .+ (x.-x_bar).*β
y ~ MvNormal(μ, σ)
end
chns = sample(heightmodel(height, weight), NUTS(), 100000)
## code 4.43
describe(chns) |> display
# covariance and correlation
alph = get(chns, :α)[1].data
bet = get(chns, :β)[1].data
sigm = get(chns, :σ)[1].data
vecs = (alph[1:352], bet[1:352])
arr = vcat(transpose.(vecs)...)'
ss = [vec(alph + bet.*(x)) for x in 25:1:70]
arrr = vcat(transpose.(ss)...)'
plot([mean(arrr[:,x]) for x in 1:46],25:1:70, ribbon = ([-1*(quantile(arrr[:,x],[0.1,0.9])[1] - mean(arrr[:,x])) for x in 1:46], [quantile(arrr[:,x],[0.1,0.9])[2] - mean(arrr[:,x]) for x in 1:46]))
Credible interval Univariate:
However, when I try to replicate it with a multivatiate function, very strange things are drawn:
Multivariate model code:
weight_s = (df.weight .-mean(df.weight))./std(df.weight)
weight_s² = weight_s.^2
#model heightmodel(height, weight, weight²) = begin
#priors
α ~ Normal(178, 20)
σ ~ Uniform(0, 50)
β1 ~ LogNormal(0, 1)
β2 ~ Normal(0, 1)
#model
μ = α .+ weight.*β1 + weight².*β2
height ~ MvNormal(μ, σ)
end
chns = sample(heightmodel(height, weight_s, weight_s²), NUTS(), 100000)
describe(chns) |> display
### painting the fit
alph = get(chns, :α)[1].data
bet1 = get(chns, :β1)[1].data
bet2 = get(chns, :β2)[1].data
vecs = (alph[1:99000], bet1[1:99000], bet2[1:99000])
arr = vcat(transpose.(vecs)...)'
polinomial = [vec(alph + bet1.*(x) + bet2.*(x.^2)) for x in -2:0.01:2]
arrr = vcat(transpose.(polinomial)...)'
plot([mean(arrr[:,x]) for x in 1:401],-2:0.01:2, ribbon = ([-1*(quantile(arrr[:,x],[0.1,0.9])[1] - mean(arrr[:,x])) for x in 1:46], [quantile(arrr[:,x],[0.1,0.9])[2] - mean(arrr[:,x]) for x in 1:46]))
Credible interval Univariate:
In the Julia Slack channel (https://slackinvite.julialang.org/) Jens was kind enough to give me the answer. The credit goes to him --He doesn't have a SO account.
The main problem was that I was lacking simplicity and was trying to plot the mean the wrong way --in a very inefficient and strange way might I add. Each parameter has a vector which corresponds to 99000 draws from the posterior distribution, I was trying to draw the mean through the matrix but it's much easier to define the median first and then plot it, then you don't make mistakes as I did calculating the mean.
old code
vecs = (alph[1:99000], bet1[1:99000], bet2[1:99000])
arr = vcat(transpose.(vecs)...)'
[mean(arrr[:,x]) for x in 1:401]
can be written as:
testweights = -2:0.01:2
arr = [fheight.(w, res.α, res.β1, res.β2) for w in testweights]
m = [mean(v) for v in arr]
Moreover, the way Jens defined the Credible intervals is much more elegant and Julianic:
Jens' code:
quantiles = [quantile(v, [0.1, 0.9]) for v in arr]
lower = [q[1] - m for (q, m) in zip(quantiles, m)]
upper = [q[2] - m for (q, m) in zip(quantiles, m)]
My code:
ribbon = ([-1*(quantile(arrr[:,x],[0.1,0.9])[1] - mean(arrr[:,x])) for x in 1:46], [quantile(arrr[:,x],[0.1,0.9])[2] - mean(arrr[:,x]) for x in 1:46]))
Complete Solution:
weight_s = (d.weight .-mean(d.weight))./std(d.weight)
height = d.height
#model heightmodel(height, weight) = begin
#priors
α ~ Normal(178, 20)´
σ ~ Uniform(0, 50)
β1 ~ LogNormal(0, 1)
β2 ~ Normal(0, 1)
#model
μ = α .+ weight .* β1 + weight.^2 .* β2
# or μ = fheight.(weight, α, β1, β2) if we are defining fheigth anyways
height ~ MvNormal(μ, σ)
end
chns = sample(heightmodel(height, weight_s), NUTS(), 10000)
describe(chns) |> display
res = DataFrame(chns)
fheight(weight, α, β1, β2) = α + weight * β1 + weight^2 * β2
testweights = -2:0.01:2
arr = [fheight.(w, res.α, res.β1, res.β2) for w in testweights]
m = [mean(v) for v in arr]
quantiles = [quantile(v, [0.1, 0.9]) for v in arr]
lower = [q[1] - m for (q, m) in zip(quantiles, m)]
upper = [q[2] - m for (q, m) in zip(quantiles, m)]
plot(testweights, m, ribbon = [lower, upper])

Can't get performant Julia Turing model

I've tried to reproduce the model from a PYMC3 and Stan comparison. But it seems to run slowly and when I look at #code_warntype there are some things -- K and N I think -- which the compiler seemingly calls Any.
I've tried adding types -- though I can't add types to turing_model's arguments and things are complicated within turing_model because it's using autodiff variables and not the usuals. I put all the code into the function do_it to avoid globals, because they say that globals can slow things down. (It actually seems slower, though.)
Any suggestions as to what's causing the problem? The turing_model code is what's iterating, so that should make the most difference.
using Turing, StatsPlots, Random
sigmoid(x) = 1.0 / (1.0 + exp(-x))
function scale(w0::Float64, w1::Array{Float64,1})
scale = √(w0^2 + sum(w1 .^ 2))
return w0 / scale, w1 ./ scale
end
function do_it(iterations::Int64)::Chains
K = 10 # predictor dimension
N = 1000 # number of data samples
X = rand(N, K) # predictors (1000, 10)
w1 = rand(K) # weights (10,)
w0 = -median(X * w1) # 50% of elements for each class (number)
w0, w1 = scale(w0, w1) # unit length (euclidean)
w_true = [w0, w1...]
y = (w0 .+ (X * w1)) .> 0.0 # labels
y = [Float64(x) for x in y]
σ = 5.0
σm = [x == y ? σ : 0.0 for x in 1:K, y in 1:K]
#model turing_model(X, y, σ, σm) = begin
w0_pred ~ Normal(0.0, σ)
w1_pred ~ MvNormal(σm)
p = sigmoid.(w0_pred .+ (X * w1_pred))
#inbounds for n in 1:length(y)
y[n] ~ Bernoulli(p[n])
end
end
#time chain = sample(turing_model(X, y, σ, σm), NUTS(iterations, 200, 0.65));
# ϵ = 0.5
# τ = 10
# #time chain = sample(turing_model(X, y, σ), HMC(iterations, ϵ, τ));
return (w_true=w_true, chains=chain::Chains)
end
chain = do_it(1000)

Time-dependent events in ODE

I recently started with Julia and wanted to implement one of my usual problems - implement time-depended events.
For now I have:
# Packages
using Plots
using DifferentialEquations
# Parameters
k21 = 0.14*24
k12 = 0.06*24
ke = 1.14*24
α = 0.5
β = 0.05
η = 0.477
μ = 0.218
k1 = 0.5
V1 = 6
# Time
maxtime = 10
tspan = (0.0, maxtime)
# Dose
stim = 100
# Initial conditions
x0 = [0 0 2e11 8e11]
# Model equations
function system(dy, y, p, t)
dy[1] = k21*y[2] - (k12 + ke)*y[1]
dy[2] = k12*y[1] - k21*y[2]
dy[3] = (α - μ - η)*y[3] + β*y[4] - k1/V1*y[1]*y[3]
dy[4] = μ*y[3] - β*y[4]
end
# Events
eventtimes = [2, 5]
function condition(y, t, integrator)
t - eventtimes
end
function affect!(integrator)
x0[1] = stim
end
cb = ContinuousCallback(condition, affect!)
# Solve
prob = ODEProblem(system, x0, tspan)
sol = solve(prob, Rodas4(), callback = cb)
# Plotting
plot(sol, layout = (2, 2))
But the output that is give is not correct. More specifically, the events are not taken into account and the initial condition doesn't seems to be 0 for y1 but stim.
Any help would be greatly appreciated.
t - eventtimes doesn't work because one's a scalar and the other is a vector. But for this case, it's much easier to just use a DiscreteCallback. When you make it a DiscreteCallback you should pre-set the stop times so that to it hits 2 and 5 for the callback. Here's an example:
# Packages
using Plots
using DifferentialEquations
# Parameters
k21 = 0.14*24
k12 = 0.06*24
ke = 1.14*24
α = 0.5
β = 0.05
η = 0.477
μ = 0.218
k1 = 0.5
V1 = 6
# Time
maxtime = 10
tspan = (0.0, maxtime)
# Dose
stim = 100
# Initial conditions
x0 = [0 0 2e11 8e11]
# Model equations
function system(dy, y, p, t)
dy[1] = k21*y[2] - (k12 + ke)*y[1]
dy[2] = k12*y[1] - k21*y[2]
dy[3] = (α - μ - η)*y[3] + β*y[4] - k1/V1*y[1]*y[3]
dy[4] = μ*y[3] - β*y[4]
end
# Events
eventtimes = [2.0, 5.0]
function condition(y, t, integrator)
t ∈ eventtimes
end
function affect!(integrator)
integrator.u[1] = stim
end
cb = DiscreteCallback(condition, affect!)
# Solve
prob = ODEProblem(system, x0, tspan)
sol = solve(prob, Rodas4(), callback = cb, tstops = eventtimes)
# Plotting
plot(sol, layout = (2, 2))
This avoids rootfinding altogether so it should be a much nicer solution that hacking time choices into a rootfinding system.
Either way, notice that the affect was changed to
function affect!(integrator)
integrator.u[1] = stim
end
It needs to be modifying the current u value otherwise it won't do anything.

Comparing SAS and R results after resolving a system of differential equations

I my main objectif is to obtain the same results on SAS and on R. Somethimes and depending on the case, it is very easy. Otherwise it is difficult, specially when we want to compute something more complicated than the usual.
So, in ored to understand my case, I have the following differential equation system :
y' = z
z' = b* y'+c*y
Let :
b = - 2 , c = - 4, y(0) = 0 and z(0) = 1
In order to resolve this system, in SAS we use the command PROC MODEL :
data t;
do time=0 to 40;
output;
end;
run;
proc model data=t ;
dependent y 0 z 1;
parm b -2 c -4;
dert.y = z;
dert.z = b * dert.y + c * y;
solve y z / dynamic solveprint out=out1;
run;
In R, we could write the following solution using the lsoda function of the deSolve package:
library(deSolve)
b <- -2;
c <- -4;
rigidode <- function(t, y, parms) {
with(as.list(y), {
dert.y <- z
dert.z <- b * dert.y + c * y
list(c(dert.y, dert.z))
})
}
yini <- c(y = 0, z = 1)
times <- seq(from=0,to=40,by=1)
out_ode <- ode (times = times, y = yini, func = rigidode, parms = NULL)
out_lsoda <- lsoda (times = times, y = yini, func = rigidode, parms = NULL)
Here are the results :
SAS
R
For time t=0,..,10 , we obtain similar results. But for t=10,...,40, we start to have differences. For me, these differences are important.
In order to correct these differences, I fixed on R the error truncation term on 1E-9 in stead of 1E-6. I also verified if the numerical integration methods and the hypothesis used by default are the same.
Do you have any idea how to deal with this problem?
Sincerely yours,
Mily

Resources