Lagrange Multiplier Method using NLsolve.jl - julia

I would like to minimize a distance function ||dz - z|| under the constraint that g(z) = 0.
I wanted to use Lagrange Multipliers to solve this problem. Then I used NLsolve.jl to solve the non-linear equation that I end up with.
using NLsolve
using ForwardDiff
function ProjLagrange(dz, g::Function)
λ_init = ones(size(g(dz...),1))
initial_x = vcat(dz, λ_init)
function gradL!(F, x)
len_dz = length(dz)
z = x[1:len_dz]
λ = x[len_dz+1:end]
F = Array{Float64}(undef, length(x))
my_distance(z) = norm(dz - z)
∇f = z -> ForwardDiff.gradient(my_distance, z)
F[1:len_dz] = ∇f(z) .- dot(λ, g(z...))
if length(λ) == 1
F[end] = g(z...)
else
F[len_dz+1:end] = g(z)
end
end
nlsolve(gradL!, initial_x)
end
g_test(x1, x2, x3) = x1^2 + x2 - x2 + 5
z = [1000,1,1]
ProjLagrange(z, g_test)
But I always end up with Zero: [NaN, NaN, NaN, NaN] and Convergence: false.
Just so you know I have already solved the equation by using Optim.jl and minimizing the following function: Proj(z) = b * sum(abs.(g(z))) + a * norm(dz - z).
But I would really like to know if this is possible with NLsolve. Any help is greatly appreciated!

Starting almost from scratch and wikipedia's Lagrange multiplier page because it was good for me, the code below seemed to work. I added an λ₀s argument to the ProjLagrange function so that it can accept a vector of initial multiplier λ values (I saw you initialized them at 1.0 but I thought this was more generic). (Note this has not been optimized for performance!)
using NLsolve, ForwardDiff, LinearAlgebra
function ProjLagrange(x₀, λ₀s, gs, n_it)
# distance function from x₀ and its gradients
f(x) = norm(x - x₀)
∇f(x) = ForwardDiff.gradient(f, x)
# gradients of the constraints
∇gs = [x -> ForwardDiff.gradient(g, x) for g in gs]
# Form the auxiliary function and its gradients
ℒ(x,λs) = f(x) - sum(λ * g(x) for (λ,g) in zip(λs,gs))
∂ℒ∂x(x,λs) = ∇f(x) - sum(λ * ∇g(x) for (λ,∇g) in zip(λs,∇gs))
∂ℒ∂λ(x,λs) = [g(x) for g in gs]
# as a function of a single argument
nx = length(x₀)
ℒ(v) = ℒ(v[1:nx], v[nx+1:end])
∇ℒ(v) = vcat(∂ℒ∂x(v[1:nx], v[nx+1:end]), ∂ℒ∂λ(v[1:nx], v[nx+1:end]))
# and solve
v₀ = vcat(x₀, λ₀s)
nlsolve(∇ℒ, v₀, iterations=n_it)
end
# test
gs_test = [x -> x[1]^2 + x[2] - x[3] + 5]
λ₀s_test = [1.0]
x₀_test = [1000.0, 1.0, 1.0]
n_it = 100
res = ProjLagrange(x₀_test, λ₀s_test, gs_test, n_it)
gives me
julia> res = ProjLagrange(x₀_test, λ₀s_test, gs_test, n_it)
Results of Nonlinear Solver Algorithm
* Algorithm: Trust-region with dogleg and autoscaling
* Starting Point: [1000.0, 1.0, 1.0, 1.0]
* Zero: [9.800027199717013, -49.52026655749088, 51.520266557490885, -0.050887973682118504]
* Inf-norm of residuals: 0.000000
* Iterations: 10
* Convergence: true
* |x - x'| < 0.0e+00: false
* |f(x)| < 1.0e-08: true
* Function Calls (f): 11
* Jacobian Calls (df/dx): 11

I altered your code as below (see my comments in there) and got the following output. It doesn't throw NaNs anymore, reduces the objective and converges. Does this differ from your Optim.jl results?
Results of Nonlinear Solver Algorithm
* Algorithm: Trust-region with dogleg and autoscaling
* Starting Point: [1000.0, 1.0, 1.0, 1.0]
* Zero: [9.80003, -49.5203, 51.5203, -0.050888]
* Inf-norm of residuals: 0.000000
* Iterations: 10
* Convergence: true
* |x - x'| < 0.0e+00: false
* |f(x)| < 1.0e-08: true
* Function Calls (f): 11
* Jacobian Calls (df/dx): 11
using NLsolve
using ForwardDiff
using LinearAlgebra: norm, dot
using Plots
function ProjLagrange(dz, g::Function, n_it)
λ_init = ones(size(g(dz),1))
initial_x = vcat(dz, λ_init)
# These definitions can go outside as well
len_dz = length(dz)
my_distance = z -> norm(dz - z)
∇f = z -> ForwardDiff.gradient(my_distance, z)
# In fact, this is probably the most vital difference w.r.t. your proposal.
# We need the gradient of the constraints.
∇g = z -> ForwardDiff.gradient(g, z)
function gradL!(F, x)
z = x[1:len_dz]
λ = x[len_dz+1:end]
# `F` is memory allocated by NLsolve to store the residual of the
# respective call of `gradL!` and hence doesn't need to be allocated
# anew every time (or at all).
F[1:len_dz] = ∇f(z) .- λ .* ∇g(z)
F[len_dz+1:end] .= g(z)
end
return nlsolve(gradL!, initial_x, iterations=n_it, store_trace=true)
end
# Presumable here is something wrong: x2 - x2 is not very likely, also made it
# callable directly with an array argument
g_test = x -> x[1]^2 + x[2] - x[3] + 5
z = [1000,1,1]
n_it = 10000
res = ProjLagrange(z, g_test, n_it)
# Ugly reformatting here
trace = hcat([[state.iteration; state.fnorm; state.stepnorm] for state in res.trace.states]...)
plot(trace[1,:], trace[2,:], label="f(x) inf-norm", xlabel="steps")
Evolution of inf-norm of f(x) over iteration steps
[Edit: Adapted solution to incorporate correct gradient computation for g()]

Related

MathOptInterface.OTHER_ERROR when trying to use ISRES of NLopt through JuMP

I am trying to minimize a nonlinear function with nonlinear inequality constraints with NLopt and JuMP.
In my test code below, I am minimizing a function with a known global minima.
Local optimizers such as LD_MMA fails to find this global minima, so I am trying to use global optimizers of NLopt that allow nonlinear inequality constraintes.
However, when I check my termination status, it says “termination_status(model) = MathOptInterface.OTHER_ERROR”. I am not sure which part of my code to check for this error.
What could be the cause?
I am using JuMP since in the future I plan to use other solvers such as KNITRO as well, but should I rather use the NLopt syntax?
Below is my code:
# THIS IS A CODE TO SOLVE FOR THE TOYMODEL
# THE EQUILIBRIUM IS CHARACTERIZED BY A NONLINEAR SYSTEM OF ODEs OF INCREASING FUCTIONS B(x) and S(y)
# THE GOAL IS TO APPROXIMATE B(x) and S(y) WITH POLYNOMIALS
# FIND THE POLYNOMIAL COEFFICIENTS THAT MINIMIZE THE LEAST SQUARES OF THE EQUILIBRIUM EQUATIONS
# load packages
using Roots, NLopt, JuMP
# model primitives and other parameters
k = .5 # equal split
d = 1 # degree of polynomial
nparam = 2*d+2 # number of parameters to estimate
m = 10 # number of grids
m -= 1
vGrid = range(0,1,m) # discretize values
c1 = 0 # lower bound for B'() and S'()
c2 = 2 # lower and upper bounds for offers
c3 = 1 # lower and upper bounds for the parameters to be estimated
# objective function to be minimized
function obj(α::T...) where {T<:Real}
# split parameters
αb = α[1:d+1] # coefficients for B(x)
αs = α[d+2:end] # coefficients for S(y)
# define B(x), B'(x), S(y), and S'(y)
B(v) = sum([αb[i] * v .^ (i-1) for i in 1:d+1])
B1(v) = sum([αb[i] * (i-1) * v ^ (i-2) for i in 2:d+1])
S(v) = sum([αs[i] * v .^ (i-1) for i in 1:d+1])
S1(v) = sum([αs[i] * (i-1) * v ^ (i-2) for i in 2:d+1])
# the equilibrium is characterized by the following first order conditions
#FOCb(y) = B(k * y * S1(y) + S(y)) - S(y)
#FOCs(x) = S(- (1-k) * (1-x) * B1(x) + B(x)) - B(x)
function FOCb(y)
sy = S(y)
binv = find_zero(q -> B(q) - sy, (-c2, c2))
return k * y * S1(y) + sy - binv
end
function FOCs(x)
bx = B(x)
sinv = find_zero(q -> S(q) - bx, (-c2, c2))
return (1-k) * (1-x) * B1(x) - B(x) + sinv
end
# evaluate the FOCs at each grid point and return the sum of squares
Eb = [FOCb(y) for y in vGrid]
Es = [FOCs(x) for x in vGrid]
E = [Eb; Es]
return E' * E
end
# this is the actual global minimum
αa = [1/12, 2/3, 1/4, 2/3]
obj(αa...)
# do optimization
model = Model(NLopt.Optimizer)
set_optimizer_attribute(model, "algorithm", :GN_ISRES)
#variable(model, -c3 <= α[1:nparam] <= c3)
#NLconstraint(model, [j = 1:m], sum(α[i] * (i-1) * vGrid[j] ^ (i-2) for i in 2:d+1) >= c1) # B should be increasing
#NLconstraint(model, [j = 1:m], sum(α[d+1+i] * (i-1) * vGrid[j] ^ (i-2) for i in 2:d+1) >= c1) # S should be increasing
register(model, :obj, nparam, obj, autodiff=true)
#NLobjective(model, Min, obj(α...))
println("")
println("Initial values:")
for i in 1:nparam
set_start_value(α[i], αa[i]+rand()*.1)
println(start_value(α[i]))
end
JuMP.optimize!(model)
println("")
#show termination_status(model)
#show objective_value(model)
println("")
println("Solution:")
sol = [value(α[i]) for i in 1:nparam]
My output:
Initial values:
0.11233072522513032
0.7631843020124309
0.3331559403539963
0.7161240026812674
termination_status(model) = MathOptInterface.OTHER_ERROR
objective_value(model) = 0.19116585196576466
Solution:
4-element Vector{Float64}:
0.11233072522513032
0.7631843020124309
0.3331559403539963
0.7161240026812674
I answered on the Julia forum: https://discourse.julialang.org/t/mathoptinterface-other-error-when-trying-to-use-isres-of-nlopt-through-jump/87420/2.
Posting my answer for posterity:
You have multiple issues:
range(0,1,m) should be range(0,1; length = m) (how did this work otherwise?) This is true for Julia 1.6. The range(start, stop, length) method was added for Julia v1.8
Sometimes your objective function errors because the root doesn't exist. If I run with Ipopt, I get
ERROR: ArgumentError: The interval [a,b] is not a bracketing interval.
You need f(a) and f(b) to have different signs (f(a) * f(b) < 0).
Consider a different bracket or try fzero(f, c) with an initial guess c.
Here's what I would do:
using JuMP
import Ipopt
import Roots
function main()
k, d, c1, c2, c3, m = 0.5, 1, 0, 2, 1, 10
nparam = 2 * d + 2
m -= 1
vGrid = range(0, 1; length = m)
function obj(α::T...) where {T<:Real}
αb, αs = α[1:d+1], α[d+2:end]
B(v) = sum(αb[i] * v^(i-1) for i in 1:d+1)
B1(v) = sum(αb[i] * (i-1) * v^(i-2) for i in 2:d+1)
S(v) = sum(αs[i] * v^(i-1) for i in 1:d+1)
S1(v) = sum(αs[i] * (i-1) * v^(i-2) for i in 2:d+1)
function FOCb(y)
sy = S(y)
binv = Roots.fzero(q -> B(q) - sy, zero(T))
return k * y * S1(y) + sy - binv
end
function FOCs(x)
bx = B(x)
sinv = Roots.fzero(q -> S(q) - bx, zero(T))
return (1-k) * (1-x) * B1(x) - B(x) + sinv
end
return sum(FOCb(x)^2 + FOCs(x)^2 for x in vGrid)
end
αa = [1/12, 2/3, 1/4, 2/3]
model = Model(Ipopt.Optimizer)
#variable(model, -c3 <= α[i=1:nparam] <= c3, start = αa[i]+ 0.1 * rand())
#constraints(model, begin
[j = 1:m], sum(α[i] * (i-1) * vGrid[j]^(i-2) for i in 2:d+1) >= c1
[j = 1:m], sum(α[d+1+i] * (i-1) * vGrid[j]^(i-2) for i in 2:d+1) >= c1
end)
register(model, :obj, nparam, obj; autodiff = true)
#NLobjective(model, Min, obj(α...))
optimize!(model)
print(solution_summary(model))
return value.(α)
end
main()

Julia minimize simple scalar function

How do I minimize a simple scalar function in Julia using Newton's method? (or any other suitable optimization scheme)
using Optim
# Function to optimize
function g(x)
return x^2
end
x0 = 2.0 # Initial value
optimize(g, x0, Newton())
The above doesn't seem to work and returns
ERROR: MethodError: no method matching optimize(::typeof(g), ::Float64, ::Newton{LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}}})
Optim is designed for vector problems and not scalar ones like in your example. You can adjust the example to be a vector-problem with one variable though:
julia> using Optim
julia> function g(x) # <- g accepts x as a vector
return x[1]^2
end
julia> x0 = [2.0] # <- Make this a vector
1-element Vector{Float64}:
2.0
julia> optimize(g, x0, Newton())
* Status: success
* Candidate solution
Final objective value: 0.000000e+00
The optimize function expects an interval, not a starting point:
optimize(g, -10, 10)
returns
Results of Optimization Algorithm
* Algorithm: Brent's Method
* Search Interval: [-10.000000, 10.000000]
* Minimizer: 0.000000e+00
* Minimum: 0.000000e+00
* Iterations: 5
* Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true
* Objective Function Calls: 6
Concerning available methods I have not read the doc, but you can directly have a look at the source code using the #edit macro:
#edit optimize(g, -10, 10)
Browsing the source you will see:
function optimize(f,
lower::Union{Integer, Real},
upper::Union{Integer, Real},
method::Union{Brent, GoldenSection};
kwargs...)
T = promote_type(typeof(lower/1), typeof(upper/1))
optimize(f,
T(lower),
T(upper),
method;
kwargs...)
end
Hence I think that you have only two methods for unidimensional minimization: Brent and GoldenSection.
By example you can try:
julia> optimize(g, -10, 10, GoldenSection())
Results of Optimization Algorithm
* Algorithm: Golden Section Search
* Search Interval: [-10.000000, 10.000000]
* Minimizer: 1.110871e-16
* Minimum: 1.234035e-32
* Iterations: 79
* Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true
* Objective Function Calls: 80

non-linear solver always produces zero residual

I am learning how to solve non-linear equations and Julia (1.3.1) in general and want to ask how we should use NLsolve.
As a first step, I try the following:
using NLsolve
uni = 1
z = 3
x = [3,2,4,5,6]
y = z .+ x
print(y)
function g!(F, x)
F[1:uni+4] = y .- z - x
end
nlsolve(g!, [0.5,0.9,1,2,3])
And, I confirm it works as below:
Results of Nonlinear Solver Algorithm
* Algorithm: Trust-region with dogleg and autoscaling
* Starting Point: [0.5, 0.9, 1.0, 2.0, 3.0]
* Zero: [2.999999999996542, 2.000000000003876, 4.000000000008193, 4.999999999990685, 5.999999999990221]
* Inf-norm of residuals: 0.000000
* Iterations: 2
* Convergence: true
* |x - x'| < 0.0e+00: false
* |f(x)| < 1.0e-08: true
* Function Calls (f): 3
* Jacobian Calls (df/dx): 3
Then, I try a more complicated model as following
using SpecialFunctions, NLsolve, Random
Random.seed!(1234)
S = 2
# Setting parameters
ε = 1.3
β = 0.4
γ = gamma((ε-1)/ε)
T = rand(S)
E = rand(S)
B = rand(S)
w = rand(S)
Q = rand(S)
d = [1 2 ;2 1 ]
# Construct a model
rvector = T.*Q.^(1-β).*B.^ε
svector = E.* w.^ε
Φ_all = (sum(sum(rvector * svector' .* d )))
π = rvector * svector' .* d ./ Φ_all
# These two are outcome the model
πR = (sum(π,dims=1))'
πM = sum(π,dims=2)
# E is now set as unknown and we want to estimate it given the outcome of the model
function f!(Res, Unknown)
rvector = T.*Q.^(1-β).*B.^ε
svector = Unknown[1:S].* w.^ε
Φ_all = (sum(sum(rvector * svector' .* d )))
π = rvector * svector' .* d ./ Φ_all
Res = ones(S+S)
Res[1:S] = πR - (sum(π,dims=1))'
Res[S+1:S+S] = πM - sum(π,dims=2)
end
nlsolve(f!, [0.5,0.6])
and this code produces a weird result as follows.
Results of Nonlinear Solver Algorithm
* Algorithm: Trust-region with dogleg and autoscaling
* Starting Point: [0.5, 0.6]
* Zero: [0.5, 0.6]
* Inf-norm of residuals: 0.000000
* Iterations: 0
* Convergence: true
* |x - x'| < 0.0e+00: false
* |f(x)| < 1.0e-08: true
* Function Calls (f): 1
* Jacobian Calls (df/dx): 1
So, essentially, the function always yields 0 as the return, and thus the initial input always becomes the solution. I cannot understand why this does not work and also why this behaves differently from the first example. Could I have your suggestion to fix it?
You need to update Res in place. Now you do
Res = ones(S+S)
which shadows the input Res. Just update Res directly.

Can't get performant Julia Turing model

I've tried to reproduce the model from a PYMC3 and Stan comparison. But it seems to run slowly and when I look at #code_warntype there are some things -- K and N I think -- which the compiler seemingly calls Any.
I've tried adding types -- though I can't add types to turing_model's arguments and things are complicated within turing_model because it's using autodiff variables and not the usuals. I put all the code into the function do_it to avoid globals, because they say that globals can slow things down. (It actually seems slower, though.)
Any suggestions as to what's causing the problem? The turing_model code is what's iterating, so that should make the most difference.
using Turing, StatsPlots, Random
sigmoid(x) = 1.0 / (1.0 + exp(-x))
function scale(w0::Float64, w1::Array{Float64,1})
scale = √(w0^2 + sum(w1 .^ 2))
return w0 / scale, w1 ./ scale
end
function do_it(iterations::Int64)::Chains
K = 10 # predictor dimension
N = 1000 # number of data samples
X = rand(N, K) # predictors (1000, 10)
w1 = rand(K) # weights (10,)
w0 = -median(X * w1) # 50% of elements for each class (number)
w0, w1 = scale(w0, w1) # unit length (euclidean)
w_true = [w0, w1...]
y = (w0 .+ (X * w1)) .> 0.0 # labels
y = [Float64(x) for x in y]
σ = 5.0
σm = [x == y ? σ : 0.0 for x in 1:K, y in 1:K]
#model turing_model(X, y, σ, σm) = begin
w0_pred ~ Normal(0.0, σ)
w1_pred ~ MvNormal(σm)
p = sigmoid.(w0_pred .+ (X * w1_pred))
#inbounds for n in 1:length(y)
y[n] ~ Bernoulli(p[n])
end
end
#time chain = sample(turing_model(X, y, σ, σm), NUTS(iterations, 200, 0.65));
# ϵ = 0.5
# τ = 10
# #time chain = sample(turing_model(X, y, σ), HMC(iterations, ϵ, τ));
return (w_true=w_true, chains=chain::Chains)
end
chain = do_it(1000)

Optim using gradient Error: "no method matching"

I’m trying to optimize a function using one of the algorithms that require a gradient. Basically I’m trying to learn how to optimize a function using a gradient in Julia. I’m fairly confident that my gradient is specified correctly. I know this because the similarly defined Matlab function for the gradient gives me the same values as in Julia for some test values of the arguments. Also, the Matlab version using fminunc with the gradient seems to optimize the function fine.
However when I run the Julia script, I seem to get the following error:
julia> include("ex2b.jl")
ERROR: `g!` has no method matching g!(::Array{Float64,1}, ::Array{Float64,1})
while loading ...\ex2b.jl, in ex
pression starting on line 64
I'm running Julia 0.3.2 on a windows 7 32bit machine. Here is the code (basically a translation of some Matlab to Julia):
using Optim
function mapFeature(X1, X2)
degrees = 5
out = ones(size(X1)[1])
for i in range(1, degrees+1)
for j in range(0, i+1)
term = reshape( (X1.^(i-j) .* X2.^(j)), size(X1.^(i-j))[1], 1)
out = hcat(out, term)
end
end
return out
end
function sigmoid(z)
return 1 ./ (1 + exp(-z))
end
function costFunc_logistic(theta, X, y, lam)
m = length(y)
regularization = sum(theta[2:end].^2) * lam / (2 * m)
return sum( (-y .* log(sigmoid(X * theta)) - (1 - y) .* log(1 - sigmoid(X * theta))) ) ./ m + regularization
end
function costFunc_logistic_gradient!(theta, X, y, lam, m)
grad= X' * ( sigmoid(X * theta) .- y ) ./ m
grad[2:end] = grad[2:end] + theta[2:end] .* lam / m
return grad
end
data = readcsv("ex2data2.txt")
X = mapFeature(data[:,1], data[:,2])
m, n = size(data)
y = data[:, end]
theta = zeros(size(X)[2])
lam = 1.0
f(theta::Array) = costFunc_logistic(theta, X, y, lam)
g!(theta::Array) = costFunc_logistic_gradient!(theta, X, y, lam, m)
optimize(f, g!, theta, method = :l_bfgs)
And here is some of the data:
0.051267,0.69956,1
-0.092742,0.68494,1
-0.21371,0.69225,1
-0.375,0.50219,1
-0.51325,0.46564,1
-0.52477,0.2098,1
-0.39804,0.034357,1
-0.30588,-0.19225,1
0.016705,-0.40424,1
0.13191,-0.51389,1
0.38537,-0.56506,1
0.52938,-0.5212,1
0.63882,-0.24342,1
0.73675,-0.18494,1
0.54666,0.48757,1
0.322,0.5826,1
0.16647,0.53874,1
-0.046659,0.81652,1
-0.17339,0.69956,1
-0.47869,0.63377,1
-0.60541,0.59722,1
-0.62846,0.33406,1
-0.59389,0.005117,1
-0.42108,-0.27266,1
-0.11578,-0.39693,1
0.20104,-0.60161,1
0.46601,-0.53582,1
0.67339,-0.53582,1
-0.13882,0.54605,1
-0.29435,0.77997,1
-0.26555,0.96272,1
-0.16187,0.8019,1
-0.17339,0.64839,1
-0.28283,0.47295,1
-0.36348,0.31213,1
-0.30012,0.027047,1
-0.23675,-0.21418,1
-0.06394,-0.18494,1
0.062788,-0.16301,1
0.22984,-0.41155,1
0.2932,-0.2288,1
0.48329,-0.18494,1
0.64459,-0.14108,1
0.46025,0.012427,1
0.6273,0.15863,1
0.57546,0.26827,1
0.72523,0.44371,1
0.22408,0.52412,1
0.44297,0.67032,1
0.322,0.69225,1
0.13767,0.57529,1
-0.0063364,0.39985,1
-0.092742,0.55336,1
-0.20795,0.35599,1
-0.20795,0.17325,1
-0.43836,0.21711,1
-0.21947,-0.016813,1
-0.13882,-0.27266,1
0.18376,0.93348,0
0.22408,0.77997,0
Let me know if you guys need additional details. Btw, this relates to a coursera machine learning course if curious.
The gradient should not be a function to compute the gradient,
but a function to store it
(hence the exclamation mark in the function name, and the second argument in the error message).
The following seems to work.
function g!(theta::Array, storage::Array)
storage[:] = costFunc_logistic_gradient!(theta, X, y, lam, m)
end
optimize(f, g!, theta, method = :l_bfgs)
The same using closures and currying (version for those who got used to a function that returns the cost and gradient):
function cost_gradient(θ, X, y, λ)
m = length(y);
return (θ::Array) -> begin
h = sigmoid(X * θ); #(m,n+1)*(n+1,1) -> (m,1)
J = (1 / m) * sum(-y .* log(h) .- (1 - y) .* log(1 - h)) + λ / (2 * m) * sum(θ[2:end] .^ 2);
end, (θ::Array, storage::Array) -> begin
h = sigmoid(X * θ); #(m,n+1)*(n+1,1) -> (m,1)
storage[:] = (1 / m) * (X' * (h .- y)) + (λ / m) * [0; θ[2:end]];
end
end
Then, somewhere in the code:
initialθ = zeros(n,1);
f, g! = cost_gradient(initialθ, X, y, λ);
res = optimize(f, g!, initialθ, method = :cg, iterations = your_iterations);
θ = res.minimum;

Resources