Julia DiffResults examples? - julia

Julia's ForwardDiff documentation suggests that computing the function value, gradient and Hessian can be computed in one fell swoop using the DiffResults API, but there are NO examples. The DiffResults package itself has no examples either, and no documentation to speak of. The use case for this is self-evident: suppose I have a function f of a vector argument x, and I want to minimize it using Newton's method. Below is the blunt approach, where things get recomputed three times - how would I write it with DiffResults?
function NewtMin(f, x0, eps)
fgrad = x-> ForwardDiff.gradient(f, x)
fhess = x-> ForwardDiff.hessian(f, x)
oldval = f(x0)
newx = x0 - fhess(x0)\fgrad(x0)
newval = f(newx)
while abs(newval - oldval) > eps
oldval = newval
newx = newx - fhess(newx)\fgrad(newx)
newval = f(newx)
end
return newx
end

There are examples in the DiffResults.jl documentation from http://www.juliadiff.org/DiffResults.jl/stable/.
And this is a simple rewriting of Newtmin using DiffResults, it works in julia v0.6.4. But I guess it can be refactored and optimized to be more elegant and performant.
using ForwardDiff
using DiffResults
function NewtMin(f, x0, eps)
result = DiffResults.HessianResult(x0)
ForwardDiff.hessian!(result, f, x0)
fhess_x0 = DiffResults.hessian(result)
fgrad_x0 = DiffResults.gradient(result)
oldval = DiffResults.value(result)
newx = x0 - fhess_x0\fgrad_x0
newval = f(newx)
while abs(newval - oldval) > eps
oldval = newval
ForwardDiff.hessian!(result, f, newx)
fhess_newx = DiffResults.hessian(result)
fgrad_newx = DiffResults.gradient(result)
newx = newx - fhess_newx\fgrad_newx
newval = f(newx)
end
return newx
end
foo(x) = sum(x.^2)
NewtMin(foo, [1.,1.,1.], 0.01)
## which should give a correct result at [0., 0., 0.]

Related

Why my code in Julia is getting slower for higher iteration?

I wrote a main function which uses a stochastic optimization algorithm (Particle Swarm Optimization) to found optimal solution for a ODE system. I would run 50 times to make sure the optimal can be found. At first, it operates normally, but now I found the calculation time would increase with iteration increases.
It cost less than 300s for first ten calculations, but it would increase to 500s for final calculation. It seems that it would cost 3~5 seconds more for each calculation. I have followed the high performance tips to optimize my code but it doesn't work.
I am sorry I don't know quite well how to upload my code before, here is the code I wrote below. But in this code, the experimental data is not loaded, I may need to find a way to upload data. In main function, with the increase of i, the time cost is increasing for each calculation.
Oh, by the way, I found another interesting phenomenon. I changed the number of calculations and the calculation time changed again. For first 20 calculations in main loop, each calculation cost about 300s, and the memory useage fluctuates significantly. But something I don't know happend, it is speeding up. It cost 1/4 time less time for each calculation, which is about 80s. And the memory useage became a straight line like this:
I knew the Julia would do pre-heating for first run and then speed up. But this situation seems different. This situation looks like Julia run slowly for first 20 calculation, and then it found a good way to optimize the memory useage and speed up. Then the program just run at full speed.
using CSV, DataFrames
using BenchmarkTools
using DifferentialEquations
using Statistics
using Dates
using Base.Threads
using Suppressor
function uniform(dim::Int, lb::Array{Float64, 1}, ub::Array{Float64, 1})
arr = rand(Float64, dim)
#inbounds for i in 1:dim; arr[i] = arr[i] * (ub[i] - lb[i]) + lb[i] end
return arr
end
mutable struct Problem
cost_func
dim::Int
lb::Array{Float64,1}
ub::Array{Float64,1}
end
mutable struct Particle
position::Array{Float64,1}
velocity::Array{Float64,1}
cost::Float64
best_position::Array{Float64,1}
best_cost::Float64
end
mutable struct Gbest
position::Array{Float64,1}
cost::Float64
end
function PSO(problem, data_dict; max_iter=100,population=100,c1=1.4962,c2=1.4962,w=0.7298,wdamp=1.0)
dim = problem.dim
lb = problem.lb
ub = problem.ub
cost_func = problem.cost_func
gbest, particles = initialize_particles(problem, population, data_dict)
# main loop
for iter in 1:max_iter
#threads for i in 1:population
particles[i].velocity .= w .* particles[i].velocity .+
c1 .* rand(dim) .* (particles[i].best_position .- particles[i].position) .+
c2 .* rand(dim) .* (gbest.position .- particles[i].position)
particles[i].position .= particles[i].position .+ particles[i].velocity
particles[i].position .= max.(particles[i].position, lb)
particles[i].position .= min.(particles[i].position, ub)
particles[i].cost = cost_func(particles[i].position,data_dict)
if particles[i].cost < particles[i].best_cost
particles[i].best_position = copy(particles[i].position)
particles[i].best_cost = copy(particles[i].cost)
if particles[i].best_cost < gbest.cost
gbest.position = copy(particles[i].best_position)
gbest.cost = copy(particles[i].best_cost)
end
end
end
w = w * wdamp
if iter % 50 == 1
println("Iteration " * string(iter) * ": Best Cost = " * string(gbest.cost))
println("Best Position = " * string(gbest.position))
println()
end
end
gbest, particles
end
function initialize_particles(problem, population,data_dict)
dim = problem.dim
lb = problem.lb
ub = problem.ub
cost_func = problem.cost_func
gbest_position = uniform(dim, lb, ub)
gbest = Gbest(gbest_position, cost_func(gbest_position,data_dict))
particles = []
for i in 1:population
position = uniform(dim, lb, ub)
velocity = zeros(dim)
cost = cost_func(position,data_dict)
best_position = copy(position)
best_cost = copy(cost)
push!(particles, Particle(position, velocity, cost, best_position, best_cost))
if best_cost < gbest.cost
gbest.position = copy(best_position)
gbest.cost = copy(best_cost)
end
end
return gbest, particles
end
function get_dict_label(beta::Int)
beta_str = lpad(beta,2,"0")
T_label = "Temperature_" * beta_str
M_label = "Mass_" * beta_str
MLR_label = "MLR_" * beta_str
return T_label, M_label, MLR_label
end
function get_error(x::Vector{Float64}, y::Vector{Float64})
numerator = sum((x.-y).^2)
denominator = var(x) * length(x)
numerator/denominator
end
function central_diff(x::AbstractArray{Float64}, y::AbstractArray{Float64})
# Central difference quotient
dydx = Vector{Float64}(undef, length(x))
dydx[2:end] .= diff(y) ./ diff(x)
#views dydx[2:end-1] .= (dydx[2:end-1] .+ dydx[3:end])./2
# Forward and Backward difference
dydx[1] = (y[2]-y[1])/(x[2]-x[1])
dydx[end] = (y[end]-y[end-1])/(x[end]-x[end-1])
return dydx
end
function decomposition!(dm,m,p,T)
# A-> residue + volitale
# B-> residue + volatile
beta,A1,E1,n1,k1,A2,E2,n2,k2,m1,m2 = p
R = 8.314
rxn1 = -m1 * exp(A1-E1/R/T) * max(m[1]/m1,0)^n1 / beta
rxn2 = -m2 * exp(A2-E2/R/T) * max(m[2]/m2,0)^n2 / beta
dm[1] = rxn1
dm[2] = rxn2
dm[3] = -k1 * rxn1 - k2 * rxn2
dm[4] = dm[1] + dm[2] + dm[3]
end
function read_file(file_path)
df = CSV.read(file_path, DataFrame)
data_dict = Dict{String, Vector{Float64}}()
for beta in 5:5:21
T_label, M_label, MLR_label = get_dict_label(beta)
T_data = collect(skipmissing(df[:, T_label]))
M_data = collect(skipmissing(df[:, M_label]))
T = T_data[T_data .< 780]
M = M_data[T_data .< 780]
data_dict[T_label] = T
data_dict[M_label] = M
data_dict[MLR_label] = central_diff(T, M)
end
return data_dict
end
function initial_condition(beta::Int64, ode_parameters::Array{Float64,1})
m_FR_initial = ode_parameters[end]
m_PVC_initial = 1 - m_FR_initial
T_span = (300.0, 800.0) # temperature range
p = [beta; ode_parameters; m_PVC_initial]
m0 = [p[end-1], p[end], 0.0, 1.0] # initial mass
return m0, T_span, p
end
function cost_func(ode_parameters, data_dict)
total_error = 0.0
for beta in 5:5:21
T_label, M_label, MLR_label= get_dict_label(beta)
T = data_dict[T_label]::Vector{Float64}
M = data_dict[M_label]::Vector{Float64}
MLR = data_dict[MLR_label]::Vector{Float64}
m0, T_span, p = initial_condition(beta,ode_parameters)
prob = ODEProblem(decomposition!,m0,T_span,p)
sol = solve(prob, AutoVern9(Rodas5(autodiff=false)),saveat=T,abstol=1e-8,reltol=1e-8,maxiters=1e4)
if sol.retcode != :Success
# println(1)
return Inf
else
M_sol = #view sol[end, :]
MLR_sol = central_diff(T, M_sol)::Array{Float64,1}
error1 = get_error(MLR, MLR_sol)::Float64
error2 = get_error(M, M_sol)::Float64
total_error += error1 + error2
end
end
total_error
end
function main()
flush(stdout)
total_time = 0
best_costs = []
file_path = raw"F:\17-Fabric\17-Fabric (Smoothed) TG.csv"
data_dict = read_file(file_path)
dimension = 9
lb = [5, 47450, 0.0, 0.0, 24.36, 148010, 0.0, 0.0, 1e-5]
ub = [25.79, 167700, 5, 1, 58.95, 293890, 5, 1, 0.25]
problem = Problem(cost_func,dimension,lb,ub)
global_best_cost = Inf
println("-"^100)
println("Running PSO ...")
population = 50
max_iter = 1001
println("The population is: ", population)
println("Max iteration is:", max_iter)
for i in 1:50 # The number of calculation
start_time = Dates.now()
println("Current iteration is: ", string(i))
gbest, particles = PSO(problem, data_dict, max_iter=max_iter, population=population)
if gbest.cost < global_best_cost
global_best_cost = gbest.cost
global_best_position = gbest.position
end
end_time = Dates.now()
time_duration = round(end_time-start_time, Second)
total_time += time_duration.value
push!(best_costs, gbest.cost)
println()
println("The Best is:")
println(gbest.cost)
println(gbest.position)
println("The calculation time is: " * string(time_duration))
println()
println("-"^50)
end
println('-'^100)
println("Global best cost is: ", global_best_cost)
println("Global best position is: ", global_best_position)
println(total_time)
best_costs
end
#suppress_err begin
#time global best_costs = main()
end
So, what is the possible mechanism for this? Is there a way to avoid this problem? Because If I increase the population and max iterations of particles, the time increased would be extremely large and thus is unacceptable.
And what is the possible mechanism for speed up the program I mentioned above? How to trigger this mechanism?
As the parameters of an ODE optimizes it can completely change its characteristics. Your equation could be getting more stiff and require different ODE solvers. There are many other related ways, but you can see how changing parameters could give such a performance issue. It's best to use methods like AutoTsit5(Rodas5()) and the like in such estimation cases because it's hard to know or guess what the performance will be like, and thus adaptiveness in the method choice can be crucial.

2D curve fitting in Julia

I have an array Z in Julia which represents an image of a 2D Gaussian function. I.e. Z[i,j] is the height of the Gaussian at pixel i,j. I would like to determine the parameters of the Gaussian (mean and covariance), presumably by some sort of curve fitting.
I've looked into various methods for fitting Z: I first tried the Distributions package, but it is designed for a somewhat different situation (randomly selected points). Then I tried the LsqFit package, but it seems to be tailored for 1D fitting, as it is throwing errors when I try to fit 2D data, and there is no documentation I can find to lead me to a solution.
How can I fit a Gaussian to a 2D array in Julia?
The simplest approach is to use Optim.jl. Here is an example code (it was not optimized for speed, but it should show you how you can handle the problem):
using Distributions, Optim
# generate some sample data
true_d = MvNormal([1.0, 0.0], [2.0 1.0; 1.0 3.0])
const xr = -3:0.1:3
const yr = -3:0.1:3
const s = 5.0
const m = [s * pdf(true_d, [x, y]) for x in xr, y in yr]
decode(x) = (mu=x[1:2], sig=[x[3] x[4]; x[4] x[5]], s=x[6])
function objective(x)
mu, sig, s = decode(x)
try # sig might be infeasible so we have to handle this case
est_d = MvNormal(mu, sig)
ref_m = [s * pdf(est_d, [x, y]) for x in xr, y in yr]
sum((a-b)^2 for (a,b) in zip(ref_m, m))
catch
sum(m)
end
end
# test for an example starting point
result = optimize(objective, [1.0, 0.0, 1.0, 0.0, 1.0, 1.0])
decode(result.minimizer)
Alternatively you could use constrained optimization e.g. like this:
using Distributions, JuMP, NLopt
true_d = MvNormal([1.0, 0.0], [2.0 1.0; 1.0 3.0])
const xr = -3:0.1:3
const yr = -3:0.1:3
const s = 5.0
const Z = [s * pdf(true_d, [x, y]) for x in xr, y in yr]
m = Model(solver=NLoptSolver(algorithm=:LD_MMA))
#variable(m, m1)
#variable(m, m2)
#variable(m, sig11 >= 0.001)
#variable(m, sig12)
#variable(m, sig22 >= 0.001)
#variable(m, sc >= 0.001)
function obj(m1, m2, sig11, sig12, sig22, sc)
est_d = MvNormal([m1, m2], [sig11 sig12; sig12 sig22])
ref_Z = [sc * pdf(est_d, [x, y]) for x in xr, y in yr]
sum((a-b)^2 for (a,b) in zip(ref_Z, Z))
end
JuMP.register(m, :obj, 6, obj, autodiff=true)
#NLobjective(m, Min, obj(m1, m2, sig11, sig12, sig22, sc))
#NLconstraint(m, sig12*sig12 + 0.001 <= sig11*sig22)
setvalue(m1, 0.0)
setvalue(m2, 0.0)
setvalue(sig11, 1.0)
setvalue(sig12, 0.0)
setvalue(sig22, 1.0)
setvalue(sc, 1.0)
status = solve(m)
getvalue.([m1, m2, sig11, sig12, sig22, sc])
In principle, you have a loss function
loss(μ, Σ) = sum(dist(Z[i,j], N([x(i), y(j)], μ, Σ)) for i in Ri, j in Rj)
where x and y convert your indices to points on the axes (for which you need to know the grid distance and offset positions), and Ri and Rj the ranges of the indices. dist is the distance measure you use, eg. squared difference.
You should be able to pass this into an optimizer by packing μ and Σ into a single vector:
pack(μ, Σ) = [μ; vec(Σ)]
unpack(v) = #views v[1:N], reshape(v[N+1:end], N, N)
loss_packed(v) = loss(unpack(v)...)
where in your case N = 2. (Maybe the unpacking deserves some optimization to get rid of unnecessary copying.)
Another thing is that we have to ensure that Σ is positive semidifinite (and hence also symmetric). One way to do that is to parametrize the packed loss function differently, and optimize over some lower triangular matrix L, such that Σ = L * L'. In the case N = 2, we can write this as
unpack(v) = v[1:2], LowerTriangular([v[3] zero(v[3]); v[4] v[5]])
loss_packed(v) = let (μ, L) = unpack(v)
loss(μ, L * L')
end
(This is of course prone to further optimization, such as expanding the multiplication directly in to loss). A different way is to specify the condition as constraints into the optimizer.
For the optimzer to work you probably have to get the derivative of loss_packed. Either have to find the manually calculate it (by a good choice of dist), or maybe more easily by using a log transformation (if you're lucky, you find a way to reduce it to a linear problem...). Alternatively you could try to find an optimizer that does automatic differentiation.

Static arguments to NLsolve.jl

I am trying to use NLsolve.jl to solve a set of 6 equations with known Jacobian matrix.
The unknowns are in the array U, so following the repository example I created a function fun! as
function fun!(F, U, k)
W = anotherFunction(U, k)
F[1] = U[1] + 4*k[1]*U[4] - W[1]
F[2] = U[2] - 4*k[2]*U[5] - W[2]
F[3] = U[3] - 4*k[3]*U[6] - W[3]
F[4] = U[1]*(U[4]*U[4]*U[4]*U[4]) -
U[2]*(U[5]*U[5]*U[5]*U[5]) -
U[3]*(U[6]*U[6]*U[6]*U[6])
F[5] = k[7]*(U[4]*U[4]*k[4] - 1 ) -
(k[8]*(U[5]*U[5]*k[5] - 1))
F[6] = k[7]*(U[4]*U[4]*k[4] - 1 ) -
(k[9]*(U[6]*U[6]*k[6] - 1))
end
where the values of W are computed throuhg another function. Similarly, the Jacobian reads
function jac!(J, U, k)
J = eye(6)
J[1,4] = 4*k[1]
J[2,5] = -4*k[2]
J[3,6] = -4*k[3]
J[4,1] = (U[4]*U[4]*U[4]*U[4])
J[4,2] = -(U[5]*U[5]*U[5]*U[5])
J[4,3] = -(U[6]*U[6]*U[6]*U[6])
J[4,4] = 4*U[1]*(U[4]*U[4]*U[4])
J[4,5] = -4*U[2]*(U[5]*U[5]*U[5])
J[4,6] = -4*U[3]*(U[6]*U[6]*U[6])
J[5,1] = 0.
J[5,2] = 0.
J[5,4] = 2*k[7]*U[4]*k[4]
J[5,5] = -2*k[8]*U[5]*k[5]
J[6,1] = 0.
J[6,3] = 0.
J[6,4] = 2*k[7]*U[4]*k[4]
J[6,6] = -2*k[9]*U[6]*k[6]
end
The k array is constant and defined before calling nlsove, therefore I create two more functions to enclose fun! and jac! as described here, here, and here
f_closure!(U) = fun!(F, U, k)
j_closure!(U) = jac!(J, U, k)
However, calling res = nlsolve(f_closure!, j_closure!, U) results in this error
ERROR: LoadError: MethodError: no method matching
(::#f_closure!#5{Array{Float64,1}})(::Array{Float64,1},
::Array{Float64,1}) Closest candidates are: f_closure!(::Any)
Is there a different way to pass the k array? Thanks!

Converting matlab code to R code

I was wondering how I can convert this code from Matlab to R code. It seems this is the code for midpoint method. Any help would be highly appreciated.
% Usage: [y t] = midpoint(f,a,b,ya,n) or y = midpoint(f,a,b,ya,n)
% Midpoint method for initial value problems
%
% Input:
% f - Matlab inline function f(t,y)
% a,b - interval
% ya - initial condition
% n - number of subintervals (panels)
%
% Output:
% y - computed solution
% t - time steps
%
% Examples:
% [y t]=midpoint(#myfunc,0,1,1,10); here 'myfunc' is a user-defined function in M-file
% y=midpoint(inline('sin(y*t)','t','y'),0,1,1,10);
% f=inline('sin(y(1))-cos(y(2))','t','y');
% y=midpoint(f,0,1,1,10);
function [y t] = midpoint(f,a,b,ya,n)
h = (b - a) / n;
halfh = h / 2;
y(1,:) = ya;
t(1) = a;
for i = 1 : n
t(i+1) = t(i) + h;
z = y(i,:) + halfh * f(t(i),y(i,:));
y(i+1,:) = y(i,:) + h * f(t(i)+halfh,z);
end;
I have the R code for Euler method which is
euler <- function(f, h = 1e-7, x0, y0, xfinal) {
N = (xfinal - x0) / h
x = y = numeric(N + 1)
x[1] = x0; y[1] = y0
i = 1
while (i <= N) {
x[i + 1] = x[i] + h
y[i + 1] = y[i] + h * f(x[i], y[i])
i = i + 1
}
return (data.frame(X = x, Y = y))
}
so based on the matlab code, do I need to change h in euler method (R code) to (b - a) / n to modify Euler code to midpoint method?
Note
Broadly speaking, I agree with the expressed comments; however, I decided to vote up this question. (now deleted) This is due to the existence of matconv that facilitates this process.
Answer
Given your code, we could use matconv in the following manner:
pacman::p_load(matconv)
out <- mat2r(inMat = "input.m")
The created out object will attempt to translate Matlab code into R, however, the job is far from finished. If you inspect the out object you will see that it requires further work. Simple statements are usually translated correctly with Matlab comments % replaced with # and so forth but more complex statements may require a more detailed investigation. You could then inspect respective line and attempt to evaluate them to see where further work may be required, example:
eval(parse(text=out$rCode[1]))
NULL
(first line is a comment so the output is NULL)

Optim using gradient Error: "no method matching"

I’m trying to optimize a function using one of the algorithms that require a gradient. Basically I’m trying to learn how to optimize a function using a gradient in Julia. I’m fairly confident that my gradient is specified correctly. I know this because the similarly defined Matlab function for the gradient gives me the same values as in Julia for some test values of the arguments. Also, the Matlab version using fminunc with the gradient seems to optimize the function fine.
However when I run the Julia script, I seem to get the following error:
julia> include("ex2b.jl")
ERROR: `g!` has no method matching g!(::Array{Float64,1}, ::Array{Float64,1})
while loading ...\ex2b.jl, in ex
pression starting on line 64
I'm running Julia 0.3.2 on a windows 7 32bit machine. Here is the code (basically a translation of some Matlab to Julia):
using Optim
function mapFeature(X1, X2)
degrees = 5
out = ones(size(X1)[1])
for i in range(1, degrees+1)
for j in range(0, i+1)
term = reshape( (X1.^(i-j) .* X2.^(j)), size(X1.^(i-j))[1], 1)
out = hcat(out, term)
end
end
return out
end
function sigmoid(z)
return 1 ./ (1 + exp(-z))
end
function costFunc_logistic(theta, X, y, lam)
m = length(y)
regularization = sum(theta[2:end].^2) * lam / (2 * m)
return sum( (-y .* log(sigmoid(X * theta)) - (1 - y) .* log(1 - sigmoid(X * theta))) ) ./ m + regularization
end
function costFunc_logistic_gradient!(theta, X, y, lam, m)
grad= X' * ( sigmoid(X * theta) .- y ) ./ m
grad[2:end] = grad[2:end] + theta[2:end] .* lam / m
return grad
end
data = readcsv("ex2data2.txt")
X = mapFeature(data[:,1], data[:,2])
m, n = size(data)
y = data[:, end]
theta = zeros(size(X)[2])
lam = 1.0
f(theta::Array) = costFunc_logistic(theta, X, y, lam)
g!(theta::Array) = costFunc_logistic_gradient!(theta, X, y, lam, m)
optimize(f, g!, theta, method = :l_bfgs)
And here is some of the data:
0.051267,0.69956,1
-0.092742,0.68494,1
-0.21371,0.69225,1
-0.375,0.50219,1
-0.51325,0.46564,1
-0.52477,0.2098,1
-0.39804,0.034357,1
-0.30588,-0.19225,1
0.016705,-0.40424,1
0.13191,-0.51389,1
0.38537,-0.56506,1
0.52938,-0.5212,1
0.63882,-0.24342,1
0.73675,-0.18494,1
0.54666,0.48757,1
0.322,0.5826,1
0.16647,0.53874,1
-0.046659,0.81652,1
-0.17339,0.69956,1
-0.47869,0.63377,1
-0.60541,0.59722,1
-0.62846,0.33406,1
-0.59389,0.005117,1
-0.42108,-0.27266,1
-0.11578,-0.39693,1
0.20104,-0.60161,1
0.46601,-0.53582,1
0.67339,-0.53582,1
-0.13882,0.54605,1
-0.29435,0.77997,1
-0.26555,0.96272,1
-0.16187,0.8019,1
-0.17339,0.64839,1
-0.28283,0.47295,1
-0.36348,0.31213,1
-0.30012,0.027047,1
-0.23675,-0.21418,1
-0.06394,-0.18494,1
0.062788,-0.16301,1
0.22984,-0.41155,1
0.2932,-0.2288,1
0.48329,-0.18494,1
0.64459,-0.14108,1
0.46025,0.012427,1
0.6273,0.15863,1
0.57546,0.26827,1
0.72523,0.44371,1
0.22408,0.52412,1
0.44297,0.67032,1
0.322,0.69225,1
0.13767,0.57529,1
-0.0063364,0.39985,1
-0.092742,0.55336,1
-0.20795,0.35599,1
-0.20795,0.17325,1
-0.43836,0.21711,1
-0.21947,-0.016813,1
-0.13882,-0.27266,1
0.18376,0.93348,0
0.22408,0.77997,0
Let me know if you guys need additional details. Btw, this relates to a coursera machine learning course if curious.
The gradient should not be a function to compute the gradient,
but a function to store it
(hence the exclamation mark in the function name, and the second argument in the error message).
The following seems to work.
function g!(theta::Array, storage::Array)
storage[:] = costFunc_logistic_gradient!(theta, X, y, lam, m)
end
optimize(f, g!, theta, method = :l_bfgs)
The same using closures and currying (version for those who got used to a function that returns the cost and gradient):
function cost_gradient(θ, X, y, λ)
m = length(y);
return (θ::Array) -> begin
h = sigmoid(X * θ); #(m,n+1)*(n+1,1) -> (m,1)
J = (1 / m) * sum(-y .* log(h) .- (1 - y) .* log(1 - h)) + λ / (2 * m) * sum(θ[2:end] .^ 2);
end, (θ::Array, storage::Array) -> begin
h = sigmoid(X * θ); #(m,n+1)*(n+1,1) -> (m,1)
storage[:] = (1 / m) * (X' * (h .- y)) + (λ / m) * [0; θ[2:end]];
end
end
Then, somewhere in the code:
initialθ = zeros(n,1);
f, g! = cost_gradient(initialθ, X, y, λ);
res = optimize(f, g!, initialθ, method = :cg, iterations = your_iterations);
θ = res.minimum;

Resources