2D curve fitting in Julia - julia

I have an array Z in Julia which represents an image of a 2D Gaussian function. I.e. Z[i,j] is the height of the Gaussian at pixel i,j. I would like to determine the parameters of the Gaussian (mean and covariance), presumably by some sort of curve fitting.
I've looked into various methods for fitting Z: I first tried the Distributions package, but it is designed for a somewhat different situation (randomly selected points). Then I tried the LsqFit package, but it seems to be tailored for 1D fitting, as it is throwing errors when I try to fit 2D data, and there is no documentation I can find to lead me to a solution.
How can I fit a Gaussian to a 2D array in Julia?

The simplest approach is to use Optim.jl. Here is an example code (it was not optimized for speed, but it should show you how you can handle the problem):
using Distributions, Optim
# generate some sample data
true_d = MvNormal([1.0, 0.0], [2.0 1.0; 1.0 3.0])
const xr = -3:0.1:3
const yr = -3:0.1:3
const s = 5.0
const m = [s * pdf(true_d, [x, y]) for x in xr, y in yr]
decode(x) = (mu=x[1:2], sig=[x[3] x[4]; x[4] x[5]], s=x[6])
function objective(x)
mu, sig, s = decode(x)
try # sig might be infeasible so we have to handle this case
est_d = MvNormal(mu, sig)
ref_m = [s * pdf(est_d, [x, y]) for x in xr, y in yr]
sum((a-b)^2 for (a,b) in zip(ref_m, m))
catch
sum(m)
end
end
# test for an example starting point
result = optimize(objective, [1.0, 0.0, 1.0, 0.0, 1.0, 1.0])
decode(result.minimizer)
Alternatively you could use constrained optimization e.g. like this:
using Distributions, JuMP, NLopt
true_d = MvNormal([1.0, 0.0], [2.0 1.0; 1.0 3.0])
const xr = -3:0.1:3
const yr = -3:0.1:3
const s = 5.0
const Z = [s * pdf(true_d, [x, y]) for x in xr, y in yr]
m = Model(solver=NLoptSolver(algorithm=:LD_MMA))
#variable(m, m1)
#variable(m, m2)
#variable(m, sig11 >= 0.001)
#variable(m, sig12)
#variable(m, sig22 >= 0.001)
#variable(m, sc >= 0.001)
function obj(m1, m2, sig11, sig12, sig22, sc)
est_d = MvNormal([m1, m2], [sig11 sig12; sig12 sig22])
ref_Z = [sc * pdf(est_d, [x, y]) for x in xr, y in yr]
sum((a-b)^2 for (a,b) in zip(ref_Z, Z))
end
JuMP.register(m, :obj, 6, obj, autodiff=true)
#NLobjective(m, Min, obj(m1, m2, sig11, sig12, sig22, sc))
#NLconstraint(m, sig12*sig12 + 0.001 <= sig11*sig22)
setvalue(m1, 0.0)
setvalue(m2, 0.0)
setvalue(sig11, 1.0)
setvalue(sig12, 0.0)
setvalue(sig22, 1.0)
setvalue(sc, 1.0)
status = solve(m)
getvalue.([m1, m2, sig11, sig12, sig22, sc])

In principle, you have a loss function
loss(μ, Σ) = sum(dist(Z[i,j], N([x(i), y(j)], μ, Σ)) for i in Ri, j in Rj)
where x and y convert your indices to points on the axes (for which you need to know the grid distance and offset positions), and Ri and Rj the ranges of the indices. dist is the distance measure you use, eg. squared difference.
You should be able to pass this into an optimizer by packing μ and Σ into a single vector:
pack(μ, Σ) = [μ; vec(Σ)]
unpack(v) = #views v[1:N], reshape(v[N+1:end], N, N)
loss_packed(v) = loss(unpack(v)...)
where in your case N = 2. (Maybe the unpacking deserves some optimization to get rid of unnecessary copying.)
Another thing is that we have to ensure that Σ is positive semidifinite (and hence also symmetric). One way to do that is to parametrize the packed loss function differently, and optimize over some lower triangular matrix L, such that Σ = L * L'. In the case N = 2, we can write this as
unpack(v) = v[1:2], LowerTriangular([v[3] zero(v[3]); v[4] v[5]])
loss_packed(v) = let (μ, L) = unpack(v)
loss(μ, L * L')
end
(This is of course prone to further optimization, such as expanding the multiplication directly in to loss). A different way is to specify the condition as constraints into the optimizer.
For the optimzer to work you probably have to get the derivative of loss_packed. Either have to find the manually calculate it (by a good choice of dist), or maybe more easily by using a log transformation (if you're lucky, you find a way to reduce it to a linear problem...). Alternatively you could try to find an optimizer that does automatic differentiation.

Related

Is there an R spline that matches the Schoenberg algorithm?

Is there a R algorithm that fit smoothing splines while minimizing L,
L = ρ ∑ (i from 0 to n-1) wi(yi-Si(xi))² + (1 - ρ) ∫ (x from 0 to x_(n-1)) (S''(x))² dx
Maybe it's possible with smooth.spline but I didn't succeed to find the good parameters.
(The equation can be seen more clearly here : https://www.iro.umontreal.ca/~simardr/ssj/doc/html/umontreal/iro/lecuyer/functionfit/SmoothingCubicSpline.html)
pspline::smooth.Pspline( x = input_x,
y = input_y,
norder = 2,
method = 1,
spar = rho)
will do the job.

Lagrange Multiplier Method using NLsolve.jl

I would like to minimize a distance function ||dz - z|| under the constraint that g(z) = 0.
I wanted to use Lagrange Multipliers to solve this problem. Then I used NLsolve.jl to solve the non-linear equation that I end up with.
using NLsolve
using ForwardDiff
function ProjLagrange(dz, g::Function)
λ_init = ones(size(g(dz...),1))
initial_x = vcat(dz, λ_init)
function gradL!(F, x)
len_dz = length(dz)
z = x[1:len_dz]
λ = x[len_dz+1:end]
F = Array{Float64}(undef, length(x))
my_distance(z) = norm(dz - z)
∇f = z -> ForwardDiff.gradient(my_distance, z)
F[1:len_dz] = ∇f(z) .- dot(λ, g(z...))
if length(λ) == 1
F[end] = g(z...)
else
F[len_dz+1:end] = g(z)
end
end
nlsolve(gradL!, initial_x)
end
g_test(x1, x2, x3) = x1^2 + x2 - x2 + 5
z = [1000,1,1]
ProjLagrange(z, g_test)
But I always end up with Zero: [NaN, NaN, NaN, NaN] and Convergence: false.
Just so you know I have already solved the equation by using Optim.jl and minimizing the following function: Proj(z) = b * sum(abs.(g(z))) + a * norm(dz - z).
But I would really like to know if this is possible with NLsolve. Any help is greatly appreciated!
Starting almost from scratch and wikipedia's Lagrange multiplier page because it was good for me, the code below seemed to work. I added an λ₀s argument to the ProjLagrange function so that it can accept a vector of initial multiplier λ values (I saw you initialized them at 1.0 but I thought this was more generic). (Note this has not been optimized for performance!)
using NLsolve, ForwardDiff, LinearAlgebra
function ProjLagrange(x₀, λ₀s, gs, n_it)
# distance function from x₀ and its gradients
f(x) = norm(x - x₀)
∇f(x) = ForwardDiff.gradient(f, x)
# gradients of the constraints
∇gs = [x -> ForwardDiff.gradient(g, x) for g in gs]
# Form the auxiliary function and its gradients
ℒ(x,λs) = f(x) - sum(λ * g(x) for (λ,g) in zip(λs,gs))
∂ℒ∂x(x,λs) = ∇f(x) - sum(λ * ∇g(x) for (λ,∇g) in zip(λs,∇gs))
∂ℒ∂λ(x,λs) = [g(x) for g in gs]
# as a function of a single argument
nx = length(x₀)
ℒ(v) = ℒ(v[1:nx], v[nx+1:end])
∇ℒ(v) = vcat(∂ℒ∂x(v[1:nx], v[nx+1:end]), ∂ℒ∂λ(v[1:nx], v[nx+1:end]))
# and solve
v₀ = vcat(x₀, λ₀s)
nlsolve(∇ℒ, v₀, iterations=n_it)
end
# test
gs_test = [x -> x[1]^2 + x[2] - x[3] + 5]
λ₀s_test = [1.0]
x₀_test = [1000.0, 1.0, 1.0]
n_it = 100
res = ProjLagrange(x₀_test, λ₀s_test, gs_test, n_it)
gives me
julia> res = ProjLagrange(x₀_test, λ₀s_test, gs_test, n_it)
Results of Nonlinear Solver Algorithm
* Algorithm: Trust-region with dogleg and autoscaling
* Starting Point: [1000.0, 1.0, 1.0, 1.0]
* Zero: [9.800027199717013, -49.52026655749088, 51.520266557490885, -0.050887973682118504]
* Inf-norm of residuals: 0.000000
* Iterations: 10
* Convergence: true
* |x - x'| < 0.0e+00: false
* |f(x)| < 1.0e-08: true
* Function Calls (f): 11
* Jacobian Calls (df/dx): 11
I altered your code as below (see my comments in there) and got the following output. It doesn't throw NaNs anymore, reduces the objective and converges. Does this differ from your Optim.jl results?
Results of Nonlinear Solver Algorithm
* Algorithm: Trust-region with dogleg and autoscaling
* Starting Point: [1000.0, 1.0, 1.0, 1.0]
* Zero: [9.80003, -49.5203, 51.5203, -0.050888]
* Inf-norm of residuals: 0.000000
* Iterations: 10
* Convergence: true
* |x - x'| < 0.0e+00: false
* |f(x)| < 1.0e-08: true
* Function Calls (f): 11
* Jacobian Calls (df/dx): 11
using NLsolve
using ForwardDiff
using LinearAlgebra: norm, dot
using Plots
function ProjLagrange(dz, g::Function, n_it)
λ_init = ones(size(g(dz),1))
initial_x = vcat(dz, λ_init)
# These definitions can go outside as well
len_dz = length(dz)
my_distance = z -> norm(dz - z)
∇f = z -> ForwardDiff.gradient(my_distance, z)
# In fact, this is probably the most vital difference w.r.t. your proposal.
# We need the gradient of the constraints.
∇g = z -> ForwardDiff.gradient(g, z)
function gradL!(F, x)
z = x[1:len_dz]
λ = x[len_dz+1:end]
# `F` is memory allocated by NLsolve to store the residual of the
# respective call of `gradL!` and hence doesn't need to be allocated
# anew every time (or at all).
F[1:len_dz] = ∇f(z) .- λ .* ∇g(z)
F[len_dz+1:end] .= g(z)
end
return nlsolve(gradL!, initial_x, iterations=n_it, store_trace=true)
end
# Presumable here is something wrong: x2 - x2 is not very likely, also made it
# callable directly with an array argument
g_test = x -> x[1]^2 + x[2] - x[3] + 5
z = [1000,1,1]
n_it = 10000
res = ProjLagrange(z, g_test, n_it)
# Ugly reformatting here
trace = hcat([[state.iteration; state.fnorm; state.stepnorm] for state in res.trace.states]...)
plot(trace[1,:], trace[2,:], label="f(x) inf-norm", xlabel="steps")
Evolution of inf-norm of f(x) over iteration steps
[Edit: Adapted solution to incorporate correct gradient computation for g()]

How to get a rolling window regression in julia

Say I have prices of a stock and I want to find the slope of the regression line in rolling manner with a given window size. How can I get it done in Julia? I want it to be really fast hence don't want to use a for loop.
You should not, in general, be worried about for loops in Julia, as they do not have the overhead of R or Python for loops. Thus, you only need to worry about asymptotic complexity and not the potentially large constant factor introduced by interpreter overhead.
Nevertheless, this operation can be done much more (asymptotically) efficiently with convolutions than with the naïve O(n²) slice-and-regress approach. The DSP.jl package provides convolution functionality. The following is an example with no intercept (it computes the rolling betas); support for an intercept should be possible by modifying the formulas.
using DSP
# Create some example x (signal) and y (stock prices)
# such that strength of signal goes up over time
const x = randn(100)
const y = (1:100) .* x .+ 100 .* randn(100)
# Create the rolling window
const window = Window.rect(20)
# Compute linear least squares estimate (X^T X)^-1 X^T Y
const xᵗx = conv(x .* x, window)[length(window):end-length(window)+1]
const xᵗy = conv(x .* y, window)[length(window):end-length(window)+1]
const lls = xᵗy ./ xᵗx # desired beta
# Check result against naïve for loop
const βref = [dot(x[i:i+19], y[i:i+19]) / dot(x[i:i+19], x[i:i+19]) for i = 1:81]
#assert isapprox(βref, lls)
Edit to add: To support an intercept, i.e. X = [x 1], so X^T X = [dot(x, x) sum(x); sum(x) w] where w is the window size, the formula for inverse of a 2D matrix can be used to get (X^T X)^-1 = [w -sum(x); -sum(x) dot(x, x)]/(w * dot(x, x) - sum(x)^2). Thus, [β, α] = [w dot(x, y) - sum(x) * sum(y), dot(x, x) * sum(y) - sum(x) * dot(x, y)] / (w * dot(x, x) - sum(x)^2). This can be translated to the following convolution code:
# Compute linear least squares estimate with intercept
const w = length(window)
const xᵗx = conv(x .* x, window)[w:end-w+1]
const xᵗy = conv(x .* y, window)[w:end-w+1]
const 𝟙ᵗx = conv(x, window)[w:end-w+1]
const 𝟙ᵗy = conv(y, window)[w:end-w+1]
const denom = w .* xᵗx - 𝟙ᵗx .^ 2
const α = (xᵗx .* 𝟙ᵗy .- 𝟙ᵗx .* xᵗy) ./ denom
const β = (w .* xᵗy .- 𝟙ᵗx .* 𝟙ᵗy) ./ denom
# Check vs. naive solution
const ref = vcat([([x[i:i+19] ones(20)] \ y[i:i+19])' for i = 1:81]...)
#assert isapprox([β α], ref)
Note that, for weighted least squares with a different window shape, some minor modifications will be needed to disentangle length(window) and sum(window) which are used interchangeably in the code above.
Since I dont need a x variable, I created a numeric series. Using RollingFunctions Package I was able to get rolling regressions through below function.
using RollingFunctions
function rolling_regression(price,windowsize)
sum_x = sum(collect(1:windowsize))
sum_x_squared = sum(collect(1:windowsize).^2)
sum_xy = rolling(sum,price,windowsize,collect(1:windowsize))
sum_y = rolling(sum,price,windowsize)
b = ((windowsize*sum_xy) - (sum_x*sum_y))/(windowsize*sum_x_squared - sum_x^2)
c = [repeat([missing],windowsize-1);b]
end

Can't get performant Julia Turing model

I've tried to reproduce the model from a PYMC3 and Stan comparison. But it seems to run slowly and when I look at #code_warntype there are some things -- K and N I think -- which the compiler seemingly calls Any.
I've tried adding types -- though I can't add types to turing_model's arguments and things are complicated within turing_model because it's using autodiff variables and not the usuals. I put all the code into the function do_it to avoid globals, because they say that globals can slow things down. (It actually seems slower, though.)
Any suggestions as to what's causing the problem? The turing_model code is what's iterating, so that should make the most difference.
using Turing, StatsPlots, Random
sigmoid(x) = 1.0 / (1.0 + exp(-x))
function scale(w0::Float64, w1::Array{Float64,1})
scale = √(w0^2 + sum(w1 .^ 2))
return w0 / scale, w1 ./ scale
end
function do_it(iterations::Int64)::Chains
K = 10 # predictor dimension
N = 1000 # number of data samples
X = rand(N, K) # predictors (1000, 10)
w1 = rand(K) # weights (10,)
w0 = -median(X * w1) # 50% of elements for each class (number)
w0, w1 = scale(w0, w1) # unit length (euclidean)
w_true = [w0, w1...]
y = (w0 .+ (X * w1)) .> 0.0 # labels
y = [Float64(x) for x in y]
σ = 5.0
σm = [x == y ? σ : 0.0 for x in 1:K, y in 1:K]
#model turing_model(X, y, σ, σm) = begin
w0_pred ~ Normal(0.0, σ)
w1_pred ~ MvNormal(σm)
p = sigmoid.(w0_pred .+ (X * w1_pred))
#inbounds for n in 1:length(y)
y[n] ~ Bernoulli(p[n])
end
end
#time chain = sample(turing_model(X, y, σ, σm), NUTS(iterations, 200, 0.65));
# ϵ = 0.5
# τ = 10
# #time chain = sample(turing_model(X, y, σ), HMC(iterations, ϵ, τ));
return (w_true=w_true, chains=chain::Chains)
end
chain = do_it(1000)

Super-ellipse Point Picking

https://en.wikipedia.org/wiki/Superellipse
I have read the SO questions on how to point-pick from a circle and an ellipse.
How would one uniformly select random points from the interior of a super-ellipse?
More generally, how would one uniformly select random points from the interior of the curve described by an arbitrary super-formula?
https://en.wikipedia.org/wiki/Superformula
The discarding method is not considered a solution, as it is mathematically unenlightening.
In order to sample the superellipse, let's assume without loss of generality that a = b = 1. The general case can be then obtained by rescaling the corresponding axis.
The points in the first quadrant (positive x-coordinate and positive y-coordinate) can be then parametrized as:
x = r * ( cos(t) )^(2/n)
y = r * ( sin(t) )^(2/n)
with 0 <= r <= 1 and 0 <= t <= pi/2:
Now, we need to sample in r, t so that the sampling transformed into x, y is uniform. To this end, let's calculate the Jacobian of this transform:
dx*dy = (2/n) * r * (sin(2*t)/2)^(2/n - 1) dr*dt
= (1/n) * d(r^2) * d(f(t))
Here, we see that as for the variable r, it is sufficient to sample uniformly the value of r^2 and then transform back with a square root. The dependency on t is a bit more complicated. However, with some effort, one gets
f(t) = -(n/2) * 2F1(1/n, (n-1)/n, 1 + 1/n, cos(t)^2) * cos(t)^(2/n)
where 2F1 is the hypergeometric function.
In order to obtain uniform sampling in x,y, we need now to sample uniformly the range of f(t) for t in [0, pi/2] and then find the t which corresponds to this sampled value, i.e., to solve for t the equation u = f(t) where u is a uniform random variable sampled from [f(0), f(pi/2)]. This is essentially the same method as for r, nevertheless in that case one can calculate the inverse directly.
One small issue with this approach is that the function f is not that well-behaved near zero - the infinite slope makes it quite challenging to find a root of u = f(t). To circumvent this, we can sample only the "upper part" of the first quadrant (i.e., area between lines x=y and x=0) and then obtain all the other points by symmetry (not only in the first quadrant but also for all the other ones).
An implementation of this method in Python could look like:
import numpy as np
from numpy.random import uniform, randint, seed
from scipy.optimize import brenth, ridder, bisect, newton
from scipy.special import gamma, hyp2f1
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
seed(100)
def superellipse_area(n):
#https://en.wikipedia.org/wiki/Superellipse#Mathematical_properties
inv_n = 1. / n
return 4 * ( gamma(1 + inv_n)**2 ) / gamma(1 + 2*inv_n)
def sample_superellipse(n, num_of_points = 2000):
def f(n, x):
inv_n = 1. / n
return -(n/2)*hyp2f1(inv_n, 1 - inv_n, 1 + inv_n, x)*(x**inv_n)
lb = f(n, 0.5)
ub = f(n, 0.0)
points = [None for idx in range(num_of_points)]
for idx in range(num_of_points):
r = np.sqrt(uniform())
v = uniform(lb, ub)
w = bisect(lambda w: f(n, w**n) - v, 0.0, 0.5**(1/n))
z = w**n
x = r * z**(1/n)
y = r * (1 - z)**(1/n)
if uniform(-1, 1) < 0:
y, x = x, y
x = (2*randint(0, 2) - 1)*x
y = (2*randint(0, 2) - 1)*y
points[idx] = [x, y]
return points
def plot_superellipse(ax, n, points):
coords_x = [p[0] for p in points]
coords_y = [p[1] for p in points]
ax.set_xlim(-1.25, 1.25)
ax.set_ylim(-1.25, 1.25)
ax.text(-1.1, 1, '{n:.1f}'.format(n = n), fontsize = 12)
ax.scatter(coords_x, coords_y, s = 0.6)
params = np.array([[0.5, 1], [2, 4]])
fig = plt.figure(figsize = (6, 6))
gs = gridspec.GridSpec(*params.shape, wspace = 1/32., hspace = 1/32.)
n_rows, n_cols = params.shape
for i in range(n_rows):
for j in range(n_cols):
n = params[i, j]
ax = plt.subplot(gs[i, j])
if i == n_rows-1:
ax.set_xticks([-1, 0, 1])
else:
ax.set_xticks([])
if j == 0:
ax.set_yticks([-1, 0, 1])
else:
ax.set_yticks([])
#ensure that the ellipses have similar point density
num_of_points = int(superellipse_area(n) / superellipse_area(2) * 4000)
points = sample_superellipse(n, num_of_points)
plot_superellipse(ax, n, points)
fig.savefig('fig.png')
This produces:

Resources