Integration of interaction tensor - julia

I need to calculate the following integral several thousand times per time step:
where and:
, and
So far I have implemented in Julia as:
using StaticArrays
function interactiontensor(C, a1, a2, a3, ϕ, θ)
n1,n2 = 100,50
T = fill(0.0,3,3,3,3)
Av = zeros(4,4)
invAv = similar(Av)
xi = Vector{Float64}(undef, 3)
#inbounds for p ∈ 1:n1
sinθp = sind(θ[p])
cosθp = cosd(θ[p])
for q ∈ 1:n2
sinϕq = sind(ϕ[q])
cosϕq = cosd(ϕ[q])
# -- Director cosines
xi[1] = sinθp*cosϕq/a1
xi[2] = sinθp*sinϕq/a2
xi[3] = cosθp/a3
fillAv!(Av, xi)
invAv = inv(SMatrix{4,4}(Av))
surface += sinθp
return T ./= surface
#inline function Christoffel!(Av,C,xi)
#inbounds for t ∈ 1:3, r ∈ 1:3
aux = zero(eltype(C))
for u ∈ 1:3, s ∈ 1:3
aux += C[r, s, t, u] * xi[s] * xi[u]
Av[r, t] = aux
#inline function tensorT!(T,invAv,xi,sinθp)
#inbounds for k ∈ 1:3, i ∈ 1:3
aux = invAv[i, k]
for l ∈ 1:3, j ∈ 1:3
T[i, j, k, l] += aux * xi[j] * xi[l] * sinθp
#inline function fillAv!(Av, xi)
#inbounds for i ∈ 1:3
xi0 = xi[i]
Av[i, 4] = xi0
Av[4, i] = xi0
n1,n2 = 100,100
step = π/n1
dθ,dϕ = π/n1, 2π/n2
θ = rad2deg.(range(dθ, stop = pi, length = n1))
ϕ = rad2deg.(range(dϕ, stop = 2pi, length = n2))
C = #SArray rand(3,3,3,3)
#btime interactiontensor($C, $10.0, $5.0, $1.0, $ϕ, $θ);
# 544.795 μs (4 allocations: 1.08 KiB)
Given the number of times I ideally need to compute this integral, is there any optimization to my implementation, or an alternative approach, to considerably reduce the computational cost?

Here are some suggestions:
sinθp, cosθp = sincosd(θ[p]), i.e. computing sine and cosine in one step.
Initializing xi = #SVector zeros(3) as a static vector and then use Setfield.jl to assign the values in each iteration, i.e. #set x[1] = sinθp*cosϕq/a1.
Load the package LoopVectorization.jl and use the #avx macro (very roughly speaking similar to #simd) to speed up the loops in Christoffel!, tensorT! and fillAv!.
On my machine I find that these changes reduce the computation time by more than a factor of 5 (relative to the original function in the OP). The biggest chunk is due to #avx, the second point above amounts to about ~30%.
julia> #btime interactiontensor_original($C, $10.0, $5.0, $1.0, $ϕ, $θ);
661.655 μs (5 allocations: 1.28 KiB)
julia> #btime interactiontensor_optimized($C, $10.0, $5.0, $1.0, $ϕ, $θ);
125.352 μs (4 allocations: 1.17 KiB)
Here the full modified code (note that I commented out the lines involving surface which isn't specified in the OP):
using StaticArrays, Setfield, LoopVectorization
function interactiontensor_optimized(C, a1, a2, a3, ϕ, θ)
n1,n2 = 100,50
T = fill(0.0,3,3,3,3)
Av = zeros(4,4)
invAv = similar(Av)
xi = #SVector zeros(3)
#inbounds for p ∈ 1:n1
sinθp, cosθp = sincosd(θ[p])
for q ∈ 1:n2
sinϕq, cosϕq = sincosd(ϕ[q])
# -- Director cosines
#set xi[1] = sinθp*cosϕq/a1
#set xi[2] = sinθp*sinϕq/a2
#set xi[3] = cosθp/a3
fillAv!(Av, xi)
invAv = inv(SMatrix{4,4}(Av))
# surface += sinθp
return T #./= surface
#inline function Christoffel!(Av,C,xi)
#avx for t ∈ 1:3, r ∈ 1:3
aux = zero(eltype(C))
for u ∈ 1:3, s ∈ 1:3
aux += C[r, s, t, u] * xi[s] * xi[u]
Av[r, t] = aux
#inline function tensorT!(T,invAv,xi,sinθp)
#avx for k ∈ 1:3, i ∈ 1:3
aux = invAv[i, k]
for l ∈ 1:3, j ∈ 1:3
T[i, j, k, l] += aux * xi[j] * xi[l] * sinθp
#inline function fillAv!(Av, xi)
#avx for i ∈ 1:3
xi0 = xi[i]
Av[i, 4] = xi0
Av[4, i] = xi0


Changing Sampled Parameter Type in Turing.jl Model

This is my Julia code to simulate data and sample from a Turing.jl model:
using LinearAlgebra, Distributions, StatsBase
using Turing, FillArrays, DynamicHMC, LabelledArrays
using NNlib, GLM
using CSV, DataFrames
function generate_hmnl_data(R::Int=100, S::Int=30, C::Int=3,
Theta::Array{Float64, 2}=ones(2, 4),
Sigma::Array{Float64, 2}=Matrix(Diagonal(fill(0.1, 4))))
K = size(Theta, 2)
G = size(Theta, 1)
Y = Array{Int64}(undef, R, S)
X = randn(R, S, C, K)
Z = Array{Float64}(undef, G, R)
Z[1, :] .= 1
if G > 1
Z[2:G, :] = randn(R * (G-1))
Beta = Array{Float64}(undef, K, R)
for r in 1:R
println(Z[:, r])
Beta[:, r] = rand(MvNormal(Theta' * Z[:, r], Sigma))
for s in 1:S
Y[r, s] = sample(1:C, Weights(exp.(X[r, s, :, :] * Beta[:, r])))
return (R=R, S=S, C=C, K=K, G=G, Y=Y, X=X, Z=Z,
beta_true=Beta, Theta_true=Theta, Sigma_true=Sigma)
d1 = generate_hmnl_data()
#model function hmnl(G::Int, Y::Matrix{Int64}, X::Array{Float64}, Z::Matrix{Float64})
R, S, C, K = size(X)
Theta = zeros(K, G)
for k in 1:K
for g in 1:G
Theta[k, g] ~ Normal(0, 10)
Sigma ~ InverseWishart(K, diagm(ones(K)))
Beta = zeros(K, R)
for r in 1:R
Beta[:, r] ~ MvNormal(Theta * Z[:, r], Sigma)
println(typeof(Beta[:, r]))
for s in 1:S
beta_r = copy(Beta[:, r])
beta_r = convert(Vector{Float64}, beta_r)
ut_rs = X[r, s, :, :] * beta_r
v = softmax(ut_rs)
Y[r, s] ~ Categorical(v)
sampler = HMC(.05, 10)
test_mod = hmnl(d1.G, d1.Y, d1.X, d1.Z)
chains = sample(test_mod, sampler, 1_000)
I get this error when I try to sample from the model: MethodError: no method matching float(::Type{Any}). The sampling statement Beta[:, r] ~ MvNormal(Theta * Z[:, r], Sigma) changes Beta[:, r] to type Vector{Any}.
I have tried
beta_r = copy(Beta[:, r])
beta_r = convert(Vector{Float64}, beta_r)
ut_rs = X[r, s, :, :] * beta_r
But then I get this error instead:
ERROR: TypeError: in typeassert, expected Float64, got a value of type ForwardDiff.Dual{Nothing, Float64, 12}
So it's messing with Turing AD somehow. I'm new to Turing and can't understand the right way to do this.
I'm reposting an answer from Tor Fjelde ( which I received on Github. For Turing to work you need to ensure types in your model can be inferred. I wasn't doing that.
This function worked:
#model function hmnl(G::Int, Y::Matrix{Int64}, X::Array{Float64}, Z::Matrix{Float64}, ::Type{T} = Float64) where {T}
R, S, C, K = size(X)
Theta = zeros(T, K, G)
for k in 1:K
for g in 1:G
Theta[k, g] ~ Normal(0, 10)
Sigma ~ InverseWishart(K, diagm(ones(K)))
Beta = zeros(T, K, R)
for r in 1:R
Beta[:, r] ~ MvNormal(Theta * Z[:, r], Sigma)
println(typeof(Beta[:, r]))
for s in 1:S
ut_rs = X[r, s, :, :] * Beta[:, r]
v = softmax(ut_rs)
Y[r, s] ~ Categorical(v)

What do multiple objective functions mean in Julia jump?

I have multiple objective functions for the same model in Julia JuMP created using an #optimize in a for loop. What does it mean to have multiple objective functions in Julia? What objective is minimized, or is it that all the objectives are minimized jointly? How are the objectives minimized jointly?
using JuMP
using MosekTools
K = 3
N = 2
penalties = [1.0, 3.9, 8.7]
function fac1(r::Number, i::Number, l::Number)
fac1 = 1.0
for m in 0:r-1
fac1 *= (i-m)*(l-m)
return fac1
function fac2(r::Number, i::Number, l::Number, tau::Float64)
return tau ^ (i + l - 2r + 1)/(i + l - 2r + 1)
function Q_r(i::Number, l::Number, r::Number, tau::Float64)
if i >= r && l >= r
return 2 * fac1(r, i, l) * fac2(r, i, l, tau)
return 0.0
function Q(i::Number, l::Number, tau::Number)
elem = 0
for r in 0:N
elem += penalties[r + 1] * Q_r(i, l, r, tau)
return elem
# discrete segment starting times
mat = Array{Float64, 3}(undef, K, N+1, N+1)
function Q_mat()
for k in 0:K-1
for i in 1:N+1
for j in 1:N+1
mat[k+1, i, j] = Q(i, j, convert(Float64, k))
return mat
function A_tau(r::Number, n::Number, tau::Float64)
fac = 1
for m in 1:r
fac *= (n - (m - 1))
if n >= r
return fac * tau ^ (n - r)
return 0.0
function A_tau_mat(tau::Float64)
mat = Array{Float64, 2}(undef, N+1, N+1)
for i in 1:N+1
for j in 1:N+1
mat[i, j] = A_tau(i, j, tau)
return mat
function A_0(r::Number, n::Number)
if r == n
fac = 1
for m in 1:r
fac *= r - (m - 1)
return fac
return 0.0
m = Model(optimizer_with_attributes(Mosek.Optimizer, "QUIET" => false, "INTPNT_CO_TOL_DFEAS" => 1e-7))
#variable(m, A[i=1:K+1,j=1:K,k=1:N+1,l=1:N+1])
#variable(m, p[i=1:K+1,j=1:N+1])
# constraint difference might be a small fractional difference.
# assuming that time difference is 1 second starting from 0.
for i in 1:K
#constraint(m, -A_tau_mat(convert(Float64, i-1)) * p[i] .+ A_tau_mat(convert(Float64, i-1)) * p[i+1] .== [0.0, 0.0, 0.0])
for i in 1:K+1
#constraint(m, A_tau_mat(convert(Float64, i-1)) * p[i] .== [1.0 12.0 13.0])
#constraint(m, A_tau_mat(convert(Float64, K+1)) * p[K+1] .== [0.0 0.0 0.0])
for i in 1:K+1
#objective(m, Min, p[i]' * Q_mat()[i] * p[i])
println("p value is ", value.(p))
println(A_tau_mat(0.0), A_tau_mat(1.0), A_tau_mat(2.0))
With the standard JuMP you can have only one goal function at a time. Running another #objective macro just overwrites the previous goal function.
Consider the following code:
julia> m = Model(GLPK.Optimizer);
julia> #variable(m,x >= 0)
julia> #objective(m, Max, 2x)
2 x
julia> #objective(m, Min, 2x)
2 x
julia> println(m)
Min 2 x
Subject to
x >= 0.0
It can be obviously seen that there is only one goal function left.
However, indeed there is an area in optimization called multi-criteria optimization. The goal here is to find a Pareto-barrier.
There is a Julia package for handling MC and it is named MultiJuMP. Here is a sample code:
using MultiJuMP, JuMP
using Clp
const mmodel = multi_model(Clp.Optimizer, linear = true)
const y = #variable(mmodel, 0 <= y <= 10.0)
const z = #variable(mmodel, 0 <= z <= 10.0)
#constraint(mmodel, y + z <= 15.0)
const exp_obj1 = #expression(mmodel, -y +0.05 * z)
const exp_obj2 = #expression(mmodel, 0.05 * y - z)
const obj1 = SingleObjective(exp_obj1)
const obj2 = SingleObjective(exp_obj2)
const multim = get_multidata(mmodel)
multim.objectives = [obj1, obj2]
optimize!(mmodel, method = WeightedSum())
This library also supports plotting of the Pareto frontier.
The disadvantage is that as of today it does not seem to be actively maintained (however it works with the current Julia and JuMP versions).

Can't seem to get NLsolve to converge in Julia. Can you suggest any tips?

I'm trying to solve a life cycle problem in economics using Julia but I'm having trouble with NLsolve. The model boils down to trying to solve two a two equation system to find optimal leisure hours and capital stock each working period. The economic agent after retirement sets leisure = 1 and I only need to solve a single non linear equation for capital. This part works fine. It's solving the two equation system that seems to break down.
As I'm fairly new to Julia / programming in general so any advice would be very helpful. Also advice / points / recommendations on all aspects of the code will be greatly appreciated. The model is solved backwards from the final time period.
My attempt
using Parameters
using Roots
using Plots
using NLsolve
using ForwardDiff
Model = #with_kw (α = 0.66,
δ = 0.02,
τ = 0.015,
β = 1/1.01,
T = 70,
Ret = 40,
function du_c(c, l, η=2, γ=2)
if c>0 && l>0
return (c+1e-6)^(-η) * l^((1-η)*γ)
return Inf
function du_l(c, l, η=2, γ=2)
if l>0 && c>0
return γ * (c+1e-6)^(1-η) * l^(γ*(1-η)-1)
return Inf
function create_euler_work(x, y, m, k, l, r, w, t)
# x = todays capital, y = leisure
#unpack α, β, τ, δ, T, Ret = m
c_1 = x*(1+r) + (1-τ)*w*(1-y) - k[t+1]
c_2 = k[t+1]*(1+r) + (1-τ)*w*(1-l[t+1]) - k[t+2]
return du_c(c_1,y) - β*(1+r)*du_c(c_2,l[t+1])
function create_euler_retire(x, m, k, r, b, t)
# Holds at time periods Ret onwards
#unpack α, β, τ, δ, T, Ret = m
c_1 = x*(1+r) + b - k[t+1]
c_2 = k[t+1]*(1+r) + b - k[t+2]
return du_c(c_1,1) - β*(1+r)*du_c(c_2,1)
function create_euler_lyw(x, y, m, k, r, w, b, t)
# x = todays capital, y = leisure
#unpack α, β, τ, δ, T, Ret = m
c_1 = x*(1+r) + (1-τ)*w*(1-y) - k[t+1]
c_2 = k[t+1]*(1+r) + b - k[t+2]
return du_c(c_1,y) - β*(1+r)*du_c(c_2,1)
function create_foc(x, y, m, k, r, w, t)
# x = todays capital, l= leisure
#unpack α, β, τ, δ, T = m
c = x*(1+r) + (1-τ)*w*(1-y) - k[t+1]
return du_l(c,y) - (1-τ)*w*du_c(c,y)
function life_cycle(m, guess, r, w, b, initial)
#unpack α, β, τ, δ, T, Ret = m
k = zeros(T+1);
l = zeros(T);
k[T] = guess
println("Period t = $(T+1) Retirment, k = $(k[T+1]), l.0 = NA")
println("Period t = $T Retirment, k = $(k[T]), l = 1.0")
########################## Retirment ################################
for t in T-1:-1:Ret+1
euler(x) = create_euler_retire(x, m, k, r, b, t)
k[t] = find_zero(euler, (0,100))
l[t] = 1
println("Period t = $t Retirment, k = $(k[t]), l = $(l[t])")
###################### Retirement Year #############################
for t in Ret:Ret
euler(x,y) = create_euler_lyw(x, y, m, k, r, w, b, t)
foc(x,y) = create_foc(x, y, m, k, r, w, t)
function f!(F, x)
F[1] = euler(x[1], x[2])
F[2] = foc(x[1], x[2])
res = nlsolve(f!, [5; 0.7], autodiff = :forward)
k[t] =[1]
l[t] =[2]
println("Period t = $t Working, k = $(k[t]), l = $(l[t])")
############################ Working ###############################
for t in Ret-1:-1:1
euler(x,y) = create_euler_work(x, y, m, k, l, r, w, t)
foc(x,y) = create_foc(x, y, m, k, r, w, t)
function f!(F, x)
F[1] = euler(x[1], x[2])
F[2] = foc(x[1], x[2])
res = nlsolve(f!, [5; 0.7], autodiff = :forward)
k[t] =[1]
l[t] =[2]
println("Period t = $t Working, k = $(k[t]), l = $(l[t])")
return k[1] - initial, k, l
m = Model();
residual, k, l = life_cycle(m, 0.3, 0.03, 1.0, 0.0, 0.0)
The code seems to break on period 35 with the error "During the resolution of the nonlinear system, the evaluation of following equations resulted in a non-finite number: [1,2]" However the solutions seem to go weird at period 37.

Can't get performant Julia Turing model

I've tried to reproduce the model from a PYMC3 and Stan comparison. But it seems to run slowly and when I look at #code_warntype there are some things -- K and N I think -- which the compiler seemingly calls Any.
I've tried adding types -- though I can't add types to turing_model's arguments and things are complicated within turing_model because it's using autodiff variables and not the usuals. I put all the code into the function do_it to avoid globals, because they say that globals can slow things down. (It actually seems slower, though.)
Any suggestions as to what's causing the problem? The turing_model code is what's iterating, so that should make the most difference.
using Turing, StatsPlots, Random
sigmoid(x) = 1.0 / (1.0 + exp(-x))
function scale(w0::Float64, w1::Array{Float64,1})
scale = √(w0^2 + sum(w1 .^ 2))
return w0 / scale, w1 ./ scale
function do_it(iterations::Int64)::Chains
K = 10 # predictor dimension
N = 1000 # number of data samples
X = rand(N, K) # predictors (1000, 10)
w1 = rand(K) # weights (10,)
w0 = -median(X * w1) # 50% of elements for each class (number)
w0, w1 = scale(w0, w1) # unit length (euclidean)
w_true = [w0, w1...]
y = (w0 .+ (X * w1)) .> 0.0 # labels
y = [Float64(x) for x in y]
σ = 5.0
σm = [x == y ? σ : 0.0 for x in 1:K, y in 1:K]
#model turing_model(X, y, σ, σm) = begin
w0_pred ~ Normal(0.0, σ)
w1_pred ~ MvNormal(σm)
p = sigmoid.(w0_pred .+ (X * w1_pred))
#inbounds for n in 1:length(y)
y[n] ~ Bernoulli(p[n])
#time chain = sample(turing_model(X, y, σ, σm), NUTS(iterations, 200, 0.65));
# ϵ = 0.5
# τ = 10
# #time chain = sample(turing_model(X, y, σ), HMC(iterations, ϵ, τ));
return (w_true=w_true, chains=chain::Chains)
chain = do_it(1000)

Improving for loop speed in Julia 1.0

I have a long vector V and a large matrix M. My purpose is in the Julia code below.
using LinearAlgebra
function myfunction(M,V)
n = size(V,1)
sum = 0
summ = 0
for i = 1:n-1
for j = i+1:n
a= [i,j]
Y = V[a]
X = M[a,a]
sum += Y'*inv(X)*Y
summ += tr(X)*Y'*Y
return sum, summ
M = randn(10000,10000)
V = randn(10000)
#time myfunction(M,V)
Since the vector is very long and the matrix is very large, this procedure takes a long time. I spent a long time on this issue. I really appreciate your help!
I just would manually unroll the calculations to avoid allocations:
function myfunction2(M::AbstractMatrix{T},V::AbstractVector{T}) where {T}
n = size(V, 1)
sum = zero(T)
summ = zero(T)
for i = 2:n
for j = 1:i-1
#inbounds y1, y2 = V[i], V[j]
y11 = y1*y1
y12 = y1*y2
y22 = y2*y2
#inbounds a, b, c, d = M[i,i], M[i,j], M[j,i], M[j,j]
sum += (d*y11-(c+b)*y12+a*y22) / (a*d-b*c)
summ += (a+d)*(y11+y22)
return sum, summ
(note that I make explicit assumptions about M and V)
EDIT this is minimally faster
function myfunction3(M::AbstractMatrix{T},V::AbstractVector{T}) where {T}
n = size(V, 1)
sum = zero(T)
summ = zero(T)
for i = 2:n
#inbounds y1 = V[i]
#inbounds a = M[i,i]
y11 = y1*y1
for j = 1:i-1
#inbounds y2 = V[j]
y12 = y1*y2
y22 = y2*y2
#inbounds b, c, d = M[i,j], M[j,i], M[j,j]
sum += (d*y11-(c+b)*y12+a*y22) / (a*d-b*c)
summ += (a+d)*(y11+y22)
return sum, summ
