Julia JuMP Multiavariate ML Estimation - julia

I am trying to perform a ML-Estimation of a normally distributed variable in a linear regression setting in Julia using JuMP and the NLopt solver.
There exists a good working example here however if I try to estimate the regression parameters (slope) the code becomes quite tedious to write, in particular if the parameter space increases.
Maybe someone has an idea how to write it more concise. Here is my Code:
#type definition to store data
type data
n::Int
A::Matrix
β::Vector
y::Vector
ls::Vector
err::Vector
end
#generate regression data
function Data( n = 1000 )
A = [ones(n) rand(n, 2)]
β = [2.1, 12.9, 3.7]
y = A*β + rand(Normal(), n)
ls = inv(A'A)A'y
err = y - A * ls
data(n, A, β, y, ls, err)
end
#initialize data
d = Data()
println( var(d.y) )
function ml( )
m = Model( solver = NLoptSolver( algorithm = :LD_LBFGS ) )
#defVar( m, b[1:3] )
#defVar( m, σ >= 0, start = 1.0 )
#this is the working example.
#As you can see it's quite tedious to write
#and becomes rather infeasible if there are more then,
#let's say 10, slope parameters to estimate
#setNLObjective( m, Max,-(d.n/2)*log(2π*σ^2) \\cont. next line
-sum{(d.y[i]-d.A[i,1]*b[1] \\
-d.A[i,2]*b[2] \\
-d.A[i,3]*b[3])^2, i=1:d.n}/(2σ^2) )
#julia returns:
> slope: [2.14,12.85,3.65], variance: 1.04
#which is what is to be expected
#however:
#this is what I would like the code to look like:
#setNLObjective( m, Max,-(d.n/2)*log(2π*σ^2) \\
-sum{(d.y[i]-(d.A[i,j]*b[j]))^2, \\
i=1:d.n, j=1:3}/(2σ^2) )
#I also tried:
#setNLObjective( m, Max,-(d.n/2)*log(2π*σ^2) \\
-sum{sum{(d.y[i]-(d.A[i,j]*b[j]))^2, \\
i=1:d.n}, j=1:3}/(2σ^2) )
#but unfortunately it returns:
> slope: [10.21,18.89,15.88], variance: 54.78
solve(m)
println( getValue(b), " ", getValue(σ^2) )
end
ml()
Any ideas?
EDIT
As noted by Reza a working example is:
#setNLObjective( m, Max,-(d.n/2)*log(2π*σ^2) \\
-sum{(d.y[i]-sum{d.A[i,j]*b[j],j=1:3})^2,
i=1:d.n}/(2σ^2) )

The sum{} syntax is a special syntax that only works inside JuMP macros, and is the preferred syntax for sums.
So your example would be written as:
function ml( )
m = Model( solver = NLoptSolver( algorithm = :LD_LBFGS ) )
#variable( m, b[1:3] )
#variable( m, σ >= 0, start = 1.0 )
#NLobjective(m, Max,
-(d.n/2)*log(2π*σ^2)
- sum{
sum{(d.y[i]-d.A[i,j]*b[j], j=1:3}^2,
i=1:d.n}/(2σ^2) )
where I've expanded it across multiple lines to be as clear as possible.
Reza's answer isn't technically wrong, but isn't idiomatic JuMP and won't be as efficient for larger models.

I didn't trace your code but anywhere, I wish that the following works for you:
sum([(d.y[i]-sum([d.A[i,j]*b[j] for j=1:3]))^2 for i=1:d.n])
as #IainDunning mentioned, JuMP package has a special syntax for summation inside it's macros, so the more efficient and abstract way to do this is:
sum{sum{(d.y[i]-d.A[i,j]*b[j], j=1:3}^2,i=1:d.n}

Related

Puzzling result from boundary condition code in Julia BVP solver

I am trying to solve a boundary value problem in Julia, following the example found here, using the BoundaryValueDiffEq package. In the boundary condition function, the example requires a for loop to update each index individually, à la
function bc1!(residual, u, p, t)
for i in 1:n
residual[i] = u[end][i] - 10
end
end
I would like to use the following code, which should be more efficient:
function bc1!(residual, u, p, t)
residual = u[end] .- 10
end
Though the resulting value of residual is the same for both versions of the code, the solver gives the correct result in the first case and an incorrect result in the second case.
All I can think of is that there is some difference between updating residual
index by index and assigning a new vector to it, even if the result is identical in value and in type. Why is this the case, and is it possible to make the code more efficient while preserving the correct result?
Here is the full code in case it helps.
using BoundaryValueDiffEq, Plots
n = 3
f(t) = .1
F(t) = .1*t
function du!(du,u,p,t)
fn(i) = 1/(u[i]-t)
for i in 1:n
du[i] = 1/(n-1)*F(u[i])/f(u[i])*((2-n)/(u[i]-t)+sum(map(fn,
vcat(1:i-1,i+1:n))))
end
end
function bc1!(residual, u, p, t)
#residual = u[end] .- 10
for i in 1:n
residual[i] = u[end][i]-10
end
end
# exact solution
xvals = LinRange(0,20/3,200)
yvals = 1.5*xvals
# solving BVP
tspan = (0.0,20/3)
bvp1 = BVProblem(du!, bc1!, 10*ones(Int8,n), tspan)
sol1 = solve(bvp1, GeneralMIRK4(), dt=.2)
# plotting computed solution vs actual solution
plot(sol1,vars=(0,1))
plot!(xvals,yvals,label="Exact solution")
You overrode the array instead of mutating it. You need to use .= to update it in-place.
function bc1!(residual, u, p, t)
residual .= u[end] .- 10
end
or safer:
function bc1!(residual, u, p, t)
#. residual = u[end] .- 10
end

Generalizing the inputs of the nlsolve function in Julia

This question has already been asked on another platform, but I haven't got an answer yet.
https://discourse.julialang.org/t/generalizing-the-inputs-of-the-nlsolve-function-in-julia/
After an extensive process usyng the SymPy in Julia, I generated a system of nonlinear equations. My system is allocated in a matrix NxS. Something like this(NN = 2, S = 2).
I would like to adapt the system to use the NLsolve package. I do some boondoggle for the case NN=1 and S =1. The system_equations2 function give me the nonlinear system, like the figure
using SymPy
using Plots
using NLsolve
res = system_equations2()
In order to simulate the output, I do this:
NN = 1
S = 1
p= [Sym("p$i$j") for i in 1:NN,j in 1:S]
res = [ Eq( -331.330122303069*p[i,j]^(1.0) + p[i,j]^(2.81818181818182) - 1895.10478893046/(p[i,j]^(-1.0))^(2.0),0 ) for i in 1:NN,j in 1:S]
resf = convert( Function, lhs( res[1,1] ) )
plot(resf, 0 ,10731)
Now
resf = convert( Function, lhs( res[1,1] ) )
# This for the argument in the nlsolve function
function resf2(p)
p = Tuple(p)[1]
r = resf(p)
return r
end
Now, I find the zeros
function K(F,p)
F[1] = resf2(p[1])
end
nlsolve(K , [7500.8])
I would like to generalize this price to any NN and any S. I believe there is a simpler way to do this.

Julia MethodError: no method matching parseNLExpr_runtime(

I'm attempting to code the method described here to estimate production functions of metal manufacturers. I've done this in Python and Matlab, but am trying to learn Julia.
spain_clean.csv is a dataset of log capital (lnk), log labor (lnl), log output (lnva), and log materials (lnm) that I am loading. Lagged variables are denoted with an "l" before them.
Code is at the bottom. I am getting an error:
ERROR: LoadError: MethodError: no method matching parseNLExpr_runtime(::JuMP.Model, ::JuMP.GenericQuadExpr{Float64,JuMP.Variable}, ::Array{ReverseDiffSparse.NodeData,1}, ::Int32, ::Array{Float64,1})
I think it has to do with the use of vector sums and arrays going into the non-linear objective, but I do not understand Julia enough to debug this.
using JuMP # Need to say it whenever we use JuMP
using Clp, Ipopt # Loading the GLPK module for using its solver
using CSV # csv reader
# read data
df = CSV.read("spain_clean.csv")
#MODEL CONSTRUCTION
#--------------------
acf = Model(solver=IpoptSolver())
#variable(acf, -10<= b0 <= 10) #
#variable(acf, -5 <= bk <= 5 ) #
#variable(acf, -5 <= bl <= 5 ) #
#variable(acf, -10<= g1 <= 10) #
const g = sum(df[:phihat]-b0-bk* df[:lnk]-bl* df[:lnl]-g1* (df[:lphihat]-b0-bk* df[:llnk]-bl* df[:llnl]))
const gllnk = sum((df[:phihat]-b0-bk* df[:lnk]-bl* df[:lnl]-g1* (df[:lphihat]-b0-bk* df[:llnk]-bl* df[:llnl])).*df[:llnk])
const gllnl = sum((df[:phihat]-b0-bk* df[:lnk]-bl* df[:lnl]-g1* (df[:lphihat]-b0-bk* df[:llnk]-bl* df[:llnl])).*df[:llnl])
const glphihat = sum((df[:phihat]-b0-bk* df[:lnk]-bl* df[:lnl]-g1* (df[:lphihat]-b0-bk* df[:llnk]-bl* df[:llnl])).*df[:lphihat])
#OBJECTIVE
#NLobjective(acf, Min, g* g + gllnk* gllnk + gllnl* gllnk + glphihat* glphihat)
#SOLVE IT
status = solve(acf) # solves the model
println("Objective value: ", getobjectivevalue(acf)) # getObjectiveValue(model_name) gives the optimum objective value
println("b0 = ", getvalue(b0))
println("bk = ", getvalue(bk))
println("bl = ", getvalue(bl))
println("g1 = ", getvalue(g1))
No an expert in Julia, but I think a couple of things are wrong about your code.
first, constant are not supposed to change during iteration and you are making them functions of control variables. Second, what you want to use there are nonlinear expression instead of constants. so instead of the constants what you want to write is
N = size(df, 1)
#NLexpression(acf, g, sum(df[i, :phihat]-b0-bk* df[i, :lnk]-bl* df[i, :lnl]-g1* (df[i, :lphihat]-b0-bk* df[i, :llnk]-bl* df[i, :llnl]) for i=1:N))
#NLexpression(acf, gllnk, sum((df[i,:phihat]-b0-bk* df[i,:lnk]-bl* df[i,:lnl]-g1* (df[i,:lphihat]-b0-bk* df[i,:llnk]-bl* df[i,:llnl]))*df[i,:llnk] for i=1:N))
#NLexpression(acf,gllnl,sum((df[i,:phihat]-b0-bk* df[i,:lnk]-bl* df[i,:lnl]-g1* (df[i,:lphihat]-b0-bk* df[i,:llnk]-bl* df[i,:llnl]))*df[i,:llnl] for i=1:N))
#NLexpression(acf,glphihat,sum((df[i,:phihat]-b0-bk* df[i,:lnk]-bl* df[i,:lnl]-g1* (df[i,:lphihat]-b0-bk* df[i,:llnk]-bl* df[i,:llnl]))*df[i,:lphihat] for i=1:N))
I tested this and it seems to work.

Efficient computation of bivariate empirical cdf in R/Fortran

Given an n*2 data matrix X I'd like to calculate the bivariate empirical cdf for each observation, i.e. for each i in 1:n, return the percentage of observations with 1st element not greater than X[i,1] and 2nd element not greater than X[i,2].
Because of the nested search involved it gets terribly slow for n ~ 100k, even after porting it to Fortran. Does anyone know if there's a better way of handling sample sizes like this?
Edit: I believe this problem is similar (in terms of complexity) to finding Kendall's tau, which is of order O(n^2). In that case Knight (1966) has an algorithm to reduce it to O(n log(n)). Just wondering if there's any O(n*log(n)) algorithm for finding bivariate ecdf already out there.
Edit 2: This is the code I have in Fortran, as requested. This is called in R in the usual way, so the R code is omitted here. The code is meant for arbitrary dimensions, but for the specific thing I'm doing a bivariate one is good enough.
! Calculates multivariate empirical cdf for each point
! n: number of observations
! d: dimension (>=2)
! umat: data matrix
! outvec: vector of ecdf
subroutine mecdf(n,d,umat,outvec)
implicit none
integer :: n, d, i, j, k, tempsum
double precision, dimension(n) :: outvec
double precision, dimension(n,d) :: umat
logical :: flag
do i = 1,n
tempsum = 0
do j = 1,n
flag = .true.
do k = 1,d
if (umat(i,k) < umat(j,k)) then
flag = .false.
exit
end if
end do
if (flag) then
tempsum = tempsum + 1
end if
end do
outvec(i) = real(tempsum)/n
end do
return
end subroutine
I think my first effort was not really an ecdf, although it did map the points to the interval [0,1] The example, a 25 x 2 matrix generated with:
#M <- matrix(runif(100), ncol=2)
M <-
structure(c(0.0468267474789172, 0.296053855214268, 0.205678076483309,
0.467400068417192, 0.968577065737918, 0.435642971657217, 0.929023026255891,
0.038406387437135, 0.304360694251955, 0.964778139721602, 0.534192910650745,
0.741682186257094, 0.0848641532938927, 0.405901980120689, 0.957696850644425,
0.384813814423978, 0.639882878866047, 0.231505588628352, 0.271994129288942,
0.786155494628474, 0.349499785574153, 0.279077709652483, 0.206662984099239,
0.777465222170576, 0.705439242534339, 0.643429880728945, 0.887209519045427,
0.0794123203959316, 0.849177583120763, 0.704594585578889, 0.736909110797569,
0.503158083418384, 0.49449566937983, 0.408533290959895, 0.236613316927105,
0.297427259152755, 0.0677345870062709, 0.623845702270046, 0.139933609170839,
0.740499466424808, 0.628097783308476, 0.678438259987161, 0.186680511338636,
0.339367639739066, 0.373212536331266, 0.976724133593962, 0.94558056560345,
0.610417427960783, 0.887977657606825, 0.663434249348938, 0.447939050383866,
0.755168803501874, 0.478974275058135, 0.737040047068149, 0.429466919740662,
0.0021107573993504, 0.697435079608113, 0.444197302218527, 0.108997165458277,
0.856855363817886, 0.891898229718208, 0.93553287582472, 0.991948011796921,
0.630414301762357, 0.0604106825776398, 0.908968194155023, 0.0398679254576564,
0.251426834380254, 0.235532913124189, 0.392070295521989, 0.530511683085933,
0.319339724024758, 0.534880011575297, 0.92030712752603, 0.138276003766805,
0.213625695323572, 0.407931711757556, 0.605797187192366, 0.424798395251855,
0.471233424032107, 0.0105366336647421, 0.625802840106189, 0.524665891425684,
0.0375960320234299, 0.54812005511485, 0.0105806747451425, 0.438266788609326,
0.791981092421338, 0.363821814302355, 0.157931488472968, 0.47945317090489,
0.906797411618754, 0.762243523262441, 0.258681379957125, 0.308056800393388,
0.91944490163587, 0.412255838746205, 0.347220918396488, 0.68236422073096,
0.559149842709303), .Dim = c(50L, 2L))
So the task is to do a single summation of a two-part logical test on N items which I suspect is O(N*3). It might be marginally faster if implemented in Rcpp, but these are vectorized operations.
# Wrong: ecdf2d <- function(m,i,j) { ord <- rank(m[ , 1]^2+m[ , 2]^2)
# ord[i]/nrow(m)} # scales to [0,1] interval
ecdf2d.v2 <- function(obj, x, y) sum( obj[,1] < x & obj[,2] < y)/nrow(obj)

(in R) Why is result of ksvm using user-defined linear kernel different from that of ksvm using "vanilladot"?

I wanted to use user-defined kernel function for Ksvm in R.
so, I tried to make a vanilladot kernel and compare with "vanilladot" which is built in "kernlab" as practice.
I write my kernel as follow.
#
###vanilla kernel with class "kernel"
#
kfunction.k <- function(){
k <- function (x,y){crossprod(x,y)}
class(k) <- "kernel"
k}
l<-0.1 ; C<-1/(2*l)
###use kfunction.k
tmp<-ksvm(x,factor(y),scaled=FALSE, type = "C-svc", kernel=kfunction.k(), C = C)
alpha(tmp)[[1]]
ind<-alphaindex(tmp)[[1]]
x.s<-x[ind,] ; y.s<-y[ind]
w.class.k<-t(alpha(tmp)[[1]]*y.s)%*%x.s
w.class.k
I thouhgt result of this operation is eqaul to that of following.
However It dosn't.
#
###use "vanilladot"
#
l<-0.1 ; C<-1/(2*l)
tmp1<-ksvm(x,factor(y),scaled=FALSE, type = "C-svc", kernel="vanilladot", C = C)
alpha(tmp1)[[1]]
ind1<-alphaindex(tmp1)[[1]]
x.s<-x[ind1,] ; y.s<-y[ind1]
w.tmp1<-t(alpha(tmp1)[[1]]*y.s)%*%x.s
w.tmp1
I think maybe this problem is related to kernel class.
When class is set to "kernel", this problem is occured.
However When class is set to "vanillakernel", the result of ksvm using user-defined kernel is equal to that of ksvm using "vanilladot" which is built in Kernlab.
#
###vanilla kernel with class "vanillakernel"
#
kfunction.v.k <- function(){
k <- function (x,y){crossprod(x,y)}
class(k) <- "vanillakernel"
k}
# The only difference between kfunction.k and kfunction.v.k is "class(k)".
l<-0.1 ; C<-1/(2*l)
###use kfunction.v.k
tmp<-ksvm(x,factor(y),scaled=FALSE, type = "C-svc", kernel=kfunction.v.k(), C = C)
alpha(tmp)[[1]]
ind<-alphaindex(tmp)[[1]]
x.s<-x[ind,] ; y.s<-y[ind]
w.class.v.k<-t(alpha(tmp)[[1]]*y.s)%*%x.s
w.class.v.k
I don't understand why the result is different from "vanilladot", when setting the class to "kernel".
Is there an error in my operation?
First, it seems like a really good question!
Now to the point. In the sources of ksvm we can find when is a line drawn between using user-defined kernel, and the built-ins:
if (type(ret) == "spoc-svc") {
if (!is.null(class.weights))
weightedC <- class.weights[weightlabels] * rep(C,
nclass(ret))
else weightedC <- rep(C, nclass(ret))
yd <- sort(y, method = "quick", index.return = TRUE)
xd <- matrix(x[yd$ix, ], nrow = dim(x)[1])
count <- 0
if (ktype == 4)
K <- kernelMatrix(kernel, x)
resv <- .Call("tron_optim", as.double(t(xd)), as.integer(nrow(xd)),
as.integer(ncol(xd)), as.double(rep(yd$x - 1,
2)), as.double(K), as.integer(if (sparse) xd#ia else 0),
as.integer(if (sparse) xd#ja else 0), as.integer(sparse),
as.integer(nclass(ret)), as.integer(count), as.integer(ktype),
as.integer(7), as.double(C), as.double(epsilon),
as.double(sigma), as.integer(degree), as.double(offset),
as.double(C), as.double(2), as.integer(0), as.double(0),
as.integer(0), as.double(weightedC), as.double(cache),
as.double(tol), as.integer(10), as.integer(shrinking),
PACKAGE = "kernlab")
reind <- sort(yd$ix, method = "quick", index.return = TRUE)$ix
alpha(ret) <- t(matrix(resv[-(nclass(ret) * nrow(xd) +
1)], nclass(ret)))[reind, , drop = FALSE]
coef(ret) <- lapply(1:nclass(ret), function(x) alpha(ret)[,
x][alpha(ret)[, x] != 0])
names(coef(ret)) <- lev(ret)
alphaindex(ret) <- lapply(sort(unique(y)), function(x)
which(alpha(ret)[,
x] != 0))
xmatrix(ret) <- x
obj(ret) <- resv[(nclass(ret) * nrow(xd) + 1)]
names(alphaindex(ret)) <- lev(ret)
svindex <- which(rowSums(alpha(ret) != 0) != 0)
b(ret) <- 0
param(ret)$C <- C
}
The important parts are two things, first, if we provide ksvm with our own kernel, then ktype=4 (while for vanillakernel, ktype=0) so it makes two changes:
in case of user-defined kernel, the kernel matrix is computed instead of actually using the kernel
tron_optim routine is ran with the information regarding the kernel
Now, in the svm.cpp we can find the tron routines, and in the tron_run (called from tron_optim), that LINEAR kernel has a separate optimization routine
if (param->kernel_type == LINEAR)
{
/* lots of code here */
while (Cpj < Cp)
{
totaliter += s.Solve(l, prob->x, minus_ones, y, alpha, w,
Cpj, Cnj, param->eps, sii, param->shrinking,
param->qpsize);
/* lots of code here */
}
totaliter += s.Solve(l, prob->x, minus_ones, y, alpha, w, Cp, Cn,
param->eps, sii, param->shrinking, param->qpsize);
delete[] w;
}
else
{
Solver_B s;
s.Solve(l, BSVC_Q(*prob,*param,y), minus_ones, y, alpha, Cp, Cn,
param->eps, sii, param->shrinking, param->qpsize);
}
As you can see, the linear case is treated in the more complex, more detailed way. There is an inner optimization loop calling the solver many times. It would require really deep analysis of actual optimization being performed here, but at this step one can answer your question in a following way:
There is no error in your operation
kernlab's svm has a separate routine for training SVM with linear kernel, which is based on the type of kernel passed to the code, changing "kernel" to "vanillakernel" made the ksvm think it is actually working with vanillakernel, and so performed this separate optimization routine
It does not seem as a bug in fact, as the linear SVM is in fact very different from the kernelized version in terms of efficient optimization techniques. Amount of heuristic as well as numerical issues that has to be taken care of is really big. As a result, some approximations are required and can lead to the different results. While for the rich feature space (like those induced by RBF kernel) it should not really matter, for simple kernels line linear ones - this simplifications can lead to significant output changes.

Resources