Spatial Autoregressive Maximum Likelihood in Julia: Multiple Parameters - julia

I have the following code that evaluates the likelihood function for a spatial autoregressive model in Julia, like so:
function like_sar2(betas,rho,sige,y,x,W)
n = length(y)
A = speye(n) - rho*W
e = y-x*betas-rho*sparse(W)*y
epe = e'*e
tmp2 = 1/(2*sige)
llike = -(n/2)*log(pi) - (n/2)*log(sige) + log(det(A)) - tmp2*epe
I am trying to maximize this function but I'm not sure how to pass the different sized function inputs so that the Optim.jl package will accept it. I have tried the following:
In the first case, the matrix in brackets does not conform due to dimension mismatch and in the second, the Optim package doesn't allow tuples.
I'd like to try and maximize this likelihood function so that it can return the numerical Hessian matrix (using the Optim options) so that I can compute t-statistics for the parameters.
If there is any easier way to obtain the numerical Hessian for such a function I'd use that but it appears that packages like FowardDiff only accept single inputs.
Any help would be greatly appreciated!

Not 100% sure I correctly understand how your function works, but it seems to me like you're using the likelihood to estimate the coefficient vector beta, with the other input variables fixed. The way to do this would be to amend the function as follows:
using Optim
# Initialize some parameters
coeffs = rand(10)
rho = 0.1
ys = rand(10)
xs = rand(10,10)
Wmat = rand(10,10)
# Construct likelihood with parameters fixed at pre-defined values
function like_sar2(β::Vector{Float64},ρ=rho,σε=sige,y=ys,x=xs,W=Wmat)
n = length(y)
A = speye(n) - ρ*W
ε = y-x*β-ρ*sparse(W)*y
epe = ε'*ε
tmp2 = 1/(2*σε)
llike = -(n/2)*log(π) - (n/2)*log(σε) + log(det(A)) - tmp2*epe
# Optimize, with starting value zero for all beta coefficients
optimize(like_sar2, zeros(10), NelderMead())
If you need to optimize more than your beta parameters (in the general autoregressive models I've used often the autocorrelation parameter was estimated jointly with other coefficients), you could do this by chugging it in with the beta vector and unpacking within the functions like so:
function like_sar3(coeffs::Vector{Float64},σε=sige,y=ys,x=xs,W=Wmat)
β = coeffs[1:10]; ρ = coeffs[11]
n = length(y)
A = speye(n) - ρ*W
ε = y-x*β-ρ*sparse(W)*y
epe = ε'*ε
tmp2 = 1/(2*σε)
llike = -(n/2)*log(π) - (n/2)*log(σε) + log(det(A)) - tmp2*epe
The key is that you end up with one vector of inputs to pass into your function.


Logitstic Regression In Julia Returning Incorrect results

I am trying to use optimize from Julia's Optim package to estimate the vector beta = [beta_0, beta_1]' , but I am getting unreasonable results.
I've provided a minimum working example where the results estimate [27.04, -14.38] when the true values are [1, 1].
What am I doing wrong here?
Here is a minimum working example. It's my first one, so please let me know how it could be improved.
using Distributions
using Optim
using LinearAlgebra
import Random
# generate data
true_beta = [1; 1];
X = [ones(N) rand(Normal(0,1), N)];
u = rand(Normal(0,1), N)
# generate the latent variable
y_star = X * true_beta + u;
# generate observed variable
y = Vector{Int64}(zeros(N));
for i in 1:N
if y_star[i] >= 0
y[i] = 1
# (negative of) loglikelihood function
function loglike(beta::Vector{Float64})
l = Vector{Float64}()
pk = 1/(1+exp(-X[i,:]'*beta))
lhood = -y[i,1]*log(pk) - (1-y[i,1])*log(1-pk)
for i in 1:size(y,1)
push!(l, lhood)
return sum(l)
# initial guess: ols
ols = inv(X'X)X'y;
# minimize negative loglikelihood
res = optimize(loglike, ols)
# save parameter estimates of beta
betahat = res.minimizer
The reason is that your function is not defined correctly. It should be:
function loglike(beta::Vector{Float64})
l = Vector{Float64}()
for i in 1:size(y,1)
pk = 1/(1+exp(-X[i,:]'*beta))
lhood = -y[i]*log(pk) - (1-y[i])*log(1-pk)
push!(l, lhood)
return sum(l)
And you can check that the result is correct by running:
using GLM
glm(#formula(y~x), (y=y, x=X[:, 2]), Binomial(), LogitLink())
Also notice that your data generating process is incorrect. You are adding normal noise and you should add logistic noise. With normal noise the correct model is Probit. If you use it e.g. like:
glm(#formula(y~x), (y=y, x=X[:, 2]), Binomial(), ProbitLink())
you will indeed recover both parameters to be around 1.
You may not get estimated values close to [1, 1] since you have used a step function to generate y.
Also, as suggested by #bogumił-kamiński, it is always better to use a tested function like glm from the GLM package. The GLM package not only provides estimates but an ecosystem surrounding the logistic regression, which is very useful to diagnose the model.
The glm produces the following
y ~ 1 + X
Coef. Std. Error z Pr(>|z|) Lower 95% Upper 95%
(Intercept) 2.22148 0.193944 11.45 <1e-29 1.84136 2.6016
X 1.98917 0.223411 8.90 <1e-18 1.55129 2.42704
and estimates using a proper loglike function
julia> # save parameter estimates of beta
betahat = res.minimizer
2-element Vector{Float64}:

compute the integration in R

I compute the cumulative distribution function whose result should lie in [0,1]. The equation for computing the CDF is:
F= \int_{\hat{a}}^{x}\frac{2}{\hat{b}-\hat{a}} ~\sum \nolimits_{k=0}^{' N-1} C_{k}~\text{cos} \bigg( \big(y - \hat{a} \big) \frac{k \pi}{\hat{b} - \hat{a}}\bigg) ~dy
Ck is a vector
cos term is a vector
length(ck) = length(cos term) = N.
I am sure the equation is correct, but I am afraid my code is incorrect.
Here is my code:
f <- function(x){integrand(x,myCk)}
# define a vectorized version of this function
fv <- Vectorize(f,"x")
res<-integrate(fv,upper = r,lower = hat.a, subdivisions = 2000)$value
resreturns the cumulative distribution function, and the result can be larger than 1.
myCkis a vector generated by another function.
hat.ais the lower bound for integration, and it is negative.
ukis a vector generated by a function. The length of ukequals the length of myCk.
I appreciate your advice!

Conducting MLE for multivariate case (bivariate normal) in R

The case is that I am trying to construct an MLE algortihm for a bivariate normal case. Yet, I stuck somewhere that seems there is no error, but when I run the script it ends up with a warning.
I have a sample of size n (a fixed constant, trained with 100, but can be anything else) from a bivariate normal distribution with mean vector = (0,0) and covariance matrix = matrix(c(2.2,1.8,1.8,3),2,2)
I've tried several optimization functions (including nlm(), mle(), spg() and optim()) to maximize the likelihood function (,or minimize neg-likelihood), but there are warnings or errors.
I've defined the first likelihood function as follows;
bvrt_ll = function(mu,sigma,rho,sample)
n = nrow(sample)
mu_hat = c(mu[1],mu[2])
p = length(mu)
if(sigma[1]>0 && sigma[2]>0)
if(rho<=1 && rho>=-1)
sigma_hat = matrix(c(sigma[1]^2
neg_likelihood = (n*p/2)*log(2*pi) + (n/2)*log(det(sigma_hat)) + 0.5*sum(((sample-mu_hat)%*%solve(sigma_hat)%*%t(sample-mu_hat)))
else NA
I prefered this one since I could set the constraints for sigmas and rho, but when I use mle()
> mle(minuslogl = bvrt_ll ,start = list(mu = mu_est,sigma=sigma_est,rho =
+ ,method = "BFGS")
Error in optim(start, f, method = method, hessian = TRUE, ...) :
(list) object cannot be coerced to type 'double'
I also tried nlm and spg in package BB, but they did not help as well. I tried the same function without defining constraints (inside the likelihood, not in optimization function), I could have some results but with warnings, like in nlm and spg both said the process was failed due to covariance matrix being not positive definite while it was, I think that was due to iteration, when iterating covariance matrix might not have been positive definite, and the fact that I did not define the constraints.
Thus, as a result I need to construct an mle algorithm for bivariate normal, where do I do the mistake?
NOTE: I also tried the optimization functions with the following, (I am not sure I did it correct);
neg_likelihood = function(mu,sigma,rho)
if(rho>=-1 && rho<=1)
,sigma = matrix(c(sigma[1]^2
,sigma[2]^2),2,2),log = T))
else NA
Any help is appreciated.
EDIT : mu is a vector of length 2 specifying the population means, sigma is a vector of length 2 (specifying population standard deviations of the random variables), and rho is a scalar as correlation coefficient between the bivariate r.v s.
You can do it in closed form so there is no need for numeric optimization. See wiki. Just use colMeans and cov and take note of the method argument in help("cov") and this comment
The denominator n - 1 is used which gives an unbiased estimator of the
(co)variance for i.i.d. observations. These functions return NA when
there is only one observation (whereas S-PLUS has been returning NaN),
and fail if x has length zero.

Error in optim(): searching for global minimum for a univariate function

I am trying to optmize a function in R
The function is the Likelihood function of negative binominal when estimating only mu parameter. This should not be a problem since the function clearly has just one point of maximum. But, I am not being able to reach the desirable result.
The function to be optmized is:
EMV <- function(data, par) {
Mi <- par
Phi <- 2
N <- NROW(data)
Resultado <- log(Mi/(Mi + Phi))*sum(data) + N*Phi*log(Phi/(Mi + Phi))
Data is a vector of negative binomial variables with parameters 2 and 2
data <- rnegbin(10000, mu = 2, theta = 2)
When I plot the function having mu as variable with the following code:
x <- seq(0.1, 100, 0.02)
z <- EMV(data,0.1)
for (aux in x) {z <- rbind(z, EMV(data,aux))}
z <- z[2:NROW(z)]
I get the following curve:
And the maximum value of z is close to parameter value --> 2
But the optimization is not working with BFGS
Error in optim(par = theta, fn = EMV, data = data, method = "BFGS") :
non-finite finite-difference value [1]
And is not going to right value using SANN, for example:
[1] 5.19767e-05
[1] -211981.8
function gradient
10000 NA
[1] 0
The questions are:
What am I doing wrong?
Is there a way to tell optim that the param should be bigger than 0?
Is there a way to tell optim that I want to maximize the function? (I am afraid the optim is trying to minimize and is going to a very small value where function returns smallest values)
Minimization or Maximization?
Although ?optim says it can do maximization, but that is in a bracket, so minimization is default:
fn: A function to be minimized (or maximized) ...
Thus, if we want to maximize an objective function, we need to multiply an -1 to it, and then minimize it. This is quite a common situation. In statistics we often want to find maximum log likelihood, so to use optim(), we have no choice but to minimize the negative log likelihood.
Which method to use?
If we only do 1D minimization, we should use method "Brent". This method allows us to specify a lower bound and an upper bound of search region. Searching will start from one bound, and search toward the other, until it hit the minimum, or it reach the boundary. Such specification can help you to constrain your parameters. For example, you don't want mu to be smaller than 0, then just set lower = 0.
When we move to 2D or higher dimension, we should resort to "BFGS". In this case, if we want to constrain one of our parameters, say a, to be positive, we need to take log transform log_a = log(a), and reparameterize our objective function using log_a. Now, log_a is free of constraint. The same goes when we want constrain multiple parameters to be positive.
How to change your code?
EMV <- function(data, par) {
Mi <- par
Phi <- 2
N <- NROW(data)
Resultado <- log(Mi/(Mi + Phi))*sum(data) + N*Phi*log(Phi/(Mi + Phi))
return(-1 * Resultado)
optim(par = theta, fn = EMV, data = data, method = "Brent", lower = 0, upper = 1E5)
The help file for optim says: "By default optim performs minimization, but it will maximize if control$fnscale is negative." So if you either multiply your function output by -1 or change the control object input, you should get the right answer.

Fitting an inverse function

I have a function which looks like:
g(x) = f(x) - a^b / f(x)^b
g(x) - known function, data vector provided.
f(x) - hidden process.
a,b - parameters of this function.
From the above we get the relation:
f(x) = inverse(g(x))
My goal is to optimize parameters a and b such that f(x) would be as close as possible
to a normal distribution. If we look on a f(x) Q-Q normal plot (attached), my purpose is to minimize the distance between f(x) to the straight line which represents the normal distribution, by optimizing parameters a and b.
I wrote the below code:
g_fun <- function(x) {x - a^b/x^b}
inverse = function (f, lower = 0, upper = 2000) {
function (y) uniroot((function (x) f(x) - y), lower = lower, upper = upper)[1]
f_func = inverse(function(x) g_fun(x))
enter code here
# let's made up an example
# g(x) values are known
g <- c(-0.016339, 0.029646, -0.0255258, 0.003352, -0.053258, -0.018971, 0.005172,
0.067114, 0.026415, 0.051062)
# Calculate f(x) by using the inverse of g(x), when a=a0 and b=b0
for (i in 1:10) {
f[i] <- f_fun(g[i])
I have two question:
How to pass parameters a and b to the functions?
How to perform this optimization task, meaning find a and b such that f(x) would approximate normal distribution.
Not sure how you were able to produce the Q-Q plot since your provided examples do not work. You are not specifying the values of a and b and you are defining f_func but calling f_fun. Anyway here is my answer to your questions:
How to pass parameters a and b to the functions? - Just pass them as
arguments to the functions.
How to perform this optimization task, meaning find a and b such that f(x) would approximate normal distribution? - The same way any optimization task is done. Define a cost function, then minimize it.
Here is the revised code: I have added a and b as parameters, removed the inverse function and incorporated it inside f_func, which can now take vector input so no need for a for loop.
g_fun <- function(x,a,b) {x - a^b/x^b}
f_func = function(y,a,b,lower = 0, upper = 2000){
sapply(y,function(z) { uniroot(function(x) g_fun(x,a,b) - z, lower = lower, upper = upper)$root})
# g(x) values are known
g <- c(-0.016339, 0.029646, -0.0255258, 0.003352, -0.053258, -0.018971, 0.005172,
0.067114, 0.026415, 0.051062)
f <- f_func(g,1,1) # using a = 1 and b = 1
#[1] 0.9918427 1.0149329 0.9873386 1.0016774 0.9737270 0.9905320 1.0025893
#[8] 1.0341199 1.0132947 1.0258569
[1] 1.876408 1.880554 1.875578 1.878138 1.873094 1.876170 1.878304 1.884049
[9] 1.880256 1.882544
Now for the optimization part, it depends on what you mean by f(x) would approximate normal distribution. You can compare mean square error from the qq-line if you want. Also since you say approximate, how close is good enough? You can go with shapiro.test and keep searching till you find p-value below 0.05 (be ware that there may not be a solution)
[1] 0.9484821
cost <- function(x,y) shapiro.test(f_func(g,x,y))$p
Now that we have a cost function how do we go about minimizing it. There are many many different ways to do numerical optimization. Take a look at optim function
This final line does not work, but without proper data and context this is as far as I can go. Hope this helps.
