Logitstic Regression In Julia Returning Incorrect results - julia

I am trying to use optimize from Julia's Optim package to estimate the vector beta = [beta_0, beta_1]' , but I am getting unreasonable results.
I've provided a minimum working example where the results estimate [27.04, -14.38] when the true values are [1, 1].
What am I doing wrong here?
Here is a minimum working example. It's my first one, so please let me know how it could be improved.
using Distributions
using Optim
using LinearAlgebra
import Random
Random.seed!(42)
# generate data
true_beta = [1; 1];
N=500;
X = [ones(N) rand(Normal(0,1), N)];
u = rand(Normal(0,1), N)
# generate the latent variable
y_star = X * true_beta + u;
# generate observed variable
y = Vector{Int64}(zeros(N));
for i in 1:N
if y_star[i] >= 0
y[i] = 1
end
end
# (negative of) loglikelihood function
function loglike(beta::Vector{Float64})
l = Vector{Float64}()
pk = 1/(1+exp(-X[i,:]'*beta))
lhood = -y[i,1]*log(pk) - (1-y[i,1])*log(1-pk)
for i in 1:size(y,1)
push!(l, lhood)
end
return sum(l)
end
# initial guess: ols
ols = inv(X'X)X'y;
# minimize negative loglikelihood
res = optimize(loglike, ols)
# save parameter estimates of beta
betahat = res.minimizer

The reason is that your function is not defined correctly. It should be:
function loglike(beta::Vector{Float64})
l = Vector{Float64}()
for i in 1:size(y,1)
pk = 1/(1+exp(-X[i,:]'*beta))
lhood = -y[i]*log(pk) - (1-y[i])*log(1-pk)
push!(l, lhood)
end
return sum(l)
end
And you can check that the result is correct by running:
using GLM
glm(#formula(y~x), (y=y, x=X[:, 2]), Binomial(), LogitLink())
Also notice that your data generating process is incorrect. You are adding normal noise and you should add logistic noise. With normal noise the correct model is Probit. If you use it e.g. like:
glm(#formula(y~x), (y=y, x=X[:, 2]), Binomial(), ProbitLink())
you will indeed recover both parameters to be around 1.

You may not get estimated values close to [1, 1] since you have used a step function to generate y.
Also, as suggested by #bogumił-kamiński, it is always better to use a tested function like glm from the GLM package. The GLM package not only provides estimates but an ecosystem surrounding the logistic regression, which is very useful to diagnose the model.
The glm produces the following
GeneralizedLinearModel
y ~ 1 + X
Coefficients:
───────────────────────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|) Lower 95% Upper 95%
───────────────────────────────────────────────────────────────────────
(Intercept) 2.22148 0.193944 11.45 <1e-29 1.84136 2.6016
X 1.98917 0.223411 8.90 <1e-18 1.55129 2.42704
───────────────────────────────────────────────────────────────────────
and estimates using a proper loglike function
julia> # save parameter estimates of beta
betahat = res.minimizer
2-element Vector{Float64}:
2.221486647653181
1.9891622925760526

Related

Problem with simple numerical estimation for MLE of multinomial in R

I am trying to set up a simple numerical MLE estimation of a multinomial distribution.
The multinomial has one constraint - all the cell probabilities need to add up to one.
Usually the way to have this constraint is to re-express one of the probabilities as (1 - sum of the others)
When I run this however, I have a problem as during the optimization procedure, I might have logarithm of a negative value.
Any thoughts of how to fix this? I tried using another optimization package (Rsolnp) and it worked, but I am trying to make it work with the simple default R optim in order to avoid constrained/nonlinear optimization.
Here is my code (I know that I can get the result in this particular case analytically, but this is a toy example, my actual problem is bigger than this here).
set.seed(1234)
test_data <- rmultinom(n = 1, size = 1000, prob = rep(1/4, 4))
N <- test_data
loglik_function <- function(theta){
output <- -1*(N[1]*log(theta[1]) + N[2]*log(theta[2]) + N[3]*log(theta[3]) + N[4]*log(1- sum(theta)))
return(output)
}
startval <- rep(0.1, 3)
my_optim <- optim(startval, loglik_function, lower = 0.0001, upper = 0.9999, method = "L-BFGS-B")
Any thoughts or help would be very much appreciated. Thanks
Full heads-up: I know you asked about (constrained) ML estimation, but how about doing this the Bayesian way à la Stan/rstan. I will remove this if it's not useful/missing the point.
The model is only a few lines of code.
library(rstan)
model_code <- "
data {
int<lower=1> K; // number of choices
int<lower=0> y[K]; // observed choices
}
parameters {
simplex[K] theta; // simplex of probabilities, one for every choice
}
model {
// Priors
theta ~ cauchy(0, 2.5); // weakly informative
// Likelihood
y ~ multinomial(theta);
}
generated quantities {
real ratio;
ratio = theta[1] / theta[2];
}
"
You can see how easy it is to implement the simplex constraint on the thetas using the Stan data type simplex. In the Stan language, simplex allows you to easily implement a probability (unit) simplex
where K denotes the number of parameters (here: choices).
Also note how we use the generated quantities code block, to calculate derived quantities (here ratio) based on the parameters (here theta[1] and theta[2]). Since we have access to the posterior distributions of all parameters, calculating the distribution of derived quantities is trivial.
We then fit the model to your test_data
fit <- stan(model_code = model_code, data = list(K = 4, y = test_data[, 1]))
and show a summary of the parameter estimates
summary(fit)$summary
# mean se_mean sd 2.5% 25%
#theta[1] 0.2379866 0.0002066858 0.01352791 0.2116417 0.2288498
#theta[2] 0.26 20013 0.0002208638 0.01365478 0.2358731 0.2526111
#theta[3] 0.2452539 0.0002101333 0.01344665 0.2196868 0.2361817
#theta[4] 0.2547582 0.0002110441 0.01375618 0.2277589 0.2458899
#ratio 0.9116350 0.0012555320 0.08050852 0.7639551 0.8545142
#lp__ -1392.6941655 0.0261794859 1.19050097 -1395.8297494 -1393.2406198
# 50% 75% 97.5% n_eff Rhat
#theta[1] 0.2381541 0.2472830 0.2645305 4283.904 0.9999816
#theta[2] 0.2615782 0.2710044 0.2898404 3822.257 1.0001742
#theta[3] 0.2448304 0.2543389 0.2722152 4094.852 1.0007501
#theta[4] 0.2545946 0.2638733 0.2822803 4248.632 0.9994449
#ratio 0.9078901 0.9648312 1.0764747 4111.764 0.9998184
#lp__ -1392.3914998 -1391.8199477 -1391.3274885 2067.937 1.0013440
as well as a plot showing point estimates and CIs for the theta parameters
plot(fit, pars = "theta")
Update: Constrained ML estimation using maxLik
You can in fact implement constrained ML estimation using methods provided by the maxLik library. I found it a bit "fiddly", because convergence seems to be quite sensitive to changes in the starting values and the optimisation method used.
For what it's worth, here is a reproducible example:
library(maxLik)
x <- test_data[, 1]
Define the log-likelihood function for a multinomial distribution; I've included an if statement here to prevent theta < 0 cases from throwing an error.
loglik <- function(theta, x)
if (all(theta > 0)) sum(dmultinom(x, prob = theta, log = TRUE)) else 0
I use the Nelder-Mead optimisation method here to find the maximum of the log-likelihood function. The important bit here is the constraints argument that implements a constraint in the form of the equality A theta + B = 0, see ?maxNM for details and examples.
res <- maxNM(
loglik,
start = rep(0.25, length(x)),
constraints = list(
eqA = matrix(rep(1, length(x)), ncol = length(x)),
eqB = -1),
x = x)
We can inspect the results
summary(res)
--------------------------------------------
Nelder-Mead maximization
Number of iterations: 111
Return code: 0
successful convergence
Function value: -10.34576
Estimates:
estimate gradient
[1,] 0.2380216 -0.014219040
[2,] 0.2620168 0.012664714
[3,] 0.2450181 0.002736670
[4,] 0.2550201 -0.002369234
Constrained optimization based on SUMT
Return code: 1
penalty close to zero
1 outer iterations, barrier value 5.868967e-09
--------------------------------------------
and confirm that indeed the sum of the estimates equals 1 (within accuracy)
sum(res$estimate)
#[1] 1.000077
Sample data
set.seed(1234)
test_data <- rmultinom(n = 1, size = 1000, prob = rep(1/4, 4))

Spatial Autoregressive Maximum Likelihood in Julia: Multiple Parameters

I have the following code that evaluates the likelihood function for a spatial autoregressive model in Julia, like so:
function like_sar2(betas,rho,sige,y,x,W)
n = length(y)
A = speye(n) - rho*W
e = y-x*betas-rho*sparse(W)*y
epe = e'*e
tmp2 = 1/(2*sige)
llike = -(n/2)*log(pi) - (n/2)*log(sige) + log(det(A)) - tmp2*epe
end
I am trying to maximize this function but I'm not sure how to pass the different sized function inputs so that the Optim.jl package will accept it. I have tried the following:
optimize(like_sar2,[betas;rho;sige;y;x;W],BFGS())
and
optimize(like_sar2,tuple(betas,rho,sige,y,x,W),BFGS())
In the first case, the matrix in brackets does not conform due to dimension mismatch and in the second, the Optim package doesn't allow tuples.
I'd like to try and maximize this likelihood function so that it can return the numerical Hessian matrix (using the Optim options) so that I can compute t-statistics for the parameters.
If there is any easier way to obtain the numerical Hessian for such a function I'd use that but it appears that packages like FowardDiff only accept single inputs.
Any help would be greatly appreciated!
Not 100% sure I correctly understand how your function works, but it seems to me like you're using the likelihood to estimate the coefficient vector beta, with the other input variables fixed. The way to do this would be to amend the function as follows:
using Optim
# Initialize some parameters
coeffs = rand(10)
rho = 0.1
ys = rand(10)
xs = rand(10,10)
Wmat = rand(10,10)
sige=0.5
# Construct likelihood with parameters fixed at pre-defined values
function like_sar2(β::Vector{Float64},ρ=rho,σε=sige,y=ys,x=xs,W=Wmat)
n = length(y)
A = speye(n) - ρ*W
ε = y-x*β-ρ*sparse(W)*y
epe = ε'*ε
tmp2 = 1/(2*σε)
llike = -(n/2)*log(π) - (n/2)*log(σε) + log(det(A)) - tmp2*epe
end
# Optimize, with starting value zero for all beta coefficients
optimize(like_sar2, zeros(10), NelderMead())
If you need to optimize more than your beta parameters (in the general autoregressive models I've used often the autocorrelation parameter was estimated jointly with other coefficients), you could do this by chugging it in with the beta vector and unpacking within the functions like so:
append!(coeffs,rho)
function like_sar3(coeffs::Vector{Float64},σε=sige,y=ys,x=xs,W=Wmat)
β = coeffs[1:10]; ρ = coeffs[11]
n = length(y)
A = speye(n) - ρ*W
ε = y-x*β-ρ*sparse(W)*y
epe = ε'*ε
tmp2 = 1/(2*σε)
llike = -(n/2)*log(π) - (n/2)*log(σε) + log(det(A)) - tmp2*epe
end
The key is that you end up with one vector of inputs to pass into your function.

Hessian Matrix in Maximum Likelihood - Gauss vs. R

I am struggling with the following problem. In a nutshell: Two different software packages (Gauss by Aptech and R) yield totally different Hessian Matrices in a Maximum Liklihood Procedure. I am using the same procedure (BFGS), the exact same data, the same maximum likelihood formula (it is a very simple logit model) with the exact same starting values and confusingly, I get the same results for the parameters and the log-likelihood. Only the Hessian matrices are different accross both programs and therefore, the estimation of the standard errors and statistical inference differs.
It does not appear much deviation in this specific example, but every increasing complication of the model increases the difference, so if I try to estimate my final model, both programs yield completely off results.
Does anyone know, how both programs differ in the way they compute the Hessian and possibly the right way to optaining the same results?
EDIT: In the R (Gauss) code, vector X (alt) is the independent variable, consisting of a two-colum vector with column one being entirely ones and the second column the subjects' responses. Vector y (itn) is the dependent variable, consisting of one columns with the subjects' responses. The example (R Code and data set) has been taken from http://www.polsci.ucsb.edu/faculty/glasgow/ps206/ps206.html, just as an example to reproduce and isolate the problem.
I have attached both codes (Gauss and R syntax) and outputs.
Any help would be greatly appreciated. Thank you :)
Gauss:
start={ 0.95568840 , -0.20459156 };
library maxlik,pgraph;
maxset;
_max_Algorithm = 2;
_max_Diagnostic = 1;
{betaa,f,g,cov,ret} = maxlik(XMAT,0,&ll,start);
call maxprt(betaa,f,g,cov,ret);
print _max_FinalHess;
proc ll(b,XMAT);
local exb, probo, logexb, yn, logexbn, yt, ynt, logl;
exb = EXP(alt*b);
//print exb;
probo = exb./(1+exb);
logexb = ln(probo);
yn = 1 - itn;
logexbn = ln(1 - probo);
yt = itn';
ynt = yn';
logl = (yt*logexb + ynt*logexbn);
print(logl);
retp(logl);
endp;
R:
startv <- c(0.95568840,-0.20459156)
logit.lf <- function(beta) {
exb <- exp(X%*%beta)
prob1 <- exb/(1+exb)
logexb <- log(prob1)
y0 <- 1 - y
logexb0 <- log(1 - prob1)
yt <- t(y)
y0t <- t(y0)
logl <- -(yt%*%logexb + y0t%*%logexb0)
return(logl)
}
logitmodel <- optim(startv, logit.lf, method="BFGS", control=list(trace=TRUE, REPORT=1), hessian=TRUE)
logitmodel$hessian
Gauss Output:
return code = 0
normal convergence
Mean log-likelihood -0.591820
Number of cases 1924
Covariance matrix of the parameters computed by the following method:
Inverse of computed Hessian
Parameters Estimates Std. err. Est./s.e. Prob. Gradient
------------------------------------------------------------------
P01 2.1038 0.2857 7.363 0.0000 0.0000
P02 -0.9984 0.2365 -4.221 0.0000 0.0000
Gauss Hessian:
0.20133256 0.23932571
0.23932571 0.29377761
R Output:
initial value 1153.210839
iter 2 value 1148.015749
iter 3 value 1141.420328
iter 4 value 1138.668174
iter 5 value 1138.662148
iter 5 value 1138.662137
iter 5 value 1138.662137
final value 1138.662137
converged
Coeff. Std. Err. z p value
[1,] 2.10379869 0.28570765 7.3634665 1.7919000e-13
[2,] -0.99837955 0.23651060 -4.2212889 2.4290942e-05
R Hessian:
[,1] [,2]
[1,] 387.34106 460.45379
[2,] 460.45379 565.24412
They are just scaled differently. The GAUSS numbers are around 1924 times smaller than the R numbers.
I think GAUSS keeps the numbers in a smaller range for numerical stability.

Maximum Likelihood Estimation for three-parameter Weibull distribution in r

I want to estimate the scale, shape and threshold parameters of a 3p Weibull distribution.
What I've done so far is the following:
Refering to this post, Fitting a 3 parameter Weibull distribution in R
I've used the functions
EPS = sqrt(.Machine$double.eps) # "epsilon" for very small numbers
llik.weibull <- function(shape, scale, thres, x)
{
sum(dweibull(x - thres, shape, scale, log=T))
}
thetahat.weibull <- function(x)
{
if(any(x <= 0)) stop("x values must be positive")
toptim <- function(theta) -llik.weibull(theta[1], theta[2], theta[3], x)
mu = mean(log(x))
sigma2 = var(log(x))
shape.guess = 1.2 / sqrt(sigma2)
scale.guess = exp(mu + (0.572 / shape.guess))
thres.guess = 1
res = nlminb(c(shape.guess, scale.guess, thres.guess), toptim, lower=EPS)
c(shape=res$par[1], scale=res$par[2], thres=res$par[3])
}
to "pre-estimate" my Weibull parameters, such that I can use them as initial values for the argument "start" in the "fitdistr" function of the MASS-Package.
You might ask why I want to estimate the parameters twice... reason is that I need the variance-covariance-matrix of the estimates which is also estimated by the fitdistr function.
EXAMPLE:
set.seed(1)
thres <- 450
dat <- rweibull(1000, 2.78, 750) + thres
pre_mle <- thetahat.weibull(dat)
my_wb <- function(x, shape, scale, thres) {
dweibull(x - thres, shape, scale)
}
ml <- fitdistr(dat, densfun = my_wb, start = list(shape = round(pre_mle[1], digits = 0), scale = round(pre_mle[2], digits = 0),
thres = round(pre_mle[3], digits = 0)))
ml
> ml
shape scale thres
2.942548 779.997177 419.996196 ( 0.152129) ( 32.194294) ( 28.729323)
> ml$vcov
shape scale thres
shape 0.02314322 4.335239 -3.836873
scale 4.33523868 1036.472551 -889.497580
thres -3.83687258 -889.497580 825.374029
This works quite well for cases where the shape parameter is above 1. Unfortunately my approach should deal with the cases where the shape parameter could be smaller than 1.
The reason why this is not possible for shape parameters that are smaller than 1 is described here: http://www.weibull.com/hotwire/issue148/hottopics148.htm
in Case 1, All three parameters are unknown the following is said:
"Define the smallest failure time of ti to be tmin. Then when γ → tmin, ln(tmin - γ) → -∞. If β is less than 1, then (β - 1)ln(tmin - γ) goes to +∞ . For a given solution of β, η and γ, we can always find another set of solutions (for example, by making γ closer to tmin) that will give a larger likelihood value. Therefore, there is no MLE solution for β, η and γ."
This makes a lot of sense. For this very reason I want to do it the way they described it on this page.
"In Weibull++, a gradient-based algorithm is used to find the MLE solution for β, η and γ. The upper bound of the range for γ is arbitrarily set to be 0.99 of tmin. Depending on the data set, either a local optimal or 0.99tmin is returned as the MLE solution for γ."
I want to set a feasible interval for gamma (in my code called 'thres') such that the solution is between (0, .99 * tmin).
Does anyone have an idea how to solve this problem?
In the function fitdistr there seems to be no opportunity doing a constrained MLE, constraining one parameter.
Another way to go could be the estimation of the asymptotic variance via the outer product of the score vectors. The score vector could be taken from the above used function thetahat.weibul(x). But calculating the outer product manually (without function) seems to be very time consuming and does not solve the problem of the constrained ML estimation.
Best regards,
Tim
It's not too hard to set up a constrained MLE. I'm going to do this in bbmle::mle2; you could also do it in stats4::mle, but bbmle has some additional features.
The larger issue is that it's theoretically difficult to define the sampling variance of an estimate when it's on the boundary of the allowed space; the theory behind Wald variance estimates breaks down. You can still calculate confidence intervals by likelihood profiling ... or you could bootstrap. I ran into a variety of optimization issues when doing this ... I haven't really thought about wether there are specific reasons
Reformat three-parameter Weibull function for mle2 use (takes x as first argument, takes log as an argument):
dweib3 <- function(x, shape, scale, thres, log=TRUE) {
dweibull(x - thres, shape, scale, log=log)
}
Starting function (slightly reformatted):
weib3_start <- function(x) {
mu <- mean(log(x))
sigma2 <- var(log(x))
logshape <- log(1.2 / sqrt(sigma2))
logscale <- mu + (0.572 / logshape)
logthres <- log(0.5*min(x))
list(logshape = logshape, logsc = logscale, logthres = logthres)
}
Generate data:
set.seed(1)
dat <- data.frame(x=rweibull(1000, 2.78, 750) + 450)
Fit model: I'm fitting the parameters on the log scale for convenience and stability, but you could use boundaries at zero as well.
tmin <- log(0.99*min(dat$x))
library(bbmle)
m1 <- mle2(x~dweib3(exp(logshape),exp(logsc),exp(logthres)),
data=dat,
upper=c(logshape=Inf,logsc=Inf,
logthres=tmin),
start=weib3_start(dat$x),
method="L-BFGS-B")
vcov(m1), which should normally provide a variance-covariance estimate (unless the estimate is on the boundary, which is not the case here) gives NaN values ... not sure why without more digging.
library(emdbook)
tmpf <- function(x,y) m1#minuslogl(logshape=x,
logsc=coef(m1)["logsc"],
logthres=y)
tmpf(1.1,6)
s1 <- curve3d(tmpf,
xlim=c(1,1.2),ylim=c(5.9,tmin),sys3d="image")
with(s1,contour(x,y,z,add=TRUE))
h <- lme4:::hessian(function(x) do.call(m1#minuslogl,as.list(x)),coef(m1))
vv <- solve(h)
diag(vv) ## [1] 0.002672240 0.001703674 0.004674833
(se <- sqrt(diag(vv))) ## standard errors
## [1] 0.05169371 0.04127558 0.06837275
cov2cor(vv)
## [,1] [,2] [,3]
## [1,] 1.0000000 0.8852090 -0.8778424
## [2,] 0.8852090 1.0000000 -0.9616941
## [3,] -0.8778424 -0.9616941 1.0000000
This is the variance-covariance matrix of the log-scaled variables. If you want to convert to the variance-covariance matrix on the original scale, you need to scale by (x_i)*(x_j) (i.e. by the derivatives of the transformation exp(x)).
outer(exp(coef(m1)),exp(coef(m1))) * vv
## logshape logsc logthres
## logshape 0.02312803 4.332993 -3.834145
## logsc 4.33299307 1035.966372 -888.980794
## logthres -3.83414498 -888.980794 824.831463
I don't know why this doesn't work with numDeriv - would be very careful with variance estimates above. (Maybe too close to boundary for Richardson extrapolation to work?)
library(numDeriv)
hessian()
grad(function(x) do.call(m1#minuslogl,as.list(x)),coef(m1)) ## looks OK
vcov(m1)
The profiles look OK ... (we have to supply std.err because the Hessian isn't invertible)
pp <- profile(m1,std.err=c(0.01,0.01,0.01))
par(las=1,bty="l",mfcol=c(1,3))
plot(pp,show.points=TRUE)
confint(pp)
## 2.5 % 97.5 %
## logshape 0.9899645 1.193571
## logsc 6.5933070 6.755399
## logthres 5.8508827 6.134346
Alternately, we can do this on the original scale ... one possibility would be to use the log-scaling to fit, then refit starting from those parameters on the original scale.
wstart <- as.list(exp(unlist(weib3_start(dat$x))))
names(wstart) <- gsub("log","",names(wstart))
m2 <- mle2(x~dweib3(shape,sc,thres),
data=dat,
lower=c(shape=0.001,sc=0.001,thres=0.001),
upper=c(shape=Inf,sc=Inf,
thres=exp(tmin)),
start=wstart,
method="L-BFGS-B")
vcov(m2)
## shape sc thres
## shape 0.02312399 4.332057 -3.833264
## sc 4.33205658 1035.743511 -888.770787
## thres -3.83326390 -888.770787 824.633714
all.equal(unname(coef(m2)),unname(exp(coef(m1))),tol=1e-4)
About the same as the values above.
We can fit with a small shape, if we are a little more careful to bound the paraameters, but now we end up on the boundary for the threshold, which will cause lots of problems for the variance calculations.
set.seed(1)
dat <- data.frame(x = rweibull(1000, .53, 365) + 100)
tmin <- log(0.99 * min(dat$x))
m1 <- mle2(x ~ dweib3(exp(logshape), exp(logsc), exp(logthres)),
lower=c(logshape=-10,logscale=0,logthres=0),
upper = c(logshape = 20, logsc = 20, logthres = tmin),
data = dat,
start = weib3_start(dat$x), method = "L-BFGS-B")
For censored data, you need to replace dweibull with pweibull; see Errors running Maximum Likelihood Estimation on a three parameter Weibull cdf for some hints.
Another possible solution is to do Bayesian inference. Using scale priors on the shape and scale parameters and a uniform prior on the location parameter, you can easily run Metropolis-Hastings as follows. It might be adviceable to reparameterize in terms of log(shape), log(scale) and log(y_min - location) because the posterior for some of the parameters becomes strongly skewed, in particular for the location parameter. Note that the output below shows the posterior for the backtransformed parameters.
library(MCMCpack)
logposterior <- function(par,y) {
gamma <- min(y) - exp(par[3])
sum(dweibull(y-gamma,exp(par[1]),exp(par[2]),log=TRUE)) + par[3]
}
y <- rweibull(100,shape=.8,scale=10) + 1
chain0 <- MCMCmetrop1R(logposterior, rep(0,3), y=y, V=.01*diag(3))
chain <- MCMCmetrop1R(logposterior, rep(0,3), y=y, V=var(chain0))
plot(exp(chain))
summary(exp(chain))
This produces the following output
#########################################################
The Metropolis acceptance rate was 0.43717
#########################################################
Iterations = 501:20500
Thinning interval = 1
Number of chains = 1
Sample size per chain = 20000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
[1,] 0.81530 0.06767 0.0004785 0.001668
[2,] 10.59015 1.39636 0.0098738 0.034495
[3,] 0.04236 0.05642 0.0003990 0.001174
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
var1 0.6886083 0.768054 0.81236 0.8608 0.9498
var2 8.0756210 9.637392 10.50210 11.4631 13.5353
var3 0.0003397 0.007525 0.02221 0.0548 0.1939

Maximum Likelihood estimate of ARIMA-not matching the value computed using arima()

I am new to r. I am using the optim function to obtain the maximum likelihood estimate of an arima function assuming residuals are normally distributed. I have differenced the data once to make it stationary. I have written the following to calculate the likelihood:-
kings <- scan("http://robjhyndman.com/tsdldata/misc/kings.dat",skip=3)
arima1<-function(a=length(kings))
{e<-array(0,dim=a-2);
e[1:2]=0
y=diff(kings)
likelihood<-function(AR,e,y)
{for(i in 3:41)
{e[i]<-sum(y[i],-AR[1],-(AR[2]*y[i-1]),-(AR[3]*y[i-2]),-(AR[4]*e[i-1]),-(AR[5]*e[i-2]))
}
-sum(-(a-1)*(log(AR[6]*(2*22/7)^.5)),-(sum(e^2)/(2*(AR[6])^2)))
}
optim(par<-c(0,0,1,0,1,14),likelihood,y=y,e=e,control=list(maxit=1000))
}
arima1()
However the parameters obtained by using the function arima(y,order<-c(2,0,2)) are different
from that obtained from the code above. Could you identify the problem in the logic of the code?
Thanks in advance.
Can't make sense of your log function. Additionally, I think you are expecting too much from the arima and optim functions. Here are a few comments:
For your purposes (learning) I would generate a series using the arima.sim function. This will allow you to know the values that optim and arima are supposed to give.
The arima function gets it wrong sometimes. In two consecutive runs this is what the results where in my computer:
y = arima.sim(n=1000,list(ar=c(0.6, -0.2), ma=c(-0.7, 0.5)),mean=0.3)
arima(y,order=c(2,0,2))$coef
ar1:0.3489 ar2:0.1060 ma1:-0.4318 ma2:0.1887 intercept:0.4919
y = arima.sim(n=1000,list(ar=c(0.6, -0.2), ma=c(-0.7, 0.5)),mean=0.3)
arima(y,order=c(2,0,2))$coef
ar1:0.6100 ar2:-0.1672 ma1:-0.6663 ma2:0.4816 intercept:0.4199
Notice that the ar2 for the first result doesn't even have the same sign and is crappy all around. But the second is a pretty nice fit. Same simulation function parameters, but different results.
Notice that the fewer values you have on the time series, the harder it is to get accurate results. Here are some runs on my computer.
y = arima.sim(n=42,list(ar=c(0.6, -0.2), ma=c(-0.7, 0.5)),mean=0.3)
arima(y,order=c(2,0,2))$coef
ar1:-1.1102 ar2:-0.4528 ma1:1.2052 ma2:0.9999 intercept:0.2353260
y = arima.sim(n=42,list(ar=c(0.6, -0.2), ma=c(-0.7, 0.5)),mean=0.3)
arima(y,order=c(2,0,2))$coef
ar1:0.8623 ar2:-0.3935 ma1:-1.0745 ma2:0.9999 intercept:0.4109
y = arima.sim(n=42,list(ar=c(0.6, -0.2), ma=c(-0.7, 0.5)),mean=0.3)
arima(y,order=c(2,0,2))$coef
ar1:-0.2749 ar2:0.4170 ma1:-0.0078 ma2:-0.0586 intercept:0.3737
All of these results are pretty sucky.
4.I found your code a bit hard to read/understand. Here is what I ran on my machine (With results).
library(stats4)
kings <- scan("http://robjhyndman.com/tsdldata/misc/kings.dat",skip=3)
y = diff(kings)
Y_0 = y[c(-1,-2)] # this is y(t)
Y_1 = y[c(-1,-length(y))] # this is y(t-1)
Y_2 = y[1:(length(y)-2)] # this is y(t-2)
logARMA22 <- function ( ar1=0.1, ar2=-0.2, ma1=-0.7, ma2=0.1, alpha=0.42, sigma=14)
{
E_0 = array(0, dim=length(Y_0))
E_1 = array(0, dim=length(Y_0))
E_2 = array(0, dim=length(Y_0))
for ( i in 1:length(E_0) )
{
E_0[i] = Y_0[i] - ar1*Y_1[i] - ar2*Y_2[i]
- ma1*E_1[i] - ma2*E_2[i] - alpha
if ( (i+1) <= length(E_1) )
E_1 [i+1] = E_0[i]
if ( (i+2) <= length(E_2) )
E_2 [i+2] = E_0[i]
}
# e^2
E2 = (Y_0-alpha-ar1*Y_1-ar2*Y_2-ma1*E_1-ma2*E_2)^2
res = suppressWarnings( - sum(log( (1/(sqrt(2*pi)*sigma)) * exp(-(E2)/(2*sigma^2)))) )
return(res)
}
print(arima(y, order=c(2,0,2))$coef)
print(mle(logARMA22))
#Read 42 items
# ar1 ar2 ma1 ma2 intercept
# 0.087452103 -0.122367077 -0.740473745 0.006492026 0.372799202
#
#Call:
#mle(minuslogl = logARMA22)
#
#Coefficients:
# ar1 ar2 ma1 ma2 alpha sigma
#-0.07772158 -0.48516681 -0.50804965 0.03678942 0.10385160 15.33830148
Note that mle is just a front end for optim (so you are actually calling optim here). The results are not equal probably due to the size of the sample. The log-likelihood function in this case was taken from "Henrik Madsen, 2008, Time Series Analysis. equation 6.72 (page 167)". Also this link will probably be helpful as well.
Hope it helps
The arima() function does not use the maximum likelihood method to fit the model.
The exact likelihood is computed via a state-space representation of the ARIMA process, and the innovations and their variance found by a Kalman filter.

Resources