Understanding different results of optim() and lm()

Understanding different results of optim() and lm() - r

Given:
set.seed(1001)
outcome<-rnorm(1000,sd = 1)
covariate<-rnorm(1000,sd = 1)
log-likelihood of normal pdf:
loglike <- function(par, outcome, covariate){
cov <- as.matrix(cbind(1, covariate))
xb <- cov * par
(- 1/2* sum((outcome - xb)^2))
}
optimize:
opt.normal <- optim(par = 0.1,fn = loglike,outcome=outcome,cov=covariate, method = "BFGS", control = list(fnscale = -1),hessian = TRUE)
However I get different results when running an simple OLS. However maximizing log-likelihhod and minimizing OLS should bring me to a similar estimate. I suppose there is something wrong with my optimization.
summary(lm(outcome~covariate))

Umm several things... Here's a proper working likelihood function (with names x and y):
loglike =
function(par,x,y){cov = cbind(1,x); xb = cov %*% par;(-1/2)*sum((y-xb)^2)}
Note use of matrix multiplication operator.
You were also only running it with one par parameter, so it was not only broken because your loglike was doing element-element multiplication, it was only returning one value too.
Now compare optimiser parameters with lm coefficients:
opt.normal <- optim(par = c(0.1,0.1),fn = loglike,y=outcome,x=covariate, method = "BFGS", control = list(fnscale = -1),hessian = TRUE)
opt.normal$par
[1] 0.02148234 -0.09124299
summary(lm(outcome~covariate))$coeff
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.02148235 0.03049535 0.7044466 0.481319029
covariate -0.09124299 0.03049819 -2.9917515 0.002842011
shazam.
Helpful hints: create data that you know the right answer for - eg x=1:10; y=rnorm(10)+(1:10) so you know the slope is 1 and the intercept 0. Then you can easily see which of your things are in the right ballpark. Also, run your loglike function on its own to see if it behaves as you expect.

Maybe you will find it usefull to see the difference between these two methods from my code. I programmed it the following way.
data.matrix <- as.matrix(hprice1[,c("assess","bdrms","lotsize","sqrft","colonial")])
loglik <- function(p,z){
beta <- p[1:5]
sigma <- p[6]
y <- log(data.matrix[,1])
eps <- (y - beta[1] - z[,2:5] %*% beta[2:5])
-nrow(z)*log(sigma)-0.5*sum((eps/sigma)^2)
}
p0 <- c(5,0,0,0,0,2)
m <- optim(p0,loglik,method="BFGS",control=list(fnscale=-1,trace=10),hessian=TRUE,z=data.matrix)
rbind(m$par,sqrt(diag(solve(-m$hessian))))
And for the lm() method I find this
m.ols <- lm(log(assess)~bdrms+lotsize+sqrft+colonial,data=hprice1)
summary(m.ols)
Also if you would like to estimate the elasticity of assessed value with respect to the lotsize or calculate a 95% confidence interval
for this parameter, you could use the following
elasticity.at.mean <- mean(hprice1$lotsize) * m$par[3]
var.coefficient <- solve(-m$hessian)[3,3]
var.elasticity <- mean(hprice1$lotsize)^2 * var.coefficient
# upper bound
elasticity.at.mean + qnorm(0.975)* sqrt(var.elasticity)
# lower bound
elasticity.at.mean + qnorm(0.025)* sqrt(var.elasticity)
A more simple example of the optim method is given below for a binomial distribution.
loglik1 <- function(p,n,n.f){
n.f*log(p) + (n-n.f)*log(1-p)
}
m <- optim(c(pi=0.5),loglik1,control=list(fnscale=-1),
n=73,n.f=18)
m
m <- optim(c(pi=0.5),loglik1,method="BFGS",hessian=TRUE,
control=list(fnscale=-1),n=73,n.f=18)
m
pi.hat <- m$par
numerical calculation of s.d
rbind(pi.hat=pi.hat,sd.pi.hat=sqrt(diag(solve(-m$hessian))))
analytical
rbind(pi.hat=18/73,sd.pi.hat=sqrt((pi.hat*(1-pi.hat))/73))
Or this code for the normal distribution.
loglik1 <- function(p,z){
mu <- p[1]
sigma <- p[2]
-(length(z)/2)*log(sigma^2) - sum(z^2)/(2*sigma^2) +
(mu*sum(z)/sigma^2) - (length(z)*mu^2)/(2*sigma^2)
}
m <- optim(c(mu=0,sigma2=0.1),loglik1,
control=list(fnscale=-1),z=aex)

Related

Fit Exponentially modified Gaussian to 2D data

I am trying to fit a exponentially modified gaussian (like in https://en.wikipedia.org/wiki/Exponentially_modified_Gaussian_distribution equation (1)) to my 2D (x, y) data in R.
My data are:
x <- c(1.13669371604919, 1.14107275009155, 1.14545404911041, 1.14983117580414,
1.15421032905579, 1.15859162807465, 1.16296875476837, 1.16734790802002,
1.17172694206238, 1.17610621452332, 1.18048334121704, 1.18486452102661,
1.18924164772034, 1.19362080097198, 1.19800209999084, 1.20237922668457,
1.20675826072693, 1.21113955974579, 1.21551668643951, 1.21989583969116,
1.22427713871002, 1.22865414619446, 1.2330334186554, 1.23741245269775,
1.24178957939148, 1.24616885185242, 1.25055003166199, 1.25492715835571,
1.25930631160736, 1.26368761062622, 1.26806473731995, 1.2724437713623
)
y <- c(42384.03125, 65262.62890625, 235535.828125, 758616, 1691651.75,
3956937.25, 8939261, 20311304, 41061724, 65143896, 72517440,
96397368, 93956264, 87773568, 82922064, 67289832, 52540768, 50410896,
35995212, 27459486, 14173627, 12645145, 10069048, 4290783.5,
2999174.5, 2759047.5, 1610762.625, 1514802, 958150.6875, 593638.6875,
368925.8125, 172826.921875)
The function I am trying to fit and the value I am trying to minimize for optimization:
EMGCurve <- function(x, par)
{
ta <- 1/par[1]
mu <- par[2]
si <- par[3]
h <- par[4]
Fct.V <- (h * si / ta) * (pi/2)^0.5 * exp(0.5 * (si / ta)^2 - (x - mu)/ta)
Fct.V
}
RMSE <- function(par)
{
Fct.V <- EMGCurve(x,par)
sqrt(sum((signal - Fct.V)^2)/length(signal))
}
result <- optim(c(1, x[which.max(y)], unname(quantile(x)[4]-quantile(x)[2]), max(y)),
lower = c(1, min(x), 0.0001, 0.1*max(y)),
upper = c(Inf, max(x), 0.5*(max(x) - min(x)), max(y)),
RMSE, method="L-BFGS-B", control=list(factr=1e7))
However, when I try to vizualize the result in the end it seems like nothing usful is happening,..
plot(x,y,xlab="RT/min",ylab="I")
lines(seq(min(x),max(x),length=1000),GaussCurve(seq(min(x),max(x),length=1000),result$par),col=2)
However, for some reason it doesn't work at all, although a managed to do it for a normal distribution with similar code. Would be great if someone has an idea?

If it might be of some use, I got an OK fit to your data using an X-shifted log-normal type peak equation, "y = a * exp(-0.5 * pow((log(x-d)-b) / c, 2.0))" with parameters a = 9.4159743234392539E+07, b = -2.7516932481669185E+00, c = -2.4343893243720971E-01, and d = 1.1251623071481867E+00 yielding R-squared = 0.994 and RMSE = 2.49E06. I personally was unable to fit using the equation in your post. There may be value in scaling the dependent data as the values seem large, but this equation seems to fit the data as is.

Logistic Regression in R: Optimization Issues concerning Initial Guess

I need to implement a logistic regression manually, using the Score/GMM approach, without the use of GLM. This is because at later stages the model will be much more complicated. Currently I am running into a problem where for the logistic regression, the optimization procedures are very initial point dependent.To illustrate, here is my code using an online dataset. More details about the procedure are in the comments:
library(data,table)
library(nleqslv)
library(Matrix)
mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
data_analysis<-data.table(mydata)
data_analysis[,constant:=1]
#Likelihood function for logit
#The logistic regression will regress the binary variable
#admit on a constant and the variable gpa
LL <- function(beta){
beta=as.numeric(beta)
data_temp=data_analysis
mat_temp2 = cbind(data_temp$constant,
data_temp$gpa)
one = rep(1,dim(mat_temp2)[1])
h = exp(beta %*% t(mat_temp2))
choice_prob = h/(1+h)
llf <- sum(data_temp$admit * log(choice_prob)) + (sum((one-data_temp$admit) * log(one-choice_prob)))
return(-1*llf)
}
#Score to be used when optimizing using LL
#Identical to the Score function below but returns negative output
Score_LL <- function(beta){
data_temp=data_analysis
mat_temp2 = cbind(data_temp$constant,
data_temp$gpa)
one = rep(1,dim(mat_temp2)[1])
h = exp(beta %*% t(mat_temp2))
choice_prob = h/(1+h)
resid = as.numeric(data_temp$admit - choice_prob)
score_final2 = t(mat_temp2) %*% Diagonal(length(resid), x=resid) %*% one
return(-1*as.numeric(score_final2))
}
#The Score/Deriv/Jacobian of the Likelihood function
Score <- function(beta){
data_temp=data_analysis
mat_temp2 = cbind(data_temp$constant,
data_temp$gpa)
one = rep(1,dim(mat_temp2)[1])
h = exp(beta %*% t(mat_temp2))
choice_prob = as.numeric(h/(1+h))
resid = as.numeric(data_temp$admit - choice_prob)
score_final2 = t(mat_temp2) %*% Diagonal(length(resid), x=resid) %*% one
return(as.numeric(score_final2))
}
#Derivative of the Score function
Score_Deriv <- function(beta){
data_temp=data_analysis
mat_temp2 = cbind(data_temp$constant,
data_temp$gpa)
one = rep(1,dim(mat_temp2)[1])
h = exp(beta %*% t(mat_temp2))
weight = (h/(1+h)) * (1- (h/(1+h)))
weight_mat = Diagonal(length(weight), x=weight)
deriv = t(mat_temp2)%*%weight_mat%*%mat_temp2
return(-1*as.array(deriv))
}
#Quadratic Gain function
#Minimized at Score=0 and so minimizing is equivalent to solving the
#FOC of the Likelihood. This is the GMM approach.
Quad_Gain<- function(beta){
h=Score(as.numeric(beta))
return(sum(h*h))
}
#Derivative of the Quadratic Gain function
Quad_Gain_deriv <- function(beta){
return(2*t(Score_Deriv(beta))%*%Score(beta))
}
sol1=glm(admit ~ gpa, data = data_analysis, family = "binomial")
sol2=optim(c(2,2),Quad_Gain,gr=Quad_Gain_deriv,method="BFGS")
sol3=optim(c(0,0),Quad_Gain,gr=Quad_Gain_deriv,method="BFGS")
When I run this code, I get that sol3 matches what glm produces (sol1) but sol2, with a different initial point, differs from the glm solution by a lot. This is something happening in my main code with the actual data as well. One solution is to create a grid and test multiple starting points. However, my main data set has 10 parameters and this would make the grid very large and the program computationally infeasible. Is there a way around this problem?

Your code seems overly complicated. The following two functions define the negative log-likelihood and negative score vector for a logistic regression with the logit link:
logLik_Bin <- function (betas, y, X) {
eta <- c(X %*% betas)
- sum(dbinom(y, size = 1, prob = plogis(eta), log = TRUE))
}
score_Bin <- function (betas, y, X) {
eta <- c(X %*% betas)
- crossprod(X, y - plogis(eta))
}
Then you can use it as follows:
# load the data
mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
# fit with optim()
opt1 <- optim(c(-1, 1, -1), logLik_Bin, score_Bin, method = "BFGS",
y = mydata$admit, X = cbind(1, mydata$gre, mydata$gpa))
opt1$par
# compare with glm()
glm(admit ~ gre + gpa, data = mydata, family = binomial())
Typically, for well-behaved covariates (i.e., expecting to have a coefficients in the interval [-4 to 4]), starting at 0 is a good idea.

nlm function fails with analytic Hessian

Some background: the nlm function in R is a general purpose optimization routine that uses Newton's method. To optimize a function, Newton's method requires the function, as well as the first and second derivatives of the function (the gradient vector and the Hessian matrix, respectively). In R the nlm function allows you to specify R functions that correspond to calculations of the gradient and Hessian, or one can leave these unspecified and numerical solutions are provided based on numerical derivatives (via the deriv function). More accurate solutions can be found by supplying functions to calculate the gradient and Hessian, so it's a useful feature.
My problem: the nlm function is slower and often fails to converge in a reasonable amount of time when the analytic Hessian is supplied. I'm guessing this is some sort of bug in the underlying code, but I'd be happy to be wrong. Is there a way to make nlm work better with an analytic Hessian matrix?
Example: my R code below demonstrates this problem using a logistic regression example, where
log(Pr(Y=1)/Pr(Y=0)) = b0 + Xb
where X is a multivariate normal of dimension N by p and b is a vector of coefficients of length p.
library(mvtnorm)
# example demonstrating a problem with NLM
expit <- function(mu) {1/(1+exp(-mu))}
mk.logit.data <- function(N,p){
set.seed(1232)
U = matrix(runif(p*p), nrow=p, ncol=p)
S = 0.5*(U+t(U)) + p*diag(rep(1,p))
X = rmvnorm(N, mean = runif(p, -1, 1), sigma = S)
Design = cbind(rep(1, N), X)
beta = sort(sample(c(rep(0,p), runif(1))))
y = rbinom(N, 1, expit(Design%*%beta))
list(X=X,y=as.numeric(y),N=N,p=p)
}
# function to calculate gradient vector at given coefficient values
logistic_gr <- function(beta, y, x, min=TRUE){
mu = beta[1] + x %*% beta[-1]
p = length(beta)
n = length(y)
D = cbind(rep(1,n), x)
gri = matrix(nrow=n, ncol=p)
for(j in 1:p){
gri[,j] = D[,j]*(exp(-mu)*y-1+y)/(1+exp(-mu))
}
gr = apply(gri, 2, sum)
if(min) gr = -gr
gr
}
# function to calculate Hessian matrix at given coefficient values
logistic_hess <- function(beta, y, x, min=TRUE){
# allow to fail with NA, NaN, Inf values
mu = beta[1] + x %*% beta[-1]
p = length(beta)
n = length(y)
D = cbind(rep(1,n), x)
h = matrix(nrow=p, ncol=p)
for(j in 1:p){
for(k in 1:p){
h[j,k] = -sum(D[,j]*D[,k]*(exp(-mu))/(1+exp(-mu))^2)
}
}
if(min) h = -h
h
}
# function to calculate likelihood (up to a constant) at given coefficient values
logistic_ll <- function(beta, y,x, gr=FALSE, he=FALSE, min=TRUE){
mu = beta[1] + x %*% beta[-1]
lli = log(expit(mu))*y + log(1-expit(mu))*(1-y)
ll = sum(lli)
if(is.na(ll) | is.infinite(ll)) ll = -1e16
if(min) ll=-ll
# the below specification is required for using analytic gradient/Hessian in nlm function
if(gr) attr(ll, "gradient") <- logistic_gr(beta, y=y, x=x, min=min)
if(he) attr(ll, "hessian") <- logistic_hess(beta, y=y, x=x, min=min)
ll
}
First example, with p=3:
dat = mk.logit.data(N=100, p=3)
The glm function estimates are for reference. nlm should give the same answer, allowing for small errors due to approximation.
(glm.sol <- glm(dat$y~dat$X, family=binomial()))$coefficients
> (Intercept) dat$X1 dat$X2 dat$X3
> 0.00981465 0.01068939 0.04417671 0.01625381
# works when correct analytic gradient is specified
(nlm.sol1 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, y=dat$y, x=dat$X))$estimate
> [1] 0.009814547 0.010689396 0.044176627 0.016253966
# works, but less accurate when correct analytic hessian is specified (even though the routine notes convergence is probable)
(nlm.sol2 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, he=TRUE, y=dat$y, x=dat$X, hessian = TRUE, check.analyticals=TRUE))$estimate
> [1] 0.009827701 0.010687278 0.044178416 0.016255630
But the problem becomes apparent when p is larger, here it is 10
dat = mk.logit.data(N=100, p=10)
Again, glm solution for reference. nlm should give the same answer, allowing for small errors due to approximation.
(glm.sol <- glm(dat$y~dat$X, family=binomial()))$coefficients
> (Intercept) dat$X1 dat$X2 dat$X3 dat$X4 dat$X5 dat$X6 dat$X7
> -0.07071882 -0.08670003 0.16436630 0.01130549 0.17302058 0.03821008 0.08836471 -0.16578959
> dat$X8 dat$X9 dat$X10
> -0.07515477 -0.08555075 0.29119963
# works when correct analytic gradient is specified
(nlm.sol1 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, y=dat$y, x=dat$X))$estimate
> [1] -0.07071879 -0.08670005 0.16436632 0.01130550 0.17302057 0.03821009 0.08836472
> [8] -0.16578958 -0.07515478 -0.08555076 0.29119967
# fails to converge in 5000 iterations when correct analytic hessian is specified
(nlm.sol2 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, he=TRUE, y=dat$y, x=dat$X, hessian = TRUE, iterlim=5000, check.analyticals=TRUE))$estimate
> [1] 0.31602065 -0.06185190 0.10775381 -0.16748897 0.05032156 0.34176104 0.02118631
> [8] -0.01833671 -0.20364929 0.63713991 0.18390489
Edit: I should also add that I have confirmed I have the correct Hessian matrix through multiple different approaches

I tried the code, but at first it seemed to be using a different rmvnorm than I can find on CRAN. I found one rmvnorm in dae package, then one in the mvtnorm package. The latter is the one to use.
nlm() was patched about the time of the above posting. I'm currently trying to verify the patches and it now seems to work OK. Note that I am author of a number of R's optimization codes, including 3/5 in optim().
nashjc at uottawa.ca
Code is below.

Revised code:
# example demonstrating a problem with NLM
expit <- function(mu) {1/(1+exp(-mu))}
mk.logit.data <- function(N,p){
set.seed(1232)
U = matrix(runif(p*p), nrow=p, ncol=p)
S = 0.5*(U+t(U)) + p*diag(rep(1,p))
X = rmvnorm(N, mean = runif(p, -1, 1), sigma = S)
Design = cbind(rep(1, N), X)
beta = sort(sample(c(rep(0,p), runif(1))))
y = rbinom(N, 1, expit(Design%*%beta))
list(X=X,y=as.numeric(y),N=N,p=p)
}
# function to calculate gradient vector at given coefficient values
logistic_gr <- function(beta, y, x, min=TRUE){
mu = beta[1] + x %*% beta[-1]
p = length(beta)
n = length(y)
D = cbind(rep(1,n), x)
gri = matrix(nrow=n, ncol=p)
for(j in 1:p){
gri[,j] = D[,j]*(exp(-mu)*y-1+y)/(1+exp(-mu))
}
gr = apply(gri, 2, sum)
if(min) gr = -gr
gr
}
# function to calculate Hessian matrix at given coefficient values
logistic_hess <- function(beta, y, x, min=TRUE){
# allow to fail with NA, NaN, Inf values
mu = beta[1] + x %*% beta[-1]
p = length(beta)
n = length(y)
D = cbind(rep(1,n), x)
h = matrix(nrow=p, ncol=p)
for(j in 1:p){
for(k in 1:p){
h[j,k] = -sum(D[,j]*D[,k]*(exp(-mu))/(1+exp(-mu))^2)
}
}
if(min) h = -h
h
}
# function to calculate likelihood (up to a constant) at given coefficient values
logistic_ll <- function(beta, y,x, gr=FALSE, he=FALSE, min=TRUE){
mu = beta[1] + x %*% beta[-1]
lli = log(expit(mu))*y + log(1-expit(mu))*(1-y)
ll = sum(lli)
if(is.na(ll) | is.infinite(ll)) ll = -1e16
if(min) ll=-ll
# the below specification is required for using analytic gradient/Hessian in nlm function
if(gr) attr(ll, "gradient") <- logistic_gr(beta, y=y, x=x, min=min)
if(he) attr(ll, "hessian") <- logistic_hess(beta, y=y, x=x, min=min)
ll
}
##!!!! NOTE: Must have this library loaded
library(mvtnorm)
dat = mk.logit.data(N=100, p=3)
(glm.sol <- glm(dat$y~dat$X, family=binomial()))$coefficients
# works when correct analytic gradient is specified
(nlm.sol1 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, y=dat$y, x=dat$X))$estimate
# works, but less accurate when correct analytic hessian is specified (even though the routine notes convergence is probable)
(nlm.sol2 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, he=TRUE, y=dat$y, x=dat$X, hessian = TRUE, check.analyticals=TRUE))$estimate
dat = mk.logit.data(N=100, p=10)
# Again, glm solution for reference. nlm should give the same answer, allowing for small errors due to approximation.
(glm.sol <- glm(dat$y~dat$X, family=binomial()))$coefficients
# works when correct analytic gradient is specified
(nlm.sol1 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, y=dat$y, x=dat$X))$estimate
# fails to converge in 5000 iterations when correct analytic hessian is specified
(nlm.sol2 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, he=TRUE, y=dat$y, x=dat$X, hessian = TRUE, iterlim=5000, check.analyticals=TRUE))$estimate

Estimate a probit regression model with optim()

I need to manually program a probit regression model without using glm. I would use optim for direct minimization of negative log-likelihood.
I wrote code below but it does not work, giving error:
cannot coerce type 'closure' to vector of type 'double'
# load data: data provided via the bottom link
Datospregunta2a <- read.dta("problema2_1.dta")
attach(Datospregunta2a)
# model matrix `X` and response `Y`
X <- cbind(1, associate_professor, full_professor, emeritus_professor, other_rank)
Y <- volunteer
# number of regression coefficients
K <- ncol(X)
# initial guess on coefficients
vi <- lm(volunteer ~ associate_professor, full_professor, emeritus_professor, other_rank)$coefficients
# negative log-likelihood
probit.nll <- function (beta) {
exb <- exp(X%*%beta)
prob<- rnorm(exb)
logexb <- log(prob)
y0 <- (1-y)
logexb0 <- log(1-prob)
yt <- t(y)
y0t <- t(y0)
-sum(yt%*%logexb + y0t%*%logexb0)
}
# gradient
probit.gr <- function (beta) {
grad <- numeric(K)
exb <- exp(X%*%beta)
prob <- rnorm(exb)
for (k in 1:K) grad[k] <- sum(X[,k]*(y - prob))
return(-grad)
}
# direct minimization
fit <- optim(vi, probit.nll, gr = probit.gr, method = "BFGS", hessian = TRUE)
data: https://drive.google.com/file/d/0B06Id6VJyeb5OTFjbHVHUE42THc/view?usp=sharing

case sensitive
Y and y are different. So you should use Y not y in your defined functions probit.nll and probit.gr.
These two functions also do not look correct to me. The most evident problem is the existence of rnorm. The following are correct ones.
negative log-likelihood function
# requires model matrix `X` and binary response `Y`
probit.nll <- function (beta) {
# linear predictor
eta <- X %*% beta
# probability
p <- pnorm(eta)
# negative log-likelihood
-sum((1 - Y) * log(1 - p) + Y * log(p))
}
gradient function
# requires model matrix `X` and binary response `Y`
probit.gr <- function (beta) {
# linear predictor
eta <- X %*% beta
# probability
p <- pnorm(eta)
# chain rule
u <- dnorm(eta) * (Y - p) / (p * (1 - p))
# gradient
-crossprod(X, u)
}
initial parameter values from lm()
This does not sound like a reasonable idea. In no cases should we apply linear regression to binary data.
However, purely focusing on the use of lm, you need + not , to separate covariates in the right hand side of the formula.
reproducible example
Let's generate a toy dataset
set.seed(0)
# model matrix
X <- cbind(1, matrix(runif(300, -2, 1), 100))
# coefficients
b <- runif(4)
# response
Y <- rbinom(100, 1, pnorm(X %*% b))
# `glm` estimate
GLM <- glm(Y ~ X - 1, family = binomial(link = "probit"))
# our own estimation via `optim`
# I am using `b` as initial parameter values (being lazy)
fit <- optim(b, probit.nll, gr = probit.gr, method = "BFGS", hessian = TRUE)
# comparison
unname(coef(GLM))
# 0.62183195 0.38971121 0.06321124 0.44199523
fit$par
# 0.62183540 0.38971287 0.06321318 0.44199659
They are very close to each other!

How to estimate the Kalman Filter with 'KFAS' R package, with an AR(1) transition equation?

I am using 'KFAS' package from R to estimate a state-space model with the Kalman filter. My measurement and transition equations are:
y_t = Z_t * x_t + \eps_t (measurement)
x_t = T_t * x_{t-1} + R_t * \eta_t (transition),
with \eps_t ~ N(0,H_t) and \eta_t ~ N(0,Q_t).
So, I want to estimate the variances H_t and Q_t, but also T_t, the AR(1) coefficient. My code is as follows:
library(KFAS)
set.seed(100)
eps <- rt(200, 4, 1)
meas <- as.matrix((arima.sim(n=200, list(ar=0.6), innov = rnorm(200)*sqrt(0.5)) + eps),
ncol=1)
Zt <- 1
Ht <- matrix(NA)
Tt <- matrix(NA)
Rt <- 1
Qt <- matrix(NA)
ss_model <- SSModel(meas ~ -1 + SSMcustom(Z = Zt, T = Tt, R = Rt,
Q = Qt), H = Ht)
fit <- fitSSM(ss_model, inits = c(0,0.6,0), method = 'L-BFGS-B')
But it returns: "Error in is.SSModel(do.call(updatefn, args = c(list(inits, model), update_args)),: System matrices (excluding Z) contain NA or infinite values, covariance matrices contain values larger than 1e+07"
The NA definitions for the variances works well, as documented in the package's paper. However, it seems this cannot be done for the AR coefficients. Does anyone know how can I do this?
Note that I am aware of the SSMarima function, which eases the definition of the transition equation as ARIMA models. Although I am able to estimate the AR(1) coef. and Q_t this way, I still cannot estimate the \eps_t variance (H_t). Moreover, I am migrating my Kalman filter codes from EViews to R, so I need to learn SSMcustom for other models that are more complicated.
Thanks!

It seems that you are missing something in your example, as your error message comes from the function fitSSM. If you want to use fitSSM for estimating general state space models, you need to provide your own model updating function. The default behaviour can only handle NA's in covariance matrices H and Q. The main goal of fitSSM is just to get started with simple stuff. For complex models and/or large data, I would recommend using your self-written objective function (with help of logLik method) and your favourite numerical optimization routines manually for maximum performance. Something like this:
library(KFAS)
set.seed(100)
eps <- rt(200, 4, 1)
meas <- as.matrix((arima.sim(n=200, list(ar=0.6), innov = rnorm(200)*sqrt(0.5)) + eps),
ncol=1)
Zt <- 1
Ht <- matrix(NA)
Tt <- matrix(NA)
Rt <- 1
Qt <- matrix(NA)
ss_model <- SSModel(meas ~ -1 + SSMcustom(Z = Zt, T = Tt, R = Rt,
Q = Qt), H = Ht)
objf <- function(pars, model, estimate = TRUE) {
model$H[1] <- pars[1]
model$T[1] <- pars[2]
model$Q[1] <- pars[3]
if (estimate) {
-logLik(model)
} else {
model
}
}
opt <- optim(c(1, 0.5, 1), objf, method = "L-BFGS-B",
lower = c(0, -0.99, 0), upper = c(100, 0.99, 100), model = ss_model)
ss_model_opt <- objf(opt$par, ss_model, estimate = FALSE)
Same with fitSSM:
updatefn <- function(pars, model) {
model$H[1] <- pars[1]
model$T[1] <- pars[2]
model$Q[1] <- pars[3]
model
}
fit <- fitSSM(ss_model, c(1, 0.5, 1), updatefn, method = "L-BFGS-B",
lower = c(0, -0.99, 0), upper = c(100, 0.99, 100))
identical(ss_model_opt, fit$model)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Understanding different results of optim() and lm() - r

Related

Fit Exponentially modified Gaussian to 2D data

Logistic Regression in R: Optimization Issues concerning Initial Guess

nlm function fails with analytic Hessian

Estimate a probit regression model with optim()

How to estimate the Kalman Filter with 'KFAS' R package, with an AR(1) transition equation?

Categories

Resources