R: error on estimating mle with MaxLik() package - r

My purpose is to find maximum likelihood estimator using Newton Raphson algorithm and compare the solution with glm(). so I tried to use maxLik() in R. And it turns out error, I have not use this package before, please fix this error, thank you!!
d <- read.delim("http://dnett.github.io/S510/Disease.txt")
d$disease=as.factor(d$disease)
d$ses=as.factor(d$ses)
d$sector=as.factor(d$sector)
str(d)
y<-as.numeric(as.character(d$disease))
x1<-as.numeric(as.character(d$age))
x2<-as.numeric(as.character(d$sector))
x3<-as.numeric(as.character(d$ses))
oreduced <- glm(disease ~ age + sector, family=binomial(link = logit), data=d)
summary(oreduced)
nlldbin=function(param){
eta<-param[1]+param[2]*x1+param[3]*(x2==2)
p<-1/(1+exp(-eta))
-sum(y*log(p)+(1-y)*log(1-p),na.rm=TRUE)
}
est_nr<-maxLik(nlldbin,start=c(0.01,0.01,0.01))
summary(est_nr)
The result is
Iteration 1
Parameter:
[1] 9841290 377928533 4325584
Gradient (first 30 components):
[,1] [,2] [,3]
[1,] NaN NaN NaN
Error in maxNRCompute(fn = function (theta, fnOrig, gradOrig = NULL, hessOrig = NULL, :
NA in gradient

We are trying to maximise the log-likelihood, but due to the negative applied to the sum, your function is minimising. Therefore just remove the negative, giving:
nlldbin <- function(param){
eta <- param[1] + param[2]*x1 + param[3]*(x2==2)
p <- 1/(1+exp(-eta))
sum(y*log(p) + (1-y) * log(1-p), na.rm=TRUE)
}
This is different from other optimisers, like optim, which often minimise by default. Which is why you would negate the sum, as you did.
ps you could write your function using in built functions, which may be a bit more stable (and less typing):
nlldbin2 <- function(param){
eta <- cbind(1, x1, x2 == 2) %*% param
p <- plogis(eta)
sum(dbinom(y, 1, p, log=TRUE))
}

Related

Using fitdist from fitdistrplus with betabinomial distribution with varying sizes

A related question is "using fitdist from fitdistplus with binomial distribution
". fitdistrplus::fitdist is a function that takes univariate data and starting guess for parameters. To fit binomial and betabinomial data, while univariate, the size is also needed. If the size is fixed for every datum, then the aforementioned link has the fix needed. However if the sizes vary and a vector needs to be passed, I'm unsure how to get a properly functioning call.
opt_one in the code below was the solution that was offered in the aforementioned linked post -- that is, the cluster size is known and fixed. For opt_one, I incorrectly specify fix.arg=list(size=125) (in essence making every element of N=125) and this is close enough and the code runs. However, the cluster sizes in N actually vary. I try to specify this in opt_two and get an error. Any thoughts would be appreciated.
library(fitdistrplus)
library(VGAM)
set.seed(123)
N <- 100 + rbinom(1000,25,0.9)
Y <- rbetabinom.ab(rep(1,length(N)), N, 1, 2)
head(cbind(Y,N))
opt_one <-
fitdist(data=Y,
distr=pbetabinom.ab,
fix.arg=list(size=125),
start=list(shape1=1,shape2=1)
)
opt_one
Which gives:
> head(cbind(Y,N))
Y N
[1,] 67 123
[2,] 14 121
[3,] 15 123
[4,] 42 121
[5,] 86 120
[6,] 28 125
> opt_one <-
+ fitdist(data=Y,
+ distr=pbetabinom.ab,
+ fix.arg=list(size=125),
+ start=list(shape1=1,shape2=1)
+ )
Warning messages:
1: In fitdist(data = Y, distr = pbetabinom.ab, fix.arg = list(size = 125), :
The dbetabinom.ab function should return a zero-length vector when input has length zero
2: In fitdist(data = Y, distr = pbetabinom.ab, fix.arg = list(size = 125), :
The pbetabinom.ab function should return a zero-length vector when input has length zero
> opt_one
Fitting of the distribution ' betabinom.ab ' by maximum likelihood
Parameters:
estimate Std. Error
shape1 0.9694054 0.04132912
shape2 2.1337839 0.10108720
Fixed parameters:
value
size 125
Not, bad, as the shape1 and shape2 parameters were 1 and 2, respectively, as specified when we created Y. Here's option 2:
opt_two <-
fitdist(data=Y,
distr=pbetabinom.ab,
fix.arg=list(size=N),
start=list(shape1=1,shape2=1)
)
Which gives an error:
> opt_two <-
+ fitdist(data=Y,
+ distr=pbetabinom.ab,
+ fix.arg=list(size=N),
+ start=list(shape1=1,shape2=1)
+ )
Error in checkparamlist(arg_startfix$start.arg, arg_startfix$fix.arg, :
'fix.arg' must specify names which are arguments to 'distr'.
An Attempt after initial posting (thanks to Dean Follmann)
I know I can code my own betabinomial likelihood (opt_three, presented below), but would really like to use the tools provided with having a fitdist object -- that is, to have opt_two working.
library(Rfast)
loglik <-function(parm){
A<-parm[1];B<-parm[2]
-sum( Lgamma(A+B) - Lgamma(A)- Lgamma(B) + Lgamma(Y+A) + Lgamma(N-Y+B) - Lgamma(N+A+B) )
}
opt_three <- optim(c(1,1),loglik, method = "L-BFGS-B", lower=c(0,0))
opt_three
Which gives:
> opt_three
$par
[1] 0.9525161 2.0262342
$value
[1] 61805.54
$counts
function gradient
7 7
$convergence
[1] 0
$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
Also related is Ben Bolker's answer using mle2. The fitdist solution remains at large.
Look at example 4 of the ?fitdistrplus::fitdist() help page:
# (4) defining your own distribution functions, here for the Gumbel distribution
# for other distributions, see the CRAN task view
# dedicated to probability distributions
#
dgumbel <- function(x, a, b) 1/b*exp((a-x)/b)*exp(-exp((a-x)/b))
pgumbel <- function(q, a, b) exp(-exp((a-q)/b))
qgumbel <- function(p, a, b) a-b*log(-log(p))
fitgumbel <- fitdist(serving, "gumbel", start=list(a=10, b=10))
summary(fitgumbel)
plot(fitgumbel)
and then -- feeling inspired and informed because you actually RTM -- make your own [dpq] functions with N specified:
dbbspecifiedsize <- function(x, a, b) dbetabinom.ab(x, size=N, shape1=a, shape2=b)
pbbspecifiedsize <- function(q, a, b) pbetabinom.ab(q, size=N, shape1=a, shape2=b)
qbbspecifiedsize <- function(p, a, b) qbetabinom.ab(p, size=N, shape1=a, shape2=b)
opt_four <-
fitdist(data=Y,
distr="bbspecifiedsize",
start=list(a=1,b=1)
)
opt_four
which gives:
> opt_four
Fitting of the distribution ' bbspecifiedsize ' by maximum likelihood
Parameters:
estimate Std. Error
a 0.9526875 0.04058396
b 2.0261339 0.09576709
which is quite similar to the estimates of opt_three and is a fitdist object.

nlm function fails with analytic Hessian

Some background: the nlm function in R is a general purpose optimization routine that uses Newton's method. To optimize a function, Newton's method requires the function, as well as the first and second derivatives of the function (the gradient vector and the Hessian matrix, respectively). In R the nlm function allows you to specify R functions that correspond to calculations of the gradient and Hessian, or one can leave these unspecified and numerical solutions are provided based on numerical derivatives (via the deriv function). More accurate solutions can be found by supplying functions to calculate the gradient and Hessian, so it's a useful feature.
My problem: the nlm function is slower and often fails to converge in a reasonable amount of time when the analytic Hessian is supplied. I'm guessing this is some sort of bug in the underlying code, but I'd be happy to be wrong. Is there a way to make nlm work better with an analytic Hessian matrix?
Example: my R code below demonstrates this problem using a logistic regression example, where
log(Pr(Y=1)/Pr(Y=0)) = b0 + Xb
where X is a multivariate normal of dimension N by p and b is a vector of coefficients of length p.
library(mvtnorm)
# example demonstrating a problem with NLM
expit <- function(mu) {1/(1+exp(-mu))}
mk.logit.data <- function(N,p){
set.seed(1232)
U = matrix(runif(p*p), nrow=p, ncol=p)
S = 0.5*(U+t(U)) + p*diag(rep(1,p))
X = rmvnorm(N, mean = runif(p, -1, 1), sigma = S)
Design = cbind(rep(1, N), X)
beta = sort(sample(c(rep(0,p), runif(1))))
y = rbinom(N, 1, expit(Design%*%beta))
list(X=X,y=as.numeric(y),N=N,p=p)
}
# function to calculate gradient vector at given coefficient values
logistic_gr <- function(beta, y, x, min=TRUE){
mu = beta[1] + x %*% beta[-1]
p = length(beta)
n = length(y)
D = cbind(rep(1,n), x)
gri = matrix(nrow=n, ncol=p)
for(j in 1:p){
gri[,j] = D[,j]*(exp(-mu)*y-1+y)/(1+exp(-mu))
}
gr = apply(gri, 2, sum)
if(min) gr = -gr
gr
}
# function to calculate Hessian matrix at given coefficient values
logistic_hess <- function(beta, y, x, min=TRUE){
# allow to fail with NA, NaN, Inf values
mu = beta[1] + x %*% beta[-1]
p = length(beta)
n = length(y)
D = cbind(rep(1,n), x)
h = matrix(nrow=p, ncol=p)
for(j in 1:p){
for(k in 1:p){
h[j,k] = -sum(D[,j]*D[,k]*(exp(-mu))/(1+exp(-mu))^2)
}
}
if(min) h = -h
h
}
# function to calculate likelihood (up to a constant) at given coefficient values
logistic_ll <- function(beta, y,x, gr=FALSE, he=FALSE, min=TRUE){
mu = beta[1] + x %*% beta[-1]
lli = log(expit(mu))*y + log(1-expit(mu))*(1-y)
ll = sum(lli)
if(is.na(ll) | is.infinite(ll)) ll = -1e16
if(min) ll=-ll
# the below specification is required for using analytic gradient/Hessian in nlm function
if(gr) attr(ll, "gradient") <- logistic_gr(beta, y=y, x=x, min=min)
if(he) attr(ll, "hessian") <- logistic_hess(beta, y=y, x=x, min=min)
ll
}
First example, with p=3:
dat = mk.logit.data(N=100, p=3)
The glm function estimates are for reference. nlm should give the same answer, allowing for small errors due to approximation.
(glm.sol <- glm(dat$y~dat$X, family=binomial()))$coefficients
> (Intercept) dat$X1 dat$X2 dat$X3
> 0.00981465 0.01068939 0.04417671 0.01625381
# works when correct analytic gradient is specified
(nlm.sol1 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, y=dat$y, x=dat$X))$estimate
> [1] 0.009814547 0.010689396 0.044176627 0.016253966
# works, but less accurate when correct analytic hessian is specified (even though the routine notes convergence is probable)
(nlm.sol2 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, he=TRUE, y=dat$y, x=dat$X, hessian = TRUE, check.analyticals=TRUE))$estimate
> [1] 0.009827701 0.010687278 0.044178416 0.016255630
But the problem becomes apparent when p is larger, here it is 10
dat = mk.logit.data(N=100, p=10)
Again, glm solution for reference. nlm should give the same answer, allowing for small errors due to approximation.
(glm.sol <- glm(dat$y~dat$X, family=binomial()))$coefficients
> (Intercept) dat$X1 dat$X2 dat$X3 dat$X4 dat$X5 dat$X6 dat$X7
> -0.07071882 -0.08670003 0.16436630 0.01130549 0.17302058 0.03821008 0.08836471 -0.16578959
> dat$X8 dat$X9 dat$X10
> -0.07515477 -0.08555075 0.29119963
# works when correct analytic gradient is specified
(nlm.sol1 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, y=dat$y, x=dat$X))$estimate
> [1] -0.07071879 -0.08670005 0.16436632 0.01130550 0.17302057 0.03821009 0.08836472
> [8] -0.16578958 -0.07515478 -0.08555076 0.29119967
# fails to converge in 5000 iterations when correct analytic hessian is specified
(nlm.sol2 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, he=TRUE, y=dat$y, x=dat$X, hessian = TRUE, iterlim=5000, check.analyticals=TRUE))$estimate
> [1] 0.31602065 -0.06185190 0.10775381 -0.16748897 0.05032156 0.34176104 0.02118631
> [8] -0.01833671 -0.20364929 0.63713991 0.18390489
Edit: I should also add that I have confirmed I have the correct Hessian matrix through multiple different approaches
I tried the code, but at first it seemed to be using a different rmvnorm than I can find on CRAN. I found one rmvnorm in dae package, then one in the mvtnorm package. The latter is the one to use.
nlm() was patched about the time of the above posting. I'm currently trying to verify the patches and it now seems to work OK. Note that I am author of a number of R's optimization codes, including 3/5 in optim().
nashjc at uottawa.ca
Code is below.
Revised code:
# example demonstrating a problem with NLM
expit <- function(mu) {1/(1+exp(-mu))}
mk.logit.data <- function(N,p){
set.seed(1232)
U = matrix(runif(p*p), nrow=p, ncol=p)
S = 0.5*(U+t(U)) + p*diag(rep(1,p))
X = rmvnorm(N, mean = runif(p, -1, 1), sigma = S)
Design = cbind(rep(1, N), X)
beta = sort(sample(c(rep(0,p), runif(1))))
y = rbinom(N, 1, expit(Design%*%beta))
list(X=X,y=as.numeric(y),N=N,p=p)
}
# function to calculate gradient vector at given coefficient values
logistic_gr <- function(beta, y, x, min=TRUE){
mu = beta[1] + x %*% beta[-1]
p = length(beta)
n = length(y)
D = cbind(rep(1,n), x)
gri = matrix(nrow=n, ncol=p)
for(j in 1:p){
gri[,j] = D[,j]*(exp(-mu)*y-1+y)/(1+exp(-mu))
}
gr = apply(gri, 2, sum)
if(min) gr = -gr
gr
}
# function to calculate Hessian matrix at given coefficient values
logistic_hess <- function(beta, y, x, min=TRUE){
# allow to fail with NA, NaN, Inf values
mu = beta[1] + x %*% beta[-1]
p = length(beta)
n = length(y)
D = cbind(rep(1,n), x)
h = matrix(nrow=p, ncol=p)
for(j in 1:p){
for(k in 1:p){
h[j,k] = -sum(D[,j]*D[,k]*(exp(-mu))/(1+exp(-mu))^2)
}
}
if(min) h = -h
h
}
# function to calculate likelihood (up to a constant) at given coefficient values
logistic_ll <- function(beta, y,x, gr=FALSE, he=FALSE, min=TRUE){
mu = beta[1] + x %*% beta[-1]
lli = log(expit(mu))*y + log(1-expit(mu))*(1-y)
ll = sum(lli)
if(is.na(ll) | is.infinite(ll)) ll = -1e16
if(min) ll=-ll
# the below specification is required for using analytic gradient/Hessian in nlm function
if(gr) attr(ll, "gradient") <- logistic_gr(beta, y=y, x=x, min=min)
if(he) attr(ll, "hessian") <- logistic_hess(beta, y=y, x=x, min=min)
ll
}
##!!!! NOTE: Must have this library loaded
library(mvtnorm)
dat = mk.logit.data(N=100, p=3)
(glm.sol <- glm(dat$y~dat$X, family=binomial()))$coefficients
# works when correct analytic gradient is specified
(nlm.sol1 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, y=dat$y, x=dat$X))$estimate
# works, but less accurate when correct analytic hessian is specified (even though the routine notes convergence is probable)
(nlm.sol2 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, he=TRUE, y=dat$y, x=dat$X, hessian = TRUE, check.analyticals=TRUE))$estimate
dat = mk.logit.data(N=100, p=10)
# Again, glm solution for reference. nlm should give the same answer, allowing for small errors due to approximation.
(glm.sol <- glm(dat$y~dat$X, family=binomial()))$coefficients
# works when correct analytic gradient is specified
(nlm.sol1 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, y=dat$y, x=dat$X))$estimate
# fails to converge in 5000 iterations when correct analytic hessian is specified
(nlm.sol2 <- nlm(p=runif(dat$p+1), f=logistic_ll, gr=TRUE, he=TRUE, y=dat$y, x=dat$X, hessian = TRUE, iterlim=5000, check.analyticals=TRUE))$estimate

How does ar.yw estimate the variance

In R, how does the function ar.yw estimate the variance? Specifically, where does the number "var.pred" come from? It does not seem to come from the usual YW estimate of the variance, nor the sum of squared residuals divided by df (even though there is disagreement about what the df should be, none of the choices give an answer equivalent to var.pred). And yes, I know that there are better methods than YW; just trying to figure out what R is doing.
set.seed(82346)
temp <- arima.sim(n=10, list(ar = 0.5), sd=1)
fit <- ar(temp, method = "yule-walker", demean = FALSE, aic=FALSE, order.max=1)
## R's estimate of the sigma squared
fit$var.pred
## YW estimate
sum(temp^2)/10 - fit$ar*sum(temp[2:10]*temp[1:9])/10
## YW if there was a mean
sum((temp-mean(temp))^2)/10 - fit$ar*sum((temp[2:10]-mean(temp))*(temp[1:9]-mean(temp)))/10
## estimate based on residuals, different possible df.
sum(na.omit(fit$resid^2))/10
sum(na.omit(fit$resid^2))/9
sum(na.omit(fit$resid^2))/8
sum(na.omit(fit$resid^2))/7
Need to read the code if it's not documented.
?ar.yw
Which says: "In ar.yw the variance matrix of the innovations is computed from the fitted coefficients and the autocovariance of x." If that is not enough explanation, then you need to look at the code:
methods(ar.yw)
#[1] ar.yw.default* ar.yw.mts*
#see '?methods' for accessing help and source code
getAnywhere(ar.yw.default)
# there are two cases that I see
x <- as.matrix(x)
nser <- ncol(x)
if (nser > 1L) # .... not your situation
#....
else{
r <- as.double(drop(xacf))
z <- .Fortran(C_eureka, as.integer(order.max), r, r,
coefs = double(order.max^2), vars = double(order.max),
double(order.max))
coefs <- matrix(z$coefs, order.max, order.max)
partialacf <- array(diag(coefs), dim = c(order.max, 1L,
1L))
var.pred <- c(r[1L], z$vars)
#.......
order <- if (aic)
(0L:order.max)[xaic == 0L]
else order.max
ar <- if (order)
coefs[order, seq_len(order)]
else numeric()
var.pred <- var.pred[order + 1L]
var.pred <- var.pred * n.used/(n.used - (order + 1L))
So you now need to find the Fortran code for C_eureka. I think I'm finding it here: https://svn.r-project.org/R/trunk/src/library/stats/src/eureka.f This is the code that aI think is returning the var.pred estimate. I'm not a time series guy and It's your responsibility to review this process for applicability to your problem.
subroutine eureka (lr,r,g,f,var,a)
c
c solves Toeplitz matrix equation toep(r)f=g(1+.)
c by Levinson's algorithm
c a is a workspace of size lr, the number
c of equations
c
snipped
c estimate the innovations variance
var(l) = var(l-1) * (1 - f(l,l)*f(l,l))
if (l .eq. lr) return
d = 0.0d0
q = 0.0d0
do 50 i = 1, l
k = l-i+2
d = d + a(i)*r(k)
q = q + f(l,i)*r(k)
50 continue

Understanding different results of optim() and lm()

Given:
set.seed(1001)
outcome<-rnorm(1000,sd = 1)
covariate<-rnorm(1000,sd = 1)
log-likelihood of normal pdf:
loglike <- function(par, outcome, covariate){
cov <- as.matrix(cbind(1, covariate))
xb <- cov * par
(- 1/2* sum((outcome - xb)^2))
}
optimize:
opt.normal <- optim(par = 0.1,fn = loglike,outcome=outcome,cov=covariate, method = "BFGS", control = list(fnscale = -1),hessian = TRUE)
However I get different results when running an simple OLS. However maximizing log-likelihhod and minimizing OLS should bring me to a similar estimate. I suppose there is something wrong with my optimization.
summary(lm(outcome~covariate))
Umm several things... Here's a proper working likelihood function (with names x and y):
loglike =
function(par,x,y){cov = cbind(1,x); xb = cov %*% par;(-1/2)*sum((y-xb)^2)}
Note use of matrix multiplication operator.
You were also only running it with one par parameter, so it was not only broken because your loglike was doing element-element multiplication, it was only returning one value too.
Now compare optimiser parameters with lm coefficients:
opt.normal <- optim(par = c(0.1,0.1),fn = loglike,y=outcome,x=covariate, method = "BFGS", control = list(fnscale = -1),hessian = TRUE)
opt.normal$par
[1] 0.02148234 -0.09124299
summary(lm(outcome~covariate))$coeff
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.02148235 0.03049535 0.7044466 0.481319029
covariate -0.09124299 0.03049819 -2.9917515 0.002842011
shazam.
Helpful hints: create data that you know the right answer for - eg x=1:10; y=rnorm(10)+(1:10) so you know the slope is 1 and the intercept 0. Then you can easily see which of your things are in the right ballpark. Also, run your loglike function on its own to see if it behaves as you expect.
Maybe you will find it usefull to see the difference between these two methods from my code. I programmed it the following way.
data.matrix <- as.matrix(hprice1[,c("assess","bdrms","lotsize","sqrft","colonial")])
loglik <- function(p,z){
beta <- p[1:5]
sigma <- p[6]
y <- log(data.matrix[,1])
eps <- (y - beta[1] - z[,2:5] %*% beta[2:5])
-nrow(z)*log(sigma)-0.5*sum((eps/sigma)^2)
}
p0 <- c(5,0,0,0,0,2)
m <- optim(p0,loglik,method="BFGS",control=list(fnscale=-1,trace=10),hessian=TRUE,z=data.matrix)
rbind(m$par,sqrt(diag(solve(-m$hessian))))
And for the lm() method I find this
m.ols <- lm(log(assess)~bdrms+lotsize+sqrft+colonial,data=hprice1)
summary(m.ols)
Also if you would like to estimate the elasticity of assessed value with respect to the lotsize or calculate a 95% confidence interval
for this parameter, you could use the following
elasticity.at.mean <- mean(hprice1$lotsize) * m$par[3]
var.coefficient <- solve(-m$hessian)[3,3]
var.elasticity <- mean(hprice1$lotsize)^2 * var.coefficient
# upper bound
elasticity.at.mean + qnorm(0.975)* sqrt(var.elasticity)
# lower bound
elasticity.at.mean + qnorm(0.025)* sqrt(var.elasticity)
A more simple example of the optim method is given below for a binomial distribution.
loglik1 <- function(p,n,n.f){
n.f*log(p) + (n-n.f)*log(1-p)
}
m <- optim(c(pi=0.5),loglik1,control=list(fnscale=-1),
n=73,n.f=18)
m
m <- optim(c(pi=0.5),loglik1,method="BFGS",hessian=TRUE,
control=list(fnscale=-1),n=73,n.f=18)
m
pi.hat <- m$par
numerical calculation of s.d
rbind(pi.hat=pi.hat,sd.pi.hat=sqrt(diag(solve(-m$hessian))))
analytical
rbind(pi.hat=18/73,sd.pi.hat=sqrt((pi.hat*(1-pi.hat))/73))
Or this code for the normal distribution.
loglik1 <- function(p,z){
mu <- p[1]
sigma <- p[2]
-(length(z)/2)*log(sigma^2) - sum(z^2)/(2*sigma^2) +
(mu*sum(z)/sigma^2) - (length(z)*mu^2)/(2*sigma^2)
}
m <- optim(c(mu=0,sigma2=0.1),loglik1,
control=list(fnscale=-1),z=aex)

hand-rolled R code for Poisson MLE

I'm attempting to write my own function to understand how the Poisson distribution behaves within a Maximum Likelihood Estimation framework (as it applies to GLM).
I'm familiar with R's handy glm function, but wanted to try and hand-roll some code to understand what's going on:
n <- 10000 # sample size
b0 <- 1.0 # intercept
b1 <- 0.2 # coefficient
x <- runif(n=n, min=0, max=1.5) # generate covariate values
lp <- b0+b1*x # linear predictor
lambda <- exp(lp) # compute lamda
y <- rpois(n=n, lambda=lambda) # generate y-values
dta <- data.frame(y=y, x=x) # generate dataset
negloglike <- function(lambda) {n*lambda-sum(x)*log(lambda) + sum(log(factorial(y)))} # build negative log-likelihood
starting.vals <- c(0,0) # one starting value for each parameter
pars <- c(b0, b1)
maxLike <- optim(par=pars,fn=negloglike, data = dta) # optimize
My R output when I enter maxLike is the following:
Error in fn(par, ...) : unused argument (data = list(y = c(2, 4....
I assume I've specified optim within my function incorrectly, but I'm not familiar enough with the nuts-and-bolts of MLE or constrained optimization to understand what I'm missing.
optim can only use your function in a certain way. It assumes the first parameter in your function takes in the parameters as a vector. If you need to pass other information to this function (in your case the data) you need to have that as a parameter of your function. Your negloglike function doesn't have a data parameter and that's what it is complaining about. The way you have it coded you don't need one so you probably could fix your problem by just removing the data=dat part of your call to optim but I didn't test that. Here is a small example of doing a simple MLE for just a poisson (not the glm)
negloglike_pois <- function(par, data){
x <- data$x
lambda <- par[1]
-sum(dpois(x, lambda, log = TRUE))
}
dat <- data.frame(x = rpois(30, 5))
optim(par = 4, fn = negloglike_pois, data = dat)
mean(dat$x)
> optim(par = 4, fn = negloglike_pois, data = dat)
$par
[1] 4.833594
$value
[1] 65.7394
$counts
function gradient
22 NA
$convergence
[1] 0
$message
NULL
Warning message:
In optim(par = 4, fn = negloglike_pois, data = dat) :
one-dimensional optimization by Nelder-Mead is unreliable:
use "Brent" or optimize() directly
> # The "true" MLE. We didn't hit it exactly but came really close
> mean(dat$x)
[1] 4.833333
Implementing the comments from Dason's answer is quite straightforward, but just in case:
library("data.table")
d <- data.table(id = as.character(1:100),
x1 = runif(100, 0, 1),
x2 = runif(100, 0, 1))
#' the assumption is that lambda can be written as
#' log(lambda) = b1*x1 + b2*x2
#' (In addition, could add a random component)
d[, mean := exp( 1.57*x1 + 5.86*x2 )]
#' draw a y for each of the observations
#' (rpois is not vectorized, need to use sapply)
d[, y := sapply(mean, function(x)rpois(1,x)) ]
negloglike_pois <- function(par, data){
data <- copy(d)
# update estimate of the mean
data[, mean_tmp := exp( par[1]*x1 + par[2]*x2 )]
# calculate the contribution of each observation to the likelihood
data[, log_p := dpois(y, mean_tmp, log = T)]
#' Now we can sum up the probabilities
data[, -sum(log_p)]
}
optim(par = c(1,1), fn = negloglike_pois, data = d)
$par
[1] 1.554759 5.872219
$value
[1] 317.8094
$counts
function gradient
95 NA
$convergence
[1] 0
$message
NULL

Resources