A related question is "using fitdist from fitdistplus with binomial distribution
". fitdistrplus::fitdist is a function that takes univariate data and starting guess for parameters. To fit binomial and betabinomial data, while univariate, the size is also needed. If the size is fixed for every datum, then the aforementioned link has the fix needed. However if the sizes vary and a vector needs to be passed, I'm unsure how to get a properly functioning call.
opt_one in the code below was the solution that was offered in the aforementioned linked post -- that is, the cluster size is known and fixed. For opt_one, I incorrectly specify fix.arg=list(size=125) (in essence making every element of N=125) and this is close enough and the code runs. However, the cluster sizes in N actually vary. I try to specify this in opt_two and get an error. Any thoughts would be appreciated.
library(fitdistrplus)
library(VGAM)
set.seed(123)
N <- 100 + rbinom(1000,25,0.9)
Y <- rbetabinom.ab(rep(1,length(N)), N, 1, 2)
head(cbind(Y,N))
opt_one <-
fitdist(data=Y,
distr=pbetabinom.ab,
fix.arg=list(size=125),
start=list(shape1=1,shape2=1)
)
opt_one
Which gives:
> head(cbind(Y,N))
Y N
[1,] 67 123
[2,] 14 121
[3,] 15 123
[4,] 42 121
[5,] 86 120
[6,] 28 125
> opt_one <-
+ fitdist(data=Y,
+ distr=pbetabinom.ab,
+ fix.arg=list(size=125),
+ start=list(shape1=1,shape2=1)
+ )
Warning messages:
1: In fitdist(data = Y, distr = pbetabinom.ab, fix.arg = list(size = 125), :
The dbetabinom.ab function should return a zero-length vector when input has length zero
2: In fitdist(data = Y, distr = pbetabinom.ab, fix.arg = list(size = 125), :
The pbetabinom.ab function should return a zero-length vector when input has length zero
> opt_one
Fitting of the distribution ' betabinom.ab ' by maximum likelihood
Parameters:
estimate Std. Error
shape1 0.9694054 0.04132912
shape2 2.1337839 0.10108720
Fixed parameters:
value
size 125
Not, bad, as the shape1 and shape2 parameters were 1 and 2, respectively, as specified when we created Y. Here's option 2:
opt_two <-
fitdist(data=Y,
distr=pbetabinom.ab,
fix.arg=list(size=N),
start=list(shape1=1,shape2=1)
)
Which gives an error:
> opt_two <-
+ fitdist(data=Y,
+ distr=pbetabinom.ab,
+ fix.arg=list(size=N),
+ start=list(shape1=1,shape2=1)
+ )
Error in checkparamlist(arg_startfix$start.arg, arg_startfix$fix.arg, :
'fix.arg' must specify names which are arguments to 'distr'.
An Attempt after initial posting (thanks to Dean Follmann)
I know I can code my own betabinomial likelihood (opt_three, presented below), but would really like to use the tools provided with having a fitdist object -- that is, to have opt_two working.
library(Rfast)
loglik <-function(parm){
A<-parm[1];B<-parm[2]
-sum( Lgamma(A+B) - Lgamma(A)- Lgamma(B) + Lgamma(Y+A) + Lgamma(N-Y+B) - Lgamma(N+A+B) )
}
opt_three <- optim(c(1,1),loglik, method = "L-BFGS-B", lower=c(0,0))
opt_three
Which gives:
> opt_three
$par
[1] 0.9525161 2.0262342
$value
[1] 61805.54
$counts
function gradient
7 7
$convergence
[1] 0
$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
Also related is Ben Bolker's answer using mle2. The fitdist solution remains at large.
Look at example 4 of the ?fitdistrplus::fitdist() help page:
# (4) defining your own distribution functions, here for the Gumbel distribution
# for other distributions, see the CRAN task view
# dedicated to probability distributions
#
dgumbel <- function(x, a, b) 1/b*exp((a-x)/b)*exp(-exp((a-x)/b))
pgumbel <- function(q, a, b) exp(-exp((a-q)/b))
qgumbel <- function(p, a, b) a-b*log(-log(p))
fitgumbel <- fitdist(serving, "gumbel", start=list(a=10, b=10))
summary(fitgumbel)
plot(fitgumbel)
and then -- feeling inspired and informed because you actually RTM -- make your own [dpq] functions with N specified:
dbbspecifiedsize <- function(x, a, b) dbetabinom.ab(x, size=N, shape1=a, shape2=b)
pbbspecifiedsize <- function(q, a, b) pbetabinom.ab(q, size=N, shape1=a, shape2=b)
qbbspecifiedsize <- function(p, a, b) qbetabinom.ab(p, size=N, shape1=a, shape2=b)
opt_four <-
fitdist(data=Y,
distr="bbspecifiedsize",
start=list(a=1,b=1)
)
opt_four
which gives:
> opt_four
Fitting of the distribution ' bbspecifiedsize ' by maximum likelihood
Parameters:
estimate Std. Error
a 0.9526875 0.04058396
b 2.0261339 0.09576709
which is quite similar to the estimates of opt_three and is a fitdist object.
Related
My purpose is to find maximum likelihood estimator using Newton Raphson algorithm and compare the solution with glm(). so I tried to use maxLik() in R. And it turns out error, I have not use this package before, please fix this error, thank you!!
d <- read.delim("http://dnett.github.io/S510/Disease.txt")
d$disease=as.factor(d$disease)
d$ses=as.factor(d$ses)
d$sector=as.factor(d$sector)
str(d)
y<-as.numeric(as.character(d$disease))
x1<-as.numeric(as.character(d$age))
x2<-as.numeric(as.character(d$sector))
x3<-as.numeric(as.character(d$ses))
oreduced <- glm(disease ~ age + sector, family=binomial(link = logit), data=d)
summary(oreduced)
nlldbin=function(param){
eta<-param[1]+param[2]*x1+param[3]*(x2==2)
p<-1/(1+exp(-eta))
-sum(y*log(p)+(1-y)*log(1-p),na.rm=TRUE)
}
est_nr<-maxLik(nlldbin,start=c(0.01,0.01,0.01))
summary(est_nr)
The result is
Iteration 1
Parameter:
[1] 9841290 377928533 4325584
Gradient (first 30 components):
[,1] [,2] [,3]
[1,] NaN NaN NaN
Error in maxNRCompute(fn = function (theta, fnOrig, gradOrig = NULL, hessOrig = NULL, :
NA in gradient
We are trying to maximise the log-likelihood, but due to the negative applied to the sum, your function is minimising. Therefore just remove the negative, giving:
nlldbin <- function(param){
eta <- param[1] + param[2]*x1 + param[3]*(x2==2)
p <- 1/(1+exp(-eta))
sum(y*log(p) + (1-y) * log(1-p), na.rm=TRUE)
}
This is different from other optimisers, like optim, which often minimise by default. Which is why you would negate the sum, as you did.
ps you could write your function using in built functions, which may be a bit more stable (and less typing):
nlldbin2 <- function(param){
eta <- cbind(1, x1, x2 == 2) %*% param
p <- plogis(eta)
sum(dbinom(y, 1, p, log=TRUE))
}
My question relates to the use of R for the derivation of maximum likelihood estimates of parameters when a probability distributions is expressed in the form of an infinite sum, such as the one below due to Rao, Girija et al.
I wanted to see if I could reproduce the maximum likelihood estimates obtained by these authors (who used Matlab, rather than R) when the model is applied to a given set of data. My attempt is given below, although this throws up several warnings that "longer object length is not a multiple of shorter object length". I know why I am getting this warning, but I do not know how to remedy it. How can I edit my code to overcome this?
Also, is there a better way to handle infinite sums? Here I'm just using an arbitrary large number for n (1000).
library(bbmle)
svec <- list(c=1,lambda=1)
x <- scan(textConnection("0.1396263 0.1570796 0.2268928 0.2268928 0.2443461 0.3141593 0.3839724 0.4712389 0.5235988 0.5934119 0.6632251 0.6632251 0.6981317 0.7679449 0.7853982 0.8203047 0.8377580 0.8377580 0.8377580 0.8377580 0.8726646 0.9250245 0.9773844 0.9948377 1.0122910 1.0122910 1.0646508 1.0995574 1.1170107 1.1170107 1.1170107 1.1344640 1.1344640 1.1868239 1.2217305 1.2740904 1.3613568 1.3613568 1.3613568 1.4486233 1.4486233 1.5358897 1.5358897 1.5358897 1.5707963 1.6057029 1.6057029 1.6231562 1.6580628 1.6755161 1.7104227 1.7453293 1.7976891 1.8500490 1.9722221 2.0594885 2.4085544 2.6703538 2.6703538 2.7052603 3.5604717 3.7524579 3.8920842 3.9444441 4.1364303 4.1538836 4.2411501 4.2586034 4.3633231 4.3807764 4.4854962 4.6774824 4.9741884 5.5676003 5.9864793 6.1086524"))
dL <- function(x, c,lambda,n = 1000,log=TRUE) {
k <- 0:n
r <- log(sum(lambda*c*(x+2*k*pi)^(-c-1)*(exp(-(x+2*k*pi)^(-c))^(lambda))))
if (log) return(r) else return(exp(r))
}
dat <- data.frame(x)
m1 <- mle2( x ~ dL(c,lambda),
data=dat,
start=svec,
control=list(parscale=unlist(svec)),
method="L-BFGS-B",
lower=c(0,0)
)
I suggest starting out with that algorithm and making a density function that can be tested for proper behavior by integrating over its range of definition, (c(0, 2*pi). You are calling it a "probability function" but that is a term that I associate with CDF's rather than density distributions (PDF's):
dL <- function(x, c=1,lambda=1,n = 1000, log=FALSE) {
k <- 0:n
r <- sum(lambda*c*(x+2*k*pi)^(-c-1)*(exp(-(x+2*k*pi)^(-c))^(lambda)))
if (log) {log(r) }
}
vdL <- Vectorize(dL)
integrate(vdL, 0,2*pi)
#0.999841 with absolute error < 9.3e-06
LL <- function(x, c, lambda){ -sum( log( vdL(x, c, lambda))) }
(I think you were trying to pack too much into your log-likelihood function so I decide to break apart the steps.)
When I ran that version I got a warning message from the final mle2 step that I didn't like and I thought it might be the case that this density function was occasionally returning negative values, so this was my final version:
dL <- function(x, c=1,lambda=1,n = 1000) {
k <- 0:n
r <- max( sum(lambda*c*(x+2*k*pi)^(-c-1)*(exp(-(x+2*k*pi)^(-c))^(lambda))), 0.00000001)
}
vdL <- Vectorize(dL)
integrate(vdL, 0,2*pi)
#0.999841 with absolute error < 9.3e-06
LL <- function(x, c, lambda){ -sum( log( vdL(x, c, lambda))) }
(m0 <- mle2(LL,start=list(c=0.2,lambda=1),data=list(x=x)))
#------------------------
Call:
mle2(minuslogl = LL, start = list(c = 0.2, lambda = 1), data = list(x = x))
Coefficients:
c lambda
0.9009665 1.1372237
Log-likelihood: -116.96
(The warning and the warning-free LL numbers were the same.)
So I guess I think you were attempting to pack too much into your definition of a log-likelihood function and got tripped up somewhere. There should have been two summations, one for the density approximation and a second one for the summation of the log-likelihood. The numbers in those summations would have been different, hence the error you were seeing. Unpacking the steps allowed success at least to the extent of not throwing errors. I'm not sure what that density represents and cannot verify correctness.
As far as the question of whether there is a better way to approximate an infinite series, the answer hinges on what is known about the rate of convergence of the partial sums, and whether you can set up a tolerance value to compare successive values and stop calculations after a smaller number of terms.
When I look at the density, it makes me wonder if it applies to some scattering process:
curve(vdL(x, c=.9, lambda=1.137), 0.00001, 2*pi)
You can examine the speed of convergence by looking at the ratios of successive terms. Here's a function that does that for the first 10 terms at an arbitrary x:
> ratios <- function(x, c=1, lambda=1) {lambda*c*(x+2*(1:11)*pi)^(-c-1)*(exp(-(x+2*(1:10)*pi)^(-c))^(lambda))/lambda*c*(x+2*(0:10)*pi)^(-c-1)*(exp(-(x+2*(0:10)*pi)^(-c))^(lambda)) }
> ratios(0.5)
[1] 1.015263e-02 1.017560e-04 1.376150e-05 3.712618e-06 1.392658e-06 6.351874e-07 3.299032e-07 1.880054e-07
[9] 1.148694e-07 7.409595e-08 4.369854e-08
Warning message:
In lambda * c * (x + 2 * (1:11) * pi)^(-c - 1) * (exp(-(x + 2 * :
longer object length is not a multiple of shorter object length
> ratios(0.05)
[1] 1.755301e-08 1.235632e-04 1.541082e-05 4.024074e-06 1.482741e-06 6.686497e-07 3.445688e-07 1.952358e-07
[9] 1.187626e-07 7.634088e-08 4.443193e-08
Warning message:
In lambda * c * (x + 2 * (1:11) * pi)^(-c - 1) * (exp(-(x + 2 * :
longer object length is not a multiple of shorter object length
> ratios(0.5)
[1] 1.015263e-02 1.017560e-04 1.376150e-05 3.712618e-06 1.392658e-06 6.351874e-07 3.299032e-07 1.880054e-07
[9] 1.148694e-07 7.409595e-08 4.369854e-08
Warning message:
In lambda * c * (x + 2 * (1:11) * pi)^(-c - 1) * (exp(-(x + 2 * :
longer object length is not a multiple of shorter object length
That looks like pretty rapid convergence to me, so I'm guessing that you could use only the first 20 terms and get similar results. With 20 terms the results look like:
> integrate(vdL, 0,2*pi)
0.9924498 with absolute error < 9.3e-06
> (m0 <- mle2(LL,start=list(c=0.2,lambda=1),data=list(x=x)))
Call:
mle2(minuslogl = LL, start = list(c = 0.2, lambda = 1), data = list(x = x))
Coefficients:
c lambda
0.9542066 1.1098169
Log-likelihood: -117.83
Since you never attempt to interpret a LL in isolation but rather look at differences, I'm guessing that the minor difference will not affect your inferences adversely.
I am using R {fExtremes} to find best parameters of GEV distribution for my data (a vector). but get the following error message
Error in solve.default(fit$hessian) : Lapack routine dgesv: system is exactly singular: U[1,1] = 0
I traced back to fit$hessian, found my hessian matrix is a sigular matrix, all of the elements are 0s. The source code (https://github.com/cran/fExtremes/blob/master/R/GevFit.R) of gevFit() shows fit$hessian is calculated by optim(). The output parameters are exactly the same value as the initial parameters. I am wondering what could be the problems of my data that cause this problem? I copied my code here
> min(sample);
[1] 5.240909
> max(sample)
[1] 175.8677
> length(sample)
[1] 6789
> mean(sample)
[1] 78.04107
>para<-gevFit(sample, type = "mle")
Error in solve.default(fit$hessian) :
Lapack routine dgesv: system is exactly singular: U[1,1] = 0
fit = optim(theta, .gumLLH, hessian = TRUE, ..., tmp = data)
> fit
$par
xi -0.3129225
mu 72.5542497
beta 16.4450897
$value
[1] 1e+06
$counts
function gradient
4 NA
$convergence
[1] 0
$message
NULL
$hessian
xi mu beta
xi 0 0 0
mu 0 0 0
beta 0 0 0
I updated my dataset on google docs:
https://docs.google.com/spreadsheets/d/1IRRpjmdrrJPhNmfiLism_P0efV_Ot4HlEsa6kwMnljc/edit?usp=sharing
This is going to be a long story, possibly more suited to https://stats.stackexchange.com/.
====== Part 1 -- The problem ======
This is the sequence generating the error:
library(fExtremes)
samp <- read.csv("optimdata.csv")[ ,2]
## does not converge
para <- gevFit(samp, type = "mle")
We are facing the typical cause of lack-of-convergence when using optim() and friends: inadequate starting values for the optimisation.
To see what goes wrong, let us use the PWM estimator (http://arxiv.org/abs/1310.3222); this consists of an analytical formula, hence it does not incur into convergence problems, since it makes no use of optim():
para <- gevFit(samp, type = "pwm")
fitpwm<- attr(para, "fit")
fitpwm$par.ests
The estimated tail parameter xi is negative, corresponding to a bounded upper tail; in fact the fitted distribution displays even more "upper tail boundedness" than the sample data, as you can see from the "leveling off" of the quantile-quantile graph at the right:
qqgevplot <- function(samp, params){
probs <- seq(0.1,0.99,by=0.01)
qqempir <- quantile(samp, probs)
qqtheor <- qgev(probs, xi=params["xi"], mu=params["mu"], beta=params["beta"])
rang <- range(qqempir,qqtheor)
plot(qqempir, qqtheor, xlim=rang, ylim=rang,
xlab="empirical", ylab="theoretical",
main="Quantile-quantile plot")
abline(a=0,b=1, col=2)
}
qqgevplot(samp, fitpwm$par.ests)
For xi<0.5 the MLE estimator is not regular (http://arxiv.org/abs/1301.5611): the value of -0.46 estimated by PWM for xi is very close to that. Now the PWM estimates are used internally by gevFit() as starting values for optim(): you can see this if you print out the code for the function gevFit():
print(gevFit)
print(.gevFit)
print(.gevmleFit)
The starting value for optim is theta, obtained by PWM. For the specific data at hand, this starting value is not adequate, in that it leads to non-convergence of optim().
====== Part 2 -- solutions? ======
Solution 1 is to use para <- gevFit(samp, type = "pwm") as above. If you'd like to use ML, then you have to specify good starting values for optim(). Unfortunately, the fExtremes package does not make it easy to do so. You can then re-define your own version of .gevmleFit to include those, e.g.
.gevmleFit <- function (data, block = NA, start.param, ...)
{
data = as.numeric(data)
n = length(data)
if(missing(start.param)){
theta = .gevpwmFit(data)$par.ests
}else{
theta = start.param
}
fit = optim(theta, .gevLLH, hessian = TRUE, ..., tmp = data)
if (fit$convergence)
warning("optimization may not have succeeded")
par.ests = fit$par
varcov = solve(fit$hessian)
par.ses = sqrt(diag(varcov))
ans = list(n = n, data = data, par.ests = par.ests, par.ses = par.ses,
varcov = varcov, converged = fit$convergence, nllh.final = fit$value)
class(ans) = "gev"
ans
}
## diverges, just as above
.gevmleFit(samp)
## diverges, just as above
startp <- fitpwm$par.ests
.gevmleFit(samp, start.param=startp)
## converges
startp <- structure(c(-0.1, 1, 1), names=names(fitpwm$par.ests))
.gevmleFit(samp, start.param=startp)$par.ests
Now check this out: the beta estimated by PWM is 0.1245; by changing this to a tiny amount, the MLE is made to converge:
startp <- fitpwm$par.ests
startp["beta"]
startp["beta"] <- 0.13
.gevmleFit(samp, start.param=startp)$par.ests
This hopefully clearly illustrates that to blindly optim()ise works until it doesn't and might then turn into a quite delicate endeavour indeed. For this reason, it might be useful to leave this reply here, rather than to migrate to CrossValidated.
I am using maximum-likelihood optimization in Stan, but unfortunately the optimizing() function doesn't report standard errors:
> MLb4c <- optimizing(get_stanmodel(fitb4c), data = win.data, init = inits)
STAN OPTIMIZATION COMMAND (LBFGS)
init = user
save_iterations = 1
init_alpha = 0.001
tol_obj = 1e-012
tol_grad = 1e-008
tol_param = 1e-008
tol_rel_obj = 10000
tol_rel_grad = 1e+007
history_size = 5
seed = 292156286
initial log joint probability = -4038.66
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
13 -2772.49 9.21091e-005 0.0135987 0.07606 0.9845 15
Optimization terminated normally:
Convergence detected: relative gradient magnitude is below tolerance
> t2 <- proc.time()
> print(t2 - t1)
user system elapsed
0.11 0.19 0.74
>
> MLb4c
$par
psi alpha beta
0.9495000 0.4350983 -0.2016895
$value
[1] -2772.489
> summary(MLb4c)
Length Class Mode
par 3 -none- numeric
value 1 -none- numeric
How do I get the standard errors of the estimates (or confidence interval - quantiles), and possibly p-values?
EDIT: I did as kindly advised by #Ben Goodrich:
> MLb4cH <- optimizing(get_stanmodel(fitb4c), data = win.data, init = inits, hessian = TRUE)
> sqrt(diag(solve(-MLb4cH$hessian)))
psi alpha beta
0.21138314 0.03251696 0.03270493
But these "unconstrained" standard errors seem to be very different from the real ones - here as is the output from bayesian fitting using stan():
> print(outb4c, dig = 5)
Inference for Stan model: tmp_stan_model.
3 chains, each with iter=500; warmup=250; thin=1;
post-warmup draws per chain=250, total post-warmup draws=750.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
alpha 0.43594 0.00127 0.03103 0.37426 0.41578 0.43592 0.45633 0.49915 594 1.00176
beta -0.20262 0.00170 0.03167 -0.26640 -0.22290 -0.20242 -0.18290 -0.13501 345 1.00402
psi 0.94905 0.00047 0.01005 0.92821 0.94308 0.94991 0.95656 0.96632 448 1.00083
lp__ -2776.94451 0.06594 1.15674 -2780.07437 -2777.50643 -2776.67139 -2776.09064 -2775.61263 308 1.01220
You can specify the hessian = TRUE argument to the optimizing function, which will return the Hessian as part of the list of output. Thus, you can obtain estimated standard errors via sqrt(diag(solve(-MLb4c$hessian))); however those standard errors pertain to the estimates in the unconstrained space. To obtain estimated standard errors for the parameters in the constrained space, you could either use the delta method or draw many times from a multivariate normal distribution whose mean vector is MLb4c$par and whose variance-covariance is solve(-MLb4c$hessian), convert those draws to the constrained space with the constrain_pars function, and estimate the standard deviation of each column.
Here is some R code you could adapt to your case
# 1: Compile and save a model (make sure to pass the data here)
model <- stan(file="model.stan", data=c("N","K","X","y"), chains = 0, iter = 0)
# 2: Fit that model
fit <- optimizing(object=get_stanmodel(model), as_vector = FALSE,
data=c("N","K","X","y"), hessian = TRUE)
# 3: Extract the vector theta_hat and the Hessian for the unconstrained parameters
theta_hat <- unlist(fit$par)
upars <- unconstrain_pars(linear, relist(theta_hat, fit$par))
Hessian <- fit$hessian
# 4: Extract the Cholesky decomposition of the (negative) Hessian and invert
R <- chol(-Hessian)
V <- chol2inv(R)
rownames(V) <- colnames(V) <- colnames(Hessian)
# 5: Produce a matrix with some specified number of simulation draws from a multinormal
SIMS <- 1000
len <- length(theta_hat)
unconstrained <- upars + t(chol(V)) %*%
matrix(rnorm(SIMS * len), nrow = len, ncol = SIMS)
theta_sims <- t(apply(unconstrained, 2, FUN = function(upars) {
unlist(constrain_pars(linear, upars))
}))
# 6: Produce estimated standard errors for the constrained parameters
se <- apply(theta_sims, 2, sd)
I'm attempting to write my own function to understand how the Poisson distribution behaves within a Maximum Likelihood Estimation framework (as it applies to GLM).
I'm familiar with R's handy glm function, but wanted to try and hand-roll some code to understand what's going on:
n <- 10000 # sample size
b0 <- 1.0 # intercept
b1 <- 0.2 # coefficient
x <- runif(n=n, min=0, max=1.5) # generate covariate values
lp <- b0+b1*x # linear predictor
lambda <- exp(lp) # compute lamda
y <- rpois(n=n, lambda=lambda) # generate y-values
dta <- data.frame(y=y, x=x) # generate dataset
negloglike <- function(lambda) {n*lambda-sum(x)*log(lambda) + sum(log(factorial(y)))} # build negative log-likelihood
starting.vals <- c(0,0) # one starting value for each parameter
pars <- c(b0, b1)
maxLike <- optim(par=pars,fn=negloglike, data = dta) # optimize
My R output when I enter maxLike is the following:
Error in fn(par, ...) : unused argument (data = list(y = c(2, 4....
I assume I've specified optim within my function incorrectly, but I'm not familiar enough with the nuts-and-bolts of MLE or constrained optimization to understand what I'm missing.
optim can only use your function in a certain way. It assumes the first parameter in your function takes in the parameters as a vector. If you need to pass other information to this function (in your case the data) you need to have that as a parameter of your function. Your negloglike function doesn't have a data parameter and that's what it is complaining about. The way you have it coded you don't need one so you probably could fix your problem by just removing the data=dat part of your call to optim but I didn't test that. Here is a small example of doing a simple MLE for just a poisson (not the glm)
negloglike_pois <- function(par, data){
x <- data$x
lambda <- par[1]
-sum(dpois(x, lambda, log = TRUE))
}
dat <- data.frame(x = rpois(30, 5))
optim(par = 4, fn = negloglike_pois, data = dat)
mean(dat$x)
> optim(par = 4, fn = negloglike_pois, data = dat)
$par
[1] 4.833594
$value
[1] 65.7394
$counts
function gradient
22 NA
$convergence
[1] 0
$message
NULL
Warning message:
In optim(par = 4, fn = negloglike_pois, data = dat) :
one-dimensional optimization by Nelder-Mead is unreliable:
use "Brent" or optimize() directly
> # The "true" MLE. We didn't hit it exactly but came really close
> mean(dat$x)
[1] 4.833333
Implementing the comments from Dason's answer is quite straightforward, but just in case:
library("data.table")
d <- data.table(id = as.character(1:100),
x1 = runif(100, 0, 1),
x2 = runif(100, 0, 1))
#' the assumption is that lambda can be written as
#' log(lambda) = b1*x1 + b2*x2
#' (In addition, could add a random component)
d[, mean := exp( 1.57*x1 + 5.86*x2 )]
#' draw a y for each of the observations
#' (rpois is not vectorized, need to use sapply)
d[, y := sapply(mean, function(x)rpois(1,x)) ]
negloglike_pois <- function(par, data){
data <- copy(d)
# update estimate of the mean
data[, mean_tmp := exp( par[1]*x1 + par[2]*x2 )]
# calculate the contribution of each observation to the likelihood
data[, log_p := dpois(y, mean_tmp, log = T)]
#' Now we can sum up the probabilities
data[, -sum(log_p)]
}
optim(par = c(1,1), fn = negloglike_pois, data = d)
$par
[1] 1.554759 5.872219
$value
[1] 317.8094
$counts
function gradient
95 NA
$convergence
[1] 0
$message
NULL