R cannot replicate results from Stata for log-likelihood maximization

R cannot replicate results from Stata for log-likelihood maximization - r

I have been working on this algorithm all week, and I do not seem to find the problem.
I employ Stata's NLSUR command for a simple QUAIDS maximization. Stata requires me to write a program evaluator in which I parametrize my system of equations as well as imposing parametric restrictions. Those with experience in such implementations will find the code below familiar. Then, Stata'NLSUR uses MLE to find the parameters. This is not problem, so I use these results (which are correct) to check my log-likelihood optimization problem in R.
R is a bit trickier, because it requires me to write a similar script to that in Stata, but I need to additionally specified the log-likelihood and so the variance-covariance matrix in a function. This function is shown below. I have tried the BBoptim() solver and the results for the parameters do not match, only those in the variance-covariance matrix. I also wrote my own gradient descent algorithm and check it with other toy examples. Then, I check it with this function. My gradient descent algorithm works with the toy examples, but not with my function. I believe the problem is my function, which is similar to the one employ in Stata's NLSUR.
I have to also point out that I use the Stata's estimated parameters in my own function with the same data and I find the same log-likelihood value. I also predict values with Stata and my R's function and I find the same values. I have also tested if the parameter restrictions have been imposed, and it seems this is the case.
Could you help me figure it out what I am doing wrong? The reason that I need to write this specification is that I will be incorporating truncations in the log-likelihood, but I started step-by-step by first trying to solve the simpler problem without the truncations, but only using the log-likelihood.
quaids.loglike <- function(param, s1, s2, s3, lnp1, lnp2, lnp3, lnp4, lnw){
# alphas
a1 <- param[1]
a2 <- param[2]
a3 <- param[3]
a4 <- (1 - a1 - a2 - a3)
# betas
b1 <- param[4]
b2 <- param[5]
b3 <- param[6]
b4 <- (-b1 - b2 - b3)
# gammas
g11 <- param[7]
g12 <- param[8]
g13 <- param[9]
g14 <- (-g11 - g12 - g13)
g21 <- g12
g22 <- param[10]
g23 <- param[11]
g24 <- (-g21 - g22 - g23)
g31 <- g13
g32 <- g23
g33 <- param[12]
g34 <- (-g31 - g32 - g33)
g41 <- g14
g42 <- g24
g43 <- g34
g44 <- (-g41 - g42 - g43)
# lambdas
l1 <- param[13]
l2 <- param[14]
l3 <- param[15]
# Sigmas
sig11 <- param[16]
sig12 <- param[17]
sig13 <- param[19]
sig21 <- sig12
sig22 <- param[18]
sig23 <- param[20]
sig31 <- sig13
sig32 <- sig23
sig33 <- param[21]
# Lnpindex (Ln a(p) where a0 = 5)
lnpindex <- 5 + a1*lnp1 + a2*lnp2 + a3*lnp3 + a4*lnp4
lnpindex <- lnpindex + 0.5*(g11*lnp1*lnp1 + g12*lnp1*lnp2 + g13*lnp1*lnp3 + g14*lnp1*lnp4)
lnpindex <- lnpindex + 0.5*(g21*lnp2*lnp1 + g22*lnp2*lnp2 + g23*lnp2*lnp3 + g24*lnp2*lnp4)
lnpindex <- lnpindex + 0.5*(g31*lnp3*lnp1 + g32*lnp3*lnp2 + g33*lnp3*lnp3 + g34*lnp3*lnp4)
lnpindex <- lnpindex + 0.5*(g41*lnp4*lnp1 + g42*lnp4*lnp2 + g43*lnp4*lnp3 + g44*lnp4*lnp4)
#for(i in as.character(1:4)) {
# for(j in as.character(1:4)) {
# gij <- get(paste0("g", i, j))
# lnpi <- get(paste0("lnp", i))
# lnpj <- get(paste0("lnp", j))
# lnpindex <- lnpindex + 0.5*gij*lnpi*lnpj
# }
#}
# b(p) price index
bofp <- lnp1*b1 + lnp2*b2 + lnp3*b3 + lnp4*b4
#for(i in as.character(1:4)) {
# lnpi <- get(paste0("lnp", i))
# bi <- get(paste0("b", i))
# bofp <- bofp + lnpi*bi
#}
bofp <- exp(bofp)
# The parametric shares
u1 <- a1 + g11*lnp1 + g12*lnp2 + g13*lnp3 + g14*lnp4 + b1*(lnw - lnpindex) + (l1/bofp)*(lnw - lnpindex)^2
u2 <- a2 + g21*lnp1 + g22*lnp2 + g23*lnp3 + g24*lnp4 + b2*(lnw - lnpindex) + (l2/bofp)*(lnw - lnpindex)^2
u3 <- a3 + g31*lnp1 + g32*lnp2 + g33*lnp3 + g34*lnp4 + b3*(lnw - lnpindex) + (l3/bofp)*(lnw - lnpindex)^2
U <- c(mean(u1, na.rm = TRUE), mean(u2, na.rm = TRUE), mean(u3, na.rm = TRUE))
# The vcov matrix:
sigma <- c(sig11, sig12, sig13, sig21, sig22, sig23, sig31, sig32, sig33)
sigma <- matrix(sigma, 3, 3)
# The shares
S <- cbind(s1, s2, s3)
# the individual log-likelihood
ll <- dmvnorm(S, U, sigma = sigma, log = TRUE)
return(sum(ll))
}

Related

Bridge sampling Monte-carlo method in R studio for variance gamma

I am using trying to use bridge sampling in R studio to simulate paths for the variance gamma process. My code is:
sigma = 0.5054
theta = 0.2464
nu = 0.1184
mu=1
N=2^(k)
k=5
V_<-rep(NA,252)
V_[0]<-0
G_[N]<-rgamma(1, shape=N*1/nu, scale=nu)
G_<-0
V<-rnorm(theta*G[N],sigma^2*G[N])
for(l in 1:k){
n<-2^(k-l)
for(j in 1:2^i-1){
i<-(2*j-1)*n
d1<-(n)*mu^2/nu
d2<-(n)*mu^2/nu
Y<-rbeta(1,d1,d2)
G_[i]<-G_[i-1]+(G[i+n]-G[i-n])*Y
G[i]
print(G_[i])
Z<-rnorm(0,(G_[i+n]-G_[i])*sigma^2*Y)
V_[i]<-Y*V_[i+n]+(1-Y)*V_[i-n]+Z
print(V_[i])
}
}
ts.plot(V[i])
I'm not sure what I've done wrong. The algorithm I am trying to follow is as below in the picture:

Based on your code, a numerical sequence was simulated. And it can be roughly validated by using VarianceGamma::vgFit to estimate the parameters.
Note that the time index starts from 1 due to R syntax. The sqrt of variance was used for the standard deviation in rnorm. And I probably shouldn't add the change due to interest rate vgC in the end, since it is not included in your algorithm. Please set it as 0 if it doesn't make sense.
Simulation by Brownian bridge:
# Brownian-Gamma Bridge Sampling (BGBS) of a VG process
set.seed(1)
M <- 10
nt <- 2^M + 1 #number of observations
T <- nt - 1 #total time
T_ <- seq(0, T, length.out=nt) #fixed time increments
#random time increments
#T_ = c(0, runif(nt-2), 1)
#T_ = sort(T_) * T
r <- 1 + 0.2 #interest rate
vgC <- (r-1)
sigma <- 0.5054
theta <- 0.2464
nu <- 0.1184
V_ <- G_ <- rep(NA,nt)
V_[1] <- 0
G_[1] <- 0
G_[nt] <- rgamma(1, shape=T/nu, scale=nu)
V_[nt] <- rnorm(1, theta*G_[nt], sqrt(sigma^2*G_[nt]))
for (k in 1:M)
{
n <- 2^(M-k)
for (j in 1:2^(k-1))
{
i <- (2*j-1) * n
Y <- rbeta(1, (T_[i+1]-T_[i-n+1])/nu, (T_[i+n+1]-T_[i+1])/nu)
G_[i+1] <- G_[i-n+1] + (G_[i+n+1] - G_[i-n+1]) * Y
Z <- rnorm(1, sd=sqrt((G_[i+n+1] - G_[i+1]) * sigma^2 * Y))
V_[i+1] <- Y * V_[i+n+1] + (1-Y) * V_[i-n+1] + Z
}
}
V_ <- V_ + vgC*T_ # changes due to interest rate
plot(T_, V_)
The results roughly match with the estimation:
#Estimated parameters:
library(VarianceGamma)
dV <- V_[2:nt] - V_[1:(nt-1)]
vgFit(dV)
> vgC sigma theta nu
> 0.2996 0.5241 0.1663 0.1184
#Real parameters:
c(vgC, sigma, theta, nu)
> vgC sigma theta nu
> 0.2000 0.5054 0.2464 0.1184
EDIT
As you commented, there is another similar algorithm and can be implemented in a similar way.
Your code could be modified as below:
set.seed(1)
M <- 7
nt <- 2^M + 1
T <- nt - 1
T_ <- seq(0, T, length.out=nt)
sigma=0.008835
theta= -0.003856
nu=0.263743
vgc=0.004132
V_ <- G_ <- rep(1,nt)
G_[T+1] <- rgamma(1, shape=T/nu, scale=nu) #
V_[T+1] <- rnorm(1, theta*G_[T+1], sqrt(sigma^2*G_[T+1])) #
V_[1] <- 0
G_[1] <- 0
for (m in 1:M){ #
Y <- rbeta(1,T/(2^m*nu), T/(2^m*nu))
for (j in 1:2^(m-1)){ #
i <- (2*j-1)
G_[i*T/(2^m)+1] = G_[(i-1)*T/(2^m)+1]+(-G_[(i-1)*T/(2^m)+1]+G_[(i+1)*T/(2^m)+1])*Y #
b=G_[T*(i+1)/2^m+1] - G_[T*(i)/2^m+1] #
Z_i <- rnorm(1, sd=b*sigma^2*Y)
#V_[i] <- Y* V_[i+1] + (1-Y)*V_[i-1] + Z_i
V_[i*T/(2^m)+1] <- Y* V_[(i+1)*T/(2^m)+1] + (1-Y)*V_[(i-1)*T/(2^m)+1] + Z_i
}
}
V_ <- V_ + vgc*T_
V_
ts.plot(V_, main="BRIDGE", xlab="Time increment")

Ryan again, I have found another algorithm for bridge sampling which I tried on my own, But I am not convinced that my answers are correct. I have added my code, output and algorithm below and also the output I think it should loom like? I have used a similar format to your code:
set.seed(1)
M <- 7
nt <- 2^M + 1 #number of observations
T <- nt - 1 #total time
T_ <- seq(0, T, length.out=nt) #fixed time increments
sigma=0.008835
theta= -0.003856
nu=0.263743
vgc=0.004132
V_ <- G_ <- rep(1,nt)
G_[T] <- rgamma(1, shape=T/nu, scale=nu)
V_[T] <- rnorm(1, theta*G_[T], sqrt(sigma^2*G_[T]))
V_[1] <- 0
G_[1] <- 0
for (m in 2:M){
Y <- rbeta(1,T/(2^m*nu), T/(2^m*nu))
for (j in 2:2^(m-1)){
i <- (2*j-1)
G_[i*T/(2^m)] = G_[(i-1)*T/(2^m)]+(G_[(i-1)*T/(2^m)]+G_[(i+1)*T/(2^m)])*Y
b=G_[T*(i)/2^m] - G_[T*(i-1)/2^m]
Z_i <- rnorm(1, sd=b*sigma^2*Y)
V_[i] <- Y* V_[i+1] + (1-Y)*V_[i-1] + Z_i
}
}
V_ <- V_ + vgc*T_ # changes due to interest rate
V_
ts.plot(V_, main="BRIDGE", xlab="Time increment")
However this is how my plot from my ouput, in figure 1:
Bu as Variance gamma is a jump process with finite activity, the path should look like this: , this is just an image from google for variance gamma paths, the sequential sampling one looks like this and my aim is to compare it to Bridge sampling for simulating paths. But my output looks really different. Please let me know your thoughts. If there is an issue in my code let me know thanks. Here is algortihm for it, much similar to the one above but slightly different:

'X' as an array of at least two dimensions

Very new to R and RStudio and the whole concept of coding language. I'm trying to create reproducible code so I can properly ask a question.
The first error says:
Error in colSums(cTrain * log(pTrain) + cCar * log(pCar) + cSM * log(pSM)) :
'x' must be an array of at least two dimensions
Using this code, where can I fix this so that 'x' can have two dimensions?
mydata <- structure(list(LUGGAGE=c(0,1,0,1,0), GA=c(0,0,0,0,0), TRAIN_AV=c(1,1,1,1,1), CAR_AV=c(1,1,1,1,1), SM_AV=c(1,1,1,1,1),
TRAIN_TT=c(114,142,235,193,227), TRAIN_CO=c(40,109,124,90,94),
SM_TxT=c(44,91,179,119,108), SM_CO=c(46,132,132,127,118),
CAR_TT=c(140,110,170,150,286), CAR_CO=c(123,104,80,95,169), CHOICE=c(2,2,3,3,2)),
.Names=c("Luggage","GA","TRAIN_AV","CAR_AV","SM_AV","TRAIN_TT","TRAIN_CO","SM_TT","SM_CO","CAR_TT","CAR_CO","CHOICE"),
row.names=c(NA,5L), class="data.frame")
## Initial value of parameters
initPar <- 8
### Log-Likelihood Function of the Logit Model
library("maxLik")
loglik <- function(x) {
## Parameters
# Alternative Specific Constants
asc_train <- x[1]
asc_sm <- x[2]
# Travel Time to Destination
ttime <- x[3]
# Travel Cost to Destination
tcost_train <- x[4]
tcost_car <- x[5]
tcost_sm <- x[6]
# Effect of Swiss Annual Season Ticket
ga <- x[7]
# Effect of luggage
luggage <- x[8]
## Log-Likelihood Variable
LL = 0
## Utility Function Vin
train <- asc_train*matrix(1, nrow=nrow(mydata), ncol = 1) + tcost_train*mydata$TRAIN_CO + ttime*mydata$TRAIN_TT/100 + ga*mydata$GA + luggage*mydata$LUGGAGE
car <- tcost_car*mydata$CAR_CO + ttime*mydata$CAR_TT/100 + luggage*mydata$LUGGAGE
sm <- asc_sm*matrix(1, nrow=nrow(mydata), ncol = 1) + tcost_sm*mydata$SM_CO + ttime*mydata$SM_TT/100 + ga*mydata$GA + luggage*mydata$LUGGAGE
## exp(Vin) and Control for Mode Availability
train <- mydata$TRAIN_AV *exp(train)
car <- mydata$CAR_AV *exp(car)
sm <- mydata$SM_AV *exp(sm)
## Choice Probabilities
deno <- (train + car + sm)
## Individual Choice Probabilities
pTrain <- mydata$TRAIN_AV *(train / deno)
pCar <- mydata$CAR_AV *(car / deno)
pSM <- mydata$SM_AV *(sm / deno)
pTrain <- (pTrain!=0) *pTrain + (pTrain==0)
pCar <- (pCar!=0) *pCar + (pCar==0)
pSM <- (pSM!=0) *pSM + (pSM==0)
## Choice Results
cTrain <- mydata$CHOICE == "1"
cCar <- mydata$CHOICE == "3"
cSM <- mydata$CHOICE == "2"
## Log-Likelihood Function
LL <- colSums(cTrain*log(pTrain) + cCar*log(pCar) + cSM*log(pSM))
}
### Maximization of Log-Likelihood Function ###
# Parameter Optimization
result <- maxLik(loglik, start=numeric(initPar))
# Parameter Estimation, Hessian Matrix Calculation
parameters <- result$estimate
hessianMatrix <- result$hessian
# T-Statistic Calculation
tval <- parameters/sqrt(-diag(solve(hessianMatrix)))
# L(0), Log-Likelihood When All parameters = 0
L0 <- loglik(numeric(initPar))
# LL, Maximumum Likelihood
LL <- result$maximum

Nicely asked question with a reproducible example; upvoted!
Your problem was very simple. Your function looks for a variable called mydata$LUGGAGE that doesn't exist. R is case sensitive and your column is called mydata$Luggage.
All you have to do is
names(mydata)[1] <- "LUGGAGE"
Now run your script and you should get this result:
result <- maxLik(loglik, start=numeric(initPar))
result
# Maximum Likelihood estimation
# Newton-Raphson maximisation, 30 iterations
# Return code 2: successive function values within tolerance limit
# Log-Likelihood: -1.744552e-07 (8 free parameter(s))
# Estimate(s): -277.7676 -250.6531 8.651811 -1.680196 -4.208955 -1.281697 0 354.4692

MLE issues in R

I am new to R and taught myself what I know of R based on the other languages i know. I am in a student research position currently and must use R to find the maximum likelihood estimate of the given likelihood function:
Where g, m_i, x_ij, n_ij, and mu_i are known. I have to maximize theta_i, but i am not sure how since i am mostly self taught. I do know that i should have six estimated values of theta, however. I have tried doing research online about using mle but I am not far into Statistics to understand what the websites are talking about. Any help in figuring out what i am doing wrong would be greatly appreciated. I am unsure how to attach excel files, so i apologize not being able to include data tables.
In trying to teach myself this and work with the professor we receive this error:
Error in do.call("minuslogl", l) : could not find function "minuslogl"
Below is the code i have done up until this point:
library(stats4)
#####################################################################
#Liklihood Model ####CHECK
###################################################################
BB <- function(LITTERS, responses, fetuses, mu, theta) {
total <- 0
#1
for (i in 1:6) {
firstSum <- 0
#2
for (j in 1:LITTERS) {
secondSum <- 0
#log(mu[i] + kTheta[i])
insideFirst <- 0
for (k in 0:(responses[i,j] - 1))
{
insideFirst <- insideFirst + log10(mu[i] + k * theta[i])
}
#log(1-mu[i] + kTheta[i])
insideSecond <- 0
for (k in 0:(fetuses[i,j] - responses[i,j] - 1))
{
insideSecond <- insideSecond + log10(1 - mu[i] + k * theta[i])
}
#log(1 + kTheta[i])
insideThird <- 0
for (k in 0:(fetuses[i,j] - 1))
{
insideThird <- insideThird + log10(1 + k * theta[i])
}
secondSum <- insideFirst + insideSecond - insideThird
firstSum <- firstSum + secondSum
}
total <- total + firstSum
}
return (total)
}
###################################################################
#Number of litters
LITTERS.M <- 25
doses <- c(0, 30, 45, 60, 75, 90)
#Retrieves the litter sizes (fetuses)
litterSize.dose0 <- get.Litter.Sizes(dose0, LITTERS.M)
litterSize.dose30 <- get.Litter.Sizes(dose30, LITTERS.M)
litterSize.dose45 <- get.Litter.Sizes(dose45, LITTERS.M)
litterSize.dose60 <- get.Litter.Sizes(dose60, LITTERS.M)
litterSize.dose75 <- get.Litter.Sizes(dose75, LITTERS.M)
litterSize.dose90 <- get.Litter.Sizes(dose90, LITTERS.M)
litterSize <- c(litterSize.dose0, litterSize.dose30, litterSize.dose45, litterSize.dose60, litterSize.dose75, litterSize.dose90)
litterSizes <- matrix(litterSize, nrow = 6, ncol = LITTERS.M)
#Start of Linear Regression for AB By first estimating AB
estimate.dose0 <- get.estimate.AB(dose0)
estimate.dose30 <- get.estimate.AB(dose30)
estimate.dose45 <- get.estimate.AB(dose45)
estimate.dose60 <- get.estimate.AB(dose60)
estimate.dose75 <- get.estimate.AB(dose75)
estimate.dose90 <- get.estimate.AB(dose90)
rProbR <- c(estimate.dose0, estimate.dose30, estimate.dose45, estimate.dose60,
estimate.dose75, estimate.dose90)
ab <- c(get.Log.Estimate(estimate.dose0), get.Log.Estimate(estimate.dose30), get.Log.Estimate(estimate.dose45),
get.Log.Estimate(estimate.dose60), get.Log.Estimate(estimate.dose75), get.Log.Estimate(estimate.dose90))
#Fit to Linear Regression
toFit <- data.frame(rProbR, ab)
linearRegression <- lm(ab ~ rProbR, data=toFit)
#Get Coefficients of linear regression of AB
AApproximation = linearRegression$coefficients[1]
BApproximation = linearRegression$coefficients[2]
#Get probability response for each dose group (P(D[i]))
probabilityResponse.dose0 <- get.Probability.Response.Logistic(AApproximation + BApproximation * 0)
probabilityResponse.dose30 <- get.Probability.Response.Logistic(AApproximation + BApproximation * 30)
probabilityResponse.dose45 <- get.Probability.Response.Logistic(AApproximation + BApproximation * 45)
probabilityResponse.dose60 <- get.Probability.Response.Logistic(AApproximation + BApproximation * 60)
probabilityResponse.dose75 <- get.Probability.Response.Logistic(AApproximation + BApproximation* 75)
probabilityResponse.dose90 <- get.Probability.Response.Logistic(AApproximation + BApproximation * 90)
probabilityResponses <- c(probabilityResponse.dose0, probabilityResponse.dose30, probabilityResponse.dose45, probabilityResponse.dose60, probabilityResponse.dose75, probabilityResponse.dose90)
#Generate number of responses for each litter (Responses)
litterResponses.dose0 <- rbinom(LITTERS.M, litterSize.dose0, probabilityResponse.dose0)
litterResponses.dose30 <- rbinom(LITTERS.M, litterSize.dose30, probabilityResponse.dose30)
litterResponses.dose45 <- rbinom(LITTERS.M, litterSize.dose45, probabilityResponse.dose45)
litterResponses.dose60 <- rbinom(LITTERS.M, litterSize.dose60, probabilityResponse.dose60)
litterResponses.dose75 <- rbinom(LITTERS.M, litterSize.dose75, probabilityResponse.dose75)
litterResponses.dose90 <- rbinom(LITTERS.M, litterSize.dose90, probabilityResponse.dose90)
litterResponse <- c(litterResponses.dose0, litterResponses.dose30, litterResponses.dose45, litterResponses.dose60, litterResponses.dose75, litterResponses.dose90)
litterResponses <- matrix(litterResponse, 6, LITTERS.M)
backgroundResponseProb <- get.Probability.Response.Logistic(AApproximation + BApproximation * 0)
backgroundResponseProb <- backgroundResponseProb + .001
mle(BB(LITTERS.M, litterResponses, litterSizes, probabilityResponse, theta=0))

I haven't gone through the entire code (you should try to provide minimal reproducible examples), but the error you're getting is caused by the fact that you're not using the mle function correctly.
The first argument to the mle function should be another function that takes candidate parameters as arguments, and returns the negative log-likelihood of the data as a function of these parameters. The second argument to the mle function is a named list of starting parameters. Look at ?mle for more details.
Here's a minimal example for fitting normally distributed data:
library(stats4)
y <- rnorm(100, 5, 3) ## Example data
mllNorm <- function(mean, log.sd) {-sum(dnorm(y, mean, exp(log.sd), log=TRUE))} ## Minus Gaussian log-likelihood
mle.fit <- mle(mllNorm, start=list(mean=1, log.sd=1)) ## MLE
print(mle.fit)
As a more general tip, I highly recommend the maxLik package for MLE. It offers a more flexible interface, prettier output, and more optimization options.

AIC different between biglm and lm

I have been trying to use biglm to run linear regressions on a large dataset (approx 60,000,000 lines). I want to use AIC for model selection. However I discovered when playing with biglm on smaller datasets that the AIC variables returned by biglm are different from those returned by lm. This even applies to the example in the biglm help.
data(trees)
ff<-log(Volume)~log(Girth)+log(Height)
chunk1<-trees[1:10,]
chunk2<-trees[11:20,]
chunk3<-trees[21:31,]
library(biglm)
a <- biglm(ff,chunk1)
a <- update(a,chunk2)
a <- update(a,chunk3)
AIC(a)#48.18546
a_lm <- lm(ff, trees)
AIC(a_lm)#-62.71125
Can someone please explain what is happening here? Are the AICs generated with biglm safe to use for comparing biglm models on the same dataset?

tl;dr it looks to me like there is a pretty obvious bug in the AIC method for biglm-class objects (more specifically, in the update method), in the current (0.9-1) version, but the author of the biglm package is a smart, experienced guy, and biglm is widely used, so perhaps I'm missing something. Googling for "biglm AIC df.resid", it seems this has been discussed way back in 2009? Update: the package author/maintainer reports via e-mail that this is indeed a bug.
Something funny seems to be going on here. The differences in AIC between models should be the same across modeling frameworks, whatever the constants that have been used and however parameters are counted (because these constants and parameter-counting should be consistent within modeling frameworks ...)
Original example:
data(trees)
ff <- log(Volume)~log(Girth)+log(Height)
chunk1<-trees[1:10,]
chunk2<-trees[11:20,]
chunk3<-trees[21:31,]
library(biglm)
a <- biglm(ff,chunk1)
a <- update(a,chunk2)
a <- update(a,chunk3)
a_lm <- lm(ff, trees)
Now fit a reduced model:
ff2 <- log(Volume)~log(Girth)
a2 <- biglm(ff2, chunk1)
a2 <- update(a2, chunk2)
a2 <- update(a2 ,chunk3)
a2_lm <- lm(ff2,trees)
Now compare AIC values:
AIC(a)-AIC(a2)
## [1] 1.80222
AIC(a_lm)-AIC(a2_lm)
## [1] -20.50022
Check that we haven't screwed something up:
all.equal(coef(a),coef(a_lm)) ## TRUE
all.equal(coef(a2),coef(a2_lm)) ## TRUE
Look under the hood:
biglm:::AIC.biglm
## function (object, ..., k = 2)
## deviance(object) + k * (object$n - object$df.resid)
In principle this is the right formula (number of observations minus residual df should be the number of parameters fitted), but digging in, it looks like the $df.resid component of the objects hasn't been updated properly:
a$n ## 31, correct
a$df.resid ## 7, only valid before updating!
Looking at biglm:::update.biglm, I would add
object$df.resid <- object$df.resid + NROW(mm)
right before or after the line that reads
object$n <- object$n + NROW(mm)
...
This seems like a fairly obvious bug to me, so perhaps I'm missing something obvious, or perhaps it has been fixed.
A simple workaround would be to define your own AIC function as
AIC.biglm <- function (object, ..., k = 2) {
deviance(object) + k * length(coef(object))
}
AIC(a)-AIC(a2) ## matches results from lm()
(although note that AIC(a_lm) is still not equal to AIC(a), because stats:::AIC.default() uses 2*log-likelihood rather than deviance (these two measures differ in their additive coefficients) ...)

I have played around with this a bit. I am not certain, but I think the formula for AIC used by the package biglm is:
2 * (n.parameters + obs.added - 1) + deviance(a)
where obs_added is the number of observations in chunk2 plus the number of observations in chunk3:
obs.added <- dim(chunk2)[1] + dim(chunk3)[1]
and n.parameters is the number of estimated coefficients returned by summary(a) + 1 (where the +1 is for the error term), and deviance(a) is the deviance of your model a.
####################################################
data(trees)
ff <- log(Volume)~log(Girth)+log(Height)
n.parm <- 4
chunk1<-trees[1:10,]
chunk2<-trees[11:20,]
chunk3<-trees[21:31,]
obs.added <- dim(chunk2)[1] + dim(chunk3)[1]
library(biglm)
a <- biglm(ff,chunk1)
a <- update(a,chunk2)
a <- update(a,chunk3)
AIC(a)
summary(a)
deviance(a)
2 * (n.parm + obs.added - 1) + deviance(a)
round(AIC(a), 5) == round(2 * (n.parm + obs.added - 1) + deviance(a), 5)
# [1] TRUE
####################################################
Since I am not 100% certain my answer is correct, you can play around with the code below and see whether you can find a scenario where the proposed formula for AIC does not work. If I find any such scenarios I will attempt to modify the code below and the formula above as necessary.
#########################################################
# Generate some data
n <- 118 # number of observations
B0 <- 2 # intercept
B1 <- -1.5 # slope 1
B2 <- 0.4 # slope 2
B3 <- 2.0 # slope 3
B4 <- -0.8 # slope 4
sigma2 <- 5 # residual variance
x1 <- round(runif(n, -5 , 5), digits = 3) # covariate 1
x2 <- round(runif(n, 10 , 20), digits = 3) # covariate 2
x3 <- round(runif(n, 2 , 8), digits = 3) # covariate 3
x4 <- round(runif(n, 12 , 15), digits = 3) # covariate 4
eps <- rnorm(n, mean = 0, sd = sqrt(sigma2)) # error
y <- B0 + B1 * x1 + B2 * x2 + B3 * x3 + B4 * x4 + eps # dependent variable
my.data <- data.frame(y, x1, x2, x3, x4)
# analyze data with linear regression
model.1 <- lm(my.data$y ~ my.data$x1 + my.data$x2 + my.data$x3 + my.data$x4)
summary(model.1)
AIC(model.1)
n.parms <- length(model.1$coefficients) + 1
my.AIC <- 2 * n.parms - 2 * as.numeric(logLik(model.1))
my.AIC
#########################################################
ff0 <- y ~ 1
ff1 <- y ~ x1
ff2 <- y ~ x1 + x2
ff3 <- y ~ x1 + x2 + x3
ff4 <- y ~ x1 + x2 + x3 + x4
n.parm0 <- 2
n.parm1 <- 3
n.parm2 <- 4
n.parm3 <- 5
n.parm4 <- 6
n.chunks <- 5
chunk1<-my.data[ 1:round(((nrow(my.data)/n.chunks)*1)+0),]
chunk2<-my.data[round(((nrow(my.data)/n.chunks)*1)+1):round(((nrow(my.data)/n.chunks)*2)+0),]
chunk3<-my.data[round(((nrow(my.data)/n.chunks)*2)+1):round(((nrow(my.data)/n.chunks)*3)+0),]
chunk4<-my.data[round(((nrow(my.data)/n.chunks)*3)+1):round(((nrow(my.data)/n.chunks)*4)+0),]
chunk5<-my.data[round(((nrow(my.data)/n.chunks)*4)+1):nrow(my.data),]
obs.added <- dim(chunk2)[1] + dim(chunk3)[1] + dim(chunk4)[1] + dim(chunk5)[1]
# check division of data
foo <- list()
foo[[1]] <- chunk1
foo[[2]] <- chunk2
foo[[3]] <- chunk3
foo[[4]] <- chunk4
foo[[5]] <- chunk5
all.data.foo <- do.call(rbind, foo)
all.equal(my.data, all.data.foo)
####################################################
library(biglm)
####################################################
a0 <- biglm(ff0, chunk1)
a0 <- update(a0, chunk2)
a0 <- update(a0, chunk3)
a0 <- update(a0, chunk4)
a0 <- update(a0, chunk5)
AIC(a0)
summary(a0)
deviance(a0)
print(a0)
2 * (n.parm0 + obs.added - 1) + deviance(a0)
round(AIC(a0), 5) == round(2 * (n.parm0 + obs.added - 1) + deviance(a0), 5)
####################################################
a1 <- biglm(ff1, chunk1)
a1 <- update(a1, chunk2)
a1 <- update(a1, chunk3)
a1 <- update(a1, chunk4)
a1 <- update(a1, chunk5)
AIC(a1)
summary(a1)
deviance(a1)
print(a1)
2 * (n.parm1 + obs.added - 1) + deviance(a1)
round(AIC(a1), 5) == round(2 * (n.parm1 + obs.added - 1) + deviance(a1), 5)
####################################################
a2 <- biglm(ff2, chunk1)
a2 <- update(a2, chunk2)
a2 <- update(a2, chunk3)
a2 <- update(a2, chunk4)
a2 <- update(a2, chunk5)
AIC(a2)
summary(a2)
deviance(a2)
print(a2)
2 * (n.parm2 + obs.added - 1) + deviance(a2)
round(AIC(a2), 5) == round(2 * (n.parm2 + obs.added - 1) + deviance(a2), 5)
####################################################
a3 <- biglm(ff3, chunk1)
a3 <- update(a3, chunk2)
a3 <- update(a3, chunk3)
a3 <- update(a3, chunk4)
a3 <- update(a3, chunk5)
AIC(a3)
summary(a3)
deviance(a3)
print(a3)
2 * (n.parm3 + obs.added - 1) + deviance(a3)
round(AIC(a3), 5) == round(2 * (n.parm3 + obs.added - 1) + deviance(a3), 5)
####################################################
a4 <- biglm(ff4, chunk1)
a4 <- update(a4, chunk2)
a4 <- update(a4, chunk3)
a4 <- update(a4, chunk4)
a4 <- update(a4, chunk5)
AIC(a4)
summary(a4)
deviance(a4)
print(a4)
2 * (n.parm4 + obs.added - 1) + deviance(a4)
round(AIC(a4), 5) == round(2 * (n.parm4 + obs.added - 1) + deviance(a4), 5)
####################################################
EDIT
I suggested biglm uses the following equation for AIC:
2 * (n.parameters + obs.added - 1) + deviance(a)
Ben Bolker pointed out that the equation biglm uses for AIC is:
deviance(object) + k * (object$n - object$df.resid)
Ben also determined that biglm was not updating the first value for residual df.
Given that new information, I now see that the two equations are equivalent.
First, restrict the two equations to the following, which is the only place they differ:
(n.parameters + obs.added - 1) # mine
(object$n - object$df.resid) # Ben's
Re-arrange mine to account for me adding 1 to the number of parameters and then subtracting one:
((n.parameters-1) + obs.added) = ((4-1) + obs.added) = (3 + 21) = 24
Now morph my equation into Ben's:
My 3 is the same as:
(number of observations in chunk1 - object$df.resid) = (10 - 7) = 3
giving:
((number of obs in chunk1 - object$df.resid) + obs.added) = ((10-7) + 21)
or:
(3 + 21) = 24
Re-arrange:
((number of obs in chunk1 + obs.added) - object$df.resid) = ((10 + 21) - 7)
or:
(31 - 7) = 24
And:
((number of observations in chunk1 + obs.added) - object$df.resid)
is the same as:
(total number of observations - object$df.resid)
Which is the same as:
(object$n - object$df.resid) = (31 - 7) = 24
It appears the equation I proposed really is the equation biglm uses for AIC, just expressed in a different form.
Of course, I was only able to realize this because Ben provided both the critical code and the critical explanation of the error.

profile confidence intervals in R: mle2

I am trying to use the command mle2, in the package bbmle. I am looking at p2 of "Maximum likelihood estimation and analysis with the bbmle package" by Bolker. Somehow I fail to enter the right start values. Here's the reproducible code:
l.lik.probit <-function(par, ivs, dv){
Y <- as.matrix(dv)
X <- as.matrix(ivs)
K <-ncol(X)
b <- as.matrix(par[1:K])
phi <- pnorm(X %*% b)
sum(Y * log(phi) + (1 - Y) * log(1 - phi))
}
n=200
set.seed(1000)
x1 <- rnorm(n)
x2 <- rnorm(n)
x3 <- rnorm(n)
x4 <- rnorm(n)
latentz<- 1 + 2.0 * x1 + 3.0 * x2 + 5.0 * x3 + 8.0 * x4 + rnorm(n,0,5)
y <- latentz
y[latentz < 1] <- 0
y[latentz >=1] <- 1
x <- cbind(1,x1,x2,x3,x4)
values.start <-c(1,1,1,1,1)
foo2<-mle2(l.lik.probit, start=list(dv=0,ivs=values.start),method="BFGS",optimizer="optim", data=list(Y=y,X=x))
And this is the error I get:
Error in mle2(l.lik.probit, start = list(Y = 0, X = values.start), method = "BFGS", :
some named arguments in 'start' are not arguments to the specified log-likelihood function
Any idea why? Thanks for your help!

You've missed a couple of things, but the most important is that by default mle2 takes a list of parameters; you can make it take a parameter vector instead, but you have to work a little bit harder.
I have tweaked the code slightly in places. (I changed the log-likelihood function to a negative log-likelihood function, without which this would never work!)
l.lik.probit <-function(par, ivs, dv){
K <- ncol(ivs)
b <- as.matrix(par[1:K])
phi <- pnorm(ivs %*% b)
-sum(dv * log(phi) + (1 - dv) * log(1 - phi))
}
n <- 200
set.seed(1000)
dat <- data.frame(x1=rnorm(n),
x2=rnorm(n),
x3=rnorm(n),
x4=rnorm(n))
beta <- c(1,2,3,5,8)
mm <- model.matrix(~x1+x2+x3+x4,data=dat)
latentz<- rnorm(n,mean=mm%*%beta,sd=5)
y <- latentz
y[latentz < 1] <- 0
y[latentz >=1] <- 1
x <- mm
values.start <- rep(1,5)
Now we do the fit. The main thing is to specify vecpar=TRUE and to use parnames to let mle2 know the names of the elements in the parameter vector ...
library("bbmle")
names(values.start) <- parnames(l.lik.probit) <- paste0("b",0:4)
m1 <- mle2(l.lik.probit, start=values.start,
vecpar=TRUE,
method="BFGS",optimizer="optim",
data=list(dv=y,ivs=x))
As pointed out above for this particular example you have just re-implemented the probit regression (although I understand that you now want to extend this to allow for heteroscedasticity in some way ...)
dat2 <- data.frame(dat,y)
m2 <- glm(y~x1+x2+x3+x4,family=binomial(link="probit"),
data=dat2)
As a final note, I would say that you should check out the parameters argument, which allows you to specify a sub-linear model for any one of the parameters, and the formula interface:
m3 <- mle2(y~dbinom(prob=pnorm(eta),size=1),
parameters=list(eta~x1+x2+x3+x4),
start=list(eta=0),
data=dat2)
PS confint(foo2) appears to work fine (giving profile CIs as requested) with this set-up.
ae <- function(x,y) all.equal(unname(coef(x)),unname(coef(y)),tol=5e-5)
ae(m1,m2) && ae(m2,m3)