I am trying to set up a simple OLS model with constraints on the coefficients in R. The code below is working. However, this demonstrates
y = c + a1x1 + a2x2 + a3x3 with constraint a1+a2 = 1
I would like to revise this constraint to:
a1*a2 - a3 = 0
thanks for your help!
WORKING CODE:
'''
set.seed(1000)
n <- 20
x1 <- seq(100,length.out=n)+rnorm(n,0,2)
x2 <- seq(50,length.out=n)+rnorm(n,0,2)
x3 <- seq(10,length.out=n)+rnorm(n,0,2)
constant <- 100
ymat <- constant + .5*x1 + .5*x2 + .75*x3 + rnorm(n,0,4)
xmat <- cbind(x1,x2,x3)
X <- cbind(rep(1,n),xmat) # explicitly include vector for constant
bh <- solve(t(X)%*%X)%*%t(X)%*%ymat
XX <- solve(t(X)%*%X)
cmat <- matrix(1,1,1)
Q <- matrix(c(0,1,1,0),ncol(X),1) # a1+a2=1 for y = c + a1x1 + a2x2 + a3x3
bc <- bh-XX%*%Q%*%solve(t(Q)%*%XX%*%Q)%*%(t(Q)%*%bh-cmat)
library(quadprog)
d <- t(ymat) %*% X
Rinv = solve(chol(t(X)%*%X))
qp <- solve.QP(Dmat=Rinv, dvec=d, Amat=Q, bvec=cmat, meq=1, factorized=TRUE)
qp
cbind(bh,qp$unconstrained.solution)
cbind(bc,qp$solution)
'''
Assuming the problem is to minimize || ymat - X b || ^2 subject to b[2] * b[3] == b[4] we can substitute for b[4] giving the unconstrained nls problem shown below. b below is the first 3 elements of b and we can get b[4] by multiplying the last two elements of b below together. No packages are used.
fm <- nls(ymat ~ X %*% c(b, b[2] * b[3]), start = list(b = 0:2))
fm
giving:
Nonlinear regression model
model: ymat ~ X %*% c(b, b[2] * b[3])
data: parent.frame()
b1 b2 b3
76.9718 0.6275 0.7598
residual sum-of-squares: 204
Number of iterations to convergence: 4
Achieved convergence tolerance: 6.555e-06
To compute b4
prod(coef(fm)[-1])
## [1] 0.476805
Note
In a similar way the original problem (to minimize the same objective but with the original constraint) can be reduced to an unconstrained problem and solved using nls via substitution:
nls(ymat ~ X %*% c(b[1], b[2], 1-b[2], b[3]), start = list(b = 0:2))
giving:
Nonlinear regression model
model: ymat ~ X %*% c(b[1], b[2], 1 - b[2], b[3])
data: parent.frame()
b1 b2 b3
105.3186 0.3931 0.7964
residual sum-of-squares: 222.3
Number of iterations to convergence: 1
Achieved convergence tolerance: 4.838e-08
It would even be possible to reparameterize to make this original problem solvable by lm
lm(ymat ~ I(X[, 2] - X[, 3]) + X[, 4] + offset(X[, 3]))
giving
Call:
lm(formula = ymat ~ I(X[, 2] - X[, 3]) + X[, 4] + offset(X[, 3]))
Coefficients:
(Intercept) I(X[, 2] - X[, 3]) X[, 4]
105.3186 0.3931 0.7964
G. grothendieck - thank you for your response. Unfortunately this didn't work for me.
I decided to work out the Lagrangian long handed, which turned out too complicated for me to solve.
Then realized,
a1*a2-a3 =0
a1*a2 = a3
ln(a1*a2)= ln(a3)
ln(a1) + ln(a2) -ln(a3) = 0
This leaves me with an additive constraint which I can solve with the quadprog package.
Maybe you can try the code below, using fmincon()
library(pracma)
library(NlcOptim)
# define objective function
fn <- function(v) norm(ymat- as.vector( xmat %*% v),"2")
# the constraint a1*a2 - a3 = 0
heq1 = function(v) prod(v[1:2])-v[3]
# solve a1, a2 and a3
res <- fmincon(0:2,fn,heq = heq1)
such that
> res$par
[1] 1.9043754 -0.1781830 -0.3393272
Related
Hi I'm currently doing coding to simulate data using inverse method. Im using parallel exponential model where I let the lambda=b0+b1x. My simulation is based on survival analysis.
#generate data
gen <- function(n,lambda,b0,b1){
set.seed(1)
u <- runif(n,0,1)
c1 <- rexp(n,lambda)
x <- rnorm(n,0,1)
t1 = -log(1 - sqrt(u) ) / (b0 + b1*x) #inverse method
c <- 1*(t1 < c1)
t = pmin(t1, c1)
data1 <- data.frame(x, t, t1, c1, c)
return(data1)
}
data2 <- gen(20,0.01,2,4)
data2
x = data2$x
t = data2$t
xsum = sum(x)
tsum = sum(t)
The problem is that when run the second coding below, it won't show my mle for b0 and b1
#Likelihood
library(maxLik)
LLF <- function(para){
set.seed(1)
b0 = para[1]
b1 = para[2]
n = 1
z1 = (n*log(2)) + (n*log(b0+b1*xsum)) - ((b0+b1*xsum)*tsum) + (n*log(1-exp((-(b0 + b1*xsum)*tsum))))
return(z1)
}
mle <- maxLik(LLF, start = c(2,4))
The problem is you assigned n=1 in the LLF. Since we usually maximize the parameters given the entire data, n should be equal to number of observations. If you update this info, your mle will converge. For example,
n<-nrow(data2)
#Likelihood
library(maxLik)
LLF <- function(para){
set.seed(1)
b0 = para[1]
b1 = para[2]
#n = 1
z1 = (n*log(2)) + (n*log(b0+b1*xsum)) - ((b0+b1*xsum)*tsum) + (n*log(1-exp((-(b0 + b1*xsum)*tsum))))
return(z1)
}
mle <- maxLik(LLF, start = c(2,4))
summary(mle)
Maximum Likelihood estimation
Newton-Raphson maximisation, 3 iterations
Return code 1: gradient close to zero
Log-Likelihood: -22.7055
2 free parameters
Estimates:
Estimate Std. error t value Pr(> t)
[1,] 1.986 NA NA NA
[2,] 3.986 NA NA NA
I'm trying to learn QR decomposition, but can't figure out how to get the variance of beta_hat without resorting to traditional matrix calculations. I'm practising with the iris data set, and here's what I have so far:
y<-(iris$Sepal.Length)
x<-(iris$Sepal.Width)
X<-cbind(1,x)
n<-nrow(X)
p<-ncol(X)
qr.X<-qr(X)
b<-(t(qr.Q(qr.X)) %*% y)[1:p]
R<-qr.R(qr.X)
beta<-as.vector(backsolve(R,b))
res<-as.vector(y-X %*% beta)
Thanks for your help!
setup (copying in your code)
y <- iris$Sepal.Length
x <- iris$Sepal.Width
X <- cbind(1,x)
n <- nrow(X)
p <- ncol(X)
qr.X <- qr(X)
b <- (t(qr.Q(qr.X)) %*% y)[1:p] ## can be optimized; see Remark 1 below
R <- qr.R(qr.X) ## can be optimized; see Remark 2 below
beta <- as.vector(backsolve(R, b))
res <- as.vector(y - X %*% beta)
math
computation
Residual degree of freedom is n - p, so estimated variance is
se2 <- sum(res ^ 2) / (n - p)
Thus, the variance covariance matrix of estimated coefficients is
V <- chol2inv(R) * se2
# [,1] [,2]
#[1,] 0.22934170 -0.07352916
#[2,] -0.07352916 0.02405009
validation
Let's check the correctness by comparing with lm:
fit <- lm(Sepal.Length ~ Sepal.Width, iris)
vcov(fit)
# (Intercept) Sepal.Width
#(Intercept) 0.22934170 -0.07352916
#Sepal.Width -0.07352916 0.02405009
Identical result!
Remark 1 (skip forming 'Q' factor)
Instead of b <- (t(qr.Q(qr.X)) %*% y)[1:p], you can use function qr.qty (to avoid forming 'Q' matrix):
b <- qr.qty(qr.X, y)[1:p]
Remark 2 (skip forming 'R' factor)
You don't have to extract R <- qr.R(qr.X) for backsolve; using qr.X$qr is sufficient:
beta <- as.vector(backsolve(qr.X$qr, b))
Appendix: A function for estimation
The above is the simplest demonstration. In practice column pivoting and rank-deficiency need be dealt with. The following is an implementation. X is a model matrix and y is the response. Results should be compared with lm(y ~ X + 0).
qr_estimation <- function (X, y) {
## QR factorization
QR <- qr(X)
r <- QR$rank
piv <- QR$pivot[1:r]
## estimate identifiable coefficients
b <- qr.qty(QR, y)[1:r]
beta <- backsolve(QR$qr, b, r)
## fitted values
yhat <- base::c(X[, piv] %*% beta)
## residuals
resi <- y - yhat
## error variance
se2 <- base::c(crossprod(resi)) / (nrow(X) - r)
## variance-covariance for coefficients
V <- chol2inv(QR$qr, r) * se2
## post-processing on pivoting and rank-deficiency
p <- ncol(X)
beta_full <- rep.int(NA_real_, p)
beta_full[piv] <- beta
V_full <- matrix(NA_real_, p, p)
V_full[piv, piv] <- V
## return
list(coefficients = beta_full, vcov = V_full,
fitted.values = yhat, residuals = resi, sig = sqrt(se2))
}
I have several nonlinear regression models (nls) saved as a1, a2,..., a_n. I would like to get a vector of related determinantion coefficients.
E.g.
y <- c(1.0385, 1.0195, 1.0176)
x <- c(3,4,5)
data <- data.frame(x,y)
b1 <- function(x,a,b) {a/b^x}
b2 <- function(x,a,b) {a^b^x}
a1 <- nls(y ~ b1(x,a,b), data = data, start = c(a=0.9, b=0.6))
a2 <- nls(y ~ b2(x,a,b), data = data, start = c(a=0.9, b=0.6))
I can get both coefficients of detetermination using
a <- sum(residuals(a1)^2)
b <- sum((y - mean(y))^2)
1 - (a/b)
#[1] 0.8198396
a <- sum(residuals(a2)^2)
b <- sum((y - mean(y))^2)
1 - (a/b)
#[1] 0.9066859
but what if I have let say 20 models?
I tried to use a cycle for, which didn't work for me as the class is nls, neither a vector nor a matrix.
Use a list of all your results and then apply a function to it:
results <- list(a1,a2)
b <- sum((y - mean(y))^2)
1 - (sapply(results,function(x) sum(residuals(x)^2) ) / b )
#[1] 0.8198396 0.9066859
I have been trying to use biglm to run linear regressions on a large dataset (approx 60,000,000 lines). I want to use AIC for model selection. However I discovered when playing with biglm on smaller datasets that the AIC variables returned by biglm are different from those returned by lm. This even applies to the example in the biglm help.
data(trees)
ff<-log(Volume)~log(Girth)+log(Height)
chunk1<-trees[1:10,]
chunk2<-trees[11:20,]
chunk3<-trees[21:31,]
library(biglm)
a <- biglm(ff,chunk1)
a <- update(a,chunk2)
a <- update(a,chunk3)
AIC(a)#48.18546
a_lm <- lm(ff, trees)
AIC(a_lm)#-62.71125
Can someone please explain what is happening here? Are the AICs generated with biglm safe to use for comparing biglm models on the same dataset?
tl;dr it looks to me like there is a pretty obvious bug in the AIC method for biglm-class objects (more specifically, in the update method), in the current (0.9-1) version, but the author of the biglm package is a smart, experienced guy, and biglm is widely used, so perhaps I'm missing something. Googling for "biglm AIC df.resid", it seems this has been discussed way back in 2009? Update: the package author/maintainer reports via e-mail that this is indeed a bug.
Something funny seems to be going on here. The differences in AIC between models should be the same across modeling frameworks, whatever the constants that have been used and however parameters are counted (because these constants and parameter-counting should be consistent within modeling frameworks ...)
Original example:
data(trees)
ff <- log(Volume)~log(Girth)+log(Height)
chunk1<-trees[1:10,]
chunk2<-trees[11:20,]
chunk3<-trees[21:31,]
library(biglm)
a <- biglm(ff,chunk1)
a <- update(a,chunk2)
a <- update(a,chunk3)
a_lm <- lm(ff, trees)
Now fit a reduced model:
ff2 <- log(Volume)~log(Girth)
a2 <- biglm(ff2, chunk1)
a2 <- update(a2, chunk2)
a2 <- update(a2 ,chunk3)
a2_lm <- lm(ff2,trees)
Now compare AIC values:
AIC(a)-AIC(a2)
## [1] 1.80222
AIC(a_lm)-AIC(a2_lm)
## [1] -20.50022
Check that we haven't screwed something up:
all.equal(coef(a),coef(a_lm)) ## TRUE
all.equal(coef(a2),coef(a2_lm)) ## TRUE
Look under the hood:
biglm:::AIC.biglm
## function (object, ..., k = 2)
## deviance(object) + k * (object$n - object$df.resid)
In principle this is the right formula (number of observations minus residual df should be the number of parameters fitted), but digging in, it looks like the $df.resid component of the objects hasn't been updated properly:
a$n ## 31, correct
a$df.resid ## 7, only valid before updating!
Looking at biglm:::update.biglm, I would add
object$df.resid <- object$df.resid + NROW(mm)
right before or after the line that reads
object$n <- object$n + NROW(mm)
...
This seems like a fairly obvious bug to me, so perhaps I'm missing something obvious, or perhaps it has been fixed.
A simple workaround would be to define your own AIC function as
AIC.biglm <- function (object, ..., k = 2) {
deviance(object) + k * length(coef(object))
}
AIC(a)-AIC(a2) ## matches results from lm()
(although note that AIC(a_lm) is still not equal to AIC(a), because stats:::AIC.default() uses 2*log-likelihood rather than deviance (these two measures differ in their additive coefficients) ...)
I have played around with this a bit. I am not certain, but I think the formula for AIC used by the package biglm is:
2 * (n.parameters + obs.added - 1) + deviance(a)
where obs_added is the number of observations in chunk2 plus the number of observations in chunk3:
obs.added <- dim(chunk2)[1] + dim(chunk3)[1]
and n.parameters is the number of estimated coefficients returned by summary(a) + 1 (where the +1 is for the error term), and deviance(a) is the deviance of your model a.
####################################################
data(trees)
ff <- log(Volume)~log(Girth)+log(Height)
n.parm <- 4
chunk1<-trees[1:10,]
chunk2<-trees[11:20,]
chunk3<-trees[21:31,]
obs.added <- dim(chunk2)[1] + dim(chunk3)[1]
library(biglm)
a <- biglm(ff,chunk1)
a <- update(a,chunk2)
a <- update(a,chunk3)
AIC(a)
summary(a)
deviance(a)
2 * (n.parm + obs.added - 1) + deviance(a)
round(AIC(a), 5) == round(2 * (n.parm + obs.added - 1) + deviance(a), 5)
# [1] TRUE
####################################################
Since I am not 100% certain my answer is correct, you can play around with the code below and see whether you can find a scenario where the proposed formula for AIC does not work. If I find any such scenarios I will attempt to modify the code below and the formula above as necessary.
#########################################################
# Generate some data
n <- 118 # number of observations
B0 <- 2 # intercept
B1 <- -1.5 # slope 1
B2 <- 0.4 # slope 2
B3 <- 2.0 # slope 3
B4 <- -0.8 # slope 4
sigma2 <- 5 # residual variance
x1 <- round(runif(n, -5 , 5), digits = 3) # covariate 1
x2 <- round(runif(n, 10 , 20), digits = 3) # covariate 2
x3 <- round(runif(n, 2 , 8), digits = 3) # covariate 3
x4 <- round(runif(n, 12 , 15), digits = 3) # covariate 4
eps <- rnorm(n, mean = 0, sd = sqrt(sigma2)) # error
y <- B0 + B1 * x1 + B2 * x2 + B3 * x3 + B4 * x4 + eps # dependent variable
my.data <- data.frame(y, x1, x2, x3, x4)
# analyze data with linear regression
model.1 <- lm(my.data$y ~ my.data$x1 + my.data$x2 + my.data$x3 + my.data$x4)
summary(model.1)
AIC(model.1)
n.parms <- length(model.1$coefficients) + 1
my.AIC <- 2 * n.parms - 2 * as.numeric(logLik(model.1))
my.AIC
#########################################################
ff0 <- y ~ 1
ff1 <- y ~ x1
ff2 <- y ~ x1 + x2
ff3 <- y ~ x1 + x2 + x3
ff4 <- y ~ x1 + x2 + x3 + x4
n.parm0 <- 2
n.parm1 <- 3
n.parm2 <- 4
n.parm3 <- 5
n.parm4 <- 6
n.chunks <- 5
chunk1<-my.data[ 1:round(((nrow(my.data)/n.chunks)*1)+0),]
chunk2<-my.data[round(((nrow(my.data)/n.chunks)*1)+1):round(((nrow(my.data)/n.chunks)*2)+0),]
chunk3<-my.data[round(((nrow(my.data)/n.chunks)*2)+1):round(((nrow(my.data)/n.chunks)*3)+0),]
chunk4<-my.data[round(((nrow(my.data)/n.chunks)*3)+1):round(((nrow(my.data)/n.chunks)*4)+0),]
chunk5<-my.data[round(((nrow(my.data)/n.chunks)*4)+1):nrow(my.data),]
obs.added <- dim(chunk2)[1] + dim(chunk3)[1] + dim(chunk4)[1] + dim(chunk5)[1]
# check division of data
foo <- list()
foo[[1]] <- chunk1
foo[[2]] <- chunk2
foo[[3]] <- chunk3
foo[[4]] <- chunk4
foo[[5]] <- chunk5
all.data.foo <- do.call(rbind, foo)
all.equal(my.data, all.data.foo)
####################################################
library(biglm)
####################################################
a0 <- biglm(ff0, chunk1)
a0 <- update(a0, chunk2)
a0 <- update(a0, chunk3)
a0 <- update(a0, chunk4)
a0 <- update(a0, chunk5)
AIC(a0)
summary(a0)
deviance(a0)
print(a0)
2 * (n.parm0 + obs.added - 1) + deviance(a0)
round(AIC(a0), 5) == round(2 * (n.parm0 + obs.added - 1) + deviance(a0), 5)
####################################################
a1 <- biglm(ff1, chunk1)
a1 <- update(a1, chunk2)
a1 <- update(a1, chunk3)
a1 <- update(a1, chunk4)
a1 <- update(a1, chunk5)
AIC(a1)
summary(a1)
deviance(a1)
print(a1)
2 * (n.parm1 + obs.added - 1) + deviance(a1)
round(AIC(a1), 5) == round(2 * (n.parm1 + obs.added - 1) + deviance(a1), 5)
####################################################
a2 <- biglm(ff2, chunk1)
a2 <- update(a2, chunk2)
a2 <- update(a2, chunk3)
a2 <- update(a2, chunk4)
a2 <- update(a2, chunk5)
AIC(a2)
summary(a2)
deviance(a2)
print(a2)
2 * (n.parm2 + obs.added - 1) + deviance(a2)
round(AIC(a2), 5) == round(2 * (n.parm2 + obs.added - 1) + deviance(a2), 5)
####################################################
a3 <- biglm(ff3, chunk1)
a3 <- update(a3, chunk2)
a3 <- update(a3, chunk3)
a3 <- update(a3, chunk4)
a3 <- update(a3, chunk5)
AIC(a3)
summary(a3)
deviance(a3)
print(a3)
2 * (n.parm3 + obs.added - 1) + deviance(a3)
round(AIC(a3), 5) == round(2 * (n.parm3 + obs.added - 1) + deviance(a3), 5)
####################################################
a4 <- biglm(ff4, chunk1)
a4 <- update(a4, chunk2)
a4 <- update(a4, chunk3)
a4 <- update(a4, chunk4)
a4 <- update(a4, chunk5)
AIC(a4)
summary(a4)
deviance(a4)
print(a4)
2 * (n.parm4 + obs.added - 1) + deviance(a4)
round(AIC(a4), 5) == round(2 * (n.parm4 + obs.added - 1) + deviance(a4), 5)
####################################################
EDIT
I suggested biglm uses the following equation for AIC:
2 * (n.parameters + obs.added - 1) + deviance(a)
Ben Bolker pointed out that the equation biglm uses for AIC is:
deviance(object) + k * (object$n - object$df.resid)
Ben also determined that biglm was not updating the first value for residual df.
Given that new information, I now see that the two equations are equivalent.
First, restrict the two equations to the following, which is the only place they differ:
(n.parameters + obs.added - 1) # mine
(object$n - object$df.resid) # Ben's
Re-arrange mine to account for me adding 1 to the number of parameters and then subtracting one:
((n.parameters-1) + obs.added) = ((4-1) + obs.added) = (3 + 21) = 24
Now morph my equation into Ben's:
My 3 is the same as:
(number of observations in chunk1 - object$df.resid) = (10 - 7) = 3
giving:
((number of obs in chunk1 - object$df.resid) + obs.added) = ((10-7) + 21)
or:
(3 + 21) = 24
Re-arrange:
((number of obs in chunk1 + obs.added) - object$df.resid) = ((10 + 21) - 7)
or:
(31 - 7) = 24
And:
((number of observations in chunk1 + obs.added) - object$df.resid)
is the same as:
(total number of observations - object$df.resid)
Which is the same as:
(object$n - object$df.resid) = (31 - 7) = 24
It appears the equation I proposed really is the equation biglm uses for AIC, just expressed in a different form.
Of course, I was only able to realize this because Ben provided both the critical code and the critical explanation of the error.
I am trying to use the command mle2, in the package bbmle. I am looking at p2 of "Maximum likelihood estimation and analysis with the bbmle package" by Bolker. Somehow I fail to enter the right start values. Here's the reproducible code:
l.lik.probit <-function(par, ivs, dv){
Y <- as.matrix(dv)
X <- as.matrix(ivs)
K <-ncol(X)
b <- as.matrix(par[1:K])
phi <- pnorm(X %*% b)
sum(Y * log(phi) + (1 - Y) * log(1 - phi))
}
n=200
set.seed(1000)
x1 <- rnorm(n)
x2 <- rnorm(n)
x3 <- rnorm(n)
x4 <- rnorm(n)
latentz<- 1 + 2.0 * x1 + 3.0 * x2 + 5.0 * x3 + 8.0 * x4 + rnorm(n,0,5)
y <- latentz
y[latentz < 1] <- 0
y[latentz >=1] <- 1
x <- cbind(1,x1,x2,x3,x4)
values.start <-c(1,1,1,1,1)
foo2<-mle2(l.lik.probit, start=list(dv=0,ivs=values.start),method="BFGS",optimizer="optim", data=list(Y=y,X=x))
And this is the error I get:
Error in mle2(l.lik.probit, start = list(Y = 0, X = values.start), method = "BFGS", :
some named arguments in 'start' are not arguments to the specified log-likelihood function
Any idea why? Thanks for your help!
You've missed a couple of things, but the most important is that by default mle2 takes a list of parameters; you can make it take a parameter vector instead, but you have to work a little bit harder.
I have tweaked the code slightly in places. (I changed the log-likelihood function to a negative log-likelihood function, without which this would never work!)
l.lik.probit <-function(par, ivs, dv){
K <- ncol(ivs)
b <- as.matrix(par[1:K])
phi <- pnorm(ivs %*% b)
-sum(dv * log(phi) + (1 - dv) * log(1 - phi))
}
n <- 200
set.seed(1000)
dat <- data.frame(x1=rnorm(n),
x2=rnorm(n),
x3=rnorm(n),
x4=rnorm(n))
beta <- c(1,2,3,5,8)
mm <- model.matrix(~x1+x2+x3+x4,data=dat)
latentz<- rnorm(n,mean=mm%*%beta,sd=5)
y <- latentz
y[latentz < 1] <- 0
y[latentz >=1] <- 1
x <- mm
values.start <- rep(1,5)
Now we do the fit. The main thing is to specify vecpar=TRUE and to use parnames to let mle2 know the names of the elements in the parameter vector ...
library("bbmle")
names(values.start) <- parnames(l.lik.probit) <- paste0("b",0:4)
m1 <- mle2(l.lik.probit, start=values.start,
vecpar=TRUE,
method="BFGS",optimizer="optim",
data=list(dv=y,ivs=x))
As pointed out above for this particular example you have just re-implemented the probit regression (although I understand that you now want to extend this to allow for heteroscedasticity in some way ...)
dat2 <- data.frame(dat,y)
m2 <- glm(y~x1+x2+x3+x4,family=binomial(link="probit"),
data=dat2)
As a final note, I would say that you should check out the parameters argument, which allows you to specify a sub-linear model for any one of the parameters, and the formula interface:
m3 <- mle2(y~dbinom(prob=pnorm(eta),size=1),
parameters=list(eta~x1+x2+x3+x4),
start=list(eta=0),
data=dat2)
PS confint(foo2) appears to work fine (giving profile CIs as requested) with this set-up.
ae <- function(x,y) all.equal(unname(coef(x)),unname(coef(y)),tol=5e-5)
ae(m1,m2) && ae(m2,m3)