I have been trying to use biglm to run linear regressions on a large dataset (approx 60,000,000 lines). I want to use AIC for model selection. However I discovered when playing with biglm on smaller datasets that the AIC variables returned by biglm are different from those returned by lm. This even applies to the example in the biglm help.
data(trees)
ff<-log(Volume)~log(Girth)+log(Height)
chunk1<-trees[1:10,]
chunk2<-trees[11:20,]
chunk3<-trees[21:31,]
library(biglm)
a <- biglm(ff,chunk1)
a <- update(a,chunk2)
a <- update(a,chunk3)
AIC(a)#48.18546
a_lm <- lm(ff, trees)
AIC(a_lm)#-62.71125
Can someone please explain what is happening here? Are the AICs generated with biglm safe to use for comparing biglm models on the same dataset?
tl;dr it looks to me like there is a pretty obvious bug in the AIC method for biglm-class objects (more specifically, in the update method), in the current (0.9-1) version, but the author of the biglm package is a smart, experienced guy, and biglm is widely used, so perhaps I'm missing something. Googling for "biglm AIC df.resid", it seems this has been discussed way back in 2009? Update: the package author/maintainer reports via e-mail that this is indeed a bug.
Something funny seems to be going on here. The differences in AIC between models should be the same across modeling frameworks, whatever the constants that have been used and however parameters are counted (because these constants and parameter-counting should be consistent within modeling frameworks ...)
Original example:
data(trees)
ff <- log(Volume)~log(Girth)+log(Height)
chunk1<-trees[1:10,]
chunk2<-trees[11:20,]
chunk3<-trees[21:31,]
library(biglm)
a <- biglm(ff,chunk1)
a <- update(a,chunk2)
a <- update(a,chunk3)
a_lm <- lm(ff, trees)
Now fit a reduced model:
ff2 <- log(Volume)~log(Girth)
a2 <- biglm(ff2, chunk1)
a2 <- update(a2, chunk2)
a2 <- update(a2 ,chunk3)
a2_lm <- lm(ff2,trees)
Now compare AIC values:
AIC(a)-AIC(a2)
## [1] 1.80222
AIC(a_lm)-AIC(a2_lm)
## [1] -20.50022
Check that we haven't screwed something up:
all.equal(coef(a),coef(a_lm)) ## TRUE
all.equal(coef(a2),coef(a2_lm)) ## TRUE
Look under the hood:
biglm:::AIC.biglm
## function (object, ..., k = 2)
## deviance(object) + k * (object$n - object$df.resid)
In principle this is the right formula (number of observations minus residual df should be the number of parameters fitted), but digging in, it looks like the $df.resid component of the objects hasn't been updated properly:
a$n ## 31, correct
a$df.resid ## 7, only valid before updating!
Looking at biglm:::update.biglm, I would add
object$df.resid <- object$df.resid + NROW(mm)
right before or after the line that reads
object$n <- object$n + NROW(mm)
...
This seems like a fairly obvious bug to me, so perhaps I'm missing something obvious, or perhaps it has been fixed.
A simple workaround would be to define your own AIC function as
AIC.biglm <- function (object, ..., k = 2) {
deviance(object) + k * length(coef(object))
}
AIC(a)-AIC(a2) ## matches results from lm()
(although note that AIC(a_lm) is still not equal to AIC(a), because stats:::AIC.default() uses 2*log-likelihood rather than deviance (these two measures differ in their additive coefficients) ...)
I have played around with this a bit. I am not certain, but I think the formula for AIC used by the package biglm is:
2 * (n.parameters + obs.added - 1) + deviance(a)
where obs_added is the number of observations in chunk2 plus the number of observations in chunk3:
obs.added <- dim(chunk2)[1] + dim(chunk3)[1]
and n.parameters is the number of estimated coefficients returned by summary(a) + 1 (where the +1 is for the error term), and deviance(a) is the deviance of your model a.
####################################################
data(trees)
ff <- log(Volume)~log(Girth)+log(Height)
n.parm <- 4
chunk1<-trees[1:10,]
chunk2<-trees[11:20,]
chunk3<-trees[21:31,]
obs.added <- dim(chunk2)[1] + dim(chunk3)[1]
library(biglm)
a <- biglm(ff,chunk1)
a <- update(a,chunk2)
a <- update(a,chunk3)
AIC(a)
summary(a)
deviance(a)
2 * (n.parm + obs.added - 1) + deviance(a)
round(AIC(a), 5) == round(2 * (n.parm + obs.added - 1) + deviance(a), 5)
# [1] TRUE
####################################################
Since I am not 100% certain my answer is correct, you can play around with the code below and see whether you can find a scenario where the proposed formula for AIC does not work. If I find any such scenarios I will attempt to modify the code below and the formula above as necessary.
#########################################################
# Generate some data
n <- 118 # number of observations
B0 <- 2 # intercept
B1 <- -1.5 # slope 1
B2 <- 0.4 # slope 2
B3 <- 2.0 # slope 3
B4 <- -0.8 # slope 4
sigma2 <- 5 # residual variance
x1 <- round(runif(n, -5 , 5), digits = 3) # covariate 1
x2 <- round(runif(n, 10 , 20), digits = 3) # covariate 2
x3 <- round(runif(n, 2 , 8), digits = 3) # covariate 3
x4 <- round(runif(n, 12 , 15), digits = 3) # covariate 4
eps <- rnorm(n, mean = 0, sd = sqrt(sigma2)) # error
y <- B0 + B1 * x1 + B2 * x2 + B3 * x3 + B4 * x4 + eps # dependent variable
my.data <- data.frame(y, x1, x2, x3, x4)
# analyze data with linear regression
model.1 <- lm(my.data$y ~ my.data$x1 + my.data$x2 + my.data$x3 + my.data$x4)
summary(model.1)
AIC(model.1)
n.parms <- length(model.1$coefficients) + 1
my.AIC <- 2 * n.parms - 2 * as.numeric(logLik(model.1))
my.AIC
#########################################################
ff0 <- y ~ 1
ff1 <- y ~ x1
ff2 <- y ~ x1 + x2
ff3 <- y ~ x1 + x2 + x3
ff4 <- y ~ x1 + x2 + x3 + x4
n.parm0 <- 2
n.parm1 <- 3
n.parm2 <- 4
n.parm3 <- 5
n.parm4 <- 6
n.chunks <- 5
chunk1<-my.data[ 1:round(((nrow(my.data)/n.chunks)*1)+0),]
chunk2<-my.data[round(((nrow(my.data)/n.chunks)*1)+1):round(((nrow(my.data)/n.chunks)*2)+0),]
chunk3<-my.data[round(((nrow(my.data)/n.chunks)*2)+1):round(((nrow(my.data)/n.chunks)*3)+0),]
chunk4<-my.data[round(((nrow(my.data)/n.chunks)*3)+1):round(((nrow(my.data)/n.chunks)*4)+0),]
chunk5<-my.data[round(((nrow(my.data)/n.chunks)*4)+1):nrow(my.data),]
obs.added <- dim(chunk2)[1] + dim(chunk3)[1] + dim(chunk4)[1] + dim(chunk5)[1]
# check division of data
foo <- list()
foo[[1]] <- chunk1
foo[[2]] <- chunk2
foo[[3]] <- chunk3
foo[[4]] <- chunk4
foo[[5]] <- chunk5
all.data.foo <- do.call(rbind, foo)
all.equal(my.data, all.data.foo)
####################################################
library(biglm)
####################################################
a0 <- biglm(ff0, chunk1)
a0 <- update(a0, chunk2)
a0 <- update(a0, chunk3)
a0 <- update(a0, chunk4)
a0 <- update(a0, chunk5)
AIC(a0)
summary(a0)
deviance(a0)
print(a0)
2 * (n.parm0 + obs.added - 1) + deviance(a0)
round(AIC(a0), 5) == round(2 * (n.parm0 + obs.added - 1) + deviance(a0), 5)
####################################################
a1 <- biglm(ff1, chunk1)
a1 <- update(a1, chunk2)
a1 <- update(a1, chunk3)
a1 <- update(a1, chunk4)
a1 <- update(a1, chunk5)
AIC(a1)
summary(a1)
deviance(a1)
print(a1)
2 * (n.parm1 + obs.added - 1) + deviance(a1)
round(AIC(a1), 5) == round(2 * (n.parm1 + obs.added - 1) + deviance(a1), 5)
####################################################
a2 <- biglm(ff2, chunk1)
a2 <- update(a2, chunk2)
a2 <- update(a2, chunk3)
a2 <- update(a2, chunk4)
a2 <- update(a2, chunk5)
AIC(a2)
summary(a2)
deviance(a2)
print(a2)
2 * (n.parm2 + obs.added - 1) + deviance(a2)
round(AIC(a2), 5) == round(2 * (n.parm2 + obs.added - 1) + deviance(a2), 5)
####################################################
a3 <- biglm(ff3, chunk1)
a3 <- update(a3, chunk2)
a3 <- update(a3, chunk3)
a3 <- update(a3, chunk4)
a3 <- update(a3, chunk5)
AIC(a3)
summary(a3)
deviance(a3)
print(a3)
2 * (n.parm3 + obs.added - 1) + deviance(a3)
round(AIC(a3), 5) == round(2 * (n.parm3 + obs.added - 1) + deviance(a3), 5)
####################################################
a4 <- biglm(ff4, chunk1)
a4 <- update(a4, chunk2)
a4 <- update(a4, chunk3)
a4 <- update(a4, chunk4)
a4 <- update(a4, chunk5)
AIC(a4)
summary(a4)
deviance(a4)
print(a4)
2 * (n.parm4 + obs.added - 1) + deviance(a4)
round(AIC(a4), 5) == round(2 * (n.parm4 + obs.added - 1) + deviance(a4), 5)
####################################################
EDIT
I suggested biglm uses the following equation for AIC:
2 * (n.parameters + obs.added - 1) + deviance(a)
Ben Bolker pointed out that the equation biglm uses for AIC is:
deviance(object) + k * (object$n - object$df.resid)
Ben also determined that biglm was not updating the first value for residual df.
Given that new information, I now see that the two equations are equivalent.
First, restrict the two equations to the following, which is the only place they differ:
(n.parameters + obs.added - 1) # mine
(object$n - object$df.resid) # Ben's
Re-arrange mine to account for me adding 1 to the number of parameters and then subtracting one:
((n.parameters-1) + obs.added) = ((4-1) + obs.added) = (3 + 21) = 24
Now morph my equation into Ben's:
My 3 is the same as:
(number of observations in chunk1 - object$df.resid) = (10 - 7) = 3
giving:
((number of obs in chunk1 - object$df.resid) + obs.added) = ((10-7) + 21)
or:
(3 + 21) = 24
Re-arrange:
((number of obs in chunk1 + obs.added) - object$df.resid) = ((10 + 21) - 7)
or:
(31 - 7) = 24
And:
((number of observations in chunk1 + obs.added) - object$df.resid)
is the same as:
(total number of observations - object$df.resid)
Which is the same as:
(object$n - object$df.resid) = (31 - 7) = 24
It appears the equation I proposed really is the equation biglm uses for AIC, just expressed in a different form.
Of course, I was only able to realize this because Ben provided both the critical code and the critical explanation of the error.
Related
I want to visualize some assumptions on regression theory. Starting point is this population and linear regression:
set.seed(1234)
runifdisc <- function(n, min = 0, max = 1) sample(min:max, n, replace = T)
x1 <- runifdisc(100000, 1, 10)
e <- runifdisc(100000, 0, 20)
y <- 4 + 2 * x1 + e
dat_pop <- data.frame(x1, e, y)
m_pop <- lm(y ~ x1, data = dat_pop)
Since x1 has 10 values, m_pop leads to 10 different predictions for y given x.
> table(round(predict(m_pop)))
16 18 20 22 24 26 28 30 32 34
10096 10081 9864 10078 9927 9914 9915 10124 10018 9983
If I am not mistaken, the predictions of y given x1 should have the same variance for each specific value of x1 in a large number of samples. An assumption that can also be applied to the residuals. However, in my code, the variance varies with the different y values given x1:
n <- 1000
vec_y1 <- rep(0, n)
vec_y2 <- rep(0, n)
vec_y3 <- rep(0, n)
vec_y4 <- rep(0, n)
vec_y5 <- rep(0, n)
vec_y6 <- rep(0, n)
vec_y7 <- rep(0, n)
vec_y8 <- rep(0, n)
vec_y9 <- rep(0, n)
vec_y10 <- rep(0, n)
#Draw 1000 samples from dat_pop; in each sample, save the prediction of y given x1
#in vectors vec_y1-vec_y10.
for (i in 1:n){
s <- dat_pop[sample(nrow(dat_pop), 1000), ]
m <- lm(y ~ x1, data = s)
#Prediction for y given x1 == 1
vec_y1[i] <- m$coefficients[1] + 1 * m$coefficients[2]
#Prediction for y given x1 == 2
vec_y2[i] <- m$coefficients[1] + 2 * m$coefficients[2]
#Prediction for y given x1 == 3
vec_y3[i] <- m$coefficients[1] + 3 * m$coefficients[2]
#Prediction for y given x1 == ...
vec_y4[i] <- m$coefficients[1] + 4 * m$coefficients[2]
vec_y5[i] <- m$coefficients[1] + 5 * m$coefficients[2]
vec_y6[i] <- m$coefficients[1] + 6 * m$coefficients[2]
vec_y7[i] <- m$coefficients[1] + 7 * m$coefficients[2]
vec_y8[i] <- m$coefficients[1] + 8 * m$coefficients[2]
vec_y9[i] <- m$coefficients[1] + 9 * m$coefficients[2]
vec_y10[i] <- m$coefficients[1] + 10 * m$coefficients[2]
}
#Variance of different predictions for y given x1 in the samples above.
#This variance should be equal for all vectors vec_y1-vec_y10.
var(vec_y1)
var(vec_y3)
var(vec_y5)
var(vec_y8)
var(vec_y10)
The variance is larger for the lower and upper values of x1.
> var(vec_y1)
[1] 0.1234933
> var(vec_y3)
[1] 0.06295427
> var(vec_y5)
[1] 0.03637214
> var(vec_y8)
[1] 0.06016804
> var(vec_y10)
[1] 0.118478
My question addresses, on the one hand, my understanding of the assumption from regression theory. Perhaps there is a misunderstanding on my side. On the other hand, the question is about the code that produces the same variance for all y given x1.
I think this will help to solve it. It took me a while to see it...
The variance is smallest at 5 because it is the mean of your model (x1 values spans from 0 to 10) If you change the model to eg. 1:20, then 10 will have the minimal variance.
You train the model x times in the for loop. every time the predicted slope changes a bit, but most at the ends (1 and 10). And that is the reason behind it. The regression will alway go through the center.
Below are two test samples (two iterations from the loop).
I have been working on this algorithm all week, and I do not seem to find the problem.
I employ Stata's NLSUR command for a simple QUAIDS maximization. Stata requires me to write a program evaluator in which I parametrize my system of equations as well as imposing parametric restrictions. Those with experience in such implementations will find the code below familiar. Then, Stata'NLSUR uses MLE to find the parameters. This is not problem, so I use these results (which are correct) to check my log-likelihood optimization problem in R.
R is a bit trickier, because it requires me to write a similar script to that in Stata, but I need to additionally specified the log-likelihood and so the variance-covariance matrix in a function. This function is shown below. I have tried the BBoptim() solver and the results for the parameters do not match, only those in the variance-covariance matrix. I also wrote my own gradient descent algorithm and check it with other toy examples. Then, I check it with this function. My gradient descent algorithm works with the toy examples, but not with my function. I believe the problem is my function, which is similar to the one employ in Stata's NLSUR.
I have to also point out that I use the Stata's estimated parameters in my own function with the same data and I find the same log-likelihood value. I also predict values with Stata and my R's function and I find the same values. I have also tested if the parameter restrictions have been imposed, and it seems this is the case.
Could you help me figure it out what I am doing wrong? The reason that I need to write this specification is that I will be incorporating truncations in the log-likelihood, but I started step-by-step by first trying to solve the simpler problem without the truncations, but only using the log-likelihood.
quaids.loglike <- function(param, s1, s2, s3, lnp1, lnp2, lnp3, lnp4, lnw){
# alphas
a1 <- param[1]
a2 <- param[2]
a3 <- param[3]
a4 <- (1 - a1 - a2 - a3)
# betas
b1 <- param[4]
b2 <- param[5]
b3 <- param[6]
b4 <- (-b1 - b2 - b3)
# gammas
g11 <- param[7]
g12 <- param[8]
g13 <- param[9]
g14 <- (-g11 - g12 - g13)
g21 <- g12
g22 <- param[10]
g23 <- param[11]
g24 <- (-g21 - g22 - g23)
g31 <- g13
g32 <- g23
g33 <- param[12]
g34 <- (-g31 - g32 - g33)
g41 <- g14
g42 <- g24
g43 <- g34
g44 <- (-g41 - g42 - g43)
# lambdas
l1 <- param[13]
l2 <- param[14]
l3 <- param[15]
# Sigmas
sig11 <- param[16]
sig12 <- param[17]
sig13 <- param[19]
sig21 <- sig12
sig22 <- param[18]
sig23 <- param[20]
sig31 <- sig13
sig32 <- sig23
sig33 <- param[21]
# Lnpindex (Ln a(p) where a0 = 5)
lnpindex <- 5 + a1*lnp1 + a2*lnp2 + a3*lnp3 + a4*lnp4
lnpindex <- lnpindex + 0.5*(g11*lnp1*lnp1 + g12*lnp1*lnp2 + g13*lnp1*lnp3 + g14*lnp1*lnp4)
lnpindex <- lnpindex + 0.5*(g21*lnp2*lnp1 + g22*lnp2*lnp2 + g23*lnp2*lnp3 + g24*lnp2*lnp4)
lnpindex <- lnpindex + 0.5*(g31*lnp3*lnp1 + g32*lnp3*lnp2 + g33*lnp3*lnp3 + g34*lnp3*lnp4)
lnpindex <- lnpindex + 0.5*(g41*lnp4*lnp1 + g42*lnp4*lnp2 + g43*lnp4*lnp3 + g44*lnp4*lnp4)
#for(i in as.character(1:4)) {
# for(j in as.character(1:4)) {
# gij <- get(paste0("g", i, j))
# lnpi <- get(paste0("lnp", i))
# lnpj <- get(paste0("lnp", j))
# lnpindex <- lnpindex + 0.5*gij*lnpi*lnpj
# }
#}
# b(p) price index
bofp <- lnp1*b1 + lnp2*b2 + lnp3*b3 + lnp4*b4
#for(i in as.character(1:4)) {
# lnpi <- get(paste0("lnp", i))
# bi <- get(paste0("b", i))
# bofp <- bofp + lnpi*bi
#}
bofp <- exp(bofp)
# The parametric shares
u1 <- a1 + g11*lnp1 + g12*lnp2 + g13*lnp3 + g14*lnp4 + b1*(lnw - lnpindex) + (l1/bofp)*(lnw - lnpindex)^2
u2 <- a2 + g21*lnp1 + g22*lnp2 + g23*lnp3 + g24*lnp4 + b2*(lnw - lnpindex) + (l2/bofp)*(lnw - lnpindex)^2
u3 <- a3 + g31*lnp1 + g32*lnp2 + g33*lnp3 + g34*lnp4 + b3*(lnw - lnpindex) + (l3/bofp)*(lnw - lnpindex)^2
U <- c(mean(u1, na.rm = TRUE), mean(u2, na.rm = TRUE), mean(u3, na.rm = TRUE))
# The vcov matrix:
sigma <- c(sig11, sig12, sig13, sig21, sig22, sig23, sig31, sig32, sig33)
sigma <- matrix(sigma, 3, 3)
# The shares
S <- cbind(s1, s2, s3)
# the individual log-likelihood
ll <- dmvnorm(S, U, sigma = sigma, log = TRUE)
return(sum(ll))
}
I'd like to assign a numerical value to a character element of a list, but to an unquoted version of the element so that I can use it in a model formula.
Suppose I have a fully specified model formula that I'll use in, say, the nls function:
m.form <- y ~ b0 + b1*x1 + b2*x2
(I know my example is linear, but that doesn't matter for this). I also have a list of the parameter names and some starting values for each parameter:
params <- c("b0","b1","b2")
startvals <- list(b0=1, b1=1, b2=-1)
I then want to assign a value to a parameter in the params so I can estimate a restricted version of the model, lets say forcing b1==0. Of course, I want to do this by referring to the parameter in the vector params (because I'm going to do a loop over a model with more variables and parameter, estimating the model with the given restriction for each loop iteration).
So I want to do something like this:
params[2] <- 0
summary(nls(m.form,data,startvals[-2])
where I'm trying to replace the parameter name in the formula with numerical 0 and then delete the starting value for that parameter from the startvals since that parameter no longer appears in the model (very likely not the best way to do this!). The above doesn't work, but if instead of the "params[1] <- 0" line I use "b1 <- 0", it does work as intended. But I'll be looping through all the parameters in the model so I don't want to write out the actual parameter name each time. Thanks.
Edit 1
So to be clearer, I need to be able to impose the restriction by referring to the element of the params vector because I'm ultimately going to loop through, each time estimating the model with a different restriction. So, e.g. maybe in the first loop iteration I impose params[2]=0, but in the next, maybe it's params[3]=0.5.
1) It can be done without rewriting the formula by defining the value and removing it from startvals. No packages are used.
set.seed(123)
DF <- data.frame(y = rnorm(25), x1 = rnorm(25), x2 = rnorm(25))
m.form <- y ~ b0 + b1*x1 + b2*x2
startvals <- list(b0=1, b1=1, b2=-1)
b1 <- 0
nls(m.form, DF, start = startvals[-2])
giving:
Nonlinear regression model
model: y ~ b0 + b1 * x1 + b2 * x2
data: DF
b0 b2
-0.03457 0.12139
residual sum-of-squares: 21.18
Number of iterations to convergence: 1
Achieved convergence tolerance: 3.722e-09
2) or if you want to substitute b1 = 0 into the formula anyways then
m.form0 <- do.call("substitute", list(m.form, list(b1 = 0)))
nls(m.form0, DF, start = startvals[-2])
giving:
Nonlinear regression model
model: y ~ b0 + 0 * x1 + b2 * x2
data: DF
b0 b2
-0.03457 0.12139
residual sum-of-squares: 21.18
Number of iterations to convergence: 1
Achieved convergence tolerance: 3.722e-09
Added
If you want to specify these in terms of ix which is a non-empty vector of param index numbers and vals which is an equal length vector of constraint values then
set.seed(123)
DF <- data.frame(y = rnorm(25), x1 = rnorm(25), x2 = rnorm(25))
m.form <- y ~ b0 + b1*x1 + b2*x2
params <- c("b0", "b1", "b2")
startvals <- list(b0 = 1, b1 = 1, b2 = -1)
ix <- 2
vals <- 0
L <- setNames(list(vals), params[ix])
# 1
list2env(L, environment(m.form)) # add constraints to formula's envir
nls(m.form, DF, start = startvals[-ix])
## Nonlinear regression model
## model: y ~ b0 + b1 * x1 + b2 * x2
## ...snip...
# 2
m.form0 <- do.call("substitute", list(m.form, L))
nls(m.form0, DF, start = startvals[-ix])
## Nonlinear regression model
## model: y ~ b0 + 0 * x1 + b2 * x2
## ...sjip...
You could write a function that does the replacement:
m.form <- y ~ b0 + b1*x1 + b2*x2
restrict <- function(form, restrictions){
restrictions <- setNames(as.character(restrictions), names(restrictions))
form <- stringr::str_replace_all(deparse(form), restrictions)
as.formula(form)
}
params <- c("b0","b1","b2")
startvals <- list(b0=1, b1=1, b2=-1)
summary(nls(restrict(m.form, c(b1 = 0)),data,startvals[-2]))
You could retrict more than 1 parameter:
summary(nls(restrict(m.form, c(b1 = 0, b0 = 1)),data,startvals[3]))
I am trying to set up a simple OLS model with constraints on the coefficients in R. The code below is working. However, this demonstrates
y = c + a1x1 + a2x2 + a3x3 with constraint a1+a2 = 1
I would like to revise this constraint to:
a1*a2 - a3 = 0
thanks for your help!
WORKING CODE:
'''
set.seed(1000)
n <- 20
x1 <- seq(100,length.out=n)+rnorm(n,0,2)
x2 <- seq(50,length.out=n)+rnorm(n,0,2)
x3 <- seq(10,length.out=n)+rnorm(n,0,2)
constant <- 100
ymat <- constant + .5*x1 + .5*x2 + .75*x3 + rnorm(n,0,4)
xmat <- cbind(x1,x2,x3)
X <- cbind(rep(1,n),xmat) # explicitly include vector for constant
bh <- solve(t(X)%*%X)%*%t(X)%*%ymat
XX <- solve(t(X)%*%X)
cmat <- matrix(1,1,1)
Q <- matrix(c(0,1,1,0),ncol(X),1) # a1+a2=1 for y = c + a1x1 + a2x2 + a3x3
bc <- bh-XX%*%Q%*%solve(t(Q)%*%XX%*%Q)%*%(t(Q)%*%bh-cmat)
library(quadprog)
d <- t(ymat) %*% X
Rinv = solve(chol(t(X)%*%X))
qp <- solve.QP(Dmat=Rinv, dvec=d, Amat=Q, bvec=cmat, meq=1, factorized=TRUE)
qp
cbind(bh,qp$unconstrained.solution)
cbind(bc,qp$solution)
'''
Assuming the problem is to minimize || ymat - X b || ^2 subject to b[2] * b[3] == b[4] we can substitute for b[4] giving the unconstrained nls problem shown below. b below is the first 3 elements of b and we can get b[4] by multiplying the last two elements of b below together. No packages are used.
fm <- nls(ymat ~ X %*% c(b, b[2] * b[3]), start = list(b = 0:2))
fm
giving:
Nonlinear regression model
model: ymat ~ X %*% c(b, b[2] * b[3])
data: parent.frame()
b1 b2 b3
76.9718 0.6275 0.7598
residual sum-of-squares: 204
Number of iterations to convergence: 4
Achieved convergence tolerance: 6.555e-06
To compute b4
prod(coef(fm)[-1])
## [1] 0.476805
Note
In a similar way the original problem (to minimize the same objective but with the original constraint) can be reduced to an unconstrained problem and solved using nls via substitution:
nls(ymat ~ X %*% c(b[1], b[2], 1-b[2], b[3]), start = list(b = 0:2))
giving:
Nonlinear regression model
model: ymat ~ X %*% c(b[1], b[2], 1 - b[2], b[3])
data: parent.frame()
b1 b2 b3
105.3186 0.3931 0.7964
residual sum-of-squares: 222.3
Number of iterations to convergence: 1
Achieved convergence tolerance: 4.838e-08
It would even be possible to reparameterize to make this original problem solvable by lm
lm(ymat ~ I(X[, 2] - X[, 3]) + X[, 4] + offset(X[, 3]))
giving
Call:
lm(formula = ymat ~ I(X[, 2] - X[, 3]) + X[, 4] + offset(X[, 3]))
Coefficients:
(Intercept) I(X[, 2] - X[, 3]) X[, 4]
105.3186 0.3931 0.7964
G. grothendieck - thank you for your response. Unfortunately this didn't work for me.
I decided to work out the Lagrangian long handed, which turned out too complicated for me to solve.
Then realized,
a1*a2-a3 =0
a1*a2 = a3
ln(a1*a2)= ln(a3)
ln(a1) + ln(a2) -ln(a3) = 0
This leaves me with an additive constraint which I can solve with the quadprog package.
Maybe you can try the code below, using fmincon()
library(pracma)
library(NlcOptim)
# define objective function
fn <- function(v) norm(ymat- as.vector( xmat %*% v),"2")
# the constraint a1*a2 - a3 = 0
heq1 = function(v) prod(v[1:2])-v[3]
# solve a1, a2 and a3
res <- fmincon(0:2,fn,heq = heq1)
such that
> res$par
[1] 1.9043754 -0.1781830 -0.3393272
Link to dataset
Defined parameters:
M <- maximum.oxygen.uptake
m <- mass
a <- age
s <- sex
v <- as.numeric(vigorous.exercise>0)
sv <- s*v
asv <- a*s*v
as <- a*s
av <- a*v
lnm=log(m)
lnms <- log(m)*s
lnmv <- log(m)*v
lnmsv <- log(m)*s*v
y <- M/m^(2/3)
I fit an nls model successfully using:
nls.full <- nls(M ~ (m ^ (alpha0 + alpha1 * s + alpha2 * v + alpha3 * s * v)) *
(beta0 + beta1 * s + beta2 * v + beta3 * sv +
a * gamma0 + gamma1 * as + gamma2 * av + gamma3 * asv),
trace=TRUE,
start=list(alpha0=2/3, alpha1=0, alpha2=0, alpha3=0,
beta0=est[1], beta1=est[2], beta2=est[3],beta3=est[4],
gamma0=est[5],gamma1=est[6],gamma2=est[7],gamma3=est[8]))
PROBLEM: CAN'T PLOT PREDICTION
xpredict <- seq(10,120,length.out=300)
data1 <- data.frame(a=35,s=0,v=1,m=seq(10,120,length.out=300))
ypredict <- predict(nls.full, newdata=data1, type="response")
plot(log(maximum.oxygen.uptake) ~ log(mass), subset = (s=='0' & v=='1'))
lines(xpredict,ypredict)
lengths of y and x differ.
I don't see why it should, I defined a new data frame with 300 variables, I should only have 300 results in the y predict.
Your question adds an important case study on the use of predict, which is currently missing on this site (as far as I know), hence I did not close it as a duplicate as I would usually do.
This simple example is sufficient to illustrate what your problem is:
set.seed(0)
x <- runif(50)
y <- runif(50)
## true model
z <- exp(4 * x - x * y) + sin(0.5 * y) + rnorm(50)
We can fit a non-linear regression model by:
fit1 <- nls(z ~ exp(a * x + b * x * y) + sin(c * y),
start = list(a = 3, b = 0, c = 1))
or
xy <- x * y
fit2 <- nls(z ~ exp(a * x + b * xy) + sin(c * y),
start = list(a = 3, b = 0, c = 1))
However, be careful when making prediction with predict.
newdat <- data.frame(x = runif(2), y = runif(2))
pred1 <- predict(fit1, newdat)
# [1] 19.476569 2.870397
pred2 <- predict(fit2, newdat)
#[1] 12.205215 2.900922 16.675160 2.588310 18.466907 3.221744 21.207958
#[8] 2.478375 16.294230 2.230084 22.675165 2.741694 22.053141 2.441442
#[15] 20.378554 2.069649 20.362845 2.380586 10.570350 3.168567 11.477691
#[22] 2.438041 19.336928 2.648129 22.282448 2.899636 16.264152 3.229857
#[29] 19.928498 1.779721 16.563424 2.688125 14.925190 2.718176 21.853093
#[36] 1.856641 20.213350 1.957830 22.960452 2.767944 21.890656 2.719899
#[43] 22.370200 2.066384 14.061771 2.237771 12.102094 3.232742 18.985547
#[50] 1.909355
predict.nls does not issue any warning like what predict.lm and predict.glm do (Getting Warning: “ 'newdata' had 1 row but variables found have 32 rows” on predict.lm in R). Basically, you have to provide all variables used in your fitting formula. Be aware, xy is also a variable:
newdat$xy <- with(newdat, x * y)
pred2 <- predict(fit2, newdat)
# [1] 19.476569 2.870397