Regression with equality and inequality constrained coefficients in R - r

I am trying to obtain estimated constrained coefficients using RSS. The beta coefficients are constrained between [0,1] and sum to 1. Additionally, my third parameter is constrained between (-1,1). Utilizing the below I can obtain a nice solution using simulated variables but when implementing the methodology on my real data set I keep arriving at a non-unique solution. In turn, I'm wondering if there is a more numerically stable way to obtain my estimated parameters.
set.seed(234)
k = 2
a = diff(c(0, sort(runif(k-1)), 1))
n = 1e4
x = matrix(rnorm(k*n), nc = k)
a2 = -0.5
y = a2 * (x %*% a) + rnorm(n)
f = function(u){sum((y - u[3] * (x %*% u[1:2]))^2)}
g = function(v){
v1 = v[1]
v2 = v[2]
u = vector(mode = "double", length = 3)
# ensure in (0,1)
v1 = 1 / (1 + exp(-v1))
# ensure add up to 1
u[1:2] = c(v1, 1 - sum(v1))
# ensure between [-1,1]
u[3] = (v2^2 - 1) / (v2^2 + 1)
u
}
res = optim(rnorm(2), function(v) f(g(v)), hessian = TRUE, method = "BFGS")
eigen(res$hessian)$values
res$convergence
rbind(Est = res$par, SE = sqrt(diag(solve(res$hessian))))
rbind(g(res$par),c(a,a2))
Hats off to http://zoonek.free.fr/blosxom/R/2012-06-01_Optimization.html

Since there has been no direct answer to your question so far, I'd like to show a way how to implement a parameter-constrained model in Stan/RStan. You should give this a try using your real data.
Doing Bayesian inference has the advantage of giving you posterior probabilities for your (constrained) model parameters. Point estimates including confidence intervals can then be easily calculated.
First off, we load the library and set RStan to store the compiled model and use multiple cores (if available).
library(rstan);
rstan_options(auto_write = TRUE);
options(mc.cores = parallel::detectCores());
We now define our Stan model. In this case, it's very simple, and we can make use of RStan's simplex data type for vectors of non-negative values that sum to one.
model <- "
data {
int<lower=1> n; // number of observations
int<lower=0> k; // number of parameters
matrix[n, k] X; // data
vector[n] y; // response
}
parameters {
real a2; // a2 is a free scaling parameter
simplex[k] a; // a is constrained to sum to 1
real sigma; // residuals
}
model {
// Likelihood
y ~ normal(a2 * (X * a), sigma);
}"
Stan supports various constrained data types; I'd recommend taking a lot at the Stan manual for more complex examples.
Using the sample data from your original question, we can run our model:
# Sample data
set.seed(234);
k = 2;
a = diff(c(0, sort(runif(k-1)), 1));
n = 1e4;
x = matrix(rnorm(k * n), nc = k);
a2 = -0.5;
y = a2 * (x %*% a) + rnorm(n);
# Fit stan model
fit <- stan(
model_code = model,
data = list(
n = n,
k = k,
X = x,
y = as.numeric(y)),
iter = 4000,
chains = 4);
Running the model will only take a few seconds (after the parser has internally translated and compiled the model in C++), and the full results (posterior distributions for all parameters conditional on the data) are stored in fit.
We can inspect the contents of fit using summary:
# Extract parameter estimates
pars <- summary(fit)$summary;
pars;
# mean se_mean sd 2.5% 25%
#a2 -0.4915289 1.970327e-04 0.014363398 -0.5194985 -0.5011471
#a[1] 0.7640606 2.273282e-04 0.016348488 0.7327691 0.7527457
#a[2] 0.2359394 2.273282e-04 0.016348488 0.2040952 0.2248482
#sigma 1.0048695 8.746869e-05 0.007048116 0.9909698 1.0001889
#lp__ -5048.4273105 1.881305e-02 1.204892294 -5051.4871931 -5048.9800451
# 50% 75% 97.5% n_eff Rhat
#a2 -0.4916061 -0.4819086 -0.4625947 5314.196 1.0000947
#a[1] 0.7638723 0.7751518 0.7959048 5171.881 0.9997468
#a[2] 0.2361277 0.2472543 0.2672309 5171.881 0.9997468
#sigma 1.0048994 1.0095420 1.0187554 6492.930 0.9998086
#lp__ -5048.1238783 -5047.5409682 -5047.0355381 4101.832 1.0012841
You can see that a[1]+a[2]=1.
Plotting parameter estimates including confidence intervals is also easy:
plot(fit);

The simplest way to solve optimization problems with equality and inequality constraints will most likely be through the "augmented Lagrangian" approach. In R this is, for example, realized in the alabama package.
# function and gradient
fn = function(u){sum((y - u[3] * (x %*% u[1:2]))^2)}
gr = function(u) numDeriv::grad(fn, u)
# constraint sum(u) == 1
heq = function(u) sum(u) - 1
# constraints 0 <= u[1],u[2] <= 1; -1 <= u[3] <= 1
hin = function(u) c(u[1], u[2], 1-u[1], 1-u[2], u[3]+1, 1-u[3])
sol_a = alabama::auglag(c(0.5, 0.5, 0), fn, gr, hin=hin, heq=heq)
sol_a
## $par
## [1] 1.0000000 0.3642904 -0.3642904
## $value
## [1] 10094.74
## ...
## $hessian
## [,1] [,2] [,3]
## [1,] 15009565054 9999999977 9999992926
## [2,] 9999999977 10000002578 9999997167
## [3,] 9999992926 9999997167 10000022569
For other packages containing an "augmented Lagrangian" procedure see the CRAN Task View on optimization.

Related

Better optimizer for constrained multinomial likelihood

Using R, I wish to estimate a vector of parameters a_i (of arbitrary length, i.e. i = 1,...,s) with a multinomial likelihood using a corresponding vector of observations n_i totaling a sample size of N=sum_i (n_i). The probabilities p_i of the multinomial are determined by said a parameters and measurements of variable x such that p_i = (a_i * x_i)/sum_i (a_i * x_i). I wish further to impose the constraint that sum_i a_i = 1.
I've managed to get optim() to do the job as follows --- implementing the two tricks I've seen of estimating the first a_1 as 1 - sum_{i=2} a_i and additionally renormalizing all estimates to 1 --- but the accuracy and dependability of achieving convergence remains rather variable (in addition to being sensitive to the vector of starting estimates I provide), even when N is very large.
I would appreciate guidance on more robust alternatives and/or improvements.
s <- 10 # vector length
N <- 1000 # total sample size
# variable
x_i <- round(rlnorm(n_p, 2.5, 1.5))
# true parameter values
a_i <- rbeta(s, 2, 2)
a_i <- a_i / sum(a_i)
# generate observations
n_i <- rmultinom(1, N, (a_i * x_i) / sum(a_i * x_i))
# negative log-likelihood for parameters `par'
nll = function(par) {
if (any(0 > par | par > 1)) {
return(NA)
}else{
par <- c(1 - sum(par), par) # estimate first as remainder
par <- par / sum(par) # normalize
p_i <- (par * x_i) / sum(par * x_i) # model for probabilities
- sum(dmultinom(
x = n_i,
size = N,
prob = p_i,
log = TRUE
)) }
}
# starting values (dropping first)
start = rep(1/s, s-1)
fit <- optim(par = start,
fn = nll,
control = list(maxit = 10000)
)
ests = c(1 - sum(fit$par), fit$par)
cbind(a_i, ests)
par(pty = 's')
plot(a_i, ests)
abline(0, 1)

R rugarch simulation

I'd like to know the range of each parameter in the rugarch specification models.
For example for distribution error "nig" and model "apARCH". I'd like to know what is the range for the parameters "skew", "shape" related to the "nig" distribution and the parameters "gamma" and "delta" for the model "apARCH".
This is my code example:
varianceModel = list(model="apARCH", garchOrder=c(1,1))
meanModel = list(armaOrder=c(1,1))
distributionModel = "nig"
fixedPars = list(mu=0, ar1 = 0.1, ma1= 0.9, omega=0.001, alpha1=0.1, beta1=0.8, gamma1 = 0.01, delta = 2, shape=1.5, skew = 0.2)
spec <- ugarchspec(variance.model = varianceModel,
mean.model= meanModel, distribution.model=distributionModel,
fixed.pars=fixedPars)
path.sgarch <- ugarchpath(spec, n.sim=1000, n.start=1, m.sim=20)
Now for each of this parameters, how I can get the possible range or the "standard" parameters?
There doesn't seem to be a list of ranges of possible values of such parameters in the documentation of rugarch, while this introduction provides only some partial information.
Those ranges of possible values, however, are (at least should be) standard in the sense that they provide well-defined distributions and stationary models. Hence, you should be able to find all such ranges in some other sources.
However, regarding the distributions, there actually is a hidden source in rugarch that you can use---the rugarch:::.DistributionBounds function source code. For instance, it contains
if (distribution == "nig") {
skew = 0.2
skew.LB = -0.99
skew.UB = 0.99
shape = 0.4
shape.LB = 0.01
shape.UB = 25
}
meaning that the lower and upper bounds for skew are -0.99 and 0.99, respectively. To extract those numbers faster, you may use
rugarch:::.DistributionBounds("nig")[c("skew.LB", "skew.UB")]
# $skew.LB
# [1] -0.99
#
# $skew.UB
# [1] 0.99
Regarding the variance models, typically "simple" ranges, such as as -1 < gamma < 1 for APARCH, are not available/what you want, because they only allow the model to exist, but doesn't guarantee stationarity. For instance, for GARCH(1,1) to be stationary we need alpha + beta < 1; hence, we actually have higher dimensional constraints than just intervals. As I said, you may find those online.
However, ugarchpath also checks those conditions by computing persistence(spec). Now, as you can see in
getMethod("persistence", signature(object = "uGARCHspec", pars = "missing",
distribution = "missing", model = "missing",
submodel="missing"))
there is a different way to compute this persistence for each specification. For instance, for APARCH we look at
rugarch:::.persistaparch1
# function (pars, idx, distribution = "norm")
# {
# alpha = pars[idx["alpha", 1]:idx["alpha", 2]]
# beta = pars[idx["beta", 1]:idx["beta", 2]]
# gamma = pars[idx["gamma", 1]:idx["gamma", 2]]
# delta = pars[idx["delta", 1]:idx["delta", 2]]
# skew = pars[idx["skew", 1]:idx["skew", 2]]
# shape = pars[idx["shape", 1]:idx["shape", 2]]
# ghlambda = pars[idx["ghlambda", 1]:idx["ghlambda", 2]]
# ps = sum(beta) + sum(apply(cbind(gamma, alpha), 1, FUN = function(x) x[2] *
# aparchKappa(x[1], delta, ghlambda, shape, skew, distribution)))
# return(ps)
# }
and the condition is that ps < 1. Notice that
rugarch:::.persistsgarch1
# function (pars, idx, distribution = "norm")
# {
# ps = sum(pars[idx["alpha", 1]:idx["alpha", 2]]) + sum(pars[idx["beta",
# 1]:idx["beta", 2]])
# return(ps)
# }
gives exactly alpha + beta in the case of GARCH(1,1) and then ugarchpathchecks the aforementioned stationarity condition. Hence, the most straightforward thing that you can do is to check if persistence(spec) < 1 before simulating. For instance, in your example,
persistence(spec)
# [1] 0.8997927

Running Regression with Constraints on Coefficients [duplicate]

I am trying to obtain estimated constrained coefficients using RSS. The beta coefficients are constrained between [0,1] and sum to 1. Additionally, my third parameter is constrained between (-1,1). Utilizing the below I can obtain a nice solution using simulated variables but when implementing the methodology on my real data set I keep arriving at a non-unique solution. In turn, I'm wondering if there is a more numerically stable way to obtain my estimated parameters.
set.seed(234)
k = 2
a = diff(c(0, sort(runif(k-1)), 1))
n = 1e4
x = matrix(rnorm(k*n), nc = k)
a2 = -0.5
y = a2 * (x %*% a) + rnorm(n)
f = function(u){sum((y - u[3] * (x %*% u[1:2]))^2)}
g = function(v){
v1 = v[1]
v2 = v[2]
u = vector(mode = "double", length = 3)
# ensure in (0,1)
v1 = 1 / (1 + exp(-v1))
# ensure add up to 1
u[1:2] = c(v1, 1 - sum(v1))
# ensure between [-1,1]
u[3] = (v2^2 - 1) / (v2^2 + 1)
u
}
res = optim(rnorm(2), function(v) f(g(v)), hessian = TRUE, method = "BFGS")
eigen(res$hessian)$values
res$convergence
rbind(Est = res$par, SE = sqrt(diag(solve(res$hessian))))
rbind(g(res$par),c(a,a2))
Hats off to http://zoonek.free.fr/blosxom/R/2012-06-01_Optimization.html
Since there has been no direct answer to your question so far, I'd like to show a way how to implement a parameter-constrained model in Stan/RStan. You should give this a try using your real data.
Doing Bayesian inference has the advantage of giving you posterior probabilities for your (constrained) model parameters. Point estimates including confidence intervals can then be easily calculated.
First off, we load the library and set RStan to store the compiled model and use multiple cores (if available).
library(rstan);
rstan_options(auto_write = TRUE);
options(mc.cores = parallel::detectCores());
We now define our Stan model. In this case, it's very simple, and we can make use of RStan's simplex data type for vectors of non-negative values that sum to one.
model <- "
data {
int<lower=1> n; // number of observations
int<lower=0> k; // number of parameters
matrix[n, k] X; // data
vector[n] y; // response
}
parameters {
real a2; // a2 is a free scaling parameter
simplex[k] a; // a is constrained to sum to 1
real sigma; // residuals
}
model {
// Likelihood
y ~ normal(a2 * (X * a), sigma);
}"
Stan supports various constrained data types; I'd recommend taking a lot at the Stan manual for more complex examples.
Using the sample data from your original question, we can run our model:
# Sample data
set.seed(234);
k = 2;
a = diff(c(0, sort(runif(k-1)), 1));
n = 1e4;
x = matrix(rnorm(k * n), nc = k);
a2 = -0.5;
y = a2 * (x %*% a) + rnorm(n);
# Fit stan model
fit <- stan(
model_code = model,
data = list(
n = n,
k = k,
X = x,
y = as.numeric(y)),
iter = 4000,
chains = 4);
Running the model will only take a few seconds (after the parser has internally translated and compiled the model in C++), and the full results (posterior distributions for all parameters conditional on the data) are stored in fit.
We can inspect the contents of fit using summary:
# Extract parameter estimates
pars <- summary(fit)$summary;
pars;
# mean se_mean sd 2.5% 25%
#a2 -0.4915289 1.970327e-04 0.014363398 -0.5194985 -0.5011471
#a[1] 0.7640606 2.273282e-04 0.016348488 0.7327691 0.7527457
#a[2] 0.2359394 2.273282e-04 0.016348488 0.2040952 0.2248482
#sigma 1.0048695 8.746869e-05 0.007048116 0.9909698 1.0001889
#lp__ -5048.4273105 1.881305e-02 1.204892294 -5051.4871931 -5048.9800451
# 50% 75% 97.5% n_eff Rhat
#a2 -0.4916061 -0.4819086 -0.4625947 5314.196 1.0000947
#a[1] 0.7638723 0.7751518 0.7959048 5171.881 0.9997468
#a[2] 0.2361277 0.2472543 0.2672309 5171.881 0.9997468
#sigma 1.0048994 1.0095420 1.0187554 6492.930 0.9998086
#lp__ -5048.1238783 -5047.5409682 -5047.0355381 4101.832 1.0012841
You can see that a[1]+a[2]=1.
Plotting parameter estimates including confidence intervals is also easy:
plot(fit);
The simplest way to solve optimization problems with equality and inequality constraints will most likely be through the "augmented Lagrangian" approach. In R this is, for example, realized in the alabama package.
# function and gradient
fn = function(u){sum((y - u[3] * (x %*% u[1:2]))^2)}
gr = function(u) numDeriv::grad(fn, u)
# constraint sum(u) == 1
heq = function(u) sum(u) - 1
# constraints 0 <= u[1],u[2] <= 1; -1 <= u[3] <= 1
hin = function(u) c(u[1], u[2], 1-u[1], 1-u[2], u[3]+1, 1-u[3])
sol_a = alabama::auglag(c(0.5, 0.5, 0), fn, gr, hin=hin, heq=heq)
sol_a
## $par
## [1] 1.0000000 0.3642904 -0.3642904
## $value
## [1] 10094.74
## ...
## $hessian
## [,1] [,2] [,3]
## [1,] 15009565054 9999999977 9999992926
## [2,] 9999999977 10000002578 9999997167
## [3,] 9999992926 9999997167 10000022569
For other packages containing an "augmented Lagrangian" procedure see the CRAN Task View on optimization.

Compound Poisson Process

My task is to simulate a compound Poisson process defined as:
where
is a Poisson process and Y_i are Gamma(shape,scale) distributed. This is my R code:
# parameter for Poisson distribution.
lambda = 1
# parameters for Gamma distribution.
shape = 7.5
scale = 1
comp.pois = function(t.max, lambda) {
stopifnot(t.max >= 0 && t.max %% 1 == 0)
# offset ns by 1 because first y is 0.
# generate N(t), that is number of arrivals until time t.
ns = cumsum(rpois(n = t.max, lambda = lambda)) + 1
# generate gamma distributed random variables Y_i.
ys = c(0, rgamma(n = max(ns), shape = shape, scale = scale))
# generate all X(t) for t <= t.max.
return(c(0, cumsum(x = ys[ns])))
}
Compute a random sample of X(10) and compare means and variances.
# sample size.
size = 1000
t = 10
# ts is a vector of sample values for X(10).
ts = sapply(1:size, function(i) comp.pois(t, lambda)[t])
# sample mean and variance:
(mean.s = mean(ts))
(var.s = var(ts))
# theoretical mean and variance:
(mean.t = lambda * t * shape * scale)
(var.t = (shape + 1) * shape * scale^2)
output:
> # sample:
> (mean.s = mean(ts))
[1] 63.38403
> (var.s = var(ts))
[1] 184.3264
> # theoretical:
> (mean.t = lambda * t * shape * scale)
[1] 75
> (var.t = (shape + 1) * shape * scale^2)
[1] 63.75
This variance is gigantic, but I cannot spot my mistake. Please help. Thank you.
EDIT:
I used the following algorithm to generate the N(t). I don't know why it is supposed to be better. I took it from Rizzo, Maria L. Statistical computing with R. CRC Press, 2007. The mean is good, but the variance is even worse. I tried sampling from the Gamma distribution only once for the entire simulation (although I'm pretty sure this does not reflect the problem very well) and the mean was off by around 10-40 for t = 10. When resampling for every X(t) (which is what the following code does), the mean is very exact. As pointed out, the variance is horrifying. This is probably not a good solution, but I suppose it is as good as it gets.
lambda = 3
shape = 6
scale = 2
size = 10000
eps = 1e-8
t = 10
# with probability 1-eps, n or less gamma distributed random variables are needed.
n = qpois(1-eps, lambda = lambda * t)
# sample from the gamma distribution. Not sure if it's ok to use the same sample every time.
# with this, the mean is of by about 10%.
# ys = c(rgamma(n = n, shape = shape, scale = scale))
# the interarrival times are exponentially distributed with rate lambda.
pp.exp = function (t0) {
# not sure how many Tn are needed :/
Tn = rexp(1000, lambda)
Sn = cumsum(Tn)
return(min(which(Sn > t0)) - 1)
}
# generate N(t) which follow the poisson process.
ns = sapply(1:size, function (i) pp.exp(t))
# generate X(t) as in the problem description.
xs = sapply(ns, function (n) {
ys = c(rgamma(n = n, shape = shape, scale = scale))
sum(ys[1:n])
})
output (t=10) in this case:
> # compare mean and variance of 'size' samples of X(t) for verification.
> # sample:
> (mean.s = mean(xs))
[1] 359.864
> (var.s = var(xs))
[1] 4933.277
> # theoretical:
> (mean.t = lambda * t * shape * scale)
[1] 360
> (var.t = (shape + 1) * shape * scale^2)
[1] 168

How does ar.yw estimate the variance

In R, how does the function ar.yw estimate the variance? Specifically, where does the number "var.pred" come from? It does not seem to come from the usual YW estimate of the variance, nor the sum of squared residuals divided by df (even though there is disagreement about what the df should be, none of the choices give an answer equivalent to var.pred). And yes, I know that there are better methods than YW; just trying to figure out what R is doing.
set.seed(82346)
temp <- arima.sim(n=10, list(ar = 0.5), sd=1)
fit <- ar(temp, method = "yule-walker", demean = FALSE, aic=FALSE, order.max=1)
## R's estimate of the sigma squared
fit$var.pred
## YW estimate
sum(temp^2)/10 - fit$ar*sum(temp[2:10]*temp[1:9])/10
## YW if there was a mean
sum((temp-mean(temp))^2)/10 - fit$ar*sum((temp[2:10]-mean(temp))*(temp[1:9]-mean(temp)))/10
## estimate based on residuals, different possible df.
sum(na.omit(fit$resid^2))/10
sum(na.omit(fit$resid^2))/9
sum(na.omit(fit$resid^2))/8
sum(na.omit(fit$resid^2))/7
Need to read the code if it's not documented.
?ar.yw
Which says: "In ar.yw the variance matrix of the innovations is computed from the fitted coefficients and the autocovariance of x." If that is not enough explanation, then you need to look at the code:
methods(ar.yw)
#[1] ar.yw.default* ar.yw.mts*
#see '?methods' for accessing help and source code
getAnywhere(ar.yw.default)
# there are two cases that I see
x <- as.matrix(x)
nser <- ncol(x)
if (nser > 1L) # .... not your situation
#....
else{
r <- as.double(drop(xacf))
z <- .Fortran(C_eureka, as.integer(order.max), r, r,
coefs = double(order.max^2), vars = double(order.max),
double(order.max))
coefs <- matrix(z$coefs, order.max, order.max)
partialacf <- array(diag(coefs), dim = c(order.max, 1L,
1L))
var.pred <- c(r[1L], z$vars)
#.......
order <- if (aic)
(0L:order.max)[xaic == 0L]
else order.max
ar <- if (order)
coefs[order, seq_len(order)]
else numeric()
var.pred <- var.pred[order + 1L]
var.pred <- var.pred * n.used/(n.used - (order + 1L))
So you now need to find the Fortran code for C_eureka. I think I'm finding it here: https://svn.r-project.org/R/trunk/src/library/stats/src/eureka.f This is the code that aI think is returning the var.pred estimate. I'm not a time series guy and It's your responsibility to review this process for applicability to your problem.
subroutine eureka (lr,r,g,f,var,a)
c
c solves Toeplitz matrix equation toep(r)f=g(1+.)
c by Levinson's algorithm
c a is a workspace of size lr, the number
c of equations
c
snipped
c estimate the innovations variance
var(l) = var(l-1) * (1 - f(l,l)*f(l,l))
if (l .eq. lr) return
d = 0.0d0
q = 0.0d0
do 50 i = 1, l
k = l-i+2
d = d + a(i)*r(k)
q = q + f(l,i)*r(k)
50 continue

Resources