I am trying to obtain estimated constrained coefficients using RSS. The beta coefficients are constrained between [0,1] and sum to 1. Additionally, my third parameter is constrained between (-1,1). Utilizing the below I can obtain a nice solution using simulated variables but when implementing the methodology on my real data set I keep arriving at a non-unique solution. In turn, I'm wondering if there is a more numerically stable way to obtain my estimated parameters.
set.seed(234)
k = 2
a = diff(c(0, sort(runif(k-1)), 1))
n = 1e4
x = matrix(rnorm(k*n), nc = k)
a2 = -0.5
y = a2 * (x %*% a) + rnorm(n)
f = function(u){sum((y - u[3] * (x %*% u[1:2]))^2)}
g = function(v){
v1 = v[1]
v2 = v[2]
u = vector(mode = "double", length = 3)
# ensure in (0,1)
v1 = 1 / (1 + exp(-v1))
# ensure add up to 1
u[1:2] = c(v1, 1 - sum(v1))
# ensure between [-1,1]
u[3] = (v2^2 - 1) / (v2^2 + 1)
u
}
res = optim(rnorm(2), function(v) f(g(v)), hessian = TRUE, method = "BFGS")
eigen(res$hessian)$values
res$convergence
rbind(Est = res$par, SE = sqrt(diag(solve(res$hessian))))
rbind(g(res$par),c(a,a2))
Hats off to http://zoonek.free.fr/blosxom/R/2012-06-01_Optimization.html
Since there has been no direct answer to your question so far, I'd like to show a way how to implement a parameter-constrained model in Stan/RStan. You should give this a try using your real data.
Doing Bayesian inference has the advantage of giving you posterior probabilities for your (constrained) model parameters. Point estimates including confidence intervals can then be easily calculated.
First off, we load the library and set RStan to store the compiled model and use multiple cores (if available).
library(rstan);
rstan_options(auto_write = TRUE);
options(mc.cores = parallel::detectCores());
We now define our Stan model. In this case, it's very simple, and we can make use of RStan's simplex data type for vectors of non-negative values that sum to one.
model <- "
data {
int<lower=1> n; // number of observations
int<lower=0> k; // number of parameters
matrix[n, k] X; // data
vector[n] y; // response
}
parameters {
real a2; // a2 is a free scaling parameter
simplex[k] a; // a is constrained to sum to 1
real sigma; // residuals
}
model {
// Likelihood
y ~ normal(a2 * (X * a), sigma);
}"
Stan supports various constrained data types; I'd recommend taking a lot at the Stan manual for more complex examples.
Using the sample data from your original question, we can run our model:
# Sample data
set.seed(234);
k = 2;
a = diff(c(0, sort(runif(k-1)), 1));
n = 1e4;
x = matrix(rnorm(k * n), nc = k);
a2 = -0.5;
y = a2 * (x %*% a) + rnorm(n);
# Fit stan model
fit <- stan(
model_code = model,
data = list(
n = n,
k = k,
X = x,
y = as.numeric(y)),
iter = 4000,
chains = 4);
Running the model will only take a few seconds (after the parser has internally translated and compiled the model in C++), and the full results (posterior distributions for all parameters conditional on the data) are stored in fit.
We can inspect the contents of fit using summary:
# Extract parameter estimates
pars <- summary(fit)$summary;
pars;
# mean se_mean sd 2.5% 25%
#a2 -0.4915289 1.970327e-04 0.014363398 -0.5194985 -0.5011471
#a[1] 0.7640606 2.273282e-04 0.016348488 0.7327691 0.7527457
#a[2] 0.2359394 2.273282e-04 0.016348488 0.2040952 0.2248482
#sigma 1.0048695 8.746869e-05 0.007048116 0.9909698 1.0001889
#lp__ -5048.4273105 1.881305e-02 1.204892294 -5051.4871931 -5048.9800451
# 50% 75% 97.5% n_eff Rhat
#a2 -0.4916061 -0.4819086 -0.4625947 5314.196 1.0000947
#a[1] 0.7638723 0.7751518 0.7959048 5171.881 0.9997468
#a[2] 0.2361277 0.2472543 0.2672309 5171.881 0.9997468
#sigma 1.0048994 1.0095420 1.0187554 6492.930 0.9998086
#lp__ -5048.1238783 -5047.5409682 -5047.0355381 4101.832 1.0012841
You can see that a[1]+a[2]=1.
Plotting parameter estimates including confidence intervals is also easy:
plot(fit);
The simplest way to solve optimization problems with equality and inequality constraints will most likely be through the "augmented Lagrangian" approach. In R this is, for example, realized in the alabama package.
# function and gradient
fn = function(u){sum((y - u[3] * (x %*% u[1:2]))^2)}
gr = function(u) numDeriv::grad(fn, u)
# constraint sum(u) == 1
heq = function(u) sum(u) - 1
# constraints 0 <= u[1],u[2] <= 1; -1 <= u[3] <= 1
hin = function(u) c(u[1], u[2], 1-u[1], 1-u[2], u[3]+1, 1-u[3])
sol_a = alabama::auglag(c(0.5, 0.5, 0), fn, gr, hin=hin, heq=heq)
sol_a
## $par
## [1] 1.0000000 0.3642904 -0.3642904
## $value
## [1] 10094.74
## ...
## $hessian
## [,1] [,2] [,3]
## [1,] 15009565054 9999999977 9999992926
## [2,] 9999999977 10000002578 9999997167
## [3,] 9999992926 9999997167 10000022569
For other packages containing an "augmented Lagrangian" procedure see the CRAN Task View on optimization.
Related
Using R, I wish to estimate a vector of parameters a_i (of arbitrary length, i.e. i = 1,...,s) with a multinomial likelihood using a corresponding vector of observations n_i totaling a sample size of N=sum_i (n_i). The probabilities p_i of the multinomial are determined by said a parameters and measurements of variable x such that p_i = (a_i * x_i)/sum_i (a_i * x_i). I wish further to impose the constraint that sum_i a_i = 1.
I've managed to get optim() to do the job as follows --- implementing the two tricks I've seen of estimating the first a_1 as 1 - sum_{i=2} a_i and additionally renormalizing all estimates to 1 --- but the accuracy and dependability of achieving convergence remains rather variable (in addition to being sensitive to the vector of starting estimates I provide), even when N is very large.
I would appreciate guidance on more robust alternatives and/or improvements.
s <- 10 # vector length
N <- 1000 # total sample size
# variable
x_i <- round(rlnorm(n_p, 2.5, 1.5))
# true parameter values
a_i <- rbeta(s, 2, 2)
a_i <- a_i / sum(a_i)
# generate observations
n_i <- rmultinom(1, N, (a_i * x_i) / sum(a_i * x_i))
# negative log-likelihood for parameters `par'
nll = function(par) {
if (any(0 > par | par > 1)) {
return(NA)
}else{
par <- c(1 - sum(par), par) # estimate first as remainder
par <- par / sum(par) # normalize
p_i <- (par * x_i) / sum(par * x_i) # model for probabilities
- sum(dmultinom(
x = n_i,
size = N,
prob = p_i,
log = TRUE
)) }
}
# starting values (dropping first)
start = rep(1/s, s-1)
fit <- optim(par = start,
fn = nll,
control = list(maxit = 10000)
)
ests = c(1 - sum(fit$par), fit$par)
cbind(a_i, ests)
par(pty = 's')
plot(a_i, ests)
abline(0, 1)
I'd like to know the range of each parameter in the rugarch specification models.
For example for distribution error "nig" and model "apARCH". I'd like to know what is the range for the parameters "skew", "shape" related to the "nig" distribution and the parameters "gamma" and "delta" for the model "apARCH".
This is my code example:
varianceModel = list(model="apARCH", garchOrder=c(1,1))
meanModel = list(armaOrder=c(1,1))
distributionModel = "nig"
fixedPars = list(mu=0, ar1 = 0.1, ma1= 0.9, omega=0.001, alpha1=0.1, beta1=0.8, gamma1 = 0.01, delta = 2, shape=1.5, skew = 0.2)
spec <- ugarchspec(variance.model = varianceModel,
mean.model= meanModel, distribution.model=distributionModel,
fixed.pars=fixedPars)
path.sgarch <- ugarchpath(spec, n.sim=1000, n.start=1, m.sim=20)
Now for each of this parameters, how I can get the possible range or the "standard" parameters?
There doesn't seem to be a list of ranges of possible values of such parameters in the documentation of rugarch, while this introduction provides only some partial information.
Those ranges of possible values, however, are (at least should be) standard in the sense that they provide well-defined distributions and stationary models. Hence, you should be able to find all such ranges in some other sources.
However, regarding the distributions, there actually is a hidden source in rugarch that you can use---the rugarch:::.DistributionBounds function source code. For instance, it contains
if (distribution == "nig") {
skew = 0.2
skew.LB = -0.99
skew.UB = 0.99
shape = 0.4
shape.LB = 0.01
shape.UB = 25
}
meaning that the lower and upper bounds for skew are -0.99 and 0.99, respectively. To extract those numbers faster, you may use
rugarch:::.DistributionBounds("nig")[c("skew.LB", "skew.UB")]
# $skew.LB
# [1] -0.99
#
# $skew.UB
# [1] 0.99
Regarding the variance models, typically "simple" ranges, such as as -1 < gamma < 1 for APARCH, are not available/what you want, because they only allow the model to exist, but doesn't guarantee stationarity. For instance, for GARCH(1,1) to be stationary we need alpha + beta < 1; hence, we actually have higher dimensional constraints than just intervals. As I said, you may find those online.
However, ugarchpath also checks those conditions by computing persistence(spec). Now, as you can see in
getMethod("persistence", signature(object = "uGARCHspec", pars = "missing",
distribution = "missing", model = "missing",
submodel="missing"))
there is a different way to compute this persistence for each specification. For instance, for APARCH we look at
rugarch:::.persistaparch1
# function (pars, idx, distribution = "norm")
# {
# alpha = pars[idx["alpha", 1]:idx["alpha", 2]]
# beta = pars[idx["beta", 1]:idx["beta", 2]]
# gamma = pars[idx["gamma", 1]:idx["gamma", 2]]
# delta = pars[idx["delta", 1]:idx["delta", 2]]
# skew = pars[idx["skew", 1]:idx["skew", 2]]
# shape = pars[idx["shape", 1]:idx["shape", 2]]
# ghlambda = pars[idx["ghlambda", 1]:idx["ghlambda", 2]]
# ps = sum(beta) + sum(apply(cbind(gamma, alpha), 1, FUN = function(x) x[2] *
# aparchKappa(x[1], delta, ghlambda, shape, skew, distribution)))
# return(ps)
# }
and the condition is that ps < 1. Notice that
rugarch:::.persistsgarch1
# function (pars, idx, distribution = "norm")
# {
# ps = sum(pars[idx["alpha", 1]:idx["alpha", 2]]) + sum(pars[idx["beta",
# 1]:idx["beta", 2]])
# return(ps)
# }
gives exactly alpha + beta in the case of GARCH(1,1) and then ugarchpathchecks the aforementioned stationarity condition. Hence, the most straightforward thing that you can do is to check if persistence(spec) < 1 before simulating. For instance, in your example,
persistence(spec)
# [1] 0.8997927
I am trying to obtain estimated constrained coefficients using RSS. The beta coefficients are constrained between [0,1] and sum to 1. Additionally, my third parameter is constrained between (-1,1). Utilizing the below I can obtain a nice solution using simulated variables but when implementing the methodology on my real data set I keep arriving at a non-unique solution. In turn, I'm wondering if there is a more numerically stable way to obtain my estimated parameters.
set.seed(234)
k = 2
a = diff(c(0, sort(runif(k-1)), 1))
n = 1e4
x = matrix(rnorm(k*n), nc = k)
a2 = -0.5
y = a2 * (x %*% a) + rnorm(n)
f = function(u){sum((y - u[3] * (x %*% u[1:2]))^2)}
g = function(v){
v1 = v[1]
v2 = v[2]
u = vector(mode = "double", length = 3)
# ensure in (0,1)
v1 = 1 / (1 + exp(-v1))
# ensure add up to 1
u[1:2] = c(v1, 1 - sum(v1))
# ensure between [-1,1]
u[3] = (v2^2 - 1) / (v2^2 + 1)
u
}
res = optim(rnorm(2), function(v) f(g(v)), hessian = TRUE, method = "BFGS")
eigen(res$hessian)$values
res$convergence
rbind(Est = res$par, SE = sqrt(diag(solve(res$hessian))))
rbind(g(res$par),c(a,a2))
Hats off to http://zoonek.free.fr/blosxom/R/2012-06-01_Optimization.html
Since there has been no direct answer to your question so far, I'd like to show a way how to implement a parameter-constrained model in Stan/RStan. You should give this a try using your real data.
Doing Bayesian inference has the advantage of giving you posterior probabilities for your (constrained) model parameters. Point estimates including confidence intervals can then be easily calculated.
First off, we load the library and set RStan to store the compiled model and use multiple cores (if available).
library(rstan);
rstan_options(auto_write = TRUE);
options(mc.cores = parallel::detectCores());
We now define our Stan model. In this case, it's very simple, and we can make use of RStan's simplex data type for vectors of non-negative values that sum to one.
model <- "
data {
int<lower=1> n; // number of observations
int<lower=0> k; // number of parameters
matrix[n, k] X; // data
vector[n] y; // response
}
parameters {
real a2; // a2 is a free scaling parameter
simplex[k] a; // a is constrained to sum to 1
real sigma; // residuals
}
model {
// Likelihood
y ~ normal(a2 * (X * a), sigma);
}"
Stan supports various constrained data types; I'd recommend taking a lot at the Stan manual for more complex examples.
Using the sample data from your original question, we can run our model:
# Sample data
set.seed(234);
k = 2;
a = diff(c(0, sort(runif(k-1)), 1));
n = 1e4;
x = matrix(rnorm(k * n), nc = k);
a2 = -0.5;
y = a2 * (x %*% a) + rnorm(n);
# Fit stan model
fit <- stan(
model_code = model,
data = list(
n = n,
k = k,
X = x,
y = as.numeric(y)),
iter = 4000,
chains = 4);
Running the model will only take a few seconds (after the parser has internally translated and compiled the model in C++), and the full results (posterior distributions for all parameters conditional on the data) are stored in fit.
We can inspect the contents of fit using summary:
# Extract parameter estimates
pars <- summary(fit)$summary;
pars;
# mean se_mean sd 2.5% 25%
#a2 -0.4915289 1.970327e-04 0.014363398 -0.5194985 -0.5011471
#a[1] 0.7640606 2.273282e-04 0.016348488 0.7327691 0.7527457
#a[2] 0.2359394 2.273282e-04 0.016348488 0.2040952 0.2248482
#sigma 1.0048695 8.746869e-05 0.007048116 0.9909698 1.0001889
#lp__ -5048.4273105 1.881305e-02 1.204892294 -5051.4871931 -5048.9800451
# 50% 75% 97.5% n_eff Rhat
#a2 -0.4916061 -0.4819086 -0.4625947 5314.196 1.0000947
#a[1] 0.7638723 0.7751518 0.7959048 5171.881 0.9997468
#a[2] 0.2361277 0.2472543 0.2672309 5171.881 0.9997468
#sigma 1.0048994 1.0095420 1.0187554 6492.930 0.9998086
#lp__ -5048.1238783 -5047.5409682 -5047.0355381 4101.832 1.0012841
You can see that a[1]+a[2]=1.
Plotting parameter estimates including confidence intervals is also easy:
plot(fit);
The simplest way to solve optimization problems with equality and inequality constraints will most likely be through the "augmented Lagrangian" approach. In R this is, for example, realized in the alabama package.
# function and gradient
fn = function(u){sum((y - u[3] * (x %*% u[1:2]))^2)}
gr = function(u) numDeriv::grad(fn, u)
# constraint sum(u) == 1
heq = function(u) sum(u) - 1
# constraints 0 <= u[1],u[2] <= 1; -1 <= u[3] <= 1
hin = function(u) c(u[1], u[2], 1-u[1], 1-u[2], u[3]+1, 1-u[3])
sol_a = alabama::auglag(c(0.5, 0.5, 0), fn, gr, hin=hin, heq=heq)
sol_a
## $par
## [1] 1.0000000 0.3642904 -0.3642904
## $value
## [1] 10094.74
## ...
## $hessian
## [,1] [,2] [,3]
## [1,] 15009565054 9999999977 9999992926
## [2,] 9999999977 10000002578 9999997167
## [3,] 9999992926 9999997167 10000022569
For other packages containing an "augmented Lagrangian" procedure see the CRAN Task View on optimization.
My task is to simulate a compound Poisson process defined as:
where
is a Poisson process and Y_i are Gamma(shape,scale) distributed. This is my R code:
# parameter for Poisson distribution.
lambda = 1
# parameters for Gamma distribution.
shape = 7.5
scale = 1
comp.pois = function(t.max, lambda) {
stopifnot(t.max >= 0 && t.max %% 1 == 0)
# offset ns by 1 because first y is 0.
# generate N(t), that is number of arrivals until time t.
ns = cumsum(rpois(n = t.max, lambda = lambda)) + 1
# generate gamma distributed random variables Y_i.
ys = c(0, rgamma(n = max(ns), shape = shape, scale = scale))
# generate all X(t) for t <= t.max.
return(c(0, cumsum(x = ys[ns])))
}
Compute a random sample of X(10) and compare means and variances.
# sample size.
size = 1000
t = 10
# ts is a vector of sample values for X(10).
ts = sapply(1:size, function(i) comp.pois(t, lambda)[t])
# sample mean and variance:
(mean.s = mean(ts))
(var.s = var(ts))
# theoretical mean and variance:
(mean.t = lambda * t * shape * scale)
(var.t = (shape + 1) * shape * scale^2)
output:
> # sample:
> (mean.s = mean(ts))
[1] 63.38403
> (var.s = var(ts))
[1] 184.3264
> # theoretical:
> (mean.t = lambda * t * shape * scale)
[1] 75
> (var.t = (shape + 1) * shape * scale^2)
[1] 63.75
This variance is gigantic, but I cannot spot my mistake. Please help. Thank you.
EDIT:
I used the following algorithm to generate the N(t). I don't know why it is supposed to be better. I took it from Rizzo, Maria L. Statistical computing with R. CRC Press, 2007. The mean is good, but the variance is even worse. I tried sampling from the Gamma distribution only once for the entire simulation (although I'm pretty sure this does not reflect the problem very well) and the mean was off by around 10-40 for t = 10. When resampling for every X(t) (which is what the following code does), the mean is very exact. As pointed out, the variance is horrifying. This is probably not a good solution, but I suppose it is as good as it gets.
lambda = 3
shape = 6
scale = 2
size = 10000
eps = 1e-8
t = 10
# with probability 1-eps, n or less gamma distributed random variables are needed.
n = qpois(1-eps, lambda = lambda * t)
# sample from the gamma distribution. Not sure if it's ok to use the same sample every time.
# with this, the mean is of by about 10%.
# ys = c(rgamma(n = n, shape = shape, scale = scale))
# the interarrival times are exponentially distributed with rate lambda.
pp.exp = function (t0) {
# not sure how many Tn are needed :/
Tn = rexp(1000, lambda)
Sn = cumsum(Tn)
return(min(which(Sn > t0)) - 1)
}
# generate N(t) which follow the poisson process.
ns = sapply(1:size, function (i) pp.exp(t))
# generate X(t) as in the problem description.
xs = sapply(ns, function (n) {
ys = c(rgamma(n = n, shape = shape, scale = scale))
sum(ys[1:n])
})
output (t=10) in this case:
> # compare mean and variance of 'size' samples of X(t) for verification.
> # sample:
> (mean.s = mean(xs))
[1] 359.864
> (var.s = var(xs))
[1] 4933.277
> # theoretical:
> (mean.t = lambda * t * shape * scale)
[1] 360
> (var.t = (shape + 1) * shape * scale^2)
[1] 168
I came across an interesting presentation on page 32, and I started out to replicate and understand a code presented
The code from the presentation is as follows:
#Unicredit banks code
library(evir)
library(fExtremes)
# Quantile function of lognormal-GPD severity distribution
qlnorm.gpd = function(p, theta, theta.gpd, u)
{
Fu = plnorm(u, meanlog=theta[1], sdlog=theta[2])
x = ifelse(p<Fu,
qlnorm( p=p, meanlog=theta[1], sdlog=theta[2] ),
qgpd( p=(p - Fu) / (1 - Fu) , xi=theta.gpd[1], mu=theta.gpd[2], beta=theta.gpd[3]) )
return(x)
}
# Random sampling function of lognormal-GPD severity distribution
rlnorm.gpd = function(n, theta, theta.gpd, u)
{
r = qlnorm.gpd(runif(n), theta, theta.gpd, u)
}
set.seed(1000)
nSim = 1000000 # Number of simulated annual losses
H = 1500 # Threshold body-tail
lambda = 791.7354 # Parameter of Poisson body
theta1 = 2.5 # Parameter mu of lognormal (body)
theta2 = 2 # Parameter sigma of lognormal (body)
theta1.tail = 0.5 # Shape parameter of GPD (tail)
theta2.tail = H # Location parameter of GPD (tail)
theta3.tail = 1000 # Scale parameter of GPD (tail)
sj = rep(0,nSim) # Annual loss distribution inizialization
freq = rpois(nSim, lambda) # Random sampling from Poisson
for(i in 1:nSim) # Convolution with Monte Carlo method
sj[i] = sum(rlnorm.gpd(n=freq[i], theta=c(theta1,theta2), theta.gpd=c(theta1.tail, theta2.tail, theta3.tail), u=H))
However I get this error which I cannot resolve:
Error: min(p, na.rm = TRUE) >= 0 is not TRUE
APPENDED Question
Many thanks to Shadow.
I dont know how to change function reference. Is it as easy as qgpd.fExtremes to qgpd.evir?
Thanks to Shadow again to pointing this out.
For anyone who wishes to change reference to function from different package (In the above example from fExtremes to evir its as simple as adding evir:::(function).
Example:
evir:::qgpd( p=(p - Fu) / (1 - Fu) , xi=theta.gpd[1], mu=theta.gpd[2], beta=theta.gpd[3]) )
The reason you get an error here is that the packages fExtremes and evir both implement different versions of the function qgpd. In the evir version, p can be less than 0, while the fExtremes package only implements qgpd for p>=0.
The easiest solution to this is to change the qgpd function call to evir:::qgpd.