How to formulate "variable free in sign" in Linear programming? - r

I just tried to formulate dea(data development analysis) in R script. And the LP are in following format:
Snapshot of the model
While I don't know how to set "free-in-sign variable" in package "Rglpk"(which I used to solve the LP in R), I set up two variable to represent u1 and u2 in the model.
u1 = u1a - u1b ; u2 = u2a - u2b
And my r code to solve this model is shown below:
f.rhs <- c(rep(0,1,2*N),1)
f.dir <- c(rep("<=",1,2*N),"==")
aux1 <- cbind(-1*X,Z,0*Y,1,-1,0,0)
aux2 <- cbind(0*X,-1*Z,Y,0,0,1,-1)
for (i in 1:N) {
f.obj <- c(rep(0,1,s),as.numeric(Z[i,]),as.numeric(Y[i,]),1,-1,1,-1)
f.con <- rbind(aux1,aux2,c(as.numeric(X[i,]),as.numeric(Z[i,]),rep(0,1,m+4)))
results <- Rglpk_solve_LP(f.obj,f.con,f.dir,f.rhs,max=TRUE)
}
But I didn't get any acceptable results and don't know if there is any problem in my formulation. The outcomes of this model should be in 0~1, but I got a 1.033 and a negative weight in one of the outcome.
Could somebody please guide me how to solve this problem? Thank you!

The docs of Rglpk shows that there is a bound-parameter available.
Use it! (and don't split)
While the split-var-approach should work in general (i did not check your code) it's a bad approach (doubles the amount of variables; possible numerical-instability at least for IPM-based solvers) and more direct-ones are preferred
As GLPK is simplex-based (i think there is a not state-of-the-art IPM too, but simplex is probably default) there is a high probability, that there is a more direct handling of these bounds, therefore, use it (by using the bounds-argument)
Defining a variable as free in this case means: set those bounds to -inf, inf
Something straight from the docs (page 6 of 7; version 0.6-2):
## Same as before but with bounds replaced by
## -Inf < x_1 <= 4
## 0 <= x_2 <= 100
## 2 <= x_3 < Inf
bounds <- list(lower = list(ind = c(1L, 3L), val = c(-Inf, 2)),
upper = list(ind = c(1L, 2L), val = c(4, 100)))
Rglpk_solve_LP(obj, mat, dir, rhs, bounds, types, max)

Related

Moment Matching Scenario Generation in R

I am working on a portfolio optimazion algorithm and part of the problem consists in generating moment matching scenario.
My choice due to its simplicity and quickness was to go through paper "An algorithm for moment-matching scenario generation with application to financial portfolio optimization" (Ponomareva, Roman and Date).
The problem is that even though the mathematics are very simple, I am stuck by the fact that some of probability weights pi are negative even though the formulas in the paper should ensure otherwise. If I put a loop to run the algorithm until it finds a positive combination it essentially runs forever.
I put the bit of code based on the paper were things get stuck:
dummy1 = 0
while (dummy1 <=0 | dummy1 >= 1) {
dummy1 = round(rnorm(1, mean = 0.5, sd = 0.25), 2)
}
diag.cov.returns = diag(cov.returns)
Z = dummy1 * sqrt (diag.cov.returns) #Vector Z according to paper formula
ZZT = Z %*% t(Z)
LLT = cov.returns - ZZT
L = chol(LLT) #cholesky decomposition to get matrix L
s = sample (1:5, 1)
F1 = 0
F2 = -1
S = (2*N*s)+3
while (((4*F2)-(3*F1*F1)) < 0) {
#Gamma = (2*s*s)*(((N*mean.fourth) - (0.75*(sum(Z^4)* (N*mean.third/sum(Z^3))^2)))/sum(L^4))
#Gamma is necessary if we want to get p from Uniform Distribution
#U = runif(s, 0, 1)
U = rgamma(s, shape = 1, scale = ((1/exp(1)):1))
#p = (s*(N/Gamma)) + ((1/(2*N*s)) - (s/(N*Gamma)))*U
p = (-log(U, base = exp(1)))
p = p/(((2*sum(p))+max(p))*N*s) #this is the array expected to have positive and bounded between 0 and 1
q1 = 1/p
pz = p
p[s+1] = (1-(2*N*sum(p))) #extra point necessary to get the 3 moment mathcing probabilities
F1 = (N*mean.third*sqrt(p[s+1]))/(sum(Z^3))
F2 = p[s+1]*(((N*mean.fourth) - (1/(2*s*s))*sum(L^4)*(sum(1/p)))/sum(Z^4))
}
alpha = (0.5*F1) + 0.5*sqrt((4*F2)-(3*F1*F1))
beta = -(0.5*F1) + 0.5*sqrt((4*F2)-(3*F1*F1))
w1 = 1/(alpha*(alpha+beta))
w2 = 1/(beta*(alpha+beta))
w0 = 1 - (1/(alpha*beta))
P = rep(pz, 2*N) #Vector with Probabilities starting from p + 3 extra probabilities to match third and fourth moments
P[(2*N*s)+1] = p[s+1]*w0
P[(2*N*s)+2] = p[s+1]*w1
P[(2*N*s)+3] = p[s+1]*w2
Unfortunately I cannot discolose the input dataset containing funds returns. However I can surely be more specific. Starting from a data.frame() containing N assets' returns (in my case there 11 funds and monthly returns from 30/01/2001 to 30/09/2020). Once the mean returns, covariance matrix, central third and fourth moments (NOT skewness and kurtosis) and the averages are computed. The algorithm follows as I have reported in the problem. The point where i get stuck is that p takes also negative values. This is a problem since the first s elements of p are later used as probabilities in P.
I hope that in this way the problem is more clear. I also want to add that in the paper the data used by the authors is reported, unfortunately to import them in R would be necessary to import them manually. However I repeat any data.frame() containing assets' returns will do.

ROI package doesn't give a solution (qpoases solver) for an unconstrained QP

I'm setting up a simple QP to test the ROI R package. However, the package isn't able to give a wrong solution to a simple toy problem when it is unconstrained.
Example,
# Maximize -1/2 x^2 + x, no constraints
> x <- OP(Q_objective(as.matrix(-1), 1), maximum = TRUE, bounds = V_bound(lb = -Inf, ub = Inf, nobj = 1))
> x
> ROI Optimization Problem:
Maximize a quadratic objective function of length 1 with
- 1 continuous objective variable,
subject to
- 0 constraints
- 1 lower and 0 upper non-standard variable bounds.
Looks good. But when I solve the problem I get,
> sol <- ROI_solve(x, solver = 'qpoases')
> sol
No optimal solution found.
The solver message was: Initialisation failed! QP could not be solved!
The objective value is: 0.000000e+00
> sol$solution
[1] 0
> sol$objval
[1] 0
> sol$status
$code
[1] 1
$msg
solver qpoases
code 36
symbol RET_INIT_FAILED_HOTSTART
message Initialisation failed! QP could not be solved!
roi_code 1
Thats odd. On a related note, when I use the quadprog solver, I'm able to get the unconstrained solution (= 1), however I've had to switch from using quadprog to qpoases for other reasons.
Any help is much appreciated.
EDIT:
Oddly enough this works,
> ROI_solve(OP(Q_objective(as.matrix(-1), 1), maximum = TRUE), solver = 'qpoases')
No optimal solution found.
The solver message was: Initialisation failed! QP could not be solved!
The objective value is: 0.000000e+00
> ROI_solve(OP(Q_objective(as.matrix(1), -1), maximum = FALSE), solver = 'qpoases')
Optimal solution found.
The objective value is: -5.000000e-01
This different results result from a different value set in the parameter hessian_type if the argument hessian_type is not provided by the
user ROI.plugin.qpoases chooses the hessian_type. And due to a
bug in the choosing of the hessian type for the first example it choose
for the first example the hessian_type = 6 (unknown) and for the second
the correct hessian_type = 1 identity.
x1 <- OP(Q_objective(as.matrix(-1), 1), maximum = TRUE, bounds = NULL)
s1 <- ROI_solve(x1, solver = 'qpoases', hessian_type = 1L)
solution(s1)
x1 <- OP(Q_objective(as.matrix(-1), 1), maximum = TRUE, bounds = NULL)
s1 <- ROI_solve(x1, solver = 'qpoases', hessian_type = 6L)
solution(s1)
This is fixed in the new version and the new version is on the way to CRAN.

How does ar.yw estimate the variance

In R, how does the function ar.yw estimate the variance? Specifically, where does the number "var.pred" come from? It does not seem to come from the usual YW estimate of the variance, nor the sum of squared residuals divided by df (even though there is disagreement about what the df should be, none of the choices give an answer equivalent to var.pred). And yes, I know that there are better methods than YW; just trying to figure out what R is doing.
set.seed(82346)
temp <- arima.sim(n=10, list(ar = 0.5), sd=1)
fit <- ar(temp, method = "yule-walker", demean = FALSE, aic=FALSE, order.max=1)
## R's estimate of the sigma squared
fit$var.pred
## YW estimate
sum(temp^2)/10 - fit$ar*sum(temp[2:10]*temp[1:9])/10
## YW if there was a mean
sum((temp-mean(temp))^2)/10 - fit$ar*sum((temp[2:10]-mean(temp))*(temp[1:9]-mean(temp)))/10
## estimate based on residuals, different possible df.
sum(na.omit(fit$resid^2))/10
sum(na.omit(fit$resid^2))/9
sum(na.omit(fit$resid^2))/8
sum(na.omit(fit$resid^2))/7
Need to read the code if it's not documented.
?ar.yw
Which says: "In ar.yw the variance matrix of the innovations is computed from the fitted coefficients and the autocovariance of x." If that is not enough explanation, then you need to look at the code:
methods(ar.yw)
#[1] ar.yw.default* ar.yw.mts*
#see '?methods' for accessing help and source code
getAnywhere(ar.yw.default)
# there are two cases that I see
x <- as.matrix(x)
nser <- ncol(x)
if (nser > 1L) # .... not your situation
#....
else{
r <- as.double(drop(xacf))
z <- .Fortran(C_eureka, as.integer(order.max), r, r,
coefs = double(order.max^2), vars = double(order.max),
double(order.max))
coefs <- matrix(z$coefs, order.max, order.max)
partialacf <- array(diag(coefs), dim = c(order.max, 1L,
1L))
var.pred <- c(r[1L], z$vars)
#.......
order <- if (aic)
(0L:order.max)[xaic == 0L]
else order.max
ar <- if (order)
coefs[order, seq_len(order)]
else numeric()
var.pred <- var.pred[order + 1L]
var.pred <- var.pred * n.used/(n.used - (order + 1L))
So you now need to find the Fortran code for C_eureka. I think I'm finding it here: https://svn.r-project.org/R/trunk/src/library/stats/src/eureka.f This is the code that aI think is returning the var.pred estimate. I'm not a time series guy and It's your responsibility to review this process for applicability to your problem.
subroutine eureka (lr,r,g,f,var,a)
c
c solves Toeplitz matrix equation toep(r)f=g(1+.)
c by Levinson's algorithm
c a is a workspace of size lr, the number
c of equations
c
snipped
c estimate the innovations variance
var(l) = var(l-1) * (1 - f(l,l)*f(l,l))
if (l .eq. lr) return
d = 0.0d0
q = 0.0d0
do 50 i = 1, l
k = l-i+2
d = d + a(i)*r(k)
q = q + f(l,i)*r(k)
50 continue

Portfolio optimization

I am trying to build a portfolio which is optimized with respect to another in R.
I am trying to minimize the objective function
$$min Var(return_p-return'weight_{bm})$$
with the constraints
$$ 1_n'w = 1$$
$$w > .005$$
$$w < .8$$
with w being the returns from a portfolio. there are 10 securities, so I set the benchmark weights at .1 each.
I know that
$$ Var(return_p-return'weight_{bm})= var(r) + var(r'w_{bm}) - 2*cov(r_p, r'w_{bm})=var(r'w)-2cov(r'w,r'w_{bm})=w'var(r)w-2cov(r'w,r'w_{bm})$$
$$=w'var(r)w-2cov(r',r'w_bm)w$$
the last term is of the form I need so I tried to solve this with solve.QP in R, the constraints are giving me a problem though.
here is my code
trackport <- array(rnorm(obs * assets, mean = .2, sd = .15), dim = c(obs,
assets)) #this is the portfolio which the assets are tracked against
wbm <- matrix(rep(1/assets, assets)) #random numbers for the weights
Aeq <- t(matrix(rep(1,assets), nrow=assets, ncol = 1)) #col of 1's to add
#the weights
Beq <- 1 # weights should sum to 1's
H = 2*cov(trackport) #times 2 because of the syntax
#multiplies the returns times coefficients to create a vector of returns for
#the benchmark
rbm = trackport %*% wbm
#covariance between the tracking portfolio and benchmark returns
eff <- cov(trackport, rbm)
#constraints
Amatrix <- t(matrix(c(Aeq, diag(assets), -diag(assets)), ncol = assets,
byrow = T))
Bvector <- matrix(c(1,rep(.005, assets), rep(.8, assets)))
#solve
solQP3 <- solve.QP(Dmat = H,
dvec = zeros, #reduces to min var portfolio for
#troubleshooting purposes
Amat = Amatrix,
bvec = Bvector,
meq = 1)
the error I am getting is "constraints are inconsistent, no solution!" but I can't find what's wrong with my A matrix
My (transposed) A matrix looks like this
[1,1,...,1]
[1,0,...,0]
[0,1,...,0]
...
[0,0,...,1]
[-1,0,...,0]
[0,-1,...,0]
...
[0,0,...,-1]
and my $b_0$ looks like this
[1]
[.005]
[.005]
...
[.005]
[.8]
[.8]
...
[.8]
so I'm not sure why it isn't finding a solution, could anyone take a look?
I'm not familiar with the package, but just took a quick look at https://cran.r-project.org/web/packages/quadprog/quadprog.pdf , which apparently is what you are using.
Your RHS values of .8 should be -0.8 because this function uses ≥ inequalities. So you have been constraining the variables to be ≥ .005 and ≤ -0.8, which of course is not what you want, and is infeasible.
So leave transposed A as is and make
b0:
[1]
[.005]
[.005]
...
[.005]
[-.8]
[-.8]
...
[-.8]

how do I select the smoothing parameter for smooth.spline()?

I know that the smoothing parameter(lambda) is quite important for fitting a smoothing spline, but I did not see any post here regarding how to select a reasonable lambda (spar=?), I was told that spar normally ranges from 0 to 1. Could anyone share your experience when use smooth.spline()? Thanks.
smooth.spline(x, y = NULL, w = NULL, df, spar = NULL,
cv = FALSE, all.knots = FALSE, nknots = NULL,
keep.data = TRUE, df.offset = 0, penalty = 1,
control.spar = list(), tol = 1e-6 * IQR(x))
agstudy provides a visual way to choose spar. I remember what I learned from linear model class (but not exact) is to use cross validation to pick "best" spar. Here's a toy example borrowed from agstudy:
x = seq(1:18)
y = c(1:3,5,4,7:3,2*(2:5),rep(10,4))
splineres <- function(spar){
res <- rep(0, length(x))
for (i in 1:length(x)){
mod <- smooth.spline(x[-i], y[-i], spar = spar)
res[i] <- predict(mod, x[i])$y - y[i]
}
return(sum(res^2))
}
spars <- seq(0, 1.5, by = 0.001)
ss <- rep(0, length(spars))
for (i in 1:length(spars)){
ss[i] <- splineres(spars[i])
}
plot(spars, ss, 'l', xlab = 'spar', ylab = 'Cross Validation Residual Sum of Squares' , main = 'CV RSS vs Spar')
spars[which.min(ss)]
R > spars[which.min(ss)]
[1] 0.381
Code is not neatest, but easy for you to understand. Also, if you specify cv=T in smooth.spline:
R > xyspline <- smooth.spline(x, y, cv=T)
R > xyspline$spar
[1] 0.3881
From the help of smooth.spline you have the following:
The computational λ used (as a function of \code{spar}) is λ = r *
256^(3*spar - 1)
spar can be greater than 1 (but I guess no too much). I think you can vary this parameters and choose it graphically by plotting the fitted values for different spars. For example:
spars <- seq(0.2,2,length.out=10) ## I will choose between 10 values
dat <- data.frame(
spar= as.factor(rep(spars,each=18)), ## spar to group data(to get different colors)
x = seq(1:18), ## recycling here to repeat x and y
y = c(1:3,5,4,7:3,2*(2:5),rep(10,4)))
xyplot(y~x|spar,data =dat, type=c('p'), pch=19,groups=spar,
panel =function(x,y,groups,...)
{
s2 <- smooth.spline(y,spar=spars[panel.number()])
panel.lines(s2)
panel.xyplot(x,y,groups,...)
})
Here for example , I get best results for spars = 0.4
If you don't have duplicated points at the same x value, then try setting GCV=TRUE - the Generalized Cross Validation (GCV) procedure is a clever way of selecting a pretty good stab at picking a good value for lambda (span). One neat detail about the GCV is that it doesn't actually have to go to the trouble of doing the calculations for every single set of one-left-out points - as highlighted in Simon Wood's book. For lots of detail on this have a look at the notes on Simon Wood's web page on MGCV.
Adrian Bowman's (sm) r-package has a function h.select() which is intended specifically for going the grunt work for choosing a value of lambda (though I'm not 100% sure that it is compatible with the smooth.spline() function in the base package.

Resources