Optimization in R: constrOptim not converging - r

I am looking at a way to optimize a function in R having several constraints. That's a piece of cake using Excel but I cannot make it work in R.
What I want is to find the set of parameters that maximizes a function under the contraints that parameters should be non-increasing and that the sum of parameters x_i ...x_max is bound for each i.
I wrote a simple example. It works for two parameters but not for three. For three parameters it looks like the optimization procedure is not doing anything.
In real-life cases I would like to use between 12 and 120 parameters so I am a bit worried it does not work with 3 ...
So any help is welcome ... and thanks in advance for the (eventual) reply.
The code for two parameters is (working)
Omp <- function (p)
{
calc <- -p[1]-2*p[2]
return (calc)
}
ui1 <-matrix(c(-1,0,1,-1,-1,-1),ncol =2)
ci1 <-c(-100,-70,0)-0.0000001
init1 <-c(100,0)
(ui1 %*% init1) - ci1
tst <- constrOptim(init1, Omp, grad = NULL, ui = ui1, ci = ci1)
The output is conform expectations:
> sum(tst$par)
[1] 100
> tst$par
[1] 50 50
The code for 3 parameters is (not working)
Opm <- function (p)
{
calc <- -p[1]-2*p[2]-3*p[3]
print(calc)
return (calc)
}
ui1 <-matrix(c(-1,0,0,1,0,-1,-1,0,-1,1,-1,-1,-1,0,-1),ncol =3)
ci1 <-c(-100,-70,0,0,0)-0.0000001
init1 <-c(65,35,0)
(ui1 %*% init1) - ci1
tst <- constrOptim(init1, Opm, grad = NULL, ui = ui1, ci = ci1)
It runs but always remains close to the initial guess.
> tst$par
[1] 6.500000e+01 3.500000e+01 9.685755e-08

Someone else may be able to provide more insight, but your starting values may not be far enough inside the feasible region.
As you stated, this does not produce the expected result:
ui1 <-matrix(c(-1,0,0,1,0,-1,-1,0,-1,1,-1,-1,-1,0,-1),ncol =3)
ci1 <-c(-100,-70,0,0,0)-0.0000001
init1 <-c(65,35,0)
(ui1 %*% init1) - ci1
tst <- constrOptim(init1, Opm, grad = NULL, ui = ui1, ci = ci1)
round(tst$par)
[1] 65 35 0
But adjusting the small offset in ci1 slightly, I get a different result - similar to your expectations in your first example.
ci1 <-c(-100,-70,0,0,0)-0.00001
init1 <-c(65,35,0)
(ui1 %*% init1) - ci1
tst <- constrOptim(init1, Opm, grad = NULL, ui = ui1, ci = ci1)
round(tst$par)
[1] 50 50 0

Related

Optimization function gives incorrect results for 2 similar data sets

I have 2 datasets not very different to each other. Each dataset has 27 rows of actual and forecast values. When tested against Solver in Excel for minimization of the absolute error (abs(actual - par * forecast) they both give nearly equal values for the parameter 'par'. However, when each of these data sets are passed on to the same optimization function that I have written, it only works for one of them. For the other data set, the objective always gets evaluated to zero (0) with'par' assisgned the upper bound value.
This is definitely incorrect. What I am not able to understand is why is R doing so?
Here are the 2 data sets :-
test
dateperiod,usage,fittedlevelusage
2019-04-13,16187.24,17257.02
2019-04-14,16410.18,17347.49
2019-04-15,18453.52,17246.88
2019-04-16,18113.1,17929.24
2019-04-17,17712.54,17476.67
2019-04-18,15098.13,17266.89
2019-04-19,13026.76,15298.11
2019-04-20,13689.49,13728.9
2019-04-21,11907.81,14122.88
2019-04-22,13078.29,13291.25
2019-04-23,15823.23,14465.34
2019-04-24,14602.43,15690.12
2019-04-25,12628.7,13806.44
2019-04-26,15064.37,12247.59
2019-04-27,17163.32,16335.43
2019-04-28,17277.18,16967.72
2019-04-29,20093.13,17418.99
2019-04-30,18820.68,18978.9
2019-05-01,18799.63,17610.66
2019-05-02,17783.24,17000.12
2019-05-03,17965.56,17818.84
2019-05-04,16891.25,18002.03
2019-05-05,18665.49,18298.02
2019-05-06,21043.86,19157.41
2019-05-07,22188.93,21092.36
2019-05-08,22358.08,21232.56
2019-05-09,22797.46,22229.69
Optimization result from R
$minimum
[1] 1.018188
$objective
[1] 28031.49
test1
dateperiod,Usage,fittedlevelusage
2019-04-13,16187.24,17248.29
2019-04-14,16410.18,17337.86
2019-04-15,18453.52,17196.25
2019-04-16,18113.10,17896.74
2019-04-17,17712.54,17464.45
2019-04-18,15098.13,17285.82
2019-04-19,13026.76,15277.10
2019-04-20,13689.49,13733.90
2019-04-21,11907.81,14152.27
2019-04-22,13078.29,13337.53
2019-04-23,15823.23,14512.41
2019-04-24,14602.43,15688.68
2019-04-25,12628.70,13808.58
2019-04-26,15064.37,12244.91
2019-04-27,17163.32,16304.28
2019-04-28,17277.18,16956.91
2019-04-29,20093.13,17441.80
2019-04-30,18820.68,18928.29
2019-05-01,18794.10,17573.40
2019-05-02,17779.00,16969.20
2019-05-03,17960.16,17764.47
2019-05-04,16884.77,17952.23
2019-05-05,18658.16,18313.66
2019-05-06,21036.49,19149.12
2019-05-07,22182.11,21103.37
2019-05-08,22335.57,21196.23
2019-05-09,22797.46,22180.51
Optimization result from R
$minimum
[1] 1.499934
$objective
[1] 0
The optimization function used is shown below :-
optfn <- function(x)
{act <- x$usage
fcst <- x$fittedlevelusage
fn <- function(par)
{sum(abs(act - (fcst * par)))
}
adjfac <- optimize(fn, c(0.5, 1.5))
return(adjfac)
}
adjfacresults <- optfn(test)
adjfacresults <- optfn(test1)
Optimization result from R
adjfacresults <- optfn(test)
$minimum
[1] 1.018188
$objective
[1] 28031.49
Optimization result from R
adjfacresults <- optfn(test1)
$minimum [1]
1.499934
$objective
[1] 0
Can anyone help to identify why is R not doing the same process over the 2 data sets and outputting the correct results in both the cases.
The corresponding results using Excel Solver for the 2 datasets are as follows :-
For 'test' data set
par value = 1.018236659
objective function valule (min) : 28031
For 'test1' data set
par value = 1.01881062927878
objective function valule (min) : 28010
Best regards
Deepak
That's because the second column of test1 is named Usage, not usage. Therefore, act = x$usage is NULL, and the function fn returns sum(abs(NULL - something)) = sum(NULL) = 0. You have to rename this column to usage.

R - numerical errors with analytical gradient?

I've got the following code:
theta=0.05
n=1000
m=200
r=rnorm(2000)
#ER check function
nu=Vectorize(function(a,tau){return(abs(tau-(a<0))*a^2)})
#Selecting 10 lowest sum values (lowest10 function returns indices)
lowest10=function(x){
values=sort(x)[1:min(10,length(x))]
indices=match(values,x)
return(indices)
}
sym.expectile=function(beta,e,abs.r){return(beta[1]+beta[2]*e+beta[3]*abs.r)}
ERsum=function(beta,tau,start,end){
y=r[(start+1):end]
X1=rep(1,n-1)
X3=abs(r[start:(end-1)])
X2=c()
X2[1]=e.sym.optimal[start-m]
for (i in 2:(n-1)){
X2[i]=sym.expectile(beta,X2[i-1],X3[i-1])
}
X=matrix(c(X1,X2,X3),ncol=3)
res=y-X%*%beta
sum.nu=mean(nu(res,tau))
return(sum.nu)
}
ERsum.gr=function(beta,tau,start,end){
y=r[(start+1):end]
X1=rep(1,n-1)
X3=abs(r[start:(end-1)])
X2=c()
X2[1]=e.sym.optimal[start-m]
for (i in 2:(n-1)){
X2[i]=sym.expectile(beta,X2[i-1],X3[i-1])
}
X=matrix(c(X1,X2,X3),ncol=3)
partial.beta0=c()
for (i in 1:(n-1)){partial.beta0[i]=-(1-beta[2]^(i))/(1-beta[2])}
gr.beta0=2/T*sum(abs(tau-(y<X%*%beta))*(y-X%*%beta)*partial.beta0)/1000
partial.beta1=c()
partial.beta1[1]=-X2[1]
for (i in 2:(n-1)){partial.beta1[i]=partial.beta1[i-1]*beta[2]-X2[i]}
gr.beta1=2/T*sum(abs(tau-(y<X%*%beta))*(y-X%*%beta)*partial.beta1)/1000
partial.beta2=c()
partial.beta2[1]=-X3[1]
for (i in 2:(n-1)){partial.beta2[i]=partial.beta2[i-1]*beta[2]-X3[i]}
gr.beta2=2/T*sum(abs(tau-(y<X%*%beta))*(y-X%*%beta)*partial.beta2)/1000
c(gr.beta0,gr.beta1,gr.beta2)
}
beta=matrix(nrow=1e4,ncol=3)
beta[,1]=runif(1e4,-1,0)#beta0
beta[,2]=runif(1e4,0,1)#beta1
beta[,3]=runif(1e4,-1,0)#beta2
e.sym.optimal=c()
tau.found.sym.optim=0.02234724
library('expectreg')
e.sym.optimal[1]=expectile(r[1:m],tau.found.sym.optim)
ERsums.sym=c()
for (i in 1:nrow(beta)){
ERsums.sym[i]=ERsum(beta[i,],tau.found.sym.optim,m+1,m+n)
}
initialbeta.esym=beta[lowest10(ERsums.sym),]
intermedietebeta.esym=matrix(ncol=3,nrow=10)
for (i in 1:10){
intermedietebeta.esym[i,]=optim(initialbeta.esym[i,],ERsum,
gr=ERsum.gr,tau=tau.found.sym.optim,
start=m+1,end=m+n,
method="BFGS")$par
}
I tried to replace the optim function with optimx, but got the following error:
Error: Gradient function might be wrong - check it!
To check if my gradient is ok I tried to evaluate values of gradient function using function grad from numDeriv and directly calling my ERsum.gr function. For the sample vector
beta
[1] -0.8256490 0.7146256 -0.4945032
I obtained following results:
>grad(function(beta) ERsum(c(beta[1],beta[2],beta[3]),tau.found.sym.optim,m+1,m+n),beta)
[1] -0.6703170 2.8812666 -0.5573101
> ERsum.gr2(beta,tau.found.sym.optim,m+1,m+n)
[1] -0.6696467 2.8783853 -0.5567527
So here is my question: is it possible that these differences are just some numerical errors caused by rounding down the partial.beta0, partial.beta1, partial.beta2 which are just the components of the sum representing gradient? I think so, because if my analytical formula for gradient misses something, the discrepancies would be probably much larger, but how can I be sure? If this is a case is there any other way to obtain more accurate values of gradient?
You've got further problems down the line even if you solve the question of whether that is really a proper gradient, which I see as too complex to tackle. If you take out the gr argument and try to run with only optimx instead of optim, you get:
Error in intermedietebeta.esym[i, ] <- optimx(initialbeta.esym[i, ], ERsum, :
number of items to replace is not a multiple of replacement length
This probably relates to the fact that optimx does not return the same structure as is returned by optim:
> optimx(initialbeta.esym[i,],ERsum,
+ tau=tau.found.sym.optim,
+ start=m+1,end=m+n,
+ method="BFGS")$par
NULL
> optimx(initialbeta.esym[i,],ERsum,
+ tau=tau.found.sym.optim,
+ start=m+1,end=m+n,
+ method="BFGS") # leave out `$par`
p1 p2 p3 value fevals gevals niter convcode kkt1 kkt2 xtimes
BFGS -1.0325 0.2978319 0.04921863 0.09326904 102 100 NA 1 TRUE FALSE 3.366
If you disagree with the decision to allow a default gradient estimate, hten you need to narrow down your debugging to the function that throws the error:
Error: Gradient function might be wrong - check it!
> traceback()
3: stop("Gradient function might be wrong - check it! \n", call. = FALSE)
2: optimx.check(par, optcfg$ufn, optcfg$ugr, optcfg$uhess, lower,
upper, hessian, optcfg$ctrl, have.bounds = optcfg$have.bounds,
usenumDeriv = optcfg$usenumDeriv, ...)
1: optimx(initialbeta.esym[i, ], ERsum, gr = ERsum.gr, tau = tau.found.sym.optim,
start = m + 1, end = m + n, method = "BFGS")
And look at the documentation (there was no help page) and code for optimx:::optimx.check. This is the section of code that does the checking:
if (!is.null(ugr) && !usenumDeriv) {
gname <- deparse(substitute(ugr))
if (ctrl$trace > 0)
cat("Analytic gradient from function ", gname,
"\n\n")
fval <- ufn(par, ...)
gn <- grad(func = ufn, x = par, ...)
ga <- ugr(par, ...)
teps <- (.Machine$double.eps)^(1/3)
if (max(abs(gn - ga))/(1 + abs(fval)) >= teps) {
stop("Gradient function might be wrong - check it! \n",
call. = FALSE)
optchk$grbad <- TRUE
}

L2 distance between functional data (smoothed curves)

I have used smoothing to create two "functions" fd4 and fd6.
fit6 <- smooth.basis(tid6, zbegfor, fdParobj2)
fd6 <- fit6$fd
I want to measure the L2 distance between them on the interval [0,1], but I haven't been able to find an appropriate way.
||f − g||_2 = sqrt(int(|f(x)-g(x)|^2,0,1))
The best bet has been this one: How to calculate functional L_2 norm using R, but when I use fd6 instead of f <- function(x) x^2, I get the following message:
"Error in fac - fdmat : non-conformable arrays".
I've spent hours trying to find a solution. Please help me!
Now with reproducible code:
library(fda)
# Smoothing of movement pattern without obstacle rescaled to the interval [0,1]
without <- c(22.5050173512478, 22.5038665040295, 22.5171851824298, 22.5368096190746,
22.5770229184757, 22.6709727229898, 22.8195669635573, 23.0285400460222,
23.3240853426905, 23.6895323912605, 24.0905709304813, 24.5674870961964,
25.129085512519, 25.7433521858875, 26.4096817521118, 27.1338935155912,
27.906416101033, 28.7207273157549, 29.5431756517467, 30.3697951466496,
31.2214907341765, 32.0625307132683, 32.8786845916855, 33.671550678219,
34.4449992914392, 35.1852293010227, 35.8866367048324, 36.5650863548079,
37.1776116180247, 37.7706354957587, 38.3082855431959, 38.8044130844639,
39.2471137254193, 39.6193031585418, 39.9685683244076, 40.2345560551869,
40.4394442661545, 40.5712407258558, 40.6905311089523, 40.712419802203,
40.6704560575084, 40.5583379372846, 40.3965425630546, 40.1443139907057,
39.8421899334408, 39.4671160834355, 39.018733225651, 38.5381390971577,
38.035680135599, 37.4625783280288, 36.8649362406917, 36.2320264206665,
35.5599736527209, 34.8983871226943, 34.2058073957721, 33.4893682831911,
32.7568501019309, 32.0241649500974, 31.3036406455137, 30.587636320768,
29.8962657607091, 29.2297665999702, 28.6003939337949, 28.0003531206639,
27.433551463149, 26.9088532545635, 26.4265682839796, 25.974193299003,
25.5553146923473, 25.1701249455904, 24.8107813804098, 24.4776168601955,
24.167582682288, 23.8726502760669, 23.589703789663, 23.3222235336882,
23.0616248799115, 22.8185342685607, 22.6767541125512, 22.6567795841271,
22.6488510112824, 22.6436058079441, 22.6391304188382)
timewithout <- (1:length(without))/length(without) # For scaling
splineBasis = create.bspline.basis(c(0,1), nbasis=25, norder=6) # The basis for smoothing
basis = fdPar(fdobj=splineBasis, Lfdobj=2, lambda=0.00001)
fitwithout <- smooth.basis(timewithout, without, basis) # Smoothing
fdwithout <- fitwithout$fd
# Same but movement is over an obstacle
with <- c(22.4731637093167, 22.4655561889073, 22.4853719755102, 22.4989400065304,
22.5495656349031, 22.666945409755, 22.8368941117498, 23.0846080078369,
23.4160560011242, 23.8285634914224, 24.2923085321078, 24.8297004047422,
25.4884540279408, 26.2107053559, 27.0614232848574, 27.9078055119721,
28.8449720096674, 29.8989669834473, 30.996962022701, 32.1343108758062,
33.3286403418359, 34.6364870430171, 35.9105342483246, 37.1883582665643,
38.467212668323, 39.7381525466373, 41.0395064969214, 42.3095531191294,
43.5708069740233, 44.7881178787717, 45.9965529977777, 47.1643807808923,
48.284786275036, 49.3593991064962, 50.3863035442644, 51.3535489662494,
52.2739716491521, 53.1338828493223, 53.9521101656512, 54.7037562884229,
55.3593092084143, 55.9567618011946, 56.4768579145271, 56.9251919073806,
57.2971965985674, 57.5937987523734, 57.8158626068961, 57.9554856023804,
58.009777126789, 57.9863251605612, 57.8932199088797, 57.6988126618694,
57.4350394069443, 57.1112025796509, 56.7580579506751, 56.2680669960935,
55.6963799946038, 55.0574070566765, 54.3592140352073, 53.6072275005723,
52.7876353306759, 51.9172334605074, 50.9879178368431, 49.9953932631072,
48.9460707853802, 47.8511977258834, 46.6827266395278, 45.4635999409637,
44.2633368255294, 43.0386729762103, 41.7880095105045, 40.4834298069985,
39.1610223705633, 37.9241872458281, 36.7158342529737, 35.5408830466013,
34.4070964101159, 33.307156473109, 32.2514661493348, 31.2475129673168,
30.2990631096187, 29.4096423238141, 28.590173995037, 27.8437368908309,
27.17493959411, 26.5779670740351, 26.0377946174036, 25.5731202027558,
25.1761397934058, 24.8319659155494, 24.5479180062239, 24.2940808334792,
24.09388897537, 23.934861348149, 23.7999923744404, 23.6877461628934,
23.5982309560843, 23.5207597985246, 23.4354446383638, 23.3604065265148,
23.2819126915765, 23.1725048152396, 23.0637455648184, 22.9426779696074,
22.8079176617495, 22.69360227086, 22.6622165457034, 22.6671302753094,
22.66828206305, 22.6703162730529, 22.6715781657376)
timewith <- (1:length(with))/length(with)
fitwith <- smooth.basis(timewith, with, basis) # Smoothing
fdwith <- fitwith$fd
# Plots for understanding
plot(fdwith, col=2) # Smoothed curve for movement over obstacle
plot(fdwithout, col=2, add = TRUE) # Same but no obstacle
# I have to find the L2-distance between these curves
First, one can take advantage of the possibility to perform arithmetic operations with fd objects: fdwith - fdwithout. Second, maybe there is a better way to extract values from fd objects at specific points, but this also works: predict(newdata = 0.5, fdwith - fdwithout). So,
sqrt(integrate(function(x) predict(newdata = x, fdwith-fdwithout)^2, lower = 0, upper = 1)$val)
# [1] 9.592434

Generating random numbers from the Laplace distribution

I have been trying to generate random numbers from the double exponential(Laplace) distribution. I am at a point I can write the code anymore. Any help would be appreciated. The code below is what I have written.
rlaplace = function(u,a,b){
u = c(runif(ns))
for(i in 1:ns){
if(u[i] <= 0.5){
X = a+b*log(2*u)
} else{
X = a-b*log(2*(1-u))
}
}
X
}
z1 = rlaplace(u,a,b)
From the Probability distributions CRAN Task View, there are several packages that already implement the Laplace distribution, notably distr and Runuran.
So you should be able to install distr, for example, and do something like :
library(distr)
D <- DExp(rate = 1)
r(D)(1)
Code taken from the examples of the DExp-class help page.
Try this?
#Using pdf for a laplace RV:
#F(y) = 1/sqrt(2*sigma^2)*exp(sqrt(2)*abs(y-mu)/sigma)
rlaplace = function(n,mu,sigma){
U = runif(n,0,1)
#This will give negative value half of the time
sign = ifelse(rbinom(n,1,.5)>.5,1,-1)
y = mu + sign*sigma/sqrt(2)*log(1-U)
y
}

while loop problem in r

i am trying to get this loop in my r program to work but it is not giving me the results that I desire. I am trying to model an insurance contract where there are n securities that have a fixed likelihood of default vector(data[i,2]) and a payout vector(data[i,1]).
i need to price the value of stop losses at the security level and at the portfolio level. to do this i created two while loops for the conditional vectors of each level (which will be inputed into the function by the user) one while loop to scan through the various securities and a final one to model the various scenarios. i tried to Use R's matrix capabilities to help organize the results.
the problem with this code is that the if statement behaves oddly, not activating and filtering correctly. this causes the program to be slow and provide bad results. it fills the individual protection column always rather than conditioning it on the likelihood vector(data[i,2]). there is a lot of moving parts but overall it is a simple model.
y = years
nr=nrow(data1)
nc=ncol(data1)
isl = individualStopLoss
asl = aggregateStoploss
Lasl = length(asl)
LIsl = length(isl)
claims = vector(mode = "logical",length= asl)
individualProtection = matrix(0,ncol=LIsl,nrow=y)
aggregateProtection = matrix(0,ncol=Lasl ,nrow=y)
expectedClaims = data1[,1]*data1[,2]
expectedClaims = sum(expectedClaims)
k = 1
m=1
while (k<=y)
{j = 1
m = 1
runi = runif(nr, min=0, max=1)
while (m<=Lasl)
{while (j<=LIsl)
{i=1
while (i<=nr)
{if ( runi[i] < data1[i,2] )
{individualProtection[k,j] = individualProtection[k,j] + max(data1[i,1]-isl[j],0)
claims[k] = claims[k] + data1[i,1]
i=i+1
}
else{i= i+1}
}
j=j+1
}
aggregateProtection[k,m]= aggregateProtection[k,m] + max(claims[k] - expectedClaims*asl[m],0)
m = m+1
}
k = k+1
}
Just an example to help you provide a reproducible example, will be deleted when your question is updated.
data1 <- cbind(rnorm(1000),rnorm(1000))
y = sample(rep(1990:2011,1000),1000)
nr=nrow(data1)
nc=ncol(data1)
isl = rnorm(500)
asl = rnorm(500)
Lasl = length(asl)
LIsl = length(isl)

Resources