I am trying to optimize layout of a set of boxes w.r.t. their hanger locations s.t. the boxes are most aligned with their hangers and do not crowd out each other. Using quadprog.
Givens:
1. box hanger x-locations (P). =710 850 990 1130
2. box-sizes (W). =690 550 690 130
3. usable x-spread tuple (S). =-150 2090
4. number of boxes (K). =4
5. minimum interbox spread (G). =50
6. box x-locations (X). =objective
We can see that the total required x-spread is sum(W) + 3G = 2060 + 150 = 2210 whereas the available x-spread is S[2] - S1 = 2240. So, a solution should exist.
Min:
sumof (P[i] – X[i])^2
s.t.:
(1) X[i+i] – X[i] >= G + ½ ( W[i+1] + W[i] ); i = 1..(K-1), i.e. the boxes do not crowd out each other
-X[i] + X[i+1] >= -( -G – ½ (W[i+1] + W[i]) )
(2) X1 >= S[left] + ½ W1, and (3) X[K] <= S[right] – ½ W[K], i.e. the boxes are within the given x-spread
X[1] >= - ( S[left] + ½ W[1] )
-X[K] >= - ( S[right] – ½ W[K] )
for a total of 5 constraints - 3 for the inter-box spread, and 2 for extremities.
in R:
> Dmat = matrix(0,4,4)
> diag(Dmat) = 1
> dvec = P, the hanger locations
[1] 710 850 990 1130
> bvec
[1] -670 -670 -460 -195 2025
> t(Amat)
[,1] [,2] [,3] [,4]
[1,] -1 1 0 0
[2,] 0 -1 1 0
[3,] 0 0 -1 1
[4,] 1 0 0 0
[5,] 0 0 0 -1
> solve.QP(Dmat, dvec, Amat, bvec)
Error in solve.QP(Dmat, dvec, Amat, bvec) :
constraints are inconsistent, no solution!
Quite obviously I have missed or mis-specified the problem (Package 'quadprog')! I am using quadprog as I found a JavaScript port of it.
Thanks a lot.
I'm not sure that this solves your physical problem but the code below seems to solve the optimization problem as you stated it. I've generalized it to a
variable number of boxes and included a plot to check the solution.
library(quadprog)
p <- c(710, 850, 990, 1130) # hanger positions
w <- c(690, 550, 690, 130) # box widths
g <- 50 # min box separation
s <- c(-150, 2390) # min and max postions of box edges
k <- length(w) # number of boxes
Dmat <- 2*diag(nrow=k)
dvec <- p
# separation constraints
Amat <- -diag(nrow=k,ncol=(k-1))
Amat[lower.tri(Amat)] <- unlist(lapply((k-1):1, function(n) c(1,numeric(n-1))))
bvec <- sapply(1:(k-1), function(n) g + (w[n+1]+w[n])/2)
# x-spread constraints
Amat <- cbind(Amat, c(1,numeric(k-1)), c(numeric(k-1),-1))
bvec <- c(bvec, s[1] + w[1]/2, -(s[2] - w[k]/2))
sol <- solve.QP(Dmat, dvec, Amat, bvec)
plot(x=s, y=c(0,0), type="l", ylim=c(-2.5,0))
points(x=p, y=numeric(k), pch=19)
segments(x0=sol$solution, y0=-1, x1=p, y1=0)
rect(xleft=sol$solution-w/2, xright=sol$solution+w/2, ytop=-1.0, ybottom=-2, density=8)
The problem lies with the setup of Amat, bvec or both. solve.QP tries to find a solution, b, of the quadratic programming problem subject to the constraint that
t(Amat)*b >= bvec
Expanding out this constraint in your example, we want to find a vector b := c(b[1], b[2], b[3], b[4]) that satisfies the conditions:
-b[1] + b[2] >= -670,
-b[2] + b[3] >= -670,
-b[3] + b[4] >= -460,
b[1] >= -195
and -b[4] >= 2025 (i.e., b[4] <= -2025).
However, by adding the first four inequalities together, we have b[4] >= -670-670-460-195 = -1995. In other words, b[4] must be greater than -1995 and less than -2025. This is a contradiction and therefore solve.QP fails to find a solution.
Trying this example with the constraint -b[4] >= -2025, by setting bvec = c(-670, -670, -460, -195, -2025) yields a solution. Without going too much into your formulation above, perhaps this was intended (or another one of these values should have been positive)?
Related
I have posted a problem in Stack Mathematics regarding a Metropolis Hastings algorithm in graph which someone can read it here
(A code solution is written in the stack mathematics link but it is in the CoCalc and I do not know how to translate in R.)
In a nutshell the problem is: Consider a finite, undirected, connected graph 𝐺=(𝑉,𝐸)
with vertex set 𝑉 and edge set 𝐸. We do not know the length of |𝑉| of the vertices of 𝐺, nor its structure. We can only see local information about 𝐺. E.g. if we are at a vertex 𝑥∈𝑉 we can see the neighbors of 𝑥, i.e. the vertices 𝑦∈𝑉 for which (𝑥,𝑦)∈𝐸, as well as how many neighbors 𝑥's neighbors have. Let us denote by 𝑑(𝑥) the degree of 𝑥∈𝑉 the number of neighbors of 𝑥.
Compute the transition probabilities of the chain {𝑋𝑛}𝑛∈𝑁 of the Metropolis-Hastings algorithm simulating the uniform distribution in 𝑉 using those of the random walk in 𝐺 as the proposal transition probabilities.
For simplicity let us assume that the graph has 5 vertices.
How can I write the MH algorithm in R for this specific problem or translate the Cocalc in the stack math answer ?
Edit,
(i) ## CoCalc code in R (rather slow).
## CoCalc code in R.
library(igraph)
# g <- graph(c(1,2, 2,4, 4,1, 1,3), directed=FALSE)
g <- graph_from_literal(A-B, B-D, D-A, A-C)
cur = 1
freq <- rep(0, vcount(g))
names(freq) <- as_ids(V(g))
nit <- 1E4
set.seed(1)
system.time({
for ( i in seq(nit) ) {
neigh <- V(g)[.nei(cur)]
nextnode <- neigh[sample(length(neigh), 1)]
if (runif(1) < degree(g, cur) / degree(g, nextnode) ){
cur <- nextnode
}
freq[cur] <- freq[cur] + 1
}
})
freq <- freq / sum(freq)
freq
Output.
user system elapsed
4.63 0.05 5.13
A B D C
0.2471 0.2477 0.2463 0.2589
Regarding your point (ii) in Stack Mathematics.
If in an (un)dirceted graph all cycles are even, then the graph is bipartite and aperiodic.
As in this example. Degrees may be different.
library(igraph)
g <- graph_from_literal(A-X, X-B, B-Y, Y-A, A-Z)
degree(g)
mmm <- as.matrix(g[])
mmm <- mmm / rowSums(mmm)
## Calculate power serie.
mms <- mmm
mms <- mms %*% mmm; mms
Sources
The Metropolis-Hastings algorithm, Christian P. Robert, https://arxiv.org/abs/1504.01896
The Metropolis Hastings Algorithm, Matthew Stephens, https://stephens999.github.io/fiveMinuteStats/MH_intro.html
As an example, we roll two dice and use the sum as the target distribution: the sum of two dices (6 faces each).
Possible states: 2 through 12.
library(igraph)
freq_d2 <- c(1,2,3,4,5,6,5,4,3,2,1)
dist_d2 <- freq_d2 / 36
target <- function(x) return(freq_d2[x-1])
Coding from Matthew Stephens.
Create "random walk proposal" distribution.
x = rep(0, 11 * 1000)
x[1] = 3 #initialize; I've set arbitrarily set this to 3
for ( i in tail(seq_along(x), -1) ) {
current_x = x[i-1]
proposed_x = sample(6, 1) + sample(6, 1)
A = target(proposed_x)/target(current_x)
hastings_ratio = target(current_x) / target(proposed_x)
A = min(1, hastings_ratio * A)
if(runif(1) < A){
x[i] = proposed_x # accept move with probabily min(1,A)
} else {
x[i] = current_x # otherwise "reject" move, and stay where we are
}
}
frq <- as.data.frame(table(x))
frq$density <- frq[,2] / sum(frq$Freq)
frq$target <- dist_d2
frq$corr <- cor(frq$density, frq$dist)
print(frq, digits=3)
Output
x Freq density target corr
1 2 305 0.0277 0.0278 0.998
2 3 611 0.0555 0.0556 0.998
3 4 953 0.0866 0.0833 0.998
4 5 1191 0.1083 0.1111 0.998
5 6 1496 0.1360 0.1389 0.998
6 7 1856 0.1687 0.1667 0.998
7 8 1568 0.1425 0.1389 0.998
8 9 1191 0.1083 0.1111 0.998
9 10 952 0.0865 0.0833 0.998
10 11 568 0.0516 0.0556 0.998
11 12 309 0.0281 0.0278 0.998
## Create a stochastic transition matrix, sample <--> next sample
## See https://en.wikipedia.org/wiki/Stochastic_matrix
el <- cbind(head(x, -1), tail(x, -1))
g <- graph_from_edgelist(el, directed = FALSE) - 1
pb <- as.matrix(g[]);
pb <- pb / ifelse(rowSums(pb) > 0, rowSums(pb), 1)
Next we demonstrate that the stationary state of the Markov chain converges to the target distribution.
The stochastic matrix pb[i,j] describes the transition probabilities from state [i] to [j]
For state s0, s1 = pb.s0 is the next state of the Markov system.
Calculate the stationary distribution of the Markov chain by
constructing the power sequence of pb, until stable.
pb1, pb2, ... describes the transition probabilities after one, two, ... transitions. pb* = pbn is stationary when pbn == pb(n+1) and pb > 0 for all elements of pb.That is, all states are reachable with positive probability.
Google, math Markov "drunken sailor", for non-technical introductions to Markov chains.
pb_star <- pb
for (i in seq(100)) {
pb_next <- pb_star %*% pb
pb_next <- pb_next
if ( sd(pb_star - pb_next) < 1E-5) break
pb_star <- pb_next;
}
state_d2 <- pb_star[1,]
print(cbind(state=state_d2, target=dist_d2, sd = sd(state_d2 - dist_d2)), digits=3)
Output.
state target sd
[1,] 0.0291 0.0278 0.00351
[2,] 0.0552 0.0556 0.00351
[3,] 0.0870 0.0833 0.00351
[4,] 0.1119 0.1111 0.00351
[5,] 0.1353 0.1389 0.00351
[6,] 0.1585 0.1667 0.00351
[7,] 0.1427 0.1389 0.00351
[8,] 0.1146 0.1111 0.00351
[9,] 0.0836 0.0833 0.00351
[10,] 0.0549 0.0556 0.00351
[11,] 0.0273 0.0278 0.00351
Here is an "amlost-equivalent" translation into R from CoCalc, but with some improvements in the speed performance
G <- graph_from_literal(0 - -1, 0 - -3, 0 - -2, 1 - -3)
nit <- 1e4
cur <- 1
deg <- degree(G)
neighs <- setNames(ego(G, mindist = 1), V(G))
freq <- setNames(rep(0, vcount(G)), V(G))
for (k in seq(nit)) {
nb <- neighs[[cur]]
nxt <- nb[sample(length(nb), 1)]
if (runif(1) < deg[cur] / deg[nxt]) {
cur <- nxt
}
freq[cur] <- freq[cur] + 1
}
p <- proportions(freq)
where neighs and deg are prepared in advanced to avoid calling functions neighbors and degree within the loop, since they are expensive.
Benchmark
You can see the benchmark below (I used nit <- 1e2 to save time but still valid to show the performance differernce)
g <- graph_from_literal(0 - -1, 0 - -3, 0 - -2, 1 - -3)
nit <- 1e2
f_TIC <- function() {
set.seed(1)
cur <- 1
deg <- degree(g)
neighs <- setNames(ego(g, mindist = 1), V(g))
freq <- setNames(rep(0, vcount(g)), V(g))
for (k in seq(nit)) {
nb <- neighs[[cur]]
nxt <- nb[sample(length(nb), 1)]
if (runif(1) < deg[cur] / deg[nxt]) {
cur <- nxt
}
freq[cur] <- freq[cur] + 1
}
setNames(proportions(freq), names(V(g)))
}
f_clp <- function() {
set.seed(1)
cur <- 1
freq <- rep(0, vcount(g))
names(freq) <- as_ids(V(g))
for (i in seq(nit)) {
neigh <- V(g)[.nei(cur)]
nextnode <- neigh[sample(length(neigh), 1)]
if (runif(1) < degree(g, cur) / degree(g, nextnode)) {
cur <- nextnode
}
freq[cur] <- freq[cur] + 1
}
freq / sum(freq)
}
microbenchmark(
f_TIC(),
f_clp(),
check = "identical",
times = 20L
)
and you will see
Unit: milliseconds
expr min lq mean median uq max neval
f_TIC() 14.6516 14.96495 17.54595 15.59415 19.63485 27.0141 20
f_clp() 54.8336 59.41840 63.09275 60.19275 64.41965 103.8254 20
I am trying to write a function to output the minimal common fraction n/d, where min <= d <= max <= |(n/d) - pi|
That is:
n is the numerator
d is the denominator
And the minand max are the boundaries, i.e. search over all denominators dbetween min and max.
If d = 1: 3/1 <= pi <= 4/1, gives the closest fraction of 3/1 with a distance of |3/1 - pi| = 0.142
...
If d = 4: 12/4 <= pi <= 13/4, gives the closest fraction of 13/4 with a distance of |13/4 - pi| = 0.108
...
If d = 6: 18/6 <= pi <= 19/6, gives the closest fraction of 19/6 with distance of |19/6 - pi| = 0.025
If d = 7: 21/7 <= pi <= 22/7, gives the closest fraction of 22/7 with a distance of |22/7 - pi| = 0.001
...
If d = 10: 31/10 <= pi <= 32/10 gives the closest fraction of 31/10 with a distance of |31/10 - pi| = 0.042
Therefore, here, the best approximation is 22/7 when d = 7 and where a distance to pi is 0.001
min = 1
max = 10
library(Rmpfr)
Const("pi", 3333) # pi correct to 1000 decimal places
1 'mpfr' number of precision 3333 bits
[1] 3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679821480865132823066470938446095505822317253594081284811174502841027019385211055596446229489549303819644288109756659334461284756482337867831652712019091456485669234603486104543266482133936072602491412737245870066063155881748815209209628292540917153643678925903600113305305488204665213841469519415116094330572703657595919530921861173819326117931051185480744623799627495673518857527248912279381830119491298336733624406566430860213949463952247371907021798609437027705392171762931767523846748184676694051320005681271452635608277857713427577896091736371787214684409012249534301465495853710507922796892589235420199561121290219608640344181598136297747713099605187072113499999983729780499510597317328160963185950244594553469083026425223082533446850352619311881710100031378387528865875332083814206171776691473035982534904287554687311595628638823537875937519577818577805321712268066130019278766111959092164201989381
The following function finds the best approximation in an entire range of denominators (and replaces an extremely inefficient function I posted earlier. Look at the edit history if you want a good laugh).
findBestApprox <- function(minD, maxD) {
lowers <- floor(pi*(minD:maxD))/(minD:maxD)
i <- which.min(abs(pi - lowers))
best.lower <- lowers[i]
uppers <- ceiling(pi*(minD:maxD))/(minD:maxD)
j <- which.min(abs(pi - uppers))
best.upper <- uppers[j]
if(abs(pi - best.lower) < abs(pi - best.upper)) {
d <- minD + i - 1
n <- floor(pi*d)
} else {
d <- minD + j - 1
n <- ceiling(pi*d)
}
c(n,d)
}
For example
> findBestApprox(2,1000)
[1] 355 113
> 355/113
[1] 3.141593
The vectorized nature of the code makes it very fast, taking only a second or so to search out to 10 million:
> findBestApprox(2,10000000)
[1] 5419351 1725033
> format(5419351/1725033,digits = 16)
[1] "3.141592653589815"
> 5419351/1725033 - pi
[1] 2.220446e-14
currently i work on calibration of probability. i use the calibration approach, called rescaling algorithm - the source http://lem.cnrs.fr/Portals/2/actus/DP_201106.pdf (page 7).
the algorithm i wrote is:
rescaling_fun = function(x, y, z) {
P_korg = z # yhat_test_prob$BAD
P_k_C1 = sum(as.numeric(y) - 1)/length(y) # testset$BAD
P_kt_C1 = sum(as.numeric(x) - 1)/length(x) # trainset$BAD
P_k_C0 = sum(abs(as.numeric(y) - 2))/length(y)
P_kt_C0 = sum(abs(as.numeric(x) - 2))/length(x)
P_new <- ((P_k_C1/P_kt_C1) * P_korg)/((P_k_C0/P_k_C0) * (1 - P_korg) + (P_k_C0/P_k_C1) * (P_korg))
return(P_new)
}
the input values are:
1. x - train_set$BAD (actuals of `train set`)
2. y - test_set$BAD (actuals of `test set`)
3. z - yhat_test_prob$BAD (prediction on `test set`)
the problem - the result values are not within range of 0 and 1. Could you please help to solve the problem?
Your formulas to obtain probs (P_k_C1 ...) need to be modified. For example, according to the paper, y is a binary variable (0, 1) and the formula is sum(y - 1)/length(y) which is most likely to be negative - it converts y values to be -1 or 0, followed by adding them. I consider it should be (sum(y)-1)/length(y). Below is an example.
set.seed(1237)
y <- sample(0:1, 10, replace = T)
y
[1] 0 1 0 0 0 1 1 0 1 1
# it must be negative as it is sum(y - 1) - y is 0 or 1
sum(as.numeric(y) - 1)/length(y)
[1] -0.5
# modification
(sum(as.numeric(y)) - 1)/length(y)
[1] 0.4
I am working with the output from a model in which there are parameter estimates that may not follow a-priori expectations. I would like to write a function that forces these utility estimates back in line with those expectations. To do this, the function should minimize the sum of the squared deviance between the starting values and the new estimates. Since we have a-priori expections, the optimization should be subject to the following constraints:
B0 < B1
B1 < B2
...
Bj < Bj+1
For example, the raw parameter estimates below are flipflopped for B2 and B3. The columns Delta and Delta^2 show the deviance between the original parameter estimate and the new coefficient. I am trying to minimize the column Delta^2. I've coded this up in Excel and shown how Excel's Solver would optimize this problem providing the set of constraints:
Beta BetaRaw Delta Delta^2 BetaNew
B0 1.2 0 0 1.2
B1 1.3 0 0 1.3
B2 1.6 -0.2 0.04 1.4
B3 1.4 0 0 1.4
B4 2.2 0 0 2.2
After reading through ?optim and ?constrOptim, I'm not able to grok how to set this up in R. I'm sure I'm just being a bit dense, but could use some pointers in the right direction!
3/24/2012 - Added bounty since I'm not smart enough to translate the first answer.
Here's some R code that should be on the right path. Assuming that the betas start with:
betas <- c(1.2,1.3,1.6,1.4,2.2)
I want to minimize the following function such that b0 <= b1 <= b2 <= b3 <= b4
f <- function(x) {
x1 <- x[1]
x2 <- x[2]
x3 <- x[3]
x4 <- x[4]
x5 <- x[5]
loss <- (x1 - betas[1]) ^ 2 +
(x2 - betas[2]) ^ 2 +
(x3 - betas[3]) ^ 2 +
(x4 - betas[4]) ^ 2 +
(x5 - betas[5]) ^ 2
return(loss)
}
To show that the function works, the loss should be zero if we pass the original betas in:
> f(betas)
[1] 0
And relatively large with some random inputs:
> set.seed(42)
> f(rnorm(5))
[1] 8.849329
And minimized at the values I was able to calculate in Excel:
> f(c(1.2,1.3,1.4,1.4,2.2))
[1] 0.04
1.
Since the objective is quadratic and the constraints linear,
you can use solve.QP.
It finds the b that minimizes
(1/2) * t(b) %*% Dmat %*% b - t(dvec) %*% b
under the constraints
t(Amat) %*% b >= bvec.
Here, we want b that minimizes
sum( (b-betas)^2 ) = sum(b^2) - 2 * sum(b*betas) + sum(beta^2)
= t(b) %*% t(b) - 2 * t(b) %*% betas + sum(beta^2).
Since the last term, sum(beta^2), is constant, we can drop it,
and we can set
Dmat = diag(n)
dvec = betas.
The constraints are
b[1] <= b[2]
b[2] <= b[3]
...
b[n-1] <= b[n]
i.e.,
-b[1] + b[2] >= 0
- b[2] + b[3] >= 0
...
- b[n-1] + b[n] >= 0
so that t(Amat) is
[ -1 1 ]
[ -1 1 ]
[ -1 1 ]
[ ... ]
[ -1 1 ]
and bvec is zero.
This leads to the following code.
# Sample data
betas <- c(1.2, 1.3, 1.6, 1.4, 2.2)
# Optimization
n <- length(betas)
Dmat <- diag(n)
dvec <- betas
Amat <- matrix(0,nr=n,nc=n-1)
Amat[cbind(1:(n-1), 1:(n-1))] <- -1
Amat[cbind(2:n, 1:(n-1))] <- 1
t(Amat) # Check that it looks as it should
bvec <- rep(0,n-1)
library(quadprog)
r <- solve.QP(Dmat, dvec, Amat, bvec)
# Check the result, graphically
plot(betas)
points(r$solution, pch=16)
2.
You can use constrOptim in the same way (the objective function can be arbitrary, but the constraints have to be linear).
3.
More generally, you can use optim if you reparametrize the problem
into a non-constrained optimization problem,
for instance
b[1] = exp(x[1])
b[2] = b[1] + exp(x[2])
...
b[n] = b[n-1] + exp(x[n-1]).
There are a few examples
here
or there.
Alright, this is starting to take form, but still has some bugs. Based on the conversation in chat with #Joran, it seems I can include a conditional that will set the loss function to an arbitrarily large value if the values are not in order. This seems to work IF the discrepancy occurs between the first two coefficients, but not thereafter. I'm having a hard time parsing out why that would be the case.
Function to minimize:
f <- function(x, x0) {
x1 <- x[1]
x2 <- x[2]
x3 <- x[3]
x4 <- x[4]
x5 <- x[5]
loss <- (x1 - x0[1]) ^ 2 +
(x2 - x0[2]) ^ 2 +
(x3 - x0[3]) ^ 2 +
(x4 - x0[4]) ^ 2 +
(x5 - x0[5]) ^ 2
#Make sure the coefficients are in order
if any(diff(c(x1,x2,x3,x4,x5)) > 0) loss = 10000000
return(loss)
}
Working example (sort of, it seems the loss would be minimized if b0 = 1.24?):
> betas <- c(1.22, 1.24, 1.18, 1.12, 1.10)
> optim(betas, f, x0 = betas)$par
[1] 1.282 1.240 1.180 1.120 1.100
Non-working example (note that the third element is still larger than the second:
> betas <- c(1.20, 1.15, 1.18, 1.12, 1.10)
> optim(betas, f, x0 = betas)$par
[1] 1.20 1.15 1.18 1.12 1.10
I am trying to use http://rss.acs.unt.edu/Rdoc/library/stats/html/constrOptim.html in R to do optimization in R with some given linear constraints but not able to figure out how to set up the problem.
For example, I need to maximize $f(x,y) = log(x) + \frac{x^2}{y^2}$ subject to constraints $g_1(x,y) = x+y < 1$, $g_2(x,y) = x > 0$ and $g_3(x,y) = y > 0$. How do I do this in R? This is just a hypothetical example. Do not worry about its structure, instead I am interested to know how to set this up in R.
thanks!
Setting up the function was trivial:
fr <- function(x) { x1 <- x[1]
x2 <- x[2]
-(log(x1) + x1^2/x2^2) # need negative since constrOptim is a minimization routine
}
Setting up the constraint matrix was problematic due to a lack of much documentation, and I resorted to experimentation. The help page says "The feasible region is defined by ui %*% theta - ci >= 0". So I tested and this seemed to "work":
> rbind(c(-1,-1),c(1,0), c(0,1) ) %*% c(0.99,0.001) -c(-1,0, 0)
[,1]
[1,] 0.009
[2,] 0.990
[3,] 0.001
So I put in a row for each constraint/boundary:
constrOptim(c(0.99,0.001), fr, NULL, ui=rbind(c(-1,-1), # the -x-y > -1
c(1,0), # the x > 0
c(0,1) ), # the y > 0
ci=c(-1,0, 0)) # the thresholds
For this problem there is a potential difficulty in that for all values of x the function goes to Inf as y -> 0. I do get a max around x=.95 and y=0 even when I push the starting values out to the "corner", but I'm somewhat suspicious that this is not the true maximum which I would have guessed was in the "corner".
EDIT:
Pursuing this I reasoned that the gradient might provide additional "direction" and added a gradient function:
grr <- function(x) { ## Gradient of 'fr'
x1 <- x[1]
x2 <- x[2]
c(-(1/x[1] + 2 * x[1]/x[2]^2),
2 * x[1]^2 /x[2]^3 )
}
This did "steer" the optimization a bit closer to the c(.999..., 0) corner, instead of moving away from it, as it did for some starting values. I remain somewhat disappointed that the process seems to "head for the cliff" when the starting values are close to the center of the feasible region:
constrOptim(c(0.99,0.001), fr, grr, ui=rbind(c(-1,-1), # the -x-y > -1
c(1,0), # the x > 0
c(0,1) ), # the y > 0
ci=c(-1,0, 0) )
$par
[1] 9.900007e-01 -3.542673e-16
$value
[1] -7.80924e+30
$counts
function gradient
2001 37
$convergence
[1] 11
$message
[1] "Objective function increased at outer iteration 2"
$outer.iterations
[1] 2
$barrier.value
[1] NaN
Note: Hans Werner Borchers posted a better example on R-Help that succeeded in getting the corner values by setting the constraint slightly away from the edge:
> constrOptim(c(0.25,0.25), fr, NULL,
ui=rbind( c(-1,-1), c(1,0), c(0,1) ),
ci=c(-1, 0.0001, 0.0001))
$par
[1] 0.9999 0.0001