Could not find the optimal solution after adding constraints - constraints

My code is as follows:
gekko = GEKKO(remote=True)
# create variable, each variable is a vector, each element
# of the vector is a binary
s = []
for i in range(N):
s.append(gekko.Array(gekko.Var, s_len[i], value=0, lb=0, ub=1, integer=True))
# some constants used in the objective/constraint function
c, d, r, m, L = create_c_d_r_m_L() # they are all numpy ndarry
# define the objective function
def objective():
obj = 0
for i in range(N):
obj += np.dot(s[i], c[i]) + np.dot(s[i], d[i])
for idx, (i, j) in enumerate(E):
obj += np.dot(np.dot(s[i], r[idx].reshape(s_len[i], s_len[j])),\
s[j]) # s[i] * r[i, j] * s[j]
return obj
# add constraints
# (a) each vector can only have and must have one 1
for i in range(N):
gekko.Equation(gekko.sum(s[i]) == 1)
# (b)
for t in range(N):
peak_mem = gekko.sum([np.dot(s[i], m[i]) for i in L[t]])
gekko.Equation(peak_mem < DEVICE_MEM)
# DEVICE_MEM is a predefined big int
# solve
gekko.Obj(objective())
gekko.solve(disp=True)
I found that when removing constraint (b), the solver can output the optimal solution for s. However, if we add (b) and set DEVICE_MEM to a very large number (which should not affect the solution), the s is not optimal anymore. I'm wondering if I am doing something wrong here because I tried both APOPT(solvertype=1) and IPOPT (solvertype=3) and they give the same nonoptimal results.
To give more context to the problem: this is an optimization over the graph. N represents the number of nodes in the graph. E is the set that contains all edges in the graph. c, d, m are three types of cost of a node. r is the cost of edges. Each node has multiple strategies (represented by the vector s[i]), and we need to select the best strategy for each node so that the overall cost is minimal.
Detailed constants:
# s_len: record the length of each vector
# (the # of strategies for each node,
# here we assume the length are all 10)
s_len = np.ones(N) * 10
# c, d, m are the costs of each node
# let's assume the c/d/m cost for i node is just i
c, d, m = [], [], []
for i in range(N):
c[i] = s_len[i] * [i]
d[i] = s_len[i] * [i]
m[i] = s_len[i] * [i]
# r is the edge cost, let's assume the cost for
# each edge is just i * j
r = []
for (i,j) in E: # E records all edges
cur_r = s_len[i] * s_len[j] * [i*j]
r.append(cur_r)
# L contains the node ids, we just randomly generate 10 integers here
L = []
for i in range(N):
cur_L = [randrange(N) for _ in range(10)]
L.append(cur_L)
I've been stuck on this for a while and any comments/answers are highly appreciated! Thanks!

Try reframing the inequality constraint:
for t in range(N):
peak_mem = gekko.sum([np.dot(s[i], m[i]) for i in L[t]])
gekko.Equation(peak_mem < DEVICE_MEM)
as a variable with an upper bound:
peak_mem = m.Array(m.Var,N,ub=DEVICE_MEM)
for t in range(N):
m.Equation(peak_mem[t]==\
gekko.sum([np.dot(s[i], m[i]) for i in L[t]])
The N inequality constraints peak_mem < DEVICE_MEM are converted to equality constraints with slack variables as s[i] = DEVICE_MEM - peak_mem with a simple inequality constraint on the slack s[i]>=0. If the inequality constraint far from the bound, then the slack variable can be very large. Formulating the equation as a variable may help.
I tried using the information in the question to pose a minimal problem that could reproduce the error and the potential solution. If you need more specific suggestions, please modify the code to be a complete and minimal example that reproduces the error. This helps with verifying the solution.

Related

CVXPY violates constraints when it solves SDP

Let's say that I want to solve the following problem.
minimize Tr(CY)
s.t. Y = xxT
x is 0 or 1.
where xxT indicates an outer product of n-1 dimension vector x. C is a n-1 by n-1 square matrix. To convert this problem to a problem with a single matrix variable, I can write down the code as follows by using cvxpy.
import cvxpy as cp
import numpy as np
n = 8
np.random.seed(1)
S = np.zeros(shape=(int(n), int(n)))
S[int(n-1), int(n-1)] = 1
C = np.zeros(shape=(n,n))
C[:n-1, :n-1] = np.random.randn(n-1, n-1)
X = cp.Variable((n,n), PSD=True)
constraints=[]
constraints.append(cp.trace(S # X) == 1)
for i in range(n-1):
Q = np.zeros(shape=(n,n))
Q[i,i] = 1
Q[-1,i] = -0.5
Q[i,-1] = -0.5
const = cp.trace(Q # X) == 0
constraints.append(const)
prob = cp.Problem(cp.Minimize(cp.trace(C # X)),constraints)
prob.solve(solver=cp.MOSEK)
print("X is")
print(X.value)
print("C is")
print(C)
To satisfy the binary constraint that the entries of the vector x should be one or zero, I added some constraints for the matrix variable X.
X = [Y x; xT 1]
Tr(QX) == 0
There are n-1 Q matrices which are forcing the vector x's entries to be 0 or 1.
However, when I ran this simple code, the constraints are violated severely.
Looking forward to see any suggestion or comments on this.

To understand how Gram-Schmidt Process is translated into this piece of code as the implementation

Trying to understand Gram-Schmidt process from this explanation:
http://mlwiki.org/index.php/Gram-Schmidt_Process
The steps of the calculation make sense to me. However the Python implementation included in the same article doesn't seem to be aligned.
def normalize(v):
return v / np.sqrt(v.dot(v))
n = len(A)
A[:, 0] = normalize(A[:, 0])
for i in range(1, n):
Ai = A[:, i]
for j in range(0, i):
Aj = A[:, j]
t = Ai.dot(Aj)
Ai = Ai - t * Aj
A[:, i] = normalize(Ai)
From above code, we see it does dot product for V1 and b, however the (V1,V1) part is not done as the denominator (refer to below equation). I wonder how below equation is translated into code residing in the for loop?
This is what the code does exactly
Basically it normalize the previous vector (column in A) and project the current one to it and to be subtracted by the current one.
Normalization happens with every vector for neat calculation.
The V2 equation above doesn't normalize the previous vector hence the difference.
Try this vectorized implementation.
Also I would suggest to go through David C lay book for theory.
def replace_zero(array):
for i in range(len(array)) :
if array[i] == 0 :
array[i] = 1
return array
def gram_schmidt(self,A, norm=True, row_vect=False):
"""Orthonormalizes vectors by gram-schmidt process
Parameters
-----------
A : ndarray,
Matrix having vectors in its columns
norm : bool,
Do you need Normalized vectors?
row_vect: bool,
Does Matrix A has vectors in its rows?
Returns
-------
G : ndarray,
Matrix of orthogonal vectors
Gram-Schmidt Process
--------------------
The Gram–Schmidt process is a simple algorithm for
producing an orthogonal or orthonormal basis for any
nonzero subspace of Rn.
Given a basis {x1,....,xp} for a nonzero subspace W of Rn,
define
v1 = x1
v2 = x2 - (x2.v1/v1.v1) * v1
v3 = x3 - (x3.v1/v1.v1) * v1 - (x3.v2/v2.v2) * v2
.
.
.
vp = xp - (xp.v1/v1.v1) * v1 - (xp.v2/v2.v2) * v2 - .......
.... - (xp.v(p-1) / v(p-1).v(p-1) ) * v(p-1)
Then {v1,.....,vp} is an orthogonal basis for W .
In addition,
Span {v1,.....,vp} = Span {x1,.....,xp} for 1 <= k <= p
References
----------
Linear Algebra and Its Applications - By David.C.Lay
"""
if row_vect :
# if true, transpose it to make column vector matrix
A = A.T
no_of_vectors = A.shape[1]
G = A[:,0:1].copy() # copy the first vector in matrix
# 0:1 is done to to be consistent with dimensions - [[1,2,3]]
# iterate from 2nd vector to number of vectors
for i in range(1,no_of_vectors):
# calculates weights(coefficents) for every vector in G
numerator = A[:,i].dot(G)
denominator = np.diag(np.dot(G.T,G)) #to get elements in diagonal
weights = np.squeeze(numerator/denominator)
# projected vector onto subspace G
projected_vector = np.sum(weights * G,
axis=1,
keepdims=True)
# orthogonal vector to subspace G
orthogonalized_vector = A[:,i:i+1] - projected_vector
# now add the orthogonal vector to our set
G = np.hstack((G,orthogonalized_vector))
if norm :
# to get orthoNormal vectors (unit orthogonal vectors)
# replace zero to 1 to deal with division by 0 if matrix has 0 vector
# or normazalization value comes out to be zero
G = G/self.replace_zero(np.linalg.norm(G,axis=0))
if row_vect:
return G.T
return G
G = np.array([[1,0,0],[1,1,0],[1,1,1],[1,1,1]])
gram_schmidt(G)
>
array([[ 0.5 , -0.8660254 , 0. ],
[ 0.5 , 0.28867513, -0.81649658],
[ 0.5 , 0.28867513, 0.40824829],
[ 0.5 , 0.28867513, 0.40824829]])

Given a list of coefficients, create a polynomial

I want to create a polynomial with given coefficients. This seems very simple but what I have found till now did not appear to be the thing I desired.
For example in such an environment;
n = 11
K = GF(4,'a')
R = PolynomialRing(GF(4,'a'),"x")
x = R.gen()
a = K.gen()
v = [1,a,0,0,1,1,1,a,a,0,1]
Given a list/vector v of length n (I will set this n and v at the begining), I want to get the polynomial v(x) as v[i]*x^i.
(Actually after that I am going to build the quotient ring GF(4,'a')[x] /< x^n-v(x) > after getting this v(x) from above) then I will say;
S = R.quotient(x^n-v(x), 'y')
y = S.gen()
But I couldn't write it.
This is a frequently asked question in many places so it is better to leave it here as an answer although the answer I have is so simple:
I just wrote R(v) and it gave me the polynomial:
sage
n = 11
K = GF(4,'a')
R = PolynomialRing(GF(4,'a'),"x")
x = R.gen()
a = K.gen()
v = [1,a,0,0,1,1,1,a,a,0,1]
R(v)
x^10 + a*x^8 + a*x^7 + x^6 + x^5 + x^4 + a*x + 1
Basically (that is, ignoring the specifics of your polynomial ring) you have a list/vector v of length n and you require a polynomial which is the sum of all v[i]*x^i. Note that this sum equals the matrix product V.X where V is a one row matrix (essentially equal to the vector v) and X is a column matrix consisting of powers of x. In Maxima you could write
v: [1,a,0,0,1,1,1,a,a,0,1]$
n: length(v)$
V: matrix(v)$
X: genmatrix(lambda([i,j], x^(i-1)), n, 1)$
V.X;
The output is
x^10+ax^8+ax^7+x^6+x^5+x^4+a*x+1

Accumulating Curried Function (SML)

I have a set of problems that I've been working through and can't seem to understand what the last one is asking. Here is the first problem, and my solution to it:
a) Often we are interested in computing ∑i=m..n f(i), the sum of function values f(i) for i = m through n. Define sigma f m n which computes ∑i=m..n f(i). This is different from defining sigma (f, m, n).
fun sigma f m n = if (m=n) then f(m) else (f(m) + sigma f (m+1) n);
The second problem, and my solution:
b) In the computation of sigma above, the index i goes from current
i to next value i+1. We may want to compute the sum of f(i) where i
goes from current i to the next, say i+2, not i+1. If we send this
information as an argument, we can compute more generalized
summation. Define ‘sum f next m n’ to compute such summation, where
‘next’ is a function to compute the next index value from the
current index value. To get ‘sigma’ in (a), you send the successor
function as ‘next’.
fun sum f next m n = if (m>=n) then f(m) else (f(m) + sum f (next) (next(m)) n);
And the third problem, with my attempt:
c) Generalizing sum in (b), we can compute not only summation but also
product and other forms of accumulation. If we want to compute sum in
(b), we send addition as an argument; if we want to compute the
product of function values, we send multiplication as an argument for
the same parameter. We also have to send the identity of the
operator. Define ‘accum h v f next m n’ to compute such accumulation,
where h is a two-variable function to do accumulation, and v is the
base value for accumulation. If we send the multiplication function
for h, 1 for v, and the successor function as ‘next’, this ‘accum’
computes ∏i=m..n f(i). Create examples whose ‘h’ is not addition or
multiplication, too.
fun accum h v f next m n = if (m>=n) then f(m) else (h (f(m)) (accum (h) (v) (f) (next) (next(m)) n));
In problem C, I'm unsure of what i'm suppose to do with my "v" argument. Right now the function will take any interval of numbers m - n and apply any kind of operation to them. For example, I could call my function
accum mult (4?) double next3 1 5;
where double is a doubling function and next3 adds 3 to a given value. Any ideas on how i'm suppoes to utilize the v value?
This set of problems is designed to lead to implementation of accumulation function. It takes
h - combines previous value and current value to produce next value
v - starting value for h
f - function to be applied to values from [m, n) interval before passing them to h function
next - computes next value in sequence
m and n - boundaries
Here is how I'd define accum:
fun accum h v f next m n = if m >= n then v else accum h (h (f m) v) f next (next m) n
Examples that were described in C will look like this:
fun sum x y = x + y;
fun mult x y = x * y;
fun id x = x;
accum sum 0 id next 1 10; (* sum [1, 10) staring 0 *)
accum mult 1 id next 1 10; (* prod [1, 10) starting 1 *)
For example, you can calculate sum of numbers from 1 to 10 and plus 5 if you pass 5 as v in first example.
The instructions will make more sense if you consider the possibility of an empty interval.
The "sum" of a single value n is n. The sum of no values is zero.
The "product" of a single value n is n. The product of no values is one.
A list of a single value n is [n] (n::nil). A list of no values is nil.
Currently, you're assuming that m ≤ n, and treating m = n as a special case that returns f m. Another approach is to treat m > n as the special case, returning v. Then, when m = n, your function will automatically return h v (f m), which is the same as (f m) (provided that v was selected properly for this h).
To be honest, though, I think the v-less approach is fine when the function's arguments specify an interval of the form [m,n], since there's no logical reason that such a function would support an empty interval. (I mean, [m,m−1] isn't so much "the empty interval" as it is "obvious error".) The v-ful approach is chiefly useful when the function's arguments specify a list or set of elements in some way that really could conceivably be empty, e.g. as an 'a list.

Minimum Weight Triangulation Taking Forever

so I've been working on a program in Python that finds the minimum weight triangulation of a convex polygon. This means that it finds the weight(The sum of all the triangle perimeters), as well as the list of chords(lines going through the polygon that break it up into triangles, not the boundaries).
I was under the impression that I'm using the dynamic programming algorithm, however when I tried using a somewhat more complex polygon it takes forever(I'm not sure how long it takes because I haven't gotten it to finish).
It works fine with a 10 sided polygon, however I'm trying 25 and that's what is making it stall. My teacher gave me the polygons so I assume that the 25 one is supposed to work as well.
Since this algorithm is supposed to be O(n^3), the 25 sided polygon should take roughly 15.625 times longer to calculate, however it's taking way longer seeing that the 10 sided seems instantaneous.
Am I doing some sort of n operation in there that I'm not realizing? I can't see anything I'm doing, except maybe the last part where I get rid of the duplicates by turning the list into a set, however in my program I put a trace after the decomp before the conversion happens, and it's not even reaching that point.
Here's my code, if you guys need anymore info just please ask. Something in there is making it take longer than O(n^3) and I need to find it so I can trim it out.
#!/usr/bin/python
import math
def cost(v):
ab = math.sqrt(((v[0][0] - v[1][0])**2) + ((v[0][1] - v[1][1])**2))
bc = math.sqrt(((v[1][0] - v[2][0])**2) + ((v[1][1] - v[2][1])**2))
ac = math.sqrt(((v[0][0] - v[2][0])**2) + ((v[0][1] - v[2][1])**2))
return ab + bc + ac
def triang_to_chord(t, n):
if t[1] == t[0] + 1:
# a and b
if t[2] == t[1] + 1:
# single
# b and c
return ((t[0], t[2]), )
elif t[2] == n-1 and t[0] == 0:
# single
# c and a
return ((t[1], t[2]), )
else:
# double
return ((t[0], t[2]), (t[1], t[2]))
elif t[2] == t[1] + 1:
# b and c
if t[0] == 0 and t[2] == n-1:
#single
# c and a
return ((t[0], t[1]), )
else:
#double
return ((t[0], t[1]), (t[0], t[2]))
elif t[0] == 0 and t[2] == n-1:
# c and a
# double
return ((t[0], t[1]), (t[1], t[2]))
else:
# triple
return ((t[0], t[1]), (t[1], t[2]), (t[0], t[2]))
file_name = raw_input("Enter the polygon file name: ").rstrip()
file_obj = open(file_name)
vertices_raw = file_obj.read().split()
file_obj.close()
vertices = []
for i in range(len(vertices_raw)):
if i % 2 == 0:
vertices.append((float(vertices_raw[i]), float(vertices_raw[i+1])))
n = len(vertices)
def decomp(i, j):
if j <= i: return (0, [])
elif j == i+1: return (0, [])
cheap_chord = [float("infinity"), []]
old_cost = cheap_chord[0]
smallest_k = None
for k in range(i+1, j):
old_cost = cheap_chord[0]
itok = decomp(i, k)
ktoj = decomp(k, j)
cheap_chord[0] = min(cheap_chord[0], cost((vertices[i], vertices[j], vertices[k])) + itok[0] + ktoj[0])
if cheap_chord[0] < old_cost:
smallest_k = k
cheap_chord[1] = itok[1] + ktoj[1]
temp_chords = triang_to_chord(sorted((i, j, smallest_k)), n)
for c in temp_chords:
cheap_chord[1].append(c)
return cheap_chord
results = decomp(0, len(vertices) - 1)
chords = set(results[1])
print "Minimum sum of triangle perimeters = ", results[0]
print len(chords), "chords are:"
for c in chords:
print " ", c[0], " ", c[1]
I'll add the polygons I'm using, again the first one is solved right away, while the second one has been running for about 10 minutes so far.
FIRST ONE:
202.1177 93.5606
177.3577 159.5286
138.2164 194.8717
73.9028 189.3758
17.8465 165.4303
2.4919 92.5714
21.9581 45.3453
72.9884 3.1700
133.3893 -0.3667
184.0190 38.2951
SECOND ONE:
397.2494 204.0564
399.0927 245.7974
375.8121 295.3134
340.3170 338.5171
313.5651 369.6730
260.6411 384.6494
208.5188 398.7632
163.0483 394.1319
119.2140 387.0723
76.2607 352.6056
39.8635 319.8147
8.0842 273.5640
-1.4554 226.3238
8.6748 173.7644
20.8444 124.1080
34.3564 87.0327
72.7005 46.8978
117.8008 12.5129
162.9027 5.9481
210.7204 2.7835
266.0091 10.9997
309.2761 27.5857
351.2311 61.9199
377.3673 108.9847
390.0396 148.6748
It looks like you have an issue with the inefficient recurision here.
...
def decomp(i, j):
...
for k in range(i+1, j):
...
itok = decomp(i, k)
ktoj = decomp(k, j)
...
...
You've ran into the same kind of issue as a naive recursive implementation of the Fibonacci Numbers, but the way this algorithm works, it'll probably be much worst on the run time. Assuming that is the only issue with you're algorithm, then you just need to use memorization to ensure that the decomp is only calculated once for each unique input.
The way to spot this issue is to print out the values of i, j and k as the triple (i,j,k). In order to obtain a runtime of O(N^3), you shouldn't see the same exact triple twice. However, the triple (22, 24, 23), appears at least twice (in the 25), and is the first such duplicate. That shows the algorithm is calculating the same thing multiple times, which is inefficient, and is bumping up the performance well past O(N^3). I'll leave figuring out what the algorithms actual performance is to you as an exercise. Assuming there isn't something else wrong with the algorithm the algorithm should eventually stop.

Resources