To understand how Gram-Schmidt Process is translated into this piece of code as the implementation - math

Trying to understand Gram-Schmidt process from this explanation:
http://mlwiki.org/index.php/Gram-Schmidt_Process
The steps of the calculation make sense to me. However the Python implementation included in the same article doesn't seem to be aligned.
def normalize(v):
return v / np.sqrt(v.dot(v))
n = len(A)
A[:, 0] = normalize(A[:, 0])
for i in range(1, n):
Ai = A[:, i]
for j in range(0, i):
Aj = A[:, j]
t = Ai.dot(Aj)
Ai = Ai - t * Aj
A[:, i] = normalize(Ai)
From above code, we see it does dot product for V1 and b, however the (V1,V1) part is not done as the denominator (refer to below equation). I wonder how below equation is translated into code residing in the for loop?

This is what the code does exactly
Basically it normalize the previous vector (column in A) and project the current one to it and to be subtracted by the current one.
Normalization happens with every vector for neat calculation.
The V2 equation above doesn't normalize the previous vector hence the difference.

Try this vectorized implementation.
Also I would suggest to go through David C lay book for theory.
def replace_zero(array):
for i in range(len(array)) :
if array[i] == 0 :
array[i] = 1
return array
def gram_schmidt(self,A, norm=True, row_vect=False):
"""Orthonormalizes vectors by gram-schmidt process
Parameters
-----------
A : ndarray,
Matrix having vectors in its columns
norm : bool,
Do you need Normalized vectors?
row_vect: bool,
Does Matrix A has vectors in its rows?
Returns
-------
G : ndarray,
Matrix of orthogonal vectors
Gram-Schmidt Process
--------------------
The Gram–Schmidt process is a simple algorithm for
producing an orthogonal or orthonormal basis for any
nonzero subspace of Rn.
Given a basis {x1,....,xp} for a nonzero subspace W of Rn,
define
v1 = x1
v2 = x2 - (x2.v1/v1.v1) * v1
v3 = x3 - (x3.v1/v1.v1) * v1 - (x3.v2/v2.v2) * v2
.
.
.
vp = xp - (xp.v1/v1.v1) * v1 - (xp.v2/v2.v2) * v2 - .......
.... - (xp.v(p-1) / v(p-1).v(p-1) ) * v(p-1)
Then {v1,.....,vp} is an orthogonal basis for W .
In addition,
Span {v1,.....,vp} = Span {x1,.....,xp} for 1 <= k <= p
References
----------
Linear Algebra and Its Applications - By David.C.Lay
"""
if row_vect :
# if true, transpose it to make column vector matrix
A = A.T
no_of_vectors = A.shape[1]
G = A[:,0:1].copy() # copy the first vector in matrix
# 0:1 is done to to be consistent with dimensions - [[1,2,3]]
# iterate from 2nd vector to number of vectors
for i in range(1,no_of_vectors):
# calculates weights(coefficents) for every vector in G
numerator = A[:,i].dot(G)
denominator = np.diag(np.dot(G.T,G)) #to get elements in diagonal
weights = np.squeeze(numerator/denominator)
# projected vector onto subspace G
projected_vector = np.sum(weights * G,
axis=1,
keepdims=True)
# orthogonal vector to subspace G
orthogonalized_vector = A[:,i:i+1] - projected_vector
# now add the orthogonal vector to our set
G = np.hstack((G,orthogonalized_vector))
if norm :
# to get orthoNormal vectors (unit orthogonal vectors)
# replace zero to 1 to deal with division by 0 if matrix has 0 vector
# or normazalization value comes out to be zero
G = G/self.replace_zero(np.linalg.norm(G,axis=0))
if row_vect:
return G.T
return G
G = np.array([[1,0,0],[1,1,0],[1,1,1],[1,1,1]])
gram_schmidt(G)
>
array([[ 0.5 , -0.8660254 , 0. ],
[ 0.5 , 0.28867513, -0.81649658],
[ 0.5 , 0.28867513, 0.40824829],
[ 0.5 , 0.28867513, 0.40824829]])

Related

Could not find the optimal solution after adding constraints

My code is as follows:
gekko = GEKKO(remote=True)
# create variable, each variable is a vector, each element
# of the vector is a binary
s = []
for i in range(N):
s.append(gekko.Array(gekko.Var, s_len[i], value=0, lb=0, ub=1, integer=True))
# some constants used in the objective/constraint function
c, d, r, m, L = create_c_d_r_m_L() # they are all numpy ndarry
# define the objective function
def objective():
obj = 0
for i in range(N):
obj += np.dot(s[i], c[i]) + np.dot(s[i], d[i])
for idx, (i, j) in enumerate(E):
obj += np.dot(np.dot(s[i], r[idx].reshape(s_len[i], s_len[j])),\
s[j]) # s[i] * r[i, j] * s[j]
return obj
# add constraints
# (a) each vector can only have and must have one 1
for i in range(N):
gekko.Equation(gekko.sum(s[i]) == 1)
# (b)
for t in range(N):
peak_mem = gekko.sum([np.dot(s[i], m[i]) for i in L[t]])
gekko.Equation(peak_mem < DEVICE_MEM)
# DEVICE_MEM is a predefined big int
# solve
gekko.Obj(objective())
gekko.solve(disp=True)
I found that when removing constraint (b), the solver can output the optimal solution for s. However, if we add (b) and set DEVICE_MEM to a very large number (which should not affect the solution), the s is not optimal anymore. I'm wondering if I am doing something wrong here because I tried both APOPT(solvertype=1) and IPOPT (solvertype=3) and they give the same nonoptimal results.
To give more context to the problem: this is an optimization over the graph. N represents the number of nodes in the graph. E is the set that contains all edges in the graph. c, d, m are three types of cost of a node. r is the cost of edges. Each node has multiple strategies (represented by the vector s[i]), and we need to select the best strategy for each node so that the overall cost is minimal.
Detailed constants:
# s_len: record the length of each vector
# (the # of strategies for each node,
# here we assume the length are all 10)
s_len = np.ones(N) * 10
# c, d, m are the costs of each node
# let's assume the c/d/m cost for i node is just i
c, d, m = [], [], []
for i in range(N):
c[i] = s_len[i] * [i]
d[i] = s_len[i] * [i]
m[i] = s_len[i] * [i]
# r is the edge cost, let's assume the cost for
# each edge is just i * j
r = []
for (i,j) in E: # E records all edges
cur_r = s_len[i] * s_len[j] * [i*j]
r.append(cur_r)
# L contains the node ids, we just randomly generate 10 integers here
L = []
for i in range(N):
cur_L = [randrange(N) for _ in range(10)]
L.append(cur_L)
I've been stuck on this for a while and any comments/answers are highly appreciated! Thanks!
Try reframing the inequality constraint:
for t in range(N):
peak_mem = gekko.sum([np.dot(s[i], m[i]) for i in L[t]])
gekko.Equation(peak_mem < DEVICE_MEM)
as a variable with an upper bound:
peak_mem = m.Array(m.Var,N,ub=DEVICE_MEM)
for t in range(N):
m.Equation(peak_mem[t]==\
gekko.sum([np.dot(s[i], m[i]) for i in L[t]])
The N inequality constraints peak_mem < DEVICE_MEM are converted to equality constraints with slack variables as s[i] = DEVICE_MEM - peak_mem with a simple inequality constraint on the slack s[i]>=0. If the inequality constraint far from the bound, then the slack variable can be very large. Formulating the equation as a variable may help.
I tried using the information in the question to pose a minimal problem that could reproduce the error and the potential solution. If you need more specific suggestions, please modify the code to be a complete and minimal example that reproduces the error. This helps with verifying the solution.

CVXPY violates constraints when it solves SDP

Let's say that I want to solve the following problem.
minimize Tr(CY)
s.t. Y = xxT
x is 0 or 1.
where xxT indicates an outer product of n-1 dimension vector x. C is a n-1 by n-1 square matrix. To convert this problem to a problem with a single matrix variable, I can write down the code as follows by using cvxpy.
import cvxpy as cp
import numpy as np
n = 8
np.random.seed(1)
S = np.zeros(shape=(int(n), int(n)))
S[int(n-1), int(n-1)] = 1
C = np.zeros(shape=(n,n))
C[:n-1, :n-1] = np.random.randn(n-1, n-1)
X = cp.Variable((n,n), PSD=True)
constraints=[]
constraints.append(cp.trace(S # X) == 1)
for i in range(n-1):
Q = np.zeros(shape=(n,n))
Q[i,i] = 1
Q[-1,i] = -0.5
Q[i,-1] = -0.5
const = cp.trace(Q # X) == 0
constraints.append(const)
prob = cp.Problem(cp.Minimize(cp.trace(C # X)),constraints)
prob.solve(solver=cp.MOSEK)
print("X is")
print(X.value)
print("C is")
print(C)
To satisfy the binary constraint that the entries of the vector x should be one or zero, I added some constraints for the matrix variable X.
X = [Y x; xT 1]
Tr(QX) == 0
There are n-1 Q matrices which are forcing the vector x's entries to be 0 or 1.
However, when I ran this simple code, the constraints are violated severely.
Looking forward to see any suggestion or comments on this.

Laplace expansion for determinants with Octave

I'm trying to calculate the determinant of a matrix using Laplace expansion in Octave. I use two functions:
submatrix, gets submatrix given matrix and indices:
function A = submatrix(A, i, j)
A(i,:) = [];
A(:,j) = [];
endfunction
determinant, for recursively calculating the determinant:
function d = determinant(A)
[m,n] = size(A);
if m == 2
# Base case are 2x2 matrices
d = A(1,1)*A(2,2) - A(1,2)*A(2,1);
else
d = 0;
for j = 1:n
d = d + (-1).^(1+j)*determinant(submatrix(A,1,j));
endfor
endif
endfunction
The function works correctly with 3 by 3 matrices, yet it always returns 0 (or -0) for bigger matrices (4 by 4 and bigger).
Question: Why does determinant returns 0 for matrices bigger than 3 by 3?

Given a list of coefficients, create a polynomial

I want to create a polynomial with given coefficients. This seems very simple but what I have found till now did not appear to be the thing I desired.
For example in such an environment;
n = 11
K = GF(4,'a')
R = PolynomialRing(GF(4,'a'),"x")
x = R.gen()
a = K.gen()
v = [1,a,0,0,1,1,1,a,a,0,1]
Given a list/vector v of length n (I will set this n and v at the begining), I want to get the polynomial v(x) as v[i]*x^i.
(Actually after that I am going to build the quotient ring GF(4,'a')[x] /< x^n-v(x) > after getting this v(x) from above) then I will say;
S = R.quotient(x^n-v(x), 'y')
y = S.gen()
But I couldn't write it.
This is a frequently asked question in many places so it is better to leave it here as an answer although the answer I have is so simple:
I just wrote R(v) and it gave me the polynomial:
sage
n = 11
K = GF(4,'a')
R = PolynomialRing(GF(4,'a'),"x")
x = R.gen()
a = K.gen()
v = [1,a,0,0,1,1,1,a,a,0,1]
R(v)
x^10 + a*x^8 + a*x^7 + x^6 + x^5 + x^4 + a*x + 1
Basically (that is, ignoring the specifics of your polynomial ring) you have a list/vector v of length n and you require a polynomial which is the sum of all v[i]*x^i. Note that this sum equals the matrix product V.X where V is a one row matrix (essentially equal to the vector v) and X is a column matrix consisting of powers of x. In Maxima you could write
v: [1,a,0,0,1,1,1,a,a,0,1]$
n: length(v)$
V: matrix(v)$
X: genmatrix(lambda([i,j], x^(i-1)), n, 1)$
V.X;
The output is
x^10+ax^8+ax^7+x^6+x^5+x^4+a*x+1

how to compute the original vector from a distance matrix?

I have a small question about vector and matrix.
Suppose a vector V = {v1, v2, ..., vn}. I generate a n-by-n distance matrix M defined as:
M_ij = | v_i - v_j | such that i,j belong to [1, n].
That is, each element M_ij in the square matrix is the absolute distance of two elements in V.
For example, I have a vector V = {1, 3, 3, 5}, the distance matrix will be
M=[
0 2 2 4;
2 0 0 2;
2 0 0 2;
4 2 2 0; ]
It seems pretty simple. Now comes to the question. Given such a matrix M, how to obtain the initial V?
Thank you.
Based on some answer for this question, it seems that the answer is not unique. So, now suppose that all the initial vector has been normalized to 0 mean and 1 variance. The question is: Given such a symmetric distance matrix M, how to decide the initial normalized vector?
You can't. To give you an idea of why, consider these two cases:
V1 = {1,2,3}
M1 = [ 0 1 2 ; 1 0 1 ; 2 1 0 ]
V2 = {3,4,5}
M2 = [ 0 1 2 ; 1 0 1 ; 2 1 0 ]
As you can see, a single M could be the result of more than one V. Therefore, you can't map backwards.
There is no way to determine the answer uniquely, since the distance matrix is invariant to adding a constant to all elements and to multiplying all the values by -1. Assuming that element 1 is equal to 0, and that the first nonzero element is positive, however, you can find an answer. Here is the pseudocode:
# Assume v[1] is 0
v[1] = 0
# e is value of first non-zero vector element
e = 0
# ei is index of first non-zero vector element
ei = 0
for i = 2...n:
# if all vector elements have been 0 so far
if e == 0:
# get the current distance from element 1 and its index
# this new element may still be 0
e = d[1,i]
ei = i
v[i] = e
elseif d[1,i] == d[ei,i] + v[ei]: # v[i] <= v[1]
# v[i] is to the left of v[1] (assuming v[ei] > v[1])
v[i] = -d[1,i]
else:
# some other case; v[i] is to the right of v[1]
v[i] = d[1,i]
I don't think it is possible to find the original vector, but you can find a translation of the vector by taking the first row of the matrix.
If you let M_ij = | v_i - v_j | and you translate all v_k for k\in [1,n] you will get
M_ij = | v-i + 1 - v_j + 1 |
= | v_i - v_j |
Hence, just take the first row as the vector and find one initial point to translate the vector to.
Correction:
Let v_1 = 0, and let l_k = | v_k | for k\in [2,n] and p_k the parity of v_k
Let p_1 = 1
for(int i = 2; i < n; i++)
if( | l_i - l_(i+1) | != M_i(i+1) )
p_(i+1) = - p_i
else
p_(i+1) = p_i
doing this for all v_k for k\in [2,n] in order will show the parity of each v_k in respect to the others
Then you can find a translation of the original vector with the same or opposite direction
Update (For Normalized vector):
Let d = Sqrt(v_1^2 + v_2^2 + ... + v_n^2)
Vector = {0, v_1 / d, v_2 / d, ... , v_n / d}
or
{0, -v_1 / d, -v_2 / d, ... , -v_n / d}

Resources