I’m having the following problem, which can be reduced from my code to:
set t:= 1..5; #Time periods
set e:= 1..2; #Inventory places
set p:= 1..3; #Products
var Iq{p,e,t} >= 0; #Inventory variable
#Moving variables:
# i for sums in t
# g for sums in e
# j for sums in p
subject to inventory_balance {j in p, i in t}:
sum{g in e} Iq[j,g,i] = sum{g in e} Iq[j,g,i-1] + sum{x in k} A[j,i,x] * Mt[i] - DS[i,j] ;
This is the inventory level, which is the sum of "g in e" of the inventory level at time i-1 plus other things. The problem is the "i-1" time period. The first iteration will be for time t=1 i.e because of the constraint it will become time t=0. I know that in that period (t=0) the amount of inventory is 0. So the thing is, how can I set the variable Iq[p,e,0] be 0 in a constraint?
Thanks in advance!
The easiest way to do it from where you are is:
set t := 0..5;
...
subject to starting_inventory_zero {j in p, k in e}: Iq{j,k,0} = 0;
and then tweak the indexing in inventory_balance to
{j in p, i in t: i > 0}
But if it were my code I'd do it with ordered sets:
set t := 0..5 ordered;
...
subject to starting_inventory_zero {j in p, k in e}: Iq{j,k,first(t)} = 0;
...
subject to inventory_balance {j in p, i in t: ord(i) > 1}:
sum{g in e} Iq[j,g,i] = sum{g in e} Iq[j,g,prev(i)] + sum{x in k} A[j,i,x] * Mt[i] - DS[i,j] ;
This does the same thing, but it generalises better. For instance, I could define my index set t as {JAN_2001, FEB_2001, ..., DEC_2016} and the code above would still work. (Unless I've made some typos, which is always possible!)
Related
My code is as follows:
gekko = GEKKO(remote=True)
# create variable, each variable is a vector, each element
# of the vector is a binary
s = []
for i in range(N):
s.append(gekko.Array(gekko.Var, s_len[i], value=0, lb=0, ub=1, integer=True))
# some constants used in the objective/constraint function
c, d, r, m, L = create_c_d_r_m_L() # they are all numpy ndarry
# define the objective function
def objective():
obj = 0
for i in range(N):
obj += np.dot(s[i], c[i]) + np.dot(s[i], d[i])
for idx, (i, j) in enumerate(E):
obj += np.dot(np.dot(s[i], r[idx].reshape(s_len[i], s_len[j])),\
s[j]) # s[i] * r[i, j] * s[j]
return obj
# add constraints
# (a) each vector can only have and must have one 1
for i in range(N):
gekko.Equation(gekko.sum(s[i]) == 1)
# (b)
for t in range(N):
peak_mem = gekko.sum([np.dot(s[i], m[i]) for i in L[t]])
gekko.Equation(peak_mem < DEVICE_MEM)
# DEVICE_MEM is a predefined big int
# solve
gekko.Obj(objective())
gekko.solve(disp=True)
I found that when removing constraint (b), the solver can output the optimal solution for s. However, if we add (b) and set DEVICE_MEM to a very large number (which should not affect the solution), the s is not optimal anymore. I'm wondering if I am doing something wrong here because I tried both APOPT(solvertype=1) and IPOPT (solvertype=3) and they give the same nonoptimal results.
To give more context to the problem: this is an optimization over the graph. N represents the number of nodes in the graph. E is the set that contains all edges in the graph. c, d, m are three types of cost of a node. r is the cost of edges. Each node has multiple strategies (represented by the vector s[i]), and we need to select the best strategy for each node so that the overall cost is minimal.
Detailed constants:
# s_len: record the length of each vector
# (the # of strategies for each node,
# here we assume the length are all 10)
s_len = np.ones(N) * 10
# c, d, m are the costs of each node
# let's assume the c/d/m cost for i node is just i
c, d, m = [], [], []
for i in range(N):
c[i] = s_len[i] * [i]
d[i] = s_len[i] * [i]
m[i] = s_len[i] * [i]
# r is the edge cost, let's assume the cost for
# each edge is just i * j
r = []
for (i,j) in E: # E records all edges
cur_r = s_len[i] * s_len[j] * [i*j]
r.append(cur_r)
# L contains the node ids, we just randomly generate 10 integers here
L = []
for i in range(N):
cur_L = [randrange(N) for _ in range(10)]
L.append(cur_L)
I've been stuck on this for a while and any comments/answers are highly appreciated! Thanks!
Try reframing the inequality constraint:
for t in range(N):
peak_mem = gekko.sum([np.dot(s[i], m[i]) for i in L[t]])
gekko.Equation(peak_mem < DEVICE_MEM)
as a variable with an upper bound:
peak_mem = m.Array(m.Var,N,ub=DEVICE_MEM)
for t in range(N):
m.Equation(peak_mem[t]==\
gekko.sum([np.dot(s[i], m[i]) for i in L[t]])
The N inequality constraints peak_mem < DEVICE_MEM are converted to equality constraints with slack variables as s[i] = DEVICE_MEM - peak_mem with a simple inequality constraint on the slack s[i]>=0. If the inequality constraint far from the bound, then the slack variable can be very large. Formulating the equation as a variable may help.
I tried using the information in the question to pose a minimal problem that could reproduce the error and the potential solution. If you need more specific suggestions, please modify the code to be a complete and minimal example that reproduces the error. This helps with verifying the solution.
I see there are a few other questions around this exercise but none specifically are asking what is meant within the hint... "define the state transition in such a way that the product abn is unchanged from state to state".
They also mention that this idea of using an "invariant quantity" is a powerful idea with respect to "iterative algorithms". By the way, this problem calls for the design of a "logarithmic" exponent algorithm that has a space complexity of O(1).
Mainly I just have no idea what is meant by this hint and am pretty confused. Can anyone give me a nudge in what is meant by this? The only thing I can really find about "invariant quantities" are described using examples in physics which only makes this concept more opaque.
Exercise description in full:
Exercise 1.16: Design a procedure that evolves an iterative exponentiation process that uses successive squaring and uses a logarithmic number of steps, as does fast-expt. (Hint: Using the observation that (bn/2)2 = (b2)n/2, keep, along with the exponent n and the base b, an additional state variable a, and define the state transformation in such a way that the product abn is unchanged from state to state.
At the beginning of the process a is taken to be 1, and the answer is given by the value of a at the end of the process. In general, the technique of defining an invariant quantity that remains unchanged from state to state is a powerful way to think about the design of iterative algorithms.)
Thanks in advance.
This is not about any "logarithmic exponentiation", which is a very vague and confusing terminology.
As the quote you provided says, it is about exponentiation function that takes logarithmic number of steps to get its final result.
So we want to develop a function, exp(b,n), which would take O(log n) time to finish.
The numbers involved are all integers.
So how do we calculate bn? We notice that
b^n = b * b * b * b *... * b
`------n times------/ O(n) time process
and, when n is even,
b^n = b * b * b * b * ... * b *
`------n/2 times-----/
b * b * b * b * ... * b
`------n/2 times-----/
= (b^(n/2))^2 ; viewed from the side
= (b^2)^(n/2) ; viewed from above
and when n is odd,
b^n = b * b * b * b * ... * b *
`------n/2 times-----/
b * b * b * b * ... * b *
`------n/2 times-----/
b
`--1 time--/
= (b^(n/2))^2 * b
= (b^2)^(n/2) * b
Note that both expression fall into same category if we write the first one as (b^2)^(n/2) * 1.
Also note that the equation
b^n = (b )^(n ) * { a where a = 1 }
= (b ^2)^(n/2) * { a where a = ... }
= (b' )^(n' ) * { a' }
means that to calculate b^n * a is the same as to calculate b'^n' * a' with the changed values of { b' = b^2 ; n' = n/2 ; a' = {if {n is even} then {a} else {a * b}} }.
So we don't actually compute either of the sides of that equation. Instead we keep the triples { b, n, a } at each step and transform them according to that rule
{ b, n, a } --> { b', n', a' } --> ...
with the initial values of b and n as given to us, and the first a equal to 1; and know that the final result calculated from any of the triples would be the same if we'd actually calculated it somehow. We still don't know how exactly we'd do that; just that it would be the same. That's the "invariant" part.
So what all this is good for? Well, since the chain { n --> n/2 --> ... } will certainly reach a point where n == 1, we know that at that point we can break the chain since
c^1 * d == c * d
and this one simple multiplication of these two numbers will produce the same result as the initial formula.
Which is the final result of the function.
And that's the meaning of the hint: maintain the state of the computation as a (here) triple of numbers (or just three variables named b, n, a), and implement your computation as a chain of state-transforming steps. When the chain is broken according to some test (here, n == 1), we've reached our destination and can calculate the final result according to some simple rule (here, c * d).
This gives us a nice and powerful methodology for problem solving.
Oh, and we also know that the length of that chain of changing states is O(log n), since we halve that n at each step.
when fast-exp execute, the trace looks something like this
b^14 = (b^2)^7 ; where: b' = b^2; a = 1, n = 7
b^14 = (b^2)^6 * b^2 ; where: b' = b^2; a = b^2, n = 6
b^14 = ((b^2)^2)^3 *b^2 ; where: b' = b^2^2 ; a = b^2, n = 3
b^14 = ((b^2)^2)^2 *b^2 *(b^2)^2 ; where: b' = b^2^2; a = b^2*(b^2)^2, n = 2
notice that all these statements have the general form of b^n = b'^n' * a', and this does not change
Is it safe to replace a/(b*c) with a/b/c when using integer-division on positive integers a,b,c, or am I at risk losing information?
I did some random tests and couldn't find an example of a/(b*c) != a/b/c, so I'm pretty sure it's safe but not quite sure how to prove it.
Thank you.
Mathematics
As mathematical expressions, ⌊a/(bc)⌋ and ⌊⌊a/b⌋/c⌋ are equivalent whenever b is nonzero and c is a positive integer (and in particular for positive integers a, b, c). The standard reference for these sorts of things is the delightful book Concrete Mathematics: A Foundation for Computer Science by Graham, Knuth and Patashnik. In it, Chapter 3 is mostly on floors and ceilings, and this is proved on page 71 as a part of a far more general result:
In the 3.10 above, you can define x = a/b (mathematical, i.e. real division), and f(x) = x/c (exact division again), and plug those into the result on the left ⌊f(x)⌋ = ⌊f(⌊x⌋)⌋ (after verifying that the conditions on f hold here) to get ⌊a/(bc)⌋ on the LHS equal to ⌊⌊a/b⌋/c⌋ on the RHS.
If we don't want to rely on a reference in a book, we can prove ⌊a/(bc)⌋ = ⌊⌊a/b⌋/c⌋ directly using their methods. Note that with x = a/b (the real number), what we're trying to prove is that ⌊x/c⌋ = ⌊⌊x⌋/c⌋. So:
if x is an integer, then there is nothing to prove, as x = ⌊x⌋.
Otherwise, ⌊x⌋ < x, so ⌊x⌋/c < x/c which means that ⌊⌊x⌋/c⌋ ≤ ⌊x/c⌋. (We want to show it's equal.) Suppose, for the sake of contradiction, that ⌊⌊x⌋/c⌋ < ⌊x/c⌋ then there must be a number y such that ⌊x⌋ < y ≤ x and y/c = ⌊x/c⌋. (As we increase a number from ⌊x⌋ to x and consider division by c, somewhere we must hit the exact value ⌊x/c⌋.) But this means that y = c*⌊x/c⌋ is an integer between ⌊x⌋ and x, which is a contradiction!
This proves the result.
Programming
#include <stdio.h>
int main() {
unsigned int a = 142857;
unsigned int b = 65537;
unsigned int c = 65537;
printf("a/(b*c) = %d\n", a/(b*c));
printf("a/b/c = %d\n", a/b/c);
}
prints (with 32-bit integers),
a/(b*c) = 1
a/b/c = 0
(I used unsigned integers as overflow behaviour for them is well-defined, so the above output is guaranteed. With signed integers, overflow is undefined behaviour, so the program can in fact print (or do) anything, which only reinforces the point that the results can be different.)
But if you don't have overflow, then the values you get in your program are equal to their mathematical values (that is, a/(b*c) in your code is equal to the mathematical value ⌊a/(bc)⌋, and a/b/c in code is equal to the mathematical value ⌊⌊a/b⌋/c⌋), which we've proved are equal. So it is safe to replace a/(b*c) in code by a/b/c when b*c is small enough not to overflow.
While b*c could overflow (in C) for the original computation, a/b/c can't overflow, so we don't need to worry about overflow for the forward replacement a/(b*c) -> a/b/c. We would need to worry about it the other way around, though.
Let x = a/b/c. Then a/b == x*c + y for some y < c, and a == (x*c + y)*b + z for some z < b.
Thus, a == x*b*c + y*b + z. y*b + z is at most b*c-1, so x*b*c <= a <= (x+1)*b*c, and a/(b*c) == x.
Thus, a/b/c == a/(b*c), and replacing a/(b*c) by a/b/c is safe.
Nested floor division can be reordered as long as you keep track of your divisors and dividends.
#python3.x
x // m // n = x // (m * n)
#python2.x
x / m / n = x / (m * n)
Proof (sucks without LaTeX :( ) in python3.x:
Let k = x // m
then k - 1 < x / m <= k
and (k - 1) / n < x / (m * n) <= k / n
In addition, (x // m) // n = k // n
and because x // m <= x / m and (x // m) // n <= (x / m) // n
k // n <= x // (m * n)
Now, if k // n < x // (m * n)
then k / n < x / (m * n)
and this contradicts the above statement that x / (m * n) <= k / n
so if k // n <= x // (m * n) and k // n !< x // (m * n)
then k // n = x // (m * n)
and (x // m) // n = x // (m * n)
https://en.wikipedia.org/wiki/Floor_and_ceiling_functions#Nested_divisions
I have spent a lot of time to learn about implementing/visualizing dynamic programming problems using iteration but I find it very hard to understand, I can implement the same using recursion with memoization but it is slow when compared to iteration.
Can someone explain the same by a example of a hard problem or by using some basic concepts. Like the matrix chain multiplication, longest palindromic sub sequence and others. I can understand the recursion process and then memoize the overlapping sub problems for efficiency but I can't understand how to do the same using iteration.
Thanks!
Dynamic programming is all about solving the sub-problems in order to solve the bigger one. The difference between the recursive approach and the iterative approach is that the former is top-down, and the latter is bottom-up. In other words, using recursion, you start from the big problem you are trying to solve and chop it down to a bit smaller sub-problems, on which you repeat the process until you reach the sub-problem so small you can solve. This has an advantage that you only have to solve the sub-problems that are absolutely needed and using memoization to remember the results as you go. The bottom-up approach first solves all the sub-problems, using tabulation to remember the results. If we are not doing extra work of solving the sub-problems that are not needed, this is a better approach.
For a simpler example, let's look at the Fibonacci sequence. Say we'd like to compute F(101). When doing it recursively, we will start with our big problem - F(101). For that, we notice that we need to compute F(99) and F(100). Then, for F(99) we need F(97) and F(98). We continue until we reach the smallest solvable sub-problem, which is F(1), and memoize the results. When doing it iteratively, we start from the smallest sub-problem, F(1) and continue all the way up, keeping the results in a table (so essentially it's just a simple for loop from 1 to 101 in this case).
Let's take a look at the matrix chain multiplication problem, which you requested. We'll start with a naive recursive implementation, then recursive DP, and finally iterative DP. It's going to be implemented in a C/C++ soup, but you should be able to follow along even if you are not very familiar with them.
/* Solve the problem recursively (naive)
p - matrix dimensions
n - size of p
i..j - state (sub-problem): range of parenthesis */
int solve_rn(int p[], int n, int i, int j) {
// A matrix multiplied by itself needs no operations
if (i == j) return 0;
// A minimal solution for this sub-problem, we
// initialize it with the maximal possible value
int min = std::numeric_limits<int>::max();
// Recursively solve all the sub-problems
for (int k = i; k < j; ++k) {
int tmp = solve_rn(p, n, i, k) + solve_rn(p, n, k + 1, j) + p[i - 1] * p[k] * p[j];
if (tmp < min) min = tmp;
}
// Return solution for this sub-problem
return min;
}
To compute the result, we starts with the big problem:
solve_rn(p, n, 1, n - 1)
The key of DP is to remember all the solutions to the sub-problems instead of forgetting them, so we don't need to recompute them. It's trivial to make a few adjustments to the above code in order to achieve that:
/* Solve the problem recursively (DP)
p - matrix dimensions
n - size of p
i..j - state (sub-problem): range of parenthesis */
int solve_r(int p[], int n, int i, int j) {
/* We need to remember the results for state i..j.
This can be done in a matrix, which we call dp,
such that dp[i][j] is the best solution for the
state i..j. We initialize everything to 0 first.
static keyword here is just a C/C++ thing for keeping
the matrix between function calls, you can also either
make it global or pass it as a parameter each time.
MAXN is here too because the array size when doing it like
this has to be a constant in C/C++. I set it to 100 here.
But you can do it some other way if you don't like it. */
static int dp[MAXN][MAXN] = {{0}};
/* A matrix multiplied by itself has 0 operations, so we
can just return 0. Also, if we already computed the result
for this state, just return that. */
if (i == j) return 0;
else if (dp[i][j] != 0) return dp[i][j];
// A minimal solution for this sub-problem, we
// initialize it with the maximal possible value
dp[i][j] = std::numeric_limits<int>::max();
// Recursively solve all the sub-problems
for (int k = i; k < j; ++k) {
int tmp = solve_r(p, n, i, k) + solve_r(p, n, k + 1, j) + p[i - 1] * p[k] * p[j];
if (tmp < dp[i][j]) dp[i][j] = tmp;
}
// Return solution for this sub-problem
return dp[i][j];;
}
We start with the big problem as well:
solve_r(p, n, 1, n - 1)
Iterative solution is only to, well, iterate all the states, instead of starting from the top:
/* Solve the problem iteratively
p - matrix dimensions
n - size of p
We don't need to pass state, because we iterate the states. */
int solve_i(int p[], int n) {
// But we do need our table, just like before
static int dp[MAXN][MAXN];
// Multiplying a matrix by itself needs no operations
for (int i = 1; i < n; ++i)
dp[i][i] = 0;
// L represents the length of the chain. We go from smallest, to
// biggest. Made L capital to distinguish letter l from number 1
for (int L = 2; L < n; ++L) {
// This double loop goes through all the states in the current
// chain length.
for (int i = 1; i <= n - L + 1; ++i) {
int j = i + L - 1;
dp[i][j] = std::numeric_limits<int>::max();
for (int k = i; k <= j - 1; ++k) {
int tmp = dp[i][k] + dp[k+1][j] + p[i-1] * p[k] * p[j];
if (tmp < dp[i][j])
dp[i][j] = tmp;
}
}
}
// Return the result of the biggest problem
return dp[1][n-1];
}
To compute the result, just call it:
solve_i(p, n)
Explanation of the loop counters in the last example:
Let's say we need to optimize the multiplication of 4 matrices: A B C D. We are doing an iterative approach, so we will first compute the chains with the length of two: (A B) C D, A (B C) D, and A B (C D). And then chains of three: (A B C) D, and A (B C D). That is what L, i and j are for.
L represents the chain length, it goes from 2 to n - 1 (n is 4 in this case, so that is 3).
i and j represent the starting and ending position of the chain. In case L = 2, i goes from 1 to 3, and j goes from 2 to 4:
(A B) C D A (B C) D A B (C D)
^ ^ ^ ^ ^ ^
i j i j i j
In case L = 3, i goes from 1 to 2, and j goes from 3 to 4:
(A B C) D A (B C D)
^ ^ ^ ^
i j i j
So generally, i goes from 1 to n - L + 1, and j is i + L - 1.
Now, let's continue with the algorithm assuming that we are at the step where we have (A B C) D. We now need to take into account the sub-problems (which are already calculated): ((A B) C) D and (A (B C)) D. That is what k is for. It goes through all the positions between i and j and computes the sub problems.
I hope I helped.
The problem with recursion is the high number of stack frames that need to be pushed/popped. This can quickly become the bottle-neck.
The Fibonacci Series can be calculated with iterative DP or recursion with memoization. If we calculate F(100) in DP all we need is an array of length 100 e.g. int[100] and that's the guts of our used memory. We calculate all entries of the array pre-filling f[0] and f[1] as they are defined to be 1. and each value just depends on the previous two.
If we use a recursive solution we start at fib(100) and work down. Every method call from 100 down to 0 is pushed onto the stack, AND checked if it's memoized. These operations add up and iteration doesn't suffer from either of these. In iteration (bottom-up) we already know all of the previous answers are valid. The bigger impact is probably the stack frames; and given a larger input you may get a StackOverflowException for what was otherwise trivial with an iterative DP approach.
I am trying to minimize an objective function that has three parameters: i, p, j like this:
param mlu{i in I, p in P, j in out[p]} := traffic[i,p]/capacity[j];
minimize MAXLU{i in I, p in P, j in out[p]}: mlu[i,p,j] * x[i,p,j];
but the objective function has to be greater than 0, otherwise it is defeating my purpose of minimization.
And I am trying to ensure this by adding a constraint on the objective function like this:
s.t. constraint1{i in I, p in P, j in out[p]} : MAXLU[i,p,j] != 0;
But I get the following error:
LP.mod:66: invalid reference to status, primal value, or dual value of objective MAXLU above solve statement
Context: i in I , p in P , j in out [ p ] } : MAXLU [ i , p , j ] !=
glp_mpl_generate: invalid call sequence
Error detected in file glpapi14.c at line 79
Aborted
Is it even possible to do this? Thank you for any help/suggestions!
You can refer to variables and parameters, but not objectives (MAXLU) in a constraint. A simple way to fix this is to replace MAXLU in constraint1 with the actual objective expression:
s.t. constraint1{i in I, p in P, j in out[p]} : mlu[i,p,j] * x[i,p,j] ...;
Moreover, you cannot define a constraint with expression != 0. You can have >= epsilon or <= epsilon instead where epsilon is a small number:
s.t. constraint1{i in I, p in P, j in out[p]} : mlu[i,p,j] * x[i,p,j] >= epsilon;