Given is a set S of size n, which is partitioned into classes (s1,..,sk) of sizes n1,..,nk. Naturally, it holds that n = n1+...+nk.
I am interested in finding out the number of ways in which I can combine elements of this partitioning so that each combination contains exactly one element of each class.
Since I can choose n1 elements from s1, n2 elements from s2 and so on, I am looking for the solution to max(n1*..*nk) for arbitrary n1,..nk for which it holds that n1+..+nk=n.
I have the feeling that this is a linear-optimization problem, but it's been too long since I learned this stuff as an undergrad. I hope that somebody remembers how to compute this.

You're looking for the number of combinations with one element from each partition?
That's simply n1*n2*...*nk.
You seem to also be asking a separate question:
Given N, how do I assign n1, n2, ..., nk such that their product is maximized. This is not actually a linear optimization problem, since your variables are multiplied together.
It can be solved by some calculus, i.e. by taking partial dervatives in each of the variables, with the constraint, using Lagrange multipliers.
The result will be that the n1 .. nk should be as close to the same size as possible.
if n is a multiple of k, then n_1 = n_2 = ... = n_k = n/k
otherwise, n_1 = n_2 = ... = n_j = Ceiling[n/k]
and n_j+1 = ... = n_k = floor[n/k]
Basically, we try to distribute the elements as evenly as possible into partitions. If they divide evenly, great. If not, we divide as evenly as possible, and with whatever is left over, we give an extra element each to the first partitions. (Doesn't have to be the first partitions, that choice is fairly arbitrary.) In this way, the difference in the number of elements owned by any two partitions will be at most one.
Gory Details:
This is the product function which we wish to maximize:
P = n1*n2*...nK
We define a new function using Lagrange multipliers:
Lambda = P + l(N - n1 - n2 ... -nk)
And take Partial derivatives in each of the k n_i variables:
dLambda/dn_i = P/n_i - l
and in l:
dLambda/dl = N - n1 -n2 ... -nk
setting all of the partial derivatives = 0, we get a system of k + 1 equations, and when we solve them, we'll get that n1 = n2 = ... = nk
floor(n/k)^(k - n mod k)*ceil(n/k)^(n mod k)
-- MarkusQ
P.S. For the example you gave of S = {1,2,3,4}, n = 4, k = 2 this gives:
floor(4/2)^(2 - 4 mod 2)*ceil(4/2)^(4 mod 2)
floor(2)^(2 - 0)*ceil(2)^(0)
2^2 * 2^0
4 * 1
4 you wanted.
To clarify, this formula gives the number of permutations generated by the partitioning with the maximum possible number of permutations. There will of course be other, less optimal partitionings.
For a given perimeter the rectangle with the largest area is the one that is closest to a square (and the same is true in higher dimensions) which means you want the sides to be as close to equal in length as possible (e.g. all either the average length rounded up or down). The formula can then be seen to be:
(length of short sides)^(number of short sides)
(length of long sides)^(number of long sides)
which is just the volume of the hyper-rectangle meeting this constraint.
Note that, when viewed this way, it also tells you how to construct a maximal partitioning.


Minimum number of increments so that all elements have a common divisor

I got this problem in an interview recently:
Given a set of numbers X = [X_1, X_2, ...., X_n] where X_i <= 500 for 1 <= i <= n. Increment the numbers (only positive increments) in the set so that each element in the set has a common divisor >=2, and such that the sum of all increments is minimized.
For example, if X = [5, 7, 7, 7, 7] the new set would be X = [7, 7, 7, 7, 7] Since you can add 2 to X_1. X = [6, 8, 8, 8, 8] has a common denominator of 2 but is not correct since we're adding 6 (add 2 to 5 and 1 to each of the 4 7's).
I had a seemingly working solution (as in it passed all the test cases) that loops through the prime numbers < 500 and for each X_i in X finds the closest multiple of the prime number greater than X_i.
function closest_multiple(x, y)
return ceil(x/y)*y
min_increment = inf
for each prime_number < 500:
total_increment = 0
for each element X_i in X:
total_increment += closest_multiple(X_i, prime_number) - X_i
min_increment = min(min_increment, total_increment)
return min_increment
It's technically O(n) but is there a better way to solve this? I've been suggested to use dynamic programming but am unsure how that would fit in here.
Constant-bounded entries case
When X_i is bounded by a constant, the best time you can achieve asymptotically is O(n), since it takes at least that long to read all of your inputs. There are some practical improvements:
Filter out duplicates, so you work with a list of (element, frequency) pairs.
Early stopping in your loop.
Faster computation of closest_multiple(x, p) - x. This is slightly hardware/language dependent, but a single integer modulus op is almost certainly faster than an int -> float cast, float division, ceiling() call, and multiplication on the same magnitude numbers.
freq_counts <- Initialize-Counter(X) // List of (element, freq) pairs
min_increment = inf
for each prime_number < 500:
total_increment = 0
for each pair X_i, freq in freq_counts:
total_increment += (prime_number - (X_i % prime_number)) * freq
if total_increment >= min_increment: break
min_increment = min(min_increment, total_increment)
return min_increment
Large entries case
With uniformly chosen random data, the answer is almost always from using '2' as the divisor, and much larger prime divisors are vanishingly unlikely. However, let's solve for that worst case scenario.
Here, let max(X) = M, so that our input size is O(n (log M)) bits. We want a solution that's sub-exponential in that input size, so finding all primes below M (or even sqrt(M)) is out of the question. We're looking for any prime that gives us a min-total-increment; we'll call such a prime a min-prime. After finding such a prime, we can get the min-total-increment in linear time. We'll use a factoring approach along with two observations.
Observation 1: The answer is always at most n, since the increment needed for the prime 2 to divide X_i is at most 1.
Observation 2: We're trying to find primes that divide X_i or a number slightly larger than X_i for a large fraction of our entries X_i. Let Consecutive-Product-Divisors[i] be the set of all primes dividing either of X_i or X_i+1, which I'll abbreviate CPD[i]. This is exactly the set of all primes which divide X_i * (1 + X_i).
(Obs. 2 Continued) If U is a known upper bound on our answer (here, at most n), and p is a min-prime for X, then p must divide either X_i or X_i + 1 for at least N - U/2 of our CPD entries. Use frequency counts on the CPD array to find all such primes.
Once you have a list of candidate primes (all min-primes are guaranteed to be in this list), you can test each one individually using your algorithm. Since a number k can have at most O(log k) distinct prime divisors, this gives O(n log M) possible distinct primes that divide at least half of the numbers
[X_1*(1 + X_1), X_2*(1 + X_2), ... X_n*(1 + X_n)] that make up our candidate list. It's possible you can lower this bound with some more careful analysis, but it likely won't strongly affect the asymptotic runtime of the whole algorithm.
A more optimal complexity for large entries
The complexity of this solution is hard to write in short form, because the bottleneck is factoring n numbers of maximum size M, plus O(n^2 log M) arithmetic (i.e. addition, subtraction, multiply, modulo) operations on numbers of maximum size M. That doesn't mean the runtime is unknown: If you select any integer factoring algorithm and large-integer-arithmetic algorithms, you can derive the runtime exactly. Unfortunately, because of factoring, the best known runtime of the above algorithm is super-polynomial (but sub-exponential).
How can we do better? I did find a more complicated solution, based on Greatest Common Divisors (GCD) and dynamic-programming-like that runs in polynomial time (although likely much slower on non-astronomical-size inputs) since it doesn't rely on factoring.
The solution relies on the fact that at least one of the following two statements is true:
The number 2 is a min-prime for X, or
For at least one value of i, 1 <= i <= n there is an optimal solution where X_i remains unincremented, i.e. where one of the divisors of X_i produces a min-total-increment.
GCD-Based polynomial time algorithm
We can test 2 and all small primes quickly for their minimum costs. In fact, we'll test all primes p, p <= n, which we can do in polynomial time, and factor out these primes from X_i and its first n increments. This leads us to the following algorithm:
// Given: input list X = [X_1, X_2, ... X_n].
// Subroutine compute-min-cost(list A, int p) is
// just the inner loop of the above algorithm.
min_increment = inf;
for each prime p <= n:
min_increment = min(min_increment, compute-min-cost(X, p));
// Initialize empty, 2-D, n x (n+1) list Y[n][n+1], of offset X-values
for all 1 <= i <= n:
for all 0 <= j <= n:
Y[i][j] <- X[i] + j;
for each prime p <= n: // Factor out all small prime divisors from Y
for each Y[i][j]:
while Y[i][j] % p == 0:
Y[i][j] /= p;
for all 1 <= i <= n: // Loop 1
// Y[i][0] is the test 'unincremented' entry
// Initialize empty hash-tables 'costs' and 'new_costs'
// Keys of hash-tables are GCDs,
// Values are a running sum of increment-costs for that GCD
costs[Y[i][0]] = 0;
for all 1 <= k <= n: // Loop 2
if i == k: continue;
clear all entries from new_costs // or reinitialize to empty
for all 0 <= j < n: // Loop 3
for each Key in costs: // Loop 4
g = GCD(Key, Y[k][j]);
if g == 1: continue;
if g is not a key in new_costs:
new_costs[g] = j + costs[Key];
new_costs[g] = min(new_costs[g], j + costs[Key]);
swap(costs, new_costs);
if costs is not empty:
min_increment = min(min_increment, smallest Value in costs);
return min_increment;
The correctness of this solution follows from the previous two observations, and the (unproven, but straightforward) fact that there is a list
[X_1 + r_1, X_2 + r_2, ... , X_n + r_n] (with 0 <= r_i <= n for all i) whose GCD is a divisor with minimum increment cost.
The runtime of this solution is trickier: GCDs can easily be computed in O(log^2(M)) time, and the list of all primes up to n can be computed in low poly(n) time. From the loop structure of the algorithm, to prove a polynomial bound on the whole algorithm, it suffices to show that the maximum size of our 'costs' hash-table is polynomial in log M. This is where the 'factoring-out' of small primes comes into play. After iteration k of loop 2, the entries in costs are (Key, Value) pairs, where each Key is the GCD of k + 1 elements:
our initial Y[i][0], and [Y[1][j_1], Y[2][j_2], ... Y[k][j_k]] for some 0 <= j_l < n. The Value for this Key is the minimum increment sum needed for this divisor (i.e. sum of the j_l) over all possible choices of j_l.
There are at most O(log M) unique prime divisors of Y[i][0]. Each such prime divides at most one key in our 'costs' table at any time: Since we've factored out all prime divisors below n, any remaining prime divisor p can divide at most one of the n consecutive numbers in any Y[j] = [X_j, 1 + X_j, ... n-1 + X_j]. This means the overall algorithm is polynomial, and has a runtime below O(n^4 log^3(M)).
From here, the open questions are whether a simpler algorithm exists, and how much better than this bound can you achieve. You can definitely optimize this algorithm (including using the early-stopping and frequency counts from before). It's also likely that better bounds on counting large-and-distinct-prime-divisors for consecutive numbers shows this solution is already better than that stated runtime, but a simplification of this solution would be very interesting.

Construct a bijective function to map arbitrary integer from [1, n] to [1, n] randomly

I want to construct a bijective function f(k, n, seed) from [1,n] to [1,n] where 1<=k<=n and 1<=f(k, n, seed)<=n for each given seed and n. The function actually should return a value from a random permutation of 1,2,...,n. The randomness is decided by the seed. Different seed may corresponds to different permutation. I want the function f(k, n, seed)'s time complexity to be O(1) for each 1<=k<=n and any given seed.
Anyone knows how can I construct such a function? The randomness is allowed to be pseudo-randomness. n can be very large (e.g. >= 1e8).
No matter how you do it, you will always have to store a list of numbers still available or numbers already used ... A simple possibility would be the following
const avail = [1,2,3, ..., n];
let random = new Random(seed)
function f(k,n) {
let index = - k);
let result = avail[index]
avail[index] = avail[n-k];
The assumptions for this are the following
the array avail is 0-indexed creates an random integer i with 0 <= i < x
the first k to call the function f with is 0
f is called for contiguous k 0, 1, 2, 3, ..., n
The principle works as follows:
avail holds all numbers still available for the permution. When you take a random index, the element at that index is the next element of the permutation. Then instead of slicing out that element from the array, which is quite expensive, you just replace the currently selected element with the last element in the avail array. In the next iteration you (virtually) decrease the size of the avail array by 1 by decreasing the upper limit for the random by one.
I'm not sure, how secure this random permutation is in terms of distribution of the values, ie for instance it may happen that a certain range of numbers is more likely to be in the beginning of the permuation or in the end of the permutation.
A simple, but not very 'random', approach would be to use the fact that, if a is relatively prime to n (ie they have no common factors), then
x-> (a*x + b)%n
is a permutation of {0,..n-1} to {0,..n-1}. To find the inverse of this, you can use the extended euclidean algorithm to find k and l so that
1 = gcd(a,n) = k*a+l*n
for then the inverse of the map above is
y -> (k*x + c) mod n
where c = -k*b mod n
So you could choose a to be a 'random' number in {0,..n-1} that is relatively prime to n, and b to be any number in {0,..n-1}
Note that you'll need to do this in 64 bit arithmetic to avoid overflow in computing a*x.

questions about AES irreducible polynomials

For galois field GF(2^8), the polynomial's format is a7x^7+a6x^6+...+a0.
For AES, the irreducible polynomial is x^8+x^4+x^3+x+1.
Apparently, the max power in GF(2^8) is x^7, but why the max power of irreducible polynomial is x^8?
How will the max power in irreducible polynomial affect inverse result in GF?
Can I set the max power of irreducible polynomial be x^9?
To understand why the modulus of GF(2⁸) must be order 8 (that is, have 8 as its largest exponent), you must know how to perform polynomial division with coefficients in GF(2), which means you must know how to perform polynomial division in general. I will assume you know how to do those things. If you don't know how, there are many tutorials on the web from which you can learn.
Remember that if r = a mod m, it means that there is a q such that a = q m + r. To make a working GF(2⁸) arithmetic, we need to guarantee that r is a element of GF(2⁸) for any a and q (even though a and q do not need to be elements of GF(2⁸)). Furthermore, we need to ensure that r can be any element of GF(2⁸), if we pick the right a from GF(2⁸).
So we must pick a modulus (the m) that makes these guarantees. We do this by picking an m of exactly order 8.
If the numerator of the division (the a in a = q m + r) is order 8 or higher, we can find something to put in the quotient (the q) that, when multiplied by x⁸, cancels out that higher order. But there's nothing we can put in the quotient that can be multiplied by x⁸ to give a term with order less than 8, so the remainder (the r) can be any order up to and including 7.
Let's try a few examples of polynomial division with a modulus (or divisor) of x⁸+x⁴+x³+x+1 to see what I mean. First let's compute x⁸+1 mod x⁸+x⁴+x³+x+1:
1 <- quotient
x⁸+x⁴+x³+x+1 │ x⁸ +1
x⁴+x³+x <- remainder
So x⁸+1 mod x⁸+x⁴+x³+x+1 = x⁴+x³+x.
Next let's compute x¹²+x⁹+x⁷+x⁵+x² mod x⁸+x⁴+x³+x+1.
x⁴ +x +1 <- quotient
x⁸+x⁴+x³+x+1 │ x¹²+x⁹ +x⁷+x⁵ +x²
-(x¹² +x⁸+x⁷+x⁵+x⁴ )
x⁹+x⁸ +x⁴ +x²
-(x⁹ +x⁵+x⁴ +x²+x)
x⁸ +x⁵ +x
-(x⁸ +x⁴+x³ +x+1)
x⁵+x⁴+x³ +1 <- remainder
So x¹²+x⁹+x⁷+x⁵+x² mod x⁸+x⁴+x³+x+1 = x⁵+x⁴+x³+1, which has order < 8.
Finally, let's try a substantially higher order: how about x¹⁰⁰+x⁹⁶⁺x⁹⁵+x⁹³+x⁸⁸+x⁸⁷+x⁸⁵+x⁸⁴+x mod x⁸+x⁴+x³+x+1?
x⁹² +x⁸⁴ <- quotient
x⁸+x⁴+x³+x+1 │ x¹⁰⁰+x⁹⁶⁺x⁹⁵+x⁹³ +x⁸⁸+x⁸⁷+x⁸⁵+x⁸⁴+x
-(x¹⁰⁰+x⁹⁶+x⁹⁵+x⁹³+x⁹² )
-(x⁹²+x⁸⁸+x⁸⁷+x⁸⁵+x⁸⁴ )
x <- remainder
So x¹⁰⁰+x⁹⁶⁺x⁹⁵+x⁹³+x⁸⁸+x⁸⁷+x⁸⁵+x⁸⁴+x mod x⁸+x⁴+x³+x+1 = x. Note that I carefully chose the numerator so that it wouldn't be a long computation. If you want some pain, try doing x¹⁰⁰ mod x⁸+x⁴+x³+x+1 by hand.

Generate random natural numbers that sum to a given number and comply to a set of general constraints

I had an application that required something similar to the problem described here.
I too need to generate a set of positive integer random variables {Xi} that add up to a given sum S, where each variable might have constraints such as mi<=Xi<=Mi.
This I know how to do, the problem is that in my case I also might have constraints between the random variables themselves, say Xi<=Fi(Xj) for some given Fi (also lets say Fi's inverse is known), Now, how should one generate the random variables "correctly"? I put correctly in quotes here because I'm not really sure what it would mean here except that I want the generated numbers to cover all possible cases with as uniform a probability as possible for each possible case.
Say we even look at a very simple case:
4 random variables X1,X2,X3,X4 that need to add up to 100 and comply with the constraint X1 <= 2*X2, what would be the "correct" way to generate them?
P.S. I know that this seems like it would be a better fit for math overflow but I found no solutions there either.
For 4 random variables X1,X2,X3,X4 that need to add up to 100 and comply with the constraint X1 <= 2*X2, one could use multinomial distribution
As soon as probability of the first number is low enough, your
condition would be almost always satisfied, if not - reject and repeat.
And multinomial distribution by design has the sum equal to 100.
Code, Windows 10 x64, Python 3.8
import numpy as np
def x1x2x3x4(rng):
while True:
v = rng.multinomial(100, [0.1, 1/2-0.1, 1/4, 1/4])
if v[0] <= 2*v[1]:
return v
return None
rng = np.random.default_rng()
Lots of freedom in selecting probabilities. E.g., you could make other (##2, 3, 4) symmetric. Code
def x1x2x3x4(rng, pfirst = 0.1):
pother = (1.0 - pfirst)/3.0
while True:
v = rng.multinomial(100, [pfirst, pother, pother, pother])
if v[0] <= 2*v[1]:
return v
return None
If you start rejecting combinations, then you artificially bump probabilities of one subset of events and lower probabilities of another set of events - and total sum is always 1. There is NO WAY to have uniform probabilities with conditions you want to meet. Code below runs with multinomial with equal probabilities and computes histograms and mean values. Mean supposed to be exactly 25 (=100/4), but as soon as you reject some samples, you lower mean of first value and increase mean of the second value. Difference is small, but UNAVOIDABLE. If it is ok with you, so be it. Code
import numpy as np
import matplotlib.pyplot as plt
def x1x2x3x4(rng, summa, pfirst = 0.1):
pother = (1.0 - pfirst)/3.0
while True:
v = rng.multinomial(summa, [pfirst, pother, pother, pother])
if v[0] <= 2*v[1]:
return v
return None
rng = np.random.default_rng()
s = 100
N = 5000000
# histograms
first = np.zeros(s+1)
secnd = np.zeros(s+1)
third = np.zeros(s+1)
forth = np.zeros(s+1)
mfirst = np.float64(0.0)
msecnd = np.float64(0.0)
mthird = np.float64(0.0)
mforth = np.float64(0.0)
for _ in range(0, N): # sampling with equal probabilities
v = x1x2x3x4(rng, s, 0.25)
q = v[0]
mfirst += np.float64(q)
first[q] += 1.0
q = v[1]
msecnd += np.float64(q)
secnd[q] += 1.0
q = v[2]
mthird += np.float64(q)
third[q] += 1.0
q = v[3]
mforth += np.float64(q)
forth[q] += 1.0
x = np.arange(0, s+1, dtype=np.int32)
fig, axs = plt.subplots(4)
axs[0].stem(x, first, markerfmt=' ')
axs[1].stem(x, secnd, markerfmt=' ')
axs[2].stem(x, third, markerfmt=' ')
axs[3].stem(x, forth, markerfmt=' ')
print((mfirst/N, msecnd/N, mthird/N, mforth/N))
(24.9267492, 25.0858356, 24.9928602, 24.994555)
NB! As I said, first mean is lower and second is higher. Histograms are a little bit different as well
Ok, Dirichlet, so be it. Lets compute mean values of your generator before and after the filter. Code
import numpy as np
def generate(n=10000):
uv = np.hstack([np.zeros([n, 1]),
np.sort(np.random.rand(n, 2), axis=1),
return np.diff(uv, axis=1)
a = generate(1000000)
print("Original Dirichlet sample means")
print(np.mean((a[:, 0] * 100).astype(int)))
print(np.mean((a[:, 1] * 100).astype(int)))
print(np.mean((a[:, 2] * 100).astype(int)))
print("\nFiltered Dirichlet sample means")
q = (a[(a[:,0]<=2*a[:,1]) & (a[:,2]>0.35),:] * 100).astype(int)
print(np.mean(q[:, 0]))
print(np.mean(q[:, 1]))
print(np.mean(q[:, 2]))
I've got
Original Dirichlet sample means
(1000000, 3)
Filtered Dirichlet sample means
(281428, 3)
Do you see the difference? As soon as you apply any kind of filter, you alter the distribution. Nothing is uniform anymore
Ok, so I have this solution for my actual question where I generate 9000 triplets of 3 random variables by joining zeros to sorted random tuple arrays and finally ones and then taking their differences as suggested in the answer on SO I mentioned in my original question.
Then I simply filter out the ones that don't match my constraints and plot them.
S = 100
def generate(n=9000):
uv = np.hstack([np.zeros([n, 1]),
np.sort(np.random.rand(n, 2), axis=1),
return np.diff(uv, axis=1)
a = generate()
def plotter(a):
fig = plt.figure(figsize=(10, 10), dpi=100)
ax = fig.add_subplot(projection='3d')
surf = ax.scatter(*zip(*a), marker='o', color=a / 100)
ax.view_init(elev=25., azim=75)
ax.set_xlabel('$A_1$', fontsize='large', fontweight='bold')
ax.set_ylabel('$A_2$', fontsize='large', fontweight='bold')
ax.set_zlabel('$A_3$', fontsize='large', fontweight='bold')
lim = (0, S);
b = a[(a[:, 0] <= 3.5 * a[:, 1] + 2 * a[:, 2]) &\
(a[:, 1] >= (a[:, 2])),:] * S
As you can see, the distribution is uniformly distributed over these arbitrary limits on the simplex but I'm still not sure if I could forego throwing away samples that don't adhere to the constraints (work the constraints somehow into the generation process? I'm almost certain now that it can't be done for general {Fi}). This could be useful in the general case if your constraints limit your sampled area to a very small subarea of the entire simplex (since resampling like this means that to sample from the constrained area a you need to sample from the simplex an order of 1/a times).
If someone has an answer to this last question I will be much obliged (will change the selected answer to his).
I have an answer to my question, under a general set of constraints what I do is:
Sample the constraints in order to evaluate s, the constrained area.
If s is big enough then generate random samples and throw out those that do not comply to the constraints as described in my previous answer.
Enumerate the entire simplex.
Apply the constraints to filter out all tuples outside the constrained area.
List the resulting filtered tuples.
When asked to generate, I generate by choosing uniformly from this result list.
(note: this is worth my effort only because I'm asked to generate very often)
A combination of these two strategies should cover most cases.
Note: I also had to handle cases where S was a randomly generated parameter (m < S < M) in which case I simply treat it as another random variable constrained between m and M and I generate it together with the rest of the variables and handle it as I described earlier.

Quadratic probing: (f(k) + a*j + b*j^2) % M, How to choose a and b?

If M is prime, how to choose a and b to minimize collisions?
Also in books it is written that to find the empty slot while quadratic probing in (f(k)+j^2) % M, the hash table has to be at least half empty? Can someone provide me a proof of that?
There are some values for choosing a and b on wikipedia:
For prime M > 2, most choices of a and b will make f(k,j) distinct for j in [0,(M − 1) / 2]. Such choices include a = b = 1/2, a = b = 1, and a = 0,b = 1. Because there are only about M/2 distinct probes for a given element, it is difficult to guarantee that insertions will succeed when the load factor is > 1/2.
A proof for the guarantee of finding the empty slots is here or here.
