Generate random natural numbers that sum to a given number and comply to a set of general constraints - math

I had an application that required something similar to the problem described here.
I too need to generate a set of positive integer random variables {Xi} that add up to a given sum S, where each variable might have constraints such as mi<=Xi<=Mi.
This I know how to do, the problem is that in my case I also might have constraints between the random variables themselves, say Xi<=Fi(Xj) for some given Fi (also lets say Fi's inverse is known), Now, how should one generate the random variables "correctly"? I put correctly in quotes here because I'm not really sure what it would mean here except that I want the generated numbers to cover all possible cases with as uniform a probability as possible for each possible case.
Say we even look at a very simple case:
4 random variables X1,X2,X3,X4 that need to add up to 100 and comply with the constraint X1 <= 2*X2, what would be the "correct" way to generate them?
P.S. I know that this seems like it would be a better fit for math overflow but I found no solutions there either.

For 4 random variables X1,X2,X3,X4 that need to add up to 100 and comply with the constraint X1 <= 2*X2, one could use multinomial distribution
As soon as probability of the first number is low enough, your
condition would be almost always satisfied, if not - reject and repeat.
And multinomial distribution by design has the sum equal to 100.
Code, Windows 10 x64, Python 3.8
import numpy as np
def x1x2x3x4(rng):
while True:
v = rng.multinomial(100, [0.1, 1/2-0.1, 1/4, 1/4])
if v[0] <= 2*v[1]:
return v
return None
rng = np.random.default_rng()
print(x1x2x3x4(rng))
print(x1x2x3x4(rng))
print(x1x2x3x4(rng))
UPDATE
Lots of freedom in selecting probabilities. E.g., you could make other (##2, 3, 4) symmetric. Code
def x1x2x3x4(rng, pfirst = 0.1):
pother = (1.0 - pfirst)/3.0
while True:
v = rng.multinomial(100, [pfirst, pother, pother, pother])
if v[0] <= 2*v[1]:
return v
return None
UPDATE II
If you start rejecting combinations, then you artificially bump probabilities of one subset of events and lower probabilities of another set of events - and total sum is always 1. There is NO WAY to have uniform probabilities with conditions you want to meet. Code below runs with multinomial with equal probabilities and computes histograms and mean values. Mean supposed to be exactly 25 (=100/4), but as soon as you reject some samples, you lower mean of first value and increase mean of the second value. Difference is small, but UNAVOIDABLE. If it is ok with you, so be it. Code
import numpy as np
import matplotlib.pyplot as plt
def x1x2x3x4(rng, summa, pfirst = 0.1):
pother = (1.0 - pfirst)/3.0
while True:
v = rng.multinomial(summa, [pfirst, pother, pother, pother])
if v[0] <= 2*v[1]:
return v
return None
rng = np.random.default_rng()
s = 100
N = 5000000
# histograms
first = np.zeros(s+1)
secnd = np.zeros(s+1)
third = np.zeros(s+1)
forth = np.zeros(s+1)
mfirst = np.float64(0.0)
msecnd = np.float64(0.0)
mthird = np.float64(0.0)
mforth = np.float64(0.0)
for _ in range(0, N): # sampling with equal probabilities
v = x1x2x3x4(rng, s, 0.25)
q = v[0]
mfirst += np.float64(q)
first[q] += 1.0
q = v[1]
msecnd += np.float64(q)
secnd[q] += 1.0
q = v[2]
mthird += np.float64(q)
third[q] += 1.0
q = v[3]
mforth += np.float64(q)
forth[q] += 1.0
x = np.arange(0, s+1, dtype=np.int32)
fig, axs = plt.subplots(4)
axs[0].stem(x, first, markerfmt=' ')
axs[1].stem(x, secnd, markerfmt=' ')
axs[2].stem(x, third, markerfmt=' ')
axs[3].stem(x, forth, markerfmt=' ')
plt.show()
print((mfirst/N, msecnd/N, mthird/N, mforth/N))
prints
(24.9267492, 25.0858356, 24.9928602, 24.994555)
NB! As I said, first mean is lower and second is higher. Histograms are a little bit different as well
UPDATE III
Ok, Dirichlet, so be it. Lets compute mean values of your generator before and after the filter. Code
import numpy as np
def generate(n=10000):
uv = np.hstack([np.zeros([n, 1]),
np.sort(np.random.rand(n, 2), axis=1),
np.ones([n,1])])
return np.diff(uv, axis=1)
a = generate(1000000)
print("Original Dirichlet sample means")
print(a.shape)
print(np.mean((a[:, 0] * 100).astype(int)))
print(np.mean((a[:, 1] * 100).astype(int)))
print(np.mean((a[:, 2] * 100).astype(int)))
print("\nFiltered Dirichlet sample means")
q = (a[(a[:,0]<=2*a[:,1]) & (a[:,2]>0.35),:] * 100).astype(int)
print(q.shape)
print(np.mean(q[:, 0]))
print(np.mean(q[:, 1]))
print(np.mean(q[:, 2]))
I've got
Original Dirichlet sample means
(1000000, 3)
32.833758
32.791228
32.88054
Filtered Dirichlet sample means
(281428, 3)
13.912784086871243
28.36360987535
56.23109285501087
Do you see the difference? As soon as you apply any kind of filter, you alter the distribution. Nothing is uniform anymore

Ok, so I have this solution for my actual question where I generate 9000 triplets of 3 random variables by joining zeros to sorted random tuple arrays and finally ones and then taking their differences as suggested in the answer on SO I mentioned in my original question.
Then I simply filter out the ones that don't match my constraints and plot them.
S = 100
def generate(n=9000):
uv = np.hstack([np.zeros([n, 1]),
np.sort(np.random.rand(n, 2), axis=1),
np.ones([n,1])])
return np.diff(uv, axis=1)
a = generate()
def plotter(a):
fig = plt.figure(figsize=(10, 10), dpi=100)
ax = fig.add_subplot(projection='3d')
surf = ax.scatter(*zip(*a), marker='o', color=a / 100)
ax.view_init(elev=25., azim=75)
ax.set_xlabel('$A_1$', fontsize='large', fontweight='bold')
ax.set_ylabel('$A_2$', fontsize='large', fontweight='bold')
ax.set_zlabel('$A_3$', fontsize='large', fontweight='bold')
lim = (0, S);
ax.set_xlim3d(*lim);
ax.set_ylim3d(*lim);
ax.set_zlim3d(*lim)
plt.show()
b = a[(a[:, 0] <= 3.5 * a[:, 1] + 2 * a[:, 2]) &\
(a[:, 1] >= (a[:, 2])),:] * S
plotter(b.astype(int))
As you can see, the distribution is uniformly distributed over these arbitrary limits on the simplex but I'm still not sure if I could forego throwing away samples that don't adhere to the constraints (work the constraints somehow into the generation process? I'm almost certain now that it can't be done for general {Fi}). This could be useful in the general case if your constraints limit your sampled area to a very small subarea of the entire simplex (since resampling like this means that to sample from the constrained area a you need to sample from the simplex an order of 1/a times).
If someone has an answer to this last question I will be much obliged (will change the selected answer to his).

I have an answer to my question, under a general set of constraints what I do is:
Sample the constraints in order to evaluate s, the constrained area.
If s is big enough then generate random samples and throw out those that do not comply to the constraints as described in my previous answer.
Otherwise:
Enumerate the entire simplex.
Apply the constraints to filter out all tuples outside the constrained area.
List the resulting filtered tuples.
When asked to generate, I generate by choosing uniformly from this result list.
(note: this is worth my effort only because I'm asked to generate very often)
A combination of these two strategies should cover most cases.
Note: I also had to handle cases where S was a randomly generated parameter (m < S < M) in which case I simply treat it as another random variable constrained between m and M and I generate it together with the rest of the variables and handle it as I described earlier.

Related

Is there a way to optimize the calculation of Bernoulli Log-Likelihoods for many multivariate samples?

I currently have two Torch Tensors, p and x, which both have the shape of (batch_size, input_size).
I would like to calculate the Bernoulli log likelihoods for the given data, and return a tensor of size (batch_size)
Here's an example of what I'd like to do:
I have the formula for log likelihoods of Bernoulli Random variables:
\sum_i^d x_{i} ln(p_i) + (1-x_i) ln (1-p_i)
Say I have p Tensor:
[[0.6 0.4 0], [0.33 0.34 0.33]]
And say I have the x tensor for the binary inputs based on those probabilities:
[[1 1 0], [0 1 1]]
And I want to calculate the log likelihood for every sample, which would result in:
[[ln(0.6)+ln(0.4)], [ln(0.67)+ln(0.34)+ln(0.33)]]
Would it be possible to do this computation without the use of for loops?
I know I could use torch.sum(axis=1) to do the final summation between the logs, but is it possible to do the Bernoulli log-likelihood computation without the use of for loops? or use at most 1 for loop? I am trying to vectorize this operation as much as possible. I could've sworn we could use LaTeX for equations before, did something change or is it another website?
Though not a good practice, you can directly use the formula on the tensors as follows (works because these are element wise operations):
import torch
p = torch.tensor([
[0.6, 0.4, 0],
[0.33, 0.34, 0.33]
])
x = torch.tensor([
[1., 1, 0],
[0, 1, 1]
])
eps = 1e-8
bll1 = (x * torch.log(p+eps) + (1-x) * torch.log(1-p+eps)).sum(axis=1)
print(bll1)
#tensor([-1.4271162748, -2.5879497528])
Note that to avoid log(0) error, I have introduced a very small constant eps inside it.
A better way to do this is to use BCELoss inside nn module in pytorch.
import torch.nn as nn
bce = nn.BCELoss(reduction='none')
bll2 = -bce(p, x).sum(axis=1)
print(bll2)
#tensor([-1.4271162748, -2.5879497528])
Since pytorch computes the BCE as a loss, it prepends your formula with a negative sign. The attribute reduction='none' says that I do not want the computed losses to be reduced (averaged/summed) across the batch in any way. This is advisable to use since we do not need to manually take care of numerical stability and error handling (such as adding eps above.)
You can indeed verify that the two solutions actually return the same tensor (upto a tolerance):
torch.allclose(bll1, bll2)
# True
or the tensors (without summing each row):
torch.allclose((x * torch.log(p+eps) + (1-x) * torch.log(1-p+eps)), -bce(p, x))
# True
Feel free to ask for further clarifications.

optiSolve package in r

I'm trying to maximize the portfolio return subject to 5 constraints:
1.- a certain level of portfolio risk
2.- the same above but oposite sign (I need that the risk to be exactly that number)
3.- the sum of weights have to be 1
4.- all the weights must be greater or equal to cero
5.- all the weights must be at most one
I'm using the optiSolve package because I didn't find any other package that allow me to write this problem (or al least that I understood how to use it).
I have three big problems here, the first is that the resulting weights vector sum more than 1 and the second problem is that I can't declare t(w) %*% varcov_matrix %*% w == 0 in the quadratic constraint because it only allows for "<=" and finally I don't know how to put a constraint to get only positives weights
vector_de_retornos <- rnorm(5)
matriz_de_varcov <- matrix(rnorm(25), ncol = 5)
library(optiSolve)
restriccion1 <- quadcon(Q = matriz_de_varcov, dir = "<=", val = 0.04237972)
restriccion1_neg <- quadcon(Q = -matriz_de_varcov, dir = "<=",
val = -mean(limite_inf, limite_sup))
restriccion2 <- lincon(t(vector_de_retornos),
d=rep(0, nrow(t(vector_de_retornos))),
dir=rep("==",nrow(t(vector_de_retornos))),
val = rep(1, nrow(t(vector_de_retornos))),
id=1:ncol(t(vector_de_retornos)),
name = nrow(t(vector_de_retornos)))
restriccion_nonnegativa <- lbcon(rep(0,length(vector_de_retornos)))
restriccion_positiva <- ubcon(rep(1,length(vector_de_retornos)))
funcion_lineal <- linfun(vector_de_retornos, name = "lin.fun")
funcion_obj <- cop(funcion_lineal, max = T, ub = restriccion_positiva,
lc = restriccion2, lb = restriccion_nonnegativa, restriccion1,
restriccion1_neg)
porfavor_funciona <- solvecop(funcion_obj, solver = "alabama")
> porfavor_funciona$x
1 2 3 4 5
-3.243313e-09 -4.709673e-09 9.741379e-01 3.689040e-01 -1.685290e-09
> sum(porfavor_funciona$x)
[1] 1.343042
Someone knows how to solve this maximization problem with all the constraints mentioned before or tell me what I'm doing wrong? I'll really appreciate that, because the result seems like is not taking into account the constraints. Thanks!
Your restriccion2 makes the weighted sum of x is 1, if you also want to ensure the regular sum of x is 1, you can modify the constraint as follows:
restriccion2 <- lincon(rbind(t(vector_de_retornos),
# make a second row of coefficients in the A matrix
t(rep(1,length(vector_de_retornos)))),
d=rep(0,2), # the scalar value for both constraints is 0
dir=rep('==',2), # the direction for both constraints is '=='
val=rep(1,2), # the rhs value for both constraints is 1
id=1:ncol(t(vector_de_retornos)), # the number of columns is the same as before
name= 1:2)
If you only want the regular sum to be 1 and not the weighted sum you can replace your first parameter in the lincon function as you've defined it to be t(rep(1,length(vector_de_retornos))) and that will just constrain the regular sum of x to be 1.
To make an inequality constraint using only inequalities you need the same constraint twice but with opposite signs on the coefficients and right hand side values between the two (for example: 2x <= 4 and -2x <= -4 combines to make the constraint 2*x == 4). In your edit above, you provide a different value to the val parameter so these two constraints won't combine to make the equality constraint unless they match except for opposite signs as below.
restriccion1_neg <- quadcon(Q = -matriz_de_varcov, dir = "<=", val = -0.04237972)
I'm not certain because I can't find precision information in the package documentation, but those "negative" values in the x vector are probably due to rounding. They are so small and are effectively 0 so I think the non-negativity constraint is functioning properly.
restriccion_nonnegativa <- lbcon(rep(0,length(vector_de_retornos)))
A constraint of the form
x'Qx = a
is non-convex. (More general: any nonlinear equality constraint is non-convex). Non-convex problems are much more difficult to solve than convex ones and require specialized, global solvers. For convex problems, there are quite a few solvers available. This is not the case for non-convex problems. Most portfolio models are formulated as convex QP (quadratic programming i.e. risk -- the quadratic term -- is in the objective) or convex QCP/SOCP problems (quadratic terms in the constraints, but in a convex fashion). So, the constraint
x'Qx <= a
is easy (convex), as long as Q is positive-semi definite. Rewriting x'Qx=a as
x'Qx <= a
-x'Qx <= -a
unfortunately does not make the non-convexity go away, as -Q is not PSD. If we are maximizing return, we usually only use x'Qx <= a to limit the risk and forget about the >= part. Even more popular is to put both the return and the risk in the objective (that is the standard mean-variable portfolio model).
A possible solver for solving non-convex quadratic problems under R is Gurobi.

Draw random numbers from distribution within a certain range

I want to draw a number of random variables from a series of distributions. However, the values returned have to be no higher than a certain threshold.
Let’s say I want to use the gamma distribution and the threshold is 10 and I need n=100 random numbers. I now want 100 random number between 0 and 10. (Say scale and shape are 1.)
Getting 100 random variables is obviously easy...
rgamma(100, shape = 1, rate = 1)
But how can I accomplish that these values range from 0 to 100?
EDIT
To make my question clearer. The 100 values drawn should be scaled beween 0 and 10. So that the highest drawn value is 10 and the lowest 0. Sorry if this was not clear...
EDIT No2
To add some context to the random numbers I need: I want to draw "system repair times" that follow certain distributions. However, within the system simulation there is a binomial probability of repairs beeing "simple" (i.e. short repair time) and "complicated" (i.e. long repair time). I now need a function that provides "short repair times" and one that provides "long repair times". The threshold would be the differentiation between short and long repair times. Again, I hope this makes my question a little clearer.
This is not possible with a gamma distribution.
The support of a distribution determine the range of sample data drawn from it.
As the support of the gamma distribution is (0,inf) this is not possible.(see https://en.wikipedia.org/wiki/Gamma_distribution).
If you really want to have a gamma distribution take a rejection sampling approach as Alex Reynolds suggests.
Otherwise look for a distribution with a bounded/finite support (see https://en.wikipedia.org/wiki/List_of_probability_distributions)
e.g. uniform or binomial
Well, fill vector with rejection, untested code
v <- rep(-1.0, 100)
k <- 1
while (TRUE) {
q <- rgamma(1, shape=1, rate=1)
if (q > 0.0 && q < 100) {
v[k] <- q
k<-k+1
if (k>100)
break
}
}
I'm not sure you can keep the properties of the original distribution, imposing additional conditions... But something like this will do the job:
Filter(function(x) x < 10, rgamma(1000,1,1))[1:100]
For the scaling - beware, the outcome will not follow the original distribution (but there's no way to do it, as the other answers pointed out):
# rescale numeric vector into (0, 1) interval
# clip everything outside the range
rescale <- function(vec, lims=range(vec), clip=c(0, 1)) {
# find the coeficients of transforming linear equation
# that maps the lims range to (0, 1)
slope <- (1 - 0) / (lims[2] - lims[1])
intercept <- - slope * lims[1]
xformed <- slope * vec + intercept
# do the clipping
xformed[xformed < 0] <- clip[1]
xformed[xformed > 1] <- clip[2]
xformed
}
# this is the requested data
10 * rescale(rgamma(100,1,1))
Use truncdist package. It truncates any distribution between upper and lower bounds.
Hope that helped.

On average, how many times will this incorrect loop iterate?

In some cases, a loop needs to run for a random number of iterations that ranges from min to max, inclusive. One working solution is to do something like this:
int numIterations = randomInteger(min, max);
for (int i = 0; i < numIterations; i++) {
/* ... fun and exciting things! ... */
}
A common mistake that many beginning programmers make is to do this:
for (int i = 0; i < randomInteger(min, max); i++) {
/* ... fun and exciting things! ... */
}
This recomputes the loop upper bound on each iteration.
I suspect that this does not give a uniform distribution of the number of times the loop will iterate that ranges from min to max, but I'm not sure exactly what distribution you do get when you do something like this. Does anyone know what the distribution of the number of loop iterations will be?
As a specific example: suppose that min = 0 and max = 2. Then there are the following possibilities:
When i = 0, the random value is 0. The loop runs 0 times.
When i = 0, the random value is nonzero. Then:
When i = 1, the random value is 0 or 1. Then the loop runs 1 time.
When i = 1, the random value is 2. Then the loop runs 2 times.
The probability of this first event is 1/3. The second event has probability 2/3, and within it, the first subcase has probability 2/3 and the second event has probability 1/3. Therefore, the average number of distributions is
0 × 1/3 + 1 × 2/3 × 2/3 + 2 × 2/3 × 1/3
= 0 + 4/9 + 4/9
= 8/9
Note that if the distribution were indeed uniform, we'd expect to get 1 loop iteration, but now we only get 8/9 on average. My question is whether it's possible to generalize this result to get a more exact value on the number of iterations.
Thanks!
Final edit (maybe!). I'm 95% sure that this isn't one of the standard distributions that are appropriate. I've put what the distribution is at the bottom of this post, as I think the code that gives the probabilities is more readable! A plot for the mean number of iterations against max is given below.
Interestingly, the number of iterations tails off as you increase max. Would be interesting if someone else could confirm this with their code.
If I were to start modelling this, I would start with the geometric distribution, and try to modify that. Essentially we're looking at a discrete, bounded distribution. So we have zero or more "failures" (not meeting the stopping condition), followed by one "success". The catch here, compared to the geometric or Poisson, is that the probability of success changes (also, like the Poisson, the geometric distribution is unbounded, but I think structurally the geometric is a good base). Assuming min=0, the basic mathematical form for P(X=k), 0 <= k <= max, where k is the number of iterations the loop runs, is, like the geometric distribution, the product of k failure terms and 1 success term, corresponding to k "false"s on the loop condition and 1 "true". (Note that this holds even to calculate the last probability, as the chance of stopping is then 1, which obviously makes no difference to a product).
Following on from this, an attempt to implement this in code, in R, looks like this:
fx = function(k,maximum)
{
n=maximum+1;
failure = factorial(n-1)/factorial(n-1-k) / n^k;
success = (k+1) / n;
failure * success
}
This assumes min=0, but generalizing to arbitrary mins isn't difficult (see my comment on the OP). To explain the code. First, as shown by the OP, the probabilities all have (min+1) as a denominator, so we calculate the denominator, n. Next, we calculate the product of the failure terms. Here factorial(n-1)/factorial(n-1-k) means, for example, for min=2, n=3 and k=2: 2*1. And it generalises to give you (n-1)(n-2)... for the total probability of failure. The probability of success increases as you get further into the loop, until finally, when k=maximum, it is 1.
Plotting this analytic formula gives the same results as the OP, and the same shape as the simulation plotted by John Kugelman.
Incidentally the R code to do this is as follows
plot_probability_mass_function = function(maximum)
{
x=0:maximum;
barplot(fx(x,max(x)), names.arg=x, main=paste("max",maximum), ylab="P(X=x)");
}
par(mfrow=c(3,1))
plot_probability_mass_function(2)
plot_probability_mass_function(10)
plot_probability_mass_function(100)
Mathematically, the distribution is, if I've got my maths right, given by:
which simplifies to
(thanks a bunch to http://www.codecogs.com/latex/eqneditor.php)
The latter is given by the R function
function(x,m) { factorial(m)*(x+1)/(factorial(m-x)*(m+1)^(x+1)) }
Plotting the mean number of iterations is done like this in R
meanf = function(minimum)
{
x = 0:minimum
probs = f(x,minimum)
x %*% probs
}
meanf = function(maximum)
{
x = 0:maximum
probs = f(x,maximum)
x %*% probs
}
par(mfrow=c(2,1))
max_range = 1:10
plot(sapply(max_range, meanf) ~ max_range, ylab="Mean number of iterations", xlab="max")
max_range = 1:100
plot(sapply(max_range, meanf) ~ max_range, ylab="Mean number of iterations", xlab="max")
Here are some concrete results I plotted with matplotlib. The X axis is the value i reached. The Y axis is the number of times that value was reached.
The distribution is clearly not uniform. I don't know what distribution it is offhand; my statistics knowledge is quite rusty.
1. min = 10, max = 20, iterations = 100,000
2. min = 100, max = 200, iterations = 100,000
I believe that it would still, given a sufficient amount of executions, conform to the distribution of the randomInteger function.
But this is probably a question better suited to be asked on MATHEMATICS.
I don’t know the math behind it, but I know how to compute it! In Haskell:
import Numeric.Probability.Distribution
iterations min max = iteration 0
where
iteration i = do
x <- uniform [min..max]
if i < x
then iteration (i + 1)
else return i
Now expected (iterations 0 2) gives you the expected value of ~0.89. Maybe someone with the requisite math knowledge can explain what I’m actually doing here. Because you start at 0, the loop will always run at least min times.

number of possible combinations in a partitioning

Given is a set S of size n, which is partitioned into classes (s1,..,sk) of sizes n1,..,nk. Naturally, it holds that n = n1+...+nk.
I am interested in finding out the number of ways in which I can combine elements of this partitioning so that each combination contains exactly one element of each class.
Since I can choose n1 elements from s1, n2 elements from s2 and so on, I am looking for the solution to max(n1*..*nk) for arbitrary n1,..nk for which it holds that n1+..+nk=n.
I have the feeling that this is a linear-optimization problem, but it's been too long since I learned this stuff as an undergrad. I hope that somebody remembers how to compute this.
You're looking for the number of combinations with one element from each partition?
That's simply n1*n2*...*nk.
Edit:
You seem to also be asking a separate question:
Given N, how do I assign n1, n2, ..., nk such that their product is maximized. This is not actually a linear optimization problem, since your variables are multiplied together.
It can be solved by some calculus, i.e. by taking partial dervatives in each of the variables, with the constraint, using Lagrange multipliers.
The result will be that the n1 .. nk should be as close to the same size as possible.
if n is a multiple of k, then n_1 = n_2 = ... = n_k = n/k
otherwise, n_1 = n_2 = ... = n_j = Ceiling[n/k]
and n_j+1 = ... = n_k = floor[n/k]
Basically, we try to distribute the elements as evenly as possible into partitions. If they divide evenly, great. If not, we divide as evenly as possible, and with whatever is left over, we give an extra element each to the first partitions. (Doesn't have to be the first partitions, that choice is fairly arbitrary.) In this way, the difference in the number of elements owned by any two partitions will be at most one.
Gory Details:
This is the product function which we wish to maximize:
P = n1*n2*...nK
We define a new function using Lagrange multipliers:
Lambda = P + l(N - n1 - n2 ... -nk)
And take Partial derivatives in each of the k n_i variables:
dLambda/dn_i = P/n_i - l
and in l:
dLambda/dl = N - n1 -n2 ... -nk
setting all of the partial derivatives = 0, we get a system of k + 1 equations, and when we solve them, we'll get that n1 = n2 = ... = nk
Some useful links:
Lagrange Multipliers
Optimization
floor(n/k)^(k - n mod k)*ceil(n/k)^(n mod k)
-- MarkusQ
P.S. For the example you gave of S = {1,2,3,4}, n = 4, k = 2 this gives:
floor(4/2)^(2 - 4 mod 2)*ceil(4/2)^(4 mod 2)
floor(2)^(2 - 0)*ceil(2)^(0)
2^2 * 2^0
4 * 1
4
...as you wanted.
To clarify, this formula gives the number of permutations generated by the partitioning with the maximum possible number of permutations. There will of course be other, less optimal partitionings.
For a given perimeter the rectangle with the largest area is the one that is closest to a square (and the same is true in higher dimensions) which means you want the sides to be as close to equal in length as possible (e.g. all either the average length rounded up or down). The formula can then be seen to be:
(length of short sides)^(number of short sides)
times
(length of long sides)^(number of long sides)
which is just the volume of the hyper-rectangle meeting this constraint.
Note that, when viewed this way, it also tells you how to construct a maximal partitioning.

Resources