Is there a function f(n) that returns the n:th combination in an ordered list of combinations without repetition? - math

Combinations without repetitions look like this, when the number of elements to choose from (n) is 5 and elements chosen (r) is 3:
0 1 2
0 1 3
0 1 4
0 2 3
0 2 4
0 3 4
1 2 3
1 2 4
1 3 4
2 3 4
As n and r grows the amount of combinations gets large pretty quickly. For (n,r) = (200,4) the number of combinations is 64684950.
It is easy to iterate the list with r nested for-loops, where the initial iterating value of each for loop is greater than the current iterating value of the for loop in which it is nested, as in this jsfiddle example:
https://dotnetfiddle.net/wHWK5o
What I would like is a function that calculates only one combination based on its index. Something like this:
tuple combination(i,n,r) {
return [combination with index i, when the number of elements to choose from is n and elements chosen is r]
Does anyone know if this is doable?

You would first need to impose some sort of ordering on the set of all combinations available for a given n and r, such that a linear index makes sense. I suggest we agree to keep our combinations in increasing order (or, at least, the indices of the individual elements), as in your example. How then can we go from a linear index to a combination?
Let us first build some intuition for the problem. Suppose we have n = 5 (e.g. the set {0, 1, 2, 3, 4}) and r = 3. How many unique combinations are there in this case? The answer is of course 5-choose-3, which evaluates to 10. Since we will sort our combinations in increasing order, consider for a minute how many combinations remain once we have exhausted all those starting with 0. This must be 4-choose-3, or 4 in total. In such a case, if we are looking for the combination at index 7 initially, this implies we must subtract 10 - 4 = 6 and search for the combination at index 1 in the set {1, 2, 3, 4}. This process continues until we find a new index that is smaller than this offset.
Once this process concludes, we know the first digit. Then we only need to determine the remaining r - 1 digits! The algorithm thus takes shape as follows (in Python, but this should not be too difficult to translate),
from math import factorial
def choose(n, k):
return factorial(n) // (factorial(k) * factorial(n - k))
def combination_at_idx(idx, elems, r):
if len(elems) == r:
# We are looking for r elements in a list of size r - thus, we need
# each element.
return elems
if len(elems) == 0 or len(elems) < r:
return []
combinations = choose(len(elems), r) # total number of combinations
remains = choose(len(elems) - 1, r) # combinations after selection
offset = combinations - remains
if idx >= offset: # combination does not start with first element
return combination_at_idx(idx - offset, elems[1:], r)
# We now know the first element of the combination, but *not* yet the next
# r - 1 elements. These need to be computed as well, again recursively.
return [elems[0]] + combination_at_idx(idx, elems[1:], r - 1)
Test-driving this with your initial input,
N = 5
R = 3
for idx in range(choose(N, R)):
print(idx, combination_at_idx(idx, list(range(N)), R))
I find,
0 [0, 1, 2]
1 [0, 1, 3]
2 [0, 1, 4]
3 [0, 2, 3]
4 [0, 2, 4]
5 [0, 3, 4]
6 [1, 2, 3]
7 [1, 2, 4]
8 [1, 3, 4]
9 [2, 3, 4]
Where the linear index is zero-based.

Start with the first element of the result. The value of that element depends on the number of combinations you can get with smaller elements. For each such smaller first element, the number of combinations with first element k is n − k − 1 choose r − 1, with potentially some of-by-one corrections. So you would sum over a bunch of binomial coefficients. Wolfram Alpha can help you compute such a sum, but the result still has a binomial coefficient in it. Solving for the largest k such that the sum doesn't exceed your given index i is a computation you can't do with something as simple as e.g. a square root. You need a loop to test possible values, e.g. like this:
def first_naive(i, n, r):
"""Find first element and index of first combination with that first element.
Returns a tuple of value and index.
Example: first_naive(8, 5, 3) returns (1, 6) because the combination with
index 8 is [1, 3, 4] so it starts with 1, and because the first combination
that starts with 1 is [1, 2, 3] which has index 6.
"""
s1 = 0
for k in range(n):
s2 = s1 + choose(n - k - 1, r - 1)
if i < s2:
return k, s1
s1 = s2
You can reduce the O(n) loop iterations to O(log n) steps using bisection, which is particularly relevant for large n. In that case I find it easier to think about numbering items from the end of your list. In the case of n = 5 and r = 3 you get choose(2, 2)=1 combinations starting with 2, choose(3,2)=3 combinations starting with 1 and choose(4,2)=6 combinations starting with 0. So in the general choose(n,r) binomial coefficient you increase the n with each step, and keep the r. Taking into account that sum(choose(k,r) for k in range(r,n+1)) can be simplified to choose(n+1,r+1), you can eventually come up with bisection conditions like the following:
def first_bisect(i, n, r):
nCr = choose(n, r)
k1 = r - 1
s1 = nCr
k2 = n
s2 = 0
while k2 - k1 > 1:
k3 = (k1 + k2) // 2
s3 = nCr - choose(k3, r)
if s3 <= i:
k2, s2 = k3, s3
else:
k1, s1 = k3, s3
return n - k2, s2
Once you know the first element to be k, you also know the index of the first combination with that same first element (also returned from my function above). You can use the difference between that first index and your actual index as input to a recursive call. The recursive call would be for r − 1 elements chosen from n − k − 1. And you'd add k + 1 to each element from the recursive call, since the top level returns values starting at 0 while the next element has to be greater than k in order to avoid duplication.
def combination(i, n, r):
"""Compute combination with a given index.
Equivalent to list(itertools.combinations(range(n), r))[i].
Each combination is represented as a tuple of ascending elements, and
combinations are ordered lexicograplically.
Args:
i: zero-based index of the combination
n: number of possible values, will be taken from range(n)
r: number of elements in result list
"""
if r == 0:
return []
k, ik = first_bisect(i, n, r)
return tuple([k] + [j + k + 1 for j in combination(i - ik, n - k - 1, r - 1)])
I've got a complete working example, including an implementation of choose, more detailed doc strings and tests for some basic assumptions.

Related

Concatenation of binary representation of first n positive integers in O(logn) time complexity

I came across this question in a coding competition. Given a number n, concatenate the binary representation of first n positive integers and return the decimal value of the resultant number formed. Since the answer can be large return answer modulo 10^9+7.
N can be as large as 10^9.
Eg:- n=4. Number formed=11011100(1=1,10=2,11=3,100=4). Decimal value of 11011100=220.
I found a stack overflow answer to this question but the problem is that it only contains a O(n) solution.
Link:- concatenate binary of first N integers and return decimal value
Since n can be up to 10^9 we need to come up with solution that is better than O(n).
Here's some Python code that provides a fast solution; it uses the same ideas as in Abhinav Mathur's post. It requires Python >= 3.8, but it doesn't use anything particularly fancy from Python, and could easily be translated into another language. You'd need to write algorithms for modular exponentiation and modular inverse if they're not already available in the target language.
First, for testing purposes, let's define the slow and obvious version:
# Modulus that results are reduced by,
M = 10 ** 9 + 7
def slow_binary_concat(n):
"""
Concatenate binary representations of 1 through n (inclusive).
Reinterpret the resulting binary string as an integer.
"""
concatenation = "".join(format(k, "b") for k in range(n + 1))
return int(concatenation, 2) % M
Checking that we get the expected result:
>>> slow_binary_concat(4)
220
>>> slow_binary_concat(10)
462911642
Now we'll write a faster version. First, we split the range [1, n) into subintervals such that within each subinterval, all numbers have the same length in binary. For example, the range [1, 10) would be split into four subintervals: [1, 2), [2, 4), [4, 8) and [8, 10). Here's a function to do that splitting:
def split_by_bit_length(n):
"""
Split the numbers in [1, n) by bit-length.
Produces triples (a, b, 2**k). Each triple represents a subinterval
[a, b) of [1, n), with a < b, all of whose elements has bit-length k.
"""
a = 1
while n > a:
b = 2 * a
yield (a, min(n, b), b)
a = b
Example output:
>>> list(split_by_bit_length(10))
[(1, 2, 2), (2, 4, 4), (4, 8, 8), (8, 10, 16)]
Now for each subinterval, the value of the concatenation of all numbers in that subinterval is represented by a fairly simple mathematical sum, which can be computed in exact form. Here's a function to compute that sum modulo M:
def subinterval_concat(a, b, l):
"""
Concatenation of values in [a, b), all of which have the same bit-length k.
l is 2**k.
Equivalently, sum(i * l**(b - 1 - i)) for i in range(a, b)) modulo M.
"""
n = b - a
inv = pow(l - 1, -1, M)
q = (pow(l, n, M) - 1) * inv
return (a * q + (q - n) * inv) % M
I won't go into the evaluation of the sum here: it's a bit off-topic for this site, and it's hard to express without a good way to render formulas. If you want the details, that's a topic for https://math.stackexchange.com, or a page of fairly simple algebra.
Finally, we want to put all the intervals together. Here's a function to do that.
def fast_binary_concat(n):
"""
Fast version of slow_binary_concat.
"""
acc = 0
for a, b, l in split_by_bit_length(n + 1):
acc = (acc * pow(l, b - a, M) + subinterval_concat(a, b, l)) % M
return acc
A comparison with the slow version shows that we get the same results:
>>> fast_binary_concat(4)
220
>>> fast_binary_concat(10)
462911642
But the fast version can easily be evaluated for much larger inputs, where using the slow version would be infeasible:
>>> fast_binary_concat(10**9)
827129560
>>> fast_binary_concat(10**18)
945204784
You just have to note a simple pattern. Taking up your example for n=4, let's gradually build the solution starting from n=1.
1 -> 1 #1
2 -> 2^2(1) + 2 #6
3 -> 2^2[2^2(1)+2] + 3 #27
4 -> 2^3{2^2[2^2(1)+2]+3} + 4 #220
If you expand the coefficients of each term for n=4, you'll get the coefficients as:
1 -> (2^3)*(2^2)*(2^2)
2 -> (2^3)*(2^2)
3 -> (2^3)
4 -> (2^0)
Let the N be total number of bits in the string representation of our required number, and D(x) be the number of bits in x. The coefficients can then be written as
1 -> 2^(N-D(1))
2 -> 2^(N-D(1)-D(2))
3 -> 2^(N-D(1)-D(2)-D(3))
... and so on
Since the value of D(x) will be the same for all x between range (2^t, 2^(t+1)-1) for some given t, you can break the problem into such ranges and solve for each range using mathematics (not iteration). Since the number of such ranges will be log2(Given N), this should work in the given time limit.
As an example, the various ranges become:
1. 1 (D(x) = 1)
2. 2-3 (D(x) = 2)
3. 4-7 (D(x) = 3)
4. 8-15 (D(x) = 4)

How do I make 100 = 1? (explanation within)

Right now I have a code that can find the number of combinations of a sum of a value using numbers greater than zero and less than the value.
I need to alter the value in order to expand the combinations so that they include more than just the value.
For example:
The number 10 yields the results:
[1, 2, 3, 4], [1, 2, 7],
[1, 3, 6], [1, 4, 5],
[1, 9], [2, 3, 5], [2, 8],
[3, 7], [4, 6]
But I need to expand this to including any number that collapses to 1 as well. Because in essence, I need 100 = n in that the sum of the individual numbers within the digits = n. So in this case 100 = 1 because 100 --> 1+0+0 = 1
Therefore the number 1999 will also be a valid combination to list for value = 100 because 1999 = 1+9+9+9 = 28, and 28 = 2+8 = 10, and 10 = 1+0 = 1
Now I realize that this will yield an infinite series of combinations, so I will need to set limits to the range I want to acquire data for. This is the current code I am using to find my combinations.
def a(lst, target, with_replacement=False):
def _a(idx, l, r, t, w):
if t == sum(l): r.append(l)
elif t < sum(l): return
for u in range(idx, len(lst)):
_a(u if w else (u + 1), l + [lst[u]], r, t, w)
return r
return _a(0, [], [], target, with_replacement)
for val in range(100,101):
s = range(1, val)
solutions = a(s, val)
print(solutions)
print('Value:', val, "Combinations", len(solutions))
You seem to have multiple issues.
To repeatedly add the decimal digits of an integer until you end with a single digit, you could use this code.
d = val
while d > 9:
d = sum(int(c) for c in str(d))
This acts in just the way you describe. However, there is an easier way. Repeatedly adding the decimal digits of a number is called casting out nines and results in the digital root of the number. This almost equals the remainder of the number when divided by nine, except that you want to get a result of 9 rather than 1. So easier and faster code is
d = val % 9
if d == 0:
d == 9
or perhaps the shorter but trickier
d = (val - 1) % 9 + 1
or the even-more-tricky
d = val % 9 or 9
To find all numbers that end up at 7 (for example, or any digit from 1 to 9) you just want all numbers with the remainder 7 when divided by 9. So start at 7 and keep adding 9 and you get all such values.
The approach you are using to find all partitions of 7 then arranging them into numbers is much more complicated and slower than necessary.
To find all numbers that end up at 16 (for example, or any integer greater than 9) your current approach may be best. It is difficult otherwise to avoid the numbers that directly add to 7 or to 25 without going through 16. If this is really what you mean, say so in your question and we can look at this situation further.

How to calculate elements needed from a loop?

I have the following data:
y-n-y-y-n-n-n
This repeats infinitely, such as:
y-n-y-y-n-n-n-y-n-y-y-n-n-n-y-n-y-y-n-n-n...
I have 5 "x".
"x" only sticks with "y".
Meaning, if I distribute x on the loop above, it will be:
y-n-y-y-n-n-n-y-n-y-y-n-n-n
x---x-x-----x-x
I want to count how many of the loop's element I needed to use to spread 5 x across, and the answer is 10.
How do I calculate it with a formula?
I presume what you're saying is that you need to process the first 10 elements of the infinite list to get 5 Y's, which match/stick with the 5 X's you have.
y-n-y-y-n-n-n-y-n-y-y-n-n-n-y-n-y-y-n-n-n...
x-_-x-x-_-_-_-x-_-x
^
L____ 10 elements read from the infinite list to place the 5 x's.
I also presume that your question is: given an input of 5 Xs, what is the number of elements you need to process in the infinite list to match those 5 Xs.
You could calculate it with a loop like the following pseudo-code:
iElementsMatchedCounter = 0
iXsMatchedCounter = 0
iXLimit = 5
strElement = ""
if (InfiniteList.IsEmpty() == false)
{
do
{
strElement = InfiniteList.ReadNextElement()
if (strElement == "y")
{
iXsMatchedCounter += 1
}
iElementsMatchedCounter += 1
} while ( (InfiniteList.IsEndReached() == false) AND (iXsMatchedCounter < iXLimit) )
}
if (iXsMatchedCounter = iXLimit)
then Print(iElementsMatchedCounter)
else Print("End of list reached before all X's were matched!")
The drawback of the above approach is that you are actually reading the infinite list, which might not be preferable.
Instead, given you know your list is an infinitely repeating sequence of the same elements y-n-y-y-n-n-n, you don't even need to loop through the entire list, but just operate on the sub-list y-n-y-y-n-n-n. The following algorithm describes how:
Given your starting input:
iNumberOfXs = 5 (you have 5 Xs to match)
iNumberOfYsInSubList = 3
(you have 3 Ys in the sub-list, the total list repeats infinitely)
iLengthOfSubList = 7 (you have 7 elements in the sub-list
y-n-y-y-n-n-n)
We then have intermediate results which are calculated:
iQuotient
iPartialLengthOfList
iPendingXs
iPendingLengthOfList
iResult
The following steps should give the result:
Divide the iNumberOfXs by iNumberOfYsInSubList. Here, this gives us 5/3 = 1.666....
Discard the remainder of the result (the 0.666...), so you're left with 1 as iQuotient. This is the number of complete sub-lists you have to iterate.
Multiply this quotient 1 with iLengthOfSubList, giving you 1*7=7 as iPartialLengthOfList. This is the partial sum of the result, and is the number of elements in the complete sub-lists you iterate.
Also multiply the quotient with iNumberOfYsInSubList, and subtract this product from iNumberOfXs, i.e. iNumberOfXs - (iQuotient * iNumberOfYsInSubList) = 5 - (1 * 3) = 2. Save this value 2 as iPendingXs, which is the number of as-yet unmatched X's.
Note that iPendingXs will always be less than iLengthOfSubList (i.e. it is a modulo, iPendingXs = iNumberOfXs MODULO iNumberOfYsInSubList).
Now you have the trivial problem of matching 2 X's (i.e. the value of iPendingXs calculated above) in the sub-list of y-n-y-y-n-n-n.
The pending items to match (counted as iPendingLengthOfList) is:
Equal to iPendingXs if iPendingXs is 0 or 1
Equal to iPendingXs + 1 otherwise (i.e. if iPendingXs is greater than 1)
In this case, iPendingLengthOfList = 3, because iPendingXs is greater than 1.
The sum of iPartialLengthOfList (7) and iPendingLengthOfList (3) is the answer, namely 10.
In general, if your sub-list y-n-y-y-n-n-n is not pre-defined, then you cannot hard-code the rule in step 6, but will instead have to loop through only the sub-list once to count the Ys and elements, similar to the pseudo-code given above.
When it comes to actual code, you can use integer division and modulo arithmetic to quickly to the operations in steps 2 and 4 respectively.
iQuotient = iNumberOfXs / iNumberOfYsInSubList // COMMENT: here integer division automatically drops the remainder
iPartialLengthOfList = iQuotient * iLengthOfSubList
iPendingXs = iNumberOfXs - (iQuotient * iNumberOfYsInSubList)
// COMMENT: can use modulo arithmetic like the following to calculate iPendingXs
// iPendingXs = iNumberOfXs % iNumberOfYsInSubList
// The following IF statement assumes the sub-list to be y-n-y-y-n-n-n
if (iPendingXs > 1)
then iPendingLengthOfList = iPendingXs + 1
else iPendingLengthOfList = iPendingXs
iResult = iPartialLengthOfList + iPendingLengthOfList

Prolog recursion simple explanation

I have the following recursion rules which returns the sum of a number, but I don't know how does it return the sum:
sum(1,1).
sum(A,Result) :-
A > 0,
Ax is A - 1,
sum(Ax,Bx),
Result is A + Bx.
now when you execute the following command in Prolog:
sum(3,X).
the answer will be 5, but as I look into the rules, I can't see how does these rules return values and sum the. How is the value of Bx is calculated ?
sum(3,X). actually gives a result of X = 6. This predicate (sum(N, X)) computes the sum of integers from 1 to N giving X:
X = 1 + 2 + 3 + ... + N.
So it is the sum of the integers from 1 to N.
sum(1,1) says the sum of 1 by itself is just 1. This is true. :)
The second clause should compute the sum for A > 1, but it's actually not totally properly written. It says A > 0 which is ignoring the fact that the first clause already takes care of the case for 1. I would have written it with A > 1. It will work as is, but be a little less efficient.
sum(A,Result) :-
A > 0,
Ax is A - 1,
sum(Ax, Bx), % Recursively find the sum of integers 1 to A-1
% Instantiate Bx with that sum
Result is A + Bx. % Result is A plus sum (in Bx) from 1 to A-1
This clause recursively says that the sum of integers from 1 to A is Result. That Result is the sum of A and the sum of integers from 1 to A-1 (which is the value Ax is unified to). The Bx is the intermediate sum of integers 1 through Ax (A-1). When it computes the sum(Ax, Bx), the value of Ax is 1 less than A. It will continue calling this second clause recursively until the first parameter goes down to 1, at which point the first clause will provide the value for the sum, and the recursion will unravel from there, summing 1, 2, 3, ...
EDIT: More Details on the Recursion
Let's look at sum(3,X) as an example.
sum(3,X) doesn't match sum(1,1). so that clause is skipped and Prolog looks at sum(A, Result). Prolog matches this by instantiating A as 3 and Result as X and steps through the statements making up the clause:
% SEQUENCE 1
% sum(A, Result) query issued with A = 3
3 > 1, % true
Ax is 3 - 1, % Ax is instantiated as the value 2
sum(2, Bx), % recursive call to `sum`, `Ax` has the value of 2
Result is 3 + Bx. % this statement is awaiting the result of `sum` above
At this point, Prolog suspends computing Result is A + Bx in order to make the recursive call. For the recursive call, Prolog can't match sum(Ax, Bx) to sum(1,1) because Ax is instantiated as 2. So it goes on to the next clause, sum(A, Result) and can match if it instantiates A as 2 and Result as Bx (remember, this is a new call to this clause, so these values for A and Result are a different copy than the ones we "suspended" above). Now Prolog goes through sum(A, Result) statements again, this time with the new values:
% SEQUENCE 2
% sum(A, Result) query issued with A = 2
2 > 0, % true
Ax is 2 - 1, % Ax is instantiated to the value 1
sum(1, Bx), % recursive call to `sum`, `Ax` has the value of 1
Result is 2 + Bx. % this statement is awaiting the result of `sum` above
Now Prolog has sum(1, Bx) (Ax is instantiated with 1). This will match sum(1,1) and instantiate Bx with 1 in the last query to sum above in SEQUENCE 2. That means Prolog will complete the sequence:
Result is 2 + 1. % `A` is 2 and `Bx` is 1, so `Result` is 3
Now that this result is complete, the recursive query to sum in the prior execution in SEQUENCE 1 will complete in a similar fashion. In this case, it is instantiated Bx with 3:
Result is 3 + 3. % `A` is 3 and `Bx` is 3 (from the SEQUENCE 2 query)
% so `Result` is 6
And finally, the original query, sum(3, X) completes, where X is instantiated with the result of 6 and you get:
X = 6.
This isn't a perfect explanation of how the recursion works, and there are some texts around with graphical representations that help. But I hope this provides some insight into how it operates.

Minimum number of element required to make a sequence that sums to a particular number

Suppose there is number s=12 , now i want to make sequence with the element a1+a2+.....+an=12.
The criteria is as follows-
n must be minimum.
a1 and an must be 1;
ai can differs a(i-1) by only 1,0 and -1.
for s=12 the result is 6.
So how to find the minimum value of n.
Algorithm for finding n from given s:
1.Find q = FLOOR( SQRT(s-1) )
2.Find r = q^2 + q
3.If s <= r then n = 2q, else n = 2q + 1
Example: s = 12
q = FLOOR( SQRT(12-1) ) = FLOOR(SQRT(11) = 3
r = 3^2 + 3 = 12
12 <= 12, therefore n = 2*3 = 6
Example: s = 160
q = FLOOR( SQRT(160-1) ) = FLOOR(SQRT(159) = 12
r = 12^2 + 12 = 156
159 > 156, therefore n = 2*12 + 1 = 25
and the 25-numbers sequence for
159: 1,2,3,4,5,6,7,8,9,10,10,10,9,10,10,10,9,8,7,6,5,4,3,2,1
Here's a way to visualize the solution.
First, draw the smallest triangle (rows containing successful odd numbers of stars) that has a greater or equal number of stars to n. In this case, we draw a 16-star triangle.
*
***
*****
*******
Then we have to remove 16 - 12 = 4 more stars. We do this diagonally starting from the top.
1
**2
****3
******4
The result is:
**
****
******
Finally, add up the column heights to get the final answer:
1, 2, 3, 3, 2, 1.
There are two cases: s odd and s even. When s is odd, you have the sequence:
1, 2, 3, ..., (s-1)/2, (s-1)/2, (s-1)/2-1, (s-1)/2-2, ..., 1
when n is even you have:
1, 2, 3, ..., s/2, s/2-1, s/2-2, ..., 1
The maximum possible for any given series of length n is:
n is even => (n^2+2n)/4
n is odd => (n+1)^2/4
These two results are arrived at easily enough by looking at the simple arithmetic sum of series where in the case of n even it is twice the sum of the series 1...n/2. In the case of n odd it is twice the sum of the series 1...(n-1)/2 and add on n+1/2 (the middle element).
Clearly you can generate any positive number that is less than this max as long as n>3.
So the problem then becomes finding the smallest n with a max greater than your target.
Algorithmically I'd go for:
Find (sqrt(4*s)-1) and round up to the next odd number. Call this M. This is an easy to work out value and will represent the lowest odd n that will work.
Check M-1 to see if its max sum is greater than s. If so then that your n is M-1. Otherwise your n is M.
Thank all you answer me. I derived a simpler solution. The algorithm looks like-
First find what is the maximum sum that can be made using n element-
if n=1 -> 1 sum=1;
if n=2 -> 1,1 sum=2;
if n=3 -> 1,2,1 sum=4;
if n=4 -> 1,2,2,1 sum=6;
if n=5 -> 1,2,3,2,1 sum=9;
if n=6 -> 1,2,3,3,2,1 sum=12;
So from observation it is clear that form any number,n 9<n<=12 can be
made using 6 element, similarly number
6<n<=9 can be made at using 5 element.
So it require only a binary search to find the number of
element that make a particular number.

Resources