Formula for combinations with constraints? - math

Suppose we have 4 bags (A, B, C, and D) having number of balls in them as (a, b, c, and d, respectively.) Let ra, rb, rc and rd be the number of balls selected from each bag.
I'd like to impose constraints on the number of balls in each bag (say, like ra < rc, rb < rc, rd ≤ rc, etc.) Also, the balls in each bag are numbered, so xi(n)ri represents the number of combinations possible of choosing r balls from the bag i, which contains xi balls in total
Is there any way to calculate the number of ways possible to choose the balls abiding these constraints?
I know I could solve this by writing some code. However, I'm looking for a simple formula for the answer. Without the constraints, there would be 2n possibilities, but with the constraints there would be fewer. The normal combinations formula (nCr) is irrelevant because of the other constraints added in, and I'm not sure what to do next.

Related

Intuitive Understanding of GCD algorithm

What's an intuitive way to understand how this algorithm finds the GCD?
function gcd(a, b) {
while (a != b)
if (a, b)
a -= b;
else
b -= a;
return a;
}
Wikipedia has a good article on it under the name Euclidean algorithm. In particular, this image from the article might answer your literal question: the intuitive way to understand how this algorithm finds GCD:
Subtraction-based animation of the Euclidean algorithm. The initial rectangle has dimensions a = 1071 and b = 462. Squares of size 462×462 are placed within it leaving a 462×147 rectangle. This rectangle is tiled with 147×147 squares until a 21×147 rectangle is left, which in turn is tiled with 21×21 squares, leaving no uncovered area. The smallest square size, 21, is the GCD of 1071 and 462.
The original inventor of the greatest common divisor algorithm was Euclid, who described it in his book Elements about 300 years before the birth of Christ. Here is his geometric explanation, including his diagram:
Let AB and CD be the two given numbers not relatively prime.
It is required to find the greatest common measure of AB and CD.
If now CD measures AB, since it also measures itself, then CD is a common measure of CD and AB. And it is clear that it is also the greatest, for no greater number than CD measures CD.
But, if CD does not measure AB, then, when the less of the numbers AB and CD being continually subtracted from the greater, some number is left which measures the one before it.
For a unit is not left, otherwise AB and CD would be relatively prime, which is contrary to the hypothesis.
Therefore some number is left which measures the one before it.
Now let CD, measuring BE, leave EA less than itself, let EA, measuring DF, leave FC less than itself, and let CF measure AE.
Since then, CF measures AE, and AE measures DF, therefore CF also measures DF. But it measures itself, therefore it also measures the whole CD.
But CD measures BE, therefore CF also measures BE. And it also measures EA, therefore it measures the whole BA.
But it also measures CD, therefore CF measures AB and CD. Therefore CF is a common measure of AB and CD.
I say next that it is also the greatest.
If CF is not the greatest common measure of AB and CD, then some number G, which is greater than CF, measures the numbers AB and CD.
Now, since G measures CD, and CD measures BE, therefore G also measures BE. But it also measures the whole BA, therefore it measures the remainder AE.
But AE measures DF, therefore G also measures DF. And it measures the whole DC, therefore it also measures the remainder CF, that is, the greater measures the less, which is impossible.
Therefore no number which is greater than CF measures the numbers AB and CD. Therefore CF is the greatest common measure of AB and CD.
Observe that Euclid uses the word "measures" to indicate that some multiple of a smaller length is the same as a larger length; that is, his concept "measures" is identical to our concept "divides" as in 7 divides 28.
In short, if both a and b is dividable by a D then it has to be a divisor of a-b and can not be bigger than a-b. The logic is to apply this recursively with the addition of the rule that for a=b the GCD is a:
GCD(a, b) = a == b ? a : GCD(min(a, b), abs(a-b))

average value when comparing each element with each other element in a list

I have number of strings (n strings) and I am computing edit distance between strings in a way that I take first one and compare it to the (n-1) remaining strings, second one and compare it to (n-2) remaining, ..., comparing until I ran out of the strings.
Why would an average edit distance be computed as sum of all the edit distances between all the strings divided by the number of comparisons squared. This squaring is confusing me.
Thanks,
Jannine
I assume you have somewhere an answer that seems to come with a squared factor -which I'll take as n^2, where n is the number of strings (not the number of distinct comparisons, which is n*(n-1)/2, as +flaschenpost points to ). It would be easier to give you a more precise answer if you'd exactly quote what that answer is.
From what I understand of your question, it isn't, at least it's not the usual sample average. It is, however, a valid estimator of central tendency with the caveat that it is a biased estimator.
See https://en.wikipedia.org/wiki/Bias_of_an_estimator.
Let's define the sample average, which I will denote as X', by
X' = \sum^m_i X_i/N
IF N=m, we get the standard average. In your case, this is the number of distinct pairs which is m=n*(n-1)/2. Let's call this average Xo.
Then if N=n*n, it is
X' = (n-1)/(2*n) Xo
Xo is an unbiased estimator of the population mean \mu. Therefore, X' is biased by a factor f=(n-1)/(2*n). For n very large this bias tends to 1/2.
That said, it could be that the answer you see has a sum that runs not just over distinct pairs. The normalization would then change, of course. For instance, we could extend that sum to all pairs without changing the average value: The correct normalization would then be N = n*(n-1); the value of the average would still be Xo though as the number of summands has double as well.
Those things are getting easier to understand if done by hand with pen and paper for a small example.
If you have the 7 Strings named a,b,c,d,e,f,g, then the simplest version would
Compare a to b, a to c, ... , a to g (this are 6)
Compare b to a, b to c, ... , b to g (this are 6)
. . .
Compare g to a, g to b, ... , g to f (this are 6)
So you have 7*6 or n*(n-1) values, so you divide by nearly 7^2. This is where the square comes from. Maybe you even compare a to a, which should bring a distance of 0 and increase the values to 7*7 or n*n. But I would count it a bit as cheating for the average distance.
You could double the speed of the algorithm, just changing it a small bit
Compare a to b, a to c, ... , a to g (this are 6)
Compare b to c, ... , b to g (this are 5)
Compare c to d, ... , b to g (this are 4)
. . .
Compare f to g (this is 1)
That is following good ol' Gauss 7*6/2, or n*(n-1)/2.
So in Essence: Try doing a simple example on paper and then count your distance values.
Since Average is still and very simply the same as ever:
sum(values) / count(values)

How to calculate the frequency of word in Zipf's law?

There are 4 different words a, b, c, d in a collection where their frequency order is a > b > c > d. The total number of tokens in this collection is 1500. Using Zipf's law, what are the frequencies of the four words?
Is there any formula for Zipf's law?
I studied that the most frequent will occur approximately twice as often as the second most frequent word in Zipf's law.
I humbly direct you to the wikipedia article on Zipf's Law,
Formally, let:
N be the number of elements;
k be their rank;
s be the value of
the exponent characterizing the distribution.
Zipf's law then predicts that out of a population of N elements, the frequency of elements of rank k, f(k;s,N), is:
There you go. There's your formula for the frequency of a word.

Multinomial Generation of Degree n

I'm basically looking for a summation function that will compute multinomials given the number of variables and a degree.
Example
2 Variables; 2 Degrees:
x^2+y^2+x*y+x+y+1
Thanks.
See Knuth The Art of Computer Programming, Vol. 4, Fascicle 3 for a comprehensive answer.
Short answer: it's enough to generate all multinomial expressions in n variables with degree exactly d. Then, for your problem, you can either put together the answers with degrees ≤d, or add a dummy variable "1".
The problem of generating all expressions with degree exactly d is thus simply one of generating all ordered partitions (i.e., all nonnegative integer solutions to x1 + ... + xn = d), and this can be done with a simple backtracking algorithm. ("Depth-first search")
Given N variables, and a maximum degree of D, you have an array of D slots to fill with all possible combinations of variables.
[_, _, ..., _, _]
You are allowed to fill the slots with any of the N variables any number <= D times total. Since multiplication is commutative, it suffices to not care about ordering of variables. As such, this problem is reduced to generating (1) partitions of an integer and (2) subsets of a set.
I hope this is at least a start to your solution.
This also seems to be a Dynamic programming variant of the 0-1 Knapsack problem. Here we would be interested in all possible leaves of the decision tree.

Is a given set of group elements a set of coset representatives?

I am afraid the question is a bit technical, but I hope someone might have stumbled into a similar subject, or give me a pointer of some kind.
If G is a group (in the sense of algebraic structure), and if g1, ..., gn are elements of G, is there an algorithm (or a function in some dedicated program, like GAP) to determine whether there is a subgroup of G such that those elements form a set of representatives for the cosets of the subgroup? (We may assume that G is a permutation group, and probably even the full symmetric group.)
(There are of course several algorithms to find the cosets of a given subgroups, like Todd-Coxeter algorithm; this is a kind of inverse question.)
Thanks,
Daniele
The only solution I can come up with is naive. Basically if you have elements x1,...,xn, you would use GAP's LowIndexSubgroupsFpGroup to enumerate all subgroups with index n (discarding those with index < n). Then you would go through each such group, generate the cosets, and check that each coset contains one of the elements.
This is all I could think of. I would be very interested if you came up with a better approach.
What you're trying to determine is if there is a subgroup H of G such that {g1, ..., gn} is a transversal of the cosets of H. i.e. A set of representatives of the partitioning of G by the cosets of H.
First, by Lagrange's theorem, |G| = |G:H| * |G|, where |G:H| = |G|/|H| is the index of the subgroup H of G. If {g1, ..., gn} is indeed a transversal, then |G:H| = |{g1, ..., gn}|, so the first test in your algorithm should be whether n divides |G|.
Moreover, since gi and gj are in the same right coset only if gigj-1 is in H, you can then check subgroups with index n to see if they avoid gigj-1. Also, note that (gigj-1)(gjgk-1) = gigk-1, so you can choose any pairing of the gis.
This should be sufficient if n is small compared to |G|.
Another approach is to start with H being the trivial group and add elements of the set H* = {h in G : hk != gigj-1, for all i, j, k; i != j} to the generators of H until you can't add any more (i.e. until it's no longer a subgroup). H is then a maximal subgroup of G such that H is a subset of H*. If you can get all such H (and have them be large enough) then the subgroup you're looking for must be one of them.
This approach would work better for larger n.
Either way a non-exponential-time approach isn't obvious.
EDIT: I've just found a discussion of this very topic here: http://en.wikipedia.org/wiki/Wikipedia:Reference_desk/Archives/Mathematics/2009_April_18#Is_a_given_set_of_group_elements_a_set_of_coset_representatives.3F
A slightly less brute approach would be to enumerate all subgroups of index n, as Il-Bhima suggested, and then for each subgroup, check each xi * xj-1 to see if it is contained in the subgroup.
The elements x1, ..., xn will be representatives for a subgroup if and only if EVERY product
xi * xj-1 where (i != j)
is NOT in the subgroup.
This type of check seems both simpler than generating all cosets, and computationally faster.

Resources