average value when comparing each element with each other element in a list - math

I have number of strings (n strings) and I am computing edit distance between strings in a way that I take first one and compare it to the (n-1) remaining strings, second one and compare it to (n-2) remaining, ..., comparing until I ran out of the strings.
Why would an average edit distance be computed as sum of all the edit distances between all the strings divided by the number of comparisons squared. This squaring is confusing me.
Thanks,
Jannine

I assume you have somewhere an answer that seems to come with a squared factor -which I'll take as n^2, where n is the number of strings (not the number of distinct comparisons, which is n*(n-1)/2, as +flaschenpost points to ). It would be easier to give you a more precise answer if you'd exactly quote what that answer is.
From what I understand of your question, it isn't, at least it's not the usual sample average. It is, however, a valid estimator of central tendency with the caveat that it is a biased estimator.
See https://en.wikipedia.org/wiki/Bias_of_an_estimator.
Let's define the sample average, which I will denote as X', by
X' = \sum^m_i X_i/N
IF N=m, we get the standard average. In your case, this is the number of distinct pairs which is m=n*(n-1)/2. Let's call this average Xo.
Then if N=n*n, it is
X' = (n-1)/(2*n) Xo
Xo is an unbiased estimator of the population mean \mu. Therefore, X' is biased by a factor f=(n-1)/(2*n). For n very large this bias tends to 1/2.
That said, it could be that the answer you see has a sum that runs not just over distinct pairs. The normalization would then change, of course. For instance, we could extend that sum to all pairs without changing the average value: The correct normalization would then be N = n*(n-1); the value of the average would still be Xo though as the number of summands has double as well.

Those things are getting easier to understand if done by hand with pen and paper for a small example.
If you have the 7 Strings named a,b,c,d,e,f,g, then the simplest version would
Compare a to b, a to c, ... , a to g (this are 6)
Compare b to a, b to c, ... , b to g (this are 6)
. . .
Compare g to a, g to b, ... , g to f (this are 6)
So you have 7*6 or n*(n-1) values, so you divide by nearly 7^2. This is where the square comes from. Maybe you even compare a to a, which should bring a distance of 0 and increase the values to 7*7 or n*n. But I would count it a bit as cheating for the average distance.
You could double the speed of the algorithm, just changing it a small bit
Compare a to b, a to c, ... , a to g (this are 6)
Compare b to c, ... , b to g (this are 5)
Compare c to d, ... , b to g (this are 4)
. . .
Compare f to g (this is 1)
That is following good ol' Gauss 7*6/2, or n*(n-1)/2.
So in Essence: Try doing a simple example on paper and then count your distance values.
Since Average is still and very simply the same as ever:
sum(values) / count(values)

Related

Questions about SVD, Singular Value Decomposition

I am not a mathematician, so I need to understand what SVD does and WHY more than how it works exactly from the math perspective. (I understand at least what is the decomposition though).
This guy on youtube gave the only human explanation of SVD saying, that the U matrix maps "user to concept correlation" Sigma matrix defines the strength of each concept, and V maps "movie to concept correlation" given that initial matrix M has users in the rows, and movie (ratings) in the columns.
He also mentioned two concept specifically "sci fi" and "romance" movies. See the picture below.
My questions are:
How SVD knows the number of concepts. He as human mentioned two - sci fi, and romance, but in reality in resulting matrices are 3 concepts. (for example matrix U - that one with blue titles - has 3 columns not 2).
How SVD knows what is the concept after all. I mean, what If i shuffle the columns randomly how SVD then knows what is sci fi, what is romance. I mean, I suppose there is no rule, group the concepts together in the column order. What if scifi movie is the first and last one? and not first 3 columns in the initial matrix M?
What is the practical usage of either U, Sigma or V matrices? (Except that you can multiply them to get the initial matrix M)
Is there also any other possible human explanation of SVD than the guy up provided, or it is the only one possible function? Matrices of correlations.
As was pointed out in the comments you may well get better explanations elsewhere. However since the question is still open, here is my tuppence worth.
Throughout I'll suppose that A is mxn where m>=n, ie that A has more rows than columns.
First of all there are many forms of the SVD, differing in the sizes of the matrices. They all share the fundamental properties that
A = U*S*V'
S is diagonal
U and V have orthogonal columns (ie U'*U = I, V'*V = I)
Perhaps the most useful from a theoretical point of view is the 'full fat' svd where we have that U is mxm, S is mxn and V is nxn. However this has rather a lot of elements that don't really contribute to A. For example S being diagonal we can write
S = ( S1 ) (where S1 is nxn )
( 0 )
If we divide up U into
U = ( U1 U2) (where U1 is mxn and U2 is (mx(m-n)))
Then its straightforward to calculate that
U*S = U1*S1
and so we can throw away the last m-n columns of U and the last m-n rows of S, and still recover A.
Moreover some of the diagonal elements of S1 may be 0; suppose in fact that p<n of them are non zero. Then we can write
S1 = ( S2 0)
( 0 0)
And arguing as above for U and analogously for V' we can in fact throw away all but the first p columns of U and all of S but S2, and all but the first p rows of V, and still recover A.
This latter is the form of SVD ('thin') in your question:
U is mxp
S is pxp
V' is pxm
where p is the number of non-zero singular values of A. This is my answer to your 1.
By convention the elements of S decrease as you move down the diagonal. To achieve this the routine that calculates the svd in effect works with a version of A with shuffled columns. This shuffling is undone by incorporating the shuffle in the U and V' output. This is my answer to your 2: however you shuffle A, it will be in effect shuffled again to ensure that the singular values decrease down the diagonal.
I struggle to answer 3, because I suspect that our ideas of 'practical' are rather different.
One thing that I think practical is to find simpler approximations to A. The reconstruction of A can be written
A = Sum{ 1<=i<=p | U[i]*S[i]*V[i]' }
where the S[i] are the diagonal elements of S, U[i] are the columns of U and V[i] those of V
We might want to use a simpler model for A, for example want to simplify it down to just one term. That is, we might wonder how much we would lose by using fewer 'concepts'. The 'thin' svd above has already done this in the sense that it has thrown away all the coluns that make no contribution to A. In an extreme case, we might wonder what we would get if we reduced to just one concept. This approximation is found by taking just the first term of the sum above. This extends to however many terms -- q say -- we want to allow: we just take the first q terms of the sum above.
I'm sorry, I can't answer 4.

People - Apple Puzzle [Inspired by client-puzzle protocol]

I am learning a client-puzzle protocol and i have a question about finding the possibility of a solution. Instead of going into the dry protocol facts, here is a scenario:
Lets say i have x people and I have y apples:
Each person must have at least 1 apple
Each person can have at most z apples.
Is there a formula to calculate the number of scenarios?
Example:
4 people [x], 6 apples [y], 15 MAX apples [z]
No. of scenarios calculated by hand: 10.
If my number is very huge, I hope to calculate it using a formula.
Thank you for any help.
Your problem is equivalent to "finds the number of ways you can get x by adding together z numbers, each of which lies between min and max." Sample Python implementation:
def possible_sums(x, z, min, max):
if min*z > x or max*z < x:
return 0
if z == 1:
if x >= min and x <= max:
return 1
else:
return 0
total = 0
#iterate from min, up to and including max
for i in range(min, max+1):
total += possible_sums(x-i, z-1, min, max)
return total
print possible_sums(6, 4, 1, 15)
Result:
10
This function can become quite expensive when called with large numbers, but runtime can be improved with memoization. How this can be accomplished depends on the language, but the conventional Python approach is to store previously calculated values in a dictionary.
def memoize(fn):
results = {}
def f(*args):
if args not in results:
results[args] = fn(*args)
return results[args]
return f
#memoize
def possible_sums(x, z, min, max):
#rest of code goes here
Now print possible_sums(60, 40, 1, 150), which would have taken a very long time to calculate, returns 2794563003870330 in an instant.
There are ways to do this mathematically. It is similar to asking how many ways there are to roll a total of 10 on 3 6-sided dice (x=3, y=10, z=6). You can implement this in a few different ways.
One approach is to use inclusion-exclusion. The number of ways to write y as a sum of x positive numbers with no maximum is y-1 choose x-1 by the stars-and-bars argument. You can calculate the number of ways to write y as a sum of x positive numbers so that a particular set of s of them are at least z+1: 0 if y-x-sz is negative, and y-1-s z choose x-1 if it is nonnegative. Then you can use inclusion-exclusion to write the count as the sum over nonnegative values of s so that y-x-sz is nonnegative of (-1)^s (x choose s)(y-1-sz choose x-1).
You can use generating functions. You can let powers of some variable, say t, hold the total, and the coefficients say how many combinations there are with that total. Then you are asking for the coefficient of t^y in (t+t^2+...+t^z)^x. You can compute this in a few ways.
One approach is with dynamic programming, computing coefficients of (t+t^2+...+t^z)^k for k up to x. The naive approach is probably fast enough: You can compute this for k=1, 2, 3, ..., x. It is a bit faster to use something like repeated squaring, e.g., to compute the 87th power, you could expand 87 in binary as 64+16+4+2+1=0b1010111 (written as a binary literal). You could compute the 1st, 2nd, 4th, 16th, and 64th powers by squaring and multiply these, or you could compute the 0b1, 0b10, 0b101, 0b1010, 0b10101, 0b101011, and 0b1010111 powers by squaring and multiplying to save a little space.
Another approach is to use the binomial theorem twice.
(t+t^2+...+t^z)^x = t^x ((t^z-1)/(t-1))^x
= t^x (t^z-1)^x (t-1)^-x.
The binomial theorem with exponent x lets us rewrite (t^z-1)^x as a sum of (-1)^s t^(z(x-s))(x choose s) where s ranges from 0 to x. It also lets us rewrite (t-1)^-x as an infinite sum of (r+x-1 choose x-1)t^r over nonnegative r. Then we can pick out the finite set of terms which contribute to the coefficient of t^y (r = y-x-sz), and we get the same sum as by inclusion-exclusion above.
For example, suppose we have x=1000, y=1100, z=30. The value is
=1.29 x 10^144.

How to implement fuzzy minimum function via fuzzy maximum

I know that I can represent fuzzy max via power function(i need it in neural network) i.e.
def max(p:Double)(a:Double,b:Double) =
pow(pow(a,p) + pow(b,p) , 1/p)
// assumption a >=0 and b >=0
It is become maximum when p -> infinity and sum when p = 1
Not sure how correctly implement fuzzy minimum.
If you are willing to replace "sum" with "harmonic sum" for the p=1 case, you can use
1/(pow(pow(a,-p) + pow(b,-p),1/p))
This converges to min(a,b) as p goes to infinity.
For p=1 it's 1/(1/a + 1/b), which is related to the harmonic mean but without the factor of 2. Just like in your original formula, a+b is related to the arithmetic mean but without the factor of 2.
However, note that both of these formulas (yours and mine) converge much more slowly to the limit as p goes to infinity, for cases where a and b are closer together.

Multinomial Generation of Degree n

I'm basically looking for a summation function that will compute multinomials given the number of variables and a degree.
Example
2 Variables; 2 Degrees:
x^2+y^2+x*y+x+y+1
Thanks.
See Knuth The Art of Computer Programming, Vol. 4, Fascicle 3 for a comprehensive answer.
Short answer: it's enough to generate all multinomial expressions in n variables with degree exactly d. Then, for your problem, you can either put together the answers with degrees ≤d, or add a dummy variable "1".
The problem of generating all expressions with degree exactly d is thus simply one of generating all ordered partitions (i.e., all nonnegative integer solutions to x1 + ... + xn = d), and this can be done with a simple backtracking algorithm. ("Depth-first search")
Given N variables, and a maximum degree of D, you have an array of D slots to fill with all possible combinations of variables.
[_, _, ..., _, _]
You are allowed to fill the slots with any of the N variables any number <= D times total. Since multiplication is commutative, it suffices to not care about ordering of variables. As such, this problem is reduced to generating (1) partitions of an integer and (2) subsets of a set.
I hope this is at least a start to your solution.
This also seems to be a Dynamic programming variant of the 0-1 Knapsack problem. Here we would be interested in all possible leaves of the decision tree.

Is a given set of group elements a set of coset representatives?

I am afraid the question is a bit technical, but I hope someone might have stumbled into a similar subject, or give me a pointer of some kind.
If G is a group (in the sense of algebraic structure), and if g1, ..., gn are elements of G, is there an algorithm (or a function in some dedicated program, like GAP) to determine whether there is a subgroup of G such that those elements form a set of representatives for the cosets of the subgroup? (We may assume that G is a permutation group, and probably even the full symmetric group.)
(There are of course several algorithms to find the cosets of a given subgroups, like Todd-Coxeter algorithm; this is a kind of inverse question.)
Thanks,
Daniele
The only solution I can come up with is naive. Basically if you have elements x1,...,xn, you would use GAP's LowIndexSubgroupsFpGroup to enumerate all subgroups with index n (discarding those with index < n). Then you would go through each such group, generate the cosets, and check that each coset contains one of the elements.
This is all I could think of. I would be very interested if you came up with a better approach.
What you're trying to determine is if there is a subgroup H of G such that {g1, ..., gn} is a transversal of the cosets of H. i.e. A set of representatives of the partitioning of G by the cosets of H.
First, by Lagrange's theorem, |G| = |G:H| * |G|, where |G:H| = |G|/|H| is the index of the subgroup H of G. If {g1, ..., gn} is indeed a transversal, then |G:H| = |{g1, ..., gn}|, so the first test in your algorithm should be whether n divides |G|.
Moreover, since gi and gj are in the same right coset only if gigj-1 is in H, you can then check subgroups with index n to see if they avoid gigj-1. Also, note that (gigj-1)(gjgk-1) = gigk-1, so you can choose any pairing of the gis.
This should be sufficient if n is small compared to |G|.
Another approach is to start with H being the trivial group and add elements of the set H* = {h in G : hk != gigj-1, for all i, j, k; i != j} to the generators of H until you can't add any more (i.e. until it's no longer a subgroup). H is then a maximal subgroup of G such that H is a subset of H*. If you can get all such H (and have them be large enough) then the subgroup you're looking for must be one of them.
This approach would work better for larger n.
Either way a non-exponential-time approach isn't obvious.
EDIT: I've just found a discussion of this very topic here: http://en.wikipedia.org/wiki/Wikipedia:Reference_desk/Archives/Mathematics/2009_April_18#Is_a_given_set_of_group_elements_a_set_of_coset_representatives.3F
A slightly less brute approach would be to enumerate all subgroups of index n, as Il-Bhima suggested, and then for each subgroup, check each xi * xj-1 to see if it is contained in the subgroup.
The elements x1, ..., xn will be representatives for a subgroup if and only if EVERY product
xi * xj-1 where (i != j)
is NOT in the subgroup.
This type of check seems both simpler than generating all cosets, and computationally faster.

Resources