lattice puzzles in the Constant Propagation for hotspot - math

I read a article below about Constant Propagation for hotspot through lattice .
http://www.cliffc.org/blog/2012/02/27/too-much-theory-part-2/
And it described that " (top meet -1) == -1 == (-1 meet top), so this example works. So does: (1 meet 2,3) == bottom == (2,3 meet 1). So does: (0 meet 0,1) == 0,1 == (0,1 meet 0)"
However I cannot understand why (top meet -1)==1 , and (-1 meet top)==-1? also why (1 meet 2,3) and (2,3 meet 1)==bottom ? how the meet is calculated?

Jacky, from your question it seems that you don't get some basic concepts. Have you tried to read linked Lattice wiki article?
I'm not sure if I can be better than collective mind of the Wiki but I'll try.
Let's start with poset aka "Partially ordered set". Having a "poset" means that you have a set of some objects and some comparator <= that you can feed two objects to and it will say which one is less (or rather "less or equal"). What differs partially ordered set from totally ordered one is that in more usual totally ordered set at least one of a <= b and a >= b holds true. In "partially ordered" mean that both might be false at the same time. I.e. you have some elements that you can't compare at all.
Now lattice is a structure over poset (and potentially not every poset can be converted to a lattice). To define a lattice you need to define two methods meet and join. meet is a function from a pair of elements of the poset to an element of the poset such that (I will use meet(a, b) syntax instead of a meet b as it seems to be more friendly for Java-developers):
For every pair of elements a and b there is an element inf = meet(a,b) i.e. meet is defined for every pair of elements
For every pair of elements a and b meet(a,b) <= a and meet(a,b) <= b
For every pair of elements a and b if inf = meet(a,b) there is no other element c in the set such that c <= a AND c <= b AND NOT c <= inf i.e. meet(a,b) defines the largest common minimum element (or more technically an infimum) and such element is unique.
The same goes for join but the join is finding "maximum" of two elements or more technically supremum.
So now let's go back to the example you referenced. The poset in that example contains of 4 types or rather layers of elements:
Top - an artificial element added to poset such that it is greater than any other element
Single integers
Pairs of neighbor integers (range) such as "[0, 1]" (here unlike the author I will use "[" and "]" to define ranges to not confuse with application of meet)
Bottom - an artificial element added to poset such that it is less than any other element
Note that all elements in single layer are not comparable(!) but all elements in any higher layer are greater than all elements in any lower layer. So no 1 is not less than 2 under that poset structure but [1,2] is less than both 1 and 2.
Note that all elements in single layer are not comparable(!). So no 1 is not less than 2 under that poset structure but [1,2] is less than both 1 and 2. Top is greater than anything. Bottom is less than anything. And range [x,y] is comparable with raw number z if and only if z lines inside the range and in that case range is less, otherwise they are not comparable.
You may notice that the structure of the poset "induces" corresponding lattice. So given such structure it is easy to understand how to define the meet function to satisfy all the requirements:
meet(Top, a) = meet(a, Top) = a for any a
meet(Bottom, a) = meet(a, Bottom) = Bottom for any a
meet(x, y) where both x and y are integers (i.e. for layer #2) is either:
Just x if x = y
Range [x, y] if x + 1 = y
Range [y, x] if y + 1 = x
Bottom otherwise
(I'm not sure if this is the right definition, it might always be range [min(x,y), max(x,y)] unless x = y . It is not clear from examples but it is not very important)
meet([x,y], z) = meet(z, [x,y]) where x, y, and z are integers i.e. meet of an integer (layer #2) and range (layer #3) is:
Range [x, y] if x = z or y = z (in other words if [x,y] < z)
Bottom otherwise
So meet of a range and an integer is almost always Bottom except most trivial cases
meet(a, b) where both a and b are ranges i.e. meet of two ranges (layer #3) is:
Range a is a = b
Bottom otherwise
So meet of two ranges is also Bottom except most trivial cases
What that part of example is about is actually about "inducing" the lattice from the structure and verifying that most of the desirable features hold (except for symmetry which is added in the next example).
Hope this helps
Update (answers to comments)
It is hard to answer "why". This is because the author build his poset in that way (probably because that way will be useful later). I think you are confused because set of natural numbers has a "natural" (pun not intended) sort order that we all are used to. Put there is nothing that could prohibit me to get the same set (i.e. the same object = all natural numbers) and define some other sorting order. Are you familiar with java.util.Comparator interface? Using that interface you can specify any sorting rule for Integer type such as "all even numbers are greater than all odd ones and inside even or odd classes works "usual" comparison rule" and you can use such a Comparartor to sort collection if for some reason such sorting order makes sense for your task. This is the same case, for author's task it makes sense to define an alternative (custom) sorting order. Moreover he want to make it only a partial order (which is impossible with Comparator). An the way he defines his order is the way I described.
Also if it is possible to do compare for [1,2] with 0 or 3?
Yes, you can compare and the answer directly follows from "all elements in single layer are not comparable(!) but all elements in any higher layer are greater than all elements in any lower layer": any number such as 0, 3, 42 or Integer.MAX_VALUE from layer #2 is greater than any range (layer #3) including the [1,2] range.
After some more thinking about it, my original answer was wrong. To satisfy author's goal range [1,2] should be not comparable with 0 or 3. So the answer is No. Actually my specification of the meet is correct but my description of sorting order is wrong.
Also the explanation for top and bottom are different from you, the original author explained that "bottom” == “we don’t know what values this might take on , “top” == “we can pick any value we like", I have do idea if you both explanation for the top and bottom actual refer to the same thing.
Here you mix up how the author defines top and bottom as a part of a mathematical structure called "lattice" and how he uses them for his practical task.
What this article is about is that there is an algorithm that analyses code for optimization based on "analyses of constants" and the algorithm is build upon the lattice of the described structure. The algorithm is based on processing different objects of the defined poset and involves finding meet of them multiple times. What the quoted answer describes is how to interpret the final value that algorithm produces rather than how those values are define.
AFAIU the basic idea behind algorithm is following: we have some variable and we see a few places where the value is assigned to it. For various optimizations it is good to know what it the possible range of values that the variable can take without running the actual code with all possible inputs. So the suggested algorithm is based on a simple idea: if we have two assignments (probably conditional) to the variable and in the first we know that values are in range [L1, R1] and in the second one values are in the range [L2, R2], we can be sure that now value is in the range [min(L1, L2), max(R1, R2)] (and this is effectively how meet is defined on that lattice). So now we can analyze all assignments in a function and to each assign range of possible values. Note that this structure of numbers and unlimited ranges also forms a lattice that the author describes in the first article (http://www.cliffc.org/blog/2012/02/12/too-much-theory/).
Note that Top is effectively impossible in Java because it provide some guarantees, but in C/C++ as the author mentions, we can have a variable that is not assigned at all and in such case C Language standard allows the compiler to treat that variable as having any value by compiler's choice i.e. compiler might assume whatever is most useful for optimization and this is what Top stands for. On the other hand if some value comes in as a argument to a method, it is Bottom because it can be any value without any control by compiler i.e. compiler can not assume anything about the value.
In the second article author points out that although the lattice from the first article is good theoretically in practice it can be very inefficient computationally. Thus to simplify computations he reduces his lattice to a much simpler one but the general theory stays the same: we assign ranges to the variables at each assignment so later we can analyze code and optimize it. And when we finished computing all the ranges, the interpretation for optimization assuming that analyzed line is if in the:
if (variable > 0) {
block#1
}
else {
block#2
}
is following
Top - if the line of code might be optimized assuming the variable has some specific value, compiler is free to do that optimization. So in the example compiler is free to eliminate that branch and decide that code will always go to block#1 and remove block#2 altogether OR decide that code will always go to block#2 and remove block#1 whichever alternative seems better to the compiler.
x - i.e. some constant value x - if the line of code might be optimized assuming the variable has value exactly x, compiler is free to do that optimization. So in the example compiler can evaluate x > 0 with that constant and leave only the code branch that corresponds to the calculated boolean value removing the other branch.
[x, y] - i.e. range from x to y. If the line of code might be optimized assuming the variable has value between x and y, compiler is free to do that optimization. So in the example if x > 0 (and thus y > 0), then compiler can remove block #2; y <= 0 (and thus x <= 0) , then compiler can remove block #1; if x <= 0 and y > 0 compiler can't optimize that code
Bottom - compiler can't optimize that code.

Related

Is the concept of equal subsets a fallacy?

A definition of subset that I found on the internet(In this web, 2nd point 3th paragraph) and in a book(set theory related topics by lipschutz, Page 3 - Def. 1-1) says or implies that:
A=B If at the same time A⊂B and B⊂A;
This would imply that A is contained in B, but it would also imply that B is contained in A.
Wouldn't this be a fallacy as demonstrated in Russell's paradox?
I imagine it would be something like that, it's this right?
This Img
Unfortunately, "contains" can be used in two very different ways for a set: "contains as a member" and "contains as a subset", so I would suggest avoiding it or being very clear which one you mean. I think the second one is less common to use "contains" for, but it still happens.
It's true that A can't be a member of B when B is a member of A (but not really related to Russell's paradox); but A can be a subset of B and B a subset of A. Just consider A={1} and B={1}. Then every member of A (i.e. 1) is a member of B, so A is a subset of B. And vice versa.
I imagine it would be something like that, it's this right? This Img
This would be if B is a proper subset of A (that is, a subset of A but not equal to A) and A is a proper subset of B.
A set X is contained in set Y if every element of X is in Y.
So, even if X is equal to Y, X is considered to be contained in Y.
At this point, it will be clear that if X is contained in Y and Y is contained in X as well in the same time, they should be equal.

Is it possible to represent 'average value' in programming?

Had a tough time thinking of an appropriate title, but I'm just trying to code something that can auto compute the following simple math problem:
The average value of a,b,c is 25. The average value of b,c is 23. What is the value of 'a'?
For us humans we can easily compute that the value of 'a' is 29, without the need to know b and c. But I'm not sure if this is possible in programming, where we code a function that takes in the average values of 'a,b,c' and 'b,c' and outputs 'a' automatically.
Yes, it is possible to do this. The reason for this is that you can model the sort of problem being described here as a system of linear equations. For example, when you say that the average of a, b, and c is 25, then you're saying that
a / 3 + b / 3 + c / 3 = 25.
Adding in the constraint that the average of b and c is 23 gives the equation
b / 2 + c / 2 = 23.
More generally, any constraint of the form "the average of the variables x1, x2, ..., xn is M" can be written as
x1 / n + x2 / n + ... + xn / n = M.
Once you have all of these constraints written out, solving for the value of a particular variable - or determining that many solutions exists - reduces to solving a system of linear equations. There are a number of techniques to do this, with Gaussian elimination with backpropagation being a particularly common way to do this (though often you'd just hand this to MATLAB or a linear algebra package and have it do the work for you.)
There's no guarantee in general that given a collection of equations the computer can determine whether or not they have a solution or to deduce a value of a variable, but this happens to be one of the nice cases where the shape of the contraints make the problem amenable to exact solutions.
Alright I have figured some things out. To answer the question as per title directly, it's possible to represent average value in programming. 1 possible way is to create a list of map data structures which store the set collection as key (eg. "a,b,c"), while the average value of the set will be the value (eg. 25).
Extract the key and split its string by comma, store into list, then multiply the average value by the size of list to get the total (eg. 25x3 and 23x2). With this, no semantic information will be lost.
As for the context to which I asked this question, the more proper description to the problem is "Given a set of average values of different combinations of variables, is it possible to find the value of each variable?" The answer to this is open. I can't figure it out, but below is an attempt in describing the logic flow if one were to code it out:
Match the lists (from Paragraph 2) against one another in all possible combinations to check if a list contains all elements in another list. If so, substract the lists (eg. abc-bc) as well as the value (eg. 75-46). If upon substracting we only have 1 variable in the collection, then we have found the value for this variable.
If there's still more than 1 variables left such as abcd - bc = ad, then store the values as a map data structure and repeat the process, till the point where the substraction count in the full iteration is 0 for all possible combinations (eg. ac can't substract bc). This is unfortunately not where it ends.
Further solutions may be found by combining the lists (eg. ac + bd = abcd) to get more possible ways to subtract and derive at the answer. When this is the case, you just don't know when to stop trying, and the list of combinations will get exponential. Maybe someone with strong related mathematical theories may be able to prove that upon a certain number of iteration, further additions are useless and hence should stop. Heck, it may even be possible that negative values are also helpful, and hence contradict what I said earlier about 'ac' can't subtract 'bd' (to get a,c,-b,-d). This will give even more combinations to compute.
People with stronger computing science foundations may try what templatetypedef has suggested.

Why do we need to have absolute values of V(number of Vertices) and E(Number of Edges) in upper bound of various Graph algorithms

I have been reading through the Graph Algorithms recently and saw the notation for various upper bounds of graph algorithms is of the form O(|V| + |E|). especially in DFS/BFS search algorithms where linear time is achieved with above upper bound.
I have seen both the notations interchangeably used, i.e. O(V+E) as well. as far as I understand "|" bar notation is used for absolute values in math world. if V = # of vertices and E = # of Edges, how can they be negative numbers, such that we have need to get the absolute values before computing the linear function. Please help.
|X| refers to the cardinality (size) of X when X is a set.
O(V+E) is technically incorrect, assuming that V and E refer to sets of vertices and edges. This is because the value inside O( ) should be quantitative, rather than abstract sets of objects that have an ambiguous operator applied to them. |V| + |E| is well-defined to be one number plus another, whereas V + E could mean a lot of things.
However, in informal scenarios (e.g. conversing over the internet and in person), many people (including me) still say O(V+E), because the cardinality of the sets is implied. I like to type fast and adding in 4 pipe characters just to be technically correct is unnecessary.
But if you need to be technically correct, i.e. you're in a formal environment, or e.g. you're writing your computer science dissertation, it's best to go with O(|V|+|E|).
In this case, the vertical bars || denote the cardinality or number of elements of a set (i.e. |E| represents the count of elements in the set E).
http://en.wikipedia.org/wiki/Cardinality

derangements and permutations in cryptography

i have a problem that i am having a bit of trouble with;
we are given a partial key (missing 11 letters) for a mono-alphabetic substitution cipher and asked to calculate the number of possible keys given that no plaintext letter can be mapped to itself.
ordinarily, the number of possible keys would be the number of derangements of the missing letters (!11), however 5 of the plaintext letters that are missing mappings already exist as mappings in the partial key, so logically it shouldnt matter what the mapping of those plaintext letters is, because they can never map to themselves.
so shouldnt the number of possible keys be 5! * !6, ie. (the number of permutations of the 5 already mapped free letters) * (the number of derangements of the remaining 6)?
the problem is that 5! * !6 = 31800 which is much less than !11 = 14684570
intuitively the set of derangements should be a smaller subset of !11, shouldnt it?
am i just getting something wrong in my arithmetic? or am i completely missing the concepts? any help would be greatly appreciated
thanks gus
ps. i know this isn't strictly a programming question, but it is a computing question and related to a programming project, so i thought it might be pertinent. also, i posted it on math.stackexchange.com yesterday but havent had any responses yet..
EDIT: corrected the value of !11
I think your problem can be rephrased as the following:
How many permutation has a list with elements a_0, a_1, ... a_n-1, b_0, b_1, ..., b_m-1, in which no a_k element is at position k? (Let us denote this number with p_{n,m} - your specific question is the value of p_{6,5}.)
Please note that your suggested formula 5!*!6 is not correct because of the following:
it only counts the cases, where the a_ks are in the first 6 positions (without any of them being in the position of its own index), and the b_ks on the last 5.
You do not count any other configurations like: a_3, b_4, b_1, a_0, a_5, b_0, a_2, b_2, b_3, a_1, a_4, where the order is totally mixed.
Your other idea about the result being a subset of the !11-element derangement on all the elements is also not correct, as any of the b_ks can be at any position.
However, we can easily add a recursive formula for p_{n,m} by separating it into two cases based on the position of a_0.
If a_0 gets in one of the positions 1, 2, ..., n-1. (n-1 different possibilities.)
This means that neither a_0 is at position 0, and it also prevents another a_k from being at position k by occupying that position. Thus this a_k becomes 'free', it can go to any other positions. If a_0 gets fixed this way, the other elements can be permutated in p_{n-2,m+1} different ways.
If a_0 gets in one of the positions n, n+1, ..., n+m-1. (m different possibilities.)
This way no other a_k gets prevented to be at the position corresponding to it's index. The other elements can be permutated in p_{n-1,m} different ways.
Adding this together gives the recursion: p_{n,m} = (n-1)*p_{n-2,m+1} + m*p_{n-1,m}. The halting conditions are p_{0,m}=m! for every m, as it means, that each element can be at any location.
I also coded it in python:
import math
def derange(n,m):
if n<0:
return 0
elif n==0:
return math.factorial(m)
else:
return (n-1)*derange(n-2, m+1) + m*derange(n-1, m)
print derange(6,5)
gives 22852200.
If you are interested in the general case, you can find some related sequences on OEIS.
The search term 'differences of factorial numbers' can be interesting, e.g. in triangular form: http://oeis.org/A047920.
There is also an article mentioned there: http://www.pmfbl.org/janjic/enumfun.pdf, maybe it can help if you are interested in a generic closed formula for n and m.
Suddenly I didn't have any good idea to come up with, but I think this can be a good starting point.

Making a cryptaritmetic solver in C++

I am planning out a C++ program that takes 3 strings that represent a cryptarithmetic puzzle. For example, given TWO, TWO, and FOUR, the program would find digit substitutions for each letter such that the mathematical expression
TWO
+ TWO
------
FOUR
is true, with the inputs assumed to be right justified. One way to go about this would of course be to just brute force it, assigning every possible substitution for each letter with nested loops, trying the sum repeatedly, etc., until the answer is finally found.
My thought is that though this is terribly inefficient, the underlying loop-check thing may be a feasible (or even necessary) way to go--after a series of deductions are performed to limit the domains of each variable. I'm finding it kind of hard to visualize, but would it be reasonable to first assume a general/padded structure like this (each X represents a not-necessarily distinct digit, and each C is a carry digit, which in this case, will either be 0 or 1)? :
CCC.....CCC
XXX.....XXXX
+ XXX.....XXXX
----------------
CXXX.....XXXX
With that in mind, some more planning thoughts:
-Though leading zeros will not be given in the problem, I probably ought to add enough of them where appropriate to even things out/match operands up.
-I'm thinking I should start with a set of possible values 0-9 for each letter, perhaps stored as vectors in a 'domains' table, and eliminate values from this as deductions are made. For example, if I see some letters lined up like this
A
C
--
A
, I can tell that C is zero and this eliminate all other values from its domain. I can think of quite a few deductions, but generalizing them to all kinds of little situations and putting it into code seems kind of tricky at first glance.
-Assuming I have a good series of deductions that run through things and boot out lots of values from the domains table, I suppose I'd still just loop over everything and hope that the state space is small enough to generate a solution in a reasonable amount of time. But it feels like there has to be more to it than that! -- maybe some clever equations to set up or something along those lines.
Tips are appreciated!
You could iterate over this problem from right to left, i.e. the way you'd perform the actual operation. Start with the rightmost column. For every digit you encounter, you check whether there already is an assignment for that digit. If there is, you use its value and go on. If there isn't, then you enter a loop over all possible digits (perhaps omitting already used ones if you want a bijective map) and recursively continue with each possible assignment. When you reach the sum row, you again check whether the variable for the digit given there is already assigned. If it is not, you assign the last digit of your current sum, and then continue to the next higher valued column, taking the carry with you. If there already is an assignment, and it agrees with the last digit of your result, you proceed in the same way. If there is an assignment and it disagrees, then you abort the current branch, and return to the closest loop where you had other digits to choose from.
The benefit of this approach should be that many variables are determined by a sum, instead of guessed up front. Particularly for letters which only occur in the sum row, this might be a huge win. Furthermore, you might be able to spot errors early on, thus avoiding choices for letters in some cases where the choices you made so far are already inconsistent. A drawback might be the slightly more complicated recursive structure of your program. But once you got that right, you'll also have learned a good deal about turning thoughts into code.
I solved this problem at my blog using a randomized hill-climbing algorithm. The basic idea is to choose a random assignment of digits to letters, "score" the assignment by computing the difference between the two sides of the equation, then altering the assignment (swap two digits) and recompute the score, keeping those changes that improve the score and discarding those changes that don't. That's hill-climbing, because you only accept changes in one direction. The problem with hill-climbing is that it sometimes gets stuck in a local maximum, so every so often you throw out the current attempt and start over; that's the randomization part of the algorithm. The algorithm is very fast: it solves every cryptarithm I have given it in fractions of a second.
Cryptarithmetic problems are classic constraint satisfaction problems. Basically, what you need to do is have your program generate constraints based on the inputs such that you end up with something like the following, using your given example:
O + O = 2O = R + 10Carry1
W + W + Carry1 = 2W + Carry1 = U + 10Carry2
T + T + Carry2 = 2T + Carry2 = O + 10Carry3 = O + 10F
Generalized pseudocode:
for i in range of shorter input, or either input if they're the same length:
shorterInput[i] + longerInput2[i] + Carry[i] = result[i] + 10*Carry[i+1] // Carry[0] == 0
for the rest of the longer input, if one is longer:
longerInput[i] + Carry[i] = result[i] + 10*Carry[i+1]
Additional constraints based on the definition of the problem:
Range(digits) == {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
Range(auxiliary_carries) == {0, 1}
So for your example:
Range(O, W, T) == {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
Range(Carry1, Carry2, F) == {0, 1}
Once you've generated the constraints to limit your search space, you can use CSP resolution techniques as described in the linked article to walk the search space and determine your solution (if one exists, of course). The concept of (local) consistency is very important here and taking advantage of it allows you to possibly greatly reduce the search space for CSPs.
As a simple example, note that cryptarithmetic generally does not use leading zeroes, meaning if the result is longer than both inputs the final digit, i.e. the last carry digit, must be 1 (so in your example, it means F == 1). This constraint can then be propagated backwards, as it means that 2T + Carry2 == O + 10; in other words, the minimum value for T must be 5, as Carry2 can be at most 1 and 2(4)+1==9. There are other methods of enhancing the search (min-conflicts algorithm, etc.), but I'd rather not turn this answer into a full-fledged CSP class so I'll leave further investigation up to you.
(Note that you can't make assumptions like A+C=A -> C == 0 except for in least significant column due to the possibility of C being 9 and the carry digit into the column being 1. That does mean that C in general will be limited to the domain {0, 9}, however, so you weren't completely off with that.)

Resources