Transition Graph per alphabet? - graph

How do you determine how many different Transition Graphs are over a particular alphabet? For example How many TG's are over the alphabet {x, y}. I am taking a class with a similar question from Daniel I. A. Cohen's book, "Introduction to computer theory." There are plenty of examples of how to create a TG but nothing to determine how many can be created per language. I'm assuming I'm looking for finite amount of TG's? Thank You very much!

There are countably infinitely many such transition graphs. One way to think about this is that you can easily construct a family of infinitely many transition graphs as follows. Suppose that I want to accept the language an for some fixed n (that is, n copies of the letter a). Then I could construct a transition graph that accepts that language as follows. Begin with a start state, then chain n new states onto the end of that state, each with a transition on 'a' to the next state. Make the last state accepting.
To see that there are only countably infinitely many of these, we can think of how we would describe these automata. We could do so by writing out the number of states in unary, then the transisions between those states as a list of tuples (start, end, character) (all encoded in binary), then the accepting states as a list of the numbers of the states in unary. Concatenated together, this is a binary string, and there are only countably many finite binary strings.

Related

Are these two different formulas for Value-Iteration update equivalent?

While studying MDP via different sources, I came across two different formulas for the Value update in Value-Iteration algorithm.
The first one is (the one on Wikipedia and a couple of books):
.
And the second one is (in some questions here on stack, and my course slides) :
For a specific iteration, they don't seem to give the same answer. Is one of them converging faster to the solution ?
Actually the difference is in reward functions R(s , s') or R(s) in the second formula.
First equation is generalized.
In the first one, the rewards is Ra(s , s') when transitioning from state s to the state s' due action a'.
Reward could be different for different states and actions.
But if for every state s we have some pre-defined reward(regardless of the previous state and the action that leads to s), then we can simplify the formula to the second one.
The final values are not necessarily equal but the policies are same.

Number of valid parenthesis catalan number explanation

While studying about catalan numbers, some of the applications that I came across were:
no of possible binary search trees using n nodes.
no of ways to draw non-intersecting chords using 2*n points on a circle.
no of ways to arrange n pairs of parenthesis.
While I understand the first two problems, how catalan numbers fit in their solution, I am not able to understand how they fit in the third problem.
Couldn't find any other useful resource on the internet which explains the HOW part. Everyone just says that it's the solution.
Can someone please explain.
Since others do not seem to agree with me that this question is off-topic, I now decide that it is on topic and provide and answer.
The Wikipedia is indeed confusing about the "number of ways to arrange n pairs of parentheses" (the second bullet point in this link.) Part of the confusion is that the order of the strings of parentheses does not match the order of the binary tree, which you do understand, or with many of the other examples.
Here is a way to transform a string of n pairs of parentheses which are correctly matched into a binary tree with n internal nodes. Consider the left-most parenthesis, which will be a left-parenthesis, together with its matching right-parenthesis. Turn the string into a node of the binary tree. The sub-string that is inside the currently-considered parentheses becomes the left child of this node, and the sub-string that is after (to the right) of the currently-considered right-parenthesis becomes the right child. Either or both sub-strings may be empty, and the currently-considered parentheses are simply removed. If either sub-string is not empty, continue this procedure recursively until all parentheses have been removed.
Here are two examples. Let's start with the string ((())). We start with
The considered-parentheses are the outermost ones. This becomes
(I did not bother drawing the external leaf nodes) then
then
which is Wikipedia's left-most binary tree with 3 internal nodes.
Now let's do another string, (())(). We start with
Again, the considered-parentheses are the outermost ones. This transforms to
And now the considered-parentheses are the first two, not the outermost ones. This becomes
which finally becomes
which is the second binary tree in Wikipedia's list.
I hope you now understand. Here is a list of all five possible strings of 3 pairs of parentheses that are correctly paired, followed by Wikipedia's list of binary trees. These lists now correspond to each other.
((())) (()()) (())() ()(()) ()()()

What is the difference between permutations and derangements?

I have been given a program to write difference combinations of set of number entered by user and when I researched for the same I get examples with terms permutations and derangements.
I am unable to find the clarity between the them. Also adding to that one more term is combinations. Any one please provide a simple one liner for clarity on the question.
Thanks in advance.
http://en.wikipedia.org/wiki/Permutation
The notion of permutation relates to the act of rearranging, or permuting, all the members of a set into some sequence or order (unlike combinations, which are selections of some members of the set where order is disregarded). For example, written as tuples, there are six permutations of the set {1,2,3}, namely: (1,2,3), (1,3,2), (2,1,3), (2,3,1), (3,1,2), and (3,2,1). As another example, an anagram of a word, all of whose letters are different, is a permutation of its letters.
http://en.wikipedia.org/wiki/Derangement
In combinatorial mathematics, a derangement is a permutation of the elements of a set such that none of the elements appear in their original position.
The number of derangements of a set of size n, usually written Dn, dn, or !n, is called the "derangement number" or "de Montmort number". (These numbers are generalized to rencontres numbers.) The subfactorial function (not to be confused with the factorial n!) maps n to !n.1 No standard notation for subfactorials is agreed upon; n¡ is sometimes used instead of !n.2

Creating an efficient function to fit a dataset

Basically I have a large (could get as large as 100,000-150,000 values) data set of 4-byte inputs and their corresponding 4-byte outputs. The inputs aren't guaranteed to be unique (which isn't really a problem because I figure I can generate pseudo-random numbers to add or xor the inputs with so that they do become unique), but the outputs aren't guaranteed to be unique either (so two different sets of inputs might have the same output).
I'm trying to create a function that effectively models the values in my data-set. I don't need it to interpolate efficiently, or even at all (by this I mean that I'm never going to feed it an input that isn't contained in this static data-set). However it does need to be as efficient as possible. I've looked into interpolation and found that it doesn't really fit what I'm looking for. For example, the large number of values means that spline interpolation won't do since it creates a polynomial per interval.
Also, from my understanding polynomial interpolation would be way too computationally expensive (n values means that the polynomial could include terms as high as pow(x,n-1). For x= a 4-byte number and n=100,000 it's just not feasible). I've tried looking online for a while now, but I'm not very strong with math and must not know the right terms to search with because I haven't come across anything similar so far.
I can see that this is not completely (to put it mildly) a programming question and I apologize in advance. I'm not looking for the exact solution or even a complete answer. I just need pointers on the topics that I would need to read up on so I can solve this problem on my own. Thanks!
TL;DR - I need a variant of interpolation that only needs to fit the initially given data-points, but which is computationally efficient.
Edit:
Some clarification - I do need the output to be exact and not an approximation. This is sort of an optimization of some research work I'm currently doing and I need to have this look-up implemented without the actual bytes of the outputs being present in my program. I can't really say a whole lot about it at the moment, but I will say that for the purposes of my work, encryption (or compression or any other other form of obfuscation) is not an option to hide the table. I need a mathematical function that can recreate the output so long as it has access to the input. I hope that clears things up a bit.
Here is one idea. Make your function be the sum (mod 232) of a linear function over all 4-byte integers, a piecewise linear function whose pieces depend on the value of the first bit, another piecewise linear function whose pieces depend on the value of the first two bits, and so on.
The actual output values appear nowhere, you have to add together linear terms to get them. There is also no direct record of which input values you have. (Someone could conclude something about those input values, but not their actual values.)
The various coefficients you need can be stored in a hash. Any lookups you do which are not found in the hash are assumed to be 0.
If you add a certain amount of random "noise" to your dataset before starting to encode it fairly efficiently, it would be hard to tell what your input values are, and very hard to tell what the outputs are even approximately without knowing the inputs.
Since you didn't impose any restriction on the function (continuous, smooth, etc), you could simply do a piece-wise constant interpolation:
or a linear interpolation:
I assume you can figure out how to construct such a function without too much trouble.
EDIT: In light of your additional requirement that such a function should "hide" the data points...
For a piece-wise constant interpolation, the constant intervals should be randomized so as to not reveal where the data point is. So for example in the picture, the intervals are centered about the data point it's interpolating. Instead, you might want to do something like:
[0 , 0.3) -> 0
[0.3 , 1.9) -> 0.8
[1.9 , 2.1) -> 0.9
[2.1 , 3.5) -> 0.2
etc
Of course, this only hides the x-coordinate. To hide the y-coordinate as well, you can use a linear interpolation.
Simply make it so that the "pointy" part isn't where the data point is. Pick random x-values such that every adjacent data point has one of these x-values in between. Then interpolate such that the "pointy" part is at these x-values.
I suggest a huge Lookup Table full of unused entries. It's the brute-force approach, having an ordered table of outputs, ordered by every possible value of the input (not just the data set, but also all other possible 4-byte value).
Though all of your data would be there, you could fill the non-used inputs with random, arbitrary, or stochastic (random whithin potentially complex constraints) data. If you make it convincing, no one could pick your real data out of it. If a "real" function interpolated all your data, it would also "contain" all the information of your real data, and anyone with access to it could use it to generate an LUT as described above.
LUTs are lightning-fast, but very memory hungry. Your case is on the edge of feasibility, requiring (2^32)*32= 16 Gigabytes of RAM, which requires a 64-bit machine to run. That is just for the data, not the program, the Operating System, or other data. It's better to have 24, just to be sure. If you can afford it, they are the way to go.

pattern matching

Suppose I have a set of tuples like this (each tuple will have 1,2 or 3 items):
Master Set:
{(A) (A,C) (B,C,E)}
and suppose I have another set of tuples like this:
Real Set: {(BOB) (TOM) (ERIC,SALLY,CHARLIE) (TOM,SALLY) (DANNY) (DANNY,TOM) (SALLY) (SALLY,TOM,ERIC) (BOB,SALLY) }
What I want to do is to extract all subsets of Tuples from the Real Set where the tuple members can be substituted to become the same as the Master Set.
In the example above, two sets would be returned:
{(BOB) (BOB,SALLY) (ERIC,SALLY,CHARLIE)}
(let BOB=A,ERIC=B,SALLY=C,CHARLIE=E)
and
{(DANNY) (DANNY,TOM) (SALLY,TOM,ERIC)}
(let DANNY=A,SALLY=B,TOM=C,ERIC=E)
Its sort of pattern matching, sort of combinatorics I guess. I really don't know how to classify this problem and what common plans of attack there are for it. What would the stackoverflow experts suggest?
Seperate your tuples into sets by size. Within each set, create a data structure that allows you to efficiently query for tuples containing a given element. The first part of this structure is your tuples as an array (so that each tuple has a cannonical index). The second set is: Map String (Set Int). This is somewhat space intensive but hopefully not prohibative.
Then, you, essentially, brute force it. For all assignments to the first master set, restrict all assignments to other master sets. For all remaining assignments to the second, restrict all assignments to the third and beyond, etc. The algorithm is basically inductive.
I should add that I don't think the problem is NP-complete so much as just flat worst-case exponential. It's not a decision problem, but an enumeration problem. And it's fairly easy to imagine scenarios of inputs that blow up exponentially.
It will be difficult to do efficiently since your problem is probably NP-complete (it includes subgraph isomorphism as a special case). That assumes the patterns and database both vary in size, though. How much data are you searching? How complicated will your patterns be? I would recommend the brute force solution first, then test if that is too slow and you need something fancier.

Resources