Eta expansion - name origin - functional-programming

Just wondering where the name come from, eta?
The only two things I know about eta are:
estimated time of arrival
seventh letter of Greek alphabet

It seems that that the rules were simply taken from start of the Greek alphabet:
α - variable renaming
β - beta reduction
γ - (I haven't seen any gamma rule, if you have, please let me know)
δ - Church's delta rule (see a very short notice in Barendregt, H. P. The Lambda Calculus: Its Syntax and Semantics.):
δ MN = T if M=N and δ MN = F if M is not N for all closed nf's M and N
ε - (I haven't seen any epsilon rule)
ζ - if Ux=Vx and x doesn't occur in UV then U=V
η - the eta-rule
A further interesting source of information could be History of Lambda-calculus and
Combinatory Logic by F.Cardone and J.R.Hindley, it's likely that there were some rules that have been abandoned long time ago.

Related

Introducing fixed representation for a quotient type in Isabelle

This question is better explained with an example. Suppose I want to prove the following lemma:
lemma int_inv: "(n::int) - (n::int) = (0::int)"
How I'd informally prove this is something along these lines:
Lemma: n - n = 0, for any integer n and 0 = abs_int(0,0).
Proof:
Let abs_int(a,b) = n for some fixed natural numbers a and b.
--- some complex and mind blowing argument here ---
That means it suffices to prove that a+b+0 = a+b+0, which is true by reflexivity.
QED.
However, I'm having trouble with the first step "Let abs_int(a,b) = n". The let statement doesn't seem to be made for this, as it only allows one term on the left side, so I'm lost at how I could introduce the variables a and b in an arbitrary representation for n.
How may I introduce a fixed reprensentation for a quotient type so I may use the variables in it?
Note: I know the statement above can be proved by auto, and the problem may be sidestepped by rewriting the lemma as "lemma int_inv: "Abs_integ(a,b) - Abs_integ(a,b) = (0::int)". However, I'm looking specifically for a way to prove by introducing an arbitrary representation in the proof.
You can introduce a concrete representation with the theorem int.abs_induct. However, you almost never want to do that manually.
The general method of proving statements about quotients is to first state an equivalent theorem about the underlying relation, and then use the transfer tool. It would've helped if your example wasn't automatically discharged by automation... in fact, let's create our own little int type so that it isn't:
theory Scratch
imports Main
begin
quotient_type int = "nat × nat" / "intrel"
morphisms Rep_Integ Abs_Integ
proof (rule equivpI)
show "reflp intrel" by (auto simp: reflp_def)
show "symp intrel" by (auto simp: symp_def)
show "transp intrel" by (auto simp: transp_def)
qed
lift_definition sub :: "int ⇒ int ⇒ int"
is "λ(x, y) (u, v). (x + v, y + u)"
by auto
lift_definition zero :: "int" is "(0, 0)".
Now, we have
lemma int_inv: "sub n n = zero"
apply transfer
proof (prove)
goal (1 subgoal):
1. ⋀n. intrel ((case n of (x, y) ⇒ λ(u, v). (x + v, y + u)) n) (0, 0)
So, the version we want to prove is
lemma int_inv': "intrel ((case n of (x, y) ⇒ λ(u, v). (x + v, y + u)) n) (0, 0)"
by (induct n) simp
Now we can transfer it with
lemma int_inv: "sub n n = zero"
by transfer (fact int_inv')
Note that the transfer proof method is backtracking — this means that it will try many possible transfers until one of them succeeds. Note however, that this backtracking doesn't apply across separate apply commands. Thus you will always want to write a transfer proof as by transfer something_simple, instead of, say proof transfer.
You can see the many possible versions with
apply transfer
back back back back back
Note also, that if your theorem mentions constants about int which weren't defined with lift_definition, you will need to prove a transfer rule for them separately. There are some examples of that here.
In general, after defining a quotient you will want to "forget" about its underlying construction as soon as possible, proving enough properties by transfer so that the rest can be proven without peeking into your type's construction.

Asymptotic bounds and Big O notation

Is it right to say that suppose we have two monotonically increasing functions f,g so that f(n)=Ω(n) and f(g(n))=O(n). Then I want to conclude that g(n)=O(n).
I think that this is a false claim, and I've been trying to provide counter example to show that this is false claim, but after many attempts I'm starting to think otherwise.
Can you please provide some kind of explanation or example if this is a false claim or a way to prove if it's a correct one.
I believe this claim is true. Here's a proof.
Suppose that f(n) = Ω(n). That means that there are constants c, n0 such that
f(n) ≥ cn for any n ≥ n0. (1)
Similarly, since f(g(n)) = O(n), we know that there are constants d, n1 such that
f(g(n)) ≤ dn for any n ≥ n1. (2)
Now, there are two options. The first is that g(n) = O(1), in which case we're done because g(n) is then O(n). The second case is that g(n) ≠ O(1), in which case g grows without bound. That means that there is an n2 such that g(n2) ≥ n0 (g grows without bound, so it eventually overtakes n0) and n2 ≥ n1 (just pick a big n2).
Now, pick any n ≥ n2. Since n ≥ n2, we have that g(n) ≥ g(n2) ≥ n0 because g is monotone increasing, and therefore by (1) we see that
f(g(n)) ≥ cg(n).
Since n ≥ n2 ≥ n1, we can combine this inequality with equation (2) to see that
dn ≥ f(g(n)) ≥ cg(n).
so, in particular, we have that
g(n) ≤ (d / c)n
for all n ≥ n2, so g(n) = O(n).

Expected worst-case time complexity of chained hash table lookups?

When implementing a hash table using a good hash function (one where the probability of any two elements colliding is 1 / m, where m is the number of buckets), it is well-known that the average-case running time for looking up an element is Θ(1 + α), where α is the load factor. The worst-case running time is O(n), though, if all the elements end up put into the same bucket.
I was recently doing some reading on hash tables and found this article which claims (on page 3) that if α = 1, the expected worst-case complexity is Θ(log n / log log n). By "expected worst-case complexity," I mean, on expectation, the maximum amount of work you'll have to do if the elements are distributed by a uniform hash function. This is different from the actual worst-case, since the worst-case behavior (all elements in the same bucket) is extremely unlikely to actually occur.
My question is the following - the author seems to suggest that differing the value of α can change the expected worst-case complexity of a lookup. Does anyone know of a formula, table, or article somewhere that discusses how changing α changes the expected worst-case runtime?
For fixed α, the expected worst time is always Θ(log n / log log n). However if you make α a function of n, then the expected worst time can change. For instance if α = O(n) then the expected worst time is O(n) (that's the case where you have a fixed number of hash buckets).
In general the distribution of items into buckets is approximately a Poisson distribution, the odds of a random bucket having i items is αi e-α / i!. The worst case is just the m'th worst out of m close to independent observations. (Not entirely independent, but fairly close to it.) The m'th worst out of m observations tends to be something whose odds of happening are about 1/m times. (More precisely the distribution is given by a Β distribution, but for our analysis 1/m is good enough.)
As you head into the tail of the Poisson distribution the growth of the i! term dominates everything else, so the cumulative probability of everything above a given i is smaller than the probability of selecting i itself. So to a good approximation you can figure out the expected value by solving for:
αi e-α / i! = 1/m = 1/(n/α) = α/n
Take logs of both sides and we get:
i log(α) - α - (i log(i) - i + O(log(i)) = log(α) - log(n)
log(n) - α = i log(i) - i - i log(α) + O(log(i))
If we hold α constant then this is:
log(n) = i log(i) + O(i)
Can this work if i has the form k log(n) / log(log(n)) with k = Θ(1)? Let's try it:
log(n) = (k log(n) / log(log(n))) (log(k) + log(log(n)) - log(log(log(n)))) + O(log(log(n)))
= k (log(n) + o(log(n)) + o(log(n))
And then we get the sharper estimate that, for any fixed load average α, the expected worst time is (1 + o(1)) log(n) / log(log(n))
After some searching, I came across this research paper that gives a complete analysis of the expected worst-case behavior of a whole bunch of different types of hash tables, including chained hash tables. The author gives as an answer that the expected length is approximately Γ-1(m), where m is the number of buckets and Γ is the Gamma function. Assuming that α is a constant, this is approximately ln m / ln ln m.
Hope this helps!

Asymptotic complexity constant, why the constant?

Big oh notation says that all g(n) are an element c.f(n), O(g(n)) for some constant c.
I have always wondered and never really understood why we need this arbitrary constant to multiply with the bounding function f(n) to get our bounds?
Also how does one decide what number this constant should be?
The constant itself doesn't characterize the limiting behavior of the f(n) compared to g(n).
It is used for the mathematical definition, which enforces the existence of a constant M such that
If such a constant exists then you can state that f(x) is an O(g(x)), and this is the usual notation when analyzing algorithms, you just don't care about which is the constant but just the complexity of operations itself. The constant is able make that disequation correct by ensuring that M|g(x)| is an upper bound of f(x).
How to find that constant depends on f(x) and g(x) and it is the mathematical point that must be proved to ensure that f(x) has a g(x) big-o so there's not a general rule. Look at this example.
Consider function
f(n) = 4 * n
Doesn't it make sense to call this function O(n) since it grows "as fast" as g(n) = n.
But without constant in definition of O you can't find n0 such as that for all n > n0, f(n) <= n. That's why you need constant, and indeed from condition,
4 * n <= c * n for all n > n0
you can get n0 == 0, c == 4.

Pohlig–Hellman algorithm for computing discrete logarithms

I'm working on coding the Pohlig-Hellman Algorithm but I am having problem understand the steps in the algorithm based on the definition of the algorithm.
Going by the Wiki of the algorithm:
I know the first part 1) is to calculate the prime factor of p-1 - which is fine.
However, I am not sure what I need to do in steps 2) where you calculate the co-efficents:
Let x2 = c0 + c1(2).
125(180/2) = 12590 1 mod (181) so c0 = 0.
125(180/4) = 12545 1 mod (181) so c1 = 0.
Thus, x2 = 0 + 0 = 0.
and 3) put the coefficents together and solve in the chinese remainder theorem.
Can someone help with explaining this in plain english (i) - or pseudocode. I want to code the solution myself obviously but I cannot make any more progress unless i understand the algorithm.
Note: I have done a lot of searching for this and I read S. Pohlig and M. Hellman (1978). "An Improved Algorithm for Computing Logarithms over GF(p) and its Cryptographic Significance but its still not really making sense to me.
Thanks in advance
Update:
how come q(125) stays constant in this example.
Where as in this example is appears like he is calculating a new q each time.
To be more specific I don't understand how the following is computed:
Now divide 7531 by a^c0 to get
7531(a^-2) = 6735 mod p.
Let's start with the main idea behind Pohlig-Hellman. Assume that we are given y, g and p and that we want to find x, such that
y == gx (mod p).
(I'm using == to denote an equivalence relation). To simplify things, I'm also assuming that the order of g is p-1, i.e. the smallest positive k with 1==gk (mod p) is k=p-1.
An inefficient method to find x, would be to simply try all values in the range 1 .. p-1.
Somewhat better is the "Baby-step giant-step" method that requires O(p0.5) arithmetic operations. Both methods are quite slow for large p. Pohlig-Hellman is a significant improvement when p-1 has many factors. I.e. assume that
p-1 = n r
Then what Pohlig and Hellman propose is to solve the equation
yn == (gn)z
(mod p).
If we take logarithms to the basis g on both sides, this is the same as
n logg(y) == logg(yn) == nz (mod p-1).
n can be divided out, giving
logg(y) == z (mod r).
Hence x == z (mod r).
This is an improvement, since we only have to search a range 0 .. r-1 for a solution of z. And again "Baby-step giant-step" can be used to improve the search for z. Obviously, doing this once is not a complete solution yet. I.e. one has to repeat the algorithm above for every prime factor r of p-1 and then to use the Chinese remainder theorem to find x from the partial solutions. This works nicely if p-1 is square free.
If p-1 is divisible by a prime power then a similiar idea can be used. For example let's assume that p-1 = m qk.
In the first step, we compute z such that x == z (mod q) as shown above. Next we want to extend this to a solution x == z' (mod q2). E.g. if p-1 = m q2 then this means that we have to find z' such that
ym == (gm)z' (mod p).
Since we already know that z' == z (mod q), z' must be in the set {z, z+q, z+2q, ..., z+(q-1)q }. Again we could either do an exhaustive search for z' or improve the search with "baby-step giant-step". This step is repeated for every exponent of q, this is from knowing x mod qi we iteratively derive x mod qi+1.
I'm coding it up myself right now (JAVA). I'm using Pollard-Rho to find the small prime factors of p-1. Then using Pohlig-Hellman to solve a DSA private key. y = g^x. I am having the same problem..
UPDATE: "To be more specific I don't understand how the following is computed: Now divide 7531 by a^c0 to get 7531(a^-2) = 6735 mod p."
if you find the modInverse of a^c0 it will make sense
Regards

Resources