several questions about this sml recursion function - recursion

When f(x-1) is called, is it calling f(x) = x+10 or f(x) = if ...
Is this a tail recursion?
How should I rewrite it using static / dynamic allocation?
let fun f(x) = x + 10
in
let fun f(x) = if x < 1 then 0 else f(x-1)
in f(3)
end
end

Before addressing your questions, here are some observations about your code:
There are two functions f, one inside the other. They're different from one another.
To lessen this confusion you can rename the inner function to g:
let fun f(x) = x + 10
in
let fun g(x) = if x < 1 then 0 else g(x-1)
in g(3)
end
end
This clears up which function calls which by the following rules: The outer f is defined inside the outer in-end, but is immediately shadowed by the inner f. So any reference to f on the right-hand side of the inner fun f(x) = if ... is shadowed because fun enables self-recursion. And any reference to f within the inner in-end is shadowed
In the following tangential example the right-hand side of an inner declaration f does not shadow the outer f if we were using val rather than fun:
let fun f(x) = if (x mod 2 = 0) then x - 10 else x + 10
in
let val f = fn x => f(x + 2) * 2
in f(3)
end
end
If the inner f is renamed to g in this second piece of code, it'd look like:
let fun f(x) = if (x mod 2 = 0) then x - 10 else x + 10
in
let val g = fn x => f(x + 2) * 2
in g(3)
end
end
The important bit is that the f(x + 2) part was not rewritten into g(x + 2) because val means that references to f are outer fs, not the f being defined, because a val is not a self-recursive definition. So any reference to an f within that definition would have to depend on it being available in the outer scope.
But the g(3) bit is rewritten because between in-end, the inner f (now g) is shadowing. So whether it's a fun or a val does not matter with respect to the shadowing of let-in-end.
(There are some more details wrt. val rec and the exact scope of a let val f = ... that I haven't elaborated on.)
As for your questions,
You should be able to answer this now. A nice way to provide the answer is 1) rename the inner function for clarity, 2) evaluate the code by hand using substitution (one rewrite per line, ~> denoting a rewrite, so I don't mean an SML operator here).
Here's an example of how it'd look with my second example (not your code):
g(3)
~> (fn x => f(x + 2) * 2)(3)
~> f(3 + 2) * 2
~> f(5) * 2
~> (if (5 mod 2 = 0) then 5 - 10 else 5 + 10) * 2
~> (if (1 = 0) then 5 - 10 else 5 + 10) * 2
~> (5 + 10) * 2
~> 15 * 2
~> 30
Your evaluation by hand would look different and possibly conclude differently.
What is tail recursion? Provide a definition and ask if your code satisfies that definition.
I'm not sure what you mean by rewriting it using static / dynamic allocation. You'll have to elaborate.

Related

passing a function to a function SML

Below is SML code to compute a definite integral using the trapezoidal method given input f=unary function, a & b=range to take integral under, and n=number of sub-intervals to divide the range into.
fun integrate f a b n =
let val w = (b - a) / (real n)
fun genBlock c = let val BB = f c
val SB = f (c+w)
in (BB + SB) * w / 2.0
end
fun sumSlice 0 c acc = acc
| sumSlice n c acc = sumSlice (n-1) (c+w) (acc + (genBlock c))
in sumSlice n a 0.0
end
Problem is I can't figure out for the life of me how to define a function (say X cubed) and feed it to this function with a,b, and n. Here's a screenshot of me trying and receiving an error:
In this picture I define cube x =xxx and show it works, then try to feed it to the integrate function to no avail.
The error message is pretty specific: integrate is expecting a function of type real -> real but you defined a function, cube, of type int -> int.
There are a couple of things you can do:
1) Add a type annotation to the definition of cube:
- fun cube x:real = x*x*x;
val cube = fn : real -> real
And then:
- integrate cube 0.0 5.0 5;
val it = 162.5 : real
2) You can dispense with defining cube as a named function and just pass the computation as an anonymous function. In this case, SML's type inference mechanism gives the function x => x*x*x the intended type:
- integrate (fn x => x*x*x) 0.0 5.0 5;
val it = 162.5 : real

Memoization in OCaml?

It is possible to improve "raw" Fibonacci recursive procedure
Fib[n_] := If[n < 2, n, Fib[n - 1] + Fib[n - 2]]
with
Fib[n_] := Fib[n] = If[n < 2, n, Fib[n - 1] + Fib[n - 2]]
in Wolfram Mathematica.
First version will suffer from exponential explosion while second one will not since Mathematica will see repeating function calls in expression and memoize (reuse) them.
Is it possible to do the same in OCaml?
How to improve
let rec fib n = if n<2 then n else fib (n-1) + fib (n-2);;
in the same manner?
The solution provided by rgrinberg can be generalized so that we can memoize any function. I am going to use associative lists instead of hashtables. But it does not really matter, you can easily convert all my examples to use hashtables.
First, here is a function memo which takes another function and returns its memoized version. It is what nlucaroni suggested in one of the comments:
let memo f =
let m = ref [] in
fun x ->
try
List.assoc x !m
with
Not_found ->
let y = f x in
m := (x, y) :: !m ;
y
The function memo f keeps a list m of results computed so far. When asked to compute f x it first checks m to see if f x has been computed already. If yes, it returns the result, otherwise it actually computes f x, stores the result in m, and returns it.
There is a problem with the above memo in case f is recursive. Once memo calls f to compute f x, any recursive calls made by f will not be intercepted by memo. To solve this problem we need to do two things:
In the definition of such a recursive f we need to substitute recursive calls with calls to a function "to be provided later" (this will be the memoized version of f).
In memo f we need to provide f with the promised "function which you should call when you want to make a recursive call".
This leads to the following solution:
let memo_rec f =
let m = ref [] in
let rec g x =
try
List.assoc x !m
with
Not_found ->
let y = f g x in
m := (x, y) :: !m ;
y
in
g
To demonstrate how this works, let us memoize the naive Fibonacci function. We need to write it so that it accepts an extra argument, which I will call self. This argument is what the function should use instead of recursively calling itself:
let fib self = function
0 -> 1
| 1 -> 1
| n -> self (n - 1) + self (n - 2)
Now to get the memoized fib, we compute
let fib_memoized = memo_rec fib
You are welcome to try it out to see that fib_memoized 50 returns instantly. (This is not so for memo f where f is the usual naive recursive definition.)
You pretty much do what the mathematica version does but manually:
let rec fib =
let cache = Hashtbl.create 10 in
begin fun n ->
try Hashtbl.find cache n
with Not_found -> begin
if n < 2 then n
else
let f = fib (n-1) + fib (n-2) in
Hashtbl.add cache n f; f
end
end
Here I choose a hashtable to store already computed results instead of recomputing them.
Note that you should still beware of integer overflow since we are using a normal and not a big int.

How do variables in pattern matching allow parameter omission?

I'm doing some homework but I've been stuck for hours on something.
I'm sure it's really trivial but I still can't wrap my head around it after digging through the all documentation available.
Can anybody give me a hand?
Basically, the exercise in OCaml programming asks to define the function x^n with the exponentiation by squaring algorithm.
I've looked at the solution:
let rec exp x = function
0 -> 1
| n when n mod 2 = 0 -> let y = exp x (n/2) in y*y
| n when n mod 2 <> 0 -> let y = exp x ((n-1)/2) in y*y*x
;;
What I don't understand in particular is how the parameter n can be omitted from the fun statement and why should it be used as a variable for a match with x, which has no apparent link with the definition of exponentiation by squaring.
Here's how I would do it:
let rec exp x n = match n with
0 -> 1
| n when (n mod 2) = 1 -> (exp x ((n-1)/2)) * (exp x ((n-1)/2)) * x
| n when (n mod 2) = 0 -> (exp x (n/2)) * (exp x (n/2))
;;
Your version is syntaxically correct, yields a good answer, but is long to execute.
In your code, exp is called recursively twice, thus yielding twice as much computation, each call yielding itself twice as much computation, etc. down to n=0. In the solution, exp is called only once, the result is storred in the variable y, then y is squared.
Now, about the syntax,
let f n = match n with
| 0 -> 0
| foo -> foo-1
is equivalent to:
let f = function
| 0 -> 0
| foo -> foo-1
The line let rec exp x = function is the begging of a function that takes two arguments: x, and an unnammed argument used in the pattern matching. In the pattern matching, the line
| n when n mod 2 = 0 ->
names this argument n. Not that a different name could be used in each case of the pattern matching (even if that would be less clear):
| n when n mod 2 = 0 -> let y = exp x (n/2) in y*y
| p when p mod 2 <> 0 -> let y = exp x ((p-1)/2) in y*y*x
The keyword "function" is not a syntaxic sugar for
match x with
but for
fun x -> match x with
thus
let rec exp x = function
could be replaced by
let rec exp x = fun y -> match y with
which is of course equivalent with your solution
let rec exp x y = match y with
Note that i wrote "y" and not "n" to avoid confusion. The n variable introduced after the match is a new variable, which is only related to the function parameter because it match it. For instance, instead of
let y = x in ...
you could write :
match x with y -> ...
In this match expression, the "y" expression is the "pattern" matched. And like any pattern, it binds its variables (here y) with the value matched. (here the value of x) And like any pattern, the variables in the pattern are new variables, which may shadow previously defined variables. In your code :
let rec exp x n = match n with
0 -> 1
| n when (n mod 2) = 1 -> (exp x ((n-1)/2)) * (exp x ((n-1)/2)) * x
| n when (n mod 2) = 0 -> (exp x (n/2)) * (exp x (n/2))
;;
the variable n in the two cases shadow the parameter n. This isn't a problem, though, since the two variable with the same name have the same value.

No idea how to solve SICP exercise 1.11

Exercise 1.11:
A function f is defined by the rule that f(n) = n if n < 3 and f(n) = f(n - 1) + 2f(n - 2) + 3f(n - 3) if n > 3. Write a procedure that computes f by means of a recursive process. Write a procedure that computes f by means of an iterative process.
Implementing it recursively is simple enough. But I couldn't figure out how to do it iteratively. I tried comparing with the Fibonacci example given, but I didn't know how to use it as an analogy. So I gave up (shame on me) and Googled for an explanation, and I found this:
(define (f n)
(if (< n 3)
n
(f-iter 2 1 0 n)))
(define (f-iter a b c count)
(if (< count 3)
a
(f-iter (+ a (* 2 b) (* 3 c))
a
b
(- count 1))))
After reading it, I understand the code and how it works. But what I don't understand is the process needed to get from the recursive definition of the function to this. I don't get how the code could have formed in someone's head.
Could you explain the thought process needed to arrive at the solution?
You need to capture the state in some accumulators and update the state at each iteration.
If you have experience in an imperative language, imagine writing a while loop and tracking information in variables during each iteration of the loop. What variables would you need? How would you update them? That's exactly what you have to do to make an iterative (tail-recursive) set of calls in Scheme.
In other words, it might help to start thinking of this as a while loop instead of a recursive definition. Eventually you'll be fluent enough with recursive -> iterative transformations that you won't need to extra help to get started.
For this particular example, you have to look closely at the three function calls, because it's not immediately clear how to represent them. However, here's the likely thought process: (in Python pseudo-code to emphasise the imperativeness)
Each recursive step keeps track of three things:
f(n) = f(n - 1) + 2f(n - 2) + 3f(n - 3)
So I need three pieces of state to track the current, the last and the penultimate values of f. (that is, f(n-1), f(n-2) and f(n-3).) Call them a, b, c. I have to update these pieces inside each loop:
for _ in 2..n:
a = NEWVALUE
b = a
c = b
return a
So what's NEWVALUE? Well, now that we have representations of f(n-1), f(n-2) and f(n-3), it's just the recursive equation:
for _ in 2..n:
a = a + 2 * b + 3 * c
b = a
c = b
return a
Now all that's left is to figure out the initial values of a, b and c. But that's easy, since we know that f(n) = n if n < 3.
if n < 3: return n
a = 2 # f(n-1) where n = 3
b = 1 # f(n-2)
c = 0 # f(n-3)
# now start off counting at 3
for _ in 3..n:
a = a + 2 * b + 3 * c
b = a
c = b
return a
That's still a little different from the Scheme iterative version, but I hope you can see the thought process now.
I think you are asking how one might discover the algorithm naturally, outside of a 'design pattern'.
It was helpful for me to look at the expansion of the f(n) at each n value:
f(0) = 0 |
f(1) = 1 | all known values
f(2) = 2 |
f(3) = f(2) + 2f(1) + 3f(0)
f(4) = f(3) + 2f(2) + 3f(1)
f(5) = f(4) + 2f(3) + 3f(2)
f(6) = f(5) + 2f(4) + 3f(3)
Looking closer at f(3), we see that we can calculate it immediately from the known values.
What do we need to calculate f(4)?
We need to at least calculate f(3) + [the rest]. But as we calculate f(3), we calculate f(2) and f(1) as well, which we happen to need for calculating [the rest] of f(4).
f(3) = f(2) + 2f(1) + 3f(0)
↘ ↘
f(4) = f(3) + 2f(2) + 3f(1)
So, for any number n, I can start by calculating f(3), and reuse the values I use to calculate f(3) to calculate f(4)...and the pattern continues...
f(3) = f(2) + 2f(1) + 3f(0)
↘ ↘
f(4) = f(3) + 2f(2) + 3f(1)
↘ ↘
f(5) = f(4) + 2f(3) + 3f(2)
Since we will reuse them, lets give them a name a, b, c. subscripted with the step we are on, and walk through a calculation of f(5):
Step 1: f(3) = f(2) + 2f(1) + 3f(0) or f(3) = a1 + 2b1 +3c1
where
a1 = f(2) = 2,
b1 = f(1) = 1,
c1 = 0
since f(n) = n for n < 3.
Thus:
f(3) = a1 + 2b1 + 3c1 = 4
Step 2: f(4) = f(3) + 2a1 + 3b1
So:
a2 = f(3) = 4 (calculated above in step 1),
b2 = a1 = f(2) = 2,
c2 = b1 = f(1) = 1
Thus:
f(4) = 4 + 2*2 + 3*1 = 11
Step 3: f(5) = f(4) + 2a2 + 3b2
So:
a3 = f(4) = 11 (calculated above in step 2),
b3 = a2 = f(3) = 4,
c3 = b2 = f(2) = 2
Thus:
f(5) = 11 + 2*4 + 3*2 = 25
Throughout the above calculation we capture state in the previous calculation and pass it to the next step,
particularily:
astep = result of step - 1
bstep = astep - 1
cstep = bstep -1
Once I saw this, then coming up with the iterative version was straightforward.
Since the post you linked to describes a lot about the solution, I'll try to only give complementary information.
You're trying to define a tail-recursive function in Scheme here, given a (non-tail) recursive definition.
The base case of the recursion (f(n) = n if n < 3) is handled by both functions. I'm not really sure why the author does this; the first function could simply be:
(define (f n)
(f-iter 2 1 0 n))
The general form would be:
(define (f-iter ... n)
(if (base-case? n)
base-result
(f-iter ...)))
Note I didn't fill in parameters for f-iter yet, because you first need to understand what state needs to be passed from one iteration to another.
Now, let's look at the dependencies of the recursive form of f(n). It references f(n - 1), f(n - 2), and f(n - 3), so we need to keep around these values. And of course we need the value of n itself, so we can stop iterating over it.
So that's how you come up with the tail-recursive call: we compute f(n) to use as f(n - 1), rotate f(n - 1) to f(n - 2) and f(n - 2) to f(n - 3), and decrement count.
If this still doesn't help, please try to ask a more specific question — it's really hard to answer when you write "I don't understand" given a relatively thorough explanation already.
I'm going to come at this in a slightly different approach to the other answers here, focused on how coding style can make the thought process behind an algorithm like this easier to comprehend.
The trouble with Bill's approach, quoted in your question, is that it's not immediately clear what meaning is conveyed by the state variables, a, b, and c. Their names convey no information, and Bill's post does not describe any invariant or other rule that they obey. I find it easier both to formulate and to understand iterative algorithms if the state variables obey some documented rules describing their relationships to each other.
With this in mind, consider this alternative formulation of the exact same algorithm, which differs from Bill's only in having more meaningful variable names for a, b and c and an incrementing counter variable instead of a decrementing one:
(define (f n)
(if (< n 3)
n
(f-iter n 2 0 1 2)))
(define (f-iter n
i
f-of-i-minus-2
f-of-i-minus-1
f-of-i)
(if (= i n)
f-of-i
(f-iter n
(+ i 1)
f-of-i-minus-1
f-of-i
(+ f-of-i
(* 2 f-of-i-minus-1)
(* 3 f-of-i-minus-2)))))
Suddenly the correctness of the algorithm - and the thought process behind its creation - is simple to see and describe. To calculate f(n):
We have a counter variable i that starts at 2 and climbs to n, incrementing by 1 on each call to f-iter.
At each step along the way, we keep track of f(i), f(i-1) and f(i-2), which is sufficient to allow us to calculate f(i+1).
Once i=n, we are done.
What did help me was running the process manually using a pencil and using hint author gave for the fibonacci example
a <- a + b
b <- a
Translating this to new problem is how you push state forward in the process
a <- a + (b * 2) + (c * 3)
b <- a
c <- b
So you need a function with an interface to accept 3 variables: a, b, c. And it needs to call itself using process above.
(define (f-iter a b c)
(f-iter (+ a (* b 2) (* c 3)) a b))
If you run and print each variable for each iteration starting with (f-iter 1 0 0), you'll get something like this (it will run forever of course):
a b c
=========
1 0 0
1 1 0
3 1 1
8 3 1
17 8 3
42 17 8
100 42 17
235 100 42
...
Can you see the answer? You get it by summing columns b and c for each iteration. I must admit I found it by doing some trail and error. Only thing left is having a counter to know when to stop, here is the whole thing:
(define (f n)
(f-iter 1 0 0 n))
(define (f-iter a b c count)
(if (= count 0)
(+ b c)
(f-iter (+ a (* b 2) (* c 3)) a b (- count 1))))
A function f is defined by the rule that f(n) = n, if n<3 and f(n) = f(n - 1) + 2f(n - 2) + 3f(n - 3), if n > 3. Write a procedure that computes f by means of a recursive process.
It is already written:
f(n) = n, (* if *) n < 3
= f(n - 1) + 2f(n - 2) + 3f(n - 3), (* if *) n > 3
Believe it or not, there was once such a language. To write this down in another language is just a matter of syntax. And by the way, the definition as you (mis)quote it has a bug, which is now very apparent and clear.
Write a procedure that computes f by means of an iterative process.
Iteration means going forward (there's your explanation!) as opposed to the recursion's going backwards at first, to the very lowest level, and then going forward calculating the result on the way back up:
f(0) = 0
f(1) = 1
f(2) = 2
f(n) = f(n - 1) + 2f(n - 2) + 3f(n - 3)
= a + 2b + 3c
f(n+1) = f(n ) + 2f(n - 1) + 3f(n - 2)
= a' + 2b' + 3c' where
a' = f(n) = a+2b+3c,
b' = f(n-1) = a,
c' = f(n-2) = b
......
This thus describes the problem's state transitions as
(n, a, b, c) -> (n+1, a+2*b+3*c, a, b)
We could code it as
g (n, a, b, c) = g (n+1, a+2*b+3*c, a, b)
but of course it wouldn't ever stop. So we must instead have
f n = g (2, 2, 1, 0)
where
g (k, a, b, c) = g (k+1, a+2*b+3*c, a, b), (* if *) k < n
g (k, a, b, c) = a, otherwise
and this is already exactly like the code you asked about, up to syntax.
Counting up to n is more natural here, following our paradigm of "going forward", but counting down to 0 as the code you quote does is of course entirely equivalent.
The corner cases and possible off-by-one errors are left out as exercise non-interesting technicalities.

What is a Y-combinator? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
A Y-combinator is a computer science concept from the “functional” side of things. Most programmers don't know much at all about combinators, if they've even heard about them.
What is a Y-combinator?
How do combinators work?
What are they good for?
Are they useful in procedural languages?
A Y-combinator is a "functional" (a function that operates on other functions) that enables recursion, when you can't refer to the function from within itself. In computer-science theory, it generalizes recursion, abstracting its implementation, and thereby separating it from the actual work of the function in question. The benefit of not needing a compile-time name for the recursive function is sort of a bonus. =)
This is applicable in languages that support lambda functions. The expression-based nature of lambdas usually means that they cannot refer to themselves by name. And working around this by way of declaring the variable, refering to it, then assigning the lambda to it, to complete the self-reference loop, is brittle. The lambda variable can be copied, and the original variable re-assigned, which breaks the self-reference.
Y-combinators are cumbersome to implement, and often to use, in static-typed languages (which procedural languages often are), because usually typing restrictions require the number of arguments for the function in question to be known at compile time. This means that a y-combinator must be written for any argument count that one needs to use.
Below is an example of how the usage and working of a Y-Combinator, in C#.
Using a Y-combinator involves an "unusual" way of constructing a recursive function. First you must write your function as a piece of code that calls a pre-existing function, rather than itself:
// Factorial, if func does the same thing as this bit of code...
x == 0 ? 1: x * func(x - 1);
Then you turn that into a function that takes a function to call, and returns a function that does so. This is called a functional, because it takes one function, and performs an operation with it that results in another function.
// A function that creates a factorial, but only if you pass in
// a function that does what the inner function is doing.
Func<Func<Double, Double>, Func<Double, Double>> fact =
(recurs) =>
(x) =>
x == 0 ? 1 : x * recurs(x - 1);
Now you have a function that takes a function, and returns another function that sort of looks like a factorial, but instead of calling itself, it calls the argument passed into the outer function. How do you make this the factorial? Pass the inner function to itself. The Y-Combinator does that, by being a function with a permanent name, which can introduce the recursion.
// One-argument Y-Combinator.
public static Func<T, TResult> Y<T, TResult>(Func<Func<T, TResult>, Func<T, TResult>> F)
{
return
t => // A function that...
F( // Calls the factorial creator, passing in...
Y(F) // The result of this same Y-combinator function call...
// (Here is where the recursion is introduced.)
)
(t); // And passes the argument into the work function.
}
Rather than the factorial calling itself, what happens is that the factorial calls the factorial generator (returned by the recursive call to Y-Combinator). And depending on the current value of t the function returned from the generator will either call the generator again, with t - 1, or just return 1, terminating the recursion.
It's complicated and cryptic, but it all shakes out at run-time, and the key to its working is "deferred execution", and the breaking up of the recursion to span two functions. The inner F is passed as an argument, to be called in the next iteration, only if necessary.
If you're ready for a long read, Mike Vanier has a great explanation. Long story short, it allows you to implement recursion in a language that doesn't necessarily support it natively.
I've lifted this from http://www.mail-archive.com/boston-pm#mail.pm.org/msg02716.html which is an explanation I wrote several years ago.
I'll use JavaScript in this example, but many other languages will work as well.
Our goal is to be able to write a recursive function of 1
variable using only functions of 1 variables and no
assignments, defining things by name, etc. (Why this is our
goal is another question, let's just take this as the
challenge that we're given.) Seems impossible, huh? As
an example, let's implement factorial.
Well step 1 is to say that we could do this easily if we
cheated a little. Using functions of 2 variables and
assignment we can at least avoid having to use
assignment to set up the recursion.
// Here's the function that we want to recurse.
X = function (recurse, n) {
if (0 == n)
return 1;
else
return n * recurse(recurse, n - 1);
};
// This will get X to recurse.
Y = function (builder, n) {
return builder(builder, n);
};
// Here it is in action.
Y(
X,
5
);
Now let's see if we can cheat less. Well firstly we're using
assignment, but we don't need to. We can just write X and
Y inline.
// No assignment this time.
function (builder, n) {
return builder(builder, n);
}(
function (recurse, n) {
if (0 == n)
return 1;
else
return n * recurse(recurse, n - 1);
},
5
);
But we're using functions of 2 variables to get a function of 1
variable. Can we fix that? Well a smart guy by the name of
Haskell Curry has a neat trick, if you have good higher order
functions then you only need functions of 1 variable. The
proof is that you can get from functions of 2 (or more in the
general case) variables to 1 variable with a purely
mechanical text transformation like this:
// Original
F = function (i, j) {
...
};
F(i,j);
// Transformed
F = function (i) { return function (j) {
...
}};
F(i)(j);
where ... remains exactly the same. (This trick is called
"currying" after its inventor. The language Haskell is also
named for Haskell Curry. File that under useless trivia.)
Now just apply this transformation everywhere and we get
our final version.
// The dreaded Y-combinator in action!
function (builder) { return function (n) {
return builder(builder)(n);
}}(
function (recurse) { return function (n) {
if (0 == n)
return 1;
else
return n * recurse(recurse)(n - 1);
}})(
5
);
Feel free to try it. alert() that return, tie it to a button, whatever.
That code calculates factorials, recursively, without using
assignment, declarations, or functions of 2 variables. (But
trying to trace how it works is likely to make your head spin.
And handing it, without the derivation, just slightly reformatted
will result in code that is sure to baffle and confuse.)
You can replace the 4 lines that recursively define factorial with
any other recursive function that you want.
I wonder if there's any use in attempting to build this from the ground up. Let's see. Here's a basic, recursive factorial function:
function factorial(n) {
return n == 0 ? 1 : n * factorial(n - 1);
}
Let's refactor and create a new function called fact that returns an anonymous factorial-computing function instead of performing the calculation itself:
function fact() {
return function(n) {
return n == 0 ? 1 : n * fact()(n - 1);
};
}
var factorial = fact();
That's a little weird, but there's nothing wrong with it. We're just generating a new factorial function at each step.
The recursion at this stage is still fairly explicit. The fact function needs to be aware of its own name. Let's parameterize the recursive call:
function fact(recurse) {
return function(n) {
return n == 0 ? 1 : n * recurse(n - 1);
};
}
function recurser(x) {
return fact(recurser)(x);
}
var factorial = fact(recurser);
That's great, but recurser still needs to know its own name. Let's parameterize that, too:
function recurser(f) {
return fact(function(x) {
return f(f)(x);
});
}
var factorial = recurser(recurser);
Now, instead of calling recurser(recurser) directly, let's create a wrapper function that returns its result:
function Y() {
return (function(f) {
return f(f);
})(recurser);
}
var factorial = Y();
We can now get rid of the recurser name altogether; it's just an argument to Y's inner function, which can be replaced with the function itself:
function Y() {
return (function(f) {
return f(f);
})(function(f) {
return fact(function(x) {
return f(f)(x);
});
});
}
var factorial = Y();
The only external name still referenced is fact, but it should be clear by now that that's easily parameterized, too, creating the complete, generic, solution:
function Y(le) {
return (function(f) {
return f(f);
})(function(f) {
return le(function(x) {
return f(f)(x);
});
});
}
var factorial = Y(function(recurse) {
return function(n) {
return n == 0 ? 1 : n * recurse(n - 1);
};
});
Most of the answers above describe what the Y-combinator is but not what it is for.
Fixed point combinators are used to show that lambda calculus is turing complete. This is a very important result in the theory of computation and provides a theoretical foundation for functional programming.
Studying fixed point combinators has also helped me really understand functional programming. I have never found any use for them in actual programming though.
For programmers who haven't encountered functional programming in depth, and don't care to start now, but are mildly curious:
The Y combinator is a formula which lets you implement recursion in a situation where functions can't have names but can be passed around as arguments, used as return values, and defined within other functions.
It works by passing the function to itself as an argument, so it can call itself.
It's part of the lambda calculus, which is really maths but is effectively a programming language, and is pretty fundamental to computer science and especially to functional programming.
The day to day practical value of the Y combinator is limited, since programming languages tend to let you name functions.
In case you need to identify it in a police lineup, it looks like this:
Y = λf.(λx.f (x x)) (λx.f (x x))
You can usually spot it because of the repeated (λx.f (x x)).
The λ symbols are the Greek letter lambda, which gives the lambda calculus its name, and there's a lot of (λx.t) style terms because that's what the lambda calculus looks like.
y-combinator in JavaScript:
var Y = function(f) {
return (function(g) {
return g(g);
})(function(h) {
return function() {
return f(h(h)).apply(null, arguments);
};
});
};
var factorial = Y(function(recurse) {
return function(x) {
return x == 0 ? 1 : x * recurse(x-1);
};
});
factorial(5) // -> 120
Edit:
I learn a lot from looking at code, but this one is a bit tough to swallow without some background - sorry about that. With some general knowledge presented by other answers, you can begin to pick apart what is happening.
The Y function is the "y-combinator". Now take a look at the var factorial line where Y is used. Notice you pass a function to it that has a parameter (in this example, recurse) that is also used later on in the inner function. The parameter name basically becomes the name of the inner function allowing it to perform a recursive call (since it uses recurse() in it's definition.) The y-combinator performs the magic of associating the otherwise anonymous inner function with the parameter name of the function passed to Y.
For the full explanation of how Y does the magic, checked out the linked article (not by me btw.)
Anonymous recursion
A fixed-point combinator is a higher-order function fix that by definition satisfies the equivalence
forall f. fix f = f (fix f)
fix f represents a solution x to the fixed-point equation
x = f x
The factorial of a natural number can be proved by
fact 0 = 1
fact n = n * fact (n - 1)
Using fix, arbitrary constructive proofs over general/μ-recursive functions can be derived without nonymous self-referentiality.
fact n = (fix fact') n
where
fact' rec n = if n == 0
then 1
else n * rec (n - 1)
such that
fact 3
= (fix fact') 3
= fact' (fix fact') 3
= if 3 == 0 then 1 else 3 * (fix fact') (3 - 1)
= 3 * (fix fact') 2
= 3 * fact' (fix fact') 2
= 3 * if 2 == 0 then 1 else 2 * (fix fact') (2 - 1)
= 3 * 2 * (fix fact') 1
= 3 * 2 * fact' (fix fact') 1
= 3 * 2 * if 1 == 0 then 1 else 1 * (fix fact') (1 - 1)
= 3 * 2 * 1 * (fix fact') 0
= 3 * 2 * 1 * fact' (fix fact') 0
= 3 * 2 * 1 * if 0 == 0 then 1 else 0 * (fix fact') (0 - 1)
= 3 * 2 * 1 * 1
= 6
This formal proof that
fact 3 = 6
methodically uses the fixed-point combinator equivalence for rewrites
fix fact' -> fact' (fix fact')
Lambda calculus
The untyped lambda calculus formalism consists in a context-free grammar
E ::= v Variable
| λ v. E Abstraction
|  E E Application
where v ranges over variables, together with the beta and eta reduction rules
(λ x. B) E -> B[x := E] Beta
λ x. E x -> E if x doesn’t occur free in E Eta
Beta reduction substitutes all free occurrences of the variable x in the abstraction (“function”) body B by the expression (“argument”) E. Eta reduction eliminates redundant abstraction. It is sometimes omitted from the formalism. An irreducible expression, to which no reduction rule applies, is in normal or canonical form.
λ x y. E
is shorthand for
λ x. λ y. E
(abstraction multiarity),
E F G
is shorthand for
(E F) G
(application left-associativity),
λ x. x
and
λ y. y
are alpha-equivalent.
Abstraction and application are the two only “language primitives” of the lambda calculus, but they allow encoding of arbitrarily complex data and operations.
The Church numerals are an encoding of the natural numbers similar to the Peano-axiomatic naturals.
0 = λ f x. x No application
1 = λ f x. f x One application
2 = λ f x. f (f x) Twofold
3 = λ f x. f (f (f x)) Threefold
. . .
SUCC = λ n f x. f (n f x) Successor
ADD = λ n m f x. n f (m f x) Addition
MULT = λ n m f x. n (m f) x Multiplication
. . .
A formal proof that
1 + 2 = 3
using the rewrite rule of beta reduction:
ADD 1 2
= (λ n m f x. n f (m f x)) (λ g y. g y) (λ h z. h (h z))
= (λ m f x. (λ g y. g y) f (m f x)) (λ h z. h (h z))
= (λ m f x. (λ y. f y) (m f x)) (λ h z. h (h z))
= (λ m f x. f (m f x)) (λ h z. h (h z))
= λ f x. f ((λ h z. h (h z)) f x)
= λ f x. f ((λ z. f (f z)) x)
= λ f x. f (f (f x)) Normal form
= 3
Combinators
In lambda calculus, combinators are abstractions that contain no free variables. Most simply: I, the identity combinator
λ x. x
isomorphic to the identity function
id x = x
Such combinators are the primitive operators of combinator calculi like the SKI system.
S = λ x y z. x z (y z)
K = λ x y. x
I = λ x. x
Beta reduction is not strongly normalizing; not all reducible expressions, “redexes”, converge to normal form under beta reduction. A simple example is divergent application of the omega ω combinator
λ x. x x
to itself:
(λ x. x x) (λ y. y y)
= (λ y. y y) (λ y. y y)
. . .
= _|_ Bottom
Reduction of leftmost subexpressions (“heads”) is prioritized. Applicative order normalizes arguments before substitution, normal order does not. The two strategies are analogous to eager evaluation, e.g. C, and lazy evaluation, e.g. Haskell.
K (I a) (ω ω)
= (λ k l. k) ((λ i. i) a) ((λ x. x x) (λ y. y y))
diverges under eager applicative-order beta reduction
= (λ k l. k) a ((λ x. x x) (λ y. y y))
= (λ l. a) ((λ x. x x) (λ y. y y))
= (λ l. a) ((λ y. y y) (λ y. y y))
. . .
= _|_
since in strict semantics
forall f. f _|_ = _|_
but converges under lazy normal-order beta reduction
= (λ l. ((λ i. i) a)) ((λ x. x x) (λ y. y y))
= (λ l. a) ((λ x. x x) (λ y. y y))
= a
If an expression has a normal form, normal-order beta reduction will find it.
Y
The essential property of the Y fixed-point combinator
λ f. (λ x. f (x x)) (λ x. f (x x))
is given by
Y g
= (λ f. (λ x. f (x x)) (λ x. f (x x))) g
= (λ x. g (x x)) (λ x. g (x x)) = Y g
= g ((λ x. g (x x)) (λ x. g (x x))) = g (Y g)
= g (g ((λ x. g (x x)) (λ x. g (x x)))) = g (g (Y g))
. . . . . .
The equivalence
Y g = g (Y g)
is isomorphic to
fix f = f (fix f)
The untyped lambda calculus can encode arbitrary constructive proofs over general/μ-recursive functions.
FACT = λ n. Y FACT' n
FACT' = λ rec n. if n == 0 then 1 else n * rec (n - 1)
FACT 3
= (λ n. Y FACT' n) 3
= Y FACT' 3
= FACT' (Y FACT') 3
= if 3 == 0 then 1 else 3 * (Y FACT') (3 - 1)
= 3 * (Y FACT') (3 - 1)
= 3 * FACT' (Y FACT') 2
= 3 * if 2 == 0 then 1 else 2 * (Y FACT') (2 - 1)
= 3 * 2 * (Y FACT') 1
= 3 * 2 * FACT' (Y FACT') 1
= 3 * 2 * if 1 == 0 then 1 else 1 * (Y FACT') (1 - 1)
= 3 * 2 * 1 * (Y FACT') 0
= 3 * 2 * 1 * FACT' (Y FACT') 0
= 3 * 2 * 1 * if 0 == 0 then 1 else 0 * (Y FACT') (0 - 1)
= 3 * 2 * 1 * 1
= 6
(Multiplication delayed, confluence)
For Churchian untyped lambda calculus, there has been shown to exist a recursively enumerable infinity of fixed-point combinators besides Y.
X = λ f. (λ x. x x) (λ x. f (x x))
Y' = (λ x y. x y x) (λ y x. y (x y x))
Z = λ f. (λ x. f (λ v. x x v)) (λ x. f (λ v. x x v))
Θ = (λ x y. y (x x y)) (λ x y. y (x x y))
. . .
Normal-order beta reduction makes the unextended untyped lambda calculus a Turing-complete rewrite system.
In Haskell, the fixed-point combinator can be elegantly implemented
fix :: forall t. (t -> t) -> t
fix f = f (fix f)
Haskell’s laziness normalizes to a finity before all subexpressions have been evaluated.
primes :: Integral t => [t]
primes = sieve [2 ..]
where
sieve = fix (\ rec (p : ns) ->
p : rec [n | n <- ns
, n `rem` p /= 0])
David Turner: Church's Thesis and Functional Programming
Alonzo Church: An Unsolvable Problem of Elementary Number Theory
Lambda calculus
Church–Rosser theorem
Other answers provide pretty concise answer to this, without one important fact: You don't need to implement fixed point combinator in any practical language in this convoluted way and doing so serves no practical purpose (except "look, I know what Y-combinator is"). It's important theoretical concept, but of little practical value.
Here is a JavaScript implementation of the Y-Combinator and the Factorial function (from Douglas Crockford's article, available at: http://javascript.crockford.com/little.html).
function Y(le) {
return (function (f) {
return f(f);
}(function (f) {
return le(function (x) {
return f(f)(x);
});
}));
}
var factorial = Y(function (fac) {
return function (n) {
return n <= 2 ? n : n * fac(n - 1);
};
});
var number120 = factorial(5);
A Y-Combinator is another name for a flux capacitor.
I have written a sort of "idiots guide" to the Y-Combinator in both Clojure and Scheme in order to help myself come to grips with it. They are influenced by material in "The Little Schemer"
In Scheme:
https://gist.github.com/z5h/238891
or Clojure:
https://gist.github.com/z5h/5102747
Both tutorials are code interspersed with comments and should be cut & pastable into your favourite editor.
As a newbie to combinators, I found Mike Vanier's article (thanks Nicholas Mancuso) to be really helpful. I would like to write a summary, besides documenting my understanding, if it could be of help to some others I would be very glad.
From Crappy to Less Crappy
Using factorial as an example, we use the following almost-factorial function to calculate factorial of number x:
def almost-factorial f x = if iszero x
then 1
else * x (f (- x 1))
In the pseudo-code above, almost-factorial takes in function f and number x (almost-factorial is curried, so it can be seen as taking in function f and returning a 1-arity function).
When almost-factorial calculates factorial for x, it delegates the calculation of factorial for x - 1 to function f and accumulates that result with x (in this case, it multiplies the result of (x - 1) with x).
It can be seen as almost-factorial takes in a crappy version of factorial function (which can only calculate till number x - 1) and returns a less-crappy version of factorial (which calculates till number x). As in this form:
almost-factorial crappy-f = less-crappy-f
If we repeatedly pass the less-crappy version of factorial to almost-factorial, we will eventually get our desired factorial function f. Where it can be considered as:
almost-factorial f = f
Fix-point
The fact that almost-factorial f = f means f is the fix-point of function almost-factorial.
This was a really interesting way of seeing the relationships of the functions above and it was an aha moment for me. (please read Mike's post on fix-point if you haven't)
Three functions
To generalize, we have a non-recursive function fn (like our almost-factorial), we have its fix-point function fr (like our f), then what Y does is when you give Y fn, Y returns the fix-point function of fn.
So in summary (simplified by assuming fr takes only one parameter; x degenerates to x - 1, x - 2... in recursion):
We define the core calculations as fn: def fn fr x = ...accumulate x with result from (fr (- x 1)), this is the almost-useful function - although we cannot use fn directly on x, it will be useful very soon. This non-recursive fn uses a function fr to calculate its result
fn fr = fr, fr is the fix-point of fn, fr is the useful funciton, we can use fr on x to get our result
Y fn = fr, Y returns the fix-point of a function, Y turns our almost-useful function fn into useful fr
Deriving Y (not included)
I will skip the derivation of Y and go to understanding Y. Mike Vainer's post has a lot of details.
The form of Y
Y is defined as (in lambda calculus format):
Y f = λs.(f (s s)) λs.(f (s s))
If we replace the variable s in the left of the functions, we get
Y f = λs.(f (s s)) λs.(f (s s))
=> f (λs.(f (s s)) λs.(f (s s)))
=> f (Y f)
So indeed, the result of (Y f) is the fix-point of f.
Why does (Y f) work?
Depending the signature of f, (Y f) can be a function of any arity, to simplify, let's assume (Y f) only takes one parameter, like our factorial function.
def fn fr x = accumulate x (fr (- x 1))
since fn fr = fr, we continue
=> accumulate x (fn fr (- x 1))
=> accumulate x (accumulate (- x 1) (fr (- x 2)))
=> accumulate x (accumulate (- x 1) (accumulate (- x 2) ... (fn fr 1)))
the recursive calculation terminates when the inner-most (fn fr 1) is the base case and fn doesn't use fr in the calculation.
Looking at Y again:
fr = Y fn = λs.(fn (s s)) λs.(fn (s s))
=> fn (λs.(fn (s s)) λs.(fn (s s)))
So
fr x = Y fn x = fn (λs.(fn (s s)) λs.(fn (s s))) x
To me, the magical parts of this setup are:
fn and fr interdepend on each other: fr 'wraps' fn inside, every time fr is used to calculate x, it 'spawns' ('lifts'?) an fn and delegates the calculation to that fn (passing in itself fr and x); on the other hand, fn depends on fr and uses fr to calculate result of a smaller problem x-1.
At the time fr is used to define fn (when fn uses fr in its operations), the real fr is not yet defined.
It's fn which defines the real business logic. Based on fn, Y creates fr - a helper function in a specific form - to facilitate the calculation for fn in a recursive manner.
It helped me understanding Y this way at the moment, hope it helps.
BTW, I also found the book An Introduction to Functional Programming Through Lambda Calculus very good, I'm only part through it and the fact that I couldn't get my head around Y in the book led me to this post.
Here are answers to the original questions, compiled from the article (which is TOTALY worth reading) mentioned in the answer by Nicholas Mancuso, as well as other answers:
What is a Y-combinator?
An Y-combinator is a "functional" (or a higher-order function — a function that operates on other functions) that takes a single argument, which is a function that isn't recursive, and returns a version of the function which is recursive.
Somewhat recursive =), but more in-depth definition:
A combinator — is just a lambda expression with no free variables.
Free variable — is a variable that is not a bound variable.
Bound variable — variable which is contained inside the body of a lambda expression that has that variable name as one of its arguments.
Another way to think about this is that combinator is such a lambda expression, in which you are able to replace the name of a combinator with its definition everywhere it is found and have everything still work (you will get into an infinite loop if combinator would contain reference to itself, inside the lambda body).
Y-combinator is a fixed-point combinator.
Fixed point of a function is an element of the function's domain that is mapped to itself by the function.
That is to say, c is a fixed point of the function f(x) if f(c) = c
This means f(f(...f(c)...)) = fn(c) = c
How do combinators work?
Examples below assume strong + dynamic typing:
Lazy (normal-order) Y-combinator:
This definition applies to languages with lazy (also: deferred, call-by-need) evaluation — evaluation strategy which delays the evaluation of an expression until its value is needed.
Y = λf.(λx.f(x x)) (λx.f(x x)) = λf.(λx.(x x)) (λx.f(x x))
What this means is that, for a given function f (which is a non-recursive function), the corresponding recursive function can be obtained first by computing λx.f(x x), and then applying this lambda expression to itself.
Strict (applicative-order) Y-combinator:
This definition applies to languages with strict (also: eager, greedy) evaluation — evaluation strategy in which an expression is evaluated as soon as it is bound to a variable.
Y = λf.(λx.f(λy.((x x) y))) (λx.f(λy.((x x) y))) = λf.(λx.(x x)) (λx.f(λy.((x x) y)))
It is same as lazy one in it's nature, it just has an extra λ wrappers to delay the lambda's body evaluation. I've asked another question, somewhat related to this topic.
What are they good for?
Stolen borrowed from answer by Chris Ammerman: Y-combinator generalizes recursion, abstracting its implementation, and thereby separating it from the actual work of the function in question.
Even though, Y-combinator has some practical applications, it is mainly a theoretical concept, understanding of which will expand your overall vision and will, likely, increase your analytical and developer skills.
Are they useful in procedural languages?
As stated by Mike Vanier: it is possible to define a Y combinator in many statically typed languages, but (at least in the examples I've seen) such definitions usually require some non-obvious type hackery, because the Y combinator itself doesn't have a straightforward static type. That's beyond the scope of this article, so I won't mention it further
And as mentioned by Chris Ammerman: most procedural languages has static-typing.
So answer to this one — not really.
A fixed point combinator (or fixed-point operator) is a higher-order function that computes a fixed point of other functions. This operation is relevant in programming language theory because it allows the implementation of recursion in the form of a rewrite rule, without explicit support from the language's runtime engine. (src Wikipedia)
The y-combinator implements anonymous recursion. So instead of
function fib( n ){ if( n<=1 ) return n; else return fib(n-1)+fib(n-2) }
you can do
function ( fib, n ){ if( n<=1 ) return n; else return fib(n-1)+fib(n-2) }
of course, the y-combinator only works in call-by-name languages. If you want to use this in any normal call-by-value language, then you will need the related z-combinator (y-combinator will diverge/infinite-loop).
The this-operator can simplify your life:
var Y = function(f) {
return (function(g) {
return g(g);
})(function(h) {
return function() {
return f.apply(h(h), arguments);
};
});
};
Then you avoid the extra function:
var fac = Y(function(n) {
return n == 0 ? 1 : n * this(n - 1);
});
Finally, you call fac(5).
I think the best way to answer this is to pick a language, like JavaScript:
function factorial(num)
{
// If the number is less than 0, reject it.
if (num < 0) {
return -1;
}
// If the number is 0, its factorial is 1.
else if (num == 0) {
return 1;
}
// Otherwise, call this recursive procedure again.
else {
return (num * factorial(num - 1));
}
}
Now rewrite it so that it doesn't use the name of the function inside the function, but still calls it recursively.
The only place the function name factorial should be seen is at the call site.
Hint: you can't use names of functions, but you can use names of parameters.
Work the problem. Don't look it up. Once you solve it, you will understand what problem the y-combinator solves.

Resources