What is the definition of "natural recursion"? - recursion

The Third Commandment of The Little Schemer states:
When building a list, describe the first typical element, and then cons it onto the natural recursion.
What is the exact definition of "natural recursion"? The reason why I am asking is because I am taking a class on programming language principles by Daniel Friedman and the following code is not considered "naturally recursive":
(define (plus x y)
(if (zero? y) x
(plus (add1 x) (sub1 y))))
However, the following code is considered "naturally recursive":
(define (plus x y)
(if (zero? y) x
(add1 (plus x (sub1 y)))))
I prefer the "unnaturally recursive" code because it is tail recursive. However, such code is considered anathema. When I asked as to why we shouldn't write the function in tail recursive form then the associate instructor simply replied, "You don't mess with the natural recursion."
What's the advantage of writing the function in the "naturally recursive" form?

"Natural" (or just "Structural") recursion is the best way to start teaching students about recursion. This is because it has the wonderful guarantee that Joshua Taylor points out: it's guaranteed to terminate[*]. Students have a hard enough time wrapping their heads around this kind of program that making this a "rule" can save them a huge amount of head-against-wall-banging.
When you choose to leave the realm of structural recursion, you (the programmer) have taken on an additional responsibility, which is to ensure that your program halts on all inputs; it's one more thing to think about & prove.
In your case, it's a bit more subtle. You have two arguments, and you're making structurally recursive calls on the second one. In fact, with this observation (program is structurally recursive on argument 2), I would argue that your original program is pretty much just as legitimate as the non-tail-calling one, since it inherits the same proof-of-convergence. Ask Dan about this; I'd be interested to hear what he has to say.
[*] To be precise here you have to legislate out all kinds of other goofy stuff like calls to other functions that don't terminate, etc.

The natural recursion has to do with the "natural", recursive definition of the type you are dealing with. Here, you are working with natural numbers; since "obviously" a natural number is either zero or the successor of another natural number, when you want to build a natural number, you naturally output 0 or (add1 z) for some other natural z which happens to be computed recursively.
The teacher probably wants you to make the link between recursive type definitions and recursive processing of values of that type. You would not have the kind of problem you have with numbers if you tried to process trees or lists, because you routinely use natural numbers in "unnatural ways" and thus, you might have natural objections thinking in terms of Church numerals.
The fact that you already know how to write tail-recursive functions is irrelevant in that context: this is apparently not the objective of your teacher to talk about tail-call optimizations, at least for now.
The associate instructor was not very helpful at first ("messing with natural recursion" sounds as "don't ask"), but the detailed explanation he/she gave in the snapshot you gave was more appropriate.

(define (plus x y)
(if (zero? y) x
(add1 (plus x (sub1 y)))))
When y != 0 it has to remember that once the result of (plus x (sub1 y)) is known, it has to compute add1 on it. Hence when y reaches zero, the recursion is at its deepest. Now the backtracking phase begins and the add1's are executed. This process can be observed using trace.
I did the trace for :
(require racket/trace)
(define (add1 x) ...)
(define (sub1 x) ...)
(define (plus x y) ...)
(trace plus)
(plus 2 3)
Here's the trace :
>(plus 2 3)
> (plus 2 2)
> >(plus 2 1)
> > (plus 2 0) // Deepest point of recursion
< < 2 // Backtracking begins, performing add1 on the results
< <3
< 4
<5
5 // Result
The difference is that the other version has no backtracking phase. It is calling itself for a few times but it is iterative, because it is remembering intermediate results (passed as arguments). Hence the process is not consuming extra space.
Sometimes implementing a tail-recursive procedure is easier or more elegant then writing it's iterative equivalent. But for some purposes you can not/may not implement it in a recursive way.
PS : I had a class which was covering a bit about garbage collection algorithms. Such algorithms may not be recursive as there may be no space left, hence having no space for the recursion. I remember an algorithm called "Deutsch-Schorr-Waite" which was really hard to understand at first. First he implemented the recursive version just to understand the concept, afterwards he wrote the iterative version (hence manually having to remember from where in memory he came), believe me the recursive one was way easier but could not be used in practice...

Related

Why is foldl defined in a strange way in Racket?

In Haskell, like in many other functional languages, the function foldl is defined such that, for example, foldl (-) 0 [1,2,3,4] = -10.
This is OK, because foldl (-) 0 [1, 2,3,4] is, by definition, ((((0 - 1) - 2) - 3) - 4).
But, in Racket, (foldl - 0 '(1 2 3 4)) is 2, because Racket "intelligently" calculates like this: (4 - (3 - (2 - (1 - 0)))), which indeed is 2.
Of course, if we define auxiliary function flip, like this:
(define (flip bin-fn)
(lambda (x y)
(bin-fn y x)))
then we could in Racket achieve the same behavior as in Haskell: instead of (foldl - 0 '(1 2 3 4)) we can write: (foldl (flip -) 0 '(1 2 3 4))
The question is: Why is foldl in racket defined in such an odd (nonstandard and nonintuitive) way, differently than in any other language?
The Haskell definition is not uniform. In Racket, the function to both folds have the same order of inputs, and therefore you can just replace foldl by foldr and get the same result. If you do that with the Haskell version you'd get a different result (usually) — and you can see this in the different types of the two.
(In fact, I think that in order to do a proper comparison you should avoid these toy numeric examples where both of the type variables are integers.)
This has the nice byproduct where you're encouraged to choose either foldl or foldr according to their semantic differences. My guess is that with Haskell's order you're likely to choose according to the operation. You have a good example for this: you've used foldl because you want to subtract each number — and that's such an "obvious" choice that it's easy to overlook the fact that foldl is usually a bad choice in a lazy language.
Another difference is that the Haskell version is more limited than the Racket version in the usual way: it operates on exactly one input list, whereas Racket can accept any number of lists. This makes it more important to have a uniform argument order for the input function).
Finally, it is wrong to assume that Racket diverged from "many other functional languages", since folding is far from a new trick, and Racket has roots that are far older than Haskell (or these other languages). The question could therefore go the other way: why is Haskell's foldl defined in a strange way? (And no, (-) is not a good excuse.)
Historical update:
Since this seems to bother people again and again, I did a little bit of legwork. This is not definitive in any way, just my second-hand guessing. Feel free to edit this if you know more, or even better, email the relevant people and ask. Specifically, I don't know the dates where these decisions were made, so the following list is in rough order.
First there was Lisp, and no mention of "fold"ing of any kind. Instead, Lisp has reduce which is very non-uniform, especially if you consider its type. For example, :from-end is a keyword argument that determines whether it's a left or a right scan and it uses different accumulator functions which means that the accumulator type depends on that keyword. This is in addition to other hacks: usually the first value is taken from the list (unless you specify an :initial-value). Finally, if you don't specify an :initial-value, and the list is empty, it will actually apply the function on zero arguments to get a result.
All of this means that reduce is usually used for what its name suggests: reducing a list of values into a single value, where the two types are usually the same. The conclusion here is that it's serving a kind of a similar purpose to folding, but it's not nearly as useful as the generic list iteration construct that you get with folding. I'm guessing that this means that there's no strong relation between reduce and the later fold operations.
The first relevant language that follows Lisp and has a proper fold is ML. The choice that was made there, as noted in newacct's answer below, was to go with the uniform types version (ie, what Racket uses).
The next reference is Bird & Wadler's ItFP (1988), which uses different types (as in Haskell). However, they note in the appendix that Miranda has the same type (as in Racket).
Miranda later on switched the argument order (ie, moved from the Racket order to the Haskell one). Specifically, that text says:
WARNING - this definition of foldl differs from that in older versions of Miranda. The one here is the same as that in Bird and Wadler (1988). The old definition had the two args of `op' reversed.
Haskell took a lot of stuff from Miranda, including the different types. (But of course I don't know the dates so maybe the Miranda change was due to Haskell.) In any case, it's clear at this point that there was no consensus, hence the reversed question above holds.
OCaml went with the Haskell direction and uses different types
I'm guessing that "How to Design Programs" (aka HtDP) was written at roughly the same period, and they chose the same type. There is, however, no motivation or explanation — and in fact, after that exercise it's simply mentioned as one of the built-in functions.
Racket's implementation of the fold operations was, of course, the "built-ins" that are mentioned here.
Then came SRFI-1, and the choice was to use the same-type version (as Racket). This decision was question by John David Stone, who points at a comment in the SRFI that says
Note: MIT Scheme and Haskell flip F's arg order for their reduce and fold functions.
Olin later addressed this: all he said was:
Good point, but I want consistency between the two functions.
state-value first: srfi-1, SML
state-value last: Haskell
Note in particular his use of state-value, which suggests a view where consistent types are a possibly more important point than operator order.
"differently than in any other language"
As a counter-example, Standard ML (ML is a very old and influential functional language)'s foldl also works this way: http://www.standardml.org/Basis/list.html#SIG:LIST.foldl:VAL
Racket's foldl and foldr (and also SRFI-1's fold and fold-right) have the property that
(foldr cons null lst) = lst
(foldl cons null lst) = (reverse lst)
I speculate the argument order was chosen for that reason.
From the Racket documentation, the description of foldl:
(foldl proc init lst ...+) → any/c
Two points of interest for your question are mentioned:
the input lsts are traversed from left to right
And
foldl processes the lsts in constant space
I'm gonna speculate on how the implementation for that might look like, with a single list for simplicity's sake:
(define (my-foldl proc init lst)
(define (iter lst acc)
(if (null? lst)
acc
(iter (cdr lst) (proc (car lst) acc))))
(iter lst init))
As you can see, the requirements of left-to-right traversal and constant space are met (notice the tail recursion in iter), but the order of the arguments for proc was never specified in the description. Hence, the result of calling the above code would be:
(my-foldl - 0 '(1 2 3 4))
> 2
If we had specified the order of the arguments for proc in this way:
(proc acc (car lst))
Then the result would be:
(my-foldl - 0 '(1 2 3 4))
> -10
My point is, the documentation for foldl doesn't make any assumptions on the evaluation order of the arguments for proc, it only has to guarantee that constant space is used and that the elements in the list are evaluated from left to right.
As a side note, you can get the desired evaluation order for your expression by simply writing this:
(- 0 1 2 3 4)
> -10

How do I write this Clojure function so that it doesn't blow out the stack?

I'm new to Clojure and I think my approach to writing code so far is not in line with the "Way of Clojure". At least, I keep writing functions that keep leading to StackOverflow errors with large values. I've learned about using recur which has been a good step forward. But, how to make functions like the one below work for values like 2500000?
(defn fib [i]
(if (>= 2 i)
1
(+ (fib (dec i))
(fib (- i 2)))))
The function is, to my eyes, the "plain" implementation of a Fibonacci generator. I've seen other implementations that are much more optimized, but less obvious in terms of what they do. I.e. when you read the function definition, you don't go "oh, fibonacci".
Any pointers would be greatly appreciated!
You need to have a mental model of how your function works. Let's say that you execute your function yourself, using scraps of paper for each invocation. First scrap, you write (fib 250000), then you see "oh, I need to calculate (fib 249999) and (fib 249998) and finally add them", so you note that and start two new scraps. You can't throw away the first, because it still has things to do for you when you finish the other calculations. You can imagine that this calculation will need a lot of scraps.
Another way is not to start at the top, but at the bottom. How would you do this by hand? You would start with the first numbers, 1, 1, 2, 3, 5, 8 …, and then always add the last two, until you have done it i times. You can even throw away all numbers except the last two at each step, so you can re-use most scraps.
(defn fib [i]
(loop [a 0
b 1
n 1]
(if (>= n i)
b
(recur b
(+ a b)
(inc n)))))
This is also a fairly obvious implementation, but of the how to, not of the what. It always seems quite elegant when you can simply write down a definition and it gets automatically transformed into an efficient calculation, but programming is that transformation. If something gets transformed automatically, then this particular problem has already been solved (often in a more general way).
Thinking "how would I do this step by step on paper" often leads to a good implementaion.
A Fibonacci generator implemented in a "plain" way, as in the definition of the sequence, will always blow your stack up. Neither of two recursive calls to fib are tail recursive, such definition cannot be optimised.
Unfortunately, if you'd like to write an efficient implementation working for big numbers you'll have to accept the fact that mathematical notation doesn't translate to code as cleanly as we'd like it to.
For instance, a non-recursive implementation can be found in clojure.contrib.lazy-seqs. A whole range of various approaches to this problem can be found on Haskell wiki. It shouldn't be difficult to understand with knowledge of basics of functional programming.
;;from "The Pragmatic Programmers Programming Clojure"
(defn fib [] (map first (iterate (fn [[a b]][b (+ a b)])[0N 1N])))
(nth (fib) 2500000)

Practical use of fold/reduce in functional languages

Fold (aka reduce) is considered a very important higher order function. Map can be expressed in terms of fold (see here). But it sounds more academical than practical to me. A typical use could be to get the sum, or product, or maximum of numbers, but these functions usually accept any number of arguments. So why write (fold + 0 '(2 3 5)) when (+ 2 3 5) works fine. My question is, in what situation is it easiest or most natural to use fold?
The point of fold is that it's more abstract. It's not that you can do things that you couldn't before, it's that you can do them more easily.
Using a fold, you can generalize any function that is defined on two elements to apply to an arbitrary number of elements. This is a win because it's usually much easier to write, test, maintain and modify a single function that applies two arguments than to a list. And it's always easier to write, test, maintain, etc. one simple function instead of two with similar-but-not-quite functionality.
Since fold (and for that matter, map, filter, and friends) have well-defined behaviour, it's often much easier to understand code using these functions than explicit recursion.
Basically, once you have the one version, you get the other "for free". Ultimately, you end up doing less work to get the same result.
Here are a few simple examples where reduce works really well.
Find the sum of the maximum values of each sub-list
Clojure:
user=> (def x '((1 2 3) (4 5) (0 9 1)))
#'user/x
user=> (reduce #(+ %1 (apply max %2)) 0 x)
17
Racket:
> (define x '((1 2 3) (4 5) (0 9 1)))
> (foldl (lambda (a b) (+ b (apply max a))) 0 x)
17
Construct a map from a list
Clojure:
user=> (def y '(("dog" "bark") ("cat" "meow") ("pig" "oink")))
#'user/y
user=> (def z (reduce #(assoc %1 (first %2) (second %2)) {} y))
#'user/z
user=> (z "pig")
"oink"
For a more complicated clojure example featuring reduce, check out my solution to Project Euler problems 18 & 67.
See also: reduce vs. apply
In Common Lisp functions don't accept any number of arguments.
There is a constant defined in every Common Lisp implementation CALL-ARGUMENTS-LIMIT, which must be 50 or larger.
This means that any such portably written function should accept at least 50 arguments. But it could be just 50.
This limit exists to allow compilers to possibly use optimized calling schemes and to not provide the general case, where an unlimited number of arguments could be passed.
Thus to really process large (larger than 50 elements) lists or vectors in portable Common Lisp code, it is necessary to use iteration constructs, reduce, map, and similar. Thus it is also necessary to not use (apply '+ large-list) but use (reduce '+ large-list).
Code using fold is usually awkward to read. That's why people prefer map, filter, exists, sum, and so on—when available. These days I'm primarily writing compilers and interpreters; here's some ways I use fold:
Compute the set of free variables for a function, expression, or type
Add a function's parameters to the symbol table, e.g., for type checking
Accumulate the collection of all sensible error messages generated from a sequence of definitions
Add all the predefined classes to a Smalltalk interpreter at boot time
What all these uses have in common is that they're accumulating information about a sequence into some kind of set or dictionary. Eminently practical.
Your example (+ 2 3 4) only works because you know the number of arguments beforehand. Folds work on lists the size of which can vary.
fold/reduce is the general version of the "cdr-ing down a list" pattern. Each algorithm that's about processing every element of a sequence in order and computing some return value from that can be expressed with it. It's basically the functional version of the foreach loop.
Here's an example that nobody else mentioned yet.
By using a function with a small, well-defined interface like "fold", you can replace that implementation without breaking the programs that use it. You could, for example, make a distributed version that runs on thousands of PCs, so a sorting algorithm that used it would become a distributed sort, and so on. Your programs become more robust, simpler, and faster.
Your example is a trivial one: + already takes any number of arguments, runs quickly in little memory, and has already been written and debugged by whoever wrote your compiler. Those properties are not often true of algorithms I need to run.

Is there any limit to recursion in lisp?

I enjoy using recursion whenever I can, it seems like a much more natural way to loop over something then actual loops. I was wondering if there is any limit to recursion in lisp? Like there is in python where it freaks out after like 1000 loops?
Could you use it for say, a game loop?
Testing it out now, simple counting recursive function. Now at >7000000!
Thanks alot
First, you should understand what tail call is about.
Tail call are call that do not consumes stack.
Now you need to recognize when your are consuming stack.
Let's take the factorial example:
(defun factorial (n)
(if (= n 1)
1
(* n (factorial (- n 1)))))
Here is the non-tail recursive implementation of factorial.
Why? This is because in addition to a return from factorial, there is a pending computation.
(* n ..)
So you are stacking n each time you call factorial.
Now let's write the tail recursive factorial:
(defun factorial-opt (n &key (result 1))
(if (= n 1)
result
(factorial-opt (- n 1) :result (* result n))))
Here, the result is passed as an argument to the function.
So you're also consuming stack, but the difference is that the stack size stays constant.
Thus, the compiler can optimize it by using only registers and leaving the stack empty.
The factorial-opt is then faster, but is less readable.
factorial is limited to the size of the stack will factorial-opt is not.
So you should learn to recognize tail recursive function in order to know if the recursion is limited.
There might be some compiler technique to transform a non-tail recursive function into a tail recursive one. Maybe someone could point out some link here.
Scheme mandates tail call optimization, and some CL implementations offer it as well. However, CL does not mandate it.
Note that for tail call optimization to work, you need to make sure that you don't have to return. E.g. a naive implementation of Fibonacci where there is a need to return and add to another recursive call, will not be tail call optimized and as a result you will run out of stack space.

Are there problems that cannot be written using tail recursion?

Tail recursion is an important performance optimisation stragegy in functional languages because it allows recursive calls to consume constant stack (rather than O(n)).
Are there any problems that simply cannot be written in a tail-recursive style, or is it always possible to convert a naively-recursive function into a tail-recursive one?
If so, one day might functional compilers and interpreters be intelligent enough to perform the conversion automatically?
Yes, actually you can take some code and convert every function call—and every return—into a tail call. What you end up with is called continuation-passing style, or CPS.
For example, here's a function containing two recursive calls:
(define (count-tree t)
(if (pair? t)
(+ (count-tree (car t)) (count-tree (cdr t)))
1))
And here's how it would look if you converted this function to continuation-passing style:
(define (count-tree-cps t ctn)
(if (pair? t)
(count-tree-cps (car t)
(lambda (L) (count-tree-cps (cdr t)
(lambda (R) (ctn (+ L R))))))
(ctn 1)))
The extra argument, ctn, is a procedure which count-tree-cps tail-calls instead of returning. (sdcvvc's answer says that you can't do everything in O(1) space, and that is correct; here each continuation is a closure which takes up some memory.)
I didn't transform the calls to car or cdr or + into tail-calls. That could be done as well, but I assume those leaf calls would actually be inlined.
Now for the fun part. Chicken Scheme actually does this conversion on all code it compiles. Procedures compiled by Chicken never return. There's a classic paper explaining why Chicken Scheme does this, written in 1994 before Chicken was implemented: CONS should not cons its arguments, Part II: Cheney on the M.T.A.
Surprisingly enough, continuation-passing style is fairly common in JavaScript. You can use it to do long-running computation, avoiding the browser's "slow script" popup. And it's attractive for asynchronous APIs. jQuery.get (a simple wrapper around XMLHttpRequest) is clearly in continuation-passing style; the last argument is a function.
It's true but not useful to observe that any collection of mutually recursive functions can be turned into a tail-recursive function. This observation is on a par with the old chestnut fro the 1960s that control-flow constructs could be eliminated because every program could be written as a loop with a case statement nested inside.
What's useful to know is that many functions which are not obviously tail-recursive can be converted to tail-recursive form by the addition of accumulating parameters. (An extreme version of this transformation is the transformation to continuation-passing style (CPS), but most programmers find the output of the CPS transform difficult to read.)
Here's an example of a function that is "recursive" (actually it's just iterating) but not tail-recursive:
factorial n = if n == 0 then 1 else n * factorial (n-1)
In this case the multiply happens after the recursive call.
We can create a version that is tail-recursive by putting the product in an accumulating parameter:
factorial n = f n 1
where f n product = if n == 0 then product else f (n-1) (n * product)
The inner function f is tail-recursive and compiles into a tight loop.
I find the following distinctions useful:
In an iterative or recursive program, you solve a problem of size n by
first solving one subproblem of size n-1. Computing the factorial function
falls into this category, and it can be done either iteratively or
recursively. (This idea generalizes, e.g., to the Fibonacci function, where
you need both n-1 and n-2 to solve n.)
In a recursive program, you solve a problem of size n by first solving two
subproblems of size n/2. Or, more generally, you solve a problem of size n
by first solving a subproblem of size k and one of size n-k, where 1 < k < n. Quicksort and mergesort are two examples of this kind of problem, which
can easily be programmed recursively, but is not so easy to program
iteratively or using only tail recursion. (You essentially have to simulate recursion using an explicit
stack.)
In dynamic programming, you solve a problem of size n by first solving all
subproblems of all sizes k, where k<n. Finding the shortest route from one
point to another on the London Underground is an example of this kind of
problem. (The London Underground is a multiply-connected graph, and you
solve the problem by first finding all points for which the shortest path
is 1 stop, then for which the shortest path is 2 stops, etc etc.)
Only the first kind of program has a simple transformation into tail-recursive form.
Any recursive algorithm can be rewritten as an iterative algorithm (perhaps requiring a stack or list) and iterative algorithms can always be rewritten as tail-recursive algorithms, so I think it's true that any recursive solution can somehow be converted to a tail-recursive solution.
(In comments, Pascal Cuoq points out that any algorithm can be converted to continuation-passing style.)
Note that just because something is tail-recursive doesn't mean that its memory usage is constant. It just means that the call-return stack doesn't grow.
You can't do everything in O(1) space (space hierarchy theorem). If you insist on using tail recursion, then you can store the call stack as one of the arguments. Obviously this doesn't change anything; somewhere internally, there is a call stack, you're simply making it explicitly visible.
If so, one day might functional compilers and interpreters be intelligent enough to perform the conversion automatically?
Such conversion will not decrease space complexity.
As Pascal Cuoq commented, another way is to use CPS; all calls are tail recursive then.
I don't think something like tak could be implemented using only tail calls. (not allowing continuations)

Resources