recursion in implementation of "partition" function - recursion

I was randomly reading through Clojure source code and I saw that partition function was defined in terms of recursion without using "recur":
(defn partition
... ...
([n step coll]
(lazy-seq
(when-let [s (seq coll)]
(let [p (doall (take n s))]
(when (= n (count p))
(cons p (partition n step (nthrest s step))))))))
... ...)
Is there any reason of doing this?

Partition is lazy. The recursive call to partition occurs within the body of a lazy-seq. Therefore, it is not immediately invoked but returned in a special seq-able object to be evaluated when needed and cache the results realized thus far. The stack depth is limited to one invocation at a time.
A recur without a lazy-seq could be used to create an eager version, but you would not want to use it on sequences of indeterminate length as you can with the version in core.

To build on #A.Webb's answer and #amalloy's comment:
recur is not a shorthand to call the function and it's not a function. It's a special form (what in another language you would call a syntax) to perform tail call optimization.
Tail call optimization is a technique that allows to use recursion without blowing up the stack (without it, every recursive call adds its call frame to the stack). It is not implemented natively in Java (yet), which is why recur is used to mark a tail call in Clojure.
Recursion using lazy-seq is different because it delays the recursive call by wrapping it in a closure. What it means is that a call to a function implemented in terms of lazy-seq (and in particular every recursive call in such a function) returns (immediately) a LazySeq sequence, whose computation is delayed until it is iterated through.
To illustrate and qualify #amalloy's comment that recur and laziness are mutually exclusive, here's an implementation of filter that uses both techniques:
(defn filter [pred coll]
(letfn [(step [pred coll]
(when-let [[x & more] (seq coll)]
(if (pred x)
(cons x (lazy-seq (step pred more))) ;; lazy recursive call
(recur pred more))))] ;; tail call
(lazy-seq (step pred coll))))
(filter even? (range 10))
;; => (0 2 4 6 8)
Both techniques can be used in the same function, but not for the same recursive call ; the function wouldn't compile if the lazy recursive call to step used recur because in this case recur wouldn't be in tail call position (the result of the tail call would not be returned directly but would be passed to lazy-seq).

All of the lazy functions are written this way. This implementation of partition would blow the stack without the call to lazy-seq for a large enough sequence.
Read a bit about TCO (tail call optimization) if you are more interested in how recur works. When you are using tail recursion it means you can jump out of your current function call without losing anything. In the case of this implementation you wouldn't be able to do that because you are cons-ing p on the next call of partition. Being in tail position means you are the last thing being called. In the implementation cons is in tail position. recur only works on tail position to guarantee TCO.

Related

Is this Scheme function recursive?

Given the following function, am I allowed to say that it is recursive? Why I ask this question is because the 'fac' function actually doesn't call itself recursively, so am I still allowed to say that it is a recursive function even though the only function that calls itself is fac-iter?
(define (fac n)
(define (fac-iter counter result)
(if (= counter 0)
result
(fac-iter (- counter 1) (* result counter))))
(fac-iter n 1))
fac is not recursive: it does not refer to its own definition in any way.
fac-iter is recursive: it does refer to its own definition. But in Scheme it will create an iterative process, since its calls to itself are tail calls.
(In casual speech I think people would often say that neither fac nor fac-iter is recursive in Scheme, but I think speaking more precisely the above is correct.)
One problem with calling fac formally recursive is that fac-iter is liftable out of fac. You can write the code like this:
(define (fac-iter counter result)
(if (= counter 0)
result
(fac-iter (- counter 1) (* result counter))))
(define (fac n)
(fac-iter n 1))
fac is an interface which is implemented by a recursive helper function.
If the helper function had a reason to be inside fac, such as accessing the parent's local variables, then there would be more justification for calling fac formally recursive: a significant piece of the interior of fac, a local funcction doing the bulk of the work, is internally recursive, and that interior cannot be moved to the top level without some refactoring.
Informally we can call fac recursive regardless if what we mean by that is that the substantial bulk of its work is done by a recursive approach. We are emphasizing the algorithm that is used, not the details over how it is integrated.
If a homework problem states "please implement a recursive solution to the binary search problem", and the solution is required to take the form of a bsearch.scm file, then obviously the problem statement doesn't mean that the bsearch.scm file must literally invoke itself, right? It means that the main algorithmic content in that file is recursive.
Or when we say that "the POSIX utility find performs a recursive traversal of the filesystem" we don't mean that find forks and executes a copy of itself for every directory it visits.
There is room for informality: calling something recursive without meaning that the entry point of that thing which has that thing's name is literally calling itself.
On another note, in some situations the term "recursion" in the Scheme context is used to denote recursion that requires stack storage; tail calls that are required to be rewritten to express iteration aren't called recursion. That's just taking the point of view of the implementation; what the compiled code is doing. Tail calls are sometimes called "stackless recursion" as a kind of compromise. The situation is complicated because tail calls alone do not eliminate true recursion. There is a way of compiling programs such that all procedure calls become tail calls, namely transformation to CPS (continuation passing style). Yet if the source program performs true recursion that requires a stack, the CPS-transformed program will also, in spite of using nothing but tail calls. What will happen is that an ad hoc stack will emerge via a chain of captured lambda environments. A lambda being used as a continuation captures the previous continuation as a lexical variable. The previous continuation itself captures another such a continuation in its environment, and so on. A heap-allocated chain emerges which constitutes the de facto return stack for the recursion. For reasons like this we cannot automatically conclude that when we see tail calls, we have iteration and not recursion.
An example looks like this. The traversal of a binary tree is truly recursive, right? When we visit the left child, that visitation must return, so that we can then visit the right child. The right child visit can be a tail call, but the left one isn't. Under CPS, they can both be tail calls:
(define (traverse tree contin)
(cond
[(null? tree) (contin)] ;; tail call to continuation
[else (traverse (tree-left tree) ;; tail call to traverse
(lambda ()
(traverse (right tree) contin)))])) ;; ditto!
so here, when the left node is traversed, that is a tail call: the last thing our procedure does is call (traverse (tree-left tree) (lambda ...)). But it passes that lambda as a continuation, and that continuation contains more statements to execute when it is invoked, which is essentially the same as if control returned there via a procedure retun. If we take the point of view that tail calls aren't recursion then we are justified in saying that the function isn't recursive. Yet it has the recursive control flow structure, uses storage proportional to the left depth of the tree, and does so without appearing to maintain an explicit stack structure. As if that weren't enough, the following obviously recursive program can be automatically converted to the above:
(define (traverse tree)
(cond
[(null? tree)] ;; return
[else (traverse (tree-left tree))
(traverse (tree-right tree))]))
The CPS transformation inserts the continuations and lambdas, turning everything into tail calls that pass a continuation argument.

Reversing list vs non tail recursion when traversing lists

I wonder how do you, experienced lispers / functional programmers usually make decision what to use. Compare:
(define (my-map1 f lst)
(reverse
(let loop ([lst lst] [acc '()])
(if (empty? lst)
acc
(loop (cdr lst) (cons (f (car lst)) acc))))))
and
(define (my-map2 f lst)
(if (empty? lst)
'()
(cons (f (car lst)) (my-map2 f (cdr lst)))))
The problem can be described in the following way: whenever we have to traverse a list, should we collect results in accumulator, which preserves tail recursion, but requires list reversion in the end? Or should we use unoptimized recursion, but then we don't have to reverse anything?
It seems to me the first solution is always better. Indeed, there's additional complexity (O(n)) there. However, it uses much less memory, let alone calling a function isn't done instantly.
Yet I've seen different examples where the second approach was used. Either I'm missing something or these examples were only educational. Are there situations where unoptimized recursion is better?
When possible, I use higher-order functions like map which build a list under the hood. In Common Lisp I also tend to use loop a lot, which has a collect keyword for building list in a forward way (I also use the series library which also implements it transparently).
I sometimes use recursive functions that are not tail-recursive because they better express what I want and because the size of the list is going to be relatively small; in particular, when writing a macro, the code being manipulated is not usually very large.
For more complex problems I don't collect into lists, I generally accept a callback function that is being called for each solution. This ensures that the work is more clearly separated between how the data is produced and how it is used.
This approach is to me the most flexible of all, because no assumption is made about how the data should be processed or collected. But it also means that the callback function is likely to perform side-effects or non-local returns (see example below). I don't think it is particularly a problem as long the the scope of the side-effects is small (local to a function).
For example, if I want to have a function that generates all natural numbers between 0 and N-1, I write:
(defun range (n f)
(dotimes (i n)
(funcall f i)))
The implementation here iterates over all values from 0 below N and calls F with the value I.
If I wanted to collect them in a list, I'd write:
(defun range-list (N)
(let ((list nil))
(range N (lambda (v) (push v list)))
(nreverse list)))
But, I can also avoid the whole push/nreverse idiom by using a queue. A queue in Lisp can be implemented as a pair (first . last) that keeps track of the first and last cons cells of the underlying linked-list collection. This allows to append elements in constant time to the end, because there is no need to iterate over the list (see Implementing queues in Lisp by P. Norvig, 1991).
(defun queue ()
(let ((list (list nil)))
(cons list list)))
(defun qpush (queue element)
(setf (cdr queue)
(setf (cddr queue)
(list element))))
(defun qlist (queue)
(cdar queue))
And so, the alternative version of the function would be:
(defun range-list (n)
(let ((q (queue)))
(range N (lambda (v) (qpush q v)))
(qlist q)))
The generator/callback approach is also useful when you don't want to build all the elements; it is a bit like the lazy model of evaluation (e.g. Haskell) where you only use the items you need.
Imagine you want to use range to find the first empty slot in a vector, you could do this:
(defun empty-index (vector)
(block nil
(range (length vector)
(lambda (d)
(when (null (aref vector d))
(return d))))))
Here, the block of lexical name nil allows the anonymous function to call return to exit the block with a return value.
In other languages, the same behaviour is often reversed inside-out: we use iterator objects with a cursor and next operations. I tend to think it is simpler to write the iteration plainly and call a callback function, but this would be another interesting approach too.
Tail recursion with accumulator
Traverses the list twice
Constructs two lists
Constant stack space
Can crash with malloc errors
Naive recursion
Traverses list twice (once building up the stack, once tearing down the stack).
Constructs one list
Linear stack space
Can crash with stack overflow (unlikely in racket), or malloc errors
It seems to me the first solution is always better
Allocations are generally more time-expensive than extra stack frames, so I think the latter one will be faster (you'll have to benchmark it to know for sure though).
Are there situations where unoptimized recursion is better?
Yes, if you are creating a lazily evaluated structure, in haskell, you need the cons-cell as the evaluation boundary, and you can't lazily evaluate a tail recursive call.
Benchmarking is the only way to know for sure, racket has deep stack frames, so you should be able to get away with both versions.
The stdlib version is quite horrific, which shows that you can usually squeeze out some performance if you're willing to sacrifice readability.
Given two implementations of the same function, with the same O notation, I will choose the simpler version 95% of the time.
There are many ways to make recursion preserving iterative process.
I usually do continuation passing style directly. This is my "natural" way to do it.
One takes into account the type of the function. Sometimes you need to connect your function with the functions around it and depending on their type you can choose another way to do recursion.
You should start by solving "the little schemer" to gain a strong foundation about it. In the "little typer" you can discover another type of doing recursion, founded on other computational philosophy, used in languages like agda, coq.
In scheme you can write code that is actually haskell sometimes (you can write monadic code that would be generated by a haskell compiler as intermediate language). In that case the way to do recursion is also different that "usual" way, etc.
false dichotomy
You have other options available to you. Here we can preserve tail-recursion and map over the list with a single traversal. The technique used here is called continuation-passing style -
(define (map f lst (return identity))
(if (null? lst)
(return null)
(map f
(cdr lst)
(lambda (r) (return (cons (f (car lst)) r))))))
(define (square x)
(* x x))
(map square '(1 2 3 4))
'(1 4 9 16)
This question is tagged with racket, which has built-in support for delimited continuations. We can accomplish map using a single traversal, but this time without using recursion. Enjoy -
(require racket/control)
(define (yield x)
(shift return (cons x (return (void)))))
(define (map f lst)
(reset (begin
(for ((x lst))
(yield (f x)))
null)))
(define (square x)
(* x x))
(map square '(1 2 3 4))
'(1 4 9 16)
It's my intention that this post will show you the detriment of pigeonholing your mind into a particular construct. The beauty of Scheme/Racket, I have come to learn, is that any implementation you can dream of is available to you.
I would highly recommend Beautiful Racket by Matthew Butterick. This easy-to-approach and freely-available ebook shatters the glass ceiling in your mind and shows you how to think about your solutions in a language-oriented way.

How to understand clojure's lazy-seq

I'm trying to understand clojure's lazy-seq operator, and the concept of lazy evaluation in general. I know the basic idea behind the concept: Evaluation of an expression is delayed until the value is needed.
In general, this is achievable in two ways:
at compile time using macros or special forms;
at runtime using lambda functions
With lazy evaluation techniques, it is possible to construct infinite data structures that are evaluated as consumed. These infinite sequences utilizes lambdas, closures and recursion. In clojure, these infinite data structures are generated using lazy-seq and cons forms.
I want to understand how lazy-seq does it's magic. I know it is actually a macro. Consider the following example.
(defn rep [n]
(lazy-seq (cons n (rep n))))
Here, the rep function returns a lazily-evaluated sequence of type LazySeq, which now can be transformed and consumed (thus evaluated) using the sequence API. This API provides functions take, map, filter and reduce.
In the expanded form, we can see how lambda is utilized to store the recipe for the cell without evaluating it immediately.
(defn rep [n]
(new clojure.lang.LazySeq (fn* [] (cons n (rep n)))))
But how does the sequence API actually work with LazySeq?
What actually happens in the following expression?
(reduce + (take 3 (map inc (rep 5))))
How is the intermediate operation map applied to the sequence,
how does take limit the sequence and
how does terminal operation reduce evaluate the sequence?
Also, how do these functions work with either a Vector or a LazySeq?
Also, is it possible to generate nested infinite data structures?: list containing lists, containing lists, containing lists... going infinitely wide and deep, evaluated as consumed with the sequence API?
And last question, is there any practical difference between this
(defn rep [n]
(lazy-seq (cons n (rep n))))
and this?
(defn rep [n]
(cons n (lazy-seq (rep n))))
That's a lot of questions!
How does the seq API actually works with LazySeq?
If you take a look at LazySeq's class source code you will notice that it implements ISeq interface providing methods like first, more and next.
Functions like map, take and filter are built using lazy-seq (they produce lazy sequences) and first and rest (which in turn uses more) and that's how they can work with lazy seq as their input collection - by using first and more implementations of LazySeq class.
What actually happens in the following expression?
(reduce + (take 3 (map inc (rep 5))))
The key is to look how LazySeq.first works. It will invoke the wrapped function to obtain and memoize the result. In your case it will be the following code:
(cons n (rep n))
Thus it will be a cons cell with n as its value and another LazySeq instance (result of a recursive call to rep) as its rest part. It will become the realised value of this LazySeq object and first will return the value of the cached cons cell.
When you call more on it, it will in the same way ensure that the value of the particular LazySeq object is realised (or reused memoized value) and call more on it (in this case more on the cons cell containing another LazySeq object).
Once you obtain another instance of LazySeq object with more the story repeats when you call first on it.
map and take will create another lazy-seq that will call first and more of the collection passed as their argument (just another lazy seq) so it will be similar story. The difference will be only in how the values passed to cons are generated (e.g. calling f to a value obtained by first invoked on the LazySeq value mapped over in map instead of a raw value like n in your rep function).
With reduce it's a bit simpler as it will use loop with first and more to iterate over the input lazy seq and apply the reducing function to produce the final result.
As the actual implementation looks like for map and take I encourage you to check their source code - it's quite easy to follow.
How seq API can works with different collection types (e.g. lazy seq and persistent vector)?
As mentioned above, map, take and other functions work in terms of first and rest (reminder - rest is implemented on top of more). Thus we need to explain how first and rest/more can work with different collection types: they check if the collection implements ISeq (and then it implement those functions directly) or they try to create a seq view of the collection and coll its implementation of first and more.
Is it possible to generate nested infinite data structures?
It's definitely possible but I am not sure what the exact data shape you would like to get. Do you mean getting a lazy seq which generates another sequence as it's value (instead of a single value like n in your rep) but returns it as a flat sequence?
(defn nested-cons [n]
(lazy-seq (cons (repeat n n) (nested-cons (inc n)))))
(take 3 (nested-cons 1))
;; => ((1) (2 2) (3 3 3))
that would rather return (1 2 2 3 3 3)?
For such cases you might use concat instead of cons which creates a lazy sequence of two or more sequences:
(defn nested-concat [n]
(lazy-seq (concat (repeat n n) (nested-concat (inc n)))))
(take 6 (nested-concat 1))
;; => (1 2 2 3 3 3)
Is there any practical difference with this
(defn rep [n]
(lazy-seq (cons n (rep n))))
and this?
(defn rep [n]
(cons n (lazy-seq (rep n))))
In this particular case not really. But in the case where a cons cell doesn't wrap a raw value but a result of a function call to calculate it, the latter form is not fully lazy. For example:
(defn calculate-sth [n]
(println "Calculating" n)
n)
(defn rep1 [n]
(lazy-seq (cons (calculate-sth n) (rep1 (inc n)))))
(defn rep2 [n]
(cons (calculate-sth n) (lazy-seq (rep2 (inc n)))))
(take 0 (rep1 1))
;; => ()
(take 0 (rep2 1))
;; Prints: Calculating 1
;; => ()
Thus the latter form will evaluate its first element even if you might not need it.

Scheme recursive or iterative

(define unique?
(lambda (l)
(or (null? l)
(null? (cdr l))
(and (not (in? (car l) (cdr l)))
(unique? (cdr l)))))
Can someone please tell me if it's an iteratibe or recursive procedure? I guess it is iterative, but I am not sure and I don't know how to explain i
You code is clearly recursive because unique? calls unique?. The procedure unique? will be iterative if the recursive call occurs in the tail position. The R6RS standard, see Section 11.20 'Tail calls and tail contexts', details the tail contexts for each syntactic form. Analyzing your lambda we reason as:
The last expression in a lambda is a tail context, so your or ... is a tail expression
The last expression in a or is a tail expression, so your and ... is a tail expression
The last expression in a and is a tail expression, so your unique? is a tail call
Therefore, unique? calls itself as a tail call and is thus iterative.
The function is recursive because it calls itself. Note, however that since it is tail-recursive, it can be compiled to iterative form.
The criteria for a function to be tail recursive is simply that there be nothing to do after the recursive call returns, which is true in your case.
How to convert a tail-recursive function into a loop
To transform a tail-recursive function into a while loop, follow the
following steps. (There is not an automatic function for converting an
arbitrary recursive function into a while loop--you have to convert it
into a tail-recursive function first.)
Turn the helper function parameters into local variables; their
initial values are the same as the initial values for the call to
the helper function from the main function.
The continue test is the same as the test in the helper function,
for the direction that causes a tail-recursive call to be made. (If
the tail-recursive call appears in the else part of the if, then add
a call to not.)
The body uses set! to update the value of each variable; the values
are the same as the arguments passed to the tail-recursive call.
Return the value of the accumulator; this comes after the while
loop.
Source: Tail recursion and loops
Since the other two answers explained what and why your function is tail-recursive; I am just going to leave an easy way for beginners to remember the difference:
A tail-recursive function does not operate on the recursive call.
A non-tail-recursive function does operate on the recursive call.

Why aren't recursive calls automagically replaced by recur?

In the following (Clojure) SO question: my own interpose function as an exercise
The accepted answers says this:
Replace your recursive call with a call to recur because as written it
will hit a stack overflow
(defn foo [stuff]
(dostuff ... )
(foo (rest stuff)))
becomes:
(defn foo [stuff]
(dostuff ...)
(recur (rest stuff)))
to avoid blowing the stack.
It may be a silly question but I'm wondering why the recursive call to foo isn't automatically replaced by recur?
Also, I took another SO example and wrote this ( without using cond on purpose, just to try things out):
(defn is-member [elem ilist]
(if (empty? ilist)
false
(if (= elem (first ilist))
true
(is-member elem (rest ilist)))))
And I was wondering if I should replace the call to is-member with recur (which also seems to work) or not.
Are there cases where you do recurse and specifically should not use recur?
There's pretty much never a reason not to use recur if you have a tail-recursive method, although unless you're in a performance-sensitive area of code it just won't make any difference.
I think the basic argument is that having recur be explicit makes it very clear whether a function is tail-recursive or not; all tail-recursive functions use recur, and all functions that recur are tail-recursive (the compiler will complain if you try to use recur from a non-tail-position.) So it's a bit of an aesthetic decision.
recur also helps distinguish Clojure from languages which will do TCO on all tail calls, like Scheme; Clojure can't do that effectively because its functions are compiled as Java functions and the JVM doesn't support it. By making recur a special case, there's hopefully no inflated expectations.
I don't think there would be any technical reason why the compiler couldn't insert recur for you, if it were designed that way, but maybe someone will correct me.
I asked Rich Hickey that and his reasoning was basically (and I paraphrase)
"make the special cases look special"
he did not see the value in papering over a special case most of the time and leaving people
to wonder why if blows the stack later when something changes and the compiler can't guarantee the optimization. Untimely it was just one of the design decisions made to try and keep the language simple
I was wondering if I should replace the call to is-member with recur
In general, as mquander says, there is no reason to not use recur whenever you can. With small inputs (a few dozen to a few hundred elements) they are the same, but the version without recur will blow up on large inputs (a few thousand elements).
Explicit recursion (i.e. without 'recur') is good for many things, but iterating through long sequences is not one of them.
Are there cases where you specifically should not use recur?
Only when you can't use it, which is when
it is not tail recursive - i.e. it wants to do something with the return value.
the recursion is to a different function.
Some examples:
(defn foo [coll]
(when coll
(println (first coll))
(recur (next coll))) ;; OK: Tail recursive
(defn fie [coll]
(when coll
(cons (first coll)
(fie (next coll))))) ;; Can't use recur: Not tail recursive.
(defn fum
([coll]
(fum coll [])) ;; Can't use recur: Different function.
([coll acc]
(if (empty? coll) acc
(recur (next coll) ;; OK: Tail recursive
(conj acc (first coll))))))
As to why recur isn't inserted automatically when appropriate: I don't know, but at least one positive side-effect is to make actual function calls visually distinct from the non-calls (i.e. recur).
Since this can be the difference between "works" and "blows up with StackOverflowError", I think it's a fair design choice to make it explicit - visible in the code - rather than implicit, where you would have to start second-guessing the compiler when it doesn't work as expected.

Resources