In Clojure programming language, why this code passes with flying colors?
(let [r (range 1e9)] [(first r) (last r)])
While this one fails:
(let [r (range 1e9)] [(last r) (first r)])
I know it is about "Losing your head" advice but would you please explain it to me? I'm not able to digest it yet.
UPDATE:
It is really hard to pick the correct answer, two answers are amazingly informative.
Note: Code snippets are from "The Joy of Clojure".
To elaborate on dfan and Rafał's answers, I've taken the time to run both expressions with the YourKit profiler.
It's fascinating to see the JVM at work. The first program is so GC-friendly that the JVM really shines at managing its memory.
I drew some charts.
GC friendly: (let [r (range 1e9)] [(first r) (last r)])
This program runs very low on memory; overall, less than 6 megabytes. As stated earlier, it is very GC friendly, it makes a lot of collections, but uses very little CPU for that.
Head holder: (let [r (range 1e9)] [(last r) (first r)])
This one is very memory hungry. It goes up to 300 MB of RAM, but that's not enough and the program does not finish (the JVM dies less than one minute later). The GC takes up to 90% of CPU time, which indicates it desperately tries to free any memory it can, but cannot find any (the objects collected are very little to none).
Edit The second program ran out of memory, which triggered a heap dump. An analysis of this dump shows that 70% of the memory is java.lang.Integer objects, which could not be collected. Here's another screenshot:
range generates elements as needed.
In the case of (let [r (range 1e9)] [(first r) (last r)]), it grabs the first element (0), then generates a billion - 2 elements, throwing them out as it goes, and then grabs the last element (999,999,999). It never has any need to keep any part of the sequence around.
In the case of (let [r (range 1e9)] [(last r) (first r)]), it generates a billion elements in order to be able to evaluate (last r), but it also has to hold on to the beginning of the list it's generating in order to later evaluate (first r). So it's not able to throw anything away as it goes, and (I presume) runs out of memory.
What really holds the head here is the binding of the sequence to r (not the already-evaluated (first r), since you cannot evaluate the whole sequence from its value.)
In the first case the binding no longer exists when (last r) is evaluated, since there are no more expressions with r to evaluate. In the second case, the existence of the not-yet-evaluated (first r) means that the evaluator needs to keep the binding to r.
To show the difference, this evaluates OK:
user> (let [r (range 1e8) a 7] [(last r) ((constantly 5) a)])
[99999999 5]
While this fails:
(let [r (range 1e8) a 7] [(last r) ((constantly 5) r)])
Even though the expression following (last r) ignores r, the evaluator is not that smart and keeps the binding to r, thus keeping the whole sequence.
Edit: I have found a post where Rich Hickey explains the details of the mechanism responsible for clearing the reference to the head in the above cases. Here it is: Rich Hickey on locals clearing
for a technical description, go to http://clojure.org/lazy. the advice is mentioned in the section Don't hang (onto) your head
Related
The following paragraph is from The Racket Guide (2.3.4):
At the same time, recursion does not lead to particularly bad
performance in Racket, and there is no such thing as stack overflow;
you can run out of memory if a computation involves too much context,
but exhausting memory typically requires orders of magnitude deeper
recursion than would trigger a stack overflow in other languages.
I'm curious about how Racket was designed to avoid stack overflow? What's more, why other languages like C cannot avoid such a problem?
First, some terminology: making a non-tail call requires a context frame to store local variables, parent return address, etc. So the question is how to represent an arbitrarily large context. "The stack" (or call stack) is just one (admittedly common) implementation strategy for the context.
Here are a few implementation strategies for deep recursion (ie, large contexts):
Allocate context frames on the heap and let the GC be responsible for cleaning them up. (This is nice and simple but probably relatively slow, although people would argue that point.)
Allocate context frames on the stack. When the stack is full, copy the frames currently on the stack into the heap, clear the stack, and reset the stack pointer to the bottom. When returning, if the stack is empty, copy frames from the heap back to the stack. (This means you can't have pointers to stack-allocated objects, though, since the objects get moved around.)
Allocate context frames on the stack. When the stack is full, allocate a new big chunk of memory, call that the new stack (ie set the stack pointer), and keep going. (This might require mprotect or other operations to convince the OS that the new block of memory is okay to treat as a call stack.)
Allocate context frames on the stack. When the stack is full, make a new thread to continue the computation, and wait for the thread to finish and arrange to grab a return value from it to return to the old thread's stack. (This strategy can be useful on platforms like the JVM that don't let you directly control the stack, stack pointer, etc. On the other hand, it complicates features like thread-local storage, etc.)
... and more variations on the strategies above.
Support for deep recursion often coincides with support for first-class continuations. In general, implementing first-class continuations means you almost automatically get support for deep recursion. There's a nice paper called Implementation Strategies for First-class Continuations by Will Clinger et al. with more detail and comparisons between different strategies.
There are two pieces to this answer.
First, in Racket and other functional languages, tail calls do not create additional stack frames. That is, a loop such as
(define (f x) (f x))
... can run forever without using any stack space at all. Many non-functional languages don't prioritize function calling in the same way as functional languages, and therefore aren't properly tail-calling.
HOWEVER, the comment that you're referring to isn't just limited to tail-calling; Racket allows very deeply nested stack frames.
Your question is a good one: why don't other languages allow deeply nested stack frames? I wrote a short test, and it looks like C unceremoniously dumps core at a depth of between 262,000 and 263,000. I wrote a simple racket test that does the same thing (being careful to ensure the recursive call was not in tail position), and I interrupted it at a depth of 48,000,000 without any apparent ill effects (except, presumably, a fairly large runtime stack).
To answer your question directly, there's no reason that I'm aware of that C couldn't allow much much more deeply nested stacks, but I think that for most C programmers, a recursion depth of 262K is plenty.
Not for us, though!
Here's my C code:
#include <stdio.h>
int f(int depth){
if ((depth % 1000) == 0) {
printf("%d\n",depth);
}
return f(depth+1);
}
int main() {
return f(0);
}
... and my racket code:
#lang racket
(define (f depth)
(when (= (modulo depth 1000) 0)
(printf "~v\n" depth))
(f (add1 depth))
(printf "this will never print..."))
(f 0)
EDIT: here's the version that uses randomness on the way out to stymie possible optimizations:
#lang racket
(define (f depth)
(when (= (modulo depth 1000000) 0)
(printf "~v\n" depth))
(when (< depth 50000000)
(f (add1 depth)))
(when (< (random) (/ 1.0 100000))
(printf "X")))
(f 0)
Also, my observations of the process size are consistent with a stack frame of about 16 bytes, plus or minus; 50M * 16 bytes = 800 Megabytes, and the observed size of the stack is about 1.2 Gigabytes.
I have written a function is-prime that verifies whether a given number is a prime number or not, and returns t or nil accordingly.
(is-prime 2) ; => T
(is-prime 3) ; => T
(is-prime 4) ; => NIL
So far, so good. Now I would like to generate a list of prime numbers between min and max, so I would like to come up with a function that takes those two values as parameters and returns a list of all prime numbers:
(defun get-primes (min max)
...)
Now this is where I am currently stuck. What I could do, of course, is create a list with the range of all numbers from min to max and run remove-if-not on it.
Anyway, this means, that I first have to create a potentially huge list with lots of numbers that I throw away anyway. So I would like to do it the other way round, build up a list that contains only the numbers between min and max that actually are prime according to the is-prime predicate.
How can I do this in a functional way, i.e. without using loop? My current approach (with loop) looks like this:
(defun get-primes (min max)
(loop
for guess from min to max
when (is-prime guess)
collect guess))
Maybe this is a totally dumb question, but I guess I don't see the forest for the trees.
Any hints?
Common Lisp does not favor pure Functional Programming approaches. The language is based on a more pragmatic view of an underlying machine: no TCO, stacks are limited, various resources are limited (the number of arguments which are allowed, etc.), mutation is possible. That does not sound very motivating for any Haskell enthusiast. But Common Lisp was developed to write Lisp applications, not for advancing FP research and development. Evaluation in Common Lisp (as usual in Lisp) is strict and not lazy. The default data structures are also not lazy. Though there are lazy libraries. Common Lisp also does not provide in the standard features like continuations or coroutines - which might be useful in this case.
The default Common Lisp approach is non-functional and uses some kind of iteration construct: DO, LOOP or the ITERATE library. Like in your example. Common Lisp users find it perfectly fine. Some, like me, think that Iterate is the best looping construct ever invented. ;-) The advantage is that it is relatively easy to create efficient code, even though the macros have a large implementation.
There are various ways to do it differently:
lazy streams. reading SICP helps, see the part on streams. Can be easily done in Common Lisp, too. Note that Common Lisp uses the word stream for I/O streams - which is something different.
use a generator function next-prime. Generate this function from the initial range and call it until you got all primes you are interested in.
Here is a simple example of a generator function, generating even numbers from some start value:
(defun make-next-even-fn (start)
(let ((current (- start
(if (evenp start) 2 1))))
(lambda ()
(incf current 2))))
CL-USER 14 > (let ((next-even-fn (make-next-even-fn 13)))
(flet ((get-next-even ()
(funcall next-even-fn)))
(print (get-next-even))
(print (get-next-even))
(print (get-next-even))
(print (get-next-even))
(list (get-next-even)
(get-next-even)
(get-next-even))))
14
16
18
20
(22 24 26)
Defining the next-prime generator is left as an exercise. ;-)
use some kind of more sophisticated lazy data structure. For Common Lisp there is Series, which was an early iteration proposal - also published in CLtL2: Series. The new book Common Lisp Recipes by Edi Weitz (math prof in Hamburg) has a Series example for computing primes. See chapter 7.15. You can download the source code for the book and find the example there.
the are simpler variants of Series, for example pipes
The Third Commandment of The Little Schemer states:
When building a list, describe the first typical element, and then cons it onto the natural recursion.
What is the exact definition of "natural recursion"? The reason why I am asking is because I am taking a class on programming language principles by Daniel Friedman and the following code is not considered "naturally recursive":
(define (plus x y)
(if (zero? y) x
(plus (add1 x) (sub1 y))))
However, the following code is considered "naturally recursive":
(define (plus x y)
(if (zero? y) x
(add1 (plus x (sub1 y)))))
I prefer the "unnaturally recursive" code because it is tail recursive. However, such code is considered anathema. When I asked as to why we shouldn't write the function in tail recursive form then the associate instructor simply replied, "You don't mess with the natural recursion."
What's the advantage of writing the function in the "naturally recursive" form?
"Natural" (or just "Structural") recursion is the best way to start teaching students about recursion. This is because it has the wonderful guarantee that Joshua Taylor points out: it's guaranteed to terminate[*]. Students have a hard enough time wrapping their heads around this kind of program that making this a "rule" can save them a huge amount of head-against-wall-banging.
When you choose to leave the realm of structural recursion, you (the programmer) have taken on an additional responsibility, which is to ensure that your program halts on all inputs; it's one more thing to think about & prove.
In your case, it's a bit more subtle. You have two arguments, and you're making structurally recursive calls on the second one. In fact, with this observation (program is structurally recursive on argument 2), I would argue that your original program is pretty much just as legitimate as the non-tail-calling one, since it inherits the same proof-of-convergence. Ask Dan about this; I'd be interested to hear what he has to say.
[*] To be precise here you have to legislate out all kinds of other goofy stuff like calls to other functions that don't terminate, etc.
The natural recursion has to do with the "natural", recursive definition of the type you are dealing with. Here, you are working with natural numbers; since "obviously" a natural number is either zero or the successor of another natural number, when you want to build a natural number, you naturally output 0 or (add1 z) for some other natural z which happens to be computed recursively.
The teacher probably wants you to make the link between recursive type definitions and recursive processing of values of that type. You would not have the kind of problem you have with numbers if you tried to process trees or lists, because you routinely use natural numbers in "unnatural ways" and thus, you might have natural objections thinking in terms of Church numerals.
The fact that you already know how to write tail-recursive functions is irrelevant in that context: this is apparently not the objective of your teacher to talk about tail-call optimizations, at least for now.
The associate instructor was not very helpful at first ("messing with natural recursion" sounds as "don't ask"), but the detailed explanation he/she gave in the snapshot you gave was more appropriate.
(define (plus x y)
(if (zero? y) x
(add1 (plus x (sub1 y)))))
When y != 0 it has to remember that once the result of (plus x (sub1 y)) is known, it has to compute add1 on it. Hence when y reaches zero, the recursion is at its deepest. Now the backtracking phase begins and the add1's are executed. This process can be observed using trace.
I did the trace for :
(require racket/trace)
(define (add1 x) ...)
(define (sub1 x) ...)
(define (plus x y) ...)
(trace plus)
(plus 2 3)
Here's the trace :
>(plus 2 3)
> (plus 2 2)
> >(plus 2 1)
> > (plus 2 0) // Deepest point of recursion
< < 2 // Backtracking begins, performing add1 on the results
< <3
< 4
<5
5 // Result
The difference is that the other version has no backtracking phase. It is calling itself for a few times but it is iterative, because it is remembering intermediate results (passed as arguments). Hence the process is not consuming extra space.
Sometimes implementing a tail-recursive procedure is easier or more elegant then writing it's iterative equivalent. But for some purposes you can not/may not implement it in a recursive way.
PS : I had a class which was covering a bit about garbage collection algorithms. Such algorithms may not be recursive as there may be no space left, hence having no space for the recursion. I remember an algorithm called "Deutsch-Schorr-Waite" which was really hard to understand at first. First he implemented the recursive version just to understand the concept, afterwards he wrote the iterative version (hence manually having to remember from where in memory he came), believe me the recursive one was way easier but could not be used in practice...
I'm new to Clojure and I think my approach to writing code so far is not in line with the "Way of Clojure". At least, I keep writing functions that keep leading to StackOverflow errors with large values. I've learned about using recur which has been a good step forward. But, how to make functions like the one below work for values like 2500000?
(defn fib [i]
(if (>= 2 i)
1
(+ (fib (dec i))
(fib (- i 2)))))
The function is, to my eyes, the "plain" implementation of a Fibonacci generator. I've seen other implementations that are much more optimized, but less obvious in terms of what they do. I.e. when you read the function definition, you don't go "oh, fibonacci".
Any pointers would be greatly appreciated!
You need to have a mental model of how your function works. Let's say that you execute your function yourself, using scraps of paper for each invocation. First scrap, you write (fib 250000), then you see "oh, I need to calculate (fib 249999) and (fib 249998) and finally add them", so you note that and start two new scraps. You can't throw away the first, because it still has things to do for you when you finish the other calculations. You can imagine that this calculation will need a lot of scraps.
Another way is not to start at the top, but at the bottom. How would you do this by hand? You would start with the first numbers, 1, 1, 2, 3, 5, 8 …, and then always add the last two, until you have done it i times. You can even throw away all numbers except the last two at each step, so you can re-use most scraps.
(defn fib [i]
(loop [a 0
b 1
n 1]
(if (>= n i)
b
(recur b
(+ a b)
(inc n)))))
This is also a fairly obvious implementation, but of the how to, not of the what. It always seems quite elegant when you can simply write down a definition and it gets automatically transformed into an efficient calculation, but programming is that transformation. If something gets transformed automatically, then this particular problem has already been solved (often in a more general way).
Thinking "how would I do this step by step on paper" often leads to a good implementaion.
A Fibonacci generator implemented in a "plain" way, as in the definition of the sequence, will always blow your stack up. Neither of two recursive calls to fib are tail recursive, such definition cannot be optimised.
Unfortunately, if you'd like to write an efficient implementation working for big numbers you'll have to accept the fact that mathematical notation doesn't translate to code as cleanly as we'd like it to.
For instance, a non-recursive implementation can be found in clojure.contrib.lazy-seqs. A whole range of various approaches to this problem can be found on Haskell wiki. It shouldn't be difficult to understand with knowledge of basics of functional programming.
;;from "The Pragmatic Programmers Programming Clojure"
(defn fib [] (map first (iterate (fn [[a b]][b (+ a b)])[0N 1N])))
(nth (fib) 2500000)
The function:
Given a list lst return all permutations of the list's contents of exactly length k, which defaults to length of list if not provided.
(defun permute (lst &optional (k (length lst)))
(if (= k 1)
(mapcar #'list lst)
(loop for item in lst nconcing
(mapcar (lambda (x) (cons item x))
(permute (remove-if (lambda (x) (eq x item)) lst)
(1- k))))))
The problem:
I'm using SLIME in emacs connected to sbcl, I haven't done too much customization yet. The function works fine on smaller inputs like lst = '(1 2 3 4 5 6 7 8) k = 3 which is what it will mostly be used for in practice. However when I Call it with a large input twice in a row the second call never returns and sbcl does not even show up on top. These are the results at the REPL:
CL-USER> (time (nth (1- 1000000) (permute '(0 1 2 3 4 5 6 7 8 9))))
Evaluation took:
12.263 seconds of real time
12.166150 seconds of total run time (10.705372 user, 1.460778 system)
[ Run times consist of 9.331 seconds GC time, and 2.836 seconds non-GC time. ]
99.21% CPU
27,105,349,193 processor cycles
930,080,016 bytes consed
(2 7 8 3 9 1 5 4 6 0)
CL-USER> (time (nth (1- 1000000) (permute '(0 1 2 3 4 5 6 7 8 9))))
And it never comes back from the second call. I can only guess that for some reason I'm doing something horrible to the garbage collector but I can't see what. Does anyone have any ideas?
One thing that's wrong in your code is your use of EQ. EQ compares for identity.
EQ is not for comparing numbers. EQ of two numbers can be true or false.
Use EQL if you want to compare by identity, numbers by value or characters. Not EQ.
Actually
(remove-if (lambda (x) (eql x item)) list)
is just
(remove item list)
For your code the EQ bug COULD mean that permute gets called in the recursive call without actually a number removed from the list.
Other than that, I think SBCL is just busy with memory management. SBCL on my Mac acquired lots of memory (more than a GB) and was busy doing something. After some time the result was computed.
Your recursive function generates huge amount of 'garbage'. LispWorks says: 1360950192 bytes
Maybe you can come up with a more efficient implementation?
Update: garbage
Lisp provides some automatic memory management, but that does not free the programmer from thinking about space effects.
Lisp uses both a stack and the heap to allocate memory. The heap maybe structured in certain ways for the GC - for example in generations, half spaces and/or areas. There are precise garbage collectors and 'conservative' garbage collectors (used by SBCL on Intel machines).
When a program runs we can see various effects:
normal recursive procedures allocate space on the stack. Problem: the stack size is usually fixed (even though some Lisps can increase it in an error handler).
a program may allocate huge amount of memory and return a large result. PERMUTE is such a function. It can return very large lists.
a program may allocate memory and use it temporarily and then the garbage collector can recycle it. The rate of creation and destruction can be very high, even though the program does not use a large amount of fixed memory.
There are more problems, though. But for each of the above the Lisp programmer (and every other programmer using a language implementation with garbage collection) has to learn how to deal with that.
Replace recursion with iteration. Replace recursion with tail recursion.
Return only the part of the result that is needed and don't generate the full solution. If you need the n-th permutation, then compute that and not all permutations. Use lazy datastructures that are computed on demand. Use something like SERIES, which allows to use stream-like, but efficient, computation. See SICP, PAIP, and other advanced Lisp books.
Reuse memory with a resource manager. Reuse buffers instead of allocating objects all the time. Use an efficient garbage collector with special support for collecting ephemeral (short-lived) objects. Sometimes it also may help to destructively modify objects, instead of allocating new objects.
Above deals with the space problems of real programs. Ideally our compilers or runtime infrastructure may provide some automatic support to deal with these problems. But in reality this does not really work. Most Lisp systems provide low-level functionality to deal with this and Lisp provides mutable objects - because the experience of real-world Lisp programs has shown that programmers do want to use them to optimize their programs. If you have a large CAD application that computes the form of turbine blades, then theoretical/puristic views about non-mutable memory simply does not apply - the developer wants the faster/smaller code and the smaller runtime footprint.
SBCL on most platforms uses generational garbage collector, which means that allocated memory which survives more than some number of collections will be more rarely considered for collection. Your algorithm for the given test case generates so much garbage that it triggers GC so many times that the actual results, which obviously have to survive the entire function runtime, are tenured, that is, moved to a final generation which is collected either very rarely or not at all. Therefore, the second run will, on standard settings for 32-bit systems, run out of heap (512 MB, can be increased with runtime options).
Tenured data can be garbage collected by manually triggering the collector with (sb-ext:gc :full t). This is obviously implementation dependent.
From the looks of the output, you're looking at the slime-repl, right?
Try changing to the "*inferior-lisp*" buffer, you'll probably see that SBCL has dropped down to the ldb (built-in low-level debugger). Most probably, you've managed to blow the call-stack.