Filter a range without using an intermediate list - functional-programming

I have written a function is-prime that verifies whether a given number is a prime number or not, and returns t or nil accordingly.
(is-prime 2) ; => T
(is-prime 3) ; => T
(is-prime 4) ; => NIL
So far, so good. Now I would like to generate a list of prime numbers between min and max, so I would like to come up with a function that takes those two values as parameters and returns a list of all prime numbers:
(defun get-primes (min max)
...)
Now this is where I am currently stuck. What I could do, of course, is create a list with the range of all numbers from min to max and run remove-if-not on it.
Anyway, this means, that I first have to create a potentially huge list with lots of numbers that I throw away anyway. So I would like to do it the other way round, build up a list that contains only the numbers between min and max that actually are prime according to the is-prime predicate.
How can I do this in a functional way, i.e. without using loop? My current approach (with loop) looks like this:
(defun get-primes (min max)
(loop
for guess from min to max
when (is-prime guess)
collect guess))
Maybe this is a totally dumb question, but I guess I don't see the forest for the trees.
Any hints?

Common Lisp does not favor pure Functional Programming approaches. The language is based on a more pragmatic view of an underlying machine: no TCO, stacks are limited, various resources are limited (the number of arguments which are allowed, etc.), mutation is possible. That does not sound very motivating for any Haskell enthusiast. But Common Lisp was developed to write Lisp applications, not for advancing FP research and development. Evaluation in Common Lisp (as usual in Lisp) is strict and not lazy. The default data structures are also not lazy. Though there are lazy libraries. Common Lisp also does not provide in the standard features like continuations or coroutines - which might be useful in this case.
The default Common Lisp approach is non-functional and uses some kind of iteration construct: DO, LOOP or the ITERATE library. Like in your example. Common Lisp users find it perfectly fine. Some, like me, think that Iterate is the best looping construct ever invented. ;-) The advantage is that it is relatively easy to create efficient code, even though the macros have a large implementation.
There are various ways to do it differently:
lazy streams. reading SICP helps, see the part on streams. Can be easily done in Common Lisp, too. Note that Common Lisp uses the word stream for I/O streams - which is something different.
use a generator function next-prime. Generate this function from the initial range and call it until you got all primes you are interested in.
Here is a simple example of a generator function, generating even numbers from some start value:
(defun make-next-even-fn (start)
(let ((current (- start
(if (evenp start) 2 1))))
(lambda ()
(incf current 2))))
CL-USER 14 > (let ((next-even-fn (make-next-even-fn 13)))
(flet ((get-next-even ()
(funcall next-even-fn)))
(print (get-next-even))
(print (get-next-even))
(print (get-next-even))
(print (get-next-even))
(list (get-next-even)
(get-next-even)
(get-next-even))))
14
16
18
20
(22 24 26)
Defining the next-prime generator is left as an exercise. ;-)
use some kind of more sophisticated lazy data structure. For Common Lisp there is Series, which was an early iteration proposal - also published in CLtL2: Series. The new book Common Lisp Recipes by Edi Weitz (math prof in Hamburg) has a Series example for computing primes. See chapter 7.15. You can download the source code for the book and find the example there.
the are simpler variants of Series, for example pipes

Related

Recursively run through a vector in Clojure

I'm just starting to play with Clojure.
How do I run through a vector of items?
My naive recursive function would have a form like the classic map eg.
(defn map [f xs] (
(if (= xs [])
[]
(cons (f (first xs)) (map f (rest xs))
)
))
The thing is I can't find any examples of this kind of code on the web. I find a lot of examples using built-in sequence traversing functions like for, map and loop. But no-one doing the raw recursive version.
Is that because you SHOULDN'T do this kind of thing in Clojure? (eg. because it uses lower-level Java primitives that don't have tail-call optimisation or something?)?
When you say "run through a vector" this is quite vague; as Clojure is a lisp and thus specializes in sequence analysis and manipulation, the beauty of using this language is that you don't think in terms "run through a vector and then do something with each element," instead you'd more idiomatically say "pull this out of a vector" or "transform this vector into X" or "I want this vector to give me X".
It is because of this type of perspective in lisp languages that you will see so many examples and production code that doesn't just loop/recur through a vector but rather specifically goes after what is wanted in a short, idiomatic way. Using simple functions like reduce map filter for into and others allow you to elegantly move over a sequence such as a vector while simultaneously doing what you want with the contents. In most other languages, this would be at least 2 different parts: the loop, and then the actual logic to do what you want.
You'll often find that if you think about sequences using the more imperative idea you get with languages like C, C++, Java, etc, that your code is about 4x longer (at least) than it would otherwise be if you first thought about your plan in a more functional approach.
Clojure re-uses stack frames only with tail-recurstion and only when you use the explicit recur call. Everything else will be stack consuming. The above map example is not tail recursive because the cons happens after the recursive call so it can't be TCO'd in any language. If you switch it to use the continuation passing style and use an explicit call to recur instead of map then you should be good to go.

Tree data structure algorithms in Scheme?

My level of knowledge so far is such that I have completed The Little Schemer without issue and I'm currently 70% through The Seasoned Schemer. I have in the back of my mind ideas for projects I would like to work on to gain some real-world experience working with Scheme, yet (perhaps because I've worked primarily with OO languages throughout my career) I still find myself wondering how one would tackle some fairly basic problems to an OO language in a functional language like Scheme.
Rather than shove all my questions in a single stackoverflow question, I'll just trickle them out over time and assume the pieces will fall into place so I actually don't need answers to other questions.
It's pretty clear that Scheme's thing is lists. Lists and lists of lists. I'm used to being able to store lists that contain "attributes" that can be retrieved quickly (i.e. hashes) and can be nested one inside the other.
Taking an example of passing around a list of files and directories in a file system, recursively, how would one approach something like this in Scheme? I guess you could pass a data structure of the form:
'(("foo" (("bar.txt")
("zip.txt")
("button.txt")))
("other.txt")
("one-more" (("time.txt"))))
Where each node is represented as the car of a list and its children represented as another list contained in the car of its cdr, thus the above is a tree structure for:
foo/
bar.txt
zip.txt
button.txt
other.txt
one-more/
time.txt
Or perhaps one would pass an iterator function that accepts a visitor instead that does some sort of deep-tree traversal? (Not entirely sure how that looks, in terms of knowing when you're switching directories).
Is there a general pattern when it comes to this sort of problem, not just directory trees, but tree structures (with attached meta data) in general?
Does this inevitably finish up being quite fiddly when compared with the object-oriented equivalent?
It's pretty clear that Scheme's thing is lists.
Traditionally, yes. But since R6RS, there are also records with named fields, which make working with other kinds of data structures a lot easier. Practical Scheme implementations have had these for decades, but they were never standardized, so the syntax varies.
Is there a general pattern when it comes to this sort of problem, not just directory trees, but tree structures (with attached meta data) in general?
Yes, and with your idea of "iterator functions" you've hit the nail on the head. (Except that it would be even more elegant to define a macro instead, so you can say:
(for-each-label (x tree 'in-order)
(display x))
Macros are created with define-syntax.)
I see two parts in your question.
Part one: How to implement trees in Scheme.
In standard R5RS most tree implementations use vectors to represent
the nodes in a tree. For a binary tree with one might define some
helper functions:
(define (tree-left t) (vector-ref t 0))
(define (tree-right t) (vector-ref t 1))
(define (tree-value t) (vector-ref t 2))
(define (make-node l r v) (vector l r v))
(define (leaf? t) (eq? t #f))
In Schemes with structures/records the above is replaced with>
(define-struct tree (left right value))
(define (leaf? t) (eq? t #f))
If hierarkies of structures are supported one can write
(define-struct tree ())
(define-struct (leaf tree) ())
(define-struct (node tree) (left right value))
For Schemes that support pattern matching code that manipulate
trees looks like earily like tree handling code in ML and Haskell.
I can recommend the section on binary search trees in HtDP:
http://htdp.org/2003-09-26/Book/curriculum-Z-H-19.html#node_sec_14.2
Part two: How to handle directory traversals
There are several approaches. One common way is to
provide a function that given a directory produces
an function or sequence that can produce one
element (file or subdirectory) at a time. This is avoids
the need of storing a potentially huge file tree in memory.
In Racket one can use sequences:
#lang racket
(for ([file (in-directory "/users/me")])
(displayln file))
There is no directory traversal mechanisms available in R5RS, but
all the major implementations of course offer some way or another of
doing so.
BTW: It is true, that Scheme provides very good support for single linked lists, but when implementing functional datastructures it is often better to use vectors or even better structures due to faster random element access.

Why is foldl defined in a strange way in Racket?

In Haskell, like in many other functional languages, the function foldl is defined such that, for example, foldl (-) 0 [1,2,3,4] = -10.
This is OK, because foldl (-) 0 [1, 2,3,4] is, by definition, ((((0 - 1) - 2) - 3) - 4).
But, in Racket, (foldl - 0 '(1 2 3 4)) is 2, because Racket "intelligently" calculates like this: (4 - (3 - (2 - (1 - 0)))), which indeed is 2.
Of course, if we define auxiliary function flip, like this:
(define (flip bin-fn)
(lambda (x y)
(bin-fn y x)))
then we could in Racket achieve the same behavior as in Haskell: instead of (foldl - 0 '(1 2 3 4)) we can write: (foldl (flip -) 0 '(1 2 3 4))
The question is: Why is foldl in racket defined in such an odd (nonstandard and nonintuitive) way, differently than in any other language?
The Haskell definition is not uniform. In Racket, the function to both folds have the same order of inputs, and therefore you can just replace foldl by foldr and get the same result. If you do that with the Haskell version you'd get a different result (usually) — and you can see this in the different types of the two.
(In fact, I think that in order to do a proper comparison you should avoid these toy numeric examples where both of the type variables are integers.)
This has the nice byproduct where you're encouraged to choose either foldl or foldr according to their semantic differences. My guess is that with Haskell's order you're likely to choose according to the operation. You have a good example for this: you've used foldl because you want to subtract each number — and that's such an "obvious" choice that it's easy to overlook the fact that foldl is usually a bad choice in a lazy language.
Another difference is that the Haskell version is more limited than the Racket version in the usual way: it operates on exactly one input list, whereas Racket can accept any number of lists. This makes it more important to have a uniform argument order for the input function).
Finally, it is wrong to assume that Racket diverged from "many other functional languages", since folding is far from a new trick, and Racket has roots that are far older than Haskell (or these other languages). The question could therefore go the other way: why is Haskell's foldl defined in a strange way? (And no, (-) is not a good excuse.)
Historical update:
Since this seems to bother people again and again, I did a little bit of legwork. This is not definitive in any way, just my second-hand guessing. Feel free to edit this if you know more, or even better, email the relevant people and ask. Specifically, I don't know the dates where these decisions were made, so the following list is in rough order.
First there was Lisp, and no mention of "fold"ing of any kind. Instead, Lisp has reduce which is very non-uniform, especially if you consider its type. For example, :from-end is a keyword argument that determines whether it's a left or a right scan and it uses different accumulator functions which means that the accumulator type depends on that keyword. This is in addition to other hacks: usually the first value is taken from the list (unless you specify an :initial-value). Finally, if you don't specify an :initial-value, and the list is empty, it will actually apply the function on zero arguments to get a result.
All of this means that reduce is usually used for what its name suggests: reducing a list of values into a single value, where the two types are usually the same. The conclusion here is that it's serving a kind of a similar purpose to folding, but it's not nearly as useful as the generic list iteration construct that you get with folding. I'm guessing that this means that there's no strong relation between reduce and the later fold operations.
The first relevant language that follows Lisp and has a proper fold is ML. The choice that was made there, as noted in newacct's answer below, was to go with the uniform types version (ie, what Racket uses).
The next reference is Bird & Wadler's ItFP (1988), which uses different types (as in Haskell). However, they note in the appendix that Miranda has the same type (as in Racket).
Miranda later on switched the argument order (ie, moved from the Racket order to the Haskell one). Specifically, that text says:
WARNING - this definition of foldl differs from that in older versions of Miranda. The one here is the same as that in Bird and Wadler (1988). The old definition had the two args of `op' reversed.
Haskell took a lot of stuff from Miranda, including the different types. (But of course I don't know the dates so maybe the Miranda change was due to Haskell.) In any case, it's clear at this point that there was no consensus, hence the reversed question above holds.
OCaml went with the Haskell direction and uses different types
I'm guessing that "How to Design Programs" (aka HtDP) was written at roughly the same period, and they chose the same type. There is, however, no motivation or explanation — and in fact, after that exercise it's simply mentioned as one of the built-in functions.
Racket's implementation of the fold operations was, of course, the "built-ins" that are mentioned here.
Then came SRFI-1, and the choice was to use the same-type version (as Racket). This decision was question by John David Stone, who points at a comment in the SRFI that says
Note: MIT Scheme and Haskell flip F's arg order for their reduce and fold functions.
Olin later addressed this: all he said was:
Good point, but I want consistency between the two functions.
state-value first: srfi-1, SML
state-value last: Haskell
Note in particular his use of state-value, which suggests a view where consistent types are a possibly more important point than operator order.
"differently than in any other language"
As a counter-example, Standard ML (ML is a very old and influential functional language)'s foldl also works this way: http://www.standardml.org/Basis/list.html#SIG:LIST.foldl:VAL
Racket's foldl and foldr (and also SRFI-1's fold and fold-right) have the property that
(foldr cons null lst) = lst
(foldl cons null lst) = (reverse lst)
I speculate the argument order was chosen for that reason.
From the Racket documentation, the description of foldl:
(foldl proc init lst ...+) → any/c
Two points of interest for your question are mentioned:
the input lsts are traversed from left to right
And
foldl processes the lsts in constant space
I'm gonna speculate on how the implementation for that might look like, with a single list for simplicity's sake:
(define (my-foldl proc init lst)
(define (iter lst acc)
(if (null? lst)
acc
(iter (cdr lst) (proc (car lst) acc))))
(iter lst init))
As you can see, the requirements of left-to-right traversal and constant space are met (notice the tail recursion in iter), but the order of the arguments for proc was never specified in the description. Hence, the result of calling the above code would be:
(my-foldl - 0 '(1 2 3 4))
> 2
If we had specified the order of the arguments for proc in this way:
(proc acc (car lst))
Then the result would be:
(my-foldl - 0 '(1 2 3 4))
> -10
My point is, the documentation for foldl doesn't make any assumptions on the evaluation order of the arguments for proc, it only has to guarantee that constant space is used and that the elements in the list are evaluated from left to right.
As a side note, you can get the desired evaluation order for your expression by simply writing this:
(- 0 1 2 3 4)
> -10

Practical use of fold/reduce in functional languages

Fold (aka reduce) is considered a very important higher order function. Map can be expressed in terms of fold (see here). But it sounds more academical than practical to me. A typical use could be to get the sum, or product, or maximum of numbers, but these functions usually accept any number of arguments. So why write (fold + 0 '(2 3 5)) when (+ 2 3 5) works fine. My question is, in what situation is it easiest or most natural to use fold?
The point of fold is that it's more abstract. It's not that you can do things that you couldn't before, it's that you can do them more easily.
Using a fold, you can generalize any function that is defined on two elements to apply to an arbitrary number of elements. This is a win because it's usually much easier to write, test, maintain and modify a single function that applies two arguments than to a list. And it's always easier to write, test, maintain, etc. one simple function instead of two with similar-but-not-quite functionality.
Since fold (and for that matter, map, filter, and friends) have well-defined behaviour, it's often much easier to understand code using these functions than explicit recursion.
Basically, once you have the one version, you get the other "for free". Ultimately, you end up doing less work to get the same result.
Here are a few simple examples where reduce works really well.
Find the sum of the maximum values of each sub-list
Clojure:
user=> (def x '((1 2 3) (4 5) (0 9 1)))
#'user/x
user=> (reduce #(+ %1 (apply max %2)) 0 x)
17
Racket:
> (define x '((1 2 3) (4 5) (0 9 1)))
> (foldl (lambda (a b) (+ b (apply max a))) 0 x)
17
Construct a map from a list
Clojure:
user=> (def y '(("dog" "bark") ("cat" "meow") ("pig" "oink")))
#'user/y
user=> (def z (reduce #(assoc %1 (first %2) (second %2)) {} y))
#'user/z
user=> (z "pig")
"oink"
For a more complicated clojure example featuring reduce, check out my solution to Project Euler problems 18 & 67.
See also: reduce vs. apply
In Common Lisp functions don't accept any number of arguments.
There is a constant defined in every Common Lisp implementation CALL-ARGUMENTS-LIMIT, which must be 50 or larger.
This means that any such portably written function should accept at least 50 arguments. But it could be just 50.
This limit exists to allow compilers to possibly use optimized calling schemes and to not provide the general case, where an unlimited number of arguments could be passed.
Thus to really process large (larger than 50 elements) lists or vectors in portable Common Lisp code, it is necessary to use iteration constructs, reduce, map, and similar. Thus it is also necessary to not use (apply '+ large-list) but use (reduce '+ large-list).
Code using fold is usually awkward to read. That's why people prefer map, filter, exists, sum, and so on—when available. These days I'm primarily writing compilers and interpreters; here's some ways I use fold:
Compute the set of free variables for a function, expression, or type
Add a function's parameters to the symbol table, e.g., for type checking
Accumulate the collection of all sensible error messages generated from a sequence of definitions
Add all the predefined classes to a Smalltalk interpreter at boot time
What all these uses have in common is that they're accumulating information about a sequence into some kind of set or dictionary. Eminently practical.
Your example (+ 2 3 4) only works because you know the number of arguments beforehand. Folds work on lists the size of which can vary.
fold/reduce is the general version of the "cdr-ing down a list" pattern. Each algorithm that's about processing every element of a sequence in order and computing some return value from that can be expressed with it. It's basically the functional version of the foreach loop.
Here's an example that nobody else mentioned yet.
By using a function with a small, well-defined interface like "fold", you can replace that implementation without breaking the programs that use it. You could, for example, make a distributed version that runs on thousands of PCs, so a sorting algorithm that used it would become a distributed sort, and so on. Your programs become more robust, simpler, and faster.
Your example is a trivial one: + already takes any number of arguments, runs quickly in little memory, and has already been written and debugged by whoever wrote your compiler. Those properties are not often true of algorithms I need to run.

What are best practices for including parameters such as an accumulator in functions?

I've been writing more Lisp code recently. In particular, recursive functions that take some data, and build a resulting data structure. Sometimes it seems I need to pass two or three pieces of information to the next invocation of the function, in addition to the user supplied data. Lets call these accumulators.
What is the best way to organize these interfaces to my code?
Currently, I do something like this:
(defun foo (user1 user2 &optional acc1 acc2 acc3)
;; do something
(foo user1 user2 (cons x acc1) (cons y acc2) (cons z acc3)))
This works as I'd like it to, but I'm concerned because I don't really need to present the &optional parameters to the programmer.
3 approaches I'm somewhat considering:
have a wrapper function that a user is encouraged to use that immediately invokes the extended definiton.
use labels internally within a function whose signature is concise.
just start using a loop and variables. However, I'd prefer not since I'd like to really wrap my head around recursion.
Thanks guys!
If you want to write idiomatic Common Lisp, I'd recommend the loop and variables for iteration. Recursion is cool, but it's only one tool of many for the Common Lisper. Besides, tail-call elimination is not guaranteed by the Common Lisp spec.
That said, I'd recommend the labels approach if you have a structure, a tree for example, that is unavoidably recursive and you can't get tail calls anyway. Optional arguments let your implementation details leak out to the caller.
Your impulse to shield implementation details from the user is a smart one, I think. I don't know common lisp, but in Scheme you do it by defining your helper function in the public function's lexical scope.
(define (fibonacci n)
(let fib-accum ((a 0)
(b 1)
(n n))
(if (< n 1)
a
(fib-accum b (+ a b) (- n 1)))))
The let expression defines a function and binds it to a name that's only visible within the let, then invokes the function.
I have used all the options you mention. All have their merits, so it boils down to personal preference.
I have arrived at using whatever I deem appropriate. If I think that leaving the &optional accumulators in the API might make sense for the user, I leave it in. For example, in a reduce-like function, the accumulator can be used by the user for providing a starting value. Otherwise, I'll often rewrite it as a loop, do, or iter (from the iterate library) form, if it makes sense to perceive it as such. Sometimes, the labels helper is also used.

Resources