Put an element to the tail of a collection - collections

I find myself doing a lot of:
(concat coll [e]) where coll is a collection and e a single element.
Is there a function for doing this in Clojure? I know conj does the job best for vectors but I don't know up front which coll will be used. It could be a vector, list or sorted-set for example.

Some types of collections can add cheaply to the front (lists, seqs), while others can add cheaply to the back (vectors, queues, kinda-sorta lazy-seqs). Rather than using concat, if possible you should arrange to be working with one of those types (vector is most common) and just conj to it: (conj [1 2 3] 4) yields [1 2 3 4], while (conj '(1 2 3) 4) yields (4 1 2 3).

concat does not add an element to the tail of a collection, nor does it concatenate two collections.
concat returns a seq made of the concatenation of two other seqs. The original type of the collections from which seqs may be inferred are lost for the return type of concat.
Now, clojure collections have different properties one must know about in order to write efficient code, that's why there isn't a universal function available in core to concatenate collections of any kind together.
To the contrary, list and vectors do have "natural insertion positions" which conj knows, and does what is right for the kind of collection.

This is a very small addendum to #amalloy's answer in order to address OP's request for a function that always adds to the tail of whatever kind of collection. This is an alternative to (concat coll [x]). Just create a vector version of the original collection:
(defn conj*
[s x]
(conj (vec s) x))
Caveats:
If you started with a lazy sequence, you've now destroyed the laziness--i.e. the output is not lazy. This may be either a good thing or a bad thing, depending on your needs.
There's some cost to creating the vector. If you need to call this function a lot, and you find (e.g. by benchmarking with Criterium) that this cost is significant for your purposes, then follow the other answers' advice to try to use vectors in the first place.

To distill the best of what amalloy and Laurent Petit have already said: use the conj function.
One of the great abstractions that Clojure provides is the Sequence API, which includes the conj function. If at all possible, your code should be as collection-type agnostic as it can be, instead using the seq API to handle operations on collections and picking a particular collection type only when you need to be specific.
If vectors are a good match, then yes, conj will be adding items onto the end. If use lists instead, then conj will be adding things to the front of your collection. But if you then use the standard seq API functions for pulling items from the "top" of a collection (the back of a vector, the front of a list), it doesn't matter which implementation you use, because it will always use the one with best performance and thus adding and removing items will be consistent.

If you are working with lazy sequences, you can also use lazy-cat:
(take 5 (lazy-cat (range) [1])) ; (0 1 2 3 4)
Or you could make it a utility method:
(defn append [coll & items] (lazy-cat coll items))
Then use it like this:
(take 5 (append (range) 1)) ; (0 1 2 3 4)

Related

Is there a intersection function for vectors?

Very often one finds statements that lists have a performance disadvantage compared to vectors because of consing and additional gc steps and some function work on generic sequences accepting lists and vectors.
But some functions like intersection expect two lists. Is there a library providing an alternative for vectors?
I started with something like this, but have the feeling that there should be a more mature solution.
(defun vec-intersec (vec-1 vec-2 &aux (result (make-array 0 :adjustable t :fill-pointer 0)))
"A simple implementation of intersection for vectors instead of lists."
(loop :for v1 :across vec-1
:if (find v1 vec-2 :test #'equal)
:do (vector-push-extend v1 result))
result)
It always depends on the size of your collection and what you want to do with it.
Below about 20 to 50 elements, lists are often perfectly OK even for random access (if you're not in a tight inner loop, or consing a lot).
If you already have vectors, it might be most convenient to sort one of them so that you can do a binary search instead of a naïve linear one. If that is not enough, and your collections bigger, putting the elements into a hash-table (as keys, with an appropriate :test) gives you faster (amortized) lookup.
This should take you quite far. If you identify an issue that cannot be solved in such a simple way, you might want to look into FSet or CL-Containers, which support more advanced data structures.

What's the difference between a sequence and a collection in Clojure

I am a Java programmer and am new to Clojure. From different places, I saw sequence and collection are used in different cases. However, I have no idea what the exact difference is between them.
For some examples:
1) In Clojure's documentation for Sequence:
The Seq interface
(first coll)
Returns the first item in the collection.
Calls seq on its argument. If coll is nil, returns nil.
(rest coll)
Returns a sequence of the items after the first. Calls seq on its argument.
If there are no more items, returns a logical sequence for which seq returns nil.
(cons item seq)
Returns a new seq where item is the first element and seq is the rest.
As you can see, when describing the Seq interface, the first two functions (first/rest) use coll which seems to indicate this is a collection while the cons function use seq which seems to indicate this is a sequence.
2) There are functions called coll? and seq? that can be used to test if a value is a collection or a sequence. It is clearly collection and sequence are different.
3) In Clojure's documentation about 'Collections', it is said:
Because collections support the seq function, all of the sequence
functions can be used with any collection
Does this mean all collections are sequences?
(coll? [1 2 3]) ; => true
(seq? [1 2 3]) ; => false
The code above tells me it is not such case because [1 2 3] is a collection but is not a sequence.
I think this is a pretty basic question for Clojure but I am not able to find a place explaining this clearly what their difference is and which one should I use in different cases. Any comment is appreciated.
Any object supporting the core first and rest functions is a sequence.
Many objects satisfy this interface and every Clojure collection provides at least one kind of seq object for walking through its contents using the seq function.
So:
user> (seq [1 2 3])
(1 2 3)
And you can create a sequence object from a map too
user> (seq {:a 1 :b 2})
([:a 1] [:b 2])
That's why you can use filter, map, for, etc. on maps sets and so on.
So you can treat many collection-like objects as sequences.
That's also why many sequence handling functions such as filter call seq on the input:
(defn filter
"Returns a lazy sequence of the items in coll for which
(pred item) returns true. pred must be free of side-effects."
{:added "1.0"
:static true}
([pred coll]
(lazy-seq
(when-let [s (seq coll)]
If you call (filter pred 5)
Don't know how to create ISeq from: java.lang.Long
RT.java:505 clojure.lang.RT.seqFrom
RT.java:486 clojure.lang.RT.seq
core.clj:133 clojure.core/seq
core.clj:2523 clojure.core/filter[fn]
You see that seq call is the is this object a sequence validation.
Most of this stuff is in Joy of Clojure chapter 5 if you want to go deeper.
Here are few points that will help understand the difference between collection and sequence.
"Collection" and "Sequence" are abstractions, not a property that can be determined from a given value.
Collections are bags of values.
Sequence is a data structure (subset of collection) that is expected to be accessed in a sequential (linear) manner.
The figure below best describes the relation between them:
You can read more about it here.
Every sequence is a collection, but not every collection is a sequence.
The seq function makes it possible to convert a collection into a sequence. E.g. for a map you get a list of its entries. That list of entries is different from the map itself, though.
In Clojure for the brave and true the author sums it up in a really understandable way:
The collection abstraction is closely related to the sequence
abstraction. All of Clojure's core data structures — vectors, maps,
lists and sets — take part in both abstractions.
The abstractions differ in that the sequence abstraction is "about"
operating on members individually while the collection abstraction is
"about" the data structure as a whole. For example, the collection
functions count, empty?, and every? aren't about any individual
element; they're about the whole.
I have just been through Chapter 5 - "Collection Types" of "The Joy of Clojure", which is a bit confusing (i.e. the next version of that book needs a review). In Chapter 5, on page 86, there is a table which I am not fully happy with:
So here's my take (fully updated after coming back to this after a month of reflection).
collection
It's a "thing", a collection of other things.
This is based on the function coll?.
The function coll? can be used to test for this.
Conversely, anything for which coll? returns true is a collection.
The coll? docstring says:
Returns true if x implements IPersistentCollection
Things that are collections as grouped into three separate classes. Things in different classes are never equal.
Maps Test using (map? foo)
Map (two actual implementations with slightly differing behaviours)
Sorted map. Note: (sequential? (sorted-map :a 1) ;=> false
Sets Test using (set? foo)
Set
Sorted set. Note: (sequential? (sorted-set :a :b)) ;=> false
Sequential collections Test using (sequential? foo)
List
Vector
Queue
Seq: (sequential? (seq [1 2 3])) ;=> true
Lazy-Seq: (sequential? (lazy-seq (seq [1 2 3]))) ;=> true
The Java interop stuff is outside of this:
(coll? (to-array [1 2 3])) ;=> false
(map? (doto (new java.util.HashMap) (.put "a" 1) (.put "b" 2))) ;=> false
sequential collection (a "chain")
It's a "thing", a collection holding other things according to a specific, stable ordering.
This is based on the function sequential?.
The function sequential? can be used to test for this.
Conversely, anything for which sequential? returns true is a sequential collection.
The sequential? docstring says:
Returns true if coll implements Sequential
Note: "sequential" is an adjective! In "The Joy of Clojure", the adjective is used as a noun and this is really, really, really confusing:
"Clojure classifies each collection data type into one of three
logical categories or partitions: sequentials, maps, and sets."
Instead of "sequential" one should use a "sequential thing" or a "sequential collection" (as used above). On the other hand, in mathematics the following words already exist: "chain", "totally ordered set", "simply ordered set", "linearly ordered set". "chain" sounds excellent but no-one uses that word. Shame!
"Joy of Clojure" also has this to say:
Beware type-based predicates!
Clojure includes a few predicates with names like the words just
defined. Although they’re not frequently used, it seems worth
mentioning that they may not mean exactly what the definitions here
might suggest. For example, every object for which sequential? returns
true is a sequential collection, but it returns false for some that
are also sequential [better: "that can be considered sequential
collections"]. This is because of implementation details that may be
improved in a future version of Clojure [and maybe this has already been
done?]
sequence (also "sequence abstraction")
This is more a concept than a thing: a series of values (thus ordered) which may or may not exist yet (i.e. a stream). If you say that a thing is a sequence, is that thing also necessarily a Clojure collection, even a sequential collection? I suppose so.
That sequential collection may have been completely computed and be completely available. Or it may be a "machine" to generate values on need (by computation - likely in a "pure" fashion - or by querying external "impure", "oracular" sources: keyboard, databases)
seq
This is a thing: something that can be processed by the functions
first, rest, next, cons (and possibly others?), i.e. something that obeys the protocol clojure.lang.ISeq (which is about the same concept as "providing an implementation for an interface" in Java), i.e. the system has registered function implementations for a pair (thing, function-name) [I sure hope I get this right...]
This is based on the function seq?.
The function seq? can be used to test for this
Conversely, a seq is anything for which seq? returns true.
Docstring for seq?:
Return true if x implements ISeq
Docstring for first:
Returns the first item in the collection. Calls seq on its argument.
If coll is nil, returns nil.
Docstring for rest:
Returns a possibly empty seq of the items after the first. Calls seq
on its argument.
Docstring for next:
Returns a seq of the items after the first. Calls seq on its argument.
If there are no more items, returns nil.
You call next on the seq to generate the next element and a new seq. Repeat until nil is obtained.
Joy of Clojure calls this a "simple API for navigating collections" and says "a seq is any object that implements the seq API" - which is correct if "the API" is the ensemble of the "thing" (of a certain type) and the functions which work on that thing. It depends on suitable shift in the concept of API.
A note on the special case of the empty seq:
(def empty-seq (rest (seq [:x])))
(type? empty-seq) ;=> clojure.lang.PersistentList$EmptyList
(nil? empty-seq) ;=> false ... empty seq is not nil
(some? empty-seq) ;=> true ("true if x is not nil, false otherwise.")
(first empty-seq) ;=> nil ... first of empty seq is nil ("does not exist"); beware confusing this with a nil in a nonempty list!
(next empty-seq) ;=> nil ... "next" of empty seq is nil
(rest empty-seq) ;=> () ... "rest" of empty seq is the empty seq
(type (rest empty-seq)) ;=> clojure.lang.PersistentList$EmptyList
(seq? (rest empty-seq)) ;=> true
(= (rest empty-seq) empty-seq) ;=> true
(count empty-seq) ;=> 0
(empty? empty-seq) ;=> true
Addenda
The function seq
If you apply the function seq to a thing for which that makes sense (generally a sequential collection), you get a seq representing/generating the members of that collection.
The docstring says:
Returns a seq on the collection. If the collection is empty, returns
nil. (seq nil) returns nil. seq also works on Strings, native Java
arrays (of reference types) and any objects that implement Iterable.
Note that seqs cache values, thus seq should not be used on any
Iterable whose iterator repeatedly returns the same mutable object.
After applying seq, you may get objects of various actual classes:
clojure.lang.Cons - try (class (seq (map #(* % 2) '( 1 2 3))))
clojure.lang.PersistentList
clojure.lang.APersistentMap$KeySeq
clojure.lang.PersistentList$EmptyList
clojure.lang.PersistentHashMap$NodeSeq
clojure.lang.PersistentQueue$Seq
clojure.lang.PersistentVector$ChunkedSeq
If you apply seq to a sequence, the actual class of the thing returned may be different from the actual class of the thing passed in. It will still be a sequence.
What the "elements" in the sequence are depends. For example, for maps, they are key-value pairs which look like 2-element vector (but their actual class is not really a vector).
The function lazy-seq
Creates a thing to generate more things lazily (a suspended machine, a suspended stream, a thunk)
The docstring says:
Takes a body of expressions that returns an ISeq or nil, and yields a
Seqable object that will invoke the body only the first time seq is
called, and will cache the result and return it on all subsequent seq
calls. See also - realized?"
A note on "functions" and "things" ... and "objects"
In the Clojure Universe, I like to talk about "functions" and "things", but not about "objects", which is a term heavily laden with Java-ness and other badness. Mention of objects feels like shards poking up from the underlying Java universe.
What is the difference between function and thing?
It's fluid! Some stuff is pure function, some stuff is pure thing, some is in between (can be used as function and has attributes of a thing)
In particular, Clojure allows contexts where one considers keywords (things) as functions (to look up values in maps) or where one interpretes maps (things) as functions, or shorthand for functions (which take a key and return the value associated to that key in the map)
Evidently, functions are things as they are "first-class citizens".
It's also contextual! In some contexts, a function becomes a thing, or a thing becomes a function.
There are nasty mentions of objects ... these are shards poking up from the underlying Java universe.
For presentation purposes, a diagram of Collections
For seq?:
Return true if x implements ISeq
For coll?:
Returns true if x implements IPersistentCollection
And I found ISeq interface extends from IPersistentCollection in Clojure source code, so as Rörd said, every sequences is a collection.

Into or vec: converting sequence back to vector in Clojure

I have the following code which increments the first element of every pair in a vector:
(vec (map (fn [[key value]] [(inc key) value]) [[0 :a] [1 :b]]))
However i fear this code is inelegant, as it first creates a sequence using map and then casts it back to a vector.
Consider this analog:
(into [] (map (fn [[key value]] [(inc key) value]) [[0 :a] [1 :b]]))
On #clojure#irc.freenode.net i was told, that using the code above is bad, because into expands into (reduce conj [] (map-indexed ...)), which produces many intermediate objects in the process. Then i was told that actually into doesn't expand into (reduce conj ...) and uses transients when it can. Also measuring elapsed time showed that into is actually faster than vec.
So my questions are:
What is the proper way to use map over vectors?
What happens underneath, when i use vec and into with vectors?
Related but not duplicate questions:
Clojure: sequence back to vector
How To Turn a Reduce-Realized Sequence Back Into Lazy Vector Sequence
Actually as of Clojure 1.4.0 the preferred way of doing this is to use mapv, which is like map except its return value is a vector. It is by far the most efficient approach, with no unnecessary intermediate allocations at all.
Clojure 1.5.0 will bring a new reducers library which will provide a generic way to map, filter, take, drop etc. while creating vectors, usable with into []. You can play with it in the 1.5.0 alphas and in the recent tagged releases of ClojureScript.
As for (vec some-seq) and (into [] some-seq), the first ultimately delegates to a Java loop which pours some-seq into an empty transient vector, while the second does the same thing in very efficient Clojure code. In both cases there are some initial checks involved to determine which approach to take when constructing the final return value.
vec and into [] are significantly different for Java arrays of small length (up to 32) -- the first will alias the array (use it as the tail of the newly created vector) and demands that the array not be modified subsequently, lest the contents of the vector change (see the docstring); the latter creates a new vector with a new tail and doesn't care about future changes to the array.

Practical use of fold/reduce in functional languages

Fold (aka reduce) is considered a very important higher order function. Map can be expressed in terms of fold (see here). But it sounds more academical than practical to me. A typical use could be to get the sum, or product, or maximum of numbers, but these functions usually accept any number of arguments. So why write (fold + 0 '(2 3 5)) when (+ 2 3 5) works fine. My question is, in what situation is it easiest or most natural to use fold?
The point of fold is that it's more abstract. It's not that you can do things that you couldn't before, it's that you can do them more easily.
Using a fold, you can generalize any function that is defined on two elements to apply to an arbitrary number of elements. This is a win because it's usually much easier to write, test, maintain and modify a single function that applies two arguments than to a list. And it's always easier to write, test, maintain, etc. one simple function instead of two with similar-but-not-quite functionality.
Since fold (and for that matter, map, filter, and friends) have well-defined behaviour, it's often much easier to understand code using these functions than explicit recursion.
Basically, once you have the one version, you get the other "for free". Ultimately, you end up doing less work to get the same result.
Here are a few simple examples where reduce works really well.
Find the sum of the maximum values of each sub-list
Clojure:
user=> (def x '((1 2 3) (4 5) (0 9 1)))
#'user/x
user=> (reduce #(+ %1 (apply max %2)) 0 x)
17
Racket:
> (define x '((1 2 3) (4 5) (0 9 1)))
> (foldl (lambda (a b) (+ b (apply max a))) 0 x)
17
Construct a map from a list
Clojure:
user=> (def y '(("dog" "bark") ("cat" "meow") ("pig" "oink")))
#'user/y
user=> (def z (reduce #(assoc %1 (first %2) (second %2)) {} y))
#'user/z
user=> (z "pig")
"oink"
For a more complicated clojure example featuring reduce, check out my solution to Project Euler problems 18 & 67.
See also: reduce vs. apply
In Common Lisp functions don't accept any number of arguments.
There is a constant defined in every Common Lisp implementation CALL-ARGUMENTS-LIMIT, which must be 50 or larger.
This means that any such portably written function should accept at least 50 arguments. But it could be just 50.
This limit exists to allow compilers to possibly use optimized calling schemes and to not provide the general case, where an unlimited number of arguments could be passed.
Thus to really process large (larger than 50 elements) lists or vectors in portable Common Lisp code, it is necessary to use iteration constructs, reduce, map, and similar. Thus it is also necessary to not use (apply '+ large-list) but use (reduce '+ large-list).
Code using fold is usually awkward to read. That's why people prefer map, filter, exists, sum, and so on—when available. These days I'm primarily writing compilers and interpreters; here's some ways I use fold:
Compute the set of free variables for a function, expression, or type
Add a function's parameters to the symbol table, e.g., for type checking
Accumulate the collection of all sensible error messages generated from a sequence of definitions
Add all the predefined classes to a Smalltalk interpreter at boot time
What all these uses have in common is that they're accumulating information about a sequence into some kind of set or dictionary. Eminently practical.
Your example (+ 2 3 4) only works because you know the number of arguments beforehand. Folds work on lists the size of which can vary.
fold/reduce is the general version of the "cdr-ing down a list" pattern. Each algorithm that's about processing every element of a sequence in order and computing some return value from that can be expressed with it. It's basically the functional version of the foreach loop.
Here's an example that nobody else mentioned yet.
By using a function with a small, well-defined interface like "fold", you can replace that implementation without breaking the programs that use it. You could, for example, make a distributed version that runs on thousands of PCs, so a sorting algorithm that used it would become a distributed sort, and so on. Your programs become more robust, simpler, and faster.
Your example is a trivial one: + already takes any number of arguments, runs quickly in little memory, and has already been written and debugged by whoever wrote your compiler. Those properties are not often true of algorithms I need to run.

Functional Programming: what is an "improper list"?

Could somebody explain what an "improper list" is?
Note: Thanks to all ! All you guys rock!
I think #Vijay's answer is the best one so far and I just intend to Erlangify it.
Pairs (cons cells) in Erlang are written as [Head|Tail] and nil is written as []. There is no restriction as to what the head and tail are but if you use the tail to chain more cons cells you get a list. If the final tail is [] then you get a proper list. There is special syntactic support for lists in that the proper list
[1|[2|[3|[]]]]
is written as
[1,2,3]
and the improper list
[1|[2|[3|4]]]
is written as
[1,2,3|4]
so you can see the difference. Matching against proper/improper lists is correspondingly easy. So a length function len for proper lists:
len([_|T]) -> 1 + len(T);
len([]) -> 0.
where we explicitly match for the terminating []. If given an improper list this will generate an error. While the function last_tail which returns the last tail of a list can handle improper lists as well:
last_tail([_|T]) -> last_tail(T);
last_tail(Tail) -> Tail. %Will match any tail
Note that building a list, or matching against it, as you normally do with [Head|Tail] does not check if the tail is list so there is no problem in handling improper lists. There is seldom a need for improper lists, though you can do cool things with them.
I think it's easier to explain this using Scheme.
A list is a chain of pairs that end with an empty list. In other words, a list ends with a pair whose cdr is ()
(a . (b . (c . (d . (e . ())))))
;; same as
(a b c d e)
A chain of pairs that doesn't end in the empty list is called an improper list. Note that an improper list is not a list. The list and dotted notations can be combined to represent improper lists, as the following equivalent notations show:
(a b c . d)
(a . (b . (c . d)))
An example of a usual mistake that leads to the construction of an improper list is:
scheme> (cons 1 (cons 2 3))
(1 2 . 3)
Notice the dot in (1 2 . 3)---that's like the dot in (2 . 3), saying that the cdr of a pair points to 3, not another pair or '(). That is, it's an improper list, not just a list of pairs. It doesn't fit the recursive definition of a list, because when we get to the second pair, its cdr isn't a list--it's an integer.
Scheme printed out the first part of the list as though it were a normal cdr-linked list, but when it got to the end, it couldn't do that, so it used "dot notation."
You generally shouldn't need to worry about dot notation, because you should use normal lists, not improper list. But if you see an unexpected dot when Scheme prints out a data structure, it's a good guess that you used cons and gave it a non-list as its second argument--something besides another pair or ().
Scheme provides a handy procedure that creates proper lists, called list. list can take any number of arguments, and constructs a proper list with those elements in that order. You don't have to remember to supply the empty list---list automatically terminates the list that way.
Scheme>(list 1 2 3 4)
(1 2 3 4)
Courtesy: An Introduction to Scheme
The definition of a list in Erlang is given in the manual - specifically Section 2.10
In Erlang the only thing you really need to know about improper lists is how to avoid them, and the way to do that is very simple - it is all down to the first 'thing' that you are going to build your list on. The following all create proper lists:
A = [].
B = [term()].
C = [term(), term(), term()].
In all these cases the syntax ensures that there is a hidden 'empty' tail which matches to '[]' sort of at the end....
So from them the following operations all produce a proper list:
X = [term() | A].
Y = [term() | B].
Z = [term() | C].
They are all operations which add a new head to a proper list.
What makes is useful is that you can feed each of X, Y or Z into a function like:
func([], Acc) -> Acc;
func([H | T], Acc) -> NewAcc = do_something(H),
func(T, [NewAcc | Acc]).
And they will rip through the list and terminate on the top clause when the hidden empty list at the tail is all that is left.
The problem comes when your base list has been improperly made, like so:
D = [term1() | term2()]. % term2() is any term except a list
This list doesn't have the hidden empty list as the terminal tail, it has a term...
From here on downwards is mince as Robert Virding pointed out in the comments
So how do you write a terminal clause for it?
What makes it infuriating is that there is no way to see if a list is improper by inspecting it... print the damn thing out it looks good... So you end up creating an improper base list, doing some stuff on it, passing it around, and then suddenly kabloowie you have a crash miles from where the error is and you pull your hair and scream and shout...
But you should be using the dialyzer to sniff these little beasts out for you.
Apologies
Following Robert's comment I tried printing out an improper list and, lo and behold, it is obvious:
(arrian#localhost)5>A = [1, 2, 3, 4].
[1,2,3,4]
(arrian#localhost)5> B = [1, 2, 3 | 4].
[1,2,3|4]
(arrian#localhost)6> io:format("A is ~p~nB is ~p~n", [A, B]).
A is [1,2,3,4]
B is [1,2,3|4]
I had spent some time hunting an improper list once and had convinced myself it was invsible, well Ah ken noo!
To understand what an improper list is, you must first understand the definition of a proper list.
Specifically, the "neat discovery" of lists is that you can represent a list using only forms with a fixed number of elements, viz:
;; a list is either
;; - empty, or
;; - (cons v l), where v is a value and l is a list.
This "data definition" (using the terms of How To Design Programs) has all kinds of
nice properties. One of the nicest is that if we define the behavior or meaning of a function on each "branch" of the data definition, we're guaranteed not to miss a case. More significantly, structures like this generally lead to nice clean recursive solutions.
The classic "length" example:
(define (length l)
(cond [(empty? l) 0]
[else (+ 1 (length (rest l))]))
Of course, everything's prettier in Haskell:
length [] = 0
length (f:r) = 1 + length r
So, what does this have to do with improper lists?
Well, an improper list uses this data definition, instead:
;; an improper list is either
;; - a value, or
;; - (cons v l), where v is a value and l is an improper list
The problem is that this definition leads to ambiguity. In particular, the first and second cases overlap. Suppose I define "length" for an improper list thusly:
(define (length l)
(cond [(cons? l) (+ 1 (length (rest l)))]
[else 1]))
The problem is that I've destroyed the nice property that if I take two values and put them into an improper list with (cons a b), the result has length two. To see why, suppose I consider the values (cons 3 4) and (cons 4 5). The result is (cons (cons 3 4) (cons 4 5)), which may be interpreted either as the improper list containing (cons 3 4) and (cons 4 5), or as the improper list containing (cons 3 4), 4, and 5.
In a language with a more restrictive type system (e.g. Haskell), the notion of an "improper list" doesn't make quite as much sense; you could interpret it as a datatype whose base case has two things in it, which is probably not what you want, either.
I think possibly it refers to a "dotted pair" in LISP, e.g. a list whose final cons cell has an atom, rather than a reference to another cons cell or NIL, in the cdr.
EDIT
Wikipedia suggests that a circular list also counts as improper. See
http://en.wikipedia.org/wiki/Lisp_(programming_language)
and search for 'improper' and check the footnotes.
I would say the implication of an improper list is that a recursive treatment of the list will not match the typical termination condition.
For example, say you call the following sum, in Erlang, on an improper list:
sum([H|T]) -> H + sum(T);
sum([]) -> 0.
Then it will raise an exception since the last tail is not the empty list, but an atom.
In Common Lisp improper lists are defined as:
dotted lists that have a non-NIL terminating 'atom'.
Example
(a b c d . f)
or
circular lists
Example
#1=(1 2 3 . #1#)
A list is made up of cells, each cell consisting of two pointers. First one pointing to the data element, second one to the next cell, or nil at the end of the list.
If the second one does not point to a cell (or nil), the list is improper. Functional languages will most probably allow you to construct cells, so you should be able to generate improper lists.
In Erlang (and probably in other FP languages as well) you can save some memory by storing your 2-tuples as improper lists:
2> erts_debug:flat_size({1,2}).
3
3> erts_debug:flat_size([1|2]).
2
In Erlang a proper list is one where [H|T].
H is the head of the list and T is the rest of the list as another list.
An improper list does not conform to this definition.
In erlang, a proper list is a singly linked list. An improper list is a singly linked list with the last node not being a real list node.
A proper list
[1, 2, 3] is like
An improper list
[1, 2 | 3] is like

Resources