Remove duplicate strings from a list - common-lisp

I have a dead simple Common Lisp question: what is the idiomatic way of removing duplicates from a list of strings?
remove-duplicates works as I'd expect for numbers, but not for strings:
* (remove-duplicates '(1 2 2 3))
(1 2 3)
* (remove-duplicates '("one" "two" "two" "three"))
("one" "two" "two" "three")
I'm guessing there's some sense in which the strings aren't equal, most likely because although "foo" and "foo" are apparently identical, they're actually pointers to different structures in memory. I think my expectation here may just be a C hangover.

You have to tell remove-duplicates how it should compare the values. By default, it uses eql, which is not sufficient for strings. Pass the :test function as in:
(remove-duplicates your-sequence :test #'equal).
(Edit to address the question from the comments): As an alternative to equal, you could use string= in this example. This predicate is (in a way) less generic than equal and it might (could, probably, possibly, eventually...) thus be faster. A real benefit might be, that string= can tell you, if you pass a wrong value:
(equal 1 "foo")
happily yields nil, whereas
(string= 1 "foo")
gives a type-error condition. Note, though, that
(string= "FOO" :FOO)
is perfectly well defined (string= and its friend are defined in terms of "string designators" not strings), so type safety would go only so far here.
The standard eql predicate, on the other hand, is almost never the right way to compare strings. If you are familiar with the Java language, think of eql as using == while equal (or string=, etc.) calling the equals(Object) method. Though eql does some type introspection (as opposed to eq, which does not), for most (non-numeric) lisp types, eql boils down to something like a pointer comparison, which is not sufficient, if you want to discriminate values based on what they actually contain, and not merely on where in memory they are located.
For the more Pythonic inclined, eq (and eql for non-numeric types) is more like the is operator, whereas equal is more like == which calls __eq__.

Related

String Comparison in Common Lisp

I am new to Common Lisp and Functional programming in general. I have a function lets call it "wordToNumber", I want it to check if the input string is "one" "two" "three".. etc (0-9) only. and I want to return 1 2 3 etc. so (wordToNumber "one") should output the number 1. I'm having some trouble with string comparison, tried using eq and eql, but its not working, from what I read it is comparing memory location not actual string. Is there an easier way to go about this or is there someway to compare strings. I need any examples to be purely functional programming, no loops and stuff. This is a small portion of a project I'm working on for school.
Oh, for string comparison im just using a simple function at the moment like this:
(defun wordToNumber(x)
(if(eq 'x "one")(return-from wordToNumber 1)))
and calling it with this : (wordToNumber "one")
keep getting Nil returned
Thanks for any help
The functions to compare strings are string= and string-equal, depending on whether you want the comparison to be case-sensitive.
And when you want to compare the value of a variable, you mustn't quote it, since the purpose of quoting is to prevent evaluation.
(defun word-to-number (x)
(cond ((string-equal x "one") 1)
((string-equal x "two") 2)
((string-equal x "three") 3)
...
))
As a practical matter, before you make a ten-branch conditional, consider this: you can pass string= and string-equal (and any other binary function) as an :test argument to most of the sequence functions. Look though the sequence functions and see if there's something that seems relevant to this problem. http://l1sp.org/cl/17.3 (There totally is!)
One nice thing about Lisp is the apropos function. Lisp is a big language and usually has what you want, and (apropos "string") would prolly have worked for you. I recommend also the Lisp Hyperpec: http://www.lispworks.com/documentation/HyperSpec/Front/
eq is good for symbols, CLOS objects, and even cons cells but be careful: (eq (list 1) (list 1)) is false because each list form returns a different cons pointing to the same number.
eql is fine for numbers and characters and anything eq can handle. One nice thing is that (eql x 42) works even if x is not a number, in which case (= x 42) would not go well.
You need equal for lists and arrays, and strings are arrays so you could use that. Then there is equalp, which I will leave as an exercise.

What's the difference between a sequence and a collection in Clojure

I am a Java programmer and am new to Clojure. From different places, I saw sequence and collection are used in different cases. However, I have no idea what the exact difference is between them.
For some examples:
1) In Clojure's documentation for Sequence:
The Seq interface
(first coll)
Returns the first item in the collection.
Calls seq on its argument. If coll is nil, returns nil.
(rest coll)
Returns a sequence of the items after the first. Calls seq on its argument.
If there are no more items, returns a logical sequence for which seq returns nil.
(cons item seq)
Returns a new seq where item is the first element and seq is the rest.
As you can see, when describing the Seq interface, the first two functions (first/rest) use coll which seems to indicate this is a collection while the cons function use seq which seems to indicate this is a sequence.
2) There are functions called coll? and seq? that can be used to test if a value is a collection or a sequence. It is clearly collection and sequence are different.
3) In Clojure's documentation about 'Collections', it is said:
Because collections support the seq function, all of the sequence
functions can be used with any collection
Does this mean all collections are sequences?
(coll? [1 2 3]) ; => true
(seq? [1 2 3]) ; => false
The code above tells me it is not such case because [1 2 3] is a collection but is not a sequence.
I think this is a pretty basic question for Clojure but I am not able to find a place explaining this clearly what their difference is and which one should I use in different cases. Any comment is appreciated.
Any object supporting the core first and rest functions is a sequence.
Many objects satisfy this interface and every Clojure collection provides at least one kind of seq object for walking through its contents using the seq function.
So:
user> (seq [1 2 3])
(1 2 3)
And you can create a sequence object from a map too
user> (seq {:a 1 :b 2})
([:a 1] [:b 2])
That's why you can use filter, map, for, etc. on maps sets and so on.
So you can treat many collection-like objects as sequences.
That's also why many sequence handling functions such as filter call seq on the input:
(defn filter
"Returns a lazy sequence of the items in coll for which
(pred item) returns true. pred must be free of side-effects."
{:added "1.0"
:static true}
([pred coll]
(lazy-seq
(when-let [s (seq coll)]
If you call (filter pred 5)
Don't know how to create ISeq from: java.lang.Long
RT.java:505 clojure.lang.RT.seqFrom
RT.java:486 clojure.lang.RT.seq
core.clj:133 clojure.core/seq
core.clj:2523 clojure.core/filter[fn]
You see that seq call is the is this object a sequence validation.
Most of this stuff is in Joy of Clojure chapter 5 if you want to go deeper.
Here are few points that will help understand the difference between collection and sequence.
"Collection" and "Sequence" are abstractions, not a property that can be determined from a given value.
Collections are bags of values.
Sequence is a data structure (subset of collection) that is expected to be accessed in a sequential (linear) manner.
The figure below best describes the relation between them:
You can read more about it here.
Every sequence is a collection, but not every collection is a sequence.
The seq function makes it possible to convert a collection into a sequence. E.g. for a map you get a list of its entries. That list of entries is different from the map itself, though.
In Clojure for the brave and true the author sums it up in a really understandable way:
The collection abstraction is closely related to the sequence
abstraction. All of Clojure's core data structures — vectors, maps,
lists and sets — take part in both abstractions.
The abstractions differ in that the sequence abstraction is "about"
operating on members individually while the collection abstraction is
"about" the data structure as a whole. For example, the collection
functions count, empty?, and every? aren't about any individual
element; they're about the whole.
I have just been through Chapter 5 - "Collection Types" of "The Joy of Clojure", which is a bit confusing (i.e. the next version of that book needs a review). In Chapter 5, on page 86, there is a table which I am not fully happy with:
So here's my take (fully updated after coming back to this after a month of reflection).
collection
It's a "thing", a collection of other things.
This is based on the function coll?.
The function coll? can be used to test for this.
Conversely, anything for which coll? returns true is a collection.
The coll? docstring says:
Returns true if x implements IPersistentCollection
Things that are collections as grouped into three separate classes. Things in different classes are never equal.
Maps Test using (map? foo)
Map (two actual implementations with slightly differing behaviours)
Sorted map. Note: (sequential? (sorted-map :a 1) ;=> false
Sets Test using (set? foo)
Set
Sorted set. Note: (sequential? (sorted-set :a :b)) ;=> false
Sequential collections Test using (sequential? foo)
List
Vector
Queue
Seq: (sequential? (seq [1 2 3])) ;=> true
Lazy-Seq: (sequential? (lazy-seq (seq [1 2 3]))) ;=> true
The Java interop stuff is outside of this:
(coll? (to-array [1 2 3])) ;=> false
(map? (doto (new java.util.HashMap) (.put "a" 1) (.put "b" 2))) ;=> false
sequential collection (a "chain")
It's a "thing", a collection holding other things according to a specific, stable ordering.
This is based on the function sequential?.
The function sequential? can be used to test for this.
Conversely, anything for which sequential? returns true is a sequential collection.
The sequential? docstring says:
Returns true if coll implements Sequential
Note: "sequential" is an adjective! In "The Joy of Clojure", the adjective is used as a noun and this is really, really, really confusing:
"Clojure classifies each collection data type into one of three
logical categories or partitions: sequentials, maps, and sets."
Instead of "sequential" one should use a "sequential thing" or a "sequential collection" (as used above). On the other hand, in mathematics the following words already exist: "chain", "totally ordered set", "simply ordered set", "linearly ordered set". "chain" sounds excellent but no-one uses that word. Shame!
"Joy of Clojure" also has this to say:
Beware type-based predicates!
Clojure includes a few predicates with names like the words just
defined. Although they’re not frequently used, it seems worth
mentioning that they may not mean exactly what the definitions here
might suggest. For example, every object for which sequential? returns
true is a sequential collection, but it returns false for some that
are also sequential [better: "that can be considered sequential
collections"]. This is because of implementation details that may be
improved in a future version of Clojure [and maybe this has already been
done?]
sequence (also "sequence abstraction")
This is more a concept than a thing: a series of values (thus ordered) which may or may not exist yet (i.e. a stream). If you say that a thing is a sequence, is that thing also necessarily a Clojure collection, even a sequential collection? I suppose so.
That sequential collection may have been completely computed and be completely available. Or it may be a "machine" to generate values on need (by computation - likely in a "pure" fashion - or by querying external "impure", "oracular" sources: keyboard, databases)
seq
This is a thing: something that can be processed by the functions
first, rest, next, cons (and possibly others?), i.e. something that obeys the protocol clojure.lang.ISeq (which is about the same concept as "providing an implementation for an interface" in Java), i.e. the system has registered function implementations for a pair (thing, function-name) [I sure hope I get this right...]
This is based on the function seq?.
The function seq? can be used to test for this
Conversely, a seq is anything for which seq? returns true.
Docstring for seq?:
Return true if x implements ISeq
Docstring for first:
Returns the first item in the collection. Calls seq on its argument.
If coll is nil, returns nil.
Docstring for rest:
Returns a possibly empty seq of the items after the first. Calls seq
on its argument.
Docstring for next:
Returns a seq of the items after the first. Calls seq on its argument.
If there are no more items, returns nil.
You call next on the seq to generate the next element and a new seq. Repeat until nil is obtained.
Joy of Clojure calls this a "simple API for navigating collections" and says "a seq is any object that implements the seq API" - which is correct if "the API" is the ensemble of the "thing" (of a certain type) and the functions which work on that thing. It depends on suitable shift in the concept of API.
A note on the special case of the empty seq:
(def empty-seq (rest (seq [:x])))
(type? empty-seq) ;=> clojure.lang.PersistentList$EmptyList
(nil? empty-seq) ;=> false ... empty seq is not nil
(some? empty-seq) ;=> true ("true if x is not nil, false otherwise.")
(first empty-seq) ;=> nil ... first of empty seq is nil ("does not exist"); beware confusing this with a nil in a nonempty list!
(next empty-seq) ;=> nil ... "next" of empty seq is nil
(rest empty-seq) ;=> () ... "rest" of empty seq is the empty seq
(type (rest empty-seq)) ;=> clojure.lang.PersistentList$EmptyList
(seq? (rest empty-seq)) ;=> true
(= (rest empty-seq) empty-seq) ;=> true
(count empty-seq) ;=> 0
(empty? empty-seq) ;=> true
Addenda
The function seq
If you apply the function seq to a thing for which that makes sense (generally a sequential collection), you get a seq representing/generating the members of that collection.
The docstring says:
Returns a seq on the collection. If the collection is empty, returns
nil. (seq nil) returns nil. seq also works on Strings, native Java
arrays (of reference types) and any objects that implement Iterable.
Note that seqs cache values, thus seq should not be used on any
Iterable whose iterator repeatedly returns the same mutable object.
After applying seq, you may get objects of various actual classes:
clojure.lang.Cons - try (class (seq (map #(* % 2) '( 1 2 3))))
clojure.lang.PersistentList
clojure.lang.APersistentMap$KeySeq
clojure.lang.PersistentList$EmptyList
clojure.lang.PersistentHashMap$NodeSeq
clojure.lang.PersistentQueue$Seq
clojure.lang.PersistentVector$ChunkedSeq
If you apply seq to a sequence, the actual class of the thing returned may be different from the actual class of the thing passed in. It will still be a sequence.
What the "elements" in the sequence are depends. For example, for maps, they are key-value pairs which look like 2-element vector (but their actual class is not really a vector).
The function lazy-seq
Creates a thing to generate more things lazily (a suspended machine, a suspended stream, a thunk)
The docstring says:
Takes a body of expressions that returns an ISeq or nil, and yields a
Seqable object that will invoke the body only the first time seq is
called, and will cache the result and return it on all subsequent seq
calls. See also - realized?"
A note on "functions" and "things" ... and "objects"
In the Clojure Universe, I like to talk about "functions" and "things", but not about "objects", which is a term heavily laden with Java-ness and other badness. Mention of objects feels like shards poking up from the underlying Java universe.
What is the difference between function and thing?
It's fluid! Some stuff is pure function, some stuff is pure thing, some is in between (can be used as function and has attributes of a thing)
In particular, Clojure allows contexts where one considers keywords (things) as functions (to look up values in maps) or where one interpretes maps (things) as functions, or shorthand for functions (which take a key and return the value associated to that key in the map)
Evidently, functions are things as they are "first-class citizens".
It's also contextual! In some contexts, a function becomes a thing, or a thing becomes a function.
There are nasty mentions of objects ... these are shards poking up from the underlying Java universe.
For presentation purposes, a diagram of Collections
For seq?:
Return true if x implements ISeq
For coll?:
Returns true if x implements IPersistentCollection
And I found ISeq interface extends from IPersistentCollection in Clojure source code, so as Rörd said, every sequences is a collection.

What is the difference between eq?, eqv?, equal?, and = in Scheme?

I wonder what the difference is between those operations in Scheme. I have seen similar questions in Stack Overflow but they are about Lisp, and there is not a comparison between three of those operators.
I am writing the different types of commands in Scheme, and I get the following outputs:
(eq? 5 5) -->#t
(eq? 2.5 2.5) -->#f
(equal? 2.5 2.5) --> #t
(= 2.5 2.5) --> #t
Why is this the case?
I'll answer this question incrementally. Let's start with the = equivalence predicate. The = predicate is used to check whether two numbers are equal. If you supply it anything else but a number then it will raise an error:
(= 2 3) => #f
(= 2.5 2.5) => #t
(= '() '()) => error
The eq? predicate is used to check whether its two parameters respresent the same object in memory. For example:
(define x '(2 3))
(define y '(2 3))
(eq? x y) => #f
(define y x)
(eq? x y) => #t
Note however that there's only one empty list '() in memory (actually the empty list doesn't exist in memory, but a pointer to the memory location 0 is considered as the empty list). Hence when comparing empty lists eq? will always return #t (because they represent the same object in memory):
(define x '())
(define y '())
(eq? x y) => #t
Now depending upon the implementation eq? may or may not return #t for primitive values such as numbers, strings, etc. For example:
(eq? 2 2) => depends upon the implementation
(eq? "a" "a") => depends upon the implementation
This is where the eqv? predicate comes into picture. The eqv? is exactly the same as the eq? predicate, except that it will always return #t for same primitive values. For example:
(eqv? 2 2) => #t
(eqv? "a" "a") => depends upon the implementation
Hence eqv? is a superset of eq? and for most cases you should use eqv? instead of eq?.
Finally we come to the equal? predicate. The equal? predicate is exactly the same as the eqv? predicate, except that it can also be used to test whether two lists, vectors, etc. have corresponding elements which satisfy the eqv? predicate. For example:
(define x '(2 3))
(define y '(2 3))
(equal? x y) => #t
(eqv? x y) => #f
In general:
Use the = predicate when you wish to test whether two numbers are equivalent.
Use the eqv? predicate when you wish to test whether two non-numeric values are equivalent.
Use the equal? predicate when you wish to test whether two lists, vectors, etc. are equivalent.
Don't use the eq? predicate unless you know exactly what you're doing.
There are a full two pages in the RnRS specification related to eq?, eqv?, equal? and =. Here is the Draft R7RS Specification. Check it out!
Explanation:
= compares numbers, 2.5 and 2.5 are numerically equal.
equal? for numbers reduces to =, 2.5 and 2.5 are numerically equal.
eq? compares 'pointers'. The number 5, in your Scheme implementation, is implemented as an 'immediate' (likely), thus 5 and 5 are identical. The number 2.5 may require an allocation of a 'floating point record' in your Scheme implementation, the two pointers are not identical.
eq? is #t when it is the same address/object. Normally one could expect #t for same symbol, boolean and object and #f for values that is of different type, with different values, or not the same structure Scheme/Lisp-implementations has a tradition to embed type in their pointers and to embed values in the same space if it's enough space. Thus some pointers really are not addresses but values, like the char R or the Fixnum 10. These will be eq? since the "address" is an embedded type+value. Some implementations also reuse immutable constants. (eq? '(1 2 3) '(1 2 3)) might be #f when interpreted but #t when compiled since it might get the same address. (Like the constant String pool in Java). Because of this, many expresions involving eq? are unspecified, thus wether it evaluates to #t or #f is implementation dependent.
eqv? are #t for the same things as eq?. It is also #t if it's a number or character and it's value is the same, even when the data is too big to fit into a pointer. Thus for those eqv? does the extra work of checking that type is one of the supported, that both are the same type and it's target objects have the same data value.
equal? is #t for the same things as eqv? and if it's a compound type like pair, vector,
string, and bytevector it recursively does equal? with the parts. In practice it will return #t if the two objects looks the same. Prior to R6RS, it's unsafe to use equal? on circular structures.
= is like eqv? but it only works for numeric types. It might be more efficient.
string=? is like equal?, but it only works for strings. It might be more efficient.
equal? recursively compares two objects (of any type) for equality.
Note this could be expensive for a large data structure since potentially the entire list, string, vector, etc must be traversed.
If the object just contains a single element (EG: number, character, etc), this is the same as eqv?.
eqv? tests two objects to determine if both are "normally regarded as the same object".
eqv? and eq? are very similar operations, and the differences between them are going to be somewhat implementation specific.
eq? is the same as eqv? but may be able to discern finer distinctions, and may be implemented more efficiently.
According to the spec, this might be implemented as a fast and efficient pointer comparison, as opposed to a more complicated operation for eqv?.
= compares numbers for numerical equality.
Note that more than two numbers can be provided, eg: (= 1 1.0 1/1 2/2)
You don't mention a scheme implementation, but in Racket, eq? only returns true if the arguments refer to the same object. Your second example is yielding #f because the system is creating a new floating point number for each argument; they're not the same object.
equal? and = are checking for value equivalence, but = is only applicable to numbers.
If you're using Racket, check here for more information. Otherwise, check the documentation of your scheme implementation.
Think of eq? as pointer equality. The authors of the Report want it to be as general as possible so they don't say this outright because it's implementation-dependent, and to say it, would favor the pointer-based implementations. But they do say
It will usually be possible to implement eq? much more efficiently than eqv?, for example, as a simple pointer comparison
Here's what I mean. (eqv? 2 2) is guaranteed to return #t but (eq? 2 2) is unspecified. Now imagine a pointer-based implementation. In it eq? is just pointer comparison. Since (eq? 2 2) is unspecified, it means that this implementation is free to just create new memory object representation of each new number it reads from the source code. eqv? must actually inspect its arguments.
OTOH (eq 'a 'a) is #t. This means that such implementation must recognize symbols with duplicate names and use the same one representation object in memory for all of them.
Suppose an implementation is not pointer-based. As long as it adheres to the Report, it doesn't matter. The authors just don't want to be seen as dictating the specifics of implementations to the implementors, so they choose their wording carefully.
This is my guess anyway.
So very coarsely, eq? is pointer equality, eqv? is (atomic-)values-aware, equal? is also structure-aware (checks into its arguments recursively, so that finally (equal? '(a) '(a)) is required to be #t), = is for numbers, string=? is for strings, and the details are in the Report.
Apart from the previous answers, I will add some comments.
All these predicates want to define the abstract function of identity for an object but in different contextes.
EQ? is implementation-dependent and it does not answer the question are 2 objects the same? only in limited use. From implementation point of view, this predicate just compares 2 numbers (pointer to objects), it does not look at the content of the objects. So, for example, if your implementation does not uniquely keep the strings inside but allocates different memory for each string, then (eq? "a" "a") will be false.
EQV? -- this looks inside the objects, but with limited use. It is implementation-dependent if it returns true for (eqv? (lambda(x) x) (lambda(x) x)). Here it's a full philosophy how to define this predicate, as we know nowadays that there are some fast methods to compare the functionality of some functions, with limited use. But eqv? provides coherent answer for big numbers, strings, etc.
Practically, some of these predicates tries to use the abstract definition of an object (mathematically), while others use the representation of an object (how it's implemented on a real machine). The mathematical definition of identity comes from Leibniz and it says:
X = Y iff for any P, P(X) = P(Y)
X, Y being objects and
P being any property associated with object X and Y.
Ideally it would be to be able to implement this very definition on computer but for reasons of indecidability and/or speed it is not implemented literally. This is why there are lots of operators that try each one to focus on different viewpoints around this definition.
Try to imagine the abstract definition of an identity for a continuation. Even if you can provide a definition of a subset of functions (sigma-recursive class of functions), the language does not impose any predicate to be true or false. It would complicate a lot both the definition of the language and much more the implementation.
The context for the other predicates is easier to analyze.

Why is foldl defined in a strange way in Racket?

In Haskell, like in many other functional languages, the function foldl is defined such that, for example, foldl (-) 0 [1,2,3,4] = -10.
This is OK, because foldl (-) 0 [1, 2,3,4] is, by definition, ((((0 - 1) - 2) - 3) - 4).
But, in Racket, (foldl - 0 '(1 2 3 4)) is 2, because Racket "intelligently" calculates like this: (4 - (3 - (2 - (1 - 0)))), which indeed is 2.
Of course, if we define auxiliary function flip, like this:
(define (flip bin-fn)
(lambda (x y)
(bin-fn y x)))
then we could in Racket achieve the same behavior as in Haskell: instead of (foldl - 0 '(1 2 3 4)) we can write: (foldl (flip -) 0 '(1 2 3 4))
The question is: Why is foldl in racket defined in such an odd (nonstandard and nonintuitive) way, differently than in any other language?
The Haskell definition is not uniform. In Racket, the function to both folds have the same order of inputs, and therefore you can just replace foldl by foldr and get the same result. If you do that with the Haskell version you'd get a different result (usually) — and you can see this in the different types of the two.
(In fact, I think that in order to do a proper comparison you should avoid these toy numeric examples where both of the type variables are integers.)
This has the nice byproduct where you're encouraged to choose either foldl or foldr according to their semantic differences. My guess is that with Haskell's order you're likely to choose according to the operation. You have a good example for this: you've used foldl because you want to subtract each number — and that's such an "obvious" choice that it's easy to overlook the fact that foldl is usually a bad choice in a lazy language.
Another difference is that the Haskell version is more limited than the Racket version in the usual way: it operates on exactly one input list, whereas Racket can accept any number of lists. This makes it more important to have a uniform argument order for the input function).
Finally, it is wrong to assume that Racket diverged from "many other functional languages", since folding is far from a new trick, and Racket has roots that are far older than Haskell (or these other languages). The question could therefore go the other way: why is Haskell's foldl defined in a strange way? (And no, (-) is not a good excuse.)
Historical update:
Since this seems to bother people again and again, I did a little bit of legwork. This is not definitive in any way, just my second-hand guessing. Feel free to edit this if you know more, or even better, email the relevant people and ask. Specifically, I don't know the dates where these decisions were made, so the following list is in rough order.
First there was Lisp, and no mention of "fold"ing of any kind. Instead, Lisp has reduce which is very non-uniform, especially if you consider its type. For example, :from-end is a keyword argument that determines whether it's a left or a right scan and it uses different accumulator functions which means that the accumulator type depends on that keyword. This is in addition to other hacks: usually the first value is taken from the list (unless you specify an :initial-value). Finally, if you don't specify an :initial-value, and the list is empty, it will actually apply the function on zero arguments to get a result.
All of this means that reduce is usually used for what its name suggests: reducing a list of values into a single value, where the two types are usually the same. The conclusion here is that it's serving a kind of a similar purpose to folding, but it's not nearly as useful as the generic list iteration construct that you get with folding. I'm guessing that this means that there's no strong relation between reduce and the later fold operations.
The first relevant language that follows Lisp and has a proper fold is ML. The choice that was made there, as noted in newacct's answer below, was to go with the uniform types version (ie, what Racket uses).
The next reference is Bird & Wadler's ItFP (1988), which uses different types (as in Haskell). However, they note in the appendix that Miranda has the same type (as in Racket).
Miranda later on switched the argument order (ie, moved from the Racket order to the Haskell one). Specifically, that text says:
WARNING - this definition of foldl differs from that in older versions of Miranda. The one here is the same as that in Bird and Wadler (1988). The old definition had the two args of `op' reversed.
Haskell took a lot of stuff from Miranda, including the different types. (But of course I don't know the dates so maybe the Miranda change was due to Haskell.) In any case, it's clear at this point that there was no consensus, hence the reversed question above holds.
OCaml went with the Haskell direction and uses different types
I'm guessing that "How to Design Programs" (aka HtDP) was written at roughly the same period, and they chose the same type. There is, however, no motivation or explanation — and in fact, after that exercise it's simply mentioned as one of the built-in functions.
Racket's implementation of the fold operations was, of course, the "built-ins" that are mentioned here.
Then came SRFI-1, and the choice was to use the same-type version (as Racket). This decision was question by John David Stone, who points at a comment in the SRFI that says
Note: MIT Scheme and Haskell flip F's arg order for their reduce and fold functions.
Olin later addressed this: all he said was:
Good point, but I want consistency between the two functions.
state-value first: srfi-1, SML
state-value last: Haskell
Note in particular his use of state-value, which suggests a view where consistent types are a possibly more important point than operator order.
"differently than in any other language"
As a counter-example, Standard ML (ML is a very old and influential functional language)'s foldl also works this way: http://www.standardml.org/Basis/list.html#SIG:LIST.foldl:VAL
Racket's foldl and foldr (and also SRFI-1's fold and fold-right) have the property that
(foldr cons null lst) = lst
(foldl cons null lst) = (reverse lst)
I speculate the argument order was chosen for that reason.
From the Racket documentation, the description of foldl:
(foldl proc init lst ...+) → any/c
Two points of interest for your question are mentioned:
the input lsts are traversed from left to right
And
foldl processes the lsts in constant space
I'm gonna speculate on how the implementation for that might look like, with a single list for simplicity's sake:
(define (my-foldl proc init lst)
(define (iter lst acc)
(if (null? lst)
acc
(iter (cdr lst) (proc (car lst) acc))))
(iter lst init))
As you can see, the requirements of left-to-right traversal and constant space are met (notice the tail recursion in iter), but the order of the arguments for proc was never specified in the description. Hence, the result of calling the above code would be:
(my-foldl - 0 '(1 2 3 4))
> 2
If we had specified the order of the arguments for proc in this way:
(proc acc (car lst))
Then the result would be:
(my-foldl - 0 '(1 2 3 4))
> -10
My point is, the documentation for foldl doesn't make any assumptions on the evaluation order of the arguments for proc, it only has to guarantee that constant space is used and that the elements in the list are evaluated from left to right.
As a side note, you can get the desired evaluation order for your expression by simply writing this:
(- 0 1 2 3 4)
> -10

Does REMOVE ever return the same sequence, in practice?

Does REMOVE ever return the same sequence in any real implementations of Common Lisp? The spec suggests that it is allowed:
The result of remove may share with
sequence; the result may be identical
to the input sequence if no elements
need to be removed.
SBCL does not seem to do this, for example, but I only did a crude (and possibly insufficient) test, and I'm wondering what other implementations do.
CL-USER> (defparameter *str* "bbb")
*STR*
CL-USER> *str*
"bbb"
CL-USER> (defparameter *str2* (remove #\a *str*))
*STR2*
CL-USER> (eq *str* *str2*)
NIL
CL-USER> *str*
"bbb"
CL-USER> *str2*
"bbb"
Returning the original string could be useful. In case no element of a string gets removed, returning the original sequence prevents allocation of a new sequence. Even if a new sequence has been allocated internally, this new sequence could be turned into garbage as soon as possible.
CLISP for example returns the original string.
[1]> (let ((a "abc")) (eq a (remove #\d a)))
T
I suspect it mostly depends on the implementation. On the whole, I suspect it's not that common, as the typical case would be that something gets removed when REMOVE is called, so making a space optimisation for the nothing-removed case would incur a run-time penalty and not necessarily saving any space, since you'd want to allocate space for the return value for strings and arrays and would either need to construct a list as you go OR do a two-pass operation.

Resources