Why is there no generic operators for Common Lisp? - common-lisp

In CL, we have many operators to check for equality that depend on the data type: =, string-equal, char=, then equal, eql and whatnot, so on for other data types, and the same for comparison operators (edit don't forget to answer about these please :) do we have generic <, > etc ? can we make them work for another object ?)
However the language has mechanisms to make them generic, for example generics (defgeneric, defmethod) as described in Practical Common Lisp. I imagine very well the same == operator that will work on integers, strings and characters, at least !
There have been work in that direction: https://common-lisp.net/project/cdr/document/8/cleqcmp.html
I see this as a major frustration, and even a wall, for beginners (of which I am), specially we who come from other languages like python where we use one equality operator (==) for every equality check (with the help of objects to make it so on custom types).
I read a blog post (not a monad tutorial, great serie) today pointing this. The guy moved to Clojure, for other reasons too of course, where there is one (or two?) operators.
So why is it so ? Is there any good reasons ? I can't even find a third party library, not even on CL21. edit: cl21 has this sort of generic operators, of course.
On other SO questions I read about performance. First, this won't apply to the little code I'll write so I don't care, and if you think so do you have figures to make your point ?
edit: despite the tone of the answers, it looks like there is not ;) We discuss in comments.

Kent Pitman has written an interesting article that tackles this subject: The Best of intentions, EQUAL rights — and wrongs — in Lisp.
And also note that EQUAL does work on integers, strings and characters. EQUALP also works for lists, vectors and hash tables an other Common Lisp types but objects… For some definition of work. The note at the end of the EQUALP page has a nice answer to your question:
Object equality is not a concept for which there is a uniquely determined correct algorithm. The appropriateness of an equality predicate can be judged only in the context of the needs of some particular program. Although these functions take any type of argument and their names sound very generic, equal and equalp are not appropriate for every application.
Specifically note that there is a trick in my last “works” definition.

A newer library adds generic interfaces to standard Common Lisp functions: https://github.com/alex-gutev/generic-cl/
GENERIC-CL provides a generic function wrapper over various functions in the Common Lisp standard, such as equality predicates and sequence operations. The goal of the wrapper is to provide a standard interface to common operations, such as testing for the equality of two objects, which is extensible to user-defined types.
It does this for equality, comparison, arithmetic, objects, iterators, sequences, hash-tables, math functions,…
So one can define his own + operator for example.

Yes we have! eq works with all values and it works all the time. It does not depend on the data type at all. It is exactly what you are looking for. It's like the is operator in python. It must be exactly what you were looking for? All the other ones agree with eq when it's t, however they tend to be t for totally different values that have various levels of similarities.
(defparameter *a* "this is a string")
(defparameter *b* *a*)
(defparameter *c* "this is a string")
(defparameter *d* "THIS IS A STRING")
All of these are equalp since they contain the same meaning. equalp is perhaps the sloppiest of equal functions. I don't think 2 and 2.0 are the same, but equalp does. In my mind 2 is 2 while 2.0 is somewhere between 1.95 and 2.04. you see they are not the same.
equal understands me. (equal *c* *d*) is definitely nil and that is good. However it returns t for (equal *a* *c*) as well. Both are arrays of characters and each character are the same value, however the two strings are not the same object. they just happen to look the same.
Notice I'm using string here for every single one of them. We have 4 equal functions that tells you if two values have something in common, but only eq tells you if they are the same.
None of these are type specific. They work on all types, however they are not generics since they were around long before that was added in the language. You could perhaps make 3-4 generic equal functions but would they really be any better than the ones we already have?

Fortunately CL21 introduces (more) generic operators, particularly for sequences it defines length, append, setf, first, rest, subseq, replace, take, drop, fill, take-while, drop-while, last, butlast, find-if, search, remove-if, delete-if, reverse, reduce, sort, split, join, remove-duplicates, every, some, map, sum (and some more). Unfortunately the doc isn't great, it's best to look at the sources. Those should work at least for strings, lists, vectors and define methods of the new abstract-sequence.
see also
https://github.com/cl21/cl21/wiki
https://lispcookbook.github.io/cl-cookbook/cl21.html

Related

Moral of the story from SICP Ex. 1.20?

In this exercise we are asked to trace Euclid's algorithm using first normal order then applicative order evaluation.
(define (gcd a b)
(if (= b 0)
a
(gcd b (remainder a b))))
I've done the manual trace, and checked it with the several solutions available on the internet. I'm curious here to consolidate the moral of the exercise.
In gcd above, note that b is re-used three times in the function body, plus this function is recursive. This being what gives rise to 18 calls to remainder for normal order, in contrast to only 4 for applicative order.
So, it seems that when a function uses an argument more than once in its body, (and perhaps recursively! as here), then not evaluating it when the function is called (i.e. applicative order), will lead to redundant recomputation of the same thing.
Note that the question is at pains to point out that the special form if does not change its behaviour when normal order is used; that is, if will always run first; if this didn't happen, there could be no termination in this example.
I'm curious regarding delayed evaluation we are seeing here.
As a plus, it might allow us to handle infinite things, like streams.
As a minus, if we have a function like here, it can cause great inefficiency.
To fix the latter it seems like there are two conceptual options. One, wrap it in some data structure that caches its result to avoid recomputation. Two, selectively force the argument to realise when you know it will otherwise cause repeated recomputation.
The thing is, neither of those options seem very nice, because both represent additional "levers" the programmer must know how to use and choose when to use.
Is all of this dealt with more thoroughly later in the book?
Is there any straightforward consolidation of these points which would be worth making clear at this point (without perhaps going into all the detail that is to come).
The short answer is yes, this is covered extensively later in the book.
It is covered in most detail in Chapter 4 where we implement eval and apply, and so get the opportunity to modify their behaviour. For example Exercise 4.31 suggests the following syntax:
(define (f a (b lazy) c (d lazy-memo))
As you can see this identifies three different behaviours.
Parameters a and c have applicative order (they are evaluated before they are passed to the f).
b is normal or lazy, it is evaluated everytime it is used (but only if it is used).
d lazy but it's value it memoized so it is evaluated at most once.
Although this syntax is possible it isn't used. I think the philosopy is that the the language has an expected behaviour (applicative order) and that normal order is only used by default when necessary (e.g., the consequent and alternative of an if statement, and in creating streams). When it is necssary (or desirable) to have a parameter with normal evaluation then we can use delay and force, and if necessary memo-proc (e.g. Exercise 3.77).
So at this point the authors are introducing the ideas of normal and applicative order so that we have some familiarity with them by the time we get into the nitty gritty later on.
In a sense this a recurring theme, applicative order is probably more intuitive, but sometimes we need normal order. Recursive functions are simpler to write, but sometimes we need the performance of iterative functions. Programs where we can use the substition model are easier to reason about, but sometimes we need environmental model because we need mutable state.

Knowing when what you're looking at must be a macro

I know there is macro-function, explained here, which allows you to check, but is it also possible in simply reading lisp source to sometimes infer of what you're looking at "that must be a macro"? (assuming of course you have never seen the function/macro before).
I'm fairly sure the answer is yes, but as this seems so fundamental, I thought worth asking, especially because any nuances on this may be valuable & interesting to know about.
In Paul Graham's ANSI Common Lisp, p70, he is describing how to use defstruct.
When I see (defstruct point x y), were I to know absolutely nothing about what defstruct was, this could just as well be a function.
But when I see
(defstruct polemic
(subject "foo")
(effect "bar"))
I know that must be a macro because (let's assume), I also know that subject and effect are undefined functions. (I know that because they error with undefined function when called 'at the top level'(?)) (if that's the right term).
If the two list arguments to defstruct above were quoted, it would not be so simple. Because they're not quoted, it must be a macro.
Is it as simple as that?
I've changed the field names slightly from those used on the book to make this question clearer.
Finally, Graham writes:
"We can specify default values for structure fields by enclosing the field name and a default expression in a list in the original definition"
What I'm noticing is that that's true but it is not a (quoted) list. Would any readers of this post have phrased the above sentence at all differently (given that macros haven't been introduced in the book yet (though I have a basic awareness of what they are)).
My feeling is it's not a "data list" those default expressions are enclosed in. (apologies for bad terminology) - seeking how rightly to conceptualise here.
In general, you're right: if there's some nesting inside the call and you are sure that the car's of the nested lists aren't functions - it's a macro.
Also, almost always, def-something and with-something are macros.
But there's no guarantee. The question is, what are you trying to accomplish? Some code walking/transformation or external processing (like in an editor). For the latter, you should keep in mind that full control is possible only if you perform code evaluation, although heuristics (like in Emacs) can take you pretty far. Or you just want to develop your intuition for faster code reading...
There is a set of conventions that identify quite cleary what forms are supposed to be macros, simply by mimicking the syntax of existing macros or special operators of CL.
For example, the following is a mix of various imaginary macros, but even without knowing their definition, the code shouldn't be too hard to figure out:
(defun/typed example ((id (integer 0 10)))
(with-connection (connection (connect id))
(do-events (event connection)
(event-case event
(:quit (&optional code) (return code))))))
The usual advice about macros is to avoid them if possible, so if you spot something that doesn't make sense as a lisp expression, it probably is, or is enclosed in, a macro.
(defstruct point x y)
[...] were I to know absolutely nothing about what defstruct was, this could just as well be a function.
There are various hints that this is not a function. First of all, the name starts with def. Then, if defstruct was a function, then point, x and y would all be evaluated before calling the function, and that means the code would be relying on global variables, even though they are not wearing earmuffs (e.g. *point*, *x*, *y*), and you probably won't find any definition for them in the preceding forms (or later in the same compilation unit). Also, if it was a function, the result would be discarded directly since it is not used (this is a toplevel form). That only indicates the probable presence of side-effects, but still, this would be unusual.
A top-level function with side-effects would look like this instead, with quoted data:
(register-struct 'point '(x y))
Finally, there are cases where you cannot easily guess if you are using a macro or a function:
(my-get object :slot)
This could be a function call, or you could have a macro that turns the above to (aref object 0) (assuming :slot is the zeroth slot in object, because all your objects are assumed to be of a certain custom type backed by a vector). You could also have compiler macros. In case of doubt, try to macroexpand it and look at the documentation.

What does the jq notation <function>/<number> mean?

In various web pages, I see references to jq functions with a slash and a number following them. For example:
walk/1
I found the above notation used on a stackoverflow page.
I could not find in the jq Manual page a definition as to what this notation means. I'm guessing it might indicate that the walk function that takes 1 argument. If so, I wonder why a more meaningful notation isn't used such as is used with signatures in C++, Java, and other languages:
<function>(type1, type2, ..., typeN)
Can anyone confirm what the notation <function>/<number> means? Are other variants used?
The notation name/arity gives the name and arity of the function. "arity" is the number of arguments (i.e., parameters), so for example explode/0 means you'd just write explode without any arguments, and map/1 means you'd write something like map(f).
The fact that 0-arity functions are invoked by name, without any parentheses, makes the notation especially handy. The fact that a function name can have multiple definitions at any one time (each definition having a distinct arity) makes it easy to distinguish between them.
This notation is not used in jq programs, but it is used in the output of the (new) built-in filter, builtins/0.
By contrast, in some other programming languages, it (or some close variant, e.g. module:name/arity in Erlang) is also part of the language.
Why?
There are various difficulties which typically arise when attempting to graft a notation that's suitable for languages in which method-dispatch is based on types onto ones in which dispatch is based solely on arity.
The first, as already noted, has to do with 0-arity functions. This is especially problematic for jq as 0-arity functions are invoked in jq without parentheses.
The second is that, in general, jq functions do not require their arguments to be any one jq type. Having to write something like nth(string+number) rather than just nth/1 would be tedious at best.
This is why the manual strenuously avoids using "name(type)"-style notation. Thus we see, for example, startswith(str), rather than startswith(string). That is, the parameter names in the documentation are clearly just names, though of course they often give strong type hints.
If you're wondering why the 'name/arity' convention isn't documented in the manual, it's probably largely because the documentation was mostly written before jq supported multi-arity functions.
In summary -- any notational scheme can be made to work, but name/arity is (1) concise; (2) precise in the jq context; (3) easy-to-learn; and (4) widely in use for arity-oriented languages, at least on this planet.

Replacing an ordinary function with a generic function

I'd like to use names such as elt, nth and mapcar with a new data structure that I am prototyping, but these names designate ordinary functions and so, I think, would need to be redefined as generic functions.
Presumably it's bad form to redefine these names?
Is there a way to tell defgeneric not to generate a program error and to go ahead and replace the function binding?
Is there a good reason for these not being generic functions or is just historic?
What's the considered wisdom and best practice here please?
If you are using SBCL or ABCL, and aren't concerned with ANSI compliance, you could investigate Extensible Sequences:
http://www.sbcl.org/manual/#Extensible-Sequences
http://www.doc.gold.ac.uk/~mas01cr/papers/ilc2007/sequences-20070301.pdf
...you can't redefine functions in the COMMON-LISP package, but you could create a new package and shadow the imports of the functions you want to redefine.
Is there a good reason for these not being generic functions or is just historic?
Common Lisp has some layers of language in some of its areas. Higher-level parts of the software might need to be built on lower-level constructs.
One of its goals was being fast enough for a range of applications.
Common Lisp also introduced the idea of sequences, the abstraction over lists and vectors, at a time, when the language didn't have an object-system. CLOS came several years after the initial Common Lisp design.
Take for example something like equality - for numbers.
Lisp has =:
(= a b)
That's the fastest way to compare numbers. = is also defined only for numbers.
Then there are eql, equal and equalp. Those work for numbers, but also for some other data types.
Now, if you need more speed, you can declare the types and tell the compiler to generate faster code:
(locally
(declare (fixnum a b)
(optimize (speed 3) (safety 0)))
(= a b))
So, why is = not a CLOS generic function?
a) it was introduced when CLOS did not exist
but equally important:
b) in Common Lisp it wasn't known (and it still isn't) how to make a CLOS generic function = as fast as a non-generic function for typical usage scenarios - while preserving dynamic typing and extensibility
CLOS generic function simply have a speed penalty. The runtime dispatch costs.
CLOS is best used for higher level code, which then really benefits from features like extensibility, multi-dispatch, inheritance/combinations. Generic functions should be used for defined generic behavior - not as collections of similar methods.
With better implementation technology, implementation-specific language enhancements, etc. it might be possible to increase the range of code which can be written in a performant way using CLOS. This has been tried with programming languages like Dylan and Julia.
Presumably it's bad form to redefine these names?
Common Lisp implementations don't let you replace them just so. Be aware, that your replacement functions should be implemented in a way which works consistently with the old functions. Also, old versions could be inlined in some way and not be replaceable everywhere.
Is there a way to tell defgeneric not to generate a program error and to go ahead and replace the function binding?
You would need to make sure that the replacement is working while replacing it. The code replacing functions, might use those function you are replacing.
Still, implementations allow you to replace CL functions - but this is implementation specific. For example LispWorks provides the variables lispworks:*packages-for-warn-on-redefinition* and lispworks:*handle-warn-on-redefinition*. One can bind them or change them globally.
What's the considered wisdom and best practice here please?
There are two approaches:
use implementation specific ways to replace standard Common Lisp functions
This can be dangerous. Plus you need to support it for all implementations of CL you want to use...
use a language package, where you define your new language. Here this would be standard Common Lisp plus your extensions/changes. Export everything the user would use. In your software use this package instead of CL.

Haskell "collections" language design

Why is the Haskell implementation so focused on linked lists?
For example, I know Data.Sequence is more efficient
with most of the list operations (except for the cons operation), and is used a lot;
syntactically, though, it is "hardly supported". Haskell has put a lot of effort into functional abstractions, such as the Functor and the Foldable class, but their syntax is not compatible with that of the default list.
If, in a project I want to optimize and replace my lists with sequences - or if I suddenly want support for infinite collections, and replace my sequences with lists - the resulting code changes are abhorrent.
So I guess my wondering can be made concrete in questions such as:
Why isn't the type of map equal to (Functor f) => (a -> b) -> f a -> f b?
Why can't the [] and (:) functions be used for, for example, the type in Data.Sequence?
I am really hoping there is some explanation for this, that doesn't include the words "backwards compatibility" or "it just grew that way", though if you think there isn't, please let me know. Any relevant language extensions are welcome as well.
Before getting into why, here's a summary of the problem and what you can do about it. The constructors [] and (:) are reserved for lists and cannot be redefined. If you plan to use the same code with multiple data types, then define or choose a type class representing the interface you want to support, and use methods from that class.
Here are some generalized functions that work on both lists and sequences. I don't know of a generalization of (:), but you could write your own.
fmap instead of map
mempty instead of []
mappend instead of (++)
If you plan to do a one-off data type replacement, then you can define your own names for things, and redefine them later.
-- For now, use lists
type List a = [a]
nil = []
cons x xs = x : xs
{- Switch to Seq in the future
-- type List a = Seq a
-- nil = empty
-- cons x xs = x <| xs
-}
Note that [] and (:) are constructors: you can also use them for pattern matching. Pattern matching is specific to one type constructor, so you can't extend a pattern to work on a new data type without rewriting the pattern-matchign code.
Why there's so much list-specific stuff in Haskell
Lists are commonly used to represent sequential computations, rather than data. In an imperative language, you might build a Set with a loop that creates elements and inserts them into the set one by one. In Haskell, you do the same thing by creating a list and then passing the list to Set.fromList. Since lists so closely match this abstraction of computation, they have a place that's unlikely to ever be superseded by another data structure.
The fact remains that some functions are list-specific when they could have been generic. Some common functions like map were made list-specific so that new users would have less to learn. In particular, they provide simpler and (it was decided) more understandable error messages. Since it's possible to use generic functions instead, the problem is really just a syntactic inconvenience. It's worth noting that Haskell language implementations have very little list-speficic code, so new data structures and methods can be just as efficient as the "built-in" ones.
There are several classes that are useful generalizations of lists:
Functor supplies fmap, a generalization of map.
Monoid supplies methods useful for collections with list-like structure. The empty list [] is generalized to other containers by mempty, and list concatenation (++) is generalized to other containers by mappend.
Applicative and Monad supply methods that are useful for interpreting collections as computations.
Traversable and Foldable supply useful methods for running computations over collections.
Of these, only Functor and Monad were in the influential Haskell 98 spec, so the others have been overlooked to varying degrees by library writers, depending on when the library was written and how actively it was maintained. The core libraries have been good about supporting new interfaces.
I remember reading somewhere that map is for lists by default since newcomers to Haskell would be put off if they made a mistake and saw a complex error about "Functors", which they have no idea about. Therefore, they have both map and fmap instead of just map.
EDIT: That "somewhere" is the Monad Reader Issue 13, page 20, footnote 3:
3You might ask why we need a separate map function. Why not just do away with the current
list-only map function, and rename fmap to map instead? Well, that’s a good question. The
usual argument is that someone just learning Haskell, when using map incorrectly, would much
rather see an error about lists than about Functors.
For (:), the (<|) function seems to be a replacement. I have no idea about [].
A nitpick, Data.Sequence isn't more efficient for "list operations", it is more efficient for sequence operations. That said, a lot of the functions in Data.List are really sequence operations. The finger tree inside Data.Sequence has to do quite a bit more work for a cons (<|) equivalent to list (:), and its memory representation is also somewhat larger than a list as it is made from two data types a FingerTree and a Deep.
The extra syntax for lists is fine, it hits the sweet spot at what lists are good at - cons (:) and pattern-matching from the left. Whether or not sequences should have extra syntax is further debate, but as you can get a very long way with lists, and lists are inherently simple, having good syntax is a must.
List isn't an ideal representation for Strings - the memory layout is inefficient as each Char is wrapped with a constructor. This is why ByteStrings were introduced. Although they are laid out as an array ByteStrings have to do a bit of administrative work - [Char] can still be competitive if you are using short strings. In GHC there are language extensions to give ByteStrings more String-like syntax.
The other major lazy functional Clean has always represented strings as byte arrays, but its type system made this more practical - I believe the ByteString library uses unsafePerfomIO under the hood.
With version 7.8, ghc supports overloading list literals, compare the manual. For example, given appropriate IsList instances, you can write
['0' .. '9'] :: Set Char
[1 .. 10] :: Vector Int
[("default",0), (k1,v1)] :: Map String Int
['a' .. 'z'] :: Text
(quoted from the documentation).
I am pretty sure this won't be an answer to your question, but still.
I wish Haskell had more liberal function names(mixfix!) a la Agda. Then, the syntax for list constructors (:,[]) wouldn't have been magic; allowing us to at least hide the list type and use the same tokens for our own types.
The amount of code change while migrating between list and custom sequence types would be minimal then.
About map, you are a bit luckier. You can always hide map, and set it equal to fmap yourself.
import Prelude hiding(map)
map :: (Functor f) => (a -> b) -> f a -> f b
map = fmap
Prelude is great, but it isn't the best part of Haskell.

Resources