Haskell "collections" language design - collections

Why is the Haskell implementation so focused on linked lists?
For example, I know Data.Sequence is more efficient
with most of the list operations (except for the cons operation), and is used a lot;
syntactically, though, it is "hardly supported". Haskell has put a lot of effort into functional abstractions, such as the Functor and the Foldable class, but their syntax is not compatible with that of the default list.
If, in a project I want to optimize and replace my lists with sequences - or if I suddenly want support for infinite collections, and replace my sequences with lists - the resulting code changes are abhorrent.
So I guess my wondering can be made concrete in questions such as:
Why isn't the type of map equal to (Functor f) => (a -> b) -> f a -> f b?
Why can't the [] and (:) functions be used for, for example, the type in Data.Sequence?
I am really hoping there is some explanation for this, that doesn't include the words "backwards compatibility" or "it just grew that way", though if you think there isn't, please let me know. Any relevant language extensions are welcome as well.

Before getting into why, here's a summary of the problem and what you can do about it. The constructors [] and (:) are reserved for lists and cannot be redefined. If you plan to use the same code with multiple data types, then define or choose a type class representing the interface you want to support, and use methods from that class.
Here are some generalized functions that work on both lists and sequences. I don't know of a generalization of (:), but you could write your own.
fmap instead of map
mempty instead of []
mappend instead of (++)
If you plan to do a one-off data type replacement, then you can define your own names for things, and redefine them later.
-- For now, use lists
type List a = [a]
nil = []
cons x xs = x : xs
{- Switch to Seq in the future
-- type List a = Seq a
-- nil = empty
-- cons x xs = x <| xs
-}
Note that [] and (:) are constructors: you can also use them for pattern matching. Pattern matching is specific to one type constructor, so you can't extend a pattern to work on a new data type without rewriting the pattern-matchign code.
Why there's so much list-specific stuff in Haskell
Lists are commonly used to represent sequential computations, rather than data. In an imperative language, you might build a Set with a loop that creates elements and inserts them into the set one by one. In Haskell, you do the same thing by creating a list and then passing the list to Set.fromList. Since lists so closely match this abstraction of computation, they have a place that's unlikely to ever be superseded by another data structure.
The fact remains that some functions are list-specific when they could have been generic. Some common functions like map were made list-specific so that new users would have less to learn. In particular, they provide simpler and (it was decided) more understandable error messages. Since it's possible to use generic functions instead, the problem is really just a syntactic inconvenience. It's worth noting that Haskell language implementations have very little list-speficic code, so new data structures and methods can be just as efficient as the "built-in" ones.
There are several classes that are useful generalizations of lists:
Functor supplies fmap, a generalization of map.
Monoid supplies methods useful for collections with list-like structure. The empty list [] is generalized to other containers by mempty, and list concatenation (++) is generalized to other containers by mappend.
Applicative and Monad supply methods that are useful for interpreting collections as computations.
Traversable and Foldable supply useful methods for running computations over collections.
Of these, only Functor and Monad were in the influential Haskell 98 spec, so the others have been overlooked to varying degrees by library writers, depending on when the library was written and how actively it was maintained. The core libraries have been good about supporting new interfaces.

I remember reading somewhere that map is for lists by default since newcomers to Haskell would be put off if they made a mistake and saw a complex error about "Functors", which they have no idea about. Therefore, they have both map and fmap instead of just map.
EDIT: That "somewhere" is the Monad Reader Issue 13, page 20, footnote 3:
3You might ask why we need a separate map function. Why not just do away with the current
list-only map function, and rename fmap to map instead? Well, that’s a good question. The
usual argument is that someone just learning Haskell, when using map incorrectly, would much
rather see an error about lists than about Functors.
For (:), the (<|) function seems to be a replacement. I have no idea about [].

A nitpick, Data.Sequence isn't more efficient for "list operations", it is more efficient for sequence operations. That said, a lot of the functions in Data.List are really sequence operations. The finger tree inside Data.Sequence has to do quite a bit more work for a cons (<|) equivalent to list (:), and its memory representation is also somewhat larger than a list as it is made from two data types a FingerTree and a Deep.
The extra syntax for lists is fine, it hits the sweet spot at what lists are good at - cons (:) and pattern-matching from the left. Whether or not sequences should have extra syntax is further debate, but as you can get a very long way with lists, and lists are inherently simple, having good syntax is a must.
List isn't an ideal representation for Strings - the memory layout is inefficient as each Char is wrapped with a constructor. This is why ByteStrings were introduced. Although they are laid out as an array ByteStrings have to do a bit of administrative work - [Char] can still be competitive if you are using short strings. In GHC there are language extensions to give ByteStrings more String-like syntax.
The other major lazy functional Clean has always represented strings as byte arrays, but its type system made this more practical - I believe the ByteString library uses unsafePerfomIO under the hood.

With version 7.8, ghc supports overloading list literals, compare the manual. For example, given appropriate IsList instances, you can write
['0' .. '9'] :: Set Char
[1 .. 10] :: Vector Int
[("default",0), (k1,v1)] :: Map String Int
['a' .. 'z'] :: Text
(quoted from the documentation).

I am pretty sure this won't be an answer to your question, but still.
I wish Haskell had more liberal function names(mixfix!) a la Agda. Then, the syntax for list constructors (:,[]) wouldn't have been magic; allowing us to at least hide the list type and use the same tokens for our own types.
The amount of code change while migrating between list and custom sequence types would be minimal then.
About map, you are a bit luckier. You can always hide map, and set it equal to fmap yourself.
import Prelude hiding(map)
map :: (Functor f) => (a -> b) -> f a -> f b
map = fmap
Prelude is great, but it isn't the best part of Haskell.

Related

Where does the word "flatMap" originate from?

Nowdays flatMap is the most widely used name for correspondent operation on monad-like objects.
But I can't find where it has appeared for the first time and what has popularized it.
The oldest appearance I know about is in Scala.
In Haskell it is called bind.
In category theory Greek notation is used.
Partial answer, which hopefully provides some useful "seed nodes" to start more thorough search. My best guess:
1958 for map used for list processing,
1988 for flatten used in context of monads,
2004 for flatMap used as important method backing for-comprehensions in Scala.
The function / method name flatMap seems to be a portmanteau word composed from flatten and map. This makes sense, because whenever M is some monad, A,B some types, and a: M[A], f: A => M[B] a value and a function, then the implementations of map, flatMap and flatten should satisfy
a.flatMap(f) = a.map(f).flatten
(in Scala-syntax).
Let's first consider the both components map and flatten separately.
Map
The map-function seems to have been used to map over lists since time immemorial. My best guess would be that it came from Lisp (around 1958), and then spread to all other languages that had anything resembling higher-order functions.
Flatten
Given how many things are represented by lists in Lisp, I assume that flatten has also been used there for list processing.
The usage of flatten in context of monads must be much more recent, because the monads themselves have been introduced in programming quite a bit later. If we are looking for the usage of word "flatten" in the context of monadic computations, we probably should at least check the papers by Eugenio Moggi. Indeed, in "Computational Lambda-Calculus and Monads" from 1988, he uses the formulation:
Remark 2.2: Intuitively eta_A: A -> TA gives the inclusion of values into computations, while mu_A: T^2 A -> TA flatten a computation of a computation into a computation.
(typesetting changed by me, emphasis mine, text in italic as in original). I think it's interesting that Moggi talks about flattening computations, and not just lists.
Math notation / "Greek"
On the Greek used in mathematical notation: in category theory, the more common way to introduce monads is through the natural transformations that correspond to pure and flatten, the morphisms corresponding to flatMap are deemphasized. However, nobody calls it "flatten". For example, Maclane calls the natural transformation corresponding to method pure "unit" (not to be confused with method unit), and flatten is usually called "multiplication", in analogy with Monoids. One might investigate further whether it was different when the "triple"-terminology was more prevalent.
flatMap
To find the origins of the flatMap portmanteau word, I'd propose to start with the most prominent popularizer today, and then try to backtrack from there. Apparently, flatMap is a Scala meme, so it seems reasonable to start from Scala. One might check the standard libraries (especially the List data structure) of the usual suspects: the languages that influenced Scala. These "roots" are named in Chapter 1, section 1.4 in Odersky's "Programming in Scala":
C, C++ and C# are probably not where it came from.
In Java it was the other way around: the flatMap came from Scala into version 1.8 of Java.
I can't say anything about Smalltalk
Ruby definitely has flat_map on Enumerable, but I don't know anything about Ruby, and I don't want to dig into the source code to find out when it was introduced.
Algol and Simula: definitely not.
Strangely enough ML (SML) seems to get by without flatMap, it only has concat (which is essentially the same as flatten). OCaml's lists also seem to have flatten, but no flatMap.
As you've already mentioned, Haskell had all this long ago, but in Haskell it is called bind and written as an operator
Erlang has flatmap on lists, but I'm not sure whether this is the origin, or whether it was introduced later. The problem with Erlang is that it is from 1986, back then there was no github.
I can't say anything about Iswim, Beta and gbeta.
I think it would be fair to say that flatMap has been popularized by Scala, for two reasons:
The flatMap took a prominent role in the design of Scala's collection library, and few years later it turned out to generalize nicely to huge distributed collections (Apache Spark and similar tools)
The flatMap became the favorite toy of everyone who decided to do functional programming on the JVM properly (Scalaz and libraries inspired by Scalaz, like Scala Cats)
To sum it up: the "flatten" terminology has been used in the context of monads since the very beginning. Later, it was combined with map into flatMap, and popularized by Scala, or more specifically by frameworks such as Apache Spark and Scalaz.
flatmap was introduced in Section 2.2.3 Sequences as Conventional Interfaces in "Structure and Interpretation of Computer Programs" as
(define (flatmap proc seq)
(accumulate append nil (map proc seq)))
The first edition of the book appeared in 1985.

Why is there no generic operators for Common Lisp?

In CL, we have many operators to check for equality that depend on the data type: =, string-equal, char=, then equal, eql and whatnot, so on for other data types, and the same for comparison operators (edit don't forget to answer about these please :) do we have generic <, > etc ? can we make them work for another object ?)
However the language has mechanisms to make them generic, for example generics (defgeneric, defmethod) as described in Practical Common Lisp. I imagine very well the same == operator that will work on integers, strings and characters, at least !
There have been work in that direction: https://common-lisp.net/project/cdr/document/8/cleqcmp.html
I see this as a major frustration, and even a wall, for beginners (of which I am), specially we who come from other languages like python where we use one equality operator (==) for every equality check (with the help of objects to make it so on custom types).
I read a blog post (not a monad tutorial, great serie) today pointing this. The guy moved to Clojure, for other reasons too of course, where there is one (or two?) operators.
So why is it so ? Is there any good reasons ? I can't even find a third party library, not even on CL21. edit: cl21 has this sort of generic operators, of course.
On other SO questions I read about performance. First, this won't apply to the little code I'll write so I don't care, and if you think so do you have figures to make your point ?
edit: despite the tone of the answers, it looks like there is not ;) We discuss in comments.
Kent Pitman has written an interesting article that tackles this subject: The Best of intentions, EQUAL rights — and wrongs — in Lisp.
And also note that EQUAL does work on integers, strings and characters. EQUALP also works for lists, vectors and hash tables an other Common Lisp types but objects… For some definition of work. The note at the end of the EQUALP page has a nice answer to your question:
Object equality is not a concept for which there is a uniquely determined correct algorithm. The appropriateness of an equality predicate can be judged only in the context of the needs of some particular program. Although these functions take any type of argument and their names sound very generic, equal and equalp are not appropriate for every application.
Specifically note that there is a trick in my last “works” definition.
A newer library adds generic interfaces to standard Common Lisp functions: https://github.com/alex-gutev/generic-cl/
GENERIC-CL provides a generic function wrapper over various functions in the Common Lisp standard, such as equality predicates and sequence operations. The goal of the wrapper is to provide a standard interface to common operations, such as testing for the equality of two objects, which is extensible to user-defined types.
It does this for equality, comparison, arithmetic, objects, iterators, sequences, hash-tables, math functions,…
So one can define his own + operator for example.
Yes we have! eq works with all values and it works all the time. It does not depend on the data type at all. It is exactly what you are looking for. It's like the is operator in python. It must be exactly what you were looking for? All the other ones agree with eq when it's t, however they tend to be t for totally different values that have various levels of similarities.
(defparameter *a* "this is a string")
(defparameter *b* *a*)
(defparameter *c* "this is a string")
(defparameter *d* "THIS IS A STRING")
All of these are equalp since they contain the same meaning. equalp is perhaps the sloppiest of equal functions. I don't think 2 and 2.0 are the same, but equalp does. In my mind 2 is 2 while 2.0 is somewhere between 1.95 and 2.04. you see they are not the same.
equal understands me. (equal *c* *d*) is definitely nil and that is good. However it returns t for (equal *a* *c*) as well. Both are arrays of characters and each character are the same value, however the two strings are not the same object. they just happen to look the same.
Notice I'm using string here for every single one of them. We have 4 equal functions that tells you if two values have something in common, but only eq tells you if they are the same.
None of these are type specific. They work on all types, however they are not generics since they were around long before that was added in the language. You could perhaps make 3-4 generic equal functions but would they really be any better than the ones we already have?
Fortunately CL21 introduces (more) generic operators, particularly for sequences it defines length, append, setf, first, rest, subseq, replace, take, drop, fill, take-while, drop-while, last, butlast, find-if, search, remove-if, delete-if, reverse, reduce, sort, split, join, remove-duplicates, every, some, map, sum (and some more). Unfortunately the doc isn't great, it's best to look at the sources. Those should work at least for strings, lists, vectors and define methods of the new abstract-sequence.
see also
https://github.com/cl21/cl21/wiki
https://lispcookbook.github.io/cl-cookbook/cl21.html

Why must Isabelle functions have at least one argument?

When I try to write
fun foo :: "nat ⇒ nat"
where "foo = Suc"
Isabelle complains that "Function has no arguments". Why is this? What's wrong with a fun having no arguments? I know that I can change fun to abbreviation or definition and all is fine. But it seems a shame to spoil the uniformity of my .thy file, in which every other definition is declared with fun.
Alex Krauss, the author of the present generation of fun and function in Isabelle/HOL had particular opinions about that, and probably also good formal reasons to say that a "function" really needs to have arguments. In SML you actually have a similar situation: "constants" without arguments are defined via val not fun.
In the rare situations, where zero-argument functions are needed in Isabelle/HOL, it is sufficiently easy to use definition [simp] "c = t to get mostly the same result, apart from the name of the key theorems produced internally: c_def versus c.simps.
I think the main inconvenience and occasional pitfal of function in this respect is its exposure of the auxiliary c_def that is not meant to be used in applications: it unfolds the internal construction behind the function specification, not its main characterizing equation.
Since no other answer seems to be on its way, let me repeat and extend on my previous comment. In Isabelle/HOL there are three ways of defining functions:
definition for non-recursive functions (which could just be seen as constants that serve as abbreviations for longer statements).
primrec for primitive recursive functions (in the sense that in every recursive call there is a fixed argument where a datatype constructor is removed).
fun for general recursive functions.
Both, primrec and fun expect at least one argument. For the former it is automatically checked that one of its arguments corresponds to the syntactic pattern of primitive recursion on datatypes, while for the latter the task of proving "termination" (or rather the well-foundedness of the call graph) will be delegated to the user in hard cases.
Anyway, it would of course be possible to relay primrec and fun to definition for easy cases without arguments, but at least to me this rather seems to obfuscate things for the user instead of clearing them up.

What is so special about Monads?

A monad is a mathematical structure which is heavily used in (pure) functional programming, basically Haskell. However, there are many other mathematical structures available, like for example applicative functors, strong monads, or monoids. Some have more specific, some are more generic. Yet, monads are much more popular. Why is that?
One explanation I came up with, is that they are a sweet spot between genericity and specificity. This means monads capture enough assumptions about the data to apply the algorithms we typically use and the data we usually have fulfills the monadic laws.
Another explanation could be that Haskell provides syntax for monads (do-notation), but not for other structures, which means Haskell programmers (and thus functional programming researchers) are intuitively drawn towards monads, where a more generic or specific (efficient) function would work as well.
I suspect that the disproportionately large attention given to this one particular type class (Monad) over the many others is mainly a historical fluke. People often associate IO with Monad, although the two are independently useful ideas (as are list reversal and bananas). Because IO is magical (having an implementation but no denotation) and Monad is often associated with IO, it's easy to fall into magical thinking about Monad.
(Aside: it's questionable whether IO even is a monad. Do the monad laws hold? What do the laws even mean for IO, i.e., what does equality mean? Note the problematic association with the state monad.)
If a type m :: * -> * has a Monad instance, you get Turing-complete composition of functions with type a -> m b. This is a fantastically useful property. You get the ability to abstract various Turing-complete control flows away from specific meanings. It's a minimal composition pattern that supports abstracting any control flow for working with types that support it.
Compare this to Applicative, for instance. There, you get only composition patterns with computational power equivalent to a push-down automaton. Of course, it's true that more types support composition with more limited power. And it's true that when you limit the power available, you can do additional optimizations. These two reasons are why the Applicative class exists and is useful. But things that can be instances of Monad usually are, so that users of the type can perform the most general operations possible with the type.
Edit:
By popular demand, here are some functions using the Monad class:
ifM :: Monad m => m Bool -> m a -> m a -> m a
ifM c x y = c >>= \z -> if z then x else y
whileM :: Monad m => (a -> m Bool) -> (a -> m a) -> a -> m a
whileM p step x = ifM (p x) (step x >>= whileM p step) (return x)
(*&&) :: Monad m => m Bool -> m Bool -> m Bool
x *&& y = ifM x y (return False)
(*||) :: Monad m => m Bool -> m Bool -> m Bool
x *|| y = ifM x (return True) y
notM :: Monad m => m Bool -> m Bool
notM x = x >>= return . not
Combining those with do syntax (or the raw >>= operator) gives you name binding, indefinite looping, and complete boolean logic. That's a well-known set of primitives sufficient to give Turing completeness. Note how all the functions have been lifted to work on monadic values, rather than simple values. All monadic effects are bound only when necessary - only the effects from the chosen branch of ifM are bound into its final value. Both *&& and *|| ignore their second argument when possible. And so on..
Now, those type signatures may not involve functions for every monadic operand, but that's just a cognitive simplification. There would be no semantic difference, ignoring bottoms, if all the non-function arguments and results were changed to () -> m a. It's just friendlier to users to optimize that cognitive overhead out.
Now, let's look at what happens to those functions with the Applicative interface.
ifA :: Applicative f => f Bool -> f a -> f a -> f a
ifA c x y = (\c' x' y' -> if c' then x' else y') <$> c <*> x <*> y
Well, uh. It got the same type signature. But there's a really big problem here already. The effects of both x and y are bound into the composed structure, regardless of which one's value is selected.
whileA :: Applicative f => (a -> f Bool) -> (a -> f a) -> a -> f a
whileA p step x = ifA (p x) (whileA p step <$> step x) (pure x)
Well, ok, that seems like it'd be ok, except for the fact that it's an infinite loop because ifA will always execute both branches... Except it's not even that close. pure x has the type f a. whileA p step <$> step x has the type f (f a). This isn't even an infinite loop. It's a compile error. Let's try again..
whileA :: Applicative f => (a -> f Bool) -> (a -> f a) -> a -> f a
whileA p step x = ifA (p x) (whileA p step <*> step x) (pure x)
Well shoot. Don't even get that far. whileA p step has the type a -> f a. If you try to use it as the first argument to <*>, it grabs the Applicative instance for the top type constructor, which is (->), not f. Yeah, this isn't gonna work either.
In fact, the only function from my Monad examples that would work with the Applicative interface is notM. That particular function works just fine with only a Functor interface, in fact. The rest? They fail.
Of course it's to be expected that you can write code using the Monad interface that you can't with the Applicative interface. It is strictly more powerful, after all. But what's interesting is what you lose. You lose the ability to compose functions that change what effects they have based on their input. That is, you lose the ability to write certain control-flow patterns that compose functions with types a -> f b.
Turing-complete composition is exactly what makes the Monad interface interesting. If it didn't allow Turing-complete composition, it would be impossible for you, the programmer, to compose together IO actions in any particular control flow that wasn't nicely prepackaged for you. It was the fact that you can use the Monad primitives to express any control flow that made the IO type a feasible way to manage the IO problem in Haskell.
Many more types than just IO have semantically valid Monad interfaces. And it happens that Haskell has the language facilities to abstract over the entire interface. Due to those factors, Monad is a valuable class to provide instances for, when possible. Doing so gets you access to all the existing abstract functionality provided for working with monadic types, regardless of what the concrete type is.
So if Haskell programmers seem to always care about Monad instances for a type, it's because it's the most generically-useful instance that can be provided.
First, I think that it is not quite true that monads are much more popular than anything else; both Functor and Monoid have many instances that are not monads. But they are both very specific; Functor provides mapping, Monoid concatenation. Applicative is the one class that I can think of that is probably underused given its considerable power, due largely to its being a relatively recent addition to the language.
But yes, monads are extremely popular. Part of that is the do notation; a lot of Monoids provide Monad instances that merely append values to a running accumulator (essentially an implicit writer). The blaze-html library is a good example. The reason, I think, is the power of the type signature (>>=) :: Monad m => m a -> (a -> m b) -> m b. While fmap and mappend are useful, what they can do is fairly narrowly constrained. bind, however, can express a wide variety of things. It is, of course, canonized in the IO monad, perhaps the best pure functional approach to IO before streams and FRP (and still useful beside them for simple tasks and defining components). But it also provides implicit state (Reader/Writer/ST), which can avoid some very tedious variable passing. The various state monads, especially, are important because they provide a guarantee that state is single threaded, allowing mutable structures in pure (non-IO) code before fusion. But bind has some more exotic uses, such as flattening nested data structures (the List and Set monads), both of which are quite useful in their place (and I usually see them used desugared, calling liftM or (>>=) explicitly, so it is not a matter of do notation). So while Functor and Monoid (and the somewhat rarer Foldable, Alternative, Traversable, and others) provide a standardized interface to a fairly straightforward function, Monad's bind is considerably more flexibility.
In short, I think that all your reasons have some role; the popularity of monads is due to a combination of historical accident (do notation and the late definition of Applicative) and their combination of power and generality (relative to functors, monoids, and the like) and understandability (relative to arrows).
Well, first let me explain what the role of monads is: Monads are very powerful, but in a certain sense: You can pretty much express anything using a monad. Haskell as a language doesn't have things like action loops, exceptions, mutation, goto, etc. Monads can be expressed within the language (so they are not special) and make all of these reachable.
There is a positive and a negative side to this: It's positive that you can express all those control structures you know from imperative programming and a whole bunch of them you don't. I have just recently developed a monad that lets you reenter a computation somewhere in the middle with a slightly changed context. That way you can run a computation, and if it fails, you just try again with slightly adjusted values. Furthermore monadic actions are first class, and that's how you build things like loops or exception handling. While while is primitive in C in Haskell it's actually just a regular function.
The negative side is that monads give you pretty much no guarantees whatsoever. They are so powerful that you are allowed to do whatever you want, to put it simply. In other words just like you know from imperative languages it can be hard to reason about code by just looking at it.
The more general abstractions are more general in the sense that they allow some concepts to be expressed which you can't express as monads. But that's only part of the story. Even for monads you can use a style known as applicative style, in which you use the applicative interface to compose your program from small isolated parts. The benefit of this is that you can reason about code by just looking at it and you can develop components without having to pay attention to the rest of your system.
What is so special about monads?
The monadic interface's main claim to fame in Haskell is its role in the replacement of the original and unwieldy dialogue-based I/O mechanism.
As for their status in a formal investigative context...it is merely an iteration of a seemingly-cyclic endeavour which is now (2021 Oct) approximately one half-century old:
During the 1960s, several researchers began work on proving things about programs. Efforts were
made to prove that:
A program was correct.
Two programs with different code computed the same answers when given the
same inputs.
One program was faster than another.
A given program would always terminate.
While these are abstract goals, they are all, really, the same as the practical goal of "getting the
program debugged".
Several difficult problems emerged from this work. One was the problem of specification: before
one can prove that a program is correct, one must specify the meaning of "correct", formally and
unambiguously. Formal systems for specifying the meaning of a program were developed, and they
looked suspiciously like programming languages.
The Anatomy of Programming Languages, Alice E. Fischer and Frances S. Grodzinsky.
(emphasis by me.)
...back when "programming languages" - apart from an intrepid few - were most definitely imperative.
Anyone for elevating this mystery to the rank of Millenium problem? Solving it would definitely advance the science of computing and the engineering of software, one way or the other...
Monads are special because of do notation, which lets you write imperative programs in a functional language. Monad is the abstraction that allows you to splice together imperative programs from smaller, reusable components (which are themselves imperative programs). Monad transformers are special because they represent enhancing an imperative language with new features.

Functional Programming for Basic Algorithms

How good is 'pure' functional programming for basic routine implementations, e.g. list sorting, string matching etc.?
It's common to implement such basic functions within the base interpreter of any functional language, which means that they will be written in an imperative language (c/c++). Although there are many exceptions..
At least, I wish to ask: How difficult is it to emulate imperative style while coding in 'pure' functional language?
How good is 'pure' functional
programming for basic routine
implementations, e.g. list sorting,
string matching etc.?
Very. I'll do your problems in Haskell, and I'll be slightly verbose about it. My aim is not to convince you that the problem can be done in 5 characters (it probably can in J!), but rather to give you an idea of the constructs.
import Data.List -- for `sort`
stdlistsorter :: (Ord a) => [a] -> [a]
stdlistsorter list = sort list
Sorting a list using the sort function from Data.List
import Data.List -- for `delete`
selectionsort :: (Ord a) => [a] -> [a]
selectionsort [] = []
selectionsort list = minimum list : (selectionsort . delete (minimum list) $ list)
Selection sort implementation.
quicksort :: (Ord a) => [a] -> [a]
quicksort [] = []
quicksort (x:xs) =
let smallerSorted = quicksort [a | a <- xs, a <= x]
biggerSorted = quicksort [a | a <- xs, a > x]
in smallerSorted ++ [x] ++ biggerSorted
Quick sort implementation.
import Data.List -- for `isInfixOf`
stdstringmatch :: (Eq a) => [a] -> [a] -> Bool
stdstringmatch list1 list2 = list1 `isInfixOf` list2
String matching using isInfixOf function from Data.list
It's common to implement such basic
functions within the base interpreter
of any functional language, which
means that they will be written in an
imperative language (c/c++). Although
there are many exceptions..
Depends. Some functions are more naturally expressed imperatively. However, I hope I have convinced you that some algorithms are also expressed naturally in a functional way.
At least, I wish to ask: How difficult
is it to emulate imperative style
while coding in 'pure' functional
language?
It depends on how hard you find Monads in Haskell. Personally, I find it quite difficult to grasp.
1) Good by what standard? What properties do you desire?
List sorting? Easy. Let's do Quicksort in Haskell:
sort [] = []
sort (x:xs) = sort (filter (< x) xs) ++ [x] ++ sort (filter (>= x) xs)
This code has the advantage of being extremely easy to understand. If the list is empty, it's sorted. Otherwise, call the first element x, find elements less than x and sort them, find elements greater than x and sort those. Then concatenate the sorted lists with x in the middle. Try making that look comprehensible in C++.
Of course, Mergesort is much faster for sorting linked lists, but the code is also 6 times longer.
2) It's extremely easy to implement imperative style while staying purely functional. The essence of imperative style is sequencing of actions. Actions are sequenced in a pure setting by using monads. The essence of monads is the binding function:
(Monad m) => (>>=) :: m a -> (a -> m b) -> m b
This function exists in C++, and it's called ;.
A sequence of actions in Haskell, for example, is written thusly:
putStrLn "What's your name?" >>=
const (getLine >>= \name -> putStrLn ("Hello, " ++ name))
Some syntax sugar is available to make this look more imperative (but note that this is the exact same code):
do {
putStrLn "What's your name?";
name <- getLine;
putStrLn ("Hello, " ++ name);
}
Nearly all functional programming languages have some construct to allow for imperative coding (like do in Haskell). There are many problem domains that can't be solved with "pure" functional programming. One of those is network protocols, for example where you need a series of commands in the right order. And such things don't lend themselves well to pure functional programming.
I have to agree with Lothar, though, that list sorting and string matching are not really examples you need to solve imperatively. There are well-known algorithms for such things and they can be implemented efficiently in functional languages already.
I think that 'algorithms' (e.g. method bodies and basic data structures) are where functional programming is best. Assuming nothing completely IO/state-dependent, functional programming excels are authoring algorithms and data structures, often resulting in shorter/simpler/cleaner code than you'd get with an imperative solution. (Don't emulate imperative style, FP style is better for most of these kinds of tasks.)
You want imperative stuff sometimes to deal with IO or low-level performance, and you want OOP for partitioning the high-level design and architecture of a large program, but "in the small" where you write most of your code, FP is a win.
See also
How does functional programming affect the structure of your code?
It works pretty well the other way round emulating functional with imperative style.
Remember that the internal of an interpreter or VM ware so close to metal and performance critical that you should even consider going to assember level and count the the clock cycles for each instruction (like Smalltalk Dophin is just doing it and the results are impressive).
CPU's are imperative.
But there is no problem to do all the basic algorithm implementation - the one you mention are NOT low level - they are basics.
I don't know about list sorting, but you'd be hard pressed to bootstrapp a language without some kind of string matching in the compiler or runtime. So you need that routine to create the language. As there isn't a great deal of point writing the same code twice, when you create the library for matching strings within the language, you call the code written earlier. The degree to which this happens in successive releases will depend on how self hosting the language is, but unless that's a strong design goal there won't be any reason to change it.

Resources