Where does the word "flatMap" originate from? - functional-programming

Nowdays flatMap is the most widely used name for correspondent operation on monad-like objects.
But I can't find where it has appeared for the first time and what has popularized it.
The oldest appearance I know about is in Scala.
In Haskell it is called bind.
In category theory Greek notation is used.

Partial answer, which hopefully provides some useful "seed nodes" to start more thorough search. My best guess:
1958 for map used for list processing,
1988 for flatten used in context of monads,
2004 for flatMap used as important method backing for-comprehensions in Scala.
The function / method name flatMap seems to be a portmanteau word composed from flatten and map. This makes sense, because whenever M is some monad, A,B some types, and a: M[A], f: A => M[B] a value and a function, then the implementations of map, flatMap and flatten should satisfy
a.flatMap(f) = a.map(f).flatten
(in Scala-syntax).
Let's first consider the both components map and flatten separately.
Map
The map-function seems to have been used to map over lists since time immemorial. My best guess would be that it came from Lisp (around 1958), and then spread to all other languages that had anything resembling higher-order functions.
Flatten
Given how many things are represented by lists in Lisp, I assume that flatten has also been used there for list processing.
The usage of flatten in context of monads must be much more recent, because the monads themselves have been introduced in programming quite a bit later. If we are looking for the usage of word "flatten" in the context of monadic computations, we probably should at least check the papers by Eugenio Moggi. Indeed, in "Computational Lambda-Calculus and Monads" from 1988, he uses the formulation:
Remark 2.2: Intuitively eta_A: A -> TA gives the inclusion of values into computations, while mu_A: T^2 A -> TA flatten a computation of a computation into a computation.
(typesetting changed by me, emphasis mine, text in italic as in original). I think it's interesting that Moggi talks about flattening computations, and not just lists.
Math notation / "Greek"
On the Greek used in mathematical notation: in category theory, the more common way to introduce monads is through the natural transformations that correspond to pure and flatten, the morphisms corresponding to flatMap are deemphasized. However, nobody calls it "flatten". For example, Maclane calls the natural transformation corresponding to method pure "unit" (not to be confused with method unit), and flatten is usually called "multiplication", in analogy with Monoids. One might investigate further whether it was different when the "triple"-terminology was more prevalent.
flatMap
To find the origins of the flatMap portmanteau word, I'd propose to start with the most prominent popularizer today, and then try to backtrack from there. Apparently, flatMap is a Scala meme, so it seems reasonable to start from Scala. One might check the standard libraries (especially the List data structure) of the usual suspects: the languages that influenced Scala. These "roots" are named in Chapter 1, section 1.4 in Odersky's "Programming in Scala":
C, C++ and C# are probably not where it came from.
In Java it was the other way around: the flatMap came from Scala into version 1.8 of Java.
I can't say anything about Smalltalk
Ruby definitely has flat_map on Enumerable, but I don't know anything about Ruby, and I don't want to dig into the source code to find out when it was introduced.
Algol and Simula: definitely not.
Strangely enough ML (SML) seems to get by without flatMap, it only has concat (which is essentially the same as flatten). OCaml's lists also seem to have flatten, but no flatMap.
As you've already mentioned, Haskell had all this long ago, but in Haskell it is called bind and written as an operator
Erlang has flatmap on lists, but I'm not sure whether this is the origin, or whether it was introduced later. The problem with Erlang is that it is from 1986, back then there was no github.
I can't say anything about Iswim, Beta and gbeta.
I think it would be fair to say that flatMap has been popularized by Scala, for two reasons:
The flatMap took a prominent role in the design of Scala's collection library, and few years later it turned out to generalize nicely to huge distributed collections (Apache Spark and similar tools)
The flatMap became the favorite toy of everyone who decided to do functional programming on the JVM properly (Scalaz and libraries inspired by Scalaz, like Scala Cats)
To sum it up: the "flatten" terminology has been used in the context of monads since the very beginning. Later, it was combined with map into flatMap, and popularized by Scala, or more specifically by frameworks such as Apache Spark and Scalaz.

flatmap was introduced in Section 2.2.3 Sequences as Conventional Interfaces in "Structure and Interpretation of Computer Programs" as
(define (flatmap proc seq)
(accumulate append nil (map proc seq)))
The first edition of the book appeared in 1985.

Related

Replacing an ordinary function with a generic function

I'd like to use names such as elt, nth and mapcar with a new data structure that I am prototyping, but these names designate ordinary functions and so, I think, would need to be redefined as generic functions.
Presumably it's bad form to redefine these names?
Is there a way to tell defgeneric not to generate a program error and to go ahead and replace the function binding?
Is there a good reason for these not being generic functions or is just historic?
What's the considered wisdom and best practice here please?
If you are using SBCL or ABCL, and aren't concerned with ANSI compliance, you could investigate Extensible Sequences:
http://www.sbcl.org/manual/#Extensible-Sequences
http://www.doc.gold.ac.uk/~mas01cr/papers/ilc2007/sequences-20070301.pdf
...you can't redefine functions in the COMMON-LISP package, but you could create a new package and shadow the imports of the functions you want to redefine.
Is there a good reason for these not being generic functions or is just historic?
Common Lisp has some layers of language in some of its areas. Higher-level parts of the software might need to be built on lower-level constructs.
One of its goals was being fast enough for a range of applications.
Common Lisp also introduced the idea of sequences, the abstraction over lists and vectors, at a time, when the language didn't have an object-system. CLOS came several years after the initial Common Lisp design.
Take for example something like equality - for numbers.
Lisp has =:
(= a b)
That's the fastest way to compare numbers. = is also defined only for numbers.
Then there are eql, equal and equalp. Those work for numbers, but also for some other data types.
Now, if you need more speed, you can declare the types and tell the compiler to generate faster code:
(locally
(declare (fixnum a b)
(optimize (speed 3) (safety 0)))
(= a b))
So, why is = not a CLOS generic function?
a) it was introduced when CLOS did not exist
but equally important:
b) in Common Lisp it wasn't known (and it still isn't) how to make a CLOS generic function = as fast as a non-generic function for typical usage scenarios - while preserving dynamic typing and extensibility
CLOS generic function simply have a speed penalty. The runtime dispatch costs.
CLOS is best used for higher level code, which then really benefits from features like extensibility, multi-dispatch, inheritance/combinations. Generic functions should be used for defined generic behavior - not as collections of similar methods.
With better implementation technology, implementation-specific language enhancements, etc. it might be possible to increase the range of code which can be written in a performant way using CLOS. This has been tried with programming languages like Dylan and Julia.
Presumably it's bad form to redefine these names?
Common Lisp implementations don't let you replace them just so. Be aware, that your replacement functions should be implemented in a way which works consistently with the old functions. Also, old versions could be inlined in some way and not be replaceable everywhere.
Is there a way to tell defgeneric not to generate a program error and to go ahead and replace the function binding?
You would need to make sure that the replacement is working while replacing it. The code replacing functions, might use those function you are replacing.
Still, implementations allow you to replace CL functions - but this is implementation specific. For example LispWorks provides the variables lispworks:*packages-for-warn-on-redefinition* and lispworks:*handle-warn-on-redefinition*. One can bind them or change them globally.
What's the considered wisdom and best practice here please?
There are two approaches:
use implementation specific ways to replace standard Common Lisp functions
This can be dangerous. Plus you need to support it for all implementations of CL you want to use...
use a language package, where you define your new language. Here this would be standard Common Lisp plus your extensions/changes. Export everything the user would use. In your software use this package instead of CL.

What is the most minimal functional programming language?

What is the most minimal functional programming language?
It depends on what you mean by minimal.
To start with, the ancestor of functional languages is, first and foremost, mathematical logic. The computational use of certain logics came after the fact. In a sense, many mathematical systems (the cores of which are usually quite minimal) could be called functional languages. But I doubt that's what you're after!
Best known is Alonzo Church's lambda calculus, of which there are variants and descendants:
The simplest form is what's called the untyped lambda calculus; this contains nothing but lambda abstractions, with no restrictions on their use. The creation of data structures using only anonymous functions is done with what's called Church encoding and represents data by fundamental operations on it; the number 5 becomes "repeat something 5 times", and so on.
Lisp-family languages are little more than untyped lambda calculus, augmented with atomic values, cons cells, and a handful of other things. I'd suspect Scheme is the most minimalist here, as if memory serves me it was created first as a teaching language.
The original purpose of the lambda calculus, that of describing logical proofs, failed when the untyped form was shown to be inconsistent, which is a polite term for "lets you prove that false is true". (Historical trivia: the paper proving this, which was a significant thing at the time, did so by writing a logical proof that, in computational terms, went into an infinite loop.) Anyway, the use as a logic was recovered by introducing typed lambda calculus. These tend not to be directly useful as programming languages, however, particularly since being logically sound makes the language not Turing-complete.
However, similarly to how Lisps derive from untyped lambda calculus, a typed lambda calculus extended with built-in recursion, algebraic data types, and a few other things gets you the extended ML-family of languages. These tend to be pretty minimal
at heart, with syntactic constructs having straightforward translations to lambda terms in many cases. Besides the obvious ML dialects, this also includes Haskell and a few other languages. I'm not aware of any especially minimalist typed functional languages, however; such a language would likely suffer from poor usability far worse than a minimalist untyped language.
So as far as lambda calculus variants go, the pure untyped lambda calculus with no extra features is Turing-complete and about as minimal as you can get!
However, arguably more minimal is to eliminate the concept of "variables" entirely--in fact, this was originally done to simplify meta-mathematical proofs about logical systems, if memory serves me--and use only higher-order functions called combinators. Here we have:
Combinatory logic itself, as originally invented by Moses Schönfinkel and developed extensively by Haskell Curry. Each combinator is defined by a simple substitution rule, for instance Sxyz = xz(yz). The lowercase letters are used like variables in this definition, but keep in mind that combinatory logic itself doesn't use variables, or assign names to anything at all. Combinatory logic is minimal, to be sure, but not too friendly as a programming language. Best-known is the SK combinator base. S is defined as in the example above; K is Kxy = x. Those two combinators alone suffice to make it Turing-complete! This is almost frighteningly minimal.
Unlambda is a language based on SK combinators, extending it with a few extra combinators with special properties. Less minimal, but lets you write "Hello World".
Even two combinators is more than you need, though. Various one-combinator bases exist; perhaps the best known is the iota Combinator, defined as ιx = xSK, which is used in a minimalist language also called Iota
Also of some note is Lazy K, which is distinguished from Unlambda by not introducing additional combinators, having no side effects, and using lazy evaluation. Basically, it's the Haskell of the combinator-based-esoteric-language world. It supports both the SK base, as well as the iota combinator.
Which of those strikes you as most "minimal" is probably a matter of taste.
The arguably most minimal functional languages are iota and Jot, because they use only one combinator (while unlambda needs two). Here is a short explanation: http://web.archive.org/web/20061105204247/http://ling.ucsd.edu/~barker/Iota/
I'd imagine the most minimal functional "programming language" would be lambda calculus.
BrainF*ck is a simple, easy to use programming language. Here's a quick rundown.
Imagine you have a near-infinite range of boxes, each empty. Luckily, you are not alone! You can move back and forth along the line, put things in them, and take them out. Though quite basic, with enough time you can do about anything: http://www.iwriteiam.nl/Ha_bf_inter.html. Here are the commands.
+ | add one to currrent box
- | take one from current box
> | move one box to the right
< | move one box to the left
[] | loop
. | print current value
, | input current value
other stuff to look at:
P" | simplified BF
language f | newer simplified BF
http://www2.gvsu.edu/miljours/bf.html | cool BF stuff/intro
https://www.esolangs.org/wiki/Language_list | list of similar langs/variants
An esoteric programming language (a.k.a. esolang) is a programming language designed to test the boundaries of computer programming language design, as a proof of concept, as software art, as a hacking interface to another language (particularly functional programming or procedural programminglanguages), or as a joke. The use of esotericdistinguishes these languages from programming languages that working developers use to write software. Usually, an esolang's creators do not intend the language to be used for mainstream programming, although some esoteric features, such as visuospatial syntax, have inspired practical applications in the arts. Such languages are often popular among hackers and hobbyists.

Haskell "collections" language design

Why is the Haskell implementation so focused on linked lists?
For example, I know Data.Sequence is more efficient
with most of the list operations (except for the cons operation), and is used a lot;
syntactically, though, it is "hardly supported". Haskell has put a lot of effort into functional abstractions, such as the Functor and the Foldable class, but their syntax is not compatible with that of the default list.
If, in a project I want to optimize and replace my lists with sequences - or if I suddenly want support for infinite collections, and replace my sequences with lists - the resulting code changes are abhorrent.
So I guess my wondering can be made concrete in questions such as:
Why isn't the type of map equal to (Functor f) => (a -> b) -> f a -> f b?
Why can't the [] and (:) functions be used for, for example, the type in Data.Sequence?
I am really hoping there is some explanation for this, that doesn't include the words "backwards compatibility" or "it just grew that way", though if you think there isn't, please let me know. Any relevant language extensions are welcome as well.
Before getting into why, here's a summary of the problem and what you can do about it. The constructors [] and (:) are reserved for lists and cannot be redefined. If you plan to use the same code with multiple data types, then define or choose a type class representing the interface you want to support, and use methods from that class.
Here are some generalized functions that work on both lists and sequences. I don't know of a generalization of (:), but you could write your own.
fmap instead of map
mempty instead of []
mappend instead of (++)
If you plan to do a one-off data type replacement, then you can define your own names for things, and redefine them later.
-- For now, use lists
type List a = [a]
nil = []
cons x xs = x : xs
{- Switch to Seq in the future
-- type List a = Seq a
-- nil = empty
-- cons x xs = x <| xs
-}
Note that [] and (:) are constructors: you can also use them for pattern matching. Pattern matching is specific to one type constructor, so you can't extend a pattern to work on a new data type without rewriting the pattern-matchign code.
Why there's so much list-specific stuff in Haskell
Lists are commonly used to represent sequential computations, rather than data. In an imperative language, you might build a Set with a loop that creates elements and inserts them into the set one by one. In Haskell, you do the same thing by creating a list and then passing the list to Set.fromList. Since lists so closely match this abstraction of computation, they have a place that's unlikely to ever be superseded by another data structure.
The fact remains that some functions are list-specific when they could have been generic. Some common functions like map were made list-specific so that new users would have less to learn. In particular, they provide simpler and (it was decided) more understandable error messages. Since it's possible to use generic functions instead, the problem is really just a syntactic inconvenience. It's worth noting that Haskell language implementations have very little list-speficic code, so new data structures and methods can be just as efficient as the "built-in" ones.
There are several classes that are useful generalizations of lists:
Functor supplies fmap, a generalization of map.
Monoid supplies methods useful for collections with list-like structure. The empty list [] is generalized to other containers by mempty, and list concatenation (++) is generalized to other containers by mappend.
Applicative and Monad supply methods that are useful for interpreting collections as computations.
Traversable and Foldable supply useful methods for running computations over collections.
Of these, only Functor and Monad were in the influential Haskell 98 spec, so the others have been overlooked to varying degrees by library writers, depending on when the library was written and how actively it was maintained. The core libraries have been good about supporting new interfaces.
I remember reading somewhere that map is for lists by default since newcomers to Haskell would be put off if they made a mistake and saw a complex error about "Functors", which they have no idea about. Therefore, they have both map and fmap instead of just map.
EDIT: That "somewhere" is the Monad Reader Issue 13, page 20, footnote 3:
3You might ask why we need a separate map function. Why not just do away with the current
list-only map function, and rename fmap to map instead? Well, that’s a good question. The
usual argument is that someone just learning Haskell, when using map incorrectly, would much
rather see an error about lists than about Functors.
For (:), the (<|) function seems to be a replacement. I have no idea about [].
A nitpick, Data.Sequence isn't more efficient for "list operations", it is more efficient for sequence operations. That said, a lot of the functions in Data.List are really sequence operations. The finger tree inside Data.Sequence has to do quite a bit more work for a cons (<|) equivalent to list (:), and its memory representation is also somewhat larger than a list as it is made from two data types a FingerTree and a Deep.
The extra syntax for lists is fine, it hits the sweet spot at what lists are good at - cons (:) and pattern-matching from the left. Whether or not sequences should have extra syntax is further debate, but as you can get a very long way with lists, and lists are inherently simple, having good syntax is a must.
List isn't an ideal representation for Strings - the memory layout is inefficient as each Char is wrapped with a constructor. This is why ByteStrings were introduced. Although they are laid out as an array ByteStrings have to do a bit of administrative work - [Char] can still be competitive if you are using short strings. In GHC there are language extensions to give ByteStrings more String-like syntax.
The other major lazy functional Clean has always represented strings as byte arrays, but its type system made this more practical - I believe the ByteString library uses unsafePerfomIO under the hood.
With version 7.8, ghc supports overloading list literals, compare the manual. For example, given appropriate IsList instances, you can write
['0' .. '9'] :: Set Char
[1 .. 10] :: Vector Int
[("default",0), (k1,v1)] :: Map String Int
['a' .. 'z'] :: Text
(quoted from the documentation).
I am pretty sure this won't be an answer to your question, but still.
I wish Haskell had more liberal function names(mixfix!) a la Agda. Then, the syntax for list constructors (:,[]) wouldn't have been magic; allowing us to at least hide the list type and use the same tokens for our own types.
The amount of code change while migrating between list and custom sequence types would be minimal then.
About map, you are a bit luckier. You can always hide map, and set it equal to fmap yourself.
import Prelude hiding(map)
map :: (Functor f) => (a -> b) -> f a -> f b
map = fmap
Prelude is great, but it isn't the best part of Haskell.

The difference between MapReduce and the map-reduce combination in functional programming

I read the mapreduce at http://en.wikipedia.org/wiki/MapReduce ,understood the example of how to get the count of a "word" in many "documents". However I did not understand the following line:
Thus the MapReduce framework transforms a list of (key, value) pairs into a list of values. This behavior is different from the functional programming map and reduce combination, which accepts a list of arbitrary values and returns one single value that combines all the values returned by map.
Can someone elaborate on the difference again(MapReduce framework VS map and reduce combination)? Especially, what does the reduce functional programming do?
Thanks a great deal.
The main difference would be that MapReduce is apparently patentable. (Couldn't help myself, sorry...)
On a more serious note, the MapReduce paper, as I remember it, describes a methodology of performing calculations in a massively parallelised fashion. This methodology builds upon the map / reduce construct which was well known for years before, but goes beyond into such matters as distributing the data etc. Also, some constraints are imposed on the structure of data being operated upon and returned by the functions used in the map-like and reduce-like parts of the computation (the thing about data coming in lists of key/value pairs), so you could say that MapReduce is a massive-parallelism-friendly specialisation of the map & reduce combination.
As for the Wikipedia comment on the function being mapped in the functional programming's map / reduce construct producing one value per input... Well, sure it does, but here there are no constraints at all on the type of said value. In particular, it could be a complex data structure like perhaps a list of things to which you would again apply a map / reduce transformation. Going back to the "counting words" example, you could very well have a function which, for a given portion of text, produces a data structure mapping words to occurrence counts, map that over your documents (or chunks of documents, as the case may be) and reduce the results.
In fact, that's exactly what happens in this article by Phil Hagelberg. It's a fun and supremely short example of a MapReduce-word-counting-like computation implemented in Clojure with map and something equivalent to reduce (the (apply + (merge-with ...)) bit -- merge-with is implemented in terms of reduce in clojure.core). The only difference between this and the Wikipedia example is that the objects being counted are URLs instead of arbitrary words -- other than that, you've got a counting words algorithm implemented with map and reduce, MapReduce-style, right there. The reason why it might not fully qualify as being an instance of MapReduce is that there's no complex distribution of workloads involved. It's all happening on a single box... albeit on all the CPUs the box provides.
For in-depth treatment of the reduce function -- also known as fold -- see Graham Hutton's A tutorial on the universality and expressiveness of fold. It's Haskell based, but should be readable even if you don't know the language, as long as you're willing to look up a Haskell thing or two as you go... Things like ++ = list concatenation, no deep Haskell magic.
Using the word count example, the original functional map() would take a set of documents, optionally distribute subsets of that set, and for each document emit a single value representing the number of words (or a particular word's occurrences) in the document. A functional reduce() would then add up the global counts for all documents, one for each document. So you get a total count (either of all words or a particular word).
In MapReduce, the map would emit a (word, count) pair for each word in each document. A MapReduce reduce() would then add up the count of each word in each document without mixing them into a single pile. So you get a list of words paired with their counts.
MapReduce is a framework built around splitting a computation into parallelizable mappers and reducers. It builds on the familiar idiom of map and reduce - if you can structure your tasks such that they can be performed by independent mappers and reducers, then you can write it in a way which takes advantage of a MapReduce framework.
Imagine a Python interpreter which recognized tasks which could be computed independently, and farmed them out to mapper or reducer nodes. If you wrote
reduce(lambda x, y: x+y, map(int, ['1', '2', '3']))
or
sum([int(x) for x in ['1', '2', '3']])
you would be using functional map and reduce methods in a MapReduce framework. With current MapReduce frameworks, there's a lot more plumbing involved, but it's the same concept.

Tail-recursion optimization in Oz

In the chapter about function in the Oz tutorial, it says that:
similar to lazy functional languages
Oz allows certain forms of
tail-recursion optimizations that are
not found in certain strict functional
languages including Standard ML,
Scheme, and the concurrent functional
language Erlang. However, standard
function definitions in Oz are not
lazy.
It then goes on to show the following function which is tail-recursive in Oz:
fun {Map Xs F}
case Xs
of nil then nil
[] X|Xr then {F X}|{Map Xr F}
end
end
What this does is, it maps the empty list to the empty list and non-empty list, to the result of applying the function F to its head and then prepending that to the result of calling Map on the tail. In other languages this would not be tail recursive, because the last operation is the prepend, not the recursive call to Map.
So my question is: If "standard function definitions in Oz are not lazy", what does Oz do that languages like Scheme or Erlang can't (or won't?) to be able to perform tail-recursion optimization for this function? And exactly when is a function tail-recursive in Oz?
This is called Tail Recursion Modulo Cons. Basically, prepending to the list directly after the recursive call is the same as appending to the list directly before the recursive call (and thus building the list as a "side-effect" of the purely functional "loop"). This is a generalization of tail recursion that works not just with cons lists but any data constructor with constant operations.
It was first described (but not named) as a LISP compilation technique in 1974 by Daniel P. Friedman and David S. Wise in Technical Report TR19: Unwinding Structured Recursions into Iterations and it was formally named and introduced by David H. D. Warren in 1980 in the context of writing the first-ever Prolog compiler.
The interesting thing about Oz, though, is that TRMC is neither a language feature nor an explicit compiler optimization, it's just a side-effect of the language's execution semantics. Specifically, the fact that Oz is a declarative concurrent constraint language, which means that every variable is a dataflow variable (or "everything is a promise", including every storage location). Since everything is a promise, we can model returning from a function as first setting up the return value as a promise, and then later on fulfilling it.
Peter van Roy, co-author of the book Concepts, Techniques, and Models of Computer Programming by Peter Van Roy and Seif Haridi, also one of the designers of Oz, and one of its implementators, explains how exactly TRMC works in a comment thread on Lambda the Ultimate: Tail-recursive map and declarative agents:
The above example of bad Scheme code turns into good tail-recursive Oz code when translated directly into Oz syntax. This gives:
fun {Map F Xs}
if Xs==nil then nil
else {F Xs.1}|{Map F Xs.2} end
end
This is because Oz has single-assignment variables. To understand the execution, we translate this example into the Oz kernel language (I give just a partial translation for clarity):
proc {Map F Xs Ys}
if Xs==nil then Ys=nil
else local Y Yr in
Ys=Y|Yr
{F Xs.1 Y}
{Map F Xs.2 Yr}
end end
end
That is, Map is tail-recursive because Yr is initially unbound. This is not just a clever trick; it is profound because it allows declarative concurrency and declarative multi-agent systems.
I am not too familiar with lazy functional languages, but if you think about the function Map in your question, it is easy to translate to a tail-recursive implementation if temporarily incomplete values in the heap are allowed (muted into more complete values one call at a time).
I have to assume that they are talking about this transformation in Oz. Lispers used to do this optimization by hand -- all values were mutable, in this case a function called setcdr would be used -- but you had to know what you were doing. Computers did not always have gigabytes of memory. It was justified to do this by hand, it arguably no longer is.
Back to your question, others modern languages do not do it automatically probably because it would be possible to observe the incomplete value while it is being built, and this must be what Oz has found a solution to. What other differences are there in Oz as compared to other languages that would explain it?

Resources