Can somebody clarify what a combinator is? What does a function need to also be a combinator? Is there a formal definition? Going through the Functional Programming in Scala book, and both terms seem to be used interchangeably.
Conventionally, functions represent modules of your program. Combinators, on the other hand, don't really represent modules in the same sense; instead, they are used to combine modules into a whole program.
A combinator 'combines' two (or more) values, in such a way that the arguments are 'part of' the result. A heuristic for recognizing combinators is that they take at least one argument with the same (or a similar) type as their own result.[1] So, for example, many (pseudo-Haskell):
many :: Parser a -> Parser [a]
is a combinator, because it takes a parser and returns a parser, and because the parser p is a part of the parser many p.
length :: [a] -> Int
is not a combinator, both because it takes a list but returns an Int, and because it discards most of the information from its argument.
(+) :: Int -> Int -> Int
is also not a combinator, becase while it takes Ints and returns an Int, it still discards part of the information from its arguments; 2 and 3 aren't really 'part of' 5 in the same way p is a part of many p.
The intuition is that when you're writing your program, you start by structuring it into pieces: parse the first bit of the input, parse this repeated element, parse the rest, put the parse trees together. You code those pieces, then apply combinators to them to get the final program:
putThePiecesTogether <$> parseTheFirstBit <*> many parseRepeatedBit <*> parseTheRest
[1] Combinators come from combinator libraries, which will also have '0-argument' combinators (constants or just functions of other types). Those are also important parts of the library and are probably combinators in their own right, so the real rule is that a combinator comes from a library most of whose functions both take and return values of the same type, in such a way that the arguments are 'part of' the result.
Related
I'm trying to understand the theorem behind "call-by-need." I do understand the definition, but I'm a bit confused. I would like to see a simple example which shows how call-by-need works.
After reading some previous threads, I found out that Haskell uses this kind of evaluation. Are there any other programming languages which support this feature?
I read about the call-by-name of Scala, and I do understand that call-by-name and call-by-need are similar but different by the fact that call-by-need will keep the evaluated value. But I really would love to see a real-life example (it does not have to be in Haskell), which shows call-by-need.
The function
say_hello numbers = putStrLn "Hello!"
ignores its numbers argument. Under call-by-value semantics, even though an argument is ignored, the parameter at the function call site may need to be evaluated, perhaps because of side effects that the rest of the program depends on.
In Haskell, we might call say_hello as
say_hello [1..]
where [1..] is the infinite list of naturals. Under call-by-value semantics, the CPU would run off trying to build an infinite list and never get to the say_hello at all!
Haskell merely outputs
$ runghc cbn.hs
Hello!
For less dramatic examples, the first ten natural numbers are
ghci> take 10 [1..]
[1,2,3,4,5,6,7,8,9,10]
The first ten odds are
ghci> take 10 $ filter odd [1..]
[1,3,5,7,9,11,13,15,17,19]
Under call-by-need semantics, each value — even a conceptually infinite one as in the examples above — is evaluated only to the extent required and no more.
update: A simple example, as asked for:
ff 0 = 1
ff 1 = 1
ff n = go (ff (n-1))
where
go x = x + x
Under call-by-name, each invocation of go evaluates ff (n-1) twice, each for each appearance of x in its definition (because + is strict in both arguments, i.e. demands the values of the both of them).
Under call-by-need, go's argument is evaluated at most once. Specifically, here, x's value is found out only once, and reused for the second appearance of x in the expression x + x. If it weren't needed, x wouldn't be evaluated at all, just as with call-by-name.
Under call-by-value, go's argument is always evaluated exactly once, prior to entering the function's body, even if it isn't used anywhere in the function's body.
Here's my understanding of it, in the context of Haskell.
According to Wikipedia, "call by need is a memoized variant of call by name where, if the function argument is evaluated, that value is stored for subsequent uses."
Call by name:
take 10 . filter even $ [1..]
With one consumer the produced value disappears after being produced so it might as well be call-by-name.
Call by need:
import qualified Data.List.Ordered as O
h = 1 : map (2*) h <> map (3*) h <> map (5*) h
where
(<>) = O.union
The difference is, here the h list is reused by several consumers, at different tempos, so it is essential that the produced values are remembered. In a call-by-name language there'd be much replication of computational effort here because the computational expression for h would be substituted at each of its occurrences, causing separate calculation for each. In a call-by-need--capable language like Haskell the results of computing the elements of h are shared between each reference to h.
Another example is, most any data defined by fix is only possible under call-by-need. With call-by-value the most we can have is the Y combinator.
See: Sharing vs. non-sharing fixed-point combinator and its linked entries and comments (among them, this, and its links, like Can fold be used to create infinite lists?).
I am trying to compute two finite sets of some enumerable type (let's say char) using a least- and greatest- fixpoint computation, respectively. I want my definitions to be extractable to SML, and to be "semi-efficient" when executed. What are my options?
From exploring the HOL library and playing around with code generation, I have the following observations:
I could use the complete_lattice.lfp and complete_lattice.gfp constants with a pair of additional monotone functions to compute my sets, which in fact I currently am doing. Code generation does work with these constants, but the code produced is horribly inefficient, and if I understand the generated SML code correctly is performing an exhaustive search over every possible set in the powerset of characters. Any use, no matter how simple, of these two constants at type char therefore causes a divergence when executed.
I could try to make use of the iterative fixpoint described by the Kleene fixpoint theorem in directed complete partial orders. From exploring, there's a ccpo_class.fixp constant in the theory Complete_Partial_Order, but the underlying iterates constant that this is defined in terms of has no associated code equations, and so code cannot be extracted.
Are there any existing fixpoint combinators hiding somewhere, suitable for use with finite sets, that produce semi-efficient code with code generation that I have missed?
None of the general fixpoint combinators in Isabelle's standard library is meant to used directly for code extraction because their construction is too general to be usable in practice. (There is another one in the theory ~~/src/HOL/Library/Bourbaki_Witt_Fixpoint.) But the theory ~~/src/HOL/Library/While_Combinator connects the lfp and gfp fixpoints to the iterative implementation you are looking for, see theorems lfp_while_lattice and gfp_while_lattice. These characterisations have the precondition that the function is monotone, so they cannot be used as code equations directly. So you have two options:
Use the while combinator instead of lfp/gfp in your code equations and/or definitions.
Tell the code preprocessor to use lfp_while_lattice as a [code_unfold] equation. This works if you also add all the rules that the preprocessor needs to prove the assumptions of these equations for the instances at which it should apply. Hence, I recommend that you also add as [code_unfold] the monotonicity statement of your function and the theorem to prove the finiteness of char set, i.e., finite_class.finite.
When I try to write
fun foo :: "nat ⇒ nat"
where "foo = Suc"
Isabelle complains that "Function has no arguments". Why is this? What's wrong with a fun having no arguments? I know that I can change fun to abbreviation or definition and all is fine. But it seems a shame to spoil the uniformity of my .thy file, in which every other definition is declared with fun.
Alex Krauss, the author of the present generation of fun and function in Isabelle/HOL had particular opinions about that, and probably also good formal reasons to say that a "function" really needs to have arguments. In SML you actually have a similar situation: "constants" without arguments are defined via val not fun.
In the rare situations, where zero-argument functions are needed in Isabelle/HOL, it is sufficiently easy to use definition [simp] "c = t to get mostly the same result, apart from the name of the key theorems produced internally: c_def versus c.simps.
I think the main inconvenience and occasional pitfal of function in this respect is its exposure of the auxiliary c_def that is not meant to be used in applications: it unfolds the internal construction behind the function specification, not its main characterizing equation.
Since no other answer seems to be on its way, let me repeat and extend on my previous comment. In Isabelle/HOL there are three ways of defining functions:
definition for non-recursive functions (which could just be seen as constants that serve as abbreviations for longer statements).
primrec for primitive recursive functions (in the sense that in every recursive call there is a fixed argument where a datatype constructor is removed).
fun for general recursive functions.
Both, primrec and fun expect at least one argument. For the former it is automatically checked that one of its arguments corresponds to the syntactic pattern of primitive recursion on datatypes, while for the latter the task of proving "termination" (or rather the well-foundedness of the call graph) will be delegated to the user in hard cases.
Anyway, it would of course be possible to relay primrec and fun to definition for easy cases without arguments, but at least to me this rather seems to obfuscate things for the user instead of clearing them up.
I recently get confused with quotation, reification and reflection. Someone could offer a good explanation about their relationship and differences (if any)?
Quoting
This is probably the easiest one. Consider what happens when you type the following into the REPL:
(+ a 1)
REPL stands for Read Eval Print Loop, so first it Reads this in. This is a list, so after reading we have a list of 3 elements containing: <the symbol "+"> <the symbol "a"> <the number 1>
The next step is evaluation. Evaluating a list in Common Lisp involves looking up the function (or macro) bound to the first item in the list. Since + is bound to a function and not a macro, it then evaluates each subsequent element in the list. Numbers evaluate to themselves, and "a" will evaluate to whatever it is bound to. Now that the arguments are evaluated, the function "+" is called with the results of the evaluation.
We then Print the result and Loop back to the Read step
So this is great, but what if we want something that, when evaluated, will end up as a list of 3 elements containing <the symbol "+"> <the symbol "a"> <the number 1>? The solution to this is quoting. Lisps in general have a special form called "quote" that takes a single argument, and the result is that argument, unevaluated. So
(quote (+ a 1))
will evaluate to that list. As some syntactic sugar, ' is treated the same as (quote ), so we can just write '(+ a 1).
Reification
Reification is a generic term that roughly means "Make an abstract concept" concrete. Specific to programming, when something is reified, it roughly means you can treat it as data (or "First-class object"). An example of this in lisp is functions the lambda expression gives you the ability to create a concrete, first-class object that represents the abstract concept of a function call. Another example is a CLOS class, which is itself a CLOS object, that represents the abstract concept of a class.
Reflection
To a certain degree, reflection is the opposite of Reification. Given something concrete, you want some information about it's abstract representation. Consider a Common Lisp Package object, which is a reification of the concept of a package, which is just a mapping from symbol names to symbols. You can use do-symbols to iterate over all of the symbols in a package, thus getting that information out at runtime.
Also, remember how I said lambda's were a reification of functions? Well "function-lambda-expression" is the reflection on functions.
The metaobject protocol (MOP) is a semi-standard way of doing all sorts of things with classes and objects. Among other things, it allows reflection on classes and objects.
Why is the Haskell implementation so focused on linked lists?
For example, I know Data.Sequence is more efficient
with most of the list operations (except for the cons operation), and is used a lot;
syntactically, though, it is "hardly supported". Haskell has put a lot of effort into functional abstractions, such as the Functor and the Foldable class, but their syntax is not compatible with that of the default list.
If, in a project I want to optimize and replace my lists with sequences - or if I suddenly want support for infinite collections, and replace my sequences with lists - the resulting code changes are abhorrent.
So I guess my wondering can be made concrete in questions such as:
Why isn't the type of map equal to (Functor f) => (a -> b) -> f a -> f b?
Why can't the [] and (:) functions be used for, for example, the type in Data.Sequence?
I am really hoping there is some explanation for this, that doesn't include the words "backwards compatibility" or "it just grew that way", though if you think there isn't, please let me know. Any relevant language extensions are welcome as well.
Before getting into why, here's a summary of the problem and what you can do about it. The constructors [] and (:) are reserved for lists and cannot be redefined. If you plan to use the same code with multiple data types, then define or choose a type class representing the interface you want to support, and use methods from that class.
Here are some generalized functions that work on both lists and sequences. I don't know of a generalization of (:), but you could write your own.
fmap instead of map
mempty instead of []
mappend instead of (++)
If you plan to do a one-off data type replacement, then you can define your own names for things, and redefine them later.
-- For now, use lists
type List a = [a]
nil = []
cons x xs = x : xs
{- Switch to Seq in the future
-- type List a = Seq a
-- nil = empty
-- cons x xs = x <| xs
-}
Note that [] and (:) are constructors: you can also use them for pattern matching. Pattern matching is specific to one type constructor, so you can't extend a pattern to work on a new data type without rewriting the pattern-matchign code.
Why there's so much list-specific stuff in Haskell
Lists are commonly used to represent sequential computations, rather than data. In an imperative language, you might build a Set with a loop that creates elements and inserts them into the set one by one. In Haskell, you do the same thing by creating a list and then passing the list to Set.fromList. Since lists so closely match this abstraction of computation, they have a place that's unlikely to ever be superseded by another data structure.
The fact remains that some functions are list-specific when they could have been generic. Some common functions like map were made list-specific so that new users would have less to learn. In particular, they provide simpler and (it was decided) more understandable error messages. Since it's possible to use generic functions instead, the problem is really just a syntactic inconvenience. It's worth noting that Haskell language implementations have very little list-speficic code, so new data structures and methods can be just as efficient as the "built-in" ones.
There are several classes that are useful generalizations of lists:
Functor supplies fmap, a generalization of map.
Monoid supplies methods useful for collections with list-like structure. The empty list [] is generalized to other containers by mempty, and list concatenation (++) is generalized to other containers by mappend.
Applicative and Monad supply methods that are useful for interpreting collections as computations.
Traversable and Foldable supply useful methods for running computations over collections.
Of these, only Functor and Monad were in the influential Haskell 98 spec, so the others have been overlooked to varying degrees by library writers, depending on when the library was written and how actively it was maintained. The core libraries have been good about supporting new interfaces.
I remember reading somewhere that map is for lists by default since newcomers to Haskell would be put off if they made a mistake and saw a complex error about "Functors", which they have no idea about. Therefore, they have both map and fmap instead of just map.
EDIT: That "somewhere" is the Monad Reader Issue 13, page 20, footnote 3:
3You might ask why we need a separate map function. Why not just do away with the current
list-only map function, and rename fmap to map instead? Well, that’s a good question. The
usual argument is that someone just learning Haskell, when using map incorrectly, would much
rather see an error about lists than about Functors.
For (:), the (<|) function seems to be a replacement. I have no idea about [].
A nitpick, Data.Sequence isn't more efficient for "list operations", it is more efficient for sequence operations. That said, a lot of the functions in Data.List are really sequence operations. The finger tree inside Data.Sequence has to do quite a bit more work for a cons (<|) equivalent to list (:), and its memory representation is also somewhat larger than a list as it is made from two data types a FingerTree and a Deep.
The extra syntax for lists is fine, it hits the sweet spot at what lists are good at - cons (:) and pattern-matching from the left. Whether or not sequences should have extra syntax is further debate, but as you can get a very long way with lists, and lists are inherently simple, having good syntax is a must.
List isn't an ideal representation for Strings - the memory layout is inefficient as each Char is wrapped with a constructor. This is why ByteStrings were introduced. Although they are laid out as an array ByteStrings have to do a bit of administrative work - [Char] can still be competitive if you are using short strings. In GHC there are language extensions to give ByteStrings more String-like syntax.
The other major lazy functional Clean has always represented strings as byte arrays, but its type system made this more practical - I believe the ByteString library uses unsafePerfomIO under the hood.
With version 7.8, ghc supports overloading list literals, compare the manual. For example, given appropriate IsList instances, you can write
['0' .. '9'] :: Set Char
[1 .. 10] :: Vector Int
[("default",0), (k1,v1)] :: Map String Int
['a' .. 'z'] :: Text
(quoted from the documentation).
I am pretty sure this won't be an answer to your question, but still.
I wish Haskell had more liberal function names(mixfix!) a la Agda. Then, the syntax for list constructors (:,[]) wouldn't have been magic; allowing us to at least hide the list type and use the same tokens for our own types.
The amount of code change while migrating between list and custom sequence types would be minimal then.
About map, you are a bit luckier. You can always hide map, and set it equal to fmap yourself.
import Prelude hiding(map)
map :: (Functor f) => (a -> b) -> f a -> f b
map = fmap
Prelude is great, but it isn't the best part of Haskell.