A Functional-Imperative Hybrid - functional-programming

Pure functional programming languages do not allow mutable data, but some computations are more naturally/intuitively expressed in an imperative way -- or an imperative version of an algorithm may be more efficient. I am aware that most functional languages are not pure, and let you assign/reassign variables and do imperative things but generally discourage it.
My question is, why not allow local state to be manipulated in local variables, but require that functions can only access their own locals and global constants (or just constants defined in an outer scope)? That way, all functions maintain referential transparency (they always give the same return value given the same arguments), but within a function, a computation can be expressed in imperative terms (like, say, a while loop).
IO and such could still be accomplished in the normal functional ways - through monads or passing around a "world" or "universe" token.

My question is, why not allow local state to be manipulated in local variables, but require that functions can only access their own locals and global constants (or just constants defined in an outer scope)?
Good question. I think the answer is that mutable locals are of limited practical value but mutable heap-allocated data structures (primarily arrays) are enormously valuable and form the backbone of many important collections including efficient stacks, queues, sets and dictionaries. So restricting mutation to locals only would not give an otherwise purely functional language any of the important benefits of mutation.
On a related note, communicating sequential processes exchanging purely functional data structures offer many of the benefits of both worlds because the sequential processes can use mutation internally, e.g. mutable message queues are ~10x faster than any purely functional queues. For example, this is idiomatic in F# where the code in a MailboxProcessor uses mutable data structures but the messages communicated between them are immutable.
Sorting is a good case study in this context. Sedgewick's quicksort in C is short and simple and hundreds of times faster than the fastest purely functional sort in any language. The reason is that quicksort mutates the array in-place. Mutable locals would not help. Same story for most graph algorithms.

The short answer is: there are systems to allow what you want. For example, you can do it using the ST monad in Haskell (as referenced in the comments).
The ST monad approach is from Haskell's Control.Monad.ST. Code written in the ST monad can use references (STRef) where convenient. The nice part is that you can even use the results of the ST monad in pure code, as it is essentially self-contained (this is basically what you were wanting in the question).
The proof of this self-contained property is done through the type-system. The ST monad carries a state-thread parameter, usually denoted with a type-variable s. When you have such a computation you'll have monadic result, with a type like:
foo :: ST s Int
To actually turn this into a pure result, you have to use
runST :: (forall s . ST s a) -> a
You can read this type like: give me a computation where the s type parameter doesn't matter, and I can give you back the result of the computation, without the ST baggage. This basically keeps the mutable ST variables from escaping, as they would carry the s with them, which would be caught by the type system.
This can be used to good effect on pure structures that are implemented with underlying mutable structures (like the vector package). One can cast off the immutability for a limited time to do something that mutates the underlying array in place. For example, one could combine the immutable Vector with an impure algorithms package to keep the most of the performance characteristics of the in place sorting algorithms and still get purity.
In this case it would look something like:
pureSort :: Ord a => Vector a -> Vector a
pureSort vector = runST $ do
mutableVector <- thaw vector
sort mutableVector
freeze mutableVector
The thaw and freeze functions are linear-time copying, but this won't disrupt the overall O(n lg n) running time. You can even use unsafeFreeze to avoid another linear traversal, as the mutable vector isn't used again.

"Pure functional programming languages do not allow mutable data" ... actually it does, you just simply have to recognize where it lies hidden and see it for what it is.
Mutability is where two things have the same name and mutually exclusive times of existence so that they may be treated as "the same thing at different times". But as every Zen philosopher knows, there is no such thing as "same thing at different times". Everything ceases to exist in an instant and is inherited by its successor in possibly changed form, in a (possibly) uncountably-infinite succession of instants.
In the lambda calculus, mutability thus takes the form illustrated by the following example: (λx (λx f(x)) (x+1)) (x+1), which may also be rendered as "let x = x + 1 in let x = x + 1 in f(x)" or just "x = x + 1, x = x + 1, f(x)" in a more C-like notation.
In other words, "name clash" of the "lambda calculus" is actually "update" of imperative programming, in disguise. They are one and the same - in the eyes of the Zen (who is always right).
So, let's refer to each instant and state of the variable as the Zen Scope of an object. One ordinary scope with a mutable object equals many Zen Scopes with constant, unmutable objects that either get initialized if they are the first, or inherit from their predecessor if they are not.
When people say "mutability" they're misidentifying and confusing the issue. Mutability (as we've just seen here) is a complete red herring. What they actually mean (even unbeknonwst to themselves) is infinite mutability; i.e. the kind which occurs in cyclic control flow structures. In other words, what they're actually referring to - as being specifically "imperative" and not "functional" - is not mutability at all, but cyclic control flow structures along with the infinite nesting of Zen Scopes that this entails.
The key feature that lies absent in the lambda calculus is, thus, seen not as something that may be remedied by the inclusion of an overwrought and overthought "solution" like monads (though that doesn't exclude the possibility of it getting the job done) but as infinitary terms.
A control flow structure is the wrapping of an unwrapped (possibility infinite) decision tree structure. Branches may re-converge. In the corresponding unwrapped structure, they appear as replicated, but separate, branches or subtrees. Goto's are direct links to subtrees. A goto or branch that back-branches to an earlier part of a control flow structure (the very genesis of the "cycling" of a cyclic control flow structure) is a link to an identically-shaped copy of the entire structure being linked to. Corresponding to each structure is its Universally Unrolled decision tree.
More precisely, we may think of a control-flow structure as a statement that precedes an actual expression that conditions the value of that expression. The archetypical case in point is Landin's original case, itself (in his 1960's paper, where he tried to lambda-ize imperative languages): let x = 1 in f(x). The "x = 1" part is the statement, the "f(x)" is the value being conditioned by the statement. In C-like form, we could write this as x = 1, f(x).
More generally, corresponding to each statement S and expression Q is an expression S[Q] which represents the result Q after S is applied. Thus, (x = 1)[f(x)] is just λx f(x) (x + 1). The S wraps around the Q. If S contains cyclic control flow structures, the wrapping will be infinitary.
When Landin tried to work out this strategy, he hit a hard wall when he got to the while loop and went "Oops. Never mind." and fell back into what become an overwrought and overthought solution, while this simple (and in retrospect, obvious) answer eluded his notice.
A while loop "while (x < n) x = x + 1;" - which has the "infinite mutability" mentioned above, may itself be treated as an infinitary wrapper, "if (x < n) { x = x + 1; if (x < 1) { x = x + 1; if (x < 1) { x = x + 1; ... } } }". So, when it wraps around an expression Q, the result is (in C-like notation) "x < n? (x = x + 1, x < n? (x = x + 1, x < n? (x = x + 1, ...): Q): Q): Q", which may be directly rendered in lambda form as "x < n? (λx x < n (λx x < n? (λx·...) (x + 1): Q) (x + 1): Q) (x + 1): Q". This shows directly the connection between cyclicity and infinitariness.
This is an infinitary expression that, despite being infinite, has only a finite number of distinct subexpressions. Just as we can think of there being a Universally Unrolled form to this expression - which is similar to what's shown above (an infinite decision tree) - we can also think of there being a Maximally Rolled form, which could be obtained by labelling each of the distinct subexpressions and referring to the labels, instead. The key subexpressions would then be:
A: x < n? goto B: Q
B: x = x + 1, goto A
The subexpression labels, here, are "A:" and "B:", while the references to the subexpressions so labelled as "goto A" and "goto B", respectively. So, by magic, the very essence of Imperativitity emerges directly out of the infinitary lambda calculus, without any need to posit it separately or anew.
This way of viewing things applies even down to the level of binary files. Every interpretation of every byte (whether it be a part of an opcode of an instruction that starts 0, 1, 2 or more bytes back, or as part of a data structure) can be treated as being there in tandem, so that the binary file is a rolling up of a much larger universally unrolled structure whose physical byte code representation overlaps extensively with itself.
Thus, emerges the imperative programming language paradigm automatically out of the pure lambda calculus, itself, when the calculus is extended to include infinitary terms. The control flow structure is directly embodied in the very structure of the infinitary expression, itself; and thus requires no additional hacks (like Landin's or later descendants, like monads) - as it's already there.
This synthesis of the imperative and functional paradigms arose in the late 1980's via the USENET, but has not (yet) been published. Part of it was already implicit in the treatment (dating from around the same time) given to languages, like Prolog-II, and the much earlier treatment of cyclic recursive structures by infinitary expressions by Irene Guessarian LNCS 99 "Algebraic Semantics".
Now, earlier I said that the magma-based formulation might get you to the same place, or to an approximation thereof. I believe there is a kind of universal representation theorem of some sort, which asserts that the infinitary based formulation provides a purely syntactic representation, and that the semantics that arise from the monad-based representation factors through this as "monad-based semantics" = "infinitary lambda calculus" + "semantics of infinitary languages".
Likewise, we may think of the "Q" expressions above as being continuations; so there may also be a universal representation theorem for continuation semantics, which similarly rolls this formulation back into the infinitary lambda calculus.
At this point, I've said nothing about non-rational infinitary terms (i.e. infinitary terms which possess an infinite number of distinct subterms and no finite Minimal Rolling) - particularly in relation to interprocedural control flow semantics. Rational terms suffice to account for loops and branches, and so provide a platform for intraprocedural control flow semantics; but not as much so for the call-return semantics that are the essential core element of interprocedural control flow semantics, if you consider subprograms to be directly represented as embellished, glorified macros.
There may be something similar to the Chomsky hierarchy for infinitary term languages; so that type 3 corresponds to rational terms, type 2 to "algebraic terms" (those that can be rolled up into a finite set of "goto" references and "macro" definitions), and type 0 for "transcendental terms". That is, for me, an unresolved loose end, as well.

Related

Simple example of call-by-need

I'm trying to understand the theorem behind "call-by-need." I do understand the definition, but I'm a bit confused. I would like to see a simple example which shows how call-by-need works.
After reading some previous threads, I found out that Haskell uses this kind of evaluation. Are there any other programming languages which support this feature?
I read about the call-by-name of Scala, and I do understand that call-by-name and call-by-need are similar but different by the fact that call-by-need will keep the evaluated value. But I really would love to see a real-life example (it does not have to be in Haskell), which shows call-by-need.
The function
say_hello numbers = putStrLn "Hello!"
ignores its numbers argument. Under call-by-value semantics, even though an argument is ignored, the parameter at the function call site may need to be evaluated, perhaps because of side effects that the rest of the program depends on.
In Haskell, we might call say_hello as
say_hello [1..]
where [1..] is the infinite list of naturals. Under call-by-value semantics, the CPU would run off trying to build an infinite list and never get to the say_hello at all!
Haskell merely outputs
$ runghc cbn.hs
Hello!
For less dramatic examples, the first ten natural numbers are
ghci> take 10 [1..]
[1,2,3,4,5,6,7,8,9,10]
The first ten odds are
ghci> take 10 $ filter odd [1..]
[1,3,5,7,9,11,13,15,17,19]
Under call-by-need semantics, each value — even a conceptually infinite one as in the examples above — is evaluated only to the extent required and no more.
update: A simple example, as asked for:
ff 0 = 1
ff 1 = 1
ff n = go (ff (n-1))
where
go x = x + x
Under call-by-name, each invocation of go evaluates ff (n-1) twice, each for each appearance of x in its definition (because + is strict in both arguments, i.e. demands the values of the both of them).
Under call-by-need, go's argument is evaluated at most once. Specifically, here, x's value is found out only once, and reused for the second appearance of x in the expression x + x. If it weren't needed, x wouldn't be evaluated at all, just as with call-by-name.
Under call-by-value, go's argument is always evaluated exactly once, prior to entering the function's body, even if it isn't used anywhere in the function's body.
Here's my understanding of it, in the context of Haskell.
According to Wikipedia, "call by need is a memoized variant of call by name where, if the function argument is evaluated, that value is stored for subsequent uses."
Call by name:
take 10 . filter even $ [1..]
With one consumer the produced value disappears after being produced so it might as well be call-by-name.
Call by need:
import qualified Data.List.Ordered as O
h = 1 : map (2*) h <> map (3*) h <> map (5*) h
where
(<>) = O.union
The difference is, here the h list is reused by several consumers, at different tempos, so it is essential that the produced values are remembered. In a call-by-name language there'd be much replication of computational effort here because the computational expression for h would be substituted at each of its occurrences, causing separate calculation for each. In a call-by-need--capable language like Haskell the results of computing the elements of h are shared between each reference to h.
Another example is, most any data defined by fix is only possible under call-by-need. With call-by-value the most we can have is the Y combinator.
See: Sharing vs. non-sharing fixed-point combinator and its linked entries and comments (among them, this, and its links, like Can fold be used to create infinite lists?).

Correct way of writing recursive functions in CLP(R) with Prolog

I am very confused in how CLP works in Prolog. Not only do I find it hard to see the benefits (I do see it in specific cases but find it hard to generalise those) but more importantly, I can hardly make up how to correctly write a recursive predicate. Which of the following would be the correct form in a CLP(R) way?
factorial(0, 1).
factorial(N, F):- {
N > 0,
PrevN = N - 1,
factorial(PrevN, NewF),
F = N * NewF}.
or
factorial(0, 1).
factorial(N, F):- {
N > 0,
PrevN = N - 1,
F = N * NewF},
factorial(PrevN, NewF).
In other words, I am not sure when I should write code outside the constraints. To me, the first case would seem more logical, because PrevN and NewF belong to the constraints. But if that's true, I am curious to see in which cases it is useful to use predicates outside the constraints in a recursive function.
There are several overlapping questions and issues in your post, probably too many to coherently address to your complete satisfaction in a single post.
Therefore, I would like to state a few general principles first, and then—based on that—make a few specific comments about the code you posted.
First, I would like to address what I think is most important in your case:
LP &subseteq; CLP
This means simply that CLP can be regarded as a superset of logic programming (LP). Whether it is to be considered a proper superset or if, in fact, it makes even more sense to regard them as denoting the same concept is somewhat debatable. In my personal view, logic programming without constraints is much harder to understand and much less usable than with constraints. Given that also even the very first Prolog systems had a constraint like dif/2 and also that essential built-in predicates like (=)/2 perfectly fit the notion of "constraint", the boundaries, if they exist at all, seem at least somewhat artificial to me, suggesting that:
LP &approx; CLP
Be that as it may, the key concept when working with CLP (of any kind) is that the constraints are available as predicates, and used in Prolog programs like all other predicates.
Therefore, whether you have the goal factorial(N, F) or { N > 0 } is, at least in principle, the same concept: Both mean that something holds.
Note the syntax: The CLP(&Rscr;) constraints have the form { C }, which is {}(C) in prefix notation.
Note that the goal factorial(N, F) is not a CLP(&Rscr;) constraint! Neither is the following:
?- { factorial(N, F) }.
ERROR: Unhandled exception: type_error({factorial(_3958,_3960)},...)
Thus, { factorial(N, F) } is not a CLP(&Rscr;) constraint either!
Your first example therefore cannot work for this reason alone already. (In addition, you have a syntax error in the clause head: factorial (, so it also does not compile at all.)
When you learn working with a constraint solver, check out the predicates it provides. For example, CLP(&Rscr;) provides {}/1 and a few other predicates, and has a dedicated syntax for stating relations that hold about floating point numbers (in this case).
Other constraint solver provide their own predicates for describing the entities of their respective domains. For example, CLP(FD) provides (#=)/2 and a few other predicates to reason about integers. dif/2 lets you reason about any Prolog term. And so on.
From the programmer's perspective, this is exactly the same as using any other predicate of your Prolog system, whether it is built-in or stems from a library. In principle, it's all the same:
A goal like list_length(Ls, L) can be read as: "The length of the list Ls is L."
A goal like { X = A + B } can be read as: The number X is equal to the sum of A and B. For example, if you are using CLP(Q), it is clear that we are talking about rational numbers in this case.
In your second example, the body of the clause is a conjunction of the form (A, B), where A is a CLP(&Rscr;) constraint, and B is a goal of the form factorial(PrevN, NewF).
The point is: The CLP(&Rscr;) constraint is also a goal! Check it out:
?- write_canonical({a,b,c}).
{','(a,','(b,c))}
true.
So, you are simply using {}/1 from library(clpr), which is one of the predicates it exports.
You are right that PrevN and NewF belong to the constraints. However, factorial(PrevN, NewF) is not part of the mini-language that CLP(&Rscr;) implements for reasoning over floating point numbers. Therefore, you cannot pull this goal into the CLP(&Rscr;)-specific part.
From a programmer's perspective, a major attraction of CLP is that it blends in completely seamlessly into "normal" logic programming, to the point that it can in fact hardly be distinguished at all from it: The constraints are simply predicates, and written down like all other goals.
Whether you label a library predicate a "constraint" or not hardly makes any difference: All predicates can be regarded as constraints, since they can only constrain answers, never relax them.
Note that both examples you post are recursive! That's perfectly OK. In fact, recursive predicates will likely be the majority of situations in which you use constraints in the future.
However, for the concrete case of factorial, your Prolog system's CLP(FD) constraints are likely a better fit, since they are completely dedicated to reasoning about integers.

Struggling with the basics

I am trying to self-learn SML. Though I can write some SML code, I have not attained the ah ha.
In:
val x = 5;
how does binding a value of 5 to the name x differ from assigning a value of 5 to the memory location/variable x in imperative programming?
How does the above expression elucidate "a style that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data"?
What do I have to throw away about imperative programming to catch on FP quickly?
Please be gentle with me.
How does binding a value of 5 to the name x differ from assigning a value of 5 to the memory location/variable x in imperative programming?
The key difference between variables in functional programming and variables in imperative programming is that variables in functional programming cannot be modified.
Then why are they called “variables”?
Variables in functional programming languages don't vary in the sense that you can modify their value. However, they do vary in the mathematical sense in that they represent some unknown constant value. For example, consider the following quadratic expression:
This is a quadratic expression in one variable. The variable x varies in the sense that you can select any value for x. However, once you select a certain value for x you cannot change that value.
Now, if you have a quadractic equation then the choice of x is no longer arbitrary. For example, consider the following quadratic equation:
The only choices for x are those which satisfy the equation (i.e. x = -2 or x = -1.5).
A mathematical function on the other hand is a relation between two sets, called the domain (input set) and the codomain (output set). For example, consider the following function:
This function relates every x belonging to the set of real numbers to its corresponding 2x^2 + 7x + 6, also belonging to the set of real numbers. Again, we are free to choose any value for x. However, once we choose some value for x we are not allowed to modify it.
The point is that they are called variables because they vary in the mathematical sense that they can assume different values. Hence, they vary with instance. However, they do not vary with time.
This immutability of variables is important in functional programming because it makes your program referentially transparent (i.e. invoking a function can be thought of as simply replacing the function call with the function body).
If variables were allowed to vary with time then you wouldn't be able to simply substitute the function call with the function body because the substituted variable could change over time. Hence, your substituted function body could become invalid over time.
How does the above expression elucidate "a style that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data"?
The expression val x = 5; could be thought of as a function application (fn x => y) 5 where y is the rest of the program. Functional programming is all about functions, and immutability in the sense that variables only vary with instance and not with time.
In particular, mutability is considered bad because it is implicit. Anything that is implicit could potentially cause unforeseeable errors in your program. Hence, the Zen of Python explicitly states that:
Explicit is better than implicit.
In fact, mutable state is the primary cause of software complexity. Hence, functional programming tries to eschew mutability as much as possible. However, that doesn't mean that we only use immutable structures. We can use mutable structures. We just need to be explicit about it.
What do I have to throw away about imperative programming to catch on FP quickly?
Nothing. Functional programming and imperative programming are two ways of thinking about computation. Computation is a very abstract idea. What exactly is computation? How do we know which problems can be computed? These were some of the questions that mathematicians struggled with in the early nineteen hundreds.
To think about computation we need to have a formal model of computation (i.e. a formal system that captures the notion of computation). There are various models of computation. However, the most famous are the lambda calculus (which gave rise to functional programming) and turing machines (which gave rise to imperative programming).
Both the lambda calculus and the turing machine are equivalent in power. They both capture the notion of computing and every problem that can be computed can be expressed as either a lambda calculus expression or an equivalent turing machine. The only difference is the way in which you express your problem.
You don't have to throw away anything that you learned about imperative programming to understand functional programming. Neither one is better than the other. They are both just different ways of expressing the same problem. However, you do need to start thinking like a functional programmer to write good functional programs.

What is so special about Monads?

A monad is a mathematical structure which is heavily used in (pure) functional programming, basically Haskell. However, there are many other mathematical structures available, like for example applicative functors, strong monads, or monoids. Some have more specific, some are more generic. Yet, monads are much more popular. Why is that?
One explanation I came up with, is that they are a sweet spot between genericity and specificity. This means monads capture enough assumptions about the data to apply the algorithms we typically use and the data we usually have fulfills the monadic laws.
Another explanation could be that Haskell provides syntax for monads (do-notation), but not for other structures, which means Haskell programmers (and thus functional programming researchers) are intuitively drawn towards monads, where a more generic or specific (efficient) function would work as well.
I suspect that the disproportionately large attention given to this one particular type class (Monad) over the many others is mainly a historical fluke. People often associate IO with Monad, although the two are independently useful ideas (as are list reversal and bananas). Because IO is magical (having an implementation but no denotation) and Monad is often associated with IO, it's easy to fall into magical thinking about Monad.
(Aside: it's questionable whether IO even is a monad. Do the monad laws hold? What do the laws even mean for IO, i.e., what does equality mean? Note the problematic association with the state monad.)
If a type m :: * -> * has a Monad instance, you get Turing-complete composition of functions with type a -> m b. This is a fantastically useful property. You get the ability to abstract various Turing-complete control flows away from specific meanings. It's a minimal composition pattern that supports abstracting any control flow for working with types that support it.
Compare this to Applicative, for instance. There, you get only composition patterns with computational power equivalent to a push-down automaton. Of course, it's true that more types support composition with more limited power. And it's true that when you limit the power available, you can do additional optimizations. These two reasons are why the Applicative class exists and is useful. But things that can be instances of Monad usually are, so that users of the type can perform the most general operations possible with the type.
Edit:
By popular demand, here are some functions using the Monad class:
ifM :: Monad m => m Bool -> m a -> m a -> m a
ifM c x y = c >>= \z -> if z then x else y
whileM :: Monad m => (a -> m Bool) -> (a -> m a) -> a -> m a
whileM p step x = ifM (p x) (step x >>= whileM p step) (return x)
(*&&) :: Monad m => m Bool -> m Bool -> m Bool
x *&& y = ifM x y (return False)
(*||) :: Monad m => m Bool -> m Bool -> m Bool
x *|| y = ifM x (return True) y
notM :: Monad m => m Bool -> m Bool
notM x = x >>= return . not
Combining those with do syntax (or the raw >>= operator) gives you name binding, indefinite looping, and complete boolean logic. That's a well-known set of primitives sufficient to give Turing completeness. Note how all the functions have been lifted to work on monadic values, rather than simple values. All monadic effects are bound only when necessary - only the effects from the chosen branch of ifM are bound into its final value. Both *&& and *|| ignore their second argument when possible. And so on..
Now, those type signatures may not involve functions for every monadic operand, but that's just a cognitive simplification. There would be no semantic difference, ignoring bottoms, if all the non-function arguments and results were changed to () -> m a. It's just friendlier to users to optimize that cognitive overhead out.
Now, let's look at what happens to those functions with the Applicative interface.
ifA :: Applicative f => f Bool -> f a -> f a -> f a
ifA c x y = (\c' x' y' -> if c' then x' else y') <$> c <*> x <*> y
Well, uh. It got the same type signature. But there's a really big problem here already. The effects of both x and y are bound into the composed structure, regardless of which one's value is selected.
whileA :: Applicative f => (a -> f Bool) -> (a -> f a) -> a -> f a
whileA p step x = ifA (p x) (whileA p step <$> step x) (pure x)
Well, ok, that seems like it'd be ok, except for the fact that it's an infinite loop because ifA will always execute both branches... Except it's not even that close. pure x has the type f a. whileA p step <$> step x has the type f (f a). This isn't even an infinite loop. It's a compile error. Let's try again..
whileA :: Applicative f => (a -> f Bool) -> (a -> f a) -> a -> f a
whileA p step x = ifA (p x) (whileA p step <*> step x) (pure x)
Well shoot. Don't even get that far. whileA p step has the type a -> f a. If you try to use it as the first argument to <*>, it grabs the Applicative instance for the top type constructor, which is (->), not f. Yeah, this isn't gonna work either.
In fact, the only function from my Monad examples that would work with the Applicative interface is notM. That particular function works just fine with only a Functor interface, in fact. The rest? They fail.
Of course it's to be expected that you can write code using the Monad interface that you can't with the Applicative interface. It is strictly more powerful, after all. But what's interesting is what you lose. You lose the ability to compose functions that change what effects they have based on their input. That is, you lose the ability to write certain control-flow patterns that compose functions with types a -> f b.
Turing-complete composition is exactly what makes the Monad interface interesting. If it didn't allow Turing-complete composition, it would be impossible for you, the programmer, to compose together IO actions in any particular control flow that wasn't nicely prepackaged for you. It was the fact that you can use the Monad primitives to express any control flow that made the IO type a feasible way to manage the IO problem in Haskell.
Many more types than just IO have semantically valid Monad interfaces. And it happens that Haskell has the language facilities to abstract over the entire interface. Due to those factors, Monad is a valuable class to provide instances for, when possible. Doing so gets you access to all the existing abstract functionality provided for working with monadic types, regardless of what the concrete type is.
So if Haskell programmers seem to always care about Monad instances for a type, it's because it's the most generically-useful instance that can be provided.
First, I think that it is not quite true that monads are much more popular than anything else; both Functor and Monoid have many instances that are not monads. But they are both very specific; Functor provides mapping, Monoid concatenation. Applicative is the one class that I can think of that is probably underused given its considerable power, due largely to its being a relatively recent addition to the language.
But yes, monads are extremely popular. Part of that is the do notation; a lot of Monoids provide Monad instances that merely append values to a running accumulator (essentially an implicit writer). The blaze-html library is a good example. The reason, I think, is the power of the type signature (>>=) :: Monad m => m a -> (a -> m b) -> m b. While fmap and mappend are useful, what they can do is fairly narrowly constrained. bind, however, can express a wide variety of things. It is, of course, canonized in the IO monad, perhaps the best pure functional approach to IO before streams and FRP (and still useful beside them for simple tasks and defining components). But it also provides implicit state (Reader/Writer/ST), which can avoid some very tedious variable passing. The various state monads, especially, are important because they provide a guarantee that state is single threaded, allowing mutable structures in pure (non-IO) code before fusion. But bind has some more exotic uses, such as flattening nested data structures (the List and Set monads), both of which are quite useful in their place (and I usually see them used desugared, calling liftM or (>>=) explicitly, so it is not a matter of do notation). So while Functor and Monoid (and the somewhat rarer Foldable, Alternative, Traversable, and others) provide a standardized interface to a fairly straightforward function, Monad's bind is considerably more flexibility.
In short, I think that all your reasons have some role; the popularity of monads is due to a combination of historical accident (do notation and the late definition of Applicative) and their combination of power and generality (relative to functors, monoids, and the like) and understandability (relative to arrows).
Well, first let me explain what the role of monads is: Monads are very powerful, but in a certain sense: You can pretty much express anything using a monad. Haskell as a language doesn't have things like action loops, exceptions, mutation, goto, etc. Monads can be expressed within the language (so they are not special) and make all of these reachable.
There is a positive and a negative side to this: It's positive that you can express all those control structures you know from imperative programming and a whole bunch of them you don't. I have just recently developed a monad that lets you reenter a computation somewhere in the middle with a slightly changed context. That way you can run a computation, and if it fails, you just try again with slightly adjusted values. Furthermore monadic actions are first class, and that's how you build things like loops or exception handling. While while is primitive in C in Haskell it's actually just a regular function.
The negative side is that monads give you pretty much no guarantees whatsoever. They are so powerful that you are allowed to do whatever you want, to put it simply. In other words just like you know from imperative languages it can be hard to reason about code by just looking at it.
The more general abstractions are more general in the sense that they allow some concepts to be expressed which you can't express as monads. But that's only part of the story. Even for monads you can use a style known as applicative style, in which you use the applicative interface to compose your program from small isolated parts. The benefit of this is that you can reason about code by just looking at it and you can develop components without having to pay attention to the rest of your system.
What is so special about monads?
The monadic interface's main claim to fame in Haskell is its role in the replacement of the original and unwieldy dialogue-based I/O mechanism.
As for their status in a formal investigative context...it is merely an iteration of a seemingly-cyclic endeavour which is now (2021 Oct) approximately one half-century old:
During the 1960s, several researchers began work on proving things about programs. Efforts were
made to prove that:
A program was correct.
Two programs with different code computed the same answers when given the
same inputs.
One program was faster than another.
A given program would always terminate.
While these are abstract goals, they are all, really, the same as the practical goal of "getting the
program debugged".
Several difficult problems emerged from this work. One was the problem of specification: before
one can prove that a program is correct, one must specify the meaning of "correct", formally and
unambiguously. Formal systems for specifying the meaning of a program were developed, and they
looked suspiciously like programming languages.
The Anatomy of Programming Languages, Alice E. Fischer and Frances S. Grodzinsky.
(emphasis by me.)
...back when "programming languages" - apart from an intrepid few - were most definitely imperative.
Anyone for elevating this mystery to the rank of Millenium problem? Solving it would definitely advance the science of computing and the engineering of software, one way or the other...
Monads are special because of do notation, which lets you write imperative programs in a functional language. Monad is the abstraction that allows you to splice together imperative programs from smaller, reusable components (which are themselves imperative programs). Monad transformers are special because they represent enhancing an imperative language with new features.

Algebraic structure and programming

May anyone give me an example how we can improve our code reusability using algebraic structures like groups, monoids and rings? (or how can i make use of these kind of structures in programming, knowing at least that i didn't learn all that theory in highschool for nothing).
I heard this is possible but i can't figure out a way applying them in programming and genereally applying hardcore mathematics in programming.
It is not really the mathematical stuff that helps as is the mathematical thinking. Abstraction is the key in programming. Transforming real live concepts into numbers and relations is what we do every day. Algebra is the mother of all, algebra is the set of rules that defines correctness, it is the highest level of abstraction, so, understanding algebra means you can think more clear, more faster, more efficient. Commencing from Sets theory to Category Theory, Domain Theory etc everything comes from practical challenges, abstraction and generalization requirements.
In common practice you will not need to actually know these, although if you are thinking of developing stuff like AI Agents, programming languages, fundamental concepts and tools then they are a must.
In functional programming, esp. Haskell, it's common to structure programs that transform states as monads. Doing so means you can reuse generic algorithms on monads in very different programs.
The C++ standard template library features the concept of a monoid. The idea is again that generic algorithms may require an operation to satisfies the axioms of monoids for their correctness.
E.g., if we can prove the type T we're operating on (numbers, string, whatever) is closed under the operation, we know we won't have to check for certain errors; we always get a valid T back. If we can prove that the operation is associative (x * (y * z) = (x * y) * z), then we can reuse the fork-join architecture; a simple but way of parallel programming implemented in various libraries.
Computer science seems to get a lot of milage out of category theory these days. You get monads, monoids, functors -- an entire bestiary of mathematical entities that are being used to improve code reusability, harnessing the abstraction of abstract mathematics.
Lists are free monoids with one generator, binary trees are groups. You have either the finite or infinite variant.
Starting points:
http://en.wikipedia.org/wiki/Algebraic_data_type
http://en.wikipedia.org/wiki/Initial_algebra
http://en.wikipedia.org/wiki/F-algebra
You may want to learn category theory, and the way category theory approaches algebraic structures: it is exactly the way functional programming languages approach data structures, at least shapewise.
Example: the type Tree A is
Tree A = () | Tree A | Tree A * Tree A
which reads as the existence of a isomorphism (*) (I set G = Tree A)
1 + G + G x G -> G
which is the same as a group structure
phi : 1 + G + G x G -> G
() € 1 -> e
x € G -> x^(-1)
(x, y) € G x G -> x * y
Indeed, binary trees can represent expressions, and they form an algebraic structure. An element of G reads as either the identity, an inverse of an element or the product of two elements. A binary tree is either a leaf, a single tree or a pair of trees. Note the similarity in shape.
(*) as well as a universal property, but they are two of them (finite trees or infinite lazy trees) so I won't dwelve into details.
As I had no idea this stuff existed in the computer science world, please disregard this answer ;)
I don't think the two fields (no pun intended) have any overlap. Rings/fields/groups deal with mathematical objects. Consider a part of the definition of a field:
For every a in F, there exists an element −a in F, such that a + (−a) = 0. Similarly, for any a in F other than 0, there exists an element a^−1 in F, such that a · a^−1 = 1. (The elements a + (−b) and a · b^−1 are also denoted a − b and a/b, respectively.) In other words, subtraction and division operations exist.
What the heck does this mean in terms of programming? I surely can't have an additive inverse of a list object in Python (well, I could just destroy the object, but that is like the multiplicative inverse. I guess you could get somewhere trying to define a Python-ring, but it just won't work out in the end). Don't even think about dividing lists...
As for code readability, I have absolutely no idea how this can even be applied, so this application is irrelevant.
This is my interpretation, but being a mathematics major probably makes me blind to other terminology from different fields (you know which one I'm talking about).
Monoids are ubiquitous in programming. In some programming languages, eg. Haskell, we can make monoids explicit http://blog.sigfpe.com/2009/01/haskell-monoids-and-their-uses.html

Resources