What are the most interesting equivalences arising from the Curry-Howard Isomorphism?

What are the most interesting equivalences arising from the Curry-Howard Isomorphism? - functional-programming

I came upon the Curry-Howard Isomorphism relatively late in my programming life, and perhaps this contributes to my being utterly fascinated by it. It implies that for every programming concept there exists a precise analogue in formal logic, and vice versa. Here's a "basic" list of such analogies, off the top of my head:
program/definition | proof
type/declaration | proposition
inhabited type | theorem/lemma
function | implication
function argument | hypothesis/antecedent
function result | conclusion/consequent
function application | modus ponens
recursion | induction
identity function | tautology
non-terminating function | absurdity/contradiction
tuple | conjunction (and)
disjoint union | disjunction (or) -- corrected by Antal S-Z
parametric polymorphism | universal quantification
So, to my question: what are some of the more interesting/obscure implications of this isomorphism? I'm no logician so I'm sure I've only scratched the surface with this list.
For example, here are some programming notions for which I'm unaware of pithy names in logic:
currying | "((a & b) => c) iff (a => (b => c))"
scope | "known theory + hypotheses"
And here are some logical concepts which I haven't quite pinned down in programming terms:
primitive type? | axiom
set of valid programs? | theory
Edit:
Here are some more equivalences collected from the responses:
function composition | syllogism -- from Apocalisp
continuation-passing | double negation -- from camccann

Since you explicitly asked for the most interesting and obscure ones:
You can extend C-H to many interesting logics and formulations of logics to obtain a really wide variety of correspondences. Here I've tried to focus on some of the more interesting ones rather than on the obscure, plus a couple of fundamental ones that haven't come up yet.
evaluation | proof normalisation/cut-elimination
variable | assumption
S K combinators | axiomatic formulation of logic
pattern matching | left-sequent rules
subtyping | implicit entailment (not reflected in expressions)
intersection types | implicit conjunction
union types | implicit disjunction
open code | temporal next
closed code | necessity
effects | possibility
reachable state | possible world
monadic metalanguage | lax logic
non-termination | truth in an unobservable possible world
distributed programs | modal logic S5/Hybrid logic
meta variables | modal assumptions
explicit substitutions | contextual modal necessity
pi-calculus | linear logic
EDIT: A reference I'd recommend to anyone interested in learning more about extensions of C-H:
"A Judgmental Reconstruction of Modal Logic" http://www.cs.cmu.edu/~fp/papers/mscs00.pdf - this is a great place to start because it starts from first principles and much of it is aimed to be accessible to non-logicians/language theorists. (I'm the second author though, so I'm biased.)

You're muddying things a little bit regarding nontermination. Falsity is represented by uninhabited types, which by definition can't be non-terminating because there's nothing of that type to evaluate in the first place.
Non-termination represents contradiction--an inconsistent logic. An inconsistent logic will of course allow you to prove anything, including falsity, however.
Ignoring inconsistencies, type systems typically correspond to an intuitionistic logic, and are by necessity constructivist, which means certain pieces of classical logic can't be expressed directly, if at all. On the other hand this is useful, because if a type is a valid constructive proof, then a term of that type is a means of constructing whatever you've proven the existence of.
A major feature of the constructivist flavor is that double negation is not equivalent to non-negation. In fact, negation is rarely a primitive in a type system, so instead we can represent it as implying falsehood, e.g., not P becomes P -> Falsity. Double negation would thus be a function with type (P -> Falsity) -> Falsity, which clearly is not equivalent to something of just type P.
However, there's an interesting twist on this! In a language with parametric polymorphism, type variables range over all possible types, including uninhabited ones, so a fully polymorphic type such as ∀a. a is, in some sense, almost-false. So what if we write double almost-negation by using polymorphism? We get a type that looks like this: ∀a. (P -> a) -> a. Is that equivalent to something of type P? Indeed it is, merely apply it to the identity function.
But what's the point? Why write a type like that? Does it mean anything in programming terms? Well, you can think of it as a function that already has something of type P somewhere, and needs you to give it a function that takes P as an argument, with the whole thing being polymorphic in the final result type. In a sense, it represents a suspended computation, waiting for the rest to be provided. In this sense, these suspended computations can be composed together, passed around, invoked, whatever. This should begin to sound familiar to fans of some languages, like Scheme or Ruby--because what it means is that double-negation corresponds to continuation-passing style, and in fact the type I gave above is exactly the continuation monad in Haskell.

Your chart is not quite right; in many cases you have confused types with terms.
function type implication
function proof of implication
function argument proof of hypothesis
function result proof of conclusion
function application RULE modus ponens
recursion n/a [1]
structural induction fold (foldr for lists)
mathematical induction fold for naturals (data N = Z | S N)
identity function proof of A -> A, for all A
non-terminating function n/a [2]
tuple normal proof of conjunction
sum disjunction
n/a [3] first-order universal quantification
parametric polymorphism second-order universal quantification
currying (A,B) -> C -||- A -> (B -> C), for all A,B,C
primitive type axiom
types of typeable terms theory
function composition syllogism
substitution cut rule
value normal proof
[1] The logic for a Turing-complete functional language is inconsistent. Recursion has no correspondence in consistent theories. In an inconsistent logic/unsound proof theory you could call it a rule which causes inconsistency/unsoundness.
[2] Again, this is a consequence of completeness. This would be a proof of an anti-theorem if the logic were consistent -- thus, it can't exist.
[3] Doesn't exist in functional languages, since they elide first-order logical features: all quantification and parametrization is done over formulae. If you had first-order features, there would be a kind other than *, * -> *, etc.; the kind of elements of the domain of discourse. For example, in Father(X,Y) :- Parent(X,Y), Male(X), X and Y range over the domain of discourse (call it Dom), and Male :: Dom -> *.

function composition | syllogism

I really like this question. I don't know a whole lot, but I do have a few things (assisted by the Wikipedia article, which has some neat tables and such itself):
I think that sum types/union types (e.g. data Either a b = Left a | Right b) are equivalent to inclusive disjunction. And, though I'm not very well acquainted with Curry-Howard, I think this demonstrates it. Consider the following function:
andImpliesOr :: (a,b) -> Either a b
andImpliesOr (a,_) = Left a
If I understand things correctly, the type says that (a ∧ b) → (a ★ b) and the definition says that this is true, where ★ is either inclusive or exclusive or, whichever Either represents. You have Either representing exclusive or, ⊕; however, (a ∧ b) ↛ (a ⊕ b). For instance, ⊤ ∧ ⊤ ≡ ⊤, but ⊤ ⊕ ⊥ ≡ ⊥, and ⊤ ↛ ⊥. In other words, if both a and b are true, then the hypothesis is true but the conclusion is false, and so this implication must be false. However, clearly, (a ∧ b) → (a ∨ b), since if both a and b are true, then at least one is true. Thus, if discriminated unions are some form of disjunction, they must be the inclusive variety. I think this holds as a proof, but feel more than free to disabuse me of this notion.
Similarly, your definitions for tautology and absurdity as the identity function and non-terminating functions, respectively, are a bit off. The true formula is represented by the unit type, which is the type which has only one element (data ⊤ = ⊤; often spelled () and/or Unit in functional programming languages). This makes sense: since that type is guaranteed to be inhabited, and since there's only one possible inhabitant, it must be true. The identity function just represents the particular tautology that a → a.
Your comment about non-terminating functions is, depending on what precisely you meant, more off. Curry-Howard functions on the type system, but non-termination is not encoded there. According to Wikipedia, dealing with non-termination is an issue, as adding it produces inconsistent logics (e.g., I can define wrong :: a -> b by wrong x = wrong x, and thus “prove” that a → b for any a and b). If this is what you meant by “absurdity”, then you're exactly correct. If instead you meant the false statement, then what you want instead is any uninhabited type, e.g. something defined by data ⊥—that is, a data type without any way to construct it. This ensures that it has no values at all, and so it must be uninhabited, which is equivalent to false. I think you could probably also use a -> b, since if we forbid non-terminating functions, then this is also uninhabited, but I'm not 100% sure.
Wikipedia says that axioms are encoded in two different ways, depending on how you interpret Curry-Howard: either in the combinators or in the variables. I think the combinator view means that the primitive functions we are given encode the things we can say by default (similar to the way that modus ponens is an axiom because function application is primitive). And I think that the variable view may actually mean the same thing—combinators, after all, are just global variables which are particular functions. As for primitive types: if I'm thinking about this correctly, then I think that primitive types are the entities—the primitive objects that we're trying to prove things about.
According to my logic and semantics class, the fact that (a ∧ b) → c ≡ a → (b → c) (and also that b → (a → c)) is called the exportation equivalence law, at least in natural deduction proofs. I didn't notice at the time that it was just currying—I wish I had, because that's cool!
While we now have a way to represent inclusive disjunction, we don't have a way to represent the exclusive variety. We should be able to use the definition of exclusive disjunction to represent it: a ⊕ b ≡ (a ∨ b) ∧ ¬(a ∧ b). I don't know how to write negation, but I do know that ¬p ≡ p → ⊥, and both implication and falsehood are easy. We should thus able to represent exclusive disjunction by:
data ⊥
data Xor a b = Xor (Either a b) ((a,b) -> ⊥)
This defines ⊥ to be the empty type with no values, which corresponds to falsity; Xor is then defined to contain both (and) Either an a or a b (or) and a function (implication) from (a,b) (and) to the bottom type (false). However, I have no idea what this means. (Edit 1: Now I do, see the next paragraph!) Since there are no values of type (a,b) -> ⊥ (are there?), I can't fathom what this would mean in a program. Does anyone know a better way to think about either this definition or another one? (Edit 1: Yes, camccann.)
Edit 1: Thanks to camccann's answer (more particularly, the comments he left on it to help me out), I think I see what's going on here. To construct a value of type Xor a b, you need to provide two things. First, a witness to the existence of an element of either a or b as the first argument; that is, a Left a or a Right b. And second, a proof that there are not elements of both types a and b—in other words, a proof that (a,b) is uninhabited—as the second argument. Since you'll only be able to write a function from (a,b) -> ⊥ if (a,b) is uninhabited, what does it mean for that to be the case? That would mean that some part of an object of type (a,b) could not be constructed; in other words, that at least one, and possibly both, of a and b are uninhabited as well! In this case, if we're thinking about pattern matching, you couldn't possibly pattern-match on such a tuple: supposing that b is uninhabited, what would we write that could match the second part of that tuple? Thus, we cannot pattern match against it, which may help you see why this makes it uninhabited. Now, the only way to have a total function which takes no arguments (as this one must, since (a,b) is uninhabited) is for the result to be of an uninhabited type too—if we're thinking about this from a pattern-matching perspective, this means that even though the function has no cases, there's no possible body it could have either, and so everything's OK.
A lot of this is me thinking aloud/proving (hopefully) things on the fly, but I hope it's useful. I really recommend the Wikipedia article; I haven't read through it in any sort of detail, but its tables are a really nice summary, and it's very thorough.

Here's a slightly obscure one that I'm surprised wasn't brought up earlier: "classical" functional reactive programming corresponds to temporal logic.
Of course, unless you're a philosopher, mathematician or obsessive functional programmer, this probably brings up several more questions.
So, first off: what is functional reactive programming? It's a declarative way to work with time-varying values. This is useful for writing things like user interfaces because inputs from the user are values that vary over time. "Classical" FRP has two basic data types: events and behaviors.
Events represent values which only exist at discrete times. Keystrokes are a great example: you can think of the inputs from the keyboard as a character at a given time. Each keypress is then just a pair with the character of the key and the time it was pressed.
Behaviors are values that exist constantly but can be changing continuously. The mouse position is a great example: it is just a behavior of x, y coordinates. After all, the mouse always has a position and, conceptually, this position changes continually as you move the mouse. After all, moving the mouse is a single protracted action, not a bunch of discrete steps.
And what is temporal logic? Appropriately enough, it's a set of logical rules for dealing with propositions quantified over time. Essentially, it extends normal first-order logic with two quantifiers: □ and ◇. The first means "always": read □φ as "φ always holds". The second is "eventually": ◇φ means that "φ will eventually hold". This is a particular kind of modal logic. The following two laws relate the quantifiers:
□φ ⇔ ¬◇¬φ
◇φ ⇔ ¬□¬φ
So □ and ◇ are dual to each other in the same way as ∀ and ∃.
These two quantifiers correspond to the two types in FRP. In particular, □ corresponds to behaviors and ◇ corresponds to events. If we think about how these types are inhabited, this should make sense: a behavior is inhabited at every possible time, while an event only happens once.

Related to the relationship between continuations and double negation, the type of call/cc is Peirce's law http://en.wikipedia.org/wiki/Call-with-current-continuation
C-H is usually stated as correspondence between intuitionistic logic and programs. However if we add the call-with-current-continuation (callCC) operator (whose type corresponds to Peirce's law), we get a correspondence between classical logic and programs with callCC.

2-continuation | Sheffer stoke
n-continuation language | Existential graph
Recursion | Mathematical Induction
One thing that is important, but have not yet being investigated is the relationship of 2-continuation (continuations that takes 2 parameters) and Sheffer stroke. In classic logic, Sheffer stroke can form a complete logic system by itself (plus some non-operator concepts). Which means the familiar and, or, not can be implemented using only the Sheffer stoke or nand.
This is an important fact of its programming type correspondence because it prompts that a single type combinator can be used to form all other types.
The type signature of a 2-continuation is (a,b) -> Void. By this implementation we can define 1-continuation (normal continuations) as (a,a) -> Void, product type as ((a,b)->Void,(a,b)->Void)->Void, sum type as ((a,a)->Void,(b,b)->Void)->Void. This gives us an impressive of its power of expressiveness.
If we dig further, we will find out that Piece's existential graph is equivalent to a language with the only data type is n-continuation, but I didn't see any existing languages is in this form. So inventing one could be interesting, I think.

While it's not a simple isomorphism, this discussion of constructive LEM is a very interesting result. In particular, in the conclusion section, Oleg Kiselyov discusses how the use of monads to get double-negation elimination in a constructive logic is analogous to distinguishing computationally decidable propositions (for which LEM is valid in a constructive setting) from all propositions. The notion that monads capture computational effects is an old one, but this instance of the Curry--Howard isomorphism helps put it in perspective and helps get at what double-negation really "means".

First-class continuations support allows you to express $P \lor \neg P$.
The trick is based on the fact that not calling the continuation and exiting with some expression is equivalent to calling the continuation with that same expression.
For more detailed view please see: http://www.cs.cmu.edu/~rwh/courses/logic/www-old/handouts/callcc.pdf

Related

Isabelle/HOL foundations

I have seen a lot of documentation about Isabelle's syntax and proof strategies. However, little have I found about its foundations. I have a few questions that I would be very grateful if someone could take the time to answer:
Why doesn't Isabelle/HOL admit functions that do not terminate? Many other languages such as Haskell do admit non-terminating functions.
What symbols are part of Isabelle's meta-language? I read that there are symbols in the meta-language for Universal Quantification (/\) and for implication (==>). However, these symbols have their counterpart in the object-level language (∀ and -->). I understand that --> is an object-level function of type bool => bool => bool. However, how are ∀ and ∃ defined? Are they object-level Boolean functions? If so, they are not computable (considering infinite domains). I noticed that I am able to write Boolean functions in therms of ∀ and ∃, but they are not computable. So what are ∀ and ∃? Are they part of the object-level? If so, how are they defined?
Are Isabelle theorems just Boolean expressions? Then Booleans are part of the meta-language?
As far as I know, Isabelle is a strict programming language. How can I use infinite objects? Let's say, infinite lists. Is it possible in Isabelle/HOL?
Sorry if these questions are very basic. I do not seem to find a good tutorial on Isabelle's meta-theory. I would love if someone could recommend me a good tutorial on these topics.
Thank you very much.

You can define non-terminating (i.e. partial) functions in Isabelle (cf. Function package manual (section 8)). However, partial functions are more difficult to reason about, because whenever you want to use its definition equations (the psimps rules, which replace the simps rules of a normal function), you have to show that the function terminates on that particular input first.
In general, things like non-definedness and non-termination are always problematic in a logic – consider, for instance, the function ‘definition’ f x = f x + 1. If we were to take this as an equation on ℤ (integers), we could subtract f x from both sides and get 0 = 1. In Haskell, this problem is ‘solved’ by saying that this is not an equation on ℤ, but rather on ℤ ∪ {⊥} (the integers plus bottom) and the non-terminating function f evaluates to ⊥, and ‘⊥ + 1 = ⊥’, so everything works out fine.
However, if every single expression in your logic could potentially evaluate to ⊥ instead of a ‘proper‘ value, reasoning in this logic will become very tedious. This is why Isabelle/HOL chooses to restrict itself to total functions; things like partiality have to be emulated with things like undefined (which is an arbitrary value that you know nothing about) or option types.
I'm not an expert on Isabelle/Pure (the meta logic), but the most important symbols are definitely
⋀ (the universal meta quantifier)
⟹ (meta implication)
≡ (meta equality)
&&& (meta conjunction, defined in terms of ⟹)
Pure.term, Pure.prop, Pure.type, Pure.dummy_pattern, Pure.sort_constraint, which fulfil certain internal functions that I don't know much about.
You can find some information on this in the Isabelle/Isar Reference Manual in section 2.1, and probably more elsewhere in the manual.
Everything else (that includes ∀ and ∃, which indeed operate on boolean expressions) is defined in the object logic (HOL, usually). You can find the definitions, of rather the axiomatisations, in ~~/src/HOL/HOL.thy (where ~~ denotes the Isabelle root directory):
All_def: "All P ≡ (P = (λx. True))"
Ex_def: "Ex P ≡ ∀Q. (∀x. P x ⟶ Q) ⟶ Q"
Also note that many, if not most Isabelle functions are typically not computable. Isabelle is not a programming language, although it does have a code generator that allows exporting Isabelle functions as code to programming languages as long as you can give code equations for all the functions involved.
3)
Isabelle theorems are a complex datatype (cf. ~~/src/Pure/thm.ML) containing a lot of information, but the most important part, of course, is the proposition. A proposition is something from Isabelle/Pure, which in fact only has propositions and functions. (and itself and dummy, but you can ignore those).
Propositions are not booleans – in fact, there isn't even a way to state that a proposition does not hold in Isabelle/Pure.
HOL then defines (or rather axiomatises) booleans and also axiomatises a coercion from booleans to propositions: Trueprop :: bool ⇒ prop
Isabelle is not a programming language, and apart from that, totality does not mean you have to restrict yourself to finite structures. Even in a total programming language, you can have infinite lists. (cf. Idris's codata)
Isabelle is a theorem prover, and logically, infinite objects can be treated by axiomatising them and then reasoning about them using the axioms and rules that you have.
For instance, HOL assumes the existence of an infinite type and defines the natural numbers on that. That already gives you access to functions nat ⇒ 'a, which are essentially infinite lists.
You can also define infinite lists and other infinite data structures as codatatypes with the (co-)datatype package, which is based on bounded natural functors.

Let me add some points to two of your questions.
1) Why doesn't Isabelle/HOL admit functions that do not terminate? Many other languages such as Haskell do admit non-terminating functions.
In short: Isabelle/HOL does not require termination, but totality (i.e., there is a specific result for each input to the function) of functions. Totality does not mean that a function is actually terminating when transcribed to a (functional) programming language or even that it is computable at all.
Therefore, talking about termination is somewhat misleading, even though it is encouraged by the fact that Isabelle/HOL's function package uses the keyword termination for proving some property P about which I will have to say a little more below.
On the one hand the term "termination" might sound more intuitive to a wider audience. On the other hand, a more precise description of P would be well-foundedness of the function's call graph.
Don't get me wrong, termination is not really a bad name for the property P, it is even justified by the fact that many techniques that are implemented in the function package are very close to termination techniques from term rewriting or functional programming (like the size-change principle, dependency pairs, lexicographic orders, etc.).
I'm just saying that it can be misleading. The answer to why that is the case also touches on question 4 of the OP.
4) As far as I know Isabelle is a strict programming language. How can I use infinite objects? Let's say, infinite lists. Is it possible in Isabelle/HOL?
Isabelle/HOL is not a programming language and it specifically does not have any evaluation strategy (we could alternatively say: it has any evaluation strategy you like).
And here is why the word termination is misleading (drum roll): if there is no evaluation strategy and we have termination of a function f, people might expect f to terminate independent of the used strategy. But this is not the case. A termination proof of a function rather ensures that f is well-defined. Even if f is computable a proof of P merely ensures that there is an evaluation strategy for which f terminates.
(As an aside: what I call "strategy" here, is typically influenced by so called cong-rules (i.e., congruence rules) in Isabelle/HOL.)
As an example, it is trivial to prove that the function (see Section 10.1 Congruence rules and evaluation order in the documentation of the function package):
fun f' :: "nat ⇒ bool"
where
"f' n ⟷ f' (n - 1) ∨ n = 0"
terminates (in the sense defined by termination) after adding the cong-rule:
lemma [fundef_cong]:
"Q = Q' ⟹ (¬ Q' ⟹ P = P') ⟹ (P ∨ Q) = (P' ∨ Q')"
by auto
Which essentially states that logical-or should be "evaluated" from right to left. However, if you write the same function e.g. in OCaml it causes a stack overflow ...

EDIT: this answer is not really correct, check out Lars' comment below.
Unfortunately I don't have enough reputation to post this as a comment, so here is my go at an answer (please bear in mind I am no expert in Isabelle, but I also had similar questions once):
1) The idea is to prove statements about the defined functions. I am not sure how familiar you are with Computability Theory, but think about the Halting Problem and the fact most undeciability problems stem from it (such as Acceptance Problem). Imagine defining a function which you can't prove it terminates. How could you then still prove it returns the number 42 when given input "ABC" and it doesn't go in an infinite loop?
If instead you limit yourself to terminating functions, you can prove much more about them, essentially making a trade-off (or at least this is how I see it).
These ideas stem from Constructivism and Intuitionism and I recommend you check out Robert Harper's very interesting lecture series: https://www.youtube.com/watch?v=9SnefrwBIDc&list=PLGCr8P_YncjXRzdGq2SjKv5F2J8HUFeqN on Type Theory
You should check out especially the part about the absence of the Law of Excluded middle: http://youtu.be/3JHTb6b1to8?t=15m34s
2) See Manuel's answer.
3,4) Again see Manuel's answer keeping in mind Intuitionistic logic: "the fundamental entity is not the boolean, but rather the proof that something is true".
For me it took a long time to get adjusted to this way of thinking and I'm still not sure I understand it. I think the key though is to understand it is a more-or-less completely different way of thinking.

What is so special about Monads?

A monad is a mathematical structure which is heavily used in (pure) functional programming, basically Haskell. However, there are many other mathematical structures available, like for example applicative functors, strong monads, or monoids. Some have more specific, some are more generic. Yet, monads are much more popular. Why is that?
One explanation I came up with, is that they are a sweet spot between genericity and specificity. This means monads capture enough assumptions about the data to apply the algorithms we typically use and the data we usually have fulfills the monadic laws.
Another explanation could be that Haskell provides syntax for monads (do-notation), but not for other structures, which means Haskell programmers (and thus functional programming researchers) are intuitively drawn towards monads, where a more generic or specific (efficient) function would work as well.

I suspect that the disproportionately large attention given to this one particular type class (Monad) over the many others is mainly a historical fluke. People often associate IO with Monad, although the two are independently useful ideas (as are list reversal and bananas). Because IO is magical (having an implementation but no denotation) and Monad is often associated with IO, it's easy to fall into magical thinking about Monad.
(Aside: it's questionable whether IO even is a monad. Do the monad laws hold? What do the laws even mean for IO, i.e., what does equality mean? Note the problematic association with the state monad.)

If a type m :: * -> * has a Monad instance, you get Turing-complete composition of functions with type a -> m b. This is a fantastically useful property. You get the ability to abstract various Turing-complete control flows away from specific meanings. It's a minimal composition pattern that supports abstracting any control flow for working with types that support it.
Compare this to Applicative, for instance. There, you get only composition patterns with computational power equivalent to a push-down automaton. Of course, it's true that more types support composition with more limited power. And it's true that when you limit the power available, you can do additional optimizations. These two reasons are why the Applicative class exists and is useful. But things that can be instances of Monad usually are, so that users of the type can perform the most general operations possible with the type.
Edit:
By popular demand, here are some functions using the Monad class:
ifM :: Monad m => m Bool -> m a -> m a -> m a
ifM c x y = c >>= \z -> if z then x else y
whileM :: Monad m => (a -> m Bool) -> (a -> m a) -> a -> m a
whileM p step x = ifM (p x) (step x >>= whileM p step) (return x)
(*&&) :: Monad m => m Bool -> m Bool -> m Bool
x *&& y = ifM x y (return False)
(*||) :: Monad m => m Bool -> m Bool -> m Bool
x *|| y = ifM x (return True) y
notM :: Monad m => m Bool -> m Bool
notM x = x >>= return . not
Combining those with do syntax (or the raw >>= operator) gives you name binding, indefinite looping, and complete boolean logic. That's a well-known set of primitives sufficient to give Turing completeness. Note how all the functions have been lifted to work on monadic values, rather than simple values. All monadic effects are bound only when necessary - only the effects from the chosen branch of ifM are bound into its final value. Both *&& and *|| ignore their second argument when possible. And so on..
Now, those type signatures may not involve functions for every monadic operand, but that's just a cognitive simplification. There would be no semantic difference, ignoring bottoms, if all the non-function arguments and results were changed to () -> m a. It's just friendlier to users to optimize that cognitive overhead out.
Now, let's look at what happens to those functions with the Applicative interface.
ifA :: Applicative f => f Bool -> f a -> f a -> f a
ifA c x y = (\c' x' y' -> if c' then x' else y') <$> c <*> x <*> y
Well, uh. It got the same type signature. But there's a really big problem here already. The effects of both x and y are bound into the composed structure, regardless of which one's value is selected.
whileA :: Applicative f => (a -> f Bool) -> (a -> f a) -> a -> f a
whileA p step x = ifA (p x) (whileA p step <$> step x) (pure x)
Well, ok, that seems like it'd be ok, except for the fact that it's an infinite loop because ifA will always execute both branches... Except it's not even that close. pure x has the type f a. whileA p step <$> step x has the type f (f a). This isn't even an infinite loop. It's a compile error. Let's try again..
whileA :: Applicative f => (a -> f Bool) -> (a -> f a) -> a -> f a
whileA p step x = ifA (p x) (whileA p step <*> step x) (pure x)
Well shoot. Don't even get that far. whileA p step has the type a -> f a. If you try to use it as the first argument to <*>, it grabs the Applicative instance for the top type constructor, which is (->), not f. Yeah, this isn't gonna work either.
In fact, the only function from my Monad examples that would work with the Applicative interface is notM. That particular function works just fine with only a Functor interface, in fact. The rest? They fail.
Of course it's to be expected that you can write code using the Monad interface that you can't with the Applicative interface. It is strictly more powerful, after all. But what's interesting is what you lose. You lose the ability to compose functions that change what effects they have based on their input. That is, you lose the ability to write certain control-flow patterns that compose functions with types a -> f b.
Turing-complete composition is exactly what makes the Monad interface interesting. If it didn't allow Turing-complete composition, it would be impossible for you, the programmer, to compose together IO actions in any particular control flow that wasn't nicely prepackaged for you. It was the fact that you can use the Monad primitives to express any control flow that made the IO type a feasible way to manage the IO problem in Haskell.
Many more types than just IO have semantically valid Monad interfaces. And it happens that Haskell has the language facilities to abstract over the entire interface. Due to those factors, Monad is a valuable class to provide instances for, when possible. Doing so gets you access to all the existing abstract functionality provided for working with monadic types, regardless of what the concrete type is.
So if Haskell programmers seem to always care about Monad instances for a type, it's because it's the most generically-useful instance that can be provided.

First, I think that it is not quite true that monads are much more popular than anything else; both Functor and Monoid have many instances that are not monads. But they are both very specific; Functor provides mapping, Monoid concatenation. Applicative is the one class that I can think of that is probably underused given its considerable power, due largely to its being a relatively recent addition to the language.
But yes, monads are extremely popular. Part of that is the do notation; a lot of Monoids provide Monad instances that merely append values to a running accumulator (essentially an implicit writer). The blaze-html library is a good example. The reason, I think, is the power of the type signature (>>=) :: Monad m => m a -> (a -> m b) -> m b. While fmap and mappend are useful, what they can do is fairly narrowly constrained. bind, however, can express a wide variety of things. It is, of course, canonized in the IO monad, perhaps the best pure functional approach to IO before streams and FRP (and still useful beside them for simple tasks and defining components). But it also provides implicit state (Reader/Writer/ST), which can avoid some very tedious variable passing. The various state monads, especially, are important because they provide a guarantee that state is single threaded, allowing mutable structures in pure (non-IO) code before fusion. But bind has some more exotic uses, such as flattening nested data structures (the List and Set monads), both of which are quite useful in their place (and I usually see them used desugared, calling liftM or (>>=) explicitly, so it is not a matter of do notation). So while Functor and Monoid (and the somewhat rarer Foldable, Alternative, Traversable, and others) provide a standardized interface to a fairly straightforward function, Monad's bind is considerably more flexibility.
In short, I think that all your reasons have some role; the popularity of monads is due to a combination of historical accident (do notation and the late definition of Applicative) and their combination of power and generality (relative to functors, monoids, and the like) and understandability (relative to arrows).

Well, first let me explain what the role of monads is: Monads are very powerful, but in a certain sense: You can pretty much express anything using a monad. Haskell as a language doesn't have things like action loops, exceptions, mutation, goto, etc. Monads can be expressed within the language (so they are not special) and make all of these reachable.
There is a positive and a negative side to this: It's positive that you can express all those control structures you know from imperative programming and a whole bunch of them you don't. I have just recently developed a monad that lets you reenter a computation somewhere in the middle with a slightly changed context. That way you can run a computation, and if it fails, you just try again with slightly adjusted values. Furthermore monadic actions are first class, and that's how you build things like loops or exception handling. While while is primitive in C in Haskell it's actually just a regular function.
The negative side is that monads give you pretty much no guarantees whatsoever. They are so powerful that you are allowed to do whatever you want, to put it simply. In other words just like you know from imperative languages it can be hard to reason about code by just looking at it.
The more general abstractions are more general in the sense that they allow some concepts to be expressed which you can't express as monads. But that's only part of the story. Even for monads you can use a style known as applicative style, in which you use the applicative interface to compose your program from small isolated parts. The benefit of this is that you can reason about code by just looking at it and you can develop components without having to pay attention to the rest of your system.

What is so special about monads?
The monadic interface's main claim to fame in Haskell is its role in the replacement of the original and unwieldy dialogue-based I/O mechanism.
As for their status in a formal investigative context...it is merely an iteration of a seemingly-cyclic endeavour which is now (2021 Oct) approximately one half-century old:
During the 1960s, several researchers began work on proving things about programs. Efforts were
made to prove that:
A program was correct.
Two programs with different code computed the same answers when given the
same inputs.
One program was faster than another.
A given program would always terminate.
While these are abstract goals, they are all, really, the same as the practical goal of "getting the
program debugged".
Several difficult problems emerged from this work. One was the problem of specification: before
one can prove that a program is correct, one must specify the meaning of "correct", formally and
unambiguously. Formal systems for specifying the meaning of a program were developed, and they
looked suspiciously like programming languages.
The Anatomy of Programming Languages, Alice E. Fischer and Frances S. Grodzinsky.
(emphasis by me.)
...back when "programming languages" - apart from an intrepid few - were most definitely imperative.
Anyone for elevating this mystery to the rank of Millenium problem? Solving it would definitely advance the science of computing and the engineering of software, one way or the other...

Monads are special because of do notation, which lets you write imperative programs in a functional language. Monad is the abstraction that allows you to splice together imperative programs from smaller, reusable components (which are themselves imperative programs). Monad transformers are special because they represent enhancing an imperative language with new features.

In pure functional languages, is data (strings, ints, floats.. ) also just functions?

I was thinking about pure Object Oriented Languages like Ruby, where everything, including numbers, int, floats, and strings are themselves objects. Is this the same thing with pure functional languages? For example, in Haskell, are Numbers and Strings also functions?
I know Haskell is based on lambda calculus which represents everything, including data and operations, as functions. It would seem logical to me that a "purely functional language" would model everything as a function, as well as keep with the definition that a function most always returns the same output with the same inputs and has no state.

It's okay to think about that theoretically, but...
Just like in Ruby not everything is an object (argument lists, for instance, are not objects), not everything in Haskell is a function.
For more reference, check out this neat post: http://conal.net/blog/posts/everything-is-a-function-in-haskell

#wrhall gives a good answer. However you are somewhat correct that in the pure lambda calculus it is consistent for everything to be a function, and the language is Turing-complete (capable of expressing any pure computation that Haskell, etc. is).
That gives you some very strange things, since the only thing you can do to anything is to apply it to something else. When do you ever get to observe something? You have some value f and want to know something about it, your only choice is to apply it some value x to get f x, which is another function and the only choice is to apply it to another value y, to get f x y and so on.
Often I interpret the pure lambda calculus as talking about transformations on things that are not functions, but only capable of expressing functions itself. That is, I can make a function (with a bit of Haskelly syntax sugar for recursion & let):
purePlus = \zero succ natCase ->
let plus = \m n -> natCase m n (\m' -> plus m' n)
in plus (succ (succ zero)) (succ (succ zero))
Here I have expressed the computation 2+2 without needing to know that there are such things as non-functions. I simply took what I needed as arguments to the function I was defining, and the values of those arguments could be church encodings or they could be "real" numbers (whatever that means) -- my definition does not care.
And you could think the same thing of Haskell. There is no particular reason to think that there are things which are not functions, nor is there a particular reason to think that everything is a function. But Haskell's type system at least prevents you from applying an argument to a number (anybody thinking about fromInteger right now needs to hold their tongue! :-). In the above interpretation, it is because numbers are not necessarily modeled as functions, so you can't necessarily apply arguments to them.
In case it isn't clear by now, this whole answer has been somewhat of a technical/philosophical digression, and the easy answer to your question is "no, not everything is a function in functional languages". Functions are the things you can apply arguments to, that's all.

The "pure" in "pure functional" refers to the "freedom from side effects" kind of purity. It has little relation to the meaning of "pure" being used when people talk about a "pure object-oriented language", which simply means that the language manipulates purely (only) in objects.
The reason is that pure-as-in-only is a reasonable distinction to use to classify object-oriented languages, because there are languages like Java and C++, which clearly have values that don't have all that much in common with objects, and there are also languages like Python and Ruby, for which it can be argued that every value is an object1
Whereas for functional languages, there are no practical languages which are "pure functional" in the sense that every value the language can manipulate is a function. It's certainly possible to program in such a language. The most basic versions of the lambda calculus don't have any notion of things that are not functions, but you can still do arbitrary computation with them by coming up with ways of representing the things you want to compute on as functions.2
But while the simplicity and minimalism of the lambda calculus tends to be great for proving things about programming, actually writing substantial programs in such a "raw" programming language is awkward. The function representation of basic things like numbers also tends to be very inefficient to implement on actual physical machines.
But there is a very important distinction between languages that encourage a functional style but allow untracked side effects anywhere, and ones that actually enforce that your functions are "pure" functions (similar to mathematical functions). Object-oriented programming is very strongly wed to the use of impure computations3, so there are no practical object-oriented programming languages that are pure in this sense.
So the "pure" in "pure functional language" means something very different from the "pure" in "pure object-oriented language".4 In each case the "pure vs not pure" distinction is one that is completely uninteresting applied to the other kind of language, so there's no very strong motive to standardise the use of the term.
1 There are corner cases to pick at in all "pure object-oriented" languages that I know of, but that's not really very interesting. It's clear that the object metaphor goes much further in languages in which 1 is an instance of some class, and that class can be sub-classed, than it does in languages in which 1 is something else than an object.
2 All computation is about representation anyway. Computers don't know anything about numbers or anything else. They just have bit-patterns that we use to represent numbers, and operations on bit-patterns that happen to correspond to operations on numbers (because we designed them so that they would).
3 This isn't fundamental either. You could design a "pure" object-oriented language that was pure in this sense. I tend to write most of my OO code to be pure anyway.
4 If this seems obtuse, you might reflect that the terms "functional", "object", and "language" have vastly different meanings in other contexts also.

A very different angle on this question: all sorts of data in Haskell can be represented as functions, using a technique called Church encodings. This is a form of inversion of control: instead of passing data to functions that consume it, you hide the data inside a set of closures, and to consume it you pass in callbacks describing what to do with this data.
Any program that uses lists, for example, can be translated into a program that uses functions instead of lists:
-- | A list corresponds to a function of this type:
type ChurchList a r = (a -> r -> r) --^ how to handle a cons cell
-> r --^ how to handle the empty list
-> r --^ result of processing the list
listToCPS :: [a] -> ChurchList a r
listToCPS xs = \f z -> foldr f z xs
That function is taking a concrete list as its starting point, but that's not necessary. You can build up ChurchList functions out of just pure functions:
-- | The empty 'ChurchList'.
nil :: ChurchList a r
nil = \f z -> z
-- | Add an element at the front of a 'ChurchList'.
cons :: a -> ChurchList a r -> ChurchList a r
cons x xs = \f z -> f z (xs f z)
foldChurchList :: (a -> r -> r) -> r -> ChurchList a r -> r
foldChurchList f z xs = xs f z
mapChurchList :: (a -> b) -> ChurchList a r -> ChurchList b r
mapChurchList f = foldChurchList step nil
where step x = cons (f x)
filterChurchList :: (a -> Bool) -> ChurchList a r -> ChurchList a r
filterChurchList pred = foldChurchList step nil
where step x xs = if pred x then cons x xs else xs
That last function uses Bool, but of course we can replace Bool with functions as well:
-- | A Bool can be represented as a function that chooses between two
-- given alternatives.
type ChurchBool r = r -> r -> r
true, false :: ChurchBool r
true a _ = a
false _ b = b
filterChurchList' :: (a -> ChurchBool r) -> ChurchList a r -> ChurchList a r
filterChurchList' pred = foldChurchList step nil
where step x xs = pred x (cons x xs) xs
This sort of transformation can be done for basically any type, so in theory, you could get rid of all "value" types in Haskell, and keep only the () type, the (->) and IO type constructors, return and >>= for IO, and a suitable set of IO primitives. This would obviously be hella impractical—and it would perform worse (try writing tailChurchList :: ChurchList a r -> ChurchList a r for a taste).

Is getChar :: IO Char a function or not? Haskell Report doesn't provide us with a definition. But it states that getChar is a function (see here). (Well, at least we can say that it is a function.)
So I think the answer is YES.
I don't think there can be correct definition of "function" except "everything is a function". (What is "correct definition"? Good question...) Consider the next example:
{-# LANGUAGE NoMonomorphismRestriction #-}
import Control.Applicative
f :: Applicative f => f Int
f = pure 1
g1 :: Maybe Int
g1 = f
g2 :: Int -> Int
g2 = f
Is f a function or datatype? It depends.

A Functional-Imperative Hybrid

Pure functional programming languages do not allow mutable data, but some computations are more naturally/intuitively expressed in an imperative way -- or an imperative version of an algorithm may be more efficient. I am aware that most functional languages are not pure, and let you assign/reassign variables and do imperative things but generally discourage it.
My question is, why not allow local state to be manipulated in local variables, but require that functions can only access their own locals and global constants (or just constants defined in an outer scope)? That way, all functions maintain referential transparency (they always give the same return value given the same arguments), but within a function, a computation can be expressed in imperative terms (like, say, a while loop).
IO and such could still be accomplished in the normal functional ways - through monads or passing around a "world" or "universe" token.

My question is, why not allow local state to be manipulated in local variables, but require that functions can only access their own locals and global constants (or just constants defined in an outer scope)?
Good question. I think the answer is that mutable locals are of limited practical value but mutable heap-allocated data structures (primarily arrays) are enormously valuable and form the backbone of many important collections including efficient stacks, queues, sets and dictionaries. So restricting mutation to locals only would not give an otherwise purely functional language any of the important benefits of mutation.
On a related note, communicating sequential processes exchanging purely functional data structures offer many of the benefits of both worlds because the sequential processes can use mutation internally, e.g. mutable message queues are ~10x faster than any purely functional queues. For example, this is idiomatic in F# where the code in a MailboxProcessor uses mutable data structures but the messages communicated between them are immutable.
Sorting is a good case study in this context. Sedgewick's quicksort in C is short and simple and hundreds of times faster than the fastest purely functional sort in any language. The reason is that quicksort mutates the array in-place. Mutable locals would not help. Same story for most graph algorithms.

The short answer is: there are systems to allow what you want. For example, you can do it using the ST monad in Haskell (as referenced in the comments).
The ST monad approach is from Haskell's Control.Monad.ST. Code written in the ST monad can use references (STRef) where convenient. The nice part is that you can even use the results of the ST monad in pure code, as it is essentially self-contained (this is basically what you were wanting in the question).
The proof of this self-contained property is done through the type-system. The ST monad carries a state-thread parameter, usually denoted with a type-variable s. When you have such a computation you'll have monadic result, with a type like:
foo :: ST s Int
To actually turn this into a pure result, you have to use
runST :: (forall s . ST s a) -> a
You can read this type like: give me a computation where the s type parameter doesn't matter, and I can give you back the result of the computation, without the ST baggage. This basically keeps the mutable ST variables from escaping, as they would carry the s with them, which would be caught by the type system.
This can be used to good effect on pure structures that are implemented with underlying mutable structures (like the vector package). One can cast off the immutability for a limited time to do something that mutates the underlying array in place. For example, one could combine the immutable Vector with an impure algorithms package to keep the most of the performance characteristics of the in place sorting algorithms and still get purity.
In this case it would look something like:
pureSort :: Ord a => Vector a -> Vector a
pureSort vector = runST $ do
mutableVector <- thaw vector
sort mutableVector
freeze mutableVector
The thaw and freeze functions are linear-time copying, but this won't disrupt the overall O(n lg n) running time. You can even use unsafeFreeze to avoid another linear traversal, as the mutable vector isn't used again.

"Pure functional programming languages do not allow mutable data" ... actually it does, you just simply have to recognize where it lies hidden and see it for what it is.
Mutability is where two things have the same name and mutually exclusive times of existence so that they may be treated as "the same thing at different times". But as every Zen philosopher knows, there is no such thing as "same thing at different times". Everything ceases to exist in an instant and is inherited by its successor in possibly changed form, in a (possibly) uncountably-infinite succession of instants.
In the lambda calculus, mutability thus takes the form illustrated by the following example: (λx (λx f(x)) (x+1)) (x+1), which may also be rendered as "let x = x + 1 in let x = x + 1 in f(x)" or just "x = x + 1, x = x + 1, f(x)" in a more C-like notation.
In other words, "name clash" of the "lambda calculus" is actually "update" of imperative programming, in disguise. They are one and the same - in the eyes of the Zen (who is always right).
So, let's refer to each instant and state of the variable as the Zen Scope of an object. One ordinary scope with a mutable object equals many Zen Scopes with constant, unmutable objects that either get initialized if they are the first, or inherit from their predecessor if they are not.
When people say "mutability" they're misidentifying and confusing the issue. Mutability (as we've just seen here) is a complete red herring. What they actually mean (even unbeknonwst to themselves) is infinite mutability; i.e. the kind which occurs in cyclic control flow structures. In other words, what they're actually referring to - as being specifically "imperative" and not "functional" - is not mutability at all, but cyclic control flow structures along with the infinite nesting of Zen Scopes that this entails.
The key feature that lies absent in the lambda calculus is, thus, seen not as something that may be remedied by the inclusion of an overwrought and overthought "solution" like monads (though that doesn't exclude the possibility of it getting the job done) but as infinitary terms.
A control flow structure is the wrapping of an unwrapped (possibility infinite) decision tree structure. Branches may re-converge. In the corresponding unwrapped structure, they appear as replicated, but separate, branches or subtrees. Goto's are direct links to subtrees. A goto or branch that back-branches to an earlier part of a control flow structure (the very genesis of the "cycling" of a cyclic control flow structure) is a link to an identically-shaped copy of the entire structure being linked to. Corresponding to each structure is its Universally Unrolled decision tree.
More precisely, we may think of a control-flow structure as a statement that precedes an actual expression that conditions the value of that expression. The archetypical case in point is Landin's original case, itself (in his 1960's paper, where he tried to lambda-ize imperative languages): let x = 1 in f(x). The "x = 1" part is the statement, the "f(x)" is the value being conditioned by the statement. In C-like form, we could write this as x = 1, f(x).
More generally, corresponding to each statement S and expression Q is an expression S[Q] which represents the result Q after S is applied. Thus, (x = 1)[f(x)] is just λx f(x) (x + 1). The S wraps around the Q. If S contains cyclic control flow structures, the wrapping will be infinitary.
When Landin tried to work out this strategy, he hit a hard wall when he got to the while loop and went "Oops. Never mind." and fell back into what become an overwrought and overthought solution, while this simple (and in retrospect, obvious) answer eluded his notice.
A while loop "while (x < n) x = x + 1;" - which has the "infinite mutability" mentioned above, may itself be treated as an infinitary wrapper, "if (x < n) { x = x + 1; if (x < 1) { x = x + 1; if (x < 1) { x = x + 1; ... } } }". So, when it wraps around an expression Q, the result is (in C-like notation) "x < n? (x = x + 1, x < n? (x = x + 1, x < n? (x = x + 1, ...): Q): Q): Q", which may be directly rendered in lambda form as "x < n? (λx x < n (λx x < n? (λx·...) (x + 1): Q) (x + 1): Q) (x + 1): Q". This shows directly the connection between cyclicity and infinitariness.
This is an infinitary expression that, despite being infinite, has only a finite number of distinct subexpressions. Just as we can think of there being a Universally Unrolled form to this expression - which is similar to what's shown above (an infinite decision tree) - we can also think of there being a Maximally Rolled form, which could be obtained by labelling each of the distinct subexpressions and referring to the labels, instead. The key subexpressions would then be:
A: x < n? goto B: Q
B: x = x + 1, goto A
The subexpression labels, here, are "A:" and "B:", while the references to the subexpressions so labelled as "goto A" and "goto B", respectively. So, by magic, the very essence of Imperativitity emerges directly out of the infinitary lambda calculus, without any need to posit it separately or anew.
This way of viewing things applies even down to the level of binary files. Every interpretation of every byte (whether it be a part of an opcode of an instruction that starts 0, 1, 2 or more bytes back, or as part of a data structure) can be treated as being there in tandem, so that the binary file is a rolling up of a much larger universally unrolled structure whose physical byte code representation overlaps extensively with itself.
Thus, emerges the imperative programming language paradigm automatically out of the pure lambda calculus, itself, when the calculus is extended to include infinitary terms. The control flow structure is directly embodied in the very structure of the infinitary expression, itself; and thus requires no additional hacks (like Landin's or later descendants, like monads) - as it's already there.
This synthesis of the imperative and functional paradigms arose in the late 1980's via the USENET, but has not (yet) been published. Part of it was already implicit in the treatment (dating from around the same time) given to languages, like Prolog-II, and the much earlier treatment of cyclic recursive structures by infinitary expressions by Irene Guessarian LNCS 99 "Algebraic Semantics".
Now, earlier I said that the magma-based formulation might get you to the same place, or to an approximation thereof. I believe there is a kind of universal representation theorem of some sort, which asserts that the infinitary based formulation provides a purely syntactic representation, and that the semantics that arise from the monad-based representation factors through this as "monad-based semantics" = "infinitary lambda calculus" + "semantics of infinitary languages".
Likewise, we may think of the "Q" expressions above as being continuations; so there may also be a universal representation theorem for continuation semantics, which similarly rolls this formulation back into the infinitary lambda calculus.
At this point, I've said nothing about non-rational infinitary terms (i.e. infinitary terms which possess an infinite number of distinct subterms and no finite Minimal Rolling) - particularly in relation to interprocedural control flow semantics. Rational terms suffice to account for loops and branches, and so provide a platform for intraprocedural control flow semantics; but not as much so for the call-return semantics that are the essential core element of interprocedural control flow semantics, if you consider subprograms to be directly represented as embellished, glorified macros.
There may be something similar to the Chomsky hierarchy for infinitary term languages; so that type 3 corresponds to rational terms, type 2 to "algebraic terms" (those that can be rolled up into a finite set of "goto" references and "macro" definitions), and type 0 for "transcendental terms". That is, for me, an unresolved loose end, as well.

What is the difference between equality and equivalence?

I've read a few instances in reading mathematics and computer science that use the equivalence symbol ≡, (basically an '=' with three lines) and it always makes sense to me to read this as if it were equality. What is the difference between these two concepts?

Wikipedia: Equivalence relation:
In mathematics, an equivalence
relation is a binary relation between
two elements of a set which groups
them together as being "equivalent" in
some way. Let a, b, and c be arbitrary
elements of some set X. Then "a ~ b"
or "a ≡ b" denotes that a is
equivalent to b.
An equivalence relation "~" is reflexive, symmetric, and transitive.
In other words, = is just an instance of equivalence relation.
Edit: This seemingly simple criteria of being reflexive, symmetric, and transitive are not always trivial. See Bloch's Effective Java 2nd ed p. 35 for example,
public final class CaseInsensitiveString {
...
// broken
#Override public boolean equals(Object o) {
if (o instance of CaseInsensitiveString)
return s.equalsIgnoreCase(
((CaseInsensitiveString) o).s);
if (o instanceof String) // One-way interoperability!
return s.equalsIgnoreCase((String) o);
return false;
}
}
The above equals implementation breaks the symmetry because CaseInsensitiveString knows about String class, but the String class doesn't know about CaseInsensitiveString.

I take your question to be about math notation rather than programming. The triple equal sign you refer to can be written ≡ in HTML or \equiv in LaTeX.
a ≡ b most commonly means "a is defined to be b" or "let a be equal to b".
So 2+2=4 but φ ≡ (1+sqrt(5))/2.
Here's a handy equivalence table:
Mathematicians Computer scientists
-------------- -------------------
= ==
≡ =
(The other answers about equivalence relations are correct too but I don't think those are as common. There's also a ≡ b (mod m) which is pronounced "a is congruent to b, mod m" and in programmer parlance would be expressed as mod(a,m) == mod(b,m). In other words, a and b are equal after mod'ing by m.)

A lot of languages distinguish between equality of the objects and equality of the values of those objects.
Ruby for example has 3 different ways to test equality. The first, equal?, compares two variables to see if they point to the same instance. This is equivalent in a C-style language of doing a check to see if 2 pointers refer to the same address. The second method, ==, tests value equality. So 3 == 3.0 would be true in this case. The third, eql?, compares both value and class type.
Lisp also has different concepts of equality depending on what you're trying to test.

In languages that I have seen that differentiate between equality and equivalence, equality usually means the type and value are the same while equivalence means that just the values are the same. For example:
int i = 3;
double d = 3.0;
i and d would be have an equivalence relationship since they represent the same value but not equality since they have different types. Other languages may have different ideas of equivalence (such as whether two variables represent the same object).

The answers above are right / partially right but they don't explain what the difference is exactly. In theoretical computer science (and probably in other branches of maths) it has to do with quantification over free variables of the logical equation (that is when we use the two notations at once).
For me the best ways to understand the difference is:
By definition
A ≡ B
means
For all possible values of free variables in A and B, A = B
or
A ≡ B <=> [A = B]
By example
x=2x
iff (in fact iff is the same as ≡)
x=0
x ≡ 2x
iff (because it is not the case that x = 2x for all possible values of x)
False
I hope it helps
Edit:
Another thing that came to my head is the definitions of the two.
A = B is defined as A <= B and A >= B, where <= (smaller equal, not implies) can be any ordering relation
A ≡ B is defined as A <=> B (iff, if and only if, implies both sides), worth noting that implication is also an ordering relation and so it is possible (but less precise and often confusing) to use = instead of ≡.
I guess the conclusion is that when you see =, then you have to figure out the authors intention based on the context.

Take it outside the realm of programming.
(31) equal -- (having the same quantity, value, or measure as another; "on equal terms"; "all men are equal before the law")
equivalent, tantamount -- (being essentially equal to something; "it was as good as gold"; "a wish that was equivalent to a command"; "his statement was tantamount to an admission of guilt"
At least in my dictionary, 'equivelance' means its a good-enough subsitute for the original, but not necessarily identical, and likewise 'equality' conveys complete identical.
null == 0 # true , null is equivelant to 0 ( in php )
null === 0 # false, null is not equal to 0 ( in php )
( Some people use ≈ to represent nonidentical values instead )

The difference resides above all in the level at which the two concepts are introduced. '≡' is a symbol of formal logic where, given two propositions a and b, a ≡ b means (a => b AND b => a).
'=' is instead the typical example of an equivalence relation on a set, and presumes at least a theory of sets. When one defines a particular set, usually he provides it with a suitable notion of equality, which comes in the form of an equivalence relation and uses the symbol '='. For example, when you define the set Q of the rational numbers, you define equality a/b = c/d (where a/b and c/d are rational) if and only if ad = bc (where ad and bc are integers, the notion of equality for integers having already been defined elsewhere).
Sometimes you will find the informal notation f(x) ≡ g(x), where f and g are functions: It means that f and g have the same domain and that f(x) = g(x) for each x in such domain (this is again an equivalence relation). Finally, sometimes you find ≡ (or ~) as a generic symbol to denote an equivalence relation.

You could have two statements that have the same truth value (equivalent) or two statements that are the same (equality). As well the "equal sign with three bars" can also mean "is defined as."

Equality really is a special kind of equivalence relation, in fact. Consider what it means to say:
0.9999999999999999... = 1
That suggests that equality is just an equivalence relation on "string numbers" (which are defined more formally as functions from Z -> {0,...,9}). And we can see from this case, the equivalence classes are not even singletons.

The first problem is, what equality and equivalence mean in this case? Essentially, contexts are quite free to define these terms.
The general tenor I got from various definitions is: For values called equal, it should make no difference which one you read from.
The grossest example that violates this expectation is C++: x and y are said to be equal if x == y evaluates to true, and x and y are said to be equivalent if !(x < y) && !(y < x). Even apart from user-defined overloads of these operators, for floating-point numbers (float, double) those are not the same: All NaN values are equivalent to each other (in fact, equivalent to everything), but not equal to anything including themselves, and the values -0.0 and +0.0 compare equal (and equivalent) although you can distinguish them if you’re clever.
In a lot of cases, you’d need better terms to convey your intent precisely. Given two variables x and y,
identity or “the same” for expressing that there is only one object and x and y refer to it. Any change done through x is inadvertantly observable through y and vice versa. In Java, reference type variables are checked for identity using ==, in C# using the ReferenceEquals method. In C++, if x and y are references, std::addressof(x) == std::addressof(y) will do (whereas &x == &y will work most of the time, but & can be customized for user-defined types).
bitwise or structure equality for expressing that the internal representations of x and y are the same. Notice that bitwise equality breaks down when objects can reference (parts of) themselves internally. To get the intended meaning, the notion has to be refined in such cases to say: Structured the same. In D, bitwise equality is checked via is and C offers memcmp. I know of no language that has built-in structure equality testing.
indistinguishability or substitutability for expressing that values cannot be distinguished (through their public interface): If a function f takes two parameters and x and y are indistinguishable, the calls f(x, y), f(x, x), and f(y, y) always return indistinguishable values – unless f checks for identity (see bullet point above) directly or maybe by mutating the parameters. An example could be two search-trees that happen to contain indistinguishable elements, but the internal trees are layed-out differently. The internal tree layout is an implementation detail that normally cannot be observed through its public methods.
This is also called Leibniz-equality after Gottfried Wilhelm Leibniz who defined equality as the lack of differences.
equivalence for expressing that objects represent values considered essentially the same from some abstract reasoning. For an example for distinguishable equivalent values, observe that floating-point numbers have a negative zero -0.0 distinct from +0.0, and e.g. sign(1/x) is different for -0.0 and +0.0. Equivalence for floating-point numbers is checked using == in many languages with C-like syntax (aka. Algol syntax). Most object-oriented languages check equivalence of objects using an equals (or similarly named) method. C# has the IEquatable<T> interface to designate that the class has a standard/canonical/default equivalence relation defined on it. In Java, one overrides the equals method every class inherits from Object.
As you can see, the notions become increasingly vague. Checking for identity is something most languages can express. Identity and bitwise equality usually cannot be hooked by the programmer as the notions are independent from interpretations. There was a C++20 proposal, which ended up being rejected, that would have introduced the last two notions as strong† and weak equality†. († This site looks like CppReference, but is not; it is not up-to-date.) The original paper is here.
There are languages without mutation, primarily functional languages like Haskell. The difference between equality and equivalence there is less of an issue and tilts to the mathematical use of those words. (In math, generally speaking, (recursively defined) sequences are used instead of re-assignments.)
Everything C has, is also available to C++ and any language that can use C functionality. Everything said about C# is true for Visual Basic .NET and probably all languages built on the .NET framework. Analogously, Java represents the JRE languages that also include Kotlin and Scala.
If you just want stupid definitions without wisdom: An equivalence relation is a reflexive, symmetrical, and transitive binary relation on a set. Equality then is the intersection of all those equivalence relations.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex