What is a "strongly moded" programming language?

What is a "strongly moded" programming language? - functional-programming

I was looking through the Mercury programming language's about page when I found a part where it said:
Mercury is a strongly moded language
What does this mean!? I've search all over the internet, and have found no answer!

I do not know of any other language that has modes as used in Mercury. The following is from the mercury manual
The mode of a predicate (or function) is a mapping from the initial
state of instantiation of the arguments of the predicate (or the
arguments and result of a function) to their final state of
instantiation.
If you are familiar with prolog you may know what this means.
Consider a function in C with the following typedecl
void sort(int * i, int * o);
Assume this function sorts the array i into another array o. Just from this declaration we have no guarantee that i is read from and o is written to. If we can write additionally mode sort(in, out) that suggests to the compiler that the function sort reads from the first argument and writes to the second argument. The compiler then checks the function body to assure us that no writing to i and reading from o takes place.
For a language like C this may not be suitable, but for a prolog family language this is a very welcome feature. Consider the append/3 predicate which succeeds when the first two lists concatenated is the third list.
append([1, 2], [a, b], X).
X = [1, 2, a, b]
So if we provide two input lists we get an output list. But when we provide the output list and ask for all solutions that result in it, we have
append(X, Y, [1, 2, a, b]).
X = [],
Y = [1, 2, a, b] ;
X = [1],
Y = [2, a, b] ;
X = [1, 2],
Y = [a, b] ;
X = [1, 2, a],
Y = [b] ;
X = [1, 2, a, b],
Y = [] ;
false.
This append([1], [2], [3]). fails where as append([1], [2], [1, 2]). succeeds.
So depending on how we use the predicate, we can have one deterministic answer, multiple answers, or no answer at all. All these properties of the predicate can be declared initially by mode declarations. The following is a mode declaration for append :
:- pred append(list(T), list(T), list(T)).
:- mode append(in, in, out) is det.
:- mode append(out, out, in) is multi.
:- mode append(in, in, in) is semidet.
If you provide the first two, the output is deterministically determined. If you provide the last argument, then the you have multiple solutions to the first two arguments. If you provide all three lists, then it just checks if when the first two are appended we get the third list.
Modes are not restricted to in, out. You will see di destructive input and uo unique output when dealing with IO. They just tell us how a predicate changes the instantiations of argument we provided. Outputs change from free variables to ground terms and inputs remain ground terms. So as a user you can define :- mode input == ground >> ground. and :- mode output == free >> ground. and use them, which are exactly how in and out modes are defined.
Consider a predicate which calculates the length of a list. We do not need the whole list to be instantiated as we know that length([X, Y], 2) is true even when X, Y are free variables. So the mode declaration :- mode length(in, out) is det. is not sufficient as the whole first argument need not be instantiated. So we can also define the instantiatedness of the argument
:- inst listskel == bound([] ; [free | listskel]).
which states that an argument is listskel instantiated if it is a empty list or a free variable ahead of another listskel list.
Such partial evaluations also happen in haskell due to its lazy nature e.g, a whole list need not be evaluated to know its length.
References:
modes
determinism
EDIT: From the mercury website
Currently only a subset of the intended mode system is implemented. This subset effectively requires arguments to be either fully input (ground at the time of call and at the time of success) or fully output (free at the time of call and ground at the time of success).

Modes specify the direction of data-flow, e.g. input or output.
In some languages, the direction of data-flow is fixed, and implicit in the syntax. For example, in most functional languages, function arguments are always
input, and function results are always output.
In most logic programming languages, however, the direction of data-flow is determined at run-time. Most logic programming languages are dynamically moded.
In Mercury, the direction of dataflow must be declared, at least at module boundaries. However, a single predicate or function in Mercury can have multiple modes; the compiler resolves the modes at compile time, and generates separate code for each mode.
Mercury also has support for optional dynamic modes with constraint solving.
(See https://www.researchgate.net/publication/220802747_Adding_Constraint_Solving_to_Mercury.)
But the default is static modes.

Related

How are state monads / monad transformers desugared inside do notation?

Example
sumArray :: Array Int -> State Int Unit
sumArray = traverse_ \n -> modify \sum -> sum + n
t1 :: Int
t1 = execState (do
sumArray [1, 2, 3]
sumArray [4, 5]
sumArray [6]) 0
-- returns 21
module Main where
import Prelude
import Effect (Effect)
import Data.Foldable (fold, traverse_)
import Control.Monad.State (State, execState)
import Control.Monad.State.Class (modify)
import Data.Maybe (Maybe(..))
import Effect.Console (log)
main :: Effect Unit
main = log $ show t1
sumArray :: Array Int -> State Int Unit
sumArray = traverse_ \n -> modify \sum -> sum + n
t1 :: Int
t1 = execState (do
sumArray [1, 2, 3]
sumArray [4, 5]
sumArray [6]) 0
{-
execState :: forall s a. State s a -> s -> s
execState (StateT m) s = case m s of Identity (Tuple _ s') -> s'
type State s = StateT s Identity
-}
Description
How I understand the evaluation of expression t1:
Each sumArray invocation returns a state monad holding the sum of given Int array values.
All three state monads are (somehow) unified into a single one, while accumulating intermediate sums.
execState returns the overall Int sum, given State Int Unit and initial value as input.
Issue
I cannot quite understand step 2 in particular. According to do notation, an expression like sumArray [1, 2, 3] desugars to bind x \_ -> ..., so former input is ignored. If I write do with a different monad type like
t2 :: Maybe Int
t2 = do
Just 3
Just 4
, the compiler complains:
A result of type Int was implicitly discarded in a do notation block. You can use _ <- ... to explicitly discard the result.
, so rules seem to be a bit different with t1.
Question
How exactly are the three separate state monads combined into a single one? More specifically: Why does the runtime calculate the overall sum of all intermediate state monad sum results, and not something like (1+2+3) * (4+5) * 6? In other words: where is the implicit + accumulator?
I feel like I miss some concept from Chapter 11: Monadic Adventures.

#Bergi's answer already gives the essential explanation - but I thought it might be useful to expand on some parts of what he says, and to answer some of your questions more directly.
According to do notation, an expression like sumArray [1, 2, 3]
desugars to bind x \_ -> ..., so former input is ignored.
This is in one sense perfectly true, but also betrays some misunderstandings.
For one thing, I find that quote misleadingly phrased - although it's perfectly acceptable in the context of the original source. It's not talking about how an expression like sumArray [1, 2, 3] "desugars" in and of itself, but about how successive lines ("statements") of a do block are desugared into a single expression that "combines" them - which seems to be in essence what your whole question is about. So yes, it's true - and basically the definition of do notation - that an expression like
do a <- x
y
desugars to bind x \a -> y (where we imagine that y is some more complex expression which presumably involves a). And likewise that
do x
y
desugars to bind x \_ -> y. But the latter case isn't "ignoring input" - it's ignoring output. Let me explain this a little more.
It's common to think of a general monadic value, of type m a, to be some sort of "computation" that "produces" value(s) of type a. That's necessarily quite an abstract formulation - because a Monad is such a general concept, and some specific Monads fit this mental picture better than others. But it's a good way to understand the basics of monads, and do notation in particular - each line can be thought of as a "statement" in some imperative language, which may have some "side effects" (of a kind strictly constrained by the particular monad you're using) and also produces a value as a "result".
In this sense, a do block of the first type above - where we "bind" the result using the "left arrow" notation - is using that computed value (denoted by a) to decide what to do next. (Incidentally, this is what distinguishes monads from applicatives - if you just have a series of computations and just want to combine their "effects", without allowing "intermediate results" to affect what you're doing, you don't actually need monads or bind.) Whereas the second one doesn't use the result of the first computation (that computation being x) - which is exactly what I meant when I said this is "ignoring output". It's ignoring the result of x. That doesn't (necessarily) mean that x is useless though. It's still being used for its "side effects".
To make it more concrete, I'll look in a bit more detail at both of your examples, starting with the simple one in the Maybe monad (I'll make the change that the compiler suggests in order to keep it happy - note that I'm personally much more familiar with Haskell than with Purescript so I may get Purescript-specific things like this wrong, as Haskell would be perfectly OK with your original code):
t2 :: Maybe Int
t2 = do
_ <- Just 3
Just 4
In this case, t2 will simply be equal to Just 4, and it will seem - correctly - that the first line of the do block is redundant. But that's just a consequence of how the Maybe monad works, as well as of the specific value we've got there. I can easily prove to you that the first line does still matter though, by making this change
t2 :: Maybe Int
t2 = do
_ <- Nothing
Just 4
Now you will find that t2 is equal not to Just 4, not to Nothing!
That's because each "computation" in the Maybe monad - that is, each value of type Maybe a - either "succeeds" with a "result" of type a (represented by a Just value), or "fails" (represented by Nothing). And, importantly, the way the Maybe monad is defined - that is, the definition of bind - deliberately propagates failure. That is, any Nothing value that is encountered at any point immediately terminates the computation with a Nothing result.
So even here, the "side effect" of the first computation - the fact that it succeeds or fails - does make a major difference to what happens overall. We just ignore the "result" (the actual value if the computation was successful).
If we now move to the State monad - this is a somewhat more complicated monad than Maybe, but may actually for that reason make the above points easier to understand. Because this is a monad where it really does make immediate sense to talk about the "side effects" and the "result" of each monadic value - which perhaps felt a bit forced, or even silly, in the Maybe case.
A value of type State s a represents a computation that results in a value of type a, while "keeping some state" of type s. That is, the computation may use the current state in order to compute its result, and/or it may update the state as part of the computation. Concretely, this is the same as a function of type s -> (a, s) - it takes some state, and returns an updated state (possibly the same) as well as the computed value. And indeed, the State s a type is essentially a simple newtype wrapper for such a function type.
And the implementation of bind in its Monad instance does the most obvious and natural thing - much easier to explain in words than to "see" from the actual implementation details. Two such "stateful functions" are combined by feeding the original state to the first function, then taking the updated state from that and feeding that to the second function. (Actually, bind needs to do - and does - more than this, because as I mentioned earlier, it needs to be able to use the "result" - the a - from the first computation to decide what to do with the second. But we don't need to go into that now, because in this example we don't use the result value - and indeed couldn't, as it's always of the trivial Unit type. It isn't actually complicated, but I won't go into the detail as I don't want to make this answer even longer!)
So when we do
do
sumArray [1, 2, 3]
sumArray [4, 5]
sumArray [6]
we are building a stateful computation of type State Int Unit - that is, a function of type Int -> (Unit, Int). Since Unit is a non-interesting type, and basically used as a placeholder here for "we don't care about any result", we're basically building a function of type Int -> Int from three other such functions. That's easy enough to do - we can just compose the three functions! And that's what, in this simple case, the implementation of bind for the State monad ends up doing.
Hopefully this answers your main question:
where is the implicit + accumulator?
by showing that there's no "implicit accumulator" other than function composition. It's the fact that those individual functions happen to add (respectively, in this case) 6, 9 and 6 to the input that results in the final result being the sum of those 3 numbers (due to the fact that the composition of two sums is itself a sum, which ultimately comes from the associativity of addition).
But more importantly, I hope this has given you a fuller explanation of Monads, and do notation, which you can apply to many other situations.

Each sumArray invocation returns a state monad holding the sum of given Int array values.
No, it doesn't return "a state monad", and it doesn't hold the sum of the array.
It returns a State Int Unit value, which represents a "stateful computation of a Unit" (using an Int for the state). To get the sum, you actually have to run that computation:
t :: State Int Unit
t = sumArray [1, 2, 3]
x = runState t 0 // ((), 6)
y = runState t 5 // ((), 11)
Notice how for the value y, the sum of the array elements is never computed - it adds 1 to 5, then 2 to 6, then 3 to 8.
How exactly are the three separate state monads combined into a single one?
The key to understanding is that they are not state values, but stateful computations. They can be combined by simply sequencing those computations one after the other, passing the result and state on to the next.

Recursive addition in Prolog

Knowledge Base
add(0,Y,Y). // clause 1
add(succ(X),Y,succ(Z)) :- add(X,Y,Z). // clause 2
Query
add(succ(succ(succ(0))), succ(succ(0)), R)
Trace
Call: (6) add(succ(succ(succ(0))), succ(succ(0)), R)
Call: (7) add(succ(succ(0)), succ(succ(0)), _G648)
Call: (8) add(succ(0), succ(succ(0)), _G650)
Call: (9) add(0, succ(succ(0)), _G652)
Exit: (9) add(0, succ(succ(0)), succ(succ(0)))
Exit: (8) add(succ(0), succ(succ(0)), succ(succ(succ(0))))
Exit: (7) add(succ(succ(0)), succ(succ(0)), succ(succ(succ(succ(0)))))
Exit: (6) add(succ(succ(succ(0))), succ(succ(0)), succ(succ(succ(succ(succ(0))))))
My Question
I see how the recursive call in clause 2 strips the outermost succ()
at each call for argument 1.
I see how it adds an outer succ() to argument 3 at each call.
I see when the 1st argument as a result of these recursive calls
reaches 0. At that point, I see how the 1st clause copies the 2nd
argument to the 3rd argument.
This is where I get confused.
Once the 1st clause is executed, does the 2nd clause automatically
get executed as well, then adding succ() to the first argument?
Also, how does the program terminate, and why doesn't it just keep
adding succ() to the first and 3rd arguments infinitely?
Explanation from LearnPrologNow.com (which I don't understand)
Let’s go step by step through the way Prolog processes this query. The
trace and search tree for the query are given below.
The first argument is not 0 , which means that only the second clause
for add/3 can be used. This leads to a recursive call of add/3 . The
outermost succ functor is stripped off the first argument of the
original query, and the result becomes the first argument of the
recursive query. The second argument is passed on unchanged to the
recursive query, and the third argument of the recursive query is a
variable, the internal variable _G648 in the trace given below. Note
that _G648 is not instantiated yet. However it shares values with R
(the variable that we used as the third argument in the original
query) because R was instantiated to succ(_G648) when the query was
unified with the head of the second clause. But that means that R is
not a completely uninstantiated variable anymore. It is now a complex
term, that has a (uninstantiated) variable as its argument.
The next two steps are essentially the same. With every step the first
argument becomes one layer of succ smaller; both the trace and the
search tree given below show this nicely. At the same time, a succ
functor is added to R at every step, but always leaving the innermost
variable uninstantiated. After the first recursive call R is
succ(_G648) . After the second recursive call, _G648 is instantiated
with succ(_G650) , so that R is succ(succ(_G650) . After the third
recursive call, _G650 is instantiated with succ(_G652) and R therefore
becomes succ(succ(succ(_G652))) . The search tree shows this step by
step instantiation.
At this stage all succ functors have been stripped off the first
argument and we can apply the base clause. The third argument is
equated with the second argument, so the ‘hole’ (the uninstantiated
variable) in the complex term R is finally filled, and we are through.

Let us start by getting the terminology right.
These are the clauses, as you correctly indicate:
add(0, Y, Y).
add(succ(X), Y, succ(Z)) :- add(X, Y, Z).
Let us first read this program declaratively, just to make sure we understand its meaning correctly:
0 plus Y is Y. This makes sense.
If it is true that X plus Y is Z then it is true that the successor of X plus Y is the successor of Z.
This is a good way to read this definition, because it is sufficiently general to cover various modes of use. For example, let us start with the most general query, where all arguments are fresh variables:
?- add(X, Y, Z).
X = 0,
Y = Z ;
X = succ(0),
Z = succ(Y) ;
X = succ(succ(0)),
Z = succ(succ(Y)) .
In this case, there is nothing to "strip", since none of the arguments is instantiated. Yet, Prolog still reports very sensible answers that make clear for which terms the relation holds.
In your case, you are considering a different query (not a "predicate definition"!), namely the query:
?- add(succ(succ(succ(0))), succ(succ(0)), R).
R = succ(succ(succ(succ(succ(0))))).
This is simply a special case of the more general query shown above, and a natural consequence of your program.
We can also go in the other direction and generalize this query. For example, this is a generalization, because we replace one ground argument by a logical variable:
?- add(succ(succ(succ(0))), B, R).
R = succ(succ(succ(B))).
If you follow the explanation you posted, you will make your life very difficult, and arrive at a very limited view of logic programs: Realistically, you will only be able to trace a tiny fragment of modes in which you could use your predicates, and a procedural reading thus falls quite short of what you are actually describing.
If you really insist on a procedural reading, start with a simpler case first. For example, let us consider:
?- add(succ(0), succ(0), R).
To "step through" procedurally, we can proceed as follows:
Does the first clause match? (Note that "matching" is already limited reading: Prolog actually applies unification, and a procedural reading leads us away from this generality.)
Answer: No, because s(_) does not unify with 0. So only the second clause applies.
The second clause only holds if its body holds, and in this case if add(0, succ(0), Z) holds. And this holds (by applying the first clause) if Z is succ(0) and R is succ(Z).
Therefore, one answer is R = succ(succ(0)).. This answer is reported.
Are there other solutions? These are only reported on backtracking.
Answer: No, there are no other solutions, because no further clause matches.
I leave it as an exercise to apply this painstaking method to the more complex query shown in the book. It is straight-forward to do it, but will increasingly lead you away from the most valuable aspects of logic programs, found in their generality and elegant declarative expression.
Your question regarding termination is both subtle and insightful. Note that we must distinguish between existential and universal termination in Prolog.
For example, consider again the most general query shown above: It yields answers, but it does not terminate. For an answer to be reported, it is enough that an answer substitution is found that makes the query true. This is the case in your example. Alternatives, if any potentially remain, are tried and reported on backtracking.
You can always try the following to test termination of your query: Simply append false/0, for example:
?- add(X, Y, Z), false.
nontermination
This lets you focus on termination properties without caring about concrete answers.
Note also that add/3 is a terrible name for a relation: An imperative always implies a direction, but this is in fact much more general and usable also if none of the arguments are even instantiated! A good predicate name should reflect this generality.

Correct form of letrec in Hindley-Milner type system?

I'm having trouble understanding the letrec definition for HM system that is given on Wikipedia, here: https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system#Recursive_definitions
For me, the rule translates roughly to the following algorithm:
infer types on everything in the letrec definition part
assign temporary type variables to each defined identifier
recursively process all definitions with temporary types
in pairs, unify the results with the original temporary variables
close (with forall) the inferred types, add them to the basis (context) and infer types of the expression part with it
I'm having trouble with a program like this:
letrec
p = (+) --has type Uint -> Uint -> Uint
x = (letrec
test = p 5 5
in test)
in x
The behavior I'm observing is as follows:
definition of p gets temporary type a
definition of x gets some temporary type too, but that's out of our scope now
in x, definition of test gets a temporary type t
p gets the temporary type a from the scope, using the HM rule for a variable
(f 5) gets processed by HM rule for application, resulting type is b (and the unification that (a unifies with Uint -> b)
((p 5) 5) gets processed by the same rule, resulting in more unifications and type c, a now in result unifies with Uint -> Uint -> c
now, test gets closed to type forall c.c
variable test of in test gets the type instance (or forall c.c) with fresh variables, accrodingly to the HM rule for variable, resulting in test :: d (that is unified with test::t right on)
resulting x has effectively type d (or t, depending on the mood of unification)
The problem: x should obviously have type Uint, but I see no way those two could ever unify to produce the type. There is a loss of information when the type of test gets closed and instance'd again that I'm not sure how to overcome or connect with substitutions/unifications.
Any idea how the algorithm should be corrected to produce x::Uint typing correctly? Or is this a property of HM system and it simply will not type such case (which I doubt)?
Note that this would be perfectly OK with standard let, but I didn't want to obfuscate the example with recursive definitions that can't be handled by let.
Thanks in advance

Answering my own question:
The definition on the Wiki is wrong, although it works to certain extent at least for type checking.
Most simple and correct way to add recursion to HM system is to use fix predicate, with definition fix f = f (fix f) and type forall a. (a->a) -> a. Mutual recursion is handled by double fixpoint, etc.
Haskell approach to the problem (described at https://gist.github.com/chrisdone/0075a16b32bfd4f62b7b#binding-groups ) is (roughly) to derive an incomplete type for all functions and then run the derivation again to check them against each other.

What's the difference between "equal (=)" and "identical (==)" in ocaml?

In OCaml, we have two kinds of equity comparisons:
x = y and x == y,
So what's exact the difference between them?
Is that x = y in ocaml just like x.equals(y) in Java?
and x == y just like x == y (comparing the address) in Java?

I don't know exactly how x.equals(y) works in Java. If it does a "deep" comparison, then the analogy is pretty close. One thing to be careful of is that physical equality is a slippery concept in OCaml (and functional languages in general). The compiler and runtime system are going to move values around, and may merge and unmerge pure (non-mutable) values at will. So you should only use == if you really know what you're doing. At some level, it requires familiarity with the implementation (which is something to avoid unless necessary).
The specific guarantees that OCaml makes for == are weak. Mutable values compare as physically equal in the way you would expect (i.e., if mutating one of the two will actually mutate the other also). But for non-mutable values, the only guarantee is that values that compare physically equal (==) will also compare as equal (=). Note that the converse is not true, as sepp2k points out for floating values.
In essence, what the language spec is telling you for non-mutable values is that you can use == as a quick check to decide if two non-mutable values are equal (=). If they compare physically equal, they are equal value-wise. If they don't compare physically equal, you don't know if they're equal value-wise. You still have to use = to decide.

Edit: this answer delves into details of the inner working of OCaml, based on the Obj module. That knowledge isn't meant to be used without extra care (let me emphasis on that very important point once more: don't use it for your program, but only if you wish to experiment with the OCaml runtime). That information is also available, albeit perhaps in a more understandable form in the O'Reilly book on OCaml, available online (pretty good book, though a bit dated now).
The = operator is checking structural equality, whereas == only checks physical equality.
Equality checking is based on the way values are allocated and stored within memory. A runtime value in OCaml may roughly fit into 2 different categories : either boxed or unboxed. The former means that the value is reachable in memory through an indirection, and the later means that the value is directly accessible.
Since int (int31 on 32 bit systems, or int63 on 64 bit systems) are unboxed values, both operators are behaving the same with them. A few other types or values, whose runtime implementations are actually int, will also see both operators behaving the same with them, like unit (), the empty list [], constants in algebraic datatypes and polymorphic variants, etc.
Once you start playing with more complex values involving structures, like lists, arrays, tuples, records (the C struct equivalent), the difference between these two operators emerges: values within structures will be boxed, unless they can be runtime represented as native ints (1). This necessity arises from how the runtime system must handle values, and manage memory efficiently. Structured values are allocated when constructed from other values, which may be themselves structured values, in which case references are used (since they are boxed).
Because of allocations, it is very unlikely that two values instantiated at different points of a program could be physically equal, although they'd be structurally equal. Each of the fields, or inner elements within the values could be identical, even up to physical identity, but if these two values are built dynamically, then they would end up using different spaces in memory, and thus be physically different, but structurally equal.
The runtime tries to avoid unecessary allocations though: for instance, if you have a function returning always the same value (in other words, if the function is constant), either simple or structured, that function will always return the same physical value (ie, the same data in memory), so that testing for physical equality the result of two invocations of that function will be successful.
One way to observe when the physical operator will actually return true is to use the Obj.is_block function on its runtime representation (That is to say, the result of Obj.repr on it). This function simply tells whether its parameter runtime representation is boxed.
A more contrived way is to use the following function:
let phy x : int = Obj.magic (Obj.repr x);;
This function will return an int which is the actual value of the pointer to the value bound to x in memory, if this value is boxed. If you try it on a int literal, you will get the exact same value! That's because int are unboxed (ie. the value is stored directly in memory, not through a reference).
Now that we know that boxed values are actually "referenced" values, we can deduce that these values can be modified, even though the language says that they are immutable.
consider for instance the reference type:
# type 'a ref = {mutable contents : 'a };;
We could define an immutable ref like this:
# type 'a imm = {i : 'a };;
type 'a imm = {i : 'a; }
And then use the Obj.magic function to coerce one type into the other, because structurally, these types will be reduced to the same runtime representation.
For instance:
# let x = { i = 1 };;
- : val x : int imm = { i = 1 }
# let y : int ref = Obj.magic x;;
- : val y : int ref = { contents = 1 }
# y := 2;;
- : unit = ()
# x
- : int imm = { i = 2 }
There are a few exceptions to this:
if values are objects, then even seemingly structurally identical values will return false on structural comparison
# let o1 = object end;;
val o1 : < > = <obj>
# let o2 = object end;;
val o2 : < > = <obj>
# o1 = o2;;
- : bool = false
# o1 = o1;;
- : bool = true
here we see that = reverts to physical equivalence.
If values are functions, you cannot compare them structurally, but physical comparison works as intended.
lazy values may or may not be structurally comparable, depending on whether they have been forced or not (respectively).
# let l1 = lazy (40 + 2);;
val l1 : lazy_t = <lazy>
# let l2 = lazy (40 + 2);;
val l2 : lazy_t = <lazy>
# l1 = l2;;
Exception: Invalid_argument "equal: functional value".
# Lazy.force l1;;
- : int = 42
# Lazy.force l2;;
- : int = 42
# l1 = l2;;
- : bool = true
module or record values are also comparable if they don't contain any functional value.
In general, I guess that it is safe to say that values which are related to functions, or may hold functions inside are not comparable with =, but may be compared with ==.
You should obviously be very cautious with all this: relying on the implementation details of the runtime is incorrect (Note: I jokingly used the word evil in my initial version of that answer, but changed it by fear of it being taken too seriously). As you aptly pointed out in comments, the behaviour of the javascript implementation is different for floats (structurally equivalent in javascript, but not in the reference implementation, and what about the java one?).
(1) If I recall correctly, floats are also unboxed when stored in arrays to avoid a double indirection, but they become boxed once extracted, so you shouldn't see a difference in behaviour with boxed values.

Is that x = y in ocaml just like x.equals(y) in Java?
and x == y just like x == y (comparing the address) in Java?
Yes, that's it. Except that in OCaml you can use = on every kind of value, whereas in Java you can't use equals on primitive types. Another difference is that floating point numbers in OCaml are reference types, so you shouldn't compare them using == (not that it's generally a good idea to compare floating point numbers directly for equality anyway).
So in summary, you basically should always be using = to compare any kind of values.

according to http://rigaux.org/language-study/syntax-across-languages-per-language/OCaml.html, == checks for shallow equality, and = checks for deep equality

A Functional-Imperative Hybrid

Pure functional programming languages do not allow mutable data, but some computations are more naturally/intuitively expressed in an imperative way -- or an imperative version of an algorithm may be more efficient. I am aware that most functional languages are not pure, and let you assign/reassign variables and do imperative things but generally discourage it.
My question is, why not allow local state to be manipulated in local variables, but require that functions can only access their own locals and global constants (or just constants defined in an outer scope)? That way, all functions maintain referential transparency (they always give the same return value given the same arguments), but within a function, a computation can be expressed in imperative terms (like, say, a while loop).
IO and such could still be accomplished in the normal functional ways - through monads or passing around a "world" or "universe" token.

My question is, why not allow local state to be manipulated in local variables, but require that functions can only access their own locals and global constants (or just constants defined in an outer scope)?
Good question. I think the answer is that mutable locals are of limited practical value but mutable heap-allocated data structures (primarily arrays) are enormously valuable and form the backbone of many important collections including efficient stacks, queues, sets and dictionaries. So restricting mutation to locals only would not give an otherwise purely functional language any of the important benefits of mutation.
On a related note, communicating sequential processes exchanging purely functional data structures offer many of the benefits of both worlds because the sequential processes can use mutation internally, e.g. mutable message queues are ~10x faster than any purely functional queues. For example, this is idiomatic in F# where the code in a MailboxProcessor uses mutable data structures but the messages communicated between them are immutable.
Sorting is a good case study in this context. Sedgewick's quicksort in C is short and simple and hundreds of times faster than the fastest purely functional sort in any language. The reason is that quicksort mutates the array in-place. Mutable locals would not help. Same story for most graph algorithms.

The short answer is: there are systems to allow what you want. For example, you can do it using the ST monad in Haskell (as referenced in the comments).
The ST monad approach is from Haskell's Control.Monad.ST. Code written in the ST monad can use references (STRef) where convenient. The nice part is that you can even use the results of the ST monad in pure code, as it is essentially self-contained (this is basically what you were wanting in the question).
The proof of this self-contained property is done through the type-system. The ST monad carries a state-thread parameter, usually denoted with a type-variable s. When you have such a computation you'll have monadic result, with a type like:
foo :: ST s Int
To actually turn this into a pure result, you have to use
runST :: (forall s . ST s a) -> a
You can read this type like: give me a computation where the s type parameter doesn't matter, and I can give you back the result of the computation, without the ST baggage. This basically keeps the mutable ST variables from escaping, as they would carry the s with them, which would be caught by the type system.
This can be used to good effect on pure structures that are implemented with underlying mutable structures (like the vector package). One can cast off the immutability for a limited time to do something that mutates the underlying array in place. For example, one could combine the immutable Vector with an impure algorithms package to keep the most of the performance characteristics of the in place sorting algorithms and still get purity.
In this case it would look something like:
pureSort :: Ord a => Vector a -> Vector a
pureSort vector = runST $ do
mutableVector <- thaw vector
sort mutableVector
freeze mutableVector
The thaw and freeze functions are linear-time copying, but this won't disrupt the overall O(n lg n) running time. You can even use unsafeFreeze to avoid another linear traversal, as the mutable vector isn't used again.

"Pure functional programming languages do not allow mutable data" ... actually it does, you just simply have to recognize where it lies hidden and see it for what it is.
Mutability is where two things have the same name and mutually exclusive times of existence so that they may be treated as "the same thing at different times". But as every Zen philosopher knows, there is no such thing as "same thing at different times". Everything ceases to exist in an instant and is inherited by its successor in possibly changed form, in a (possibly) uncountably-infinite succession of instants.
In the lambda calculus, mutability thus takes the form illustrated by the following example: (λx (λx f(x)) (x+1)) (x+1), which may also be rendered as "let x = x + 1 in let x = x + 1 in f(x)" or just "x = x + 1, x = x + 1, f(x)" in a more C-like notation.
In other words, "name clash" of the "lambda calculus" is actually "update" of imperative programming, in disguise. They are one and the same - in the eyes of the Zen (who is always right).
So, let's refer to each instant and state of the variable as the Zen Scope of an object. One ordinary scope with a mutable object equals many Zen Scopes with constant, unmutable objects that either get initialized if they are the first, or inherit from their predecessor if they are not.
When people say "mutability" they're misidentifying and confusing the issue. Mutability (as we've just seen here) is a complete red herring. What they actually mean (even unbeknonwst to themselves) is infinite mutability; i.e. the kind which occurs in cyclic control flow structures. In other words, what they're actually referring to - as being specifically "imperative" and not "functional" - is not mutability at all, but cyclic control flow structures along with the infinite nesting of Zen Scopes that this entails.
The key feature that lies absent in the lambda calculus is, thus, seen not as something that may be remedied by the inclusion of an overwrought and overthought "solution" like monads (though that doesn't exclude the possibility of it getting the job done) but as infinitary terms.
A control flow structure is the wrapping of an unwrapped (possibility infinite) decision tree structure. Branches may re-converge. In the corresponding unwrapped structure, they appear as replicated, but separate, branches or subtrees. Goto's are direct links to subtrees. A goto or branch that back-branches to an earlier part of a control flow structure (the very genesis of the "cycling" of a cyclic control flow structure) is a link to an identically-shaped copy of the entire structure being linked to. Corresponding to each structure is its Universally Unrolled decision tree.
More precisely, we may think of a control-flow structure as a statement that precedes an actual expression that conditions the value of that expression. The archetypical case in point is Landin's original case, itself (in his 1960's paper, where he tried to lambda-ize imperative languages): let x = 1 in f(x). The "x = 1" part is the statement, the "f(x)" is the value being conditioned by the statement. In C-like form, we could write this as x = 1, f(x).
More generally, corresponding to each statement S and expression Q is an expression S[Q] which represents the result Q after S is applied. Thus, (x = 1)[f(x)] is just λx f(x) (x + 1). The S wraps around the Q. If S contains cyclic control flow structures, the wrapping will be infinitary.
When Landin tried to work out this strategy, he hit a hard wall when he got to the while loop and went "Oops. Never mind." and fell back into what become an overwrought and overthought solution, while this simple (and in retrospect, obvious) answer eluded his notice.
A while loop "while (x < n) x = x + 1;" - which has the "infinite mutability" mentioned above, may itself be treated as an infinitary wrapper, "if (x < n) { x = x + 1; if (x < 1) { x = x + 1; if (x < 1) { x = x + 1; ... } } }". So, when it wraps around an expression Q, the result is (in C-like notation) "x < n? (x = x + 1, x < n? (x = x + 1, x < n? (x = x + 1, ...): Q): Q): Q", which may be directly rendered in lambda form as "x < n? (λx x < n (λx x < n? (λx·...) (x + 1): Q) (x + 1): Q) (x + 1): Q". This shows directly the connection between cyclicity and infinitariness.
This is an infinitary expression that, despite being infinite, has only a finite number of distinct subexpressions. Just as we can think of there being a Universally Unrolled form to this expression - which is similar to what's shown above (an infinite decision tree) - we can also think of there being a Maximally Rolled form, which could be obtained by labelling each of the distinct subexpressions and referring to the labels, instead. The key subexpressions would then be:
A: x < n? goto B: Q
B: x = x + 1, goto A
The subexpression labels, here, are "A:" and "B:", while the references to the subexpressions so labelled as "goto A" and "goto B", respectively. So, by magic, the very essence of Imperativitity emerges directly out of the infinitary lambda calculus, without any need to posit it separately or anew.
This way of viewing things applies even down to the level of binary files. Every interpretation of every byte (whether it be a part of an opcode of an instruction that starts 0, 1, 2 or more bytes back, or as part of a data structure) can be treated as being there in tandem, so that the binary file is a rolling up of a much larger universally unrolled structure whose physical byte code representation overlaps extensively with itself.
Thus, emerges the imperative programming language paradigm automatically out of the pure lambda calculus, itself, when the calculus is extended to include infinitary terms. The control flow structure is directly embodied in the very structure of the infinitary expression, itself; and thus requires no additional hacks (like Landin's or later descendants, like monads) - as it's already there.
This synthesis of the imperative and functional paradigms arose in the late 1980's via the USENET, but has not (yet) been published. Part of it was already implicit in the treatment (dating from around the same time) given to languages, like Prolog-II, and the much earlier treatment of cyclic recursive structures by infinitary expressions by Irene Guessarian LNCS 99 "Algebraic Semantics".
Now, earlier I said that the magma-based formulation might get you to the same place, or to an approximation thereof. I believe there is a kind of universal representation theorem of some sort, which asserts that the infinitary based formulation provides a purely syntactic representation, and that the semantics that arise from the monad-based representation factors through this as "monad-based semantics" = "infinitary lambda calculus" + "semantics of infinitary languages".
Likewise, we may think of the "Q" expressions above as being continuations; so there may also be a universal representation theorem for continuation semantics, which similarly rolls this formulation back into the infinitary lambda calculus.
At this point, I've said nothing about non-rational infinitary terms (i.e. infinitary terms which possess an infinite number of distinct subterms and no finite Minimal Rolling) - particularly in relation to interprocedural control flow semantics. Rational terms suffice to account for loops and branches, and so provide a platform for intraprocedural control flow semantics; but not as much so for the call-return semantics that are the essential core element of interprocedural control flow semantics, if you consider subprograms to be directly represented as embellished, glorified macros.
There may be something similar to the Chomsky hierarchy for infinitary term languages; so that type 3 corresponds to rational terms, type 2 to "algebraic terms" (those that can be rolled up into a finite set of "goto" references and "macro" definitions), and type 0 for "transcendental terms". That is, for me, an unresolved loose end, as well.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex