How are state monads / monad transformers desugared inside do notation?

How are state monads / monad transformers desugared inside do notation? - functional-programming

Example
sumArray :: Array Int -> State Int Unit
sumArray = traverse_ \n -> modify \sum -> sum + n
t1 :: Int
t1 = execState (do
sumArray [1, 2, 3]
sumArray [4, 5]
sumArray [6]) 0
-- returns 21
module Main where
import Prelude
import Effect (Effect)
import Data.Foldable (fold, traverse_)
import Control.Monad.State (State, execState)
import Control.Monad.State.Class (modify)
import Data.Maybe (Maybe(..))
import Effect.Console (log)
main :: Effect Unit
main = log $ show t1
sumArray :: Array Int -> State Int Unit
sumArray = traverse_ \n -> modify \sum -> sum + n
t1 :: Int
t1 = execState (do
sumArray [1, 2, 3]
sumArray [4, 5]
sumArray [6]) 0
{-
execState :: forall s a. State s a -> s -> s
execState (StateT m) s = case m s of Identity (Tuple _ s') -> s'
type State s = StateT s Identity
-}
Description
How I understand the evaluation of expression t1:
Each sumArray invocation returns a state monad holding the sum of given Int array values.
All three state monads are (somehow) unified into a single one, while accumulating intermediate sums.
execState returns the overall Int sum, given State Int Unit and initial value as input.
Issue
I cannot quite understand step 2 in particular. According to do notation, an expression like sumArray [1, 2, 3] desugars to bind x \_ -> ..., so former input is ignored. If I write do with a different monad type like
t2 :: Maybe Int
t2 = do
Just 3
Just 4
, the compiler complains:
A result of type Int was implicitly discarded in a do notation block. You can use _ <- ... to explicitly discard the result.
, so rules seem to be a bit different with t1.
Question
How exactly are the three separate state monads combined into a single one? More specifically: Why does the runtime calculate the overall sum of all intermediate state monad sum results, and not something like (1+2+3) * (4+5) * 6? In other words: where is the implicit + accumulator?
I feel like I miss some concept from Chapter 11: Monadic Adventures.

#Bergi's answer already gives the essential explanation - but I thought it might be useful to expand on some parts of what he says, and to answer some of your questions more directly.
According to do notation, an expression like sumArray [1, 2, 3]
desugars to bind x \_ -> ..., so former input is ignored.
This is in one sense perfectly true, but also betrays some misunderstandings.
For one thing, I find that quote misleadingly phrased - although it's perfectly acceptable in the context of the original source. It's not talking about how an expression like sumArray [1, 2, 3] "desugars" in and of itself, but about how successive lines ("statements") of a do block are desugared into a single expression that "combines" them - which seems to be in essence what your whole question is about. So yes, it's true - and basically the definition of do notation - that an expression like
do a <- x
y
desugars to bind x \a -> y (where we imagine that y is some more complex expression which presumably involves a). And likewise that
do x
y
desugars to bind x \_ -> y. But the latter case isn't "ignoring input" - it's ignoring output. Let me explain this a little more.
It's common to think of a general monadic value, of type m a, to be some sort of "computation" that "produces" value(s) of type a. That's necessarily quite an abstract formulation - because a Monad is such a general concept, and some specific Monads fit this mental picture better than others. But it's a good way to understand the basics of monads, and do notation in particular - each line can be thought of as a "statement" in some imperative language, which may have some "side effects" (of a kind strictly constrained by the particular monad you're using) and also produces a value as a "result".
In this sense, a do block of the first type above - where we "bind" the result using the "left arrow" notation - is using that computed value (denoted by a) to decide what to do next. (Incidentally, this is what distinguishes monads from applicatives - if you just have a series of computations and just want to combine their "effects", without allowing "intermediate results" to affect what you're doing, you don't actually need monads or bind.) Whereas the second one doesn't use the result of the first computation (that computation being x) - which is exactly what I meant when I said this is "ignoring output". It's ignoring the result of x. That doesn't (necessarily) mean that x is useless though. It's still being used for its "side effects".
To make it more concrete, I'll look in a bit more detail at both of your examples, starting with the simple one in the Maybe monad (I'll make the change that the compiler suggests in order to keep it happy - note that I'm personally much more familiar with Haskell than with Purescript so I may get Purescript-specific things like this wrong, as Haskell would be perfectly OK with your original code):
t2 :: Maybe Int
t2 = do
_ <- Just 3
Just 4
In this case, t2 will simply be equal to Just 4, and it will seem - correctly - that the first line of the do block is redundant. But that's just a consequence of how the Maybe monad works, as well as of the specific value we've got there. I can easily prove to you that the first line does still matter though, by making this change
t2 :: Maybe Int
t2 = do
_ <- Nothing
Just 4
Now you will find that t2 is equal not to Just 4, not to Nothing!
That's because each "computation" in the Maybe monad - that is, each value of type Maybe a - either "succeeds" with a "result" of type a (represented by a Just value), or "fails" (represented by Nothing). And, importantly, the way the Maybe monad is defined - that is, the definition of bind - deliberately propagates failure. That is, any Nothing value that is encountered at any point immediately terminates the computation with a Nothing result.
So even here, the "side effect" of the first computation - the fact that it succeeds or fails - does make a major difference to what happens overall. We just ignore the "result" (the actual value if the computation was successful).
If we now move to the State monad - this is a somewhat more complicated monad than Maybe, but may actually for that reason make the above points easier to understand. Because this is a monad where it really does make immediate sense to talk about the "side effects" and the "result" of each monadic value - which perhaps felt a bit forced, or even silly, in the Maybe case.
A value of type State s a represents a computation that results in a value of type a, while "keeping some state" of type s. That is, the computation may use the current state in order to compute its result, and/or it may update the state as part of the computation. Concretely, this is the same as a function of type s -> (a, s) - it takes some state, and returns an updated state (possibly the same) as well as the computed value. And indeed, the State s a type is essentially a simple newtype wrapper for such a function type.
And the implementation of bind in its Monad instance does the most obvious and natural thing - much easier to explain in words than to "see" from the actual implementation details. Two such "stateful functions" are combined by feeding the original state to the first function, then taking the updated state from that and feeding that to the second function. (Actually, bind needs to do - and does - more than this, because as I mentioned earlier, it needs to be able to use the "result" - the a - from the first computation to decide what to do with the second. But we don't need to go into that now, because in this example we don't use the result value - and indeed couldn't, as it's always of the trivial Unit type. It isn't actually complicated, but I won't go into the detail as I don't want to make this answer even longer!)
So when we do
do
sumArray [1, 2, 3]
sumArray [4, 5]
sumArray [6]
we are building a stateful computation of type State Int Unit - that is, a function of type Int -> (Unit, Int). Since Unit is a non-interesting type, and basically used as a placeholder here for "we don't care about any result", we're basically building a function of type Int -> Int from three other such functions. That's easy enough to do - we can just compose the three functions! And that's what, in this simple case, the implementation of bind for the State monad ends up doing.
Hopefully this answers your main question:
where is the implicit + accumulator?
by showing that there's no "implicit accumulator" other than function composition. It's the fact that those individual functions happen to add (respectively, in this case) 6, 9 and 6 to the input that results in the final result being the sum of those 3 numbers (due to the fact that the composition of two sums is itself a sum, which ultimately comes from the associativity of addition).
But more importantly, I hope this has given you a fuller explanation of Monads, and do notation, which you can apply to many other situations.

Each sumArray invocation returns a state monad holding the sum of given Int array values.
No, it doesn't return "a state monad", and it doesn't hold the sum of the array.
It returns a State Int Unit value, which represents a "stateful computation of a Unit" (using an Int for the state). To get the sum, you actually have to run that computation:
t :: State Int Unit
t = sumArray [1, 2, 3]
x = runState t 0 // ((), 6)
y = runState t 5 // ((), 11)
Notice how for the value y, the sum of the array elements is never computed - it adds 1 to 5, then 2 to 6, then 3 to 8.
How exactly are the three separate state monads combined into a single one?
The key to understanding is that they are not state values, but stateful computations. They can be combined by simply sequencing those computations one after the other, passing the result and state on to the next.

Related

What is a "strongly moded" programming language?

I was looking through the Mercury programming language's about page when I found a part where it said:
Mercury is a strongly moded language
What does this mean!? I've search all over the internet, and have found no answer!

I do not know of any other language that has modes as used in Mercury. The following is from the mercury manual
The mode of a predicate (or function) is a mapping from the initial
state of instantiation of the arguments of the predicate (or the
arguments and result of a function) to their final state of
instantiation.
If you are familiar with prolog you may know what this means.
Consider a function in C with the following typedecl
void sort(int * i, int * o);
Assume this function sorts the array i into another array o. Just from this declaration we have no guarantee that i is read from and o is written to. If we can write additionally mode sort(in, out) that suggests to the compiler that the function sort reads from the first argument and writes to the second argument. The compiler then checks the function body to assure us that no writing to i and reading from o takes place.
For a language like C this may not be suitable, but for a prolog family language this is a very welcome feature. Consider the append/3 predicate which succeeds when the first two lists concatenated is the third list.
append([1, 2], [a, b], X).
X = [1, 2, a, b]
So if we provide two input lists we get an output list. But when we provide the output list and ask for all solutions that result in it, we have
append(X, Y, [1, 2, a, b]).
X = [],
Y = [1, 2, a, b] ;
X = [1],
Y = [2, a, b] ;
X = [1, 2],
Y = [a, b] ;
X = [1, 2, a],
Y = [b] ;
X = [1, 2, a, b],
Y = [] ;
false.
This append([1], [2], [3]). fails where as append([1], [2], [1, 2]). succeeds.
So depending on how we use the predicate, we can have one deterministic answer, multiple answers, or no answer at all. All these properties of the predicate can be declared initially by mode declarations. The following is a mode declaration for append :
:- pred append(list(T), list(T), list(T)).
:- mode append(in, in, out) is det.
:- mode append(out, out, in) is multi.
:- mode append(in, in, in) is semidet.
If you provide the first two, the output is deterministically determined. If you provide the last argument, then the you have multiple solutions to the first two arguments. If you provide all three lists, then it just checks if when the first two are appended we get the third list.
Modes are not restricted to in, out. You will see di destructive input and uo unique output when dealing with IO. They just tell us how a predicate changes the instantiations of argument we provided. Outputs change from free variables to ground terms and inputs remain ground terms. So as a user you can define :- mode input == ground >> ground. and :- mode output == free >> ground. and use them, which are exactly how in and out modes are defined.
Consider a predicate which calculates the length of a list. We do not need the whole list to be instantiated as we know that length([X, Y], 2) is true even when X, Y are free variables. So the mode declaration :- mode length(in, out) is det. is not sufficient as the whole first argument need not be instantiated. So we can also define the instantiatedness of the argument
:- inst listskel == bound([] ; [free | listskel]).
which states that an argument is listskel instantiated if it is a empty list or a free variable ahead of another listskel list.
Such partial evaluations also happen in haskell due to its lazy nature e.g, a whole list need not be evaluated to know its length.
References:
modes
determinism
EDIT: From the mercury website
Currently only a subset of the intended mode system is implemented. This subset effectively requires arguments to be either fully input (ground at the time of call and at the time of success) or fully output (free at the time of call and ground at the time of success).

Modes specify the direction of data-flow, e.g. input or output.
In some languages, the direction of data-flow is fixed, and implicit in the syntax. For example, in most functional languages, function arguments are always
input, and function results are always output.
In most logic programming languages, however, the direction of data-flow is determined at run-time. Most logic programming languages are dynamically moded.
In Mercury, the direction of dataflow must be declared, at least at module boundaries. However, a single predicate or function in Mercury can have multiple modes; the compiler resolves the modes at compile time, and generates separate code for each mode.
Mercury also has support for optional dynamic modes with constraint solving.
(See https://www.researchgate.net/publication/220802747_Adding_Constraint_Solving_to_Mercury.)
But the default is static modes.

Erlang Recursive end loop

I just started learning Erlang and since I found out there is no for loop I tried recreating one with recursion:
display(Rooms, In) ->
Room = array:get(In, Rooms)
io:format("~w", [Room]),
if
In < 59 -> display(Rooms, In + 1);
true -> true
end.
With this code i need to display the content (false or true) of each array in Rooms till the number 59 is reached. However this creates a weird code which displays all of Rooms contents about 60 times (?). When I drop the if statement and only put in the recursive code it is working except for a exception error: Bad Argument.
So basically my question is how do I put a proper end to my "for loop".
Thanks in advance!

Hmm, this code is rewritten and not pasted. It is missing colon after Room = array:get(In, Rooms). The Bad argument error is probably this:
exception error: bad argument
in function array:get/2 (array.erl, line 633)
in call from your_module_name:display/2
This means, that you called array:get/2 with bad arguments: either Rooms is not an array or you used index out of range. The second one is more likely the cause. You are checking if:
In < 59
and then calling display again, so it will get to 58, evaluate to true and call:
display(Rooms, 59)
which is too much.
There is also couple of other things:
In io:format/2 it is usually better to use ~p instead of ~w. It does exactly the same, but with pretty printing, so it is easier to read.
In Erlang if is unnatural, because it evaluates guards and one of them has to match or you get error... It is just really weird.
case is much more readable:
case In < 59 of
false -> do_something();
true -> ok
end
In case you usually write something, that always matches:
case Something of
{One, Two} -> do_stuff(One, Two);
[Head, RestOfList] -> do_other_stuff(Head, RestOfList);
_ -> none_of_the_previous_matched()
end
The underscore is really useful in pattern matching.
In functional languages you should never worry about details like indexes! Array module has map function, which takes function and array as arguments and calls the given function on each array element.
So you can write your code this way:
display(Rooms) ->
DisplayRoom = fun(Index, Room) -> io:format("~p ~p~n", [Index, Room]) end,
array:map(DisplayRoom, Rooms).
This isn't perfect though, because apart from calling the io:format/2 and displaying the contents, it will also construct new array. io:format returns atom ok after completion, so you will get array of 58 ok atoms. There is also array:foldl/3, which doesn't have that problem.
If you don't have to have random access, it would be best to simply use lists.
Rooms = lists:duplicate(58, false),
DisplayRoom = fun(Room) -> io:format("~p~n", [Room]) end,
lists:foreach(DisplayRoom, Rooms)
If you are not comfortable with higher order functions. Lists allow you to easily write recursive algorithms with function clauses:
display([]) -> % always start with base case, where you don't need recursion
ok; % you have to return something
display([Room | RestRooms]) -> % pattern match on list splitting it to first element and tail
io:format("~p~n", [Room]), % do something with first element
display(RestRooms). % recursive call on rest (RestRooms is quite funny name :D)
To summarize - don't write forloops in Erlang :)

This is a general misunderstanding of recursive loop definitions. What you are trying to check for is called the "base condition" or "base case". This is easiest to deal with by matching:
display(0, _) ->
ok;
display(In, Rooms) ->
Room = array:get(In, Rooms)
io:format("~w~n", [Room]),
display(In - 1, Rooms).
This is, however, rather unidiomatic. Instead of using a hand-made recursive function, something like a fold or map is more common.
Going a step beyond that, though, most folks would probably have chosen to represent the rooms as a set or list, and iterated over it using list operations. When hand-written the "base case" would be an empty list instead of a 0:
display([]) ->
ok;
display([Room | Rooms]) ->
io:format("~w~n", [Room]),
display(Rooms).
Which would have been avoided in favor, once again, of a list operation like foreach:
display(Rooms) ->
lists:foreach(fun(Room) -> io:format("~w~n", [Room]) end, Rooms).
Some folks really dislike reading lambdas in-line this way. (In this case I find it readable, but the larger they get the more likely the are to become genuinely distracting.) An alternative representation of the exact same function:
display(Rooms) ->
Display = fun(Room) -> io:format("~w~n", [Room]) end,
lists:foreach(Display, Rooms).
Which might itself be passed up in favor of using a list comprehension as a shorthand for iteration:
_ = [io:format("~w~n", [Room]) | Room <- Rooms].
When only trying to get a side effect, though, I really think that lists:foreach/2 is the best choice for semantic reasons.
I think part of the difficulty you are experiencing is that you have chosen to use a rather unusual structure as your base data for your first Erlang program that does anything (arrays are not used very often, and are not very idiomatic in functional languages). Try working with lists a bit first -- its not scary -- and some of the idioms and other code examples and general discussions about list processing and functional programming will make more sense.
Wait! There's more...
I didn't deal with the case where you have an irregular room layout. The assumption was always that everything was laid out in a nice even grid -- which is never the case when you get into the really interesting stuff (either because the map is irregular or because the topology is interesting).
The main difference here is that instead of simply carrying a list of [Room] where each Room value is a single value representing the Room's state, you would wrap the state value of the room in a tuple which also contained some extra data about that state such as its location or coordinates, name, etc. (You know, "metadata" -- which is such an overloaded, buzz-laden term today that I hate saying it.)
Let's say we need to maintain coordinates in a three-dimensional space in which the rooms reside, and that each room has a list of occupants. In the case of the array we would have divided the array by the dimensions of the layout. A 10*10*10 space would have an array index from 0 to 999, and each location would be found by an operation similar to
locate({X, Y, Z}) -> (1 * X) + (10 * Y) + (100 * Z).
and the value of each Room would be [Occupant1, occupant2, ...].
It would be a real annoyance to define such an array and then mark arbitrarily large regions of it as "unusable" to give the impression of irregular layout, and then work around that trying to simulate a 3D universe.
Instead we could use a list (or something like a list) to represent the set of rooms, but the Room value would now be a tuple: Room = {{X, Y, Z}, [Occupants]}. You may have an additional element (or ten!), like the "name" of the room or some other status information or whatever, but the coordinates are the most certain real identity you're likely to get. To get the room status you would do the same as before, but mark what element you are looking at:
display(Rooms) ->
Display =
fun({ID, Occupants}) ->
io:format("ID ~p: Occupants ~p~n", [ID, Occupants])
end,
lists:foreach(Display, Rooms).
To do anything more interesting than printing sequentially, you could replace the internals of Display with a function that uses the coordinates to plot the room on a chart, check for empty or full lists of Occupants (use pattern matching, don't do it procedurally!), or whatever else you might dream up.

Explanation of lists:fold function

I learning more and more about Erlang language and have recently faced some problem. I read about foldl(Fun, Acc0, List) -> Acc1 function. I used learnyousomeerlang.com tutorial and there was an example (example is about Reverse Polish Notation Calculator in Erlang):
%function that deletes all whitspaces and also execute
rpn(L) when is_list(L) ->
[Res] = lists:foldl(fun rpn/2, [], string:tokens(L," ")),
Res.
%function that converts string to integer or floating poitn value
read(N) ->
case string:to_float(N) of
%returning {error, no_float} where there is no float avaiable
{error,no_float} -> list_to_integer(N);
{F,_} -> F
end.
%rpn managing all actions
rpn("+",[N1,N2|S]) -> [N2+N1|S];
rpn("-", [N1,N2|S]) -> [N2-N1|S];
rpn("*", [N1,N2|S]) -> [N2*N1|S];
rpn("/", [N1,N2|S]) -> [N2/N1|S];
rpn("^", [N1,N2|S]) -> [math:pow(N2,N1)|S];
rpn("ln", [N|S]) -> [math:log(N)|S];
rpn("log10", [N|S]) -> [math:log10(N)|S];
rpn(X, Stack) -> [read(X) | Stack].
As far as I understand lists:foldl executes rpn/2 on every element on list. But this is as far as I can understand this function. I read the documentation but it does not help me a lot. Can someone explain me how lists:foldl works?

Let's say we want to add a list of numbers together:
1 + 2 + 3 + 4.
This is a pretty normal way to write it. But I wrote "add a list of numbers together", not "write numbers with pluses between them". There is something fundamentally different between the way I expressed the operation in prose and the mathematical notation I used. We do this because we know it is an equivalent notation for addition (because it is commutative), and in our heads it reduces immediately to:
3 + 7.
and then
10.
So what's the big deal? The problem is that we have no way of understanding the idea of summation from this example. What if instead I had written "Start with 0, then take one element from the list at a time and add it to the starting value as a running sum"? This is actually what summation is about, and it's not arbitrarily deciding which two things to add first until the equation is reduced.
sum(List) -> sum(List, 0).
sum([], A) -> A;
sum([H|T], A) -> sum(T, H + A).
If you're with me so far, then you're ready to understand folds.
There is a problem with the function above; it is too specific. It braids three ideas together without specifying any independently:
iteration
accumulation
addition
It is easy to miss the difference between iteration and accumulation because most of the time we never give this a second thought. Most languages accidentally encourage us to miss the difference, actually, by having the same storage location change its value each iteration of a similar function.
It is easy to miss the independence of addition merely because of the way it is written in this example because "+" looks like an "operation", not a function.
What if I had said "Start with 1, then take one element from the list at a time and multiply it by the running value"? We would still be doing the list processing in exactly the same way, but with two examples to compare it is pretty clear that multiplication and addition are the only difference between the two:
prod(List) -> prod(List, 1).
prod([], A) -> A;
prod([H|T], A) -> prod(T, H * A).
This is exactly the same flow of execution but for the inner operation and the starting value of the accumulator.
So let's make the addition and multiplication bits into functions, so we can pull that part of the pattern out:
add(A, B) -> A + B.
mult(A, B) -> A * B.
How could we write the list operation on its own? We need to pass a function in -- addition or multiplication -- and have it operate over the values. Also, we have to pay attention to the identity of the type and operation of things we are operating on or else we will screw up the magic that is value aggregation. "add(0, X)" always returns X, so this idea (0 + Foo) is the addition identity operation. In multiplication the identity operation is to multiply by 1. So we must start our accumulator at 0 for addition and 1 for multiplication (and for building lists an empty list, and so on). So we can't write the function with an accumulator value built-in, because it will only be correct for some type+operation pairs.
So this means to write a fold we need to have a list argument, a function to do things argument, and an accumulator argument, like so:
fold([], _, Accumulator) ->
Accumulator;
fold([H|T], Operation, Accumulator) ->
fold(T, Operation, Operation(H, Accumulator)).
With this definition we can now write sum/1 using this more general pattern:
fsum(List) -> fold(List, fun add/2, 0).
And prod/1 also:
fprod(List) -> fold(List, fun prod/2, 1).
And they are functionally identical to the one we wrote above, but the notation is more clear and we don't have to write a bunch of recursive details that tangle the idea of iteration with the idea of accumulation with the idea of some specific operation like multiplication or addition.
In the case of the RPN calculator the idea of aggregate list operations is combined with the concept of selective dispatch (picking an operation to perform based on what symbol is encountered/matched). The RPN example is relatively simple and small (you can fit all the code in your head at once, it's just a few lines), but until you get used to functional paradigms the process it manifests can make your head hurt. In functional programming a tiny amount of code can create an arbitrarily complex process of unpredictable (or even evolving!) behavior, based just on list operations and selective dispatch; this is very different from the conditional checks, input validation and procedural checking techniques used in other paradigms more common today. Analyzing such behavior is greatly assisted by single assignment and recursive notation, because each iteration is a conceptually independent slice of time which can be contemplated in isolation of the rest of the system. I'm talking a little ahead of the basic question, but this is a core idea you may wish to contemplate as you consider why we like to use operations like folds and recursive notations instead of procedural, multiple-assignment loops.
I hope this helped more than confused.

First, you have to remember haw works rpn. If you want to execute the following operation: 2 * (3 + 5), you will feed the function with the input: "3 5 + 2 *". This was useful at a time where you had 25 step to enter a program :o)
the first function called simply split this character list into element:
1> string:tokens("3 5 + 2 *"," ").
["3","5","+","2","*"]
2>
then it processes the lists:foldl/3. for each element of this list, rpn/2 is called with the head of the input list and the current accumulator, and return a new accumulator. lets go step by step:
Step head accumulator matched rpn/2 return value
1 "3" [] rpn(X, Stack) -> [read(X) | Stack]. [3]
2 "5" [3] rpn(X, Stack) -> [read(X) | Stack]. [5,3]
3 "+" [5,3] rpn("+", [N1,N2|S]) -> [N2+N1|S]; [8]
4 "2" [8] rpn(X, Stack) -> [read(X) | Stack]. [2,8]
5 "*" [2,8] rpn("*",[N1,N2|S]) -> [N2*N1|S]; [16]
At the end, lists:foldl/3 returns [16] which matches to [R], and though rpn/1 returns R = 16

What's the difference between "equal (=)" and "identical (==)" in ocaml?

In OCaml, we have two kinds of equity comparisons:
x = y and x == y,
So what's exact the difference between them?
Is that x = y in ocaml just like x.equals(y) in Java?
and x == y just like x == y (comparing the address) in Java?

I don't know exactly how x.equals(y) works in Java. If it does a "deep" comparison, then the analogy is pretty close. One thing to be careful of is that physical equality is a slippery concept in OCaml (and functional languages in general). The compiler and runtime system are going to move values around, and may merge and unmerge pure (non-mutable) values at will. So you should only use == if you really know what you're doing. At some level, it requires familiarity with the implementation (which is something to avoid unless necessary).
The specific guarantees that OCaml makes for == are weak. Mutable values compare as physically equal in the way you would expect (i.e., if mutating one of the two will actually mutate the other also). But for non-mutable values, the only guarantee is that values that compare physically equal (==) will also compare as equal (=). Note that the converse is not true, as sepp2k points out for floating values.
In essence, what the language spec is telling you for non-mutable values is that you can use == as a quick check to decide if two non-mutable values are equal (=). If they compare physically equal, they are equal value-wise. If they don't compare physically equal, you don't know if they're equal value-wise. You still have to use = to decide.

Edit: this answer delves into details of the inner working of OCaml, based on the Obj module. That knowledge isn't meant to be used without extra care (let me emphasis on that very important point once more: don't use it for your program, but only if you wish to experiment with the OCaml runtime). That information is also available, albeit perhaps in a more understandable form in the O'Reilly book on OCaml, available online (pretty good book, though a bit dated now).
The = operator is checking structural equality, whereas == only checks physical equality.
Equality checking is based on the way values are allocated and stored within memory. A runtime value in OCaml may roughly fit into 2 different categories : either boxed or unboxed. The former means that the value is reachable in memory through an indirection, and the later means that the value is directly accessible.
Since int (int31 on 32 bit systems, or int63 on 64 bit systems) are unboxed values, both operators are behaving the same with them. A few other types or values, whose runtime implementations are actually int, will also see both operators behaving the same with them, like unit (), the empty list [], constants in algebraic datatypes and polymorphic variants, etc.
Once you start playing with more complex values involving structures, like lists, arrays, tuples, records (the C struct equivalent), the difference between these two operators emerges: values within structures will be boxed, unless they can be runtime represented as native ints (1). This necessity arises from how the runtime system must handle values, and manage memory efficiently. Structured values are allocated when constructed from other values, which may be themselves structured values, in which case references are used (since they are boxed).
Because of allocations, it is very unlikely that two values instantiated at different points of a program could be physically equal, although they'd be structurally equal. Each of the fields, or inner elements within the values could be identical, even up to physical identity, but if these two values are built dynamically, then they would end up using different spaces in memory, and thus be physically different, but structurally equal.
The runtime tries to avoid unecessary allocations though: for instance, if you have a function returning always the same value (in other words, if the function is constant), either simple or structured, that function will always return the same physical value (ie, the same data in memory), so that testing for physical equality the result of two invocations of that function will be successful.
One way to observe when the physical operator will actually return true is to use the Obj.is_block function on its runtime representation (That is to say, the result of Obj.repr on it). This function simply tells whether its parameter runtime representation is boxed.
A more contrived way is to use the following function:
let phy x : int = Obj.magic (Obj.repr x);;
This function will return an int which is the actual value of the pointer to the value bound to x in memory, if this value is boxed. If you try it on a int literal, you will get the exact same value! That's because int are unboxed (ie. the value is stored directly in memory, not through a reference).
Now that we know that boxed values are actually "referenced" values, we can deduce that these values can be modified, even though the language says that they are immutable.
consider for instance the reference type:
# type 'a ref = {mutable contents : 'a };;
We could define an immutable ref like this:
# type 'a imm = {i : 'a };;
type 'a imm = {i : 'a; }
And then use the Obj.magic function to coerce one type into the other, because structurally, these types will be reduced to the same runtime representation.
For instance:
# let x = { i = 1 };;
- : val x : int imm = { i = 1 }
# let y : int ref = Obj.magic x;;
- : val y : int ref = { contents = 1 }
# y := 2;;
- : unit = ()
# x
- : int imm = { i = 2 }
There are a few exceptions to this:
if values are objects, then even seemingly structurally identical values will return false on structural comparison
# let o1 = object end;;
val o1 : < > = <obj>
# let o2 = object end;;
val o2 : < > = <obj>
# o1 = o2;;
- : bool = false
# o1 = o1;;
- : bool = true
here we see that = reverts to physical equivalence.
If values are functions, you cannot compare them structurally, but physical comparison works as intended.
lazy values may or may not be structurally comparable, depending on whether they have been forced or not (respectively).
# let l1 = lazy (40 + 2);;
val l1 : lazy_t = <lazy>
# let l2 = lazy (40 + 2);;
val l2 : lazy_t = <lazy>
# l1 = l2;;
Exception: Invalid_argument "equal: functional value".
# Lazy.force l1;;
- : int = 42
# Lazy.force l2;;
- : int = 42
# l1 = l2;;
- : bool = true
module or record values are also comparable if they don't contain any functional value.
In general, I guess that it is safe to say that values which are related to functions, or may hold functions inside are not comparable with =, but may be compared with ==.
You should obviously be very cautious with all this: relying on the implementation details of the runtime is incorrect (Note: I jokingly used the word evil in my initial version of that answer, but changed it by fear of it being taken too seriously). As you aptly pointed out in comments, the behaviour of the javascript implementation is different for floats (structurally equivalent in javascript, but not in the reference implementation, and what about the java one?).
(1) If I recall correctly, floats are also unboxed when stored in arrays to avoid a double indirection, but they become boxed once extracted, so you shouldn't see a difference in behaviour with boxed values.

Is that x = y in ocaml just like x.equals(y) in Java?
and x == y just like x == y (comparing the address) in Java?
Yes, that's it. Except that in OCaml you can use = on every kind of value, whereas in Java you can't use equals on primitive types. Another difference is that floating point numbers in OCaml are reference types, so you shouldn't compare them using == (not that it's generally a good idea to compare floating point numbers directly for equality anyway).
So in summary, you basically should always be using = to compare any kind of values.

according to http://rigaux.org/language-study/syntax-across-languages-per-language/OCaml.html, == checks for shallow equality, and = checks for deep equality

A Functional-Imperative Hybrid

Pure functional programming languages do not allow mutable data, but some computations are more naturally/intuitively expressed in an imperative way -- or an imperative version of an algorithm may be more efficient. I am aware that most functional languages are not pure, and let you assign/reassign variables and do imperative things but generally discourage it.
My question is, why not allow local state to be manipulated in local variables, but require that functions can only access their own locals and global constants (or just constants defined in an outer scope)? That way, all functions maintain referential transparency (they always give the same return value given the same arguments), but within a function, a computation can be expressed in imperative terms (like, say, a while loop).
IO and such could still be accomplished in the normal functional ways - through monads or passing around a "world" or "universe" token.

My question is, why not allow local state to be manipulated in local variables, but require that functions can only access their own locals and global constants (or just constants defined in an outer scope)?
Good question. I think the answer is that mutable locals are of limited practical value but mutable heap-allocated data structures (primarily arrays) are enormously valuable and form the backbone of many important collections including efficient stacks, queues, sets and dictionaries. So restricting mutation to locals only would not give an otherwise purely functional language any of the important benefits of mutation.
On a related note, communicating sequential processes exchanging purely functional data structures offer many of the benefits of both worlds because the sequential processes can use mutation internally, e.g. mutable message queues are ~10x faster than any purely functional queues. For example, this is idiomatic in F# where the code in a MailboxProcessor uses mutable data structures but the messages communicated between them are immutable.
Sorting is a good case study in this context. Sedgewick's quicksort in C is short and simple and hundreds of times faster than the fastest purely functional sort in any language. The reason is that quicksort mutates the array in-place. Mutable locals would not help. Same story for most graph algorithms.

The short answer is: there are systems to allow what you want. For example, you can do it using the ST monad in Haskell (as referenced in the comments).
The ST monad approach is from Haskell's Control.Monad.ST. Code written in the ST monad can use references (STRef) where convenient. The nice part is that you can even use the results of the ST monad in pure code, as it is essentially self-contained (this is basically what you were wanting in the question).
The proof of this self-contained property is done through the type-system. The ST monad carries a state-thread parameter, usually denoted with a type-variable s. When you have such a computation you'll have monadic result, with a type like:
foo :: ST s Int
To actually turn this into a pure result, you have to use
runST :: (forall s . ST s a) -> a
You can read this type like: give me a computation where the s type parameter doesn't matter, and I can give you back the result of the computation, without the ST baggage. This basically keeps the mutable ST variables from escaping, as they would carry the s with them, which would be caught by the type system.
This can be used to good effect on pure structures that are implemented with underlying mutable structures (like the vector package). One can cast off the immutability for a limited time to do something that mutates the underlying array in place. For example, one could combine the immutable Vector with an impure algorithms package to keep the most of the performance characteristics of the in place sorting algorithms and still get purity.
In this case it would look something like:
pureSort :: Ord a => Vector a -> Vector a
pureSort vector = runST $ do
mutableVector <- thaw vector
sort mutableVector
freeze mutableVector
The thaw and freeze functions are linear-time copying, but this won't disrupt the overall O(n lg n) running time. You can even use unsafeFreeze to avoid another linear traversal, as the mutable vector isn't used again.

"Pure functional programming languages do not allow mutable data" ... actually it does, you just simply have to recognize where it lies hidden and see it for what it is.
Mutability is where two things have the same name and mutually exclusive times of existence so that they may be treated as "the same thing at different times". But as every Zen philosopher knows, there is no such thing as "same thing at different times". Everything ceases to exist in an instant and is inherited by its successor in possibly changed form, in a (possibly) uncountably-infinite succession of instants.
In the lambda calculus, mutability thus takes the form illustrated by the following example: (λx (λx f(x)) (x+1)) (x+1), which may also be rendered as "let x = x + 1 in let x = x + 1 in f(x)" or just "x = x + 1, x = x + 1, f(x)" in a more C-like notation.
In other words, "name clash" of the "lambda calculus" is actually "update" of imperative programming, in disguise. They are one and the same - in the eyes of the Zen (who is always right).
So, let's refer to each instant and state of the variable as the Zen Scope of an object. One ordinary scope with a mutable object equals many Zen Scopes with constant, unmutable objects that either get initialized if they are the first, or inherit from their predecessor if they are not.
When people say "mutability" they're misidentifying and confusing the issue. Mutability (as we've just seen here) is a complete red herring. What they actually mean (even unbeknonwst to themselves) is infinite mutability; i.e. the kind which occurs in cyclic control flow structures. In other words, what they're actually referring to - as being specifically "imperative" and not "functional" - is not mutability at all, but cyclic control flow structures along with the infinite nesting of Zen Scopes that this entails.
The key feature that lies absent in the lambda calculus is, thus, seen not as something that may be remedied by the inclusion of an overwrought and overthought "solution" like monads (though that doesn't exclude the possibility of it getting the job done) but as infinitary terms.
A control flow structure is the wrapping of an unwrapped (possibility infinite) decision tree structure. Branches may re-converge. In the corresponding unwrapped structure, they appear as replicated, but separate, branches or subtrees. Goto's are direct links to subtrees. A goto or branch that back-branches to an earlier part of a control flow structure (the very genesis of the "cycling" of a cyclic control flow structure) is a link to an identically-shaped copy of the entire structure being linked to. Corresponding to each structure is its Universally Unrolled decision tree.
More precisely, we may think of a control-flow structure as a statement that precedes an actual expression that conditions the value of that expression. The archetypical case in point is Landin's original case, itself (in his 1960's paper, where he tried to lambda-ize imperative languages): let x = 1 in f(x). The "x = 1" part is the statement, the "f(x)" is the value being conditioned by the statement. In C-like form, we could write this as x = 1, f(x).
More generally, corresponding to each statement S and expression Q is an expression S[Q] which represents the result Q after S is applied. Thus, (x = 1)[f(x)] is just λx f(x) (x + 1). The S wraps around the Q. If S contains cyclic control flow structures, the wrapping will be infinitary.
When Landin tried to work out this strategy, he hit a hard wall when he got to the while loop and went "Oops. Never mind." and fell back into what become an overwrought and overthought solution, while this simple (and in retrospect, obvious) answer eluded his notice.
A while loop "while (x < n) x = x + 1;" - which has the "infinite mutability" mentioned above, may itself be treated as an infinitary wrapper, "if (x < n) { x = x + 1; if (x < 1) { x = x + 1; if (x < 1) { x = x + 1; ... } } }". So, when it wraps around an expression Q, the result is (in C-like notation) "x < n? (x = x + 1, x < n? (x = x + 1, x < n? (x = x + 1, ...): Q): Q): Q", which may be directly rendered in lambda form as "x < n? (λx x < n (λx x < n? (λx·...) (x + 1): Q) (x + 1): Q) (x + 1): Q". This shows directly the connection between cyclicity and infinitariness.
This is an infinitary expression that, despite being infinite, has only a finite number of distinct subexpressions. Just as we can think of there being a Universally Unrolled form to this expression - which is similar to what's shown above (an infinite decision tree) - we can also think of there being a Maximally Rolled form, which could be obtained by labelling each of the distinct subexpressions and referring to the labels, instead. The key subexpressions would then be:
A: x < n? goto B: Q
B: x = x + 1, goto A
The subexpression labels, here, are "A:" and "B:", while the references to the subexpressions so labelled as "goto A" and "goto B", respectively. So, by magic, the very essence of Imperativitity emerges directly out of the infinitary lambda calculus, without any need to posit it separately or anew.
This way of viewing things applies even down to the level of binary files. Every interpretation of every byte (whether it be a part of an opcode of an instruction that starts 0, 1, 2 or more bytes back, or as part of a data structure) can be treated as being there in tandem, so that the binary file is a rolling up of a much larger universally unrolled structure whose physical byte code representation overlaps extensively with itself.
Thus, emerges the imperative programming language paradigm automatically out of the pure lambda calculus, itself, when the calculus is extended to include infinitary terms. The control flow structure is directly embodied in the very structure of the infinitary expression, itself; and thus requires no additional hacks (like Landin's or later descendants, like monads) - as it's already there.
This synthesis of the imperative and functional paradigms arose in the late 1980's via the USENET, but has not (yet) been published. Part of it was already implicit in the treatment (dating from around the same time) given to languages, like Prolog-II, and the much earlier treatment of cyclic recursive structures by infinitary expressions by Irene Guessarian LNCS 99 "Algebraic Semantics".
Now, earlier I said that the magma-based formulation might get you to the same place, or to an approximation thereof. I believe there is a kind of universal representation theorem of some sort, which asserts that the infinitary based formulation provides a purely syntactic representation, and that the semantics that arise from the monad-based representation factors through this as "monad-based semantics" = "infinitary lambda calculus" + "semantics of infinitary languages".
Likewise, we may think of the "Q" expressions above as being continuations; so there may also be a universal representation theorem for continuation semantics, which similarly rolls this formulation back into the infinitary lambda calculus.
At this point, I've said nothing about non-rational infinitary terms (i.e. infinitary terms which possess an infinite number of distinct subterms and no finite Minimal Rolling) - particularly in relation to interprocedural control flow semantics. Rational terms suffice to account for loops and branches, and so provide a platform for intraprocedural control flow semantics; but not as much so for the call-return semantics that are the essential core element of interprocedural control flow semantics, if you consider subprograms to be directly represented as embellished, glorified macros.
There may be something similar to the Chomsky hierarchy for infinitary term languages; so that type 3 corresponds to rational terms, type 2 to "algebraic terms" (those that can be rolled up into a finite set of "goto" references and "macro" definitions), and type 0 for "transcendental terms". That is, for me, an unresolved loose end, as well.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex