SML syntactical restrictions within recursive bindings? - recursion

There seems to be syntactical restrictions within SML's recursive bindings, which I'm unable to understand. What are these restrictions I'm not encountering in the second case (see source below) and I'm encountering when using a custom operator in the first case?
Below is the case with which I encountered the issue. It fails when I want to use a custom operator, as explained in comments. Of the major SML implementations I'm testing SML sources with, only Poly/ML accepts it as valid, and all of MLton, ML Kit and HaMLet rejects it.
Error messages are rather confusing to me. The clearest one to my eyes, is the one from HaMLet, which complains about “illegal expression within recursive value binding”.
(* A datatype to pass recursion as result and to arguments. *)
datatype 'a process = Chain of ('a -> 'a process)
(* A controlling iterator, where the item handler is
* of type `'a -> 'a process`, so that while handling an item,
* it's also able to return the next handler to be used, making
* the handler less passive. *)
val rec iter =
fn process: int -> int process =>
fn first: int =>
fn last: int =>
let
val rec step =
fn (i: int, Chain process) (* -> unit *) =>
if i < first then ()
else if i = last then (process i; ())
else if i > last then ()
else
let val Chain process = process i
in step (i + 1, Chain process)
end
in step (first, Chain process)
end
(* An attempt to set‑up a syntax to make use of the `Chain` constructor,
* a bit more convenient and readable. *)
val chain: unit * ('a -> 'a process) -> 'a process =
fn (a, b) => (a; Chain b)
infixr 0 THEN
val op THEN = chain
(* A test of this syntax:
* - OK with Poly/ML, which displays “0-2|4-6|8-10|12-14|16-18|20”.
* - fails with MLton, which complains about a syntax error on line #44.
* - fails with ML Kit, which complains about a syntax error on line #51.
* - fails with HaMLet, which complains about a syntax error on line #45.
* The clearest (while not helpful to me) message comes from HaMLet, which
* says “illegal expression within recursive value binding”. *)
val rec process: int -> int process =
(fn x => print (Int.toString x) THEN
(fn x => print "-" THEN
(fn x => print (Int.toString x) THEN
(fn x => print "|" THEN
process))))
val () = iter process 0 20
val () = print "\n"
(* Here is the same without the `THEN` operator. This one works with
* all of Poly/ML, MLton, ML Kit and HaMLet. *)
val rec process =
fn x =>
(print (Int.toString x);
Chain (fn x => (print "-";
Chain (fn x => (print (Int.toString x);
Chain (fn x => (print "|";
Chain process)))))))
val () = iter process 0 20
val () = print "\n"
(* SML implementations version notes:
* - MLton, is the last version, built just yesterday
* - Poly/ML is Poly/ML 5.5.2
* - ML Kit is MLKit 4.3.7
* - HaMLet is HaMLet 2.0.0 *)
Update
I could work around the issue, but still don't understand it. If I remove the outermost parentheses, then it validates:
val rec process: int -> int process =
fn x => print (Int.toString x) THEN
(fn x => print "-" THEN
(fn x => print (Int.toString x) THEN
(fn x => print "|" THEN
process)))
Instead of:
val rec process: int -> int process =
(fn x => print (Int.toString x) THEN
(fn x => print "-" THEN
(fn x => print (Int.toString x) THEN
(fn x => print "|" THEN
process))))
But why is this so? An SML syntax subtlety? What's its rational?

It's just an over-restrictive sentence in the language definition, which says:
For each value binding "pat = exp" within rec, exp must be of the form "fn match".
Strictly speaking, that doesn't allow any parentheses. In practice, that's rarely a problem, because you almost always use the fun declaration syntax anyway.

Related

converting to tail recursion

Consider a function g(x):
g(x) = x == 0 ? 0 : g(g(x - 1))
I don't think this is tail recursive, since the call to g(x) can be broken down into two parts:
let y = g(x - 1);
return g(y);
how to convert this to tail recursion?
continuation-passing style
You can convert from direct style to continuation-passing style -
g'(x,return) =
x == 0
? return(0)
: g'(x - 1, x' => # tail
g'(x', return)) # tail
Now write g to call g' with the default continuation -
g(x) = g'(x, x' => x')
Depending on the language you use, you may have to rename return to something else. k is a popular choice.
another example
We can see this technique applied to other problems like fib -
# direct style
fib(x) =
x < 2
? x
: fib(x - 1) + fib(x - 2) # "+" has tail position; not fib
# continuation-passing style
fib'(x, return) =
x < 2
? return(x)
: fib'(x - 1, x' => # tail| first compute x - 1
fib(x - 2, x'' => # tail| then compute x - 2
return(x' + x''))) # tail| then return sum
fib(x) = fib'(x, z => z)
other uses
Continuation-passing style is useful in other ways too. For example, it can be used to provide branching or early-exit behavior in this search program -
search(t, index, match, ifFound, notFound) =
i >= length(t)
? notFound()
: t[index] == match
? ifFound(index)
: search(t, index + 1, match, ifFound, notFound)
We can call search with continuations for each possible outcome -
search(["bird", "cat", "dog"],
0,
"cat",
matchedIndex => print("found at: " + matchedIndex),
() => print("not found")
)
how to
A function written in continuation-passing style takes an extra argument: an explicit "continuation"; i.e., a function of one argument — wikipedia
(* direct style *)
add(a, b) = ...
(* continuation-passing style takes extra argument *)
add(a, b, k) = ...
When the CPS function has computed its result value, it "returns" it by calling the continuation function with this value as the argument.
(* direct style *)
add(a, b) =
a + b
(* continuation-passing style "returns" by calling continuation *)
add(a, b, k) =
k(a + b) (* call k with the return value *)
That means that when invoking a CPS function, the calling function is required to supply a procedure to be invoked with the subroutine's "return" value.
(* direct style *)
print(add(5, 3)) (* 8 *)
(* continuation-passing style *)
add(5, 3, print) (* 8 *)
Expressing code in this form makes a number of things explicit which are implicit in direct style. These include: procedure returns, which become apparent as calls to a continuation; intermediate values, which are all given names; order of argument evaluation, which is made explicit; and tail calls, which simply call a procedure with the same continuation, unmodified, that was passed to the caller.
you've probably used continuations before
If you've ever run into someone saying "callback", what they really mean is continuation.
"When the button is clicked, continue the program with event => ..." -
document.querySelector("#foo").addEventListener("click", event => {
console.log("this code runs inside a continuation!")
})
<button id="foo">click me</button>
"When the file contents are read, continue the program with (err, data) => ..." -
import { readFile } from 'fs';
readFile('/etc/passwd', (err, data) => {
if (err) throw err;
console.log(data);
});

How does F# implement let rec?

I am wondering how F# implements let rec, and I couldn't find an answer. As a preface, I'll address how Scheme implements letrec:
In Scheme, let is just syntactics sugar for a definition of a lambda and applying it:
(let ((x 1)) (+ x 2))
is transformed to
((lambda (x) (+ x 2)) 1)
(in each case the expression is evaluated to 3).
letrec is also syntactic sugar, but #f is passed as initial argument to the lambda's parameters, and set! expressions are injected before the letrec body, like in this transformation:
(letrec ((x 1)) (+ x 2)) => ((lambda (x) (begin (set! x 1) (+ x 2))) #f).
Considering that F# doesn't have an equivalent operator to Scheme's set!, how does it implement let rec? Does it declare the function's parameters as mutable, and then mutate them in the function's body?
In F#, let rec allows a reference to the binding from within the function before it has been bound. let rec doesn't have an implementation per se, because it is merely a compiler hint.
In this contrived example,
let rec even =
function 0 -> true | 1 -> false | x -> odd (x - 1)
and odd =
function 0 -> false | 1 -> true | x -> even (x - 1)
the compiled IL very unglamorously translates to:
public static bool even(int _arg1)
{
switch (_arg1)
{
case 0:
return true;
case 1:
return false;
default:
return odd(_arg1 - 1);
}
}
public static bool odd(int _arg2)
{
switch (_arg2)
{
case 0:
return false;
case 1:
return true;
default:
return even(_arg2 - 1);
}
}
All function definitions are statically compiled to IL.
F# ultimately is a language which runs on the CLR.
There is no meta-programming.

ml function of type fn : 'a -> 'b

The function:
fn : 'a -> 'b
Now, are there any functions which can be defined and have this type?
There are two possible implementations for that function signature in Standard ML. One employs exceptions, the other recursion:
val raises : 'a -> 'b =
fn a => raise Fail "some error";
(* Infinite looping; satisfies the type signature, *)
(* but won't ever produce anything. *)
val rec loops : 'a -> 'b =
fn a => loops a;
The first solution may be useful for defining a helper function, say bug, which saves a few key strokes:
fun bug msg = raise Fail ("BUG: " ^ msg);
The other solution may be useful for defining server loops or REPLs.
In the Basis library, OS.Process.exit is such a function that returns an unknown generic type 'a:
- OS.Process.exit;
val it = fn : OS.Process.status -> 'a
A small echo REPL with type val repl = fn : unit -> 'a:
fun repl () =
let
val line = TextIO.inputLine TextIO.stdIn
in
case line of
NONE => OS.Process.exit OS.Process.failure
| SOME ":q\n" => OS.Process.exit OS.Process.success
| SOME line => (TextIO.print line ; repl ())
end
You might also find useful this question about the type signature of Haskell's forever function.
I can think of one example:
fun f a = raise Div;
I can think of several:
One that is recursive,
fun f x = f x
Any function that raises exceptions,
fun f x = raise SomeExn
Any function that is mutually recursive, e.g.,
fun f x = g x
and g x = f x
Any function that uses casting (requires specific compiler support, below is for Moscow ML),
fun f x = Obj.magic x
Breaking the type system like this is probably cheating, but unlike all the other functions with this type, this function actually returns something. (In the simplest case, it's the identity function.)
A function that throws if the Collatz conjecture is false, recurses infinitely if true,
fun f x =
let fun loop (i : IntInf.int) =
if collatz i
then loop (i+1)
else raise Collatz
in loop 1 end
which is really just a combination of the first two.
Any function that performs arbitrary I/O and recurses infinitely, e.g.
fun f x = (print "Woohoo!"; f x)
fun repl x =
let val y = read ()
val z = eval y
val _ = print z
in repl x end
One may argue that exceptions and infinite recursion represent the same theoretical value ⊥ (bottom) meaning "no result", although since you can catch exceptions and not infinitely recursive functions, you may also argue they're different.
If you restrict yourself to pure functions (e.g. no printing or exceptions) and only Standard ML (and not compiler-specific features) and you think of the mutually recursive cases as functionally equivalent in spite of their different recursion schemes, we're back to just fun f x = f x.
The reason why fun f x = f x has type 'a → 'b is perhaps obvious: The type-inference algorithm assumes that the input type and the output type are 'a and 'b respectively and goes on to conclude the function's only constraint: That f x's input type must be equal to f x's input type, and that f x's output type must be equal to f x's output type, at which point the types 'a and 'b have not been specialized any further.

Conduits - multiple "attempts" into one Source

I am trying to do the following:
sourceIRC
:: (MonadBaseControl IO m, MonadLogger m)
=> NetworkSettings
-> Producer (ConnectionT m) Message
sourceIRC networkSettings = do
withConnectionForever networkSettings $ \socket -> do
bracket (liftBase $ Network.socketToHandle socket IO.ReadWriteMode)
(\handle -> liftBase $ IO.hClose handle)
(\handle -> do
mvar <- newMVar False
bracket (fork $ do
threadDelay 5000000
_ <- swapMVar mvar True
return ())
(killThread)
(\_ -> runConnectionT handle (sourceHandle handle))
takeMVar mvar)
As you can see, I am trying to create a Producer in terms of a primitive withConnectionForever. That primitive is of type:
withConnectionForever
:: (MonadBaseControl IO m, MonadLogger m)
=> NetworkSettings
-> (Network.Socket -> m Bool)
-> m ()
As you can imagine, I am getting an error message on compilation! It is:
Haskell/IRC.hs:128:54:
Couldn't match expected type `ConnectionT m0 a0'
with actual type `ConduitM i0 ByteString.ByteString m1 ()'
In the return type of a call of `sourceHandle'
In the second argument of `runConnectionT', namely
`(sourceHandle handle)'
In the expression: runConnectionT handle (sourceHandle handle)
Now, I know that the type of the call to withConnectionForever is not obviously a conduit, but I had hoped that it could manage to be one, by virtue of the fact that a conduit is also a monad and withConnectionForever uses a free monad instead of a hardcoded one. My understanding of what the message is trying to communicate is that that's not happening, and I'd like to know why and what I can do about it.
Here, for completeness, is the source of the primitive:
withConnectionForever
:: (MonadBaseControl IO m, MonadLogger m)
=> NetworkSettings
-> (Network.Socket -> m Bool)
-> m ()
withConnectionForever networkSettings action = do
let loop nFailures = do
maybeSocket <- newConnection networkSettings
case maybeSocket of
Nothing -> return ()
Just socket -> do
terminatedNormally <- action socket
if terminatedNormally
then loop 0
else do
exponentTwiddle <- liftBase $ Random.randomRIO (0, 100)
let exponent =
1.25 + fromIntegral (exponentTwiddle - 50) / 100.0
delay = floor $ 1000000.0 *
((0.5 ** (fromIntegral nFailures * negate exponent))
- 1.0 :: Double)
$(logInfo) (Text.concat
["Abnormal disconnection from the network ",
networkSettingsName networkSettings,
"; pausing attempts for ",
Text.pack $ show $ fromIntegral delay / 1000000.0,
" seconds..."])
liftBase $ threadDelay delay
loop (nFailures + 1)
loop 0
I'd really prefer not to rewrite the primitive, unless it can be done in minimally invasive fashion, but I suppose that's on the table.
Thanks in advance!
The relevant thing to do was
(\_ -> transPipe (runConnectionT handle) (sourceHandle handle))
instead of
(\_ -> runConnectionT handle (sourceHandle handle))
Thanks for your time! :D

Implementing a direct-threaded interpreter in a functional language like OCaml

In C/C++ you can implement a direct threaded interpreter with an array of function pointers. The array represents your program - an array of operations. Each of the operation functions must end in a call to the next function in the array, something like:
void op_plus(size_t pc, uint8_t* data) {
*data += 1;
BytecodeArray[pc+1](pc+1, data); //call the next operation in the array
}
The BytecodeArray is an array of function pointers. If we had an array of these op_plus operations then length of the array would determine how ofter we'd be incrementing the contents of data. (of course, you'd need to add some sort of terminating operation as the last operation in the array).
How would one go about implementing something like this in OCaml? I may be trying to translate this code too literally: I was using an OCaml Array of functions as in the C++. The problem with that is that I keep ending up with something like:
let op_plus pc data = Printf.printf "pc: %d, data_i: %d \n" pc data;
let f = (op_array.(pc+1)) in
f (pc+1) (data+1) ;;
Where op_array is an Array defined in the scope above and then redefine it later to be filled with a bunch of op_plus functions... however, the op_plus function uses the previous definition of op_array. It's a chicken&egg problem.
Another alternative would be using CPS and avoid explicit function array altogether. Tail call optimization still applies in this case.
I don't know how do you generate the code, but let's make not unreasonable assumption that at some point you have an array of VM instructions you want to prepare for execution. Every instruction is still represented as a function, but instead of program counter it receives continuation function.
Here is the simplest example:
type opcode = Add of int | Sub of int
let make_instr opcode cont =
match opcode with
| Add x -> fun data -> Printf.printf "add %d %d\n" data x; cont (data + x)
| Sub x -> fun data -> Printf.printf "sub %d %d\n" data x; cont (data - x)
let compile opcodes =
Array.fold_right make_instr opcodes (fun x -> x)
Usage (look at inferred types):
# #use "cpsvm.ml";;
type opcode = Add of int | Sub of int
val make_instr : opcode -> (int -> 'a) -> int -> 'a = <fun>
val compile : opcode array -> int -> int = <fun>
# let code = [| Add 13; Add 42; Sub 7 |];;
val code : opcode array = [|Add 13; Add 42; Sub 7|]
# let fn = compile code;;
val fn : int -> int = <fun>
# fn 0;;
add 0 13
add 13 42
sub 55 7
- : int = 48
UPDATE:
It's easy to introduce [conditional] branching in this model. if continuation is constructed from two arguments: iftrue-continuation and iffalse-continuation, but has the same type as every other continuation function. The problem is that we don't know what constitutes these continuations in case of backward branching (backward, because we compile from tail to head). That's easy to overcome with destructive updates (though maybe more elegant solution is possible if you are compiling from a high level language): just leave "holes" and fill them later when branch target is reached by the compiler.
Sample implementation (I've made use of string labels instead of integer instruction pointers, but this hardly matters):
type label = string
type opcode =
Add of int | Sub of int
| Label of label | Jmp of label | Phi of (int -> bool) * label * label
let make_instr labels opcode cont =
match opcode with
| Add x -> fun data -> Printf.printf "add %d %d\n" data x; cont (data + x)
| Sub x -> fun data -> Printf.printf "sub %d %d\n" data x; cont (data - x)
| Label label -> (Hashtbl.find labels label) := cont; cont
| Jmp label ->
let target = Hashtbl.find labels label in
(fun data -> Printf.printf "jmp %s\n" label; !target data)
| Phi (cond, tlabel, flabel) ->
let tcont = Hashtbl.find labels tlabel
and fcont = Hashtbl.find labels flabel in
(fun data ->
let b = cond data in
Printf.printf "branch on %d to %s\n"
data (if b then tlabel else flabel);
(if b then !tcont else !fcont) data)
let compile opcodes =
let id = fun x -> x in
let labels = Hashtbl.create 17 in
Array.iter (function
| Label label -> Hashtbl.add labels label (ref id)
| _ -> ())
opcodes;
Array.fold_right (make_instr labels) opcodes id
I've used two passes for clarity but it's easy to see that it can be done in one pass.
Here is a simple loop that can be compiled and executed by the code above:
let code = [|
Label "entry";
Phi (((<) 0), "body", "exit");
Label "body";
Sub 1;
Jmp "entry";
Label "exit" |]
Execution trace:
# let fn = compile code;;
val fn : int -> int = <fun>
# fn 3;;
branch on 3 to body
sub 3 1
jmp entry
branch on 2 to body
sub 2 1
jmp entry
branch on 1 to body
sub 1 1
jmp entry
branch on 0 to exit
- : int = 0
UPDATE 2:
Performance-wise, CPS representation is likely to be faster than array-based, because there is no indirection in case of linear execution. Continuation function is stored directly in the instruction closure. In the array-based implementation it has to increment program counter and perform array access (with an extra bounds checking overhead) first.
I've made some benchmarks to demonstrate it. Here is an implementation of array-based interpreter:
type opcode =
Add of int | Sub of int
| Jmp of int | Phi of (int -> bool) * int * int
| Ret
let compile opcodes =
let instr_array = Array.make (Array.length opcodes) (fun _ data -> data)
in Array.iteri (fun i opcode ->
instr_array.(i) <- match opcode with
| Add x -> (fun pc data ->
let cont = instr_array.(pc + 1) in cont (pc + 1) (data + x))
| Sub x -> (fun pc data ->
let cont = instr_array.(pc + 1) in cont (pc + 1) (data - x))
| Jmp pc -> (fun _ data ->
let cont = instr_array.(pc) in cont (pc + 1) data)
| Phi (cond, tbranch, fbranch) ->
(fun _ data ->
let pc = (if cond data then tbranch else fbranch) in
let cont = instr_array.(pc) in
cont pc data)
| Ret -> fun _ data -> data)
opcodes;
instr_array
let code = [|
Phi (((<) 0), 1, 3);
Sub 1;
Jmp 0;
Ret
|]
let () =
let fn = compile code in
let result = fn.(0) 0 500_000_000 in
Printf.printf "%d\n" result
Let's see how it compares to the CPS-based interpreter above (with all debug tracing stripped, of course). I used OCaml 3.12.0 native compiler on Linux/amd64. Each program was run 5 times.
array: mean = 13.7 s, stddev = 0.24
CPS: mean = 11.4 s, stddev = 0.20
So even in tight loop CPS performs considerably better than array. If we unroll loop and replace one sub instruction with five, figures change:
array: mean = 5.28 s, stddev = 0.065
CPS: mean = 4.14 s, stddev = 0.309
It's interesting that both implementations actually beat OCaml bytecode interpreter. The following loop takes 17 seconds to execute on my machine:
for i = 500_000_000 downto 0 do () done
You should not redefine op_array, you should fill it in with instructions by modifying it in place so that it's the same op_array that your functions already refer to. Unfortunately, you can't change the size of an array dynamically in OCaml.
I see two solutions:
1) if you don't need to change the sequence of "instructions", define them in a mutual recursion with the array op_array. OCaml allows mutually recursive functions and values that start with the application of a constructor to be defined. Something like:
let rec op_plus pc data = ...
and op_array = [| ... |]
2) Or use an additional indirection: make op_array a reference to an array of instructions, and refer in the functions to (!op_array).(pc+1). Later, after you have defined all the instructions, you can make op_array point to an array of the right size, full of the instructions you intend.
let op_array = ref [| |] ;;
let op_plus pc data = ... ;;
op_array := [| ... |] ;;
One more option (if the size is known beforehand) - initially fill the array with void instructions :
let op_array = Array.create size (fun _ _ -> assert false)
let op_plus = ...
let () = op_array.(0) <- op_plus; ...

Resources