Implementing a direct-threaded interpreter in a functional language like OCaml - functional-programming

In C/C++ you can implement a direct threaded interpreter with an array of function pointers. The array represents your program - an array of operations. Each of the operation functions must end in a call to the next function in the array, something like:
void op_plus(size_t pc, uint8_t* data) {
*data += 1;
BytecodeArray[pc+1](pc+1, data); //call the next operation in the array
}
The BytecodeArray is an array of function pointers. If we had an array of these op_plus operations then length of the array would determine how ofter we'd be incrementing the contents of data. (of course, you'd need to add some sort of terminating operation as the last operation in the array).
How would one go about implementing something like this in OCaml? I may be trying to translate this code too literally: I was using an OCaml Array of functions as in the C++. The problem with that is that I keep ending up with something like:
let op_plus pc data = Printf.printf "pc: %d, data_i: %d \n" pc data;
let f = (op_array.(pc+1)) in
f (pc+1) (data+1) ;;
Where op_array is an Array defined in the scope above and then redefine it later to be filled with a bunch of op_plus functions... however, the op_plus function uses the previous definition of op_array. It's a chicken&egg problem.

Another alternative would be using CPS and avoid explicit function array altogether. Tail call optimization still applies in this case.
I don't know how do you generate the code, but let's make not unreasonable assumption that at some point you have an array of VM instructions you want to prepare for execution. Every instruction is still represented as a function, but instead of program counter it receives continuation function.
Here is the simplest example:
type opcode = Add of int | Sub of int
let make_instr opcode cont =
match opcode with
| Add x -> fun data -> Printf.printf "add %d %d\n" data x; cont (data + x)
| Sub x -> fun data -> Printf.printf "sub %d %d\n" data x; cont (data - x)
let compile opcodes =
Array.fold_right make_instr opcodes (fun x -> x)
Usage (look at inferred types):
# #use "cpsvm.ml";;
type opcode = Add of int | Sub of int
val make_instr : opcode -> (int -> 'a) -> int -> 'a = <fun>
val compile : opcode array -> int -> int = <fun>
# let code = [| Add 13; Add 42; Sub 7 |];;
val code : opcode array = [|Add 13; Add 42; Sub 7|]
# let fn = compile code;;
val fn : int -> int = <fun>
# fn 0;;
add 0 13
add 13 42
sub 55 7
- : int = 48
UPDATE:
It's easy to introduce [conditional] branching in this model. if continuation is constructed from two arguments: iftrue-continuation and iffalse-continuation, but has the same type as every other continuation function. The problem is that we don't know what constitutes these continuations in case of backward branching (backward, because we compile from tail to head). That's easy to overcome with destructive updates (though maybe more elegant solution is possible if you are compiling from a high level language): just leave "holes" and fill them later when branch target is reached by the compiler.
Sample implementation (I've made use of string labels instead of integer instruction pointers, but this hardly matters):
type label = string
type opcode =
Add of int | Sub of int
| Label of label | Jmp of label | Phi of (int -> bool) * label * label
let make_instr labels opcode cont =
match opcode with
| Add x -> fun data -> Printf.printf "add %d %d\n" data x; cont (data + x)
| Sub x -> fun data -> Printf.printf "sub %d %d\n" data x; cont (data - x)
| Label label -> (Hashtbl.find labels label) := cont; cont
| Jmp label ->
let target = Hashtbl.find labels label in
(fun data -> Printf.printf "jmp %s\n" label; !target data)
| Phi (cond, tlabel, flabel) ->
let tcont = Hashtbl.find labels tlabel
and fcont = Hashtbl.find labels flabel in
(fun data ->
let b = cond data in
Printf.printf "branch on %d to %s\n"
data (if b then tlabel else flabel);
(if b then !tcont else !fcont) data)
let compile opcodes =
let id = fun x -> x in
let labels = Hashtbl.create 17 in
Array.iter (function
| Label label -> Hashtbl.add labels label (ref id)
| _ -> ())
opcodes;
Array.fold_right (make_instr labels) opcodes id
I've used two passes for clarity but it's easy to see that it can be done in one pass.
Here is a simple loop that can be compiled and executed by the code above:
let code = [|
Label "entry";
Phi (((<) 0), "body", "exit");
Label "body";
Sub 1;
Jmp "entry";
Label "exit" |]
Execution trace:
# let fn = compile code;;
val fn : int -> int = <fun>
# fn 3;;
branch on 3 to body
sub 3 1
jmp entry
branch on 2 to body
sub 2 1
jmp entry
branch on 1 to body
sub 1 1
jmp entry
branch on 0 to exit
- : int = 0
UPDATE 2:
Performance-wise, CPS representation is likely to be faster than array-based, because there is no indirection in case of linear execution. Continuation function is stored directly in the instruction closure. In the array-based implementation it has to increment program counter and perform array access (with an extra bounds checking overhead) first.
I've made some benchmarks to demonstrate it. Here is an implementation of array-based interpreter:
type opcode =
Add of int | Sub of int
| Jmp of int | Phi of (int -> bool) * int * int
| Ret
let compile opcodes =
let instr_array = Array.make (Array.length opcodes) (fun _ data -> data)
in Array.iteri (fun i opcode ->
instr_array.(i) <- match opcode with
| Add x -> (fun pc data ->
let cont = instr_array.(pc + 1) in cont (pc + 1) (data + x))
| Sub x -> (fun pc data ->
let cont = instr_array.(pc + 1) in cont (pc + 1) (data - x))
| Jmp pc -> (fun _ data ->
let cont = instr_array.(pc) in cont (pc + 1) data)
| Phi (cond, tbranch, fbranch) ->
(fun _ data ->
let pc = (if cond data then tbranch else fbranch) in
let cont = instr_array.(pc) in
cont pc data)
| Ret -> fun _ data -> data)
opcodes;
instr_array
let code = [|
Phi (((<) 0), 1, 3);
Sub 1;
Jmp 0;
Ret
|]
let () =
let fn = compile code in
let result = fn.(0) 0 500_000_000 in
Printf.printf "%d\n" result
Let's see how it compares to the CPS-based interpreter above (with all debug tracing stripped, of course). I used OCaml 3.12.0 native compiler on Linux/amd64. Each program was run 5 times.
array: mean = 13.7 s, stddev = 0.24
CPS: mean = 11.4 s, stddev = 0.20
So even in tight loop CPS performs considerably better than array. If we unroll loop and replace one sub instruction with five, figures change:
array: mean = 5.28 s, stddev = 0.065
CPS: mean = 4.14 s, stddev = 0.309
It's interesting that both implementations actually beat OCaml bytecode interpreter. The following loop takes 17 seconds to execute on my machine:
for i = 500_000_000 downto 0 do () done

You should not redefine op_array, you should fill it in with instructions by modifying it in place so that it's the same op_array that your functions already refer to. Unfortunately, you can't change the size of an array dynamically in OCaml.
I see two solutions:
1) if you don't need to change the sequence of "instructions", define them in a mutual recursion with the array op_array. OCaml allows mutually recursive functions and values that start with the application of a constructor to be defined. Something like:
let rec op_plus pc data = ...
and op_array = [| ... |]
2) Or use an additional indirection: make op_array a reference to an array of instructions, and refer in the functions to (!op_array).(pc+1). Later, after you have defined all the instructions, you can make op_array point to an array of the right size, full of the instructions you intend.
let op_array = ref [| |] ;;
let op_plus pc data = ... ;;
op_array := [| ... |] ;;

One more option (if the size is known beforehand) - initially fill the array with void instructions :
let op_array = Array.create size (fun _ _ -> assert false)
let op_plus = ...
let () = op_array.(0) <- op_plus; ...

Related

How best to memoize based on argument only, not function closure, and inside a class?

(question edited and rewritten to reflect chat discussion results)
In one line: Given a state in a state monad, evaluate monadic function once, cache the results.
I am trying to cache the result of a function evaluation, where the key of the cache is the state of a State monad, and where I do not care about possible side effects: i.e., even if the body of the function may change in theory, I know it will be independent of the state:
f x = state { return DateTime.Now.AddMinutes(x) }
g x = state { return DateTime.Now.AddMinutes(x) }
Here, g 10 and f 10 should yield the same result, they may not differ as result to a double call to DateTime.Now, i.e., they must be deterministic. For the sake of argument, the variable state here is x.
On a same token, (g 10) - (f 5) should yield exactly 5 minutes and not a microsecond more or less.
After finding out that caching didn't work, I toned down a more elaborate solution to its bare minimum, using Don Syme's memoization pattern with maps (or dict).
The memoization pattern:
module Cache =
let cache f =
let _cache = ref Map.empty
fun x ->
match (!_cache).TryFind(x) with
| Some res -> res
| None ->
let res = f x
_cache := (!_cache).Add(x,res)
res
The caching is supposed to be used as part of a computation builder, in the Run method:
type someBuilder() =
member __.Run f =
Log.time "Calling __.Run"
let memo_me =
fun state ->
let res =
match f with
| State expr - expr state
| Value v -> state, v
Log.time ("Cache miss, adding key: %A", s)
res
XCache.cache memo_me
This doesn't work, because the cache function is different each time because of the closure, resulting in hitting a cache miss each time over. It should be independent of expr above, and dependent on state only.
I tried placing the _cache outside the cache function on module level, but then it hits the problem of generalization:
Value restriction. The value '_cache' has been inferred to have generic type
Either define '_cache' as a simple data term, make it a function with explicit arguments or, if you do not intend for it to be generic, add a type annotation.
Which I then tried to solve using type annotations, but I ended up not being able to use it in the generic function for the same reason: it required specific type annotations then to be used:
let _cache<'T, 'U when 'T: comparison> ref : Map<'T, 'U> = ref Map.empty
Edit, a working version of the whole computation builder
Here's the computation builder as asked in the comments, tested in FSI. The caching should be dependent solely on TState, not on the whole of 'TState -> 'TState * 'TResult.
type State<'TState, 'TResult> = State of ('TState -> 'TState * 'TResult)
type ResultState<'TState, 'TResult> =
| Expression of State<'TState, 'TResult>
| Value of 'TResult
type RS<'S, 'T> = ResultState<'S, 'T>
type RS =
static member run v s =
match v with
| Value item -> s, item
| Expression (State expr) -> expr s
static member bind k v =
match v with
| Expression (State expr) ->
Expression
<| State
(fun initialState ->
let updatedState, result = expr initialState
RS.run (k result) updatedState
)
| Value item -> k item
type MyBuilder() =
member __.Bind (e, f) = RS.bind f e
member __.Return v = RS.Value v
member __.ReturnFrom e = e
member __.Run f =
printfn "Running!"
// add/remove the first following line to see it with caching
XCache.cache <|
fun s ->
match f with
| RS.Expression (State state) ->
printfn "Call me once!"
state s
| RS.Value v -> s, v
module Builders =
let builder = new MyBuilder()
// constructing prints "Running!", this is as expected
let create() = builder {
let! v = RS.Expression <| (State <| fun i -> (fst i + 12.0, snd i + 3), "my value")
return "test " + v
}
// for seeing the effect, recreating the builder twice,
// it should be cached once
let result1() = create()(30.0, 39)
let result2() = create()(30.0, 39)
Result of running the example in FSI:
Running!
Call me once!
val it : (float * int) * string = ((42.0, 42), "test my value")
Call me once!
val it : (float * int) * string = ((42.0, 42), "test my value")
Just add the Cache into the Run
member __.Run f =
printfn "Running!"
Cache.cache <|
fun s ->
match f with
| RS.Expression (State state) ->
printfn "Call me once!"
state s
| RS.Value v -> s, v
and modify the cache function to see if it really caches
module Cache =
let cache f =
let _cache = ref Map.empty
fun x ->
match (!_cache).TryFind(x) with
| Some res -> printfn "from cache"; res
| None ->
let res = f x
_cache := (!_cache).Add(x,res)
printfn "to cache"
res
and the output is
Call me once!
to cache
val it : (float * int) * string = ((42.0, 42), "test my value")
>
from cache
val it : (float * int) * string = ((42.0, 42), "test my value")

Graph with sets as vertices

I have a tiny grammar represented as a variant type term with strings that are tokens/part of tokens (type term).
Given expressions from the grammar, I am collecting all strings from expressions and pack them into sets (function vars). Finally, I want to create some graph with these sets as vertices (lines 48-49).
For some reason, the graph created in the such sophisticated way does not recognise sets containing same variables and creates multiple vertices with the same content. I don't really understand why this is happening.
Here is minimal working example with this behaviour:
(* demo.ml *)
type term =
| Var of string
| List of term list * string option
| Tuple of term list
module SSet = Set.Make(
struct
let compare = String.compare
type t = string
end)
let rec vars = function
| Var v -> SSet.singleton v
| List (x, tail) ->
let tl = match tail with
| None -> SSet.empty
| Some var -> SSet.singleton var in
SSet.union tl (List.fold_left SSet.union SSet.empty (List.map vars x))
| Tuple x -> List.fold_left SSet.union SSet.empty (List.map vars x)
module Node = struct
type t = SSet.t
let compare = SSet.compare
let equal = SSet.equal
let hash = Hashtbl.hash
end
module G = Graph.Imperative.Digraph.ConcreteBidirectional(Node)
(* dot output for the graph for illustration purposes *)
module Dot = Graph.Graphviz.Dot(struct
include G
let edge_attributes _ = []
let default_edge_attributes _ = []
let get_subgraph _ = None
let vertex_attributes _ = []
let vertex_name v = Printf.sprintf "{%s}" (String.concat ", " (SSet.elements v))
let default_vertex_attributes _ = []
let graph_attributes _ = []
end)
let _ =
(* creation of two terms *)
let a, b = List ([Var "a"], Some "b"), Tuple [Var "a"; Var "b"] in
(* get strings from terms packed into sets *)
let avars, bvars = vars a, vars b in
let g = G.create () in
G.add_edge g avars bvars;
Printf.printf "The content is the same: [%s] [%s]\n"
(String.concat ", " (SSet.elements avars))
(String.concat ", " (SSet.elements bvars));
Printf.printf "compare/equal output: %d %b\n"
(SSet.compare avars bvars)
(SSet.equal avars bvars);
Printf.printf "Hash values are different: %d %d\n"
(Hashtbl.hash avars) (Hashtbl.hash bvars);
Dot.fprint_graph Format.str_formatter g;
Printf.printf "Graph representation:\n%s" (Format.flush_str_formatter ())
In order to compile, type ocamlc -c -I +ocamlgraph demo.ml; ocamlc -I +ocamlgraph graph.cma demo.cmo. When the program is executed you get this output:
The content is the same: [a, b] [a, b]
compare/equal output: 0 true
Hash values are different: 814436103 1017954833
Graph representation:
digraph G {
{a, b};
{a, b};
{a, b} -> {a, b};
{a, b} -> {a, b};
}
To sum up, I am curious why there are non-equal hash values for sets and two identical vertices are created in the graph, despite the fact these sets are equal by all other means.
I suspect the general answer is that OCaml's built-in hashing is based on rather physical properties of a value, while set equality is a more abstract notion. If you represent sets as ordered binary trees, there are many trees that represent the same set (as is well known). These will be equal as sets but might very well hash to different values.
If you want hashing to work for sets, you might have to supply your own function.
As Jeffrey pointed out, it seems that the problem is in the definition of the hash function that is part of Node module.
Changing it to let hash x = Hashtbl.hash (SSet.elements x) fixed the issue.

Set a timer for a function

I have defined a list of values: data : int list and a function f: int -> unit, and a piece of code:
for i = 0 to (List.length data) - 1 do
let d = List.nth data i in
f d
done
Now, I would like to set a maximal running time for f. For instance, if f d exceeds a certain time maximal, the execution of f d stops, and we carry on with the next element of data.
Does anyone know how to do it?
Update1:
Following the comments, I would like to add that, the application of f to a good part of elements of data will end up by raising an exception. This is normal and accepted. So the code looks like:
List.iter
(fun d ->
try
(f d)
with
| e ->
printf "%s\n" (Printexc.to_string e))
data
Something like this might work for you:
exception Timeout
let run_with_timeout t f x =
try
Sys.set_signal Sys.sigalrm (Sys.Signal_handle (fun _ -> raise Timeout));
ignore (Unix.alarm t);
f x;
ignore (Unix.alarm 0);
Sys.set_signal Sys.sigalrm Sys.Signal_default
with Timeout -> Sys.set_signal Sys.sigalrm Sys.Signal_default
Here's a session that shows how it works:
$ ocaml
OCaml version 4.00.1
# #load "unix.cma";;
# #use "rwt.ml";;
exception Timeout
val run_with_timeout : int -> ('a -> 'b) -> 'a -> unit = <fun>
# run_with_timeout 2 Printf.printf "yes\n";;
yes
- : unit = ()
# run_with_timeout 2 (fun () -> while true do () done) ();;
- : unit = ()
#
Your code would be something like this:
List.iter (run_with_timeout 10 f) data
(This code hasn't been thorougly tested but it shows a way that might work.)
Update
As the comments have shown, this code isn't suitable if f x might throw an exception (or if you're using alarms for some other purpose). I encourage gsg to post his/her improved solution. The edit seems to have been rejected.
This is based on Jeffrey's answer, with some modifications to improve exception safety:
exception Timeout
let run_with_timeout timeout f x =
let old_handler = Sys.signal Sys.sigalrm
(Sys.Signal_handle (fun _ -> raise Timeout)) in
let finish () =
ignore (Unix.alarm 0);
ignore (Sys.signal Sys.sigalrm old_handler) in
try
ignore (Unix.alarm timeout);
ignore (f x);
finish ()
with Timeout -> finish ()
| exn -> finish (); raise exn

Implementing Okasaki's bootstrapped heaps in OCaml, why doesn't it compile?

(A minimal non-compiling example can be found at https://gist.github.com/4044467, see more background below.)
I am trying to implement Bootstrapped Heaps introduced in Chapter 10 of Okasaki's Purely Functional Data Structure. The following is a simplified version of my non-compiling code.
We're to implement a heap with following signature:
module type ORDERED =
sig
type t
val compare : t -> t -> int
end
module type HEAP =
sig
module Elem : ORDERED
type heap
val empty : heap
val insert : Elem.t -> heap -> heap
val find_min : heap -> Elem.t
val delete_min : heap -> heap
end
We say a data structure is bootstrapped when its implementation depends on another implementation of the same kind of data structure. So we have a heap like this (the actual implementation is not important):
module SomeHeap (Element : ORDERED) : (HEAP with module Elem = Element) =
struct
module Elem = Element
type heap
let empty = failwith "skipped"
let insert = failwith "skipped"
let find_min = failwith "skipped"
let delete_min = failwith "skipped"
end
Then, the bootstrapped heap we're going to implement, which can depend on any heap implementation, is supposed to have the following signature:
module BootstrappedHeap
(MakeH : functor (Element : ORDERED) -> HEAP with module Elem = Element)
(Element : ORDERED) : (HEAP with module Elem = Element)
So we can use it like this:
module StringHeap = BootstrappedHeap(SomeHeap)(String)
The implementation of BootstrappedHeap, according to Okasaki, is like this:
module BootstrappedHeap
(MakeH : functor (Element : ORDERED) -> HEAP with module Elem = Element)
(Element : ORDERED) : (HEAP with module Elem = Element) =
struct
module Elem = Element
module rec BootstrappedElem :
sig
type t =
| E
| H of Elem.t * PrimH.heap
val compare : t -> t -> int
end =
struct
type t =
| E
| H of Elem.t * PrimH.heap
let compare t1 t2 = match t1, t2 with
| H (x, _), H (y, _) -> Elem.compare x y
| _ -> failwith "unreachable"
end
and PrimH : (HEAP with module Elem = BootstrappedElem) =
MakeH(BootstrappedElem)
type heap
let empty = failwith "not implemented"
let insert = failwith "not implemented"
let find_min = failwith "not implemented"
let delete_min = failwith "not implemented"
end
But this is not compiling! The error message is:
File "ordered.ml", line 52, characters 15-55:
Error: In this `with' constraint, the new definition of Elem
does not match its original definition in the constrained signature:
Modules do not match:
sig type t = BootstrappedElem.t end
is not included in
ORDERED
The field `compare' is required but not provided
The line 52 is the line
and PrimH : (HEAP with module Elem = BootstrappedElem) =
I think BootstrappedElem did implement ORDERED as it has both t and compare, but I failed to see why the compiler fails to find the compare function.
Change the signature of BootstrappedElem to
module rec BootstrappedElem : ORDERED
will make it compiling but this will hide the type constructor E and T in BootstrappedElem to make it impossible to implement the later parts.
The whole non-compiling code can be downloaded at https://raw.github.com/gist/4044281/0ce0336c40b277e59cece43dbadb9b94ce6efdaf/ordered.ml
I believe this might be a bug in the type-checker. I have reduced your code to the following example:
module type ORDERED =
sig
type t
val compare : t -> t -> int
end
module type CARRY = sig
module M : ORDERED
end
(* works *)
module HigherOrderFunctor
(Make : functor (X : ORDERED) -> (CARRY with module M = X))
= struct
module rec Base
: (ORDERED with type t = string)
= String
and Other
: (CARRY with module M = Base)
= Make(Base)
end
(* does not work *)
module HigherOrderFunctor
(Make : functor (X : ORDERED) -> (CARRY with module M = X))
= struct
module rec Base
: sig
(* 'compare' seems dropped from this signature *)
type t = string
val compare : t -> t -> int
end
= String
and Other
: (CARRY with module M = (Base : sig type t = string val compare : t -> t -> int end))
= Make(Base)
end
I don't understand why the first code works and the second (which seems equivalent) doesn't. I suggest you wait a bit to see if an expert comes with an explanation (Andreas?), then consider sending a bug report.
In this case, a solution is to first bind the signature that seems mishandled:
(* works again *)
module HigherOrderFunctor
(Make : functor (X : ORDERED) -> (CARRY with module M = X))
= struct
(* bind the problematic signature first *)
module type S = sig
type t = string
val compare : t -> t -> int
end
module rec Base : S = String
and Other : (CARRY with module M = Base) = Make(Base)
end
However, that is not possible in your setting, because the signature of BootstrappedElem is mutually recursive with BootstrappedHeap.
A workaround is to avoid the apparently-delicate with module ... construct and replace it with a simple type equality with type Elem.t = ...:
module BootstrappedHeap
(MakeH : functor (Element : ORDERED) -> HEAP with module Elem = Element)
(Element : ORDERED) : (HEAP with module Elem = Element) =
struct
module Elem = Element
module rec BootstrappedElem :
sig
type t =
| E
| H of Elem.t * PrimH.heap
val compare : t -> t -> int
end =
struct
type t =
| E
| H of Elem.t * PrimH.heap
let compare t1 t2 = match t1, t2 with
| H (x, _), H (y, _) -> Elem.compare x y
| _ -> failwith "unreachable"
end
and PrimH : (HEAP with type Elem.t = BootstrappedElem.t) =
MakeH(BootstrappedElem)
type heap
let empty = failwith "not implemented"
let insert = failwith "not implemented"
let find_min = failwith "not implemented"
let delete_min = failwith "not implemented"
end
You could also avoid the mutual recursion and define both BootstrappedElem and BootstrappedHeap in one recursive knot, by defining BootstrappedElem inside the recursive BootstrappedHeap.
module BootstrappedHeap
(MakeH : functor (Element : ORDERED) -> HEAP with module Elem = Element)
(Element : ORDERED) : (HEAP with module Elem = Element) =
struct
module rec BootstrappedHeap : sig
module Elem : sig
type t = E | H of Element.t * BootstrappedHeap.heap
val compare : t -> t -> int
end
include (HEAP with module Elem := Elem)
end = struct
module Elem = struct
type t = E | H of Element.t * BootstrappedHeap.heap
let compare t1 t2 = match t1, t2 with
| H (x, _), H (y, _) -> Element.compare x y
| _ -> failwith "unreachable"
end
include (MakeH(Elem) : HEAP with module Elem := Elem)
end
module Elem = Element
type heap
let empty = failwith "not implemented"
let insert = failwith "not implemented"
let find_min = failwith "not implemented"
let delete_min = failwith "not implemented"
end
This style corresponds naturally to your decision of embedding Elem in the HEAP signature and using with module ... for refinement. Another solution would have been to define HEAP as a functor returning a signature, used as HEAP(Elem).S, and I suppose a different recursive style could have been chosed. Not to say that this would have been better: I think the "abstract module" style is more convenient.

How to represent a simple finite state machine in Ocaml?

I have written some state machine in C++ and Java but never in a functional language like Ocaml
Problem is I don't know if I can just adapt code from the object languages versions, since in Ocaml records and variants are more powerful than class;
So, I need an event-driven finite state machine (hierarchical like in UML), easily configurable
Could someone experienced in the field post a simple sample of that ? Just to avoid the most common traps
thanks :)
EDIT 16/03 : Is it possible to do it without mutable state ?
And I'd like to encapsulate it properly under the name "FSM", should I choose a module or a class ?
It depends on how you have to operate the FSM, e.g., if you need to be able to store its state and continue later, or if you just want to execute it immediately. In the latter case, it's trivial to do it as a bunch of tail-recursive functions.
For example, assume the regexp C((A|B)*CD)* -- the following mutually recursive functions are a direct implementation of the respective FSM that recognises a list matching this regexp (if I didn't make any mistake :) ):
type alphabet = A | B | C | D
let rec s1 = function
| C :: rest -> s2 rest
| _ -> false
and s2 = function
| [] -> true
| (A | B) :: rest -> s2 rest
| C :: rest -> s3 rest
| _ -> false
and s3 = function
| D :: rest -> s2 rest
| _ -> false
Every function corresponds to exactly one state of the automaton and implements its transition function. Applying s1 : alphabet list -> bool will run the FSM on the argument.
PS: Note how this is an application demonstrating the benefit and elegance of tail call optimization...
Usually, you create a record corresponding to a state of the automata, and you have another type for the event triggering the transition to another state. In the state record, you have a map to find, for each event, the new state.
Let's suppose your transitions are triggered by strings:
type event = string
module EventMap = Map.Make(struct
type t = event
let compare = compare
end)
type state = {
state_info : ...; (* the content of that state, id, comment, etc. *)
mutable state_transitions : state EventMap.t;
}
let next_state current_state event =
try
EventMap.find event current_state.state_transitions
with Not_found -> current_state
Here, I supposed that unknown events stay on the same state, but you could have an error state in the record...
There is an excellent answer which demonstrates expressiveness and elegance of OCaml in representing finite state machine here:
automata in ocaml
For more serious use, you could try to look at some finite state machine library like fsm library here.
I recently created an FSM module in OCaml which you can find here
I have some special requirements for my FSM implementation which could make it not quite as nice to look at as some of the others pointed out here, however, I think the way you declare the FSM itself is kind of nice and declarative. The special requirement is that I need to be able to generate code in HDL (hardware description language) from a declarative description of the FSM in addition to being able to simulate the FSM's operation in the OCaml version. Because of this I needed to use predicate expressions instead of transition functions (otherwise, how would I translate a function to a string?) So mainly you want to focus on the FSM module there and the create and eval_fsm functions there.
Here is an example of usage:
(*********************************************************
* FSM testing *******************************************
*)
(* inputs to the FSM *)
let full = Var({name ="full"; value = F});;
let ten_minutes = Var({name = "ten_minutes"; value = F});;
let empty = Var({name = "empty"; value = F});;
let five_minutes = Var({name = "five_minutes"; value =F});;
(* T is true, F is false *)
let _ =
assign full F ;
assign ten_minutes F ;
assign empty F ;
assign five_minutes F ;;
(* outputs from the FSM *)
let water_on = Var({name = "water_on"; value = F});;
let agitate = Var({name = "agitate"; value = F});;
let drain = Var({name = "drain" ; value = F});;
let start_timer = Var({name = "start_timer"; value = F});;
let motor_on = Var({name = "motor_on"; value = F});;
let washed = Var({name = "washed"; value = F});;
let soap = Var({name = "soap"; value = F});;
let reset_actions =
assign water_on F;
assign agitate F;
assign drain F;
assign start_timer F;
assign motor_on F;;
module WashStates =
struct
type t = START | FILL | WASH | DRAIN | RINSE | SPIN | STOP
deriving(Show, Enum)
let start_state = START
end
module LogicExp =
struct
type t = boolean Logic.bexp
type var_t = boolean Logic.variable
let eval_exp exp = to_bool (Logic.eval exp)
let var_to_s = var_to_s
end
module WashFSM = FSM(WashStates)(LogicExp)
open WashStates
(* declare the state table *)
(* CS, PREDICATE, NS, ACTIONs *)
let my_fsm = [
(START, Const(T), FILL, [(water_on, T);
(soap, T)]);
(FILL, Bop(And,full,soap), WASH, [(water_on, F);
(agitate, T);
(washed, T);
(start_timer,T)]);
(WASH, ten_minutes, DRAIN,[(agitate, F);
(start_timer,F);
(empty, T)]);
(DRAIN, Bop(And,empty,soap), FILL, [(drain, F);
(soap, F);
(water_on, T)] );
(FILL, Bop(And,full,Not(soap)), RINSE,[(water_on, F);
(soap, F);
(empty, F);
(agitate, T)]);
(RINSE, ten_minutes, DRAIN, [(agitate, F);
(empty, T)] );
(DRAIN, Bop(And,empty,Not(soap)), SPIN, [(motor_on, T);
(start_timer,T)]);
(SPIN, five_minutes, STOP, [(water_on, F);
(drain, F);
(start_timer,F);
(motor_on, F)]);
(STOP, Const(T), STOP, [(motor_on, F)]);
];;
let st_table, current_state = WashFSM.create my_fsm in
let _ = assign full T in
let current_state = WashFSM.eval_fsm st_table current_state in
let _ = assign ten_minutes T in
let current_state = WashFSM.eval_fsm st_table current_state in
let current_state = WashFSM.eval_fsm st_table current_state in
let _ = (assign ten_minutes F);(assign empty T) in
let current_state = WashFSM.eval_fsm st_table current_state in
let _ = assign five_minutes T in
let current_state = WashFSM.eval_fsm st_table current_state in
let _ = assign five_minutes F in
let _ = assign ten_minutes T in
let current_state = WashFSM.eval_fsm st_table current_state in
let current_state = WashFSM.eval_fsm st_table current_state in
let _ = assign five_minutes T in
let _ = WashFSM.eval_fsm st_table current_state in
(*...and so on...*)
(Please excuse the ";;" endings - I wanted to be able to cut & paste this code into the REPL)
Some of the code used here is found in the Logic project on my github (fsm.ml is part of that project). The predicate expression evaluates to either T or F (true or false). If true, then the transition is made from current state to next state. Const T means always transition. An expression such as:
Bop(And, full, soap)
Means that if both full and soap are T (true) then the expression evaluates to true.

Resources