How to represent a simple finite state machine in Ocaml? - functional-programming

I have written some state machine in C++ and Java but never in a functional language like Ocaml
Problem is I don't know if I can just adapt code from the object languages versions, since in Ocaml records and variants are more powerful than class;
So, I need an event-driven finite state machine (hierarchical like in UML), easily configurable
Could someone experienced in the field post a simple sample of that ? Just to avoid the most common traps
thanks :)
EDIT 16/03 : Is it possible to do it without mutable state ?
And I'd like to encapsulate it properly under the name "FSM", should I choose a module or a class ?

It depends on how you have to operate the FSM, e.g., if you need to be able to store its state and continue later, or if you just want to execute it immediately. In the latter case, it's trivial to do it as a bunch of tail-recursive functions.
For example, assume the regexp C((A|B)*CD)* -- the following mutually recursive functions are a direct implementation of the respective FSM that recognises a list matching this regexp (if I didn't make any mistake :) ):
type alphabet = A | B | C | D
let rec s1 = function
| C :: rest -> s2 rest
| _ -> false
and s2 = function
| [] -> true
| (A | B) :: rest -> s2 rest
| C :: rest -> s3 rest
| _ -> false
and s3 = function
| D :: rest -> s2 rest
| _ -> false
Every function corresponds to exactly one state of the automaton and implements its transition function. Applying s1 : alphabet list -> bool will run the FSM on the argument.
PS: Note how this is an application demonstrating the benefit and elegance of tail call optimization...

Usually, you create a record corresponding to a state of the automata, and you have another type for the event triggering the transition to another state. In the state record, you have a map to find, for each event, the new state.
Let's suppose your transitions are triggered by strings:
type event = string
module EventMap = Map.Make(struct
type t = event
let compare = compare
end)
type state = {
state_info : ...; (* the content of that state, id, comment, etc. *)
mutable state_transitions : state EventMap.t;
}
let next_state current_state event =
try
EventMap.find event current_state.state_transitions
with Not_found -> current_state
Here, I supposed that unknown events stay on the same state, but you could have an error state in the record...

There is an excellent answer which demonstrates expressiveness and elegance of OCaml in representing finite state machine here:
automata in ocaml
For more serious use, you could try to look at some finite state machine library like fsm library here.

I recently created an FSM module in OCaml which you can find here
I have some special requirements for my FSM implementation which could make it not quite as nice to look at as some of the others pointed out here, however, I think the way you declare the FSM itself is kind of nice and declarative. The special requirement is that I need to be able to generate code in HDL (hardware description language) from a declarative description of the FSM in addition to being able to simulate the FSM's operation in the OCaml version. Because of this I needed to use predicate expressions instead of transition functions (otherwise, how would I translate a function to a string?) So mainly you want to focus on the FSM module there and the create and eval_fsm functions there.
Here is an example of usage:
(*********************************************************
* FSM testing *******************************************
*)
(* inputs to the FSM *)
let full = Var({name ="full"; value = F});;
let ten_minutes = Var({name = "ten_minutes"; value = F});;
let empty = Var({name = "empty"; value = F});;
let five_minutes = Var({name = "five_minutes"; value =F});;
(* T is true, F is false *)
let _ =
assign full F ;
assign ten_minutes F ;
assign empty F ;
assign five_minutes F ;;
(* outputs from the FSM *)
let water_on = Var({name = "water_on"; value = F});;
let agitate = Var({name = "agitate"; value = F});;
let drain = Var({name = "drain" ; value = F});;
let start_timer = Var({name = "start_timer"; value = F});;
let motor_on = Var({name = "motor_on"; value = F});;
let washed = Var({name = "washed"; value = F});;
let soap = Var({name = "soap"; value = F});;
let reset_actions =
assign water_on F;
assign agitate F;
assign drain F;
assign start_timer F;
assign motor_on F;;
module WashStates =
struct
type t = START | FILL | WASH | DRAIN | RINSE | SPIN | STOP
deriving(Show, Enum)
let start_state = START
end
module LogicExp =
struct
type t = boolean Logic.bexp
type var_t = boolean Logic.variable
let eval_exp exp = to_bool (Logic.eval exp)
let var_to_s = var_to_s
end
module WashFSM = FSM(WashStates)(LogicExp)
open WashStates
(* declare the state table *)
(* CS, PREDICATE, NS, ACTIONs *)
let my_fsm = [
(START, Const(T), FILL, [(water_on, T);
(soap, T)]);
(FILL, Bop(And,full,soap), WASH, [(water_on, F);
(agitate, T);
(washed, T);
(start_timer,T)]);
(WASH, ten_minutes, DRAIN,[(agitate, F);
(start_timer,F);
(empty, T)]);
(DRAIN, Bop(And,empty,soap), FILL, [(drain, F);
(soap, F);
(water_on, T)] );
(FILL, Bop(And,full,Not(soap)), RINSE,[(water_on, F);
(soap, F);
(empty, F);
(agitate, T)]);
(RINSE, ten_minutes, DRAIN, [(agitate, F);
(empty, T)] );
(DRAIN, Bop(And,empty,Not(soap)), SPIN, [(motor_on, T);
(start_timer,T)]);
(SPIN, five_minutes, STOP, [(water_on, F);
(drain, F);
(start_timer,F);
(motor_on, F)]);
(STOP, Const(T), STOP, [(motor_on, F)]);
];;
let st_table, current_state = WashFSM.create my_fsm in
let _ = assign full T in
let current_state = WashFSM.eval_fsm st_table current_state in
let _ = assign ten_minutes T in
let current_state = WashFSM.eval_fsm st_table current_state in
let current_state = WashFSM.eval_fsm st_table current_state in
let _ = (assign ten_minutes F);(assign empty T) in
let current_state = WashFSM.eval_fsm st_table current_state in
let _ = assign five_minutes T in
let current_state = WashFSM.eval_fsm st_table current_state in
let _ = assign five_minutes F in
let _ = assign ten_minutes T in
let current_state = WashFSM.eval_fsm st_table current_state in
let current_state = WashFSM.eval_fsm st_table current_state in
let _ = assign five_minutes T in
let _ = WashFSM.eval_fsm st_table current_state in
(*...and so on...*)
(Please excuse the ";;" endings - I wanted to be able to cut & paste this code into the REPL)
Some of the code used here is found in the Logic project on my github (fsm.ml is part of that project). The predicate expression evaluates to either T or F (true or false). If true, then the transition is made from current state to next state. Const T means always transition. An expression such as:
Bop(And, full, soap)
Means that if both full and soap are T (true) then the expression evaluates to true.

Related

OCaml guessing number game

I've decided to do a simple exercise in OCaml, writing a guess the number application:
The solution I came up with was this:
let () =
let func_end = ref false in
while !func_end != true do
let () = print_endline "Enter guess:" in
let input = read_int () in
let bound = 50 in
let random_number = Random.int bound in
if input == random_number then
let () = print_endline "Correct!" in
func_end := true
done
I wonder if there's a more functional approach to this problem,
the use of refs seems more imperative rather than functional.
You have a while loop and a ref type, which are imperative features of OCaml, rather than functional, but you can approximate the functionality of this loop in a generalized functional way with recursion.
A while loop involves an initial state, a boolean predicate, and a body. We can pass these things to a function, with the predicate being a function which tests the state, and the body working on the state to generate a new state.
let rec do_while state pred body =
if pred state then
let state = body state in
do_while state pred body
else
state
Consider a simple example of this:
utop # do_while 1 (fun x -> x < 10) (fun x -> Printf.printf "%d\n" x; x + 1);;
1
2
3
4
5
6
7
8
9
- : int = 10
It can then be used to replace your loop. The state will be the correct number, and the guess. Of course, the guess won't exist on the first go around so we'll use a value of type int option. If the correct number and the guess are the same, the function will return true and the loop will end. Otherwise it will continue. The loop body function will prompt for input and update the state. It will also print an error message if the guess isn't correct.
let _ =
do_while
(Random.int 50, None)
(function
| (correct, Some guess) when correct = guess -> false
| _ -> true)
(fun (correct, _) ->
let () = print_endline "guess a number:" in
let guess = read_int () in
(if correct <> guess then print_endline "Incorrect guess!";
(correct, Some guess)))
As an aside, you do not need the ref type in your while loop. You can use an exception to exit the loop.
exception Exit
try
while true do
let () = print_endline "Enter guess:" in
let input = read_int () in
let bound = 50 in
let random_number = Random.int bound in
if input = random_number then raise Exit
done
with
| Exit -> ()
The "functional loop" is recursion.
A good step would be to split this into functions; something like this, perhaps
let guess () =
let () = print_endline "Enter guess: " in
read_int()
let answer () =
Random.int 50
let rec play () =
if guess () = answer () then
print_endline "Correct!"
else
play ()
But I personally feel that it's a bit unfair to create a new correct answer for every guess, so you might want to stick to just one
let guess () =
let () = print_endline "Enter guess: " in
read_int()
let answer () =
Random.int 50
let play () =
let rec with_guess it =
if guess () = it then
print_endline "Correct!"
else
with_guess it
in
with_guess ## answer ()
This can easily be extended to provide hints, such as "too high" or "too low".

How to create a cached recursive type?

open System
open System.Collections.Generic
type Node<'a>(expr:'a, symbol:int) =
member x.Expression = expr
member x.Symbol = symbol
override x.GetHashCode() = symbol
override x.Equals(y) =
match y with
| :? Node<'a> as y -> symbol = y.Symbol
| _ -> failwith "Invalid equality for Node."
interface IComparable with
member x.CompareTo(y) =
match y with
| :? Node<'a> as y -> compare symbol y.Symbol
| _ -> failwith "Invalid comparison for Node."
type Ty =
| Int
| String
| Tuple of Ty list
| Rec of Node<Ty>
| Union of Ty list
type NodeDict<'a> = Dictionary<'a,Node<'a>>
let get_nodify_tag =
let mutable i = 0
fun () -> i <- i+1; i
let nodify (dict: NodeDict<_>) x =
match dict.TryGetValue x with
| true, x -> x
| false, _ ->
let x' = Node(x,get_nodify_tag())
dict.[x] <- x'
x'
let d = Dictionary(HashIdentity.Structural)
let nodify_ty x = nodify d x
let rec int_string_stream =
Union
[
Tuple [Int; Rec (nodify_ty (int_string_stream))]
Tuple [String; Rec (nodify_ty (int_string_stream))]
]
In the above example, the int_string_stream gives a type error, but it neatly illustrates what I want to do. Of course, I want both sides to get tagged with the same symbol in nodify_ty. When I tried changing the Rec type to Node<Lazy<Ty>> I've found that it does not compare them correctly and each sides gets a new symbol which is useless to me.
I am working on a language, and the way I've dealt with storing recursive types up to now is by mapping Rec to an int and then substituting that with the related Ty in a dictionary whenever I need it. Currently, I am in the process of cleaning up the language, and would like to have the Rec case be Node<Ty> rather than an int.
At this point though, I am not sure what else could I try here. Could this be done somehow?
I think you will need to add some form of explicit "delay" to the discriminated union that represents your types. Without an explicit delay, you'll always end up fully evaluating the types and so there is no potential for closing the loop.
Something like this seems to work:
type Ty =
| Int
| String
| Tuple of Ty list
| Rec of Node<Ty>
| Union of Ty list
| Delayed of Lazy<Ty>
// (rest is as before)
let rec int_string_stream = Delayed(Lazy.Create(fun () ->
Union
[
Tuple [Int; Rec (nodify_ty (int_string_stream))]
Tuple [String; Rec (nodify_ty (int_string_stream))]
]))
This will mean that when you pattern match on Ty, you'll always need to check for Delayed, evaluate the lazy value and then pattern match again, but that's probably doable!

How best to memoize based on argument only, not function closure, and inside a class?

(question edited and rewritten to reflect chat discussion results)
In one line: Given a state in a state monad, evaluate monadic function once, cache the results.
I am trying to cache the result of a function evaluation, where the key of the cache is the state of a State monad, and where I do not care about possible side effects: i.e., even if the body of the function may change in theory, I know it will be independent of the state:
f x = state { return DateTime.Now.AddMinutes(x) }
g x = state { return DateTime.Now.AddMinutes(x) }
Here, g 10 and f 10 should yield the same result, they may not differ as result to a double call to DateTime.Now, i.e., they must be deterministic. For the sake of argument, the variable state here is x.
On a same token, (g 10) - (f 5) should yield exactly 5 minutes and not a microsecond more or less.
After finding out that caching didn't work, I toned down a more elaborate solution to its bare minimum, using Don Syme's memoization pattern with maps (or dict).
The memoization pattern:
module Cache =
let cache f =
let _cache = ref Map.empty
fun x ->
match (!_cache).TryFind(x) with
| Some res -> res
| None ->
let res = f x
_cache := (!_cache).Add(x,res)
res
The caching is supposed to be used as part of a computation builder, in the Run method:
type someBuilder() =
member __.Run f =
Log.time "Calling __.Run"
let memo_me =
fun state ->
let res =
match f with
| State expr - expr state
| Value v -> state, v
Log.time ("Cache miss, adding key: %A", s)
res
XCache.cache memo_me
This doesn't work, because the cache function is different each time because of the closure, resulting in hitting a cache miss each time over. It should be independent of expr above, and dependent on state only.
I tried placing the _cache outside the cache function on module level, but then it hits the problem of generalization:
Value restriction. The value '_cache' has been inferred to have generic type
Either define '_cache' as a simple data term, make it a function with explicit arguments or, if you do not intend for it to be generic, add a type annotation.
Which I then tried to solve using type annotations, but I ended up not being able to use it in the generic function for the same reason: it required specific type annotations then to be used:
let _cache<'T, 'U when 'T: comparison> ref : Map<'T, 'U> = ref Map.empty
Edit, a working version of the whole computation builder
Here's the computation builder as asked in the comments, tested in FSI. The caching should be dependent solely on TState, not on the whole of 'TState -> 'TState * 'TResult.
type State<'TState, 'TResult> = State of ('TState -> 'TState * 'TResult)
type ResultState<'TState, 'TResult> =
| Expression of State<'TState, 'TResult>
| Value of 'TResult
type RS<'S, 'T> = ResultState<'S, 'T>
type RS =
static member run v s =
match v with
| Value item -> s, item
| Expression (State expr) -> expr s
static member bind k v =
match v with
| Expression (State expr) ->
Expression
<| State
(fun initialState ->
let updatedState, result = expr initialState
RS.run (k result) updatedState
)
| Value item -> k item
type MyBuilder() =
member __.Bind (e, f) = RS.bind f e
member __.Return v = RS.Value v
member __.ReturnFrom e = e
member __.Run f =
printfn "Running!"
// add/remove the first following line to see it with caching
XCache.cache <|
fun s ->
match f with
| RS.Expression (State state) ->
printfn "Call me once!"
state s
| RS.Value v -> s, v
module Builders =
let builder = new MyBuilder()
// constructing prints "Running!", this is as expected
let create() = builder {
let! v = RS.Expression <| (State <| fun i -> (fst i + 12.0, snd i + 3), "my value")
return "test " + v
}
// for seeing the effect, recreating the builder twice,
// it should be cached once
let result1() = create()(30.0, 39)
let result2() = create()(30.0, 39)
Result of running the example in FSI:
Running!
Call me once!
val it : (float * int) * string = ((42.0, 42), "test my value")
Call me once!
val it : (float * int) * string = ((42.0, 42), "test my value")
Just add the Cache into the Run
member __.Run f =
printfn "Running!"
Cache.cache <|
fun s ->
match f with
| RS.Expression (State state) ->
printfn "Call me once!"
state s
| RS.Value v -> s, v
and modify the cache function to see if it really caches
module Cache =
let cache f =
let _cache = ref Map.empty
fun x ->
match (!_cache).TryFind(x) with
| Some res -> printfn "from cache"; res
| None ->
let res = f x
_cache := (!_cache).Add(x,res)
printfn "to cache"
res
and the output is
Call me once!
to cache
val it : (float * int) * string = ((42.0, 42), "test my value")
>
from cache
val it : (float * int) * string = ((42.0, 42), "test my value")

Graph with sets as vertices

I have a tiny grammar represented as a variant type term with strings that are tokens/part of tokens (type term).
Given expressions from the grammar, I am collecting all strings from expressions and pack them into sets (function vars). Finally, I want to create some graph with these sets as vertices (lines 48-49).
For some reason, the graph created in the such sophisticated way does not recognise sets containing same variables and creates multiple vertices with the same content. I don't really understand why this is happening.
Here is minimal working example with this behaviour:
(* demo.ml *)
type term =
| Var of string
| List of term list * string option
| Tuple of term list
module SSet = Set.Make(
struct
let compare = String.compare
type t = string
end)
let rec vars = function
| Var v -> SSet.singleton v
| List (x, tail) ->
let tl = match tail with
| None -> SSet.empty
| Some var -> SSet.singleton var in
SSet.union tl (List.fold_left SSet.union SSet.empty (List.map vars x))
| Tuple x -> List.fold_left SSet.union SSet.empty (List.map vars x)
module Node = struct
type t = SSet.t
let compare = SSet.compare
let equal = SSet.equal
let hash = Hashtbl.hash
end
module G = Graph.Imperative.Digraph.ConcreteBidirectional(Node)
(* dot output for the graph for illustration purposes *)
module Dot = Graph.Graphviz.Dot(struct
include G
let edge_attributes _ = []
let default_edge_attributes _ = []
let get_subgraph _ = None
let vertex_attributes _ = []
let vertex_name v = Printf.sprintf "{%s}" (String.concat ", " (SSet.elements v))
let default_vertex_attributes _ = []
let graph_attributes _ = []
end)
let _ =
(* creation of two terms *)
let a, b = List ([Var "a"], Some "b"), Tuple [Var "a"; Var "b"] in
(* get strings from terms packed into sets *)
let avars, bvars = vars a, vars b in
let g = G.create () in
G.add_edge g avars bvars;
Printf.printf "The content is the same: [%s] [%s]\n"
(String.concat ", " (SSet.elements avars))
(String.concat ", " (SSet.elements bvars));
Printf.printf "compare/equal output: %d %b\n"
(SSet.compare avars bvars)
(SSet.equal avars bvars);
Printf.printf "Hash values are different: %d %d\n"
(Hashtbl.hash avars) (Hashtbl.hash bvars);
Dot.fprint_graph Format.str_formatter g;
Printf.printf "Graph representation:\n%s" (Format.flush_str_formatter ())
In order to compile, type ocamlc -c -I +ocamlgraph demo.ml; ocamlc -I +ocamlgraph graph.cma demo.cmo. When the program is executed you get this output:
The content is the same: [a, b] [a, b]
compare/equal output: 0 true
Hash values are different: 814436103 1017954833
Graph representation:
digraph G {
{a, b};
{a, b};
{a, b} -> {a, b};
{a, b} -> {a, b};
}
To sum up, I am curious why there are non-equal hash values for sets and two identical vertices are created in the graph, despite the fact these sets are equal by all other means.
I suspect the general answer is that OCaml's built-in hashing is based on rather physical properties of a value, while set equality is a more abstract notion. If you represent sets as ordered binary trees, there are many trees that represent the same set (as is well known). These will be equal as sets but might very well hash to different values.
If you want hashing to work for sets, you might have to supply your own function.
As Jeffrey pointed out, it seems that the problem is in the definition of the hash function that is part of Node module.
Changing it to let hash x = Hashtbl.hash (SSet.elements x) fixed the issue.

Implementing a direct-threaded interpreter in a functional language like OCaml

In C/C++ you can implement a direct threaded interpreter with an array of function pointers. The array represents your program - an array of operations. Each of the operation functions must end in a call to the next function in the array, something like:
void op_plus(size_t pc, uint8_t* data) {
*data += 1;
BytecodeArray[pc+1](pc+1, data); //call the next operation in the array
}
The BytecodeArray is an array of function pointers. If we had an array of these op_plus operations then length of the array would determine how ofter we'd be incrementing the contents of data. (of course, you'd need to add some sort of terminating operation as the last operation in the array).
How would one go about implementing something like this in OCaml? I may be trying to translate this code too literally: I was using an OCaml Array of functions as in the C++. The problem with that is that I keep ending up with something like:
let op_plus pc data = Printf.printf "pc: %d, data_i: %d \n" pc data;
let f = (op_array.(pc+1)) in
f (pc+1) (data+1) ;;
Where op_array is an Array defined in the scope above and then redefine it later to be filled with a bunch of op_plus functions... however, the op_plus function uses the previous definition of op_array. It's a chicken&egg problem.
Another alternative would be using CPS and avoid explicit function array altogether. Tail call optimization still applies in this case.
I don't know how do you generate the code, but let's make not unreasonable assumption that at some point you have an array of VM instructions you want to prepare for execution. Every instruction is still represented as a function, but instead of program counter it receives continuation function.
Here is the simplest example:
type opcode = Add of int | Sub of int
let make_instr opcode cont =
match opcode with
| Add x -> fun data -> Printf.printf "add %d %d\n" data x; cont (data + x)
| Sub x -> fun data -> Printf.printf "sub %d %d\n" data x; cont (data - x)
let compile opcodes =
Array.fold_right make_instr opcodes (fun x -> x)
Usage (look at inferred types):
# #use "cpsvm.ml";;
type opcode = Add of int | Sub of int
val make_instr : opcode -> (int -> 'a) -> int -> 'a = <fun>
val compile : opcode array -> int -> int = <fun>
# let code = [| Add 13; Add 42; Sub 7 |];;
val code : opcode array = [|Add 13; Add 42; Sub 7|]
# let fn = compile code;;
val fn : int -> int = <fun>
# fn 0;;
add 0 13
add 13 42
sub 55 7
- : int = 48
UPDATE:
It's easy to introduce [conditional] branching in this model. if continuation is constructed from two arguments: iftrue-continuation and iffalse-continuation, but has the same type as every other continuation function. The problem is that we don't know what constitutes these continuations in case of backward branching (backward, because we compile from tail to head). That's easy to overcome with destructive updates (though maybe more elegant solution is possible if you are compiling from a high level language): just leave "holes" and fill them later when branch target is reached by the compiler.
Sample implementation (I've made use of string labels instead of integer instruction pointers, but this hardly matters):
type label = string
type opcode =
Add of int | Sub of int
| Label of label | Jmp of label | Phi of (int -> bool) * label * label
let make_instr labels opcode cont =
match opcode with
| Add x -> fun data -> Printf.printf "add %d %d\n" data x; cont (data + x)
| Sub x -> fun data -> Printf.printf "sub %d %d\n" data x; cont (data - x)
| Label label -> (Hashtbl.find labels label) := cont; cont
| Jmp label ->
let target = Hashtbl.find labels label in
(fun data -> Printf.printf "jmp %s\n" label; !target data)
| Phi (cond, tlabel, flabel) ->
let tcont = Hashtbl.find labels tlabel
and fcont = Hashtbl.find labels flabel in
(fun data ->
let b = cond data in
Printf.printf "branch on %d to %s\n"
data (if b then tlabel else flabel);
(if b then !tcont else !fcont) data)
let compile opcodes =
let id = fun x -> x in
let labels = Hashtbl.create 17 in
Array.iter (function
| Label label -> Hashtbl.add labels label (ref id)
| _ -> ())
opcodes;
Array.fold_right (make_instr labels) opcodes id
I've used two passes for clarity but it's easy to see that it can be done in one pass.
Here is a simple loop that can be compiled and executed by the code above:
let code = [|
Label "entry";
Phi (((<) 0), "body", "exit");
Label "body";
Sub 1;
Jmp "entry";
Label "exit" |]
Execution trace:
# let fn = compile code;;
val fn : int -> int = <fun>
# fn 3;;
branch on 3 to body
sub 3 1
jmp entry
branch on 2 to body
sub 2 1
jmp entry
branch on 1 to body
sub 1 1
jmp entry
branch on 0 to exit
- : int = 0
UPDATE 2:
Performance-wise, CPS representation is likely to be faster than array-based, because there is no indirection in case of linear execution. Continuation function is stored directly in the instruction closure. In the array-based implementation it has to increment program counter and perform array access (with an extra bounds checking overhead) first.
I've made some benchmarks to demonstrate it. Here is an implementation of array-based interpreter:
type opcode =
Add of int | Sub of int
| Jmp of int | Phi of (int -> bool) * int * int
| Ret
let compile opcodes =
let instr_array = Array.make (Array.length opcodes) (fun _ data -> data)
in Array.iteri (fun i opcode ->
instr_array.(i) <- match opcode with
| Add x -> (fun pc data ->
let cont = instr_array.(pc + 1) in cont (pc + 1) (data + x))
| Sub x -> (fun pc data ->
let cont = instr_array.(pc + 1) in cont (pc + 1) (data - x))
| Jmp pc -> (fun _ data ->
let cont = instr_array.(pc) in cont (pc + 1) data)
| Phi (cond, tbranch, fbranch) ->
(fun _ data ->
let pc = (if cond data then tbranch else fbranch) in
let cont = instr_array.(pc) in
cont pc data)
| Ret -> fun _ data -> data)
opcodes;
instr_array
let code = [|
Phi (((<) 0), 1, 3);
Sub 1;
Jmp 0;
Ret
|]
let () =
let fn = compile code in
let result = fn.(0) 0 500_000_000 in
Printf.printf "%d\n" result
Let's see how it compares to the CPS-based interpreter above (with all debug tracing stripped, of course). I used OCaml 3.12.0 native compiler on Linux/amd64. Each program was run 5 times.
array: mean = 13.7 s, stddev = 0.24
CPS: mean = 11.4 s, stddev = 0.20
So even in tight loop CPS performs considerably better than array. If we unroll loop and replace one sub instruction with five, figures change:
array: mean = 5.28 s, stddev = 0.065
CPS: mean = 4.14 s, stddev = 0.309
It's interesting that both implementations actually beat OCaml bytecode interpreter. The following loop takes 17 seconds to execute on my machine:
for i = 500_000_000 downto 0 do () done
You should not redefine op_array, you should fill it in with instructions by modifying it in place so that it's the same op_array that your functions already refer to. Unfortunately, you can't change the size of an array dynamically in OCaml.
I see two solutions:
1) if you don't need to change the sequence of "instructions", define them in a mutual recursion with the array op_array. OCaml allows mutually recursive functions and values that start with the application of a constructor to be defined. Something like:
let rec op_plus pc data = ...
and op_array = [| ... |]
2) Or use an additional indirection: make op_array a reference to an array of instructions, and refer in the functions to (!op_array).(pc+1). Later, after you have defined all the instructions, you can make op_array point to an array of the right size, full of the instructions you intend.
let op_array = ref [| |] ;;
let op_plus pc data = ... ;;
op_array := [| ... |] ;;
One more option (if the size is known beforehand) - initially fill the array with void instructions :
let op_array = Array.create size (fun _ _ -> assert false)
let op_plus = ...
let () = op_array.(0) <- op_plus; ...

Resources