Conduits - multiple "attempts" into one Source - networking

I am trying to do the following:
sourceIRC
:: (MonadBaseControl IO m, MonadLogger m)
=> NetworkSettings
-> Producer (ConnectionT m) Message
sourceIRC networkSettings = do
withConnectionForever networkSettings $ \socket -> do
bracket (liftBase $ Network.socketToHandle socket IO.ReadWriteMode)
(\handle -> liftBase $ IO.hClose handle)
(\handle -> do
mvar <- newMVar False
bracket (fork $ do
threadDelay 5000000
_ <- swapMVar mvar True
return ())
(killThread)
(\_ -> runConnectionT handle (sourceHandle handle))
takeMVar mvar)
As you can see, I am trying to create a Producer in terms of a primitive withConnectionForever. That primitive is of type:
withConnectionForever
:: (MonadBaseControl IO m, MonadLogger m)
=> NetworkSettings
-> (Network.Socket -> m Bool)
-> m ()
As you can imagine, I am getting an error message on compilation! It is:
Haskell/IRC.hs:128:54:
Couldn't match expected type `ConnectionT m0 a0'
with actual type `ConduitM i0 ByteString.ByteString m1 ()'
In the return type of a call of `sourceHandle'
In the second argument of `runConnectionT', namely
`(sourceHandle handle)'
In the expression: runConnectionT handle (sourceHandle handle)
Now, I know that the type of the call to withConnectionForever is not obviously a conduit, but I had hoped that it could manage to be one, by virtue of the fact that a conduit is also a monad and withConnectionForever uses a free monad instead of a hardcoded one. My understanding of what the message is trying to communicate is that that's not happening, and I'd like to know why and what I can do about it.
Here, for completeness, is the source of the primitive:
withConnectionForever
:: (MonadBaseControl IO m, MonadLogger m)
=> NetworkSettings
-> (Network.Socket -> m Bool)
-> m ()
withConnectionForever networkSettings action = do
let loop nFailures = do
maybeSocket <- newConnection networkSettings
case maybeSocket of
Nothing -> return ()
Just socket -> do
terminatedNormally <- action socket
if terminatedNormally
then loop 0
else do
exponentTwiddle <- liftBase $ Random.randomRIO (0, 100)
let exponent =
1.25 + fromIntegral (exponentTwiddle - 50) / 100.0
delay = floor $ 1000000.0 *
((0.5 ** (fromIntegral nFailures * negate exponent))
- 1.0 :: Double)
$(logInfo) (Text.concat
["Abnormal disconnection from the network ",
networkSettingsName networkSettings,
"; pausing attempts for ",
Text.pack $ show $ fromIntegral delay / 1000000.0,
" seconds..."])
liftBase $ threadDelay delay
loop (nFailures + 1)
loop 0
I'd really prefer not to rewrite the primitive, unless it can be done in minimally invasive fashion, but I suppose that's on the table.
Thanks in advance!

The relevant thing to do was
(\_ -> transPipe (runConnectionT handle) (sourceHandle handle))
instead of
(\_ -> runConnectionT handle (sourceHandle handle))
Thanks for your time! :D

Related

how can I repeat monadic instructions inside kind's monads?

I know that I can run monadic instructions sequentially inside monads in Kind language, like this:
Test: _
IO {
IO.print(Nat.show(2))
IO.print(Nat.show(3))
IO.print(Nat.show(4))
}
output:
2
3
4
But is it possible to run monadic instructions repeatedly, like this below?
Test: _
a = [2,1,3,4,5]
IO {
for num in a:
IO.print(Nat.show(num))
}
If it is possible how can I do it correctly?
Monads are usually represented by only two operators :
return :: a -> m(a) // that encapulapse the value inside a effectful monad
>>= :: m a -> (a -> m b) -> m b
// the monadic laws are omitted
Notice, the bind operator is naturally recursive, once it can compose two monads or even discard the value of one and the return can be thought of as a "base case".
m >>= (\a -> ... >>= (\b -> ~ i have a and b, compose or discard? ~) >>= fixpoint)
You just have to produce that sequence, which is pretty straightforward. For example, in Kind we represent monads as a pair which takes a type-for-type value and encapluse a polymorphic type.
type Monad <M: Type -> Type> {
new(
bind: <A: Type, B: Type> M<A> -> (A -> M<B>) -> M<B>
pure: <A: Type> A -> M<A>
)
}
In your example, we just have to trigger the effect and discard the value, a recursive definition is enough :
action (x : List<String>): IO(Unit)
case x {
nil : IO.end!(Unit.new) // base case but we are not worried about values here, just the effects
cons : IO {
IO.print(x.head) // print and discard the value
action(x.tail) // fixpoint
}
}
test : IO(Unit)
IO {
let ls = ["2", "1", "3", "4", "5"]
action(ls)
}
The IO as you know it will be desugared by a sequence of binds!
Normally in case of list it can be generalized like the mapM function of haskell library :
Monadic.forM(A : Type -> Type, B : Type,
C : Type, m : Monad<A>, b : A(C), f : B -> A(C), x : List<A(B)>): A(C)
case x {
nil : b
cons :
open m
let k = App.Kaelin.App.mapM!!!(m, b, f, x.tail)
let ac = m.bind!!(x.head, f)
m.bind!!(ac, (c) k) // the >> operator
}
It naturally discard the value and finally we can do it :
action2 (ls : List<String>): IO(Unit)
let ls = [IO.end!(2), IO.end!(1), IO.end!(3), IO.end!(4), IO.end!(5)]
Monadic.forM!!!(IO.monad, IO.end!(Unit.new), (b) IO.print(Nat.show(b)), ls)
So, action2 do the same thing of action, but in one line!.
When you need compose the values you can represent as monadic fold :
Monadic.foldM(A : Type -> Type, B : Type,
C : Type, m : Monad<A>, b : A(C), f : B -> C -> A(C), x : List<A(B)>): A(C)
case x {
nil : b
cons :
open m
let k = Monadic.foldM!!!(m, b, f, x.tail)
m.bind!!(x.head, (b) m.bind!!(k, (c) f(b, c)))
}
For example, suppose that you want to sum a sequence of numbers that you ask for the user in a loop, you just have to call foldM and compose with a simple function :
Monad.action3 : IO(Nat)
let ls = [IO.get_line, IO.get_line, IO.get_line]
Monadic.foldM!!!(IO.monad, IO.end!(0),
(b, c) IO {
IO.end!(Nat.add(Nat.read(b), c))
},
ls)
test : IO(Unit)
IO {
get total = action3
IO.print(Nat.show(total))
}
For now, Kind do not support typeclass so it make the things a little more verbose, but i think a new support to forM loops syntax can be thought in the future. We hope so :)

F# Recursive Objects

I'm new to F#, and functional languages. So this might be stupid question, or duplicated with this Recursive objects in F#?, but I don't know.
Here is a simple Fibonacci function:
let rec fib n =
match n with
| 0 -> 1
| 1 -> 1
| _ -> fib (n - 1) + fib (n - 2)
Its signature is int -> int.
It can be rewritten as:
let rec fib =
fun n ->
match n with
| 0 -> 1
| 1 -> 1
| _ -> fib (n - 1) + fib (n - 2)
Its signature is (int -> int) (in Visual Studio for Mac).
So what's the difference with the previous one?
If I add one more line like this:
let rec fib =
printfn "fib" // <-- this line
fun n ->
match n with
| 0 -> 1
| 1 -> 1
| _ -> fib (n - 1) + fib (n - 2)
The IDE gives me a warning:
warning FS0040: This and other recursive references to the object(s) being defined will be checked for initialization-soundness at runtime through the use of a delayed reference. This is because you are defining one or more recursive objects, rather than recursive functions. This warning may be suppressed by using '#nowarn "40"' or '--nowarn:40'.
How does this line affect the initialization?
What does "recursive object" mean? I can't find it in the documentation.
Update
Thanks for your replies, really nice explanation.
After reading your answers, I have some ideas about the Recursive Object.
First, I made a mistake about the signature. The first two code snippets above have a same signature, int -> int; but the last has signature (int -> int) (note: the signatures have different representation in vscode with Ionide extension).
I think the difference between the two signatures is, the first one means it's just a function, the other one means it's a reference to a function, that is, an object.
And every let rec something with no parameter-list is an object rather than a function, see the function definition, while the second snippet is an exception, possibly optimized by the compiler to a function.
One example:
let rec x = (fun () -> x + 1)() // same warning, says `x` is an recursive object
The only one reason I can think of is the compiler is not smart enough, it throws an warning just because it's a recursive object, like the warning indicates,
This is because you are defining one or more recursive objects, rather than recursive functions
even though this pattern would never have any problem.
let rec fib =
// do something here, if fib invoked here directly, it's definitely an error, not warning.
fun n ->
match n with
| 0 -> 1
| 1 -> 1
| _ -> fib (n - 1) + fib (n - 2)
What do you think about this?
"Recursive objects" are just like recursive functions, except they are, well, objects. Not functions.
A recursive function is a function that references itself, e.g.:
let rec f x = f (x-1) + 1
A recursive object is similar, in that it references itself, except it's not a function, e.g.:
let rec x = x + 1
The above will actually not compile. The F# compiler is able to correctly determine the problem and issue an error: The value 'x' will be evaluated as part of its own definition. Clearly, such definition is nonsensical: in order to calculate x, you need to already know x. Does not compute.
But let's see if we can be more clever. How about if I close x in a lambda expression?
let rec x = (fun() -> x + 1) ()
Here, I wrap the x in a function, and immediately call that function. This compiles, but with a warning - the same warning that you're getting, something about "checking for initialization-soundness at runtime".
So let's go to runtime:
> let rec x = (fun() -> x + 1) ()
System.InvalidOperationException: ValueFactory attempted to access the Value property of this instance.
Not surprisingly, we get an error: turns out, in this definition, you still need to know x in order to calculate x - same as with let rec x = x + 1.
But if this is the case, why does it compile at all? Well, it just so happens that, in general, it is impossible to strictly prove that x will or will not access itself during initialization. The compiler is just smart enough to notice that it might happen (and this is why it issues the warning), but not smart enough to prove that it will definitely happen.
So in cases like this, in addition to issuing a warning, the compiler will install a runtime guard, which will check whether x has already been initialized when it's being accessed. The compiled code with such guard might look something like this:
let mutable x_initialized = false
let rec x =
let x_temp =
(fun() ->
if not x_initialized then failwith "Not good!"
else x + 1
) ()
x_initialized <- true
x_temp
(the actual compiled code looks differently of course; use ILSpy to look if you're curious)
In certain special cases, the compiler can prove one way or another. In other cases it can't, so it installs runtime protection:
// Definitely bad => compile-time error
let rec x = x + 1
// Definitely good => no errors, no warnings
let rec x = fun() -> x() + 1
// Might be bad => compile-time warning + runtime guard
let rec x = (fun() -> x+1) ()
// Also might be bad: no way to tell what the `printfn` call will do
let rec x =
printfn "a"
fun() -> x() + 1
There's a major difference between the last two versions. Notice adding a printfn call to the first version generates no warning, and "fib" will be printed each time the function recurses:
let rec fib n =
printfn "fib"
match n with
| 0 -> 1
| 1 -> 1
| _ -> fib (n - 1) + fib (n - 2)
> fib 10;;
fib
fib
fib
...
val it : int = 89
The printfn call is part of the recursive function's body. But the 3rd/final version only prints "fib" once when the function is defined then never again.
What's the difference? In the 3rd version you're not defining just a recursive function, because there are other expressions creating a closure over the lambda, resulting in a recursive object. Consider this version:
let rec fib3 =
let x = 1
let y = 2
fun n ->
match n with
| 0 -> x
| 1 -> x
| _ -> fib3 (n - x) + fib3 (n - y)
fib3 is not a plain recursive function; there's a closure over the function capturing x and y (and same for the printfn version, although it's just a side-effect). This closure is the "recursive object" referred to in the warning. x and y will not be redefined in each recursion; they're part of the root-level closure/recursive object.
From the linked question/answer:
because [the compiler] cannot guarantee that the reference won't be accessed before it is initialized
Although it doesn't apply in your particular example, it's impossible for the compiler to know whether you're doing something harmless, or potentially referencing/invoking the lambda in fib3 definition before fib3 has a value/has been initialized. Here's another good answer explaining the same.

SML syntactical restrictions within recursive bindings?

There seems to be syntactical restrictions within SML's recursive bindings, which I'm unable to understand. What are these restrictions I'm not encountering in the second case (see source below) and I'm encountering when using a custom operator in the first case?
Below is the case with which I encountered the issue. It fails when I want to use a custom operator, as explained in comments. Of the major SML implementations I'm testing SML sources with, only Poly/ML accepts it as valid, and all of MLton, ML Kit and HaMLet rejects it.
Error messages are rather confusing to me. The clearest one to my eyes, is the one from HaMLet, which complains about “illegal expression within recursive value binding”.
(* A datatype to pass recursion as result and to arguments. *)
datatype 'a process = Chain of ('a -> 'a process)
(* A controlling iterator, where the item handler is
* of type `'a -> 'a process`, so that while handling an item,
* it's also able to return the next handler to be used, making
* the handler less passive. *)
val rec iter =
fn process: int -> int process =>
fn first: int =>
fn last: int =>
let
val rec step =
fn (i: int, Chain process) (* -> unit *) =>
if i < first then ()
else if i = last then (process i; ())
else if i > last then ()
else
let val Chain process = process i
in step (i + 1, Chain process)
end
in step (first, Chain process)
end
(* An attempt to set‑up a syntax to make use of the `Chain` constructor,
* a bit more convenient and readable. *)
val chain: unit * ('a -> 'a process) -> 'a process =
fn (a, b) => (a; Chain b)
infixr 0 THEN
val op THEN = chain
(* A test of this syntax:
* - OK with Poly/ML, which displays “0-2|4-6|8-10|12-14|16-18|20”.
* - fails with MLton, which complains about a syntax error on line #44.
* - fails with ML Kit, which complains about a syntax error on line #51.
* - fails with HaMLet, which complains about a syntax error on line #45.
* The clearest (while not helpful to me) message comes from HaMLet, which
* says “illegal expression within recursive value binding”. *)
val rec process: int -> int process =
(fn x => print (Int.toString x) THEN
(fn x => print "-" THEN
(fn x => print (Int.toString x) THEN
(fn x => print "|" THEN
process))))
val () = iter process 0 20
val () = print "\n"
(* Here is the same without the `THEN` operator. This one works with
* all of Poly/ML, MLton, ML Kit and HaMLet. *)
val rec process =
fn x =>
(print (Int.toString x);
Chain (fn x => (print "-";
Chain (fn x => (print (Int.toString x);
Chain (fn x => (print "|";
Chain process)))))))
val () = iter process 0 20
val () = print "\n"
(* SML implementations version notes:
* - MLton, is the last version, built just yesterday
* - Poly/ML is Poly/ML 5.5.2
* - ML Kit is MLKit 4.3.7
* - HaMLet is HaMLet 2.0.0 *)
Update
I could work around the issue, but still don't understand it. If I remove the outermost parentheses, then it validates:
val rec process: int -> int process =
fn x => print (Int.toString x) THEN
(fn x => print "-" THEN
(fn x => print (Int.toString x) THEN
(fn x => print "|" THEN
process)))
Instead of:
val rec process: int -> int process =
(fn x => print (Int.toString x) THEN
(fn x => print "-" THEN
(fn x => print (Int.toString x) THEN
(fn x => print "|" THEN
process))))
But why is this so? An SML syntax subtlety? What's its rational?
It's just an over-restrictive sentence in the language definition, which says:
For each value binding "pat = exp" within rec, exp must be of the form "fn match".
Strictly speaking, that doesn't allow any parentheses. In practice, that's rarely a problem, because you almost always use the fun declaration syntax anyway.

Set a timer for a function

I have defined a list of values: data : int list and a function f: int -> unit, and a piece of code:
for i = 0 to (List.length data) - 1 do
let d = List.nth data i in
f d
done
Now, I would like to set a maximal running time for f. For instance, if f d exceeds a certain time maximal, the execution of f d stops, and we carry on with the next element of data.
Does anyone know how to do it?
Update1:
Following the comments, I would like to add that, the application of f to a good part of elements of data will end up by raising an exception. This is normal and accepted. So the code looks like:
List.iter
(fun d ->
try
(f d)
with
| e ->
printf "%s\n" (Printexc.to_string e))
data
Something like this might work for you:
exception Timeout
let run_with_timeout t f x =
try
Sys.set_signal Sys.sigalrm (Sys.Signal_handle (fun _ -> raise Timeout));
ignore (Unix.alarm t);
f x;
ignore (Unix.alarm 0);
Sys.set_signal Sys.sigalrm Sys.Signal_default
with Timeout -> Sys.set_signal Sys.sigalrm Sys.Signal_default
Here's a session that shows how it works:
$ ocaml
OCaml version 4.00.1
# #load "unix.cma";;
# #use "rwt.ml";;
exception Timeout
val run_with_timeout : int -> ('a -> 'b) -> 'a -> unit = <fun>
# run_with_timeout 2 Printf.printf "yes\n";;
yes
- : unit = ()
# run_with_timeout 2 (fun () -> while true do () done) ();;
- : unit = ()
#
Your code would be something like this:
List.iter (run_with_timeout 10 f) data
(This code hasn't been thorougly tested but it shows a way that might work.)
Update
As the comments have shown, this code isn't suitable if f x might throw an exception (or if you're using alarms for some other purpose). I encourage gsg to post his/her improved solution. The edit seems to have been rejected.
This is based on Jeffrey's answer, with some modifications to improve exception safety:
exception Timeout
let run_with_timeout timeout f x =
let old_handler = Sys.signal Sys.sigalrm
(Sys.Signal_handle (fun _ -> raise Timeout)) in
let finish () =
ignore (Unix.alarm 0);
ignore (Sys.signal Sys.sigalrm old_handler) in
try
ignore (Unix.alarm timeout);
ignore (f x);
finish ()
with Timeout -> finish ()
| exn -> finish (); raise exn

Implementing a direct-threaded interpreter in a functional language like OCaml

In C/C++ you can implement a direct threaded interpreter with an array of function pointers. The array represents your program - an array of operations. Each of the operation functions must end in a call to the next function in the array, something like:
void op_plus(size_t pc, uint8_t* data) {
*data += 1;
BytecodeArray[pc+1](pc+1, data); //call the next operation in the array
}
The BytecodeArray is an array of function pointers. If we had an array of these op_plus operations then length of the array would determine how ofter we'd be incrementing the contents of data. (of course, you'd need to add some sort of terminating operation as the last operation in the array).
How would one go about implementing something like this in OCaml? I may be trying to translate this code too literally: I was using an OCaml Array of functions as in the C++. The problem with that is that I keep ending up with something like:
let op_plus pc data = Printf.printf "pc: %d, data_i: %d \n" pc data;
let f = (op_array.(pc+1)) in
f (pc+1) (data+1) ;;
Where op_array is an Array defined in the scope above and then redefine it later to be filled with a bunch of op_plus functions... however, the op_plus function uses the previous definition of op_array. It's a chicken&egg problem.
Another alternative would be using CPS and avoid explicit function array altogether. Tail call optimization still applies in this case.
I don't know how do you generate the code, but let's make not unreasonable assumption that at some point you have an array of VM instructions you want to prepare for execution. Every instruction is still represented as a function, but instead of program counter it receives continuation function.
Here is the simplest example:
type opcode = Add of int | Sub of int
let make_instr opcode cont =
match opcode with
| Add x -> fun data -> Printf.printf "add %d %d\n" data x; cont (data + x)
| Sub x -> fun data -> Printf.printf "sub %d %d\n" data x; cont (data - x)
let compile opcodes =
Array.fold_right make_instr opcodes (fun x -> x)
Usage (look at inferred types):
# #use "cpsvm.ml";;
type opcode = Add of int | Sub of int
val make_instr : opcode -> (int -> 'a) -> int -> 'a = <fun>
val compile : opcode array -> int -> int = <fun>
# let code = [| Add 13; Add 42; Sub 7 |];;
val code : opcode array = [|Add 13; Add 42; Sub 7|]
# let fn = compile code;;
val fn : int -> int = <fun>
# fn 0;;
add 0 13
add 13 42
sub 55 7
- : int = 48
UPDATE:
It's easy to introduce [conditional] branching in this model. if continuation is constructed from two arguments: iftrue-continuation and iffalse-continuation, but has the same type as every other continuation function. The problem is that we don't know what constitutes these continuations in case of backward branching (backward, because we compile from tail to head). That's easy to overcome with destructive updates (though maybe more elegant solution is possible if you are compiling from a high level language): just leave "holes" and fill them later when branch target is reached by the compiler.
Sample implementation (I've made use of string labels instead of integer instruction pointers, but this hardly matters):
type label = string
type opcode =
Add of int | Sub of int
| Label of label | Jmp of label | Phi of (int -> bool) * label * label
let make_instr labels opcode cont =
match opcode with
| Add x -> fun data -> Printf.printf "add %d %d\n" data x; cont (data + x)
| Sub x -> fun data -> Printf.printf "sub %d %d\n" data x; cont (data - x)
| Label label -> (Hashtbl.find labels label) := cont; cont
| Jmp label ->
let target = Hashtbl.find labels label in
(fun data -> Printf.printf "jmp %s\n" label; !target data)
| Phi (cond, tlabel, flabel) ->
let tcont = Hashtbl.find labels tlabel
and fcont = Hashtbl.find labels flabel in
(fun data ->
let b = cond data in
Printf.printf "branch on %d to %s\n"
data (if b then tlabel else flabel);
(if b then !tcont else !fcont) data)
let compile opcodes =
let id = fun x -> x in
let labels = Hashtbl.create 17 in
Array.iter (function
| Label label -> Hashtbl.add labels label (ref id)
| _ -> ())
opcodes;
Array.fold_right (make_instr labels) opcodes id
I've used two passes for clarity but it's easy to see that it can be done in one pass.
Here is a simple loop that can be compiled and executed by the code above:
let code = [|
Label "entry";
Phi (((<) 0), "body", "exit");
Label "body";
Sub 1;
Jmp "entry";
Label "exit" |]
Execution trace:
# let fn = compile code;;
val fn : int -> int = <fun>
# fn 3;;
branch on 3 to body
sub 3 1
jmp entry
branch on 2 to body
sub 2 1
jmp entry
branch on 1 to body
sub 1 1
jmp entry
branch on 0 to exit
- : int = 0
UPDATE 2:
Performance-wise, CPS representation is likely to be faster than array-based, because there is no indirection in case of linear execution. Continuation function is stored directly in the instruction closure. In the array-based implementation it has to increment program counter and perform array access (with an extra bounds checking overhead) first.
I've made some benchmarks to demonstrate it. Here is an implementation of array-based interpreter:
type opcode =
Add of int | Sub of int
| Jmp of int | Phi of (int -> bool) * int * int
| Ret
let compile opcodes =
let instr_array = Array.make (Array.length opcodes) (fun _ data -> data)
in Array.iteri (fun i opcode ->
instr_array.(i) <- match opcode with
| Add x -> (fun pc data ->
let cont = instr_array.(pc + 1) in cont (pc + 1) (data + x))
| Sub x -> (fun pc data ->
let cont = instr_array.(pc + 1) in cont (pc + 1) (data - x))
| Jmp pc -> (fun _ data ->
let cont = instr_array.(pc) in cont (pc + 1) data)
| Phi (cond, tbranch, fbranch) ->
(fun _ data ->
let pc = (if cond data then tbranch else fbranch) in
let cont = instr_array.(pc) in
cont pc data)
| Ret -> fun _ data -> data)
opcodes;
instr_array
let code = [|
Phi (((<) 0), 1, 3);
Sub 1;
Jmp 0;
Ret
|]
let () =
let fn = compile code in
let result = fn.(0) 0 500_000_000 in
Printf.printf "%d\n" result
Let's see how it compares to the CPS-based interpreter above (with all debug tracing stripped, of course). I used OCaml 3.12.0 native compiler on Linux/amd64. Each program was run 5 times.
array: mean = 13.7 s, stddev = 0.24
CPS: mean = 11.4 s, stddev = 0.20
So even in tight loop CPS performs considerably better than array. If we unroll loop and replace one sub instruction with five, figures change:
array: mean = 5.28 s, stddev = 0.065
CPS: mean = 4.14 s, stddev = 0.309
It's interesting that both implementations actually beat OCaml bytecode interpreter. The following loop takes 17 seconds to execute on my machine:
for i = 500_000_000 downto 0 do () done
You should not redefine op_array, you should fill it in with instructions by modifying it in place so that it's the same op_array that your functions already refer to. Unfortunately, you can't change the size of an array dynamically in OCaml.
I see two solutions:
1) if you don't need to change the sequence of "instructions", define them in a mutual recursion with the array op_array. OCaml allows mutually recursive functions and values that start with the application of a constructor to be defined. Something like:
let rec op_plus pc data = ...
and op_array = [| ... |]
2) Or use an additional indirection: make op_array a reference to an array of instructions, and refer in the functions to (!op_array).(pc+1). Later, after you have defined all the instructions, you can make op_array point to an array of the right size, full of the instructions you intend.
let op_array = ref [| |] ;;
let op_plus pc data = ... ;;
op_array := [| ... |] ;;
One more option (if the size is known beforehand) - initially fill the array with void instructions :
let op_array = Array.create size (fun _ _ -> assert false)
let op_plus = ...
let () = op_array.(0) <- op_plus; ...

Resources