Retrieve MethodInfo of a F# function - reflection

I would like to write a function that takes a function f as an argument and returns the System.Reflection.MethodInfo associated to f.
I'm not quite sure if it is feasible or not.

So, I finally found a solution. Very hacky, but hey! It works! (edit: in Debug mode only).
let Foo (f:S -> A[] -> B[] -> C[] -> D[] -> unit) =
let ty = f.GetType()
let argty = [|typeof<S>; typeof<A[]>; typeof<B[]>; typeof<C[]>;typeof<D[]>|]
let mi = ty.GetMethod("Invoke", argty)
let il = mi.GetMethodBody().GetILAsByteArray()
let offset = 9//mi.GetMethodBody().MaxStackSize
let token = System.BitConverter.ToInt32(il, offset)
let mb = ty.Module.ResolveMethod(token)
match Expr.TryGetReflectedDefinition mb with
| Some ex -> printfn "success %A" e
| None -> failwith "failed"
It works well, even if f is defined in another assembly (.dll) or in the same where the call of Foo happens.
It's not fully general yet since I have to define what argty is, but I'm sure I can write a function that does it.
Turns out after writing this code that Dustin have a similar solution for the same issue, albeit in C# (see it here).
EDIT:
So here's an usage example:
open System
open Microsoft.FSharp.Quotations
[<ReflectedDefinition>]
let F (sv:int) (a:int[]) (b:int[]) (c:int[]) (d:int[]) =
let temp = a.[2] + b.[3]
c.[0] <- temp
()
let Foo (f:S -> A[] -> B[] -> C[] -> D[] -> unit) =
let ty = f.GetType()
let arr = ty.BaseType.GetGenericArguments()
let argty = Array.init (arr.Length-1) (fun i -> arr.[i])
let mi = ty.GetMethod("Invoke", argty)
let il = mi.GetMethodBody().GetILAsByteArray()
let offset = 9
let token = System.BitConverter.ToInt32(il, offset)
let mb = ty.Module.ResolveMethod(token)
mb
let main () =
let mb = Foo F
printfn "%s" mb.Name
match Expr.TryGetReflectedDefinition mb with
| None -> ()
| Some(e) -> printfn "%A" e
do main ()
What it does is printing name of F, and its AST if the function is a reflected definition.
But after further investigation, it happens that this hack only works in debug mode (and F has to be a function value as well as a top level definition), so might as well say that it's an impossible thing to do.
Here's the IL code of the FSharpFunc's Invoke method in both debug/release build:
DEBUG mode:
.method /*06000007*/ public strict virtual
instance class [FSharp.Core/*23000002*/]Microsoft.FSharp.Core.Unit/*01000006*/
Invoke(int32 sv,
int32[] a,
int32[] b,
int32[] c,
int32[] d) cil managed
// SIG: 20 05 12 19 08 1D 08 1D 08 1D 08 1D 08
{
// Method begins at RVA 0x21e4
// Code size 16 (0x10)
.maxstack 9
IL_0000: /* 00 | */ nop
IL_0001: /* 03 | */ ldarg.1
IL_0002: /* 04 | */ ldarg.2
IL_0003: /* 05 | */ ldarg.3
IL_0004: /* 0E | 04 */ ldarg.s c
IL_0006: /* 0E | 05 */ ldarg.s d
IL_0008: /* 28 | (06)000001 */ call void Program/*02000002*/::F(int32,
int32[],
int32[],
int32[],
int32[]) /* 06000001 */
IL_000d: /* 00 | */ nop
IL_000e: /* 14 | */ ldnull
IL_000f: /* 2A | */ ret
} // end of method mb#25::Invoke
RELEASE mode:
method public strict virtual instance class [FSharp.Core]Microsoft.FSharp.Core.Unit
Invoke(int32 sv,
int32[] a,
int32[] b,
int32[] c,
int32[] d) cil managed
{
// Code size 28 (0x1c)
.maxstack 7
.locals init ([0] int32 V_0)
IL_0000: nop
IL_0001: ldarg.2
IL_0002: ldc.i4.2
IL_0003: ldelem [mscorlib]System.Int32
IL_0008: ldarg.3
IL_0009: ldc.i4.3
IL_000a: ldelem [mscorlib]System.Int32
IL_000f: add
IL_0010: stloc.0
IL_0011: ldarg.s c
IL_0013: ldc.i4.0
IL_0014: ldloc.0
IL_0015: stelem [mscorlib]System.Int32
IL_001a: ldnull
IL_001b: ret
} // end of method mb#25::Invoke
You can see that in release mode, the compiler inlines code of F into the Invoke method, so the information of calling F (and the possibility to retrieve the token) is gone..

Does the program below help?
module Program
[<ReflectedDefinition>]
let F x =
x + 1
let Main() =
let x = F 4
let a = System.Reflection.Assembly.GetExecutingAssembly()
let modu = a.GetType("Program")
let methodInfo = modu.GetMethod("F")
let reflDefnOpt = Microsoft.FSharp.Quotations.Expr.TryGetReflectedDefinition(methodInfo)
match reflDefnOpt with
| None -> printfn "failed"
| Some(e) -> printfn "success %A" e
Main()

This is not (easily) possible. The thing to note is that when you write:
let printFunctionName f =
let mi = getMethodInfo f
printfn "%s" mi.Name
Parameter 'f' is simply an instance of type FSharpFunc<,>. So the following are all possible:
printFunctionName (fun x -> x + 1) // Lambda expression
printFunctionName String.ToUpper // Function value
printFunctionName (List.map id) // Curried function
printFunctionNAme (not >> List.empty) // Function composition
In either case there is no straightforward answer to this

I don't know if there is a general answer for any kind of function, but if your function is simple ('a -> 'b) then you could write
let getMethodInfo (f : 'a -> 'b) = (FastFunc.ToConverter f).Method

Related

How can I determine the json path to a field within a record without actually hard coding the path?

I would like to work with the following type
type RecordPath<'a,'b> = {
Get: 'a -> 'b
Path:string
}
It's purpose is to define a getter for going from record type 'a to some field within 'a of type 'b. It also gives the path to that field for the json representation of the record.
For example, consider the following fields.
type DateWithoutTimeBecauseWeirdlyDotnetDoesNotHaveThisConcept = {
Year:uint
Month:uint
Day:uint
}
type Person = {
FullName:string
PassportNumber:string
BirthDate:DateWithoutTimeBecauseWeirdlyDotnetDoesNotHaveThisConcept
}
type Team = {
TeamName:string
TeamMembers:Person list
}
An example RecordPath might be
let birthYearPath = {
Get = fun (team:Team) -> team.TeamMembers |> List.map (fun p -> p.BirthDate.Year)
Path = "$.TeamMember[*].BirthDate.Year" //using mariadb format for json path
}
Is there some way of letting a library user create this record without ever actually needing to specify the string explicitly. Ideally there is some strongly typed way of the user specifying the fields involved. Maybe some kind of clever use of reflection?
It just occurred to me that with a language that supports macros, this would be possible. But can it be done in F#?
PS: I notice that I left out the s in "TeamMembers" in the path. This is the kind of thing I want to guard against to make it easier on the user.
As you noted in the comments, F# has a quotation mechanism that lets you do this. You can create those explicitly using <# ... #> notation or implicitly using a somewhat more elengant automatic quoting mechanism. The quotations are farily close representations of the F# code, so converting them to the desired path format is not going to be easy, but I think it can be done.
I tried to get this to work at least for your small example. First, I needed a helper function that does two transformations on the code and turns:
let x = e1 in e2 into e2[x <- e1] (using the notation e2[x <- e1] to mean a subsitution, i.e. expression e2 with all occurences of x replaced by e1)
e1 |> fun x -> e2 into e2[x <- e1]
This is all I needed for your example, but it's likely you'll need a few more cases:
open Microsoft.FSharp.Quotations
let rec simplify dict e =
let e' = simplifyOne dict e
if e' <> e then simplify dict e' else e'
and simplifyOne dict = function
| Patterns.Call(None, op, [e; Patterns.Lambda(v, body)])
when op.Name = "op_PipeRight" ->
simplify (Map.add v e dict) body
| Patterns.Let(v, e, body) -> simplify (Map.add v e dict) body
| ExprShape.ShapeVar(v) when Map.containsKey v dict -> dict.[v]
| ExprShape.ShapeVar(v) -> Expr.Var(v)
| ExprShape.ShapeLambda(v, e) -> Expr.Lambda(v, simplify dict e)
| ExprShape.ShapeCombination(o, es) ->
ExprShape.RebuildShapeCombination(o, List.map (simplify dict) es)
With this pre-processing, I managed to write an extractPath function like this:
let rec extractPath var = function
| Patterns.Call(None, op, [Patterns.Lambda(v, body); inst]) when op.Name = "Map" ->
extractPath var inst + "[*]." + extractPath v.Name body
| Patterns.PropertyGet(Some(Patterns.Var v), p, []) when v.Name = var -> p.Name
| Patterns.PropertyGet(Some e, p, []) -> extractPath var e + "." + p.Name
| e -> failwithf "Unexpected expression: %A" e
This looks for (1) a call to map function, (2) a property access on a variable that represents the data source and (3) a property access where the instance has some more property accesses.
The following now works for your small example (but probably for nothing else!)
type Path =
static member Make([<ReflectedDefinition(true)>] f:Expr<'T -> 'R>) =
match f with
| Patterns.WithValue(f, _, Patterns.Lambda(v, body)) ->
{ Get = f :?> 'T -> 'R
Path = "$." + extractPath v.Name (simplify Map.empty body) }
| _ -> failwith "Unexpected argument"
Path.Make(fun (team:Team) -> team.TeamMembers |> List.map (fun p -> p.BirthDate.Year))
The way I solved this is
let jsonPath userExpr =
let rec innerLoop expr state =
match expr with
|Patterns.Lambda(_, body) ->
innerLoop body state
|Patterns.PropertyGet(Some parent, propInfo, []) ->
sprintf ".%s%s" propInfo.Name state |> innerLoop parent
|Patterns.Call (None, _, expr1::[Patterns.Let (v, expr2, _)]) when v.Name = "mapping"->
let parentPath = innerLoop expr1 "[*]"
let childPath = innerLoop expr2 ""
parentPath + childPath
|ExprShape.ShapeVar x ->
state
|_ ->
failwithf "Unsupported expression: %A" expr
innerLoop userExpr "" |> sprintf "$%s"
type Path =
static member Make([<ReflectedDefinition(true)>] f:Expr<'T -> 'R>) =
match f with
|Patterns.WithValue(f, _, expr) ->
let path = jsonPath expr
{
Get = f :?> 'T -> 'R
Path = path
}
| _ -> failwith "Unexpected argument"
Caveat: I don't know enough about these techniques to tell if Tomas' answer performs better in some edge cases than mine.

F# async workflow / tasks combined with free monad

I'm trying to build pipeline for message handling using free monad pattern, my code looks like that:
module PipeMonad =
type PipeInstruction<'msgIn, 'msgOut, 'a> =
| HandleAsync of 'msgIn * (Async<'msgOut> -> 'a)
| SendOutAsync of 'msgOut * (Async -> 'a)
let private mapInstruction f = function
| HandleAsync (x, next) -> HandleAsync (x, next >> f)
| SendOutAsync (x, next) -> SendOutAsync (x, next >> f)
type PipeProgram<'msgIn, 'msgOut, 'a> =
| Act of PipeInstruction<'msgIn, 'msgOut, PipeProgram<'msgIn, 'msgOut, 'a>>
| Stop of 'a
let rec bind f = function
| Act x -> x |> mapInstruction (bind f) |> Act
| Stop x -> f x
type PipeBuilder() =
member __.Bind (x, f) = bind f x
member __.Return x = Stop x
member __.Zero () = Stop ()
member __.ReturnFrom x = x
let pipe = PipeBuilder()
let handleAsync msgIn = Act (HandleAsync (msgIn, Stop))
let sendOutAsync msgOut = Act (SendOutAsync (msgOut, Stop))
which I wrote according to this article
However it's important to me to have those methods asynchronous (Task preferably, but Async is acceptable), but when I created a builder for my pipeline, I can't figure out how to use it - how can I await a Task<'msgOut> or Async<'msgOut> so I can send it out and await this "send" task?
Now I have this piece of code:
let pipeline log msgIn =
pipe {
let! msgOut = handleAsync msgIn
let result = async {
let! msgOut = msgOut
log msgOut
return sendOutAsync msgOut
}
return result
}
which returns PipeProgram<'b, 'a, Async<PipeProgram<'c, 'a, Async>>>
In my understanding, the whole point of the free monad is that you don't expose effects like Async, so I don't think they should be used in the PipeInstruction type. The interpreter is where the effects get added.
Also, the Free Monad really only makes sense in Haskell, where all you need to do is define a functor, and then you get the rest of the implementation automatically. In F# you have to write the rest of the code as well, so there is not much benefit to using Free over a more traditional interpreter pattern.
That TurtleProgram code you linked to was just an experiment -- I would not recommend using Free for real code at all.
Finally, if you already know the effects you are going to use, and you are not going to have more than one interpretation, then using this approach doesn't make sense. It only makes sense when the benefits outweigh the complexity.
Anyway, if you did want to write an interpreter version (rather than Free) this is how I would do it:
First, define the instructions without any effects.
/// The abstract instruction set
module PipeProgram =
type PipeInstruction<'msgIn, 'msgOut,'state> =
| Handle of 'msgIn * ('msgOut -> PipeInstruction<'msgIn, 'msgOut,'state>)
| SendOut of 'msgOut * (unit -> PipeInstruction<'msgIn, 'msgOut,'state>)
| Stop of 'state
Then you can write a computation expression for it:
/// A computation expression for a PipeProgram
module PipeProgramCE =
open PipeProgram
let rec bind f instruction =
match instruction with
| Handle (x,next) -> Handle (x, (next >> bind f))
| SendOut (x, next) -> SendOut (x, (next >> bind f))
| Stop x -> f x
type PipeBuilder() =
member __.Bind (x, f) = bind f x
member __.Return x = Stop x
member __.Zero () = Stop ()
member __.ReturnFrom x = x
let pipe = PipeProgramCE.PipeBuilder()
And then you can start writing your computation expressions. This will help flush out the design before you start on the interpreter.
// helper functions for CE
let stop x = PipeProgram.Stop x
let handle x = PipeProgram.Handle (x,stop)
let sendOut x = PipeProgram.SendOut (x, stop)
let exampleProgram : PipeProgram.PipeInstruction<string,string,string> = pipe {
let! msgOut1 = handle "In1"
do! sendOut msgOut1
let! msgOut2 = handle "In2"
do! sendOut msgOut2
return msgOut2
}
Once you have described the the instructions, you can then write the interpreters. And as I said, if you are not writing multiple interpreters, then perhaps you don't need to do this at all.
Here's an interpreter for a non-async version (the "Id monad", as it were):
module PipeInterpreterSync =
open PipeProgram
let handle msgIn =
printfn "In: %A" msgIn
let msgOut = System.Console.ReadLine()
msgOut
let sendOut msgOut =
printfn "Out: %A" msgOut
()
let rec interpret instruction =
match instruction with
| Handle (x, next) ->
let result = handle x
result |> next |> interpret
| SendOut (x, next) ->
let result = sendOut x
result |> next |> interpret
| Stop x ->
x
and here's the async version:
module PipeInterpreterAsync =
open PipeProgram
/// Implementation of "handle" uses async/IO
let handleAsync msgIn = async {
printfn "In: %A" msgIn
let msgOut = System.Console.ReadLine()
return msgOut
}
/// Implementation of "sendOut" uses async/IO
let sendOutAsync msgOut = async {
printfn "Out: %A" msgOut
return ()
}
let rec interpret instruction =
match instruction with
| Handle (x, next) -> async {
let! result = handleAsync x
return! result |> next |> interpret
}
| SendOut (x, next) -> async {
do! sendOutAsync x
return! () |> next |> interpret
}
| Stop x -> x
First of all, I think that using free monads in F# is very close to being an anti-pattern. It is a very abstract construction that does not fit all that great with idiomatic F# style - but that is a matter of preference and if you (and your team) finds this way of writing code readable and easy to understand, then you can certainly go in this direction.
Out of curiosity, I spent a bit of time playing with your example - although I have not quite figured out how to fix your example completely, I hope the following might help to steer you in the right direction. The summary is that I think you will need to integrate Async into your PipeProgram so that the pipe program is inherently asynchronous:
type PipeInstruction<'msgIn, 'msgOut, 'a> =
| HandleAsync of 'msgIn * (Async<'msgOut> -> 'a)
| SendOutAsync of 'msgOut * (Async<unit> -> 'a)
| Continue of 'a
type PipeProgram<'msgIn, 'msgOut, 'a> =
| Act of Async<PipeInstruction<'msgIn, 'msgOut, PipeProgram<'msgIn, 'msgOut, 'a>>>
| Stop of Async<'a>
Note that I had to add Continue to make my functions type-check, but I think that's probably a wrong hack and you might need to remote that. With these definitions, you can then do:
let private mapInstruction f = function
| HandleAsync (x, next) -> HandleAsync (x, next >> f)
| SendOutAsync (x, next) -> SendOutAsync (x, next >> f)
| Continue v -> Continue v
let rec bind (f:'a -> PipeProgram<_, _, _>) = function
| Act x ->
let w = async {
let! x = x
return mapInstruction (bind f) x }
Act w
| Stop x ->
let w = async {
let! x = x
let pg = f x
return Continue pg
}
Act w
type PipeBuilder() =
member __.Bind (x, f) = bind f x
member __.Return x = Stop x
member __.Zero () = Stop (async.Return())
member __.ReturnFrom x = x
let pipe = PipeBuilder()
let handleAsync msgIn = Act (async.Return(HandleAsync (msgIn, Stop)))
let sendOutAsync msgOut = Act (async.Return(SendOutAsync (msgOut, Stop)))
let pipeline log msgIn =
pipe {
let! msgOut = handleAsync msgIn
log msgOut
return! sendOutAsync msgOut
}
pipeline ignore 0
This now gives you just plain PipeProgram<int, unit, unit> which you should be able to evaluate by having a recursive asynchronous functions that acts on the commands.

How best to memoize based on argument only, not function closure, and inside a class?

(question edited and rewritten to reflect chat discussion results)
In one line: Given a state in a state monad, evaluate monadic function once, cache the results.
I am trying to cache the result of a function evaluation, where the key of the cache is the state of a State monad, and where I do not care about possible side effects: i.e., even if the body of the function may change in theory, I know it will be independent of the state:
f x = state { return DateTime.Now.AddMinutes(x) }
g x = state { return DateTime.Now.AddMinutes(x) }
Here, g 10 and f 10 should yield the same result, they may not differ as result to a double call to DateTime.Now, i.e., they must be deterministic. For the sake of argument, the variable state here is x.
On a same token, (g 10) - (f 5) should yield exactly 5 minutes and not a microsecond more or less.
After finding out that caching didn't work, I toned down a more elaborate solution to its bare minimum, using Don Syme's memoization pattern with maps (or dict).
The memoization pattern:
module Cache =
let cache f =
let _cache = ref Map.empty
fun x ->
match (!_cache).TryFind(x) with
| Some res -> res
| None ->
let res = f x
_cache := (!_cache).Add(x,res)
res
The caching is supposed to be used as part of a computation builder, in the Run method:
type someBuilder() =
member __.Run f =
Log.time "Calling __.Run"
let memo_me =
fun state ->
let res =
match f with
| State expr - expr state
| Value v -> state, v
Log.time ("Cache miss, adding key: %A", s)
res
XCache.cache memo_me
This doesn't work, because the cache function is different each time because of the closure, resulting in hitting a cache miss each time over. It should be independent of expr above, and dependent on state only.
I tried placing the _cache outside the cache function on module level, but then it hits the problem of generalization:
Value restriction. The value '_cache' has been inferred to have generic type
Either define '_cache' as a simple data term, make it a function with explicit arguments or, if you do not intend for it to be generic, add a type annotation.
Which I then tried to solve using type annotations, but I ended up not being able to use it in the generic function for the same reason: it required specific type annotations then to be used:
let _cache<'T, 'U when 'T: comparison> ref : Map<'T, 'U> = ref Map.empty
Edit, a working version of the whole computation builder
Here's the computation builder as asked in the comments, tested in FSI. The caching should be dependent solely on TState, not on the whole of 'TState -> 'TState * 'TResult.
type State<'TState, 'TResult> = State of ('TState -> 'TState * 'TResult)
type ResultState<'TState, 'TResult> =
| Expression of State<'TState, 'TResult>
| Value of 'TResult
type RS<'S, 'T> = ResultState<'S, 'T>
type RS =
static member run v s =
match v with
| Value item -> s, item
| Expression (State expr) -> expr s
static member bind k v =
match v with
| Expression (State expr) ->
Expression
<| State
(fun initialState ->
let updatedState, result = expr initialState
RS.run (k result) updatedState
)
| Value item -> k item
type MyBuilder() =
member __.Bind (e, f) = RS.bind f e
member __.Return v = RS.Value v
member __.ReturnFrom e = e
member __.Run f =
printfn "Running!"
// add/remove the first following line to see it with caching
XCache.cache <|
fun s ->
match f with
| RS.Expression (State state) ->
printfn "Call me once!"
state s
| RS.Value v -> s, v
module Builders =
let builder = new MyBuilder()
// constructing prints "Running!", this is as expected
let create() = builder {
let! v = RS.Expression <| (State <| fun i -> (fst i + 12.0, snd i + 3), "my value")
return "test " + v
}
// for seeing the effect, recreating the builder twice,
// it should be cached once
let result1() = create()(30.0, 39)
let result2() = create()(30.0, 39)
Result of running the example in FSI:
Running!
Call me once!
val it : (float * int) * string = ((42.0, 42), "test my value")
Call me once!
val it : (float * int) * string = ((42.0, 42), "test my value")
Just add the Cache into the Run
member __.Run f =
printfn "Running!"
Cache.cache <|
fun s ->
match f with
| RS.Expression (State state) ->
printfn "Call me once!"
state s
| RS.Value v -> s, v
and modify the cache function to see if it really caches
module Cache =
let cache f =
let _cache = ref Map.empty
fun x ->
match (!_cache).TryFind(x) with
| Some res -> printfn "from cache"; res
| None ->
let res = f x
_cache := (!_cache).Add(x,res)
printfn "to cache"
res
and the output is
Call me once!
to cache
val it : (float * int) * string = ((42.0, 42), "test my value")
>
from cache
val it : (float * int) * string = ((42.0, 42), "test my value")

Implementing a direct-threaded interpreter in a functional language like OCaml

In C/C++ you can implement a direct threaded interpreter with an array of function pointers. The array represents your program - an array of operations. Each of the operation functions must end in a call to the next function in the array, something like:
void op_plus(size_t pc, uint8_t* data) {
*data += 1;
BytecodeArray[pc+1](pc+1, data); //call the next operation in the array
}
The BytecodeArray is an array of function pointers. If we had an array of these op_plus operations then length of the array would determine how ofter we'd be incrementing the contents of data. (of course, you'd need to add some sort of terminating operation as the last operation in the array).
How would one go about implementing something like this in OCaml? I may be trying to translate this code too literally: I was using an OCaml Array of functions as in the C++. The problem with that is that I keep ending up with something like:
let op_plus pc data = Printf.printf "pc: %d, data_i: %d \n" pc data;
let f = (op_array.(pc+1)) in
f (pc+1) (data+1) ;;
Where op_array is an Array defined in the scope above and then redefine it later to be filled with a bunch of op_plus functions... however, the op_plus function uses the previous definition of op_array. It's a chicken&egg problem.
Another alternative would be using CPS and avoid explicit function array altogether. Tail call optimization still applies in this case.
I don't know how do you generate the code, but let's make not unreasonable assumption that at some point you have an array of VM instructions you want to prepare for execution. Every instruction is still represented as a function, but instead of program counter it receives continuation function.
Here is the simplest example:
type opcode = Add of int | Sub of int
let make_instr opcode cont =
match opcode with
| Add x -> fun data -> Printf.printf "add %d %d\n" data x; cont (data + x)
| Sub x -> fun data -> Printf.printf "sub %d %d\n" data x; cont (data - x)
let compile opcodes =
Array.fold_right make_instr opcodes (fun x -> x)
Usage (look at inferred types):
# #use "cpsvm.ml";;
type opcode = Add of int | Sub of int
val make_instr : opcode -> (int -> 'a) -> int -> 'a = <fun>
val compile : opcode array -> int -> int = <fun>
# let code = [| Add 13; Add 42; Sub 7 |];;
val code : opcode array = [|Add 13; Add 42; Sub 7|]
# let fn = compile code;;
val fn : int -> int = <fun>
# fn 0;;
add 0 13
add 13 42
sub 55 7
- : int = 48
UPDATE:
It's easy to introduce [conditional] branching in this model. if continuation is constructed from two arguments: iftrue-continuation and iffalse-continuation, but has the same type as every other continuation function. The problem is that we don't know what constitutes these continuations in case of backward branching (backward, because we compile from tail to head). That's easy to overcome with destructive updates (though maybe more elegant solution is possible if you are compiling from a high level language): just leave "holes" and fill them later when branch target is reached by the compiler.
Sample implementation (I've made use of string labels instead of integer instruction pointers, but this hardly matters):
type label = string
type opcode =
Add of int | Sub of int
| Label of label | Jmp of label | Phi of (int -> bool) * label * label
let make_instr labels opcode cont =
match opcode with
| Add x -> fun data -> Printf.printf "add %d %d\n" data x; cont (data + x)
| Sub x -> fun data -> Printf.printf "sub %d %d\n" data x; cont (data - x)
| Label label -> (Hashtbl.find labels label) := cont; cont
| Jmp label ->
let target = Hashtbl.find labels label in
(fun data -> Printf.printf "jmp %s\n" label; !target data)
| Phi (cond, tlabel, flabel) ->
let tcont = Hashtbl.find labels tlabel
and fcont = Hashtbl.find labels flabel in
(fun data ->
let b = cond data in
Printf.printf "branch on %d to %s\n"
data (if b then tlabel else flabel);
(if b then !tcont else !fcont) data)
let compile opcodes =
let id = fun x -> x in
let labels = Hashtbl.create 17 in
Array.iter (function
| Label label -> Hashtbl.add labels label (ref id)
| _ -> ())
opcodes;
Array.fold_right (make_instr labels) opcodes id
I've used two passes for clarity but it's easy to see that it can be done in one pass.
Here is a simple loop that can be compiled and executed by the code above:
let code = [|
Label "entry";
Phi (((<) 0), "body", "exit");
Label "body";
Sub 1;
Jmp "entry";
Label "exit" |]
Execution trace:
# let fn = compile code;;
val fn : int -> int = <fun>
# fn 3;;
branch on 3 to body
sub 3 1
jmp entry
branch on 2 to body
sub 2 1
jmp entry
branch on 1 to body
sub 1 1
jmp entry
branch on 0 to exit
- : int = 0
UPDATE 2:
Performance-wise, CPS representation is likely to be faster than array-based, because there is no indirection in case of linear execution. Continuation function is stored directly in the instruction closure. In the array-based implementation it has to increment program counter and perform array access (with an extra bounds checking overhead) first.
I've made some benchmarks to demonstrate it. Here is an implementation of array-based interpreter:
type opcode =
Add of int | Sub of int
| Jmp of int | Phi of (int -> bool) * int * int
| Ret
let compile opcodes =
let instr_array = Array.make (Array.length opcodes) (fun _ data -> data)
in Array.iteri (fun i opcode ->
instr_array.(i) <- match opcode with
| Add x -> (fun pc data ->
let cont = instr_array.(pc + 1) in cont (pc + 1) (data + x))
| Sub x -> (fun pc data ->
let cont = instr_array.(pc + 1) in cont (pc + 1) (data - x))
| Jmp pc -> (fun _ data ->
let cont = instr_array.(pc) in cont (pc + 1) data)
| Phi (cond, tbranch, fbranch) ->
(fun _ data ->
let pc = (if cond data then tbranch else fbranch) in
let cont = instr_array.(pc) in
cont pc data)
| Ret -> fun _ data -> data)
opcodes;
instr_array
let code = [|
Phi (((<) 0), 1, 3);
Sub 1;
Jmp 0;
Ret
|]
let () =
let fn = compile code in
let result = fn.(0) 0 500_000_000 in
Printf.printf "%d\n" result
Let's see how it compares to the CPS-based interpreter above (with all debug tracing stripped, of course). I used OCaml 3.12.0 native compiler on Linux/amd64. Each program was run 5 times.
array: mean = 13.7 s, stddev = 0.24
CPS: mean = 11.4 s, stddev = 0.20
So even in tight loop CPS performs considerably better than array. If we unroll loop and replace one sub instruction with five, figures change:
array: mean = 5.28 s, stddev = 0.065
CPS: mean = 4.14 s, stddev = 0.309
It's interesting that both implementations actually beat OCaml bytecode interpreter. The following loop takes 17 seconds to execute on my machine:
for i = 500_000_000 downto 0 do () done
You should not redefine op_array, you should fill it in with instructions by modifying it in place so that it's the same op_array that your functions already refer to. Unfortunately, you can't change the size of an array dynamically in OCaml.
I see two solutions:
1) if you don't need to change the sequence of "instructions", define them in a mutual recursion with the array op_array. OCaml allows mutually recursive functions and values that start with the application of a constructor to be defined. Something like:
let rec op_plus pc data = ...
and op_array = [| ... |]
2) Or use an additional indirection: make op_array a reference to an array of instructions, and refer in the functions to (!op_array).(pc+1). Later, after you have defined all the instructions, you can make op_array point to an array of the right size, full of the instructions you intend.
let op_array = ref [| |] ;;
let op_plus pc data = ... ;;
op_array := [| ... |] ;;
One more option (if the size is known beforehand) - initially fill the array with void instructions :
let op_array = Array.create size (fun _ _ -> assert false)
let op_plus = ...
let () = op_array.(0) <- op_plus; ...

Suppress exhaustive matching warning in OCaml

I'm having a problem in fixing a warning that OCaml compiler gives to me.
Basically I'm parsing an expression that can be composed by Bool, Int and Float.
I have a symbol table that tracks all the symbols declared with their type:
type ast_type = Bool | Int | Float
and variables = (string, int*ast_type) Hashtbl.t;
where int is the index used later in the array of all variables.
I have then a concrete type representing the value in a variable:
type value =
| BOOL of bool
| INT of int
| FLOAT of float
| UNSET
and var_values = value array
I'm trying to define the behaviour of a variable reference inside a boolean expression so what I do is
check that the variable is declared
check that the variable has type bool
to do this I have this code (s is the name of the variable):
| GVar s ->
begin
try
let (i,t) = Hashtbl.find variables s in
if (t != Bool) then
raise (SemanticException (BoolExpected,s))
else
(fun s -> let BOOL v = Array.get var_values i in v)
with
Not_found -> raise (SemanticException (VarUndefined,s))
end
The problem is that my checks assure that the element taken from var_values will be of type BOOL of bool but of course this constraint isn't seen by the compiler that warns me:
Warning P: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:
(FLOAT _ |INT _ |UNSET)
How am I supposed to solve this kind of issues? Thanks in advance
This is a problem that you can solve using OCaml's polymorphic variants.
Here is some compilable OCaml code that I infer exhibits your problem:
type ast_type = Bool | Int | Float
and variables = (string, int*ast_type) Hashtbl.t
type value =
| BOOL of bool
| INT of int
| FLOAT of float
| UNSET
and var_values = value array
type expr = GVar of string
type exceptioninfo = BoolExpected | VarUndefined
exception SemanticException of exceptioninfo * string
let variables = Hashtbl.create 13
let var_values = Array.create 13 (BOOL false)
let f e =
match e with
| GVar s ->
begin
try
let (i,t) = Hashtbl.find variables s in
if (t != Bool) then
raise (SemanticException (BoolExpected,s))
else
(fun s -> let BOOL v = Array.get var_values i in v)
with
Not_found -> raise (SemanticException (VarUndefined,s))
end
It generates the warning:
File "t.ml", line 30, characters 42-48:
Warning P: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:
(FLOAT _|INT _|UNSET)
Here is the same code transformed to use polymorphic variants. That code compiles without warnings. Note that polymorphic variants have more expressive power than standard types (here allowing to express that var_values is an array of BOOL only), but they can lead to puzzling warnings.
type ast_type = Bool | Int | Float
and variables = (string, int*ast_type) Hashtbl.t
type value =
[ `BOOL of bool
| `INT of int
| `FLOAT of float
| `UNSET ]
and var_values = value array
type expr = GVar of string
type exceptioninfo = BoolExpected | VarUndefined
exception SemanticException of exceptioninfo * string
let variables = Hashtbl.create 13
let var_values = Array.create 13 (`BOOL false)
let f e =
match e with
| GVar s ->
begin
try
let (i,t) = Hashtbl.find variables s in
if (t != Bool) then
raise (SemanticException (BoolExpected,s))
else
(fun s -> let `BOOL v = Array.get var_values i in v)
with
Not_found -> raise (SemanticException (VarUndefined,s))
end
Here are the types inferred by OCaml on the above code:
type ast_type = Bool | Int | Float
and variables = (string, int * ast_type) Hashtbl.t
type value = [ `BOOL of bool | `FLOAT of float | `INT of int | `UNSET ]
and var_values = value array
type expr = GVar of string
type exceptioninfo = BoolExpected | VarUndefined
exception SemanticException of exceptioninfo * string
val variables : (string, int * ast_type) Hashtbl.t
val var_values : [ `BOOL of bool ] array
val f : expr -> 'a -> bool
Take a look at this and search for "disable warnings". You should come to a flag -w.
If you want to fix it the "ocamlish" way, then I think you must make the pattern match exhaustive, i.e. cover all cases that might occur.
But if you don't want to match against all possible values, you might consider using wildcard (see here), that covers all cases you do not want to handle explicitly.
In this particular case, polymorphic variants, as explained by Pascal, are a good answer.
Sometimes, however, you're stuck with an impossible case. Then I find it natural to write
(fun s -> match Array.get var_values i with
| BOOL v -> v
| _ -> assert false)
This is much better than using the -w p flag which could hide other, undesired non-exhaustive pattern matches.
Whoops! Misread your question. Leaving my answer below for posterity.
Updated answer: is there a reason why you are doing the check in the hashtbl, or why you can't have the concrete data types (type value) in the hashtbl? That would simplify things. As it is, you can move the check for bool to the Array.get and use a closure:
| GVar s ->
begin
try
let (i,_) = Hashtbl.find variables s in
match (Array.get var_values i) with BOOL(v) -> (fun s -> v)
| _ -> raise (SemanticException (BoolExpected,s))
with
Not_found -> raise (SemanticException (VarUndefined,s))
end
Alternatively I think it would make more sense to simplify your code. Move the values into the Hashtbl instead of having a type, an index and an array of values. Or just store the index in the Hashtbl and check the type in the array.
INCORRECT ANSWER BELOW:
You can replace the if else with a match. Or you can replace the let with a match:
replace if/else:
| GVar s ->
begin
try
let (i,t) = Hashtbl.find variables s in
match t with Bool -> (fun s -> let BOOL v = Array.get var_values i in v)
| _ -> raise (SemanticException (BoolExpected,s))
with
Not_found -> raise (SemanticException (VarUndefined,s))
end
replace let:
| GVar s ->
begin
try
match (Hashtbl.find variables s) with (i, Bool) -> (fun s -> let BOOL v = Array.get var_values i in v)
| _ -> raise (SemanticException (BoolExpected,s))
with
Not_found -> raise (SemanticException (VarUndefined,s))
end

Resources