I have seen some graphs vertex signatures and even come up with my own:
module type VERTEX = sig
type t
type label
val equal : t -> t -> bool
val create : label -> t
val label : t -> label
end
But I have completely no idea how to implement it as a module. What types should t and label be? How can I create a t based on a label? And how do I get the label from a t?
I'm an author of Graphlib, so I can't pass by as this question hits me directly into my heart. Honestly, I was asked this question millions of times offline and never was able to provide a good answer.
The real problem is that the graph interfaces from the OCamlGraph library are all messed up. We started Graphlib as an attempt to fix them. However, OCamlGraph is a valuable repository of Graph algorithms, thus we have constrained ourselves to be compatible with the OCamlGraph interface. The main problem for us was and still is this Vertex interface that basically establishes a bijection between the set of labels and the set of nodes. People usually stumble on this, as this doesn't make sense - why do we need two different types, one for the label and another for the vertex, if they are the same?
Indeed, the simplest implementation of the VERTEX interface is the following module
module Int : VERTEX with type label = int = struct
type t = int
type label = int
let create x = x
let label x = x
end
In that case, we indeed have a trivial bijection (via the identity endofunctor) between the set of labels and the set of vertices.
However, the deeper look, shows us that a signature
val create : label -> t
val label : t -> label
Is not really a bijection, as the bijection is a one-to-one mapping. It is not really required or enforced by the type system. For example, the create function could be a surjection of label onto t, where label is some distinctive element of a family of vertices. Correspondingly, the label function, could be a forgetting functor that returns the distinctive label and forgetting everything else.
Given this approach, we can have another implementation:
module Labeled = struct
type label = int
type t = {
label : label;
data : "";
}
let create label = {label; data = ""}
let label n = n.label
let data n = n.data
let with_data n data = {n with data}
let compare x y = compare x.label y.label
end
In that implementation, we use the label as an identity of a node, and arbitrary attribute can be attached to a node. In this interpretation, the create function partitions all sets of nodes into a set of equivalence classes, where all members of a class, share the same identity, i.e., they represent the same real-world entity in different points of time or space. For example,
type color = Red | Yellow | Green
module TrafficLight = struct
type label = int
type t = {
id : label;
color : color
}
let create id = {id; color=Red}
let label t = t.id
let compare x y = compare x.id y.id
let switch t color = {t with color}
let color t = t.color
end
In this model, we represent a traffic light with its id number. The color attribute doesn't affect an identity of a traffic light (if a traffic light switches to another color it is still the same traffic light, although in a functional programming language it is represented with two different objects).
The main problem with the above representation is that in all graph textbooks the label is used in the opposite meaning - as an opaque attribute. In a textbook, they will refer to the color of a traffic light as a label. And the node itself will be represented as an int. That's why I'm saying that OCamlGraph interfaces are messed up (and consequently the Graphlib interfaces). So, if you don't want to fall in a contradiction with textbooks, then you should use unlabeled graphs (with int probably is the best representation of a node). And if you need to attach attributes to your nodes, you can use external finite maps, i.e., arrays, maps, associative lists, or any other dictionaries. Otherwise, you need to keep in mind that your label is not a label, but vice verse - the node.
With all this said, let's specify a better interface for a graph vertex:
module type VERTEX = sig
type id
type label
type t
val create : id -> t
val id : t -> id
val label : t -> label
val with_label : t -> label -> label
end
The proposed interface is compatible with your interface (and thus with the OCamlGraph), as it is isomorphic modulo renaming (i.e., we renamed label to id). It also allows us to create efficient unlabeled nodes, where id = t, as well as attach arbitrary information to a node without relying on external mappings.
Implementing a module based on a signature is like a mini puzzle. Here's how I would analyze it:
The first remark I have when reading that signature, is that there is no way in that signature to build values of type label. So, our implementation will need to be a bit larger, maybe by specifying type label = string.
Now, we have:
val create : label -> t
val label : t -> label
Which is a bijection (the types are "equivalent"). The simplest way to implement that is by defining type t = label, so that it's really only one type, but from the exterior of the module you don't know that.
The rest is
type t
val equal: t -> t -> bool
We said that label = string, and t = label. So t = string, and equal is the string equality.
Boom! here we are:
module String_vertex : VERTEX with type label = string = struct
type label = string
type t = string
let equal = String.equal
let create x = x
let label x = x
end
The VERTEX with type label = string part is just if you want to define it in the same file. Otherwise, you can do something like:
(* string_vertex.ml *)
type label = string
type t = string
let equal = String.equal
let create x = x
let label x = x
and any functor F that takes a VERTEX can be called with F(String_vertex).
It would be best practice to create string_vertex.mli with contents include VERTEX with type label = string, though.
Related
Maybe you can help. I'm an Elm beginner and I'm struggling with a rather mundane problem. I'm quite excited with Elm and I've been rather successful with smaller things, so now I tried something more complex but I just can't seem to get my head around it.
I'm trying to build something in Elm that uses a graph-like underlying data structure. I create the graph with a fluent/factory pattern like this:
sample : Result String MyThing
sample =
MyThing.empty
|> addNode 1 "bobble"
|> addNode 2 "why not"
|> addEdge 1 2 "some data here too"
When this code returns Ok MyThing, then the whole graph has been set up in a consistent manner, guaranteed, i.e. all nodes and edges have the required data and the edges for all nodes actually exist.
The actual code has more complex data associated with the nodes and edges but that doesn't matter for the question. Internally, the nodes and edges are stored in the Dict Int element.
type alias MyThing =
{ nodes : Dict Int String
, edges : Dict Int { from : Int, to : Int, label : String }
}
Now, in the users of the module, I want to access the various elements of the graph. But whenever I access one of the nodes or edges with Dict.get, I get a Maybe. That's rather inconvenient because by the virtue of my constructor code I know the indexes exist etc. I don't want to clutter upstream code with Maybe and Result when I know the indexes in an edge exist. To give an example:
getNodeTexts : Edge -> MyThing -> Maybe (String, String)
getNodeTexts edge thing =
case Dict.get edge.from thing.nodes of
Nothing ->
--Yeah, actually this can never happen...
Nothing
Just fromNode -> case Dict.get edge.to thing.nodes of
Nothing ->
--Again, this can never actually happen because the builder code prevents it.
Nothing
Just toNode ->
Just ( fromNode.label, toNode.label )
That's just a lot of boilerplate code to handle something I specifically prevented in the factory code. But what's even worse: Now the consumer needs extra boilerplate code to handle the Maybe--potentially not knowing that the Maybe will actually never be Nothing. The API is sort of lying to the consumer. Isn't that something Elm tries to avoid? Compare to the hypothetical but incorrect:
getNodeTexts : Edge -> MyThing -> (String, String)
getNodeTexts edge thing =
( Dict.get edge.from thing.nodes |> .label
, Dict.get edge.to thing.nodes |> .label
)
An alternative would be not to use Int IDs but use the actual data instead--but then updating things gets very tedious as connectors can have many edges. Managing state without the decoupling through Ints just doesn't seem like a good idea.
I feel there must be a solution to this dilemma using opaque ID types but I just don't see it. I would be very grateful for any pointers.
Note: I've also tried to use both drathier and elm-community elm-graph libraries but they don't address the specific question. They rely on Dict underneath as well, so I end up with the same Maybes.
There is no easy answer to your question. I can offer one comment and a coding suggestion.
You use the magic words "impossible state" but as OOBalance has pointed out, you can create an impossible state in your modelling. The normal meaning of "impossible state" in Elm is precisely in relation to modelling e.g. when you use two Bools to represent 3 possible states. In Elm you can use a custom type for this and not leave one combination of bools in your code.
As for your code, you can reduce its length (and perhaps complexity) with
getNodeTexts : Edge -> MyThing -> Maybe ( String, String )
getNodeTexts edge thing =
Maybe.map2 (\ n1 n2 -> ( n1.label, n2.label ))
(Dict.get edge.from thing.nodes)
(Dict.get edge.to thing.nodes)
From your description, it looks to me like those states actually aren't impossible.
Let's start with your definition of MyThing:
type alias MyThing =
{ nodes : Dict Int String
, edges : Dict Int { from : Int, to : Int, label : String }
}
This is a type alias, not a type – meaning the compiler will accept MyThing in place of {nodes : Dict Int String, edges : Dict Int {from : Int, to : Int, label : String}} and vice-versa.
So rather than construct a MyThing value safely using your factory functions, I can write:
import Dict
myThing = { nodes = Dict.empty, edges = Dict.fromList [(0, {from = 0, to = 1, label = "Edge 0"})] }
… and then pass myThing to any of your functions expecting MyThing, even though the nodes connected by Edge 0 aren't contained in myThing.nodes.
You can fix this by changing MyThing to be a custom type:
type MyThing
= MyThing { nodes : Dict Int String
, edges : Dict Int { from : Int, to : Int, label : String }
}
… and exposing it using exposing (MyThing) rather than exposing (MyThing(..)). That way, no constructor for MyThing is exposed, and code outside of your module must use the factory functions to obtain a value.
The same applies to Edge, wich I'm assuming is defined as:
type alias Edge =
{ from : Int, to : Int, label : String }
Unless it is changed to a custom type, it is trivial to construct arbitrary Edge values:
type Edge
= Edge { from : Int, to : Int, label : String }
Then however, you will need to expose some functions to obtain Edge values to pass to functions like getNodeTexts. Let's assume I have obtained a MyThing and one of its edges:
myThing : MyThing
-- created using factory functions
edge : Edge
-- an edge of myThing
Now I create another MyThing value, and pass it to getNodeTexts along with edge:
myOtherThing : MyThing
-- a different value of type MyThing
nodeTexts = getNodeTexts edge myOtherThing
This should return Maybe.Nothing or Result.Err String, but certainly not (String, String) – the edge does not belong to myOtherThing, so there is no guarantee its nodes are contained in it.
I'm doing exercises of "Modern Compiler Implementation in ML" (Andrew Appel). One of which (ex 1.1 d) is to recommend a balanced-tree data structure for functional symbol table. Appeal mentioned such data structure should rebalance on insertion but not on lookup. Being totally new to functional programming, I found this confusing. What is key insight on this requirement?
A tree that’s rebalanced on every insertion and deletion doesn’t need to rebalance on lookup, because lookup doesn’t modify the structure. If it was balanced before a lookup, it will stay balanced during and after.
In functional languages, insertion and rebalancing can be more expensive than in a procedural one. Because you can’t alter any node in place, you replace a node by creating a new node, then replacing its parent with a new node whose children are the new daughter and the unaltered older daughter, and then replace the grandparent node with one whose children are the new parent and her older sister, and so on up. You finish when you create a new root node for the updated tree and garbage-collect all the nodes you replaced. However, some tree structures have the desirable property that they need to replace no more than O(log N) nodes of the tree on an insertion and can re-use the rest. This means that the rotation of a red-black tree (for example) has not much more overhead than an unbalanced insertion.
Also, you will typically need to query a symbol table much more often than you update it. It therefore becomes less tempting to try to make insertion faster: if you’re inserting, you might as well rebalance.
The question of which self-balancing tree structure is best for a functional language has been asked here, more than once.
Since Davislor already answered your question extensively, here are mostly some implementation hints. I would add that choice of data structure for your symbol table is probably not relevant for a toy compiler. Compilation time only starts to become an issue when you compiler is used on a lot of code and the code is recompiled often.
Sticking to a O(n) insert/lookup data structure is fine in practice until it isn't.
Signature-wise, all you want is a key-value mapping, insert, and lookup:
signature SymTab =
sig
type id
type value
type symtab
val empty : symtab
val insert : id -> value -> symtab -> symtab
val lookup : id -> symtab -> value option
end
A simple O(n) implementation with lists might be:
structure ListSymTab : SymTab =
struct
type id = string
type value = int
type symtab = (id * value) list
val empty = []
fun insert id value [] = [(id, value)]
| insert id value ((id',value')::symtab) =
if id = id'
then (id,value)::symtab
else (id',value')::insert id value symtab
fun lookup _ [] = NONE
| lookup id ((id',value)::symtab) =
if id = id' then SOME value else lookup id symtab
end
You might use it like:
- ListSymTab.lookup "hello" (ListSymTab.insert "hello" 42 ListSymTab.empty);
> val it = SOME 42 : int option
Then again, maybe your symbol table doesn't map strings to integers, or you may have one symbol table for variables and one for functions.
You could parameterise the id/value types using a functor:
functor ListSymTabFn (X : sig
eqtype id
type value
end) : SymTab =
struct
type id = X.id
type value = X.value
(* The rest is the same as ListSymTab. *)
end
And you might use it like:
- structure ListSymTab = ListSymTabFn(struct type id = string type value = int end);
- ListSymTab.lookup "world" (ListSymTab.insert "hello" 42 ListSymTab.empty);
> val it = NONE : int option
All you need for a list-based symbol table is that the identifiers/symbols can be compared for equality. For your balanced-tree symbol table, you need identifiers/symbols to be orderable.
Instead of implementing balanced trees from scratch, look e.g. at SML/NJ's RedBlackMapFn:
To create a structure implementing maps (dictionaries) over a type T [...]:
structure MapT = RedBlackMapFn (struct
type ord_key = T
val compare = compareT
end)
Try this example with T as string and compare as String.compare:
$ sml
Standard ML of New Jersey v110.76 [built: Sun Jun 29 03:29:51 2014]
- structure MapS = RedBlackMapFn (struct
type ord_key = string
val compare = String.compare
end);
[autoloading]
[library $SMLNJ-BASIS/basis.cm is stable]
[library $SMLNJ-LIB/Util/smlnj-lib.cm is stable]
[autoloading done]
structure MapS : ORD_MAP?
- open MapS;
...
Opening the structure is an easy way to explore the available functions and their types.
We can then create a similar functor to ListSymTabFn, but one that takes an additional compare function:
functor RedBlackSymTabFn (X : sig
type id
type value
val compare : id * id -> order
end) : SymTab =
struct
type id = X.id
type value = X.value
structure SymTabX = RedBlackMapFn (struct
type ord_key = X.id
val compare = X.compare
end)
(* The 'a map type inside SymTabX maps X.id to anything. *)
(* We are, however, only interested in mapping to values. *)
type symtab = value SymTabX.map
(* Use other stuff in SymTabT for empty, insert, lookup. *)
end
Finally, you can use this as your symbol table:
structure SymTab = RedBlackSymTabFn(struct
type id = string
type value = int
val compare = String.compare
end);
I am implementing binary search trees in OCaml, trying to use as much imperative programming as possible.
I have the following data type:
type tKey = Key of int;;
type tBST = Null | Pos of node ref
and node = {mutable key : tKey; mutable left : tBST; mutable right : tBST};;
I am having trouble with this function:
let createNode k tree =
tree := Pos ({key = k; left = Null; right = Null});;
Error: This record expression is expected to have type node ref
The field key does not belong to type ref
A binary search tree can be either Null (means empty tree) or a Pos. A tree Pos is a pointer to a node, and a node is a structure of a key and 2 other trees (left and right).
My main goal here is to have a tree that is modified after functions are over. Passing tree by reference so when createNode is over, the tBST I passed as parameter is modified.
Question: is actually possible to do what I am trying in OCaml? if so, how could I change my function createNode and/or data type to make this happen?
Thank you very much.
It is possible, but you need to create the Pos node with a reference explicitly:
Pos (ref {key = k; (*...*)})
Whether what you are trying to do is recommended practice in a language like Ocaml is a different story, though.
The question has already been answered. I would just like to add a side note: The use of ref seems superfluous in this case.
A value of type tBST is either Null or a mutable pointer. If it is Null it will remain Null. If it is non-Null, it will remain non-Null, but the actual pointer might change. That might well be what you intended, but I have my doubts. In particular, what tBST does not do, is to emulate C-style pointers (which are either null or really point somewhere). I suspect, though, that that was your intention.
The idiomatic way to emulate C-style pointers is to just use the built-in option type, like so:
type tBST = node option
A value of type node option is either None or Some n, where n is a pointer to a value of type node. You use tBST for mutable fields (of the record node), so you would effectively have mutable C-style pointers to nodes.
Here is what you probably had in mind:
type tree = node option ref
and node = {
mutable left: tree;
mutable key: int;
mutable right: tree;
};;
let t0 : tree = ref None;;
let t1 : tree = ref (Some { left = ref None; key = 1; right = ref None; }) ;;
let create_node key tree =
tree := Some { left = ref None; key; right = ref None; }
No need to have a separate type for key but you can if you want it, and with the latest OCaml there no runtime overhead for it.
If I want to change a value on a list, I will return a new list with the new value instead of changing the value on the old list.
Now I have four types. I need to update the value location in varEnd, instead of changing the value, I need to return a new type with the update value
type varEnd = {
v: ctype;
k: varkind;
l: location;
}
;;
type varStart = {
ct: ctype;
sy: sTable;
n: int;
stm: stmt list;
e: expr
}
and sEntry = Var of varEnd | Fun of varStart
and sTable = (string * sEntry) list
type environment = sTable list;;
(a function where environment is the only parameter i can use)
let allocateMem (env:environment) : environment =
I tried to use List.iter, but it changes the value directly, which type is also not mutable. I think List.fold will be a better option.
The biggest issue i have is there are four different types.
I think you're saying that you know how to change an element of a list by constructing a new list.
Now you want to do this to an environment, and an environment is a list of quite complicated things. But this doesn't make any difference, the way to change the list is the same. The only difference is that the replacement value will be a complicated thing.
I don't know what you mean when you say you have four types. I see a lot more than four types listed here. But on the other hand, an environment seems to contain things of basically two different types.
Maybe (but possibly not) you're saying you don't know a good way to change just one of the four fields of a record while leaving the others the same. This is something for which there's a good answer. Assume that x is something of type varEnd. Then you can say:
{ x with l = loc }
If, in fact, you don't know how to modify an element of a list by creating a new list, then that's the thing to figure out first. You can do it with a fold, but in fact you can also do it with List.map, which is a little simpler. You can't do it with List.iter.
Update
Assume we have a record type like this:
type r = { a: int; b: float; }
Here's a function that takes r list list and adds 1.0 to the b fields of those records whose a fields are 0.
let incr_ll rll =
let f r = if r.a = 0 then { r with b = r.b +. 1.0 } else r in
List.map (List.map f) rll
The type of this function is r list list -> r list list.
How to create a Map that I will have a key: (int * int) and when it comes to key it is my_own_type ?
Here's a small example:
module IPMap = Map.Make(struct type t = int * int let compare = compare end)
let mymap = IPMap.add (0, 0) (my_value : my_own_type) IPMap.empty
let mymap' = IPMap.add (1, 2) (t: my_own_type) mymap
Note: you don't have to write (t: my_own_type). You can just write t. I'm including it just for emphasis.
When you create a map module like IPMap, you only need to specify the type of the keys. You can have as many different maps with different value types as you like.
Note 2: OCaml maps are immutable. I worry that you haven't fully grappled with this issue yet. (Apologies if I'm wrong.)