How to collapse a recursive tree in OCaml - functional-programming

I have a tree type:
type tree = Vertex of int * tree list;;
My recursive equality definition is that two trees are equal if their ints are equal and all of their children are equal.
How do I build the function
topo: tree -> tree list
that creates a list of all of the trees in depth first search order with each tree appearing once and only once (according to the equality definition)? I want to do this in a computationally efficient way. Maybe use lazy or a hashmap?
Here is my attempt, the code blows up when the length is too large:
type tree = Vertex of int * (tree list)
let rec base = function
| 0 -> Vertex (0, [])
| i -> Vertex (i, [base (i - 1)])
let rec range = function
| 0 -> [0]
| i -> i :: range (i - 1)
let agg i = Vertex (-1, List.map base (range i))
let rec equals (a: tree) (b: tree) : bool =
let rec deep_match a_dep b_dep = match a_dep, b_dep with
| [], [] -> true
| [], _
| _, [] -> false
| x::xs, y::ys -> equals x y && deep_match xs ys
in
let Vertex (ai, al) = a in
let Vertex (bi, bl) = b in
ai = bi && deep_match al bl
let rec in_list (a: tree) (l: tree list) : bool = match l with
| [] -> false
| hd::tl -> equals a hd || in_list a tl
let rec topological (pool: tree list) (t: tree) : tree list =
if in_list t pool then pool else
t::match t with
| Vertex(_, []) -> pool
| Vertex(_, deps) -> List.fold_left topological pool deps
let big_agg = agg 100_000
let topo_ordered = topological [] big_agg;;
Printf.printf "len %i\n" (List.length topo_ordered)

To make it efficient you need to implement ordering and hash-consing. With total ordering, you can store your trees in a balanced tree or even a hashtable, thus turning your in_list into O(logN) or even O(1). Adding hash-consing will enable O(1) comparison of your trees (at the cost of less efficient tree construction).
Instead of having both, depending on your design constraints, you can have only one. For the didactic purposes, let's implement hash-consing for your particular representation
To implement hash-consing you need to make your constructor private and hide data constructors behind an abstraction wall (to prevent users from breaking you hash-consing properties):
module Tree : sig
type t = private Vertex of int * t list
val create : int -> t list -> t
val equal : t -> t -> bool
end = struct
type t = Vertex of int * t list
let repository = Hashtbl.create 64
let create n children =
let node = Vertex (n,children) in
try Hashtbl.find repository node
with Not_found -> Hashtbl.add repository node node; node
let equal x y = x == y
end
Since we guaranteed that structurally equal trees are physically equal during the tree creation (i.e., if there exists an equal tree in our repository then we return it), we are now able to substitute structural equality with physical equality, i.e., with pointer comparison.
We got a fast comparison with the price - we now leaking memory, since we need to store all ever created trees and the create function is now O(N). We can alleviate the first problem by using ephemerons, but the latter problem will persist, of course.
Another issue, is that we're not able to put our trees into ordered structure, like a map or a set. We can of course use regular polymorphic compare, but since it will be O(N), inserting to such structure will become quadratic. Not an option for us. Therefore we need to add total ordering on our trees. We can theoretically do this without changing the representation (using ephemerons), but it is easier just to add an order parameter to our tree representation, e.g.,
module Tree : sig
type order (* = private int *) (* add this for debuggin *)
type t = private Vertex of order * int * t list
val create : int -> t list -> t
val equal : t -> t -> bool
val compare : t -> t -> int
end = struct
type order = int
type t = Vertex of order * int * t list
type tree = t
module Repository = Hashtbl.Make(struct
type t = tree
let max_hash = 16
let rec equal (Vertex (_,p1,x)) (Vertex (_,p2,y)) =
match compare p1 p2 with
| 0 -> equal_trees x y
| n -> false
and equal_trees xs ys = match xs, ys with
| [],[] -> true
| [],_ | _,[] -> false
| x :: xs, y::ys -> equal x y && equal_trees xs ys
let rec hash (Vertex (_,p,xs)) =
hash_trees (Hashtbl.hash p) max_hash xs
and hash_trees hash depth = function
| x :: xs when depth > 0 ->
hash_trees (Hashtbl.hash x) (depth-1) xs
| _ -> hash
end)
let repository = Repository.create 64
let create n children =
try Repository.find repository (Vertex (0,n,children))
with Not_found ->
let order = Repository.length repository + 1 in
let node = Vertex (order,n,children) in
Repository.add repository node node; node
let equal x y = x == y
let order (Vertex (order,_,_)) = order
let compare x y = compare (order x) (order y)
end
We had to manually implement the structural variants of equal and hash for our trees because we need to ignore the order in comparison, when we store a new tree in the repository. It looks like a bit of work, but in the real-life you can do this using derivers.
Anyways, now we got a comparable version of a tree with a comparison function which is O(1), so we can put our trees in sets and maps, and implement your topo efficiently.
A nice feature of both implementations is a tight representation of a tree, since sharing is guaranteed by the create function. E.g.,
# let t1 = Tree.create 42 [];;
val t1 : Tree.t = Tree.Vertex (1, 42, [])
# let t3 = Tree.create 42 [t1; t1];;
val t3 : Tree.t =
Tree.Vertex (2, 42, [Tree.Vertex (1, 42, []); Tree.Vertex (1, 42, [])])
# let t5 = Tree.create 42 [t1; t3; t1];;
val t5 : Tree.t =
Tree.Vertex (3, 42,
[Tree.Vertex (1, 42, []);
Tree.Vertex (2, 42, [Tree.Vertex (1, 42, []); Tree.Vertex (1, 42, [])]);
Tree.Vertex (1, 42, [])])
#
In this example, t1 in t5 and t3 will be the same pointer.

For optimal performance, one possibility would be to use hashconsing. However, in your current example, both the generation and the unicity test are quadratic in n. Fixing both points seems to already improve performance a lot.
First, we can avoid the quadratic tree generation by adding a lot of sharing:
let range max =
let rec range elt l n =
if n > max then elt::l
else
let next = Vertex(n,[elt]) in
range next (elt::l) (n+1) in
range (Vertex(0,[])) [] 1
let agg i = Vertex (-1, range i)
With this change, it is become reasonable to generate a tree with 1010 elements (but only 105 unique elements).
Then, the unicity test can be done with a set (or a hashtable):
module S = Set.Make(struct type t = tree let compare = compare end)
let rec topological (set, pool) t =
if S.mem t set then (set, pool) else
let set = S.add t set in
let set, pool =
match t with
| Vertex(_, []) -> set, pool
| Vertex(_, deps) -> List.fold_left topological (set,pool) deps in
set, t::pool

Related

create a register

I have to create a type tree which would be used to store words, like every node of the tree would hold a letter and the list of the next characters (so words with the same root would share the same "part/branch of the tree). the tree is basically a n-ary one, used as a dictionnary.
All using Caml language
Well, I don't know if it's a homework or not but I'll still answer :
First, we need to define a signature type for letters.
module type LS = sig
type t
val compare : t -> t -> int
end
Then, we need to define our structure :
module Make (L : LS) = struct
module M = Map.Make(L)
type elt = L.t list
type t = { word : bool; branches : t M.t }
let empty = { word = false; branches = M.empty }
let is_empty t = not t.word && M.is_empty t.branches
let rec mem x t =
match x with
| [] -> t.word
| c :: cl -> try mem cl (M.find c t.branches)
with Not_found -> false
let rec add x t =
match x with
| [] -> if t.word then t else { t with word = true }
| c :: cl ->
let b = try M.find c t.branches with Not_found -> empty in
{ t with branches = M.add c (add cl b) t.branches }
end
Now, step by step :
module Make (L : LS) = struct is a functor that will return a new module if we give it a module of type LS as an argument
module M = Map.Make(L)
type elt = L.t list
type t = { word : bool; branches : t M.t }
This is the complex point, once you have it, everything begins clear. We need to represent a node (as you can see in the Wikipedia page of tries). My representation is this : a node is
a truth value stating that this node represent a word (which means that all the letters from the root to this node form a word)
the branches that goes from it. To represent this branches, I need a dictionary and luckily there's a Map functor in OCaml. So, my field branches is a field associating to some letters a trie (which is why I wrote that branches : t M.t). An element is then a list of letters and you'll find out why I chose this type rather than a string.
let empty = { word = false; branches = M.empty } the empty trie is the record with no branches (so, just the root), and this root is not a word (so word = false) (same idea for is_empty)
let rec mem x t =
match x with
| [] -> t.word
| c :: cl -> try mem cl (M.find c t.branches)
with Not_found -> false
Here it becomes interesting. My word being a list of letters, if I want to know if a word is in my trie, I need to make a recursive functions going through this list.
If I reached the point where my list is empty it means that I reached a node where the path from the root to it is composed by all the letters of my word. I just need to know, then, if the value word at this node is true or false.
If I still have at least one letter I need to find the branch corresponding to this letter.
If I find this branch (which will be a trie), I just need to make a recursive call to find the rest of the word (cl) in it
If I don't find it I know that my word doesn't exist in my trie so I can return false.
let rec add x t =
match x with
| [] -> if t.word then t else { t with word = true }
| c :: cl ->
let b = try M.find c t.branches with Not_found -> empty in
{ t with branches = M.add c (add cl b) t.branches }
Same idea. If I want to add a word :
If my list is empty it means that I added all the letters and I've reached the node corresponding to my word. In that case, if word is already true it means that this word was already added, I don't do anything. If word is false I just return the same branch (trie) but with word equal to true.
If my list contains at least a letter c, I find in the current node the branch corresponding to it (try M.find c t.branches with Not_found -> empty) and I there's no such branch, I just return an empty one and then I recursively add the rest of my letters to this branch and add this new branch to the branches of my current node associated to the letter c (if this branch already existed, it will be replaced since I use a dictionary)
Here, we start with the empty trie and we add the word to, top and tea.
In case we don't want to use functors, we can do it this way :
type elt = char list
type t = { word : bool; branches : (char * t) list }
let empty = { word = false; branches = [] }
let is_empty t = not t.word && t.branches = []
let find c l =
let rec aux = function
| [] -> raise Not_found
| (c', t) :: tl when c' = c -> t
| _ :: tl -> aux tl
in aux l
let rec mem x t =
match x with
| [] -> t.word
| c :: cl -> try mem cl (find c t.branches)
with Not_found -> false
let rec add x t =
match x with
| [] -> if t.word then t else { t with word = true }
| c :: cl ->
let b = try find c t.branches with Not_found -> empty in
{ t with branches = (c, (add cl b)) :: t.branches }

Understanding passing of polymorphic types in Standard ML

I am working on some exercises to help my understanding of SML and find I am having a hard time understanding how generic/polymorphic types are passed into functions.
I am given the following initial information:
datatype 'a tree = Leaf | Node of 'a tree * 'a * 'a tree
val testTree = Node (Node (Node (Leaf, ("a", 107), Leaf), ("c", 417), Node (Leaf, ("e", ~151), Node (Leaf, ("o", ~499), Leaf))), ("s", 35), Node (Leaf, ("u", ~387), Node (Leaf, ("y", 263), Leaf)))
fun nameCompare (n1: name, n2: name) : order = String.compare (n1, n2)
fun treeLookup cmp =
let
fun lkup (x, btree) =
case tree of
Leaf => NONE
| Node (lt, y, rt) =>
(case cmp (x, y) of
LESS => lkup (x, lt)
| EQUAL => SOME y
| GREATER => lkup (x, rt))
in
lkup
end
When I try to call treeLookup I continue to get type matching errors.
For example this is what I may be calling
treeLookup nameCompare ("a", testTree)
and Ill get an error like this
treeLookup nameCompare ("a", testTree);
^^^^^^^^
Type clash: expression of type
(string * int) tree
cannot have type
string tree
What do I need to do in order to satisfy the type of the tree when passing it to treeLookup?
In your tree
a' : ("a", 107)
treeLookup calls the cmp on every element and the one you passed. You passed in nameCompare which takes two strings and a string, and "a" which is a string. That means your tree should only have strings in it.
To solve that you'll probably want to make your tree be a map, effectively comparing only on the first value of the pair:
| Node (lt, (k,v), rt) =>
(case cmp (x, k)
Possibly changing the definition as well:
datatype 'k 'v tree = Leaf | Node of 'k 'v tree * ('k * 'v) * 'k 'v tree
Alternatively, you can change your comparison function to take ('a * 'b), but that means that e.g. you'd need to do treeLookup with an element ("a", 107) which would try to match both fields.
You're comparing a string against an item in the tree, which is a string * int.
You could always change your comparison function; something like
fun nameCompare (n, (k,v)) = String.compare (n1, k)
should do.

Check if a tree is a BST using a provided higher order function in OCAML

So let me start by saying this was part of a past homework I couldn't solve but as I am preparing for a test I would like to know how to do this. I have these implementations of map_tree and fold_tree provided by the instructor:
let rec map_tree (f:'a -> 'b) (t:'a tree) : 'b tree =
match t with
| Leaf x -> Leaf (f x)
| Node (x,lt,rt) -> Node (f x,(map_tree f lt),(map_tree f rt))
let fold_tree (f1:'a->'b) (f2:'a->'b->'b->'b) (t:'a tree) : 'b =
let rec aux t =
match t with
| Leaf x -> f1 x
| Node (x,lt,rt) -> f2 x (aux lt) (aux rt)
in aux t
I need to implement a function that verifies a tree is a BST using the above functions, so far this is what I've accomplished and I'm getting the error:
Error: This expression has type bool but an expression was expected of type
'a tree
This is my code:
let rec smaller_than t n : bool =
begin match t with
| Leaf x -> true
| Node(x,lt,rt) -> (x<n) && (smaller_than lt x) && (smaller_than rt x)
end
let rec greater_equal_than t n : bool =
begin match t with
| Leaf x -> true
| Node(x,lt,rt) -> (x>=n) && (greater_equal_than lt x) && (greater_equal_than rt x)
end
let check_bst t =
fold_tree (fun x -> true) (fun x lt rt -> ((check_bst lt) && (check_bst rt)&&(smaller_than lt x)&&(greater_equal_than rt x))) t;;
Any suggestions? I seem to have trouble understanding exactly how higher order functions work in OCAML
What is the specification of a BST? It's a binary tree where:
all the elements in the left subtree (which is also a BST) are strictly smaller than the value stored at the node
and all the ones in the right subtree (which is also a BST) are bigger or equal than the value stored at the node
A fold is an induction principle: you have to explain how to deal with the base cases (the Leaf case here) and how to combine the results for the subcases in the step cases (the Node case here).
A Leaf is always a BST so the base case is going to be pretty simple. However, in the Node case, we need to make sure that the values are in the right subtrees. To be able to perform this check, we are going to need extra information. The idea is to have a fold computing:
whether the given tree is a BST
and the interval in which all of its values live
Let's introduce type synonyms to structure our thoughts:
type is_bst = bool
type 'a interval = 'a * 'a
As predicted, the base case is easy:
let leaf_bst (a : 'a) : is_bst * 'a interval = (true, (a, a))
In the Node case, we have the value a stored at the node and the results computed recursively for the left (lih as in left induction hypothesis) and right subtrees respectively. The tree thus built is a BST if and only if the two subtrees are (b1 && b2) and their values respect the properties described earlier. The interval in which this new tree's values live is now the larger (lb1, ub2).
let node_bst (a : 'a) (lih : is_bst * 'a interval) (rih : is_bst * 'a interval) =
let (b1, (lb1, ub1)) = lih in
let (b2, (lb2, ub2)) = rih in
(b1 && b2 && ub1 < a && a <= lb2, (lb1, ub2))
Finally, the function checking whether a tree is a BST is defined by projecting out the boolean out of the result of calling fold_tree leaf_bst node_bst on it.
let bst (t : 'a tree) : bool =
fst (fold_tree leaf_bst node_bst t)

Grabbing a list of nodes in a tree?

How would you go about grabbing a list of nodes from a tree structure that meet a certain criteria using ocaml? Since everything's created anew, there's no saved data structure. Any type of function that tries to return a list could only return one element when it hits a node, not a list.
Here's a tree type:
type tree = Leaf of int | Node of int * tree * tree
Here's a function that returns all the even values from the nodes of a tree:
let evens t =
let rec go sofar = function
| Leaf k -> if k mod 2 = 0 then k :: sofar else sofar
| Node (k, lt, rt) ->
let sofar' = if k mod 2 = 0 then k :: sofar else sofar in
let sofar'' = go sofar' lt in
go sofar'' rt
in
go [] t

Graph with sets as vertices

I have a tiny grammar represented as a variant type term with strings that are tokens/part of tokens (type term).
Given expressions from the grammar, I am collecting all strings from expressions and pack them into sets (function vars). Finally, I want to create some graph with these sets as vertices (lines 48-49).
For some reason, the graph created in the such sophisticated way does not recognise sets containing same variables and creates multiple vertices with the same content. I don't really understand why this is happening.
Here is minimal working example with this behaviour:
(* demo.ml *)
type term =
| Var of string
| List of term list * string option
| Tuple of term list
module SSet = Set.Make(
struct
let compare = String.compare
type t = string
end)
let rec vars = function
| Var v -> SSet.singleton v
| List (x, tail) ->
let tl = match tail with
| None -> SSet.empty
| Some var -> SSet.singleton var in
SSet.union tl (List.fold_left SSet.union SSet.empty (List.map vars x))
| Tuple x -> List.fold_left SSet.union SSet.empty (List.map vars x)
module Node = struct
type t = SSet.t
let compare = SSet.compare
let equal = SSet.equal
let hash = Hashtbl.hash
end
module G = Graph.Imperative.Digraph.ConcreteBidirectional(Node)
(* dot output for the graph for illustration purposes *)
module Dot = Graph.Graphviz.Dot(struct
include G
let edge_attributes _ = []
let default_edge_attributes _ = []
let get_subgraph _ = None
let vertex_attributes _ = []
let vertex_name v = Printf.sprintf "{%s}" (String.concat ", " (SSet.elements v))
let default_vertex_attributes _ = []
let graph_attributes _ = []
end)
let _ =
(* creation of two terms *)
let a, b = List ([Var "a"], Some "b"), Tuple [Var "a"; Var "b"] in
(* get strings from terms packed into sets *)
let avars, bvars = vars a, vars b in
let g = G.create () in
G.add_edge g avars bvars;
Printf.printf "The content is the same: [%s] [%s]\n"
(String.concat ", " (SSet.elements avars))
(String.concat ", " (SSet.elements bvars));
Printf.printf "compare/equal output: %d %b\n"
(SSet.compare avars bvars)
(SSet.equal avars bvars);
Printf.printf "Hash values are different: %d %d\n"
(Hashtbl.hash avars) (Hashtbl.hash bvars);
Dot.fprint_graph Format.str_formatter g;
Printf.printf "Graph representation:\n%s" (Format.flush_str_formatter ())
In order to compile, type ocamlc -c -I +ocamlgraph demo.ml; ocamlc -I +ocamlgraph graph.cma demo.cmo. When the program is executed you get this output:
The content is the same: [a, b] [a, b]
compare/equal output: 0 true
Hash values are different: 814436103 1017954833
Graph representation:
digraph G {
{a, b};
{a, b};
{a, b} -> {a, b};
{a, b} -> {a, b};
}
To sum up, I am curious why there are non-equal hash values for sets and two identical vertices are created in the graph, despite the fact these sets are equal by all other means.
I suspect the general answer is that OCaml's built-in hashing is based on rather physical properties of a value, while set equality is a more abstract notion. If you represent sets as ordered binary trees, there are many trees that represent the same set (as is well known). These will be equal as sets but might very well hash to different values.
If you want hashing to work for sets, you might have to supply your own function.
As Jeffrey pointed out, it seems that the problem is in the definition of the hash function that is part of Node module.
Changing it to let hash x = Hashtbl.hash (SSet.elements x) fixed the issue.

Resources