Creating set functions union, difference, intersection in SML - functional-programming

I'm trying to write three function definitions that can be used to manipulate sets in SML. As you can see we are basing the implementation on lists.
Union is the set of all elements in both Set s and Set t. (No duplicates are allowed)
Intersection is the set in which the elements is a part of both Set s and Set t.
If Set s and Set t are sets, then the relative complement of Set s in Set t, is the set of elements in Set t, but not in Set s.
Right now the code looks like this:
fun filter p [] = []
| filter p (h::t) =
if p h
then h :: (filter p t)
else (filter p t);
fun mem [] a = false
| mem (h::t) a = (a=h) orelse (mem t a);
signature SETS =
sig
type ''a set
val union : ''a set -> ''a set -> ''a set
val intersection : ''a set -> ''a set -> ''a set
val difference : ''a set -> ''a set -> ''a set
end;
structure Set : SETS =
struct
type ''a set = ''a list;
fun union s t = **(Code goes here)**
fun intersection s t = **(Code goes here)**
fun difference s t = **(Code goes here)**
end;
As you can see there is two helpfunctions to be used when needed - mem and filter.filter will go through a list and only keep those elements that satisfy some
boolean function p, while mem just checks a list to see if it contains the value a.

Related

How to find the minimum element in a Map and return a tuple (key,minimum element)?

I have these types :
type position = float * float
type node = position
I've written those modules to create my Map :
module MyMap =
struct
type t = node
let compare (a1,b1) (a2,b2) =
if a1 > a2 then 1
else if a1 < a2 then -1
else if b1 > b2 then 1
else if b1 < b2 then -1
else 0
end
module DistMap = Map.Make(MyMap)
I've tried to write functions that used iter but my attempts to formulate my ideas in a correct syntax were unsuccessful.
My goal would be able to have a function that takes a Map as argument and return a tuple of the minimum element and its key.
Thanks.
If you're asking for the minimum key and its corresponding element, that's easy: use DistMap.min_binding_opt, or DistMap.min_binding if you're fine with raising an exception on an empty map.
If you're asking for the minimum element and its corresponding key, you will want to use a fold. Luckily, the DistMap module returned by Map.Make exposes a fold function, so you don't have to do extra allocation by, say, calling to_seq and doing a fold on the result. In addition, because the type of elements in a map is not constrained by the functor application (i.e., you can create a map with any element type), you will need the client to supply a comparison function for the element type.
DistMap.fold has type (key -> 'a -> 'b -> 'b) -> 'a t -> 'b -> 'b, so we'll have to instantiate 'b in such a way as to keep track of both the key and the min element; in other words, we'll instantiate 'a as the element type of the map (let’s call it t), and 'b as (key * t) option (where key = position = float * float).
Here's what the code might look like:
let min_element_and_its_key map ~compare_element =
let take_min key element key_and_min_element =
match key_and_min_element with
| None -> Some (key, element)
| Some (key_for_min_element, min_element) ->
if compare_element element min_element < 0
then Some (key, element)
else Some (key_for_min_element, min_element)
in
DistMap.fold take_min map None
min_element_and_its_key will return None on an empty map.
Example client code (which you can run in an ocaml repl) might look like:
let map = DistMap.(empty |> add (3., 3.) "a" |> add (4., 4.) "b") in
min_element_and_its_key map ~compare_element:String.compare;;
(* Output: *)
- : (node * string) option = Some ((3., 3.), "a")
In general, anytime that you want to traverse all keys/elements in a data structure and accumulate a value, a fold is the way to go. iter will sort of work, but you'll have to accumulate the value in mutable state instead of accumulating it directly as the return value of the function you're folding with.

How to collapse a recursive tree in OCaml

I have a tree type:
type tree = Vertex of int * tree list;;
My recursive equality definition is that two trees are equal if their ints are equal and all of their children are equal.
How do I build the function
topo: tree -> tree list
that creates a list of all of the trees in depth first search order with each tree appearing once and only once (according to the equality definition)? I want to do this in a computationally efficient way. Maybe use lazy or a hashmap?
Here is my attempt, the code blows up when the length is too large:
type tree = Vertex of int * (tree list)
let rec base = function
| 0 -> Vertex (0, [])
| i -> Vertex (i, [base (i - 1)])
let rec range = function
| 0 -> [0]
| i -> i :: range (i - 1)
let agg i = Vertex (-1, List.map base (range i))
let rec equals (a: tree) (b: tree) : bool =
let rec deep_match a_dep b_dep = match a_dep, b_dep with
| [], [] -> true
| [], _
| _, [] -> false
| x::xs, y::ys -> equals x y && deep_match xs ys
in
let Vertex (ai, al) = a in
let Vertex (bi, bl) = b in
ai = bi && deep_match al bl
let rec in_list (a: tree) (l: tree list) : bool = match l with
| [] -> false
| hd::tl -> equals a hd || in_list a tl
let rec topological (pool: tree list) (t: tree) : tree list =
if in_list t pool then pool else
t::match t with
| Vertex(_, []) -> pool
| Vertex(_, deps) -> List.fold_left topological pool deps
let big_agg = agg 100_000
let topo_ordered = topological [] big_agg;;
Printf.printf "len %i\n" (List.length topo_ordered)
To make it efficient you need to implement ordering and hash-consing. With total ordering, you can store your trees in a balanced tree or even a hashtable, thus turning your in_list into O(logN) or even O(1). Adding hash-consing will enable O(1) comparison of your trees (at the cost of less efficient tree construction).
Instead of having both, depending on your design constraints, you can have only one. For the didactic purposes, let's implement hash-consing for your particular representation
To implement hash-consing you need to make your constructor private and hide data constructors behind an abstraction wall (to prevent users from breaking you hash-consing properties):
module Tree : sig
type t = private Vertex of int * t list
val create : int -> t list -> t
val equal : t -> t -> bool
end = struct
type t = Vertex of int * t list
let repository = Hashtbl.create 64
let create n children =
let node = Vertex (n,children) in
try Hashtbl.find repository node
with Not_found -> Hashtbl.add repository node node; node
let equal x y = x == y
end
Since we guaranteed that structurally equal trees are physically equal during the tree creation (i.e., if there exists an equal tree in our repository then we return it), we are now able to substitute structural equality with physical equality, i.e., with pointer comparison.
We got a fast comparison with the price - we now leaking memory, since we need to store all ever created trees and the create function is now O(N). We can alleviate the first problem by using ephemerons, but the latter problem will persist, of course.
Another issue, is that we're not able to put our trees into ordered structure, like a map or a set. We can of course use regular polymorphic compare, but since it will be O(N), inserting to such structure will become quadratic. Not an option for us. Therefore we need to add total ordering on our trees. We can theoretically do this without changing the representation (using ephemerons), but it is easier just to add an order parameter to our tree representation, e.g.,
module Tree : sig
type order (* = private int *) (* add this for debuggin *)
type t = private Vertex of order * int * t list
val create : int -> t list -> t
val equal : t -> t -> bool
val compare : t -> t -> int
end = struct
type order = int
type t = Vertex of order * int * t list
type tree = t
module Repository = Hashtbl.Make(struct
type t = tree
let max_hash = 16
let rec equal (Vertex (_,p1,x)) (Vertex (_,p2,y)) =
match compare p1 p2 with
| 0 -> equal_trees x y
| n -> false
and equal_trees xs ys = match xs, ys with
| [],[] -> true
| [],_ | _,[] -> false
| x :: xs, y::ys -> equal x y && equal_trees xs ys
let rec hash (Vertex (_,p,xs)) =
hash_trees (Hashtbl.hash p) max_hash xs
and hash_trees hash depth = function
| x :: xs when depth > 0 ->
hash_trees (Hashtbl.hash x) (depth-1) xs
| _ -> hash
end)
let repository = Repository.create 64
let create n children =
try Repository.find repository (Vertex (0,n,children))
with Not_found ->
let order = Repository.length repository + 1 in
let node = Vertex (order,n,children) in
Repository.add repository node node; node
let equal x y = x == y
let order (Vertex (order,_,_)) = order
let compare x y = compare (order x) (order y)
end
We had to manually implement the structural variants of equal and hash for our trees because we need to ignore the order in comparison, when we store a new tree in the repository. It looks like a bit of work, but in the real-life you can do this using derivers.
Anyways, now we got a comparable version of a tree with a comparison function which is O(1), so we can put our trees in sets and maps, and implement your topo efficiently.
A nice feature of both implementations is a tight representation of a tree, since sharing is guaranteed by the create function. E.g.,
# let t1 = Tree.create 42 [];;
val t1 : Tree.t = Tree.Vertex (1, 42, [])
# let t3 = Tree.create 42 [t1; t1];;
val t3 : Tree.t =
Tree.Vertex (2, 42, [Tree.Vertex (1, 42, []); Tree.Vertex (1, 42, [])])
# let t5 = Tree.create 42 [t1; t3; t1];;
val t5 : Tree.t =
Tree.Vertex (3, 42,
[Tree.Vertex (1, 42, []);
Tree.Vertex (2, 42, [Tree.Vertex (1, 42, []); Tree.Vertex (1, 42, [])]);
Tree.Vertex (1, 42, [])])
#
In this example, t1 in t5 and t3 will be the same pointer.
For optimal performance, one possibility would be to use hashconsing. However, in your current example, both the generation and the unicity test are quadratic in n. Fixing both points seems to already improve performance a lot.
First, we can avoid the quadratic tree generation by adding a lot of sharing:
let range max =
let rec range elt l n =
if n > max then elt::l
else
let next = Vertex(n,[elt]) in
range next (elt::l) (n+1) in
range (Vertex(0,[])) [] 1
let agg i = Vertex (-1, range i)
With this change, it is become reasonable to generate a tree with 1010 elements (but only 105 unique elements).
Then, the unicity test can be done with a set (or a hashtable):
module S = Set.Make(struct type t = tree let compare = compare end)
let rec topological (set, pool) t =
if S.mem t set then (set, pool) else
let set = S.add t set in
let set, pool =
match t with
| Vertex(_, []) -> set, pool
| Vertex(_, deps) -> List.fold_left topological (set,pool) deps in
set, t::pool

SML Common type for different structures

I am implementing sets in Standard ML. Currently it looks like this:
signature SET = sig
type t
type 'a set
...
val map : ('a -> t) -> 'a set -> t set
end
functor ListSetFn (EQ : sig type t val equal : t * t -> bool end)
:> SET where type t = EQ.t = struct
type t = EQ.t
type 'a set = 'a list
...
fun map f = fromList o (List.map f)
end
I want the map function to be able to take any set in a structure SET, ideally not even constrained to those from ListSetFn functor. However, on the top level it can only operate on sets created by a single structure: the one it is called from, e.g.:
functor EqListSetFn(eqtype t) :> SET where type t = t = struct
structure T = ListSetFn(struct type t = t val equal = op= end)
open T
end
structure IntSet = EqListSetFn(type t = int)
IntSet.map : ('a -> IntSet.t) -> 'a IntSet.set -> IntSet.t IntSet.set
While I'd really like it to be something like
IntSet.map : ('a -> IntSet.t) -> 'a ArbitrarySet.set -> IntSet.t IntSet.set
Is there a way to do it? I know it could be declared on the top level, but I want to hide the internal implementation and hence use opaque signature(s)
In principle, there are two ways to perform such a parameterisation:
Wrap the function into its own functor, that takes the other structure as argument.
Make the function polymorphic, passing the relevant functions to operate on the other type as individual arguments, or as a record of arguments.
Let's assume the SET signature contains these functions:
val empty : 'a set
val isEmpty : 'a set -> bool
val add : 'a * 'a set -> 'a set
val remove : 'a * 'a set -> 'a set
val pick : 'a set -> 'a
Then the former solution would look like this:
functor SetMap (structure S1 : SET; structure S2 : SET) =
struct
fun map f s =
if S1.isEmpty s then S2.empty else
let val x = S1.pick s
in S2.add (f x, map f (S2.remove (x, s)))
end
end
For solution 2, you would need to pass all relevant functions directly, e.g. as records:
fun map {isEmpty, pick, remove} {empty, add} f s =
let
fun iter s =
if isEmpty s then empty else
let val x = pick s
in add (f x, iter (remove (x, s)))
end
in iter s end
FWIW, this would be nicer with first-class structures, but SML does not have them as a standard feature.
fun map (pack S1 : SET) (pack S2 : SET) f s =
let
fun iter s =
if S1.isEmpty s then S2.empty else
let val x = S1.pick s
in S2.add (f x, iter (S2.remove (x, s)))
end
in iter s end

Hashtable of mutable variable in Ocaml

I need to use hashtable of mutable variable in Ocaml, but it doesn't work out.
let link = Hashtbl.create 3;;
let a = ref [1;2];;
let b = ref [3;4];;
Hashtbl.add link a b;;
# Hashtbl.mem link a;;
- : bool = true
# a := 5::!a;;
- : unit = ()
# Hashtbl.mem link a;;
- : bool = false
Is there any way to make it works?
You can use the functorial interface, which lets you supply your own hash and equality functions. Then you write functions that are based only on the non-mutable parts of your values. In this example, there are no non-mutable parts. So, it's not especially clear what you're expecting to find in the table. But in a more realistic example (in my experience) there are non-mutable parts that you can use.
If there aren't any non-mutable parts, you can add them specifically for use in hashing. You could add an arbitrary unique integer to each value, for example.
It's also possible to do hashing based on physical equality (==), which has a reasonable definition for references (and other mutable values). You have to be careful with it, though, as physical equality is a little tricky. For example, you can't use the physical address of a value as your hash key--an address can change at any time due to garbage collection.
Mutable variables that may happen to have the same content can still be distinguished because they are stored at different locations in memory. They can be compared with the physical equality operator (==). However, OCaml doesn't provide anything better than equality, it doesn't provide a nontrivial hash function or order on references, so the only data structure you can build to store references is an association list of some form, with $\Theta(n)$ access time for most uses.
(You can actually get at the underlying pointer if you play dirty. But the pointer can move under your feet. There is a way to make use of it nonetheless, but if you need to ask, you shouldn't use it. And you aren't desperate enough for that anyway.)
It would be easy to compare references if two distinct references had a distinct content. So make it so! Add a unique identifier to your references. Keep a global counter, increment it by 1 each time you create a reference, and store the counter value with the data. Now your references can be indexed by their counter value.
let counter = ref 0
let new_var x = incr counter; ref (!counter, x)
let var_value v = snd !v
let update_var v x = v := (fst !v, x)
let hash_var v = Hashtbl.hash (fst !v)
For better type safety and improved efficiency, define a data structure containing a counter value and an item.
let counter = ref 0
type counter = int
type 'a variable = {
key : counter;
mutable data : 'a;
}
let new_var x = incr counter; {key = !counter; data = x}
let hash_var v = Hashtbl.hash v.key
You can put the code above in a module and make the counter type abstract. Also, you can define a hash table module using the Hashtbl functorial interface. Here's another way to define variables and a hash table structure on them with a cleaner, more modular structure.
module Counter = (struct
type t = int
let counter = ref 0
let next () = incr counter; !counter
let value c = c
end : sig
type t
val next : unit -> t
val value : t -> int
end)
module Variable = struct
type 'a variable = {
mutable data : 'a;
key : Counter.t;
}
let make x = {key = Counter.next(); data = x}
let update v x = v.data <- x
let get v = v.data
let equal v1 v2 = v1 == v2
let hash v = Counter.value v.key
let compare v1 v2 = Counter.value v2.key - Counter.value v1.key
end
module Make = functor(A : sig type t end) -> struct
module M = struct
type t = A.t Variable.variable
include Variable
end
module Hashtbl = Hashtbl.Make(M)
module Set = Set.Make(M)
module Map = Map.Make(M)
end
We need the intermediate module Variable because the type variable is parametric and the standard library data structure functors (Hashtbl.Make, Set.Make, Map.Make) are only defined for type constructors with no argument. Here's an interface that exposes both the polymorphic interface (with the associated functions, but no data structures) and a functor to build any number of monomorphic instances, with an associated hash table (and set, and map) type.
module Variable : sig
type 'a variable
val make : 'a -> 'a variable
val update : 'a variable -> 'a -> unit
val get : 'a variable -> 'a
val equal : 'a -> 'a -> bool
val hash : 'a variable -> int
val compare : 'a variable -> 'b variable -> int
end
module Make : functor(A : sig type t end) -> sig
module M : sig
type t = A.t variable.variable
val make : A.t -> t
val update : t -> A.t -> unit
val get : t -> A.t
val equal : t -> t -> bool
val hash : t -> int
val compare : t -> t -> int
end
module Hashtbl : Hashtbl.S with type key = M.t
module Set : Set.S with type key = M.t
module Map : Map.S with type key = M.t
end
Note that if you expect that your program may generate more than 2^30 variables during a run, an int won't cut it. You need two int values to make a 2^60-bit value, or an Int64.t.
Note that if your program is multithreaded, you need a lock around the counter, to make the incrementation and lookup atomic.

Functors in OCaml

I am having a bit of a problem with a functor (and it's resultant type). Below, I have a Set functor that uses an Ordered type. I actually used the set.ml that comes with OCaml for some guidance, but I seem to be doing everything ahem right. I created an Ordered module with integers and applied it to the Set functor to get the last module on this code sample, IntSet.
The next line fails, when I try to insert an integer. I get the following type error:
Error: This expression has type int but is here used with type
SetInt.elt = Set(OrdInt).elt
Don't get me wrong, the type system is correct here. The top level reports that the type of the SetInt.elt is Set(OrdInt).elt, but when I do the same operations to set up a Set using the one provided by OCaml the 'same' line is, SetInt.elt = OrderedInt.t. Seems like I should be getting SetInt.elt = Ordered.t.
This is so simple, I'm probably just missing some stupid detail! argh!
Please Note: I have simplified the member/insert functions here since this issue has to do with types.
module type Ordered =
sig
type t
val lt : t -> t -> bool
val eq : t -> t -> bool
val leq : t -> t -> bool
end
module type S =
sig
type elt
type t
exception Already_Exists
val empty : t
val insert : elt -> t -> t
val member : elt -> t -> bool
end
module Set (Elt:Ordered) : S =
struct
type elt = Elt.t
type t = Leaf | Node of t * elt * t
exception Already_Exists
let empty = Leaf
let insert e t = t
let member e t = false
end
module OrdInt : Ordered =
struct
type t = int
let lt a b = a < b
let eq a b = a = b
let leq a b = a <= b
end
module IntSet = Set (OrdInt)
(* line that fails *)
let one_elm = IntSet.insert 1 IntSet.empty
You need to change these two lines
module Set (Elt:Ordered) : S =
module OrdInt : Ordered =
to
module Set (Elt:Ordered) : S with type elt = Elt.t =
module OrdInt : Ordered with type t = int =
Without these, the modules will not have signatures that expose the types elt and t as int.
[Edit]:
The set.ml doesn't have the 'with' bit, because there's a sml.mli, which declares the signature for the functor and it does have the 'with'. Also, OrdInt doesn't need 'with' if you don't explicitly specify a signature for it, like this:
module OrdInt =
You can also construct the set by defining the module in place:
module IntSet = Set (struct
type t = int
let lt a b = a < b
let eq a b = a = b
let leq a b = a <= b
end)

Resources