Balanced tree for functional symbol table - functional-programming

I'm doing exercises of "Modern Compiler Implementation in ML" (Andrew Appel). One of which (ex 1.1 d) is to recommend a balanced-tree data structure for functional symbol table. Appeal mentioned such data structure should rebalance on insertion but not on lookup. Being totally new to functional programming, I found this confusing. What is key insight on this requirement?

A tree that’s rebalanced on every insertion and deletion doesn’t need to rebalance on lookup, because lookup doesn’t modify the structure. If it was balanced before a lookup, it will stay balanced during and after.
In functional languages, insertion and rebalancing can be more expensive than in a procedural one. Because you can’t alter any node in place, you replace a node by creating a new node, then replacing its parent with a new node whose children are the new daughter and the unaltered older daughter, and then replace the grandparent node with one whose children are the new parent and her older sister, and so on up. You finish when you create a new root node for the updated tree and garbage-collect all the nodes you replaced. However, some tree structures have the desirable property that they need to replace no more than O(log N) nodes of the tree on an insertion and can re-use the rest. This means that the rotation of a red-black tree (for example) has not much more overhead than an unbalanced insertion.
Also, you will typically need to query a symbol table much more often than you update it. It therefore becomes less tempting to try to make insertion faster: if you’re inserting, you might as well rebalance.
The question of which self-balancing tree structure is best for a functional language has been asked here, more than once.

Since Davislor already answered your question extensively, here are mostly some implementation hints. I would add that choice of data structure for your symbol table is probably not relevant for a toy compiler. Compilation time only starts to become an issue when you compiler is used on a lot of code and the code is recompiled often.
Sticking to a O(n) insert/lookup data structure is fine in practice until it isn't.
Signature-wise, all you want is a key-value mapping, insert, and lookup:
signature SymTab =
sig
type id
type value
type symtab
val empty : symtab
val insert : id -> value -> symtab -> symtab
val lookup : id -> symtab -> value option
end
A simple O(n) implementation with lists might be:
structure ListSymTab : SymTab =
struct
type id = string
type value = int
type symtab = (id * value) list
val empty = []
fun insert id value [] = [(id, value)]
| insert id value ((id',value')::symtab) =
if id = id'
then (id,value)::symtab
else (id',value')::insert id value symtab
fun lookup _ [] = NONE
| lookup id ((id',value)::symtab) =
if id = id' then SOME value else lookup id symtab
end
You might use it like:
- ListSymTab.lookup "hello" (ListSymTab.insert "hello" 42 ListSymTab.empty);
> val it = SOME 42 : int option
Then again, maybe your symbol table doesn't map strings to integers, or you may have one symbol table for variables and one for functions.
You could parameterise the id/value types using a functor:
functor ListSymTabFn (X : sig
eqtype id
type value
end) : SymTab =
struct
type id = X.id
type value = X.value
(* The rest is the same as ListSymTab. *)
end
And you might use it like:
- structure ListSymTab = ListSymTabFn(struct type id = string type value = int end);
- ListSymTab.lookup "world" (ListSymTab.insert "hello" 42 ListSymTab.empty);
> val it = NONE : int option
All you need for a list-based symbol table is that the identifiers/symbols can be compared for equality. For your balanced-tree symbol table, you need identifiers/symbols to be orderable.
Instead of implementing balanced trees from scratch, look e.g. at SML/NJ's RedBlackMapFn:
To create a structure implementing maps (dictionaries) over a type T [...]:
structure MapT = RedBlackMapFn (struct
type ord_key = T
val compare = compareT
end)
Try this example with T as string and compare as String.compare:
$ sml
Standard ML of New Jersey v110.76 [built: Sun Jun 29 03:29:51 2014]
- structure MapS = RedBlackMapFn (struct
type ord_key = string
val compare = String.compare
end);
[autoloading]
[library $SMLNJ-BASIS/basis.cm is stable]
[library $SMLNJ-LIB/Util/smlnj-lib.cm is stable]
[autoloading done]
structure MapS : ORD_MAP?
- open MapS;
...
Opening the structure is an easy way to explore the available functions and their types.
We can then create a similar functor to ListSymTabFn, but one that takes an additional compare function:
functor RedBlackSymTabFn (X : sig
type id
type value
val compare : id * id -> order
end) : SymTab =
struct
type id = X.id
type value = X.value
structure SymTabX = RedBlackMapFn (struct
type ord_key = X.id
val compare = X.compare
end)
(* The 'a map type inside SymTabX maps X.id to anything. *)
(* We are, however, only interested in mapping to values. *)
type symtab = value SymTabX.map
(* Use other stuff in SymTabT for empty, insert, lookup. *)
end
Finally, you can use this as your symbol table:
structure SymTab = RedBlackSymTabFn(struct
type id = string
type value = int
val compare = String.compare
end);

Related

Pointer to a record in OCaml

I am implementing binary search trees in OCaml, trying to use as much imperative programming as possible.
I have the following data type:
type tKey = Key of int;;
type tBST = Null | Pos of node ref
and node = {mutable key : tKey; mutable left : tBST; mutable right : tBST};;
I am having trouble with this function:
let createNode k tree =
tree := Pos ({key = k; left = Null; right = Null});;
Error: This record expression is expected to have type node ref
The field key does not belong to type ref
A binary search tree can be either Null (means empty tree) or a Pos. A tree Pos is a pointer to a node, and a node is a structure of a key and 2 other trees (left and right).
My main goal here is to have a tree that is modified after functions are over. Passing tree by reference so when createNode is over, the tBST I passed as parameter is modified.
Question: is actually possible to do what I am trying in OCaml? if so, how could I change my function createNode and/or data type to make this happen?
Thank you very much.
It is possible, but you need to create the Pos node with a reference explicitly:
Pos (ref {key = k; (*...*)})
Whether what you are trying to do is recommended practice in a language like Ocaml is a different story, though.
The question has already been answered. I would just like to add a side note: The use of ref seems superfluous in this case.
A value of type tBST is either Null or a mutable pointer. If it is Null it will remain Null. If it is non-Null, it will remain non-Null, but the actual pointer might change. That might well be what you intended, but I have my doubts. In particular, what tBST does not do, is to emulate C-style pointers (which are either null or really point somewhere). I suspect, though, that that was your intention.
The idiomatic way to emulate C-style pointers is to just use the built-in option type, like so:
type tBST = node option
A value of type node option is either None or Some n, where n is a pointer to a value of type node. You use tBST for mutable fields (of the record node), so you would effectively have mutable C-style pointers to nodes.
Here is what you probably had in mind:
type tree = node option ref
and node = {
mutable left: tree;
mutable key: int;
mutable right: tree;
};;
let t0 : tree = ref None;;
let t1 : tree = ref (Some { left = ref None; key = 1; right = ref None; }) ;;
let create_node key tree =
tree := Some { left = ref None; key; right = ref None; }
No need to have a separate type for key but you can if you want it, and with the latest OCaml there no runtime overhead for it.

F# map and distinct objects

I have some nondescript but distinct objects (specifically, unnamed variables in logic expressions) that I want to put in a map that associates them with their values. As I understand it, map needs to distinguish objects by some ordered field, so I can't just have
type Term =
...
| Var
as this would not allow different variables distinguishable from each other. Instead I could presumably have
type Term =
...
| Var of int64
and then have a new_var function that increments a global int64 counter and returns a new variable with the incremented value. This seems slightly inelegant, but should work.
Is the global counter the recommended way to handle this, or is there a more idiomatic method?
It's not really a "map having to distinguish objects" thing - when you declare a type like this:
type Term =
| Var
you have a type with a single valid value - Var. If you're saying you want to have objects that are distinct - this is not what you want. You can still use that type as a key in a map - not a particularly useful one though, since it will have at most a single element.
Using a counter is a good enough way to handle it. If you don't want a "global" one, you can roll it into a function using a ref cell to hold it:
type Term =
| Var of int
let make =
let counter = ref 0
fun () ->
counter := !counter + 1
Term.Var (!counter)
Or use GUIDs if you don't care about the values and want the counter out of the picture:
type Term =
| Var of System.Guid
let make () =
Term.Var (System.Guid.NewGuid())

SML/NJ - linked list which can hold any types

I trying to create a datatype for linked list which can hold all types at same time i.e linked list of void* elements , the designing is to create a Node datatype which hold a record contains Value and Next .
What I did so far is -
datatype 'a anything = dummy of 'a ; (* suppose to hold any type (i.e void*) *)
datatype linkedList = Node of {Value:dummy, Next:linkedList}; (* Node contain this record *)
As you can see the above trying does not works out , but I believe my idea is clear enough , so what changes are required here to make it work ?
I am not sure if you are being forced to use a record type. Because otherwise I think it is simpler to do:
datatype 'a linkedlist = Empty | Cons of 'a * 'a linkedlist
Then you can use it somewhat like:
val jedis = Cons ("Obi-wan", Cons("Luke", Cons("Yoda", Cons("Anakin", Empty))));
I think the use of the record is a poor choice here. I cannot even think how I could represent an empty list with that approach.
-EDIT-
To answer your comment about supporting multiple types:
datatype polymorphic = N of int | S of string | B of bool
Cons(S("A"), Cons(N(5), Cons(N(6), Cons(B(true), Empty))));
Given the circumstances you may prefer SML lists instead:
S("A")::N(5)::N(6)::B(true)::[];
Which produces the list
[S "A",N 5,N 6,B true]
That is, a list of the same type (i.e. polymorphic), but this type is capable of containing different kinds of things through its multiple constructors.
FYI, if it is important that the types of your polymorphic list remain open, you can use SML's built-in exception type: exn. The exn type is open and can be extended anywhere in the program.
exception INT of int
exception STR of string
val xs = [STR "A", INT 5, INT 6] : exn list
You can case selectively on particular types as usual:
val inc_ints = List.map (fn INT i => INT (i + 1) | other => other)
And you can later extend the type without mention of its previous definition:
exception BOOL of bool
val ys = [STR "A", INT 5, INT 6, BOOL true] : exn list
Notice that you can put the construction of any exception in there (here the div-by-zero exception):
val zs = Div :: ys : exn list
That said, this (ab)use really has very few good use cases and you are generally better off with a closed sum type as explained by Edwin in the answer above.

Function with different argument types

I read about polymorphism in function and saw this example
fun len nil = 0
| len rest = 1 + len (tl rest)
All the other examples dealt with nil arg too.
I wanted to check the polymorphism concept on other types, like
fun func (a : int) : int = 1
| func (b : string) : int = 2 ;
and got the follow error
stdIn:1.6-2.33 Error: parameter or result constraints of clauses don't agree
[tycon mismatch]
this clause: string -> int
previous clauses: int -> int
in declaration:
func = (fn a : int => 1: int
| b : string => 2: int)
What is the mistake in the above function? Is it legal at all?
Subtype Polymorphism:
In a programming languages like Java, C# o C++ you have a set of subtyping rules that govern polymorphism. For instance, in object-oriented programming languages if you have a type A that is a supertype of a type B; then wherever A appears you can pass a B, right?
For instance, if you have a type Mammal, and Dog and Cat were subtypes of Mammal, then wherever Mammal appears you could pass a Dog or a Cat.
You can achive the same concept in SML using datatypes and constructors. For instance:
datatype mammal = Dog of String | Cat of String
Then if you have a function that receives a mammal, like:
fun walk(m: mammal) = ...
Then you could pass a Dog or a Cat, because they are constructors for mammals. For instance:
walk(Dog("Fido"));
walk(Cat("Zoe"));
So this is the way SML achieves something similar to what we know as subtype polymorphism in object-oriented languajes.
Ad-hoc Polymorphysm:
Coercions
The actual point of confusion could be the fact that languages like Java, C# and C++ typically have automatic coercions of types. For instance, in Java an int can be automatically coerced to a long, and a float to a double. As such, I could have a function that accepts doubles and I could pass integers. Some call these automatic coercions ad-hoc polymorphism.
Such form of polymorphism does not exist in SML. In those cases you are forced to manually coerced or convert one type to another.
fun calc(r: real) = r
You cannot call it with an integer, to do so you must convert it first:
calc(Real.fromInt(10));
So, as you can see, there is no ad-hoc polymorphism of this kind in SML. You must do castings/conversions/coercions manually.
Function Overloading
Another form of ad-hoc polymorphism is what we call method overloading in languages like Java, C# and C++. Again, there is no such thing in SML. You may define two different functions with different names, but no the same function (same name) receiving different parameters or parameter types.
This concept of function or method overloading must not be confused with what you use in your examples, which is simply pattern matching for functions. That is syntantic sugar for something like this:
fun len xs =
if null xs then 0
else 1 + len(tl xs)
Parametric Polymorphism:
Finally, SML offers parametric polymorphism, very similar to what generics do in Java and C# and I understand that somewhat similar to templates in C++.
So, for instance, you could have a type like
datatype 'a list = Empty | Cons of 'a * 'a list
In a type like this 'a represents any type. Therefore this is a polymorphic type. As such, I could use the same type to define a list of integers, or a list of strings:
val listOfString = Cons("Obi-wan", Empty);
Or a list of integers
val numbers = Cons(1, Empty);
Or a list of mammals:
val pets = Cons(Cat("Milo", Cons(Dog("Bentley"), Empty)));
This is the same thing you could do with SML lists, which also have parametric polymorphism:
You could define lists of many "different types":
val listOfString = "Yoda"::"Anakin"::"Luke"::[]
val listOfIntegers 1::2::3::4::[]
val listOfMammals = Cat("Misingo")::Dog("Fido")::Cat("Dexter")::Dog("Tank")::[]
In the same sense, we could have parametric polymorphism in functions, like in the following example where we have an identity function:
fun id x = x
The type of x is 'a, which basically means you can substitute it for any type you want, like
id("hello");
id(35);
id(Dog("Diesel"));
id(Cat("Milo"));
So, as you can see, combining all these different forms of polymorphism you should be able to achieve the same things you do in other statically typed languages.
No, it's not legal. In SML, every function has a type. The type of the len function you gave as an example is
fn : 'a list -> int
That is, it takes a list of any type and returns an integer. The function you're trying to make takes and integer or a string, and returns an integer, and that's not legal in the SML type system. The usual workaround is to make a wrapper type:
datatype wrapper = I of int | S of string
fun func (I a) = 1
| func (S a) = 2
That function has type
fn : wrapper -> int
Where wrapper can contain either an integer or a string.

Hashtables in ocaml

Is it possible to store different types in the same hashtable (Hashtbl) in Ocaml? Are hashtables really restricted to just one type?
Yes, hash tables entries are restricted to one type for each table. This is really a question about the OCaml type sytem and not about hash tables. If it seems odd to require things to be the same type in a hash table, how about in a list?
Without knowing the problem you're solving, it's hard to know what to suggest. However, a common thing to do is to create an algebraic type that has one variant for each of the types you're dealing with:
type alg = A of int | B of float
A value of type (string, alg) Hashtbl.t would store ints and floats, using a string as the lookup key.
# let ht = Hashtbl.create 44;;
val ht : ('_a, '_b) Hashtbl.t = <abstr>
# Hashtbl.add ht "yes" (A 3);;
- : unit = ()
# Hashtbl.add ht "no" (B 1.7);;
- : unit = ()
# ht;;
- : (string, alg) Hashtbl.t = <abstr>
# Hashtbl.find ht "yes";;
- : alg = A 3
After you get used to the flexible and strong typing of OCaml, it's hard to go back to systems without it.

Resources