Simple non-optimal unionfind in OCaml - functional-programming

I wrote a OCaml program for union find algorithm. This algorithm I wrote is not optimal and is the simplest version.
I put my OCaml code here because I am not sure whether this code is good enough or not (despite of the algorithm itself), although this code can run without errors.
This is the first time I wrote a complete working thing after I started to learn OCaml, so please help me by reviewing it.
Useful suggestions will help me improving my OCaml skills. Thanks
type union_find = {id_ary : int array; sz_ary : int array};;
let create_union n = {id_ary = Array.init n (fun i -> i);
sz_ary = Array.init n (fun i -> 1)};;
let union u p q =
let rec unionfy id_ary i =
let vp = id_ary.(p) in
let vq = id_ary.(q) in
if i < Array.length id_ary then begin
if i != q && id_ary.(i) = vp then id_ary.(i) <- vq;
unionfy id_ary (i + 1)
end
else print_string "end of union\n"
in
unionfy u.id_ary 0;;
let is_connected u p q = u.id_ary.(p) = u.id_ary.(q);;
First of all,
Am I creating the data structure of union (as in union find) correctly?
Should I include two arrays inside or is there any better way?
Second,
I am using array in this code, but array is mutable which is not that good for fp right?
Is there a way to avoid using array?
Finally,
Overall, is this piece of code good enough?
Anything can be improved?
P.S. I am not using OCaml's object oriented bit yet, as I haven't learnt to that part.

Some comments on the code:
You don't seem to use sz_ary for anything.
Your code iterates through the whole array for each union operation. This is not correct for the standard (Tarjan) union-find. For a linear number of union operations your code produces a quadratic solution. Wikipedia has the standard algorithm: Disjoint-set Data Structure.
To answer your second question: as far as I know, union-find is one of the algorithms for which there's no known functional (immutable) solution with the same complexity as the best imperative solution. Since an array is simply a map from integers to values, you could always translate any array-based solution into an immutable one using maps. As far as I've been able to determine, this would match the best known solution in asymptotic complexity; i.e., it would add an extra factor of log n. Of course there would also be a constant factor that might be large enough to be a problem.
I've implemented union-find a few times in OCaml, and I've always chosen to do it using mutable data. However, I haven't used arrays. I have a record type for my basic objects, and I use a mutable field in each record to point to its parent object. To do path compression, you modify the parent pointer to point to the current root of the tree.

A few stylistic points:
Not sure why unionfy takes id_ary as a parameter since it keeps it constant throughout
don't use Array.init with a constant function. Just use Array.make.
print_string "...\n" is equivalent to print_endline "..."
The following definition can be cleaned up with punning to
let union u p q = to: let union {id_ary; _} p q so that there are no extraneous references to u.
Same punning trick for let is_connected u p q = u.id_ary.(p) = u.id_ary.(q);;
This might be a personal choice but I would get rid of:
let vp = id_ary.(p) in
let vq = id_ary.(q) in
Or at least shove them above the recursive definition so that it's clear they are constant.
EDIT: corrected version
let union {id_ary;_} p q =
let (vp, vq) = (id_ary.(p), id_ary.(q)) in
let rec unionfy i =
if i < Array.length id_ary then begin
if i != q && id_ary.(i) = vp then id_ary.(i) <- vq;
unionfy (i + 1)
end else print_endline "end of union"
in unionfy 0;;

Related

Constrain the range of a parametric type to integer values

I've created a parametrized type where the parameter is an integer. Below is a reasonable MWE.
struct InverseTaylor{N}
a::Vector{Float64}
function InverseTaylor{N}() where {N}
a = something_done_with_n(N)
new(a)
end
end
But I'm unhappy to have the where {N} not constraining the N further.
What I could think of during writing this question is to add a verification in the inner constructor, like so:
N isa Int || error("$N should be an integer")
Yet it still feels like there should be a better way.
P.S. I know there are caveats to using types parameterized on integer values, but I feel like if the language allows for the concept, there should also be a way to make it clear what I expect from N in the struct definition.
would
struct InverseTaylor{N}
a::Vector{Float64}
function InverseTaylor{N}() where N <: Integer
a = something_done_with_n(N)
new(a)
end
end
work for what you want to do?

Extract nth element of a tuple

For a list, you can do pattern matching and iterate until the nth element, but for a tuple, how would you grab the nth element?
TL;DR; Stop trying to access directly the n-th element of a t-uple and use a record or an array as they allow random access.
You can grab the n-th element by unpacking the t-uple with value deconstruction, either by a let construct, a match construct or a function definition:
let ivuple = (5, 2, 1, 1)
let squared_sum_let =
let (a,b,c,d) = ivuple in
a*a + b*b + c*c + d*d
let squared_sum_match =
match ivuple with (a,b,c,d) -> a*a + b*b + c*c + d*d
let squared_sum_fun (a,b,c,d) =
a*a + b*b + c*c + d*d
The match-construct has here no virtue over the let-construct, it is just included for the sake of completeness.
Do not use t-uples, Don¹
There are only a few cases where using t-uples to represent a type is the right thing to do. Most of the times, we pick a t-uple because we are too lazy to define a type and we should interpret the problem of accessing the n-th field of a t-uple or iterating over the fields of a t-uple as a serious signal that it is time to switch to a proper type.
There are two natural replacements to t-uples: records and arrays.
When to use records
We can see a record as a t-uple whose entries are labelled; as such, they are definitely the most natural replacement to t-uples if we want to access them directly.
type ivuple = {
a: int;
b: int;
c: int;
d: int;
}
We then access directly the field a of a value x of type ivuple by writing x.a. Note that records are easily copied with modifications, as in let y = { x with d = 0 }. There is no natural way to iterate over the fields of a record, mostly because a record do not need to be homogeneous.
When to use arrays
A large² homogeneous collection of values is adequately represented by an array, which allows direct access, iterating and folding. A possible inconvenience is that the size of an array is not part of its type, but for arrays of fixed size, this is easily circumvented by introducing a private type — or even an abstract type. I described an example of this technique in my answer to the question “OCaml compiler check for vector lengths”.
Note on float boxing
When using floats in t-uples, in records containing only floats and in arrays, these are unboxed. We should therefore not notice any performance modification when changing from one type to the other in our numeric computations.
¹ See the TeXbook.
² Large starts near 4.
Since the length of OCaml tuples is part of the type and hence known (and fixed) at compile time, you get the n-th item by straightforward pattern matching on the tuple. For the same reason, the problem of extracting the n-th element of an "arbitrary-length tuple" cannot occur in practice - such a "tuple" cannot be expressed in OCaml's type system.
You might still not want to write out a pattern every time you need to project a tuple, and nothing prevents you from generating the functions get_1_1...get_i_j... that extract the i-th element from a j-tuple for any possible combination of i and j occuring in your code, e.g.
let get_1_1 (a) = a
let get_1_2 (a,_) = a
let get_2_2 (_,a) = a
let get_1_3 (a,_,_) = a
let get_2_3 (_,a,_) = a
...
Not necessarily pretty, but possible.
Note: Previously I had claimed that OCaml tuples can have at most length 255 and you can simply generate all possible tuple projections once and for all. As #Virgile pointed out in the comments, this is incorrect - tuples can be huge. This means that it is impractical to generate all possible tuple projection functions upfront, hence the restriction "occurring in your code" above.
It's not possible to write such a function in full generality in OCaml. One way to see this is to think about what type the function would have. There are two problems. First, each size of tuple is a different type. So you can't write a function that accesses elements of tuples of different sizes. The second problem is that different elements of a tuple can have different types. Lists don't have either of these problems, which is why you can have List.nth.
If you're willing to work with a fixed size tuple whose elements are all the same type, you can write a function as shown by #user2361830.
Update
If you really have collections of values of the same type that you want to access by index, you should probably be using an array.
here is a function wich return you the string of the ocaml function you need to do that ;) very helpful I use it frequently.
let tup len n =
if n>=0 && n<len then
let rec rep str nn = match nn<1 with
|true ->""
|_->str ^ (rep str (nn-1))in
let txt1 ="let t"^(string_of_int len)^"_"^(string_of_int n)^" tup = match tup with |" ^ (rep "_," n) ^ "a" and
txt2 =","^(rep "_," (len-n-2)) and
txt3 ="->a" in
if n = len-1 then
print_string (txt1^txt3)
else
print_string (txt1^txt2^"_"^txt3)
else raise (Failure "Error") ;;
For example:
tup 8 6;;
return:
let t8_6 tup = match tup with |_,_,_,_,_,_,a,_->a
and of course:
val t8_6 : 'a * 'b * 'c * 'd * 'e * 'f * 'g * 'h -> 'g = <fun>

How to implement a dictionary as a function in OCaml?

I am learning Jason Hickey's Introduction to Objective Caml.
Here is an exercise I don't have any clue
First of all, what does it mean to implement a dictionary as a function? How can I image that?
Do we need any array or something like that? Apparently, we can't have array in this exercise, because array hasn't been introduced yet in Chapter 3. But How do I do it without some storage?
So I don't know how to do it, I wish some hints and guides.
I think the point of this exercise is to get you to use closures. For example, consider the following pair of OCaml functions in a file fun-dict.ml:
let empty (_ : string) : int = 0
let add d k v = fun k' -> if k = k' then v else d k'
Then at the OCaml prompt you can do:
# #use "fun-dict.ml";;
val empty : string -> int =
val add : ('a -> 'b) -> 'a -> 'b -> 'a -> 'b =
# let d = add empty "foo" 10;;
val d : string -> int =
# d "bar";; (* Since our dictionary is a function we simply call with a
string to look up a value *)
- : int = 0 (* We never added "bar" so we get 0 *)
# d "foo";;
- : int = 10 (* We added "foo" -> 10 *)
In this example the dictionary is a function on a string key to an int value. The empty function is a dictionary that maps all keys to 0. The add function creates a closure which takes one argument, a key. Remember that our definition of a dictionary here is function from key to values so this closure is a dictionary. It checks to see if k' (the closure parameter) is = k where k is the key just added. If it is it returns the new value, otherwise it calls the old dictionary.
You effectively have a list of closures which are chained not by cons cells by by closing over the next dictionary(function) in the chain).
Extra exercise, how would you remove a key from this dictionary?
Edit: What is a closure?
A closure is a function which references variables (names) from the scope it was created in. So what does that mean?
Consider our add function. It returns a function
fun k' -> if k = k' then v else d k
If you only look at that function there are three names that aren't defined, d, k, and v. To figure out what they are we have to look in the enclosing scope, i.e. the scope of add. Where we find
let add d k v = ...
So even after add has returned a new function that function still references the arguments to add. So a closure is a function which must be closed over by some outer scope in order to be meaningful.
In OCaml you can use an actual function to represent a dictionary. Non-FP languages usually don't support functions as first-class objects, so if you're used to them you might have trouble thinking that way at first.
A dictionary is a map, which is a function. Imagine you have a function d that takes a string and gives back a number. It gives back different numbers for different strings but always the same number for the same string. This is a dictionary. The string is the thing you're looking up, and the number you get back is the associated entry in the dictionary.
You don't need an array (or a list). Your add function can construct a function that does what's necessary without any (explicit) data structure. Note that the add function takes a dictionary (a function) and returns a dictionary (a new function).
To get started thinking about higher-order functions, here's an example. The function bump takes a function (f: int -> int) and an int (k: int). It returns a new function that returns a value that's k bigger than what f returns for the same input.
let bump f k = fun n -> k + f n
(The point is that bump, like add, takes a function and some data and returns a new function based on these values.)
I thought it might be worth to add that functions in OCaml are not just pieces of code (unlike in C, C++, Java etc.). In those non-functional languages, functions don't have any state associated with them, it would be kind of rediculous to talk about such a thing. But this is not the case with functions in functional languages, you should start to think of them as a kind of objects; a weird kind of objects, yes.
So how can we "make" these objects? Let's take Jeffrey's example:
let bump f k =
fun n ->
k + f n
Now what does bump actually do? It might help you to think of bump as a constructor that you may already be familiar with. What does it construct? it constructs a function object (very losely speaking here). So what state does that resulting object has? it has two instance variables (sort of) which are f and k. These two instance variables are bound to the resulting function-object when you invoke bump f k. You can see that the returned function-object:
fun n ->
k + f n
Utilizes these instance variables f and k in it's body. Once this function-object is returned, you can only invoke it, there's no other way for you to access f or k (so this is encapsulation).
It's very uncommon to use the term function-object, they are called just functions, but you have to keep in mind that they can "enclose" state as well. These function-objects (also called closures) are not far separated from the "real" objects in object-oriented programming languages, a very interesting discussion can be found here.
I'm also struggling with this problem. Here's my solution and it works for the cases listed in the textbook...
An empty dictionary simply returns 0:
let empty (k:string) = 0
Find calls the dictionary's function on the key. This function is trivial:
let find (d: string -> int) k = d k
Add extends the function of the dictionary to have another conditional branching. We return a new dictionary that takes a key k' and matches it against k (the key we need to add). If it matches, we return v (the corresponding value). If it doesn't match we return the old (smaller) dictionary:
let add (d: string -> int) k v =
fun k' ->
if k' = k then
v
else
d k'
You could alter add to have a remove function. Also, I added a condition to make sure we don't remove a non-exisiting key. This is just for practice. This implementation of a dictionary is bad anyways:
let remove (d: string -> int) k =
if find d k = 0 then
d
else
fun k' ->
if k' = k then
0
else
d k'
I'm not good with the terminology as I'm still learning functional programming. So, feel free to correct me.

Erlang syntax for nested function with if

I've been looking around and can't find examples of this and all of my syntax wrestling skills are failing me. Can someone please tell me how to make this compile?? My ,s ;s or .s are just wrong I guess for defining a nested function...
I'm aware there is a function for doing string replaces already so I don't need to implement this, but I'm playing with Erlang trying to pick it up so I'm hand spinning some of the basics I need to use..
replace(Whole,Old,New) ->
OldLen = length(Old),
ReplaceInit = fun(Next, NewWhole) ->
if
lists:prefix(Old, [Next|NewWhole]) -> {_,Rest} = lists:split(OldLen-1, NewWhole), New ++ Rest;
true -> [Next|NewWhole]
end,
lists:foldr(ReplaceInit, [], Whole).
Basically I'm trying to write this haskell (also probably bad but beyond the point):
repl xs ys zs =
foldr replaceInit [] xs
where
ylen = length ys
replaceInit y newxs
| take ylen (y:newxs) == ys = zs ++ drop (ylen-1) newxs
| otherwise = y:newxs
The main problem is that in an if you are only allowed to use guards as tests. Guards are very restricted and, amongst other things, calls to general Erlang functions are not allowed. Irrespective of whether they are part of the OTP release or written by you. The best solution for your function is to use case instead of if. For example:
replace(Whole,Old,New) ->
OldLen = length(Old),
ReplaceInit = fun (Next, NewWhole) ->
case lists:prefix(Old, [Next|NewWhole]) of
true ->
{_,Rest} = lists:split(OldLen-1, NewWhole),
New ++ Rest;
false -> [Next|NewWhole]
end
end,
lists:foldr(ReplaceInit, [], Whole).
Because of this if is not used that often in Erlang. See about if and about guards in the Erlang documentation.

Implementing quicksort in a functional language

I need to implement quicksort in SML for a homework assignment, and I'm lost. I was previously unfamiliar with how quicksort was implemented, so I read up on that, but every implementation I read about was in an imperative one. These don't look too difficult, but I had no idea how to implement quicksort functionally.
Wikipedia happens to have quicksort code in Standard ML (which is the language required for my assignment), but I don't understand how it's working.
Wikipedia code:
val filt = List.filter
fun quicksort << xs = let
fun qs [] = []
| qs [x] = [x]
| qs (p::xs) = let
val lessThanP = (fn x => << (x, p))
in
qs (filt lessThanP xs) # p :: (qs (filt (not o lessThanP) xs))
end
in
qs xs
end
In particular, I don't understand this line: qs (filt lessThanP xs) # p :: (qs (filt (not o lessThanP) xs)). filt will return a list of everything in xs less than p*, which is concatenated with p, which is cons-ed onto everything >= p.*
*assuming the << (x, p) function returns true when x < p. Of course it doesn't have to be that.
Actually, typing this out is helping me understand what's going on a bit. Anyways, I'm trying to compare that SML function to wiki's quicksort pseudocode, which follows.
function quicksort(array, 'left', 'right')
// If the list has 2 or more items
if 'left' < 'right'
// See "Choice of pivot" section below for possible choices
choose any 'pivotIndex' such that 'left' ≤ 'pivotIndex' ≤ 'right'
// Get lists of bigger and smaller items and final position of pivot
'pivotNewIndex' := partition(array, 'left', 'right', 'pivotIndex')
// Recursively sort elements smaller than the pivot
quicksort(array, 'left', 'pivotNewIndex' - 1)
// Recursively sort elements at least as big as the pivot
quicksort(array, 'pivotNewIndex' + 1, 'right')
Where partition is defined as
// left is the index of the leftmost element of the array
// right is the index of the rightmost element of the array (inclusive)
// number of elements in subarray = right-left+1
function partition(array, 'left', 'right', 'pivotIndex')
'pivotValue' := array['pivotIndex']
swap array['pivotIndex'] and array['right'] // Move pivot to end
'storeIndex' := 'left'
for 'i' from 'left' to 'right' - 1 // left ≤ i < right
if array['i'] < 'pivotValue'
swap array['i'] and array['storeIndex']
'storeIndex' := 'storeIndex' + 1
swap array['storeIndex'] and array['right'] // Move pivot to its final place
return 'storeIndex'
So, where exactly is the partitioning happening? Or am I thinking about SMLs quicksort wrongly?
So, where exactly is the partitioning happening? Or am I thinking about SMLs quicksort wrongly?
A purely functional implementation of quicksort works by structural recursion on the input list (IMO, this is worth mentioning). Moreover, as you see, the two calls to "filt" allow you to partition the input list into two sublists (say A and B), which then can be treated individually. What is important here is that:
all elements of A are less than or equal to the pivot element ("p" in the code)
all elements of B are greater than the pivot element
An imperative implementation works in-place, by swapping elements in the same array. In the pseudocode you've provided, the post-invariant of the "partition" function is that you have two subarrays, one starting at 'left' of input array (and ending at 'pivotIndex'), and another starting right after 'pivotIndex' and ending at 'right'. What is important here is that the two subarrays can be seen as representations of the sublists A and B.
I think that by now, you have the idea where the partitioning step is happening (or conversely, how the imperative and functional are related).
You said this:
filt will return a list of everything in xs less than p*, which is concatenated with p, which is cons-ed onto everything >= p.*
That's not quite accurate. filt will return a list of everything in xs less than p, but that new list isn't immediately concatenated with p. The new list is in fact passed to qs (recursively), and whatever qs returns is concatenated with p.
In the pseudocode version, the partitioning happens in-place in the array variable. That's why you see swap in the partition loop. Doing the partitioning in-place is much better for performance than making a copy.

Resources