How to implement a binary heap using list in OCaml? - functional-programming

I am implementing a binary heap using list in OCaml, just to sharpen my OCaml skills.
I feel it very difficult using list and after struggling for 2 days, I have to come here for suggestions and hints.
Here is my thought so far
Obviously, I can't use the orignal array based algorithm to implement it using list.
What I am trying to utilise is binary tree. I have keep the invariant that a node should be bigger than any node whose level is lower than its.
I roughly figured out how to implement insert, although I am not sure whether it is correct or not.
For the binary tree, each node has two children, value and size n which is the total number of offsprings it has. This n is used to balance the tree.
When inserting x, I compare with a node (from root, recursively). Assume x < the value of the node, then
If one or both of the node's children are Leaf, then I insert the x to that Leaf place.
If none of the node's children are Leaf, then I will choose the child whose n is less and then recursively insert.
Here is my code
type 'a heap =
| Node of 'a * 'a heap * 'a heap * int
| Leaf
exception EmptyHeapException
let create_heap () = Leaf;;
let rec insert x = function
| Leaf -> Node (x, Leaf, Leaf, 0)
| Node (v, l, r, n) ->
let (stay, move) = if x > v then (x, v) else (v, x)
in
match (l, r) with
| (Leaf, Leaf) ->
Node (stay, Node (move, Leaf, Leaf, 0), Leaf, 1)
| (Leaf, _) ->
Node (stay, Node (move, Leaf, Leaf, 0), r, n+1)
| (_, Leaf) ->
Node (stay, l, Node (move, Leaf, Leaf, 0), n+1)
| (Node (_, _, _, n1), Node (_, _, _, n2)) ->
if n1 <= n2 then
Node (stay, (insert move l), r, n1+1)
else
Node (stay, l, (insert move r), n2+1);;
Ok, I have following questions.
Am I heading to the correct direction? Is my thought or implementation correct?
I get stuck in implementing get_top function. I don't know how to continue. any hints?
ocaml batteries implemented an efficient batHeap.ml. I have had a look, but I feel its way is totally different from mine and I can't understand it. Any one can help me understanding it?

This insertion code looks pretty nice to me. (I was confused by the counts for a while, but now I see they're counting the number of offspring.)
The function to remove the largest element (the root) is basically a deletion, which is always the most difficult. In essence you need to merge two trees while maintaining your invariant. I don't have time right now to work through it in detail, but I think it will turn out to be possible.
If you look in Okasaki (which you can do if you get stuck!) you'll see his trees have an extra invariant that makes it easier to do these operations. I'm pretty sure it's not something I would come up with right away. His implementation is based on an operation that merges two trees. It's used for insertion and deletion.
At a quick glance the Batteries heap code is based on "binomial trees", which are in fact a lot more complicated. They're explained in Okasaki also.
Update
Okasaki's book Purely Functional Data Structures is an elaboration of his PhD thesis. It appears that priority queues appear only in the book--sorry. If you're really interested in FP and not too strapped for cash the book is really worth owning.
As I said, your insert code looks great to me. It seems to me you actually have two invariants:
The value in a node is less than or equal to the values at the roots of its subtrees (ordering invariant).
The populations of the subtrees of a node differ by at most 1 (balance invariant).
As I said, I don't have time to verify in detail, but it looks to me like your insert code maintains the invariants and thus is O(log n).
The usefulness of this structure depends on your being able to delete the root in O(log n) while maintaining these two invariants.
The sketch of delete would be something like this:
let pop = function Leaf -> 0 | Node (_, _, _, p) -> p
let rec merge a b =
(* populations of a, b differ by at most one. pop a >= pop b *)
match a, b with
| Leaf, Leaf -> Leaf
| Leaf, _ -> b
| _, Leaf -> a
| Node (av, al, ar, ap), Node (bv, bl, br, bp) ->
if av >= bv then Node (av, merge al ar, b, ap + bp)
else Node (bv, merge al ar, insert av (delete_min b), ap + bp)
and delete_min = function
| Leaf -> Leaf
| Node (_, Leaf, Leaf, _) -> Leaf
| Node (_, l, Leaf, _) -> l
| Node (_, Leaf, r, _) -> r
| Node (_, l, r, _) ->
if pop l >= pop r then merge l r else merge r l
I still don't have a lot of time, so this might need some fixing up for correctness or for complexity.
Update
Being a purely cerebral guy, I (truly) never wondered what Chris Okasaki is like in real life. He teaches at West Point, and it's not too difficult to find his personal page there. It might satisfy some of your curiosity.

Related

How do I check if the path from a node to another has a depth equal to a given one in a graph in OCaml?

First of all, I'm sorry for how I wrote my question.
Anyway, I'm trying to write a function in OCaml that, given a graph, a max depth, a starting node, and another node, returns the list of the nodes that make the path but only if the depth of it is equal to the given one. However, I can't implement the depth part.
This is what I did:
let m = [(1, 2, "A"); (2, 3, "A");
(3, 1, "A"); (2, 4, "B");
(4, 5, "B"); (4, 6, "C");
(6, 3, "C"); (5, 7, "D");
(6, 7, "D")]
let rec vicini n = function
[] -> []
| (x, y, _)::rest ->
if x = n then y :: vicini n rest
else if y = n then x :: vicini n rest
else vicini n rest
exception NotFound
let raggiungi m maxc start goal =
let rec from_node visited n =
if List.mem n visited then raise NotFound
else if n = goal then [n]
else n :: from_list (n :: visited) (vicini n m)
and from_list visited = function
[] -> raise NotFound
| n::rest ->
try from_node visited n
with NotFound -> from_list visited rest
in start :: from_list [] (vicini start m)
I know I have to add another parameter that increases with every recursion and then check if its the same as the given one, but I don't know where
I am not going to solve your homework, but I will try to teach you how to use recursion.
In programming, especially functional programming, we use recursion to express iteration. In an iterative procedure, there are things that change with each step and things that remain the same on each step. An iteration is well-founded if it has an end, i.e., at some point in time, the thing that changes reaches its foundation and stops. The thing that changes on each step, is usually called an induction variable as the tribute to the mathematical induction. In mathematical induction, we take a complex construct and deconstruct it step by step. For example, consider how we induct over a list to understand its length,
let rec length xs = match xs with
| [] -> 0
| _ :: xs -> 1 + length xs
Since the list is defined inductively, i.e., a list is either an empty list [] or a pair of an element x and a list, x :: list called a cons. So to discover how many elements in the list we follow its recursive definition, and deconstruct it step by step until we reach the foundation, which is, in our case, the empty list.
In the example above, our inductive variable was the list and we didn't introduce any variable that will represent the length itself. We used the program stack to store the length of the list, which resulted in an algorithm that consumes memory equivalent to the size of the list to compute its length. Doesn't sound very efficient, so we can try to devise another version that will use a variable passed to the function, which will track the length of the list, let's call it cnt,
let rec length cnt xs = match xs with
| [] -> cnt
| _ :: xs -> length (cnt+1) xs
Notice, how on each step we deconstruct the list and increment the cnt variable. Here, call to the length (cnt+1) xs is the same as you would see in an English-language explanation of an algorithm that will state something like, increment cnt by one, set xs to the tail xs and goto step 1. The only difference with the imperative implementation is that we use arguments of a function and change them on each call, instead of changing them in place.
As the final example, let's devise a function that checks that there's a letter in the first n letters in the word, which is represented as a list of characters. In this function, we have two parameters, both are inductive (note that a natural number is also an inductive type that is defined much like a list, i.e., a number is zero or the successor of a number). Our recursion is also well-founded, in fact, it even has two foundations, the 0 length and the empty list, whatever comes first. It also has a parameter that doesn't change.
let rec has_letter_in_prefix letter length input =
length > 0 && match input with
| [] -> false
| char :: input ->
char = letter || has_letter_in_prefix letter (length-1) input
I hope that this will help you in understanding how to encode iterations with recursion.

Find a key by having its value using Data.Map in Haskell

I just started using Haskell some weeks ago and I lack of imagination to resolve a function in this situation.
So I am trying to find the predecessors of a vertex in a graph implemented in Haskell.
My graph :
-- | A directed graph
data Graph v = Graph
{ arcsMap :: Map v [v] -- A map associating a vertex with its successors
, labelMap :: Map v String -- The Graphviz label of each node
, styleMap :: Map v String -- The Graphviz style of each node
}
The function successors :
-- | Returns the successors of a vertex in a graph in ascending order
--
-- We say that `v` is a successor of `u` in a graph `G` if the arc `(u,v)`
-- belongs to `G`.
-- Note: Returns the empty list if the vertex does not belong to the graph.
successors :: Ord v => v -> Graph v -> [v]
successors v (Graph arcs _ _) = findWithDefault [] v arcs
And the function I'm currently trying to resolve :
-- | Returns the predecessors of a vertex in a graph in ascending order
--
-- We say that `u` is a predecessor of `v` in a graph `G` if the arc `(u,v)`
-- belongs to `G`.
-- Note: Returns the empty list if the vertex does not belong to the graph.
predecessors :: Ord v => v -> Graph v -> [v]
predecessors v (Graph arcs _ _) =
map (fst) (filter (\(x,[y]) -> elem v [y]) (assocs arcs) )
I need to find a way to get the keys (the vertices) by having the value (the successor) of those vertices. For example :
-- >>> predecessors 3 $ addArcs emptyGraph [(1,2),(2,3),(1,3)]
-- [1,2]
But when I run that line, I get Non-exhaustive patterns in lambda.
What is that and how can I fix it?
Thank you!
Edit : Never mind I corrected it but I still do not really understand how haha
Haskell's Maps and Hashmap don't have efficient key lookups. The best you can do is O(n), and you have to write it yourself. I have something like this in my projects, which we can edit a bit to find all keys:
lookupKey :: Eq v => v -> Map.Map k v -> [k]
lookupKey val = Map.foldrWithKey go [] where
go key value found =
if value == val
then key:found
else found
You might want to use strict folds if you use strict maps.

Union and intersection operations on collections in Cypher (Neo4j)

I need to calculate both the union and the intersection of a set of arrays/collections in Cypher. Let's say for instance I have the topics of interest of a number of individuals saved as array properties for each individual node and I need to know (1) the topics that every member of a given group find interesting; but I also need to know (2) the topics that may attract the attention of any of the group members.
So, take the following individuals as the members of a group of two:
CREATE ({name: 'bill', interests: ["biking", "hiking", "fishing", "swimming"]})
CREATE ({name: 'joe', interests: ["swimming", "hiking", "biking", "tennis"]})
Inspired by this great answer I have written the following scripts to get what I need:
Intersection (n.interests ∩ m.interests)
MATCH (n {name:'bill'}), (m {name:'joe'})
RETURN FILTER(x IN n.interests WHERE x IN m.interests)
Response: biking, hiking, swimming
Union (n.interests ∪ m.interests)
MATCH (n {name:'bill'}), (m {name:'joe'})
RETURN FILTER(x IN n.interests WHERE x IN m.interests)+
FILTER(x IN n.interests WHERE NOT(x IN m.interests))+
FILTER(x IN m.interests WHERE NOT(x IN n.interests))
Response: biking, hiking, swimming, fishing, tennis
Both work pretty well for groups of two. The problem is the union script is not generalizable and needs to be expanded further for each additional group member. This is because instead of doing a straightforward n.interests ∪ m.interests I am going the long way by producing (n.interests ∩ m.interests) ∪ (n.interests - m.interests) ∪ (m.interests - n.interests) which equals n.interests ∪ m.interests, but necessitates pairwise comparison of all individuals in the group.
Hence my question: Is there any better way in Cypher to produce the union of two collections/arrays, without redundant results in the response collection?
P.S. As you may have noticed these interests don't really have an ordering, so I am actually treating Neo4j collections as sets.
P.S.2 It is possible that I am misunderstanding and incorrectly conflating the notions of collection and array in Cypher, in which case please don't hesitate to point out what the mistake is.
APOC Procedures has union and intersection functions, should be exactly what you need.
MATCH (n {name:'bill'}), (m {name:'joe'})
RETURN apoc.coll.union(n.interests, m.interests) as interests_union,
apoc.coll.intersection(n.interests, m.interests) as interests_intersection
The above is usable with Neo4j 3.1 and up (which supports user-defined functions). In Neo4j 3.0, these are procedures instead, and you'll need to CALL them as procedures.
This is also easily applied to multiple collections, instead of just two. If the collections are collected, you can run REDUCE() on the list of lists to apply the union or intersection for all collections.
I recently solved the same problem by first taking the duplicated union, and then deduplicating using distinct
MATCH (n {name:'bill'}), (m {name:'joe'})
UNWIND n.interests + m.interests AS interests
RETURN COLLECT(distinct interests) AS interests_union
I think you can generalize it using reduce to produce your collections.
And probably use one of the quantor-predicates (ANY,ALL, SINGLE, NONE)
Something like this for intersection:
WITH [1,2,3] as a, [3,4,5] as b, [2,3,4] as c
REDUCE (res=[], x in a | case when x in b AND x in c then res + [x] else res)
WITH [1,2,3] as a, [3,4,5] as b, [2,3,4] as c
REDUCE (res=[], x in a | case when ALL(coll in [b,c] WHERE x in coll) then res + [x] else res)
But all of these operations won't have really good runtime characteristics.

Prolog infinite loop

I'm fairly new to Prolog and I hope this question hasn't been asked and answered but if it has I apologize, I can't make sense of any of the other similar questions and answers.
My problem is that I have 3 towns, connected by roads. Most are one way, but there are two towns connected by a two way street. i.e.
facts:
road(a, b, 1).
road(b, a, 1).
road(b, c, 3).
where a, b and c are towns, and the numbers are the distances
I need to be able to go from town a to c without getting stuck between a and b
Up to here I can solve with the predicates: (where r is a list of towns on the route)
route(A, B, R, N) :-
road(A, B, N),
R1 = [B],
R = [A|R1],
!.
route(A, B, R, N) :-
road(A, C, N1),
route(C, B, R1, N2),
\+ member(A, R1),
R = [A | R1],
N is N1+N2.
however if I add a town d like so
facts:
road(b, d, 10)
I can't get Prolog to recognize this is a second possible route. I know that this is because I have used a cut, but without the cut it doesn't stop and ends in stack overflow.
Furthermore I will then need to be able to write a new predicate that returns true when R is given as the shortest route between a and c.
Sorry for the long description. I hope someone can help me!
This is a problem of graph traversal. I think your problem is that you've got a cyclic graph — you find the leg a-->b and the next leg you find is b-->a where it again finds the leg a-->b and ... well, you get the picture.
I would approach the problem like this, using a helper predicate with accumulators to build my route and compute total distance. Something like this:
% ===========================================================================
% route/4: find the route(s) from Origin to Destination and compute the total distance
%
% This predicate simply invoke the helper predicate with the
% accumulator(s) variables properly seeded.
% ===========================================================================
route(Origin,Destination,Route,Distance) :-
route(Origin,Destination,[],0,Route,Distance)
.
% ------------------------------------------------
% route/6: helper predicate that does all the work
% ------------------------------------------------
route(D,D,V,L,R,L) :- % special case: you're where you want to be.
reverse([D|V],R) % - reverse the visited list since it get built in reverse order
. % - and unify the length accumulator with the final value.
route(O,D,V,L,Route,Length) :- % direct connection
road(O,D,N) , % - a segment exists connecting origin and destination directly
L1 is L+N , % - increment the length accumulator
V1 = [O|V] , % - prepend the current origin to the visited accumulator
route(D,D,V1,L1,Route,Length) % - recurse down, indicating that we've arrived at our destination
. %
route(O,D,V,L,Route,Length) :- % indirect connection
road(O,X,N) , % - a segment exists from the current origin to some destination
X \= D , % - that destination is other than the desired destination
not member(X,V) , % - and we've not yet visited that destination
L1 is L+N , % - increment the length accumulator
V1 = [O|V] , % - prepend the current origin to the visited accumulator
route(X,D,V1,L1,Route,Length) % - recurse down using the current destination as the new origin.

In which functional programming function would one grow a set of items?

Which of the three (if any (please provide an alternative)) would be used to add elements to a list of items?
Fold
Map
Filter
Also; how would items be added? (appended to the end / inserted after working item / other)
A list in functional programming is usually defined as a recursive data structure that is either a special empty value, or is composed of a value (dubbed "head") and another list (dubbed "tail"). In Haskell:
-- A* = 1 + A x A*
-- there is a builtin list type:
data [a] = [] | (a : [a])
To add an element at the head, you can use "cons": the function that takes a head and a tail, and produces the corresponding list.
-- (:) is "cons" in Haskell
(:) :: a -> [a] -> [a]
x = [1,2,3] -- this is short for (1:(2:(3:[])))
y = 0 : x -- y = [0,1,2,3]
To add elements at the end, you need to recurse down the list to add it. You can do this easily with a fold.
consAtEnd :: a -> [a] -> [a]
consAtEnd x = foldr [x] (:)
-- this "rebuilds" the whole list with cons,
-- but uses [x] in the place of []
-- effectively adding to the end
To add elements in the middle, you need to use a similar strategy:
consAt :: Int -> a -> [a] -> [a]
consAt n x l = consAtEnd (take n l) ++ drop n l
-- ++ is the concatenation operator: it joins two lists
-- into one.
-- take picks the first n elements of a list
-- drop picks all but the first n elements of a list
Notice that except for insertions at the head, these operations cross the whole list, which may become a performance issue.
"cons" is the low-level operation used in most functional programming languages to construct various data structure including lists. In lispy syntax it looks like this:
(cons 0 (cons 1 (cons 2 (cons 3 nil))))
Visually this is a linked list
0 -> 1 -> 2 -> 3 -> nil
Or perhaps more accurately
cons -- cons -- cons -- cons -- nil
| | | |
0 1 2 3
Of course you could construct various "tree"-like data structures with cons as well.
A tree like structure might look something like this
(cons (cons 1 2) (cons 3 4))
I.e. Visually:
cons
/ \
cons cons
/ \ / \
1 2 3 4
However most functional programming languages will provide many "higher level" functions for manipulating lists.
For example, in Haskell there's
Append: (++) :: [a] -> [a] -> [a]
List comprehension: [foo c | c <- s]
Cons: (:) :: a -> [a] -> [a] (as Martinho already mentioned)
And many many more
Just to offer a concluding remark, you wouldn't often operate on individual elements in a list in the way that you're probably thinking, this is an imperative mindset. You're more likely to copy the entire structure using a recursive function or something in that line. The compiler/virtual machine is responsible recognizing when the memory can be modified in place and updating pointers etc.

Resources