Suggestion for solving fragile pattern matching - functional-programming

I often need to match a tuple of values that should have the same constructor. The catchall _,_ always winds-up at the end. This of course is fragile, any additional constructor added to the type will compile perfectly fine. My current thoughts are to have matches that connect the first but not second argument. But, is there any other options?
For example,
type data = | States of int array
| Chars of (char list) array
let median a b = match a,b with
| States xs, States ys ->
assert( (Array.length xs) = (Array.length ys) );
States (Array.init (Array.length xs) (fun i -> xs.(i) lor ys.(i)))
| Chars xs, Chars ys ->
assert( (Array.length xs) = (Array.length ys) );
let union c1 c2 = (List.filter (fun x -> not (List.mem x c2)) c1) # c2 in
Chars (Array.init (Array.length xs) (fun i -> union xs.(i) ys.(i)))
(* inconsistent pairs of matching *)
| Chars _, _
| States _, _ -> assert false

You can use the slightly shorter pattern below:
| (Chars _| States _), _ -> assert false
In fact, you can let the compiler generate it for you, because it's still a little tedious.
Type the following and compile:
let median a b = match a,b with
| States xs, States ys ->
assert( (Array.length xs) = (Array.length ys) );
States (Array.init (Array.length xs) (fun i -> xs.(i) lor ys.(i)))
| Chars xs, Chars ys ->
assert( (Array.length xs) = (Array.length ys) );
let union c1 c2 = (List.filter (fun x -> not (List.mem x c2)) c1) # c2 in
Chars (Array.init (Array.length xs) (fun i -> union xs.(i) ys.(i)))
Warning 8: this pattern-matching is
not exhaustive. Here is an example of
a value that is not matched: (Chars _,
States _)
You can now copy-paste the suggested pattern back into your code. This is usually how I generate non-fragile catch-all patterns for types with tens of constructors. You may need to launch the compiler several times, but it's still faster than typing them yourself.

It's only a matter of taste/style, but I tend to prefer grouping clauses on the same constructor together, rather than having the useful clauses for everything first, then all the "absurd cases" together. This can be quite helpful when you get to write several "useful" clauses for one given constructor, and want to check you didn't forget anything.
let median a b = match a,b with
| States xs, States ys ->
assert( (Array.length xs) = (Array.length ys) );
States (Array.init (Array.length xs) (fun i -> xs.(i) lor ys.(i)))
| States _, _ -> assert false
| Chars xs, Chars ys ->
assert( (Array.length xs) = (Array.length ys) );
let union c1 c2 = (List.filter (fun x -> not (List.mem x c2)) c1) # c2 in
Chars (Array.init (Array.length xs) (fun i -> union xs.(i) ys.(i)))
| Chars _, _ -> assert false

This is pretty hackish (and results in warnings) but you can use Obj to check if the tags are equal or not. It should catch all cases where a and b have different values:
type data = | States of int array
| Chars of (char list) array
let median a b = match a,b with
| States xs, States ys ->
assert( (Array.length xs) = (Array.length ys) );
States (Array.init (Array.length xs) (fun i -> xs.(i) lor ys.(i)))
| Chars xs, Chars ys ->
assert( (Array.length xs) = (Array.length ys) );
let union c1 c2 = (List.filter (fun x -> not (List.mem x c2)) c1) # c2 in
Chars (Array.init (Array.length xs) (fun i -> union xs.(i) ys.(i)))
(* inconsistent pairs of matching *)
| x, y when (Obj.tag (Obj.repr x)) <> (Obj.tag (Obj.repr y)) -> assert false
The warning is for non-exhaustive pattern-matching (since it can't tell whether or not the guarded clause matches the rest or not).
EDIT: you don't need to use Obj at all, you can just compare x and y directly:
| x, y when x <> y -> assert false
Though this still results in a warning, unfortunately.

Related

OCaml function apparently works but does not return the expected result

Who can help? I am a beginner in OCaml, I am trying to perform an action of unpacking sets. Having a set [(1, 4); (2, 5); (3, 6)] I want to get the exit [(1,2,3), (4,5,6)]. I am using a script that I tested with Haskell and it worked, but in OCaml, it does not show the result. Where am I going wrong? I could not figure out where my mistake is. Thx.
let fst num1 num2 =
match num1, num2 with
| (x, y) -> x;;
let snd num1 num2 =
match num1, num2 with
| (x, y) -> y;;
let rec dcp_base list1 list2 list3 =
match list1, list2, list3 with
| (xs, ys, []) -> (xs, ys)
| (xs, ys, z :: zs) -> dcp_base (xs # [fst z]) (ys # [snd z]) zs;;
let descompact list =
match list with
| [] -> ([], [])
| xs -> dcp_base [] [] xs;;
The problem is your redefinition of fst and snd. They're not needed, as they're already defined in the standard library and in scope with exactly those names. But they're also wrong. Your implementation takes two arguments and selects either the first or second in a roundabout way by creating an intermediary tuple, instead of a singe tuple argument directly. Therefore, when you apply it to a single tuple argument it will return a partially applied function expecting the second argument.
You can fix the problem just by removing the definitions of fst and snd from your code, but if you absolutely want to reimplement it, it ought to look something more like this:
let fst (x, _) = x;;
let snd (_, y) = y;;
Your fst and snd functions are actually strange since you take two arguments to return the first one or the second one. I guess you wanted to get the first or second element of a pair so you should write (from most detailed to least detailed)
(* too much details *)
let fst num = match num with (x, y) -> x
let snd num = match num with (x, y) -> y
(* let's use the wildcards *)
let fst num = match num with (x, _) -> x
let snd num = match num with (_, y) -> y
(* do we really need num? *)
let fst = function (x, _) -> x
let snd = function (_, y) -> y
(* do we really need to match on a single pattern? *)
let fst (x, _) = x
let snd (_, y) = y
And it should work.
As a side note, fst and snd already exist in the standard library but it's never wrong to try implementing them yourself
Second side note, appending at the end of a list is usually not advised (not tail recursive, you're forcing the program to traverse the entire list to append an element at the end). What you could do instead is to add each new element at the head of the list and reverse the final list:
let rec dcp_base list1 list2 list3 =
match list1, list2, list3 with
| (xs, ys, []) -> (List.rev xs, List.rev ys)
| (xs, ys, z :: zs) -> dcp_base (fst z :: xs) (snd z :: ys) zs;;
And actually, since OCaml is really strong, you don't need fst and snd at all:
let rec dcp_base list1 list2 list3 =
match list1, list2, list3 with
| (xs, ys, []) -> (List.rev xs, List.rev ys)
| (xs, ys, (x, y) :: zs) -> dcp_base (x :: xs) (y :: ys) zs;;
Proof:
let rec dcp_base list1 list2 list3 =
match list1, list2, list3 with
| (xs, ys, []) -> (List.rev xs, List.rev ys)
| (xs, ys, (x, y) :: zs) -> dcp_base (x :: xs) (y :: ys) zs;;
let descompact list =
match list with
| [] -> ([], [])
| xs -> dcp_base [] [] xs;;
descompact [(1, 4); (2, 5); (3, 6)];;
- : int list * int list = ([1; 2; 3], [4; 5; 6])

First and last element of list OCaml

I am trying to get first and last element of the list in OCaml. I expect that my function will be like
'a list -> 'a * 'a
What I am trying to do is
let lista = [1;2;3;4;6;0];;
let rec first_last myList =
match myList with
[x] -> (List.hd lista,x)
| head::tail ->
first_last tail;;
first_last lista;;
Of course because of I made list as integer then I am doing this syntax like
*int list -> int * 'a
The point is that I dont have idea how to do this function for 'a.
Whats the direction?
The direction is to write two different functions first and last and implement the first_and_last function as:
let first_and_last xs = first xs, last xs
Another possibility with only one function:
let rec first_last = function
| [] -> failwith "too bad"
| [e] -> failwith "too bad"
| [e1;e2] -> (e1,e2)
| e1 :: _ :: r -> first_last (e1::r)
You may prefer it like that:
let rec first_last myList = match myList with
| [] -> failwith "too bad"
| [e] -> failwith "too bad"
| [e1;e2] -> (e1,e2)
| e1 :: _ :: r -> first_last (e1::r)
You can create two separate functions to return first element and last element, and then in your first_and_last function return a tuple (first_element, last_element).
let rec first_element list =
match list with
| [] -> failwith "List is empty"
| first_el::rest_of_list -> first_el
let rec last_element list =
match list with
| [] -> failwith "List is empty"
| [x] -> x
| first_el::rest_of_list -> last_element rest_of_list
You can create a helper function that has a base-case of the empty-list - for which it returns itself, and otherwise checks if the next recursive call will return an empty list. If it does, return the current element (which is by definition the last element in the list), and if it doesn't, return what was returned by the recursive call.
For the regular (non-helper) method, if the list is at least one element long (i.e. hd::tl = hd::[]) then you can just concatenate the list you got from the last function onto the head from ls.
It can be implemented as follow:
let rec last ls =
match ls with
| [] -> []
| hd::tl -> let next = last tl in
if next = [] then [hd]
else next
;;
let first_last ls =
match ls with
| [] -> failwith "Oh no!!!!! Empty list!"
| hd::tl -> hd::last tl
;;
Yet another take on this problem.
let first_last xs =
let rec last_non_empty = function
| [x] -> x
| _ :: xs' -> last_non_empty xs'
| [] -> failwith "first_last: impossible case!"
in
match xs with
| [] -> failwith "first_last"
| x::_ -> (x, last_non_empty xs)
Some properties of this implementation:
(1) it meets the specification 'a list -> 'a * 'a:
utop > #typeof "first_last";;
val first_last : 'a list -> 'a * 'a
(2) it works for singleton lists: first_last [x] = (x,x):
utop> first_last [1];;
- : int * int = (1, 1) utop> first_last ["str"];;
- : bytes * bytes = ("str", "str")
(3) it's tail-recursive (hence it won't cause stack overflow for sufficiently big lists):
utop > first_last (Array.to_list (Array.init 1000000 (fun x -> x+1)));;
- : int * int = (1, 1000000)
(4) it traverses the input list one time only; (5) it avoids creating new lists as it goes down the recursive ladder; (6) it avoids polluting the namespace (with the price of not allowing the reuse of a function like last).
And another rather simple variant, from the first principles (I was trying to illustrate "wishful thinking" in the spirit of the SICP book):
(* Not tail-recursive, might result in stack overflow *)
let rec first_last = function
| [] -> failwith "first_last"
| [x] -> (x,x)
| x :: xs -> (x, snd (first_last xs))
You could write it like this:
let first_last = function
| [] -> assert false
| x :: xs -> (x, List.fold_left (fun _ y -> y) x xs)
Or, if you are using the Base library, you could write in this way:
let first_last xs = (List.hd_exn xs, List.reduce_exn ~f:(fun _ y -> y) xs)
The basic idea is that List.fold_left (fun _ y -> y) x xs will compute the last element of x :: xs. You can prove this by induction on xs: if xs = [] then List.fold_left (fun _ y -> y) x [] = x, which is the last element of x :: []; moreover, if xs = x' :: xs' then List.fold_left (fun _ y -> y) x (x' :: xs') can be rewritten as List.fold_left (fun _ y -> y) x' xs', because List.fold_left f acc (x :: xs) = List.fold_left (f acc x) xs, hence we are finished, because this is the last element of x' :: xs' by our induction hypothesis.

How to create a function that encodes run-length using fold_right?

I created a function and helper function that find the number of repeating elements in a list, and what those elements.
let rec _encode l x =
match l with
| [] -> 0
| head::rest -> (if head = x then 1 else 0) + encode rest x
let encode l x = ((_encode l x), x)
In this case, I have to specify what that element is for it to search.
So this is a two part question. 1) How do I do it to return a list of tuples, with format (int * 'a) list, where int is the # of rep, and 'a is the element that is repeating.
2) How would I implement this using fold_right?
I was thinking something along the lines of:
let encode (l : 'a list) : (int * 'a) list = fold_right (fun (x,hd) lst ->
match x with
| [] -> 0
| hd :: rest -> if hd x then (x+1, hd) else (x, hd)) l []
Your attempt looks very confused:
It doesn't use lst, hd (the first one), or rest.
x is used as a list (match x with []) and a number (x+1).
The elements of x (list) are functions that return bools?? (... hd::rest -> ... if hd x)
The function sometimes returns a number (0) and sometimes a tuple ((x, hd)).
Here's how I'd do it:
let encode l =
let f x = function
| (n, y) :: zs when x = y -> (n + 1, y) :: zs
| zs -> (1, x) :: zs
in
fold_right f l []
Which is the same as:
let encode l =
let f x z = match z with
| (n, y) :: zs when x = y -> (n + 1, y) :: zs
| zs -> (1, x) :: zs
in
fold_right f l []
Which is the same as:
let encode l =
fold_right (fun x z ->
match z with
| (n, y) :: zs when x = y -> (n + 1, y) :: zs
| zs -> (1, x) :: zs
) l []

Any simpler way to implement non-in-place selection sort in OCaml?

I implemented a non-in-place version of selection sort in OCaml.
let sort compare_fun l =
let rec find_min l' min_l origin_l =
match l' with
| [] ->
if min_l = [] then (min_l, l')
else
let min = List.hd min_l
in
(min_l, List.filter (fun x -> if x != min then true else false) origin_l)
| x::tl ->
if min_l = [] then
find_min tl [x] origin_l
else
let c = compare_fun (List.hd min_l) x
in
if c = 1 then
find_min tl [x] origin_l
else if c = 0 then
find_min tl (min_l # [x]) origin_l
else
find_min tl min_l origin_l
in
let rec insert_min l' new_l =
match l' with
| [] -> new_l
| _ ->
let (min_l, rest) = find_min l' [] l'
in
insert_min rest (new_l # min_l)
in
insert_min l [];;
My idea is that in a list, every time I find the list of minimum items (in case of duplicate values) and add this min list to the result list, then redo the finding_min in the rest of the list.
I use List.filter to filter out the min_list, so the resulting list will be the list for next find_min.
I find my implementation is quite complicated, and far more complicated than the Java in-place version of selection sort.
Any suggestions to improve it?
Edit: Here's a much better implementation: http://rosettacode.org/wiki/Sorting_algorithms/Selection_sort#OCaml
here's my own crappier implementation
(* partial function - bad habit, don't do this. *)
let smallest (x::xs) = List.fold_right (fun e acc -> min e acc) xs x
let remove l y =
let rec loop acc = function
| [] -> raise Not_found
| x::xs -> if y = x then (List.rev acc) # xs else loop (x::acc) xs
in loop [] l
let selection_sort =
let rec loop acc = function
| [] -> List.rev acc
| xs ->
let small = smallest xs in
let rest = remove xs small in
loop (small::acc) rest
in loop []

F# System.OutOfMemoryException with recursive call

This is actually a solution to Project Euler Problem 14 in F#. However, I'm running into a System.OutOfMemory exception when attempting to calculate an iterative sequence for larger numbers. As you can see, I'm writing my recursive function with tail calls.
I was running into a problem with StackOverFlowException because I was debugging in visual studio (which disables the tail calls). I've documented that in another question. Here, I'm running in release mode--but I'm getting out of memory exceptions when I run this as a console app (on windows xp with 4gb ram).
I'm really at a loss to understand how I coded myself into this memory overflow & hoping someone can show my the error in my ways.
let E14_interativeSequence x =
let rec calc acc startNum =
match startNum with
| d when d = 1 -> List.rev (d::acc)
| e when e%2 = 0 -> calc (e::acc) (e/2)
| _ -> calc (startNum::acc) (startNum * 3 + 1)
let maxNum pl=
let rec maxPairInternal acc pairList =
match pairList with
| [] -> acc
| x::xs -> if (snd x) > (snd acc) then maxPairInternal x xs
else maxPairInternal acc xs
maxPairInternal (0,0) pl
|> fst
// if I lower this to like [2..99999] it will work.
[2..99999]
|> List.map (fun n -> (n,(calc [] n)))
|> List.map (fun pair -> ((fst pair), (List.length (snd pair))))
|> maxNum
|> (fun x-> Console.WriteLine(x))
EDIT
Given the suggestions via the answers, I rewrote to use a lazy list and also to use Int64's.
#r "FSharp.PowerPack.dll"
let E14_interativeSequence =
let rec calc acc startNum =
match startNum with
| d when d = 1L -> List.rev (d::acc) |> List.toSeq
| e when e%2L = 0L -> calc (e::acc) (e/2L)
| _ -> calc (startNum::acc) (startNum * 3L + 1L)
let maxNum (lazyPairs:LazyList<System.Int64*System.Int64>) =
let rec maxPairInternal acc (pairs:seq<System.Int64*System.Int64>) =
match pairs with
| :? LazyList<System.Int64*System.Int64> as p ->
match p with
| LazyList.Cons(x,xs)-> if (snd x) > (snd acc) then maxPairInternal x xs
else maxPairInternal acc xs
| _ -> acc
| _ -> failwith("not a lazylist of pairs")
maxPairInternal (0L,0L) lazyPairs
|> fst
{2L..999999L}
|> Seq.map (fun n -> (n,(calc [] n)))
|> Seq.map (fun pair -> ((fst pair), (Convert.ToInt64(Seq.length (snd pair)))))
|> LazyList.ofSeq
|> maxNum
which solves the problem. I'd also look at Yin Zhu's solution which is better, though.
As mentioned by Brian, List.* operations are not appropriate here. They cost too much memory.
The stackoverflow problem comes from another place. There are two possible for you to have stackoverflow: calc and maxPairInternal. It must be the first as the second has the same depth as the first. Then the problem comes to the numbers, the number in 3n+1 problem could easily go to very large. So you first get a int32 overflow, then you get a stackoverflow. That's the reason. After changing the numbers to 64bit, the program works.
Here is my solution page, where you can see a memoization trick.
open System
let E14_interativeSequence x =
let rec calc acc startNum =
match startNum with
| d when d = 1L -> List.rev (d::acc)
| e when e%2L = 0L -> calc (e::acc) (e/2L)
| _ -> calc (startNum::acc) (startNum * 3L + 1L)
let maxNum pl=
let rec maxPairInternal acc pairList =
match pairList with
| [] -> acc
| x::xs -> if (snd x) > (snd acc) then maxPairInternal x xs
else maxPairInternal acc xs
maxPairInternal (0L,0) pl
|> fst
// if I lower this to like [2..99999] it will work.
[2L..1000000L]
|> Seq.map (fun n -> (n,(calc [] n)))
|> Seq.maxBy (fun (n, lst) -> List.length lst)
|> (fun x-> Console.WriteLine(x))
If you change List.map to Seq.map (and re-work maxPairInternal to iterate over a seq) that will probably help tons. Right now, you're manifesting all the data at once in a giant structure before processing the whole structure to get a single number result. It is much better to do this lazily via Seq, and just create one row, and compare it with the next row, and create a single row at a time and then discard it.
I don't have time to code my suggestion now, but let me know if you are still having trouble and I'll revisit this.
Stop trying to use lists everywhere, this isn't Haskell! And stop writing fst pair and snd pair everywhere, this isn't Lisp!
If you want a simple solution in F# you can do it directly like this without creating any intermediate data structures:
let rec f = function
| 1L -> 0
| n when n % 2L = 0L -> 1 + f(n / 2L)
| n -> 1 + f(3L * n + 1L)
let rec g (li, i) = function
| 1L -> i
| n -> g (max (li, i) (f n, n)) (n - 1L)
let euler14 n = g (0, 1L) n
That takes around 15s on my netbook. If you want something more time efficient, reuse previous results via an array:
let rec inside (a : _ array) n =
if n <= 1L || a.[int n] > 0s then a.[int n] else
let p =
if n &&& 1L = 0L then inside a (n >>> 1) else
let n = 3L*n + 1L
if n < int64 a.Length then inside a n else outside a n
a.[int n] <- 1s + p
1s + p
and outside (a : _ array) n =
let n = if n &&& 1L = 0L then n >>> 1 else 3L*n + 1L
1s + if n < int64 a.Length then inside a n else outside a n
let euler14 n =
let a = Array.create (n+1) 0s
let a = Array.Parallel.init (n+1) (fun n -> inside a (int64 n))
let i = Array.findIndex (Array.reduce max a |> (=)) a
i, a.[i]
That takes around 0.2s on my netbook.
Found this looking for Microsoft.FSharp.Core.Operators.Checked.
I'm just learning F#, so I thought I'd take the Project Euler 14 Challenge.
This uses recursion but not tail-recursion.
Takes about 3.1 sec for me, but has the advantage that I can almost understand it.
let Collatz (n:int64) = if n % 2L = 0L then n / 2L else n * 3L + 1L
let rec CollatzLength (current:int64) (acc:int) =
match current with
| 1L -> acc
| _ -> CollatzLength (Collatz current) (acc + 1)
let collatzSeq (max:int64) =
seq{
for i in 1L..max do
yield i, CollatzLength i 0
}
let collatz = Seq.toList(collatzSeq 1000000L)
let result, steps = List.maxBy snd collatz

Resources