How to write a pattern match in Ocaml so it is easy to scale? - functional-programming

I am learning Jason Hickey's Introduction to Objective Caml.
There is an exercise like this:
Exercise 4.3 Suppose we have a crypto-system based on the following substitution cipher, where each plain letter is encrypted according to the following table.
Plain | A B C D
--------------------
Encrypted | C A D B
For example, the string BAD would be encrypted as ACB.
Write a function check that, given a plaintext string s1 and a ciphertext string s2, returns true if, and only if, s2 is the ciphertext for s1. Your function should raise an exception if s1 is not a plaintext string. You may wish to refer to the string operations on page 8. How does your code scale as the alphabet gets larger? [emphasis added]
Basically, I wrote two functions with might-be-stupid-naive ways for this exercise.
I would like to ask for advice on my solutions first.
Then I would like to ask for hints for the scaled solution as highlighted in the exercise.
Using if else
let check_cipher_1 s1 s2 =
let len1 = String.length s1 in
let len2 = String.length s2 in
if len1 = len2 then
let rec check pos =
if pos = -1 then
true
else
let sub1 = s1.[pos] in
let sub2 = s2.[pos] in
match sub1 with
| 'A' -> (match sub2 with
|'C' -> check (pos-1)
| _ -> false)
| 'B' -> (match sub2 with
|'A' -> check (pos-1)
| _ -> false)
| 'C' -> (match sub2 with
|'D' -> check (pos-1)
| _ -> false)
| 'D' -> (match sub2 with
|'B' -> check (pos-1)
| _ -> false)
| _ -> false;
in
check (len1-1)
else
false
Using pure match everywhere
let check_cipher_2 s1 s2 =
let len1 = String.length s1 in
let len2 = String.length s2 in
match () with
| () when len1 = len2 ->
let rec check pos =
match pos with
| -1 -> true
| _ ->
let sub1 = s1.[pos] in
let sub2 = s2.[pos] in
(*http://stackoverflow.com/questions/257605/ocaml-match-expression-inside-another-one*)
match sub1 with
| 'A' -> (match sub2 with
|'C' -> check (pos-1)
| _ -> false)
| 'B' -> (match sub2 with
|'A' -> check (pos-1)
| _ -> false)
| 'C' -> (match sub2 with
|'D' -> check (pos-1)
| _ -> false)
| 'D' -> (match sub2 with
|'B' -> check (pos-1)
| _ -> false)
| _ -> false
in
check (len1-1)
| () -> false
Ok. The above two solutions are similar.
I produced these two, because in here http://www.quora.com/OCaml/What-is-the-syntax-for-nested-IF-statements-in-OCaml, some people say that if else is not prefered.
This is essentially the first time I ever wrote a not-that-simple function in my whole life. So I am really hungry for suggestions here.
For exmaple,
how can I improve these solutions?
should I prefer match over if else?
Am I designing the rec or use the rec correctly?
if that in check (len1-1) correct?
Scale it
The exercise asks How does your code scale as the alphabet gets larger?. I really don't have a clue for now. In Java, I would say I will have a map, then for each char in s1, I am looking s2 for the according char and to see whether it is the value in the map.
Any suggestions on this?

Here's a simple solution:
let tr = function
| 'A' -> 'C'
| 'B' -> 'A'
| 'C' -> 'D'
| 'D' -> 'B'
| _ -> failwith "not a plaintext"
let check ~tr s1 s2 = (String.map tr s1) = s2
check ~tr "BAD" "ACD"
you can add more letters by composing with tr. I.e.
let comp c1 c2 x = try (c1 x) with _ -> (c2 x)
let tr2 = comp tr (function | 'X' -> 'Y')

how can I improve these solutions?
You misuse indentation which makes the program much harder to read. Eliminating unnecessary tabs and move check to outer scope for readability:
let check_cipher_1 s1 s2 =
let rec check pos =
if pos = -1 then
true
else
let sub1 = s1.[pos] in
let sub2 = s2.[pos] in
match sub1 with
| 'A' -> (match sub2 with
|'C' -> check (pos-1)
| _ -> false)
| 'B' -> (match sub2 with
|'A' -> check (pos-1)
| _ -> false)
| 'C' -> (match sub2 with
|'D' -> check (pos-1)
| _ -> false)
| 'D' -> (match sub2 with
|'B' -> check (pos-1)
| _ -> false)
| _ -> false in
let len1 = String.length s1 in
let len2 = String.length s2 in
if len1 = len2 then
check (len1-1)
else false
should I prefer match over if else?
It depends on situations. If pattern matching is superficial as you demonstrate in the 2nd function (match () with | () when len1 = len2) then it brings no value compared to a simple if/else construct. If you pattern match on values, it is better than if/else and potentially shorter when you make use of advanced constructs. For example, you can shorten the function by matching on tuples:
let check_cipher_1 s1 s2 =
let rec check pos =
if pos = -1 then
true
else
match s1.[pos], s2.[pos] with
| 'A', 'C' | 'B', 'A'
| 'C', 'D' | 'D', 'B' -> check (pos-1)
| _ -> false in
let len1 = String.length s1 in
let len2 = String.length s2 in
len1 = len2 && check (len1 - 1)
Here we also use Or pattern to group patterns having the same output actions and replace an unnecessary if/else block by &&.
Am I designing the rec or use the rec correctly?
if that in check (len1-1) correct?
Your function looks nice. There's no better way than testing with a few inputs on OCaml top-level.
Scale it
The number of patterns grows linearly with the size of the alphabet. It's pretty nice IMO.

The simplest solution seems to be to just cipher the text and compare the result:
let cipher_char = function
| 'A' -> 'C'
| 'B' -> 'A'
| 'C' -> 'D'
| 'D' -> 'B'
| _ -> failwith "cipher_char"
let cipher = String.map cipher_char
let check_cipher s1 s2 = (cipher s1 = s2)
The cipher_char function scales linearly with the size of the alphabet. To make it a bit more compact and generic you could use a lookup table of some form, e.g.
(* Assume that only letters are needed *)
let cipher_mapping = "CADB"
let cipher_char c =
try cipher_mapping.[Char.code c - Char.code 'A']
with Invalid_argument _ -> failwith "cipher_char"

Related

Inserting into a family tree (a real old-school family tree)

I'm learning F# and I'm trying to solve this exercise but my solution feels really... heavy and I suspect that there might an easier way of solving this task.
The task goes like that:
Declare two mutually recursive functions:
insertChildOf: Name -> FamilyTree -> FamilyTree -> FamilyTree option
insertChildOfInList: Name -> FamilyTree -> Children -> Children option
The value of insertChildOf n c t = Some t when t is the family tree obtained from t by insertion of c as a child of the person with name n. The value is None if such an insertion is not possible. Similarly, the value of insertChildOfInList n c cs = Some cs when cs is the list of children obtained from cs by inserting c as a child of a person named n in one of the children in cs. The value is None if such an insertion is not possible. Note that the person named n may occur anywhere in the family tree.
The type for the tree:
type Name = string;;
type Sex =
| M // male
| F // female
type YearOfBirth = int;;
type FamilyTree = P of Name * Sex * YearOfBirth * Children
and Children = FamilyTree list;;
You can assume that the tree has the following proprieties.
All children are younger than their parent.
The children are arranged form the oldest to the youngest.
Make sure that the tree you return also has those parameters.
My code:
let rec insertChildOf n c t =
let (P (_, _, yobi, _)) = c
match t with
| (P (name, sex, yob, children)) when n = name && yob < yobi ->
match insertHere c children -infinity with
| Some a -> Some ( P (name, sex, yob, a ))
| None -> None
| (P (name, _, yob, children)) when n = name && yob > yobi -> None
| (P (n, s, y, children)) ->
match insertChildOfInList n c children with
| Some a -> Some ( P (n, s, y, a ))
| None -> None
and insertChildOfInList n c cs =
match cs with
| h::t ->
match insertChildOf n c h with
| Some h2 ->
match insertChildOfInList n c t with
| Some a -> Some (h2::a)
| None -> None
| None -> None
| [] -> Some []
and insertHere t cs acc =
match cs with
| [] -> Some [t]
| h::tail ->
let (P (_, _, yob, _)) = t
let (P (_, _, yob2, _)) = h
if acc < yob && yob < yob2 then Some (t::h::tail)
else if yob = yob2 then None
else // h::(insertHere t tail (float yob2))
match insertHere t tail (float yob2) with
| Some a -> Some (h::a )
| None -> None
Once again, my question is: Can I do it in any simpler way?
Also, is there any way to return None if we didn't find FamilyTree with the right name? The one way I can think of is making all the functions return one more extra value called (found) which would signal if the node with the correct name was found, and creating a wrapper that would check the value of that variable and return None if the found was false.
To be honest, it isnt really any shorter than yours.
I've not used any 'fancy' library methods (e.g. things like sortby, or Option.map as you havent either)
I can't guarentee that its completely correct
I've written it as explicit lambda functions to make the types explicit - this isnt normal.
I didnt use the tuple syntax you need, because i find tuple syntax clumsy in this example.
I've put some test cases at the end.
type Name = string
type Sex =
| M // male
| F // female
type YearOfBirth = int
type FamilyTree =
{ name: string
sex: Sex
yearOfBirth: YearOfBirth
children: Children
}
and Children = FamilyTree list
let makeFamilyTree name sex yearOfBirth children =
{ name = name
sex = sex
yearOfBirth = yearOfBirth
children = children
}
let rec addChild : FamilyTree -> Children -> Children =
fun newChild children ->
match children with
| [] ->
[newChild]
| eldest :: tail when eldest.yearOfBirth < newChild.yearOfBirth ->
newChild :: eldest :: tail
| eldest :: tail ->
eldest :: addChild newChild tail
let rec insertChildOf: Name -> FamilyTree -> FamilyTree -> FamilyTree option =
fun name childTree tree ->
let childrenMaybe =
if name = tree.name && tree.yearOfBirth < childTree.yearOfBirth then
addChild childTree tree.children
|> Some
else
insertChildOfInList name childTree tree.children
match childrenMaybe with
| Some children ->
Some { tree with children = children }
| None ->
None
and insertChildOfInList: Name -> FamilyTree -> Children -> Children option =
fun name childTree children ->
match children with
| [] ->
None
| eldest :: younger ->
match insertChildOf name childTree eldest with
| Some eldest' ->
Some (eldest' :: younger)
| _ ->
None
let jon =
{ name = "Jon"
sex = Sex.M
yearOfBirth = 1100
children = []
}
let jon2 =
insertChildOf
"Jon"
(makeFamilyTree
"Dave"
Sex.M
1120
[])
jon
let jon3 =
insertChildOf
"Dave"
(makeFamilyTree
"Mary"
Sex.F
1140
[])
jon2.Value
let jon4 =
insertChildOf
"Dave"
(makeFamilyTree
"Elizabeth"
Sex.F
1141
[])
jon3.Value
let jon5 =
insertChildOf
"Dave"
(makeFamilyTree
"George"
Sex.F
1142
[])
jon4.Value

Comparison in pattern matching in OCaml

I want to write a function set which changes the index i in the 'a array a to the value 'a v and raise an invalid_argument exception if i is bigger then the length-1 of the array.
I know that this can be done with if/then/else:
let set i v a =
let l = Array.length a in
if i > (l-1) then
raise (Invalid_argument "index out of bounds")
else
a.(i) <- v
However I want to know if this can be achieved in a pure functional approach, using pattern matching and the OCaml standard library. I don't how to compare values inside the pattern matching, I get an error at the marked line:
let set i v a =
let l = Array.length a in
match i with
>>>>>> | > l-1 -> raise (Invalid_argument "index out of bounds")
| _ -> a.(i) <- v
Is there a workaround to achieve this? perhaps with a helper function?
An if expression is a pure functional approach, and is also the right approach. In general, pattern matching has the purpose of deconstructing values; it's not an alternative to an if.
However, it's still possible to do this with pattern matching:
let set i v a =
let l = Array.length a in
match compare l i with
| 1 -> a.(i) <- v
| _ -> raise ## Invalid_argument "index out of bounds"
EDIT: Apparently, compare can return other values than -1, 0 and 1 so this version is not reliable (but you wouldn't use it anyway, would you?)...
Or, more efficiently
let set i v a =
let l = Array.length a in
match l > i with
| true -> a.(i) <- v
| false -> raise ## Invalid_argument "index out of bounds"
But then you realize that matching over a boolean is just an if. Which is why the correct version is still
let set i v a =
let l = Array.length a in
if l > i then a.(i) <- v
else raise ## Invalid_argument "index out of bounds"
BlackBeans' answer is correct. But also know that pattern-matching in OCaml can take advantage of conditional guards when you want to place a conditional on a pattern.
Consider the following simple example.
type species = Dog | Cat
type weight = int
type pet = Pet of species * weight
let sound = function
| Pet (Dog, weight) when weight < 10 -> "Yip!"
| Pet (Dog, _) -> "Woof!"
| Pet (Cat, weight) when weight > 100 -> "ROAR!!!"
| _ -> "Meow!"
The patterns Pet (Dog, weight) and Pet (Dog, _) would otherwise match the same values (with the latter not binding a name to the weight).
An equivalent with if/else would look like:
let sound = function
| Pet (Dog, weight) ->
if weight < 10 then "Yip!"
else "Woof!"
| Pet (Cat, weight) ->
if weight > 100 -> "ROAR!!!"
else "Meow!"
In many ways which you prefer boils down to opinion, and which you feel is more expressive.

How to create a cached recursive type?

open System
open System.Collections.Generic
type Node<'a>(expr:'a, symbol:int) =
member x.Expression = expr
member x.Symbol = symbol
override x.GetHashCode() = symbol
override x.Equals(y) =
match y with
| :? Node<'a> as y -> symbol = y.Symbol
| _ -> failwith "Invalid equality for Node."
interface IComparable with
member x.CompareTo(y) =
match y with
| :? Node<'a> as y -> compare symbol y.Symbol
| _ -> failwith "Invalid comparison for Node."
type Ty =
| Int
| String
| Tuple of Ty list
| Rec of Node<Ty>
| Union of Ty list
type NodeDict<'a> = Dictionary<'a,Node<'a>>
let get_nodify_tag =
let mutable i = 0
fun () -> i <- i+1; i
let nodify (dict: NodeDict<_>) x =
match dict.TryGetValue x with
| true, x -> x
| false, _ ->
let x' = Node(x,get_nodify_tag())
dict.[x] <- x'
x'
let d = Dictionary(HashIdentity.Structural)
let nodify_ty x = nodify d x
let rec int_string_stream =
Union
[
Tuple [Int; Rec (nodify_ty (int_string_stream))]
Tuple [String; Rec (nodify_ty (int_string_stream))]
]
In the above example, the int_string_stream gives a type error, but it neatly illustrates what I want to do. Of course, I want both sides to get tagged with the same symbol in nodify_ty. When I tried changing the Rec type to Node<Lazy<Ty>> I've found that it does not compare them correctly and each sides gets a new symbol which is useless to me.
I am working on a language, and the way I've dealt with storing recursive types up to now is by mapping Rec to an int and then substituting that with the related Ty in a dictionary whenever I need it. Currently, I am in the process of cleaning up the language, and would like to have the Rec case be Node<Ty> rather than an int.
At this point though, I am not sure what else could I try here. Could this be done somehow?
I think you will need to add some form of explicit "delay" to the discriminated union that represents your types. Without an explicit delay, you'll always end up fully evaluating the types and so there is no potential for closing the loop.
Something like this seems to work:
type Ty =
| Int
| String
| Tuple of Ty list
| Rec of Node<Ty>
| Union of Ty list
| Delayed of Lazy<Ty>
// (rest is as before)
let rec int_string_stream = Delayed(Lazy.Create(fun () ->
Union
[
Tuple [Int; Rec (nodify_ty (int_string_stream))]
Tuple [String; Rec (nodify_ty (int_string_stream))]
]))
This will mean that when you pattern match on Ty, you'll always need to check for Delayed, evaluate the lazy value and then pattern match again, but that's probably doable!

How to shorten this OCaml code?

I am just wondering how to shorten these code as I suspect it is too redundant
let get ename doc =
try Some (StringMap.find ename doc) with Not_found -> None;;
let get_double ename doc =
let element = get ename doc in
match element with
| None -> None
| Some (Double v) -> Some v
| _ -> raise Wrong_bson_type;;
let get_string ename doc =
let element = get ename doc in
match element with
| None -> None
| Some (String v) -> Some v
| _ -> raise Wrong_bson_type;;
let get_doc ename doc =
let element = get ename doc in
match element with
| None -> None
| Some (Document v) -> Some v
| _ -> raise Wrong_bson_type;;
So, basically, I have different types of values, and I put all those kinds of values into a map.
The code above is for getting according type of values out of the map. What I do is that for each type, I have a get. To get one type of value, I have to see a). whether it is there or not; b). whether it is that type indeed, if not, raise an exception.
But the code above seems to redundant as you can see. The only diff between each type's get is just the type itself.
How can I shorten this code?
You can do this:
let get_generic extract ename doc =
let element = get ename doc in
match element with
| None -> None
| Some v -> Some (extract v)
let get_double = get_generic (function Double v -> v | _ -> raise Wrong_bson_type)
let get_string = get_generic (function String v -> v | _ -> raise Wrong_bson_type)
let get_doc = get_generic (function Document v -> v | _ -> raise Wrong_bson_type)
EDIT:
To remove the redundant raise Wrong_bson_type (But it is ugly):
let get_generic extract ename doc = try
let element = get ename doc in
match element with
| None -> None
| Some v -> Some (extract v)
with Match_failure _ -> raise Wrong_bson_type
let get_double = get_generic (fun (Double v) -> v)
let get_string = get_generic (fun (String v) -> v)
let get_doc = get_generic (fun (Document v)-> v)
You can use GADT to do that:
If you define a type expr like this:
type _ expr =
| Document: document -> document expr
| String: string -> string expr
| Double: float -> float expr
You can write a function get like this:
let get : type v. v expr -> v = function
Document doc -> doc
| String s -> s
| Double d -> d
With GADTs:
type _ asked =
| TDouble : float asked
| TString : string asked
| TDocument : document asked
let get : type v. v asked -> string -> doc StringMap.t -> v option =
fun asked ename doc ->
try
Some (match asked, StringMap.find ename doc with
| TDouble, Double f -> f
| TString, String s -> s
| TDocument, Document d -> d)
with Not_found -> None
let get_double = get TDouble
let get_string = get TString
let get_document = get TDocument
If you can live with these extractor functions:
let extract_double = function
| Double v -> v
| _ -> raise Wrong_bson_type
let extract_string = function
| String v -> v
| _ -> raise Wrong_bson_type
let extract_doc = function
| Document v -> v
| _ -> raise Wrong_bson_type
Then you can use monadic style for the higher-order function, which allows you to keep your original definition of get:
let return x = Some x
let (>>=) mx f =
match mx with
| Some x -> f x
| None -> None
let get_with exf ename doc =
(get ename doc) >>= fun v ->
return (exf v)
let get_double = get_with extract_double
let get_string = get_with extract_string
let get_doc = get_with extract_doc
Less redundant and abstracts the side effect to generic bind and return operations.

Pattern matching and constructors

Why do i get errors when I write this kind of pattern matching :
type t = A of int | B of float
let f = function
| (A i | B f) -> true
| _ -> false
or
let f = function
| A i | B f -> true
| _ -> false
Error: Variable f must occur on both sides of this | pattern
let f = function
| (A i | B i) -> true
| _ -> false
or
let f = function
| A i | B i -> true
| _ -> false
Error: This pattern matches values of type ints of type float
but a pattern was expected which matches value
If you provide a single right-hand side for multiple patterns (as you do), OCaml requires that the patterns consistently bind to pattern variables.
In the first situation,
match ... with
| A i | B f -> ...
...
the patterns don't agree on the variables they bind to: the first pattern binds to i, while the second binds to f.
In the second situation,
match ... with
| A i | B i -> ...
...
the patterns don't agree on the type of values to bind to their variables: the first pattern binds a value of type int to i, while the second binds a value of type float to i.
The only way in which these two pattern can consistently bind to variables is not to bind to any variables at all:
match ... with
| A _ | B _ -> ...
...
The complete example then becomes
type t = A of int | B of float
let f = function
| A _ | B _ -> true
| _ -> false
(But note that the last arm of the pattern match is superfluous as the first two pattern already exhaustively match all values of your type t. Hence, we get:
let f = function
| A _ | B _ -> true
This of course is equivalent to writing let f _ = true.)
In Or pattern (| pattern), you lose track of which constructors you are in. Therefore, you need to bind the same set of variables to work without referring to constructors.
And OCaml is strongly-typed; a value i cannot have both type int and type float.
If type t has more than two cases, you should write:
let f = function
| A _ | B _ -> true
| _ -> false
otherwise:
let f = function
| A _ | B _ -> true
is enough since pattern matching is already exhaustive.
I agree that Or pattern is quite restrictive, but sometimes it is helpful when you have symmetric cases in your function:
type num =
| Int of int
| Float of float
let add s1 s2 =
match s1, s2 with
| Int i1, Int i2 -> Int (i1 + i2)
| Int i, Float f | Float f, Int i -> Float (float i +. f)
| Float f1, Float f2 -> Float (f1 +. f2)

Resources